Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Documentation: add documents for DAMON

This commit adds documents for DAMON under
`Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`.

Link: https://lkml.kernel.org/r/20210716081449.22187-11-sj38.park@gmail.com
Signed-off-by: SeongJae Park <sjpark@amazon.de>
Reviewed-by: Fernand Sieber <sieberf@amazon.com>
Reviewed-by: Markus Boehme <markubo@amazon.de>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Amit Shah <amit@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Brendan Higgins <brendanhiggins@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Rientjes <rientjes@google.com>
Cc: David Woodhouse <dwmw@amazon.com>
Cc: Fan Du <fan.du@intel.com>
Cc: Greg Kroah-Hartman <greg@kroah.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Joe Perches <joe@perches.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Leonard Foerster <foersleo@amazon.de>
Cc: Marco Elver <elver@google.com>
Cc: Maximilian Heyne <mheyne@amazon.de>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

SeongJae Park and committed by
Linus Torvalds
c4ba6014 75c1c2b5

+510
+15
Documentation/admin-guide/mm/damon/index.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ======================== 4 + Monitoring Data Accesses 5 + ======================== 6 + 7 + :doc:`DAMON </vm/damon/index>` allows light-weight data access monitoring. 8 + Using DAMON, users can analyze the memory access patterns of their systems and 9 + optimize those. 10 + 11 + .. toctree:: 12 + :maxdepth: 2 13 + 14 + start 15 + usage
+114
Documentation/admin-guide/mm/damon/start.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =============== 4 + Getting Started 5 + =============== 6 + 7 + This document briefly describes how you can use DAMON by demonstrating its 8 + default user space tool. Please note that this document describes only a part 9 + of its features for brevity. Please refer to :doc:`usage` for more details. 10 + 11 + 12 + TL; DR 13 + ====== 14 + 15 + Follow the commands below to monitor and visualize the memory access pattern of 16 + your workload. :: 17 + 18 + # # build the kernel with CONFIG_DAMON_*=y, install it, and reboot 19 + # mount -t debugfs none /sys/kernel/debug/ 20 + # git clone https://github.com/awslabs/damo 21 + # ./damo/damo record $(pidof <your workload>) 22 + # ./damo/damo report heat --plot_ascii 23 + 24 + The final command draws the access heatmap of ``<your workload>``. The heatmap 25 + shows which memory region (x-axis) is accessed when (y-axis) and how frequently 26 + (number; the higher the more accesses have been observed). :: 27 + 28 + 111111111111111111111111111111111111111111111111111111110000 29 + 111121111111111111111111111111211111111111111111111111110000 30 + 000000000000000000000000000000000000000000000000001555552000 31 + 000000000000000000000000000000000000000000000222223555552000 32 + 000000000000000000000000000000000000000011111677775000000000 33 + 000000000000000000000000000000000000000488888000000000000000 34 + 000000000000000000000000000000000177888400000000000000000000 35 + 000000000000000000000000000046666522222100000000000000000000 36 + 000000000000000000000014444344444300000000000000000000000000 37 + 000000000000000002222245555510000000000000000000000000000000 38 + # access_frequency: 0 1 2 3 4 5 6 7 8 9 39 + # x-axis: space (140286319947776-140286426374096: 101.496 MiB) 40 + # y-axis: time (605442256436361-605479951866441: 37.695430s) 41 + # resolution: 60x10 (1.692 MiB and 3.770s for each character) 42 + 43 + 44 + Prerequisites 45 + ============= 46 + 47 + Kernel 48 + ------ 49 + 50 + You should first ensure your system is running on a kernel built with 51 + ``CONFIG_DAMON_*=y``. 52 + 53 + 54 + User Space Tool 55 + --------------- 56 + 57 + For the demonstration, we will use the default user space tool for DAMON, 58 + called DAMON Operator (DAMO). It is available at 59 + https://github.com/awslabs/damo. The examples below assume that ``damo`` is on 60 + your ``$PATH``. It's not mandatory, though. 61 + 62 + Because DAMO is using the debugfs interface (refer to :doc:`usage` for the 63 + detail) of DAMON, you should ensure debugfs is mounted. Mount it manually as 64 + below:: 65 + 66 + # mount -t debugfs none /sys/kernel/debug/ 67 + 68 + or append the following line to your ``/etc/fstab`` file so that your system 69 + can automatically mount debugfs upon booting:: 70 + 71 + debugfs /sys/kernel/debug debugfs defaults 0 0 72 + 73 + 74 + Recording Data Access Patterns 75 + ============================== 76 + 77 + The commands below record the memory access patterns of a program and save the 78 + monitoring results to a file. :: 79 + 80 + $ git clone https://github.com/sjp38/masim 81 + $ cd masim; make; ./masim ./configs/zigzag.cfg & 82 + $ sudo damo record -o damon.data $(pidof masim) 83 + 84 + The first two lines of the commands download an artificial memory access 85 + generator program and run it in the background. The generator will repeatedly 86 + access two 100 MiB sized memory regions one by one. You can substitute this 87 + with your real workload. The last line asks ``damo`` to record the access 88 + pattern in the ``damon.data`` file. 89 + 90 + 91 + Visualizing Recorded Patterns 92 + ============================= 93 + 94 + The following three commands visualize the recorded access patterns and save 95 + the results as separate image files. :: 96 + 97 + $ damo report heats --heatmap access_pattern_heatmap.png 98 + $ damo report wss --range 0 101 1 --plot wss_dist.png 99 + $ damo report wss --range 0 101 1 --sortby time --plot wss_chron_change.png 100 + 101 + - ``access_pattern_heatmap.png`` will visualize the data access pattern in a 102 + heatmap, showing which memory region (y-axis) got accessed when (x-axis) 103 + and how frequently (color). 104 + - ``wss_dist.png`` will show the distribution of the working set size. 105 + - ``wss_chron_change.png`` will show how the working set size has 106 + chronologically changed. 107 + 108 + You can view the visualizations of this example workload at [1]_. 109 + Visualizations of other realistic workloads are available at [2]_ [3]_ [4]_. 110 + 111 + .. [1] https://damonitor.github.io/doc/html/v17/admin-guide/mm/damon/start.html#visualizing-recorded-patterns 112 + .. [2] https://damonitor.github.io/test/result/visual/latest/rec.heatmap.1.png.html 113 + .. [3] https://damonitor.github.io/test/result/visual/latest/rec.wss_sz.png.html 114 + .. [4] https://damonitor.github.io/test/result/visual/latest/rec.wss_time.png.html
+112
Documentation/admin-guide/mm/damon/usage.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + =============== 4 + Detailed Usages 5 + =============== 6 + 7 + DAMON provides below three interfaces for different users. 8 + 9 + - *DAMON user space tool.* 10 + This is for privileged people such as system administrators who want a 11 + just-working human-friendly interface. Using this, users can use the DAMON’s 12 + major features in a human-friendly way. It may not be highly tuned for 13 + special cases, though. It supports only virtual address spaces monitoring. 14 + - *debugfs interface.* 15 + This is for privileged user space programmers who want more optimized use of 16 + DAMON. Using this, users can use DAMON’s major features by reading 17 + from and writing to special debugfs files. Therefore, you can write and use 18 + your personalized DAMON debugfs wrapper programs that reads/writes the 19 + debugfs files instead of you. The DAMON user space tool is also a reference 20 + implementation of such programs. It supports only virtual address spaces 21 + monitoring. 22 + - *Kernel Space Programming Interface.* 23 + This is for kernel space programmers. Using this, users can utilize every 24 + feature of DAMON most flexibly and efficiently by writing kernel space 25 + DAMON application programs for you. You can even extend DAMON for various 26 + address spaces. 27 + 28 + Nevertheless, you could write your own user space tool using the debugfs 29 + interface. A reference implementation is available at 30 + https://github.com/awslabs/damo. If you are a kernel programmer, you could 31 + refer to :doc:`/vm/damon/api` for the kernel space programming interface. For 32 + the reason, this document describes only the debugfs interface 33 + 34 + debugfs Interface 35 + ================= 36 + 37 + DAMON exports three files, ``attrs``, ``target_ids``, and ``monitor_on`` under 38 + its debugfs directory, ``<debugfs>/damon/``. 39 + 40 + 41 + Attributes 42 + ---------- 43 + 44 + Users can get and set the ``sampling interval``, ``aggregation interval``, 45 + ``regions update interval``, and min/max number of monitoring target regions by 46 + reading from and writing to the ``attrs`` file. To know about the monitoring 47 + attributes in detail, please refer to the :doc:`/vm/damon/design`. For 48 + example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10 and 49 + 1000, and then check it again:: 50 + 51 + # cd <debugfs>/damon 52 + # echo 5000 100000 1000000 10 1000 > attrs 53 + # cat attrs 54 + 5000 100000 1000000 10 1000 55 + 56 + 57 + Target IDs 58 + ---------- 59 + 60 + Some types of address spaces supports multiple monitoring target. For example, 61 + the virtual memory address spaces monitoring can have multiple processes as the 62 + monitoring targets. Users can set the targets by writing relevant id values of 63 + the targets to, and get the ids of the current targets by reading from the 64 + ``target_ids`` file. In case of the virtual address spaces monitoring, the 65 + values should be pids of the monitoring target processes. For example, below 66 + commands set processes having pids 42 and 4242 as the monitoring targets and 67 + check it again:: 68 + 69 + # cd <debugfs>/damon 70 + # echo 42 4242 > target_ids 71 + # cat target_ids 72 + 42 4242 73 + 74 + Note that setting the target ids doesn't start the monitoring. 75 + 76 + 77 + Turning On/Off 78 + -------------- 79 + 80 + Setting the files as described above doesn't incur effect unless you explicitly 81 + start the monitoring. You can start, stop, and check the current status of the 82 + monitoring by writing to and reading from the ``monitor_on`` file. Writing 83 + ``on`` to the file starts the monitoring of the targets with the attributes. 84 + Writing ``off`` to the file stops those. DAMON also stops if every target 85 + process is terminated. Below example commands turn on, off, and check the 86 + status of DAMON:: 87 + 88 + # cd <debugfs>/damon 89 + # echo on > monitor_on 90 + # echo off > monitor_on 91 + # cat monitor_on 92 + off 93 + 94 + Please note that you cannot write to the above-mentioned debugfs files while 95 + the monitoring is turned on. If you write to the files while DAMON is running, 96 + an error code such as ``-EBUSY`` will be returned. 97 + 98 + 99 + Tracepoint for Monitoring Results 100 + ================================= 101 + 102 + DAMON provides the monitoring results via a tracepoint, 103 + ``damon:damon_aggregated``. While the monitoring is turned on, you could 104 + record the tracepoint events and show results using tracepoint supporting tools 105 + like ``perf``. For example:: 106 + 107 + # echo on > monitor_on 108 + # perf record -e damon:damon_aggregated & 109 + # sleep 5 110 + # kill 9 $(pidof perf) 111 + # echo off > monitor_on 112 + # perf script
+1
Documentation/admin-guide/mm/index.rst
··· 27 27 28 28 concepts 29 29 cma_debugfs 30 + damon/index 30 31 hugetlbpage 31 32 idle_page_tracking 32 33 ksm
+20
Documentation/vm/damon/api.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============= 4 + API Reference 5 + ============= 6 + 7 + Kernel space programs can use every feature of DAMON using below APIs. All you 8 + need to do is including ``damon.h``, which is located in ``include/linux/`` of 9 + the source tree. 10 + 11 + Structures 12 + ========== 13 + 14 + .. kernel-doc:: include/linux/damon.h 15 + 16 + 17 + Functions 18 + ========= 19 + 20 + .. kernel-doc:: mm/damon/core.c
+166
Documentation/vm/damon/design.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ====== 4 + Design 5 + ====== 6 + 7 + Configurable Layers 8 + =================== 9 + 10 + DAMON provides data access monitoring functionality while making the accuracy 11 + and the overhead controllable. The fundamental access monitorings require 12 + primitives that dependent on and optimized for the target address space. On 13 + the other hand, the accuracy and overhead tradeoff mechanism, which is the core 14 + of DAMON, is in the pure logic space. DAMON separates the two parts in 15 + different layers and defines its interface to allow various low level 16 + primitives implementations configurable with the core logic. 17 + 18 + Due to this separated design and the configurable interface, users can extend 19 + DAMON for any address space by configuring the core logics with appropriate low 20 + level primitive implementations. If appropriate one is not provided, users can 21 + implement the primitives on their own. 22 + 23 + For example, physical memory, virtual memory, swap space, those for specific 24 + processes, NUMA nodes, files, and backing memory devices would be supportable. 25 + Also, if some architectures or devices support special optimized access check 26 + primitives, those will be easily configurable. 27 + 28 + 29 + Reference Implementations of Address Space Specific Primitives 30 + ============================================================== 31 + 32 + The low level primitives for the fundamental access monitoring are defined in 33 + two parts: 34 + 35 + 1. Identification of the monitoring target address range for the address space. 36 + 2. Access check of specific address range in the target space. 37 + 38 + DAMON currently provides the implementation of the primitives for only the 39 + virtual address spaces. Below two subsections describe how it works. 40 + 41 + 42 + VMA-based Target Address Range Construction 43 + ------------------------------------------- 44 + 45 + Only small parts in the super-huge virtual address space of the processes are 46 + mapped to the physical memory and accessed. Thus, tracking the unmapped 47 + address regions is just wasteful. However, because DAMON can deal with some 48 + level of noise using the adaptive regions adjustment mechanism, tracking every 49 + mapping is not strictly required but could even incur a high overhead in some 50 + cases. That said, too huge unmapped areas inside the monitoring target should 51 + be removed to not take the time for the adaptive mechanism. 52 + 53 + For the reason, this implementation converts the complex mappings to three 54 + distinct regions that cover every mapped area of the address space. The two 55 + gaps between the three regions are the two biggest unmapped areas in the given 56 + address space. The two biggest unmapped areas would be the gap between the 57 + heap and the uppermost mmap()-ed region, and the gap between the lowermost 58 + mmap()-ed region and the stack in most of the cases. Because these gaps are 59 + exceptionally huge in usual address spaces, excluding these will be sufficient 60 + to make a reasonable trade-off. Below shows this in detail:: 61 + 62 + <heap> 63 + <BIG UNMAPPED REGION 1> 64 + <uppermost mmap()-ed region> 65 + (small mmap()-ed regions and munmap()-ed regions) 66 + <lowermost mmap()-ed region> 67 + <BIG UNMAPPED REGION 2> 68 + <stack> 69 + 70 + 71 + PTE Accessed-bit Based Access Check 72 + ----------------------------------- 73 + 74 + The implementation for the virtual address space uses PTE Accessed-bit for 75 + basic access checks. It finds the relevant PTE Accessed bit from the address 76 + by walking the page table for the target task of the address. In this way, the 77 + implementation finds and clears the bit for next sampling target address and 78 + checks whether the bit set again after one sampling period. This could disturb 79 + other kernel subsystems using the Accessed bits, namely Idle page tracking and 80 + the reclaim logic. To avoid such disturbances, DAMON makes it mutually 81 + exclusive with Idle page tracking and uses ``PG_idle`` and ``PG_young`` page 82 + flags to solve the conflict with the reclaim logic, as Idle page tracking does. 83 + 84 + 85 + Address Space Independent Core Mechanisms 86 + ========================================= 87 + 88 + Below four sections describe each of the DAMON core mechanisms and the five 89 + monitoring attributes, ``sampling interval``, ``aggregation interval``, 90 + ``regions update interval``, ``minimum number of regions``, and ``maximum 91 + number of regions``. 92 + 93 + 94 + Access Frequency Monitoring 95 + --------------------------- 96 + 97 + The output of DAMON says what pages are how frequently accessed for a given 98 + duration. The resolution of the access frequency is controlled by setting 99 + ``sampling interval`` and ``aggregation interval``. In detail, DAMON checks 100 + access to each page per ``sampling interval`` and aggregates the results. In 101 + other words, counts the number of the accesses to each page. After each 102 + ``aggregation interval`` passes, DAMON calls callback functions that previously 103 + registered by users so that users can read the aggregated results and then 104 + clears the results. This can be described in below simple pseudo-code:: 105 + 106 + while monitoring_on: 107 + for page in monitoring_target: 108 + if accessed(page): 109 + nr_accesses[page] += 1 110 + if time() % aggregation_interval == 0: 111 + for callback in user_registered_callbacks: 112 + callback(monitoring_target, nr_accesses) 113 + for page in monitoring_target: 114 + nr_accesses[page] = 0 115 + sleep(sampling interval) 116 + 117 + The monitoring overhead of this mechanism will arbitrarily increase as the 118 + size of the target workload grows. 119 + 120 + 121 + Region Based Sampling 122 + --------------------- 123 + 124 + To avoid the unbounded increase of the overhead, DAMON groups adjacent pages 125 + that assumed to have the same access frequencies into a region. As long as the 126 + assumption (pages in a region have the same access frequencies) is kept, only 127 + one page in the region is required to be checked. Thus, for each ``sampling 128 + interval``, DAMON randomly picks one page in each region, waits for one 129 + ``sampling interval``, checks whether the page is accessed meanwhile, and 130 + increases the access frequency of the region if so. Therefore, the monitoring 131 + overhead is controllable by setting the number of regions. DAMON allows users 132 + to set the minimum and the maximum number of regions for the trade-off. 133 + 134 + This scheme, however, cannot preserve the quality of the output if the 135 + assumption is not guaranteed. 136 + 137 + 138 + Adaptive Regions Adjustment 139 + --------------------------- 140 + 141 + Even somehow the initial monitoring target regions are well constructed to 142 + fulfill the assumption (pages in same region have similar access frequencies), 143 + the data access pattern can be dynamically changed. This will result in low 144 + monitoring quality. To keep the assumption as much as possible, DAMON 145 + adaptively merges and splits each region based on their access frequency. 146 + 147 + For each ``aggregation interval``, it compares the access frequencies of 148 + adjacent regions and merges those if the frequency difference is small. Then, 149 + after it reports and clears the aggregated access frequency of each region, it 150 + splits each region into two or three regions if the total number of regions 151 + will not exceed the user-specified maximum number of regions after the split. 152 + 153 + In this way, DAMON provides its best-effort quality and minimal overhead while 154 + keeping the bounds users set for their trade-off. 155 + 156 + 157 + Dynamic Target Space Updates Handling 158 + ------------------------------------- 159 + 160 + The monitoring target address range could dynamically changed. For example, 161 + virtual memory could be dynamically mapped and unmapped. Physical memory could 162 + be hot-plugged. 163 + 164 + As the changes could be quite frequent in some cases, DAMON checks the dynamic 165 + memory mapping changes and applies it to the abstracted target area only for 166 + each of a user-specified time interval (``regions update interval``).
+51
Documentation/vm/damon/faq.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ========================== 4 + Frequently Asked Questions 5 + ========================== 6 + 7 + Why a new subsystem, instead of extending perf or other user space tools? 8 + ========================================================================= 9 + 10 + First, because it needs to be lightweight as much as possible so that it can be 11 + used online, any unnecessary overhead such as kernel - user space context 12 + switching cost should be avoided. Second, DAMON aims to be used by other 13 + programs including the kernel. Therefore, having a dependency on specific 14 + tools like perf is not desirable. These are the two biggest reasons why DAMON 15 + is implemented in the kernel space. 16 + 17 + 18 + Can 'idle pages tracking' or 'perf mem' substitute DAMON? 19 + ========================================================= 20 + 21 + Idle page tracking is a low level primitive for access check of the physical 22 + address space. 'perf mem' is similar, though it can use sampling to minimize 23 + the overhead. On the other hand, DAMON is a higher-level framework for the 24 + monitoring of various address spaces. It is focused on memory management 25 + optimization and provides sophisticated accuracy/overhead handling mechanisms. 26 + Therefore, 'idle pages tracking' and 'perf mem' could provide a subset of 27 + DAMON's output, but cannot substitute DAMON. 28 + 29 + 30 + Does DAMON support virtual memory only? 31 + ======================================= 32 + 33 + No. The core of the DAMON is address space independent. The address space 34 + specific low level primitive parts including monitoring target regions 35 + constructions and actual access checks can be implemented and configured on the 36 + DAMON core by the users. In this way, DAMON users can monitor any address 37 + space with any access check technique. 38 + 39 + Nonetheless, DAMON provides vma tracking and PTE Accessed bit check based 40 + implementations of the address space dependent functions for the virtual memory 41 + by default, for a reference and convenient use. In near future, we will 42 + provide those for physical memory address space. 43 + 44 + 45 + Can I simply monitor page granularity? 46 + ====================================== 47 + 48 + Yes. You can do so by setting the ``min_nr_regions`` attribute higher than the 49 + working set size divided by the page size. Because the monitoring target 50 + regions size is forced to be ``>=page size``, the region split will make no 51 + effect.
+30
Documentation/vm/damon/index.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ========================== 4 + DAMON: Data Access MONitor 5 + ========================== 6 + 7 + DAMON is a data access monitoring framework subsystem for the Linux kernel. 8 + The core mechanisms of DAMON (refer to :doc:`design` for the detail) make it 9 + 10 + - *accurate* (the monitoring output is useful enough for DRAM level memory 11 + management; It might not appropriate for CPU Cache levels, though), 12 + - *light-weight* (the monitoring overhead is low enough to be applied online), 13 + and 14 + - *scalable* (the upper-bound of the overhead is in constant range regardless 15 + of the size of target workloads). 16 + 17 + Using this framework, therefore, the kernel's memory management mechanisms can 18 + make advanced decisions. Experimental memory management optimization works 19 + that incurring high data accesses monitoring overhead could implemented again. 20 + In user space, meanwhile, users who have some special workloads can write 21 + personalized applications for better understanding and optimizations of their 22 + workloads and systems. 23 + 24 + .. toctree:: 25 + :maxdepth: 2 26 + 27 + faq 28 + design 29 + api 30 + plans
+1
Documentation/vm/index.rst
··· 32 32 arch_pgtable_helpers 33 33 balance 34 34 cleancache 35 + damon/index 35 36 free_page_reporting 36 37 frontswap 37 38 highmem