Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

sched/numa-balancing: Move some document to make it consistent with the code

After commit 8a99b6833c88 ("sched: Move SCHED_DEBUG sysctl to
debugfs"), some NUMA balancing sysctls enclosed with SCHED_DEBUG has
been moved to debugfs. This patch move the document for these
sysctls from

Documentation/admin-guide/sysctl/kernel.rst

to

Documentation/scheduler/sched-debug.rst

to make the document consistent with the code.

Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Link: https://lkml.kernel.org/r/20220210052514.3038279-1-ying.huang@intel.com

authored by

Huang Ying and committed by
Peter Zijlstra
3624ba7b e496132e

+56 -45
+1 -45
Documentation/admin-guide/sysctl/kernel.rst
··· 609 609 The unmapping of pages and trapping faults incur additional overhead that 610 610 ideally is offset by improved memory locality but there is no universal 611 611 guarantee. If the target workload is already bound to NUMA nodes then this 612 - feature should be disabled. Otherwise, if the system overhead from the 613 - feature is too high then the rate the kernel samples for NUMA hinting 614 - faults may be controlled by the `numa_balancing_scan_period_min_ms, 615 - numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, 616 - numa_balancing_scan_size_mb`_, and numa_balancing_settle_count sysctls. 617 - 618 - 619 - numa_balancing_scan_period_min_ms, numa_balancing_scan_delay_ms, numa_balancing_scan_period_max_ms, numa_balancing_scan_size_mb 620 - =============================================================================================================================== 621 - 622 - 623 - Automatic NUMA balancing scans tasks address space and unmaps pages to 624 - detect if pages are properly placed or if the data should be migrated to a 625 - memory node local to where the task is running. Every "scan delay" the task 626 - scans the next "scan size" number of pages in its address space. When the 627 - end of the address space is reached the scanner restarts from the beginning. 628 - 629 - In combination, the "scan delay" and "scan size" determine the scan rate. 630 - When "scan delay" decreases, the scan rate increases. The scan delay and 631 - hence the scan rate of every task is adaptive and depends on historical 632 - behaviour. If pages are properly placed then the scan delay increases, 633 - otherwise the scan delay decreases. The "scan size" is not adaptive but 634 - the higher the "scan size", the higher the scan rate. 635 - 636 - Higher scan rates incur higher system overhead as page faults must be 637 - trapped and potentially data must be migrated. However, the higher the scan 638 - rate, the more quickly a tasks memory is migrated to a local node if the 639 - workload pattern changes and minimises performance impact due to remote 640 - memory accesses. These sysctls control the thresholds for scan delays and 641 - the number of pages scanned. 642 - 643 - ``numa_balancing_scan_period_min_ms`` is the minimum time in milliseconds to 644 - scan a tasks virtual memory. It effectively controls the maximum scanning 645 - rate for each task. 646 - 647 - ``numa_balancing_scan_delay_ms`` is the starting "scan delay" used for a task 648 - when it initially forks. 649 - 650 - ``numa_balancing_scan_period_max_ms`` is the maximum time in milliseconds to 651 - scan a tasks virtual memory. It effectively controls the minimum scanning 652 - rate for each task. 653 - 654 - ``numa_balancing_scan_size_mb`` is how many megabytes worth of pages are 655 - scanned for a given scan. 656 - 612 + feature should be disabled. 657 613 658 614 oops_all_cpu_backtrace 659 615 ======================
+1
Documentation/scheduler/index.rst
··· 17 17 sched-nice-design 18 18 sched-rt-group 19 19 sched-stats 20 + sched-debug 20 21 21 22 text_files 22 23
+54
Documentation/scheduler/sched-debug.rst
··· 1 + ================= 2 + Scheduler debugfs 3 + ================= 4 + 5 + Booting a kernel with CONFIG_SCHED_DEBUG=y will give access to 6 + scheduler specific debug files under /sys/kernel/debug/sched. Some of 7 + those files are described below. 8 + 9 + numa_balancing 10 + ============== 11 + 12 + `numa_balancing` directory is used to hold files to control NUMA 13 + balancing feature. If the system overhead from the feature is too 14 + high then the rate the kernel samples for NUMA hinting faults may be 15 + controlled by the `scan_period_min_ms, scan_delay_ms, 16 + scan_period_max_ms, scan_size_mb` files. 17 + 18 + 19 + scan_period_min_ms, scan_delay_ms, scan_period_max_ms, scan_size_mb 20 + ------------------------------------------------------------------- 21 + 22 + Automatic NUMA balancing scans tasks address space and unmaps pages to 23 + detect if pages are properly placed or if the data should be migrated to a 24 + memory node local to where the task is running. Every "scan delay" the task 25 + scans the next "scan size" number of pages in its address space. When the 26 + end of the address space is reached the scanner restarts from the beginning. 27 + 28 + In combination, the "scan delay" and "scan size" determine the scan rate. 29 + When "scan delay" decreases, the scan rate increases. The scan delay and 30 + hence the scan rate of every task is adaptive and depends on historical 31 + behaviour. If pages are properly placed then the scan delay increases, 32 + otherwise the scan delay decreases. The "scan size" is not adaptive but 33 + the higher the "scan size", the higher the scan rate. 34 + 35 + Higher scan rates incur higher system overhead as page faults must be 36 + trapped and potentially data must be migrated. However, the higher the scan 37 + rate, the more quickly a tasks memory is migrated to a local node if the 38 + workload pattern changes and minimises performance impact due to remote 39 + memory accesses. These files control the thresholds for scan delays and 40 + the number of pages scanned. 41 + 42 + ``scan_period_min_ms`` is the minimum time in milliseconds to scan a 43 + tasks virtual memory. It effectively controls the maximum scanning 44 + rate for each task. 45 + 46 + ``scan_delay_ms`` is the starting "scan delay" used for a task when it 47 + initially forks. 48 + 49 + ``scan_period_max_ms`` is the maximum time in milliseconds to scan a 50 + tasks virtual memory. It effectively controls the minimum scanning 51 + rate for each task. 52 + 53 + ``scan_size_mb`` is how many megabytes worth of pages are scanned for 54 + a given scan.