Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

memory-hotplug.rst: document the "auto-movable" online policy

Commit e83a437faa62 ("mm/memory_hotplug: introduce "auto-movable" online
policy") introduced a new memory online policy to automatically select a
zone for memory blocks to be onlined. It added a way to set the active
online policy and tunables for the auto-movable online policy.

Follow-up commits tweaked the "auto-movable" policy to also consider
memory device details when selecting zones for memory blocks to be
onlined.

Let's document the new toggles and how the two online policies we have
work.

[david@redhat.com: updates]
Link: https://lkml.kernel.org/r/20211011082058.6076-4-david@redhat.com

Link: https://lkml.kernel.org/r/20210930144117.23641-4-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

David Hildenbrand and committed by
Linus Torvalds
9e122cc1 a8db400f

+120 -19
+120 -19
Documentation/admin-guide/mm/memory-hotplug.rst
··· 165 165 166 166 % echo 1 > /sys/devices/system/memory/memoryXXX/online 167 167 168 - The kernel will select the target zone automatically, usually defaulting to 169 - ``ZONE_NORMAL`` unless ``movable_node`` has been specified on the kernel 170 - command line or if the memory block would intersect the ZONE_MOVABLE already. 168 + The kernel will select the target zone automatically, depending on the 169 + configured ``online_policy``. 171 170 172 171 One can explicitly request to associate an offline memory block with 173 172 ZONE_MOVABLE by:: ··· 196 197 ``online_movable`` to that file, like:: 197 198 198 199 % echo online > /sys/devices/system/memory/auto_online_blocks 200 + 201 + Similarly to manual onlining, with ``online`` the kernel will select the 202 + target zone automatically, depending on the configured ``online_policy``. 199 203 200 204 Modifying the auto-online behavior will only affect all subsequently added 201 205 memory blocks only. ··· 395 393 ======================== ======================================================= 396 394 ``memhp_default_state`` configure auto-onlining by essentially setting 397 395 ``/sys/devices/system/memory/auto_online_blocks``. 398 - ``movable_node`` configure automatic zone selection in the kernel. When 399 - set, the kernel will default to ZONE_MOVABLE, unless 400 - other zones can be kept contiguous. 396 + ``movable_node`` configure automatic zone selection in the kernel when 397 + using the ``contig-zones`` online policy. When 398 + set, the kernel will default to ZONE_MOVABLE when 399 + onlining a memory block, unless other zones can be kept 400 + contiguous. 401 401 ======================== ======================================================= 402 + 403 + See Documentation/admin-guide/kernel-parameters.txt for a more generic 404 + description of these command line parameters. 402 405 403 406 Module Parameters 404 407 ------------------ ··· 421 414 422 415 The following module parameters are currently defined: 423 416 424 - ======================== ======================================================= 425 - ``memmap_on_memory`` read-write: Allocate memory for the memmap from the 426 - added memory block itself. Even if enabled, actual 427 - support depends on various other system properties and 428 - should only be regarded as a hint whether the behavior 429 - would be desired. 417 + ================================ =============================================== 418 + ``memmap_on_memory`` read-write: Allocate memory for the memmap from 419 + the added memory block itself. Even if enabled, 420 + actual support depends on various other system 421 + properties and should only be regarded as a 422 + hint whether the behavior would be desired. 430 423 431 - While allocating the memmap from the memory block 432 - itself makes memory hotplug less likely to fail and 433 - keeps the memmap on the same NUMA node in any case, it 434 - can fragment physical memory in a way that huge pages 435 - in bigger granularity cannot be formed on hotplugged 436 - memory. 437 - ======================== ======================================================= 424 + While allocating the memmap from the memory 425 + block itself makes memory hotplug less likely 426 + to fail and keeps the memmap on the same NUMA 427 + node in any case, it can fragment physical 428 + memory in a way that huge pages in bigger 429 + granularity cannot be formed on hotplugged 430 + memory. 431 + ``online_policy`` read-write: Set the basic policy used for 432 + automatic zone selection when onlining memory 433 + blocks without specifying a target zone. 434 + ``contig-zones`` has been the kernel default 435 + before this parameter was added. After an 436 + online policy was configured and memory was 437 + online, the policy should not be changed 438 + anymore. 439 + 440 + When set to ``contig-zones``, the kernel will 441 + try keeping zones contiguous. If a memory block 442 + intersects multiple zones or no zone, the 443 + behavior depends on the ``movable_node`` kernel 444 + command line parameter: default to ZONE_MOVABLE 445 + if set, default to the applicable kernel zone 446 + (usually ZONE_NORMAL) if not set. 447 + 448 + When set to ``auto-movable``, the kernel will 449 + try onlining memory blocks to ZONE_MOVABLE if 450 + possible according to the configuration and 451 + memory device details. With this policy, one 452 + can avoid zone imbalances when eventually 453 + hotplugging a lot of memory later and still 454 + wanting to be able to hotunplug as much as 455 + possible reliably, very desirable in 456 + virtualized environments. This policy ignores 457 + the ``movable_node`` kernel command line 458 + parameter and isn't really applicable in 459 + environments that require it (e.g., bare metal 460 + with hotunpluggable nodes) where hotplugged 461 + memory might be exposed via the 462 + firmware-provided memory map early during boot 463 + to the system instead of getting detected, 464 + added and onlined later during boot (such as 465 + done by virtio-mem or by some hypervisors 466 + implementing emulated DIMMs). As one example, a 467 + hotplugged DIMM will be onlined either 468 + completely to ZONE_MOVABLE or completely to 469 + ZONE_NORMAL, not a mixture. 470 + As another example, as many memory blocks 471 + belonging to a virtio-mem device will be 472 + onlined to ZONE_MOVABLE as possible, 473 + special-casing units of memory blocks that can 474 + only get hotunplugged together. *This policy 475 + does not protect from setups that are 476 + problematic with ZONE_MOVABLE and does not 477 + change the zone of memory blocks dynamically 478 + after they were onlined.* 479 + ``auto_movable_ratio`` read-write: Set the maximum MOVABLE:KERNEL 480 + memory ratio in % for the ``auto-movable`` 481 + online policy. Whether the ratio applies only 482 + for the system across all NUMA nodes or also 483 + per NUMA nodes depends on the 484 + ``auto_movable_numa_aware`` configuration. 485 + 486 + All accounting is based on present memory pages 487 + in the zones combined with accounting per 488 + memory device. Memory dedicated to the CMA 489 + allocator is accounted as MOVABLE, although 490 + residing on one of the kernel zones. The 491 + possible ratio depends on the actual workload. 492 + The kernel default is "301" %, for example, 493 + allowing for hotplugging 24 GiB to a 8 GiB VM 494 + and automatically onlining all hotplugged 495 + memory to ZONE_MOVABLE in many setups. The 496 + additional 1% deals with some pages being not 497 + present, for example, because of some firmware 498 + allocations. 499 + 500 + Note that ZONE_NORMAL memory provided by one 501 + memory device does not allow for more 502 + ZONE_MOVABLE memory for a different memory 503 + device. As one example, onlining memory of a 504 + hotplugged DIMM to ZONE_NORMAL will not allow 505 + for another hotplugged DIMM to get onlined to 506 + ZONE_MOVABLE automatically. In contrast, memory 507 + hotplugged by a virtio-mem device that got 508 + onlined to ZONE_NORMAL will allow for more 509 + ZONE_MOVABLE memory within *the same* 510 + virtio-mem device. 511 + ``auto_movable_numa_aware`` read-write: Configure whether the 512 + ``auto_movable_ratio`` in the ``auto-movable`` 513 + online policy also applies per NUMA 514 + node in addition to the whole system across all 515 + NUMA nodes. The kernel default is "Y". 516 + 517 + Disabling NUMA awareness can be helpful when 518 + dealing with NUMA nodes that should be 519 + completely hotunpluggable, onlining the memory 520 + completely to ZONE_MOVABLE automatically if 521 + possible. 522 + 523 + Parameter availability depends on CONFIG_NUMA. 524 + ================================ =============================================== 438 525 439 526 ZONE_MOVABLE 440 527 ============