Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

cxl: docs/linux/memory-hotplug

Add documentation on how the CXL driver surfaces memory through the
DAX driver and memory-hotplug.

Signed-off-by: Gregory Price <gourry@gourry.net>
Link: https://patch.msgid.link/20250512162134.3596150-13-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>

authored by

Gregory Price and committed by
Dave Jiang
641fdea6 36e9f71b

+79
+1
Documentation/driver-api/cxl/index.rst
··· 37 37 linux/early-boot 38 38 linux/cxl-driver 39 39 linux/dax-driver 40 + linux/memory-hotplug 40 41 linux/access-coordinates 41 42 42 43
+78
Documentation/driver-api/cxl/linux/memory-hotplug.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ============== 4 + Memory Hotplug 5 + ============== 6 + The final phase of surfacing CXL memory to the kernel page allocator is for 7 + the `DAX` driver to surface a `Driver Managed` memory region via the 8 + memory-hotplug component. 9 + 10 + There are four major configurations to consider: 11 + 12 + 1) Default Online Behavior (on/off and zone) 13 + 2) Hotplug Memory Block size 14 + 3) Memory Map Resource location 15 + 4) Driver-Managed Memory Designation 16 + 17 + Default Online Behavior 18 + ======================= 19 + The default-online behavior of hotplug memory is dictated by the following, 20 + in order of precedence: 21 + 22 + - :code:`CONFIG_MHP_DEFAULT_ONLINE_TYPE` Build Configuration 23 + - :code:`memhp_default_state` Boot parameter 24 + - :code:`/sys/devices/system/memory/auto_online_blocks` value 25 + 26 + These dictate whether hotplugged memory blocks arrive in one of three states: 27 + 28 + 1) Offline 29 + 2) Online in :code:`ZONE_NORMAL` 30 + 3) Online in :code:`ZONE_MOVABLE` 31 + 32 + :code:`ZONE_NORMAL` implies this capacity may be used for almost any allocation, 33 + while :code:`ZONE_MOVABLE` implies this capacity should only be used for 34 + migratable allocations. 35 + 36 + :code:`ZONE_MOVABLE` attempts to retain the hotplug-ability of a memory block 37 + so that it the entire region may be hot-unplugged at a later time. Any capacity 38 + onlined into :code:`ZONE_NORMAL` should be considered permanently attached to 39 + the page allocator. 40 + 41 + Hotplug Memory Block Size 42 + ========================= 43 + By default, on most architectures, the Hotplug Memory Block Size is either 44 + 128MB or 256MB. On x86, the block size increases up to 2GB as total memory 45 + capacity exceeds 64GB. As of v6.15, Linux does not take into account the 46 + size and alignment of the ACPI CEDT CFMWS regions (see Early Boot docs) when 47 + deciding the Hotplug Memory Block Size. 48 + 49 + Memory Map 50 + ========== 51 + The location of :code:`struct folio` allocations to represent the hotplugged 52 + memory capacity are dictated by the following system settings: 53 + 54 + - :code:`/sys_module/memory_hotplug/parameters/memmap_on_memory` 55 + - :code:`/sys/bus/dax/devices/daxN.Y/memmap_on_memory` 56 + 57 + If both of these parameters are set to true, :code:`struct folio` for this 58 + capacity will be carved out of the memory block being onlined. This has 59 + performance implications if the memory is particularly high-latency and 60 + its :code:`struct folio` becomes hotly contended. 61 + 62 + If either parameter is set to false, :code:`struct folio` for this capacity 63 + will be allocated from the local node of the processor running the hotplug 64 + procedure. This capacity will be allocated from :code:`ZONE_NORMAL` on 65 + that node, as it is a :code:`GFP_KERNEL` allocation. 66 + 67 + Systems with extremely large amounts of :code:`ZONE_MOVABLE` memory (e.g. 68 + CXL memory pools) must ensure that there is sufficient local 69 + :code:`ZONE_NORMAL` capacity to host the memory map for the hotplugged capacity. 70 + 71 + Driver Managed Memory 72 + ===================== 73 + The DAX driver surfaces this memory to memory-hotplug as "Driver Managed". This 74 + is not a configurable setting, but it's important to note that driver managed 75 + memory is explicitly excluded from use during kexec. This is required to ensure 76 + any reset or out-of-band operations that the CXL device may be subject to during 77 + a functional system-reboot (such as a reset-on-probe) will not cause portions of 78 + the kexec kernel to be overwritten.