Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

cxl: docs/platform/bios-and-efi documentation

Add some docs on CXL configurations done in bios/efi that affect
linux configuration - information vendors may care to consider.

Signed-off-by: Gregory Price <gourry@gourry.net>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Link: https://patch.msgid.link/20250512162134.3596150-5-gourry@gourry.net
Signed-off-by: Dave Jiang <dave.jiang@intel.com>

authored by

Gregory Price and committed by
Dave Jiang
e4528b9e 750d662c

+268
+6
Documentation/driver-api/cxl/index.rst
··· 22 22 devices/device-types 23 23 24 24 .. toctree:: 25 + :maxdepth: 2 26 + :caption: Platform Configuration 27 + 28 + platform/bios-and-efi 29 + 30 + .. toctree:: 25 31 :maxdepth: 1 26 32 :caption: Linux Kernel Configuration 27 33
+262
Documentation/driver-api/cxl/platform/bios-and-efi.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ====================== 4 + BIOS/EFI Configuration 5 + ====================== 6 + 7 + BIOS and EFI are largely responsible for configuring static information about 8 + devices (or potential future devices) such that Linux can build the appropriate 9 + logical representations of these devices. 10 + 11 + At a high level, this is what occurs during this phase of configuration. 12 + 13 + * The bootloader starts the BIOS/EFI. 14 + 15 + * BIOS/EFI do early device probe to determine static configuration 16 + 17 + * BIOS/EFI creates ACPI Tables that describe static config for the OS 18 + 19 + * BIOS/EFI create the system memory map (EFI Memory Map, E820, etc) 20 + 21 + * BIOS/EFI calls :code:`start_kernel` and begins the Linux Early Boot process. 22 + 23 + Much of what this section is concerned with is ACPI Table production and 24 + static memory map configuration. More detail on these tables can be found 25 + under Platform Configuration -> ACPI Table Reference. 26 + 27 + .. note:: 28 + Platform Vendors should read carefully, as this sections has recommendations 29 + on physical memory region size and alignment, memory holes, HDM interleave, 30 + and what linux expects of HDM decoders trying to work with these features. 31 + 32 + UEFI Settings 33 + ============= 34 + If your platform supports it, the :code:`uefisettings` command can be used to 35 + read/write EFI settings. Changes will be reflected on the next reboot. Kexec 36 + is not a sufficient reboot. 37 + 38 + One notable configuration here is the EFI_MEMORY_SP (Specific Purpose) bit. 39 + When this is enabled, this bit tells linux to defer management of a memory 40 + region to a driver (in this case, the CXL driver). Otherwise, the memory is 41 + treated as "normal memory", and is exposed to the page allocator during 42 + :code:`__init`. 43 + 44 + uefisettings examples 45 + --------------------- 46 + 47 + :code:`uefisettings identify` :: 48 + 49 + uefisettings identify 50 + 51 + bios_vendor: xxx 52 + bios_version: xxx 53 + bios_release: xxx 54 + bios_date: xxx 55 + product_name: xxx 56 + product_family: xxx 57 + product_version: xxx 58 + 59 + On some AMD platforms, the :code:`EFI_MEMORY_SP` bit is set via the :code:`CXL 60 + Memory Attribute` field. This may be called something else on your platform. 61 + 62 + :code:`uefisettings get "CXL Memory Attribute"` :: 63 + 64 + selector: xxx 65 + ... 66 + question: Question { 67 + name: "CXL Memory Attribute", 68 + answer: "Enabled", 69 + ... 70 + } 71 + 72 + Physical Memory Map 73 + =================== 74 + 75 + Physical Address Region Alignment 76 + --------------------------------- 77 + 78 + As of Linux v6.14, the hotplug memory system requires memory regions to be 79 + uniform in size and alignment. While the CXL specification allows for memory 80 + regions as small as 256MB, the supported memory block size and alignment for 81 + hotplugged memory is architecture-defined. 82 + 83 + A Linux memory blocks may be as small as 128MB and increase in powers of two. 84 + 85 + * On ARM, the default block size and alignment is either 128MB or 256MB. 86 + 87 + * On x86, the default block size is 256MB, and increases to 2GB as the 88 + capacity of the system increases up to 64GB. 89 + 90 + For best support across versions, platform vendors should place CXL memory at 91 + a 2GB aligned base address, and regions should be 2GB aligned. This also helps 92 + prevent the creating thousands of memory devices (one per block). 93 + 94 + Memory Holes 95 + ------------ 96 + 97 + Holes in the memory map are tricky. Consider a 4GB device located at base 98 + address 0x100000000, but with the following memory map :: 99 + 100 + --------------------- 101 + | 0x100000000 | 102 + | CXL | 103 + | 0x1BFFFFFFF | 104 + --------------------- 105 + | 0x1C0000000 | 106 + | MEMORY HOLE | 107 + | 0x1FFFFFFFF | 108 + --------------------- 109 + | 0x200000000 | 110 + | CXL CONT. | 111 + | 0x23FFFFFFF | 112 + --------------------- 113 + 114 + There are two issues to consider: 115 + 116 + * decoder programming, and 117 + * memory block alignment. 118 + 119 + If your architecture requires 2GB uniform size and aligned memory blocks, the 120 + only capacity Linux is capable of mapping (as of v6.14) would be the capacity 121 + from `0x100000000-0x180000000`. The remaining capacity will be stranded, as 122 + they are not of 2GB aligned length. 123 + 124 + Assuming your architecture and memory configuration allows 1GB memory blocks, 125 + this memory map is supported and this should be presented as multiple CFMWS 126 + in the CEDT that describe each side of the memory hole separately - along with 127 + matching decoders. 128 + 129 + Multiple decoders can (and should) be used to manage such a memory hole (see 130 + below), but each chunk of a memory hole should be aligned to a reasonable block 131 + size (larger alignment is always better). If you intend to have memory holes 132 + in the memory map, expect to use one decoder per contiguous chunk of host 133 + physical memory. 134 + 135 + As of v6.14, Linux does provide support for memory hotplug of multiple 136 + physical memory regions separated by a memory hole described by a single 137 + HDM decoder. 138 + 139 + 140 + Decoder Programming 141 + =================== 142 + If BIOS/EFI intends to program the decoders to be statically configured, 143 + there are a few things to consider to avoid major pitfalls that will 144 + prevent Linux compatibility. Some of these recommendations are not 145 + required "per the specification", but Linux makes no guarantees of support 146 + otherwise. 147 + 148 + 149 + Translation Point 150 + ----------------- 151 + Per the specification, the only decoders which **TRANSLATE** Host Physical 152 + Address (HPA) to Device Physical Address (DPA) are the **Endpoint Decoders**. 153 + All other decoders in the fabric are intended to route accesses without 154 + translating the addresses. 155 + 156 + This is heavily implied by the specification, see: :: 157 + 158 + CXL Specification 3.1 159 + 8.2.4.20: CXL HDM Decoder Capability Structure 160 + - Implementation Note: CXL Host Bridge and Upstream Switch Port Decoder Flow 161 + - Implementation Note: Device Decoder Logic 162 + 163 + Given this, Linux makes a strong assumption that decoders between CPU and 164 + endpoint will all be programmed with addresses ranges that are subsets of 165 + their parent decoder. 166 + 167 + Due to some ambiguity in how Architecture, ACPI, PCI, and CXL specifications 168 + "hand off" responsibility between domains, some early adopting platforms 169 + attempted to do translation at the originating memory controller or host 170 + bridge. This configuration requires a platform specific extension to the 171 + driver and is not officially endorsed - despite being supported. 172 + 173 + It is *highly recommended* **NOT** to do this; otherwise, you are on your own 174 + to implement driver support for your platform. 175 + 176 + Interleave and Configuration Flexibility 177 + ---------------------------------------- 178 + If providing cross-host-bridge interleave, a CFMWS entry in the CEDT must be 179 + presented with target host-bridges for the interleaved device sets (there may 180 + be multiple behind each host bridge). 181 + 182 + If providing intra-host-bridge interleaving, only 1 CFMWS entry in the CEDT is 183 + required for that host bridge - if it covers the entire capacity of the devices 184 + behind the host bridge. 185 + 186 + If intending to provide users flexibility in programming decoders beyond the 187 + root, you may want to provide multiple CFMWS entries in the CEDT intended for 188 + different purposes. For example, you may want to consider adding: 189 + 190 + 1) A CFMWS entry to cover all interleavable host bridges. 191 + 2) A CFMWS entry to cover all devices on a single host bridge. 192 + 3) A CFMWS entry to cover each device. 193 + 194 + A platform may choose to add all of these, or change the mode based on a BIOS 195 + setting. For each CFMWS entry, Linux expects descriptions of the described 196 + memory regions in the SRAT to determine the number of NUMA nodes it should 197 + reserve during early boot / init. 198 + 199 + As of v6.14, Linux will create a NUMA node for each CEDT CFMWS entry, even if 200 + a matching SRAT entry does not exist; however, this is not guaranteed in the 201 + future and such a configuration should be avoided. 202 + 203 + Memory Holes 204 + ------------ 205 + If your platform includes memory holes intersparsed between your CXL memory, it 206 + is recommended to utilize multiple decoders to cover these regions of memory, 207 + rather than try to program the decoders to accept the entire range and expect 208 + Linux to manage the overlap. 209 + 210 + For example, consider the Memory Hole described above :: 211 + 212 + --------------------- 213 + | 0x100000000 | 214 + | CXL | 215 + | 0x1BFFFFFFF | 216 + --------------------- 217 + | 0x1C0000000 | 218 + | MEMORY HOLE | 219 + | 0x1FFFFFFFF | 220 + --------------------- 221 + | 0x200000000 | 222 + | CXL CONT. | 223 + | 0x23FFFFFFF | 224 + --------------------- 225 + 226 + Assuming this is provided by a single device attached directly to a host bridge, 227 + Linux would expect the following decoder programming :: 228 + 229 + ----------------------- ----------------------- 230 + | root-decoder-0 | | root-decoder-1 | 231 + | base: 0x100000000 | | base: 0x200000000 | 232 + | size: 0xC0000000 | | size: 0x40000000 | 233 + ----------------------- ----------------------- 234 + | | 235 + ----------------------- ----------------------- 236 + | HB-decoder-0 | | HB-decoder-1 | 237 + | base: 0x100000000 | | base: 0x200000000 | 238 + | size: 0xC0000000 | | size: 0x40000000 | 239 + ----------------------- ----------------------- 240 + | | 241 + ----------------------- ----------------------- 242 + | ep-decoder-0 | | ep-decoder-1 | 243 + | base: 0x100000000 | | base: 0x200000000 | 244 + | size: 0xC0000000 | | size: 0x40000000 | 245 + ----------------------- ----------------------- 246 + 247 + With a CEDT configuration with two CFMWS describing the above root decoders. 248 + 249 + Linux makes no guarantee of support for strange memory hole situations. 250 + 251 + Multi-Media Devices 252 + ------------------- 253 + The CFMWS field of the CEDT has special restriction bits which describe whether 254 + the described memory region allows volatile or persistent memory (or both). If 255 + the platform intends to support either: 256 + 257 + 1) A device with multiple medias, or 258 + 2) Using a persistent memory device as normal memory 259 + 260 + A platform may wish to create multiple CEDT CFMWS entries to describe the same 261 + memory, with the intent of allowing the end user flexibility in how that memory 262 + is configured. Linux does not presently have strong requirements in this area.