Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

docs/vm: add documentation of memory models

Describe what {FLAT,DISCONTIG,SPARSE}MEM are and how they manage to
maintain pfn <-> struct page correspondence.

Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>

authored by

Mike Rapoport and committed by
Jonathan Corbet
7d10bdbd 678f784c

+184
+1
Documentation/vm/index.rst
··· 37 37 hwpoison 38 38 hugetlbfs_reserv 39 39 ksm 40 + memory-model 40 41 mmu_notifier 41 42 numa 42 43 overcommit-accounting
+183
Documentation/vm/memory-model.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + .. _physical_memory_model: 4 + 5 + ===================== 6 + Physical Memory Model 7 + ===================== 8 + 9 + Physical memory in a system may be addressed in different ways. The 10 + simplest case is when the physical memory starts at address 0 and 11 + spans a contiguous range up to the maximal address. It could be, 12 + however, that this range contains small holes that are not accessible 13 + for the CPU. Then there could be several contiguous ranges at 14 + completely distinct addresses. And, don't forget about NUMA, where 15 + different memory banks are attached to different CPUs. 16 + 17 + Linux abstracts this diversity using one of the three memory models: 18 + FLATMEM, DISCONTIGMEM and SPARSEMEM. Each architecture defines what 19 + memory models it supports, what the default memory model is and 20 + whether it is possible to manually override that default. 21 + 22 + .. note:: 23 + At time of this writing, DISCONTIGMEM is considered deprecated, 24 + although it is still in use by several architectures. 25 + 26 + All the memory models track the status of physical page frames using 27 + :c:type:`struct page` arranged in one or more arrays. 28 + 29 + Regardless of the selected memory model, there exists one-to-one 30 + mapping between the physical page frame number (PFN) and the 31 + corresponding `struct page`. 32 + 33 + Each memory model defines :c:func:`pfn_to_page` and :c:func:`page_to_pfn` 34 + helpers that allow the conversion from PFN to `struct page` and vice 35 + versa. 36 + 37 + FLATMEM 38 + ======= 39 + 40 + The simplest memory model is FLATMEM. This model is suitable for 41 + non-NUMA systems with contiguous, or mostly contiguous, physical 42 + memory. 43 + 44 + In the FLATMEM memory model, there is a global `mem_map` array that 45 + maps the entire physical memory. For most architectures, the holes 46 + have entries in the `mem_map` array. The `struct page` objects 47 + corresponding to the holes are never fully initialized. 48 + 49 + To allocate the `mem_map` array, architecture specific setup code 50 + should call :c:func:`free_area_init_node` function or its convenience 51 + wrapper :c:func:`free_area_init`. Yet, the mappings array is not 52 + usable until the call to :c:func:`memblock_free_all` that hands all 53 + the memory to the page allocator. 54 + 55 + If an architecture enables `CONFIG_ARCH_HAS_HOLES_MEMORYMODEL` option, 56 + it may free parts of the `mem_map` array that do not cover the 57 + actual physical pages. In such case, the architecture specific 58 + :c:func:`pfn_valid` implementation should take the holes in the 59 + `mem_map` into account. 60 + 61 + With FLATMEM, the conversion between a PFN and the `struct page` is 62 + straightforward: `PFN - ARCH_PFN_OFFSET` is an index to the 63 + `mem_map` array. 64 + 65 + The `ARCH_PFN_OFFSET` defines the first page frame number for 66 + systems with physical memory starting at address different from 0. 67 + 68 + DISCONTIGMEM 69 + ============ 70 + 71 + The DISCONTIGMEM model treats the physical memory as a collection of 72 + `nodes` similarly to how Linux NUMA support does. For each node Linux 73 + constructs an independent memory management subsystem represented by 74 + `struct pglist_data` (or `pg_data_t` for short). Among other 75 + things, `pg_data_t` holds the `node_mem_map` array that maps 76 + physical pages belonging to that node. The `node_start_pfn` field of 77 + `pg_data_t` is the number of the first page frame belonging to that 78 + node. 79 + 80 + The architecture setup code should call :c:func:`free_area_init_node` for 81 + each node in the system to initialize the `pg_data_t` object and its 82 + `node_mem_map`. 83 + 84 + Every `node_mem_map` behaves exactly as FLATMEM's `mem_map` - 85 + every physical page frame in a node has a `struct page` entry in the 86 + `node_mem_map` array. When DISCONTIGMEM is enabled, a portion of the 87 + `flags` field of the `struct page` encodes the node number of the 88 + node hosting that page. 89 + 90 + The conversion between a PFN and the `struct page` in the 91 + DISCONTIGMEM model became slightly more complex as it has to determine 92 + which node hosts the physical page and which `pg_data_t` object 93 + holds the `struct page`. 94 + 95 + Architectures that support DISCONTIGMEM provide :c:func:`pfn_to_nid` 96 + to convert PFN to the node number. The opposite conversion helper 97 + :c:func:`page_to_nid` is generic as it uses the node number encoded in 98 + page->flags. 99 + 100 + Once the node number is known, the PFN can be used to index 101 + appropriate `node_mem_map` array to access the `struct page` and 102 + the offset of the `struct page` from the `node_mem_map` plus 103 + `node_start_pfn` is the PFN of that page. 104 + 105 + SPARSEMEM 106 + ========= 107 + 108 + SPARSEMEM is the most versatile memory model available in Linux and it 109 + is the only memory model that supports several advanced features such 110 + as hot-plug and hot-remove of the physical memory, alternative memory 111 + maps for non-volatile memory devices and deferred initialization of 112 + the memory map for larger systems. 113 + 114 + The SPARSEMEM model presents the physical memory as a collection of 115 + sections. A section is represented with :c:type:`struct mem_section` 116 + that contains `section_mem_map` that is, logically, a pointer to an 117 + array of struct pages. However, it is stored with some other magic 118 + that aids the sections management. The section size and maximal number 119 + of section is specified using `SECTION_SIZE_BITS` and 120 + `MAX_PHYSMEM_BITS` constants defined by each architecture that 121 + supports SPARSEMEM. While `MAX_PHYSMEM_BITS` is an actual width of a 122 + physical address that an architecture supports, the 123 + `SECTION_SIZE_BITS` is an arbitrary value. 124 + 125 + The maximal number of sections is denoted `NR_MEM_SECTIONS` and 126 + defined as 127 + 128 + .. math:: 129 + 130 + NR\_MEM\_SECTIONS = 2 ^ {(MAX\_PHYSMEM\_BITS - SECTION\_SIZE\_BITS)} 131 + 132 + The `mem_section` objects are arranged in a two-dimensional array 133 + called `mem_sections`. The size and placement of this array depend 134 + on `CONFIG_SPARSEMEM_EXTREME` and the maximal possible number of 135 + sections: 136 + 137 + * When `CONFIG_SPARSEMEM_EXTREME` is disabled, the `mem_sections` 138 + array is static and has `NR_MEM_SECTIONS` rows. Each row holds a 139 + single `mem_section` object. 140 + * When `CONFIG_SPARSEMEM_EXTREME` is enabled, the `mem_sections` 141 + array is dynamically allocated. Each row contains PAGE_SIZE worth of 142 + `mem_section` objects and the number of rows is calculated to fit 143 + all the memory sections. 144 + 145 + The architecture setup code should call :c:func:`memory_present` for 146 + each active memory range or use :c:func:`memblocks_present` or 147 + :c:func:`sparse_memory_present_with_active_regions` wrappers to 148 + initialize the memory sections. Next, the actual memory maps should be 149 + set up using :c:func:`sparse_init`. 150 + 151 + With SPARSEMEM there are two possible ways to convert a PFN to the 152 + corresponding `struct page` - a "classic sparse" and "sparse 153 + vmemmap". The selection is made at build time and it is determined by 154 + the value of `CONFIG_SPARSEMEM_VMEMMAP`. 155 + 156 + The classic sparse encodes the section number of a page in page->flags 157 + and uses high bits of a PFN to access the section that maps that page 158 + frame. Inside a section, the PFN is the index to the array of pages. 159 + 160 + The sparse vmemmap uses a virtually mapped memory map to optimize 161 + pfn_to_page and page_to_pfn operations. There is a global `struct 162 + page *vmemmap` pointer that points to a virtually contiguous array of 163 + `struct page` objects. A PFN is an index to that array and the the 164 + offset of the `struct page` from `vmemmap` is the PFN of that 165 + page. 166 + 167 + To use vmemmap, an architecture has to reserve a range of virtual 168 + addresses that will map the physical pages containing the memory 169 + map and make sure that `vmemmap` points to that range. In addition, 170 + the architecture should implement :c:func:`vmemmap_populate` method 171 + that will allocate the physical memory and create page tables for the 172 + virtual memory map. If an architecture does not have any special 173 + requirements for the vmemmap mappings, it can use default 174 + :c:func:`vmemmap_populate_basepages` provided by the generic memory 175 + management. 176 + 177 + The virtually mapped memory map allows storing `struct page` objects 178 + for persistent memory devices in pre-allocated storage on those 179 + devices. This storage is represented with :c:type:`struct vmem_altmap` 180 + that is eventually passed to vmemmap_populate() through a long chain 181 + of function calls. The vmemmap_populate() implementation may use the 182 + `vmem_altmap` along with :c:func:`altmap_alloc_block_buf` helper to 183 + allocate memory map on the persistent memory device.