Documentation/mm/physical_memory.rst at master · tjh.dev/kernel

tjh.dev / kernel
Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
kernel / Documentation / mm / physical_memory.rst
at master 27 kB view raw
  1.. SPDX-License-Identifier: GPL-2.0
  2
  3===============
  4Physical Memory
  5===============
  6
  7Linux is available for a wide range of architectures so there is a need for an
  8architecture-independent abstraction to represent the physical memory. This
  9chapter describes the structures used to manage physical memory in a running
 10system.
 11
 12The first principal concept prevalent in the memory management is
 13`Non-Uniform Memory Access (NUMA)
 14<https://en.wikipedia.org/wiki/Non-uniform_memory_access>`_.
 15With multi-core and multi-socket machines, memory may be arranged into banks
 16that incur a different cost to access depending on the “distance” from the
 17processor. For example, there might be a bank of memory assigned to each CPU or
 18a bank of memory very suitable for DMA near peripheral devices.
 19
 20Each bank is called a node and the concept is represented under Linux by a
 21``struct pglist_data`` even if the architecture is UMA. This structure is
 22always referenced by its typedef ``pg_data_t``. A ``pg_data_t`` structure
 23for a particular node can be referenced by ``NODE_DATA(nid)`` macro where
 24``nid`` is the ID of that node.
 25
 26For NUMA architectures, the node structures are allocated by the architecture
 27specific code early during boot. Usually, these structures are allocated
 28locally on the memory bank they represent. For UMA architectures, only one
 29static ``pg_data_t`` structure called ``contig_page_data`` is used. Nodes will
 30be discussed further in Section :ref:`Nodes <nodes>`
 31
 32The entire physical address space is partitioned into one or more blocks
 33called zones which represent ranges within memory. These ranges are usually
 34determined by architectural constraints for accessing the physical memory.
 35The memory range within a node that corresponds to a particular zone is
 36described by a ``struct zone``. Each zone has
 37one of the types described below.
 38
 39* ``ZONE_DMA`` and ``ZONE_DMA32`` historically represented memory suitable for
 40  DMA by peripheral devices that cannot access all of the addressable
 41  memory. For many years there are better more and robust interfaces to get
 42  memory with DMA specific requirements (Documentation/core-api/dma-api.rst),
 43  but ``ZONE_DMA`` and ``ZONE_DMA32`` still represent memory ranges that have
 44  restrictions on how they can be accessed.
 45  Depending on the architecture, either of these zone types or even they both
 46  can be disabled at build time using ``CONFIG_ZONE_DMA`` and
 47  ``CONFIG_ZONE_DMA32`` configuration options. Some 64-bit platforms may need
 48  both zones as they support peripherals with different DMA addressing
 49  limitations.
 50
 51* ``ZONE_NORMAL`` is for normal memory that can be accessed by the kernel all
 52  the time. DMA operations can be performed on pages in this zone if the DMA
 53  devices support transfers to all addressable memory. ``ZONE_NORMAL`` is
 54  always enabled.
 55
 56* ``ZONE_HIGHMEM`` is the part of the physical memory that is not covered by a
 57  permanent mapping in the kernel page tables. The memory in this zone is only
 58  accessible to the kernel using temporary mappings. This zone is available
 59  only on some 32-bit architectures and is enabled with ``CONFIG_HIGHMEM``.
 60
 61* ``ZONE_MOVABLE`` is for normal accessible memory, just like ``ZONE_NORMAL``.
 62  The difference is that the contents of most pages in ``ZONE_MOVABLE`` is
 63  movable. That means that while virtual addresses of these pages do not
 64  change, their content may move between different physical pages. Often
 65  ``ZONE_MOVABLE`` is populated during memory hotplug, but it may be
 66  also populated on boot using one of ``kernelcore``, ``movablecore`` and
 67  ``movable_node`` kernel command line parameters. See
 68  Documentation/mm/page_migration.rst and
 69  Documentation/admin-guide/mm/memory-hotplug.rst for additional details.
 70
 71* ``ZONE_DEVICE`` represents memory residing on devices such as PMEM and GPU.
 72  It has different characteristics than RAM zone types and it exists to provide
 73  :ref:`struct page <Pages>` and memory map services for device driver
 74  identified physical address ranges. ``ZONE_DEVICE`` is enabled with
 75  configuration option ``CONFIG_ZONE_DEVICE``.
 76
 77It is important to note that many kernel operations can only take place using
 78``ZONE_NORMAL`` so it is the most performance critical zone. Zones are
 79discussed further in Section :ref:`Zones <zones>`.
 80
 81The relation between node and zone extents is determined by the physical memory
 82map reported by the firmware, architectural constraints for memory addressing
 83and certain parameters in the kernel command line.
 84
 85For example, with 32-bit kernel on an x86 UMA machine with 2 Gbytes of RAM the
 86entire memory will be on node 0 and there will be three zones: ``ZONE_DMA``,
 87``ZONE_NORMAL`` and ``ZONE_HIGHMEM``::
 88
 89  0                                                            2G
 90  +-------------------------------------------------------------+
 91  |                            node 0                           |
 92  +-------------------------------------------------------------+
 93
 94  0         16M                    896M                        2G
 95  +----------+-----------------------+--------------------------+
 96  | ZONE_DMA |      ZONE_NORMAL      |       ZONE_HIGHMEM       |
 97  +----------+-----------------------+--------------------------+
 98
 99
100With a kernel built with ``ZONE_DMA`` disabled and ``ZONE_DMA32`` enabled and
101booted with ``movablecore=80%`` parameter on an arm64 machine with 16 Gbytes of
102RAM equally split between two nodes, there will be ``ZONE_DMA32``,
103``ZONE_NORMAL`` and ``ZONE_MOVABLE`` on node 0, and ``ZONE_NORMAL`` and
104``ZONE_MOVABLE`` on node 1::
105
106
107  1G                                9G                         17G
108  +--------------------------------+ +--------------------------+
109  |              node 0            | |          node 1          |
110  +--------------------------------+ +--------------------------+
111
112  1G       4G        4200M          9G          9320M          17G
113  +---------+----------+-----------+ +------------+-------------+
114  |  DMA32  |  NORMAL  |  MOVABLE  | |   NORMAL   |   MOVABLE   |
115  +---------+----------+-----------+ +------------+-------------+
116
117
118Memory banks may belong to interleaving nodes. In the example below an x86
119machine has 16 Gbytes of RAM in 4 memory banks, even banks belong to node 0
120and odd banks belong to node 1::
121
122
123  0              4G              8G             12G            16G
124  +-------------+ +-------------+ +-------------+ +-------------+
125  |    node 0   | |    node 1   | |    node 0   | |    node 1   |
126  +-------------+ +-------------+ +-------------+ +-------------+
127
128  0   16M      4G
129  +-----+-------+ +-------------+ +-------------+ +-------------+
130  | DMA | DMA32 | |    NORMAL   | |    NORMAL   | |    NORMAL   |
131  +-----+-------+ +-------------+ +-------------+ +-------------+
132
133In this case node 0 will span from 0 to 12 Gbytes and node 1 will span from
1344 to 16 Gbytes.
135
136.. _nodes:
137
138Nodes
139=====
140
141As we have mentioned, each node in memory is described by a ``pg_data_t`` which
142is a typedef for a ``struct pglist_data``. When allocating a page, by default
143Linux uses a node-local allocation policy to allocate memory from the node
144closest to the running CPU. As processes tend to run on the same CPU, it is
145likely the memory from the current node will be used. The allocation policy can
146be controlled by users as described in
147Documentation/admin-guide/mm/numa_memory_policy.rst.
148
149Most NUMA architectures maintain an array of pointers to the node
150structures. The actual structures are allocated early during boot when
151architecture specific code parses the physical memory map reported by the
152firmware. The bulk of the node initialization happens slightly later in the
153boot process by free_area_init() function, described later in Section
154:ref:`Initialization <initialization>`.
155
156
157Along with the node structures, kernel maintains an array of ``nodemask_t``
158bitmasks called ``node_states``. Each bitmask in this array represents a set of
159nodes with particular properties as defined by ``enum node_states``:
160
161``N_POSSIBLE``
162  The node could become online at some point.
163``N_ONLINE``
164  The node is online.
165``N_NORMAL_MEMORY``
166  The node has regular memory.
167``N_HIGH_MEMORY``
168  The node has regular or high memory. When ``CONFIG_HIGHMEM`` is disabled
169  aliased to ``N_NORMAL_MEMORY``.
170``N_MEMORY``
171  The node has memory(regular, high, movable)
172``N_CPU``
173  The node has one or more CPUs
174``N_GENERIC_INITIATOR``
175  The node has one or more Generic Initiators
176
177For each node that has a property described above, the bit corresponding to the
178node ID in the ``node_states[<property>]`` bitmask is set.
179
180For example, for node 2 with normal memory and CPUs, bit 2 will be set in ::
181
182  node_states[N_POSSIBLE]
183  node_states[N_ONLINE]
184  node_states[N_NORMAL_MEMORY]
185  node_states[N_HIGH_MEMORY]
186  node_states[N_MEMORY]
187  node_states[N_CPU]
188
189For various operations possible with nodemasks please refer to
190``include/linux/nodemask.h``.
191
192Among other things, nodemasks are used to provide macros for node traversal,
193namely ``for_each_node()`` and ``for_each_online_node()``.
194
195For instance, to call a function foo() for each online node::
196
197	for_each_online_node(nid) {
198		pg_data_t *pgdat = NODE_DATA(nid);
199
200		foo(pgdat);
201	}
202
203Node structure
204--------------
205
206The nodes structure ``struct pglist_data`` is declared in
207``include/linux/mmzone.h``. Here we briefly describe fields of this
208structure:
209
210General
211~~~~~~~
212
213``node_zones``
214  The zones for this node.  Not all of the zones may be populated, but it is
215  the full list. It is referenced by this node's node_zonelists as well as
216  other node's node_zonelists.
217
218``node_zonelists``
219  The list of all zones in all nodes. This list defines the order of zones
220  that allocations are preferred from. The ``node_zonelists`` is set up by
221  ``build_zonelists()`` in ``mm/page_alloc.c`` during the initialization of
222  core memory management structures.
223
224``nr_zones``
225  Number of populated zones in this node.
226
227``node_mem_map``
228  For UMA systems that use FLATMEM memory model the 0's node
229  ``node_mem_map`` is array of struct pages representing each physical frame.
230
231``node_page_ext``
232  For UMA systems that use FLATMEM memory model the 0's node
233  ``node_page_ext`` is array of extensions of struct pages. Available only
234  in the kernels built with ``CONFIG_PAGE_EXTENSION`` enabled.
235
236``node_start_pfn``
237  The page frame number of the starting page frame in this node.
238
239``node_present_pages``
240  Total number of physical pages present in this node.
241
242``node_spanned_pages``
243  Total size of physical page range, including holes.
244
245``node_size_lock``
246  A lock that protects the fields defining the node extents. Only defined when
247  at least one of ``CONFIG_MEMORY_HOTPLUG`` or
248  ``CONFIG_DEFERRED_STRUCT_PAGE_INIT`` configuration options are enabled.
249  ``pgdat_resize_lock()`` and ``pgdat_resize_unlock()`` are provided to
250  manipulate ``node_size_lock`` without checking for ``CONFIG_MEMORY_HOTPLUG``
251  or ``CONFIG_DEFERRED_STRUCT_PAGE_INIT``.
252
253``node_id``
254  The Node ID (NID) of the node, starts at 0.
255
256``totalreserve_pages``
257  This is a per-node reserve of pages that are not available to userspace
258  allocations.
259
260``first_deferred_pfn``
261  If memory initialization on large machines is deferred then this is the first
262  PFN that needs to be initialized. Defined only when
263  ``CONFIG_DEFERRED_STRUCT_PAGE_INIT`` is enabled
264
265``deferred_split_queue``
266  Per-node queue of huge pages that their split was deferred. Defined only when ``CONFIG_TRANSPARENT_HUGEPAGE`` is enabled.
267
268``__lruvec``
269  Per-node lruvec holding LRU lists and related parameters. Used only when
270  memory cgroups are disabled. It should not be accessed directly, use
271  ``mem_cgroup_lruvec()`` to look up lruvecs instead.
272
273Reclaim control
274~~~~~~~~~~~~~~~
275
276See also Documentation/mm/page_reclaim.rst.
277
278``kswapd``
279  Per-node instance of kswapd kernel thread.
280
281``kswapd_wait``, ``pfmemalloc_wait``, ``reclaim_wait``
282  Workqueues used to synchronize memory reclaim tasks
283
284``nr_writeback_throttled``
285  Number of tasks that are throttled waiting on dirty pages to clean.
286
287``nr_reclaim_start``
288  Number of pages written while reclaim is throttled waiting for writeback.
289
290``kswapd_order``
291  Controls the order kswapd tries to reclaim
292
293``kswapd_highest_zoneidx``
294  The highest zone index to be reclaimed by kswapd
295
296``kswapd_failures``
297  Number of runs kswapd was unable to reclaim any pages
298
299``min_unmapped_pages``
300  Minimal number of unmapped file backed pages that cannot be reclaimed.
301  Determined by ``vm.min_unmapped_ratio`` sysctl. Only defined when
302  ``CONFIG_NUMA`` is enabled.
303
304``min_slab_pages``
305  Minimal number of SLAB pages that cannot be reclaimed. Determined by
306  ``vm.min_slab_ratio sysctl``. Only defined when ``CONFIG_NUMA`` is enabled
307
308``flags``
309  Flags controlling reclaim behavior.
310
311Compaction control
312~~~~~~~~~~~~~~~~~~
313
314``kcompactd_max_order``
315  Page order that kcompactd should try to achieve.
316
317``kcompactd_highest_zoneidx``
318  The highest zone index to be compacted by kcompactd.
319
320``kcompactd_wait``
321  Workqueue used to synchronize memory compaction tasks.
322
323``kcompactd``
324  Per-node instance of kcompactd kernel thread.
325
326``proactive_compact_trigger``
327  Determines if proactive compaction is enabled. Controlled by
328  ``vm.compaction_proactiveness`` sysctl.
329
330Statistics
331~~~~~~~~~~
332
333``per_cpu_nodestats``
334  Per-CPU VM statistics for the node
335
336``vm_stat``
337  VM statistics for the node.
338
339.. _zones:
340
341Zones
342=====
343As we have mentioned, each zone in memory is described by a ``struct zone``
344which is an element of the ``node_zones`` array of the node it belongs to.
345``struct zone`` is the core data structure of the page allocator. A zone
346represents a range of physical memory and may have holes.
347
348The page allocator uses the GFP flags, see :ref:`mm-api-gfp-flags`, specified by
349a memory allocation to determine the highest zone in a node from which the
350memory allocation can allocate memory. The page allocator first allocates memory
351from that zone, if the page allocator can't allocate the requested amount of
352memory from the zone, it will allocate memory from the next lower zone in the
353node, the process continues up to and including the lowest zone. For example, if
354a node contains ``ZONE_DMA32``, ``ZONE_NORMAL`` and ``ZONE_MOVABLE`` and the
355highest zone of a memory allocation is ``ZONE_MOVABLE``, the order of the zones
356from which the page allocator allocates memory is ``ZONE_MOVABLE`` >
357``ZONE_NORMAL`` > ``ZONE_DMA32``.
358
359At runtime, free pages in a zone are in the Per-CPU Pagesets (PCP) or free areas
360of the zone. The Per-CPU Pagesets are a vital mechanism in the kernel's memory
361management system. By handling most frequent allocations and frees locally on
362each CPU, the Per-CPU Pagesets improve performance and scalability, especially
363on systems with many cores. The page allocator in the kernel employs a two-step
364strategy for memory allocation, starting with the Per-CPU Pagesets before
365falling back to the buddy allocator. Pages are transferred between the Per-CPU
366Pagesets and the global free areas (managed by the buddy allocator) in batches.
367This minimizes the overhead of frequent interactions with the global buddy
368allocator.
369
370Architecture specific code calls free_area_init() to initializes zones.
371
372Zone structure
373--------------
374The zones structure ``struct zone`` is defined in ``include/linux/mmzone.h``.
375Here we briefly describe fields of this structure:
376
377General
378~~~~~~~
379
380``_watermark``
381  The watermarks for this zone. When the amount of free pages in a zone is below
382  the min watermark, boosting is ignored, an allocation may trigger direct
383  reclaim and direct compaction, it is also used to throttle direct reclaim.
384  When the amount of free pages in a zone is below the low watermark, kswapd is
385  woken up. When the amount of free pages in a zone is above the high watermark,
386  kswapd stops reclaiming (a zone is balanced) when the
387  ``NUMA_BALANCING_MEMORY_TIERING`` bit of ``sysctl_numa_balancing_mode`` is not
388  set. The promo watermark is used for memory tiering and NUMA balancing. When
389  the amount of free pages in a zone is above the promo watermark, kswapd stops
390  reclaiming when the ``NUMA_BALANCING_MEMORY_TIERING`` bit of
391  ``sysctl_numa_balancing_mode`` is set. The watermarks are set by
392  ``__setup_per_zone_wmarks()``. The min watermark is calculated according to
393  ``vm.min_free_kbytes`` sysctl. The other three watermarks are set according
394  to the distance between two watermarks. The distance itself is calculated
395  taking ``vm.watermark_scale_factor`` sysctl into account.
396
397``watermark_boost``
398  The number of pages which are used to boost watermarks to increase reclaim
399  pressure to reduce the likelihood of future fallbacks and wake kswapd now
400  as the node may be balanced overall and kswapd will not wake naturally.
401
402``nr_reserved_highatomic``
403  The number of pages which are reserved for high-order atomic allocations.
404
405``nr_free_highatomic``
406  The number of free pages in reserved highatomic pageblocks
407
408``lowmem_reserve``
409  The array of the amounts of the memory reserved in this zone for memory
410  allocations. For example, if the highest zone a memory allocation can
411  allocate memory from is ``ZONE_MOVABLE``, the amount of memory reserved in
412  this zone for this allocation is ``lowmem_reserve[ZONE_MOVABLE]`` when
413  attempting to allocate memory from this zone. This is a mechanism the page
414  allocator uses to prevent allocations which could use ``highmem`` from using
415  too much ``lowmem``. For some specialised workloads on ``highmem`` machines,
416  it is dangerous for the kernel to allow process memory to be allocated from
417  the ``lowmem`` zone. This is because that memory could then be pinned via the
418  ``mlock()`` system call, or by unavailability of swapspace.
419  ``vm.lowmem_reserve_ratio`` sysctl determines how aggressive the kernel is in
420  defending these lower zones. This array is recalculated by
421  ``setup_per_zone_lowmem_reserve()`` at runtime if ``vm.lowmem_reserve_ratio``
422  sysctl changes.
423
424``node``
425  The index of the node this zone belongs to. Available only when
426  ``CONFIG_NUMA`` is enabled because there is only one zone in a UMA system.
427
428``zone_pgdat``
429  Pointer to the ``struct pglist_data`` of the node this zone belongs to.
430
431``per_cpu_pageset``
432  Pointer to the Per-CPU Pagesets (PCP) allocated and initialized by
433  ``setup_zone_pageset()``. By handling most frequent allocations and frees
434  locally on each CPU, PCP improves performance and scalability on systems with
435  many cores.
436
437``pageset_high_min``
438  Copied to the ``high_min`` of the Per-CPU Pagesets for faster access.
439
440``pageset_high_max``
441  Copied to the ``high_max`` of the Per-CPU Pagesets for faster access.
442
443``pageset_batch``
444  Copied to the ``batch`` of the Per-CPU Pagesets for faster access. The
445  ``batch``, ``high_min`` and ``high_max`` of the Per-CPU Pagesets are used to
446  calculate the number of elements the Per-CPU Pagesets obtain from the buddy
447  allocator under a single hold of the lock for efficiency. They are also used
448  to decide if the Per-CPU Pagesets return pages to the buddy allocator in page
449  free process.
450
451``pageblock_flags``
452  The pointer to the flags for the pageblocks in the zone (see
453  ``include/linux/pageblock-flags.h`` for flags list). The memory is allocated
454  in ``setup_usemap()``. Each pageblock occupies ``NR_PAGEBLOCK_BITS`` bits.
455  Defined only when ``CONFIG_FLATMEM`` is enabled. The flags is stored in
456  ``mem_section`` when ``CONFIG_SPARSEMEM`` is enabled.
457
458``zone_start_pfn``
459  The start pfn of the zone. It is initialized by
460  ``calculate_node_totalpages()``.
461
462``managed_pages``
463  The present pages managed by the buddy system, which is calculated as:
464  ``managed_pages`` = ``present_pages`` - ``reserved_pages``, ``reserved_pages``
465  includes pages allocated by the memblock allocator. It should be used by page
466  allocator and vm scanner to calculate all kinds of watermarks and thresholds.
467  It is accessed using ``atomic_long_xxx()`` functions. It is initialized in
468  ``free_area_init_core()`` and then is reinitialized when memblock allocator
469  frees pages into buddy system.
470
471``spanned_pages``
472  The total pages spanned by the zone, including holes, which is calculated as:
473  ``spanned_pages`` = ``zone_end_pfn`` - ``zone_start_pfn``. It is initialized
474  by ``calculate_node_totalpages()``.
475
476``present_pages``
477  The physical pages existing within the zone, which is calculated as:
478  ``present_pages`` = ``spanned_pages`` - ``absent_pages`` (pages in holes). It
479  may be used by memory hotplug or memory power management logic to figure out
480  unmanaged pages by checking (``present_pages`` - ``managed_pages``). Write
481  access to ``present_pages`` at runtime should be protected by
482  ``mem_hotplug_begin/done()``. Any reader who can't tolerant drift of
483  ``present_pages`` should use ``get_online_mems()`` to get a stable value. It
484  is initialized by ``calculate_node_totalpages()``.
485
486``present_early_pages``
487  The present pages existing within the zone located on memory available since
488  early boot, excluding hotplugged memory. Defined only when
489  ``CONFIG_MEMORY_HOTPLUG`` is enabled and initialized by
490  ``calculate_node_totalpages()``.
491
492``cma_pages``
493  The pages reserved for CMA use. These pages behave like ``ZONE_MOVABLE`` when
494  they are not used for CMA. Defined only when ``CONFIG_CMA`` is enabled.
495
496``name``
497  The name of the zone. It is a pointer to the corresponding element of
498  the ``zone_names`` array.
499
500``nr_isolate_pageblock``
501  Number of isolated pageblocks. It is used to solve incorrect freepage counting
502  problem due to racy retrieving migratetype of pageblock. Protected by
503  ``zone->lock``. Defined only when ``CONFIG_MEMORY_ISOLATION`` is enabled.
504
505``span_seqlock``
506  The seqlock to protect ``zone_start_pfn`` and ``spanned_pages``. It is a
507  seqlock because it has to be read outside of ``zone->lock``, and it is done in
508  the main allocator path. However, the seqlock is written quite infrequently.
509  Defined only when ``CONFIG_MEMORY_HOTPLUG`` is enabled.
510
511``initialized``
512  The flag indicating if the zone is initialized. Set by
513  ``init_currently_empty_zone()`` during boot.
514
515``free_area``
516  The array of free areas, where each element corresponds to a specific order
517  which is a power of two. The buddy allocator uses this structure to manage
518  free memory efficiently. When allocating, it tries to find the smallest
519  sufficient block, if the smallest sufficient block is larger than the
520  requested size, it will be recursively split into the next smaller blocks
521  until the required size is reached. When a page is freed, it may be merged
522  with its buddy to form a larger block. It is initialized by
523  ``zone_init_free_lists()``.
524
525``unaccepted_pages``
526  The list of pages to be accepted. All pages on the list are ``MAX_PAGE_ORDER``.
527  Defined only when ``CONFIG_UNACCEPTED_MEMORY`` is enabled.
528
529``flags``
530  The zone flags. The least three bits are used and defined by
531  ``enum zone_flags``. ``ZONE_BOOSTED_WATERMARK`` (bit 0): zone recently boosted
532  watermarks. Cleared when kswapd is woken. ``ZONE_RECLAIM_ACTIVE`` (bit 1):
533  kswapd may be scanning the zone. ``ZONE_BELOW_HIGH`` (bit 2): zone is below
534  high watermark.
535
536``lock``
537  The main lock that protects the internal data structures of the page allocator
538  specific to the zone, especially protects ``free_area``.
539
540``percpu_drift_mark``
541  When free pages are below this point, additional steps are taken when reading
542  the number of free pages to avoid per-cpu counter drift allowing watermarks
543  to be breached. It is updated in ``refresh_zone_stat_thresholds()``.
544
545Compaction control
546~~~~~~~~~~~~~~~~~~
547
548``compact_cached_free_pfn``
549  The PFN where compaction free scanner should start in the next scan.
550
551``compact_cached_migrate_pfn``
552  The PFNs where compaction migration scanner should start in the next scan.
553  This array has two elements: the first one is used in ``MIGRATE_ASYNC`` mode,
554  and the other one is used in ``MIGRATE_SYNC`` mode.
555
556``compact_init_migrate_pfn``
557  The initial migration PFN which is initialized to 0 at boot time, and to the
558  first pageblock with migratable pages in the zone after a full compaction
559  finishes. It is used to check if a scan is a whole zone scan or not.
560
561``compact_init_free_pfn``
562  The initial free PFN which is initialized to 0 at boot time and to the last
563  pageblock with free ``MIGRATE_MOVABLE`` pages in the zone. It is used to check
564  if it is the start of a scan.
565
566``compact_considered``
567  The number of compactions attempted since last failure. It is reset in
568  ``defer_compaction()`` when a compaction fails to result in a page allocation
569  success. It is increased by 1 in ``compaction_deferred()`` when a compaction
570  should be skipped. ``compaction_deferred()`` is called before
571  ``compact_zone()`` is called, ``compaction_defer_reset()`` is called when
572  ``compact_zone()`` returns ``COMPACT_SUCCESS``, ``defer_compaction()`` is
573  called when ``compact_zone()`` returns ``COMPACT_PARTIAL_SKIPPED`` or
574  ``COMPACT_COMPLETE``.
575
576``compact_defer_shift``
577  The number of compactions skipped before trying again is
578  ``1<<compact_defer_shift``. It is increased by 1 in ``defer_compaction()``.
579  It is reset in ``compaction_defer_reset()`` when a direct compaction results
580  in a page allocation success. Its maximum value is ``COMPACT_MAX_DEFER_SHIFT``.
581
582``compact_order_failed``
583  The minimum compaction failed order. It is set in ``compaction_defer_reset()``
584  when a compaction succeeds and in ``defer_compaction()`` when a compaction
585  fails to result in a page allocation success.
586
587``compact_blockskip_flush``
588  Set to true when compaction migration scanner and free scanner meet, which
589  means the ``PB_compact_skip`` bits should be cleared.
590
591``contiguous``
592  Set to true when the zone is contiguous (in other words, no hole).
593
594Statistics
595~~~~~~~~~~
596
597``vm_stat``
598  VM statistics for the zone. The items tracked are defined by
599  ``enum zone_stat_item``.
600
601``vm_numa_event``
602  VM NUMA event statistics for the zone. The items tracked are defined by
603  ``enum numa_stat_item``.
604
605``per_cpu_zonestats``
606  Per-CPU VM statistics for the zone. It records VM statistics and VM NUMA event
607  statistics on a per-CPU basis. It reduces updates to the global ``vm_stat``
608  and ``vm_numa_event`` fields of the zone to improve performance.
609
610.. _pages:
611
612Pages
613=====
614
615.. admonition:: Stub
616
617   This section is incomplete. Please list and describe the appropriate fields.
618
619.. _folios:
620
621Folios
622======
623
624.. admonition:: Stub
625
626   This section is incomplete. Please list and describe the appropriate fields.
627
628.. _initialization:
629
630Initialization
631==============
632
633.. admonition:: Stub
634
635   This section is incomplete. Please list and describe the appropriate fields.