Linux kernel mirror (for testing)
git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel
os
linux
1.. SPDX-License-Identifier: GPL-2.0
2
3===============
4Physical Memory
5===============
6
7Linux is available for a wide range of architectures so there is a need for an
8architecture-independent abstraction to represent the physical memory. This
9chapter describes the structures used to manage physical memory in a running
10system.
11
12The first principal concept prevalent in the memory management is
13`Non-Uniform Memory Access (NUMA)
14<https://en.wikipedia.org/wiki/Non-uniform_memory_access>`_.
15With multi-core and multi-socket machines, memory may be arranged into banks
16that incur a different cost to access depending on the “distance” from the
17processor. For example, there might be a bank of memory assigned to each CPU or
18a bank of memory very suitable for DMA near peripheral devices.
19
20Each bank is called a node and the concept is represented under Linux by a
21``struct pglist_data`` even if the architecture is UMA. This structure is
22always referenced by its typedef ``pg_data_t``. A ``pg_data_t`` structure
23for a particular node can be referenced by ``NODE_DATA(nid)`` macro where
24``nid`` is the ID of that node.
25
26For NUMA architectures, the node structures are allocated by the architecture
27specific code early during boot. Usually, these structures are allocated
28locally on the memory bank they represent. For UMA architectures, only one
29static ``pg_data_t`` structure called ``contig_page_data`` is used. Nodes will
30be discussed further in Section :ref:`Nodes <nodes>`
31
32The entire physical address space is partitioned into one or more blocks
33called zones which represent ranges within memory. These ranges are usually
34determined by architectural constraints for accessing the physical memory.
35The memory range within a node that corresponds to a particular zone is
36described by a ``struct zone``. Each zone has
37one of the types described below.
38
39* ``ZONE_DMA`` and ``ZONE_DMA32`` historically represented memory suitable for
40 DMA by peripheral devices that cannot access all of the addressable
41 memory. For many years there are better more and robust interfaces to get
42 memory with DMA specific requirements (Documentation/core-api/dma-api.rst),
43 but ``ZONE_DMA`` and ``ZONE_DMA32`` still represent memory ranges that have
44 restrictions on how they can be accessed.
45 Depending on the architecture, either of these zone types or even they both
46 can be disabled at build time using ``CONFIG_ZONE_DMA`` and
47 ``CONFIG_ZONE_DMA32`` configuration options. Some 64-bit platforms may need
48 both zones as they support peripherals with different DMA addressing
49 limitations.
50
51* ``ZONE_NORMAL`` is for normal memory that can be accessed by the kernel all
52 the time. DMA operations can be performed on pages in this zone if the DMA
53 devices support transfers to all addressable memory. ``ZONE_NORMAL`` is
54 always enabled.
55
56* ``ZONE_HIGHMEM`` is the part of the physical memory that is not covered by a
57 permanent mapping in the kernel page tables. The memory in this zone is only
58 accessible to the kernel using temporary mappings. This zone is available
59 only on some 32-bit architectures and is enabled with ``CONFIG_HIGHMEM``.
60
61* ``ZONE_MOVABLE`` is for normal accessible memory, just like ``ZONE_NORMAL``.
62 The difference is that the contents of most pages in ``ZONE_MOVABLE`` is
63 movable. That means that while virtual addresses of these pages do not
64 change, their content may move between different physical pages. Often
65 ``ZONE_MOVABLE`` is populated during memory hotplug, but it may be
66 also populated on boot using one of ``kernelcore``, ``movablecore`` and
67 ``movable_node`` kernel command line parameters. See
68 Documentation/mm/page_migration.rst and
69 Documentation/admin-guide/mm/memory-hotplug.rst for additional details.
70
71* ``ZONE_DEVICE`` represents memory residing on devices such as PMEM and GPU.
72 It has different characteristics than RAM zone types and it exists to provide
73 :ref:`struct page <Pages>` and memory map services for device driver
74 identified physical address ranges. ``ZONE_DEVICE`` is enabled with
75 configuration option ``CONFIG_ZONE_DEVICE``.
76
77It is important to note that many kernel operations can only take place using
78``ZONE_NORMAL`` so it is the most performance critical zone. Zones are
79discussed further in Section :ref:`Zones <zones>`.
80
81The relation between node and zone extents is determined by the physical memory
82map reported by the firmware, architectural constraints for memory addressing
83and certain parameters in the kernel command line.
84
85For example, with 32-bit kernel on an x86 UMA machine with 2 Gbytes of RAM the
86entire memory will be on node 0 and there will be three zones: ``ZONE_DMA``,
87``ZONE_NORMAL`` and ``ZONE_HIGHMEM``::
88
89 0 2G
90 +-------------------------------------------------------------+
91 | node 0 |
92 +-------------------------------------------------------------+
93
94 0 16M 896M 2G
95 +----------+-----------------------+--------------------------+
96 | ZONE_DMA | ZONE_NORMAL | ZONE_HIGHMEM |
97 +----------+-----------------------+--------------------------+
98
99
100With a kernel built with ``ZONE_DMA`` disabled and ``ZONE_DMA32`` enabled and
101booted with ``movablecore=80%`` parameter on an arm64 machine with 16 Gbytes of
102RAM equally split between two nodes, there will be ``ZONE_DMA32``,
103``ZONE_NORMAL`` and ``ZONE_MOVABLE`` on node 0, and ``ZONE_NORMAL`` and
104``ZONE_MOVABLE`` on node 1::
105
106
107 1G 9G 17G
108 +--------------------------------+ +--------------------------+
109 | node 0 | | node 1 |
110 +--------------------------------+ +--------------------------+
111
112 1G 4G 4200M 9G 9320M 17G
113 +---------+----------+-----------+ +------------+-------------+
114 | DMA32 | NORMAL | MOVABLE | | NORMAL | MOVABLE |
115 +---------+----------+-----------+ +------------+-------------+
116
117
118Memory banks may belong to interleaving nodes. In the example below an x86
119machine has 16 Gbytes of RAM in 4 memory banks, even banks belong to node 0
120and odd banks belong to node 1::
121
122
123 0 4G 8G 12G 16G
124 +-------------+ +-------------+ +-------------+ +-------------+
125 | node 0 | | node 1 | | node 0 | | node 1 |
126 +-------------+ +-------------+ +-------------+ +-------------+
127
128 0 16M 4G
129 +-----+-------+ +-------------+ +-------------+ +-------------+
130 | DMA | DMA32 | | NORMAL | | NORMAL | | NORMAL |
131 +-----+-------+ +-------------+ +-------------+ +-------------+
132
133In this case node 0 will span from 0 to 12 Gbytes and node 1 will span from
1344 to 16 Gbytes.
135
136.. _nodes:
137
138Nodes
139=====
140
141As we have mentioned, each node in memory is described by a ``pg_data_t`` which
142is a typedef for a ``struct pglist_data``. When allocating a page, by default
143Linux uses a node-local allocation policy to allocate memory from the node
144closest to the running CPU. As processes tend to run on the same CPU, it is
145likely the memory from the current node will be used. The allocation policy can
146be controlled by users as described in
147Documentation/admin-guide/mm/numa_memory_policy.rst.
148
149Most NUMA architectures maintain an array of pointers to the node
150structures. The actual structures are allocated early during boot when
151architecture specific code parses the physical memory map reported by the
152firmware. The bulk of the node initialization happens slightly later in the
153boot process by free_area_init() function, described later in Section
154:ref:`Initialization <initialization>`.
155
156
157Along with the node structures, kernel maintains an array of ``nodemask_t``
158bitmasks called ``node_states``. Each bitmask in this array represents a set of
159nodes with particular properties as defined by ``enum node_states``:
160
161``N_POSSIBLE``
162 The node could become online at some point.
163``N_ONLINE``
164 The node is online.
165``N_NORMAL_MEMORY``
166 The node has regular memory.
167``N_HIGH_MEMORY``
168 The node has regular or high memory. When ``CONFIG_HIGHMEM`` is disabled
169 aliased to ``N_NORMAL_MEMORY``.
170``N_MEMORY``
171 The node has memory(regular, high, movable)
172``N_CPU``
173 The node has one or more CPUs
174
175For each node that has a property described above, the bit corresponding to the
176node ID in the ``node_states[<property>]`` bitmask is set.
177
178For example, for node 2 with normal memory and CPUs, bit 2 will be set in ::
179
180 node_states[N_POSSIBLE]
181 node_states[N_ONLINE]
182 node_states[N_NORMAL_MEMORY]
183 node_states[N_HIGH_MEMORY]
184 node_states[N_MEMORY]
185 node_states[N_CPU]
186
187For various operations possible with nodemasks please refer to
188``include/linux/nodemask.h``.
189
190Among other things, nodemasks are used to provide macros for node traversal,
191namely ``for_each_node()`` and ``for_each_online_node()``.
192
193For instance, to call a function foo() for each online node::
194
195 for_each_online_node(nid) {
196 pg_data_t *pgdat = NODE_DATA(nid);
197
198 foo(pgdat);
199 }
200
201Node structure
202--------------
203
204The nodes structure ``struct pglist_data`` is declared in
205``include/linux/mmzone.h``. Here we briefly describe fields of this
206structure:
207
208General
209~~~~~~~
210
211``node_zones``
212 The zones for this node. Not all of the zones may be populated, but it is
213 the full list. It is referenced by this node's node_zonelists as well as
214 other node's node_zonelists.
215
216``node_zonelists``
217 The list of all zones in all nodes. This list defines the order of zones
218 that allocations are preferred from. The ``node_zonelists`` is set up by
219 ``build_zonelists()`` in ``mm/page_alloc.c`` during the initialization of
220 core memory management structures.
221
222``nr_zones``
223 Number of populated zones in this node.
224
225``node_mem_map``
226 For UMA systems that use FLATMEM memory model the 0's node
227 ``node_mem_map`` is array of struct pages representing each physical frame.
228
229``node_page_ext``
230 For UMA systems that use FLATMEM memory model the 0's node
231 ``node_page_ext`` is array of extensions of struct pages. Available only
232 in the kernels built with ``CONFIG_PAGE_EXTENSION`` enabled.
233
234``node_start_pfn``
235 The page frame number of the starting page frame in this node.
236
237``node_present_pages``
238 Total number of physical pages present in this node.
239
240``node_spanned_pages``
241 Total size of physical page range, including holes.
242
243``node_size_lock``
244 A lock that protects the fields defining the node extents. Only defined when
245 at least one of ``CONFIG_MEMORY_HOTPLUG`` or
246 ``CONFIG_DEFERRED_STRUCT_PAGE_INIT`` configuration options are enabled.
247 ``pgdat_resize_lock()`` and ``pgdat_resize_unlock()`` are provided to
248 manipulate ``node_size_lock`` without checking for ``CONFIG_MEMORY_HOTPLUG``
249 or ``CONFIG_DEFERRED_STRUCT_PAGE_INIT``.
250
251``node_id``
252 The Node ID (NID) of the node, starts at 0.
253
254``totalreserve_pages``
255 This is a per-node reserve of pages that are not available to userspace
256 allocations.
257
258``first_deferred_pfn``
259 If memory initialization on large machines is deferred then this is the first
260 PFN that needs to be initialized. Defined only when
261 ``CONFIG_DEFERRED_STRUCT_PAGE_INIT`` is enabled
262
263``deferred_split_queue``
264 Per-node queue of huge pages that their split was deferred. Defined only when ``CONFIG_TRANSPARENT_HUGEPAGE`` is enabled.
265
266``__lruvec``
267 Per-node lruvec holding LRU lists and related parameters. Used only when
268 memory cgroups are disabled. It should not be accessed directly, use
269 ``mem_cgroup_lruvec()`` to look up lruvecs instead.
270
271Reclaim control
272~~~~~~~~~~~~~~~
273
274See also Documentation/mm/page_reclaim.rst.
275
276``kswapd``
277 Per-node instance of kswapd kernel thread.
278
279``kswapd_wait``, ``pfmemalloc_wait``, ``reclaim_wait``
280 Workqueues used to synchronize memory reclaim tasks
281
282``nr_writeback_throttled``
283 Number of tasks that are throttled waiting on dirty pages to clean.
284
285``nr_reclaim_start``
286 Number of pages written while reclaim is throttled waiting for writeback.
287
288``kswapd_order``
289 Controls the order kswapd tries to reclaim
290
291``kswapd_highest_zoneidx``
292 The highest zone index to be reclaimed by kswapd
293
294``kswapd_failures``
295 Number of runs kswapd was unable to reclaim any pages
296
297``min_unmapped_pages``
298 Minimal number of unmapped file backed pages that cannot be reclaimed.
299 Determined by ``vm.min_unmapped_ratio`` sysctl. Only defined when
300 ``CONFIG_NUMA`` is enabled.
301
302``min_slab_pages``
303 Minimal number of SLAB pages that cannot be reclaimed. Determined by
304 ``vm.min_slab_ratio sysctl``. Only defined when ``CONFIG_NUMA`` is enabled
305
306``flags``
307 Flags controlling reclaim behavior.
308
309Compaction control
310~~~~~~~~~~~~~~~~~~
311
312``kcompactd_max_order``
313 Page order that kcompactd should try to achieve.
314
315``kcompactd_highest_zoneidx``
316 The highest zone index to be compacted by kcompactd.
317
318``kcompactd_wait``
319 Workqueue used to synchronize memory compaction tasks.
320
321``kcompactd``
322 Per-node instance of kcompactd kernel thread.
323
324``proactive_compact_trigger``
325 Determines if proactive compaction is enabled. Controlled by
326 ``vm.compaction_proactiveness`` sysctl.
327
328Statistics
329~~~~~~~~~~
330
331``per_cpu_nodestats``
332 Per-CPU VM statistics for the node
333
334``vm_stat``
335 VM statistics for the node.
336
337.. _zones:
338
339Zones
340=====
341As we have mentioned, each zone in memory is described by a ``struct zone``
342which is an element of the ``node_zones`` array of the node it belongs to.
343``struct zone`` is the core data structure of the page allocator. A zone
344represents a range of physical memory and may have holes.
345
346The page allocator uses the GFP flags, see :ref:`mm-api-gfp-flags`, specified by
347a memory allocation to determine the highest zone in a node from which the
348memory allocation can allocate memory. The page allocator first allocates memory
349from that zone, if the page allocator can't allocate the requested amount of
350memory from the zone, it will allocate memory from the next lower zone in the
351node, the process continues up to and including the lowest zone. For example, if
352a node contains ``ZONE_DMA32``, ``ZONE_NORMAL`` and ``ZONE_MOVABLE`` and the
353highest zone of a memory allocation is ``ZONE_MOVABLE``, the order of the zones
354from which the page allocator allocates memory is ``ZONE_MOVABLE`` >
355``ZONE_NORMAL`` > ``ZONE_DMA32``.
356
357At runtime, free pages in a zone are in the Per-CPU Pagesets (PCP) or free areas
358of the zone. The Per-CPU Pagesets are a vital mechanism in the kernel's memory
359management system. By handling most frequent allocations and frees locally on
360each CPU, the Per-CPU Pagesets improve performance and scalability, especially
361on systems with many cores. The page allocator in the kernel employs a two-step
362strategy for memory allocation, starting with the Per-CPU Pagesets before
363falling back to the buddy allocator. Pages are transferred between the Per-CPU
364Pagesets and the global free areas (managed by the buddy allocator) in batches.
365This minimizes the overhead of frequent interactions with the global buddy
366allocator.
367
368Architecture specific code calls free_area_init() to initializes zones.
369
370Zone structure
371--------------
372The zones structure ``struct zone`` is defined in ``include/linux/mmzone.h``.
373Here we briefly describe fields of this structure:
374
375General
376~~~~~~~
377
378``_watermark``
379 The watermarks for this zone. When the amount of free pages in a zone is below
380 the min watermark, boosting is ignored, an allocation may trigger direct
381 reclaim and direct compaction, it is also used to throttle direct reclaim.
382 When the amount of free pages in a zone is below the low watermark, kswapd is
383 woken up. When the amount of free pages in a zone is above the high watermark,
384 kswapd stops reclaiming (a zone is balanced) when the
385 ``NUMA_BALANCING_MEMORY_TIERING`` bit of ``sysctl_numa_balancing_mode`` is not
386 set. The promo watermark is used for memory tiering and NUMA balancing. When
387 the amount of free pages in a zone is above the promo watermark, kswapd stops
388 reclaiming when the ``NUMA_BALANCING_MEMORY_TIERING`` bit of
389 ``sysctl_numa_balancing_mode`` is set. The watermarks are set by
390 ``__setup_per_zone_wmarks()``. The min watermark is calculated according to
391 ``vm.min_free_kbytes`` sysctl. The other three watermarks are set according
392 to the distance between two watermarks. The distance itself is calculated
393 taking ``vm.watermark_scale_factor`` sysctl into account.
394
395``watermark_boost``
396 The number of pages which are used to boost watermarks to increase reclaim
397 pressure to reduce the likelihood of future fallbacks and wake kswapd now
398 as the node may be balanced overall and kswapd will not wake naturally.
399
400``nr_reserved_highatomic``
401 The number of pages which are reserved for high-order atomic allocations.
402
403``nr_free_highatomic``
404 The number of free pages in reserved highatomic pageblocks
405
406``lowmem_reserve``
407 The array of the amounts of the memory reserved in this zone for memory
408 allocations. For example, if the highest zone a memory allocation can
409 allocate memory from is ``ZONE_MOVABLE``, the amount of memory reserved in
410 this zone for this allocation is ``lowmem_reserve[ZONE_MOVABLE]`` when
411 attempting to allocate memory from this zone. This is a mechanism the page
412 allocator uses to prevent allocations which could use ``highmem`` from using
413 too much ``lowmem``. For some specialised workloads on ``highmem`` machines,
414 it is dangerous for the kernel to allow process memory to be allocated from
415 the ``lowmem`` zone. This is because that memory could then be pinned via the
416 ``mlock()`` system call, or by unavailability of swapspace.
417 ``vm.lowmem_reserve_ratio`` sysctl determines how aggressive the kernel is in
418 defending these lower zones. This array is recalculated by
419 ``setup_per_zone_lowmem_reserve()`` at runtime if ``vm.lowmem_reserve_ratio``
420 sysctl changes.
421
422``node``
423 The index of the node this zone belongs to. Available only when
424 ``CONFIG_NUMA`` is enabled because there is only one zone in a UMA system.
425
426``zone_pgdat``
427 Pointer to the ``struct pglist_data`` of the node this zone belongs to.
428
429``per_cpu_pageset``
430 Pointer to the Per-CPU Pagesets (PCP) allocated and initialized by
431 ``setup_zone_pageset()``. By handling most frequent allocations and frees
432 locally on each CPU, PCP improves performance and scalability on systems with
433 many cores.
434
435``pageset_high_min``
436 Copied to the ``high_min`` of the Per-CPU Pagesets for faster access.
437
438``pageset_high_max``
439 Copied to the ``high_max`` of the Per-CPU Pagesets for faster access.
440
441``pageset_batch``
442 Copied to the ``batch`` of the Per-CPU Pagesets for faster access. The
443 ``batch``, ``high_min`` and ``high_max`` of the Per-CPU Pagesets are used to
444 calculate the number of elements the Per-CPU Pagesets obtain from the buddy
445 allocator under a single hold of the lock for efficiency. They are also used
446 to decide if the Per-CPU Pagesets return pages to the buddy allocator in page
447 free process.
448
449``pageblock_flags``
450 The pointer to the flags for the pageblocks in the zone (see
451 ``include/linux/pageblock-flags.h`` for flags list). The memory is allocated
452 in ``setup_usemap()``. Each pageblock occupies ``NR_PAGEBLOCK_BITS`` bits.
453 Defined only when ``CONFIG_FLATMEM`` is enabled. The flags is stored in
454 ``mem_section`` when ``CONFIG_SPARSEMEM`` is enabled.
455
456``zone_start_pfn``
457 The start pfn of the zone. It is initialized by
458 ``calculate_node_totalpages()``.
459
460``managed_pages``
461 The present pages managed by the buddy system, which is calculated as:
462 ``managed_pages`` = ``present_pages`` - ``reserved_pages``, ``reserved_pages``
463 includes pages allocated by the memblock allocator. It should be used by page
464 allocator and vm scanner to calculate all kinds of watermarks and thresholds.
465 It is accessed using ``atomic_long_xxx()`` functions. It is initialized in
466 ``free_area_init_core()`` and then is reinitialized when memblock allocator
467 frees pages into buddy system.
468
469``spanned_pages``
470 The total pages spanned by the zone, including holes, which is calculated as:
471 ``spanned_pages`` = ``zone_end_pfn`` - ``zone_start_pfn``. It is initialized
472 by ``calculate_node_totalpages()``.
473
474``present_pages``
475 The physical pages existing within the zone, which is calculated as:
476 ``present_pages`` = ``spanned_pages`` - ``absent_pages`` (pages in holes). It
477 may be used by memory hotplug or memory power management logic to figure out
478 unmanaged pages by checking (``present_pages`` - ``managed_pages``). Write
479 access to ``present_pages`` at runtime should be protected by
480 ``mem_hotplug_begin/done()``. Any reader who can't tolerant drift of
481 ``present_pages`` should use ``get_online_mems()`` to get a stable value. It
482 is initialized by ``calculate_node_totalpages()``.
483
484``present_early_pages``
485 The present pages existing within the zone located on memory available since
486 early boot, excluding hotplugged memory. Defined only when
487 ``CONFIG_MEMORY_HOTPLUG`` is enabled and initialized by
488 ``calculate_node_totalpages()``.
489
490``cma_pages``
491 The pages reserved for CMA use. These pages behave like ``ZONE_MOVABLE`` when
492 they are not used for CMA. Defined only when ``CONFIG_CMA`` is enabled.
493
494``name``
495 The name of the zone. It is a pointer to the corresponding element of
496 the ``zone_names`` array.
497
498``nr_isolate_pageblock``
499 Number of isolated pageblocks. It is used to solve incorrect freepage counting
500 problem due to racy retrieving migratetype of pageblock. Protected by
501 ``zone->lock``. Defined only when ``CONFIG_MEMORY_ISOLATION`` is enabled.
502
503``span_seqlock``
504 The seqlock to protect ``zone_start_pfn`` and ``spanned_pages``. It is a
505 seqlock because it has to be read outside of ``zone->lock``, and it is done in
506 the main allocator path. However, the seqlock is written quite infrequently.
507 Defined only when ``CONFIG_MEMORY_HOTPLUG`` is enabled.
508
509``initialized``
510 The flag indicating if the zone is initialized. Set by
511 ``init_currently_empty_zone()`` during boot.
512
513``free_area``
514 The array of free areas, where each element corresponds to a specific order
515 which is a power of two. The buddy allocator uses this structure to manage
516 free memory efficiently. When allocating, it tries to find the smallest
517 sufficient block, if the smallest sufficient block is larger than the
518 requested size, it will be recursively split into the next smaller blocks
519 until the required size is reached. When a page is freed, it may be merged
520 with its buddy to form a larger block. It is initialized by
521 ``zone_init_free_lists()``.
522
523``unaccepted_pages``
524 The list of pages to be accepted. All pages on the list are ``MAX_PAGE_ORDER``.
525 Defined only when ``CONFIG_UNACCEPTED_MEMORY`` is enabled.
526
527``flags``
528 The zone flags. The least three bits are used and defined by
529 ``enum zone_flags``. ``ZONE_BOOSTED_WATERMARK`` (bit 0): zone recently boosted
530 watermarks. Cleared when kswapd is woken. ``ZONE_RECLAIM_ACTIVE`` (bit 1):
531 kswapd may be scanning the zone. ``ZONE_BELOW_HIGH`` (bit 2): zone is below
532 high watermark.
533
534``lock``
535 The main lock that protects the internal data structures of the page allocator
536 specific to the zone, especially protects ``free_area``.
537
538``percpu_drift_mark``
539 When free pages are below this point, additional steps are taken when reading
540 the number of free pages to avoid per-cpu counter drift allowing watermarks
541 to be breached. It is updated in ``refresh_zone_stat_thresholds()``.
542
543Compaction control
544~~~~~~~~~~~~~~~~~~
545
546``compact_cached_free_pfn``
547 The PFN where compaction free scanner should start in the next scan.
548
549``compact_cached_migrate_pfn``
550 The PFNs where compaction migration scanner should start in the next scan.
551 This array has two elements: the first one is used in ``MIGRATE_ASYNC`` mode,
552 and the other one is used in ``MIGRATE_SYNC`` mode.
553
554``compact_init_migrate_pfn``
555 The initial migration PFN which is initialized to 0 at boot time, and to the
556 first pageblock with migratable pages in the zone after a full compaction
557 finishes. It is used to check if a scan is a whole zone scan or not.
558
559``compact_init_free_pfn``
560 The initial free PFN which is initialized to 0 at boot time and to the last
561 pageblock with free ``MIGRATE_MOVABLE`` pages in the zone. It is used to check
562 if it is the start of a scan.
563
564``compact_considered``
565 The number of compactions attempted since last failure. It is reset in
566 ``defer_compaction()`` when a compaction fails to result in a page allocation
567 success. It is increased by 1 in ``compaction_deferred()`` when a compaction
568 should be skipped. ``compaction_deferred()`` is called before
569 ``compact_zone()`` is called, ``compaction_defer_reset()`` is called when
570 ``compact_zone()`` returns ``COMPACT_SUCCESS``, ``defer_compaction()`` is
571 called when ``compact_zone()`` returns ``COMPACT_PARTIAL_SKIPPED`` or
572 ``COMPACT_COMPLETE``.
573
574``compact_defer_shift``
575 The number of compactions skipped before trying again is
576 ``1<<compact_defer_shift``. It is increased by 1 in ``defer_compaction()``.
577 It is reset in ``compaction_defer_reset()`` when a direct compaction results
578 in a page allocation success. Its maximum value is ``COMPACT_MAX_DEFER_SHIFT``.
579
580``compact_order_failed``
581 The minimum compaction failed order. It is set in ``compaction_defer_reset()``
582 when a compaction succeeds and in ``defer_compaction()`` when a compaction
583 fails to result in a page allocation success.
584
585``compact_blockskip_flush``
586 Set to true when compaction migration scanner and free scanner meet, which
587 means the ``PB_migrate_skip`` bits should be cleared.
588
589``contiguous``
590 Set to true when the zone is contiguous (in other words, no hole).
591
592Statistics
593~~~~~~~~~~
594
595``vm_stat``
596 VM statistics for the zone. The items tracked are defined by
597 ``enum zone_stat_item``.
598
599``vm_numa_event``
600 VM NUMA event statistics for the zone. The items tracked are defined by
601 ``enum numa_stat_item``.
602
603``per_cpu_zonestats``
604 Per-CPU VM statistics for the zone. It records VM statistics and VM NUMA event
605 statistics on a per-CPU basis. It reduces updates to the global ``vm_stat``
606 and ``vm_numa_event`` fields of the zone to improve performance.
607
608.. _pages:
609
610Pages
611=====
612
613.. admonition:: Stub
614
615 This section is incomplete. Please list and describe the appropriate fields.
616
617.. _folios:
618
619Folios
620======
621
622.. admonition:: Stub
623
624 This section is incomplete. Please list and describe the appropriate fields.
625
626.. _initialization:
627
628Initialization
629==============
630
631.. admonition:: Stub
632
633 This section is incomplete. Please list and describe the appropriate fields.