Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: introduce and use {pgd,p4d}_populate_kernel()

Introduce and use {pgd,p4d}_populate_kernel() in core MM code when
populating PGD and P4D entries for the kernel address space. These
helpers ensure proper synchronization of page tables when updating the
kernel portion of top-level page tables.

Until now, the kernel has relied on each architecture to handle
synchronization of top-level page tables in an ad-hoc manner. For
example, see commit 9b861528a801 ("x86-64, mem: Update all PGDs for direct
mapping and vmemmap mapping changes").

However, this approach has proven fragile for following reasons:

1) It is easy to forget to perform the necessary page table
synchronization when introducing new changes.
For instance, commit 4917f55b4ef9 ("mm/sparse-vmemmap: improve memory
savings for compound devmaps") overlooked the need to synchronize
page tables for the vmemmap area.

2) It is also easy to overlook that the vmemmap and direct mapping areas
must not be accessed before explicit page table synchronization.
For example, commit 8d400913c231 ("x86/vmemmap: handle unpopulated
sub-pmd ranges")) caused crashes by accessing the vmemmap area
before calling sync_global_pgds().

To address this, as suggested by Dave Hansen, introduce _kernel() variants
of the page table population helpers, which invoke architecture-specific
hooks to properly synchronize page tables. These are introduced in a new
header file, include/linux/pgalloc.h, so they can be called from common
code.

They reuse existing infrastructure for vmalloc and ioremap.
Synchronization requirements are determined by ARCH_PAGE_TABLE_SYNC_MASK,
and the actual synchronization is performed by
arch_sync_kernel_mappings().

This change currently targets only x86_64, so only PGD and P4D level
helpers are introduced. Currently, these helpers are no-ops since no
architecture sets PGTBL_{PGD,P4D}_MODIFIED in ARCH_PAGE_TABLE_SYNC_MASK.

In theory, PUD and PMD level helpers can be added later if needed by other
architectures. For now, 32-bit architectures (x86-32 and arm) only handle
PGTBL_PMD_MODIFIED, so p*d_populate_kernel() will never affect them unless
we introduce a PMD level helper.

[harry.yoo@oracle.com: fix KASAN build error due to p*d_populate_kernel()]
Link: https://lkml.kernel.org/r/20250822020727.202749-1-harry.yoo@oracle.com
Link: https://lkml.kernel.org/r/20250818020206.4517-3-harry.yoo@oracle.com
Fixes: 8d400913c231 ("x86/vmemmap: handle unpopulated sub-pmd ranges")
Signed-off-by: Harry Yoo <harry.yoo@oracle.com>
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kiryl Shutsemau <kas@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Alexander Potapenko <glider@google.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: bibo mao <maobibo@loongson.cn>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Christoph Lameter (Ampere) <cl@gentwo.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Gwan-gyeong Mun <gwan-gyeong.mun@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jane Chu <jane.chu@oracle.com>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: Joerg Roedel <joro@8bytes.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Peter Xu <peterx@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Zheng <zhengqi.arch@bytedance.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Thomas Huth <thuth@redhat.com>
Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com>
Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Harry Yoo and committed by
Andrew Morton
f2d2f959 7cc183f2

+48 -18
+29
include/linux/pgalloc.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _LINUX_PGALLOC_H 3 + #define _LINUX_PGALLOC_H 4 + 5 + #include <linux/pgtable.h> 6 + #include <asm/pgalloc.h> 7 + 8 + /* 9 + * {pgd,p4d}_populate_kernel() are defined as macros to allow 10 + * compile-time optimization based on the configured page table levels. 11 + * Without this, linking may fail because callers (e.g., KASAN) may rely 12 + * on calls to these functions being optimized away when passing symbols 13 + * that exist only for certain page table levels. 14 + */ 15 + #define pgd_populate_kernel(addr, pgd, p4d) \ 16 + do { \ 17 + pgd_populate(&init_mm, pgd, p4d); \ 18 + if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_PGD_MODIFIED) \ 19 + arch_sync_kernel_mappings(addr, addr); \ 20 + } while (0) 21 + 22 + #define p4d_populate_kernel(addr, p4d, pud) \ 23 + do { \ 24 + p4d_populate(&init_mm, p4d, pud); \ 25 + if (ARCH_PAGE_TABLE_SYNC_MASK & PGTBL_P4D_MODIFIED) \ 26 + arch_sync_kernel_mappings(addr, addr); \ 27 + } while (0) 28 + 29 + #endif /* _LINUX_PGALLOC_H */
+7 -6
include/linux/pgtable.h
··· 1469 1469 1470 1470 /* 1471 1471 * Architectures can set this mask to a combination of PGTBL_P?D_MODIFIED values 1472 - * and let generic vmalloc and ioremap code know when arch_sync_kernel_mappings() 1473 - * needs to be called. 1472 + * and let generic vmalloc, ioremap and page table update code know when 1473 + * arch_sync_kernel_mappings() needs to be called. 1474 1474 */ 1475 1475 #ifndef ARCH_PAGE_TABLE_SYNC_MASK 1476 1476 #define ARCH_PAGE_TABLE_SYNC_MASK 0 ··· 1954 1954 /* 1955 1955 * Page Table Modification bits for pgtbl_mod_mask. 1956 1956 * 1957 - * These are used by the p?d_alloc_track*() set of functions an in the generic 1958 - * vmalloc/ioremap code to track at which page-table levels entries have been 1959 - * modified. Based on that the code can better decide when vmalloc and ioremap 1960 - * mapping changes need to be synchronized to other page-tables in the system. 1957 + * These are used by the p?d_alloc_track*() and p*d_populate_kernel() 1958 + * functions in the generic vmalloc, ioremap and page table update code 1959 + * to track at which page-table levels entries have been modified. 1960 + * Based on that the code can better decide when page table changes need 1961 + * to be synchronized to other page-tables in the system. 1961 1962 */ 1962 1963 #define __PGTBL_PGD_MODIFIED 0 1963 1964 #define __PGTBL_P4D_MODIFIED 1
+6 -6
mm/kasan/init.c
··· 13 13 #include <linux/mm.h> 14 14 #include <linux/pfn.h> 15 15 #include <linux/slab.h> 16 + #include <linux/pgalloc.h> 16 17 17 18 #include <asm/page.h> 18 - #include <asm/pgalloc.h> 19 19 20 20 #include "kasan.h" 21 21 ··· 191 191 pud_t *pud; 192 192 pmd_t *pmd; 193 193 194 - p4d_populate(&init_mm, p4d, 194 + p4d_populate_kernel(addr, p4d, 195 195 lm_alias(kasan_early_shadow_pud)); 196 196 pud = pud_offset(p4d, addr); 197 197 pud_populate(&init_mm, pud, ··· 212 212 } else { 213 213 p = early_alloc(PAGE_SIZE, NUMA_NO_NODE); 214 214 pud_init(p); 215 - p4d_populate(&init_mm, p4d, p); 215 + p4d_populate_kernel(addr, p4d, p); 216 216 } 217 217 } 218 218 zero_pud_populate(p4d, addr, next); ··· 251 251 * puds,pmds, so pgd_populate(), pud_populate() 252 252 * is noops. 253 253 */ 254 - pgd_populate(&init_mm, pgd, 254 + pgd_populate_kernel(addr, pgd, 255 255 lm_alias(kasan_early_shadow_p4d)); 256 256 p4d = p4d_offset(pgd, addr); 257 - p4d_populate(&init_mm, p4d, 257 + p4d_populate_kernel(addr, p4d, 258 258 lm_alias(kasan_early_shadow_pud)); 259 259 pud = pud_offset(p4d, addr); 260 260 pud_populate(&init_mm, pud, ··· 273 273 if (!p) 274 274 return -ENOMEM; 275 275 } else { 276 - pgd_populate(&init_mm, pgd, 276 + pgd_populate_kernel(addr, pgd, 277 277 early_alloc(PAGE_SIZE, NUMA_NO_NODE)); 278 278 } 279 279 }
+3 -3
mm/percpu.c
··· 3108 3108 #endif /* BUILD_EMBED_FIRST_CHUNK */ 3109 3109 3110 3110 #ifdef BUILD_PAGE_FIRST_CHUNK 3111 - #include <asm/pgalloc.h> 3111 + #include <linux/pgalloc.h> 3112 3112 3113 3113 #ifndef P4D_TABLE_SIZE 3114 3114 #define P4D_TABLE_SIZE PAGE_SIZE ··· 3134 3134 3135 3135 if (pgd_none(*pgd)) { 3136 3136 p4d = memblock_alloc_or_panic(P4D_TABLE_SIZE, P4D_TABLE_SIZE); 3137 - pgd_populate(&init_mm, pgd, p4d); 3137 + pgd_populate_kernel(addr, pgd, p4d); 3138 3138 } 3139 3139 3140 3140 p4d = p4d_offset(pgd, addr); 3141 3141 if (p4d_none(*p4d)) { 3142 3142 pud = memblock_alloc_or_panic(PUD_TABLE_SIZE, PUD_TABLE_SIZE); 3143 - p4d_populate(&init_mm, p4d, pud); 3143 + p4d_populate_kernel(addr, p4d, pud); 3144 3144 } 3145 3145 3146 3146 pud = pud_offset(p4d, addr);
+3 -3
mm/sparse-vmemmap.c
··· 27 27 #include <linux/spinlock.h> 28 28 #include <linux/vmalloc.h> 29 29 #include <linux/sched.h> 30 + #include <linux/pgalloc.h> 30 31 31 32 #include <asm/dma.h> 32 - #include <asm/pgalloc.h> 33 33 #include <asm/tlbflush.h> 34 34 35 35 #include "hugetlb_vmemmap.h" ··· 229 229 if (!p) 230 230 return NULL; 231 231 pud_init(p); 232 - p4d_populate(&init_mm, p4d, p); 232 + p4d_populate_kernel(addr, p4d, p); 233 233 } 234 234 return p4d; 235 235 } ··· 241 241 void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node); 242 242 if (!p) 243 243 return NULL; 244 - pgd_populate(&init_mm, pgd, p); 244 + pgd_populate_kernel(addr, pgd, p); 245 245 } 246 246 return pgd; 247 247 }