Merge tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes and updates from Ingo Molnar:

- Fix a large number of x86 Kconfig dependency and help text accuracy
bugs/problems, by Mateusz Jończyk and David Heideberg

- Fix a VM_PAT interaction with fork() crash. This also touches core
kernel code

- Fix an ORC unwinder bug for interrupt entries

- Fixes and cleanups

- Fix an AMD microcode loader bug that can promote verification
failures into success

- Add early-printk support for MMIO based UARTs on an x86 board that
had no other serial debugging facility and also experienced early
boot crashes

* tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Fix __apply_microcode_amd()'s return value
x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
x86/fpu: Update the outdated comment above fpstate_init_user()
x86/early_printk: Add support for MMIO-based UARTs
x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment
x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1
x86/Kconfig: Fix lists in X86_EXTENDED_PLATFORM help text
x86/Kconfig: Correct X86_X2APIC help text
x86/speculation: Remove the extra #ifdef around CALL_NOSPEC
x86/Kconfig: Document release year of glibc 2.3.3
x86/Kconfig: Make CONFIG_PCI_CNB20LE_QUIRK depend on X86_32
x86/Kconfig: Document CONFIG_PCI_MMCONFIG
x86/Kconfig: Update lists in X86_EXTENDED_PLATFORM
x86/Kconfig: Move all X86_EXTENDED_PLATFORM options together
x86/Kconfig: Always enable ARCH_SPARSEMEM_ENABLE
x86/Kconfig: Enable X86_X2APIC by default and improve help text

+172 -75
+8 -1
Documentation/admin-guide/kernel-parameters.txt
··· 1407 earlyprintk=serial[,0x...[,baudrate]] 1408 earlyprintk=ttySn[,baudrate] 1409 earlyprintk=dbgp[debugController#] 1410 - earlyprintk=pciserial[,force],bus:device.function[,baudrate] 1411 earlyprintk=xdbc[xhciController#] 1412 earlyprintk=bios 1413 1414 earlyprintk is useful when the kernel crashes before 1415 the normal console is initialized. It is not enabled by 1416 default because it has some cosmetic problems. 1417 1418 Append ",keep" to not disable it when the real console 1419 takes over.
··· 1407 earlyprintk=serial[,0x...[,baudrate]] 1408 earlyprintk=ttySn[,baudrate] 1409 earlyprintk=dbgp[debugController#] 1410 + earlyprintk=pciserial[,force],bus:device.function[,{nocfg|baudrate}] 1411 earlyprintk=xdbc[xhciController#] 1412 earlyprintk=bios 1413 + earlyprintk=mmio,membase[,{nocfg|baudrate}] 1414 1415 earlyprintk is useful when the kernel crashes before 1416 the normal console is initialized. It is not enabled by 1417 default because it has some cosmetic problems. 1418 + 1419 + Only 32-bit memory addresses are supported for "mmio" 1420 + and "pciserial" devices. 1421 + 1422 + Use "nocfg" to skip UART configuration, assume 1423 + BIOS/firmware has configured UART correctly. 1424 1425 Append ",keep" to not disable it when the real console 1426 takes over.
+56 -27
arch/x86/Kconfig
··· 460 If you don't know what to do here, say N. 461 462 config X86_X2APIC 463 - bool "Support x2apic" 464 depends on X86_LOCAL_APIC && X86_64 && (IRQ_REMAP || HYPERVISOR_GUEST) 465 help 466 - This enables x2apic support on CPUs that have this feature. 467 468 - This allows 32-bit apic IDs (so it can support very large systems), 469 - and accesses the local apic via MSRs not via mmio. 470 471 - Some Intel systems circa 2022 and later are locked into x2APIC mode 472 - and can not fall back to the legacy APIC modes if SGX or TDX are 473 - enabled in the BIOS. They will boot with very reduced functionality 474 - without enabling this option. 475 476 - If you don't know what to do here, say N. 477 478 config X86_POSTED_MSI 479 bool "Enable MSI and MSI-x delivery by posted interrupts" ··· 552 CONFIG_64BIT. 553 554 32-bit platforms (CONFIG_64BIT=n): 555 - Goldfish (Android emulator) 556 - AMD Elan 557 RDC R-321x SoC 558 - SGI 320/540 (Visual Workstation) 559 560 64-bit platforms (CONFIG_64BIT=y): 561 Numascale NumaChip 562 ScaleMP vSMP 563 SGI Ultraviolet 564 Merrifield/Moorefield MID devices 565 566 If you have one of these systems, or if you want to build a 567 generic distribution kernel, say Y here - otherwise say N. ··· 676 Say Y here if you have a Quark based system such as the Arduino 677 compatible Intel Galileo. 678 679 config X86_INTEL_LPSS 680 bool "Intel Low Power Subsystem Support" 681 depends on X86 && ACPI && PCI ··· 739 device they want to access. 740 741 If you don't require the option or are in doubt, say N. 742 - 743 - config X86_RDC321X 744 - bool "RDC R-321x SoC" 745 - depends on X86_32 746 - depends on X86_EXTENDED_PLATFORM 747 - select M486 748 - select X86_REBOOTFIXUPS 749 - help 750 - This option is needed for RDC R-321x system-on-chip, also known 751 - as R-8610-(G). 752 - If you don't have one of these chips, you should say N here. 753 754 config X86_SUPPORTS_MEMORY_FAILURE 755 def_bool y ··· 1574 1575 config ARCH_SPARSEMEM_ENABLE 1576 def_bool y 1577 - depends on X86_64 || NUMA || X86_32 1578 select SPARSEMEM_STATIC if X86_32 1579 select SPARSEMEM_VMEMMAP_ENABLE if X86_64 1580 ··· 2220 2221 config COMPAT_VDSO 2222 def_bool n 2223 - prompt "Disable the 32-bit vDSO (needed for glibc 2.3.3)" 2224 depends on COMPAT_32 2225 help 2226 Certain buggy versions of glibc will crash if they are ··· 2909 default y 2910 depends on PCI && (ACPI || JAILHOUSE_GUEST) 2911 depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG) 2912 2913 config PCI_OLPC 2914 def_bool y ··· 2936 depends on X86_64 && PCI_MMCONFIG && ACPI 2937 2938 config PCI_CNB20LE_QUIRK 2939 - bool "Read CNB20LE Host Bridge Windows" if EXPERT 2940 - depends on PCI 2941 help 2942 Read the PCI windows out of the CNB20LE host bridge. This allows 2943 PCI hotplug to work on systems with the CNB20LE chipset which do 2944 not have ACPI. 2945 2946 There's no public spec for this chipset, and this functionality 2947 is known to be incomplete.
··· 460 If you don't know what to do here, say N. 461 462 config X86_X2APIC 463 + bool "x2APIC interrupt controller architecture support" 464 depends on X86_LOCAL_APIC && X86_64 && (IRQ_REMAP || HYPERVISOR_GUEST) 465 + default y 466 help 467 + x2APIC is an interrupt controller architecture, a component of which 468 + (the local APIC) is present in the CPU. It allows faster access to 469 + the local APIC and supports a larger number of CPUs in the system 470 + than the predecessors. 471 472 + x2APIC was introduced in Intel CPUs around 2008 and in AMD EPYC CPUs 473 + in 2019, but it can be disabled by the BIOS. It is also frequently 474 + emulated in virtual machines, even when the host CPU does not support 475 + it. Support in the CPU can be checked by executing 476 + grep x2apic /proc/cpuinfo 477 478 + If this configuration option is disabled, the kernel will boot with 479 + very reduced functionality and performance on some platforms that 480 + have x2APIC enabled. On the other hand, on hardware that does not 481 + support x2APIC, a kernel with this option enabled will just fallback 482 + to older APIC implementations. 483 484 + If in doubt, say Y. 485 486 config X86_POSTED_MSI 487 bool "Enable MSI and MSI-x delivery by posted interrupts" ··· 544 CONFIG_64BIT. 545 546 32-bit platforms (CONFIG_64BIT=n): 547 + Goldfish (mostly Android emulator) 548 + Intel CE media processor (CE4100) SoC 549 + Intel Quark 550 RDC R-321x SoC 551 552 64-bit platforms (CONFIG_64BIT=y): 553 Numascale NumaChip 554 ScaleMP vSMP 555 SGI Ultraviolet 556 Merrifield/Moorefield MID devices 557 + Goldfish (mostly Android emulator) 558 559 If you have one of these systems, or if you want to build a 560 generic distribution kernel, say Y here - otherwise say N. ··· 667 Say Y here if you have a Quark based system such as the Arduino 668 compatible Intel Galileo. 669 670 + config X86_RDC321X 671 + bool "RDC R-321x SoC" 672 + depends on X86_32 673 + depends on X86_EXTENDED_PLATFORM 674 + select M486 675 + select X86_REBOOTFIXUPS 676 + help 677 + This option is needed for RDC R-321x system-on-chip, also known 678 + as R-8610-(G). 679 + If you don't have one of these chips, you should say N here. 680 + 681 config X86_INTEL_LPSS 682 bool "Intel Low Power Subsystem Support" 683 depends on X86 && ACPI && PCI ··· 719 device they want to access. 720 721 If you don't require the option or are in doubt, say N. 722 723 config X86_SUPPORTS_MEMORY_FAILURE 724 def_bool y ··· 1565 1566 config ARCH_SPARSEMEM_ENABLE 1567 def_bool y 1568 select SPARSEMEM_STATIC if X86_32 1569 select SPARSEMEM_VMEMMAP_ENABLE if X86_64 1570 ··· 2212 2213 config COMPAT_VDSO 2214 def_bool n 2215 + prompt "Workaround for glibc 2.3.2 / 2.3.3 (released in year 2003/2004)" 2216 depends on COMPAT_32 2217 help 2218 Certain buggy versions of glibc will crash if they are ··· 2901 default y 2902 depends on PCI && (ACPI || JAILHOUSE_GUEST) 2903 depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG) 2904 + help 2905 + Add support for accessing the PCI configuration space as a memory 2906 + mapped area. It is the recommended method if the system supports 2907 + this (it must have PCI Express and ACPI for it to be available). 2908 + 2909 + In the unlikely case that enabling this configuration option causes 2910 + problems, the mechanism can be switched off with the 'pci=nommconf' 2911 + command line parameter. 2912 + 2913 + Say N only if you are sure that your platform does not support this 2914 + access method or you have problems caused by it. 2915 + 2916 + Say Y otherwise. 2917 2918 config PCI_OLPC 2919 def_bool y ··· 2915 depends on X86_64 && PCI_MMCONFIG && ACPI 2916 2917 config PCI_CNB20LE_QUIRK 2918 + bool "Read PCI host bridge windows from the CNB20LE chipset" if EXPERT 2919 + depends on X86_32 && PCI 2920 help 2921 Read the PCI windows out of the CNB20LE host bridge. This allows 2922 PCI hotplug to work on systems with the CNB20LE chipset which do 2923 not have ACPI. 2924 + 2925 + The ServerWorks (later Broadcom) CNB20LE was a chipset designed 2926 + most probably only for Pentium III. 2927 + 2928 + To find out if you have such a chipset, search for a PCI device with 2929 + 1166:0009 PCI IDs, for example by executing 2930 + lspci -nn | grep '1166:0009' 2931 + The code is inactive if there is none. 2932 2933 There's no public spec for this chipset, and this functionality 2934 is known to be incomplete.
+2
arch/x86/entry/calling.h
··· 70 pushq %rsi /* pt_regs->si */ 71 movq 8(%rsp), %rsi /* temporarily store the return address in %rsi */ 72 movq %rdi, 8(%rsp) /* pt_regs->di (overwriting original return address) */ 73 .else 74 pushq %rdi /* pt_regs->di */ 75 pushq %rsi /* pt_regs->si */
··· 70 pushq %rsi /* pt_regs->si */ 71 movq 8(%rsp), %rsi /* temporarily store the return address in %rsi */ 72 movq %rdi, 8(%rsp) /* pt_regs->di (overwriting original return address) */ 73 + /* We just clobbered the return address - use the IRET frame for unwinding: */ 74 + UNWIND_HINT_IRET_REGS offset=3*8 75 .else 76 pushq %rdi /* pt_regs->di */ 77 pushq %rsi /* pt_regs->si */
-4
arch/x86/include/asm/nospec-branch.h
··· 435 * Inline asm uses the %V modifier which is only in newer GCC 436 * which is ensured when CONFIG_MITIGATION_RETPOLINE is defined. 437 */ 438 - #ifdef CONFIG_MITIGATION_RETPOLINE 439 #define CALL_NOSPEC __CS_PREFIX("%V[thunk_target]") \ 440 "call __x86_indirect_thunk_%V[thunk_target]\n" 441 - #else 442 - #define CALL_NOSPEC "call *%[thunk_target]\n" 443 - #endif 444 445 # define THUNK_TARGET(addr) [thunk_target] "r" (addr) 446
··· 435 * Inline asm uses the %V modifier which is only in newer GCC 436 * which is ensured when CONFIG_MITIGATION_RETPOLINE is defined. 437 */ 438 #define CALL_NOSPEC __CS_PREFIX("%V[thunk_target]") \ 439 "call __x86_indirect_thunk_%V[thunk_target]\n" 440 441 # define THUNK_TARGET(addr) [thunk_target] "r" (addr) 442
+1 -1
arch/x86/kernel/cpu/microcode/amd.c
··· 600 unsigned long p_addr = (unsigned long)&mc->hdr.data_code; 601 602 if (!verify_sha256_digest(mc->hdr.patch_id, *cur_rev, (const u8 *)p_addr, psize)) 603 - return -1; 604 605 native_wrmsrl(MSR_AMD64_PATCH_LOADER, p_addr); 606
··· 600 unsigned long p_addr = (unsigned long)&mc->hdr.data_code; 601 602 if (!verify_sha256_digest(mc->hdr.patch_id, *cur_rev, (const u8 *)p_addr, psize)) 603 + return false; 604 605 native_wrmsrl(MSR_AMD64_PATCH_LOADER, p_addr); 606
+2 -3
arch/x86/kernel/dumpstack.c
··· 195 printk("%sCall Trace:\n", log_lvl); 196 197 unwind_start(&state, task, regs, stack); 198 regs = unwind_get_entry_regs(&state, &partial); 199 200 /* ··· 214 * - hardirq stack 215 * - entry stack 216 */ 217 - for (stack = stack ?: get_stack_pointer(task, regs); 218 - stack; 219 - stack = stack_info.next_sp) { 220 const char *stack_name; 221 222 stack = PTR_ALIGN(stack, sizeof(long));
··· 195 printk("%sCall Trace:\n", log_lvl); 196 197 unwind_start(&state, task, regs, stack); 198 + stack = stack ?: get_stack_pointer(task, regs); 199 regs = unwind_get_entry_regs(&state, &partial); 200 201 /* ··· 213 * - hardirq stack 214 * - entry stack 215 */ 216 + for (; stack; stack = stack_info.next_sp) { 217 const char *stack_name; 218 219 stack = PTR_ALIGN(stack, sizeof(long));
+44 -1
arch/x86/kernel/early_printk.c
··· 190 early_serial_hw_init(divisor); 191 } 192 193 - #ifdef CONFIG_PCI 194 static __noendbr void mem32_serial_out(unsigned long addr, int offset, int value) 195 { 196 u32 __iomem *vaddr = (u32 __iomem *)addr; ··· 206 } 207 ANNOTATE_NOENDBR_SYM(mem32_serial_in); 208 209 /* 210 * early_pci_serial_init() 211 * ··· 389 keep = (strstr(buf, "keep") != NULL); 390 391 while (*buf != '\0') { 392 if (!strncmp(buf, "serial", 6)) { 393 buf += 6; 394 early_serial_init(buf);
··· 190 early_serial_hw_init(divisor); 191 } 192 193 static __noendbr void mem32_serial_out(unsigned long addr, int offset, int value) 194 { 195 u32 __iomem *vaddr = (u32 __iomem *)addr; ··· 207 } 208 ANNOTATE_NOENDBR_SYM(mem32_serial_in); 209 210 + /* 211 + * early_mmio_serial_init() - Initialize MMIO-based early serial console. 212 + * @s: MMIO-based serial specification. 213 + */ 214 + static __init void early_mmio_serial_init(char *s) 215 + { 216 + unsigned long baudrate; 217 + unsigned long membase; 218 + char *e; 219 + 220 + if (*s == ',') 221 + s++; 222 + 223 + if (!strncmp(s, "0x", 2)) { 224 + /* NB: only 32-bit addresses are supported. */ 225 + membase = simple_strtoul(s, &e, 16); 226 + early_serial_base = (unsigned long)early_ioremap(membase, PAGE_SIZE); 227 + 228 + static_call_update(serial_in, mem32_serial_in); 229 + static_call_update(serial_out, mem32_serial_out); 230 + 231 + s += strcspn(s, ","); 232 + if (*s == ',') 233 + s++; 234 + } 235 + 236 + if (!strncmp(s, "nocfg", 5)) { 237 + baudrate = 0; 238 + } else { 239 + baudrate = simple_strtoul(s, &e, 0); 240 + if (baudrate == 0 || s == e) 241 + baudrate = DEFAULT_BAUD; 242 + } 243 + 244 + if (baudrate) 245 + early_serial_hw_init(115200 / baudrate); 246 + } 247 + 248 + #ifdef CONFIG_PCI 249 /* 250 * early_pci_serial_init() 251 * ··· 351 keep = (strstr(buf, "keep") != NULL); 352 353 while (*buf != '\0') { 354 + if (!strncmp(buf, "mmio", 4)) { 355 + early_mmio_serial_init(buf + 4); 356 + early_console_register(&early_serial_console, keep); 357 + buf += 4; 358 + } 359 if (!strncmp(buf, "serial", 6)) { 360 buf += 6; 361 early_serial_init(buf);
+1 -1
arch/x86/kernel/fpu/core.c
··· 508 /* 509 * Used in two places: 510 * 1) Early boot to setup init_fpstate for non XSAVE systems 511 - * 2) fpu_init_fpstate_user() which is invoked from KVM 512 */ 513 void fpstate_init_user(struct fpstate *fpstate) 514 {
··· 508 /* 509 * Used in two places: 510 * 1) Early boot to setup init_fpstate for non XSAVE systems 511 + * 2) fpu_alloc_guest_fpstate() which is invoked from KVM 512 */ 513 void fpstate_init_user(struct fpstate *fpstate) 514 {
+28 -24
arch/x86/mm/pat/memtype.c
··· 984 return -EINVAL; 985 } 986 987 - /* 988 - * track_pfn_copy is called when vma that is covering the pfnmap gets 989 - * copied through copy_page_range(). 990 - * 991 - * If the vma has a linear pfn mapping for the entire range, we get the prot 992 - * from pte and reserve the entire vma range with single reserve_pfn_range call. 993 - */ 994 - int track_pfn_copy(struct vm_area_struct *vma) 995 { 996 resource_size_t paddr; 997 - unsigned long vma_size = vma->vm_end - vma->vm_start; 998 pgprot_t pgprot; 999 1000 - if (vma->vm_flags & VM_PAT) { 1001 - if (get_pat_info(vma, &paddr, &pgprot)) 1002 - return -EINVAL; 1003 - /* reserve the whole chunk covered by vma. */ 1004 - return reserve_pfn_range(paddr, vma_size, &pgprot, 1); 1005 - } 1006 1007 return 0; 1008 } 1009 1010 /* ··· 1108 } 1109 } 1110 1111 - /* 1112 - * untrack_pfn_clear is called if the following situation fits: 1113 - * 1114 - * 1) while mremapping a pfnmap for a new region, with the old vma after 1115 - * its pfnmap page table has been removed. The new vma has a new pfnmap 1116 - * to the same pfn & cache type with VM_PAT set. 1117 - * 2) while duplicating vm area, the new vma fails to copy the pgtable from 1118 - * old vma. 1119 - */ 1120 void untrack_pfn_clear(struct vm_area_struct *vma) 1121 { 1122 vm_flags_clear(vma, VM_PAT);
··· 984 return -EINVAL; 985 } 986 987 + int track_pfn_copy(struct vm_area_struct *dst_vma, 988 + struct vm_area_struct *src_vma, unsigned long *pfn) 989 { 990 + const unsigned long vma_size = src_vma->vm_end - src_vma->vm_start; 991 resource_size_t paddr; 992 pgprot_t pgprot; 993 + int rc; 994 995 + if (!(src_vma->vm_flags & VM_PAT)) 996 + return 0; 997 998 + /* 999 + * Duplicate the PAT information for the dst VMA based on the src 1000 + * VMA. 1001 + */ 1002 + if (get_pat_info(src_vma, &paddr, &pgprot)) 1003 + return -EINVAL; 1004 + rc = reserve_pfn_range(paddr, vma_size, &pgprot, 1); 1005 + if (rc) 1006 + return rc; 1007 + 1008 + /* Reservation for the destination VMA succeeded. */ 1009 + vm_flags_set(dst_vma, VM_PAT); 1010 + *pfn = PHYS_PFN(paddr); 1011 return 0; 1012 + } 1013 + 1014 + void untrack_pfn_copy(struct vm_area_struct *dst_vma, unsigned long pfn) 1015 + { 1016 + untrack_pfn(dst_vma, pfn, dst_vma->vm_end - dst_vma->vm_start, true); 1017 + /* 1018 + * Reservation was freed, any copied page tables will get cleaned 1019 + * up later, but without getting PAT involved again. 1020 + */ 1021 } 1022 1023 /* ··· 1095 } 1096 } 1097 1098 void untrack_pfn_clear(struct vm_area_struct *vma) 1099 { 1100 vm_flags_clear(vma, VM_PAT);
+22 -6
include/linux/pgtable.h
··· 1508 } 1509 1510 /* 1511 - * track_pfn_copy is called when vma that is covering the pfnmap gets 1512 - * copied through copy_page_range(). 1513 */ 1514 - static inline int track_pfn_copy(struct vm_area_struct *vma) 1515 { 1516 return 0; 1517 } 1518 1519 /* ··· 1539 } 1540 1541 /* 1542 - * untrack_pfn_clear is called while mremapping a pfnmap for a new region 1543 - * or fails to copy pgtable during duplicate vm area. 1544 */ 1545 static inline void untrack_pfn_clear(struct vm_area_struct *vma) 1546 { ··· 1553 unsigned long size); 1554 extern void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot, 1555 pfn_t pfn); 1556 - extern int track_pfn_copy(struct vm_area_struct *vma); 1557 extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn, 1558 unsigned long size, bool mm_wr_locked); 1559 extern void untrack_pfn_clear(struct vm_area_struct *vma);
··· 1508 } 1509 1510 /* 1511 + * track_pfn_copy is called when a VM_PFNMAP VMA is about to get the page 1512 + * tables copied during copy_page_range(). On success, stores the pfn to be 1513 + * passed to untrack_pfn_copy(). 1514 */ 1515 + static inline int track_pfn_copy(struct vm_area_struct *dst_vma, 1516 + struct vm_area_struct *src_vma, unsigned long *pfn) 1517 { 1518 return 0; 1519 + } 1520 + 1521 + /* 1522 + * untrack_pfn_copy is called when a VM_PFNMAP VMA failed to copy during 1523 + * copy_page_range(), but after track_pfn_copy() was already called. 1524 + */ 1525 + static inline void untrack_pfn_copy(struct vm_area_struct *dst_vma, 1526 + unsigned long pfn) 1527 + { 1528 } 1529 1530 /* ··· 1528 } 1529 1530 /* 1531 + * untrack_pfn_clear is called in the following cases on a VM_PFNMAP VMA: 1532 + * 1533 + * 1) During mremap() on the src VMA after the page tables were moved. 1534 + * 2) During fork() on the dst VMA, immediately after duplicating the src VMA. 1535 */ 1536 static inline void untrack_pfn_clear(struct vm_area_struct *vma) 1537 { ··· 1540 unsigned long size); 1541 extern void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot, 1542 pfn_t pfn); 1543 + extern int track_pfn_copy(struct vm_area_struct *dst_vma, 1544 + struct vm_area_struct *src_vma, unsigned long *pfn); 1545 + extern void untrack_pfn_copy(struct vm_area_struct *dst_vma, 1546 + unsigned long pfn); 1547 extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn, 1548 unsigned long size, bool mm_wr_locked); 1549 extern void untrack_pfn_clear(struct vm_area_struct *vma);
+4
kernel/fork.c
··· 504 vma_numab_state_init(new); 505 dup_anon_vma_name(orig, new); 506 507 return new; 508 } 509
··· 504 vma_numab_state_init(new); 505 dup_anon_vma_name(orig, new); 506 507 + /* track_pfn_copy() will later take care of copying internal state. */ 508 + if (unlikely(new->vm_flags & VM_PFNMAP)) 509 + untrack_pfn_clear(new); 510 + 511 return new; 512 } 513
+4 -7
mm/memory.c
··· 1362 copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) 1363 { 1364 pgd_t *src_pgd, *dst_pgd; 1365 - unsigned long next; 1366 unsigned long addr = src_vma->vm_start; 1367 unsigned long end = src_vma->vm_end; 1368 struct mm_struct *dst_mm = dst_vma->vm_mm; 1369 struct mm_struct *src_mm = src_vma->vm_mm; 1370 struct mmu_notifier_range range; 1371 bool is_cow; 1372 int ret; 1373 ··· 1378 return copy_hugetlb_page_range(dst_mm, src_mm, dst_vma, src_vma); 1379 1380 if (unlikely(src_vma->vm_flags & VM_PFNMAP)) { 1381 - /* 1382 - * We do not free on error cases below as remove_vma 1383 - * gets called on error from higher level routine 1384 - */ 1385 - ret = track_pfn_copy(src_vma); 1386 if (ret) 1387 return ret; 1388 } ··· 1415 continue; 1416 if (unlikely(copy_p4d_range(dst_vma, src_vma, dst_pgd, src_pgd, 1417 addr, next))) { 1418 - untrack_pfn_clear(dst_vma); 1419 ret = -ENOMEM; 1420 break; 1421 } ··· 1424 raw_write_seqcount_end(&src_mm->write_protect_seq); 1425 mmu_notifier_invalidate_range_end(&range); 1426 } 1427 return ret; 1428 } 1429
··· 1362 copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) 1363 { 1364 pgd_t *src_pgd, *dst_pgd; 1365 unsigned long addr = src_vma->vm_start; 1366 unsigned long end = src_vma->vm_end; 1367 struct mm_struct *dst_mm = dst_vma->vm_mm; 1368 struct mm_struct *src_mm = src_vma->vm_mm; 1369 struct mmu_notifier_range range; 1370 + unsigned long next, pfn; 1371 bool is_cow; 1372 int ret; 1373 ··· 1378 return copy_hugetlb_page_range(dst_mm, src_mm, dst_vma, src_vma); 1379 1380 if (unlikely(src_vma->vm_flags & VM_PFNMAP)) { 1381 + ret = track_pfn_copy(dst_vma, src_vma, &pfn); 1382 if (ret) 1383 return ret; 1384 } ··· 1419 continue; 1420 if (unlikely(copy_p4d_range(dst_vma, src_vma, dst_pgd, src_pgd, 1421 addr, next))) { 1422 ret = -ENOMEM; 1423 break; 1424 } ··· 1429 raw_write_seqcount_end(&src_mm->write_protect_seq); 1430 mmu_notifier_invalidate_range_end(&range); 1431 } 1432 + if (ret && unlikely(src_vma->vm_flags & VM_PFNMAP)) 1433 + untrack_pfn_copy(dst_vma, pfn); 1434 return ret; 1435 } 1436