Merge tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull misc x86 fixes and updates from Ingo Molnar:

- Fix a large number of x86 Kconfig dependency and help text accuracy
bugs/problems, by Mateusz Jończyk and David Heideberg

- Fix a VM_PAT interaction with fork() crash. This also touches core
kernel code

- Fix an ORC unwinder bug for interrupt entries

- Fixes and cleanups

- Fix an AMD microcode loader bug that can promote verification
failures into success

- Add early-printk support for MMIO based UARTs on an x86 board that
had no other serial debugging facility and also experienced early
boot crashes

* tag 'x86-urgent-2025-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode/AMD: Fix __apply_microcode_amd()'s return value
x86/mm/pat: Fix VM_PAT handling when fork() fails in copy_page_range()
x86/fpu: Update the outdated comment above fpstate_init_user()
x86/early_printk: Add support for MMIO-based UARTs
x86/dumpstack: Fix inaccurate unwinding from exception stacks due to misplaced assignment
x86/entry: Fix ORC unwinder for PUSH_REGS with save_ret=1
x86/Kconfig: Fix lists in X86_EXTENDED_PLATFORM help text
x86/Kconfig: Correct X86_X2APIC help text
x86/speculation: Remove the extra #ifdef around CALL_NOSPEC
x86/Kconfig: Document release year of glibc 2.3.3
x86/Kconfig: Make CONFIG_PCI_CNB20LE_QUIRK depend on X86_32
x86/Kconfig: Document CONFIG_PCI_MMCONFIG
x86/Kconfig: Update lists in X86_EXTENDED_PLATFORM
x86/Kconfig: Move all X86_EXTENDED_PLATFORM options together
x86/Kconfig: Always enable ARCH_SPARSEMEM_ENABLE
x86/Kconfig: Enable X86_X2APIC by default and improve help text

+172 -75
+8 -1
Documentation/admin-guide/kernel-parameters.txt
··· 1407 1407 earlyprintk=serial[,0x...[,baudrate]] 1408 1408 earlyprintk=ttySn[,baudrate] 1409 1409 earlyprintk=dbgp[debugController#] 1410 - earlyprintk=pciserial[,force],bus:device.function[,baudrate] 1410 + earlyprintk=pciserial[,force],bus:device.function[,{nocfg|baudrate}] 1411 1411 earlyprintk=xdbc[xhciController#] 1412 1412 earlyprintk=bios 1413 + earlyprintk=mmio,membase[,{nocfg|baudrate}] 1413 1414 1414 1415 earlyprintk is useful when the kernel crashes before 1415 1416 the normal console is initialized. It is not enabled by 1416 1417 default because it has some cosmetic problems. 1418 + 1419 + Only 32-bit memory addresses are supported for "mmio" 1420 + and "pciserial" devices. 1421 + 1422 + Use "nocfg" to skip UART configuration, assume 1423 + BIOS/firmware has configured UART correctly. 1417 1424 1418 1425 Append ",keep" to not disable it when the real console 1419 1426 takes over.
+56 -27
arch/x86/Kconfig
··· 460 460 If you don't know what to do here, say N. 461 461 462 462 config X86_X2APIC 463 - bool "Support x2apic" 463 + bool "x2APIC interrupt controller architecture support" 464 464 depends on X86_LOCAL_APIC && X86_64 && (IRQ_REMAP || HYPERVISOR_GUEST) 465 + default y 465 466 help 466 - This enables x2apic support on CPUs that have this feature. 467 + x2APIC is an interrupt controller architecture, a component of which 468 + (the local APIC) is present in the CPU. It allows faster access to 469 + the local APIC and supports a larger number of CPUs in the system 470 + than the predecessors. 467 471 468 - This allows 32-bit apic IDs (so it can support very large systems), 469 - and accesses the local apic via MSRs not via mmio. 472 + x2APIC was introduced in Intel CPUs around 2008 and in AMD EPYC CPUs 473 + in 2019, but it can be disabled by the BIOS. It is also frequently 474 + emulated in virtual machines, even when the host CPU does not support 475 + it. Support in the CPU can be checked by executing 476 + grep x2apic /proc/cpuinfo 470 477 471 - Some Intel systems circa 2022 and later are locked into x2APIC mode 472 - and can not fall back to the legacy APIC modes if SGX or TDX are 473 - enabled in the BIOS. They will boot with very reduced functionality 474 - without enabling this option. 478 + If this configuration option is disabled, the kernel will boot with 479 + very reduced functionality and performance on some platforms that 480 + have x2APIC enabled. On the other hand, on hardware that does not 481 + support x2APIC, a kernel with this option enabled will just fallback 482 + to older APIC implementations. 475 483 476 - If you don't know what to do here, say N. 484 + If in doubt, say Y. 477 485 478 486 config X86_POSTED_MSI 479 487 bool "Enable MSI and MSI-x delivery by posted interrupts" ··· 552 544 CONFIG_64BIT. 553 545 554 546 32-bit platforms (CONFIG_64BIT=n): 555 - Goldfish (Android emulator) 556 - AMD Elan 547 + Goldfish (mostly Android emulator) 548 + Intel CE media processor (CE4100) SoC 549 + Intel Quark 557 550 RDC R-321x SoC 558 - SGI 320/540 (Visual Workstation) 559 551 560 552 64-bit platforms (CONFIG_64BIT=y): 561 553 Numascale NumaChip 562 554 ScaleMP vSMP 563 555 SGI Ultraviolet 564 556 Merrifield/Moorefield MID devices 557 + Goldfish (mostly Android emulator) 565 558 566 559 If you have one of these systems, or if you want to build a 567 560 generic distribution kernel, say Y here - otherwise say N. ··· 676 667 Say Y here if you have a Quark based system such as the Arduino 677 668 compatible Intel Galileo. 678 669 670 + config X86_RDC321X 671 + bool "RDC R-321x SoC" 672 + depends on X86_32 673 + depends on X86_EXTENDED_PLATFORM 674 + select M486 675 + select X86_REBOOTFIXUPS 676 + help 677 + This option is needed for RDC R-321x system-on-chip, also known 678 + as R-8610-(G). 679 + If you don't have one of these chips, you should say N here. 680 + 679 681 config X86_INTEL_LPSS 680 682 bool "Intel Low Power Subsystem Support" 681 683 depends on X86 && ACPI && PCI ··· 739 719 device they want to access. 740 720 741 721 If you don't require the option or are in doubt, say N. 742 - 743 - config X86_RDC321X 744 - bool "RDC R-321x SoC" 745 - depends on X86_32 746 - depends on X86_EXTENDED_PLATFORM 747 - select M486 748 - select X86_REBOOTFIXUPS 749 - help 750 - This option is needed for RDC R-321x system-on-chip, also known 751 - as R-8610-(G). 752 - If you don't have one of these chips, you should say N here. 753 722 754 723 config X86_SUPPORTS_MEMORY_FAILURE 755 724 def_bool y ··· 1574 1565 1575 1566 config ARCH_SPARSEMEM_ENABLE 1576 1567 def_bool y 1577 - depends on X86_64 || NUMA || X86_32 1578 1568 select SPARSEMEM_STATIC if X86_32 1579 1569 select SPARSEMEM_VMEMMAP_ENABLE if X86_64 1580 1570 ··· 2220 2212 2221 2213 config COMPAT_VDSO 2222 2214 def_bool n 2223 - prompt "Disable the 32-bit vDSO (needed for glibc 2.3.3)" 2215 + prompt "Workaround for glibc 2.3.2 / 2.3.3 (released in year 2003/2004)" 2224 2216 depends on COMPAT_32 2225 2217 help 2226 2218 Certain buggy versions of glibc will crash if they are ··· 2909 2901 default y 2910 2902 depends on PCI && (ACPI || JAILHOUSE_GUEST) 2911 2903 depends on X86_64 || (PCI_GOANY || PCI_GOMMCONFIG) 2904 + help 2905 + Add support for accessing the PCI configuration space as a memory 2906 + mapped area. It is the recommended method if the system supports 2907 + this (it must have PCI Express and ACPI for it to be available). 2908 + 2909 + In the unlikely case that enabling this configuration option causes 2910 + problems, the mechanism can be switched off with the 'pci=nommconf' 2911 + command line parameter. 2912 + 2913 + Say N only if you are sure that your platform does not support this 2914 + access method or you have problems caused by it. 2915 + 2916 + Say Y otherwise. 2912 2917 2913 2918 config PCI_OLPC 2914 2919 def_bool y ··· 2936 2915 depends on X86_64 && PCI_MMCONFIG && ACPI 2937 2916 2938 2917 config PCI_CNB20LE_QUIRK 2939 - bool "Read CNB20LE Host Bridge Windows" if EXPERT 2940 - depends on PCI 2918 + bool "Read PCI host bridge windows from the CNB20LE chipset" if EXPERT 2919 + depends on X86_32 && PCI 2941 2920 help 2942 2921 Read the PCI windows out of the CNB20LE host bridge. This allows 2943 2922 PCI hotplug to work on systems with the CNB20LE chipset which do 2944 2923 not have ACPI. 2924 + 2925 + The ServerWorks (later Broadcom) CNB20LE was a chipset designed 2926 + most probably only for Pentium III. 2927 + 2928 + To find out if you have such a chipset, search for a PCI device with 2929 + 1166:0009 PCI IDs, for example by executing 2930 + lspci -nn | grep '1166:0009' 2931 + The code is inactive if there is none. 2945 2932 2946 2933 There's no public spec for this chipset, and this functionality 2947 2934 is known to be incomplete.
+2
arch/x86/entry/calling.h
··· 70 70 pushq %rsi /* pt_regs->si */ 71 71 movq 8(%rsp), %rsi /* temporarily store the return address in %rsi */ 72 72 movq %rdi, 8(%rsp) /* pt_regs->di (overwriting original return address) */ 73 + /* We just clobbered the return address - use the IRET frame for unwinding: */ 74 + UNWIND_HINT_IRET_REGS offset=3*8 73 75 .else 74 76 pushq %rdi /* pt_regs->di */ 75 77 pushq %rsi /* pt_regs->si */
-4
arch/x86/include/asm/nospec-branch.h
··· 435 435 * Inline asm uses the %V modifier which is only in newer GCC 436 436 * which is ensured when CONFIG_MITIGATION_RETPOLINE is defined. 437 437 */ 438 - #ifdef CONFIG_MITIGATION_RETPOLINE 439 438 #define CALL_NOSPEC __CS_PREFIX("%V[thunk_target]") \ 440 439 "call __x86_indirect_thunk_%V[thunk_target]\n" 441 - #else 442 - #define CALL_NOSPEC "call *%[thunk_target]\n" 443 - #endif 444 440 445 441 # define THUNK_TARGET(addr) [thunk_target] "r" (addr) 446 442
+1 -1
arch/x86/kernel/cpu/microcode/amd.c
··· 600 600 unsigned long p_addr = (unsigned long)&mc->hdr.data_code; 601 601 602 602 if (!verify_sha256_digest(mc->hdr.patch_id, *cur_rev, (const u8 *)p_addr, psize)) 603 - return -1; 603 + return false; 604 604 605 605 native_wrmsrl(MSR_AMD64_PATCH_LOADER, p_addr); 606 606
+2 -3
arch/x86/kernel/dumpstack.c
··· 195 195 printk("%sCall Trace:\n", log_lvl); 196 196 197 197 unwind_start(&state, task, regs, stack); 198 + stack = stack ?: get_stack_pointer(task, regs); 198 199 regs = unwind_get_entry_regs(&state, &partial); 199 200 200 201 /* ··· 214 213 * - hardirq stack 215 214 * - entry stack 216 215 */ 217 - for (stack = stack ?: get_stack_pointer(task, regs); 218 - stack; 219 - stack = stack_info.next_sp) { 216 + for (; stack; stack = stack_info.next_sp) { 220 217 const char *stack_name; 221 218 222 219 stack = PTR_ALIGN(stack, sizeof(long));
+44 -1
arch/x86/kernel/early_printk.c
··· 190 190 early_serial_hw_init(divisor); 191 191 } 192 192 193 - #ifdef CONFIG_PCI 194 193 static __noendbr void mem32_serial_out(unsigned long addr, int offset, int value) 195 194 { 196 195 u32 __iomem *vaddr = (u32 __iomem *)addr; ··· 206 207 } 207 208 ANNOTATE_NOENDBR_SYM(mem32_serial_in); 208 209 210 + /* 211 + * early_mmio_serial_init() - Initialize MMIO-based early serial console. 212 + * @s: MMIO-based serial specification. 213 + */ 214 + static __init void early_mmio_serial_init(char *s) 215 + { 216 + unsigned long baudrate; 217 + unsigned long membase; 218 + char *e; 219 + 220 + if (*s == ',') 221 + s++; 222 + 223 + if (!strncmp(s, "0x", 2)) { 224 + /* NB: only 32-bit addresses are supported. */ 225 + membase = simple_strtoul(s, &e, 16); 226 + early_serial_base = (unsigned long)early_ioremap(membase, PAGE_SIZE); 227 + 228 + static_call_update(serial_in, mem32_serial_in); 229 + static_call_update(serial_out, mem32_serial_out); 230 + 231 + s += strcspn(s, ","); 232 + if (*s == ',') 233 + s++; 234 + } 235 + 236 + if (!strncmp(s, "nocfg", 5)) { 237 + baudrate = 0; 238 + } else { 239 + baudrate = simple_strtoul(s, &e, 0); 240 + if (baudrate == 0 || s == e) 241 + baudrate = DEFAULT_BAUD; 242 + } 243 + 244 + if (baudrate) 245 + early_serial_hw_init(115200 / baudrate); 246 + } 247 + 248 + #ifdef CONFIG_PCI 209 249 /* 210 250 * early_pci_serial_init() 211 251 * ··· 389 351 keep = (strstr(buf, "keep") != NULL); 390 352 391 353 while (*buf != '\0') { 354 + if (!strncmp(buf, "mmio", 4)) { 355 + early_mmio_serial_init(buf + 4); 356 + early_console_register(&early_serial_console, keep); 357 + buf += 4; 358 + } 392 359 if (!strncmp(buf, "serial", 6)) { 393 360 buf += 6; 394 361 early_serial_init(buf);
+1 -1
arch/x86/kernel/fpu/core.c
··· 508 508 /* 509 509 * Used in two places: 510 510 * 1) Early boot to setup init_fpstate for non XSAVE systems 511 - * 2) fpu_init_fpstate_user() which is invoked from KVM 511 + * 2) fpu_alloc_guest_fpstate() which is invoked from KVM 512 512 */ 513 513 void fpstate_init_user(struct fpstate *fpstate) 514 514 {
+28 -24
arch/x86/mm/pat/memtype.c
··· 984 984 return -EINVAL; 985 985 } 986 986 987 - /* 988 - * track_pfn_copy is called when vma that is covering the pfnmap gets 989 - * copied through copy_page_range(). 990 - * 991 - * If the vma has a linear pfn mapping for the entire range, we get the prot 992 - * from pte and reserve the entire vma range with single reserve_pfn_range call. 993 - */ 994 - int track_pfn_copy(struct vm_area_struct *vma) 987 + int track_pfn_copy(struct vm_area_struct *dst_vma, 988 + struct vm_area_struct *src_vma, unsigned long *pfn) 995 989 { 990 + const unsigned long vma_size = src_vma->vm_end - src_vma->vm_start; 996 991 resource_size_t paddr; 997 - unsigned long vma_size = vma->vm_end - vma->vm_start; 998 992 pgprot_t pgprot; 993 + int rc; 999 994 1000 - if (vma->vm_flags & VM_PAT) { 1001 - if (get_pat_info(vma, &paddr, &pgprot)) 1002 - return -EINVAL; 1003 - /* reserve the whole chunk covered by vma. */ 1004 - return reserve_pfn_range(paddr, vma_size, &pgprot, 1); 1005 - } 995 + if (!(src_vma->vm_flags & VM_PAT)) 996 + return 0; 1006 997 998 + /* 999 + * Duplicate the PAT information for the dst VMA based on the src 1000 + * VMA. 1001 + */ 1002 + if (get_pat_info(src_vma, &paddr, &pgprot)) 1003 + return -EINVAL; 1004 + rc = reserve_pfn_range(paddr, vma_size, &pgprot, 1); 1005 + if (rc) 1006 + return rc; 1007 + 1008 + /* Reservation for the destination VMA succeeded. */ 1009 + vm_flags_set(dst_vma, VM_PAT); 1010 + *pfn = PHYS_PFN(paddr); 1007 1011 return 0; 1012 + } 1013 + 1014 + void untrack_pfn_copy(struct vm_area_struct *dst_vma, unsigned long pfn) 1015 + { 1016 + untrack_pfn(dst_vma, pfn, dst_vma->vm_end - dst_vma->vm_start, true); 1017 + /* 1018 + * Reservation was freed, any copied page tables will get cleaned 1019 + * up later, but without getting PAT involved again. 1020 + */ 1008 1021 } 1009 1022 1010 1023 /* ··· 1108 1095 } 1109 1096 } 1110 1097 1111 - /* 1112 - * untrack_pfn_clear is called if the following situation fits: 1113 - * 1114 - * 1) while mremapping a pfnmap for a new region, with the old vma after 1115 - * its pfnmap page table has been removed. The new vma has a new pfnmap 1116 - * to the same pfn & cache type with VM_PAT set. 1117 - * 2) while duplicating vm area, the new vma fails to copy the pgtable from 1118 - * old vma. 1119 - */ 1120 1098 void untrack_pfn_clear(struct vm_area_struct *vma) 1121 1099 { 1122 1100 vm_flags_clear(vma, VM_PAT);
+22 -6
include/linux/pgtable.h
··· 1508 1508 } 1509 1509 1510 1510 /* 1511 - * track_pfn_copy is called when vma that is covering the pfnmap gets 1512 - * copied through copy_page_range(). 1511 + * track_pfn_copy is called when a VM_PFNMAP VMA is about to get the page 1512 + * tables copied during copy_page_range(). On success, stores the pfn to be 1513 + * passed to untrack_pfn_copy(). 1513 1514 */ 1514 - static inline int track_pfn_copy(struct vm_area_struct *vma) 1515 + static inline int track_pfn_copy(struct vm_area_struct *dst_vma, 1516 + struct vm_area_struct *src_vma, unsigned long *pfn) 1515 1517 { 1516 1518 return 0; 1519 + } 1520 + 1521 + /* 1522 + * untrack_pfn_copy is called when a VM_PFNMAP VMA failed to copy during 1523 + * copy_page_range(), but after track_pfn_copy() was already called. 1524 + */ 1525 + static inline void untrack_pfn_copy(struct vm_area_struct *dst_vma, 1526 + unsigned long pfn) 1527 + { 1517 1528 } 1518 1529 1519 1530 /* ··· 1539 1528 } 1540 1529 1541 1530 /* 1542 - * untrack_pfn_clear is called while mremapping a pfnmap for a new region 1543 - * or fails to copy pgtable during duplicate vm area. 1531 + * untrack_pfn_clear is called in the following cases on a VM_PFNMAP VMA: 1532 + * 1533 + * 1) During mremap() on the src VMA after the page tables were moved. 1534 + * 2) During fork() on the dst VMA, immediately after duplicating the src VMA. 1544 1535 */ 1545 1536 static inline void untrack_pfn_clear(struct vm_area_struct *vma) 1546 1537 { ··· 1553 1540 unsigned long size); 1554 1541 extern void track_pfn_insert(struct vm_area_struct *vma, pgprot_t *prot, 1555 1542 pfn_t pfn); 1556 - extern int track_pfn_copy(struct vm_area_struct *vma); 1543 + extern int track_pfn_copy(struct vm_area_struct *dst_vma, 1544 + struct vm_area_struct *src_vma, unsigned long *pfn); 1545 + extern void untrack_pfn_copy(struct vm_area_struct *dst_vma, 1546 + unsigned long pfn); 1557 1547 extern void untrack_pfn(struct vm_area_struct *vma, unsigned long pfn, 1558 1548 unsigned long size, bool mm_wr_locked); 1559 1549 extern void untrack_pfn_clear(struct vm_area_struct *vma);
+4
kernel/fork.c
··· 504 504 vma_numab_state_init(new); 505 505 dup_anon_vma_name(orig, new); 506 506 507 + /* track_pfn_copy() will later take care of copying internal state. */ 508 + if (unlikely(new->vm_flags & VM_PFNMAP)) 509 + untrack_pfn_clear(new); 510 + 507 511 return new; 508 512 } 509 513
+4 -7
mm/memory.c
··· 1362 1362 copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) 1363 1363 { 1364 1364 pgd_t *src_pgd, *dst_pgd; 1365 - unsigned long next; 1366 1365 unsigned long addr = src_vma->vm_start; 1367 1366 unsigned long end = src_vma->vm_end; 1368 1367 struct mm_struct *dst_mm = dst_vma->vm_mm; 1369 1368 struct mm_struct *src_mm = src_vma->vm_mm; 1370 1369 struct mmu_notifier_range range; 1370 + unsigned long next, pfn; 1371 1371 bool is_cow; 1372 1372 int ret; 1373 1373 ··· 1378 1378 return copy_hugetlb_page_range(dst_mm, src_mm, dst_vma, src_vma); 1379 1379 1380 1380 if (unlikely(src_vma->vm_flags & VM_PFNMAP)) { 1381 - /* 1382 - * We do not free on error cases below as remove_vma 1383 - * gets called on error from higher level routine 1384 - */ 1385 - ret = track_pfn_copy(src_vma); 1381 + ret = track_pfn_copy(dst_vma, src_vma, &pfn); 1386 1382 if (ret) 1387 1383 return ret; 1388 1384 } ··· 1415 1419 continue; 1416 1420 if (unlikely(copy_p4d_range(dst_vma, src_vma, dst_pgd, src_pgd, 1417 1421 addr, next))) { 1418 - untrack_pfn_clear(dst_vma); 1419 1422 ret = -ENOMEM; 1420 1423 break; 1421 1424 } ··· 1424 1429 raw_write_seqcount_end(&src_mm->write_protect_seq); 1425 1430 mmu_notifier_invalidate_range_end(&range); 1426 1431 } 1432 + if (ret && unlikely(src_vma->vm_flags & VM_PFNMAP)) 1433 + untrack_pfn_copy(dst_vma, pfn); 1427 1434 return ret; 1428 1435 } 1429 1436