Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

- arm64 support for syscall emulation via PTRACE_SYSEMU{,_SINGLESTEP}

- Wire up VM_FLUSH_RESET_PERMS for arm64, allowing the core code to
manage the permissions of executable vmalloc regions more strictly

- Slight performance improvement by keeping softirqs enabled while
touching the FPSIMD/SVE state (kernel_neon_begin/end)

- Expose a couple of ARMv8.5 features to user (HWCAP): CondM (new
XAFLAG and AXFLAG instructions for floating point comparison flags
manipulation) and FRINT (rounding floating point numbers to integers)

- Re-instate ARM64_PSEUDO_NMI support which was previously marked as
BROKEN due to some bugs (now fixed)

- Improve parking of stopped CPUs and implement an arm64-specific
panic_smp_self_stop() to avoid warning on not being able to stop
secondary CPUs during panic

- perf: enable the ARM Statistical Profiling Extensions (SPE) on ACPI
platforms

- perf: DDR performance monitor support for iMX8QXP

- cache_line_size() can now be set from DT or ACPI/PPTT if provided to
cope with a system cache info not exposed via the CPUID registers

- Avoid warning on hardware cache line size greater than
ARCH_DMA_MINALIGN if the system is fully coherent

- arm64 do_page_fault() and hugetlb cleanups

- Refactor set_pte_at() to avoid redundant READ_ONCE(*ptep)

- Ignore ACPI 5.1 FADTs reported as 5.0 (infer from the
'arm_boot_flags' introduced in 5.1)

- CONFIG_RANDOMIZE_BASE now enabled in defconfig

- Allow the selection of ARM64_MODULE_PLTS, currently only done via
RANDOMIZE_BASE (and an erratum workaround), allowing modules to spill
over into the vmalloc area

- Make ZONE_DMA32 configurable

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (54 commits)
perf: arm_spe: Enable ACPI/Platform automatic module loading
arm_pmu: acpi: spe: Add initial MADT/SPE probing
ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens
ACPI/PPTT: Modify node flag detection to find last IDENTICAL
x86/entry: Simplify _TIF_SYSCALL_EMU handling
arm64: rename dump_instr as dump_kernel_instr
arm64/mm: Drop [PTE|PMD]_TYPE_FAULT
arm64: Implement panic_smp_self_stop()
arm64: Improve parking of stopped CPUs
arm64: Expose FRINT capabilities to userspace
arm64: Expose ARMv8.5 CondM capability to userspace
arm64: defconfig: enable CONFIG_RANDOMIZE_BASE
arm64: ARM64_MODULES_PLTS must depend on MODULES
arm64: bpf: do not allocate executable memory
arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages
arm64/mm: wire up CONFIG_ARCH_HAS_SET_DIRECT_MAP
arm64: module: create module allocations without exec permissions
arm64: Allow user selection of ARM64_MODULE_PLTS
acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
arm64: Allow selecting Pseudo-NMI again
...

+1321 -315
+8
Documentation/arm64/elf_hwcaps.txt
··· 207 207 208 208 Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0001. 209 209 210 + HWCAP2_FLAGM2 211 + 212 + Functionality implied by ID_AA64ISAR0_EL1.TS == 0b0010. 213 + 210 214 HWCAP_SSBS 211 215 212 216 Functionality implied by ID_AA64PFR1_EL1.SSBS == 0b0010. ··· 226 222 Functionality implied by ID_AA64ISAR1_EL1.GPA == 0b0001 or 227 223 ID_AA64ISAR1_EL1.GPI == 0b0001, as described by 228 224 Documentation/arm64/pointer-authentication.txt. 225 + 226 + HWCAP2_FRINT 227 + 228 + Functionality implied by ID_AA64ISAR1_EL1.FRINTTS == 0b0001. 229 229 230 230 231 231 4. Unused AT_HWCAP bits
+21
Documentation/devicetree/bindings/perf/fsl-imx-ddr.txt
··· 1 + * Freescale(NXP) IMX8 DDR performance monitor 2 + 3 + Required properties: 4 + 5 + - compatible: should be one of: 6 + "fsl,imx8-ddr-pmu" 7 + "fsl,imx8m-ddr-pmu" 8 + 9 + - reg: physical address and size 10 + 11 + - interrupts: single interrupt 12 + generated by the control block 13 + 14 + Example: 15 + 16 + ddr-pmu@5c020000 { 17 + compatible = "fsl,imx8-ddr-pmu"; 18 + reg = <0x5c020000 0x10000>; 19 + interrupt-parent = <&gic>; 20 + interrupts = <GIC_SPI 131 IRQ_TYPE_LEVEL_HIGH>; 21 + };
+7
MAINTAINERS
··· 6337 6337 S: Maintained 6338 6338 F: drivers/i2c/busses/i2c-cpm.c 6339 6339 6340 + FREESCALE IMX DDR PMU DRIVER 6341 + M: Frank Li <Frank.li@nxp.com> 6342 + L: linux-arm-kernel@lists.infradead.org 6343 + S: Maintained 6344 + F: drivers/perf/fsl_imx8_ddr_perf.c 6345 + F: Documentation/devicetree/bindings/perf/fsl-imx-ddr.txt 6346 + 6340 6347 FREESCALE IMX LPI2C DRIVER 6341 6348 M: Dong Aisheng <aisheng.dong@nxp.com> 6342 6349 L: linux-i2c@vger.kernel.org
+31 -4
arch/arm64/Kconfig
··· 26 26 select ARCH_HAS_MEMBARRIER_SYNC_CORE 27 27 select ARCH_HAS_PTE_SPECIAL 28 28 select ARCH_HAS_SETUP_DMA_OPS 29 + select ARCH_HAS_SET_DIRECT_MAP 29 30 select ARCH_HAS_SET_MEMORY 30 31 select ARCH_HAS_STRICT_KERNEL_RWX 31 32 select ARCH_HAS_STRICT_MODULE_RWX ··· 261 260 def_bool y 262 261 263 262 config ZONE_DMA32 264 - def_bool y 263 + bool "Support DMA32 zone" if EXPERT 264 + default y 265 265 266 266 config HAVE_GENERIC_GUP 267 267 def_bool y ··· 935 933 config PARAVIRT_TIME_ACCOUNTING 936 934 bool "Paravirtual steal time accounting" 937 935 select PARAVIRT 938 - default n 939 936 help 940 937 Select this option to enable fine granularity task steal time 941 938 accounting. Time spent executing other tasks in parallel with ··· 1419 1418 KVM in the same kernel image. 1420 1419 1421 1420 config ARM64_MODULE_PLTS 1422 - bool 1421 + bool "Use PLTs to allow module memory to spill over into vmalloc area" 1422 + depends on MODULES 1423 1423 select HAVE_MOD_ARCH_SPECIFIC 1424 + help 1425 + Allocate PLTs when loading modules so that jumps and calls whose 1426 + targets are too far away for their relative offsets to be encoded 1427 + in the instructions themselves can be bounced via veneers in the 1428 + module's PLT. This allows modules to be allocated in the generic 1429 + vmalloc area after the dedicated module memory area has been 1430 + exhausted. 1431 + 1432 + When running with address space randomization (KASLR), the module 1433 + region itself may be too far away for ordinary relative jumps and 1434 + calls, and so in that case, module PLTs are required and cannot be 1435 + disabled. 1436 + 1437 + Specific errata workaround(s) might also force module PLTs to be 1438 + enabled (ARM64_ERRATUM_843419). 1424 1439 1425 1440 config ARM64_PSEUDO_NMI 1426 1441 bool "Support for NMI-like interrupts" 1427 - depends on BROKEN # 1556553607-46531-1-git-send-email-julien.thierry@arm.com 1428 1442 select CONFIG_ARM_GIC_V3 1429 1443 help 1430 1444 Adds support for mimicking Non-Maskable Interrupts through the use of ··· 1451 1435 "irqchip.gicv3_pseudo_nmi" to 1. 1452 1436 1453 1437 If unsure, say N 1438 + 1439 + if ARM64_PSEUDO_NMI 1440 + config ARM64_DEBUG_PRIORITY_MASKING 1441 + bool "Debug interrupt priority masking" 1442 + help 1443 + This adds runtime checks to functions enabling/disabling 1444 + interrupts when using priority masking. The additional checks verify 1445 + the validity of ICC_PMR_EL1 when calling concerned functions. 1446 + 1447 + If unsure, say N 1448 + endif 1454 1449 1455 1450 config RELOCATABLE 1456 1451 bool
+1
arch/arm64/configs/defconfig
··· 68 68 CONFIG_CRASH_DUMP=y 69 69 CONFIG_XEN=y 70 70 CONFIG_COMPAT=y 71 + CONFIG_RANDOMIZE_BASE=y 71 72 CONFIG_HIBERNATION=y 72 73 CONFIG_WQ_POWER_EFFICIENT_DEFAULT=y 73 74 CONFIG_ARM_CPUIDLE=y
+3
arch/arm64/include/asm/acpi.h
··· 38 38 (!(entry) || (entry)->header.length < ACPI_MADT_GICC_MIN_LENGTH || \ 39 39 (unsigned long)(entry) + (entry)->header.length > (end)) 40 40 41 + #define ACPI_MADT_GICC_SPE (ACPI_OFFSET(struct acpi_madt_generic_interrupt, \ 42 + spe_interrupt) + sizeof(u16)) 43 + 41 44 /* Basic configuration for ACPI */ 42 45 #ifdef CONFIG_ACPI 43 46 pgprot_t __acpi_get_mem_attribute(phys_addr_t addr);
+3 -1
arch/arm64/include/asm/arch_gicv3.h
··· 152 152 153 153 static inline void gic_pmr_mask_irqs(void) 154 154 { 155 - BUILD_BUG_ON(GICD_INT_DEF_PRI <= GIC_PRIO_IRQOFF); 155 + BUILD_BUG_ON(GICD_INT_DEF_PRI < (GIC_PRIO_IRQOFF | 156 + GIC_PRIO_PSR_I_SET)); 157 + BUILD_BUG_ON(GICD_INT_DEF_PRI >= GIC_PRIO_IRQON); 156 158 gic_write_pmr(GIC_PRIO_IRQOFF); 157 159 } 158 160
+4 -1
arch/arm64/include/asm/cache.h
··· 80 80 81 81 #define __read_mostly __attribute__((__section__(".data..read_mostly"))) 82 82 83 - static inline int cache_line_size(void) 83 + static inline int cache_line_size_of_cpu(void) 84 84 { 85 85 u32 cwg = cache_type_cwg(); 86 + 86 87 return cwg ? 4 << cwg : ARCH_DMA_MINALIGN; 87 88 } 89 + 90 + int cache_line_size(void); 88 91 89 92 /* 90 93 * Read the effective value of CTR_EL0.
+3
arch/arm64/include/asm/cacheflush.h
··· 176 176 177 177 int set_memory_valid(unsigned long addr, int numpages, int enable); 178 178 179 + int set_direct_map_invalid_noflush(struct page *page); 180 + int set_direct_map_default_noflush(struct page *page); 181 + 179 182 #endif
+6
arch/arm64/include/asm/cpufeature.h
··· 614 614 cpus_have_const_cap(ARM64_HAS_IRQ_PRIO_MASKING); 615 615 } 616 616 617 + static inline bool system_has_prio_mask_debugging(void) 618 + { 619 + return IS_ENABLED(CONFIG_ARM64_DEBUG_PRIORITY_MASKING) && 620 + system_uses_irq_prio_masking(); 621 + } 622 + 617 623 #define ARM64_SSBD_UNKNOWN -1 618 624 #define ARM64_SSBD_FORCE_DISABLE 0 619 625 #define ARM64_SSBD_KERNEL 1
+50 -29
arch/arm64/include/asm/daifflags.h
··· 7 7 8 8 #include <linux/irqflags.h> 9 9 10 + #include <asm/arch_gicv3.h> 10 11 #include <asm/cpufeature.h> 11 12 12 13 #define DAIF_PROCCTX 0 ··· 17 16 /* mask/save/unmask/restore all exceptions, including interrupts. */ 18 17 static inline void local_daif_mask(void) 19 18 { 19 + WARN_ON(system_has_prio_mask_debugging() && 20 + (read_sysreg_s(SYS_ICC_PMR_EL1) == (GIC_PRIO_IRQOFF | 21 + GIC_PRIO_PSR_I_SET))); 22 + 20 23 asm volatile( 21 24 "msr daifset, #0xf // local_daif_mask\n" 22 25 : 23 26 : 24 27 : "memory"); 28 + 29 + /* Don't really care for a dsb here, we don't intend to enable IRQs */ 30 + if (system_uses_irq_prio_masking()) 31 + gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET); 32 + 25 33 trace_hardirqs_off(); 26 34 } 27 35 ··· 42 32 43 33 if (system_uses_irq_prio_masking()) { 44 34 /* If IRQs are masked with PMR, reflect it in the flags */ 45 - if (read_sysreg_s(SYS_ICC_PMR_EL1) <= GIC_PRIO_IRQOFF) 35 + if (read_sysreg_s(SYS_ICC_PMR_EL1) != GIC_PRIO_IRQON) 46 36 flags |= PSR_I_BIT; 47 37 } 48 38 ··· 55 45 { 56 46 bool irq_disabled = flags & PSR_I_BIT; 57 47 48 + WARN_ON(system_has_prio_mask_debugging() && 49 + !(read_sysreg(daif) & PSR_I_BIT)); 50 + 58 51 if (!irq_disabled) { 59 52 trace_hardirqs_on(); 60 53 61 - if (system_uses_irq_prio_masking()) 62 - arch_local_irq_enable(); 63 - } else if (!(flags & PSR_A_BIT)) { 64 - /* 65 - * If interrupts are disabled but we can take 66 - * asynchronous errors, we can take NMIs 67 - */ 68 54 if (system_uses_irq_prio_masking()) { 69 - flags &= ~PSR_I_BIT; 70 - /* 71 - * There has been concern that the write to daif 72 - * might be reordered before this write to PMR. 73 - * From the ARM ARM DDI 0487D.a, section D1.7.1 74 - * "Accessing PSTATE fields": 75 - * Writes to the PSTATE fields have side-effects on 76 - * various aspects of the PE operation. All of these 77 - * side-effects are guaranteed: 78 - * - Not to be visible to earlier instructions in 79 - * the execution stream. 80 - * - To be visible to later instructions in the 81 - * execution stream 82 - * 83 - * Also, writes to PMR are self-synchronizing, so no 84 - * interrupts with a lower priority than PMR is signaled 85 - * to the PE after the write. 86 - * 87 - * So we don't need additional synchronization here. 88 - */ 89 - arch_local_irq_disable(); 55 + gic_write_pmr(GIC_PRIO_IRQON); 56 + dsb(sy); 90 57 } 58 + } else if (system_uses_irq_prio_masking()) { 59 + u64 pmr; 60 + 61 + if (!(flags & PSR_A_BIT)) { 62 + /* 63 + * If interrupts are disabled but we can take 64 + * asynchronous errors, we can take NMIs 65 + */ 66 + flags &= ~PSR_I_BIT; 67 + pmr = GIC_PRIO_IRQOFF; 68 + } else { 69 + pmr = GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET; 70 + } 71 + 72 + /* 73 + * There has been concern that the write to daif 74 + * might be reordered before this write to PMR. 75 + * From the ARM ARM DDI 0487D.a, section D1.7.1 76 + * "Accessing PSTATE fields": 77 + * Writes to the PSTATE fields have side-effects on 78 + * various aspects of the PE operation. All of these 79 + * side-effects are guaranteed: 80 + * - Not to be visible to earlier instructions in 81 + * the execution stream. 82 + * - To be visible to later instructions in the 83 + * execution stream 84 + * 85 + * Also, writes to PMR are self-synchronizing, so no 86 + * interrupts with a lower priority than PMR is signaled 87 + * to the PE after the write. 88 + * 89 + * So we don't need additional synchronization here. 90 + */ 91 + gic_write_pmr(pmr); 91 92 } 92 93 93 94 write_sysreg(flags, daif);
+1 -4
arch/arm64/include/asm/fpsimd.h
··· 37 37 extern void fpsimd_save_state(struct user_fpsimd_state *state); 38 38 extern void fpsimd_load_state(struct user_fpsimd_state *state); 39 39 40 - extern void fpsimd_save(void); 41 - 42 40 extern void fpsimd_thread_switch(struct task_struct *next); 43 41 extern void fpsimd_flush_thread(void); 44 42 ··· 50 52 void *sve_state, unsigned int sve_vl); 51 53 52 54 extern void fpsimd_flush_task_state(struct task_struct *target); 53 - extern void fpsimd_flush_cpu_state(void); 54 - extern void sve_flush_cpu_state(void); 55 + extern void fpsimd_save_and_flush_cpu_state(void); 55 56 56 57 /* Maximum VL that SVE VL-agnostic software can transparently support */ 57 58 #define SVE_VL_ARCH_MAX 0x100
+2
arch/arm64/include/asm/hwcap.h
··· 84 84 #define KERNEL_HWCAP_SVEBITPERM __khwcap2_feature(SVEBITPERM) 85 85 #define KERNEL_HWCAP_SVESHA3 __khwcap2_feature(SVESHA3) 86 86 #define KERNEL_HWCAP_SVESM4 __khwcap2_feature(SVESM4) 87 + #define KERNEL_HWCAP_FLAGM2 __khwcap2_feature(FLAGM2) 88 + #define KERNEL_HWCAP_FRINT __khwcap2_feature(FRINT) 87 89 88 90 /* 89 91 * This yields a mask that user programs can use to figure out what
+39 -40
arch/arm64/include/asm/irqflags.h
··· 29 29 */ 30 30 static inline void arch_local_irq_enable(void) 31 31 { 32 + if (system_has_prio_mask_debugging()) { 33 + u32 pmr = read_sysreg_s(SYS_ICC_PMR_EL1); 34 + 35 + WARN_ON_ONCE(pmr != GIC_PRIO_IRQON && pmr != GIC_PRIO_IRQOFF); 36 + } 37 + 32 38 asm volatile(ALTERNATIVE( 33 39 "msr daifclr, #2 // arch_local_irq_enable\n" 34 40 "nop", ··· 48 42 49 43 static inline void arch_local_irq_disable(void) 50 44 { 45 + if (system_has_prio_mask_debugging()) { 46 + u32 pmr = read_sysreg_s(SYS_ICC_PMR_EL1); 47 + 48 + WARN_ON_ONCE(pmr != GIC_PRIO_IRQON && pmr != GIC_PRIO_IRQOFF); 49 + } 50 + 51 51 asm volatile(ALTERNATIVE( 52 52 "msr daifset, #2 // arch_local_irq_disable", 53 53 __msr_s(SYS_ICC_PMR_EL1, "%0"), ··· 68 56 */ 69 57 static inline unsigned long arch_local_save_flags(void) 70 58 { 71 - unsigned long daif_bits; 72 59 unsigned long flags; 73 60 74 - daif_bits = read_sysreg(daif); 75 - 76 - /* 77 - * The asm is logically equivalent to: 78 - * 79 - * if (system_uses_irq_prio_masking()) 80 - * flags = (daif_bits & PSR_I_BIT) ? 81 - * GIC_PRIO_IRQOFF : 82 - * read_sysreg_s(SYS_ICC_PMR_EL1); 83 - * else 84 - * flags = daif_bits; 85 - */ 86 61 asm volatile(ALTERNATIVE( 87 - "mov %0, %1\n" 88 - "nop\n" 89 - "nop", 90 - __mrs_s("%0", SYS_ICC_PMR_EL1) 91 - "ands %1, %1, " __stringify(PSR_I_BIT) "\n" 92 - "csel %0, %0, %2, eq", 93 - ARM64_HAS_IRQ_PRIO_MASKING) 94 - : "=&r" (flags), "+r" (daif_bits) 95 - : "r" ((unsigned long) GIC_PRIO_IRQOFF) 62 + "mrs %0, daif", 63 + __mrs_s("%0", SYS_ICC_PMR_EL1), 64 + ARM64_HAS_IRQ_PRIO_MASKING) 65 + : "=&r" (flags) 66 + : 96 67 : "memory"); 97 68 98 69 return flags; 70 + } 71 + 72 + static inline int arch_irqs_disabled_flags(unsigned long flags) 73 + { 74 + int res; 75 + 76 + asm volatile(ALTERNATIVE( 77 + "and %w0, %w1, #" __stringify(PSR_I_BIT), 78 + "eor %w0, %w1, #" __stringify(GIC_PRIO_IRQON), 79 + ARM64_HAS_IRQ_PRIO_MASKING) 80 + : "=&r" (res) 81 + : "r" ((int) flags) 82 + : "memory"); 83 + 84 + return res; 99 85 } 100 86 101 87 static inline unsigned long arch_local_irq_save(void) ··· 102 92 103 93 flags = arch_local_save_flags(); 104 94 105 - arch_local_irq_disable(); 95 + /* 96 + * There are too many states with IRQs disabled, just keep the current 97 + * state if interrupts are already disabled/masked. 98 + */ 99 + if (!arch_irqs_disabled_flags(flags)) 100 + arch_local_irq_disable(); 106 101 107 102 return flags; 108 103 } ··· 123 108 __msr_s(SYS_ICC_PMR_EL1, "%0") 124 109 "dsb sy", 125 110 ARM64_HAS_IRQ_PRIO_MASKING) 126 - : "+r" (flags) 127 111 : 112 + : "r" (flags) 128 113 : "memory"); 129 114 } 130 115 131 - static inline int arch_irqs_disabled_flags(unsigned long flags) 132 - { 133 - int res; 134 - 135 - asm volatile(ALTERNATIVE( 136 - "and %w0, %w1, #" __stringify(PSR_I_BIT) "\n" 137 - "nop", 138 - "cmp %w1, #" __stringify(GIC_PRIO_IRQOFF) "\n" 139 - "cset %w0, ls", 140 - ARM64_HAS_IRQ_PRIO_MASKING) 141 - : "=&r" (res) 142 - : "r" ((int) flags) 143 - : "memory"); 144 - 145 - return res; 146 - } 147 116 #endif 148 117 #endif
+4 -3
arch/arm64/include/asm/kvm_host.h
··· 597 597 * will not signal the CPU of interrupts of lower priority, and the 598 598 * only way to get out will be via guest exceptions. 599 599 * Naturally, we want to avoid this. 600 + * 601 + * local_daif_mask() already sets GIC_PRIO_PSR_I_SET, we just need a 602 + * dsb to ensure the redistributor is forwards EL2 IRQs to the CPU. 600 603 */ 601 - if (system_uses_irq_prio_masking()) { 602 - gic_write_pmr(GIC_PRIO_IRQON); 604 + if (system_uses_irq_prio_masking()) 603 605 dsb(sy); 604 - } 605 606 } 606 607 607 608 static inline void kvm_arm_vhe_guest_exit(void)
+1 -2
arch/arm64/include/asm/pgtable-hwdef.h
··· 115 115 * Level 2 descriptor (PMD). 116 116 */ 117 117 #define PMD_TYPE_MASK (_AT(pmdval_t, 3) << 0) 118 - #define PMD_TYPE_FAULT (_AT(pmdval_t, 0) << 0) 119 118 #define PMD_TYPE_TABLE (_AT(pmdval_t, 3) << 0) 120 119 #define PMD_TYPE_SECT (_AT(pmdval_t, 1) << 0) 121 120 #define PMD_TABLE_BIT (_AT(pmdval_t, 1) << 1) ··· 141 142 /* 142 143 * Level 3 descriptor (PTE). 143 144 */ 145 + #define PTE_VALID (_AT(pteval_t, 1) << 0) 144 146 #define PTE_TYPE_MASK (_AT(pteval_t, 3) << 0) 145 - #define PTE_TYPE_FAULT (_AT(pteval_t, 0) << 0) 146 147 #define PTE_TYPE_PAGE (_AT(pteval_t, 3) << 0) 147 148 #define PTE_TABLE_BIT (_AT(pteval_t, 1) << 1) 148 149 #define PTE_USER (_AT(pteval_t, 1) << 6) /* AP[1] */
-1
arch/arm64/include/asm/pgtable-prot.h
··· 13 13 /* 14 14 * Software defined PTE bits definition. 15 15 */ 16 - #define PTE_VALID (_AT(pteval_t, 1) << 0) 17 16 #define PTE_WRITE (PTE_DBM) /* same as DBM (51) */ 18 17 #define PTE_DIRTY (_AT(pteval_t, 1) << 55) 19 18 #define PTE_SPECIAL (_AT(pteval_t, 1) << 56)
+37 -19
arch/arm64/include/asm/pgtable.h
··· 235 235 * 236 236 * PTE_DIRTY || (PTE_WRITE && !PTE_RDONLY) 237 237 */ 238 - static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, 239 - pte_t *ptep, pte_t pte) 238 + 239 + static inline void __check_racy_pte_update(struct mm_struct *mm, pte_t *ptep, 240 + pte_t pte) 240 241 { 241 242 pte_t old_pte; 242 243 244 + if (!IS_ENABLED(CONFIG_DEBUG_VM)) 245 + return; 246 + 247 + old_pte = READ_ONCE(*ptep); 248 + 249 + if (!pte_valid(old_pte) || !pte_valid(pte)) 250 + return; 251 + if (mm != current->active_mm && atomic_read(&mm->mm_users) <= 1) 252 + return; 253 + 254 + /* 255 + * Check for potential race with hardware updates of the pte 256 + * (ptep_set_access_flags safely changes valid ptes without going 257 + * through an invalid entry). 258 + */ 259 + VM_WARN_ONCE(!pte_young(pte), 260 + "%s: racy access flag clearing: 0x%016llx -> 0x%016llx", 261 + __func__, pte_val(old_pte), pte_val(pte)); 262 + VM_WARN_ONCE(pte_write(old_pte) && !pte_dirty(pte), 263 + "%s: racy dirty state clearing: 0x%016llx -> 0x%016llx", 264 + __func__, pte_val(old_pte), pte_val(pte)); 265 + } 266 + 267 + static inline void set_pte_at(struct mm_struct *mm, unsigned long addr, 268 + pte_t *ptep, pte_t pte) 269 + { 243 270 if (pte_present(pte) && pte_user_exec(pte) && !pte_special(pte)) 244 271 __sync_icache_dcache(pte); 245 272 246 - /* 247 - * If the existing pte is valid, check for potential race with 248 - * hardware updates of the pte (ptep_set_access_flags safely changes 249 - * valid ptes without going through an invalid entry). 250 - */ 251 - old_pte = READ_ONCE(*ptep); 252 - if (IS_ENABLED(CONFIG_DEBUG_VM) && pte_valid(old_pte) && pte_valid(pte) && 253 - (mm == current->active_mm || atomic_read(&mm->mm_users) > 1)) { 254 - VM_WARN_ONCE(!pte_young(pte), 255 - "%s: racy access flag clearing: 0x%016llx -> 0x%016llx", 256 - __func__, pte_val(old_pte), pte_val(pte)); 257 - VM_WARN_ONCE(pte_write(old_pte) && !pte_dirty(pte), 258 - "%s: racy dirty state clearing: 0x%016llx -> 0x%016llx", 259 - __func__, pte_val(old_pte), pte_val(pte)); 260 - } 273 + __check_racy_pte_update(mm, ptep, pte); 261 274 262 275 set_pte(ptep, pte); 263 276 } ··· 337 324 return __pmd(pte_val(pte)); 338 325 } 339 326 340 - static inline pgprot_t mk_sect_prot(pgprot_t prot) 327 + static inline pgprot_t mk_pud_sect_prot(pgprot_t prot) 341 328 { 342 - return __pgprot(pgprot_val(prot) & ~PTE_TABLE_BIT); 329 + return __pgprot((pgprot_val(prot) & ~PUD_TABLE_BIT) | PUD_TYPE_SECT); 330 + } 331 + 332 + static inline pgprot_t mk_pmd_sect_prot(pgprot_t prot) 333 + { 334 + return __pgprot((pgprot_val(prot) & ~PMD_TABLE_BIT) | PMD_TYPE_SECT); 343 335 } 344 336 345 337 #ifdef CONFIG_NUMA_BALANCING
+8 -2
arch/arm64/include/asm/ptrace.h
··· 24 24 * means masking more IRQs (or at least that the same IRQs remain masked). 25 25 * 26 26 * To mask interrupts, we clear the most significant bit of PMR. 27 + * 28 + * Some code sections either automatically switch back to PSR.I or explicitly 29 + * require to not use priority masking. If bit GIC_PRIO_PSR_I_SET is included 30 + * in the the priority mask, it indicates that PSR.I should be set and 31 + * interrupt disabling temporarily does not rely on IRQ priorities. 27 32 */ 28 - #define GIC_PRIO_IRQON 0xf0 29 - #define GIC_PRIO_IRQOFF (GIC_PRIO_IRQON & ~0x80) 33 + #define GIC_PRIO_IRQON 0xc0 34 + #define GIC_PRIO_IRQOFF (GIC_PRIO_IRQON & ~0x80) 35 + #define GIC_PRIO_PSR_I_SET (1 << 4) 30 36 31 37 /* Additional SPSR bits not exposed in the UABI */ 32 38 #define PSR_IL_BIT (1 << 20)
+5 -5
arch/arm64/include/asm/simd.h
··· 12 12 #include <linux/preempt.h> 13 13 #include <linux/types.h> 14 14 15 - #ifdef CONFIG_KERNEL_MODE_NEON 15 + DECLARE_PER_CPU(bool, fpsimd_context_busy); 16 16 17 - DECLARE_PER_CPU(bool, kernel_neon_busy); 17 + #ifdef CONFIG_KERNEL_MODE_NEON 18 18 19 19 /* 20 20 * may_use_simd - whether it is allowable at this time to issue SIMD ··· 26 26 static __must_check inline bool may_use_simd(void) 27 27 { 28 28 /* 29 - * kernel_neon_busy is only set while preemption is disabled, 29 + * fpsimd_context_busy is only set while preemption is disabled, 30 30 * and is clear whenever preemption is enabled. Since 31 - * this_cpu_read() is atomic w.r.t. preemption, kernel_neon_busy 31 + * this_cpu_read() is atomic w.r.t. preemption, fpsimd_context_busy 32 32 * cannot change under our feet -- if it's set we cannot be 33 33 * migrated, and if it's clear we cannot be migrated to a CPU 34 34 * where it is set. 35 35 */ 36 36 return !in_irq() && !irqs_disabled() && !in_nmi() && 37 - !this_cpu_read(kernel_neon_busy); 37 + !this_cpu_read(fpsimd_context_busy); 38 38 } 39 39 40 40 #else /* ! CONFIG_KERNEL_MODE_NEON */
+1
arch/arm64/include/asm/sysreg.h
··· 549 549 550 550 /* id_aa64isar1 */ 551 551 #define ID_AA64ISAR1_SB_SHIFT 36 552 + #define ID_AA64ISAR1_FRINTTS_SHIFT 32 552 553 #define ID_AA64ISAR1_GPI_SHIFT 28 553 554 #define ID_AA64ISAR1_GPA_SHIFT 24 554 555 #define ID_AA64ISAR1_LRCPC_SHIFT 20
+4 -1
arch/arm64/include/asm/thread_info.h
··· 65 65 * TIF_SYSCALL_TRACEPOINT - syscall tracepoint for ftrace 66 66 * TIF_SYSCALL_AUDIT - syscall auditing 67 67 * TIF_SECCOMP - syscall secure computing 68 + * TIF_SYSCALL_EMU - syscall emulation active 68 69 * TIF_SIGPENDING - signal pending 69 70 * TIF_NEED_RESCHED - rescheduling necessary 70 71 * TIF_NOTIFY_RESUME - callback before returning to user ··· 81 80 #define TIF_SYSCALL_AUDIT 9 82 81 #define TIF_SYSCALL_TRACEPOINT 10 83 82 #define TIF_SECCOMP 11 83 + #define TIF_SYSCALL_EMU 12 84 84 #define TIF_MEMDIE 18 /* is terminating due to OOM killer */ 85 85 #define TIF_FREEZE 19 86 86 #define TIF_RESTORE_SIGMASK 20 ··· 100 98 #define _TIF_SYSCALL_AUDIT (1 << TIF_SYSCALL_AUDIT) 101 99 #define _TIF_SYSCALL_TRACEPOINT (1 << TIF_SYSCALL_TRACEPOINT) 102 100 #define _TIF_SECCOMP (1 << TIF_SECCOMP) 101 + #define _TIF_SYSCALL_EMU (1 << TIF_SYSCALL_EMU) 103 102 #define _TIF_UPROBE (1 << TIF_UPROBE) 104 103 #define _TIF_FSCHECK (1 << TIF_FSCHECK) 105 104 #define _TIF_32BIT (1 << TIF_32BIT) ··· 112 109 113 110 #define _TIF_SYSCALL_WORK (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_AUDIT | \ 114 111 _TIF_SYSCALL_TRACEPOINT | _TIF_SECCOMP | \ 115 - _TIF_NOHZ) 112 + _TIF_NOHZ | _TIF_SYSCALL_EMU) 116 113 117 114 #define INIT_THREAD_INFO(tsk) \ 118 115 { \
+2
arch/arm64/include/uapi/asm/hwcap.h
··· 63 63 #define HWCAP2_SVEBITPERM (1 << 4) 64 64 #define HWCAP2_SVESHA3 (1 << 5) 65 65 #define HWCAP2_SVESM4 (1 << 6) 66 + #define HWCAP2_FLAGM2 (1 << 7) 67 + #define HWCAP2_FRINT (1 << 8) 66 68 67 69 #endif /* _UAPI__ASM_HWCAP_H */
+3
arch/arm64/include/uapi/asm/ptrace.h
··· 62 62 #define PSR_x 0x0000ff00 /* Extension */ 63 63 #define PSR_c 0x000000ff /* Control */ 64 64 65 + /* syscall emulation path in ptrace */ 66 + #define PTRACE_SYSEMU 31 67 + #define PTRACE_SYSEMU_SINGLESTEP 32 65 68 66 69 #ifndef __ASSEMBLY__ 67 70
+7 -3
arch/arm64/kernel/acpi.c
··· 152 152 */ 153 153 if (table->revision < 5 || 154 154 (table->revision == 5 && fadt->minor_revision < 1)) { 155 - pr_err("Unsupported FADT revision %d.%d, should be 5.1+\n", 155 + pr_err(FW_BUG "Unsupported FADT revision %d.%d, should be 5.1+\n", 156 156 table->revision, fadt->minor_revision); 157 - ret = -EINVAL; 158 - goto out; 157 + 158 + if (!fadt->arm_boot_flags) { 159 + ret = -EINVAL; 160 + goto out; 161 + } 162 + pr_err("FADT has ARM boot flags set, assuming 5.1\n"); 159 163 } 160 164 161 165 if (!(fadt->flags & ACPI_FADT_HW_REDUCED)) {
+9
arch/arm64/kernel/cacheinfo.c
··· 17 17 #define CLIDR_CTYPE(clidr, level) \ 18 18 (((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level)) 19 19 20 + int cache_line_size(void) 21 + { 22 + if (coherency_max_size != 0) 23 + return coherency_max_size; 24 + 25 + return cache_line_size_of_cpu(); 26 + } 27 + EXPORT_SYMBOL_GPL(cache_line_size); 28 + 20 29 static inline enum cache_type get_cache_type(int level) 21 30 { 22 31 u64 clidr;
+5 -3
arch/arm64/kernel/cpufeature.c
··· 1184 1184 static void cpu_enable_ssbs(const struct arm64_cpu_capabilities *__unused) 1185 1185 { 1186 1186 static bool undef_hook_registered = false; 1187 - static DEFINE_SPINLOCK(hook_lock); 1187 + static DEFINE_RAW_SPINLOCK(hook_lock); 1188 1188 1189 - spin_lock(&hook_lock); 1189 + raw_spin_lock(&hook_lock); 1190 1190 if (!undef_hook_registered) { 1191 1191 register_undef_hook(&ssbs_emulation_hook); 1192 1192 undef_hook_registered = true; 1193 1193 } 1194 - spin_unlock(&hook_lock); 1194 + raw_spin_unlock(&hook_lock); 1195 1195 1196 1196 if (arm64_get_ssbd_state() == ARM64_SSBD_FORCE_DISABLE) { 1197 1197 sysreg_clear_set(sctlr_el1, 0, SCTLR_ELx_DSSBS); ··· 1618 1618 HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_DP_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ASIMDDP), 1619 1619 HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_FHM_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_ASIMDFHM), 1620 1620 HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_TS_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_FLAGM), 1621 + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_TS_SHIFT, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_FLAGM2), 1621 1622 HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, FTR_SIGNED, 0, CAP_HWCAP, KERNEL_HWCAP_FP), 1622 1623 HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, FTR_SIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_FPHP), 1623 1624 HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_ASIMD_SHIFT, FTR_SIGNED, 0, CAP_HWCAP, KERNEL_HWCAP_ASIMD), ··· 1630 1629 HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_FCMA_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_FCMA), 1631 1630 HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_LRCPC_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_LRCPC), 1632 1631 HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_LRCPC_SHIFT, FTR_UNSIGNED, 2, CAP_HWCAP, KERNEL_HWCAP_ILRCPC), 1632 + HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_FRINTTS_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_FRINT), 1633 1633 HWCAP_CAP(SYS_ID_AA64ISAR1_EL1, ID_AA64ISAR1_SB_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_SB), 1634 1634 HWCAP_CAP(SYS_ID_AA64MMFR2_EL1, ID_AA64MMFR2_AT_SHIFT, FTR_UNSIGNED, 1, CAP_HWCAP, KERNEL_HWCAP_USCAT), 1635 1635 #ifdef CONFIG_ARM64_SVE
+2
arch/arm64/kernel/cpuinfo.c
··· 82 82 "svebitperm", 83 83 "svesha3", 84 84 "svesm4", 85 + "flagm2", 86 + "frint", 85 87 NULL 86 88 }; 87 89
+68 -18
arch/arm64/kernel/entry.S
··· 247 247 /* 248 248 * Registers that may be useful after this macro is invoked: 249 249 * 250 + * x20 - ICC_PMR_EL1 250 251 * x21 - aborted SP 251 252 * x22 - aborted PC 252 253 * x23 - aborted PSTATE ··· 423 422 irq_stack_entry 424 423 blr x1 425 424 irq_stack_exit 425 + .endm 426 + 427 + #ifdef CONFIG_ARM64_PSEUDO_NMI 428 + /* 429 + * Set res to 0 if irqs were unmasked in interrupted context. 430 + * Otherwise set res to non-0 value. 431 + */ 432 + .macro test_irqs_unmasked res:req, pmr:req 433 + alternative_if ARM64_HAS_IRQ_PRIO_MASKING 434 + sub \res, \pmr, #GIC_PRIO_IRQON 435 + alternative_else 436 + mov \res, xzr 437 + alternative_endif 438 + .endm 439 + #endif 440 + 441 + .macro gic_prio_kentry_setup, tmp:req 442 + #ifdef CONFIG_ARM64_PSEUDO_NMI 443 + alternative_if ARM64_HAS_IRQ_PRIO_MASKING 444 + mov \tmp, #(GIC_PRIO_PSR_I_SET | GIC_PRIO_IRQON) 445 + msr_s SYS_ICC_PMR_EL1, \tmp 446 + alternative_else_nop_endif 447 + #endif 448 + .endm 449 + 450 + .macro gic_prio_irq_setup, pmr:req, tmp:req 451 + #ifdef CONFIG_ARM64_PSEUDO_NMI 452 + alternative_if ARM64_HAS_IRQ_PRIO_MASKING 453 + orr \tmp, \pmr, #GIC_PRIO_PSR_I_SET 454 + msr_s SYS_ICC_PMR_EL1, \tmp 455 + alternative_else_nop_endif 456 + #endif 426 457 .endm 427 458 428 459 .text ··· 635 602 cmp x24, #ESR_ELx_EC_BRK64 // if BRK64 636 603 cinc x24, x24, eq // set bit '0' 637 604 tbz x24, #0, el1_inv // EL1 only 605 + gic_prio_kentry_setup tmp=x3 638 606 mrs x0, far_el1 639 607 mov x2, sp // struct pt_regs 640 608 bl do_debug_exception ··· 653 619 .align 6 654 620 el1_irq: 655 621 kernel_entry 1 622 + gic_prio_irq_setup pmr=x20, tmp=x1 656 623 enable_da_f 657 - #ifdef CONFIG_TRACE_IRQFLAGS 624 + 658 625 #ifdef CONFIG_ARM64_PSEUDO_NMI 659 - alternative_if ARM64_HAS_IRQ_PRIO_MASKING 660 - ldr x20, [sp, #S_PMR_SAVE] 661 - alternative_else 662 - mov x20, #GIC_PRIO_IRQON 663 - alternative_endif 664 - cmp x20, #GIC_PRIO_IRQOFF 665 - /* Irqs were disabled, don't trace */ 666 - b.ls 1f 667 - #endif 668 - bl trace_hardirqs_off 626 + test_irqs_unmasked res=x0, pmr=x20 627 + cbz x0, 1f 628 + bl asm_nmi_enter 669 629 1: 630 + #endif 631 + 632 + #ifdef CONFIG_TRACE_IRQFLAGS 633 + bl trace_hardirqs_off 670 634 #endif 671 635 672 636 irq_handler ··· 683 651 bl preempt_schedule_irq // irq en/disable is done inside 684 652 1: 685 653 #endif 686 - #ifdef CONFIG_TRACE_IRQFLAGS 654 + 687 655 #ifdef CONFIG_ARM64_PSEUDO_NMI 688 656 /* 689 - * if IRQs were disabled when we received the interrupt, we have an NMI 690 - * and we are not re-enabling interrupt upon eret. Skip tracing. 657 + * When using IRQ priority masking, we can get spurious interrupts while 658 + * PMR is set to GIC_PRIO_IRQOFF. An NMI might also have occurred in a 659 + * section with interrupts disabled. Skip tracing in those cases. 691 660 */ 692 - cmp x20, #GIC_PRIO_IRQOFF 693 - b.ls 1f 661 + test_irqs_unmasked res=x0, pmr=x20 662 + cbz x0, 1f 663 + bl asm_nmi_exit 664 + 1: 665 + #endif 666 + 667 + #ifdef CONFIG_TRACE_IRQFLAGS 668 + #ifdef CONFIG_ARM64_PSEUDO_NMI 669 + test_irqs_unmasked res=x0, pmr=x20 670 + cbnz x0, 1f 694 671 #endif 695 672 bl trace_hardirqs_on 696 673 1: ··· 817 776 * Instruction abort handling 818 777 */ 819 778 mrs x26, far_el1 779 + gic_prio_kentry_setup tmp=x0 820 780 enable_da_f 821 781 #ifdef CONFIG_TRACE_IRQFLAGS 822 782 bl trace_hardirqs_off ··· 863 821 * Stack or PC alignment exception handling 864 822 */ 865 823 mrs x26, far_el1 824 + gic_prio_kentry_setup tmp=x0 866 825 enable_da_f 867 826 #ifdef CONFIG_TRACE_IRQFLAGS 868 827 bl trace_hardirqs_off ··· 898 855 * Debug exception handling 899 856 */ 900 857 tbnz x24, #0, el0_inv // EL0 only 858 + gic_prio_kentry_setup tmp=x3 901 859 mrs x0, far_el1 902 860 mov x1, x25 903 861 mov x2, sp 904 862 bl do_debug_exception 905 - enable_daif 863 + enable_da_f 906 864 ct_user_exit 907 865 b ret_to_user 908 866 el0_inv: ··· 920 876 el0_irq: 921 877 kernel_entry 0 922 878 el0_irq_naked: 879 + gic_prio_irq_setup pmr=x20, tmp=x0 923 880 enable_da_f 881 + 924 882 #ifdef CONFIG_TRACE_IRQFLAGS 925 883 bl trace_hardirqs_off 926 884 #endif ··· 944 898 el1_error: 945 899 kernel_entry 1 946 900 mrs x1, esr_el1 901 + gic_prio_kentry_setup tmp=x2 947 902 enable_dbg 948 903 mov x0, sp 949 904 bl do_serror ··· 955 908 kernel_entry 0 956 909 el0_error_naked: 957 910 mrs x1, esr_el1 911 + gic_prio_kentry_setup tmp=x2 958 912 enable_dbg 959 913 mov x0, sp 960 914 bl do_serror 961 - enable_daif 915 + enable_da_f 962 916 ct_user_exit 963 917 b ret_to_user 964 918 ENDPROC(el0_error) ··· 980 932 */ 981 933 ret_to_user: 982 934 disable_daif 935 + gic_prio_kentry_setup tmp=x3 983 936 ldr x1, [tsk, #TSK_TI_FLAGS] 984 937 and x2, x1, #_TIF_WORK_MASK 985 938 cbnz x2, work_pending ··· 997 948 */ 998 949 .align 6 999 950 el0_svc: 951 + gic_prio_kentry_setup tmp=x1 1000 952 mov x0, sp 1001 953 bl el0_svc_handler 1002 954 b ret_to_user
+96 -43
arch/arm64/kernel/fpsimd.c
··· 82 82 * To prevent this from racing with the manipulation of the task's FPSIMD state 83 83 * from task context and thereby corrupting the state, it is necessary to 84 84 * protect any manipulation of a task's fpsimd_state or TIF_FOREIGN_FPSTATE 85 - * flag with local_bh_disable() unless softirqs are already masked. 85 + * flag with {, __}get_cpu_fpsimd_context(). This will still allow softirqs to 86 + * run but prevent them to use FPSIMD. 86 87 * 87 88 * For a certain task, the sequence may look something like this: 88 89 * - the task gets scheduled in; if both the task's fpsimd_cpu field ··· 145 144 extern void __percpu *efi_sve_state; 146 145 147 146 #endif /* ! CONFIG_ARM64_SVE */ 147 + 148 + DEFINE_PER_CPU(bool, fpsimd_context_busy); 149 + EXPORT_PER_CPU_SYMBOL(fpsimd_context_busy); 150 + 151 + static void __get_cpu_fpsimd_context(void) 152 + { 153 + bool busy = __this_cpu_xchg(fpsimd_context_busy, true); 154 + 155 + WARN_ON(busy); 156 + } 157 + 158 + /* 159 + * Claim ownership of the CPU FPSIMD context for use by the calling context. 160 + * 161 + * The caller may freely manipulate the FPSIMD context metadata until 162 + * put_cpu_fpsimd_context() is called. 163 + * 164 + * The double-underscore version must only be called if you know the task 165 + * can't be preempted. 166 + */ 167 + static void get_cpu_fpsimd_context(void) 168 + { 169 + preempt_disable(); 170 + __get_cpu_fpsimd_context(); 171 + } 172 + 173 + static void __put_cpu_fpsimd_context(void) 174 + { 175 + bool busy = __this_cpu_xchg(fpsimd_context_busy, false); 176 + 177 + WARN_ON(!busy); /* No matching get_cpu_fpsimd_context()? */ 178 + } 179 + 180 + /* 181 + * Release the CPU FPSIMD context. 182 + * 183 + * Must be called from a context in which get_cpu_fpsimd_context() was 184 + * previously called, with no call to put_cpu_fpsimd_context() in the 185 + * meantime. 186 + */ 187 + static void put_cpu_fpsimd_context(void) 188 + { 189 + __put_cpu_fpsimd_context(); 190 + preempt_enable(); 191 + } 192 + 193 + static bool have_cpu_fpsimd_context(void) 194 + { 195 + return !preemptible() && __this_cpu_read(fpsimd_context_busy); 196 + } 148 197 149 198 /* 150 199 * Call __sve_free() directly only if you know task can't be scheduled ··· 266 215 * This function should be called only when the FPSIMD/SVE state in 267 216 * thread_struct is known to be up to date, when preparing to enter 268 217 * userspace. 269 - * 270 - * Softirqs (and preemption) must be disabled. 271 218 */ 272 219 static void task_fpsimd_load(void) 273 220 { 274 - WARN_ON(!in_softirq() && !irqs_disabled()); 221 + WARN_ON(!have_cpu_fpsimd_context()); 275 222 276 223 if (system_supports_sve() && test_thread_flag(TIF_SVE)) 277 224 sve_load_state(sve_pffr(&current->thread), ··· 282 233 /* 283 234 * Ensure FPSIMD/SVE storage in memory for the loaded context is up to 284 235 * date with respect to the CPU registers. 285 - * 286 - * Softirqs (and preemption) must be disabled. 287 236 */ 288 - void fpsimd_save(void) 237 + static void fpsimd_save(void) 289 238 { 290 239 struct fpsimd_last_state_struct const *last = 291 240 this_cpu_ptr(&fpsimd_last_state); 292 241 /* set by fpsimd_bind_task_to_cpu() or fpsimd_bind_state_to_cpu() */ 293 242 294 - WARN_ON(!in_softirq() && !irqs_disabled()); 243 + WARN_ON(!have_cpu_fpsimd_context()); 295 244 296 245 if (!test_thread_flag(TIF_FOREIGN_FPSTATE)) { 297 246 if (system_supports_sve() && test_thread_flag(TIF_SVE)) { ··· 411 364 * task->thread.sve_state. 412 365 * 413 366 * Task can be a non-runnable task, or current. In the latter case, 414 - * softirqs (and preemption) must be disabled. 367 + * the caller must have ownership of the cpu FPSIMD context before calling 368 + * this function. 415 369 * task->thread.sve_state must point to at least sve_state_size(task) 416 370 * bytes of allocated kernel memory. 417 371 * task->thread.uw.fpsimd_state must be up to date before calling this ··· 441 393 * task->thread.uw.fpsimd_state. 442 394 * 443 395 * Task can be a non-runnable task, or current. In the latter case, 444 - * softirqs (and preemption) must be disabled. 396 + * the caller must have ownership of the cpu FPSIMD context before calling 397 + * this function. 445 398 * task->thread.sve_state must point to at least sve_state_size(task) 446 399 * bytes of allocated kernel memory. 447 400 * task->thread.sve_state must be up to date before calling this function. ··· 606 557 * non-SVE thread. 607 558 */ 608 559 if (task == current) { 609 - local_bh_disable(); 560 + get_cpu_fpsimd_context(); 610 561 611 562 fpsimd_save(); 612 563 } ··· 616 567 sve_to_fpsimd(task); 617 568 618 569 if (task == current) 619 - local_bh_enable(); 570 + put_cpu_fpsimd_context(); 620 571 621 572 /* 622 573 * Force reallocation of task SVE state to the correct size ··· 929 880 930 881 sve_alloc(current); 931 882 932 - local_bh_disable(); 883 + get_cpu_fpsimd_context(); 933 884 934 885 fpsimd_save(); 935 886 ··· 940 891 if (test_and_set_thread_flag(TIF_SVE)) 941 892 WARN_ON(1); /* SVE access shouldn't have trapped */ 942 893 943 - local_bh_enable(); 894 + put_cpu_fpsimd_context(); 944 895 } 945 896 946 897 /* ··· 984 935 if (!system_supports_fpsimd()) 985 936 return; 986 937 938 + __get_cpu_fpsimd_context(); 939 + 987 940 /* Save unsaved fpsimd state, if any: */ 988 941 fpsimd_save(); 989 942 ··· 1000 949 1001 950 update_tsk_thread_flag(next, TIF_FOREIGN_FPSTATE, 1002 951 wrong_task || wrong_cpu); 952 + 953 + __put_cpu_fpsimd_context(); 1003 954 } 1004 955 1005 956 void fpsimd_flush_thread(void) ··· 1011 958 if (!system_supports_fpsimd()) 1012 959 return; 1013 960 1014 - local_bh_disable(); 961 + get_cpu_fpsimd_context(); 1015 962 1016 963 fpsimd_flush_task_state(current); 1017 964 memset(&current->thread.uw.fpsimd_state, 0, ··· 1052 999 current->thread.sve_vl_onexec = 0; 1053 1000 } 1054 1001 1055 - local_bh_enable(); 1002 + put_cpu_fpsimd_context(); 1056 1003 } 1057 1004 1058 1005 /* ··· 1064 1011 if (!system_supports_fpsimd()) 1065 1012 return; 1066 1013 1067 - local_bh_disable(); 1014 + get_cpu_fpsimd_context(); 1068 1015 fpsimd_save(); 1069 - local_bh_enable(); 1016 + put_cpu_fpsimd_context(); 1070 1017 } 1071 1018 1072 1019 /* ··· 1083 1030 1084 1031 /* 1085 1032 * Associate current's FPSIMD context with this cpu 1086 - * Preemption must be disabled when calling this function. 1033 + * The caller must have ownership of the cpu FPSIMD context before calling 1034 + * this function. 1087 1035 */ 1088 1036 void fpsimd_bind_task_to_cpu(void) 1089 1037 { ··· 1130 1076 if (!system_supports_fpsimd()) 1131 1077 return; 1132 1078 1133 - local_bh_disable(); 1079 + get_cpu_fpsimd_context(); 1134 1080 1135 1081 if (test_and_clear_thread_flag(TIF_FOREIGN_FPSTATE)) { 1136 1082 task_fpsimd_load(); 1137 1083 fpsimd_bind_task_to_cpu(); 1138 1084 } 1139 1085 1140 - local_bh_enable(); 1086 + put_cpu_fpsimd_context(); 1141 1087 } 1142 1088 1143 1089 /* ··· 1150 1096 if (!system_supports_fpsimd()) 1151 1097 return; 1152 1098 1153 - local_bh_disable(); 1099 + get_cpu_fpsimd_context(); 1154 1100 1155 1101 current->thread.uw.fpsimd_state = *state; 1156 1102 if (system_supports_sve() && test_thread_flag(TIF_SVE)) ··· 1161 1107 1162 1108 clear_thread_flag(TIF_FOREIGN_FPSTATE); 1163 1109 1164 - local_bh_enable(); 1110 + put_cpu_fpsimd_context(); 1165 1111 } 1166 1112 1167 1113 /* ··· 1187 1133 1188 1134 /* 1189 1135 * Invalidate any task's FPSIMD state that is present on this cpu. 1190 - * This function must be called with softirqs disabled. 1136 + * The FPSIMD context should be acquired with get_cpu_fpsimd_context() 1137 + * before calling this function. 1191 1138 */ 1192 - void fpsimd_flush_cpu_state(void) 1139 + static void fpsimd_flush_cpu_state(void) 1193 1140 { 1194 1141 __this_cpu_write(fpsimd_last_state.st, NULL); 1195 1142 set_thread_flag(TIF_FOREIGN_FPSTATE); 1196 1143 } 1197 1144 1198 - #ifdef CONFIG_KERNEL_MODE_NEON 1145 + /* 1146 + * Save the FPSIMD state to memory and invalidate cpu view. 1147 + * This function must be called with preemption disabled. 1148 + */ 1149 + void fpsimd_save_and_flush_cpu_state(void) 1150 + { 1151 + WARN_ON(preemptible()); 1152 + __get_cpu_fpsimd_context(); 1153 + fpsimd_save(); 1154 + fpsimd_flush_cpu_state(); 1155 + __put_cpu_fpsimd_context(); 1156 + } 1199 1157 1200 - DEFINE_PER_CPU(bool, kernel_neon_busy); 1201 - EXPORT_PER_CPU_SYMBOL(kernel_neon_busy); 1158 + #ifdef CONFIG_KERNEL_MODE_NEON 1202 1159 1203 1160 /* 1204 1161 * Kernel-side NEON support functions ··· 1235 1170 1236 1171 BUG_ON(!may_use_simd()); 1237 1172 1238 - local_bh_disable(); 1239 - 1240 - __this_cpu_write(kernel_neon_busy, true); 1173 + get_cpu_fpsimd_context(); 1241 1174 1242 1175 /* Save unsaved fpsimd state, if any: */ 1243 1176 fpsimd_save(); 1244 1177 1245 1178 /* Invalidate any task state remaining in the fpsimd regs: */ 1246 1179 fpsimd_flush_cpu_state(); 1247 - 1248 - preempt_disable(); 1249 - 1250 - local_bh_enable(); 1251 1180 } 1252 1181 EXPORT_SYMBOL(kernel_neon_begin); 1253 1182 ··· 1256 1197 */ 1257 1198 void kernel_neon_end(void) 1258 1199 { 1259 - bool busy; 1260 - 1261 1200 if (!system_supports_fpsimd()) 1262 1201 return; 1263 1202 1264 - busy = __this_cpu_xchg(kernel_neon_busy, false); 1265 - WARN_ON(!busy); /* No matching kernel_neon_begin()? */ 1266 - 1267 - preempt_enable(); 1203 + put_cpu_fpsimd_context(); 1268 1204 } 1269 1205 EXPORT_SYMBOL(kernel_neon_end); 1270 1206 ··· 1351 1297 { 1352 1298 switch (cmd) { 1353 1299 case CPU_PM_ENTER: 1354 - fpsimd_save(); 1355 - fpsimd_flush_cpu_state(); 1300 + fpsimd_save_and_flush_cpu_state(); 1356 1301 break; 1357 1302 case CPU_PM_EXIT: 1358 1303 break;
+26
arch/arm64/kernel/irq.c
··· 16 16 #include <linux/smp.h> 17 17 #include <linux/init.h> 18 18 #include <linux/irqchip.h> 19 + #include <linux/kprobes.h> 19 20 #include <linux/seq_file.h> 20 21 #include <linux/vmalloc.h> 22 + #include <asm/daifflags.h> 21 23 #include <asm/vmap_stack.h> 22 24 23 25 unsigned long irq_err_count; ··· 66 64 irqchip_init(); 67 65 if (!handle_arch_irq) 68 66 panic("No interrupt controller found."); 67 + 68 + if (system_uses_irq_prio_masking()) { 69 + /* 70 + * Now that we have a stack for our IRQ handler, set 71 + * the PMR/PSR pair to a consistent state. 72 + */ 73 + WARN_ON(read_sysreg(daif) & PSR_A_BIT); 74 + local_daif_restore(DAIF_PROCCTX_NOIRQ); 75 + } 69 76 } 77 + 78 + /* 79 + * Stubs to make nmi_enter/exit() code callable from ASM 80 + */ 81 + asmlinkage void notrace asm_nmi_enter(void) 82 + { 83 + nmi_enter(); 84 + } 85 + NOKPROBE_SYMBOL(asm_nmi_enter); 86 + 87 + asmlinkage void notrace asm_nmi_exit(void) 88 + { 89 + nmi_exit(); 90 + } 91 + NOKPROBE_SYMBOL(asm_nmi_exit);
+2 -2
arch/arm64/kernel/module.c
··· 34 34 module_alloc_end = MODULES_END; 35 35 36 36 p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base, 37 - module_alloc_end, gfp_mask, PAGE_KERNEL_EXEC, 0, 37 + module_alloc_end, gfp_mask, PAGE_KERNEL, 0, 38 38 NUMA_NO_NODE, __builtin_return_address(0)); 39 39 40 40 if (!p && IS_ENABLED(CONFIG_ARM64_MODULE_PLTS) && ··· 50 50 */ 51 51 p = __vmalloc_node_range(size, MODULE_ALIGN, module_alloc_base, 52 52 module_alloc_base + SZ_2G, GFP_KERNEL, 53 - PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, 53 + PAGE_KERNEL, 0, NUMA_NO_NODE, 54 54 __builtin_return_address(0)); 55 55 56 56 if (p && (kasan_module_alloc(p, size) < 0)) {
+3 -1
arch/arm64/kernel/probes/kprobes.c
··· 122 122 void *page; 123 123 124 124 page = vmalloc_exec(PAGE_SIZE); 125 - if (page) 125 + if (page) { 126 126 set_memory_ro((unsigned long)page, 1); 127 + set_vm_flush_reset_perms(page); 128 + } 127 129 128 130 return page; 129 131 }
+1 -1
arch/arm64/kernel/process.c
··· 83 83 * be raised. 84 84 */ 85 85 pmr = gic_read_pmr(); 86 - gic_write_pmr(GIC_PRIO_IRQON); 86 + gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET); 87 87 88 88 __cpu_do_idle(); 89 89
+5 -1
arch/arm64/kernel/ptrace.c
··· 1808 1808 1809 1809 int syscall_trace_enter(struct pt_regs *regs) 1810 1810 { 1811 - if (test_thread_flag(TIF_SYSCALL_TRACE)) 1811 + if (test_thread_flag(TIF_SYSCALL_TRACE) || 1812 + test_thread_flag(TIF_SYSCALL_EMU)) { 1812 1813 tracehook_report_syscall(regs, PTRACE_SYSCALL_ENTER); 1814 + if (!in_syscall(regs) || test_thread_flag(TIF_SYSCALL_EMU)) 1815 + return -1; 1816 + } 1813 1817 1814 1818 /* Do the secure computing after ptrace; failures should be fast. */ 1815 1819 if (secure_computing(NULL) == -1)
+1 -1
arch/arm64/kernel/sleep.S
··· 27 27 * aff0 = mpidr_masked & 0xff; 28 28 * aff1 = mpidr_masked & 0xff00; 29 29 * aff2 = mpidr_masked & 0xff0000; 30 - * aff2 = mpidr_masked & 0xff00000000; 30 + * aff3 = mpidr_masked & 0xff00000000; 31 31 * dst = (aff0 >> rs0 | aff1 >> rs1 | aff2 >> rs2 | aff3 >> rs3); 32 32 *} 33 33 * Input registers: rs0, rs1, rs2, rs3, mpidr, mask
+14 -13
arch/arm64/kernel/smp.c
··· 181 181 182 182 WARN_ON(!(cpuflags & PSR_I_BIT)); 183 183 184 - gic_write_pmr(GIC_PRIO_IRQOFF); 185 - 186 - /* We can only unmask PSR.I if we can take aborts */ 187 - if (!(cpuflags & PSR_A_BIT)) 188 - write_sysreg(cpuflags & ~PSR_I_BIT, daif); 184 + gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET); 189 185 } 190 186 191 187 /* ··· 830 834 } 831 835 #endif 832 836 833 - /* 834 - * ipi_cpu_stop - handle IPI from smp_send_stop() 835 - */ 836 - static void ipi_cpu_stop(unsigned int cpu) 837 + static void local_cpu_stop(void) 837 838 { 838 - set_cpu_online(cpu, false); 839 + set_cpu_online(smp_processor_id(), false); 839 840 840 841 local_daif_mask(); 841 842 sdei_mask_local_cpu(); 843 + cpu_park_loop(); 844 + } 842 845 843 - while (1) 844 - cpu_relax(); 846 + /* 847 + * We need to implement panic_smp_self_stop() for parallel panic() calls, so 848 + * that cpu_online_mask gets correctly updated and smp_send_stop() can skip 849 + * CPUs that have already stopped themselves. 850 + */ 851 + void panic_smp_self_stop(void) 852 + { 853 + local_cpu_stop(); 845 854 } 846 855 847 856 #ifdef CONFIG_KEXEC_CORE ··· 899 898 900 899 case IPI_CPU_STOP: 901 900 irq_enter(); 902 - ipi_cpu_stop(cpu); 901 + local_cpu_stop(); 903 902 irq_exit(); 904 903 break; 905 904
+7 -16
arch/arm64/kernel/traps.c
··· 55 55 printk(" %pS\n", (void *)where); 56 56 } 57 57 58 - static void __dump_instr(const char *lvl, struct pt_regs *regs) 58 + static void dump_kernel_instr(const char *lvl, struct pt_regs *regs) 59 59 { 60 60 unsigned long addr = instruction_pointer(regs); 61 61 char str[sizeof("00000000 ") * 5 + 2 + 1], *p = str; 62 62 int i; 63 63 64 + if (user_mode(regs)) 65 + return; 66 + 64 67 for (i = -4; i < 1; i++) { 65 68 unsigned int val, bad; 66 69 67 - bad = get_user(val, &((u32 *)addr)[i]); 70 + bad = aarch64_insn_read(&((u32 *)addr)[i], &val); 68 71 69 72 if (!bad) 70 73 p += sprintf(p, i == 0 ? "(%08x) " : "%08x ", val); ··· 76 73 break; 77 74 } 78 75 } 79 - printk("%sCode: %s\n", lvl, str); 80 - } 81 76 82 - static void dump_instr(const char *lvl, struct pt_regs *regs) 83 - { 84 - if (!user_mode(regs)) { 85 - mm_segment_t fs = get_fs(); 86 - set_fs(KERNEL_DS); 87 - __dump_instr(lvl, regs); 88 - set_fs(fs); 89 - } else { 90 - __dump_instr(lvl, regs); 91 - } 77 + printk("%sCode: %s\n", lvl, str); 92 78 } 93 79 94 80 void dump_backtrace(struct pt_regs *regs, struct task_struct *tsk) ··· 163 171 print_modules(); 164 172 show_regs(regs); 165 173 166 - if (!user_mode(regs)) 167 - dump_instr(KERN_EMERG, regs); 174 + dump_kernel_instr(KERN_EMERG, regs); 168 175 169 176 return ret; 170 177 }
+1 -3
arch/arm64/kvm/fpsimd.c
··· 112 112 if (vcpu->arch.flags & KVM_ARM64_FP_ENABLED) { 113 113 u64 *guest_zcr = &vcpu->arch.ctxt.sys_regs[ZCR_EL1]; 114 114 115 - /* Clean guest FP state to memory and invalidate cpu view */ 116 - fpsimd_save(); 117 - fpsimd_flush_cpu_state(); 115 + fpsimd_save_and_flush_cpu_state(); 118 116 119 117 if (guest_has_sve) 120 118 *guest_zcr = read_sysreg_s(SYS_ZCR_EL12);
+1 -1
arch/arm64/kvm/hyp/switch.c
··· 604 604 * Naturally, we want to avoid this. 605 605 */ 606 606 if (system_uses_irq_prio_masking()) { 607 - gic_write_pmr(GIC_PRIO_IRQON); 607 + gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET); 608 608 dsb(sy); 609 609 } 610 610
+8 -4
arch/arm64/mm/dma-mapping.c
··· 80 80 81 81 static int __init arm64_dma_init(void) 82 82 { 83 - WARN_TAINT(ARCH_DMA_MINALIGN < cache_line_size(), 84 - TAINT_CPU_OUT_OF_SPEC, 85 - "ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < %d)", 86 - ARCH_DMA_MINALIGN, cache_line_size()); 87 83 return dma_atomic_pool_init(GFP_DMA32, __pgprot(PROT_NORMAL_NC)); 88 84 } 89 85 arch_initcall(arm64_dma_init); ··· 457 461 void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, 458 462 const struct iommu_ops *iommu, bool coherent) 459 463 { 464 + int cls = cache_line_size_of_cpu(); 465 + 466 + WARN_TAINT(!coherent && cls > ARCH_DMA_MINALIGN, 467 + TAINT_CPU_OUT_OF_SPEC, 468 + "%s %s: ARCH_DMA_MINALIGN smaller than CTR_EL0.CWG (%d < %d)", 469 + dev_driver_string(dev), dev_name(dev), 470 + ARCH_DMA_MINALIGN, cls); 471 + 460 472 dev->dma_coherent = coherent; 461 473 __iommu_setup_dma_ops(dev, dma_base, size, iommu); 462 474
+30 -31
arch/arm64/mm/fault.c
··· 384 384 #define VM_FAULT_BADACCESS 0x020000 385 385 386 386 static vm_fault_t __do_page_fault(struct mm_struct *mm, unsigned long addr, 387 - unsigned int mm_flags, unsigned long vm_flags, 388 - struct task_struct *tsk) 387 + unsigned int mm_flags, unsigned long vm_flags) 389 388 { 390 - struct vm_area_struct *vma; 391 - vm_fault_t fault; 389 + struct vm_area_struct *vma = find_vma(mm, addr); 392 390 393 - vma = find_vma(mm, addr); 394 - fault = VM_FAULT_BADMAP; 395 391 if (unlikely(!vma)) 396 - goto out; 397 - if (unlikely(vma->vm_start > addr)) 398 - goto check_stack; 392 + return VM_FAULT_BADMAP; 399 393 400 394 /* 401 395 * Ok, we have a good vm_area for this memory access, so we can handle 402 396 * it. 403 397 */ 404 - good_area: 398 + if (unlikely(vma->vm_start > addr)) { 399 + if (!(vma->vm_flags & VM_GROWSDOWN)) 400 + return VM_FAULT_BADMAP; 401 + if (expand_stack(vma, addr)) 402 + return VM_FAULT_BADMAP; 403 + } 404 + 405 405 /* 406 406 * Check that the permissions on the VMA allow for the fault which 407 407 * occurred. 408 408 */ 409 - if (!(vma->vm_flags & vm_flags)) { 410 - fault = VM_FAULT_BADACCESS; 411 - goto out; 412 - } 413 - 409 + if (!(vma->vm_flags & vm_flags)) 410 + return VM_FAULT_BADACCESS; 414 411 return handle_mm_fault(vma, addr & PAGE_MASK, mm_flags); 415 - 416 - check_stack: 417 - if (vma->vm_flags & VM_GROWSDOWN && !expand_stack(vma, addr)) 418 - goto good_area; 419 - out: 420 - return fault; 421 412 } 422 413 423 414 static bool is_el0_instruction_abort(unsigned int esr) ··· 416 425 return ESR_ELx_EC(esr) == ESR_ELx_EC_IABT_LOW; 417 426 } 418 427 428 + /* 429 + * Note: not valid for EL1 DC IVAC, but we never use that such that it 430 + * should fault. EL0 cannot issue DC IVAC (undef). 431 + */ 432 + static bool is_write_abort(unsigned int esr) 433 + { 434 + return (esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM); 435 + } 436 + 419 437 static int __kprobes do_page_fault(unsigned long addr, unsigned int esr, 420 438 struct pt_regs *regs) 421 439 { 422 440 const struct fault_info *inf; 423 - struct task_struct *tsk; 424 - struct mm_struct *mm; 441 + struct mm_struct *mm = current->mm; 425 442 vm_fault_t fault, major = 0; 426 443 unsigned long vm_flags = VM_READ | VM_WRITE; 427 444 unsigned int mm_flags = FAULT_FLAG_ALLOW_RETRY | FAULT_FLAG_KILLABLE; 428 445 429 446 if (notify_page_fault(regs, esr)) 430 447 return 0; 431 - 432 - tsk = current; 433 - mm = tsk->mm; 434 448 435 449 /* 436 450 * If we're in an interrupt or have no user context, we must not take ··· 449 453 450 454 if (is_el0_instruction_abort(esr)) { 451 455 vm_flags = VM_EXEC; 452 - } else if ((esr & ESR_ELx_WNR) && !(esr & ESR_ELx_CM)) { 456 + mm_flags |= FAULT_FLAG_INSTRUCTION; 457 + } else if (is_write_abort(esr)) { 453 458 vm_flags = VM_WRITE; 454 459 mm_flags |= FAULT_FLAG_WRITE; 455 460 } ··· 489 492 */ 490 493 might_sleep(); 491 494 #ifdef CONFIG_DEBUG_VM 492 - if (!user_mode(regs) && !search_exception_tables(regs->pc)) 495 + if (!user_mode(regs) && !search_exception_tables(regs->pc)) { 496 + up_read(&mm->mmap_sem); 493 497 goto no_context; 498 + } 494 499 #endif 495 500 } 496 501 497 - fault = __do_page_fault(mm, addr, mm_flags, vm_flags, tsk); 502 + fault = __do_page_fault(mm, addr, mm_flags, vm_flags); 498 503 major |= fault & VM_FAULT_MAJOR; 499 504 500 505 if (fault & VM_FAULT_RETRY) { ··· 536 537 * that point. 537 538 */ 538 539 if (major) { 539 - tsk->maj_flt++; 540 + current->maj_flt++; 540 541 perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MAJ, 1, regs, 541 542 addr); 542 543 } else { 543 - tsk->min_flt++; 544 + current->min_flt++; 544 545 perf_sw_event(PERF_COUNT_SW_PAGE_FAULTS_MIN, 1, regs, 545 546 addr); 546 547 }
+6 -6
arch/arm64/mm/hugetlbpage.c
··· 228 228 229 229 if (sz == PUD_SIZE) { 230 230 ptep = (pte_t *)pudp; 231 - } else if (sz == (PAGE_SIZE * CONT_PTES)) { 231 + } else if (sz == (CONT_PTE_SIZE)) { 232 232 pmdp = pmd_alloc(mm, pudp, addr); 233 233 234 234 WARN_ON(addr & (sz - 1)); ··· 246 246 ptep = huge_pmd_share(mm, addr, pudp); 247 247 else 248 248 ptep = (pte_t *)pmd_alloc(mm, pudp, addr); 249 - } else if (sz == (PMD_SIZE * CONT_PMDS)) { 249 + } else if (sz == (CONT_PMD_SIZE)) { 250 250 pmdp = pmd_alloc(mm, pudp, addr); 251 251 WARN_ON(addr & (sz - 1)); 252 252 return (pte_t *)pmdp; ··· 454 454 #ifdef CONFIG_ARM64_4K_PAGES 455 455 add_huge_page_size(PUD_SIZE); 456 456 #endif 457 - add_huge_page_size(PMD_SIZE * CONT_PMDS); 457 + add_huge_page_size(CONT_PMD_SIZE); 458 458 add_huge_page_size(PMD_SIZE); 459 - add_huge_page_size(PAGE_SIZE * CONT_PTES); 459 + add_huge_page_size(CONT_PTE_SIZE); 460 460 461 461 return 0; 462 462 } ··· 470 470 #ifdef CONFIG_ARM64_4K_PAGES 471 471 case PUD_SIZE: 472 472 #endif 473 - case PMD_SIZE * CONT_PMDS: 473 + case CONT_PMD_SIZE: 474 474 case PMD_SIZE: 475 - case PAGE_SIZE * CONT_PTES: 475 + case CONT_PTE_SIZE: 476 476 add_huge_page_size(ps); 477 477 return 1; 478 478 }
+3 -2
arch/arm64/mm/init.c
··· 180 180 { 181 181 unsigned long max_zone_pfns[MAX_NR_ZONES] = {0}; 182 182 183 - if (IS_ENABLED(CONFIG_ZONE_DMA32)) 184 - max_zone_pfns[ZONE_DMA32] = PFN_DOWN(max_zone_dma_phys()); 183 + #ifdef CONFIG_ZONE_DMA32 184 + max_zone_pfns[ZONE_DMA32] = PFN_DOWN(max_zone_dma_phys()); 185 + #endif 185 186 max_zone_pfns[ZONE_NORMAL] = max; 186 187 187 188 free_area_init_nodes(max_zone_pfns);
+5 -9
arch/arm64/mm/mmu.c
··· 765 765 766 766 return 0; 767 767 } 768 - #endif /* CONFIG_ARM64_64K_PAGES */ 768 + #endif /* !ARM64_SWAPPER_USES_SECTION_MAPS */ 769 769 void vmemmap_free(unsigned long start, unsigned long end, 770 770 struct vmem_altmap *altmap) 771 771 { ··· 960 960 961 961 int pud_set_huge(pud_t *pudp, phys_addr_t phys, pgprot_t prot) 962 962 { 963 - pgprot_t sect_prot = __pgprot(PUD_TYPE_SECT | 964 - pgprot_val(mk_sect_prot(prot))); 965 - pud_t new_pud = pfn_pud(__phys_to_pfn(phys), sect_prot); 963 + pud_t new_pud = pfn_pud(__phys_to_pfn(phys), mk_pud_sect_prot(prot)); 966 964 967 965 /* Only allow permission changes for now */ 968 966 if (!pgattr_change_is_safe(READ_ONCE(pud_val(*pudp)), 969 967 pud_val(new_pud))) 970 968 return 0; 971 969 972 - BUG_ON(phys & ~PUD_MASK); 970 + VM_BUG_ON(phys & ~PUD_MASK); 973 971 set_pud(pudp, new_pud); 974 972 return 1; 975 973 } 976 974 977 975 int pmd_set_huge(pmd_t *pmdp, phys_addr_t phys, pgprot_t prot) 978 976 { 979 - pgprot_t sect_prot = __pgprot(PMD_TYPE_SECT | 980 - pgprot_val(mk_sect_prot(prot))); 981 - pmd_t new_pmd = pfn_pmd(__phys_to_pfn(phys), sect_prot); 977 + pmd_t new_pmd = pfn_pmd(__phys_to_pfn(phys), mk_pmd_sect_prot(prot)); 982 978 983 979 /* Only allow permission changes for now */ 984 980 if (!pgattr_change_is_safe(READ_ONCE(pmd_val(*pmdp)), 985 981 pmd_val(new_pmd))) 986 982 return 0; 987 983 988 - BUG_ON(phys & ~PMD_MASK); 984 + VM_BUG_ON(phys & ~PMD_MASK); 989 985 set_pmd(pmdp, new_pmd); 990 986 return 1; 991 987 }
+40 -8
arch/arm64/mm/pageattr.c
··· 151 151 __pgprot(PTE_VALID)); 152 152 } 153 153 154 - #ifdef CONFIG_DEBUG_PAGEALLOC 154 + int set_direct_map_invalid_noflush(struct page *page) 155 + { 156 + struct page_change_data data = { 157 + .set_mask = __pgprot(0), 158 + .clear_mask = __pgprot(PTE_VALID), 159 + }; 160 + 161 + if (!rodata_full) 162 + return 0; 163 + 164 + return apply_to_page_range(&init_mm, 165 + (unsigned long)page_address(page), 166 + PAGE_SIZE, change_page_range, &data); 167 + } 168 + 169 + int set_direct_map_default_noflush(struct page *page) 170 + { 171 + struct page_change_data data = { 172 + .set_mask = __pgprot(PTE_VALID | PTE_WRITE), 173 + .clear_mask = __pgprot(PTE_RDONLY), 174 + }; 175 + 176 + if (!rodata_full) 177 + return 0; 178 + 179 + return apply_to_page_range(&init_mm, 180 + (unsigned long)page_address(page), 181 + PAGE_SIZE, change_page_range, &data); 182 + } 183 + 155 184 void __kernel_map_pages(struct page *page, int numpages, int enable) 156 185 { 186 + if (!debug_pagealloc_enabled() && !rodata_full) 187 + return; 188 + 157 189 set_memory_valid((unsigned long)page_address(page), numpages, enable); 158 190 } 159 - #ifdef CONFIG_HIBERNATION 191 + 160 192 /* 161 - * When built with CONFIG_DEBUG_PAGEALLOC and CONFIG_HIBERNATION, this function 162 - * is used to determine if a linear map page has been marked as not-valid by 163 - * CONFIG_DEBUG_PAGEALLOC. Walk the page table and check the PTE_VALID bit. 164 - * This is based on kern_addr_valid(), which almost does what we need. 193 + * This function is used to determine if a linear map page has been marked as 194 + * not-valid. Walk the page table and check the PTE_VALID bit. This is based 195 + * on kern_addr_valid(), which almost does what we need. 165 196 * 166 197 * Because this is only called on the kernel linear map, p?d_sect() implies 167 198 * p?d_present(). When debug_pagealloc is enabled, sections mappings are ··· 205 174 pmd_t *pmdp, pmd; 206 175 pte_t *ptep; 207 176 unsigned long addr = (unsigned long)page_address(page); 177 + 178 + if (!debug_pagealloc_enabled() && !rodata_full) 179 + return true; 208 180 209 181 pgdp = pgd_offset_k(addr); 210 182 if (pgd_none(READ_ONCE(*pgdp))) ··· 230 196 ptep = pte_offset_kernel(pmdp, addr); 231 197 return pte_valid(READ_ONCE(*ptep)); 232 198 } 233 - #endif /* CONFIG_HIBERNATION */ 234 - #endif /* CONFIG_DEBUG_PAGEALLOC */
+1 -1
arch/arm64/net/bpf_jit_comp.c
··· 970 970 { 971 971 return __vmalloc_node_range(size, PAGE_SIZE, BPF_JIT_REGION_START, 972 972 BPF_JIT_REGION_END, GFP_KERNEL, 973 - PAGE_KERNEL_EXEC, 0, NUMA_NO_NODE, 973 + PAGE_KERNEL, 0, NUMA_NO_NODE, 974 974 __builtin_return_address(0)); 975 975 } 976 976
-1
arch/powerpc/kernel/ptrace.c
··· 2521 2521 { 2522 2522 /* make sure the single step bit is not set. */ 2523 2523 user_disable_single_step(child); 2524 - clear_tsk_thread_flag(child, TIF_SYSCALL_EMU); 2525 2524 } 2526 2525 2527 2526 #ifdef CONFIG_PPC_ADV_DEBUG_REGS
+6 -11
arch/x86/entry/common.c
··· 72 72 73 73 struct thread_info *ti = current_thread_info(); 74 74 unsigned long ret = 0; 75 - bool emulated = false; 76 75 u32 work; 77 76 78 77 if (IS_ENABLED(CONFIG_DEBUG_ENTRY)) 79 78 BUG_ON(regs != task_pt_regs(current)); 80 79 81 - work = READ_ONCE(ti->flags) & _TIF_WORK_SYSCALL_ENTRY; 80 + work = READ_ONCE(ti->flags); 82 81 83 - if (unlikely(work & _TIF_SYSCALL_EMU)) 84 - emulated = true; 85 - 86 - if ((emulated || (work & _TIF_SYSCALL_TRACE)) && 87 - tracehook_report_syscall_entry(regs)) 88 - return -1L; 89 - 90 - if (emulated) 91 - return -1L; 82 + if (work & (_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU)) { 83 + ret = tracehook_report_syscall_entry(regs); 84 + if (ret || (work & _TIF_SYSCALL_EMU)) 85 + return -1L; 86 + } 92 87 93 88 #ifdef CONFIG_SECCOMP 94 89 /*
-3
arch/x86/kernel/ptrace.c
··· 747 747 void ptrace_disable(struct task_struct *child) 748 748 { 749 749 user_disable_single_step(child); 750 - #ifdef TIF_SYSCALL_EMU 751 - clear_tsk_thread_flag(child, TIF_SYSCALL_EMU); 752 - #endif 753 750 } 754 751 755 752 #if defined CONFIG_X86_32 || defined CONFIG_IA32_EMULATION
+55 -6
drivers/acpi/pptt.c
··· 432 432 } 433 433 } 434 434 435 + static bool flag_identical(struct acpi_table_header *table_hdr, 436 + struct acpi_pptt_processor *cpu) 437 + { 438 + struct acpi_pptt_processor *next; 439 + 440 + /* heterogeneous machines must use PPTT revision > 1 */ 441 + if (table_hdr->revision < 2) 442 + return false; 443 + 444 + /* Locate the last node in the tree with IDENTICAL set */ 445 + if (cpu->flags & ACPI_PPTT_ACPI_IDENTICAL) { 446 + next = fetch_pptt_node(table_hdr, cpu->parent); 447 + if (!(next && next->flags & ACPI_PPTT_ACPI_IDENTICAL)) 448 + return true; 449 + } 450 + 451 + return false; 452 + } 453 + 435 454 /* Passing level values greater than this will result in search termination */ 436 455 #define PPTT_ABORT_PACKAGE 0xFF 437 456 438 - static struct acpi_pptt_processor *acpi_find_processor_package_id(struct acpi_table_header *table_hdr, 439 - struct acpi_pptt_processor *cpu, 440 - int level, int flag) 457 + static struct acpi_pptt_processor *acpi_find_processor_tag(struct acpi_table_header *table_hdr, 458 + struct acpi_pptt_processor *cpu, 459 + int level, int flag) 441 460 { 442 461 struct acpi_pptt_processor *prev_node; 443 462 444 463 while (cpu && level) { 445 - if (cpu->flags & flag) 464 + /* special case the identical flag to find last identical */ 465 + if (flag == ACPI_PPTT_ACPI_IDENTICAL) { 466 + if (flag_identical(table_hdr, cpu)) 467 + break; 468 + } else if (cpu->flags & flag) 446 469 break; 447 470 pr_debug("level %d\n", level); 448 471 prev_node = fetch_pptt_node(table_hdr, cpu->parent); ··· 503 480 504 481 cpu_node = acpi_find_processor_node(table, acpi_cpu_id); 505 482 if (cpu_node) { 506 - cpu_node = acpi_find_processor_package_id(table, cpu_node, 507 - level, flag); 483 + cpu_node = acpi_find_processor_tag(table, cpu_node, 484 + level, flag); 508 485 /* 509 486 * As per specification if the processor structure represents 510 487 * an actual processor, then ACPI processor ID must be valid. ··· 682 659 { 683 660 return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE, 684 661 ACPI_PPTT_PHYSICAL_PACKAGE); 662 + } 663 + 664 + /** 665 + * find_acpi_cpu_topology_hetero_id() - Get a core architecture tag 666 + * @cpu: Kernel logical CPU number 667 + * 668 + * Determine a unique heterogeneous tag for the given CPU. CPUs with the same 669 + * implementation should have matching tags. 670 + * 671 + * The returned tag can be used to group peers with identical implementation. 672 + * 673 + * The search terminates when a level is found with the identical implementation 674 + * flag set or we reach a root node. 675 + * 676 + * Due to limitations in the PPTT data structure, there may be rare situations 677 + * where two cores in a heterogeneous machine may be identical, but won't have 678 + * the same tag. 679 + * 680 + * Return: -ENOENT if the PPTT doesn't exist, or the CPU cannot be found. 681 + * Otherwise returns a value which represents a group of identical cores 682 + * similar to this CPU. 683 + */ 684 + int find_acpi_cpu_topology_hetero_id(unsigned int cpu) 685 + { 686 + return find_acpi_cpu_topology_tag(cpu, PPTT_ABORT_PACKAGE, 687 + ACPI_PPTT_ACPI_IDENTICAL); 685 688 }
+5
drivers/base/cacheinfo.c
··· 213 213 return -ENOTSUPP; 214 214 } 215 215 216 + unsigned int coherency_max_size; 217 + 216 218 static int cache_shared_cpu_map_setup(unsigned int cpu) 217 219 { 218 220 struct cpu_cacheinfo *this_cpu_ci = get_cpu_cacheinfo(cpu); ··· 253 251 cpumask_set_cpu(i, &this_leaf->shared_cpu_map); 254 252 } 255 253 } 254 + /* record the maximum cache line size */ 255 + if (this_leaf->coherency_line_size > coherency_max_size) 256 + coherency_max_size = this_leaf->coherency_line_size; 256 257 } 257 258 258 259 return 0;
+7
drivers/irqchip/irq-gic-v3.c
··· 461 461 462 462 static inline void gic_handle_nmi(u32 irqnr, struct pt_regs *regs) 463 463 { 464 + bool irqs_enabled = interrupts_enabled(regs); 464 465 int err; 466 + 467 + if (irqs_enabled) 468 + nmi_enter(); 465 469 466 470 if (static_branch_likely(&supports_deactivate_key)) 467 471 gic_write_eoir(irqnr); ··· 478 474 err = handle_domain_nmi(gic_data.domain, irqnr, regs); 479 475 if (err) 480 476 gic_deactivate_unhandled(irqnr); 477 + 478 + if (irqs_enabled) 479 + nmi_exit(); 481 480 } 482 481 483 482 static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs)
+8
drivers/perf/Kconfig
··· 71 71 system, control logic. The PMU allows counting various events related 72 72 to DSU. 73 73 74 + config FSL_IMX8_DDR_PMU 75 + tristate "Freescale i.MX8 DDR perf monitor" 76 + depends on ARCH_MXC 77 + help 78 + Provides support for the DDR performance monitor in i.MX8, which 79 + can give information about memory throughput and other related 80 + events. 81 + 74 82 config HISI_PMU 75 83 bool "HiSilicon SoC PMU" 76 84 depends on ARM64 && ACPI
+1
drivers/perf/Makefile
··· 5 5 obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o 6 6 obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o 7 7 obj-$(CONFIG_ARM_SMMU_V3_PMU) += arm_smmuv3_pmu.o 8 + obj-$(CONFIG_FSL_IMX8_DDR_PMU) += fsl_imx8_ddr_perf.o 8 9 obj-$(CONFIG_HISI_PMU) += hisilicon/ 9 10 obj-$(CONFIG_QCOM_L2_PMU) += qcom_l2_pmu.o 10 11 obj-$(CONFIG_QCOM_L3_PMU) += qcom_l3_pmu.o
+72
drivers/perf/arm_pmu_acpi.c
··· 71 71 acpi_unregister_gsi(gsi); 72 72 } 73 73 74 + #if IS_ENABLED(CONFIG_ARM_SPE_PMU) 75 + static struct resource spe_resources[] = { 76 + { 77 + /* irq */ 78 + .flags = IORESOURCE_IRQ, 79 + } 80 + }; 81 + 82 + static struct platform_device spe_dev = { 83 + .name = ARMV8_SPE_PDEV_NAME, 84 + .id = -1, 85 + .resource = spe_resources, 86 + .num_resources = ARRAY_SIZE(spe_resources) 87 + }; 88 + 89 + /* 90 + * For lack of a better place, hook the normal PMU MADT walk 91 + * and create a SPE device if we detect a recent MADT with 92 + * a homogeneous PPI mapping. 93 + */ 94 + static void arm_spe_acpi_register_device(void) 95 + { 96 + int cpu, hetid, irq, ret; 97 + bool first = true; 98 + u16 gsi = 0; 99 + 100 + /* 101 + * Sanity check all the GICC tables for the same interrupt number. 102 + * For now, we only support homogeneous ACPI/SPE machines. 103 + */ 104 + for_each_possible_cpu(cpu) { 105 + struct acpi_madt_generic_interrupt *gicc; 106 + 107 + gicc = acpi_cpu_get_madt_gicc(cpu); 108 + if (gicc->header.length < ACPI_MADT_GICC_SPE) 109 + return; 110 + 111 + if (first) { 112 + gsi = gicc->spe_interrupt; 113 + if (!gsi) 114 + return; 115 + hetid = find_acpi_cpu_topology_hetero_id(cpu); 116 + first = false; 117 + } else if ((gsi != gicc->spe_interrupt) || 118 + (hetid != find_acpi_cpu_topology_hetero_id(cpu))) { 119 + pr_warn("ACPI: SPE must be homogeneous\n"); 120 + return; 121 + } 122 + } 123 + 124 + irq = acpi_register_gsi(NULL, gsi, ACPI_LEVEL_SENSITIVE, 125 + ACPI_ACTIVE_HIGH); 126 + if (irq < 0) { 127 + pr_warn("ACPI: SPE Unable to register interrupt: %d\n", gsi); 128 + return; 129 + } 130 + 131 + spe_resources[0].start = irq; 132 + ret = platform_device_register(&spe_dev); 133 + if (ret < 0) { 134 + pr_warn("ACPI: SPE: Unable to register device\n"); 135 + acpi_unregister_gsi(gsi); 136 + } 137 + } 138 + #else 139 + static inline void arm_spe_acpi_register_device(void) 140 + { 141 + } 142 + #endif /* CONFIG_ARM_SPE_PMU */ 143 + 74 144 static int arm_pmu_acpi_parse_irqs(void) 75 145 { 76 146 int irq, cpu, irq_cpu, err; ··· 345 275 346 276 if (acpi_disabled) 347 277 return 0; 278 + 279 + arm_spe_acpi_register_device(); 348 280 349 281 ret = arm_pmu_acpi_parse_irqs(); 350 282 if (ret)
+10 -2
drivers/perf/arm_spe_pmu.c
··· 27 27 #include <linux/of_address.h> 28 28 #include <linux/of_device.h> 29 29 #include <linux/perf_event.h> 30 + #include <linux/perf/arm_pmu.h> 30 31 #include <linux/platform_device.h> 31 32 #include <linux/printk.h> 32 33 #include <linux/slab.h> ··· 1158 1157 }; 1159 1158 MODULE_DEVICE_TABLE(of, arm_spe_pmu_of_match); 1160 1159 1161 - static int arm_spe_pmu_device_dt_probe(struct platform_device *pdev) 1160 + static const struct platform_device_id arm_spe_match[] = { 1161 + { ARMV8_SPE_PDEV_NAME, 0}, 1162 + { } 1163 + }; 1164 + MODULE_DEVICE_TABLE(platform, arm_spe_match); 1165 + 1166 + static int arm_spe_pmu_device_probe(struct platform_device *pdev) 1162 1167 { 1163 1168 int ret; 1164 1169 struct arm_spe_pmu *spe_pmu; ··· 1224 1217 } 1225 1218 1226 1219 static struct platform_driver arm_spe_pmu_driver = { 1220 + .id_table = arm_spe_match, 1227 1221 .driver = { 1228 1222 .name = DRVNAME, 1229 1223 .of_match_table = of_match_ptr(arm_spe_pmu_of_match), 1230 1224 }, 1231 - .probe = arm_spe_pmu_device_dt_probe, 1225 + .probe = arm_spe_pmu_device_probe, 1232 1226 .remove = arm_spe_pmu_device_remove, 1233 1227 }; 1234 1228
+554
drivers/perf/fsl_imx8_ddr_perf.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright 2017 NXP 4 + * Copyright 2016 Freescale Semiconductor, Inc. 5 + */ 6 + 7 + #include <linux/bitfield.h> 8 + #include <linux/init.h> 9 + #include <linux/interrupt.h> 10 + #include <linux/io.h> 11 + #include <linux/module.h> 12 + #include <linux/of.h> 13 + #include <linux/of_address.h> 14 + #include <linux/of_device.h> 15 + #include <linux/of_irq.h> 16 + #include <linux/perf_event.h> 17 + #include <linux/slab.h> 18 + 19 + #define COUNTER_CNTL 0x0 20 + #define COUNTER_READ 0x20 21 + 22 + #define COUNTER_DPCR1 0x30 23 + 24 + #define CNTL_OVER 0x1 25 + #define CNTL_CLEAR 0x2 26 + #define CNTL_EN 0x4 27 + #define CNTL_EN_MASK 0xFFFFFFFB 28 + #define CNTL_CLEAR_MASK 0xFFFFFFFD 29 + #define CNTL_OVER_MASK 0xFFFFFFFE 30 + 31 + #define CNTL_CSV_SHIFT 24 32 + #define CNTL_CSV_MASK (0xFF << CNTL_CSV_SHIFT) 33 + 34 + #define EVENT_CYCLES_ID 0 35 + #define EVENT_CYCLES_COUNTER 0 36 + #define NUM_COUNTERS 4 37 + 38 + #define to_ddr_pmu(p) container_of(p, struct ddr_pmu, pmu) 39 + 40 + #define DDR_PERF_DEV_NAME "imx8_ddr" 41 + #define DDR_CPUHP_CB_NAME DDR_PERF_DEV_NAME "_perf_pmu" 42 + 43 + static DEFINE_IDA(ddr_ida); 44 + 45 + static const struct of_device_id imx_ddr_pmu_dt_ids[] = { 46 + { .compatible = "fsl,imx8-ddr-pmu",}, 47 + { .compatible = "fsl,imx8m-ddr-pmu",}, 48 + { /* sentinel */ } 49 + }; 50 + 51 + struct ddr_pmu { 52 + struct pmu pmu; 53 + void __iomem *base; 54 + unsigned int cpu; 55 + struct hlist_node node; 56 + struct device *dev; 57 + struct perf_event *events[NUM_COUNTERS]; 58 + int active_events; 59 + enum cpuhp_state cpuhp_state; 60 + int irq; 61 + int id; 62 + }; 63 + 64 + static ssize_t ddr_perf_cpumask_show(struct device *dev, 65 + struct device_attribute *attr, char *buf) 66 + { 67 + struct ddr_pmu *pmu = dev_get_drvdata(dev); 68 + 69 + return cpumap_print_to_pagebuf(true, buf, cpumask_of(pmu->cpu)); 70 + } 71 + 72 + static struct device_attribute ddr_perf_cpumask_attr = 73 + __ATTR(cpumask, 0444, ddr_perf_cpumask_show, NULL); 74 + 75 + static struct attribute *ddr_perf_cpumask_attrs[] = { 76 + &ddr_perf_cpumask_attr.attr, 77 + NULL, 78 + }; 79 + 80 + static struct attribute_group ddr_perf_cpumask_attr_group = { 81 + .attrs = ddr_perf_cpumask_attrs, 82 + }; 83 + 84 + static ssize_t 85 + ddr_pmu_event_show(struct device *dev, struct device_attribute *attr, 86 + char *page) 87 + { 88 + struct perf_pmu_events_attr *pmu_attr; 89 + 90 + pmu_attr = container_of(attr, struct perf_pmu_events_attr, attr); 91 + return sprintf(page, "event=0x%02llx\n", pmu_attr->id); 92 + } 93 + 94 + #define IMX8_DDR_PMU_EVENT_ATTR(_name, _id) \ 95 + (&((struct perf_pmu_events_attr[]) { \ 96 + { .attr = __ATTR(_name, 0444, ddr_pmu_event_show, NULL),\ 97 + .id = _id, } \ 98 + })[0].attr.attr) 99 + 100 + static struct attribute *ddr_perf_events_attrs[] = { 101 + IMX8_DDR_PMU_EVENT_ATTR(cycles, EVENT_CYCLES_ID), 102 + IMX8_DDR_PMU_EVENT_ATTR(selfresh, 0x01), 103 + IMX8_DDR_PMU_EVENT_ATTR(read-accesses, 0x04), 104 + IMX8_DDR_PMU_EVENT_ATTR(write-accesses, 0x05), 105 + IMX8_DDR_PMU_EVENT_ATTR(read-queue-depth, 0x08), 106 + IMX8_DDR_PMU_EVENT_ATTR(write-queue-depth, 0x09), 107 + IMX8_DDR_PMU_EVENT_ATTR(lp-read-credit-cnt, 0x10), 108 + IMX8_DDR_PMU_EVENT_ATTR(hp-read-credit-cnt, 0x11), 109 + IMX8_DDR_PMU_EVENT_ATTR(write-credit-cnt, 0x12), 110 + IMX8_DDR_PMU_EVENT_ATTR(read-command, 0x20), 111 + IMX8_DDR_PMU_EVENT_ATTR(write-command, 0x21), 112 + IMX8_DDR_PMU_EVENT_ATTR(read-modify-write-command, 0x22), 113 + IMX8_DDR_PMU_EVENT_ATTR(hp-read, 0x23), 114 + IMX8_DDR_PMU_EVENT_ATTR(hp-req-nocredit, 0x24), 115 + IMX8_DDR_PMU_EVENT_ATTR(hp-xact-credit, 0x25), 116 + IMX8_DDR_PMU_EVENT_ATTR(lp-req-nocredit, 0x26), 117 + IMX8_DDR_PMU_EVENT_ATTR(lp-xact-credit, 0x27), 118 + IMX8_DDR_PMU_EVENT_ATTR(wr-xact-credit, 0x29), 119 + IMX8_DDR_PMU_EVENT_ATTR(read-cycles, 0x2a), 120 + IMX8_DDR_PMU_EVENT_ATTR(write-cycles, 0x2b), 121 + IMX8_DDR_PMU_EVENT_ATTR(read-write-transition, 0x30), 122 + IMX8_DDR_PMU_EVENT_ATTR(precharge, 0x31), 123 + IMX8_DDR_PMU_EVENT_ATTR(activate, 0x32), 124 + IMX8_DDR_PMU_EVENT_ATTR(load-mode, 0x33), 125 + IMX8_DDR_PMU_EVENT_ATTR(perf-mwr, 0x34), 126 + IMX8_DDR_PMU_EVENT_ATTR(read, 0x35), 127 + IMX8_DDR_PMU_EVENT_ATTR(read-activate, 0x36), 128 + IMX8_DDR_PMU_EVENT_ATTR(refresh, 0x37), 129 + IMX8_DDR_PMU_EVENT_ATTR(write, 0x38), 130 + IMX8_DDR_PMU_EVENT_ATTR(raw-hazard, 0x39), 131 + NULL, 132 + }; 133 + 134 + static struct attribute_group ddr_perf_events_attr_group = { 135 + .name = "events", 136 + .attrs = ddr_perf_events_attrs, 137 + }; 138 + 139 + PMU_FORMAT_ATTR(event, "config:0-7"); 140 + 141 + static struct attribute *ddr_perf_format_attrs[] = { 142 + &format_attr_event.attr, 143 + NULL, 144 + }; 145 + 146 + static struct attribute_group ddr_perf_format_attr_group = { 147 + .name = "format", 148 + .attrs = ddr_perf_format_attrs, 149 + }; 150 + 151 + static const struct attribute_group *attr_groups[] = { 152 + &ddr_perf_events_attr_group, 153 + &ddr_perf_format_attr_group, 154 + &ddr_perf_cpumask_attr_group, 155 + NULL, 156 + }; 157 + 158 + static u32 ddr_perf_alloc_counter(struct ddr_pmu *pmu, int event) 159 + { 160 + int i; 161 + 162 + /* 163 + * Always map cycle event to counter 0 164 + * Cycles counter is dedicated for cycle event 165 + * can't used for the other events 166 + */ 167 + if (event == EVENT_CYCLES_ID) { 168 + if (pmu->events[EVENT_CYCLES_COUNTER] == NULL) 169 + return EVENT_CYCLES_COUNTER; 170 + else 171 + return -ENOENT; 172 + } 173 + 174 + for (i = 1; i < NUM_COUNTERS; i++) { 175 + if (pmu->events[i] == NULL) 176 + return i; 177 + } 178 + 179 + return -ENOENT; 180 + } 181 + 182 + static void ddr_perf_free_counter(struct ddr_pmu *pmu, int counter) 183 + { 184 + pmu->events[counter] = NULL; 185 + } 186 + 187 + static u32 ddr_perf_read_counter(struct ddr_pmu *pmu, int counter) 188 + { 189 + return readl_relaxed(pmu->base + COUNTER_READ + counter * 4); 190 + } 191 + 192 + static int ddr_perf_event_init(struct perf_event *event) 193 + { 194 + struct ddr_pmu *pmu = to_ddr_pmu(event->pmu); 195 + struct hw_perf_event *hwc = &event->hw; 196 + struct perf_event *sibling; 197 + 198 + if (event->attr.type != event->pmu->type) 199 + return -ENOENT; 200 + 201 + if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK) 202 + return -EOPNOTSUPP; 203 + 204 + if (event->cpu < 0) { 205 + dev_warn(pmu->dev, "Can't provide per-task data!\n"); 206 + return -EOPNOTSUPP; 207 + } 208 + 209 + /* 210 + * We must NOT create groups containing mixed PMUs, although software 211 + * events are acceptable (for example to create a CCN group 212 + * periodically read when a hrtimer aka cpu-clock leader triggers). 213 + */ 214 + if (event->group_leader->pmu != event->pmu && 215 + !is_software_event(event->group_leader)) 216 + return -EINVAL; 217 + 218 + for_each_sibling_event(sibling, event->group_leader) { 219 + if (sibling->pmu != event->pmu && 220 + !is_software_event(sibling)) 221 + return -EINVAL; 222 + } 223 + 224 + event->cpu = pmu->cpu; 225 + hwc->idx = -1; 226 + 227 + return 0; 228 + } 229 + 230 + 231 + static void ddr_perf_event_update(struct perf_event *event) 232 + { 233 + struct ddr_pmu *pmu = to_ddr_pmu(event->pmu); 234 + struct hw_perf_event *hwc = &event->hw; 235 + u64 delta, prev_raw_count, new_raw_count; 236 + int counter = hwc->idx; 237 + 238 + do { 239 + prev_raw_count = local64_read(&hwc->prev_count); 240 + new_raw_count = ddr_perf_read_counter(pmu, counter); 241 + } while (local64_cmpxchg(&hwc->prev_count, prev_raw_count, 242 + new_raw_count) != prev_raw_count); 243 + 244 + delta = (new_raw_count - prev_raw_count) & 0xFFFFFFFF; 245 + 246 + local64_add(delta, &event->count); 247 + } 248 + 249 + static void ddr_perf_counter_enable(struct ddr_pmu *pmu, int config, 250 + int counter, bool enable) 251 + { 252 + u8 reg = counter * 4 + COUNTER_CNTL; 253 + int val; 254 + 255 + if (enable) { 256 + /* 257 + * must disable first, then enable again 258 + * otherwise, cycle counter will not work 259 + * if previous state is enabled. 260 + */ 261 + writel(0, pmu->base + reg); 262 + val = CNTL_EN | CNTL_CLEAR; 263 + val |= FIELD_PREP(CNTL_CSV_MASK, config); 264 + writel(val, pmu->base + reg); 265 + } else { 266 + /* Disable counter */ 267 + writel(0, pmu->base + reg); 268 + } 269 + } 270 + 271 + static void ddr_perf_event_start(struct perf_event *event, int flags) 272 + { 273 + struct ddr_pmu *pmu = to_ddr_pmu(event->pmu); 274 + struct hw_perf_event *hwc = &event->hw; 275 + int counter = hwc->idx; 276 + 277 + local64_set(&hwc->prev_count, 0); 278 + 279 + ddr_perf_counter_enable(pmu, event->attr.config, counter, true); 280 + 281 + hwc->state = 0; 282 + } 283 + 284 + static int ddr_perf_event_add(struct perf_event *event, int flags) 285 + { 286 + struct ddr_pmu *pmu = to_ddr_pmu(event->pmu); 287 + struct hw_perf_event *hwc = &event->hw; 288 + int counter; 289 + int cfg = event->attr.config; 290 + 291 + counter = ddr_perf_alloc_counter(pmu, cfg); 292 + if (counter < 0) { 293 + dev_dbg(pmu->dev, "There are not enough counters\n"); 294 + return -EOPNOTSUPP; 295 + } 296 + 297 + pmu->events[counter] = event; 298 + pmu->active_events++; 299 + hwc->idx = counter; 300 + 301 + hwc->state |= PERF_HES_STOPPED; 302 + 303 + if (flags & PERF_EF_START) 304 + ddr_perf_event_start(event, flags); 305 + 306 + return 0; 307 + } 308 + 309 + static void ddr_perf_event_stop(struct perf_event *event, int flags) 310 + { 311 + struct ddr_pmu *pmu = to_ddr_pmu(event->pmu); 312 + struct hw_perf_event *hwc = &event->hw; 313 + int counter = hwc->idx; 314 + 315 + ddr_perf_counter_enable(pmu, event->attr.config, counter, false); 316 + ddr_perf_event_update(event); 317 + 318 + hwc->state |= PERF_HES_STOPPED; 319 + } 320 + 321 + static void ddr_perf_event_del(struct perf_event *event, int flags) 322 + { 323 + struct ddr_pmu *pmu = to_ddr_pmu(event->pmu); 324 + struct hw_perf_event *hwc = &event->hw; 325 + int counter = hwc->idx; 326 + 327 + ddr_perf_event_stop(event, PERF_EF_UPDATE); 328 + 329 + ddr_perf_free_counter(pmu, counter); 330 + pmu->active_events--; 331 + hwc->idx = -1; 332 + } 333 + 334 + static void ddr_perf_pmu_enable(struct pmu *pmu) 335 + { 336 + struct ddr_pmu *ddr_pmu = to_ddr_pmu(pmu); 337 + 338 + /* enable cycle counter if cycle is not active event list */ 339 + if (ddr_pmu->events[EVENT_CYCLES_COUNTER] == NULL) 340 + ddr_perf_counter_enable(ddr_pmu, 341 + EVENT_CYCLES_ID, 342 + EVENT_CYCLES_COUNTER, 343 + true); 344 + } 345 + 346 + static void ddr_perf_pmu_disable(struct pmu *pmu) 347 + { 348 + struct ddr_pmu *ddr_pmu = to_ddr_pmu(pmu); 349 + 350 + if (ddr_pmu->events[EVENT_CYCLES_COUNTER] == NULL) 351 + ddr_perf_counter_enable(ddr_pmu, 352 + EVENT_CYCLES_ID, 353 + EVENT_CYCLES_COUNTER, 354 + false); 355 + } 356 + 357 + static int ddr_perf_init(struct ddr_pmu *pmu, void __iomem *base, 358 + struct device *dev) 359 + { 360 + *pmu = (struct ddr_pmu) { 361 + .pmu = (struct pmu) { 362 + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, 363 + .task_ctx_nr = perf_invalid_context, 364 + .attr_groups = attr_groups, 365 + .event_init = ddr_perf_event_init, 366 + .add = ddr_perf_event_add, 367 + .del = ddr_perf_event_del, 368 + .start = ddr_perf_event_start, 369 + .stop = ddr_perf_event_stop, 370 + .read = ddr_perf_event_update, 371 + .pmu_enable = ddr_perf_pmu_enable, 372 + .pmu_disable = ddr_perf_pmu_disable, 373 + }, 374 + .base = base, 375 + .dev = dev, 376 + }; 377 + 378 + pmu->id = ida_simple_get(&ddr_ida, 0, 0, GFP_KERNEL); 379 + return pmu->id; 380 + } 381 + 382 + static irqreturn_t ddr_perf_irq_handler(int irq, void *p) 383 + { 384 + int i; 385 + struct ddr_pmu *pmu = (struct ddr_pmu *) p; 386 + struct perf_event *event, *cycle_event = NULL; 387 + 388 + /* all counter will stop if cycle counter disabled */ 389 + ddr_perf_counter_enable(pmu, 390 + EVENT_CYCLES_ID, 391 + EVENT_CYCLES_COUNTER, 392 + false); 393 + /* 394 + * When the cycle counter overflows, all counters are stopped, 395 + * and an IRQ is raised. If any other counter overflows, it 396 + * continues counting, and no IRQ is raised. 397 + * 398 + * Cycles occur at least 4 times as often as other events, so we 399 + * can update all events on a cycle counter overflow and not 400 + * lose events. 401 + * 402 + */ 403 + for (i = 0; i < NUM_COUNTERS; i++) { 404 + 405 + if (!pmu->events[i]) 406 + continue; 407 + 408 + event = pmu->events[i]; 409 + 410 + ddr_perf_event_update(event); 411 + 412 + if (event->hw.idx == EVENT_CYCLES_COUNTER) 413 + cycle_event = event; 414 + } 415 + 416 + ddr_perf_counter_enable(pmu, 417 + EVENT_CYCLES_ID, 418 + EVENT_CYCLES_COUNTER, 419 + true); 420 + if (cycle_event) 421 + ddr_perf_event_update(cycle_event); 422 + 423 + return IRQ_HANDLED; 424 + } 425 + 426 + static int ddr_perf_offline_cpu(unsigned int cpu, struct hlist_node *node) 427 + { 428 + struct ddr_pmu *pmu = hlist_entry_safe(node, struct ddr_pmu, node); 429 + int target; 430 + 431 + if (cpu != pmu->cpu) 432 + return 0; 433 + 434 + target = cpumask_any_but(cpu_online_mask, cpu); 435 + if (target >= nr_cpu_ids) 436 + return 0; 437 + 438 + perf_pmu_migrate_context(&pmu->pmu, cpu, target); 439 + pmu->cpu = target; 440 + 441 + WARN_ON(irq_set_affinity_hint(pmu->irq, cpumask_of(pmu->cpu))); 442 + 443 + return 0; 444 + } 445 + 446 + static int ddr_perf_probe(struct platform_device *pdev) 447 + { 448 + struct ddr_pmu *pmu; 449 + struct device_node *np; 450 + void __iomem *base; 451 + char *name; 452 + int num; 453 + int ret; 454 + int irq; 455 + 456 + base = devm_platform_ioremap_resource(pdev, 0); 457 + if (IS_ERR(base)) 458 + return PTR_ERR(base); 459 + 460 + np = pdev->dev.of_node; 461 + 462 + pmu = devm_kzalloc(&pdev->dev, sizeof(*pmu), GFP_KERNEL); 463 + if (!pmu) 464 + return -ENOMEM; 465 + 466 + num = ddr_perf_init(pmu, base, &pdev->dev); 467 + 468 + platform_set_drvdata(pdev, pmu); 469 + 470 + name = devm_kasprintf(&pdev->dev, GFP_KERNEL, DDR_PERF_DEV_NAME "%d", 471 + num); 472 + if (!name) 473 + return -ENOMEM; 474 + 475 + pmu->cpu = raw_smp_processor_id(); 476 + ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, 477 + DDR_CPUHP_CB_NAME, 478 + NULL, 479 + ddr_perf_offline_cpu); 480 + 481 + if (ret < 0) { 482 + dev_err(&pdev->dev, "cpuhp_setup_state_multi failed\n"); 483 + goto ddr_perf_err; 484 + } 485 + 486 + pmu->cpuhp_state = ret; 487 + 488 + /* Register the pmu instance for cpu hotplug */ 489 + cpuhp_state_add_instance_nocalls(pmu->cpuhp_state, &pmu->node); 490 + 491 + /* Request irq */ 492 + irq = of_irq_get(np, 0); 493 + if (irq < 0) { 494 + dev_err(&pdev->dev, "Failed to get irq: %d", irq); 495 + ret = irq; 496 + goto ddr_perf_err; 497 + } 498 + 499 + ret = devm_request_irq(&pdev->dev, irq, 500 + ddr_perf_irq_handler, 501 + IRQF_NOBALANCING | IRQF_NO_THREAD, 502 + DDR_CPUHP_CB_NAME, 503 + pmu); 504 + if (ret < 0) { 505 + dev_err(&pdev->dev, "Request irq failed: %d", ret); 506 + goto ddr_perf_err; 507 + } 508 + 509 + pmu->irq = irq; 510 + ret = irq_set_affinity_hint(pmu->irq, cpumask_of(pmu->cpu)); 511 + if (ret) { 512 + dev_err(pmu->dev, "Failed to set interrupt affinity!\n"); 513 + goto ddr_perf_err; 514 + } 515 + 516 + ret = perf_pmu_register(&pmu->pmu, name, -1); 517 + if (ret) 518 + goto ddr_perf_err; 519 + 520 + return 0; 521 + 522 + ddr_perf_err: 523 + if (pmu->cpuhp_state) 524 + cpuhp_state_remove_instance_nocalls(pmu->cpuhp_state, &pmu->node); 525 + 526 + ida_simple_remove(&ddr_ida, pmu->id); 527 + dev_warn(&pdev->dev, "i.MX8 DDR Perf PMU failed (%d), disabled\n", ret); 528 + return ret; 529 + } 530 + 531 + static int ddr_perf_remove(struct platform_device *pdev) 532 + { 533 + struct ddr_pmu *pmu = platform_get_drvdata(pdev); 534 + 535 + cpuhp_state_remove_instance_nocalls(pmu->cpuhp_state, &pmu->node); 536 + irq_set_affinity_hint(pmu->irq, NULL); 537 + 538 + perf_pmu_unregister(&pmu->pmu); 539 + 540 + ida_simple_remove(&ddr_ida, pmu->id); 541 + return 0; 542 + } 543 + 544 + static struct platform_driver imx_ddr_pmu_driver = { 545 + .driver = { 546 + .name = "imx-ddr-pmu", 547 + .of_match_table = imx_ddr_pmu_dt_ids, 548 + }, 549 + .probe = ddr_perf_probe, 550 + .remove = ddr_perf_remove, 551 + }; 552 + 553 + module_platform_driver(imx_ddr_pmu_driver); 554 + MODULE_LICENSE("GPL v2");
+5
include/linux/acpi.h
··· 1303 1303 #ifdef CONFIG_ACPI_PPTT 1304 1304 int find_acpi_cpu_topology(unsigned int cpu, int level); 1305 1305 int find_acpi_cpu_topology_package(unsigned int cpu); 1306 + int find_acpi_cpu_topology_hetero_id(unsigned int cpu); 1306 1307 int find_acpi_cpu_cache_topology(unsigned int cpu, int level); 1307 1308 #else 1308 1309 static inline int find_acpi_cpu_topology(unsigned int cpu, int level) ··· 1311 1310 return -EINVAL; 1312 1311 } 1313 1312 static inline int find_acpi_cpu_topology_package(unsigned int cpu) 1313 + { 1314 + return -EINVAL; 1315 + } 1316 + static inline int find_acpi_cpu_topology_hetero_id(unsigned int cpu) 1314 1317 { 1315 1318 return -EINVAL; 1316 1319 }
+2
include/linux/cacheinfo.h
··· 17 17 CACHE_TYPE_UNIFIED = BIT(2), 18 18 }; 19 19 20 + extern unsigned int coherency_max_size; 21 + 20 22 /** 21 23 * struct cacheinfo - represent a cache leaf node 22 24 * @id: This cache's id. It is unique among caches with the same (type, level).
+2
include/linux/perf/arm_pmu.h
··· 171 171 172 172 #endif /* CONFIG_ARM_PMU */ 173 173 174 + #define ARMV8_SPE_PDEV_NAME "arm,spe-v1" 175 + 174 176 #endif /* __ARM_PMU_H__ */
+6 -2
kernel/irq/irqdesc.c
··· 680 680 * @hwirq: The HW irq number to convert to a logical one 681 681 * @regs: Register file coming from the low-level handling code 682 682 * 683 + * This function must be called from an NMI context. 684 + * 683 685 * Returns: 0 on success, or -EINVAL if conversion has failed 684 686 */ 685 687 int handle_domain_nmi(struct irq_domain *domain, unsigned int hwirq, ··· 691 689 unsigned int irq; 692 690 int ret = 0; 693 691 694 - nmi_enter(); 692 + /* 693 + * NMI context needs to be setup earlier in order to deal with tracing. 694 + */ 695 + WARN_ON(!in_nmi()); 695 696 696 697 irq = irq_find_mapping(domain, hwirq); 697 698 ··· 707 702 else 708 703 ret = -EINVAL; 709 704 710 - nmi_exit(); 711 705 set_irq_regs(old_regs); 712 706 return ret; 713 707 }
+3
kernel/ptrace.c
··· 116 116 BUG_ON(!child->ptrace); 117 117 118 118 clear_tsk_thread_flag(child, TIF_SYSCALL_TRACE); 119 + #ifdef TIF_SYSCALL_EMU 120 + clear_tsk_thread_flag(child, TIF_SYSCALL_EMU); 121 + #endif 119 122 120 123 child->parent = child->real_parent; 121 124 list_del_init(&child->ptrace_entry);
-11
mm/vmalloc.c
··· 2128 2128 int flush_dmap = 0; 2129 2129 int i; 2130 2130 2131 - /* 2132 - * The below block can be removed when all architectures that have 2133 - * direct map permissions also have set_direct_map_() implementations. 2134 - * This is concerned with resetting the direct map any an vm alias with 2135 - * execute permissions, without leaving a RW+X window. 2136 - */ 2137 - if (flush_reset && !IS_ENABLED(CONFIG_ARCH_HAS_SET_DIRECT_MAP)) { 2138 - set_memory_nx((unsigned long)area->addr, area->nr_pages); 2139 - set_memory_rw((unsigned long)area->addr, area->nr_pages); 2140 - } 2141 - 2142 2131 remove_vm_area(area->addr); 2143 2132 2144 2133 /* If this is not VM_FLUSH_RESET_PERMS memory, no need for the below. */