Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/mm: Fix false positive warning in switch_mm_irqs_off()

Multiple testers reported the following new warning:

WARNING: CPU: 0 PID: 0 at arch/x86/mm/tlb.c:795

Which corresponds to:

if (IS_ENABLED(CONFIG_DEBUG_VM) && WARN_ON_ONCE(prev != &init_mm &&
!cpumask_test_cpu(cpu, mm_cpumask(next))))
cpumask_set_cpu(cpu, mm_cpumask(next));

So the problem is that unuse_temporary_mm() explicitly clears
that bit; and it has to, because otherwise the flush_tlb_mm_range() in
__text_poke() will try sending IPIs, which are not at all needed.

See also:

https://lore.kernel.org/all/20241113095550.GBZzR3pg-RhJKPDazS@fat_crate.local/

Notably, the whole {,un}use_temporary_mm() thing requires preemption to
be disabled across it with the express purpose of keeping all TLB
nonsense CPU local, such that invalidations can also stay local etc.

However, as a side-effect, we violate this above WARN(), which sorta
makes sense for the normal case, but very much doesn't make sense here.

Change unuse_temporary_mm() to mark the mm_struct such that a further
exception (beyond init_mm) can be grafted, to keep the warning for all
the other cases.

Reported-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com>
Reported-by: Jani Nikula <jani.nikula@linux.intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/r/20250430081154.GH4439@noisy.programming.kicks-ass.net

authored by

Peter Zijlstra and committed by
Ingo Molnar
7f995823 43c2df7e

+18 -3
+2 -2
arch/x86/include/asm/mmu.h
··· 16 16 #define MM_CONTEXT_LOCK_LAM 2 17 17 /* Allow LAM and SVA coexisting */ 18 18 #define MM_CONTEXT_FORCE_TAGGED_SVA 3 19 + /* Tracks mm_cpumask */ 20 + #define MM_CONTEXT_NOTRACK 4 19 21 20 22 /* 21 23 * x86 has arch-specific MMU state beyond what lives in mm_struct. ··· 46 44 struct ldt_struct *ldt; 47 45 #endif 48 46 49 - #ifdef CONFIG_X86_64 50 47 unsigned long flags; 51 - #endif 52 48 53 49 #ifdef CONFIG_ADDRESS_MASKING 54 50 /* Active LAM mode: X86_CR3_LAM_U48 or X86_CR3_LAM_U57 or 0 (disabled) */
+10
arch/x86/include/asm/mmu_context.h
··· 247 247 } 248 248 #endif 249 249 250 + static inline bool is_notrack_mm(struct mm_struct *mm) 251 + { 252 + return test_bit(MM_CONTEXT_NOTRACK, &mm->context.flags); 253 + } 254 + 255 + static inline void set_notrack_mm(struct mm_struct *mm) 256 + { 257 + set_bit(MM_CONTEXT_NOTRACK, &mm->context.flags); 258 + } 259 + 250 260 /* 251 261 * We only want to enforce protection keys on the current process 252 262 * because we effectively have no access to PKRU for other
+3
arch/x86/mm/init.c
··· 28 28 #include <asm/text-patching.h> 29 29 #include <asm/memtype.h> 30 30 #include <asm/paravirt.h> 31 + #include <asm/mmu_context.h> 31 32 32 33 /* 33 34 * We need to define the tracepoints somewhere, and tlb.c ··· 830 829 831 830 /* Xen PV guests need the PGD to be pinned. */ 832 831 paravirt_enter_mmap(text_poke_mm); 832 + 833 + set_notrack_mm(text_poke_mm); 833 834 834 835 /* 835 836 * Randomize the poking address, but make sure that the following page
+2 -1
arch/x86/mm/tlb.c
··· 847 847 * mm_cpumask. The TLB shootdown code can figure out from 848 848 * cpu_tlbstate_shared.is_lazy whether or not to send an IPI. 849 849 */ 850 - if (IS_ENABLED(CONFIG_DEBUG_VM) && WARN_ON_ONCE(prev != &init_mm && 850 + if (IS_ENABLED(CONFIG_DEBUG_VM) && 851 + WARN_ON_ONCE(prev != &init_mm && !is_notrack_mm(prev) && 851 852 !cpumask_test_cpu(cpu, mm_cpumask(next)))) 852 853 cpumask_set_cpu(cpu, mm_cpumask(next)); 853 854
+1
arch/x86/platform/efi/efi_64.c
··· 89 89 efi_mm.pgd = efi_pgd; 90 90 mm_init_cpumask(&efi_mm); 91 91 init_new_context(NULL, &efi_mm); 92 + set_notrack_mm(&efi_mm); 92 93 93 94 return 0; 94 95