Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

lockdep: Add lockdep_cleanup_dead_cpu()

Add a function to check that an offline CPU has left the tracing
infrastructure in a sane state.

Commit 9bb69ba4c177 ("ACPI: processor_idle: use raw_safe_halt() in
acpi_idle_play_dead()") fixed an issue where the acpi_idle_play_dead()
function called safe_halt() instead of raw_safe_halt(), which had the
side-effect of setting the hardirqs_enabled flag for the offline CPU.

On x86 this triggered warnings from lockdep_assert_irqs_disabled() when
the CPU was brought back online again later. These warnings were too
early for the exception to be handled correctly, leading to a
triple-fault.

Add lockdep_cleanup_dead_cpu() to check for this kind of failure mode,
print the events leading up to it, and correct it so that the CPU can
come online again correctly. Re-introducing the original bug now merely
results in this warning instead:

[ 61.556652] smpboot: CPU 1 is now offline
[ 61.556769] CPU 1 left hardirqs enabled!
[ 61.556915] irq event stamp: 128149
[ 61.556965] hardirqs last enabled at (128149): [<ffffffff81720a36>] acpi_idle_play_dead+0x46/0x70
[ 61.557055] hardirqs last disabled at (128148): [<ffffffff81124d50>] do_idle+0x90/0xe0
[ 61.557117] softirqs last enabled at (128078): [<ffffffff81cec74c>] __do_softirq+0x31c/0x423
[ 61.557199] softirqs last disabled at (128065): [<ffffffff810baae1>] __irq_exit_rcu+0x91/0x100

[boqun: Capitalize the title and reword the message a bit]

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Link: https://lore.kernel.org/r/f7bd2b3b999051bb3ef4be34526a9262008285f5.camel@infradead.org

authored by

David Woodhouse and committed by
Boqun Feng
0784181b 87347f14

+31
+6
include/linux/irqflags.h
··· 18 18 #include <asm/irqflags.h> 19 19 #include <asm/percpu.h> 20 20 21 + struct task_struct; 22 + 21 23 /* Currently lockdep_softirqs_on/off is used only by lockdep */ 22 24 #ifdef CONFIG_PROVE_LOCKING 23 25 extern void lockdep_softirqs_on(unsigned long ip); ··· 27 25 extern void lockdep_hardirqs_on_prepare(void); 28 26 extern void lockdep_hardirqs_on(unsigned long ip); 29 27 extern void lockdep_hardirqs_off(unsigned long ip); 28 + extern void lockdep_cleanup_dead_cpu(unsigned int cpu, 29 + struct task_struct *idle); 30 30 #else 31 31 static inline void lockdep_softirqs_on(unsigned long ip) { } 32 32 static inline void lockdep_softirqs_off(unsigned long ip) { } 33 33 static inline void lockdep_hardirqs_on_prepare(void) { } 34 34 static inline void lockdep_hardirqs_on(unsigned long ip) { } 35 35 static inline void lockdep_hardirqs_off(unsigned long ip) { } 36 + static inline void lockdep_cleanup_dead_cpu(unsigned int cpu, 37 + struct task_struct *idle) {} 36 38 #endif 37 39 38 40 #ifdef CONFIG_TRACE_IRQFLAGS
+1
kernel/cpu.c
··· 1338 1338 1339 1339 cpuhp_bp_sync_dead(cpu); 1340 1340 1341 + lockdep_cleanup_dead_cpu(cpu, idle_thread_get(cpu)); 1341 1342 tick_cleanup_dead_cpu(cpu); 1342 1343 1343 1344 /*
+24
kernel/locking/lockdep.c
··· 4586 4586 debug_atomic_inc(redundant_softirqs_off); 4587 4587 } 4588 4588 4589 + /** 4590 + * lockdep_cleanup_dead_cpu - Ensure CPU lockdep state is cleanly stopped 4591 + * 4592 + * @cpu: index of offlined CPU 4593 + * @idle: task pointer for offlined CPU's idle thread 4594 + * 4595 + * Invoked after the CPU is dead. Ensures that the tracing infrastructure 4596 + * is left in a suitable state for the CPU to be subsequently brought 4597 + * online again. 4598 + */ 4599 + void lockdep_cleanup_dead_cpu(unsigned int cpu, struct task_struct *idle) 4600 + { 4601 + if (unlikely(!debug_locks)) 4602 + return; 4603 + 4604 + if (unlikely(per_cpu(hardirqs_enabled, cpu))) { 4605 + pr_warn("CPU %u left hardirqs enabled!", cpu); 4606 + if (idle) 4607 + print_irqtrace_events(idle); 4608 + /* Clean it up for when the CPU comes online again. */ 4609 + per_cpu(hardirqs_enabled, cpu) = 0; 4610 + } 4611 + } 4612 + 4589 4613 static int 4590 4614 mark_usage(struct task_struct *curr, struct held_lock *hlock, int check) 4591 4615 {