Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

kernel/watchdog: prevent false hardlockup on overloaded system

On an overloaded system, it is possible that a change in the watchdog
threshold can be delayed long enough to trigger a false positive.

This can easily be achieved by having a cpu spinning indefinitely on a
task, while another cpu updates watchdog threshold.

What happens is while trying to park the watchdog threads, the hrtimers
on the other cpus trigger and reprogram themselves with the new slower
watchdog threshold. Meanwhile, the nmi watchdog is still programmed
with the old faster threshold.

Because the one cpu is blocked, it prevents the thread parking on the
other cpus from completing, which is needed to shutdown the nmi watchdog
and reprogram it correctly. As a result, a false positive from the nmi
watchdog is reported.

Fix this by setting a park_in_progress flag to block all lockups until
the parking is complete.

Fix provided by Ulrich Obergfell.

[akpm@linux-foundation.org: s/park_in_progress/watchdog_park_in_progress/]
Link: http://lkml.kernel.org/r/1481041033-192236-1-git-send-email-dzickus@redhat.com
Signed-off-by: Don Zickus <dzickus@redhat.com>
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
Cc: Ulrich Obergfell <uobergfe@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Don Zickus and committed by
Linus Torvalds
b94f5118 6affb9d7

+13
+1
include/linux/nmi.h
··· 110 110 extern int watchdog_thresh; 111 111 extern unsigned long watchdog_enabled; 112 112 extern unsigned long *watchdog_cpumask_bits; 113 + extern atomic_t watchdog_park_in_progress; 113 114 #ifdef CONFIG_SMP 114 115 extern int sysctl_softlockup_all_cpu_backtrace; 115 116 extern int sysctl_hardlockup_all_cpu_backtrace;
+9
kernel/watchdog.c
··· 49 49 #define for_each_watchdog_cpu(cpu) \ 50 50 for_each_cpu_and((cpu), cpu_online_mask, &watchdog_cpumask) 51 51 52 + atomic_t watchdog_park_in_progress = ATOMIC_INIT(0); 53 + 52 54 /* 53 55 * The 'watchdog_running' variable is set to 1 when the watchdog threads 54 56 * are registered/started and is set to 0 when the watchdog threads are ··· 262 260 int duration; 263 261 int softlockup_all_cpu_backtrace = sysctl_softlockup_all_cpu_backtrace; 264 262 263 + if (atomic_read(&watchdog_park_in_progress) != 0) 264 + return HRTIMER_NORESTART; 265 + 265 266 /* kick the hardlockup detector */ 266 267 watchdog_interrupt_count(); 267 268 ··· 472 467 { 473 468 int cpu, ret = 0; 474 469 470 + atomic_set(&watchdog_park_in_progress, 1); 471 + 475 472 for_each_watchdog_cpu(cpu) { 476 473 ret = kthread_park(per_cpu(softlockup_watchdog, cpu)); 477 474 if (ret) 478 475 break; 479 476 } 477 + 478 + atomic_set(&watchdog_park_in_progress, 0); 480 479 481 480 return ret; 482 481 }
+3
kernel/watchdog_hld.c
··· 84 84 /* Ensure the watchdog never gets throttled */ 85 85 event->hw.interrupts = 0; 86 86 87 + if (atomic_read(&watchdog_park_in_progress) != 0) 88 + return; 89 + 87 90 if (__this_cpu_read(watchdog_nmi_touch) == true) { 88 91 __this_cpu_write(watchdog_nmi_touch, false); 89 92 return;