Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

kernel/watchdog.c: always return NOTIFY_OK during cpu up/down events

This patch addresses a couple of problems. One was the case when the
hardlockup failed to start, it also failed to start the softlockup. There
were valid cases when the hardlockup shouldn't start and that shouldn't
block the softlockup (no lapic, bios controls perf counters).

The second problem was when the hardlockup failed to start on boxes (from
a no lapic or bios controlled perf counter case), it reported failure to
the cpu notifier chain. This blocked the notifier from continuing to
start other more critical pieces of cpu bring-up (in our case based on a
2.6.32 fork, it was the mce). As a result, during soft cpu online/offline
testing, the system would panic when a cpu was offlined because the cpu
notifier would succeed in processing a watchdog disable cpu event and
would panic in the mce case as a result of un-initialized variables from a
never executed cpu up event.

I realized the hardlockup/softlockup cases are really just debugging aids
and should never impede the progress of a cpu up/down event. Therefore I
modified the code to always return NOTIFY_OK and instead rely on printks
to inform the user of problems.

Signed-off-by: Don Zickus <dzickus@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Reviewed-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Don Zickus and committed by
Linus Torvalds
f99a9933 fef2c9bc

+16 -6
+16 -6
kernel/watchdog.c
··· 418 418 static int watchdog_enable(int cpu) 419 419 { 420 420 struct task_struct *p = per_cpu(softlockup_watchdog, cpu); 421 - int err; 421 + int err = 0; 422 422 423 423 /* enable the perf event */ 424 424 err = watchdog_nmi_enable(cpu); 425 - if (err) 426 - return err; 425 + 426 + /* Regardless of err above, fall through and start softlockup */ 427 427 428 428 /* create the watchdog thread */ 429 429 if (!p) { 430 430 p = kthread_create(watchdog, (void *)(unsigned long)cpu, "watchdog/%d", cpu); 431 431 if (IS_ERR(p)) { 432 432 printk(KERN_ERR "softlockup watchdog for %i failed\n", cpu); 433 - return PTR_ERR(p); 433 + if (!err) 434 + /* if hardlockup hasn't already set this */ 435 + err = PTR_ERR(p); 436 + goto out; 434 437 } 435 438 kthread_bind(p, cpu); 436 439 per_cpu(watchdog_touch_ts, cpu) = 0; ··· 441 438 wake_up_process(p); 442 439 } 443 440 444 - return 0; 441 + out: 442 + return err; 445 443 } 446 444 447 445 static void watchdog_disable(int cpu) ··· 554 550 break; 555 551 #endif /* CONFIG_HOTPLUG_CPU */ 556 552 } 557 - return notifier_from_errno(err); 553 + 554 + /* 555 + * hardlockup and softlockup are not important enough 556 + * to block cpu bring up. Just always succeed and 557 + * rely on printk output to flag problems. 558 + */ 559 + return NOTIFY_OK; 558 560 } 559 561 560 562 static struct notifier_block __cpuinitdata cpu_nfb = {