softlockup: Add sched_clock_tick() to avoid kernel warning on kgdb resume

When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, sched_clock() gets
the time from hardware such as the TSC on x86. In this
configuration kgdb will report a softlock warning message on
resuming or detaching from a debug session.

Sequence of events in the problem case:

1) "cpu sched clock" and "hardware time" are at 100 sec prior
to a call to kgdb_handle_exception()

2) Debugger waits in kgdb_handle_exception() for 80 sec and on
exit the following is called ... touch_softlockup_watchdog() -->
__raw_get_cpu_var(touch_timestamp) = 0;

3) "cpu sched clock" = 100s (it was not updated, because the
interrupt was disabled in kgdb) but the "hardware time" = 180 sec

4) The first timer interrupt after resuming from
kgdb_handle_exception updates the watchdog from the "cpu sched clock"

update_process_times() { ... run_local_timers() -->
softlockup_tick() --> check (touch_timestamp == 0) (it is "YES"
here, we have set "touch_timestamp = 0" at kgdb) -->
__touch_softlockup_watchdog() ***(A)--> reset "touch_timestamp"
to "get_timestamp()" (Here, the "touch_timestamp" will still be
set to 100s.) ...

scheduler_tick() ***(B)--> sched_clock_tick() (update "cpu sched
clock" to "hardware time" = 180s) ... }

5) The Second timer interrupt handler appears to have a large
jump and trips the softlockup warning.

update_process_times() { ... run_local_timers() -->
softlockup_tick() --> "cpu sched clock" - "touch_timestamp" =
180s-100s > 60s --> printk "soft lockup error messages" ... }

note: ***(A) reset "touch_timestamp" to
"get_timestamp(this_cpu)"

Why is "touch_timestamp" 100 sec, instead of 180 sec?

When CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is set, the call trace of
get_timestamp() is:

get_timestamp(this_cpu)
-->cpu_clock(this_cpu)
-->sched_clock_cpu(this_cpu)
-->__update_sched_clock(sched_clock_data, now)

The __update_sched_clock() function uses the GTOD tick value to
create a window to normalize the "now" values. So if "now"
value is too big for sched_clock_data, it will be ignored.

The fix is to invoke sched_clock_tick() to update "cpu sched
clock" in order to recover from this state. This is done by
introducing the function touch_softlockup_watchdog_sync(). This
allows kgdb to request that the sched clock is updated when the
watchdog thread runs the first time after a resume from kgdb.

[yong.zhang0@gmail.com: Use per cpu instead of an array]
Signed-off-by: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Dongdong Deng <Dongdong.Deng@windriver.com>
Cc: kgdb-bugreport@lists.sourceforge.net
Cc: peterz@infradead.org
LKML-Reference: <1264631124-4837-2-git-send-email-jason.wessel@windriver.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

authored by Jason Wessel and committed by Ingo Molnar d6ad3e28 48d50674

+22 -3
+4
include/linux/sched.h
··· 310 310 #ifdef CONFIG_DETECT_SOFTLOCKUP 311 311 extern void softlockup_tick(void); 312 312 extern void touch_softlockup_watchdog(void); 313 + extern void touch_softlockup_watchdog_sync(void); 313 314 extern void touch_all_softlockup_watchdogs(void); 314 315 extern int proc_dosoftlockup_thresh(struct ctl_table *table, int write, 315 316 void __user *buffer, ··· 322 321 { 323 322 } 324 323 static inline void touch_softlockup_watchdog(void) 324 + { 325 + } 326 + static inline void touch_softlockup_watchdog_sync(void) 325 327 { 326 328 } 327 329 static inline void touch_all_softlockup_watchdogs(void)
+3 -3
kernel/kgdb.c
··· 596 596 597 597 /* Signal the primary CPU that we are done: */ 598 598 atomic_set(&cpu_in_kgdb[cpu], 0); 599 - touch_softlockup_watchdog(); 599 + touch_softlockup_watchdog_sync(); 600 600 clocksource_touch_watchdog(); 601 601 local_irq_restore(flags); 602 602 } ··· 1450 1450 (kgdb_info[cpu].task && 1451 1451 kgdb_info[cpu].task->pid != kgdb_sstep_pid) && --sstep_tries) { 1452 1452 atomic_set(&kgdb_active, -1); 1453 - touch_softlockup_watchdog(); 1453 + touch_softlockup_watchdog_sync(); 1454 1454 clocksource_touch_watchdog(); 1455 1455 local_irq_restore(flags); 1456 1456 ··· 1550 1550 } 1551 1551 /* Free kgdb_active */ 1552 1552 atomic_set(&kgdb_active, -1); 1553 - touch_softlockup_watchdog(); 1553 + touch_softlockup_watchdog_sync(); 1554 1554 clocksource_touch_watchdog(); 1555 1555 local_irq_restore(flags); 1556 1556
+15
kernel/softlockup.c
··· 25 25 static DEFINE_PER_CPU(unsigned long, softlockup_touch_ts); /* touch timestamp */ 26 26 static DEFINE_PER_CPU(unsigned long, softlockup_print_ts); /* print timestamp */ 27 27 static DEFINE_PER_CPU(struct task_struct *, softlockup_watchdog); 28 + static DEFINE_PER_CPU(bool, softlock_touch_sync); 28 29 29 30 static int __read_mostly did_panic; 30 31 int __read_mostly softlockup_thresh = 60; ··· 80 79 } 81 80 EXPORT_SYMBOL(touch_softlockup_watchdog); 82 81 82 + void touch_softlockup_watchdog_sync(void) 83 + { 84 + __raw_get_cpu_var(softlock_touch_sync) = true; 85 + __raw_get_cpu_var(softlockup_touch_ts) = 0; 86 + } 87 + 83 88 void touch_all_softlockup_watchdogs(void) 84 89 { 85 90 int cpu; ··· 125 118 } 126 119 127 120 if (touch_ts == 0) { 121 + if (unlikely(per_cpu(softlock_touch_sync, this_cpu))) { 122 + /* 123 + * If the time stamp was touched atomically 124 + * make sure the scheduler tick is up to date. 125 + */ 126 + per_cpu(softlock_touch_sync, this_cpu) = false; 127 + sched_clock_tick(); 128 + } 128 129 __touch_softlockup_watchdog(); 129 130 return; 130 131 }