clocksource: Resolve cpu hotplug dead lock with TSC unstable

Martin Schwidefsky analyzed it:
To register a clocksource the clocksource_mutex is acquired and if
necessary timekeeping_notify is called to install the clocksource as
the timekeeper clock. timekeeping_notify uses stop_machine which needs
to take cpu_add_remove_lock mutex.
Starting a new cpu is done with the cpu_add_remove_lock mutex held.
native_cpu_up checks the tsc of the new cpu and if the tsc is no good
clocksource_change_rating is called. Which needs the clocksource_mutex
and the deadlock is complete.

The solution is to replace the TSC via the clocksource watchdog
mechanism. Mark the TSC as unstable and schedule the watchdog work so
it gets removed in the watchdog thread context.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <new-submission>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: John Stultz <johnstul@us.ibm.com>

+36 -6
+5 -3
arch/x86/kernel/tsc.c
··· 767 767 { 768 768 if (!tsc_unstable) { 769 769 tsc_unstable = 1; 770 - printk("Marking TSC unstable due to %s\n", reason); 770 + printk(KERN_INFO "Marking TSC unstable due to %s\n", reason); 771 771 /* Change only the rating, when not registered */ 772 772 if (clocksource_tsc.mult) 773 - clocksource_change_rating(&clocksource_tsc, 0); 774 - else 773 + clocksource_mark_unstable(&clocksource_tsc); 774 + else { 775 + clocksource_tsc.flags |= CLOCK_SOURCE_UNSTABLE; 775 776 clocksource_tsc.rating = 0; 777 + } 776 778 } 777 779 } 778 780
+1
include/linux/clocksource.h
··· 277 277 extern void clocksource_change_rating(struct clocksource *cs, int rating); 278 278 extern void clocksource_resume(void); 279 279 extern struct clocksource * __init __weak clocksource_default_clock(void); 280 + extern void clocksource_mark_unstable(struct clocksource *cs); 280 281 281 282 #ifdef CONFIG_GENERIC_TIME_VSYSCALL 282 283 extern void update_vsyscall(struct timespec *ts, struct clocksource *c);
+30 -3
kernel/time/clocksource.c
··· 149 149 kthread_run(clocksource_watchdog_kthread, NULL, "kwatchdog"); 150 150 } 151 151 152 + static void __clocksource_unstable(struct clocksource *cs) 153 + { 154 + cs->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG); 155 + cs->flags |= CLOCK_SOURCE_UNSTABLE; 156 + schedule_work(&watchdog_work); 157 + } 158 + 152 159 static void clocksource_unstable(struct clocksource *cs, int64_t delta) 153 160 { 154 161 printk(KERN_WARNING "Clocksource %s unstable (delta = %Ld ns)\n", 155 162 cs->name, delta); 156 - cs->flags &= ~(CLOCK_SOURCE_VALID_FOR_HRES | CLOCK_SOURCE_WATCHDOG); 157 - cs->flags |= CLOCK_SOURCE_UNSTABLE; 158 - schedule_work(&watchdog_work); 163 + __clocksource_unstable(cs); 164 + } 165 + 166 + /** 167 + * clocksource_mark_unstable - mark clocksource unstable via watchdog 168 + * @cs: clocksource to be marked unstable 169 + * 170 + * This function is called instead of clocksource_change_rating from 171 + * cpu hotplug code to avoid a deadlock between the clocksource mutex 172 + * and the cpu hotplug mutex. It defers the update of the clocksource 173 + * to the watchdog thread. 174 + */ 175 + void clocksource_mark_unstable(struct clocksource *cs) 176 + { 177 + unsigned long flags; 178 + 179 + spin_lock_irqsave(&watchdog_lock, flags); 180 + if (!(cs->flags & CLOCK_SOURCE_UNSTABLE)) { 181 + if (list_empty(&cs->wd_list)) 182 + list_add(&cs->wd_list, &watchdog_list); 183 + __clocksource_unstable(cs); 184 + } 185 + spin_unlock_irqrestore(&watchdog_lock, flags); 159 186 } 160 187 161 188 static void clocksource_watchdog(unsigned long data)