hrtimers: Handle CPU state correctly on hotplug

Consider a scenario where a CPU transitions from CPUHP_ONLINE to halfway
through a CPU hotunplug down to CPUHP_HRTIMERS_PREPARE, and then back to
CPUHP_ONLINE:

Since hrtimers_prepare_cpu() does not run, cpu_base.hres_active remains set
to 1 throughout. However, during a CPU unplug operation, the tick and the
clockevents are shut down at CPUHP_AP_TICK_DYING. On return to the online
state, for instance CFS incorrectly assumes that the hrtick is already
active, and the chance of the clockevent device to transition to oneshot
mode is also lost forever for the CPU, unless it goes back to a lower state
than CPUHP_HRTIMERS_PREPARE once.

This round-trip reveals another issue; cpu_base.online is not set to 1
after the transition, which appears as a WARN_ON_ONCE in enqueue_hrtimer().

Aside of that, the bulk of the per CPU state is not reset either, which
means there are dangling pointers in the worst case.

Address this by adding a corresponding startup() callback, which resets the
stale per CPU state and sets the online flag.

[ tglx: Make the new callback unconditionally available, remove the online
modification in the prepare() callback and clear the remaining
state in the starting callback instead of the prepare callback ]

Fixes: 5c0930ccaad5 ("hrtimers: Push pending hrtimers away from outgoing CPU earlier")
Signed-off-by: Koichiro Den <koichiro.den@canonical.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20241220134421.3809834-1-koichiro.den@canonical.com

authored by Koichiro Den and committed by Thomas Gleixner 2f8dea16 922efd29

Changed files
+12 -2
include
linux
kernel
+1
include/linux/hrtimer.h
··· 386 386 extern void sysrq_timer_list_show(void); 387 387 388 388 int hrtimers_prepare_cpu(unsigned int cpu); 389 + int hrtimers_cpu_starting(unsigned int cpu); 389 390 #ifdef CONFIG_HOTPLUG_CPU 390 391 int hrtimers_cpu_dying(unsigned int cpu); 391 392 #else
+1 -1
kernel/cpu.c
··· 2179 2179 }, 2180 2180 [CPUHP_AP_HRTIMERS_DYING] = { 2181 2181 .name = "hrtimers:dying", 2182 - .startup.single = NULL, 2182 + .startup.single = hrtimers_cpu_starting, 2183 2183 .teardown.single = hrtimers_cpu_dying, 2184 2184 }, 2185 2185 [CPUHP_AP_TICK_DYING] = {
+10 -1
kernel/time/hrtimer.c
··· 2202 2202 } 2203 2203 2204 2204 cpu_base->cpu = cpu; 2205 + hrtimer_cpu_base_init_expiry_lock(cpu_base); 2206 + return 0; 2207 + } 2208 + 2209 + int hrtimers_cpu_starting(unsigned int cpu) 2210 + { 2211 + struct hrtimer_cpu_base *cpu_base = this_cpu_ptr(&hrtimer_bases); 2212 + 2213 + /* Clear out any left over state from a CPU down operation */ 2205 2214 cpu_base->active_bases = 0; 2206 2215 cpu_base->hres_active = 0; 2207 2216 cpu_base->hang_detected = 0; ··· 2219 2210 cpu_base->expires_next = KTIME_MAX; 2220 2211 cpu_base->softirq_expires_next = KTIME_MAX; 2221 2212 cpu_base->online = 1; 2222 - hrtimer_cpu_base_init_expiry_lock(cpu_base); 2223 2213 return 0; 2224 2214 } 2225 2215 ··· 2294 2286 void __init hrtimers_init(void) 2295 2287 { 2296 2288 hrtimers_prepare_cpu(smp_processor_id()); 2289 + hrtimers_cpu_starting(smp_processor_id()); 2297 2290 open_softirq(HRTIMER_SOFTIRQ, hrtimer_run_softirq); 2298 2291 }