Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

cpu/hotplug: Fix rollback during error-out in __cpu_disable()

The recent introduction of the hotplug thread which invokes the callbacks on
the plugged cpu, cased the following regression:

If takedown_cpu() fails, then we run into several issues:

1) The rollback of the target cpu states is not invoked. That leaves the smp
threads and the hotplug thread in disabled state.

2) notify_online() is executed due to a missing skip_onerr flag. That causes
that both CPU_DOWN_FAILED and CPU_ONLINE notifications are invoked which
confuses quite some notifiers.

3) The CPU_DOWN_FAILED notification is not invoked on the target CPU. That's
not an issue per se, but it is inconsistent and in consequence blocks the
patches which rely on these states being invoked on the target CPU and not
on the controlling cpu. It also does not preserve the strict call order on
rollback which is problematic for the ongoing state machine conversion as
well.

To fix this we add a rollback flag to the remote callback machinery and invoke
the rollback including the CPU_DOWN_FAILED notification on the remote
cpu. Further mark the notify online state with 'skip_onerr' so we don't get a
double invokation.

This workaround will go away once we moved the unplug invocation to the target
cpu itself.

[ tglx: Massaged changelog and moved the CPU_DOWN_FAILED notifiaction to the
target cpu ]

Fixes: 4cb28ced23c4 ("cpu/hotplug: Create hotplug threads")
Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: linux-s390@vger.kernel.org
Cc: rt@linutronix.de
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Anna-Maria Gleixner <anna-maria@linutronix.de>
Link: http://lkml.kernel.org/r/20160408124015.GA21960@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

authored by

Sebastian Andrzej Siewior and committed by
Thomas Gleixner
3b9d6da6 c3b46c73

+26 -7
+26 -7
kernel/cpu.c
··· 36 36 * @target: The target state 37 37 * @thread: Pointer to the hotplug thread 38 38 * @should_run: Thread should execute 39 + * @rollback: Perform a rollback 39 40 * @cb_stat: The state for a single callback (install/uninstall) 40 41 * @cb: Single callback function (install/uninstall) 41 42 * @result: Result of the operation ··· 48 47 #ifdef CONFIG_SMP 49 48 struct task_struct *thread; 50 49 bool should_run; 50 + bool rollback; 51 51 enum cpuhp_state cb_state; 52 52 int (*cb)(unsigned int cpu); 53 53 int result; ··· 303 301 return __cpu_notify(val, cpu, -1, NULL); 304 302 } 305 303 304 + static void cpu_notify_nofail(unsigned long val, unsigned int cpu) 305 + { 306 + BUG_ON(cpu_notify(val, cpu)); 307 + } 308 + 306 309 /* Notifier wrappers for transitioning to state machine */ 307 310 static int notify_prepare(unsigned int cpu) 308 311 { ··· 484 477 } else { 485 478 ret = cpuhp_invoke_callback(cpu, st->cb_state, st->cb); 486 479 } 480 + } else if (st->rollback) { 481 + BUG_ON(st->state < CPUHP_AP_ONLINE_IDLE); 482 + 483 + undo_cpu_down(cpu, st, cpuhp_ap_states); 484 + /* 485 + * This is a momentary workaround to keep the notifier users 486 + * happy. Will go away once we got rid of the notifiers. 487 + */ 488 + cpu_notify_nofail(CPU_DOWN_FAILED, cpu); 489 + st->rollback = false; 487 490 } else { 488 491 /* Cannot happen .... */ 489 492 BUG_ON(st->state < CPUHP_AP_ONLINE_IDLE); ··· 653 636 read_unlock(&tasklist_lock); 654 637 } 655 638 656 - static void cpu_notify_nofail(unsigned long val, unsigned int cpu) 657 - { 658 - BUG_ON(cpu_notify(val, cpu)); 659 - } 660 - 661 639 static int notify_down_prepare(unsigned int cpu) 662 640 { 663 641 int err, nr_calls = 0; ··· 733 721 */ 734 722 err = stop_machine(take_cpu_down, NULL, cpumask_of(cpu)); 735 723 if (err) { 736 - /* CPU didn't die: tell everyone. Can't complain. */ 737 - cpu_notify_nofail(CPU_DOWN_FAILED, cpu); 724 + /* CPU refused to die */ 738 725 irq_unlock_sparse(); 726 + /* Unpark the hotplug thread so we can rollback there */ 727 + kthread_unpark(per_cpu_ptr(&cpuhp_state, cpu)->thread); 739 728 return err; 740 729 } 741 730 BUG_ON(cpu_online(cpu)); ··· 845 832 * to do the further cleanups. 846 833 */ 847 834 ret = cpuhp_down_callbacks(cpu, st, cpuhp_bp_states, target); 835 + if (ret && st->state > CPUHP_TEARDOWN_CPU && st->state < prev_state) { 836 + st->target = prev_state; 837 + st->rollback = true; 838 + cpuhp_kick_ap_work(cpu); 839 + } 848 840 849 841 hasdied = prev_state != st->state && st->state == CPUHP_OFFLINE; 850 842 out: ··· 1267 1249 .name = "notify:online", 1268 1250 .startup = notify_online, 1269 1251 .teardown = notify_down_prepare, 1252 + .skip_onerr = true, 1270 1253 }, 1271 1254 #endif 1272 1255 /*