x86/smp: Put CPUs into INIT on shutdown if possible

Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can
resume execution due to NMI, SMI and MCE, which has the same issue as the
MWAIT loop.

Kicking the secondary CPUs into INIT makes this safe against NMI and SMI.

A broadcast MCE will take the machine down, but a broadcast MCE which makes
HLT resume and execute overwritten text, pagetables or data will end up in
a disaster too.

So chose the lesser of two evils and kick the secondary CPUs into INIT
unless the system has installed special wakeup mechanisms which are not
using INIT.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230615193330.608657211@linutronix.de

+53 -7
+2
arch/x86/include/asm/smp.h
··· 139 139 void native_send_call_func_single_ipi(int cpu); 140 140 void x86_idle_thread_init(unsigned int cpu, struct task_struct *idle); 141 141 142 + bool smp_park_other_cpus_in_init(void); 143 + 142 144 void smp_store_boot_cpu_info(void); 143 145 void smp_store_cpu_info(int id); 144 146
+32 -7
arch/x86/kernel/smp.c
··· 131 131 } 132 132 133 133 /* 134 - * this function calls the 'stop' function on all other CPUs in the system. 134 + * Disable virtualization, APIC etc. and park the CPU in a HLT loop 135 135 */ 136 136 DEFINE_IDTENTRY_SYSVEC(sysvec_reboot) 137 137 { ··· 172 172 * 2) Wait for all other CPUs to report that they reached the 173 173 * HLT loop in stop_this_cpu() 174 174 * 175 - * 3) If #2 timed out send an NMI to the CPUs which did not 176 - * yet report 175 + * 3) If the system uses INIT/STARTUP for CPU bringup, then 176 + * send all present CPUs an INIT vector, which brings them 177 + * completely out of the way. 177 178 * 178 - * 4) Wait for all other CPUs to report that they reached the 179 + * 4) If #3 is not possible and #2 timed out send an NMI to the 180 + * CPUs which did not yet report 181 + * 182 + * 5) Wait for all other CPUs to report that they reached the 179 183 * HLT loop in stop_this_cpu() 180 184 * 181 - * #3 can obviously race against a CPU reaching the HLT loop late. 185 + * #4 can obviously race against a CPU reaching the HLT loop late. 182 186 * That CPU will have reported already and the "have all CPUs 183 187 * reached HLT" condition will be true despite the fact that the 184 188 * other CPU is still handling the NMI. Again, there is no ··· 198 194 /* 199 195 * Don't wait longer than a second for IPI completion. The 200 196 * wait request is not checked here because that would 201 - * prevent an NMI shutdown attempt in case that not all 197 + * prevent an NMI/INIT shutdown in case that not all 202 198 * CPUs reach shutdown state. 203 199 */ 204 200 timeout = USEC_PER_SEC; ··· 206 202 udelay(1); 207 203 } 208 204 209 - /* if the REBOOT_VECTOR didn't work, try with the NMI */ 205 + /* 206 + * Park all other CPUs in INIT including "offline" CPUs, if 207 + * possible. That's a safe place where they can't resume execution 208 + * of HLT and then execute the HLT loop from overwritten text or 209 + * page tables. 210 + * 211 + * The only downside is a broadcast MCE, but up to the point where 212 + * the kexec() kernel brought all APs online again an MCE will just 213 + * make HLT resume and handle the MCE. The machine crashes and burns 214 + * due to overwritten text, page tables and data. So there is a 215 + * choice between fire and frying pan. The result is pretty much 216 + * the same. Chose frying pan until x86 provides a sane mechanism 217 + * to park a CPU. 218 + */ 219 + if (smp_park_other_cpus_in_init()) 220 + goto done; 221 + 222 + /* 223 + * If park with INIT was not possible and the REBOOT_VECTOR didn't 224 + * take all secondary CPUs offline, try with the NMI. 225 + */ 210 226 if (!cpumask_empty(&cpus_stop_mask)) { 211 227 /* 212 228 * If NMI IPI is enabled, try to register the stop handler ··· 249 225 udelay(1); 250 226 } 251 227 228 + done: 252 229 local_irq_save(flags); 253 230 disable_local_APIC(); 254 231 mcheck_cpu_clear(this_cpu_ptr(&cpu_info));
+19
arch/x86/kernel/smpboot.c
··· 1465 1465 cache_aps_init(); 1466 1466 } 1467 1467 1468 + bool smp_park_other_cpus_in_init(void) 1469 + { 1470 + unsigned int cpu, this_cpu = smp_processor_id(); 1471 + unsigned int apicid; 1472 + 1473 + if (apic->wakeup_secondary_cpu_64 || apic->wakeup_secondary_cpu) 1474 + return false; 1475 + 1476 + for_each_present_cpu(cpu) { 1477 + if (cpu == this_cpu) 1478 + continue; 1479 + apicid = apic->cpu_present_to_apicid(cpu); 1480 + if (apicid == BAD_APICID) 1481 + continue; 1482 + send_init_sequence(apicid); 1483 + } 1484 + return true; 1485 + } 1486 + 1468 1487 /* 1469 1488 * Early setup to make printk work. 1470 1489 */