x86: fix resume (S2R) broken by Intel microcode module, on A110L

Impact: fix deadlock

This is in response to the following bug report:

Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=12100
Subject : resume (S2R) broken by Intel microcode module, on A110L
Submitter : Andreas Mohr <andi@lisas.de>
Date : 2008-11-25 08:48 (19 days old)
Handled-By : Dmitry Adamushko <dmitry.adamushko@gmail.com>

[ The deadlock scenario has been discovered by Andreas Mohr ]

I think I might have a logical explanation why the system:

(http://bugzilla.kernel.org/show_bug.cgi?id=12100)

might hang upon resuming, OTOH it should have likely hanged each and every time.

(1) possible deadlock in microcode_resume_cpu() if either 'if' section is
taken;

(2) now, I don't see it in spec. and can't experimentally verify it (newer
ucodes don't seem to be available for my Core2duo)... but logically-wise, I'd
think that when read upon resuming, the 'microcode revision' (MSR 0x8B) should
be back to its original one (we need to reload ucode anyway so it doesn't seem
logical if a cpu doesn't drop the version)... if so, the comparison with
memcmp() for the full 'struct cpu_signature' is wrong... and that's how one of
the aforementioned 'if' sections might have been triggered - leading to a
deadlock.

Obviously, in my tests I simulated loading/resuming with the ucode of the same
version (just to see that the file is loaded/re-loaded upon resuming) so this
issue has never popped up.

I'd appreciate if someone with an appropriate system might give a try to the
2nd patch (titled "fix a comparison && deadlock...").

In any case, the deadlock situation is a must-have fix.

Reported-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Tested-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Cc: <stable@kernel.org>

Signed-off-by: Ingo Molnar <mingo@elte.hu>

authored by Dmitry Adamushko and committed by Ingo Molnar 280a9ca5 c9bc03ac

Changed files
+20 -5
arch
+14 -5
arch/x86/kernel/microcode_core.c
··· 272 272 .name = "microcode", 273 273 }; 274 274 275 - static void microcode_fini_cpu(int cpu) 275 + static void __microcode_fini_cpu(int cpu) 276 276 { 277 277 struct ucode_cpu_info *uci = ucode_cpu_info + cpu; 278 278 279 - mutex_lock(&microcode_mutex); 280 279 microcode_ops->microcode_fini_cpu(cpu); 281 280 uci->valid = 0; 281 + } 282 + 283 + static void microcode_fini_cpu(int cpu) 284 + { 285 + mutex_lock(&microcode_mutex); 286 + __microcode_fini_cpu(cpu); 282 287 mutex_unlock(&microcode_mutex); 283 288 } 284 289 ··· 311 306 * to this cpu (a bit of paranoia): 312 307 */ 313 308 if (microcode_ops->collect_cpu_info(cpu, &nsig)) { 314 - microcode_fini_cpu(cpu); 309 + __microcode_fini_cpu(cpu); 310 + printk(KERN_ERR "failed to collect_cpu_info for resuming cpu #%d\n", 311 + cpu); 315 312 return -1; 316 313 } 317 314 318 - if (memcmp(&nsig, &uci->cpu_sig, sizeof(nsig))) { 319 - microcode_fini_cpu(cpu); 315 + if ((nsig.sig != uci->cpu_sig.sig) || (nsig.pf != uci->cpu_sig.pf)) { 316 + __microcode_fini_cpu(cpu); 317 + printk(KERN_ERR "cached ucode doesn't match the resuming cpu #%d\n", 318 + cpu); 320 319 /* Should we look for a new ucode here? */ 321 320 return 1; 322 321 }
+6
arch/x86/kernel/microcode_intel.c
··· 155 155 static int collect_cpu_info(int cpu_num, struct cpu_signature *csig) 156 156 { 157 157 struct cpuinfo_x86 *c = &cpu_data(cpu_num); 158 + unsigned long flags; 158 159 unsigned int val[2]; 159 160 160 161 memset(csig, 0, sizeof(*csig)); ··· 175 174 csig->pf = 1 << ((val[1] >> 18) & 7); 176 175 } 177 176 177 + /* serialize access to the physical write to MSR 0x79 */ 178 + spin_lock_irqsave(&microcode_update_lock, flags); 179 + 178 180 wrmsr(MSR_IA32_UCODE_REV, 0, 0); 179 181 /* see notes above for revision 1.07. Apparent chip bug */ 180 182 sync_core(); 181 183 /* get the current revision from MSR 0x8B */ 182 184 rdmsr(MSR_IA32_UCODE_REV, val[0], csig->rev); 185 + spin_unlock_irqrestore(&microcode_update_lock, flags); 186 + 183 187 pr_debug("microcode: collect_cpu_info : sig=0x%x, pf=0x%x, rev=0x%x\n", 184 188 csig->sig, csig->pf, csig->rev); 185 189