x86: fix NULL pointer deref in __switch_to

I am able to reproduce the oops reported by Simon in __switch_to() with
lguest.

My debug showed that there is at least one lguest specific
issue (which should be present in 2.6.25 and before aswell) and it got
exposed with a kernel oops with the recent fpu dynamic allocation patches.

In addition to the previous possible scenario (with fpu_counter), in the
presence of lguest, it is possible that the cpu's TS bit it still set and the
lguest launcher task's thread_info has TS_USEDFPU still set.

This is because of the way the lguest launcher handling the guest's TS bit.
(look at lguest_set_ts() in lguest_arch_run_guest()). This can result
in a DNA fault while doing unlazy_fpu() in __switch_to(). This will
end up causing a DNA fault in the context of new process thats
getting context switched in (as opossed to handling DNA fault in the context
of lguest launcher/helper process).

This is wrong in both pre and post 2.6.25 kernels. In the recent
2.6.26-rc series, this is showing up as NULL pointer dereferences or
sleeping function called from atomic context(__switch_to()), as
we free and dynamically allocate the FPU context for the newly
created threads. Older kernels might show some FPU corruption for processes
running inside of lguest.

With the appended patch, my test system is running for more than 50 mins
now. So atleast some of your oops (hopefully all!) should get fixed.
Please give it a try. I will spend more time with this fix tomorrow.

Reported-by: Simon Holm Thøgersen <odie@cs.aau.dk>
Reported-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

authored by Suresh Siddha and committed by Ingo Molnar 54481cf8 ffe6e1da

+9 -6
+9 -6
drivers/lguest/x86/core.c
··· 176 * we set it now, so we can trap and pass that trap to the Guest if it 177 * uses the FPU. */ 178 if (cpu->ts) 179 - lguest_set_ts(); 180 181 /* SYSENTER is an optimized way of doing system calls. We can't allow 182 * it because it always jumps to privilege level 0. A normal Guest ··· 196 * trap made the switcher code come back, and an error code which some 197 * traps set. */ 198 199 /* If the Guest page faulted, then the cr2 register will tell us the 200 * bad virtual address. We have to grab this now, because once we 201 * re-enable interrupts an interrupt could fault and thus overwrite ··· 207 if (cpu->regs->trapnum == 14) 208 cpu->arch.last_pagefault = read_cr2(); 209 /* Similarly, if we took a trap because the Guest used the FPU, 210 - * we have to restore the FPU it expects to see. */ 211 else if (cpu->regs->trapnum == 7) 212 math_state_restore(); 213 - 214 - /* Restore SYSENTER if it's supposed to be on. */ 215 - if (boot_cpu_has(X86_FEATURE_SEP)) 216 - wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0); 217 } 218 219 /*H:130 Now we've examined the hypercall code; our Guest can make requests.
··· 176 * we set it now, so we can trap and pass that trap to the Guest if it 177 * uses the FPU. */ 178 if (cpu->ts) 179 + unlazy_fpu(current); 180 181 /* SYSENTER is an optimized way of doing system calls. We can't allow 182 * it because it always jumps to privilege level 0. A normal Guest ··· 196 * trap made the switcher code come back, and an error code which some 197 * traps set. */ 198 199 + /* Restore SYSENTER if it's supposed to be on. */ 200 + if (boot_cpu_has(X86_FEATURE_SEP)) 201 + wrmsr(MSR_IA32_SYSENTER_CS, __KERNEL_CS, 0); 202 + 203 /* If the Guest page faulted, then the cr2 register will tell us the 204 * bad virtual address. We have to grab this now, because once we 205 * re-enable interrupts an interrupt could fault and thus overwrite ··· 203 if (cpu->regs->trapnum == 14) 204 cpu->arch.last_pagefault = read_cr2(); 205 /* Similarly, if we took a trap because the Guest used the FPU, 206 + * we have to restore the FPU it expects to see. 207 + * math_state_restore() may sleep and we may even move off to 208 + * a different CPU. So all the critical stuff should be done 209 + * before this. */ 210 else if (cpu->regs->trapnum == 7) 211 math_state_restore(); 212 } 213 214 /*H:130 Now we've examined the hypercall code; our Guest can make requests.