x86: fix "Kernel panic - not syncing: IO-APIC + timer doesn't work!"

this is the tale of a full day spent debugging an ancient but elusive bug.

after booting up thousands of random .config kernels, i finally happened
to generate a .config that produced the following rare bootup failure
on 32-bit x86:

| ..TIMER: vector=0x31 apic1=0 pin1=2 apic2=-1 pin2=-1
| ..MP-BIOS bug: 8254 timer not connected to IO-APIC
| ...trying to set up timer (IRQ0) through the 8259A ... failed.
| ...trying to set up timer as Virtual Wire IRQ... failed.
| ...trying to set up timer as ExtINT IRQ... failed :(.
| Kernel panic - not syncing: IO-APIC + timer doesn't work! Boot with apic=debug
| and send a report. Then try booting with the 'noapic' option

this bug has been reported many times during the years, but it was never
reproduced nor fixed.

the bug that i hit was extremely sensitive to .config details.

First i did a .config-bisection - suspecting some .config detail.
That led to CONFIG_X86_MCE: enabling X86_MCE magically made the bug disappear
and the system would boot up just fine.

Debugging my way through the MCE code ended up identifying two unlikely
candidates: the thing that made a real difference to the hang was that
X86_MCE did two printks:

Intel machine check architecture supported.
Intel machine check reporting enabled on CPU#1.

Adding the same printks to a !CONFIG_X86_MCE kernel made the bug go away!

this left timing as the main suspect: i experimented with adding various
udelay()s to the arch/x86/kernel/io_apic_32.c:check_timer() function, and
the race window turned out to be narrower than 30 microseconds (!).

That made debugging especially funny, debugging without having printk
ability before the bug hits is ... interesting ;-)

eventually i started suspecting IRQ activities - those are pretty much the
only thing that happen this early during bootup and have the timescale of
a few dozen microseconds. Also, check_timer() changes the IRQ hardware
in various creative ways, so the main candidate became IRQ0 interaction.

i've added a counter to track timer irqs (on which core they arrived, at
what exact time, etc.) and found that no timer IRQ would arrive after the
bug condition hits - even if we re-enable IRQ0 and re-initialize the i8259A,
but that we'd get a small number of timer irqs right around the time when we
call the check_timer() function.

Eventually i got the following backtrace triggered from debug code in the
timer interrupt:

...trying to set up timer as Virtual Wire IRQ... failed.
...trying to set up timer as ExtINT IRQ...
Pid: 1, comm: swapper Not tainted (2.6.24-rc5 #57)
EIP: 0060:[<c044d57e>] EFLAGS: 00000246 CPU: 0
EIP is at _spin_unlock_irqrestore+0x5/0x1c
EAX: c0634178 EBX: 00000000 ECX: c4947d63 EDX: 00000246
ESI: 00000002 EDI: 00010031 EBP: c04e0f2e ESP: f7c41df4
DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
CR0: 8005003b CR2: ffe04000 CR3: 00630000 CR4: 000006d0
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
[<c05f5784>] setup_IO_APIC+0x9c3/0xc5c

the spin_unlock() was called from init_8259A(). Wait ... we have an IRQ0
entry while we are in the middle of setting up the local APIC, the i8259A
and the PIT??

That is certainly not how it's supposed to work! check_timer() was supposed
to be called with irqs turned off - but this eroded away sometime in the
past. This code would still work most of the time because this code runs
very quickly, but just the right timing conditions are present and IRQ0
hits in this small, ~30 usecs window, timer irqs stop and the system does
not boot up. Also, given how early this is during bootup, the hang is
very deterministic - but it would only occur on certain machines (and
certain configs).

The fix was quite simple: disable/restore interrupts properly in this
function. With that in place the test-system now boots up just fine.

(64-bit x86 io_apic_64.c had the same bug.)

Phew! One down, only 1500 other kernel bugs are left ;-)

Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

+24 -8
+12 -4
arch/x86/kernel/io_apic_32.c
··· 1882 1882 static int __init timer_irq_works(void) 1883 1883 { 1884 1884 unsigned long t1 = jiffies; 1885 + unsigned long flags; 1885 1886 1886 1887 if (no_timer_check) 1887 1888 return 1; 1888 1889 1890 + local_save_flags(flags); 1889 1891 local_irq_enable(); 1890 1892 /* Let ten ticks pass... */ 1891 1893 mdelay((10 * 1000) / HZ); 1894 + local_irq_restore(flags); 1892 1895 1893 1896 /* 1894 1897 * Expect a few ticks at least, to be sure some possible ··· 2170 2167 int apic1, pin1, apic2, pin2; 2171 2168 int vector; 2172 2169 unsigned int ver; 2170 + unsigned long flags; 2171 + 2172 + local_irq_save(flags); 2173 2173 2174 2174 ver = apic_read(APIC_LVR); 2175 2175 ver = GET_APIC_VERSION(ver); ··· 2225 2219 } 2226 2220 if (disable_timer_pin_1 > 0) 2227 2221 clear_IO_APIC_pin(0, pin1); 2228 - return; 2222 + goto out; 2229 2223 } 2230 2224 clear_IO_APIC_pin(apic1, pin1); 2231 2225 printk(KERN_ERR "..MP-BIOS bug: 8254 timer not connected to " ··· 2248 2242 if (nmi_watchdog == NMI_IO_APIC) { 2249 2243 setup_nmi(); 2250 2244 } 2251 - return; 2245 + goto out; 2252 2246 } 2253 2247 /* 2254 2248 * Cleanup, just in case ... ··· 2272 2266 2273 2267 if (timer_irq_works()) { 2274 2268 printk(" works.\n"); 2275 - return; 2269 + goto out; 2276 2270 } 2277 2271 apic_write_around(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | vector); 2278 2272 printk(" failed.\n"); ··· 2288 2282 2289 2283 if (timer_irq_works()) { 2290 2284 printk(" works.\n"); 2291 - return; 2285 + goto out; 2292 2286 } 2293 2287 printk(" failed :(.\n"); 2294 2288 panic("IO-APIC + timer doesn't work! Boot with apic=debug and send a " 2295 2289 "report. Then try booting with the 'noapic' option"); 2290 + out: 2291 + local_irq_restore(flags); 2296 2292 } 2297 2293 2298 2294 /*
+12 -4
arch/x86/kernel/io_apic_64.c
··· 1281 1281 static int __init timer_irq_works(void) 1282 1282 { 1283 1283 unsigned long t1 = jiffies; 1284 + unsigned long flags; 1284 1285 1286 + local_save_flags(flags); 1285 1287 local_irq_enable(); 1286 1288 /* Let ten ticks pass... */ 1287 1289 mdelay((10 * 1000) / HZ); 1290 + local_irq_restore(flags); 1288 1291 1289 1292 /* 1290 1293 * Expect a few ticks at least, to be sure some possible ··· 1658 1655 { 1659 1656 struct irq_cfg *cfg = irq_cfg + 0; 1660 1657 int apic1, pin1, apic2, pin2; 1658 + unsigned long flags; 1659 + 1660 + local_irq_save(flags); 1661 1661 1662 1662 /* 1663 1663 * get/set the timer IRQ vector: ··· 1702 1696 } 1703 1697 if (disable_timer_pin_1 > 0) 1704 1698 clear_IO_APIC_pin(0, pin1); 1705 - return; 1699 + goto out; 1706 1700 } 1707 1701 clear_IO_APIC_pin(apic1, pin1); 1708 1702 apic_printk(APIC_QUIET,KERN_ERR "..MP-BIOS bug: 8254 timer not " ··· 1724 1718 if (nmi_watchdog == NMI_IO_APIC) { 1725 1719 setup_nmi(); 1726 1720 } 1727 - return; 1721 + goto out; 1728 1722 } 1729 1723 /* 1730 1724 * Cleanup, just in case ... ··· 1747 1741 1748 1742 if (timer_irq_works()) { 1749 1743 apic_printk(APIC_VERBOSE," works.\n"); 1750 - return; 1744 + goto out; 1751 1745 } 1752 1746 apic_write(APIC_LVT0, APIC_LVT_MASKED | APIC_DM_FIXED | cfg->vector); 1753 1747 apic_printk(APIC_VERBOSE," failed.\n"); ··· 1762 1756 1763 1757 if (timer_irq_works()) { 1764 1758 apic_printk(APIC_VERBOSE," works.\n"); 1765 - return; 1759 + goto out; 1766 1760 } 1767 1761 apic_printk(APIC_VERBOSE," failed :(.\n"); 1768 1762 panic("IO-APIC + timer doesn't work! Try using the 'noapic' kernel parameter\n"); 1763 + out: 1764 + local_irq_restore(flags); 1769 1765 } 1770 1766 1771 1767 static int __init notimercheck(char *s)