Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/apic: Make the ISR clearing sane

apic_pending_intr_clear() is fundamentally voodoo programming. It's primary
purpose is to clear stale ISR bits in the local APIC, which would otherwise
lock the corresponding interrupt priority level.

The comments and the implementation claim falsely that after clearing the
stale ISR bits, eventually stale IRR bits would be turned into ISR bits and
can be cleared as well. That's just wishful thinking because:

1) If interrupts are disabled, the APIC does not propagate an IRR bit to
the ISR.

2) If interrupts are enabled, then the APIC propagates the IRR bit to the
ISR and raises the interrupt in the CPU, which means that code _cannot_
observe the ISR bit for any of those IRR bits.

Rename the function to reflect the purpose and make exactly _one_ attempt
to EOI the pending ISR bits and add comments why traversing the pending bit
map in low to high priority order is correct.

Instead of trying to "clear" IRR bits, simply print a warning message when
the IRR is not empty.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/871ppwih4s.ffs@tglx

authored by

Thomas Gleixner and committed by
Borislav Petkov (AMD)
1b558e14 1b237f19

+38 -41
+38 -41
arch/x86/kernel/apic/apic.c
··· 1428 1428 u32 regs[APIC_IR_REGS]; 1429 1429 }; 1430 1430 1431 - static bool apic_check_and_ack(union apic_ir *irr, union apic_ir *isr) 1431 + static bool apic_check_and_eoi_isr(union apic_ir *isr) 1432 1432 { 1433 1433 int i, bit; 1434 - 1435 - /* Read the IRRs */ 1436 - for (i = 0; i < APIC_IR_REGS; i++) 1437 - irr->regs[i] = apic_read(APIC_IRR + i * 0x10); 1438 1434 1439 1435 /* Read the ISRs */ 1440 1436 for (i = 0; i < APIC_IR_REGS; i++) 1441 1437 isr->regs[i] = apic_read(APIC_ISR + i * 0x10); 1442 1438 1443 - /* 1444 - * If the ISR map is not empty. ACK the APIC and run another round 1445 - * to verify whether a pending IRR has been unblocked and turned 1446 - * into a ISR. 1447 - */ 1448 - if (!bitmap_empty(isr->map, APIC_IR_BITS)) { 1449 - /* 1450 - * There can be multiple ISR bits set when a high priority 1451 - * interrupt preempted a lower priority one. Issue an ACK 1452 - * per set bit. 1453 - */ 1454 - for_each_set_bit(bit, isr->map, APIC_IR_BITS) 1455 - apic_eoi(); 1439 + /* If the ISR map empty, nothing to do here. */ 1440 + if (bitmap_empty(isr->map, APIC_IR_BITS)) 1456 1441 return true; 1457 - } 1458 1442 1459 - return !bitmap_empty(irr->map, APIC_IR_BITS); 1443 + /* 1444 + * There can be multiple ISR bits set when a high priority 1445 + * interrupt preempted a lower priority one. Issue an EOI for each 1446 + * set bit. The priority traversal order does not matter as there 1447 + * can't be new ISR bits raised at this point. What matters is that 1448 + * an EOI is issued for each ISR bit. 1449 + */ 1450 + for_each_set_bit(bit, isr->map, APIC_IR_BITS) 1451 + apic_eoi(); 1452 + 1453 + /* Reread the ISRs, they should be empty now */ 1454 + for (i = 0; i < APIC_IR_REGS; i++) 1455 + isr->regs[i] = apic_read(APIC_ISR + i * 0x10); 1456 + 1457 + return bitmap_empty(isr->map, APIC_IR_BITS); 1460 1458 } 1461 1459 1462 1460 /* 1463 - * After a crash, we no longer service the interrupts and a pending 1464 - * interrupt from previous kernel might still have ISR bit set. 1461 + * If a CPU services an interrupt and crashes before issuing EOI to the 1462 + * local APIC, the corresponding ISR bit is still set when the crashing CPU 1463 + * jumps into a crash kernel. Read the ISR and issue an EOI for each set 1464 + * bit to acknowledge it as otherwise these slots would be locked forever 1465 + * waiting for an EOI. 1465 1466 * 1466 - * Most probably by now the CPU has serviced that pending interrupt and it 1467 - * might not have done the apic_eoi() because it thought, interrupt 1468 - * came from i8259 as ExtInt. LAPIC did not get EOI so it does not clear 1469 - * the ISR bit and cpu thinks it has already serviced the interrupt. Hence 1470 - * a vector might get locked. It was noticed for timer irq (vector 1471 - * 0x31). Issue an extra EOI to clear ISR. 1467 + * If there are pending bits in the IRR, then they won't be converted into 1468 + * ISR bits as the CPU has interrupts disabled. They will be delivered once 1469 + * the CPU enables interrupts and there is nothing which can prevent that. 1472 1470 * 1473 - * If there are pending IRR bits they turn into ISR bits after a higher 1474 - * priority ISR bit has been acked. 1471 + * In the worst case this results in spurious interrupt warnings. 1475 1472 */ 1476 - static void apic_pending_intr_clear(void) 1473 + static void apic_clear_isr(void) 1477 1474 { 1478 - union apic_ir irr, isr; 1475 + union apic_ir ir; 1479 1476 unsigned int i; 1480 1477 1481 - /* 512 loops are way oversized and give the APIC a chance to obey. */ 1482 - for (i = 0; i < 512; i++) { 1483 - if (!apic_check_and_ack(&irr, &isr)) 1484 - return; 1485 - } 1486 - /* Dump the IRR/ISR content if that failed */ 1487 - pr_warn("APIC: Stale IRR: %256pb ISR: %256pb\n", irr.map, isr.map); 1478 + if (!apic_check_and_eoi_isr(&ir)) 1479 + pr_warn("APIC: Stale ISR: %256pb\n", ir.map); 1480 + 1481 + for (i = 0; i < APIC_IR_REGS; i++) 1482 + ir.regs[i] = apic_read(APIC_IRR + i * 0x10); 1483 + 1484 + if (!bitmap_empty(ir.map, APIC_IR_BITS)) 1485 + pr_warn("APIC: Stale IRR: %256pb\n", ir.map); 1488 1486 } 1489 1487 1490 1488 /** ··· 1539 1541 value |= 0x10; 1540 1542 apic_write(APIC_TASKPRI, value); 1541 1543 1542 - /* Clear eventually stale ISR/IRR bits */ 1543 - apic_pending_intr_clear(); 1544 + apic_clear_isr(); 1544 1545 1545 1546 /* 1546 1547 * Now that we are all set up, enable the APIC