Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86, apic: Fix spurious error interrupts triggering on all non-boot APs

This patch fixes a bug reported by a customer, who found
that many unreasonable error interrupts reported on all
non-boot CPUs (APs) during the system boot stage.

According to Chapter 10 of Intel Software Developer Manual
Volume 3A, Local APIC may signal an illegal vector error when
an LVT entry is set as an illegal vector value (0~15) under
FIXED delivery mode (bits 8-11 is 0), regardless of whether
the mask bit is set or an interrupt actually happen. These
errors are seen as error interrupts.

The initial value of thermal LVT entries on all APs always reads
0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
sequence to them and LVT registers are reset to 0s except for
the mask bits which are set to 1s when APs receive INIT IPI.

When the BIOS takes over the thermal throttling interrupt,
the LVT thermal deliver mode should be SMI and it is required
from the kernel to keep AP's LVT thermal monitoring register
programmed as such as well.

This issue happens when BIOS does not take over thermal throttling
interrupt, AP's LVT thermal monitor register will be restored to
0x10000 which means vector 0 and fixed deliver mode, so all APs will
signal illegal vector error interrupts.

This patch check if interrupt delivery mode is not fixed mode before
restoring AP's LVT thermal monitor register.

Signed-off-by: Youquan Song <youquan.song@intel.com>
Acked-by: Suresh Siddha <suresh.b.siddha@intel.com>
Acked-by: Yong Wang <yong.y.wang@intel.com>
Cc: hpa@linux.intel.com
Cc: joe@perches.com
Cc: jbaron@redhat.com
Cc: trenn@suse.de
Cc: kent.liu@intel.com
Cc: chaohong.guo@intel.com
Cc: <stable@kernel.org> # As far back as possible
Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>

authored by

Youquan Song and committed by
Ingo Molnar
e503f9e4 d9a5ac9e

+8 -5
+1
arch/x86/include/asm/apicdef.h
··· 78 78 #define APIC_DEST_LOGICAL 0x00800 79 79 #define APIC_DEST_PHYSICAL 0x00000 80 80 #define APIC_DM_FIXED 0x00000 81 + #define APIC_DM_FIXED_MASK 0x00700 81 82 #define APIC_DM_LOWEST 0x00100 82 83 #define APIC_DM_SMI 0x00200 83 84 #define APIC_DM_REMRD 0x00300
+7 -5
arch/x86/kernel/cpu/mcheck/therm_throt.c
··· 446 446 */ 447 447 rdmsr(MSR_IA32_MISC_ENABLE, l, h); 448 448 449 + h = lvtthmr_init; 449 450 /* 450 451 * The initial value of thermal LVT entries on all APs always reads 451 452 * 0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI 452 453 * sequence to them and LVT registers are reset to 0s except for 453 454 * the mask bits which are set to 1s when APs receive INIT IPI. 454 - * Always restore the value that BIOS has programmed on AP based on 455 - * BSP's info we saved since BIOS is always setting the same value 456 - * for all threads/cores 455 + * If BIOS takes over the thermal interrupt and sets its interrupt 456 + * delivery mode to SMI (not fixed), it restores the value that the 457 + * BIOS has programmed on AP based on BSP's info we saved since BIOS 458 + * is always setting the same value for all threads/cores. 457 459 */ 458 - apic_write(APIC_LVTTHMR, lvtthmr_init); 460 + if ((h & APIC_DM_FIXED_MASK) != APIC_DM_FIXED) 461 + apic_write(APIC_LVTTHMR, lvtthmr_init); 459 462 460 - h = lvtthmr_init; 461 463 462 464 if ((l & MSR_IA32_MISC_ENABLE_TM1) && (h & APIC_DM_SMI)) { 463 465 printk(KERN_DEBUG