Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

powerpc/powernv: Don't call generic code on offline cpus

On PowerNV platforms, when a CPU is offline, we put it into nap mode.
It's possible that the CPU wakes up from nap mode while it is still
offline due to a stray IPI. A misdirected device interrupt could also
potentially cause it to wake up. In that circumstance, we need to clear
the interrupt so that the CPU can go back to nap mode.

In the past the clearing of the interrupt was accomplished by briefly
enabling interrupts and allowing the normal interrupt handling code
(do_IRQ() etc.) to handle the interrupt. This has the problem that
this code calls irq_enter() and irq_exit(), which call functions such
as account_system_vtime() which use RCU internally. Use of RCU is not
permitted on offline CPUs and will trigger errors if RCU checking is
enabled.

To avoid calling into any generic code which might use RCU, we adopt
a different method of clearing interrupts on offline CPUs. Since we
are on the PowerNV platform, we know that the system interrupt
controller is a XICS being driven directly (i.e. not via hcalls) by
the kernel. Hence this adds a new icp_native_flush_interrupt()
function to the native-mode XICS driver and arranges to call that
when an offline CPU is woken from nap. This new function reads the
interrupt from the XICS. If it is an IPI, it clears the IPI; if it
is a device interrupt, it prints a warning and disables the source.
Then it does the end-of-interrupt processing for the interrupt.

The other thing that briefly enabling interrupts did was to check and
clear the irq_happened flag in this CPU's PACA. Therefore, after
flushing the interrupt from the XICS, we also clear all bits except
the PACA_IRQ_HARD_DIS (interrupts are hard disabled) bit from the
irq_happened flag. The PACA_IRQ_HARD_DIS flag is set by power7_nap()
and is left set to indicate that interrupts are hard disabled. This
means we then have to ignore that flag in power7_nap(), which is
reasonable since it doesn't indicate that any interrupt event needs
servicing.

Signed-off-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

authored by

Paul Mackerras and committed by
Michael Ellerman
d6a4f709 423216ed

+30 -4
+1
arch/powerpc/include/asm/xics.h
··· 29 29 /* Native ICP */ 30 30 #ifdef CONFIG_PPC_ICP_NATIVE 31 31 extern int icp_native_init(void); 32 + extern void icp_native_flush_interrupt(void); 32 33 #else 33 34 static inline int icp_native_init(void) { return -ENODEV; } 34 35 #endif
+1 -1
arch/powerpc/kernel/idle_power7.S
··· 73 73 74 74 /* Check if something happened while soft-disabled */ 75 75 lbz r0,PACAIRQHAPPENED(r13) 76 - cmpwi cr0,r0,0 76 + andi. r0,r0,~PACA_IRQ_HARD_DIS@l 77 77 beq 1f 78 78 cmpwi cr0,r4,0 79 79 beq 1f
+3 -3
arch/powerpc/platforms/powernv/smp.c
··· 168 168 power7_nap(1); 169 169 ppc64_runlatch_on(); 170 170 171 - /* Reenable IRQs briefly to clear the IPI that woke us */ 172 - local_irq_enable(); 173 - local_irq_disable(); 171 + /* Clear the IPI that woke us up */ 172 + icp_native_flush_interrupt(); 173 + local_paca->irq_happened &= PACA_IRQ_HARD_DIS; 174 174 mb(); 175 175 176 176 if (cpu_core_split_required())
+25
arch/powerpc/sysdev/xics/icp-native.c
··· 155 155 icp_native_set_qirr(cpu, IPI_PRIORITY); 156 156 } 157 157 158 + /* 159 + * Called when an interrupt is received on an off-line CPU to 160 + * clear the interrupt, so that the CPU can go back to nap mode. 161 + */ 162 + void icp_native_flush_interrupt(void) 163 + { 164 + unsigned int xirr = icp_native_get_xirr(); 165 + unsigned int vec = xirr & 0x00ffffff; 166 + 167 + if (vec == XICS_IRQ_SPURIOUS) 168 + return; 169 + if (vec == XICS_IPI) { 170 + /* Clear pending IPI */ 171 + int cpu = smp_processor_id(); 172 + kvmppc_set_host_ipi(cpu, 0); 173 + icp_native_set_qirr(cpu, 0xff); 174 + } else { 175 + pr_err("XICS: hw interrupt 0x%x to offline cpu, disabling\n", 176 + vec); 177 + xics_mask_unknown_vec(vec); 178 + } 179 + /* EOI the interrupt */ 180 + icp_native_set_xirr(xirr); 181 + } 182 + 158 183 void xics_wake_cpu(int cpu) 159 184 { 160 185 icp_native_set_qirr(cpu, IPI_PRIORITY);