Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

powerpc/64: irq_work avoid interrupt when called with hardware irqs enabled

irq_work_raise should not cause a decrementer exception unless it is
called from NMI context. Doing so often just results in an immediate
masked decrementer interrupt:

<...>-550 90d... 4us : update_curr_rt <-dequeue_task_rt
<...>-550 90d... 5us : dbs_update_util_handler <-update_curr_rt
<...>-550 90d... 6us : arch_irq_work_raise <-irq_work_queue
<...>-550 90d... 7us : soft_nmi_interrupt <-soft_nmi_common
<...>-550 90d... 7us : printk_nmi_enter <-soft_nmi_interrupt
<...>-550 90d.Z. 8us : rcu_nmi_enter <-soft_nmi_interrupt
<...>-550 90d.Z. 9us : rcu_nmi_exit <-soft_nmi_interrupt
<...>-550 90d... 9us : printk_nmi_exit <-soft_nmi_interrupt
<...>-550 90d... 10us : cpuacct_charge <-update_curr_rt

The soft_nmi_interrupt here is the call into the watchdog, due to the
decrementer interrupt firing with irqs soft-disabled. This is
harmless, but sub-optimal.

When it's not called from NMI context or with interrupts enabled, mark
the decrementer pending in the irq_happened mask directly, rather than
having the masked decrementer interupt handler do it. This will be
replayed at the next local_irq_enable. See the comment for details.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

authored by

Nicholas Piggin and committed by
Michael Ellerman
ebb37cf3 98fd72fe

+31 -2
+31 -2
arch/powerpc/kernel/time.c
··· 513 513 "i" (offsetof(struct paca_struct, irq_work_pending))); 514 514 } 515 515 516 + void arch_irq_work_raise(void) 517 + { 518 + preempt_disable(); 519 + set_irq_work_pending_flag(); 520 + /* 521 + * Non-nmi code running with interrupts disabled will replay 522 + * irq_happened before it re-enables interrupts, so setthe 523 + * decrementer there instead of causing a hardware exception 524 + * which would immediately hit the masked interrupt handler 525 + * and have the net effect of setting the decrementer in 526 + * irq_happened. 527 + * 528 + * NMI interrupts can not check this when they return, so the 529 + * decrementer hardware exception is raised, which will fire 530 + * when interrupts are next enabled. 531 + * 532 + * BookE does not support this yet, it must audit all NMI 533 + * interrupt handlers to ensure they call nmi_enter() so this 534 + * check would be correct. 535 + */ 536 + if (IS_ENABLED(CONFIG_BOOKE) || !irqs_disabled() || in_nmi()) { 537 + set_dec(1); 538 + } else { 539 + hard_irq_disable(); 540 + local_paca->irq_happened |= PACA_IRQ_DEC; 541 + } 542 + preempt_enable(); 543 + } 544 + 516 545 #else /* 32-bit */ 517 546 518 547 DEFINE_PER_CPU(u8, irq_work_pending); ··· 550 521 #define test_irq_work_pending() __this_cpu_read(irq_work_pending) 551 522 #define clear_irq_work_pending() __this_cpu_write(irq_work_pending, 0) 552 523 553 - #endif /* 32 vs 64 bit */ 554 - 555 524 void arch_irq_work_raise(void) 556 525 { 557 526 preempt_disable(); ··· 557 530 set_dec(1); 558 531 preempt_enable(); 559 532 } 533 + 534 + #endif /* 32 vs 64 bit */ 560 535 561 536 #else /* CONFIG_IRQ_WORK */ 562 537