Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

powerpc/pseries: Don't enforce MSI affinity with kdump

Depending on the number of online CPUs in the original kernel, it is
likely for CPU #0 to be offline in a kdump kernel. The associated IRQs
in the affinity mappings provided by irq_create_affinity_masks() are
thus not started by irq_startup(), as per-design with managed IRQs.

This can be a problem with multi-queue block devices driven by blk-mq :
such a non-started IRQ is very likely paired with the single queue
enforced by blk-mq during kdump (see blk_mq_alloc_tag_set()). This
causes the device to remain silent and likely hangs the guest at
some point.

This is a regression caused by commit 9ea69a55b3b9 ("powerpc/pseries:
Pass MSI affinity to irq_create_mapping()"). Note that this only happens
with the XIVE interrupt controller because XICS has a workaround to bypass
affinity, which is activated during kdump with the "noirqdistrib" kernel
parameter.

The issue comes from a combination of factors:
- discrepancy between the number of queues detected by the multi-queue
block driver, that was used to create the MSI vectors, and the single
queue mode enforced later on by blk-mq because of kdump (i.e. keeping
all queues fixes the issue)
- CPU#0 offline (i.e. kdump always succeed with CPU#0)

Given that I couldn't reproduce on x86, which seems to always have CPU#0
online even during kdump, I'm not sure where this should be fixed. Hence
going for another approach : fine-grained affinity is for performance
and we don't really care about that during kdump. Simply revert to the
previous working behavior of ignoring affinity masks in this case only.

Fixes: 9ea69a55b3b9 ("powerpc/pseries: Pass MSI affinity to irq_create_mapping()")
Cc: stable@vger.kernel.org # v5.10+
Signed-off-by: Greg Kurz <groug@kaod.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210215094506.1196119-1-groug@kaod.org

authored by

Greg Kurz and committed by
Michael Ellerman
f9619d5e eead0893

+23 -2
+23 -2
arch/powerpc/platforms/pseries/msi.c
··· 4 4 * Copyright 2006-2007 Michael Ellerman, IBM Corp. 5 5 */ 6 6 7 + #include <linux/crash_dump.h> 7 8 #include <linux/device.h> 8 9 #include <linux/irq.h> 9 10 #include <linux/msi.h> ··· 459 458 return hwirq; 460 459 } 461 460 462 - virq = irq_create_mapping_affinity(NULL, hwirq, 463 - entry->affinity); 461 + /* 462 + * Depending on the number of online CPUs in the original 463 + * kernel, it is likely for CPU #0 to be offline in a kdump 464 + * kernel. The associated IRQs in the affinity mappings 465 + * provided by irq_create_affinity_masks() are thus not 466 + * started by irq_startup(), as per-design for managed IRQs. 467 + * This can be a problem with multi-queue block devices driven 468 + * by blk-mq : such a non-started IRQ is very likely paired 469 + * with the single queue enforced by blk-mq during kdump (see 470 + * blk_mq_alloc_tag_set()). This causes the device to remain 471 + * silent and likely hangs the guest at some point. 472 + * 473 + * We don't really care for fine-grained affinity when doing 474 + * kdump actually : simply ignore the pre-computed affinity 475 + * masks in this case and let the default mask with all CPUs 476 + * be used when creating the IRQ mappings. 477 + */ 478 + if (is_kdump_kernel()) 479 + virq = irq_create_mapping(NULL, hwirq); 480 + else 481 + virq = irq_create_mapping_affinity(NULL, hwirq, 482 + entry->affinity); 464 483 465 484 if (!virq) { 466 485 pr_debug("rtas_msi: Failed mapping hwirq %d\n", hwirq);