Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

powerpc/eeh: Enable IO path on permanent error

We give up recovery on permanent error, simply shutdown the affected
devices and remove them. If the devices can't be put into quiet state,
they spew more traffic that is likely to cause another unexpected EEH
error. This was observed on "p8dtu2u" machine:

0002:00:00.0 PCI bridge: IBM Device 03dc
0002:01:00.0 Ethernet controller: Intel Corporation \
Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0002:01:00.1 Ethernet controller: Intel Corporation \
Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0002:01:00.2 Ethernet controller: Intel Corporation \
Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)
0002:01:00.3 Ethernet controller: Intel Corporation \
Ethernet Controller X710/X557-AT 10GBASE-T (rev 02)

On P8 PowerNV platform, the IO path is frozen when shutdowning the
devices, meaning the memory registers are inaccessible. It is why
the devices can't be put into quiet state before removing them.
This fixes the issue by enabling IO path prior to putting the devices
into quiet state.

Reported-by: Pridhiviraj Paidipeddi <ppaidipe@linux.vnet.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Acked-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

authored by

Gavin Shan and committed by
Michael Ellerman
387bbc97 d89f473f

+9 -1
+9 -1
arch/powerpc/kernel/eeh.c
··· 298 298 * 299 299 * For pHyp, we have to enable IO for log retrieval. Otherwise, 300 300 * 0xFF's is always returned from PCI config space. 301 + * 302 + * When the @severity is EEH_LOG_PERM, the PE is going to be 303 + * removed. Prior to that, the drivers for devices included in 304 + * the PE will be closed. The drivers rely on working IO path 305 + * to bring the devices to quiet state. Otherwise, PCI traffic 306 + * from those devices after they are removed is like to cause 307 + * another unexpected EEH error. 301 308 */ 302 309 if (!(pe->type & EEH_PE_PHB)) { 303 - if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG)) 310 + if (eeh_has_flag(EEH_ENABLE_IO_FOR_LOG) || 311 + severity == EEH_LOG_PERM) 304 312 eeh_pci_enable(pe, EEH_OPT_THAW_MMIO); 305 313 306 314 /*