Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

powerpc/eeh: Fix crash on converting OF node to edev

The kernel crash was reported by Alexy. He was testing some feature
with private kernel, in which Alexy added some code in pci_pm_reset()
to read the CSR after writting it. The bug could be reproduced on
Fiber Channel card (Fibre Channel: Emulex Corporation Saturn-X:
LightPulse Fibre Channel Host Adapter (rev 03)) by the following
commands.

# echo 1 > /sys/devices/pci0004:01/0004:01:00.0/reset
# rmmod lpfc
# modprobe lpfc

The history behind the test case is that those additional config
space reading operations in pci_pm_reset() would cause EEH error,
but we didn't detect EEH error until "modprobe lpfc". For the case,
all the PCI devices on PCI bus (0004:01) were removed and added after
PE reset. Then the EEH devices would be figured out again based on
the OF nodes. Unfortunately, there were some child OF nodes under
PCI device (0004:01:00.0), but they didn't have attached PCI_DN since
they're invisible from PCI domain. However, we were still trying to
convert OF node to EEH device without checking on the attached PCI_DN.
Eventually, it caused the kernel crash as follows:

Unable to handle kernel paging request for data at address 0x00000030
Faulting instruction address: 0xc00000000004d888
cpu 0x0: Vector: 300 (Data Access) at [c000000fc797b950]
pc: c00000000004d888: .eeh_add_device_tree_early+0x78/0x140
lr: c00000000004d880: .eeh_add_device_tree_early+0x70/0x140
sp: c000000fc797bbd0
msr: 8000000000009032
dar: 30
dsisr: 40000000
current = 0xc000000fc78d9f70
paca = 0xc00000000edb0000 softe: 0 irq_happened: 0x00
pid = 2951, comm = eehd
enter ? for help
[c000000fc797bc50] c00000000004d848 .eeh_add_device_tree_early+0x38/0x140
[c000000fc797bcd0] c00000000004d848 .eeh_add_device_tree_early+0x38/0x140
[c000000fc797bd50] c000000000051b54 .pcibios_add_pci_devices+0x34/0x190
[c000000fc797bde0] c00000000004fb10 .eeh_reset_device+0x100/0x160
[c000000fc797be70] c0000000000502dc .eeh_handle_event+0x19c/0x300
[c000000fc797bf00] c000000000050570 .eeh_event_handler+0x130/0x1a0
[c000000fc797bf90] c000000000020138 .kernel_thread+0x54/0x70

The patch changes of_node_to_eeh_dev() and just returns NULL if the
passed OF node doesn't have attached PCI_DN.

Cc: stable@vger.kernel.org
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Gavin Shan <shangw@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>

authored by

Gavin Shan and committed by
Benjamin Herrenschmidt
1e38b714 feadf7c0

+9 -1
+8
arch/powerpc/include/asm/pci-bridge.h
··· 182 182 #if defined(CONFIG_EEH) 183 183 static inline struct eeh_dev *of_node_to_eeh_dev(struct device_node *dn) 184 184 { 185 + /* 186 + * For those OF nodes whose parent isn't PCI bridge, they 187 + * don't have PCI_DN actually. So we have to skip them for 188 + * any EEH operations. 189 + */ 190 + if (!dn || !PCI_DN(dn)) 191 + return NULL; 192 + 185 193 return PCI_DN(dn)->edev; 186 194 } 187 195 #else
+1 -1
arch/powerpc/platforms/pseries/eeh.c
··· 728 728 { 729 729 struct pci_controller *phb; 730 730 731 - if (!dn || !of_node_to_eeh_dev(dn)) 731 + if (!of_node_to_eeh_dev(dn)) 732 732 return; 733 733 phb = of_node_to_eeh_dev(dn)->phb; 734 734