Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tg3: Disable tg3 PCIe AER on system reboot

Disable PCIe AER on the tg3 device on system reboot on a limited
list of Dell PowerEdge systems. This prevents a fatal PCIe AER event
on the tg3 device during the ACPI _PTS (prepare to sleep) method for
S5 on those systems. The _PTS is invoked by acpi_enter_sleep_state_prep()
as part of the kernel's reboot sequence as a result of commit
38f34dba806a ("PM: ACPI: reboot: Reinstate S5 for reboot").

There was an earlier fix for this problem by commit 2ca1c94ce0b6
("tg3: Disable tg3 device on system reboot to avoid triggering AER").
But it was discovered that this earlier fix caused a reboot hang
when some Dell PowerEdge servers were booted via ipxe. To address
this reboot hang, the earlier fix was essentially reverted by commit
9fc3bc764334 ("tg3: power down device only on SYSTEM_POWER_OFF").
This re-exposed the tg3 PCIe AER on reboot problem.

This fix is not an ideal solution because the root cause of the AER
is in system firmware. Instead, it's a targeted work-around in the
tg3 driver.

Note also that the PCIe AER must be disabled on the tg3 device even
if the system is configured to use "firmware first" error handling.

V3:
- Fix sparse warning on improper comparison of pdev->current_state
- Adhere to netdev comment style

Fixes: 9fc3bc764334 ("tg3: power down device only on SYSTEM_POWER_OFF")
Signed-off-by: Lenny Szubowicz <lszubowi@redhat.com>
Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Lenny Szubowicz and committed by
David S. Miller
e0efe83e 3f1baa91

+58
+58
drivers/net/ethernet/broadcom/tg3.c
··· 55 55 #include <linux/hwmon.h> 56 56 #include <linux/hwmon-sysfs.h> 57 57 #include <linux/crc32poly.h> 58 + #include <linux/dmi.h> 58 59 59 60 #include <net/checksum.h> 60 61 #include <net/gso.h> ··· 18213 18212 18214 18213 static SIMPLE_DEV_PM_OPS(tg3_pm_ops, tg3_suspend, tg3_resume); 18215 18214 18215 + /* Systems where ACPI _PTS (Prepare To Sleep) S5 will result in a fatal 18216 + * PCIe AER event on the tg3 device if the tg3 device is not, or cannot 18217 + * be, powered down. 18218 + */ 18219 + static const struct dmi_system_id tg3_restart_aer_quirk_table[] = { 18220 + { 18221 + .matches = { 18222 + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), 18223 + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R440"), 18224 + }, 18225 + }, 18226 + { 18227 + .matches = { 18228 + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), 18229 + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R540"), 18230 + }, 18231 + }, 18232 + { 18233 + .matches = { 18234 + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), 18235 + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R640"), 18236 + }, 18237 + }, 18238 + { 18239 + .matches = { 18240 + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), 18241 + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R650"), 18242 + }, 18243 + }, 18244 + { 18245 + .matches = { 18246 + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), 18247 + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R740"), 18248 + }, 18249 + }, 18250 + { 18251 + .matches = { 18252 + DMI_MATCH(DMI_SYS_VENDOR, "Dell Inc."), 18253 + DMI_MATCH(DMI_PRODUCT_NAME, "PowerEdge R750"), 18254 + }, 18255 + }, 18256 + {} 18257 + }; 18258 + 18216 18259 static void tg3_shutdown(struct pci_dev *pdev) 18217 18260 { 18218 18261 struct net_device *dev = pci_get_drvdata(pdev); ··· 18273 18228 18274 18229 if (system_state == SYSTEM_POWER_OFF) 18275 18230 tg3_power_down(tp); 18231 + else if (system_state == SYSTEM_RESTART && 18232 + dmi_first_match(tg3_restart_aer_quirk_table) && 18233 + pdev->current_state != PCI_D3cold && 18234 + pdev->current_state != PCI_UNKNOWN) { 18235 + /* Disable PCIe AER on the tg3 to avoid a fatal 18236 + * error during this system restart. 18237 + */ 18238 + pcie_capability_clear_word(pdev, PCI_EXP_DEVCTL, 18239 + PCI_EXP_DEVCTL_CERE | 18240 + PCI_EXP_DEVCTL_NFERE | 18241 + PCI_EXP_DEVCTL_FERE | 18242 + PCI_EXP_DEVCTL_URRE); 18243 + } 18276 18244 18277 18245 rtnl_unlock(); 18278 18246