[PATCH] PCI Error Recovery: documentation

Various PCI bus errors can be signaled by newer PCI controllers.
Recovering from those errors requires an infrastructure to notify
affected device drivers of the error, and a way of walking through
a reset sequence. This patch adds documentation describing the
current error recovery proposal.

Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>

authored by

linas@austin.ibm.com and committed by
Greg Kroah-Hartman
065c6359 392a1ce7

+253
+246
Documentation/pci-error-recovery.txt
···
··· 1 + 2 + PCI Error Recovery 3 + ------------------ 4 + May 31, 2005 5 + 6 + Current document maintainer: 7 + Linas Vepstas <linas@austin.ibm.com> 8 + 9 + 10 + Some PCI bus controllers are able to detect certain "hard" PCI errors 11 + on the bus, such as parity errors on the data and address busses, as 12 + well as SERR and PERR errors. These chipsets are then able to disable 13 + I/O to/from the affected device, so that, for example, a bad DMA 14 + address doesn't end up corrupting system memory. These same chipsets 15 + are also able to reset the affected PCI device, and return it to 16 + working condition. This document describes a generic API form 17 + performing error recovery. 18 + 19 + The core idea is that after a PCI error has been detected, there must 20 + be a way for the kernel to coordinate with all affected device drivers 21 + so that the pci card can be made operational again, possibly after 22 + performing a full electrical #RST of the PCI card. The API below 23 + provides a generic API for device drivers to be notified of PCI 24 + errors, and to be notified of, and respond to, a reset sequence. 25 + 26 + Preliminary sketch of API, cut-n-pasted-n-modified email from 27 + Ben Herrenschmidt, circa 5 april 2005 28 + 29 + The error recovery API support is exposed to the driver in the form of 30 + a structure of function pointers pointed to by a new field in struct 31 + pci_driver. The absence of this pointer in pci_driver denotes an 32 + "non-aware" driver, behaviour on these is platform dependant. 33 + Platforms like ppc64 can try to simulate pci hotplug remove/add. 34 + 35 + The definition of "pci_error_token" is not covered here. It is based on 36 + Seto's work on the synchronous error detection. We still need to define 37 + functions for extracting infos out of an opaque error token. This is 38 + separate from this API. 39 + 40 + This structure has the form: 41 + 42 + struct pci_error_handlers 43 + { 44 + int (*error_detected)(struct pci_dev *dev, pci_error_token error); 45 + int (*mmio_enabled)(struct pci_dev *dev); 46 + int (*resume)(struct pci_dev *dev); 47 + int (*link_reset)(struct pci_dev *dev); 48 + int (*slot_reset)(struct pci_dev *dev); 49 + }; 50 + 51 + A driver doesn't have to implement all of these callbacks. The 52 + only mandatory one is error_detected(). If a callback is not 53 + implemented, the corresponding feature is considered unsupported. 54 + For example, if mmio_enabled() and resume() aren't there, then the 55 + driver is assumed as not doing any direct recovery and requires 56 + a reset. If link_reset() is not implemented, the card is assumed as 57 + not caring about link resets, in which case, if recover is supported, 58 + the core can try recover (but not slot_reset() unless it really did 59 + reset the slot). If slot_reset() is not supported, link_reset() can 60 + be called instead on a slot reset. 61 + 62 + At first, the call will always be : 63 + 64 + 1) error_detected() 65 + 66 + Error detected. This is sent once after an error has been detected. At 67 + this point, the device might not be accessible anymore depending on the 68 + platform (the slot will be isolated on ppc64). The driver may already 69 + have "noticed" the error because of a failing IO, but this is the proper 70 + "synchronisation point", that is, it gives a chance to the driver to 71 + cleanup, waiting for pending stuff (timers, whatever, etc...) to 72 + complete; it can take semaphores, schedule, etc... everything but touch 73 + the device. Within this function and after it returns, the driver 74 + shouldn't do any new IOs. Called in task context. This is sort of a 75 + "quiesce" point. See note about interrupts at the end of this doc. 76 + 77 + Result codes: 78 + - PCIERR_RESULT_CAN_RECOVER: 79 + Driever returns this if it thinks it might be able to recover 80 + the HW by just banging IOs or if it wants to be given 81 + a chance to extract some diagnostic informations (see 82 + below). 83 + - PCIERR_RESULT_NEED_RESET: 84 + Driver returns this if it thinks it can't recover unless the 85 + slot is reset. 86 + - PCIERR_RESULT_DISCONNECT: 87 + Return this if driver thinks it won't recover at all, 88 + (this will detach the driver ? or just leave it 89 + dangling ? to be decided) 90 + 91 + So at this point, we have called error_detected() for all drivers 92 + on the segment that had the error. On ppc64, the slot is isolated. What 93 + happens now typically depends on the result from the drivers. If all 94 + drivers on the segment/slot return PCIERR_RESULT_CAN_RECOVER, we would 95 + re-enable IOs on the slot (or do nothing special if the platform doesn't 96 + isolate slots) and call 2). If not and we can reset slots, we go to 4), 97 + if neither, we have a dead slot. If it's an hotplug slot, we might 98 + "simulate" reset by triggering HW unplug/replug though. 99 + 100 + >>> Current ppc64 implementation assumes that a device driver will 101 + >>> *not* schedule or semaphore in this routine; the current ppc64 102 + >>> implementation uses one kernel thread to notify all devices; 103 + >>> thus, of one device sleeps/schedules, all devices are affected. 104 + >>> Doing better requires complex multi-threaded logic in the error 105 + >>> recovery implementation (e.g. waiting for all notification threads 106 + >>> to "join" before proceeding with recovery.) This seems excessively 107 + >>> complex and not worth implementing. 108 + 109 + >>> The current ppc64 implementation doesn't much care if the device 110 + >>> attempts i/o at this point, or not. I/O's will fail, returning 111 + >>> a value of 0xff on read, and writes will be dropped. If the device 112 + >>> driver attempts more than 10K I/O's to a frozen adapter, it will 113 + >>> assume that the device driver has gone into an infinite loop, and 114 + >>> it will panic the the kernel. 115 + 116 + 2) mmio_enabled() 117 + 118 + This is the "early recovery" call. IOs are allowed again, but DMA is 119 + not (hrm... to be discussed, I prefer not), with some restrictions. This 120 + is NOT a callback for the driver to start operations again, only to 121 + peek/poke at the device, extract diagnostic information, if any, and 122 + eventually do things like trigger a device local reset or some such, 123 + but not restart operations. This is sent if all drivers on a segment 124 + agree that they can try to recover and no automatic link reset was 125 + performed by the HW. If the platform can't just re-enable IOs without 126 + a slot reset or a link reset, it doesn't call this callback and goes 127 + directly to 3) or 4). All IOs should be done _synchronously_ from 128 + within this callback, errors triggered by them will be returned via 129 + the normal pci_check_whatever() api, no new error_detected() callback 130 + will be issued due to an error happening here. However, such an error 131 + might cause IOs to be re-blocked for the whole segment, and thus 132 + invalidate the recovery that other devices on the same segment might 133 + have done, forcing the whole segment into one of the next states, 134 + that is link reset or slot reset. 135 + 136 + Result codes: 137 + - PCIERR_RESULT_RECOVERED 138 + Driver returns this if it thinks the device is fully 139 + functionnal and thinks it is ready to start 140 + normal driver operations again. There is no 141 + guarantee that the driver will actually be 142 + allowed to proceed, as another driver on the 143 + same segment might have failed and thus triggered a 144 + slot reset on platforms that support it. 145 + 146 + - PCIERR_RESULT_NEED_RESET 147 + Driver returns this if it thinks the device is not 148 + recoverable in it's current state and it needs a slot 149 + reset to proceed. 150 + 151 + - PCIERR_RESULT_DISCONNECT 152 + Same as above. Total failure, no recovery even after 153 + reset driver dead. (To be defined more precisely) 154 + 155 + >>> The current ppc64 implementation does not implement this callback. 156 + 157 + 3) link_reset() 158 + 159 + This is called after the link has been reset. This is typically 160 + a PCI Express specific state at this point and is done whenever a 161 + non-fatal error has been detected that can be "solved" by resetting 162 + the link. This call informs the driver of the reset and the driver 163 + should check if the device appears to be in working condition. 164 + This function acts a bit like 2) mmio_enabled(), in that the driver 165 + is not supposed to restart normal driver I/O operations right away. 166 + Instead, it should just "probe" the device to check it's recoverability 167 + status. If all is right, then the core will call resume() once all 168 + drivers have ack'd link_reset(). 169 + 170 + Result codes: 171 + (identical to mmio_enabled) 172 + 173 + >>> The current ppc64 implementation does not implement this callback. 174 + 175 + 4) slot_reset() 176 + 177 + This is called after the slot has been soft or hard reset by the 178 + platform. A soft reset consists of asserting the adapter #RST line 179 + and then restoring the PCI BARs and PCI configuration header. If the 180 + platform supports PCI hotplug, then it might instead perform a hard 181 + reset by toggling power on the slot off/on. This call gives drivers 182 + the chance to re-initialize the hardware (re-download firmware, etc.), 183 + but drivers shouldn't restart normal I/O processing operations at 184 + this point. (See note about interrupts; interrupts aren't guaranteed 185 + to be delivered until the resume() callback has been called). If all 186 + device drivers report success on this callback, the patform will call 187 + resume() to complete the error handling and let the driver restart 188 + normal I/O processing. 189 + 190 + A driver can still return a critical failure for this function if 191 + it can't get the device operational after reset. If the platform 192 + previously tried a soft reset, it migh now try a hard reset (power 193 + cycle) and then call slot_reset() again. It the device still can't 194 + be recovered, there is nothing more that can be done; the platform 195 + will typically report a "permanent failure" in such a case. The 196 + device will be considered "dead" in this case. 197 + 198 + Result codes: 199 + - PCIERR_RESULT_DISCONNECT 200 + Same as above. 201 + 202 + >>> The current ppc64 implementation does not try a power-cycle reset 203 + >>> if the driver returned PCIERR_RESULT_DISCONNECT. However, it should. 204 + 205 + 5) resume() 206 + 207 + This is called if all drivers on the segment have returned 208 + PCIERR_RESULT_RECOVERED from one of the 3 prevous callbacks. 209 + That basically tells the driver to restart activity, tht everything 210 + is back and running. No result code is taken into account here. If 211 + a new error happens, it will restart a new error handling process. 212 + 213 + That's it. I think this covers all the possibilities. The way those 214 + callbacks are called is platform policy. A platform with no slot reset 215 + capability for example may want to just "ignore" drivers that can't 216 + recover (disconnect them) and try to let other cards on the same segment 217 + recover. Keep in mind that in most real life cases, though, there will 218 + be only one driver per segment. 219 + 220 + Now, there is a note about interrupts. If you get an interrupt and your 221 + device is dead or has been isolated, there is a problem :) 222 + 223 + After much thinking, I decided to leave that to the platform. That is, 224 + the recovery API only precies that: 225 + 226 + - There is no guarantee that interrupt delivery can proceed from any 227 + device on the segment starting from the error detection and until the 228 + restart callback is sent, at which point interrupts are expected to be 229 + fully operational. 230 + 231 + - There is no guarantee that interrupt delivery is stopped, that is, ad 232 + river that gets an interrupts after detecting an error, or that detects 233 + and error within the interrupt handler such that it prevents proper 234 + ack'ing of the interrupt (and thus removal of the source) should just 235 + return IRQ_NOTHANDLED. It's up to the platform to deal with taht 236 + condition, typically by masking the irq source during the duration of 237 + the error handling. It is expected that the platform "knows" which 238 + interrupts are routed to error-management capable slots and can deal 239 + with temporarily disabling that irq number during error processing (this 240 + isn't terribly complex). That means some IRQ latency for other devices 241 + sharing the interrupt, but there is simply no other way. High end 242 + platforms aren't supposed to share interrupts between many devices 243 + anyway :) 244 + 245 + 246 + Revised: 31 May 2005 Linas Vepstas <linas@austin.ibm.com>
+7
MAINTAINERS
··· 1994 L: linux-abi-devel@lists.sourceforge.net 1995 S: Maintained 1996 1997 PCI SOUND DRIVERS (ES1370, ES1371 and SONICVIBES) 1998 P: Thomas Sailer 1999 M: sailer@ife.ee.ethz.ch
··· 1994 L: linux-abi-devel@lists.sourceforge.net 1995 S: Maintained 1996 1997 + PCI ERROR RECOVERY 1998 + P: Linas Vepstas 1999 + M: linas@austin.ibm.com 2000 + L: linux-kernel@vger.kernel.org 2001 + L: linux-pci@atrey.karlin.mff.cuni.cz 2002 + S: Supported 2003 + 2004 PCI SOUND DRIVERS (ES1370, ES1371 and SONICVIBES) 2005 P: Thomas Sailer 2006 M: sailer@ife.ee.ethz.ch