Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Documentation: PCI: convert pcieaer-howto.txt to reST

Convert plain text documentation to reStructuredText format and add it to
Sphinx TOC tree. No essential content change.

Signed-off-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>

authored by

Changbin Du and committed by
Bjorn Helgaas
4e37f055 8a01fa64

+101 -56
+1
Documentation/PCI/index.rst
··· 14 14 msi-howto 15 15 acpi-info 16 16 pci-error-recovery 17 + pcieaer-howto
+100 -56
Documentation/PCI/pcieaer-howto.txt Documentation/PCI/pcieaer-howto.rst
··· 1 - The PCI Express Advanced Error Reporting Driver Guide HOWTO 2 - T. Long Nguyen <tom.l.nguyen@intel.com> 3 - Yanmin Zhang <yanmin.zhang@intel.com> 4 - 07/29/2006 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. include:: <isonum.txt> 5 3 4 + =========================================================== 5 + The PCI Express Advanced Error Reporting Driver Guide HOWTO 6 + =========================================================== 6 7 7 - 1. Overview 8 + :Authors: - T. Long Nguyen <tom.l.nguyen@intel.com> 9 + - Yanmin Zhang <yanmin.zhang@intel.com> 8 10 9 - 1.1 About this guide 11 + :Copyright: |copy| 2006 Intel Corporation 12 + 13 + Overview 14 + =========== 15 + 16 + About this guide 17 + ---------------- 10 18 11 19 This guide describes the basics of the PCI Express Advanced Error 12 20 Reporting (AER) driver and provides information on how to use it, as 13 21 well as how to enable the drivers of endpoint devices to conform with 14 22 PCI Express AER driver. 15 23 16 - 1.2 Copyright (C) Intel Corporation 2006. 17 24 18 - 1.3 What is the PCI Express AER Driver? 25 + What is the PCI Express AER Driver? 26 + ----------------------------------- 19 27 20 28 PCI Express error signaling can occur on the PCI Express link itself 21 29 or on behalf of transactions initiated on the link. PCI Express ··· 38 30 Express Advanced Error Reporting capability. The PCI Express AER 39 31 driver provides three basic functions: 40 32 41 - - Gathers the comprehensive error information if errors occurred. 42 - - Reports error to the users. 43 - - Performs error recovery actions. 33 + - Gathers the comprehensive error information if errors occurred. 34 + - Reports error to the users. 35 + - Performs error recovery actions. 44 36 45 37 AER driver only attaches root ports which support PCI-Express AER 46 38 capability. 47 39 48 40 49 - 2. User Guide 41 + User Guide 42 + ========== 50 43 51 - 2.1 Include the PCI Express AER Root Driver into the Linux Kernel 44 + Include the PCI Express AER Root Driver into the Linux Kernel 45 + ------------------------------------------------------------- 52 46 53 47 The PCI Express AER Root driver is a Root Port service driver attached 54 48 to the PCI Express Port Bus driver. If a user wants to use it, the driver ··· 58 48 depends on CONFIG_PCIEPORTBUS, so pls. set CONFIG_PCIEPORTBUS=y and 59 49 CONFIG_PCIEAER = y. 60 50 61 - 2.2 Load PCI Express AER Root Driver 51 + Load PCI Express AER Root Driver 52 + -------------------------------- 62 53 63 54 Some systems have AER support in firmware. Enabling Linux AER support at 64 55 the same time the firmware handles AER may result in unpredictable ··· 67 56 grants AER control to the OS via the ACPI _OSC method. See the PCI FW 3.0 68 57 Specification for details regarding _OSC usage. 69 58 70 - 2.3 AER error output 59 + AER error output 60 + ---------------- 71 61 72 62 When a PCIe AER error is captured, an error message will be output to 73 63 console. If it's a correctable error, it is output as a warning. 74 64 Otherwise, it is printed as an error. So users could choose different 75 65 log level to filter out correctable error messages. 76 66 77 - Below shows an example: 78 - 0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID) 79 - 0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000 80 - 0000:50:00.0: [20] Unsupported Request (First) 81 - 0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100 67 + Below shows an example:: 68 + 69 + 0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID) 70 + 0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000 71 + 0000:50:00.0: [20] Unsupported Request (First) 72 + 0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100 82 73 83 74 In the example, 'Requester ID' means the ID of the device who sends 84 75 the error message to root port. Pls. refer to pci express specs for 85 76 other fields. 86 77 87 - 2.4 AER Statistics / Counters 78 + AER Statistics / Counters 79 + ------------------------- 88 80 89 81 When PCIe AER errors are captured, the counters / statistics are also exposed 90 82 in the form of sysfs attributes which are documented at 91 83 Documentation/ABI/testing/sysfs-bus-pci-devices-aer_stats 92 84 93 - 3. Developer Guide 85 + Developer Guide 86 + =============== 94 87 95 88 To enable AER aware support requires a software driver to configure 96 89 the AER capability structure within its device and to provide callbacks. ··· 135 120 errors because device specific errors will still get sent directly to 136 121 the device driver. 137 122 138 - 3.1 Configure the AER capability structure 123 + Configure the AER capability structure 124 + -------------------------------------- 139 125 140 126 AER aware drivers of PCI Express component need change the device 141 127 control registers to enable AER. They also could change AER registers, ··· 144 128 pci_enable_pcie_error_reporting could be used to enable AER. See 145 129 section 3.3. 146 130 147 - 3.2. Provide callbacks 131 + Provide callbacks 132 + ----------------- 148 133 149 - 3.2.1 callback reset_link to reset pci express link 134 + callback reset_link to reset pci express link 135 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 150 136 151 137 This callback is used to reset the pci express physical link when a 152 138 fatal error happens. The root port aer service driver provides a ··· 158 140 159 141 In struct pcie_port_service_driver, a new pointer, reset_link, is 160 142 added. 143 + :: 161 144 162 - pci_ers_result_t (*reset_link) (struct pci_dev *dev); 145 + pci_ers_result_t (*reset_link) (struct pci_dev *dev); 163 146 164 147 Section 3.2.2.2 provides more detailed info on when to call 165 148 reset_link. 166 149 167 - 3.2.2 PCI error-recovery callbacks 150 + PCI error-recovery callbacks 151 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 168 152 169 153 The PCI Express AER Root driver uses error callbacks to coordinate 170 154 with downstream device drivers associated with a hierarchy in question ··· 181 161 182 162 Below sections specify when to call the error callback functions. 183 163 184 - 3.2.2.1 Correctable errors 164 + Correctable errors 165 + ~~~~~~~~~~~~~~~~~~ 185 166 186 167 Correctable errors pose no impacts on the functionality of 187 168 the interface. The PCI Express protocol can recover without any ··· 190 169 require any recovery actions. The AER driver clears the device's 191 170 correctable error status register accordingly and logs these errors. 192 171 193 - 3.2.2.2 Non-correctable (non-fatal and fatal) errors 172 + Non-correctable (non-fatal and fatal) errors 173 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 194 174 195 175 If an error message indicates a non-fatal error, performing link reset 196 176 at upstream is not required. The AER driver calls error_detected(dev, 197 177 pci_channel_io_normal) to all drivers associated within a hierarchy in 198 - question. for example, 199 - EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort. 178 + question. for example:: 179 + 180 + EndPoint<==>DownstreamPort B<==>UpstreamPort A<==>RootPort 181 + 200 182 If Upstream port A captures an AER error, the hierarchy consists of 201 183 Downstream port B and EndPoint. 202 184 ··· 223 199 reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes 224 200 to mmio_enabled. 225 201 226 - 3.3 helper functions 202 + helper functions 203 + ---------------- 204 + :: 227 205 228 - 3.3.1 int pci_enable_pcie_error_reporting(struct pci_dev *dev); 206 + int pci_enable_pcie_error_reporting(struct pci_dev *dev); 207 + 229 208 pci_enable_pcie_error_reporting enables the device to send error 230 209 messages to root port when an error is detected. Note that devices 231 210 don't enable the error reporting by default, so device drivers need 232 211 call this function to enable it. 233 212 234 - 3.3.2 int pci_disable_pcie_error_reporting(struct pci_dev *dev); 213 + :: 214 + 215 + int pci_disable_pcie_error_reporting(struct pci_dev *dev); 216 + 235 217 pci_disable_pcie_error_reporting disables the device to send error 236 218 messages to root port when an error is detected. 237 219 238 - 3.3.3 int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev); 220 + :: 221 + 222 + int pci_cleanup_aer_uncorrect_error_status(struct pci_dev *dev);` 223 + 239 224 pci_cleanup_aer_uncorrect_error_status cleanups the uncorrectable 240 225 error status register. 241 226 242 - 3.4 Frequent Asked Questions 227 + Frequent Asked Questions 228 + ------------------------ 243 229 244 - Q: What happens if a PCI Express device driver does not provide an 245 - error recovery handler (pci_driver->err_handler is equal to NULL)? 230 + Q: 231 + What happens if a PCI Express device driver does not provide an 232 + error recovery handler (pci_driver->err_handler is equal to NULL)? 246 233 247 - A: The devices attached with the driver won't be recovered. If the 248 - error is fatal, kernel will print out warning messages. Please refer 249 - to section 3 for more information. 234 + A: 235 + The devices attached with the driver won't be recovered. If the 236 + error is fatal, kernel will print out warning messages. Please refer 237 + to section 3 for more information. 250 238 251 - Q: What happens if an upstream port service driver does not provide 252 - callback reset_link? 239 + Q: 240 + What happens if an upstream port service driver does not provide 241 + callback reset_link? 253 242 254 - A: Fatal error recovery will fail if the errors are reported by the 255 - upstream ports who are attached by the service driver. 243 + A: 244 + Fatal error recovery will fail if the errors are reported by the 245 + upstream ports who are attached by the service driver. 256 246 257 - Q: How does this infrastructure deal with driver that is not PCI 258 - Express aware? 247 + Q: 248 + How does this infrastructure deal with driver that is not PCI 249 + Express aware? 259 250 260 - A: This infrastructure calls the error callback functions of the 261 - driver when an error happens. But if the driver is not aware of 262 - PCI Express, the device might not report its own errors to root 263 - port. 251 + A: 252 + This infrastructure calls the error callback functions of the 253 + driver when an error happens. But if the driver is not aware of 254 + PCI Express, the device might not report its own errors to root 255 + port. 264 256 265 - Q: What modifications will that driver need to make it compatible 266 - with the PCI Express AER Root driver? 257 + Q: 258 + What modifications will that driver need to make it compatible 259 + with the PCI Express AER Root driver? 267 260 268 - A: It could call the helper functions to enable AER in devices and 269 - cleanup uncorrectable status register. Pls. refer to section 3.3. 261 + A: 262 + It could call the helper functions to enable AER in devices and 263 + cleanup uncorrectable status register. Pls. refer to section 3.3. 270 264 271 265 272 - 4. Software error injection 266 + Software error injection 267 + ======================== 273 268 274 269 Debugging PCIe AER error recovery code is quite difficult because it 275 270 is hard to trigger real hardware errors. Software based error ··· 304 261 305 262 Then, you need a user space tool named aer-inject, which can be gotten 306 263 from: 264 + 307 265 https://git.kernel.org/cgit/linux/kernel/git/gong.chen/aer-inject.git/ 308 266 309 267 More information about aer-inject can be found in the document comes