Merge branch 'linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6

+40

Documentation/ABI/testing/sysfs-bus-pci

··· 133 133 The symbolic link points to the PCI device sysfs entry of the 134 134 Physical Function this device associates with. 135 135 136 + 137 + What: /sys/bus/pci/slots/... 138 + Date: April 2005 (possibly older) 139 + KernelVersion: 2.6.12 (possibly older) 140 + Contact: linux-pci@vger.kernel.org 141 + Description: 142 + When the appropriate driver is loaded, it will create a 143 + directory per claimed physical PCI slot in 144 + /sys/bus/pci/slots/. The names of these directories are 145 + specific to the driver, which in turn, are specific to the 146 + platform, but in general, should match the label on the 147 + machine's physical chassis. 148 + 149 + The drivers that can create slot directories include the 150 + PCI hotplug drivers, and as of 2.6.27, the pci_slot driver. 151 + 152 + The slot directories contain, at a minimum, a file named 153 + 'address' which contains the PCI bus:device:function tuple. 154 + Other files may appear as well, but are specific to the 155 + driver. 156 + 157 + What: /sys/bus/pci/slots/.../function[0-7] 158 + Date: March 2010 159 + KernelVersion: 2.6.35 160 + Contact: linux-pci@vger.kernel.org 161 + Description: 162 + If PCI slot directories (as described above) are created, 163 + and the physical slot is actually populated with a device, 164 + symbolic links in the slot directory pointing to the 165 + device's PCI functions are created as well. 166 + 167 + What: /sys/bus/pci/devices/.../slot 168 + Date: March 2010 169 + KernelVersion: 2.6.35 170 + Contact: linux-pci@vger.kernel.org 171 + Description: 172 + If PCI slot directories (as described above) are created, 173 + a symbolic link pointing to the slot directory will be 174 + created as well. 175 + 136 176 What: /sys/bus/pci/slots/.../module 137 177 Date: June 2009 138 178 Contact: linux-pci@vger.kernel.org

+13 -16

Documentation/PCI/pcieaer-howto.txt

··· 13 13 well as how to enable the drivers of endpoint devices to conform with 14 14 PCI Express AER driver. 15 15 16 - 1.2 Copyright � Intel Corporation 2006. 16 + 1.2 Copyright (C) Intel Corporation 2006. 17 17 18 18 1.3 What is the PCI Express AER Driver? 19 19 ··· 71 71 Otherwise, it is printed as an error. So users could choose different 72 72 log level to filter out correctable error messages. 73 73 74 - Below shows an example. 75 - +------ PCI-Express Device Error -----+ 76 - Error Severity : Uncorrected (Fatal) 77 - PCIE Bus Error type : Transaction Layer 78 - Unsupported Request : First 79 - Requester ID : 0500 80 - VendorID=8086h, DeviceID=0329h, Bus=05h, Device=00h, Function=00h 81 - TLB Header: 82 - 04000001 00200a03 05010000 00050100 74 + Below shows an example: 75 + 0000:50:00.0: PCIe Bus Error: severity=Uncorrected (Fatal), type=Transaction Layer, id=0500(Requester ID) 76 + 0000:50:00.0: device [8086:0329] error status/mask=00100000/00000000 77 + 0000:50:00.0: [20] Unsupported Request (First) 78 + 0000:50:00.0: TLP Header: 04000001 00200a03 05010000 00050100 83 79 84 80 In the example, 'Requester ID' means the ID of the device who sends 85 81 the error message to root port. Pls. refer to pci express specs for ··· 108 112 the other hand, cause the link to be unreliable. 109 113 110 114 When AER is enabled, a PCI Express device will automatically send an 111 - error message to the PCIE root port above it when the device captures 115 + error message to the PCIe root port above it when the device captures 112 116 an error. The Root Port, upon receiving an error reporting message, 113 117 internally processes and logs the error message in its PCI Express 114 118 capability structure. Error information being logged includes storing ··· 194 198 function to reset link. Firstly, kernel looks for if the upstream 195 199 component has an aer driver. If it has, kernel uses the reset_link 196 200 callback of the aer driver. If the upstream component has no aer driver 197 - and the port is downstream port, we will use the aer driver of the 198 - root port who reports the AER error. As for upstream ports, 201 + and the port is downstream port, we will perform a hot reset as the 202 + default by setting the Secondary Bus Reset bit of the Bridge Control 203 + register associated with the downstream port. As for upstream ports, 199 204 they should provide their own aer service drivers with reset_link 200 205 function. If error_detected returns PCI_ERS_RESULT_CAN_RECOVER and 201 206 reset_link returns PCI_ERS_RESULT_RECOVERED, the error handling goes ··· 250 253 251 254 4. Software error injection 252 255 253 - Debugging PCIE AER error recovery code is quite difficult because it 256 + Debugging PCIe AER error recovery code is quite difficult because it 254 257 is hard to trigger real hardware errors. Software based error 255 - injection can be used to fake various kinds of PCIE errors. 258 + injection can be used to fake various kinds of PCIe errors. 256 259 257 - First you should enable PCIE AER software error injection in kernel 260 + First you should enable PCIe AER software error injection in kernel 258 261 configuration, that is, following item should be in your .config. 259 262 260 263 CONFIG_PCIEAER_INJECT=y or CONFIG_PCIEAER_INJECT=m

+984 -258

Documentation/power/pci.txt

··· 1 - 2 1 PCI Power Management 3 - ~~~~~~~~~~~~~~~~~~~~ 4 2 5 - An overview of the concepts and the related functions in the Linux kernel 3 + Copyright (c) 2010 Rafael J. Wysocki <rjw@sisk.pl>, Novell Inc. 6 4 7 - Patrick Mochel <mochel@transmeta.com> 8 - (and others) 5 + An overview of concepts and the Linux kernel's interfaces related to PCI power 6 + management. Based on previous work by Patrick Mochel <mochel@transmeta.com> 7 + (and others). 8 + 9 + This document only covers the aspects of power management specific to PCI 10 + devices. For general description of the kernel's interfaces related to device 11 + power management refer to Documentation/power/devices.txt and 12 + Documentation/power/runtime_pm.txt. 9 13 10 14 --------------------------------------------------------------------------- 11 15 12 - 1. Overview 13 - 2. How the PCI Subsystem Does Power Management 14 - 3. PCI Utility Functions 15 - 4. PCI Device Drivers 16 - 5. Resources 16 + 1. Hardware and Platform Support for PCI Power Management 17 + 2. PCI Subsystem and Device Power Management 18 + 3. PCI Device Drivers and Power Management 19 + 4. Resources 17 20 18 - 1. Overview 19 - ~~~~~~~~~~~ 20 21 21 - The PCI Power Management Specification was introduced between the PCI 2.1 and 22 - PCI 2.2 Specifications. It a standard interface for controlling various 23 - power management operations. 22 + 1. Hardware and Platform Support for PCI Power Management 23 + ========================================================= 24 24 25 - Implementation of the PCI PM Spec is optional, as are several sub-components of 26 - it. If a device supports the PCI PM Spec, the device will have an 8 byte 27 - capability field in its PCI configuration space. This field is used to describe 28 - and control the standard PCI power management features. 25 + 1.1. Native and Platform-Based Power Management 26 + ----------------------------------------------- 27 + In general, power management is a feature allowing one to save energy by putting 28 + devices into states in which they draw less power (low-power states) at the 29 + price of reduced functionality or performance. 29 30 30 - The PCI PM spec defines 4 operating states for devices (D0 - D3) and for buses 31 - (B0 - B3). The higher the number, the less power the device consumes. However, 32 - the higher the number, the longer the latency is for the device to return to 33 - an operational state (D0). 31 + Usually, a device is put into a low-power state when it is underutilized or 32 + completely inactive. However, when it is necessary to use the device once 33 + again, it has to be put back into the "fully functional" state (full-power 34 + state). This may happen when there are some data for the device to handle or 35 + as a result of an external event requiring the device to be active, which may 36 + be signaled by the device itself. 34 37 35 - There are actually two D3 states. When someone talks about D3, they usually 36 - mean D3hot, which corresponds to an ACPI D2 state (power is reduced, the 37 - device may lose some context). But they may also mean D3cold, which is an 38 - ACPI D3 state (power is fully off, all state was discarded); or both. 38 + PCI devices may be put into low-power states in two ways, by using the device 39 + capabilities introduced by the PCI Bus Power Management Interface Specification, 40 + or with the help of platform firmware, such as an ACPI BIOS. In the first 41 + approach, that is referred to as the native PCI power management (native PCI PM) 42 + in what follows, the device power state is changed as a result of writing a 43 + specific value into one of its standard configuration registers. The second 44 + approach requires the platform firmware to provide special methods that may be 45 + used by the kernel to change the device's power state. 39 46 40 - Bus power management is not covered in this version of this document. 47 + Devices supporting the native PCI PM usually can generate wakeup signals called 48 + Power Management Events (PMEs) to let the kernel know about external events 49 + requiring the device to be active. After receiving a PME the kernel is supposed 50 + to put the device that sent it into the full-power state. However, the PCI Bus 51 + Power Management Interface Specification doesn't define any standard method of 52 + delivering the PME from the device to the CPU and the operating system kernel. 53 + It is assumed that the platform firmware will perform this task and therefore, 54 + even though a PCI device is set up to generate PMEs, it also may be necessary to 55 + prepare the platform firmware for notifying the CPU of the PMEs coming from the 56 + device (e.g. by generating interrupts). 41 57 42 - Note that all PCI devices support D0 and D3cold by default, regardless of 43 - whether or not they implement any of the PCI PM spec. 58 + In turn, if the methods provided by the platform firmware are used for changing 59 + the power state of a device, usually the platform also provides a method for 60 + preparing the device to generate wakeup signals. In that case, however, it 61 + often also is necessary to prepare the device for generating PMEs using the 62 + native PCI PM mechanism, because the method provided by the platform depends on 63 + that. 44 64 45 - The possible state transitions that a device can undergo are: 65 + Thus in many situations both the native and the platform-based power management 66 + mechanisms have to be used simultaneously to obtain the desired result. 46 67 47 - +---------------------------+ 48 - | Current State | New State | 49 - +---------------------------+ 50 - | D0 | D1, D2, D3| 51 - +---------------------------+ 52 - | D1 | D2, D3 | 53 - +---------------------------+ 54 - | D2 | D3 | 55 - +---------------------------+ 56 - | D1, D2, D3 | D0 | 57 - +---------------------------+ 68 + 1.2. Native PCI Power Management 69 + -------------------------------- 70 + The PCI Bus Power Management Interface Specification (PCI PM Spec) was 71 + introduced between the PCI 2.1 and PCI 2.2 Specifications. It defined a 72 + standard interface for performing various operations related to power 73 + management. 58 74 59 - Note that when the system is entering a global suspend state, all devices will 60 - be placed into D3 and when resuming, all devices will be placed into D0. 61 - However, when the system is running, other state transitions are possible. 75 + The implementation of the PCI PM Spec is optional for conventional PCI devices, 76 + but it is mandatory for PCI Express devices. If a device supports the PCI PM 77 + Spec, it has an 8 byte power management capability field in its PCI 78 + configuration space. This field is used to describe and control the standard 79 + features related to the native PCI power management. 62 80 63 - 2. How The PCI Subsystem Handles Power Management 64 - ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 81 + The PCI PM Spec defines 4 operating states for devices (D0-D3) and for buses 82 + (B0-B3). The higher the number, the less power is drawn by the device or bus 83 + in that state. However, the higher the number, the longer the latency for 84 + the device or bus to return to the full-power state (D0 or B0, respectively). 65 85 66 - The PCI suspend/resume functionality is accessed indirectly via the Power 67 - Management subsystem. At boot, the PCI driver registers a power management 68 - callback with that layer. Upon entering a suspend state, the PM layer iterates 69 - through all of its registered callbacks. This currently takes place only during 70 - APM state transitions. 86 + There are two variants of the D3 state defined by the specification. The first 87 + one is D3hot, referred to as the software accessible D3, because devices can be 88 + programmed to go into it. The second one, D3cold, is the state that PCI devices 89 + are in when the supply voltage (Vcc) is removed from them. It is not possible 90 + to program a PCI device to go into D3cold, although there may be a programmable 91 + interface for putting the bus the device is on into a state in which Vcc is 92 + removed from all devices on the bus. 71 93 72 - Upon going to sleep, the PCI subsystem walks its device tree twice. Both times, 73 - it does a depth first walk of the device tree. The first walk saves each of the 74 - device's state and checks for devices that will prevent the system from entering 75 - a global power state. The next walk then places the devices in a low power 94 + PCI bus power management, however, is not supported by the Linux kernel at the 95 + time of this writing and therefore it is not covered by this document. 96 + 97 + Note that every PCI device can be in the full-power state (D0) or in D3cold, 98 + regardless of whether or not it implements the PCI PM Spec. In addition to 99 + that, if the PCI PM Spec is implemented by the device, it must support D3hot 100 + as well as D0. The support for the D1 and D2 power states is optional. 101 + 102 + PCI devices supporting the PCI PM Spec can be programmed to go to any of the 103 + supported low-power states (except for D3cold). While in D1-D3hot the 104 + standard configuration registers of the device must be accessible to software 105 + (i.e. the device is required to respond to PCI configuration accesses), although 106 + its I/O and memory spaces are then disabled. This allows the device to be 107 + programmatically put into D0. Thus the kernel can switch the device back and 108 + forth between D0 and the supported low-power states (except for D3cold) and the 109 + possible power state transitions the device can undergo are the following: 110 + 111 + +----------------------------+ 112 + | Current State | New State | 113 + +----------------------------+ 114 + | D0 | D1, D2, D3 | 115 + +----------------------------+ 116 + | D1 | D2, D3 | 117 + +----------------------------+ 118 + | D2 | D3 | 119 + +----------------------------+ 120 + | D1, D2, D3 | D0 | 121 + +----------------------------+ 122 + 123 + The transition from D3cold to D0 occurs when the supply voltage is provided to 124 + the device (i.e. power is restored). In that case the device returns to D0 with 125 + a full power-on reset sequence and the power-on defaults are restored to the 126 + device by hardware just as at initial power up. 127 + 128 + PCI devices supporting the PCI PM Spec can be programmed to generate PMEs 129 + while in a low-power state (D1-D3), but they are not required to be capable 130 + of generating PMEs from all supported low-power states. In particular, the 131 + capability of generating PMEs from D3cold is optional and depends on the 132 + presence of additional voltage (3.3Vaux) allowing the device to remain 133 + sufficiently active to generate a wakeup signal. 134 + 135 + 1.3. ACPI Device Power Management 136 + --------------------------------- 137 + The platform firmware support for the power management of PCI devices is 138 + system-specific. However, if the system in question is compliant with the 139 + Advanced Configuration and Power Interface (ACPI) Specification, like the 140 + majority of x86-based systems, it is supposed to implement device power 141 + management interfaces defined by the ACPI standard. 142 + 143 + For this purpose the ACPI BIOS provides special functions called "control 144 + methods" that may be executed by the kernel to perform specific tasks, such as 145 + putting a device into a low-power state. These control methods are encoded 146 + using special byte-code language called the ACPI Machine Language (AML) and 147 + stored in the machine's BIOS. The kernel loads them from the BIOS and executes 148 + them as needed using an AML interpreter that translates the AML byte code into 149 + computations and memory or I/O space accesses. This way, in theory, a BIOS 150 + writer can provide the kernel with a means to perform actions depending 151 + on the system design in a system-specific fashion. 152 + 153 + ACPI control methods may be divided into global control methods, that are not 154 + associated with any particular devices, and device control methods, that have 155 + to be defined separately for each device supposed to be handled with the help of 156 + the platform. This means, in particular, that ACPI device control methods can 157 + only be used to handle devices that the BIOS writer knew about in advance. The 158 + ACPI methods used for device power management fall into that category. 159 + 160 + The ACPI specification assumes that devices can be in one of four power states 161 + labeled as D0, D1, D2, and D3 that roughly correspond to the native PCI PM 162 + D0-D3 states (although the difference between D3hot and D3cold is not taken 163 + into account by ACPI). Moreover, for each power state of a device there is a 164 + set of power resources that have to be enabled for the device to be put into 165 + that state. These power resources are controlled (i.e. enabled or disabled) 166 + with the help of their own control methods, _ON and _OFF, that have to be 167 + defined individually for each of them. 168 + 169 + To put a device into the ACPI power state Dx (where x is a number between 0 and 170 + 3 inclusive) the kernel is supposed to (1) enable the power resources required 171 + by the device in this state using their _ON control methods and (2) execute the 172 + _PSx control method defined for the device. In addition to that, if the device 173 + is going to be put into a low-power state (D1-D3) and is supposed to generate 174 + wakeup signals from that state, the _DSW (or _PSW, replaced with _DSW by ACPI 175 + 3.0) control method defined for it has to be executed before _PSx. Power 176 + resources that are not required by the device in the target power state and are 177 + not required any more by any other device should be disabled (by executing their 178 + _OFF control methods). If the current power state of the device is D3, it can 179 + only be put into D0 this way. 180 + 181 + However, quite often the power states of devices are changed during a 182 + system-wide transition into a sleep state or back into the working state. ACPI 183 + defines four system sleep states, S1, S2, S3, and S4, and denotes the system 184 + working state as S0. In general, the target system sleep (or working) state 185 + determines the highest power (lowest number) state the device can be put 186 + into and the kernel is supposed to obtain this information by executing the 187 + device's _SxD control method (where x is a number between 0 and 4 inclusive). 188 + If the device is required to wake up the system from the target sleep state, the 189 + lowest power (highest number) state it can be put into is also determined by the 190 + target state of the system. The kernel is then supposed to use the device's 191 + _SxW control method to obtain the number of that state. It also is supposed to 192 + use the device's _PRW control method to learn which power resources need to be 193 + enabled for the device to be able to generate wakeup signals. 194 + 195 + 1.4. Wakeup Signaling 196 + --------------------- 197 + Wakeup signals generated by PCI devices, either as native PCI PMEs, or as 198 + a result of the execution of the _DSW (or _PSW) ACPI control method before 199 + putting the device into a low-power state, have to be caught and handled as 200 + appropriate. If they are sent while the system is in the working state 201 + (ACPI S0), they should be translated into interrupts so that the kernel can 202 + put the devices generating them into the full-power state and take care of the 203 + events that triggered them. In turn, if they are sent while the system is 204 + sleeping, they should cause the system's core logic to trigger wakeup. 205 + 206 + On ACPI-based systems wakeup signals sent by conventional PCI devices are 207 + converted into ACPI General-Purpose Events (GPEs) which are hardware signals 208 + from the system core logic generated in response to various events that need to 209 + be acted upon. Every GPE is associated with one or more sources of potentially 210 + interesting events. In particular, a GPE may be associated with a PCI device 211 + capable of signaling wakeup. The information on the connections between GPEs 212 + and event sources is recorded in the system's ACPI BIOS from where it can be 213 + read by the kernel. 214 + 215 + If a PCI device known to the system's ACPI BIOS signals wakeup, the GPE 216 + associated with it (if there is one) is triggered. The GPEs associated with PCI 217 + bridges may also be triggered in response to a wakeup signal from one of the 218 + devices below the bridge (this also is the case for root bridges) and, for 219 + example, native PCI PMEs from devices unknown to the system's ACPI BIOS may be 220 + handled this way. 221 + 222 + A GPE may be triggered when the system is sleeping (i.e. when it is in one of 223 + the ACPI S1-S4 states), in which case system wakeup is started by its core logic 224 + (the device that was the source of the signal causing the system wakeup to occur 225 + may be identified later). The GPEs used in such situations are referred to as 226 + wakeup GPEs. 227 + 228 + Usually, however, GPEs are also triggered when the system is in the working 229 + state (ACPI S0) and in that case the system's core logic generates a System 230 + Control Interrupt (SCI) to notify the kernel of the event. Then, the SCI 231 + handler identifies the GPE that caused the interrupt to be generated which, 232 + in turn, allows the kernel to identify the source of the event (that may be 233 + a PCI device signaling wakeup). The GPEs used for notifying the kernel of 234 + events occurring while the system is in the working state are referred to as 235 + runtime GPEs. 236 + 237 + Unfortunately, there is no standard way of handling wakeup signals sent by 238 + conventional PCI devices on systems that are not ACPI-based, but there is one 239 + for PCI Express devices. Namely, the PCI Express Base Specification introduced 240 + a native mechanism for converting native PCI PMEs into interrupts generated by 241 + root ports. For conventional PCI devices native PMEs are out-of-band, so they 242 + are routed separately and they need not pass through bridges (in principle they 243 + may be routed directly to the system's core logic), but for PCI Express devices 244 + they are in-band messages that have to pass through the PCI Express hierarchy, 245 + including the root port on the path from the device to the Root Complex. Thus 246 + it was possible to introduce a mechanism by which a root port generates an 247 + interrupt whenever it receives a PME message from one of the devices below it. 248 + The PCI Express Requester ID of the device that sent the PME message is then 249 + recorded in one of the root port's configuration registers from where it may be 250 + read by the interrupt handler allowing the device to be identified. [PME 251 + messages sent by PCI Express endpoints integrated with the Root Complex don't 252 + pass through root ports, but instead they cause a Root Complex Event Collector 253 + (if there is one) to generate interrupts.] 254 + 255 + In principle the native PCI Express PME signaling may also be used on ACPI-based 256 + systems along with the GPEs, but to use it the kernel has to ask the system's 257 + ACPI BIOS to release control of root port configuration registers. The ACPI 258 + BIOS, however, is not required to allow the kernel to control these registers 259 + and if it doesn't do that, the kernel must not modify their contents. Of course 260 + the native PCI Express PME signaling cannot be used by the kernel in that case. 261 + 262 + 263 + 2. PCI Subsystem and Device Power Management 264 + ============================================ 265 + 266 + 2.1. Device Power Management Callbacks 267 + -------------------------------------- 268 + The PCI Subsystem participates in the power management of PCI devices in a 269 + number of ways. First of all, it provides an intermediate code layer between 270 + the device power management core (PM core) and PCI device drivers. 271 + Specifically, the pm field of the PCI subsystem's struct bus_type object, 272 + pci_bus_type, points to a struct dev_pm_ops object, pci_dev_pm_ops, containing 273 + pointers to several device power management callbacks: 274 + 275 + const struct dev_pm_ops pci_dev_pm_ops = { 276 + .prepare = pci_pm_prepare, 277 + .complete = pci_pm_complete, 278 + .suspend = pci_pm_suspend, 279 + .resume = pci_pm_resume, 280 + .freeze = pci_pm_freeze, 281 + .thaw = pci_pm_thaw, 282 + .poweroff = pci_pm_poweroff, 283 + .restore = pci_pm_restore, 284 + .suspend_noirq = pci_pm_suspend_noirq, 285 + .resume_noirq = pci_pm_resume_noirq, 286 + .freeze_noirq = pci_pm_freeze_noirq, 287 + .thaw_noirq = pci_pm_thaw_noirq, 288 + .poweroff_noirq = pci_pm_poweroff_noirq, 289 + .restore_noirq = pci_pm_restore_noirq, 290 + .runtime_suspend = pci_pm_runtime_suspend, 291 + .runtime_resume = pci_pm_runtime_resume, 292 + .runtime_idle = pci_pm_runtime_idle, 293 + }; 294 + 295 + These callbacks are executed by the PM core in various situations related to 296 + device power management and they, in turn, execute power management callbacks 297 + provided by PCI device drivers. They also perform power management operations 298 + involving some standard configuration registers of PCI devices that device 299 + drivers need not know or care about. 300 + 301 + The structure representing a PCI device, struct pci_dev, contains several fields 302 + that these callbacks operate on: 303 + 304 + struct pci_dev { 305 + ... 306 + pci_power_t current_state; /* Current operating state. */ 307 + int pm_cap; /* PM capability offset in the 308 + configuration space */ 309 + unsigned int pme_support:5; /* Bitmask of states from which PME# 310 + can be generated */ 311 + unsigned int pme_interrupt:1;/* Is native PCIe PME signaling used? */ 312 + unsigned int d1_support:1; /* Low power state D1 is supported */ 313 + unsigned int d2_support:1; /* Low power state D2 is supported */ 314 + unsigned int no_d1d2:1; /* D1 and D2 are forbidden */ 315 + unsigned int wakeup_prepared:1; /* Device prepared for wake up */ 316 + unsigned int d3_delay; /* D3->D0 transition time in ms */ 317 + ... 318 + }; 319 + 320 + They also indirectly use some fields of the struct device that is embedded in 321 + struct pci_dev. 322 + 323 + 2.2. Device Initialization 324 + -------------------------- 325 + The PCI subsystem's first task related to device power management is to 326 + prepare the device for power management and initialize the fields of struct 327 + pci_dev used for this purpose. This happens in two functions defined in 328 + drivers/pci/pci.c, pci_pm_init() and platform_pci_wakeup_init(). 329 + 330 + The first of these functions checks if the device supports native PCI PM 331 + and if that's the case the offset of its power management capability structure 332 + in the configuration space is stored in the pm_cap field of the device's struct 333 + pci_dev object. Next, the function checks which PCI low-power states are 334 + supported by the device and from which low-power states the device can generate 335 + native PCI PMEs. The power management fields of the device's struct pci_dev and 336 + the struct device embedded in it are updated accordingly and the generation of 337 + PMEs by the device is disabled. 338 + 339 + The second function checks if the device can be prepared to signal wakeup with 340 + the help of the platform firmware, such as the ACPI BIOS. If that is the case, 341 + the function updates the wakeup fields in struct device embedded in the 342 + device's struct pci_dev and uses the firmware-provided method to prevent the 343 + device from signaling wakeup. 344 + 345 + At this point the device is ready for power management. For driverless devices, 346 + however, this functionality is limited to a few basic operations carried out 347 + during system-wide transitions to a sleep state and back to the working state. 348 + 349 + 2.3. Runtime Device Power Management 350 + ------------------------------------ 351 + The PCI subsystem plays a vital role in the runtime power management of PCI 352 + devices. For this purpose it uses the general runtime power management 353 + (runtime PM) framework described in Documentation/power/runtime_pm.txt. 354 + Namely, it provides subsystem-level callbacks: 355 + 356 + pci_pm_runtime_suspend() 357 + pci_pm_runtime_resume() 358 + pci_pm_runtime_idle() 359 + 360 + that are executed by the core runtime PM routines. It also implements the 361 + entire mechanics necessary for handling runtime wakeup signals from PCI devices 362 + in low-power states, which at the time of this writing works for both the native 363 + PCI Express PME signaling and the ACPI GPE-based wakeup signaling described in 364 + Section 1. 365 + 366 + First, a PCI device is put into a low-power state, or suspended, with the help 367 + of pm_schedule_suspend() or pm_runtime_suspend() which for PCI devices call 368 + pci_pm_runtime_suspend() to do the actual job. For this to work, the device's 369 + driver has to provide a pm->runtime_suspend() callback (see below), which is 370 + run by pci_pm_runtime_suspend() as the first action. If the driver's callback 371 + returns successfully, the device's standard configuration registers are saved, 372 + the device is prepared to generate wakeup signals and, finally, it is put into 373 + the target low-power state. 374 + 375 + The low-power state to put the device into is the lowest-power (highest number) 376 + state from which it can signal wakeup. The exact method of signaling wakeup is 377 + system-dependent and is determined by the PCI subsystem on the basis of the 378 + reported capabilities of the device and the platform firmware. To prepare the 379 + device for signaling wakeup and put it into the selected low-power state, the 380 + PCI subsystem can use the platform firmware as well as the device's native PCI 381 + PM capabilities, if supported. 382 + 383 + It is expected that the device driver's pm->runtime_suspend() callback will 384 + not attempt to prepare the device for signaling wakeup or to put it into a 385 + low-power state. The driver ought to leave these tasks to the PCI subsystem 386 + that has all of the information necessary to perform them. 387 + 388 + A suspended device is brought back into the "active" state, or resumed, 389 + with the help of pm_request_resume() or pm_runtime_resume() which both call 390 + pci_pm_runtime_resume() for PCI devices. Again, this only works if the device's 391 + driver provides a pm->runtime_resume() callback (see below). However, before 392 + the driver's callback is executed, pci_pm_runtime_resume() brings the device 393 + back into the full-power state, prevents it from signaling wakeup while in that 394 + state and restores its standard configuration registers. Thus the driver's 395 + callback need not worry about the PCI-specific aspects of the device resume. 396 + 397 + Note that generally pci_pm_runtime_resume() may be called in two different 398 + situations. First, it may be called at the request of the device's driver, for 399 + example if there are some data for it to process. Second, it may be called 400 + as a result of a wakeup signal from the device itself (this sometimes is 401 + referred to as "remote wakeup"). Of course, for this purpose the wakeup signal 402 + is handled in one of the ways described in Section 1 and finally converted into 403 + a notification for the PCI subsystem after the source device has been 404 + identified. 405 + 406 + The pci_pm_runtime_idle() function, called for PCI devices by pm_runtime_idle() 407 + and pm_request_idle(), executes the device driver's pm->runtime_idle() 408 + callback, if defined, and if that callback doesn't return error code (or is not 409 + present at all), suspends the device with the help of pm_runtime_suspend(). 410 + Sometimes pci_pm_runtime_idle() is called automatically by the PM core (for 411 + example, it is called right after the device has just been resumed), in which 412 + cases it is expected to suspend the device if that makes sense. Usually, 413 + however, the PCI subsystem doesn't really know if the device really can be 414 + suspended, so it lets the device's driver decide by running its 415 + pm->runtime_idle() callback. 416 + 417 + 2.4. System-Wide Power Transitions 418 + ---------------------------------- 419 + There are a few different types of system-wide power transitions, described in 420 + Documentation/power/devices.txt. Each of them requires devices to be handled 421 + in a specific way and the PM core executes subsystem-level power management 422 + callbacks for this purpose. They are executed in phases such that each phase 423 + involves executing the same subsystem-level callback for every device belonging 424 + to the given subsystem before the next phase begins. These phases always run 425 + after tasks have been frozen. 426 + 427 + 2.4.1. System Suspend 428 + 429 + When the system is going into a sleep state in which the contents of memory will 430 + be preserved, such as one of the ACPI sleep states S1-S3, the phases are: 431 + 432 + prepare, suspend, suspend_noirq. 433 + 434 + The following PCI bus type's callbacks, respectively, are used in these phases: 435 + 436 + pci_pm_prepare() 437 + pci_pm_suspend() 438 + pci_pm_suspend_noirq() 439 + 440 + The pci_pm_prepare() routine first puts the device into the "fully functional" 441 + state with the help of pm_runtime_resume(). Then, it executes the device 442 + driver's pm->prepare() callback if defined (i.e. if the driver's struct 443 + dev_pm_ops object is present and the prepare pointer in that object is valid). 444 + 445 + The pci_pm_suspend() routine first checks if the device's driver implements 446 + legacy PCI suspend routines (see Section 3), in which case the driver's legacy 447 + suspend callback is executed, if present, and its result is returned. Next, if 448 + the device's driver doesn't provide a struct dev_pm_ops object (containing 449 + pointers to the driver's callbacks), pci_pm_default_suspend() is called, which 450 + simply turns off the device's bus master capability and runs 451 + pcibios_disable_device() to disable it, unless the device is a bridge (PCI 452 + bridges are ignored by this routine). Next, the device driver's pm->suspend() 453 + callback is executed, if defined, and its result is returned if it fails. 454 + Finally, pci_fixup_device() is called to apply hardware suspend quirks related 455 + to the device if necessary. 456 + 457 + Note that the suspend phase is carried out asynchronously for PCI devices, so 458 + the pci_pm_suspend() callback may be executed in parallel for any pair of PCI 459 + devices that don't depend on each other in a known way (i.e. none of the paths 460 + in the device tree from the root bridge to a leaf device contains both of them). 461 + 462 + The pci_pm_suspend_noirq() routine is executed after suspend_device_irqs() has 463 + been called, which means that the device driver's interrupt handler won't be 464 + invoked while this routine is running. It first checks if the device's driver 465 + implements legacy PCI suspends routines (Section 3), in which case the legacy 466 + late suspend routine is called and its result is returned (the standard 467 + configuration registers of the device are saved if the driver's callback hasn't 468 + done that). Second, if the device driver's struct dev_pm_ops object is not 469 + present, the device's standard configuration registers are saved and the routine 470 + returns success. Otherwise the device driver's pm->suspend_noirq() callback is 471 + executed, if present, and its result is returned if it fails. Next, if the 472 + device's standard configuration registers haven't been saved yet (one of the 473 + device driver's callbacks executed before might do that), pci_pm_suspend_noirq() 474 + saves them, prepares the device to signal wakeup (if necessary) and puts it into 475 + a low-power state. 476 + 477 + The low-power state to put the device into is the lowest-power (highest number) 478 + state from which it can signal wakeup while the system is in the target sleep 479 + state. Just like in the runtime PM case described above, the mechanism of 480 + signaling wakeup is system-dependent and determined by the PCI subsystem, which 481 + is also responsible for preparing the device to signal wakeup from the system's 482 + target sleep state as appropriate. 483 + 484 + PCI device drivers (that don't implement legacy power management callbacks) are 485 + generally not expected to prepare devices for signaling wakeup or to put them 486 + into low-power states. However, if one of the driver's suspend callbacks 487 + (pm->suspend() or pm->suspend_noirq()) saves the device's standard configuration 488 + registers, pci_pm_suspend_noirq() will assume that the device has been prepared 489 + to signal wakeup and put into a low-power state by the driver (the driver is 490 + then assumed to have used the helper functions provided by the PCI subsystem for 491 + this purpose). PCI device drivers are not encouraged to do that, but in some 492 + rare cases doing that in the driver may be the optimum approach. 493 + 494 + 2.4.2. System Resume 495 + 496 + When the system is undergoing a transition from a sleep state in which the 497 + contents of memory have been preserved, such as one of the ACPI sleep states 498 + S1-S3, into the working state (ACPI S0), the phases are: 499 + 500 + resume_noirq, resume, complete. 501 + 502 + The following PCI bus type's callbacks, respectively, are executed in these 503 + phases: 504 + 505 + pci_pm_resume_noirq() 506 + pci_pm_resume() 507 + pci_pm_complete() 508 + 509 + The pci_pm_resume_noirq() routine first puts the device into the full-power 510 + state, restores its standard configuration registers and applies early resume 511 + hardware quirks related to the device, if necessary. This is done 512 + unconditionally, regardless of whether or not the device's driver implements 513 + legacy PCI power management callbacks (this way all PCI devices are in the 514 + full-power state and their standard configuration registers have been restored 515 + when their interrupt handlers are invoked for the first time during resume, 516 + which allows the kernel to avoid problems with the handling of shared interrupts 517 + by drivers whose devices are still suspended). If legacy PCI power management 518 + callbacks (see Section 3) are implemented by the device's driver, the legacy 519 + early resume callback is executed and its result is returned. Otherwise, the 520 + device driver's pm->resume_noirq() callback is executed, if defined, and its 521 + result is returned. 522 + 523 + The pci_pm_resume() routine first checks if the device's standard configuration 524 + registers have been restored and restores them if that's not the case (this 525 + only is necessary in the error path during a failing suspend). Next, resume 526 + hardware quirks related to the device are applied, if necessary, and if the 527 + device's driver implements legacy PCI power management callbacks (see 528 + Section 3), the driver's legacy resume callback is executed and its result is 529 + returned. Otherwise, the device's wakeup signaling mechanisms are blocked and 530 + its driver's pm->resume() callback is executed, if defined (the callback's 531 + result is then returned). 532 + 533 + The resume phase is carried out asynchronously for PCI devices, like the 534 + suspend phase described above, which means that if two PCI devices don't depend 535 + on each other in a known way, the pci_pm_resume() routine may be executed for 536 + the both of them in parallel. 537 + 538 + The pci_pm_complete() routine only executes the device driver's pm->complete() 539 + callback, if defined. 540 + 541 + 2.4.3. System Hibernation 542 + 543 + System hibernation is more complicated than system suspend, because it requires 544 + a system image to be created and written into a persistent storage medium. The 545 + image is created atomically and all devices are quiesced, or frozen, before that 546 + happens. 547 + 548 + The freezing of devices is carried out after enough memory has been freed (at 549 + the time of this writing the image creation requires at least 50% of system RAM 550 + to be free) in the following three phases: 551 + 552 + prepare, freeze, freeze_noirq 553 + 554 + that correspond to the PCI bus type's callbacks: 555 + 556 + pci_pm_prepare() 557 + pci_pm_freeze() 558 + pci_pm_freeze_noirq() 559 + 560 + This means that the prepare phase is exactly the same as for system suspend. 561 + The other two phases, however, are different. 562 + 563 + The pci_pm_freeze() routine is quite similar to pci_pm_suspend(), but it runs 564 + the device driver's pm->freeze() callback, if defined, instead of pm->suspend(), 565 + and it doesn't apply the suspend-related hardware quirks. It is executed 566 + asynchronously for different PCI devices that don't depend on each other in a 567 + known way. 568 + 569 + The pci_pm_freeze_noirq() routine, in turn, is similar to 570 + pci_pm_suspend_noirq(), but it calls the device driver's pm->freeze_noirq() 571 + routine instead of pm->suspend_noirq(). It also doesn't attempt to prepare the 572 + device for signaling wakeup and put it into a low-power state. Still, it saves 573 + the device's standard configuration registers if they haven't been saved by one 574 + of the driver's callbacks. 575 + 576 + Once the image has been created, it has to be saved. However, at this point all 577 + devices are frozen and they cannot handle I/O, while their ability to handle 578 + I/O is obviously necessary for the image saving. Thus they have to be brought 579 + back to the fully functional state and this is done in the following phases: 580 + 581 + thaw_noirq, thaw, complete 582 + 583 + using the following PCI bus type's callbacks: 584 + 585 + pci_pm_thaw_noirq() 586 + pci_pm_thaw() 587 + pci_pm_complete() 588 + 589 + respectively. 590 + 591 + The first of them, pci_pm_thaw_noirq(), is analogous to pci_pm_resume_noirq(), 592 + but it doesn't put the device into the full power state and doesn't attempt to 593 + restore its standard configuration registers. It also executes the device 594 + driver's pm->thaw_noirq() callback, if defined, instead of pm->resume_noirq(). 595 + 596 + The pci_pm_thaw() routine is similar to pci_pm_resume(), but it runs the device 597 + driver's pm->thaw() callback instead of pm->resume(). It is executed 598 + asynchronously for different PCI devices that don't depend on each other in a 599 + known way. 600 + 601 + The complete phase it the same as for system resume. 602 + 603 + After saving the image, devices need to be powered down before the system can 604 + enter the target sleep state (ACPI S4 for ACPI-based systems). This is done in 605 + three phases: 606 + 607 + prepare, poweroff, poweroff_noirq 608 + 609 + where the prepare phase is exactly the same as for system suspend. The other 610 + two phases are analogous to the suspend and suspend_noirq phases, respectively. 611 + The PCI subsystem-level callbacks they correspond to 612 + 613 + pci_pm_poweroff() 614 + pci_pm_poweroff_noirq() 615 + 616 + work in analogy with pci_pm_suspend() and pci_pm_poweroff_noirq(), respectively, 617 + although they don't attempt to save the device's standard configuration 618 + registers. 619 + 620 + 2.4.4. System Restore 621 + 622 + System restore requires a hibernation image to be loaded into memory and the 623 + pre-hibernation memory contents to be restored before the pre-hibernation system 624 + activity can be resumed. 625 + 626 + As described in Documentation/power/devices.txt, the hibernation image is loaded 627 + into memory by a fresh instance of the kernel, called the boot kernel, which in 628 + turn is loaded and run by a boot loader in the usual way. After the boot kernel 629 + has loaded the image, it needs to replace its own code and data with the code 630 + and data of the "hibernated" kernel stored within the image, called the image 631 + kernel. For this purpose all devices are frozen just like before creating 632 + the image during hibernation, in the 633 + 634 + prepare, freeze, freeze_noirq 635 + 636 + phases described above. However, the devices affected by these phases are only 637 + those having drivers in the boot kernel; other devices will still be in whatever 638 + state the boot loader left them. 639 + 640 + Should the restoration of the pre-hibernation memory contents fail, the boot 641 + kernel would go through the "thawing" procedure described above, using the 642 + thaw_noirq, thaw, and complete phases (that will only affect the devices having 643 + drivers in the boot kernel), and then continue running normally. 644 + 645 + If the pre-hibernation memory contents are restored successfully, which is the 646 + usual situation, control is passed to the image kernel, which then becomes 647 + responsible for bringing the system back to the working state. To achieve this, 648 + it must restore the devices' pre-hibernation functionality, which is done much 649 + like waking up from the memory sleep state, although it involves different 650 + phases: 651 + 652 + restore_noirq, restore, complete 653 + 654 + The first two of these are analogous to the resume_noirq and resume phases 655 + described above, respectively, and correspond to the following PCI subsystem 656 + callbacks: 657 + 658 + pci_pm_restore_noirq() 659 + pci_pm_restore() 660 + 661 + These callbacks work in analogy with pci_pm_resume_noirq() and pci_pm_resume(), 662 + respectively, but they execute the device driver's pm->restore_noirq() and 663 + pm->restore() callbacks, if available. 664 + 665 + The complete phase is carried out in exactly the same way as during system 666 + resume. 667 + 668 + 669 + 3. PCI Device Drivers and Power Management 670 + ========================================== 671 + 672 + 3.1. Power Management Callbacks 673 + ------------------------------- 674 + PCI device drivers participate in power management by providing callbacks to be 675 + executed by the PCI subsystem's power management routines described above and by 676 + controlling the runtime power management of their devices. 677 + 678 + At the time of this writing there are two ways to define power management 679 + callbacks for a PCI device driver, the recommended one, based on using a 680 + dev_pm_ops structure described in Documentation/power/devices.txt, and the 681 + "legacy" one, in which the .suspend(), .suspend_late(), .resume_early(), and 682 + .resume() callbacks from struct pci_driver are used. The legacy approach, 683 + however, doesn't allow one to define runtime power management callbacks and is 684 + not really suitable for any new drivers. Therefore it is not covered by this 685 + document (refer to the source code to learn more about it). 686 + 687 + It is recommended that all PCI device drivers define a struct dev_pm_ops object 688 + containing pointers to power management (PM) callbacks that will be executed by 689 + the PCI subsystem's PM routines in various circumstances. A pointer to the 690 + driver's struct dev_pm_ops object has to be assigned to the driver.pm field in 691 + its struct pci_driver object. Once that has happened, the "legacy" PM callbacks 692 + in struct pci_driver are ignored (even if they are not NULL). 693 + 694 + The PM callbacks in struct dev_pm_ops are not mandatory and if they are not 695 + defined (i.e. the respective fields of struct dev_pm_ops are unset) the PCI 696 + subsystem will handle the device in a simplified default manner. If they are 697 + defined, though, they are expected to behave as described in the following 698 + subsections. 699 + 700 + 3.1.1. prepare() 701 + 702 + The prepare() callback is executed during system suspend, during hibernation 703 + (when a hibernation image is about to be created), during power-off after 704 + saving a hibernation image and during system restore, when a hibernation image 705 + has just been loaded into memory. 706 + 707 + This callback is only necessary if the driver's device has children that in 708 + general may be registered at any time. In that case the role of the prepare() 709 + callback is to prevent new children of the device from being registered until 710 + one of the resume_noirq(), thaw_noirq(), or restore_noirq() callbacks is run. 711 + 712 + In addition to that the prepare() callback may carry out some operations 713 + preparing the device to be suspended, although it should not allocate memory 714 + (if additional memory is required to suspend the device, it has to be 715 + preallocated earlier, for example in a suspend/hibernate notifier as described 716 + in Documentation/power/notifiers.txt). 717 + 718 + 3.1.2. suspend() 719 + 720 + The suspend() callback is only executed during system suspend, after prepare() 721 + callbacks have been executed for all devices in the system. 722 + 723 + This callback is expected to quiesce the device and prepare it to be put into a 724 + low-power state by the PCI subsystem. It is not required (in fact it even is 725 + not recommended) that a PCI driver's suspend() callback save the standard 726 + configuration registers of the device, prepare it for waking up the system, or 727 + put it into a low-power state. All of these operations can very well be taken 728 + care of by the PCI subsystem, without the driver's participation. 729 + 730 + However, in some rare case it is convenient to carry out these operations in 731 + a PCI driver. Then, pci_save_state(), pci_prepare_to_sleep(), and 732 + pci_set_power_state() should be used to save the device's standard configuration 733 + registers, to prepare it for system wakeup (if necessary), and to put it into a 734 + low-power state, respectively. Moreover, if the driver calls pci_save_state(), 735 + the PCI subsystem will not execute either pci_prepare_to_sleep(), or 736 + pci_set_power_state() for its device, so the driver is then responsible for 737 + handling the device as appropriate. 738 + 739 + While the suspend() callback is being executed, the driver's interrupt handler 740 + can be invoked to handle an interrupt from the device, so all suspend-related 741 + operations relying on the driver's ability to handle interrupts should be 742 + carried out in this callback. 743 + 744 + 3.1.3. suspend_noirq() 745 + 746 + The suspend_noirq() callback is only executed during system suspend, after 747 + suspend() callbacks have been executed for all devices in the system and 748 + after device interrupts have been disabled by the PM core. 749 + 750 + The difference between suspend_noirq() and suspend() is that the driver's 751 + interrupt handler will not be invoked while suspend_noirq() is running. Thus 752 + suspend_noirq() can carry out operations that would cause race conditions to 753 + arise if they were performed in suspend(). 754 + 755 + 3.1.4. freeze() 756 + 757 + The freeze() callback is hibernation-specific and is executed in two situations, 758 + during hibernation, after prepare() callbacks have been executed for all devices 759 + in preparation for the creation of a system image, and during restore, 760 + after a system image has been loaded into memory from persistent storage and the 761 + prepare() callbacks have been executed for all devices. 762 + 763 + The role of this callback is analogous to the role of the suspend() callback 764 + described above. In fact, they only need to be different in the rare cases when 765 + the driver takes the responsibility for putting the device into a low-power 76 766 state. 77 767 78 - The first walk allows a graceful recovery in the event of a failure, since none 79 - of the devices have actually been powered down. 768 + In that cases the freeze() callback should not prepare the device system wakeup 769 + or put it into a low-power state. Still, either it or freeze_noirq() should 770 + save the device's standard configuration registers using pci_save_state(). 80 771 81 - In both walks, in particular the second, all children of a bridge are touched 82 - before the actual bridge itself. This allows the bridge to retain power while 83 - its children are being accessed. 772 + 3.1.5. freeze_noirq() 773 + 774 + The freeze_noirq() callback is hibernation-specific. It is executed during 775 + hibernation, after prepare() and freeze() callbacks have been executed for all 776 + devices in preparation for the creation of a system image, and during restore, 777 + after a system image has been loaded into memory and after prepare() and 778 + freeze() callbacks have been executed for all devices. It is always executed 779 + after device interrupts have been disabled by the PM core. 84 780 85 - Upon resuming from sleep, just the opposite must be true: all bridges must be 86 - powered on and restored before their children are powered on. This is easily 87 - accomplished with a breadth-first walk of the PCI device tree. 781 + The role of this callback is analogous to the role of the suspend_noirq() 782 + callback described above and it very rarely is necessary to define 783 + freeze_noirq(). 88 784 785 + The difference between freeze_noirq() and freeze() is analogous to the 786 + difference between suspend_noirq() and suspend(). 89 787 90 - 3. PCI Utility Functions 91 - ~~~~~~~~~~~~~~~~~~~~~~~~ 788 + 3.1.6. poweroff() 92 789 93 - These are helper functions designed to be called by individual device drivers. 94 - Assuming that a device behaves as advertised, these should be applicable in most 95 - cases. However, results may vary. 790 + The poweroff() callback is hibernation-specific. It is executed when the system 791 + is about to be powered off after saving a hibernation image to a persistent 792 + storage. prepare() callbacks are executed for all devices before poweroff() is 793 + called. 96 794 97 - Note that these functions are never implicitly called for the driver. The driver 98 - is always responsible for deciding when and if to call these. 795 + The role of this callback is analogous to the role of the suspend() and freeze() 796 + callbacks described above, although it does not need to save the contents of 797 + the device's registers. In particular, if the driver wants to put the device 798 + into a low-power state itself instead of allowing the PCI subsystem to do that, 799 + the poweroff() callback should use pci_prepare_to_sleep() and 800 + pci_set_power_state() to prepare the device for system wakeup and to put it 801 + into a low-power state, respectively, but it need not save the device's standard 802 + configuration registers. 99 803 804 + 3.1.7. poweroff_noirq() 805 + 806 + The poweroff_noirq() callback is hibernation-specific. It is executed after 807 + poweroff() callbacks have been executed for all devices in the system. 100 808 101 - pci_save_state 102 - -------------- 809 + The role of this callback is analogous to the role of the suspend_noirq() and 810 + freeze_noirq() callbacks described above, but it does not need to save the 811 + contents of the device's registers. 103 812 104 - Usage: 105 - pci_save_state(struct pci_dev *dev); 813 + The difference between poweroff_noirq() and poweroff() is analogous to the 814 + difference between suspend_noirq() and suspend(). 106 815 107 - Description: 108 - Save first 64 bytes of PCI config space, along with any additional 109 - PCI-Express or PCI-X information. 816 + 3.1.8. resume_noirq() 110 817 818 + The resume_noirq() callback is only executed during system resume, after the 819 + PM core has enabled the non-boot CPUs. The driver's interrupt handler will not 820 + be invoked while resume_noirq() is running, so this callback can carry out 821 + operations that might race with the interrupt handler. 111 822 112 - pci_restore_state 113 - ----------------- 823 + Since the PCI subsystem unconditionally puts all devices into the full power 824 + state in the resume_noirq phase of system resume and restores their standard 825 + configuration registers, resume_noirq() is usually not necessary. In general 826 + it should only be used for performing operations that would lead to race 827 + conditions if carried out by resume(). 114 828 115 - Usage: 116 - pci_restore_state(struct pci_dev *dev); 829 + 3.1.9. resume() 117 830 118 - Description: 119 - Restore previously saved config space. 831 + The resume() callback is only executed during system resume, after 832 + resume_noirq() callbacks have been executed for all devices in the system and 833 + device interrupts have been enabled by the PM core. 120 834 835 + This callback is responsible for restoring the pre-suspend configuration of the 836 + device and bringing it back to the fully functional state. The device should be 837 + able to process I/O in a usual way after resume() has returned. 121 838 122 - pci_set_power_state 123 - ------------------- 839 + 3.1.10. thaw_noirq() 124 840 125 - Usage: 126 - pci_set_power_state(struct pci_dev *dev, pci_power_t state); 841 + The thaw_noirq() callback is hibernation-specific. It is executed after a 842 + system image has been created and the non-boot CPUs have been enabled by the PM 843 + core, in the thaw_noirq phase of hibernation. It also may be executed if the 844 + loading of a hibernation image fails during system restore (it is then executed 845 + after enabling the non-boot CPUs). The driver's interrupt handler will not be 846 + invoked while thaw_noirq() is running. 127 847 128 - Description: 129 - Transition device to low power state using PCI PM Capabilities 130 - registers. 848 + The role of this callback is analogous to the role of resume_noirq(). The 849 + difference between these two callbacks is that thaw_noirq() is executed after 850 + freeze() and freeze_noirq(), so in general it does not need to modify the 851 + contents of the device's registers. 131 852 132 - Will fail under one of the following conditions: 133 - - If state is less than current state, but not D0 (illegal transition) 134 - - Device doesn't support PM Capabilities 135 - - Device does not support requested state 853 + 3.1.11. thaw() 136 854 137 - 138 - pci_enable_wake 139 - --------------- 140 - 141 - Usage: 142 - pci_enable_wake(struct pci_dev *dev, pci_power_t state, int enable); 143 - 144 - Description: 145 - Enable device to generate PME# during low power state using PCI PM 146 - Capabilities. 147 - 148 - Checks whether if device supports generating PME# from requested state 149 - and fail if it does not, unless enable == 0 (request is to disable wake 150 - events, which is implicit if it doesn't even support it in the first 151 - place). 152 - 153 - Note that the PMC Register in the device's PM Capabilities has a bitmask 154 - of the states it supports generating PME# from. D3hot is bit 3 and 155 - D3cold is bit 4. So, while a value of 4 as the state may not seem 156 - semantically correct, it is. 157 - 158 - 159 - 4. PCI Device Drivers 160 - ~~~~~~~~~~~~~~~~~~~~~ 161 - 162 - These functions are intended for use by individual drivers, and are defined in 163 - struct pci_driver: 164 - 165 - int (*suspend) (struct pci_dev *dev, pm_message_t state); 166 - int (*resume) (struct pci_dev *dev); 167 - 168 - 169 - suspend 170 - ------- 171 - 172 - Usage: 173 - 174 - if (dev->driver && dev->driver->suspend) 175 - dev->driver->suspend(dev,state); 176 - 177 - A driver uses this function to actually transition the device into a low power 178 - state. This should include disabling I/O, IRQs, and bus-mastering, as well as 179 - physically transitioning the device to a lower power state; it may also include 180 - calls to pci_enable_wake(). 181 - 182 - Bus mastering may be disabled by doing: 183 - 184 - pci_disable_device(dev); 185 - 186 - For devices that support the PCI PM Spec, this may be used to set the device's 187 - power state to match the suspend() parameter: 188 - 189 - pci_set_power_state(dev,state); 190 - 191 - The driver is also responsible for disabling any other device-specific features 192 - (e.g blanking screen, turning off on-card memory, etc). 193 - 194 - The driver should be sure to track the current state of the device, as it may 195 - obviate the need for some operations. 196 - 197 - The driver should update the current_state field in its pci_dev structure in 198 - this function, except for PM-capable devices when pci_set_power_state is used. 199 - 200 - resume 201 - ------ 202 - 203 - Usage: 204 - 205 - if (dev->driver && dev->driver->resume) 206 - dev->driver->resume(dev) 207 - 208 - The resume callback may be called from any power state, and is always meant to 209 - transition the device to the D0 state. 210 - 211 - The driver is responsible for reenabling any features of the device that had 212 - been disabled during previous suspend calls, such as IRQs and bus mastering, 213 - as well as calling pci_restore_state(). 214 - 215 - If the device is currently in D3, it may need to be reinitialized in resume(). 216 - 217 - * Some types of devices, like bus controllers, will preserve context in D3hot 218 - (using Vcc power). Their drivers will often want to avoid re-initializing 219 - them after re-entering D0 (perhaps to avoid resetting downstream devices). 220 - 221 - * Other kinds of devices in D3hot will discard device context as part of a 222 - soft reset when re-entering the D0 state. 223 - 224 - * Devices resuming from D3cold always go through a power-on reset. Some 225 - device context can also be preserved using Vaux power. 226 - 227 - * Some systems hide D3cold resume paths from drivers. For example, on PCs 228 - the resume path for suspend-to-disk often runs BIOS powerup code, which 229 - will sometimes re-initialize the device. 230 - 231 - To handle resets during D3 to D0 transitions, it may be convenient to share 232 - device initialization code between probe() and resume(). Device parameters 233 - can also be saved before the driver suspends into D3, avoiding re-probe. 234 - 235 - If the device supports the PCI PM Spec, it can use this to physically transition 236 - the device to D0: 237 - 238 - pci_set_power_state(dev,0); 239 - 240 - Note that if the entire system is transitioning out of a global sleep state, all 241 - devices will be placed in the D0 state, so this is not necessary. However, in 242 - the event that the device is placed in the D3 state during normal operation, 243 - this call is necessary. It is impossible to determine which of the two events is 244 - taking place in the driver, so it is always a good idea to make that call. 245 - 246 - The driver should take note of the state that it is resuming from in order to 247 - ensure correct (and speedy) operation. 248 - 249 - The driver should update the current_state field in its pci_dev structure in 250 - this function, except for PM-capable devices when pci_set_power_state is used. 251 - 252 - 253 - 254 - A reference implementation 255 - ------------------------- 256 - .suspend() 257 - { 258 - /* driver specific operations */ 259 - 260 - /* Disable IRQ */ 261 - free_irq(); 262 - /* If using MSI */ 263 - pci_disable_msi(); 264 - 265 - pci_save_state(); 266 - pci_enable_wake(); 267 - /* Disable IO/bus master/irq router */ 268 - pci_disable_device(); 269 - pci_set_power_state(pci_choose_state()); 270 - } 271 - 272 - .resume() 273 - { 274 - pci_set_power_state(PCI_D0); 275 - pci_restore_state(); 276 - /* device's irq possibly is changed, driver should take care */ 277 - pci_enable_device(); 278 - pci_set_master(); 279 - 280 - /* if using MSI, device's vector possibly is changed */ 281 - pci_enable_msi(); 282 - 283 - request_irq(); 284 - /* driver specific operations; */ 285 - } 286 - 287 - This is a typical implementation. Drivers can slightly change the order 288 - of the operations in the implementation, ignore some operations or add 289 - more driver specific operations in it, but drivers should do something like 290 - this on the whole. 291 - 292 - 5. Resources 293 - ~~~~~~~~~~~~ 294 - 295 - PCI Local Bus Specification 296 - PCI Bus Power Management Interface Specification 297 - 298 - http://www.pcisig.com 299 - 855 + The thaw() callback is hibernation-specific. It is executed after thaw_noirq() 856 + callbacks have been executed for all devices in the system and after device 857 + interrupts have been enabled by the PM core. 858 + 859 + This callback is responsible for restoring the pre-freeze configuration of 860 + the device, so that it will work in a usual way after thaw() has returned. 861 + 862 + 3.1.12. restore_noirq() 863 + 864 + The restore_noirq() callback is hibernation-specific. It is executed in the 865 + restore_noirq phase of hibernation, when the boot kernel has passed control to 866 + the image kernel and the non-boot CPUs have been enabled by the image kernel's 867 + PM core. 868 + 869 + This callback is analogous to resume_noirq() with the exception that it cannot 870 + make any assumption on the previous state of the device, even if the BIOS (or 871 + generally the platform firmware) is known to preserve that state over a 872 + suspend-resume cycle. 873 + 874 + For the vast majority of PCI device drivers there is no difference between 875 + resume_noirq() and restore_noirq(). 876 + 877 + 3.1.13. restore() 878 + 879 + The restore() callback is hibernation-specific. It is executed after 880 + restore_noirq() callbacks have been executed for all devices in the system and 881 + after the PM core has enabled device drivers' interrupt handlers to be invoked. 882 + 883 + This callback is analogous to resume(), just like restore_noirq() is analogous 884 + to resume_noirq(). Consequently, the difference between restore_noirq() and 885 + restore() is analogous to the difference between resume_noirq() and resume(). 886 + 887 + For the vast majority of PCI device drivers there is no difference between 888 + resume() and restore(). 889 + 890 + 3.1.14. complete() 891 + 892 + The complete() callback is executed in the following situations: 893 + - during system resume, after resume() callbacks have been executed for all 894 + devices, 895 + - during hibernation, before saving the system image, after thaw() callbacks 896 + have been executed for all devices, 897 + - during system restore, when the system is going back to its pre-hibernation 898 + state, after restore() callbacks have been executed for all devices. 899 + It also may be executed if the loading of a hibernation image into memory fails 900 + (in that case it is run after thaw() callbacks have been executed for all 901 + devices that have drivers in the boot kernel). 902 + 903 + This callback is entirely optional, although it may be necessary if the 904 + prepare() callback performs operations that need to be reversed. 905 + 906 + 3.1.15. runtime_suspend() 907 + 908 + The runtime_suspend() callback is specific to device runtime power management 909 + (runtime PM). It is executed by the PM core's runtime PM framework when the 910 + device is about to be suspended (i.e. quiesced and put into a low-power state) 911 + at run time. 912 + 913 + This callback is responsible for freezing the device and preparing it to be 914 + put into a low-power state, but it must allow the PCI subsystem to perform all 915 + of the PCI-specific actions necessary for suspending the device. 916 + 917 + 3.1.16. runtime_resume() 918 + 919 + The runtime_resume() callback is specific to device runtime PM. It is executed 920 + by the PM core's runtime PM framework when the device is about to be resumed 921 + (i.e. put into the full-power state and programmed to process I/O normally) at 922 + run time. 923 + 924 + This callback is responsible for restoring the normal functionality of the 925 + device after it has been put into the full-power state by the PCI subsystem. 926 + The device is expected to be able to process I/O in the usual way after 927 + runtime_resume() has returned. 928 + 929 + 3.1.17. runtime_idle() 930 + 931 + The runtime_idle() callback is specific to device runtime PM. It is executed 932 + by the PM core's runtime PM framework whenever it may be desirable to suspend 933 + the device according to the PM core's information. In particular, it is 934 + automatically executed right after runtime_resume() has returned in case the 935 + resume of the device has happened as a result of a spurious event. 936 + 937 + This callback is optional, but if it is not implemented or if it returns 0, the 938 + PCI subsystem will call pm_runtime_suspend() for the device, which in turn will 939 + cause the driver's runtime_suspend() callback to be executed. 940 + 941 + 3.1.18. Pointing Multiple Callback Pointers to One Routine 942 + 943 + Although in principle each of the callbacks described in the previous 944 + subsections can be defined as a separate function, it often is convenient to 945 + point two or more members of struct dev_pm_ops to the same routine. There are 946 + a few convenience macros that can be used for this purpose. 947 + 948 + The SIMPLE_DEV_PM_OPS macro declares a struct dev_pm_ops object with one 949 + suspend routine pointed to by the .suspend(), .freeze(), and .poweroff() 950 + members and one resume routine pointed to by the .resume(), .thaw(), and 951 + .restore() members. The other function pointers in this struct dev_pm_ops are 952 + unset. 953 + 954 + The UNIVERSAL_DEV_PM_OPS macro is similar to SIMPLE_DEV_PM_OPS, but it 955 + additionally sets the .runtime_resume() pointer to the same value as 956 + .resume() (and .thaw(), and .restore()) and the .runtime_suspend() pointer to 957 + the same value as .suspend() (and .freeze() and .poweroff()). 958 + 959 + The SET_SYSTEM_SLEEP_PM_OPS can be used inside of a declaration of struct 960 + dev_pm_ops to indicate that one suspend routine is to be pointed to by the 961 + .suspend(), .freeze(), and .poweroff() members and one resume routine is to 962 + be pointed to by the .resume(), .thaw(), and .restore() members. 963 + 964 + 3.2. Device Runtime Power Management 965 + ------------------------------------ 966 + In addition to providing device power management callbacks PCI device drivers 967 + are responsible for controlling the runtime power management (runtime PM) of 968 + their devices. 969 + 970 + The PCI device runtime PM is optional, but it is recommended that PCI device 971 + drivers implement it at least in the cases where there is a reliable way of 972 + verifying that the device is not used (like when the network cable is detached 973 + from an Ethernet adapter or there are no devices attached to a USB controller). 974 + 975 + To support the PCI runtime PM the driver first needs to implement the 976 + runtime_suspend() and runtime_resume() callbacks. It also may need to implement 977 + the runtime_idle() callback to prevent the device from being suspended again 978 + every time right after the runtime_resume() callback has returned 979 + (alternatively, the runtime_suspend() callback will have to check if the 980 + device should really be suspended and return -EAGAIN if that is not the case). 981 + 982 + The runtime PM of PCI devices is disabled by default. It is also blocked by 983 + pci_pm_init() that runs the pm_runtime_forbid() helper function. If a PCI 984 + driver implements the runtime PM callbacks and intends to use the runtime PM 985 + framework provided by the PM core and the PCI subsystem, it should enable this 986 + feature by executing the pm_runtime_enable() helper function. However, the 987 + driver should not call the pm_runtime_allow() helper function unblocking 988 + the runtime PM of the device. Instead, it should allow user space or some 989 + platform-specific code to do that (user space can do it via sysfs), although 990 + once it has called pm_runtime_enable(), it must be prepared to handle the 991 + runtime PM of the device correctly as soon as pm_runtime_allow() is called 992 + (which may happen at any time). [It also is possible that user space causes 993 + pm_runtime_allow() to be called via sysfs before the driver is loaded, so in 994 + fact the driver has to be prepared to handle the runtime PM of the device as 995 + soon as it calls pm_runtime_enable().] 996 + 997 + The runtime PM framework works by processing requests to suspend or resume 998 + devices, or to check if they are idle (in which cases it is reasonable to 999 + subsequently request that they be suspended). These requests are represented 1000 + by work items put into the power management workqueue, pm_wq. Although there 1001 + are a few situations in which power management requests are automatically 1002 + queued by the PM core (for example, after processing a request to resume a 1003 + device the PM core automatically queues a request to check if the device is 1004 + idle), device drivers are generally responsible for queuing power management 1005 + requests for their devices. For this purpose they should use the runtime PM 1006 + helper functions provided by the PM core, discussed in 1007 + Documentation/power/runtime_pm.txt. 1008 + 1009 + Devices can also be suspended and resumed synchronously, without placing a 1010 + request into pm_wq. In the majority of cases this also is done by their 1011 + drivers that use helper functions provided by the PM core for this purpose. 1012 + 1013 + For more information on the runtime PM of devices refer to 1014 + Documentation/power/runtime_pm.txt. 1015 + 1016 + 1017 + 4. Resources 1018 + ============ 1019 + 1020 + PCI Local Bus Specification, Rev. 3.0 1021 + PCI Bus Power Management Interface Specification, Rev. 1.2 1022 + Advanced Configuration and Power Interface (ACPI) Specification, Rev. 3.0b 1023 + PCI Express Base Specification, Rev. 2.0 1024 + Documentation/power/devices.txt 1025 + Documentation/power/runtime_pm.txt

+8

arch/x86/Kconfig

··· 1923 1923 bool "Support mmconfig PCI config space access" 1924 1924 depends on X86_64 && PCI && ACPI 1925 1925 1926 + config PCI_CNB20LE_QUIRK 1927 + bool "Read CNB20LE Host Bridge Windows" 1928 + depends on PCI 1929 + help 1930 + Read the PCI windows out of the CNB20LE host bridge. This allows 1931 + PCI hotplug to work on systems with the CNB20LE chipset which do 1932 + not have ACPI. 1933 + 1926 1934 config DMAR 1927 1935 bool "Support for DMA Remapping Devices (EXPERIMENTAL)" 1928 1936 depends on PCI_MSI && ACPI && EXPERIMENTAL

+1 -1

arch/x86/include/asm/pci_x86.h

··· 83 83 84 84 extern unsigned int pcibios_irq_mask; 85 85 86 - extern spinlock_t pci_config_lock; 86 + extern raw_spinlock_t pci_config_lock; 87 87 88 88 extern int (*pcibios_enable_irq)(struct pci_dev *dev); 89 89 extern void (*pcibios_disable_irq)(struct pci_dev *dev);

+2

arch/x86/pci/Makefile

··· 18 18 obj-y += common.o early.o 19 19 obj-y += amd_bus.o bus_numa.o 20 20 21 + obj-$(CONFIG_PCI_CNB20LE_QUIRK) += broadcom_bus.o 22 + 21 23 ifeq ($(CONFIG_PCI_DEBUG),y) 22 24 EXTRA_CFLAGS += -DDEBUG 23 25 endif

+101

arch/x86/pci/broadcom_bus.c

··· 1 + /* 2 + * Read address ranges from a Broadcom CNB20LE Host Bridge 3 + * 4 + * Copyright (c) 2010 Ira W. Snyder <iws@ovro.caltech.edu> 5 + * 6 + * This program is free software; you can redistribute it and/or modify it 7 + * under the terms of the GNU General Public License as published by the 8 + * Free Software Foundation; either version 2 of the License, or (at your 9 + * option) any later version. 10 + */ 11 + 12 + #include <linux/delay.h> 13 + #include <linux/dmi.h> 14 + #include <linux/pci.h> 15 + #include <linux/init.h> 16 + #include <asm/pci_x86.h> 17 + 18 + #include "bus_numa.h" 19 + 20 + static void __devinit cnb20le_res(struct pci_dev *dev) 21 + { 22 + struct pci_root_info *info; 23 + struct resource res; 24 + u16 word1, word2; 25 + u8 fbus, lbus; 26 + int i; 27 + 28 + /* 29 + * The x86_pci_root_bus_res_quirks() function already refuses to use 30 + * this information if ACPI _CRS was used. Therefore, we don't bother 31 + * checking if ACPI is enabled, and just generate the information 32 + * for both the ACPI _CRS and no ACPI cases. 33 + */ 34 + 35 + info = &pci_root_info[pci_root_num]; 36 + pci_root_num++; 37 + 38 + /* read the PCI bus numbers */ 39 + pci_read_config_byte(dev, 0x44, &fbus); 40 + pci_read_config_byte(dev, 0x45, &lbus); 41 + info->bus_min = fbus; 42 + info->bus_max = lbus; 43 + 44 + /* 45 + * Add the legacy IDE ports on bus 0 46 + * 47 + * These do not exist anywhere in the bridge registers, AFAICT. I do 48 + * not have the datasheet, so this is the best I can do. 49 + */ 50 + if (fbus == 0) { 51 + update_res(info, 0x01f0, 0x01f7, IORESOURCE_IO, 0); 52 + update_res(info, 0x03f6, 0x03f6, IORESOURCE_IO, 0); 53 + update_res(info, 0x0170, 0x0177, IORESOURCE_IO, 0); 54 + update_res(info, 0x0376, 0x0376, IORESOURCE_IO, 0); 55 + update_res(info, 0xffa0, 0xffaf, IORESOURCE_IO, 0); 56 + } 57 + 58 + /* read the non-prefetchable memory window */ 59 + pci_read_config_word(dev, 0xc0, &word1); 60 + pci_read_config_word(dev, 0xc2, &word2); 61 + if (word1 != word2) { 62 + res.start = (word1 << 16) | 0x0000; 63 + res.end = (word2 << 16) | 0xffff; 64 + res.flags = IORESOURCE_MEM; 65 + update_res(info, res.start, res.end, res.flags, 0); 66 + } 67 + 68 + /* read the prefetchable memory window */ 69 + pci_read_config_word(dev, 0xc4, &word1); 70 + pci_read_config_word(dev, 0xc6, &word2); 71 + if (word1 != word2) { 72 + res.start = (word1 << 16) | 0x0000; 73 + res.end = (word2 << 16) | 0xffff; 74 + res.flags = IORESOURCE_MEM | IORESOURCE_PREFETCH; 75 + update_res(info, res.start, res.end, res.flags, 0); 76 + } 77 + 78 + /* read the IO port window */ 79 + pci_read_config_word(dev, 0xd0, &word1); 80 + pci_read_config_word(dev, 0xd2, &word2); 81 + if (word1 != word2) { 82 + res.start = word1; 83 + res.end = word2; 84 + res.flags = IORESOURCE_IO; 85 + update_res(info, res.start, res.end, res.flags, 0); 86 + } 87 + 88 + /* print information about this host bridge */ 89 + res.start = fbus; 90 + res.end = lbus; 91 + res.flags = IORESOURCE_BUS; 92 + dev_info(&dev->dev, "CNB20LE PCI Host Bridge (domain %04x %pR)\n", 93 + pci_domain_nr(dev->bus), &res); 94 + 95 + for (i = 0; i < info->res_num; i++) 96 + dev_info(&dev->dev, "host bridge window %pR\n", &info->res[i]); 97 + } 98 + 99 + DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_SERVERWORKS, PCI_DEVICE_ID_SERVERWORKS_LE, 100 + cnb20le_res); 101 +

+1 -1

arch/x86/pci/common.c

··· 76 76 * This interrupt-safe spinlock protects all accesses to PCI 77 77 * configuration space. 78 78 */ 79 - DEFINE_SPINLOCK(pci_config_lock); 79 + DEFINE_RAW_SPINLOCK(pci_config_lock); 80 80 81 81 static int __devinit can_skip_ioresource_align(const struct dmi_system_id *d) 82 82 {

+8 -8

arch/x86/pci/direct.c

··· 27 27 return -EINVAL; 28 28 } 29 29 30 - spin_lock_irqsave(&pci_config_lock, flags); 30 + raw_spin_lock_irqsave(&pci_config_lock, flags); 31 31 32 32 outl(PCI_CONF1_ADDRESS(bus, devfn, reg), 0xCF8); 33 33 ··· 43 43 break; 44 44 } 45 45 46 - spin_unlock_irqrestore(&pci_config_lock, flags); 46 + raw_spin_unlock_irqrestore(&pci_config_lock, flags); 47 47 48 48 return 0; 49 49 } ··· 56 56 if ((bus > 255) || (devfn > 255) || (reg > 4095)) 57 57 return -EINVAL; 58 58 59 - spin_lock_irqsave(&pci_config_lock, flags); 59 + raw_spin_lock_irqsave(&pci_config_lock, flags); 60 60 61 61 outl(PCI_CONF1_ADDRESS(bus, devfn, reg), 0xCF8); 62 62 ··· 72 72 break; 73 73 } 74 74 75 - spin_unlock_irqrestore(&pci_config_lock, flags); 75 + raw_spin_unlock_irqrestore(&pci_config_lock, flags); 76 76 77 77 return 0; 78 78 } ··· 108 108 if (dev & 0x10) 109 109 return PCIBIOS_DEVICE_NOT_FOUND; 110 110 111 - spin_lock_irqsave(&pci_config_lock, flags); 111 + raw_spin_lock_irqsave(&pci_config_lock, flags); 112 112 113 113 outb((u8)(0xF0 | (fn << 1)), 0xCF8); 114 114 outb((u8)bus, 0xCFA); ··· 127 127 128 128 outb(0, 0xCF8); 129 129 130 - spin_unlock_irqrestore(&pci_config_lock, flags); 130 + raw_spin_unlock_irqrestore(&pci_config_lock, flags); 131 131 132 132 return 0; 133 133 } ··· 147 147 if (dev & 0x10) 148 148 return PCIBIOS_DEVICE_NOT_FOUND; 149 149 150 - spin_lock_irqsave(&pci_config_lock, flags); 150 + raw_spin_lock_irqsave(&pci_config_lock, flags); 151 151 152 152 outb((u8)(0xF0 | (fn << 1)), 0xCF8); 153 153 outb((u8)bus, 0xCFA); ··· 166 166 167 167 outb(0, 0xCF8); 168 168 169 - spin_unlock_irqrestore(&pci_config_lock, flags); 169 + raw_spin_unlock_irqrestore(&pci_config_lock, flags); 170 170 171 171 return 0; 172 172 }

+7 -2

arch/x86/pci/irq.c

··· 589 589 case PCI_DEVICE_ID_INTEL_ICH10_1: 590 590 case PCI_DEVICE_ID_INTEL_ICH10_2: 591 591 case PCI_DEVICE_ID_INTEL_ICH10_3: 592 - case PCI_DEVICE_ID_INTEL_CPT_LPC1: 593 - case PCI_DEVICE_ID_INTEL_CPT_LPC2: 594 592 r->name = "PIIX/ICH"; 595 593 r->get = pirq_piix_get; 596 594 r->set = pirq_piix_set; ··· 603 605 return 1; 604 606 } 605 607 608 + if ((device >= PCI_DEVICE_ID_INTEL_CPT_LPC_MIN) && 609 + (device <= PCI_DEVICE_ID_INTEL_CPT_LPC_MAX)) { 610 + r->name = "PIIX/ICH"; 611 + r->get = pirq_piix_get; 612 + r->set = pirq_piix_set; 613 + return 1; 614 + } 606 615 return 0; 607 616 } 608 617

+9 -8

arch/x86/pci/mmconfig-shared.c

··· 483 483 list_for_each_entry(cfg, &pci_mmcfg_list, list) { 484 484 int valid = 0; 485 485 486 - if (!early && !acpi_disabled) 486 + if (!early && !acpi_disabled) { 487 487 valid = is_mmconf_reserved(is_acpi_reserved, cfg, 0); 488 488 489 - if (valid) 490 - continue; 491 - 492 - if (!early) 493 - printk(KERN_ERR FW_BUG PREFIX 494 - "MMCONFIG at %pR not reserved in " 495 - "ACPI motherboard resources\n", &cfg->res); 489 + if (valid) 490 + continue; 491 + else 492 + printk(KERN_ERR FW_BUG PREFIX 493 + "MMCONFIG at %pR not reserved in " 494 + "ACPI motherboard resources\n", 495 + &cfg->res); 496 + } 496 497 497 498 /* Don't try to do this check unless configuration 498 499 type 1 is available. how about type 2 ?*/

+4 -4

arch/x86/pci/mmconfig_32.c

··· 64 64 if (!base) 65 65 goto err; 66 66 67 - spin_lock_irqsave(&pci_config_lock, flags); 67 + raw_spin_lock_irqsave(&pci_config_lock, flags); 68 68 69 69 pci_exp_set_dev_base(base, bus, devfn); 70 70 ··· 79 79 *value = mmio_config_readl(mmcfg_virt_addr + reg); 80 80 break; 81 81 } 82 - spin_unlock_irqrestore(&pci_config_lock, flags); 82 + raw_spin_unlock_irqrestore(&pci_config_lock, flags); 83 83 84 84 return 0; 85 85 } ··· 97 97 if (!base) 98 98 return -EINVAL; 99 99 100 - spin_lock_irqsave(&pci_config_lock, flags); 100 + raw_spin_lock_irqsave(&pci_config_lock, flags); 101 101 102 102 pci_exp_set_dev_base(base, bus, devfn); 103 103 ··· 112 112 mmio_config_writel(mmcfg_virt_addr + reg, value); 113 113 break; 114 114 } 115 - spin_unlock_irqrestore(&pci_config_lock, flags); 115 + raw_spin_unlock_irqrestore(&pci_config_lock, flags); 116 116 117 117 return 0; 118 118 }

+4 -4

arch/x86/pci/numaq_32.c

··· 37 37 if (!value || (bus >= MAX_MP_BUSSES) || (devfn > 255) || (reg > 255)) 38 38 return -EINVAL; 39 39 40 - spin_lock_irqsave(&pci_config_lock, flags); 40 + raw_spin_lock_irqsave(&pci_config_lock, flags); 41 41 42 42 write_cf8(bus, devfn, reg); 43 43 ··· 62 62 break; 63 63 } 64 64 65 - spin_unlock_irqrestore(&pci_config_lock, flags); 65 + raw_spin_unlock_irqrestore(&pci_config_lock, flags); 66 66 67 67 return 0; 68 68 } ··· 76 76 if ((bus >= MAX_MP_BUSSES) || (devfn > 255) || (reg > 255)) 77 77 return -EINVAL; 78 78 79 - spin_lock_irqsave(&pci_config_lock, flags); 79 + raw_spin_lock_irqsave(&pci_config_lock, flags); 80 80 81 81 write_cf8(bus, devfn, reg); 82 82 ··· 101 101 break; 102 102 } 103 103 104 - spin_unlock_irqrestore(&pci_config_lock, flags); 104 + raw_spin_unlock_irqrestore(&pci_config_lock, flags); 105 105 106 106 return 0; 107 107 }

+4 -4

arch/x86/pci/pcbios.c

··· 162 162 if (!value || (bus > 255) || (devfn > 255) || (reg > 255)) 163 163 return -EINVAL; 164 164 165 - spin_lock_irqsave(&pci_config_lock, flags); 165 + raw_spin_lock_irqsave(&pci_config_lock, flags); 166 166 167 167 switch (len) { 168 168 case 1: ··· 213 213 break; 214 214 } 215 215 216 - spin_unlock_irqrestore(&pci_config_lock, flags); 216 + raw_spin_unlock_irqrestore(&pci_config_lock, flags); 217 217 218 218 return (int)((result & 0xff00) >> 8); 219 219 } ··· 228 228 if ((bus > 255) || (devfn > 255) || (reg > 255)) 229 229 return -EINVAL; 230 230 231 - spin_lock_irqsave(&pci_config_lock, flags); 231 + raw_spin_lock_irqsave(&pci_config_lock, flags); 232 232 233 233 switch (len) { 234 234 case 1: ··· 269 269 break; 270 270 } 271 271 272 - spin_unlock_irqrestore(&pci_config_lock, flags); 272 + raw_spin_unlock_irqrestore(&pci_config_lock, flags); 273 273 274 274 return (int)((result & 0xff00) >> 8); 275 275 }

+1 -1

drivers/edac/amd76x_edac.c

··· 294 294 { 295 295 debugf0("%s()\n", __func__); 296 296 297 - /* don't need to call pci_device_enable() */ 297 + /* don't need to call pci_enable_device() */ 298 298 return amd76x_probe1(pdev, ent->driver_data); 299 299 } 300 300

+1 -1

drivers/edac/i82443bxgx_edac.c

··· 354 354 355 355 debugf0("MC: " __FILE__ ": %s()\n", __func__); 356 356 357 - /* don't need to call pci_device_enable() */ 357 + /* don't need to call pci_enable_device() */ 358 358 rc = i82443bxgx_edacmc_probe1(pdev, ent->driver_data); 359 359 360 360 if (mci_pdev == NULL)

+1 -1

drivers/edac/r82600_edac.c

··· 354 354 { 355 355 debugf0("%s()\n", __func__); 356 356 357 - /* don't need to call pci_device_enable() */ 357 + /* don't need to call pci_enable_device() */ 358 358 return r82600_probe1(pdev, ent->driver_data); 359 359 } 360 360

+1 -1

drivers/pci/Kconfig

··· 19 19 by using the 'pci=nomsi' option. This disables MSI for the 20 20 entire system. 21 21 22 - If you don't know what to do here, say N. 22 + If you don't know what to do here, say Y. 23 23 24 24 config PCI_DEBUG 25 25 bool "PCI Debugging"

+23 -18

drivers/pci/access.c

··· 13 13 * configuration space. 14 14 */ 15 15 16 - static DEFINE_SPINLOCK(pci_lock); 16 + static DEFINE_RAW_SPINLOCK(pci_lock); 17 17 18 18 /* 19 19 * Wrappers for all PCI configuration access functions. They just check ··· 33 33 unsigned long flags; \ 34 34 u32 data = 0; \ 35 35 if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \ 36 - spin_lock_irqsave(&pci_lock, flags); \ 36 + raw_spin_lock_irqsave(&pci_lock, flags); \ 37 37 res = bus->ops->read(bus, devfn, pos, len, &data); \ 38 38 *value = (type)data; \ 39 - spin_unlock_irqrestore(&pci_lock, flags); \ 39 + raw_spin_unlock_irqrestore(&pci_lock, flags); \ 40 40 return res; \ 41 41 } 42 42 ··· 47 47 int res; \ 48 48 unsigned long flags; \ 49 49 if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \ 50 - spin_lock_irqsave(&pci_lock, flags); \ 50 + raw_spin_lock_irqsave(&pci_lock, flags); \ 51 51 res = bus->ops->write(bus, devfn, pos, len, value); \ 52 - spin_unlock_irqrestore(&pci_lock, flags); \ 52 + raw_spin_unlock_irqrestore(&pci_lock, flags); \ 53 53 return res; \ 54 54 } 55 55 ··· 79 79 struct pci_ops *old_ops; 80 80 unsigned long flags; 81 81 82 - spin_lock_irqsave(&pci_lock, flags); 82 + raw_spin_lock_irqsave(&pci_lock, flags); 83 83 old_ops = bus->ops; 84 84 bus->ops = ops; 85 - spin_unlock_irqrestore(&pci_lock, flags); 85 + raw_spin_unlock_irqrestore(&pci_lock, flags); 86 86 return old_ops; 87 87 } 88 88 EXPORT_SYMBOL(pci_bus_set_ops); ··· 136 136 __add_wait_queue(&pci_ucfg_wait, &wait); 137 137 do { 138 138 set_current_state(TASK_UNINTERRUPTIBLE); 139 - spin_unlock_irq(&pci_lock); 139 + raw_spin_unlock_irq(&pci_lock); 140 140 schedule(); 141 - spin_lock_irq(&pci_lock); 141 + raw_spin_lock_irq(&pci_lock); 142 142 } while (dev->block_ucfg_access); 143 143 __remove_wait_queue(&pci_ucfg_wait, &wait); 144 144 } ··· 150 150 int ret = 0; \ 151 151 u32 data = -1; \ 152 152 if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \ 153 - spin_lock_irq(&pci_lock); \ 153 + raw_spin_lock_irq(&pci_lock); \ 154 154 if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev); \ 155 155 ret = dev->bus->ops->read(dev->bus, dev->devfn, \ 156 156 pos, sizeof(type), &data); \ 157 - spin_unlock_irq(&pci_lock); \ 157 + raw_spin_unlock_irq(&pci_lock); \ 158 158 *val = (type)data; \ 159 159 return ret; \ 160 160 } ··· 165 165 { \ 166 166 int ret = -EIO; \ 167 167 if (PCI_##size##_BAD) return PCIBIOS_BAD_REGISTER_NUMBER; \ 168 - spin_lock_irq(&pci_lock); \ 168 + raw_spin_lock_irq(&pci_lock); \ 169 169 if (unlikely(dev->block_ucfg_access)) pci_wait_ucfg(dev); \ 170 170 ret = dev->bus->ops->write(dev->bus, dev->devfn, \ 171 171 pos, sizeof(type), val); \ 172 - spin_unlock_irq(&pci_lock); \ 172 + raw_spin_unlock_irq(&pci_lock); \ 173 173 return ret; \ 174 174 } 175 175 ··· 220 220 return 0; 221 221 } 222 222 223 - if (time_after(jiffies, timeout)) 223 + if (time_after(jiffies, timeout)) { 224 + dev_printk(KERN_DEBUG, &dev->dev, 225 + "vpd r/w failed. This is likely a firmware " 226 + "bug on this device. Contact the card " 227 + "vendor for a firmware update."); 224 228 return -ETIMEDOUT; 229 + } 225 230 if (fatal_signal_pending(current)) 226 231 return -EINTR; 227 232 if (!cond_resched()) ··· 401 396 unsigned long flags; 402 397 int was_blocked; 403 398 404 - spin_lock_irqsave(&pci_lock, flags); 399 + raw_spin_lock_irqsave(&pci_lock, flags); 405 400 was_blocked = dev->block_ucfg_access; 406 401 dev->block_ucfg_access = 1; 407 - spin_unlock_irqrestore(&pci_lock, flags); 402 + raw_spin_unlock_irqrestore(&pci_lock, flags); 408 403 409 404 /* If we BUG() inside the pci_lock, we're guaranteed to hose 410 405 * the machine */ ··· 422 417 { 423 418 unsigned long flags; 424 419 425 - spin_lock_irqsave(&pci_lock, flags); 420 + raw_spin_lock_irqsave(&pci_lock, flags); 426 421 427 422 /* This indicates a problem in the caller, but we don't need 428 423 * to kill them, unlike a double-block above. */ ··· 430 425 431 426 dev->block_ucfg_access = 0; 432 427 wake_up_all(&pci_ucfg_wait); 433 - spin_unlock_irqrestore(&pci_lock, flags); 428 + raw_spin_unlock_irqrestore(&pci_lock, flags); 434 429 } 435 430 EXPORT_SYMBOL_GPL(pci_unblock_user_cfg_access);

+1 -2

drivers/pci/hotplug/cpqphp_core.c

··· 1075 1075 1076 1076 /* make our own copy of the pci bus structure, 1077 1077 * as we like tweaking it a lot */ 1078 - ctrl->pci_bus = kmalloc(sizeof(*ctrl->pci_bus), GFP_KERNEL); 1078 + ctrl->pci_bus = kmemdup(pdev->bus, sizeof(*ctrl->pci_bus), GFP_KERNEL); 1079 1079 if (!ctrl->pci_bus) { 1080 1080 err("out of memory\n"); 1081 1081 rc = -ENOMEM; 1082 1082 goto err_free_ctrl; 1083 1083 } 1084 - memcpy(ctrl->pci_bus, pdev->bus, sizeof(*ctrl->pci_bus)); 1085 1084 1086 1085 ctrl->bus = pdev->bus->number; 1087 1086 ctrl->rev = pdev->revision;

+3 -14

drivers/pci/hotplug/pciehp_pci.c

··· 84 84 dev = pci_get_slot(parent, PCI_DEVFN(0, fn)); 85 85 if (!dev) 86 86 continue; 87 - if ((dev->class >> 16) == PCI_BASE_CLASS_DISPLAY) { 88 - ctrl_err(ctrl, "Cannot hot-add display device %s\n", 89 - pci_name(dev)); 90 - pci_dev_put(dev); 91 - continue; 92 - } 93 87 if ((dev->hdr_type == PCI_HEADER_TYPE_BRIDGE) || 94 88 (dev->hdr_type == PCI_HEADER_TYPE_CARDBUS)) { 95 89 pciehp_add_bridge(dev); ··· 127 133 presence = 0; 128 134 129 135 for (j = 0; j < 8; j++) { 130 - struct pci_dev* temp = pci_get_slot(parent, PCI_DEVFN(0, j)); 136 + struct pci_dev *temp = pci_get_slot(parent, PCI_DEVFN(0, j)); 131 137 if (!temp) 132 138 continue; 133 - if ((temp->class >> 16) == PCI_BASE_CLASS_DISPLAY) { 134 - ctrl_err(ctrl, "Cannot remove display device %s\n", 135 - pci_name(temp)); 136 - pci_dev_put(temp); 137 - continue; 138 - } 139 139 if (temp->hdr_type == PCI_HEADER_TYPE_BRIDGE && presence) { 140 140 pci_read_config_byte(temp, PCI_BRIDGE_CONTROL, &bctl); 141 141 if (bctl & PCI_BRIDGE_CTL_VGA) { ··· 137 149 "Cannot remove display device %s\n", 138 150 pci_name(temp)); 139 151 pci_dev_put(temp); 140 - continue; 152 + rc = EINVAL; 153 + break; 141 154 } 142 155 } 143 156 pci_remove_bus_device(temp);

+43 -1

drivers/pci/pci-sysfs.c

··· 979 979 980 980 if (val != 1) 981 981 return -EINVAL; 982 - return pci_reset_function(pdev); 982 + 983 + result = pci_reset_function(pdev); 984 + if (result < 0) 985 + return result; 986 + 987 + return count; 983 988 } 984 989 985 990 static struct device_attribute reset_attr = __ATTR(reset, 0200, NULL, reset_store); ··· 1033 1028 } 1034 1029 1035 1030 return retval; 1031 + } 1032 + 1033 + static void pci_remove_slot_links(struct pci_dev *dev) 1034 + { 1035 + char func[10]; 1036 + struct pci_slot *slot; 1037 + 1038 + sysfs_remove_link(&dev->dev.kobj, "slot"); 1039 + list_for_each_entry(slot, &dev->bus->slots, list) { 1040 + if (slot->number != PCI_SLOT(dev->devfn)) 1041 + continue; 1042 + snprintf(func, 10, "function%d", PCI_FUNC(dev->devfn)); 1043 + sysfs_remove_link(&slot->kobj, func); 1044 + } 1045 + } 1046 + 1047 + static int pci_create_slot_links(struct pci_dev *dev) 1048 + { 1049 + int result = 0; 1050 + char func[10]; 1051 + struct pci_slot *slot; 1052 + 1053 + list_for_each_entry(slot, &dev->bus->slots, list) { 1054 + if (slot->number != PCI_SLOT(dev->devfn)) 1055 + continue; 1056 + result = sysfs_create_link(&dev->dev.kobj, &slot->kobj, "slot"); 1057 + if (result) 1058 + goto out; 1059 + snprintf(func, 10, "function%d", PCI_FUNC(dev->devfn)); 1060 + result = sysfs_create_link(&slot->kobj, &dev->dev.kobj, func); 1061 + } 1062 + out: 1063 + return result; 1036 1064 } 1037 1065 1038 1066 int __must_check pci_create_sysfs_dev_files (struct pci_dev *pdev) ··· 1130 1092 if (retval) 1131 1093 goto err_vga_file; 1132 1094 1095 + pci_create_slot_links(pdev); 1096 + 1133 1097 return 0; 1134 1098 1135 1099 err_vga_file: ··· 1180 1140 1181 1141 if (!sysfs_initialized) 1182 1142 return; 1143 + 1144 + pci_remove_slot_links(pdev); 1183 1145 1184 1146 pci_remove_capabilities_sysfs(pdev); 1185 1147

+1 -3

drivers/pci/pci.c

··· 1193 1193 * anymore. This only involves disabling PCI bus-mastering, if active. 1194 1194 * 1195 1195 * Note we don't actually disable the device until all callers of 1196 - * pci_device_enable() have called pci_device_disable(). 1196 + * pci_enable_device() have called pci_disable_device(). 1197 1197 */ 1198 1198 void 1199 1199 pci_disable_device(struct pci_dev *dev) ··· 1631 1631 * let the user space enable it to wake up the system as needed. 1632 1632 */ 1633 1633 device_set_wakeup_capable(&dev->dev, true); 1634 - device_set_wakeup_enable(&dev->dev, false); 1635 1634 /* Disable the PME# generation functionality */ 1636 1635 pci_pme_active(dev, false); 1637 1636 } else { ··· 1654 1655 return; 1655 1656 1656 1657 device_set_wakeup_capable(&dev->dev, true); 1657 - device_set_wakeup_enable(&dev->dev, false); 1658 1658 platform_pci_sleep_wake(dev, false); 1659 1659 } 1660 1660

+1 -1

drivers/pci/pcie/aer/aer_inject.c

··· 168 168 target = &err->root_status; 169 169 rw1cs = 1; 170 170 break; 171 - case PCI_ERR_ROOT_COR_SRC: 171 + case PCI_ERR_ROOT_ERR_SRC: 172 172 target = &err->source_id; 173 173 break; 174 174 }

+133 -46

drivers/pci/pcie/aer/aerdrv.c

··· 72 72 pcie_aer_disable = 1; /* has priority over 'forceload' */ 73 73 } 74 74 75 + static int set_device_error_reporting(struct pci_dev *dev, void *data) 76 + { 77 + bool enable = *((bool *)data); 78 + 79 + if ((dev->pcie_type == PCI_EXP_TYPE_ROOT_PORT) || 80 + (dev->pcie_type == PCI_EXP_TYPE_UPSTREAM) || 81 + (dev->pcie_type == PCI_EXP_TYPE_DOWNSTREAM)) { 82 + if (enable) 83 + pci_enable_pcie_error_reporting(dev); 84 + else 85 + pci_disable_pcie_error_reporting(dev); 86 + } 87 + 88 + if (enable) 89 + pcie_set_ecrc_checking(dev); 90 + 91 + return 0; 92 + } 93 + 94 + /** 95 + * set_downstream_devices_error_reporting - enable/disable the error reporting bits on the root port and its downstream ports. 96 + * @dev: pointer to root port's pci_dev data structure 97 + * @enable: true = enable error reporting, false = disable error reporting. 98 + */ 99 + static void set_downstream_devices_error_reporting(struct pci_dev *dev, 100 + bool enable) 101 + { 102 + set_device_error_reporting(dev, &enable); 103 + 104 + if (!dev->subordinate) 105 + return; 106 + pci_walk_bus(dev->subordinate, set_device_error_reporting, &enable); 107 + } 108 + 109 + /** 110 + * aer_enable_rootport - enable Root Port's interrupts when receiving messages 111 + * @rpc: pointer to a Root Port data structure 112 + * 113 + * Invoked when PCIe bus loads AER service driver. 114 + */ 115 + static void aer_enable_rootport(struct aer_rpc *rpc) 116 + { 117 + struct pci_dev *pdev = rpc->rpd->port; 118 + int pos, aer_pos; 119 + u16 reg16; 120 + u32 reg32; 121 + 122 + pos = pci_pcie_cap(pdev); 123 + /* Clear PCIe Capability's Device Status */ 124 + pci_read_config_word(pdev, pos+PCI_EXP_DEVSTA, &reg16); 125 + pci_write_config_word(pdev, pos+PCI_EXP_DEVSTA, reg16); 126 + 127 + /* Disable system error generation in response to error messages */ 128 + pci_read_config_word(pdev, pos + PCI_EXP_RTCTL, &reg16); 129 + reg16 &= ~(SYSTEM_ERROR_INTR_ON_MESG_MASK); 130 + pci_write_config_word(pdev, pos + PCI_EXP_RTCTL, reg16); 131 + 132 + aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); 133 + /* Clear error status */ 134 + pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_STATUS, &reg32); 135 + pci_write_config_dword(pdev, aer_pos + PCI_ERR_ROOT_STATUS, reg32); 136 + pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_STATUS, &reg32); 137 + pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_STATUS, reg32); 138 + pci_read_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, &reg32); 139 + pci_write_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, reg32); 140 + 141 + /* 142 + * Enable error reporting for the root port device and downstream port 143 + * devices. 144 + */ 145 + set_downstream_devices_error_reporting(pdev, true); 146 + 147 + /* Enable Root Port's interrupt in response to error messages */ 148 + pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_COMMAND, &reg32); 149 + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; 150 + pci_write_config_dword(pdev, aer_pos + PCI_ERR_ROOT_COMMAND, reg32); 151 + } 152 + 153 + /** 154 + * aer_disable_rootport - disable Root Port's interrupts when receiving messages 155 + * @rpc: pointer to a Root Port data structure 156 + * 157 + * Invoked when PCIe bus unloads AER service driver. 158 + */ 159 + static void aer_disable_rootport(struct aer_rpc *rpc) 160 + { 161 + struct pci_dev *pdev = rpc->rpd->port; 162 + u32 reg32; 163 + int pos; 164 + 165 + /* 166 + * Disable error reporting for the root port device and downstream port 167 + * devices. 168 + */ 169 + set_downstream_devices_error_reporting(pdev, false); 170 + 171 + pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); 172 + /* Disable Root's interrupt in response to error messages */ 173 + pci_read_config_dword(pdev, pos + PCI_ERR_ROOT_COMMAND, &reg32); 174 + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; 175 + pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_COMMAND, reg32); 176 + 177 + /* Clear Root's error status reg */ 178 + pci_read_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, &reg32); 179 + pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, reg32); 180 + } 181 + 75 182 /** 76 183 * aer_irq - Root Port's ISR 77 184 * @irq: IRQ assigned to Root Port 78 185 * @context: pointer to Root Port data structure 79 186 * 80 187 * Invoked when Root Port detects AER messages. 81 - **/ 188 + */ 82 189 irqreturn_t aer_irq(int irq, void *context) 83 190 { 84 191 unsigned int status, id; ··· 204 97 205 98 /* Read error status */ 206 99 pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, &status); 207 - if (!(status & ROOT_ERR_STATUS_MASKS)) { 100 + if (!(status & (PCI_ERR_ROOT_UNCOR_RCV|PCI_ERR_ROOT_COR_RCV))) { 208 101 spin_unlock_irqrestore(&rpc->e_lock, flags); 209 102 return IRQ_NONE; 210 103 } 211 104 212 105 /* Read error source and clear error status */ 213 - pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_COR_SRC, &id); 106 + pci_read_config_dword(pdev->port, pos + PCI_ERR_ROOT_ERR_SRC, &id); 214 107 pci_write_config_dword(pdev->port, pos + PCI_ERR_ROOT_STATUS, status); 215 108 216 109 /* Store error source for later DPC handler */ ··· 242 135 * @dev: pointer to the pcie_dev data structure 243 136 * 244 137 * Invoked when Root Port's AER service is loaded. 245 - **/ 138 + */ 246 139 static struct aer_rpc *aer_alloc_rpc(struct pcie_device *dev) 247 140 { 248 141 struct aer_rpc *rpc; ··· 251 144 if (!rpc) 252 145 return NULL; 253 146 254 - /* 255 - * Initialize Root lock access, e_lock, to Root Error Status Reg, 256 - * Root Error ID Reg, and Root error producer/consumer index. 257 - */ 147 + /* Initialize Root lock access, e_lock, to Root Error Status Reg */ 258 148 spin_lock_init(&rpc->e_lock); 259 149 260 150 rpc->rpd = dev; 261 151 INIT_WORK(&rpc->dpc_handler, aer_isr); 262 - rpc->prod_idx = rpc->cons_idx = 0; 263 152 mutex_init(&rpc->rpc_mutex); 264 153 init_waitqueue_head(&rpc->wait_release); 265 154 ··· 270 167 * @dev: pointer to the pcie_dev data structure 271 168 * 272 169 * Invoked when PCI Express bus unloads or AER probe fails. 273 - **/ 170 + */ 274 171 static void aer_remove(struct pcie_device *dev) 275 172 { 276 173 struct aer_rpc *rpc = get_service_data(dev); ··· 282 179 283 180 wait_event(rpc->wait_release, rpc->prod_idx == rpc->cons_idx); 284 181 285 - aer_delete_rootport(rpc); 182 + aer_disable_rootport(rpc); 183 + kfree(rpc); 286 184 set_service_data(dev, NULL); 287 185 } 288 186 } ··· 294 190 * @id: pointer to the service id data structure 295 191 * 296 192 * Invoked when PCI Express bus loads AER service driver. 297 - **/ 193 + */ 298 194 static int __devinit aer_probe(struct pcie_device *dev) 299 195 { 300 196 int status; ··· 334 230 * @dev: pointer to Root Port's pci_dev data structure 335 231 * 336 232 * Invoked by Port Bus driver when performing link reset at Root Port. 337 - **/ 233 + */ 338 234 static pci_ers_result_t aer_root_reset(struct pci_dev *dev) 339 235 { 340 - u16 p2p_ctrl; 341 - u32 status; 236 + u32 reg32; 342 237 int pos; 343 238 344 239 pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); 345 240 346 241 /* Disable Root's interrupt in response to error messages */ 347 - pci_write_config_dword(dev, pos + PCI_ERR_ROOT_COMMAND, 0); 242 + pci_read_config_dword(dev, pos + PCI_ERR_ROOT_COMMAND, &reg32); 243 + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; 244 + pci_write_config_dword(dev, pos + PCI_ERR_ROOT_COMMAND, reg32); 348 245 349 - /* Assert Secondary Bus Reset */ 350 - pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &p2p_ctrl); 351 - p2p_ctrl |= PCI_BRIDGE_CTL_BUS_RESET; 352 - pci_write_config_word(dev, PCI_BRIDGE_CONTROL, p2p_ctrl); 353 - 354 - /* 355 - * we should send hot reset message for 2ms to allow it time to 356 - * propogate to all downstream ports 357 - */ 358 - msleep(2); 359 - 360 - /* De-assert Secondary Bus Reset */ 361 - p2p_ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET; 362 - pci_write_config_word(dev, PCI_BRIDGE_CONTROL, p2p_ctrl); 363 - 364 - /* 365 - * System software must wait for at least 100ms from the end 366 - * of a reset of one or more device before it is permitted 367 - * to issue Configuration Requests to those devices. 368 - */ 369 - msleep(200); 246 + aer_do_secondary_bus_reset(dev); 370 247 dev_printk(KERN_DEBUG, &dev->dev, "Root Port link has been reset\n"); 371 248 249 + /* Clear Root Error Status */ 250 + pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, &reg32); 251 + pci_write_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, reg32); 252 + 372 253 /* Enable Root Port's interrupt in response to error messages */ 373 - pci_read_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, &status); 374 - pci_write_config_dword(dev, pos + PCI_ERR_ROOT_STATUS, status); 375 - pci_write_config_dword(dev, 376 - pos + PCI_ERR_ROOT_COMMAND, 377 - ROOT_PORT_INTR_ON_MESG_MASK); 254 + pci_read_config_dword(dev, pos + PCI_ERR_ROOT_COMMAND, &reg32); 255 + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; 256 + pci_write_config_dword(dev, pos + PCI_ERR_ROOT_COMMAND, reg32); 378 257 379 258 return PCI_ERS_RESULT_RECOVERED; 380 259 } ··· 368 281 * @error: error severity being notified by port bus 369 282 * 370 283 * Invoked by Port Bus driver during error recovery. 371 - **/ 284 + */ 372 285 static pci_ers_result_t aer_error_detected(struct pci_dev *dev, 373 286 enum pci_channel_state error) 374 287 { ··· 381 294 * @dev: pointer to Root Port's pci_dev data structure 382 295 * 383 296 * Invoked by Port Bus driver during nonfatal recovery. 384 - **/ 297 + */ 385 298 static void aer_error_resume(struct pci_dev *dev) 386 299 { 387 300 int pos; ··· 408 321 * aer_service_init - register AER root service driver 409 322 * 410 323 * Invoked when AER root service driver is loaded. 411 - **/ 324 + */ 412 325 static int __init aer_service_init(void) 413 326 { 414 327 if (pcie_aer_disable) ··· 422 335 * aer_service_exit - unregister AER root service driver 423 336 * 424 337 * Invoked when AER root service driver is unloaded. 425 - **/ 338 + */ 426 339 static void __exit aer_service_exit(void) 427 340 { 428 341 pcie_port_service_unregister(&aerdriver);

+1 -5

drivers/pci/pcie/aer/aerdrv.h

··· 17 17 #define AER_FATAL 1 18 18 #define AER_CORRECTABLE 2 19 19 20 - /* Root Error Status Register Bits */ 21 - #define ROOT_ERR_STATUS_MASKS 0x0f 22 - 23 20 #define SYSTEM_ERROR_INTR_ON_MESG_MASK (PCI_EXP_RTCTL_SECEE| \ 24 21 PCI_EXP_RTCTL_SENFEE| \ 25 22 PCI_EXP_RTCTL_SEFEE) ··· 114 117 } 115 118 116 119 extern struct bus_type pcie_port_bus_type; 117 - extern void aer_enable_rootport(struct aer_rpc *rpc); 118 - extern void aer_delete_rootport(struct aer_rpc *rpc); 120 + extern void aer_do_secondary_bus_reset(struct pci_dev *dev); 119 121 extern int aer_init(struct pcie_device *dev); 120 122 extern void aer_isr(struct work_struct *work); 121 123 extern void aer_print_error(struct pci_dev *dev, struct aer_err_info *info);

+236 -324

drivers/pci/pcie/aer/aerdrv_core.c

··· 47 47 if (!pos) 48 48 return -EIO; 49 49 50 - pci_read_config_word(dev, pos+PCI_EXP_DEVCTL, &reg16); 51 - reg16 = reg16 | 52 - PCI_EXP_DEVCTL_CERE | 50 + pci_read_config_word(dev, pos + PCI_EXP_DEVCTL, &reg16); 51 + reg16 |= (PCI_EXP_DEVCTL_CERE | 53 52 PCI_EXP_DEVCTL_NFERE | 54 53 PCI_EXP_DEVCTL_FERE | 55 - PCI_EXP_DEVCTL_URRE; 56 - pci_write_config_word(dev, pos+PCI_EXP_DEVCTL, reg16); 54 + PCI_EXP_DEVCTL_URRE); 55 + pci_write_config_word(dev, pos + PCI_EXP_DEVCTL, reg16); 57 56 58 57 return 0; 59 58 } ··· 70 71 if (!pos) 71 72 return -EIO; 72 73 73 - pci_read_config_word(dev, pos+PCI_EXP_DEVCTL, &reg16); 74 - reg16 = reg16 & ~(PCI_EXP_DEVCTL_CERE | 75 - PCI_EXP_DEVCTL_NFERE | 76 - PCI_EXP_DEVCTL_FERE | 77 - PCI_EXP_DEVCTL_URRE); 78 - pci_write_config_word(dev, pos+PCI_EXP_DEVCTL, reg16); 74 + pci_read_config_word(dev, pos + PCI_EXP_DEVCTL, &reg16); 75 + reg16 &= ~(PCI_EXP_DEVCTL_CERE | 76 + PCI_EXP_DEVCTL_NFERE | 77 + PCI_EXP_DEVCTL_FERE | 78 + PCI_EXP_DEVCTL_URRE); 79 + pci_write_config_word(dev, pos + PCI_EXP_DEVCTL, reg16); 79 80 80 81 return 0; 81 82 } ··· 98 99 } 99 100 EXPORT_SYMBOL_GPL(pci_cleanup_aer_uncorrect_error_status); 100 101 101 - static int set_device_error_reporting(struct pci_dev *dev, void *data) 102 - { 103 - bool enable = *((bool *)data); 104 - 105 - if ((dev->pcie_type == PCI_EXP_TYPE_ROOT_PORT) || 106 - (dev->pcie_type == PCI_EXP_TYPE_UPSTREAM) || 107 - (dev->pcie_type == PCI_EXP_TYPE_DOWNSTREAM)) { 108 - if (enable) 109 - pci_enable_pcie_error_reporting(dev); 110 - else 111 - pci_disable_pcie_error_reporting(dev); 112 - } 113 - 114 - if (enable) 115 - pcie_set_ecrc_checking(dev); 116 - 117 - return 0; 118 - } 119 - 120 102 /** 121 - * set_downstream_devices_error_reporting - enable/disable the error reporting bits on the root port and its downstream ports. 122 - * @dev: pointer to root port's pci_dev data structure 123 - * @enable: true = enable error reporting, false = disable error reporting. 103 + * add_error_device - list device to be handled 104 + * @e_info: pointer to error info 105 + * @dev: pointer to pci_dev to be added 124 106 */ 125 - static void set_downstream_devices_error_reporting(struct pci_dev *dev, 126 - bool enable) 127 - { 128 - set_device_error_reporting(dev, &enable); 129 - 130 - if (!dev->subordinate) 131 - return; 132 - pci_walk_bus(dev->subordinate, set_device_error_reporting, &enable); 133 - } 134 - 135 - static inline int compare_device_id(struct pci_dev *dev, 136 - struct aer_err_info *e_info) 137 - { 138 - if (e_info->id == ((dev->bus->number << 8) | dev->devfn)) { 139 - /* 140 - * Device ID match 141 - */ 142 - return 1; 143 - } 144 - 145 - return 0; 146 - } 147 - 148 107 static int add_error_device(struct aer_err_info *e_info, struct pci_dev *dev) 149 108 { 150 109 if (e_info->error_dev_num < AER_MAX_MULTI_ERR_DEVICES) { 151 110 e_info->dev[e_info->error_dev_num] = dev; 152 111 e_info->error_dev_num++; 153 - return 1; 112 + return 0; 154 113 } 155 - 156 - return 0; 114 + return -ENOSPC; 157 115 } 158 - 159 116 160 117 #define PCI_BUS(x) (((x) >> 8) & 0xff) 161 118 162 - static int find_device_iter(struct pci_dev *dev, void *data) 119 + /** 120 + * is_error_source - check whether the device is source of reported error 121 + * @dev: pointer to pci_dev to be checked 122 + * @e_info: pointer to reported error info 123 + */ 124 + static bool is_error_source(struct pci_dev *dev, struct aer_err_info *e_info) 163 125 { 164 126 int pos; 165 - u32 status; 166 - u32 mask; 127 + u32 status, mask; 167 128 u16 reg16; 168 - int result; 169 - struct aer_err_info *e_info = (struct aer_err_info *)data; 170 129 171 130 /* 172 131 * When bus id is equal to 0, it might be a bad id 173 132 * reported by root port. 174 133 */ 175 134 if (!nosourceid && (PCI_BUS(e_info->id) != 0)) { 176 - result = compare_device_id(dev, e_info); 177 - if (result) 178 - add_error_device(e_info, dev); 135 + /* Device ID match? */ 136 + if (e_info->id == ((dev->bus->number << 8) | dev->devfn)) 137 + return true; 179 138 180 - /* 181 - * If there is no multiple error, we stop 182 - * or continue based on the id comparing. 183 - */ 139 + /* Continue id comparing if there is no multiple error */ 184 140 if (!e_info->multi_error_valid) 185 - return result; 186 - 187 - /* 188 - * If there are multiple errors and id does match, 189 - * We need continue to search other devices under 190 - * the root port. Return 0 means that. 191 - */ 192 - if (result) 193 - return 0; 141 + return false; 194 142 } 195 143 196 144 /* ··· 146 200 * 2) bus id is equal to 0. Some ports might lose the bus 147 201 * id of error source id; 148 202 * 3) There are multiple errors and prior id comparing fails; 149 - * We check AER status registers to find the initial reporter. 203 + * We check AER status registers to find possible reporter. 150 204 */ 151 205 if (atomic_read(&dev->enable_cnt) == 0) 152 - return 0; 206 + return false; 153 207 pos = pci_pcie_cap(dev); 154 208 if (!pos) 155 - return 0; 209 + return false; 210 + 156 211 /* Check if AER is enabled */ 157 - pci_read_config_word(dev, pos+PCI_EXP_DEVCTL, &reg16); 212 + pci_read_config_word(dev, pos + PCI_EXP_DEVCTL, &reg16); 158 213 if (!(reg16 & ( 159 214 PCI_EXP_DEVCTL_CERE | 160 215 PCI_EXP_DEVCTL_NFERE | 161 216 PCI_EXP_DEVCTL_FERE | 162 217 PCI_EXP_DEVCTL_URRE))) 163 - return 0; 218 + return false; 164 219 pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ERR); 165 220 if (!pos) 166 - return 0; 221 + return false; 167 222 168 - status = 0; 169 - mask = 0; 223 + /* Check if error is recorded */ 170 224 if (e_info->severity == AER_CORRECTABLE) { 171 225 pci_read_config_dword(dev, pos + PCI_ERR_COR_STATUS, &status); 172 226 pci_read_config_dword(dev, pos + PCI_ERR_COR_MASK, &mask); 173 - if (status & ~mask) { 174 - add_error_device(e_info, dev); 175 - goto added; 176 - } 177 227 } else { 178 228 pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_STATUS, &status); 179 229 pci_read_config_dword(dev, pos + PCI_ERR_UNCOR_MASK, &mask); 180 - if (status & ~mask) { 181 - add_error_device(e_info, dev); 182 - goto added; 183 - } 184 230 } 231 + if (status & ~mask) 232 + return true; 185 233 234 + return false; 235 + } 236 + 237 + static int find_device_iter(struct pci_dev *dev, void *data) 238 + { 239 + struct aer_err_info *e_info = (struct aer_err_info *)data; 240 + 241 + if (is_error_source(dev, e_info)) { 242 + /* List this device */ 243 + if (add_error_device(e_info, dev)) { 244 + /* We cannot handle more... Stop iteration */ 245 + /* TODO: Should print error message here? */ 246 + return 1; 247 + } 248 + 249 + /* If there is only a single error, stop iteration */ 250 + if (!e_info->multi_error_valid) 251 + return 1; 252 + } 186 253 return 0; 187 - 188 - added: 189 - if (e_info->multi_error_valid) 190 - return 0; 191 - else 192 - return 1; 193 254 } 194 255 195 256 /** 196 257 * find_source_device - search through device hierarchy for source device 197 258 * @parent: pointer to Root Port pci_dev data structure 198 - * @err_info: including detailed error information such like id 259 + * @e_info: including detailed error information such like id 199 260 * 200 - * Invoked when error is detected at the Root Port. 261 + * Return true if found. 262 + * 263 + * Invoked by DPC when error is detected at the Root Port. 264 + * Caller of this function must set id, severity, and multi_error_valid of 265 + * struct aer_err_info pointed by @e_info properly. This function must fill 266 + * e_info->error_dev_num and e_info->dev[], based on the given information. 201 267 */ 202 - static void find_source_device(struct pci_dev *parent, 268 + static bool find_source_device(struct pci_dev *parent, 203 269 struct aer_err_info *e_info) 204 270 { 205 271 struct pci_dev *dev = parent; 206 272 int result; 207 273 274 + /* Must reset in this function */ 275 + e_info->error_dev_num = 0; 276 + 208 277 /* Is Root Port an agent that sends error message? */ 209 278 result = find_device_iter(dev, e_info); 210 279 if (result) 211 - return; 280 + return true; 212 281 213 282 pci_walk_bus(parent->subordinate, find_device_iter, e_info); 283 + 284 + if (!e_info->error_dev_num) { 285 + dev_printk(KERN_DEBUG, &parent->dev, 286 + "can't find device of ID%04x\n", 287 + e_info->id); 288 + return false; 289 + } 290 + return true; 214 291 } 215 292 216 293 static int report_error_detected(struct pci_dev *dev, void *data) ··· 372 403 return result_data.result; 373 404 } 374 405 375 - struct find_aer_service_data { 376 - struct pcie_port_service_driver *aer_driver; 377 - int is_downstream; 378 - }; 406 + /** 407 + * aer_do_secondary_bus_reset - perform secondary bus reset 408 + * @dev: pointer to bridge's pci_dev data structure 409 + * 410 + * Invoked when performing link reset at Root Port or Downstream Port. 411 + */ 412 + void aer_do_secondary_bus_reset(struct pci_dev *dev) 413 + { 414 + u16 p2p_ctrl; 415 + 416 + /* Assert Secondary Bus Reset */ 417 + pci_read_config_word(dev, PCI_BRIDGE_CONTROL, &p2p_ctrl); 418 + p2p_ctrl |= PCI_BRIDGE_CTL_BUS_RESET; 419 + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, p2p_ctrl); 420 + 421 + /* 422 + * we should send hot reset message for 2ms to allow it time to 423 + * propagate to all downstream ports 424 + */ 425 + msleep(2); 426 + 427 + /* De-assert Secondary Bus Reset */ 428 + p2p_ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET; 429 + pci_write_config_word(dev, PCI_BRIDGE_CONTROL, p2p_ctrl); 430 + 431 + /* 432 + * System software must wait for at least 100ms from the end 433 + * of a reset of one or more device before it is permitted 434 + * to issue Configuration Requests to those devices. 435 + */ 436 + msleep(200); 437 + } 438 + 439 + /** 440 + * default_downstream_reset_link - default reset function for Downstream Port 441 + * @dev: pointer to downstream port's pci_dev data structure 442 + * 443 + * Invoked when performing link reset at Downstream Port w/ no aer driver. 444 + */ 445 + static pci_ers_result_t default_downstream_reset_link(struct pci_dev *dev) 446 + { 447 + aer_do_secondary_bus_reset(dev); 448 + dev_printk(KERN_DEBUG, &dev->dev, 449 + "Downstream Port link has been reset\n"); 450 + return PCI_ERS_RESULT_RECOVERED; 451 + } 379 452 380 453 static int find_aer_service_iter(struct device *device, void *data) 381 454 { 382 - struct device_driver *driver; 383 - struct pcie_port_service_driver *service_driver; 384 - struct find_aer_service_data *result; 455 + struct pcie_port_service_driver *service_driver, **drv; 385 456 386 - result = (struct find_aer_service_data *) data; 457 + drv = (struct pcie_port_service_driver **) data; 387 458 388 - if (device->bus == &pcie_port_bus_type) { 389 - struct pcie_device *pcie = to_pcie_device(device); 390 - 391 - if (pcie->port->pcie_type == PCI_EXP_TYPE_DOWNSTREAM) 392 - result->is_downstream = 1; 393 - 394 - driver = device->driver; 395 - if (driver) { 396 - service_driver = to_service_driver(driver); 397 - if (service_driver->service == PCIE_PORT_SERVICE_AER) { 398 - result->aer_driver = service_driver; 399 - return 1; 400 - } 459 + if (device->bus == &pcie_port_bus_type && device->driver) { 460 + service_driver = to_service_driver(device->driver); 461 + if (service_driver->service == PCIE_PORT_SERVICE_AER) { 462 + *drv = service_driver; 463 + return 1; 401 464 } 402 465 } 403 466 404 467 return 0; 405 468 } 406 469 407 - static void find_aer_service(struct pci_dev *dev, 408 - struct find_aer_service_data *data) 470 + static struct pcie_port_service_driver *find_aer_service(struct pci_dev *dev) 409 471 { 410 - int retval; 411 - retval = device_for_each_child(&dev->dev, data, find_aer_service_iter); 472 + struct pcie_port_service_driver *drv = NULL; 473 + 474 + device_for_each_child(&dev->dev, &drv, find_aer_service_iter); 475 + 476 + return drv; 412 477 } 413 478 414 479 static pci_ers_result_t reset_link(struct pcie_device *aerdev, ··· 450 447 { 451 448 struct pci_dev *udev; 452 449 pci_ers_result_t status; 453 - struct find_aer_service_data data; 450 + struct pcie_port_service_driver *driver; 454 451 455 - if (dev->hdr_type & PCI_HEADER_TYPE_BRIDGE) 452 + if (dev->hdr_type & PCI_HEADER_TYPE_BRIDGE) { 453 + /* Reset this port for all subordinates */ 456 454 udev = dev; 457 - else 455 + } else { 456 + /* Reset the upstream component (likely downstream port) */ 458 457 udev = dev->bus->self; 459 - 460 - data.is_downstream = 0; 461 - data.aer_driver = NULL; 462 - find_aer_service(udev, &data); 463 - 464 - /* 465 - * Use the aer driver of the error agent firstly. 466 - * If it hasn't the aer driver, use the root port's 467 - */ 468 - if (!data.aer_driver || !data.aer_driver->reset_link) { 469 - if (data.is_downstream && 470 - aerdev->device.driver && 471 - to_service_driver(aerdev->device.driver)->reset_link) { 472 - data.aer_driver = 473 - to_service_driver(aerdev->device.driver); 474 - } else { 475 - dev_printk(KERN_DEBUG, &dev->dev, "no link-reset " 476 - "support\n"); 477 - return PCI_ERS_RESULT_DISCONNECT; 478 - } 479 458 } 480 459 481 - status = data.aer_driver->reset_link(udev); 460 + /* Use the aer driver of the component firstly */ 461 + driver = find_aer_service(udev); 462 + 463 + if (driver && driver->reset_link) { 464 + status = driver->reset_link(udev); 465 + } else if (udev->pcie_type == PCI_EXP_TYPE_DOWNSTREAM) { 466 + status = default_downstream_reset_link(udev); 467 + } else { 468 + dev_printk(KERN_DEBUG, &dev->dev, 469 + "no link-reset support at upstream device %s\n", 470 + pci_name(udev)); 471 + return PCI_ERS_RESULT_DISCONNECT; 472 + } 473 + 482 474 if (status != PCI_ERS_RESULT_RECOVERED) { 483 - dev_printk(KERN_DEBUG, &dev->dev, "link reset at upstream " 484 - "device %s failed\n", pci_name(udev)); 475 + dev_printk(KERN_DEBUG, &dev->dev, 476 + "link reset at upstream device %s failed\n", 477 + pci_name(udev)); 485 478 return PCI_ERS_RESULT_DISCONNECT; 486 479 } 487 480 ··· 494 495 * error detected message to all downstream drivers within a hierarchy in 495 496 * question and return the returned code. 496 497 */ 497 - static pci_ers_result_t do_recovery(struct pcie_device *aerdev, 498 - struct pci_dev *dev, 498 + static void do_recovery(struct pcie_device *aerdev, struct pci_dev *dev, 499 499 int severity) 500 500 { 501 501 pci_ers_result_t status, result = PCI_ERS_RESULT_RECOVERED; ··· 512 514 513 515 if (severity == AER_FATAL) { 514 516 result = reset_link(aerdev, dev); 515 - if (result != PCI_ERS_RESULT_RECOVERED) { 516 - /* TODO: Should panic here? */ 517 - return result; 518 - } 517 + if (result != PCI_ERS_RESULT_RECOVERED) 518 + goto failed; 519 519 } 520 520 521 521 if (status == PCI_ERS_RESULT_CAN_RECOVER) ··· 534 538 report_slot_reset); 535 539 } 536 540 537 - if (status == PCI_ERS_RESULT_RECOVERED) 538 - broadcast_error_message(dev, 541 + if (status != PCI_ERS_RESULT_RECOVERED) 542 + goto failed; 543 + 544 + broadcast_error_message(dev, 539 545 state, 540 546 "resume", 541 547 report_resume); 542 548 543 - return status; 549 + dev_printk(KERN_DEBUG, &dev->dev, 550 + "AER driver successfully recovered\n"); 551 + return; 552 + 553 + failed: 554 + /* TODO: Should kernel panic here? */ 555 + dev_printk(KERN_DEBUG, &dev->dev, 556 + "AER driver didn't recover\n"); 544 557 } 545 558 546 559 /** ··· 564 559 struct pci_dev *dev, 565 560 struct aer_err_info *info) 566 561 { 567 - pci_ers_result_t status = 0; 568 562 int pos; 569 563 570 564 if (info->severity == AER_CORRECTABLE) { ··· 575 571 if (pos) 576 572 pci_write_config_dword(dev, pos + PCI_ERR_COR_STATUS, 577 573 info->status); 578 - } else { 579 - status = do_recovery(aerdev, dev, info->severity); 580 - if (status == PCI_ERS_RESULT_RECOVERED) { 581 - dev_printk(KERN_DEBUG, &dev->dev, "AER driver " 582 - "successfully recovered\n"); 583 - } else { 584 - /* TODO: Should kernel panic here? */ 585 - dev_printk(KERN_DEBUG, &dev->dev, "AER driver didn't " 586 - "recover\n"); 587 - } 588 - } 589 - } 590 - 591 - /** 592 - * aer_enable_rootport - enable Root Port's interrupts when receiving messages 593 - * @rpc: pointer to a Root Port data structure 594 - * 595 - * Invoked when PCIe bus loads AER service driver. 596 - */ 597 - void aer_enable_rootport(struct aer_rpc *rpc) 598 - { 599 - struct pci_dev *pdev = rpc->rpd->port; 600 - int pos, aer_pos; 601 - u16 reg16; 602 - u32 reg32; 603 - 604 - pos = pci_pcie_cap(pdev); 605 - /* Clear PCIe Capability's Device Status */ 606 - pci_read_config_word(pdev, pos+PCI_EXP_DEVSTA, &reg16); 607 - pci_write_config_word(pdev, pos+PCI_EXP_DEVSTA, reg16); 608 - 609 - /* Disable system error generation in response to error messages */ 610 - pci_read_config_word(pdev, pos + PCI_EXP_RTCTL, &reg16); 611 - reg16 &= ~(SYSTEM_ERROR_INTR_ON_MESG_MASK); 612 - pci_write_config_word(pdev, pos + PCI_EXP_RTCTL, reg16); 613 - 614 - aer_pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); 615 - /* Clear error status */ 616 - pci_read_config_dword(pdev, aer_pos + PCI_ERR_ROOT_STATUS, &reg32); 617 - pci_write_config_dword(pdev, aer_pos + PCI_ERR_ROOT_STATUS, reg32); 618 - pci_read_config_dword(pdev, aer_pos + PCI_ERR_COR_STATUS, &reg32); 619 - pci_write_config_dword(pdev, aer_pos + PCI_ERR_COR_STATUS, reg32); 620 - pci_read_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, &reg32); 621 - pci_write_config_dword(pdev, aer_pos + PCI_ERR_UNCOR_STATUS, reg32); 622 - 623 - /* 624 - * Enable error reporting for the root port device and downstream port 625 - * devices. 626 - */ 627 - set_downstream_devices_error_reporting(pdev, true); 628 - 629 - /* Enable Root Port's interrupt in response to error messages */ 630 - pci_write_config_dword(pdev, 631 - aer_pos + PCI_ERR_ROOT_COMMAND, 632 - ROOT_PORT_INTR_ON_MESG_MASK); 633 - } 634 - 635 - /** 636 - * disable_root_aer - disable Root Port's interrupts when receiving messages 637 - * @rpc: pointer to a Root Port data structure 638 - * 639 - * Invoked when PCIe bus unloads AER service driver. 640 - */ 641 - static void disable_root_aer(struct aer_rpc *rpc) 642 - { 643 - struct pci_dev *pdev = rpc->rpd->port; 644 - u32 reg32; 645 - int pos; 646 - 647 - /* 648 - * Disable error reporting for the root port device and downstream port 649 - * devices. 650 - */ 651 - set_downstream_devices_error_reporting(pdev, false); 652 - 653 - pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ERR); 654 - /* Disable Root's interrupt in response to error messages */ 655 - pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_COMMAND, 0); 656 - 657 - /* Clear Root's error status reg */ 658 - pci_read_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, &reg32); 659 - pci_write_config_dword(pdev, pos + PCI_ERR_ROOT_STATUS, reg32); 660 - } 661 - 662 - /** 663 - * get_e_source - retrieve an error source 664 - * @rpc: pointer to the root port which holds an error 665 - * 666 - * Invoked by DPC handler to consume an error. 667 - */ 668 - static struct aer_err_source *get_e_source(struct aer_rpc *rpc) 669 - { 670 - struct aer_err_source *e_source; 671 - unsigned long flags; 672 - 673 - /* Lock access to Root error producer/consumer index */ 674 - spin_lock_irqsave(&rpc->e_lock, flags); 675 - if (rpc->prod_idx == rpc->cons_idx) { 676 - spin_unlock_irqrestore(&rpc->e_lock, flags); 677 - return NULL; 678 - } 679 - e_source = &rpc->e_sources[rpc->cons_idx]; 680 - rpc->cons_idx++; 681 - if (rpc->cons_idx == AER_ERROR_SOURCES_MAX) 682 - rpc->cons_idx = 0; 683 - spin_unlock_irqrestore(&rpc->e_lock, flags); 684 - 685 - return e_source; 574 + } else 575 + do_recovery(aerdev, dev, info->severity); 686 576 } 687 577 688 578 /** ··· 585 687 * @info: pointer to structure to store the error record 586 688 * 587 689 * Return 1 on success, 0 on error. 690 + * 691 + * Note that @info is reused among all error devices. Clear fields properly. 588 692 */ 589 693 static int get_device_error_info(struct pci_dev *dev, struct aer_err_info *info) 590 694 { 591 695 int pos, temp; 592 696 697 + /* Must reset in this function */ 593 698 info->status = 0; 594 699 info->tlp_header_valid = 0; 595 700 ··· 645 744 { 646 745 int i; 647 746 648 - if (!e_info->dev[0]) { 649 - dev_printk(KERN_DEBUG, &p_device->port->dev, 650 - "can't find device of ID%04x\n", 651 - e_info->id); 652 - } 653 - 654 747 /* Report all before handle them, not to lost records by reset etc. */ 655 748 for (i = 0; i < e_info->error_dev_num && e_info->dev[i]; i++) { 656 749 if (get_device_error_info(e_info->dev[i], e_info)) ··· 665 770 struct aer_err_source *e_src) 666 771 { 667 772 struct aer_err_info *e_info; 668 - int i; 669 773 670 774 /* struct aer_err_info might be big, so we allocate it with slab */ 671 775 e_info = kmalloc(sizeof(struct aer_err_info), GFP_KERNEL); 672 - if (e_info == NULL) { 776 + if (!e_info) { 673 777 dev_printk(KERN_DEBUG, &p_device->port->dev, 674 778 "Can't allocate mem when processing AER errors\n"); 675 779 return; ··· 678 784 * There is a possibility that both correctable error and 679 785 * uncorrectable error being logged. Report correctable error first. 680 786 */ 681 - for (i = 1; i & ROOT_ERR_STATUS_MASKS ; i <<= 2) { 682 - if (i > 4) 683 - break; 684 - if (!(e_src->status & i)) 685 - continue; 787 + if (e_src->status & PCI_ERR_ROOT_COR_RCV) { 788 + e_info->id = ERR_COR_ID(e_src->id); 789 + e_info->severity = AER_CORRECTABLE; 686 790 687 - memset(e_info, 0, sizeof(struct aer_err_info)); 688 - 689 - /* Init comprehensive error information */ 690 - if (i & PCI_ERR_ROOT_COR_RCV) { 691 - e_info->id = ERR_COR_ID(e_src->id); 692 - e_info->severity = AER_CORRECTABLE; 693 - } else { 694 - e_info->id = ERR_UNCOR_ID(e_src->id); 695 - e_info->severity = ((e_src->status >> 6) & 1); 696 - } 697 - if (e_src->status & 698 - (PCI_ERR_ROOT_MULTI_COR_RCV | 699 - PCI_ERR_ROOT_MULTI_UNCOR_RCV)) 791 + if (e_src->status & PCI_ERR_ROOT_MULTI_COR_RCV) 700 792 e_info->multi_error_valid = 1; 793 + else 794 + e_info->multi_error_valid = 0; 701 795 702 796 aer_print_port_info(p_device->port, e_info); 703 797 704 - find_source_device(p_device->port, e_info); 705 - aer_process_err_devices(p_device, e_info); 798 + if (find_source_device(p_device->port, e_info)) 799 + aer_process_err_devices(p_device, e_info); 800 + } 801 + 802 + if (e_src->status & PCI_ERR_ROOT_UNCOR_RCV) { 803 + e_info->id = ERR_UNCOR_ID(e_src->id); 804 + 805 + if (e_src->status & PCI_ERR_ROOT_FATAL_RCV) 806 + e_info->severity = AER_FATAL; 807 + else 808 + e_info->severity = AER_NONFATAL; 809 + 810 + if (e_src->status & PCI_ERR_ROOT_MULTI_UNCOR_RCV) 811 + e_info->multi_error_valid = 1; 812 + else 813 + e_info->multi_error_valid = 0; 814 + 815 + aer_print_port_info(p_device->port, e_info); 816 + 817 + if (find_source_device(p_device->port, e_info)) 818 + aer_process_err_devices(p_device, e_info); 706 819 } 707 820 708 821 kfree(e_info); 822 + } 823 + 824 + /** 825 + * get_e_source - retrieve an error source 826 + * @rpc: pointer to the root port which holds an error 827 + * @e_src: pointer to store retrieved error source 828 + * 829 + * Return 1 if an error source is retrieved, otherwise 0. 830 + * 831 + * Invoked by DPC handler to consume an error. 832 + */ 833 + static int get_e_source(struct aer_rpc *rpc, struct aer_err_source *e_src) 834 + { 835 + unsigned long flags; 836 + int ret = 0; 837 + 838 + /* Lock access to Root error producer/consumer index */ 839 + spin_lock_irqsave(&rpc->e_lock, flags); 840 + if (rpc->prod_idx != rpc->cons_idx) { 841 + *e_src = rpc->e_sources[rpc->cons_idx]; 842 + rpc->cons_idx++; 843 + if (rpc->cons_idx == AER_ERROR_SOURCES_MAX) 844 + rpc->cons_idx = 0; 845 + ret = 1; 846 + } 847 + spin_unlock_irqrestore(&rpc->e_lock, flags); 848 + 849 + return ret; 709 850 } 710 851 711 852 /** ··· 753 824 { 754 825 struct aer_rpc *rpc = container_of(work, struct aer_rpc, dpc_handler); 755 826 struct pcie_device *p_device = rpc->rpd; 756 - struct aer_err_source *e_src; 827 + struct aer_err_source e_src; 757 828 758 829 mutex_lock(&rpc->rpc_mutex); 759 - e_src = get_e_source(rpc); 760 - while (e_src) { 761 - aer_isr_one_error(p_device, e_src); 762 - e_src = get_e_source(rpc); 763 - } 830 + while (get_e_source(rpc, &e_src)) 831 + aer_isr_one_error(p_device, &e_src); 764 832 mutex_unlock(&rpc->rpc_mutex); 765 833 766 834 wake_up(&rpc->wait_release); 767 - } 768 - 769 - /** 770 - * aer_delete_rootport - disable root port aer and delete service data 771 - * @rpc: pointer to a root port device being deleted 772 - * 773 - * Invoked when AER service unloaded on a specific Root Port 774 - */ 775 - void aer_delete_rootport(struct aer_rpc *rpc) 776 - { 777 - /* Disable root port AER itself */ 778 - disable_root_aer(rpc); 779 - 780 - kfree(rpc); 781 835 } 782 836 783 837 /**

+18 -3

drivers/pci/quirks.c

··· 2127 2127 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ASUSTEK, 0x9602, quirk_disable_msi); 2128 2128 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AI, 0x9602, quirk_disable_msi); 2129 2129 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_VIA, 0xa238, quirk_disable_msi); 2130 + DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_ATI, 0x5a3f, quirk_disable_msi); 2130 2131 2131 2132 /* Go through the list of Hypertransport capabilities and 2132 2133 * return 1 if a HT MSI capability is found and enabled */ ··· 2219 2218 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_8132_BRIDGE, 2220 2219 ht_enable_msi_mapping); 2221 2220 2222 - /* The P5N32-SLI Premium motherboard from Asus has a problem with msi 2221 + /* The P5N32-SLI motherboards from Asus have a problem with msi 2223 2222 * for the MCP55 NIC. It is not yet determined whether the msi problem 2224 2223 * also affects other devices. As for now, turn off msi for this device. 2225 2224 */ 2226 2225 static void __devinit nvenet_msi_disable(struct pci_dev *dev) 2227 2226 { 2228 - if (dmi_name_in_vendors("P5N32-SLI PREMIUM")) { 2227 + if (dmi_name_in_vendors("P5N32-SLI PREMIUM") || 2228 + dmi_name_in_vendors("P5N32-E SLI")) { 2229 2229 dev_info(&dev->dev, 2230 - "Disabling msi for MCP55 NIC on P5N32-SLI Premium\n"); 2230 + "Disabling msi for MCP55 NIC on P5N32-SLI\n"); 2231 2231 dev->no_msi = 1; 2232 2232 } 2233 2233 } ··· 2553 2551 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x1518, quirk_i82576_sriov); 2554 2552 2555 2553 #endif /* CONFIG_PCI_IOV */ 2554 + 2555 + /* Allow manual resource allocation for PCI hotplug bridges 2556 + * via pci=hpmemsize=nnM and pci=hpiosize=nnM parameters. For 2557 + * some PCI-PCI hotplug bridges, like PLX 6254 (former HINT HB6), 2558 + * kernel fails to allocate resources when hotplug device is 2559 + * inserted and PCI bus is rescanned. 2560 + */ 2561 + static void __devinit quirk_hotplug_bridge(struct pci_dev *dev) 2562 + { 2563 + dev->is_hotplug_bridge = 1; 2564 + } 2565 + 2566 + DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_HINT, 0x0020, quirk_hotplug_bridge); 2556 2567 2557 2568 /* 2558 2569 * This is a quirk for the Ricoh MMC controller found as a part of

+48

drivers/pci/slot.c

··· 97 97 return bus_speed_read(slot->bus->cur_bus_speed, buf); 98 98 } 99 99 100 + static void remove_sysfs_files(struct pci_slot *slot) 101 + { 102 + char func[10]; 103 + struct list_head *tmp; 104 + 105 + list_for_each(tmp, &slot->bus->devices) { 106 + struct pci_dev *dev = pci_dev_b(tmp); 107 + if (PCI_SLOT(dev->devfn) != slot->number) 108 + continue; 109 + sysfs_remove_link(&dev->dev.kobj, "slot"); 110 + 111 + snprintf(func, 10, "function%d", PCI_FUNC(dev->devfn)); 112 + sysfs_remove_link(&slot->kobj, func); 113 + } 114 + } 115 + 116 + static int create_sysfs_files(struct pci_slot *slot) 117 + { 118 + int result; 119 + char func[10]; 120 + struct list_head *tmp; 121 + 122 + list_for_each(tmp, &slot->bus->devices) { 123 + struct pci_dev *dev = pci_dev_b(tmp); 124 + if (PCI_SLOT(dev->devfn) != slot->number) 125 + continue; 126 + 127 + result = sysfs_create_link(&dev->dev.kobj, &slot->kobj, "slot"); 128 + if (result) 129 + goto fail; 130 + 131 + snprintf(func, 10, "function%d", PCI_FUNC(dev->devfn)); 132 + result = sysfs_create_link(&slot->kobj, &dev->dev.kobj, func); 133 + if (result) 134 + goto fail; 135 + } 136 + 137 + return 0; 138 + 139 + fail: 140 + remove_sysfs_files(slot); 141 + return result; 142 + } 143 + 100 144 static void pci_slot_release(struct kobject *kobj) 101 145 { 102 146 struct pci_dev *dev; ··· 152 108 list_for_each_entry(dev, &slot->bus->devices, bus_list) 153 109 if (PCI_SLOT(dev->devfn) == slot->number) 154 110 dev->slot = NULL; 111 + 112 + remove_sysfs_files(slot); 155 113 156 114 list_del(&slot->list); 157 115 ··· 345 299 346 300 INIT_LIST_HEAD(&slot->list); 347 301 list_add(&slot->list, &parent->slots); 302 + 303 + create_sysfs_files(slot); 348 304 349 305 list_for_each_entry(dev, &parent->devices, bus_list) 350 306 if (PCI_SLOT(dev->devfn) == slot_nr)

+3 -1

include/linux/ioport.h

··· 52 52 53 53 #define IORESOURCE_MEM_64 0x00100000 54 54 #define IORESOURCE_WINDOW 0x00200000 /* forwarded by bridge */ 55 + #define IORESOURCE_MUXED 0x00400000 /* Resource is software muxed */ 55 56 56 57 #define IORESOURCE_EXCLUSIVE 0x08000000 /* Userland may not map this resource */ 57 58 #define IORESOURCE_DISABLED 0x10000000 ··· 144 143 } 145 144 146 145 /* Convenience shorthand with allocation */ 147 - #define request_region(start,n,name) __request_region(&ioport_resource, (start), (n), (name), 0) 146 + #define request_region(start,n,name) __request_region(&ioport_resource, (start), (n), (name), 0) 147 + #define request_muxed_region(start,n,name) __request_region(&ioport_resource, (start), (n), (name), IORESOURCE_MUXED) 148 148 #define __request_mem_region(start,n,name, excl) __request_region(&iomem_resource, (start), (n), (name), excl) 149 149 #define request_mem_region(start,n,name) __request_region(&iomem_resource, (start), (n), (name), 0) 150 150 #define request_mem_region_exclusive(start,n,name) \

+2 -2

include/linux/pci_ids.h

··· 2419 2419 #define PCI_DEVICE_ID_INTEL_82845_HB 0x1a30 2420 2420 #define PCI_DEVICE_ID_INTEL_IOAT 0x1a38 2421 2421 #define PCI_DEVICE_ID_INTEL_CPT_SMBUS 0x1c22 2422 - #define PCI_DEVICE_ID_INTEL_CPT_LPC1 0x1c42 2423 - #define PCI_DEVICE_ID_INTEL_CPT_LPC2 0x1c43 2422 + #define PCI_DEVICE_ID_INTEL_CPT_LPC_MIN 0x1c41 2423 + #define PCI_DEVICE_ID_INTEL_CPT_LPC_MAX 0x1c5f 2424 2424 #define PCI_DEVICE_ID_INTEL_82801AA_0 0x2410 2425 2425 #define PCI_DEVICE_ID_INTEL_82801AA_1 0x2411 2426 2426 #define PCI_DEVICE_ID_INTEL_82801AA_3 0x2413

+1 -2

include/linux/pci_regs.h

··· 566 566 #define PCI_ERR_ROOT_FIRST_FATAL 0x00000010 /* First Fatal */ 567 567 #define PCI_ERR_ROOT_NONFATAL_RCV 0x00000020 /* Non-Fatal Received */ 568 568 #define PCI_ERR_ROOT_FATAL_RCV 0x00000040 /* Fatal Received */ 569 - #define PCI_ERR_ROOT_COR_SRC 52 570 - #define PCI_ERR_ROOT_SRC 54 569 + #define PCI_ERR_ROOT_ERR_SRC 52 /* Error Source Identification */ 571 570 572 571 /* Virtual Channel */ 573 572 #define PCI_VC_PORT_REG1 4

+15 -1

kernel/resource.c

··· 15 15 #include <linux/spinlock.h> 16 16 #include <linux/fs.h> 17 17 #include <linux/proc_fs.h> 18 + #include <linux/sched.h> 18 19 #include <linux/seq_file.h> 19 20 #include <linux/device.h> 20 21 #include <linux/pfn.h> ··· 682 681 * release_region releases a matching busy region. 683 682 */ 684 683 684 + static DECLARE_WAIT_QUEUE_HEAD(muxed_resource_wait); 685 + 685 686 /** 686 687 * __request_region - create a new busy resource region 687 688 * @parent: parent resource descriptor ··· 696 693 resource_size_t start, resource_size_t n, 697 694 const char *name, int flags) 698 695 { 696 + DECLARE_WAITQUEUE(wait, current); 699 697 struct resource *res = kzalloc(sizeof(*res), GFP_KERNEL); 700 698 701 699 if (!res) ··· 721 717 if (!(conflict->flags & IORESOURCE_BUSY)) 722 718 continue; 723 719 } 724 - 720 + if (conflict->flags & flags & IORESOURCE_MUXED) { 721 + add_wait_queue(&muxed_resource_wait, &wait); 722 + write_unlock(&resource_lock); 723 + set_current_state(TASK_UNINTERRUPTIBLE); 724 + schedule(); 725 + remove_wait_queue(&muxed_resource_wait, &wait); 726 + write_lock(&resource_lock); 727 + continue; 728 + } 725 729 /* Uhhuh, that didn't work out.. */ 726 730 kfree(res); 727 731 res = NULL; ··· 803 791 break; 804 792 *p = res->sibling; 805 793 write_unlock(&resource_lock); 794 + if (res->flags & IORESOURCE_MUXED) 795 + wake_up(&muxed_resource_wait); 806 796 kfree(res); 807 797 return; 808 798 }