Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

at v2.6.22-rc2 572 lines 26 kB view raw
1 The MSI Driver Guide HOWTO 2 Tom L Nguyen tom.l.nguyen@intel.com 3 10/03/2003 4 Revised Feb 12, 2004 by Martine Silbermann 5 email: Martine.Silbermann@hp.com 6 Revised Jun 25, 2004 by Tom L Nguyen 7 81. About this guide 9 10This guide describes the basics of Message Signaled Interrupts (MSI), 11the advantages of using MSI over traditional interrupt mechanisms, 12and how to enable your driver to use MSI or MSI-X. Also included is 13a Frequently Asked Questions (FAQ) section. 14 151.1 Terminology 16 17PCI devices can be single-function or multi-function. In either case, 18when this text talks about enabling or disabling MSI on a "device 19function," it is referring to one specific PCI device and function and 20not to all functions on a PCI device (unless the PCI device has only 21one function). 22 232. Copyright 2003 Intel Corporation 24 253. What is MSI/MSI-X? 26 27Message Signaled Interrupt (MSI), as described in the PCI Local Bus 28Specification Revision 2.3 or later, is an optional feature, and a 29required feature for PCI Express devices. MSI enables a device function 30to request service by sending an Inbound Memory Write on its PCI bus to 31the FSB as a Message Signal Interrupt transaction. Because MSI is 32generated in the form of a Memory Write, all transaction conditions, 33such as a Retry, Master-Abort, Target-Abort or normal completion, are 34supported. 35 36A PCI device that supports MSI must also support pin IRQ assertion 37interrupt mechanism to provide backward compatibility for systems that 38do not support MSI. In systems which support MSI, the bus driver is 39responsible for initializing the message address and message data of 40the device function's MSI/MSI-X capability structure during device 41initial configuration. 42 43An MSI capable device function indicates MSI support by implementing 44the MSI/MSI-X capability structure in its PCI capability list. The 45device function may implement both the MSI capability structure and 46the MSI-X capability structure; however, the bus driver should not 47enable both. 48 49The MSI capability structure contains Message Control register, 50Message Address register and Message Data register. These registers 51provide the bus driver control over MSI. The Message Control register 52indicates the MSI capability supported by the device. The Message 53Address register specifies the target address and the Message Data 54register specifies the characteristics of the message. To request 55service, the device function writes the content of the Message Data 56register to the target address. The device and its software driver 57are prohibited from writing to these registers. 58 59The MSI-X capability structure is an optional extension to MSI. It 60uses an independent and separate capability structure. There are 61some key advantages to implementing the MSI-X capability structure 62over the MSI capability structure as described below. 63 64 - Support a larger maximum number of vectors per function. 65 66 - Provide the ability for system software to configure 67 each vector with an independent message address and message 68 data, specified by a table that resides in Memory Space. 69 70 - MSI and MSI-X both support per-vector masking. Per-vector 71 masking is an optional extension of MSI but a required 72 feature for MSI-X. Per-vector masking provides the kernel the 73 ability to mask/unmask a single MSI while running its 74 interrupt service routine. If per-vector masking is 75 not supported, then the device driver should provide the 76 hardware/software synchronization to ensure that the device 77 generates MSI when the driver wants it to do so. 78 794. Why use MSI? 80 81As a benefit to the simplification of board design, MSI allows board 82designers to remove out-of-band interrupt routing. MSI is another 83step towards a legacy-free environment. 84 85Due to increasing pressure on chipset and processor packages to 86reduce pin count, the need for interrupt pins is expected to 87diminish over time. Devices, due to pin constraints, may implement 88messages to increase performance. 89 90PCI Express endpoints uses INTx emulation (in-band messages) instead 91of IRQ pin assertion. Using INTx emulation requires interrupt 92sharing among devices connected to the same node (PCI bridge) while 93MSI is unique (non-shared) and does not require BIOS configuration 94support. As a result, the PCI Express technology requires MSI 95support for better interrupt performance. 96 97Using MSI enables the device functions to support two or more 98vectors, which can be configured to target different CPUs to 99increase scalability. 100 1015. Configuring a driver to use MSI/MSI-X 102 103By default, the kernel will not enable MSI/MSI-X on all devices that 104support this capability. The CONFIG_PCI_MSI kernel option 105must be selected to enable MSI/MSI-X support. 106 1075.1 Including MSI/MSI-X support into the kernel 108 109To allow MSI/MSI-X capable device drivers to selectively enable 110MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described 111below), the VECTOR based scheme needs to be enabled by setting 112CONFIG_PCI_MSI during kernel config. 113 114Since the target of the inbound message is the local APIC, providing 115CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI. 116 1175.2 Configuring for MSI support 118 119Due to the non-contiguous fashion in vector assignment of the 120existing Linux kernel, this version does not support multiple 121messages regardless of a device function is capable of supporting 122more than one vector. To enable MSI on a device function's MSI 123capability structure requires a device driver to call the function 124pci_enable_msi() explicitly. 125 1265.2.1 API pci_enable_msi 127 128int pci_enable_msi(struct pci_dev *dev) 129 130With this new API, a device driver that wants to have MSI 131enabled on its device function must call this API to enable MSI. 132A successful call will initialize the MSI capability structure 133with ONE vector, regardless of whether a device function is 134capable of supporting multiple messages. This vector replaces the 135pre-assigned dev->irq with a new MSI vector. To avoid a conflict 136of the new assigned vector with existing pre-assigned vector requires 137a device driver to call this API before calling request_irq(). 138 1395.2.2 API pci_disable_msi 140 141void pci_disable_msi(struct pci_dev *dev) 142 143This API should always be used to undo the effect of pci_enable_msi() 144when a device driver is unloading. This API restores dev->irq with 145the pre-assigned IOAPIC vector and switches a device's interrupt 146mode to PCI pin-irq assertion/INTx emulation mode. 147 148Note that a device driver should always call free_irq() on the MSI vector 149that it has done request_irq() on before calling this API. Failure to do 150so results in a BUG_ON() and a device will be left with MSI enabled and 151leaks its vector. 152 1535.2.3 MSI mode vs. legacy mode diagram 154 155The below diagram shows the events which switch the interrupt 156mode on the MSI-capable device function between MSI mode and 157PIN-IRQ assertion mode. 158 159 ------------ pci_enable_msi ------------------------ 160 | | <=============== | | 161 | MSI MODE | | PIN-IRQ ASSERTION MODE | 162 | | ===============> | | 163 ------------ pci_disable_msi ------------------------ 164 165 166Figure 1. MSI Mode vs. Legacy Mode 167 168In Figure 1, a device operates by default in legacy mode. Legacy 169in this context means PCI pin-irq assertion or PCI-Express INTx 170emulation. A successful MSI request (using pci_enable_msi()) switches 171a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector 172stored in dev->irq will be saved by the PCI subsystem and a new 173assigned MSI vector will replace dev->irq. 174 175To return back to its default mode, a device driver should always call 176pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a 177device driver should always call free_irq() on the MSI vector it has 178done request_irq() on before calling pci_disable_msi(). Failure to do 179so results in a BUG_ON() and a device will be left with MSI enabled and 180leaks its vector. Otherwise, the PCI subsystem restores a device's 181dev->irq with a pre-assigned IOAPIC vector and marks the released 182MSI vector as unused. 183 184Once being marked as unused, there is no guarantee that the PCI 185subsystem will reserve this MSI vector for a device. Depending on 186the availability of current PCI vector resources and the number of 187MSI/MSI-X requests from other drivers, this MSI may be re-assigned. 188 189For the case where the PCI subsystem re-assigns this MSI vector to 190another driver, a request to switch back to MSI mode may result 191in being assigned a different MSI vector or a failure if no more 192vectors are available. 193 1945.3 Configuring for MSI-X support 195 196Due to the ability of the system software to configure each vector of 197the MSI-X capability structure with an independent message address 198and message data, the non-contiguous fashion in vector assignment of 199the existing Linux kernel has no impact on supporting multiple 200messages on an MSI-X capable device functions. To enable MSI-X on 201a device function's MSI-X capability structure requires its device 202driver to call the function pci_enable_msix() explicitly. 203 204The function pci_enable_msix(), once invoked, enables either 205all or nothing, depending on the current availability of PCI vector 206resources. If the PCI vector resources are available for the number 207of vectors requested by a device driver, this function will configure 208the MSI-X table of the MSI-X capability structure of a device with 209requested messages. To emphasize this reason, for example, a device 210may be capable for supporting the maximum of 32 vectors while its 211software driver usually may request 4 vectors. It is recommended 212that the device driver should call this function once during the 213initialization phase of the device driver. 214 215Unlike the function pci_enable_msi(), the function pci_enable_msix() 216does not replace the pre-assigned IOAPIC dev->irq with a new MSI 217vector because the PCI subsystem writes the 1:1 vector-to-entry mapping 218into the field vector of each element contained in a second argument. 219Note that the pre-assigned IOAPIC dev->irq is valid only if the device 220operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at 221using dev->irq by the device driver to request for interrupt service 222may result in unpredictable behavior. 223 224For each MSI-X vector granted, a device driver is responsible for calling 225other functions like request_irq(), enable_irq(), etc. to enable 226this vector with its corresponding interrupt service handler. It is 227a device driver's choice to assign all vectors with the same 228interrupt service handler or each vector with a unique interrupt 229service handler. 230 2315.3.1 Handling MMIO address space of MSI-X Table 232 233The PCI 3.0 specification has implementation notes that MMIO address 234space for a device's MSI-X structure should be isolated so that the 235software system can set different pages for controlling accesses to the 236MSI-X structure. The implementation of MSI support requires the PCI 237subsystem, not a device driver, to maintain full control of the MSI-X 238table/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-X 239table/MSI-X PBA. A device driver is prohibited from requesting the MMIO 240address space of the MSI-X table/MSI-X PBA. Otherwise, the PCI subsystem 241will fail enabling MSI-X on its hardware device when it calls the function 242pci_enable_msix(). 243 2445.3.2 Handling MSI-X allocation 245 246Determining the number of MSI-X vectors allocated to a function is 247dependent on the number of MSI capable devices and MSI-X capable 248devices populated in the system. The policy of allocating MSI-X 249vectors to a function is defined as the following: 250 251#of MSI-X vectors allocated to a function = (x - y)/z where 252 253x = The number of available PCI vector resources by the time 254 the device driver calls pci_enable_msix(). The PCI vector 255 resources is the sum of the number of unassigned vectors 256 (new) and the number of released vectors when any MSI/MSI-X 257 device driver switches its hardware device back to a legacy 258 mode or is hot-removed. The number of unassigned vectors 259 may exclude some vectors reserved, as defined in parameter 260 NR_HP_RESERVED_VECTORS, for the case where the system is 261 capable of supporting hot-add/hot-remove operations. Users 262 may change the value defined in NR_HR_RESERVED_VECTORS to 263 meet their specific needs. 264 265y = The number of MSI capable devices populated in the system. 266 This policy ensures that each MSI capable device has its 267 vector reserved to avoid the case where some MSI-X capable 268 drivers may attempt to claim all available vector resources. 269 270z = The number of MSI-X capable devices populated in the system. 271 This policy ensures that maximum (x - y) is distributed 272 evenly among MSI-X capable devices. 273 274Note that the PCI subsystem scans y and z during a bus enumeration. 275When the PCI subsystem completes configuring MSI/MSI-X capability 276structure of a device as requested by its device driver, y/z is 277decremented accordingly. 278 2795.3.3 Handling MSI-X shortages 280 281For the case where fewer MSI-X vectors are allocated to a function 282than requested, the function pci_enable_msix() will return the 283maximum number of MSI-X vectors available to the caller. A device 284driver may re-send its request with fewer or equal vectors indicated 285in the return. For example, if a device driver requests 5 vectors, but 286the number of available vectors is 3 vectors, a value of 3 will be 287returned as a result of pci_enable_msix() call. A function could be 288designed for its driver to use only 3 MSI-X table entries as 289different combinations as ABC--, A-B-C, A--CB, etc. Note that this 290patch does not support multiple entries with the same vector. Such 291attempt by a device driver to use 5 MSI-X table entries with 3 vectors 292as ABBCC, AABCC, BCCBA, etc will result as a failure by the function 293pci_enable_msix(). Below are the reasons why supporting multiple 294entries with the same vector is an undesirable solution. 295 296 - The PCI subsystem cannot determine the entry that 297 generated the message to mask/unmask MSI while handling 298 software driver ISR. Attempting to walk through all MSI-X 299 table entries (2048 max) to mask/unmask any match vector 300 is an undesirable solution. 301 302 - Walking through all MSI-X table entries (2048 max) to handle 303 SMP affinity of any match vector is an undesirable solution. 304 3055.3.4 API pci_enable_msix 306 307int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec) 308 309This API enables a device driver to request the PCI subsystem 310to enable MSI-X messages on its hardware device. Depending on 311the availability of PCI vectors resources, the PCI subsystem enables 312either all or none of the requested vectors. 313 314Argument 'dev' points to the device (pci_dev) structure. 315 316Argument 'entries' is a pointer to an array of msix_entry structs. 317The number of entries is indicated in argument 'nvec'. 318struct msix_entry is defined in /driver/pci/msi.h: 319 320struct msix_entry { 321 u16 vector; /* kernel uses to write alloc vector */ 322 u16 entry; /* driver uses to specify entry */ 323}; 324 325A device driver is responsible for initializing the field 'entry' of 326each element with a unique entry supported by MSI-X table. Otherwise, 327-EINVAL will be returned as a result. A successful return of zero 328indicates the PCI subsystem completed initializing each of the requested 329entries of the MSI-X table with message address and message data. 330Last but not least, the PCI subsystem will write the 1:1 331vector-to-entry mapping into the field 'vector' of each element. A 332device driver is responsible for keeping track of allocated MSI-X 333vectors in its internal data structure. 334 335A return of zero indicates that the number of MSI-X vectors was 336successfully allocated. A return of greater than zero indicates 337MSI-X vector shortage. Or a return of less than zero indicates 338a failure. This failure may be a result of duplicate entries 339specified in second argument, or a result of no available vector, 340or a result of failing to initialize MSI-X table entries. 341 3425.3.5 API pci_disable_msix 343 344void pci_disable_msix(struct pci_dev *dev) 345 346This API should always be used to undo the effect of pci_enable_msix() 347when a device driver is unloading. Note that a device driver should 348always call free_irq() on all MSI-X vectors it has done request_irq() 349on before calling this API. Failure to do so results in a BUG_ON() and 350a device will be left with MSI-X enabled and leaks its vectors. 351 3525.3.6 MSI-X mode vs. legacy mode diagram 353 354The below diagram shows the events which switch the interrupt 355mode on the MSI-X capable device function between MSI-X mode and 356PIN-IRQ assertion mode (legacy). 357 358 ------------ pci_enable_msix(,,n) ------------------------ 359 | | <=============== | | 360 | MSI-X MODE | | PIN-IRQ ASSERTION MODE | 361 | | ===============> | | 362 ------------ pci_disable_msix ------------------------ 363 364Figure 2. MSI-X Mode vs. Legacy Mode 365 366In Figure 2, a device operates by default in legacy mode. A 367successful MSI-X request (using pci_enable_msix()) switches a 368device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector 369stored in dev->irq will be saved by the PCI subsystem; however, 370unlike MSI mode, the PCI subsystem will not replace dev->irq with 371assigned MSI-X vector because the PCI subsystem already writes the 1:1 372vector-to-entry mapping into the field 'vector' of each element 373specified in second argument. 374 375To return back to its default mode, a device driver should always call 376pci_disable_msix() to undo the effect of pci_enable_msix(). Note that 377a device driver should always call free_irq() on all MSI-X vectors it 378has done request_irq() on before calling pci_disable_msix(). Failure 379to do so results in a BUG_ON() and a device will be left with MSI-X 380enabled and leaks its vectors. Otherwise, the PCI subsystem switches a 381device function's interrupt mode from MSI-X mode to legacy mode and 382marks all allocated MSI-X vectors as unused. 383 384Once being marked as unused, there is no guarantee that the PCI 385subsystem will reserve these MSI-X vectors for a device. Depending on 386the availability of current PCI vector resources and the number of 387MSI/MSI-X requests from other drivers, these MSI-X vectors may be 388re-assigned. 389 390For the case where the PCI subsystem re-assigned these MSI-X vectors 391to other drivers, a request to switch back to MSI-X mode may result 392being assigned with another set of MSI-X vectors or a failure if no 393more vectors are available. 394 3955.4 Handling function implementing both MSI and MSI-X capabilities 396 397For the case where a function implements both MSI and MSI-X 398capabilities, the PCI subsystem enables a device to run either in MSI 399mode or MSI-X mode but not both. A device driver determines whether it 400wants MSI or MSI-X enabled on its hardware device. Once a device 401driver requests for MSI, for example, it is prohibited from requesting 402MSI-X; in other words, a device driver is not permitted to ping-pong 403between MSI mod MSI-X mode during a run-time. 404 4055.5 Hardware requirements for MSI/MSI-X support 406 407MSI/MSI-X support requires support from both system hardware and 408individual hardware device functions. 409 4105.5.1 System hardware support 411 412Since the target of MSI address is the local APIC CPU, enabling 413MSI/MSI-X support in the Linux kernel is dependent on whether existing 414system hardware supports local APIC. Users should verify that their 415system supports local APIC operation by testing that it runs when 416CONFIG_X86_LOCAL_APIC=y. 417 418In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set; 419however, in UP environment, users must manually set 420CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting 421CONFIG_PCI_MSI enables the VECTOR based scheme and the option for 422MSI-capable device drivers to selectively enable MSI/MSI-X. 423 424Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X 425vector is allocated new during runtime and MSI/MSI-X support does not 426depend on BIOS support. This key independency enables MSI/MSI-X 427support on future IOxAPIC free platforms. 428 4295.5.2 Device hardware support 430 431The hardware device function supports MSI by indicating the 432MSI/MSI-X capability structure on its PCI capability list. By 433default, this capability structure will not be initialized by 434the kernel to enable MSI during the system boot. In other words, 435the device function is running on its default pin assertion mode. 436Note that in many cases the hardware supporting MSI have bugs, 437which may result in system hangs. The software driver of specific 438MSI-capable hardware is responsible for deciding whether to call 439pci_enable_msi or not. A return of zero indicates the kernel 440successfully initialized the MSI/MSI-X capability structure of the 441device function. The device function is now running on MSI/MSI-X mode. 442 4435.6 How to tell whether MSI/MSI-X is enabled on device function 444 445At the driver level, a return of zero from the function call of 446pci_enable_msi()/pci_enable_msix() indicates to a device driver that 447its device function is initialized successfully and ready to run in 448MSI/MSI-X mode. 449 450At the user level, users can use the command 'cat /proc/interrupts' 451to display the vectors allocated for devices and their interrupt 452MSI/MSI-X modes ("PCI-MSI"/"PCI-MSI-X"). Below shows MSI mode is 453enabled on a SCSI Adaptec 39320D Ultra320 controller. 454 455 CPU0 CPU1 456 0: 324639 0 IO-APIC-edge timer 457 1: 1186 0 IO-APIC-edge i8042 458 2: 0 0 XT-PIC cascade 459 12: 2797 0 IO-APIC-edge i8042 460 14: 6543 0 IO-APIC-edge ide0 461 15: 1 0 IO-APIC-edge ide1 462169: 0 0 IO-APIC-level uhci-hcd 463185: 0 0 IO-APIC-level uhci-hcd 464193: 138 10 PCI-MSI aic79xx 465201: 30 0 PCI-MSI aic79xx 466225: 30 0 IO-APIC-level aic7xxx 467233: 30 0 IO-APIC-level aic7xxx 468NMI: 0 0 469LOC: 324553 325068 470ERR: 0 471MIS: 0 472 4736. MSI quirks 474 475Several PCI chipsets or devices are known to not support MSI. 476The PCI stack provides 3 possible levels of MSI disabling: 477* on a single device 478* on all devices behind a specific bridge 479* globally 480 4816.1. Disabling MSI on a single device 482 483Under some circumstances it might be required to disable MSI on a 484single device. This may be achieved by either not calling pci_enable_msi() 485or all, or setting the pci_dev->no_msi flag before (most of the time 486in a quirk). 487 4886.2. Disabling MSI below a bridge 489 490The vast majority of MSI quirks are required by PCI bridges not 491being able to route MSI between busses. In this case, MSI have to be 492disabled on all devices behind this bridge. It is achieves by setting 493the PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridge 494subordinate bus. There is no need to set the same flag on bridges that 495are below the broken bridge. When pci_enable_msi() is called to enable 496MSI on a device, pci_msi_supported() takes care of checking the NO_MSI 497flag in all parent busses of the device. 498 499Some bridges actually support dynamic MSI support enabling/disabling 500by changing some bits in their PCI configuration space (especially 501the Hypertransport chipsets such as the nVidia nForce and Serverworks 502HT2000). It may then be required to update the NO_MSI flag on the 503corresponding devices in the sysfs hierarchy. To enable MSI support 504on device "0000:00:0e", do: 505 506 echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_bus 507 508To disable MSI support, echo 0 instead of 1. Note that it should be 509used with caution since changing this value might break interrupts. 510 5116.3. Disabling MSI globally 512 513Some extreme cases may require to disable MSI globally on the system. 514For now, the only known case is a Serverworks PCI-X chipsets (MSI are 515not supported on several busses that are not all connected to the 516chipset in the Linux PCI hierarchy). In the vast majority of other 517cases, disabling only behind a specific bridge is enough. 518 519For debugging purpose, the user may also pass pci=nomsi on the kernel 520command-line to explicitly disable MSI globally. But, once the appro- 521priate quirks are added to the kernel, this option should not be 522required anymore. 523 5246.4. Finding why MSI cannot be enabled on a device 525 526Assuming that MSI are not enabled on a device, you should look at 527dmesg to find messages that quirks may output when disabling MSI 528on some devices, some bridges or even globally. 529Then, lspci -t gives the list of bridges above a device. Reading 530/sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSI 531are enabled (1) or disabled (0). In 0 is found in a single bridge 532msi_bus file above the device, MSI cannot be enabled. 533 5347. FAQ 535 536Q1. Are there any limitations on using the MSI? 537 538A1. If the PCI device supports MSI and conforms to the 539specification and the platform supports the APIC local bus, 540then using MSI should work. 541 542Q2. Will it work on all the Pentium processors (P3, P4, Xeon, 543AMD processors)? In P3 IPI's are transmitted on the APIC local 544bus and in P4 and Xeon they are transmitted on the system 545bus. Are there any implications with this? 546 547A2. MSI support enables a PCI device sending an inbound 548memory write (0xfeexxxxx as target address) on its PCI bus 549directly to the FSB. Since the message address has a 550redirection hint bit cleared, it should work. 551 552Q3. The target address 0xfeexxxxx will be translated by the 553Host Bridge into an interrupt message. Are there any 554limitations on the chipsets such as Intel 8xx, Intel e7xxx, 555or VIA? 556 557A3. If these chipsets support an inbound memory write with 558target address set as 0xfeexxxxx, as conformed to PCI 559specification 2.3 or latest, then it should work. 560 561Q4. From the driver point of view, if the MSI is lost because 562of errors occurring during inbound memory write, then it may 563wait forever. Is there a mechanism for it to recover? 564 565A4. Since the target of the transaction is an inbound memory 566write, all transaction termination conditions (Retry, 567Master-Abort, Target-Abort, or normal completion) are 568supported. A device sending an MSI must abide by all the PCI 569rules and conditions regarding that inbound memory write. So, 570if a retry is signaled it must retry, etc... We believe that 571the recommendation for Abort is also a retry (refer to PCI 572specification 2.3 or latest).