Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus

Pull virtio updates from Rusty Russell.

* tag 'virtio-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
virtio: fix typo in comment
virtio-mmio: Devices parameter parsing
virtio_blk: Drop unused request tracking list
virtio-blk: Fix hot-unplug race in remove method
virtio: Use ida to allocate virtio index
virtio: balloon: separate out common code between remove and freeze functions
virtio: balloon: drop restore_common()
9p: disconnect channel when PCI device is removed
virtio: update documentation to v0.9.5 of spec

+1314 -120
+17
Documentation/kernel-parameters.txt
··· 110 110 USB USB support is enabled. 111 111 USBHID USB Human Interface Device support is enabled. 112 112 V4L Video For Linux support is enabled. 113 + VMMIO Driver for memory mapped virtio devices is enabled. 113 114 VGA The VGA console has been enabled. 114 115 VT Virtual terminal support is enabled. 115 116 WDT Watchdog support is enabled. ··· 2932 2931 2933 2932 video= [FB] Frame buffer configuration 2934 2933 See Documentation/fb/modedb.txt. 2934 + 2935 + virtio_mmio.device= 2936 + [VMMIO] Memory mapped virtio (platform) device. 2937 + 2938 + <size>@<baseaddr>:<irq>[:<id>] 2939 + where: 2940 + <size> := size (can use standard suffixes 2941 + like K, M and G) 2942 + <baseaddr> := physical base address 2943 + <irq> := interrupt number (as passed to 2944 + request_irq()) 2945 + <id> := (optional) platform device id 2946 + example: 2947 + virtio_mmio.device=1K@0x100b0000:48:7 2948 + 2949 + Can be used multiple times for multiple devices. 2935 2950 2936 2951 vga= [BOOT,X86-32] Select a particular video mode 2937 2952 See Documentation/x86/boot.txt and
+1087 -77
Documentation/virtual/virtio-spec.txt
··· 1 1 [Generated file: see http://ozlabs.org/~rusty/virtio-spec/] 2 2 Virtio PCI Card Specification 3 - v0.9.1 DRAFT 3 + v0.9.5 DRAFT 4 4 - 5 5 6 - Rusty Russell <rusty@rustcorp.com.au>IBM Corporation (Editor) 6 + Rusty Russell <rusty@rustcorp.com.au> IBM Corporation (Editor) 7 7 8 - 2011 August 1. 8 + 2012 May 7. 9 9 10 10 Purpose and Description 11 11 ··· 68 68 +-------------------+-----------------------------------+-----------+ 69 69 70 70 71 - When the driver wants to send buffers to the device, it puts them 72 - in one or more slots in the descriptor table, and writes the 73 - descriptor indices into the available ring. It then notifies the 74 - device. When the device has finished with the buffers, it writes 75 - the descriptors into the used ring, and sends an interrupt. 71 + When the driver wants to send a buffer to the device, it fills in 72 + a slot in the descriptor table (or chains several together), and 73 + writes the descriptor index into the available ring. It then 74 + notifies the device. When the device has finished a buffer, it 75 + writes the descriptor into the used ring, and sends an interrupt. 76 76 77 77 Specification 78 78 ··· 106 106 +----------------------+--------------------+---------------+ 107 107 | 6 | ioMemory | - | 108 108 +----------------------+--------------------+---------------+ 109 + | 7 | rpmsg | Appendix H | 110 + +----------------------+--------------------+---------------+ 111 + | 8 | SCSI host | Appendix I | 112 + +----------------------+--------------------+---------------+ 109 113 | 9 | 9P transport | - | 114 + +----------------------+--------------------+---------------+ 115 + | 10 | mac80211 wlan | - | 110 116 +----------------------+--------------------+---------------+ 111 117 112 118 ··· 133 127 the native endian of the guest (where such distinction is 134 128 applicable). 135 129 136 - Device Initialization Sequence 130 + Device Initialization Sequence<sub:Device-Initialization-Sequence> 137 131 138 132 We start with an overview of device initialization, then expand 139 133 on the details of the device and how each step is preformed. ··· 183 177 184 178 185 179 If MSI-X is enabled for the device, two additional fields 186 - immediately follow this header: 180 + immediately follow this header:[footnote: 181 + ie. once you enable MSI-X on the device, the other fields move. 182 + If you turn it off again, they move back! 183 + ] 187 184 188 185 189 186 +------------++----------------+--------+ ··· 198 189 | Purpose || Configuration | Queue | 199 190 | (MSI-X) || Vector | Vector | 200 191 +------------++----------------+--------+ 201 - 202 - 203 - Finally, if feature bits (VIRTIO_F_FEATURES_HI) this is 204 - immediately followed by two additional fields: 205 - 206 - 207 - +------------++----------------------+---------------------- 208 - | Bits || 32 | 32 209 - +------------++----------------------+---------------------- 210 - | Read/Write || R | R+W 211 - +------------++----------------------+---------------------- 212 - | Purpose || Device | Guest 213 - | || Features bits 32:63 | Features bits 32:63 214 - +------------++----------------------+---------------------- 215 192 216 193 217 194 Immediately following these general headers, there may be ··· 233 238 may be a significant (or infinite) delay before setting this 234 239 bit. 235 240 236 - DRIVER_OK (3) Indicates that the driver is set up and ready to 241 + DRIVER_OK (4) Indicates that the driver is set up and ready to 237 242 drive the device. 238 243 239 - FAILED (8) Indicates that something went wrong in the guest, 244 + FAILED (128) Indicates that something went wrong in the guest, 240 245 and it has given up on the device. This could be an internal 241 246 error, or the driver didn't like the device for some reason, or 242 247 even a fatal error during device operation. The device must be 243 248 reset before attempting to re-initialize. 244 249 245 - Feature Bits 250 + Feature Bits<sub:Feature-Bits> 246 251 247 - The least significant 31 bits of the first configuration field 248 - indicates the features that the device supports (the high bit is 249 - reserved, and will be used to indicate the presence of future 250 - feature bits elsewhere). If more than 31 feature bits are 251 - supported, the device indicates so by setting feature bit 31 (see 252 - [cha:Reserved-Feature-Bits]). The bits are allocated as follows: 252 + Thefirst configuration field indicates the features that the 253 + device supports. The bits are allocated as follows: 253 254 254 255 0 to 23 Feature bits for the specific device type 255 256 256 - 24 to 40 Feature bits reserved for extensions to the queue and 257 + 24 to 32 Feature bits reserved for extensions to the queue and 257 258 feature negotiation mechanisms 258 - 259 - 41 to 63 Feature bits reserved for future extensions 260 259 261 260 For example, feature bit 0 for a network device (i.e. Subsystem 262 261 Device ID 1) indicates that the device supports checksumming of ··· 274 285 will not see that feature bit in the Device Features field and 275 286 can go into backwards compatibility mode (or, for poor 276 287 implementations, set the FAILED Device Status bit). 277 - 278 - Access to feature bits 32 to 63 is enabled by Guest by setting 279 - feature bit 31. If this bit is unset, Device must assume that all 280 - feature bits > 31 are unset. 281 288 282 289 Configuration/Queue Vectors 283 290 ··· 309 324 failure, NO_VECTOR is returned. If a mapping failure is detected, 310 325 the driver can retry mapping with fewervectors, or disable MSI-X. 311 326 312 - Virtqueue Configuration 327 + Virtqueue Configuration<sec:Virtqueue-Configuration> 313 328 314 329 As a device can have zero or more virtqueues for bulk data 315 330 transport (for example, the network driver has two), the driver ··· 572 587 freely used by all other projects, and is reproduced (with slight 573 588 variation to remove Linux assumptions) in Appendix A. 574 589 575 - Device Operation 590 + Device Operation<sec:Device-Operation> 576 591 577 592 There are two parts to device operation: supplying new buffers to 578 593 the device, and processing used buffers from the device. As an ··· 798 813 799 814 } 800 815 801 - Dealing With Configuration Changes 816 + Dealing With Configuration Changes<sub:Dealing-With-Configuration> 802 817 803 818 Some virtio PCI devices can change the device configuration 804 819 state, as reflected in the virtio header in the PCI configuration ··· 1245 1260 driver should ignore the used_event field; the device should 1246 1261 ignore the avail_event field; the flags field is used 1247 1262 1248 - VIRTIO_F_BAD_FEATURE(30) This feature should never be 1249 - negotiated by the guest; doing so is an indication that the 1250 - guest is faulty[footnote: 1251 - An experimental virtio PCI driver contained in Linux version 1252 - 2.6.25 had this problem, and this feature bit can be used to 1253 - detect it. 1254 - ] 1255 - 1256 - VIRTIO_F_FEATURES_HIGH(31) This feature indicates that the 1257 - device supports feature bits 32:63. If unset, feature bits 1258 - 32:63 are unset. 1259 - 1260 1263 Appendix C: Network Device 1261 1264 1262 1265 The virtio network device is a virtual ethernet card, and is the ··· 1308 1335 1309 1336 VIRTIO_NET_F_CTRL_VLAN (19) Control channel VLAN filtering. 1310 1337 1338 + VIRTIO_NET_F_GUEST_ANNOUNCE(21) Guest can send gratuitous 1339 + packets. 1340 + 1311 1341 Device configuration layout Two configuration fields are 1312 1342 currently defined. The mac address field always exists (though 1313 1343 is only valid if VIRTIO_NET_F_MAC is set), and the status field 1314 - only exists if VIRTIO_NET_F_STATUS is set. Only one bit is 1315 - currently defined for the status field: VIRTIO_NET_S_LINK_UP. #define VIRTIO_NET_S_LINK_UP 1 1344 + only exists if VIRTIO_NET_F_STATUS is set. Two read-only bits 1345 + are currently defined for the status field: 1346 + VIRTIO_NET_S_LINK_UP and VIRTIO_NET_S_ANNOUNCE. #define VIRTIO_NET_S_LINK_UP 1 1347 + 1348 + #define VIRTIO_NET_S_ANNOUNCE 2 1316 1349 1317 1350 1318 1351 ··· 1356 1377 packets by negotating the VIRTIO_NET_F_CSUM feature. This “ 1357 1378 checksum offload” is a common feature on modern network cards. 1358 1379 1359 - If that feature is negotiated, a driver can use TCP or UDP 1360 - segmentation offload by negotiating the VIRTIO_NET_F_HOST_TSO4 1361 - (IPv4 TCP), VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and 1362 - VIRTIO_NET_F_HOST_UFO (UDP fragmentation) features. It should 1363 - not send TCP packets requiring segmentation offload which have 1364 - the Explicit Congestion Notification bit set, unless the 1380 + If that feature is negotiated[footnote: 1381 + ie. VIRTIO_NET_F_HOST_TSO* and VIRTIO_NET_F_HOST_UFO are 1382 + dependent on VIRTIO_NET_F_CSUM; a dvice which offers the offload 1383 + features must offer the checksum feature, and a driver which 1384 + accepts the offload features must accept the checksum feature. 1385 + Similar logic applies to the VIRTIO_NET_F_GUEST_TSO4 features 1386 + depending on VIRTIO_NET_F_GUEST_CSUM. 1387 + ], a driver can use TCP or UDP segmentation offload by 1388 + negotiating the VIRTIO_NET_F_HOST_TSO4 (IPv4 TCP), 1389 + VIRTIO_NET_F_HOST_TSO6 (IPv6 TCP) and VIRTIO_NET_F_HOST_UFO 1390 + (UDP fragmentation) features. It should not send TCP packets 1391 + requiring segmentation offload which have the Explicit 1392 + Congestion Notification bit set, unless the 1365 1393 VIRTIO_NET_F_HOST_ECN feature is negotiated.[footnote: 1366 1394 This is a common restriction in real, older network cards. 1367 1395 ] ··· 1389 1403 1390 1404 Packets are transmitted by placing them in the transmitq, and 1391 1405 buffers for incoming packets are placed in the receiveq. In each 1392 - case, the packet itself is preceded by a header: 1406 + case, the packet itself is preceeded by a header: 1393 1407 1394 1408 struct virtio_net_hdr { 1395 1409 ··· 1448 1462 followed by the TCP header (with the TCP checksum field 16 bytes 1449 1463 into that header). csum_start will be 14+20 = 34 (the TCP 1450 1464 checksum includes the header), and csum_offset will be 16. The 1451 - value in the TCP checksum field will be the sum of the TCP pseudo 1452 - header, so that replacing it by the ones' complement checksum of 1453 - the TCP header and body will give the correct result. 1465 + value in the TCP checksum field should be initialized to the sum 1466 + of the TCP pseudo header, so that replacing it by the ones' 1467 + complement checksum of the TCP header and body will give the 1468 + correct result. 1454 1469 ] 1455 1470 1456 1471 <enu:If-the-driver>If the driver negotiated ··· 1470 1483 as a guarantee of the transport header size. 1471 1484 ] 1472 1485 1473 - gso_size is the size of the packet beyond that header (ie. 1474 - MSS). 1486 + gso_size is the maximum size of each packet beyond that header 1487 + (ie. MSS). 1475 1488 1476 1489 If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the 1477 1490 VIRTIO_NET_HDR_GSO_ECN bit may be set in “gso_type” as well, ··· 1554 1567 If the VIRTIO_NET_F_GUEST_TSO4, TSO6 or UFO options were 1555 1568 negotiated, then the “gso_type” may be something other than 1556 1569 VIRTIO_NET_HDR_GSO_NONE, and the “gso_size” field indicates the 1557 - desired MSS (see [enu:If-the-driver]).Control Virtqueue 1570 + desired MSS (see [enu:If-the-driver]). 1571 + 1572 + Control Virtqueue 1558 1573 1559 1574 The driver uses the control virtqueue (if VIRTIO_NET_F_VTRL_VQ is 1560 1575 negotiated) to send commands to manipulate various features of ··· 1631 1642 1632 1643 The device can filter incoming packets by any number of 1633 1644 destination MAC addresses.[footnote: 1634 - Since there are no guarantees, it can use a hash filter 1645 + Since there are no guarentees, it can use a hash filter 1635 1646 orsilently switch to allmulti or promiscuous mode if it is given 1636 1647 too many addresses. 1637 1648 ] This table is set using the class VIRTIO_NET_CTRL_MAC and the ··· 1653 1664 1654 1665 Both the VIRTIO_NET_CTRL_VLAN_ADD and VIRTIO_NET_CTRL_VLAN_DEL 1655 1666 command take a 16-bit VLAN id as the command-specific-data. 1667 + 1668 + Gratuitous Packet Sending 1669 + 1670 + If the driver negotiates the VIRTIO_NET_F_GUEST_ANNOUNCE (depends 1671 + on VIRTIO_NET_F_CTRL_VQ), it can ask the guest to send gratuitous 1672 + packets; this is usually done after the guest has been physically 1673 + migrated, and needs to announce its presence on the new network 1674 + links. (As hypervisor does not have the knowledge of guest 1675 + network configuration (eg. tagged vlan) it is simplest to prod 1676 + the guest in this way). 1677 + 1678 + #define VIRTIO_NET_CTRL_ANNOUNCE 3 1679 + 1680 + #define VIRTIO_NET_CTRL_ANNOUNCE_ACK 0 1681 + 1682 + The Guest needs to check VIRTIO_NET_S_ANNOUNCE bit in status 1683 + field when it notices the changes of device configuration. The 1684 + command VIRTIO_NET_CTRL_ANNOUNCE_ACK is used to indicate that 1685 + driver has recevied the notification and device would clear the 1686 + VIRTIO_NET_S_ANNOUNCE bit in the status filed after it received 1687 + this command. 1688 + 1689 + Processing this notification involves: 1690 + 1691 + Sending the gratuitous packets or marking there are pending 1692 + gratuitous packets to be sent and letting deferred routine to 1693 + send them. 1694 + 1695 + Sending VIRTIO_NET_CTRL_ANNOUNCE_ACK command through control 1696 + vq. 1697 + 1698 + . 1656 1699 1657 1700 Appendix D: Block Device 1658 1701 ··· 1719 1698 VIRTIO_BLK_F_SCSI (7) Device supports scsi packet commands. 1720 1699 1721 1700 VIRTIO_BLK_F_FLUSH (9) Cache flush command support. 1722 - 1723 - 1724 1701 1725 1702 Device configuration layout The capacity of the device 1726 1703 (expressed in 512-byte sectors) is always present. The ··· 1761 1742 1762 1743 If the VIRTIO_BLK_F_RO feature is set by the device, any write 1763 1744 requests will fail. 1764 - 1765 - 1766 1745 1767 1746 Device Operation 1768 1747 ··· 1822 1805 distinguish between them 1823 1806 ]). If the device has VIRTIO_BLK_F_BARRIER feature the high bit 1824 1807 (VIRTIO_BLK_T_BARRIER) indicates that this request acts as a 1825 - barrier and that all preceding requests must be complete before 1808 + barrier and that all preceeding requests must be complete before 1826 1809 this one, and all following requests must not be started until 1827 1810 this is complete. Note that a barrier does not flush caches in 1828 1811 the underlying backend device in host, and thus does not serve as ··· 2135 2118 2136 2119 Otherwise, the guest may begin to re-use pages previously given 2137 2120 to the balloon before the device has acknowledged their 2138 - withdrawal. [footnote: 2121 + withdrawl. [footnote: 2139 2122 In this case, deflation advice is merely a courtesy 2140 2123 ] 2141 2124 ··· 2214 2197 2215 2198 VIRTIO_BALLOON_S_MEMTOT The total amount of memory available 2216 2199 (in bytes). 2200 + 2201 + Appendix H: Rpmsg: Remote Processor Messaging 2202 + 2203 + Virtio rpmsg devices represent remote processors on the system 2204 + which run in asymmetric multi-processing (AMP) configuration, and 2205 + which are usually used to offload cpu-intensive tasks from the 2206 + main application processor (a typical SoC methodology). 2207 + 2208 + Virtio is being used to communicate with those remote processors; 2209 + empty buffers are placed in one virtqueue for receiving messages, 2210 + and non-empty buffers, containing outbound messages, are enqueued 2211 + in a second virtqueue for transmission. 2212 + 2213 + Numerous communication channels can be multiplexed over those two 2214 + virtqueues, so different entities, running on the application and 2215 + remote processor, can directly communicate in a point-to-point 2216 + fashion. 2217 + 2218 + Configuration 2219 + 2220 + Subsystem Device ID 7 2221 + 2222 + Virtqueues 0:receiveq. 1:transmitq. 2223 + 2224 + Feature bits 2225 + 2226 + VIRTIO_RPMSG_F_NS (0) Device sends (and capable of receiving) 2227 + name service messages announcing the creation (or 2228 + destruction) of a channel:/** 2229 + 2230 + * struct rpmsg_ns_msg - dynamic name service announcement 2231 + message 2232 + 2233 + * @name: name of remote service that is published 2234 + 2235 + * @addr: address of remote service that is published 2236 + 2237 + * @flags: indicates whether service is created or destroyed 2238 + 2239 + * 2240 + 2241 + * This message is sent across to publish a new service (or 2242 + announce 2243 + 2244 + * about its removal). When we receives these messages, an 2245 + appropriate 2246 + 2247 + * rpmsg channel (i.e device) is created/destroyed. 2248 + 2249 + */ 2250 + 2251 + struct rpmsg_ns_msgoon_config { 2252 + 2253 + char name[RPMSG_NAME_SIZE]; 2254 + 2255 + u32 addr; 2256 + 2257 + u32 flags; 2258 + 2259 + } __packed; 2260 + 2261 + 2262 + 2263 + /** 2264 + 2265 + * enum rpmsg_ns_flags - dynamic name service announcement flags 2266 + 2267 + * 2268 + 2269 + * @RPMSG_NS_CREATE: a new remote service was just created 2270 + 2271 + * @RPMSG_NS_DESTROY: a remote service was just destroyed 2272 + 2273 + */ 2274 + 2275 + enum rpmsg_ns_flags { 2276 + 2277 + RPMSG_NS_CREATE = 0, 2278 + 2279 + RPMSG_NS_DESTROY = 1, 2280 + 2281 + }; 2282 + 2283 + Device configuration layout 2284 + 2285 + At his point none currently defined. 2286 + 2287 + Device Initialization 2288 + 2289 + The initialization routine should identify the receive and 2290 + transmission virtqueues. 2291 + 2292 + The receive virtqueue should be filled with receive buffers. 2293 + 2294 + Device Operation 2295 + 2296 + Messages are transmitted by placing them in the transmitq, and 2297 + buffers for inbound messages are placed in the receiveq. In any 2298 + case, messages are always preceded by the following header: /** 2299 + 2300 + * struct rpmsg_hdr - common header for all rpmsg messages 2301 + 2302 + * @src: source address 2303 + 2304 + * @dst: destination address 2305 + 2306 + * @reserved: reserved for future use 2307 + 2308 + * @len: length of payload (in bytes) 2309 + 2310 + * @flags: message flags 2311 + 2312 + * @data: @len bytes of message payload data 2313 + 2314 + * 2315 + 2316 + * Every message sent(/received) on the rpmsg bus begins with 2317 + this header. 2318 + 2319 + */ 2320 + 2321 + struct rpmsg_hdr { 2322 + 2323 + u32 src; 2324 + 2325 + u32 dst; 2326 + 2327 + u32 reserved; 2328 + 2329 + u16 len; 2330 + 2331 + u16 flags; 2332 + 2333 + u8 data[0]; 2334 + 2335 + } __packed; 2336 + 2337 + Appendix I: SCSI Host Device 2338 + 2339 + The virtio SCSI host device groups together one or more virtual 2340 + logical units (such as disks), and allows communicating to them 2341 + using the SCSI protocol. An instance of the device represents a 2342 + SCSI host to which many targets and LUNs are attached. 2343 + 2344 + The virtio SCSI device services two kinds of requests: 2345 + 2346 + command requests for a logical unit; 2347 + 2348 + task management functions related to a logical unit, target or 2349 + command. 2350 + 2351 + The device is also able to send out notifications about added and 2352 + removed logical units. Together, these capabilities provide a 2353 + SCSI transport protocol that uses virtqueues as the transfer 2354 + medium. In the transport protocol, the virtio driver acts as the 2355 + initiator, while the virtio SCSI host provides one or more 2356 + targets that receive and process the requests. 2357 + 2358 + Configuration 2359 + 2360 + Subsystem Device ID 8 2361 + 2362 + Virtqueues 0:controlq; 1:eventq; 2..n:request queues. 2363 + 2364 + Feature bits 2365 + 2366 + VIRTIO_SCSI_F_INOUT (0) A single request can include both 2367 + read-only and write-only data buffers. 2368 + 2369 + VIRTIO_SCSI_F_HOTPLUG (1) The host should enable 2370 + hot-plug/hot-unplug of new LUNs and targets on the SCSI bus. 2371 + 2372 + Device configuration layout All fields of this configuration 2373 + are always available. sense_size and cdb_size are writable by 2374 + the guest.struct virtio_scsi_config { 2375 + 2376 + u32 num_queues; 2377 + 2378 + u32 seg_max; 2379 + 2380 + u32 max_sectors; 2381 + 2382 + u32 cmd_per_lun; 2383 + 2384 + u32 event_info_size; 2385 + 2386 + u32 sense_size; 2387 + 2388 + u32 cdb_size; 2389 + 2390 + u16 max_channel; 2391 + 2392 + u16 max_target; 2393 + 2394 + u32 max_lun; 2395 + 2396 + }; 2397 + 2398 + num_queues is the total number of request virtqueues exposed by 2399 + the device. The driver is free to use only one request queue, 2400 + or it can use more to achieve better performance. 2401 + 2402 + seg_max is the maximum number of segments that can be in a 2403 + command. A bidirectional command can include seg_max input 2404 + segments and seg_max output segments. 2405 + 2406 + max_sectors is a hint to the guest about the maximum transfer 2407 + size it should use. 2408 + 2409 + cmd_per_lun is a hint to the guest about the maximum number of 2410 + linked commands it should send to one LUN. The actual value 2411 + to be used is the minimum of cmd_per_lun and the virtqueue 2412 + size. 2413 + 2414 + event_info_size is the maximum size that the device will fill 2415 + for buffers that the driver places in the eventq. The driver 2416 + should always put buffers at least of this size. It is 2417 + written by the device depending on the set of negotated 2418 + features. 2419 + 2420 + sense_size is the maximum size of the sense data that the 2421 + device will write. The default value is written by the device 2422 + and will always be 96, but the driver can modify it. It is 2423 + restored to the default when the device is reset. 2424 + 2425 + cdb_size is the maximum size of the CDB that the driver will 2426 + write. The default value is written by the device and will 2427 + always be 32, but the driver can likewise modify it. It is 2428 + restored to the default when the device is reset. 2429 + 2430 + max_channel, max_target and max_lun can be used by the driver 2431 + as hints to constrain scanning the logical units on the 2432 + host.h 2433 + 2434 + Device Initialization 2435 + 2436 + The initialization routine should first of all discover the 2437 + device's virtqueues. 2438 + 2439 + If the driver uses the eventq, it should then place at least a 2440 + buffer in the eventq. 2441 + 2442 + The driver can immediately issue requests (for example, INQUIRY 2443 + or REPORT LUNS) or task management functions (for example, I_T 2444 + RESET). 2445 + 2446 + Device Operation: request queues 2447 + 2448 + The driver queues requests to an arbitrary request queue, and 2449 + they are used by the device on that same queue. It is the 2450 + responsibility of the driver to ensure strict request ordering 2451 + for commands placed on different queues, because they will be 2452 + consumed with no order constraints. 2453 + 2454 + Requests have the following format: 2455 + 2456 + struct virtio_scsi_req_cmd { 2457 + 2458 + // Read-only 2459 + 2460 + u8 lun[8]; 2461 + 2462 + u64 id; 2463 + 2464 + u8 task_attr; 2465 + 2466 + u8 prio; 2467 + 2468 + u8 crn; 2469 + 2470 + char cdb[cdb_size]; 2471 + 2472 + char dataout[]; 2473 + 2474 + // Write-only part 2475 + 2476 + u32 sense_len; 2477 + 2478 + u32 residual; 2479 + 2480 + u16 status_qualifier; 2481 + 2482 + u8 status; 2483 + 2484 + u8 response; 2485 + 2486 + u8 sense[sense_size]; 2487 + 2488 + char datain[]; 2489 + 2490 + }; 2491 + 2492 + 2493 + 2494 + /* command-specific response values */ 2495 + 2496 + #define VIRTIO_SCSI_S_OK 0 2497 + 2498 + #define VIRTIO_SCSI_S_OVERRUN 1 2499 + 2500 + #define VIRTIO_SCSI_S_ABORTED 2 2501 + 2502 + #define VIRTIO_SCSI_S_BAD_TARGET 3 2503 + 2504 + #define VIRTIO_SCSI_S_RESET 4 2505 + 2506 + #define VIRTIO_SCSI_S_BUSY 5 2507 + 2508 + #define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6 2509 + 2510 + #define VIRTIO_SCSI_S_TARGET_FAILURE 7 2511 + 2512 + #define VIRTIO_SCSI_S_NEXUS_FAILURE 8 2513 + 2514 + #define VIRTIO_SCSI_S_FAILURE 9 2515 + 2516 + 2517 + 2518 + /* task_attr */ 2519 + 2520 + #define VIRTIO_SCSI_S_SIMPLE 0 2521 + 2522 + #define VIRTIO_SCSI_S_ORDERED 1 2523 + 2524 + #define VIRTIO_SCSI_S_HEAD 2 2525 + 2526 + #define VIRTIO_SCSI_S_ACA 3 2527 + 2528 + The lun field addresses a target and logical unit in the 2529 + virtio-scsi device's SCSI domain. The only supported format for 2530 + the LUN field is: first byte set to 1, second byte set to target, 2531 + third and fourth byte representing a single level LUN structure, 2532 + followed by four zero bytes. With this representation, a 2533 + virtio-scsi device can serve up to 256 targets and 16384 LUNs per 2534 + target. 2535 + 2536 + The id field is the command identifier (“tag”). 2537 + 2538 + task_attr, prio and crn should be left to zero. task_attr defines 2539 + the task attribute as in the table above, but all task attributes 2540 + may be mapped to SIMPLE by the device; crn may also be provided 2541 + by clients, but is generally expected to be 0. The maximum CRN 2542 + value defined by the protocol is 255, since CRN is stored in an 2543 + 8-bit integer. 2544 + 2545 + All of these fields are defined in SAM. They are always 2546 + read-only, as are the cdb and dataout field. The cdb_size is 2547 + taken from the configuration space. 2548 + 2549 + sense and subsequent fields are always write-only. The sense_len 2550 + field indicates the number of bytes actually written to the sense 2551 + buffer. The residual field indicates the residual size, 2552 + calculated as “data_length - number_of_transferred_bytes”, for 2553 + read or write operations. For bidirectional commands, the 2554 + number_of_transferred_bytes includes both read and written bytes. 2555 + A residual field that is less than the size of datain means that 2556 + the dataout field was processed entirely. A residual field that 2557 + exceeds the size of datain means that the dataout field was 2558 + processed partially and the datain field was not processed at 2559 + all. 2560 + 2561 + The status byte is written by the device to be the status code as 2562 + defined in SAM. 2563 + 2564 + The response byte is written by the device to be one of the 2565 + following: 2566 + 2567 + VIRTIO_SCSI_S_OK when the request was completed and the status 2568 + byte is filled with a SCSI status code (not necessarily 2569 + "GOOD"). 2570 + 2571 + VIRTIO_SCSI_S_OVERRUN if the content of the CDB requires 2572 + transferring more data than is available in the data buffers. 2573 + 2574 + VIRTIO_SCSI_S_ABORTED if the request was cancelled due to an 2575 + ABORT TASK or ABORT TASK SET task management function. 2576 + 2577 + VIRTIO_SCSI_S_BAD_TARGET if the request was never processed 2578 + because the target indicated by the lun field does not exist. 2579 + 2580 + VIRTIO_SCSI_S_RESET if the request was cancelled due to a bus 2581 + or device reset (including a task management function). 2582 + 2583 + VIRTIO_SCSI_S_TRANSPORT_FAILURE if the request failed due to a 2584 + problem in the connection between the host and the target 2585 + (severed link). 2586 + 2587 + VIRTIO_SCSI_S_TARGET_FAILURE if the target is suffering a 2588 + failure and the guest should not retry on other paths. 2589 + 2590 + VIRTIO_SCSI_S_NEXUS_FAILURE if the nexus is suffering a failure 2591 + but retrying on other paths might yield a different result. 2592 + 2593 + VIRTIO_SCSI_S_BUSY if the request failed but retrying on the 2594 + same path should work. 2595 + 2596 + VIRTIO_SCSI_S_FAILURE for other host or guest error. In 2597 + particular, if neither dataout nor datain is empty, and the 2598 + VIRTIO_SCSI_F_INOUT feature has not been negotiated, the 2599 + request will be immediately returned with a response equal to 2600 + VIRTIO_SCSI_S_FAILURE. 2601 + 2602 + Device Operation: controlq 2603 + 2604 + The controlq is used for other SCSI transport operations. 2605 + Requests have the following format: 2606 + 2607 + struct virtio_scsi_ctrl { 2608 + 2609 + u32 type; 2610 + 2611 + ... 2612 + 2613 + u8 response; 2614 + 2615 + }; 2616 + 2617 + 2618 + 2619 + /* response values valid for all commands */ 2620 + 2621 + #define VIRTIO_SCSI_S_OK 0 2622 + 2623 + #define VIRTIO_SCSI_S_BAD_TARGET 3 2624 + 2625 + #define VIRTIO_SCSI_S_BUSY 5 2626 + 2627 + #define VIRTIO_SCSI_S_TRANSPORT_FAILURE 6 2628 + 2629 + #define VIRTIO_SCSI_S_TARGET_FAILURE 7 2630 + 2631 + #define VIRTIO_SCSI_S_NEXUS_FAILURE 8 2632 + 2633 + #define VIRTIO_SCSI_S_FAILURE 9 2634 + 2635 + #define VIRTIO_SCSI_S_INCORRECT_LUN 12 2636 + 2637 + The type identifies the remaining fields. 2638 + 2639 + The following commands are defined: 2640 + 2641 + Task management function 2642 + #define VIRTIO_SCSI_T_TMF 0 2643 + 2644 + 2645 + 2646 + #define VIRTIO_SCSI_T_TMF_ABORT_TASK 0 2647 + 2648 + #define VIRTIO_SCSI_T_TMF_ABORT_TASK_SET 1 2649 + 2650 + #define VIRTIO_SCSI_T_TMF_CLEAR_ACA 2 2651 + 2652 + #define VIRTIO_SCSI_T_TMF_CLEAR_TASK_SET 3 2653 + 2654 + #define VIRTIO_SCSI_T_TMF_I_T_NEXUS_RESET 4 2655 + 2656 + #define VIRTIO_SCSI_T_TMF_LOGICAL_UNIT_RESET 5 2657 + 2658 + #define VIRTIO_SCSI_T_TMF_QUERY_TASK 6 2659 + 2660 + #define VIRTIO_SCSI_T_TMF_QUERY_TASK_SET 7 2661 + 2662 + 2663 + 2664 + struct virtio_scsi_ctrl_tmf 2665 + 2666 + { 2667 + 2668 + // Read-only part 2669 + 2670 + u32 type; 2671 + 2672 + u32 subtype; 2673 + 2674 + u8 lun[8]; 2675 + 2676 + u64 id; 2677 + 2678 + // Write-only part 2679 + 2680 + u8 response; 2681 + 2682 + } 2683 + 2684 + 2685 + 2686 + /* command-specific response values */ 2687 + 2688 + #define VIRTIO_SCSI_S_FUNCTION_COMPLETE 0 2689 + 2690 + #define VIRTIO_SCSI_S_FUNCTION_SUCCEEDED 10 2691 + 2692 + #define VIRTIO_SCSI_S_FUNCTION_REJECTED 11 2693 + 2694 + The type is VIRTIO_SCSI_T_TMF; the subtype field defines. All 2695 + fields except response are filled by the driver. The subtype 2696 + field must always be specified and identifies the requested 2697 + task management function. 2698 + 2699 + Other fields may be irrelevant for the requested TMF; if so, 2700 + they are ignored but they should still be present. The lun 2701 + field is in the same format specified for request queues; the 2702 + single level LUN is ignored when the task management function 2703 + addresses a whole I_T nexus. When relevant, the value of the id 2704 + field is matched against the id values passed on the requestq. 2705 + 2706 + The outcome of the task management function is written by the 2707 + device in the response field. The command-specific response 2708 + values map 1-to-1 with those defined in SAM. 2709 + 2710 + Asynchronous notification query 2711 + #define VIRTIO_SCSI_T_AN_QUERY 1 2712 + 2713 + 2714 + 2715 + struct virtio_scsi_ctrl_an { 2716 + 2717 + // Read-only part 2718 + 2719 + u32 type; 2720 + 2721 + u8 lun[8]; 2722 + 2723 + u32 event_requested; 2724 + 2725 + // Write-only part 2726 + 2727 + u32 event_actual; 2728 + 2729 + u8 response; 2730 + 2731 + } 2732 + 2733 + 2734 + 2735 + #define VIRTIO_SCSI_EVT_ASYNC_OPERATIONAL_CHANGE 2 2736 + 2737 + #define VIRTIO_SCSI_EVT_ASYNC_POWER_MGMT 4 2738 + 2739 + #define VIRTIO_SCSI_EVT_ASYNC_EXTERNAL_REQUEST 8 2740 + 2741 + #define VIRTIO_SCSI_EVT_ASYNC_MEDIA_CHANGE 16 2742 + 2743 + #define VIRTIO_SCSI_EVT_ASYNC_MULTI_HOST 32 2744 + 2745 + #define VIRTIO_SCSI_EVT_ASYNC_DEVICE_BUSY 64 2746 + 2747 + By sending this command, the driver asks the device which 2748 + events the given LUN can report, as described in paragraphs 6.6 2749 + and A.6 of the SCSI MMC specification. The driver writes the 2750 + events it is interested in into the event_requested; the device 2751 + responds by writing the events that it supports into 2752 + event_actual. 2753 + 2754 + The type is VIRTIO_SCSI_T_AN_QUERY. The lun and event_requested 2755 + fields are written by the driver. The event_actual and response 2756 + fields are written by the device. 2757 + 2758 + No command-specific values are defined for the response byte. 2759 + 2760 + Asynchronous notification subscription 2761 + #define VIRTIO_SCSI_T_AN_SUBSCRIBE 2 2762 + 2763 + 2764 + 2765 + struct virtio_scsi_ctrl_an { 2766 + 2767 + // Read-only part 2768 + 2769 + u32 type; 2770 + 2771 + u8 lun[8]; 2772 + 2773 + u32 event_requested; 2774 + 2775 + // Write-only part 2776 + 2777 + u32 event_actual; 2778 + 2779 + u8 response; 2780 + 2781 + } 2782 + 2783 + By sending this command, the driver asks the specified LUN to 2784 + report events for its physical interface, again as described in 2785 + the SCSI MMC specification. The driver writes the events it is 2786 + interested in into the event_requested; the device responds by 2787 + writing the events that it supports into event_actual. 2788 + 2789 + Event types are the same as for the asynchronous notification 2790 + query message. 2791 + 2792 + The type is VIRTIO_SCSI_T_AN_SUBSCRIBE. The lun and 2793 + event_requested fields are written by the driver. The 2794 + event_actual and response fields are written by the device. 2795 + 2796 + No command-specific values are defined for the response byte. 2797 + 2798 + Device Operation: eventq 2799 + 2800 + The eventq is used by the device to report information on logical 2801 + units that are attached to it. The driver should always leave a 2802 + few buffers ready in the eventq. In general, the device will not 2803 + queue events to cope with an empty eventq, and will end up 2804 + dropping events if it finds no buffer ready. However, when 2805 + reporting events for many LUNs (e.g. when a whole target 2806 + disappears), the device can throttle events to avoid dropping 2807 + them. For this reason, placing 10-15 buffers on the event queue 2808 + should be enough. 2809 + 2810 + Buffers are placed in the eventq and filled by the device when 2811 + interesting events occur. The buffers should be strictly 2812 + write-only (device-filled) and the size of the buffers should be 2813 + at least the value given in the device's configuration 2814 + information. 2815 + 2816 + Buffers returned by the device on the eventq will be referred to 2817 + as "events" in the rest of this section. Events have the 2818 + following format: 2819 + 2820 + #define VIRTIO_SCSI_T_EVENTS_MISSED 0x80000000 2821 + 2822 + 2823 + 2824 + struct virtio_scsi_event { 2825 + 2826 + // Write-only part 2827 + 2828 + u32 event; 2829 + 2830 + ... 2831 + 2832 + } 2833 + 2834 + If bit 31 is set in the event field, the device failed to report 2835 + an event due to missing buffers. In this case, the driver should 2836 + poll the logical units for unit attention conditions, and/or do 2837 + whatever form of bus scan is appropriate for the guest operating 2838 + system. 2839 + 2840 + Other data that the device writes to the buffer depends on the 2841 + contents of the event field. The following events are defined: 2842 + 2843 + No event 2844 + #define VIRTIO_SCSI_T_NO_EVENT 0 2845 + 2846 + This event is fired in the following cases: 2847 + 2848 + When the device detects in the eventq a buffer that is shorter 2849 + than what is indicated in the configuration field, it might 2850 + use it immediately and put this dummy value in the event 2851 + field. A well-written driver will never observe this 2852 + situation. 2853 + 2854 + When events are dropped, the device may signal this event as 2855 + soon as the drivers makes a buffer available, in order to 2856 + request action from the driver. In this case, of course, this 2857 + event will be reported with the VIRTIO_SCSI_T_EVENTS_MISSED 2858 + flag. 2859 + 2860 + Transport reset 2861 + #define VIRTIO_SCSI_T_TRANSPORT_RESET 1 2862 + 2863 + 2864 + 2865 + struct virtio_scsi_event_reset { 2866 + 2867 + // Write-only part 2868 + 2869 + u32 event; 2870 + 2871 + u8 lun[8]; 2872 + 2873 + u32 reason; 2874 + 2875 + } 2876 + 2877 + 2878 + 2879 + #define VIRTIO_SCSI_EVT_RESET_HARD 0 2880 + 2881 + #define VIRTIO_SCSI_EVT_RESET_RESCAN 1 2882 + 2883 + #define VIRTIO_SCSI_EVT_RESET_REMOVED 2 2884 + 2885 + By sending this event, the device signals that a logical unit 2886 + on a target has been reset, including the case of a new device 2887 + appearing or disappearing on the bus.The device fills in all 2888 + fields. The event field is set to 2889 + VIRTIO_SCSI_T_TRANSPORT_RESET. The lun field addresses a 2890 + logical unit in the SCSI host. 2891 + 2892 + The reason value is one of the three #define values appearing 2893 + above: 2894 + 2895 + VIRTIO_SCSI_EVT_RESET_REMOVED (“LUN/target removed”) is used if 2896 + the target or logical unit is no longer able to receive 2897 + commands. 2898 + 2899 + VIRTIO_SCSI_EVT_RESET_HARD (“LUN hard reset”) is used if the 2900 + logical unit has been reset, but is still present. 2901 + 2902 + VIRTIO_SCSI_EVT_RESET_RESCAN (“rescan LUN/target”) is used if a 2903 + target or logical unit has just appeared on the device. 2904 + 2905 + The “removed” and “rescan” events, when sent for LUN 0, may 2906 + apply to the entire target. After receiving them the driver 2907 + should ask the initiator to rescan the target, in order to 2908 + detect the case when an entire target has appeared or 2909 + disappeared. These two events will never be reported unless the 2910 + VIRTIO_SCSI_F_HOTPLUG feature was negotiated between the host 2911 + and the guest. 2912 + 2913 + Events will also be reported via sense codes (this obviously 2914 + does not apply to newly appeared buses or targets, since the 2915 + application has never discovered them): 2916 + 2917 + “LUN/target removed” maps to sense key ILLEGAL REQUEST, asc 2918 + 0x25, ascq 0x00 (LOGICAL UNIT NOT SUPPORTED) 2919 + 2920 + “LUN hard reset” maps to sense key UNIT ATTENTION, asc 0x29 2921 + (POWER ON, RESET OR BUS DEVICE RESET OCCURRED) 2922 + 2923 + “rescan LUN/target” maps to sense key UNIT ATTENTION, asc 0x3f, 2924 + ascq 0x0e (REPORTED LUNS DATA HAS CHANGED) 2925 + 2926 + The preferred way to detect transport reset is always to use 2927 + events, because sense codes are only seen by the driver when it 2928 + sends a SCSI command to the logical unit or target. However, in 2929 + case events are dropped, the initiator will still be able to 2930 + synchronize with the actual state of the controller if the 2931 + driver asks the initiator to rescan of the SCSI bus. During the 2932 + rescan, the initiator will be able to observe the above sense 2933 + codes, and it will process them as if it the driver had 2934 + received the equivalent event. 2935 + 2936 + Asynchronous notification 2937 + #define VIRTIO_SCSI_T_ASYNC_NOTIFY 2 2938 + 2939 + 2940 + 2941 + struct virtio_scsi_event_an { 2942 + 2943 + // Write-only part 2944 + 2945 + u32 event; 2946 + 2947 + u8 lun[8]; 2948 + 2949 + u32 reason; 2950 + 2951 + } 2952 + 2953 + By sending this event, the device signals that an asynchronous 2954 + event was fired from a physical interface. 2955 + 2956 + All fields are written by the device. The event field is set to 2957 + VIRTIO_SCSI_T_ASYNC_NOTIFY. The lun field addresses a logical 2958 + unit in the SCSI host. The reason field is a subset of the 2959 + events that the driver has subscribed to via the "Asynchronous 2960 + notification subscription" command. 2961 + 2962 + When dropped events are reported, the driver should poll for 2963 + asynchronous events manually using SCSI commands. 2964 + 2965 + Appendix X: virtio-mmio 2966 + 2967 + Virtual environments without PCI support (a common situation in 2968 + embedded devices models) might use simple memory mapped device (“ 2969 + virtio-mmio”) instead of the PCI device. 2970 + 2971 + The memory mapped virtio device behaviour is based on the PCI 2972 + device specification. Therefore most of operations like device 2973 + initialization, queues configuration and buffer transfers are 2974 + nearly identical. Existing differences are described in the 2975 + following sections. 2976 + 2977 + Device Initialization 2978 + 2979 + Instead of using the PCI IO space for virtio header, the “ 2980 + virtio-mmio” device provides a set of memory mapped control 2981 + registers, all 32 bits wide, followed by device-specific 2982 + configuration space. The following list presents their layout: 2983 + 2984 + Offset from the device base address | Direction | Name 2985 + Description 2986 + 2987 + 0x000 | R | MagicValue 2988 + “virt” string. 2989 + 2990 + 0x004 | R | Version 2991 + Device version number. Currently must be 1. 2992 + 2993 + 0x008 | R | DeviceID 2994 + Virtio Subsystem Device ID (ie. 1 for network card). 2995 + 2996 + 0x00c | R | VendorID 2997 + Virtio Subsystem Vendor ID. 2998 + 2999 + 0x010 | R | HostFeatures 3000 + Flags representing features the device supports. 3001 + Reading from this register returns 32 consecutive flag bits, 3002 + first bit depending on the last value written to 3003 + HostFeaturesSel register. Access to this register returns bits HostFeaturesSel*32 3004 + 3005 + to (HostFeaturesSel*32)+31 3006 + , eg. feature bits 0 to 31 if 3007 + HostFeaturesSel is set to 0 and features bits 32 to 63 if 3008 + HostFeaturesSel is set to 1. Also see [sub:Feature-Bits] 3009 + 3010 + 0x014 | W | HostFeaturesSel 3011 + Device (Host) features word selection. 3012 + Writing to this register selects a set of 32 device feature bits 3013 + accessible by reading from HostFeatures register. Device driver 3014 + must write a value to the HostFeaturesSel register before 3015 + reading from the HostFeatures register. 3016 + 3017 + 0x020 | W | GuestFeatures 3018 + Flags representing device features understood and activated by 3019 + the driver. 3020 + Writing to this register sets 32 consecutive flag bits, first 3021 + bit depending on the last value written to GuestFeaturesSel 3022 + register. Access to this register sets bits GuestFeaturesSel*32 3023 + 3024 + to (GuestFeaturesSel*32)+31 3025 + , eg. feature bits 0 to 31 if 3026 + GuestFeaturesSel is set to 0 and features bits 32 to 63 if 3027 + GuestFeaturesSel is set to 1. Also see [sub:Feature-Bits] 3028 + 3029 + 0x024 | W | GuestFeaturesSel 3030 + Activated (Guest) features word selection. 3031 + Writing to this register selects a set of 32 activated feature 3032 + bits accessible by writing to the GuestFeatures register. 3033 + Device driver must write a value to the GuestFeaturesSel 3034 + register before writing to the GuestFeatures register. 3035 + 3036 + 0x028 | W | GuestPageSize 3037 + Guest page size. 3038 + Device driver must write the guest page size in bytes to the 3039 + register during initialization, before any queues are used. 3040 + This value must be a power of 2 and is used by the Host to 3041 + calculate Guest address of the first queue page (see QueuePFN). 3042 + 3043 + 0x030 | W | QueueSel 3044 + Virtual queue index (first queue is 0). 3045 + Writing to this register selects the virtual queue that the 3046 + following operations on QueueNum, QueueAlign and QueuePFN apply 3047 + to. 3048 + 3049 + 0x034 | R | QueueNumMax 3050 + Maximum virtual queue size. 3051 + Reading from the register returns the maximum size of the queue 3052 + the Host is ready to process or zero (0x0) if the queue is not 3053 + available. This applies to the queue selected by writing to 3054 + QueueSel and is allowed only when QueuePFN is set to zero 3055 + (0x0), so when the queue is not actively used. 3056 + 3057 + 0x038 | W | QueueNum 3058 + Virtual queue size. 3059 + Queue size is a number of elements in the queue, therefore size 3060 + of the descriptor table and both available and used rings. 3061 + Writing to this register notifies the Host what size of the 3062 + queue the Guest will use. This applies to the queue selected by 3063 + writing to QueueSel. 3064 + 3065 + 0x03c | W | QueueAlign 3066 + Used Ring alignment in the virtual queue. 3067 + Writing to this register notifies the Host about alignment 3068 + boundary of the Used Ring in bytes. This value must be a power 3069 + of 2 and applies to the queue selected by writing to QueueSel. 3070 + 3071 + 0x040 | RW | QueuePFN 3072 + Guest physical page number of the virtual queue. 3073 + Writing to this register notifies the host about location of the 3074 + virtual queue in the Guest's physical address space. This value 3075 + is the index number of a page starting with the queue 3076 + Descriptor Table. Value zero (0x0) means physical address zero 3077 + (0x00000000) and is illegal. When the Guest stops using the 3078 + queue it must write zero (0x0) to this register. 3079 + Reading from this register returns the currently used page 3080 + number of the queue, therefore a value other than zero (0x0) 3081 + means that the queue is in use. 3082 + Both read and write accesses apply to the queue selected by 3083 + writing to QueueSel. 3084 + 3085 + 0x050 | W | QueueNotify 3086 + Queue notifier. 3087 + Writing a queue index to this register notifies the Host that 3088 + there are new buffers to process in the queue. 3089 + 3090 + 0x60 | R | InterruptStatus 3091 + Interrupt status. 3092 + Reading from this register returns a bit mask of interrupts 3093 + asserted by the device. An interrupt is asserted if the 3094 + corresponding bit is set, ie. equals one (1). 3095 + 3096 + Bit 0 | Used Ring Update 3097 + This interrupt is asserted when the Host has updated the Used 3098 + Ring in at least one of the active virtual queues. 3099 + 3100 + Bit 1 | Configuration change 3101 + This interrupt is asserted when configuration of the device has 3102 + changed. 3103 + 3104 + 0x064 | W | InterruptACK 3105 + Interrupt acknowledge. 3106 + Writing to this register notifies the Host that the Guest 3107 + finished handling interrupts. Set bits in the value clear the 3108 + corresponding bits of the InterruptStatus register. 3109 + 3110 + 0x070 | RW | Status 3111 + Device status. 3112 + Reading from this register returns the current device status 3113 + flags. 3114 + Writing non-zero values to this register sets the status flags, 3115 + indicating the Guest progress. Writing zero (0x0) to this 3116 + register triggers a device reset. 3117 + Also see [sub:Device-Initialization-Sequence] 3118 + 3119 + 0x100+ | RW | Config 3120 + Device-specific configuration space starts at an offset 0x100 3121 + and is accessed with byte alignment. Its meaning and size 3122 + depends on the device and the driver. 3123 + 3124 + Virtual queue size is a number of elements in the queue, 3125 + therefore size of the descriptor table and both available and 3126 + used rings. 3127 + 3128 + The endianness of the registers follows the native endianness of 3129 + the Guest. Writing to registers described as “R” and reading from 3130 + registers described as “W” is not permitted and can cause 3131 + undefined behavior. 3132 + 3133 + The device initialization is performed as described in [sub:Device-Initialization-Sequence] 3134 + with one exception: the Guest must notify the Host about its 3135 + page size, writing the size in bytes to GuestPageSize register 3136 + before the initialization is finished. 3137 + 3138 + The memory mapped virtio devices generate single interrupt only, 3139 + therefore no special configuration is required. 3140 + 3141 + Virtqueue Configuration 3142 + 3143 + The virtual queue configuration is performed in a similar way to 3144 + the one described in [sec:Virtqueue-Configuration] with a few 3145 + additional operations: 3146 + 3147 + Select the queue writing its index (first queue is 0) to the 3148 + QueueSel register. 3149 + 3150 + Check if the queue is not already in use: read QueuePFN 3151 + register, returned value should be zero (0x0). 3152 + 3153 + Read maximum queue size (number of elements) from the 3154 + QueueNumMax register. If the returned value is zero (0x0) the 3155 + queue is not available. 3156 + 3157 + Allocate and zero the queue pages in contiguous virtual memory, 3158 + aligning the Used Ring to an optimal boundary (usually page 3159 + size). Size of the allocated queue may be smaller than or equal 3160 + to the maximum size returned by the Host. 3161 + 3162 + Notify the Host about the queue size by writing the size to 3163 + QueueNum register. 3164 + 3165 + Notify the Host about the used alignment by writing its value 3166 + in bytes to QueueAlign register. 3167 + 3168 + Write the physical number of the first page of the queue to the 3169 + QueuePFN register. 3170 + 3171 + The queue and the device are ready to begin normal operations 3172 + now. 3173 + 3174 + Device Operation 3175 + 3176 + The memory mapped virtio device behaves in the same way as 3177 + described in [sec:Device-Operation], with the following 3178 + exceptions: 3179 + 3180 + The device is notified about new buffers available in a queue 3181 + by writing the queue index to register QueueNum instead of the 3182 + virtio header in PCI I/O space ([sub:Notifying-The-Device]). 3183 + 3184 + The memory mapped virtio device is using single, dedicated 3185 + interrupt signal, which is raised when at least one of the 3186 + interrupts described in the InterruptStatus register 3187 + description is asserted. After receiving an interrupt, the 3188 + driver must read the InterruptStatus register to check what 3189 + caused the interrupt (see the register description). After the 3190 + interrupt is handled, the driver must acknowledge it by writing 3191 + a bit mask corresponding to the serviced interrupt to the 3192 + InterruptACK register. 2217 3193
+11 -10
drivers/block/virtio_blk.c
··· 29 29 /* The disk structure for the kernel. */ 30 30 struct gendisk *disk; 31 31 32 - /* Request tracking. */ 33 - struct list_head reqs; 34 - 35 32 mempool_t *pool; 36 33 37 34 /* Process context for config space updates */ ··· 52 55 53 56 struct virtblk_req 54 57 { 55 - struct list_head list; 56 58 struct request *req; 57 59 struct virtio_blk_outhdr out_hdr; 58 60 struct virtio_scsi_inhdr in_hdr; ··· 95 99 } 96 100 97 101 __blk_end_request_all(vbr->req, error); 98 - list_del(&vbr->list); 99 102 mempool_free(vbr, vblk->pool); 100 103 } 101 104 /* In case queue is stopped waiting for more buffers. */ ··· 179 184 return false; 180 185 } 181 186 182 - list_add_tail(&vbr->list, &vblk->reqs); 183 187 return true; 184 188 } 185 189 ··· 431 437 goto out_free_index; 432 438 } 433 439 434 - INIT_LIST_HEAD(&vblk->reqs); 435 440 spin_lock_init(&vblk->lock); 436 441 vblk->vdev = vdev; 437 442 vblk->sg_elems = sg_elems; ··· 576 583 { 577 584 struct virtio_blk *vblk = vdev->priv; 578 585 int index = vblk->index; 586 + struct virtblk_req *vbr; 587 + unsigned long flags; 579 588 580 589 /* Prevent config work handler from accessing the device. */ 581 590 mutex_lock(&vblk->config_lock); 582 591 vblk->config_enable = false; 583 592 mutex_unlock(&vblk->config_lock); 584 - 585 - /* Nothing should be pending. */ 586 - BUG_ON(!list_empty(&vblk->reqs)); 587 593 588 594 /* Stop all the virtqueues. */ 589 595 vdev->config->reset(vdev); ··· 590 598 flush_work(&vblk->config_work); 591 599 592 600 del_gendisk(vblk->disk); 601 + 602 + /* Abort requests dispatched to driver. */ 603 + spin_lock_irqsave(&vblk->lock, flags); 604 + while ((vbr = virtqueue_detach_unused_buf(vblk->vq))) { 605 + __blk_end_request_all(vbr->req, -EIO); 606 + mempool_free(vbr, vblk->pool); 607 + } 608 + spin_unlock_irqrestore(&vblk->lock, flags); 609 + 593 610 blk_cleanup_queue(vblk->disk->queue); 594 611 put_disk(vblk->disk); 595 612 mempool_destroy(vblk->pool);
+11
drivers/virtio/Kconfig
··· 46 46 47 47 If unsure, say N. 48 48 49 + config VIRTIO_MMIO_CMDLINE_DEVICES 50 + bool "Memory mapped virtio devices parameter parsing" 51 + depends on VIRTIO_MMIO 52 + ---help--- 53 + Allow virtio-mmio devices instantiation via the kernel command line 54 + or module parameters. Be aware that using incorrect parameters (base 55 + address in particular) can crash your system - you have been warned. 56 + See Documentation/kernel-parameters.txt for details. 57 + 58 + If unsure, say 'N'. 59 + 49 60 endmenu
+9 -2
drivers/virtio/virtio.c
··· 2 2 #include <linux/spinlock.h> 3 3 #include <linux/virtio_config.h> 4 4 #include <linux/module.h> 5 + #include <linux/idr.h> 5 6 6 7 /* Unique numbering for virtio devices. */ 7 - static unsigned int dev_index; 8 + static DEFINE_IDA(virtio_index_ida); 8 9 9 10 static ssize_t device_show(struct device *_d, 10 11 struct device_attribute *attr, char *buf) ··· 194 193 dev->dev.bus = &virtio_bus; 195 194 196 195 /* Assign a unique device index and hence name. */ 197 - dev->index = dev_index++; 196 + err = ida_simple_get(&virtio_index_ida, 0, 0, GFP_KERNEL); 197 + if (err < 0) 198 + goto out; 199 + 200 + dev->index = err; 198 201 dev_set_name(&dev->dev, "virtio%u", dev->index); 199 202 200 203 /* We always start by resetting the device, in case a previous ··· 213 208 /* device_register() causes the bus infrastructure to look for a 214 209 * matching driver. */ 215 210 err = device_register(&dev->dev); 211 + out: 216 212 if (err) 217 213 add_status(dev, VIRTIO_CONFIG_S_FAILED); 218 214 return err; ··· 223 217 void unregister_virtio_device(struct virtio_device *dev) 224 218 { 225 219 device_unregister(&dev->dev); 220 + ida_simple_remove(&virtio_index_ida, dev->index); 226 221 } 227 222 EXPORT_SYMBOL_GPL(unregister_virtio_device); 228 223
+13 -20
drivers/virtio/virtio_balloon.c
··· 381 381 return err; 382 382 } 383 383 384 - static void __devexit virtballoon_remove(struct virtio_device *vdev) 384 + static void remove_common(struct virtio_balloon *vb) 385 385 { 386 - struct virtio_balloon *vb = vdev->priv; 387 - 388 - kthread_stop(vb->thread); 389 - 390 386 /* There might be pages left in the balloon: free them. */ 391 387 while (vb->num_pages) 392 388 leak_balloon(vb, vb->num_pages); 393 389 update_balloon_size(vb); 394 390 395 391 /* Now we reset the device so we can clean up the queues. */ 396 - vdev->config->reset(vdev); 392 + vb->vdev->config->reset(vb->vdev); 397 393 398 - vdev->config->del_vqs(vdev); 394 + vb->vdev->config->del_vqs(vb->vdev); 395 + } 396 + 397 + static void __devexit virtballoon_remove(struct virtio_device *vdev) 398 + { 399 + struct virtio_balloon *vb = vdev->priv; 400 + 401 + kthread_stop(vb->thread); 402 + remove_common(vb); 399 403 kfree(vb); 400 404 } 401 405 ··· 413 409 * function is called. 414 410 */ 415 411 416 - while (vb->num_pages) 417 - leak_balloon(vb, vb->num_pages); 418 - update_balloon_size(vb); 419 - 420 - /* Ensure we don't get any more requests from the host */ 421 - vdev->config->reset(vdev); 422 - vdev->config->del_vqs(vdev); 412 + remove_common(vb); 423 413 return 0; 424 414 } 425 415 426 - static int restore_common(struct virtio_device *vdev) 416 + static int virtballoon_restore(struct virtio_device *vdev) 427 417 { 428 418 struct virtio_balloon *vb = vdev->priv; 429 419 int ret; ··· 429 431 fill_balloon(vb, towards_target(vb)); 430 432 update_balloon_size(vb); 431 433 return 0; 432 - } 433 - 434 - static int virtballoon_restore(struct virtio_device *vdev) 435 - { 436 - return restore_common(vdev); 437 434 } 438 435 #endif 439 436
+163
drivers/virtio/virtio_mmio.c
··· 6 6 * This module allows virtio devices to be used over a virtual, memory mapped 7 7 * platform device. 8 8 * 9 + * The guest device(s) may be instantiated in one of three equivalent ways: 10 + * 11 + * 1. Static platform device in board's code, eg.: 12 + * 13 + * static struct platform_device v2m_virtio_device = { 14 + * .name = "virtio-mmio", 15 + * .id = -1, 16 + * .num_resources = 2, 17 + * .resource = (struct resource []) { 18 + * { 19 + * .start = 0x1001e000, 20 + * .end = 0x1001e0ff, 21 + * .flags = IORESOURCE_MEM, 22 + * }, { 23 + * .start = 42 + 32, 24 + * .end = 42 + 32, 25 + * .flags = IORESOURCE_IRQ, 26 + * }, 27 + * } 28 + * }; 29 + * 30 + * 2. Device Tree node, eg.: 31 + * 32 + * virtio_block@1e000 { 33 + * compatible = "virtio,mmio"; 34 + * reg = <0x1e000 0x100>; 35 + * interrupts = <42>; 36 + * } 37 + * 38 + * 3. Kernel module (or command line) parameter. Can be used more than once - 39 + * one device will be created for each one. Syntax: 40 + * 41 + * [virtio_mmio.]device=<size>@<baseaddr>:<irq>[:<id>] 42 + * where: 43 + * <size> := size (can use standard suffixes like K, M or G) 44 + * <baseaddr> := physical base address 45 + * <irq> := interrupt number (as passed to request_irq()) 46 + * <id> := (optional) platform device id 47 + * eg.: 48 + * virtio_mmio.device=0x100@0x100b0000:48 \ 49 + * virtio_mmio.device=1K@0x1001e000:74 50 + * 51 + * 52 + * 9 53 * Registers layout (all 32-bit wide): 10 54 * 11 55 * offset d. name description ··· 85 41 * This work is licensed under the terms of the GNU GPL, version 2 or later. 86 42 * See the COPYING file in the top-level directory. 87 43 */ 44 + 45 + #define pr_fmt(fmt) "virtio-mmio: " fmt 88 46 89 47 #include <linux/highmem.h> 90 48 #include <linux/interrupt.h> ··· 495 449 496 450 497 451 452 + /* Devices list parameter */ 453 + 454 + #if defined(CONFIG_VIRTIO_MMIO_CMDLINE_DEVICES) 455 + 456 + static struct device vm_cmdline_parent = { 457 + .init_name = "virtio-mmio-cmdline", 458 + }; 459 + 460 + static int vm_cmdline_parent_registered; 461 + static int vm_cmdline_id; 462 + 463 + static int vm_cmdline_set(const char *device, 464 + const struct kernel_param *kp) 465 + { 466 + int err; 467 + struct resource resources[2] = {}; 468 + char *str; 469 + long long int base; 470 + int processed, consumed = 0; 471 + struct platform_device *pdev; 472 + 473 + resources[0].flags = IORESOURCE_MEM; 474 + resources[1].flags = IORESOURCE_IRQ; 475 + 476 + resources[0].end = memparse(device, &str) - 1; 477 + 478 + processed = sscanf(str, "@%lli:%u%n:%d%n", 479 + &base, &resources[1].start, &consumed, 480 + &vm_cmdline_id, &consumed); 481 + 482 + if (processed < 2 || processed > 3 || str[consumed]) 483 + return -EINVAL; 484 + 485 + resources[0].start = base; 486 + resources[0].end += base; 487 + resources[1].end = resources[1].start; 488 + 489 + if (!vm_cmdline_parent_registered) { 490 + err = device_register(&vm_cmdline_parent); 491 + if (err) { 492 + pr_err("Failed to register parent device!\n"); 493 + return err; 494 + } 495 + vm_cmdline_parent_registered = 1; 496 + } 497 + 498 + pr_info("Registering device virtio-mmio.%d at 0x%llx-0x%llx, IRQ %d.\n", 499 + vm_cmdline_id, 500 + (unsigned long long)resources[0].start, 501 + (unsigned long long)resources[0].end, 502 + (int)resources[1].start); 503 + 504 + pdev = platform_device_register_resndata(&vm_cmdline_parent, 505 + "virtio-mmio", vm_cmdline_id++, 506 + resources, ARRAY_SIZE(resources), NULL, 0); 507 + if (IS_ERR(pdev)) 508 + return PTR_ERR(pdev); 509 + 510 + return 0; 511 + } 512 + 513 + static int vm_cmdline_get_device(struct device *dev, void *data) 514 + { 515 + char *buffer = data; 516 + unsigned int len = strlen(buffer); 517 + struct platform_device *pdev = to_platform_device(dev); 518 + 519 + snprintf(buffer + len, PAGE_SIZE - len, "0x%llx@0x%llx:%llu:%d\n", 520 + pdev->resource[0].end - pdev->resource[0].start + 1ULL, 521 + (unsigned long long)pdev->resource[0].start, 522 + (unsigned long long)pdev->resource[1].start, 523 + pdev->id); 524 + return 0; 525 + } 526 + 527 + static int vm_cmdline_get(char *buffer, const struct kernel_param *kp) 528 + { 529 + buffer[0] = '\0'; 530 + device_for_each_child(&vm_cmdline_parent, buffer, 531 + vm_cmdline_get_device); 532 + return strlen(buffer) + 1; 533 + } 534 + 535 + static struct kernel_param_ops vm_cmdline_param_ops = { 536 + .set = vm_cmdline_set, 537 + .get = vm_cmdline_get, 538 + }; 539 + 540 + device_param_cb(device, &vm_cmdline_param_ops, NULL, S_IRUSR); 541 + 542 + static int vm_unregister_cmdline_device(struct device *dev, 543 + void *data) 544 + { 545 + platform_device_unregister(to_platform_device(dev)); 546 + 547 + return 0; 548 + } 549 + 550 + static void vm_unregister_cmdline_devices(void) 551 + { 552 + if (vm_cmdline_parent_registered) { 553 + device_for_each_child(&vm_cmdline_parent, NULL, 554 + vm_unregister_cmdline_device); 555 + device_unregister(&vm_cmdline_parent); 556 + vm_cmdline_parent_registered = 0; 557 + } 558 + } 559 + 560 + #else 561 + 562 + static void vm_unregister_cmdline_devices(void) 563 + { 564 + } 565 + 566 + #endif 567 + 498 568 /* Platform driver */ 499 569 500 570 static struct of_device_id virtio_mmio_match[] = { ··· 637 475 static void __exit virtio_mmio_exit(void) 638 476 { 639 477 platform_driver_unregister(&virtio_mmio_driver); 478 + vm_unregister_cmdline_devices(); 640 479 } 641 480 642 481 module_init(virtio_mmio_init);
+1 -10
include/linux/virtio_config.h
··· 74 74 * @set_status: write the status byte 75 75 * vdev: the virtio_device 76 76 * status: the new status byte 77 - * @request_vqs: request the specified number of virtqueues 78 - * vdev: the virtio_device 79 - * max_vqs: the max number of virtqueues we want 80 - * If supplied, must call before any virtqueues are instantiated. 81 - * To modify the max number of virtqueues after request_vqs has been 82 - * called, call free_vqs and then request_vqs with a new value. 83 - * @free_vqs: cleanup resources allocated by request_vqs 84 - * vdev: the virtio_device 85 - * If supplied, must call after all virtqueues have been deleted. 86 77 * @reset: reset the device 87 78 * vdev: the virtio device 88 79 * After this, status and feature negotiation must be done again ··· 147 156 * @vdev: the virtio device 148 157 * @fbit: the feature bit 149 158 * @offset: the type to search for. 150 - * @val: a pointer to the value to fill in. 159 + * @v: a pointer to the value to fill in. 151 160 * 152 161 * The return value is -ENOENT if the feature doesn't exist. Otherwise 153 162 * the config value is copied into whatever is pointed to by v. */
+2 -1
net/9p/trans_virtio.c
··· 615 615 { 616 616 struct virtio_chan *chan = vdev->priv; 617 617 618 - BUG_ON(chan->inuse); 618 + if (chan->inuse) 619 + p9_virtio_close(chan->client); 619 620 vdev->config->del_vqs(vdev); 620 621 621 622 mutex_lock(&virtio_9p_lock);