Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost

Pull virtio updates from Michael Tsirkin:

- vduse driver ("vDPA Device in Userspace") supporting emulated virtio
block devices

- virtio-vsock support for end of record with SEQPACKET

- vdpa: mac and mq support for ifcvf and mlx5

- vdpa: management netlink for ifcvf

- virtio-i2c, gpio dt bindings

- misc fixes and cleanups

* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits)
Documentation: Add documentation for VDUSE
vduse: Introduce VDUSE - vDPA Device in Userspace
vduse: Implement an MMU-based software IOTLB
vdpa: Support transferring virtual addressing during DMA mapping
vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()
vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()
vhost-iotlb: Add an opaque pointer for vhost IOTLB
vhost-vdpa: Handle the failure of vdpa_reset()
vdpa: Add reset callback in vdpa_config_ops
vdpa: Fix some coding style issues
file: Export receive_fd() to modules
eventfd: Export eventfd_wake_count to modules
iova: Export alloc_iova_fast() and free_iova_fast()
virtio-blk: remove unneeded "likely" statements
virtio-balloon: Use virtio_find_vqs() helper
vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro
vsock_test: update message bounds test for MSG_EOR
af_vsock: rename variables in receive loop
virtio/vsock: support MSG_EOR bit processing
vhost/vsock: support MSG_EOR bit processing
...

+4131 -329
+59
Documentation/devicetree/bindings/gpio/gpio-virtio.yaml
··· 1 + # SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + %YAML 1.2 3 + --- 4 + $id: http://devicetree.org/schemas/gpio/gpio-virtio.yaml# 5 + $schema: http://devicetree.org/meta-schemas/core.yaml# 6 + 7 + title: Virtio GPIO controller 8 + 9 + maintainers: 10 + - Viresh Kumar <viresh.kumar@linaro.org> 11 + 12 + allOf: 13 + - $ref: /schemas/virtio/virtio-device.yaml# 14 + 15 + description: 16 + Virtio GPIO controller, see /schemas/virtio/virtio-device.yaml for more 17 + details. 18 + 19 + properties: 20 + $nodename: 21 + const: gpio 22 + 23 + compatible: 24 + const: virtio,device29 25 + 26 + gpio-controller: true 27 + 28 + "#gpio-cells": 29 + const: 2 30 + 31 + interrupt-controller: true 32 + 33 + "#interrupt-cells": 34 + const: 2 35 + 36 + required: 37 + - compatible 38 + - gpio-controller 39 + - "#gpio-cells" 40 + 41 + unevaluatedProperties: false 42 + 43 + examples: 44 + - | 45 + virtio@3000 { 46 + compatible = "virtio,mmio"; 47 + reg = <0x3000 0x100>; 48 + interrupts = <41>; 49 + 50 + gpio { 51 + compatible = "virtio,device29"; 52 + gpio-controller; 53 + #gpio-cells = <2>; 54 + interrupt-controller; 55 + #interrupt-cells = <2>; 56 + }; 57 + }; 58 + 59 + ...
+51
Documentation/devicetree/bindings/i2c/i2c-virtio.yaml
··· 1 + # SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + %YAML 1.2 3 + --- 4 + $id: http://devicetree.org/schemas/i2c/i2c-virtio.yaml# 5 + $schema: http://devicetree.org/meta-schemas/core.yaml# 6 + 7 + title: Virtio I2C Adapter 8 + 9 + maintainers: 10 + - Viresh Kumar <viresh.kumar@linaro.org> 11 + 12 + allOf: 13 + - $ref: /schemas/i2c/i2c-controller.yaml# 14 + - $ref: /schemas/virtio/virtio-device.yaml# 15 + 16 + description: 17 + Virtio I2C device, see /schemas/virtio/virtio-device.yaml for more details. 18 + 19 + properties: 20 + $nodename: 21 + const: i2c 22 + 23 + compatible: 24 + const: virtio,device22 25 + 26 + required: 27 + - compatible 28 + 29 + unevaluatedProperties: false 30 + 31 + examples: 32 + - | 33 + virtio@3000 { 34 + compatible = "virtio,mmio"; 35 + reg = <0x3000 0x100>; 36 + interrupts = <41>; 37 + 38 + i2c { 39 + compatible = "virtio,device22"; 40 + 41 + #address-cells = <1>; 42 + #size-cells = <0>; 43 + 44 + light-sensor@20 { 45 + compatible = "dynaimage,al3320a"; 46 + reg = <0x20>; 47 + }; 48 + }; 49 + }; 50 + 51 + ...
+2 -1
Documentation/devicetree/bindings/virtio/mmio.yaml
··· 36 36 - reg 37 37 - interrupts 38 38 39 - additionalProperties: false 39 + additionalProperties: 40 + type: object 40 41 41 42 examples: 42 43 - |
+41
Documentation/devicetree/bindings/virtio/virtio-device.yaml
··· 1 + # SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause) 2 + %YAML 1.2 3 + --- 4 + $id: http://devicetree.org/schemas/virtio/virtio-device.yaml# 5 + $schema: http://devicetree.org/meta-schemas/core.yaml# 6 + 7 + title: Virtio device bindings 8 + 9 + maintainers: 10 + - Viresh Kumar <viresh.kumar@linaro.org> 11 + 12 + description: 13 + These bindings are applicable to virtio devices irrespective of the bus they 14 + are bound to, like mmio or pci. 15 + 16 + # We need a select here so we don't match all nodes with 'virtio,mmio' 17 + properties: 18 + compatible: 19 + pattern: "^virtio,device[0-9a-f]{1,8}$" 20 + description: Virtio device nodes. 21 + "virtio,deviceID", where ID is the virtio device id. The textual 22 + representation of ID shall be in lower case hexadecimal with leading 23 + zeroes suppressed. 24 + 25 + required: 26 + - compatible 27 + 28 + additionalProperties: true 29 + 30 + examples: 31 + - | 32 + virtio@3000 { 33 + compatible = "virtio,mmio"; 34 + reg = <0x3000 0x100>; 35 + interrupts = <43>; 36 + 37 + i2c { 38 + compatible = "virtio,device22"; 39 + }; 40 + }; 41 + ...
+1
Documentation/userspace-api/index.rst
··· 27 27 iommu 28 28 media/index 29 29 sysfs-platform_profile 30 + vduse 30 31 31 32 .. only:: subproject and html 32 33
+1
Documentation/userspace-api/ioctl/ioctl-number.rst
··· 299 299 'z' 10-4F drivers/s390/crypto/zcrypt_api.h conflict! 300 300 '|' 00-7F linux/media.h 301 301 0x80 00-1F linux/fb.h 302 + 0x81 00-1F linux/vduse.h 302 303 0x89 00-06 arch/x86/include/asm/sockios.h 303 304 0x89 0B-DF linux/sockios.h 304 305 0x89 E0-EF linux/sockios.h SIOCPROTOPRIVATE range
+233
Documentation/userspace-api/vduse.rst
··· 1 + ================================== 2 + VDUSE - "vDPA Device in Userspace" 3 + ================================== 4 + 5 + vDPA (virtio data path acceleration) device is a device that uses a 6 + datapath which complies with the virtio specifications with vendor 7 + specific control path. vDPA devices can be both physically located on 8 + the hardware or emulated by software. VDUSE is a framework that makes it 9 + possible to implement software-emulated vDPA devices in userspace. And 10 + to make the device emulation more secure, the emulated vDPA device's 11 + control path is handled in the kernel and only the data path is 12 + implemented in the userspace. 13 + 14 + Note that only virtio block device is supported by VDUSE framework now, 15 + which can reduce security risks when the userspace process that implements 16 + the data path is run by an unprivileged user. The support for other device 17 + types can be added after the security issue of corresponding device driver 18 + is clarified or fixed in the future. 19 + 20 + Create/Destroy VDUSE devices 21 + ------------------------ 22 + 23 + VDUSE devices are created as follows: 24 + 25 + 1. Create a new VDUSE instance with ioctl(VDUSE_CREATE_DEV) on 26 + /dev/vduse/control. 27 + 28 + 2. Setup each virtqueue with ioctl(VDUSE_VQ_SETUP) on /dev/vduse/$NAME. 29 + 30 + 3. Begin processing VDUSE messages from /dev/vduse/$NAME. The first 31 + messages will arrive while attaching the VDUSE instance to vDPA bus. 32 + 33 + 4. Send the VDPA_CMD_DEV_NEW netlink message to attach the VDUSE 34 + instance to vDPA bus. 35 + 36 + VDUSE devices are destroyed as follows: 37 + 38 + 1. Send the VDPA_CMD_DEV_DEL netlink message to detach the VDUSE 39 + instance from vDPA bus. 40 + 41 + 2. Close the file descriptor referring to /dev/vduse/$NAME. 42 + 43 + 3. Destroy the VDUSE instance with ioctl(VDUSE_DESTROY_DEV) on 44 + /dev/vduse/control. 45 + 46 + The netlink messages can be sent via vdpa tool in iproute2 or use the 47 + below sample codes: 48 + 49 + .. code-block:: c 50 + 51 + static int netlink_add_vduse(const char *name, enum vdpa_command cmd) 52 + { 53 + struct nl_sock *nlsock; 54 + struct nl_msg *msg; 55 + int famid; 56 + 57 + nlsock = nl_socket_alloc(); 58 + if (!nlsock) 59 + return -ENOMEM; 60 + 61 + if (genl_connect(nlsock)) 62 + goto free_sock; 63 + 64 + famid = genl_ctrl_resolve(nlsock, VDPA_GENL_NAME); 65 + if (famid < 0) 66 + goto close_sock; 67 + 68 + msg = nlmsg_alloc(); 69 + if (!msg) 70 + goto close_sock; 71 + 72 + if (!genlmsg_put(msg, NL_AUTO_PORT, NL_AUTO_SEQ, famid, 0, 0, cmd, 0)) 73 + goto nla_put_failure; 74 + 75 + NLA_PUT_STRING(msg, VDPA_ATTR_DEV_NAME, name); 76 + if (cmd == VDPA_CMD_DEV_NEW) 77 + NLA_PUT_STRING(msg, VDPA_ATTR_MGMTDEV_DEV_NAME, "vduse"); 78 + 79 + if (nl_send_sync(nlsock, msg)) 80 + goto close_sock; 81 + 82 + nl_close(nlsock); 83 + nl_socket_free(nlsock); 84 + 85 + return 0; 86 + nla_put_failure: 87 + nlmsg_free(msg); 88 + close_sock: 89 + nl_close(nlsock); 90 + free_sock: 91 + nl_socket_free(nlsock); 92 + return -1; 93 + } 94 + 95 + How VDUSE works 96 + --------------- 97 + 98 + As mentioned above, a VDUSE device is created by ioctl(VDUSE_CREATE_DEV) on 99 + /dev/vduse/control. With this ioctl, userspace can specify some basic configuration 100 + such as device name (uniquely identify a VDUSE device), virtio features, virtio 101 + configuration space, the number of virtqueues and so on for this emulated device. 102 + Then a char device interface (/dev/vduse/$NAME) is exported to userspace for device 103 + emulation. Userspace can use the VDUSE_VQ_SETUP ioctl on /dev/vduse/$NAME to 104 + add per-virtqueue configuration such as the max size of virtqueue to the device. 105 + 106 + After the initialization, the VDUSE device can be attached to vDPA bus via 107 + the VDPA_CMD_DEV_NEW netlink message. Userspace needs to read()/write() on 108 + /dev/vduse/$NAME to receive/reply some control messages from/to VDUSE kernel 109 + module as follows: 110 + 111 + .. code-block:: c 112 + 113 + static int vduse_message_handler(int dev_fd) 114 + { 115 + int len; 116 + struct vduse_dev_request req; 117 + struct vduse_dev_response resp; 118 + 119 + len = read(dev_fd, &req, sizeof(req)); 120 + if (len != sizeof(req)) 121 + return -1; 122 + 123 + resp.request_id = req.request_id; 124 + 125 + switch (req.type) { 126 + 127 + /* handle different types of messages */ 128 + 129 + } 130 + 131 + len = write(dev_fd, &resp, sizeof(resp)); 132 + if (len != sizeof(resp)) 133 + return -1; 134 + 135 + return 0; 136 + } 137 + 138 + There are now three types of messages introduced by VDUSE framework: 139 + 140 + - VDUSE_GET_VQ_STATE: Get the state for virtqueue, userspace should return 141 + avail index for split virtqueue or the device/driver ring wrap counters and 142 + the avail and used index for packed virtqueue. 143 + 144 + - VDUSE_SET_STATUS: Set the device status, userspace should follow 145 + the virtio spec: https://docs.oasis-open.org/virtio/virtio/v1.1/virtio-v1.1.html 146 + to process this message. For example, fail to set the FEATURES_OK device 147 + status bit if the device can not accept the negotiated virtio features 148 + get from the VDUSE_DEV_GET_FEATURES ioctl. 149 + 150 + - VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping for specified 151 + IOVA range, userspace should firstly remove the old mapping, then setup the new 152 + mapping via the VDUSE_IOTLB_GET_FD ioctl. 153 + 154 + After DRIVER_OK status bit is set via the VDUSE_SET_STATUS message, userspace is 155 + able to start the dataplane processing as follows: 156 + 157 + 1. Get the specified virtqueue's information with the VDUSE_VQ_GET_INFO ioctl, 158 + including the size, the IOVAs of descriptor table, available ring and used ring, 159 + the state and the ready status. 160 + 161 + 2. Pass the above IOVAs to the VDUSE_IOTLB_GET_FD ioctl so that those IOVA regions 162 + can be mapped into userspace. Some sample codes is shown below: 163 + 164 + .. code-block:: c 165 + 166 + static int perm_to_prot(uint8_t perm) 167 + { 168 + int prot = 0; 169 + 170 + switch (perm) { 171 + case VDUSE_ACCESS_WO: 172 + prot |= PROT_WRITE; 173 + break; 174 + case VDUSE_ACCESS_RO: 175 + prot |= PROT_READ; 176 + break; 177 + case VDUSE_ACCESS_RW: 178 + prot |= PROT_READ | PROT_WRITE; 179 + break; 180 + } 181 + 182 + return prot; 183 + } 184 + 185 + static void *iova_to_va(int dev_fd, uint64_t iova, uint64_t *len) 186 + { 187 + int fd; 188 + void *addr; 189 + size_t size; 190 + struct vduse_iotlb_entry entry; 191 + 192 + entry.start = iova; 193 + entry.last = iova; 194 + 195 + /* 196 + * Find the first IOVA region that overlaps with the specified 197 + * range [start, last] and return the corresponding file descriptor. 198 + */ 199 + fd = ioctl(dev_fd, VDUSE_IOTLB_GET_FD, &entry); 200 + if (fd < 0) 201 + return NULL; 202 + 203 + size = entry.last - entry.start + 1; 204 + *len = entry.last - iova + 1; 205 + addr = mmap(0, size, perm_to_prot(entry.perm), MAP_SHARED, 206 + fd, entry.offset); 207 + close(fd); 208 + if (addr == MAP_FAILED) 209 + return NULL; 210 + 211 + /* 212 + * Using some data structures such as linked list to store 213 + * the iotlb mapping. The munmap(2) should be called for the 214 + * cached mapping when the corresponding VDUSE_UPDATE_IOTLB 215 + * message is received or the device is reset. 216 + */ 217 + 218 + return addr + iova - entry.start; 219 + } 220 + 221 + 3. Setup the kick eventfd for the specified virtqueues with the VDUSE_VQ_SETUP_KICKFD 222 + ioctl. The kick eventfd is used by VDUSE kernel module to notify userspace to 223 + consume the available ring. This is optional since userspace can choose to poll the 224 + available ring instead. 225 + 226 + 4. Listen to the kick eventfd (optional) and consume the available ring. The buffer 227 + described by the descriptors in the descriptor table should be also mapped into 228 + userspace via the VDUSE_IOTLB_GET_FD ioctl before accessing. 229 + 230 + 5. Inject an interrupt for specific virtqueue with the VDUSE_INJECT_VQ_IRQ ioctl 231 + after the used ring is filled. 232 + 233 + For more details on the uAPI, please see include/uapi/linux/vduse.h.
+2 -2
drivers/block/virtio_blk.c
··· 762 762 goto out_free_vblk; 763 763 764 764 /* Default queue sizing is to fill the ring. */ 765 - if (likely(!virtblk_queue_depth)) { 765 + if (!virtblk_queue_depth) { 766 766 queue_depth = vblk->vqs[0].vq->num_free; 767 767 /* ... but without indirect descs, we use 2 descs per req */ 768 768 if (!virtio_has_feature(vdev, VIRTIO_RING_F_INDIRECT_DESC)) ··· 836 836 else 837 837 blk_size = queue_logical_block_size(q); 838 838 839 - if (unlikely(blk_size < SECTOR_SIZE || blk_size > PAGE_SIZE)) { 839 + if (blk_size < SECTOR_SIZE || blk_size > PAGE_SIZE) { 840 840 dev_err(&vdev->dev, 841 841 "block size is changed unexpectedly, now is %u\n", 842 842 blk_size);
+2
drivers/iommu/iova.c
··· 519 519 520 520 return new_iova->pfn_lo; 521 521 } 522 + EXPORT_SYMBOL_GPL(alloc_iova_fast); 522 523 523 524 /** 524 525 * free_iova_fast - free iova pfn range into rcache ··· 537 536 538 537 free_iova(iovad, pfn); 539 538 } 539 + EXPORT_SYMBOL_GPL(free_iova_fast); 540 540 541 541 #define fq_ring_for_each(i, fq) \ 542 542 for ((i) = (fq)->head; (i) != (fq)->tail; (i) = ((i) + 1) % IOVA_FQ_SIZE)
+11
drivers/vdpa/Kconfig
··· 33 33 vDPA block device simulator which terminates IO request in a 34 34 memory buffer. 35 35 36 + config VDPA_USER 37 + tristate "VDUSE (vDPA Device in Userspace) support" 38 + depends on EVENTFD && MMU && HAS_DMA 39 + select DMA_OPS 40 + select VHOST_IOTLB 41 + select IOMMU_IOVA 42 + help 43 + With VDUSE it is possible to emulate a vDPA Device 44 + in a userspace program. 45 + 36 46 config IFCVF 37 47 tristate "Intel IFC VF vDPA driver" 38 48 depends on PCI_MSI ··· 63 53 config MLX5_VDPA_NET 64 54 tristate "vDPA driver for ConnectX devices" 65 55 select MLX5_VDPA 56 + select VHOST_RING 66 57 depends on MLX5_CORE 67 58 help 68 59 VDPA network driver for ConnectX6 and newer. Provides offloading
+1
drivers/vdpa/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 obj-$(CONFIG_VDPA) += vdpa.o 3 3 obj-$(CONFIG_VDPA_SIM) += vdpa_sim/ 4 + obj-$(CONFIG_VDPA_USER) += vdpa_user/ 4 5 obj-$(CONFIG_IFCVF) += ifcvf/ 5 6 obj-$(CONFIG_MLX5_VDPA) += mlx5/ 6 7 obj-$(CONFIG_VP_VDPA) += virtio_pci/
+5 -3
drivers/vdpa/ifcvf/ifcvf_base.c
··· 158 158 return -EIO; 159 159 } 160 160 161 - for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) { 161 + hw->nr_vring = ifc_ioread16(&hw->common_cfg->num_queues); 162 + 163 + for (i = 0; i < hw->nr_vring; i++) { 162 164 ifc_iowrite16(i, &hw->common_cfg->queue_select); 163 165 notify_off = ifc_ioread16(&hw->common_cfg->queue_notify_off); 164 166 hw->vring[i].notify_addr = hw->notify_base + ··· 306 304 u32 q_pair_id; 307 305 308 306 ifcvf_lm = (struct ifcvf_lm_cfg __iomem *)hw->lm_cfg; 309 - q_pair_id = qid / (IFCVF_MAX_QUEUE_PAIRS * 2); 307 + q_pair_id = qid / hw->nr_vring; 310 308 avail_idx_addr = &ifcvf_lm->vring_lm_cfg[q_pair_id].idx_addr[qid % 2]; 311 309 last_avail_idx = ifc_ioread16(avail_idx_addr); 312 310 ··· 320 318 u32 q_pair_id; 321 319 322 320 ifcvf_lm = (struct ifcvf_lm_cfg __iomem *)hw->lm_cfg; 323 - q_pair_id = qid / (IFCVF_MAX_QUEUE_PAIRS * 2); 321 + q_pair_id = qid / hw->nr_vring; 324 322 avail_idx_addr = &ifcvf_lm->vring_lm_cfg[q_pair_id].idx_addr[qid % 2]; 325 323 hw->vring[qid].last_avail_idx = num; 326 324 ifc_iowrite16(num, avail_idx_addr);
+10 -15
drivers/vdpa/ifcvf/ifcvf_base.h
··· 22 22 #define N3000_DEVICE_ID 0x1041 23 23 #define N3000_SUBSYS_DEVICE_ID 0x001A 24 24 25 - #define IFCVF_NET_SUPPORTED_FEATURES \ 26 - ((1ULL << VIRTIO_NET_F_MAC) | \ 27 - (1ULL << VIRTIO_F_ANY_LAYOUT) | \ 28 - (1ULL << VIRTIO_F_VERSION_1) | \ 29 - (1ULL << VIRTIO_NET_F_STATUS) | \ 30 - (1ULL << VIRTIO_F_ORDER_PLATFORM) | \ 31 - (1ULL << VIRTIO_F_ACCESS_PLATFORM) | \ 32 - (1ULL << VIRTIO_NET_F_MRG_RXBUF)) 33 - 34 - /* Only one queue pair for now. */ 35 - #define IFCVF_MAX_QUEUE_PAIRS 1 25 + /* Max 8 data queue pairs(16 queues) and one control vq for now. */ 26 + #define IFCVF_MAX_QUEUES 17 36 27 37 28 #define IFCVF_QUEUE_ALIGNMENT PAGE_SIZE 38 29 #define IFCVF_QUEUE_MAX 32768 ··· 41 50 42 51 #define ifcvf_private_to_vf(adapter) \ 43 52 (&((struct ifcvf_adapter *)adapter)->vf) 44 - 45 - #define IFCVF_MAX_INTR (IFCVF_MAX_QUEUE_PAIRS * 2 + 1) 46 53 47 54 struct vring_info { 48 55 u64 desc; ··· 72 83 u32 dev_type; 73 84 struct virtio_pci_common_cfg __iomem *common_cfg; 74 85 void __iomem *net_cfg; 75 - struct vring_info vring[IFCVF_MAX_QUEUE_PAIRS * 2]; 86 + struct vring_info vring[IFCVF_MAX_QUEUES]; 76 87 void __iomem * const *base; 77 88 char config_msix_name[256]; 78 89 struct vdpa_callback config_cb; ··· 92 103 93 104 struct ifcvf_lm_cfg { 94 105 u8 reserved[IFCVF_LM_RING_STATE_OFFSET]; 95 - struct ifcvf_vring_lm_cfg vring_lm_cfg[IFCVF_MAX_QUEUE_PAIRS]; 106 + struct ifcvf_vring_lm_cfg vring_lm_cfg[IFCVF_MAX_QUEUES]; 107 + }; 108 + 109 + struct ifcvf_vdpa_mgmt_dev { 110 + struct vdpa_mgmt_dev mdev; 111 + struct ifcvf_adapter *adapter; 112 + struct pci_dev *pdev; 96 113 }; 97 114 98 115 int ifcvf_init_hw(struct ifcvf_hw *hw, struct pci_dev *dev);
+179 -78
drivers/vdpa/ifcvf/ifcvf_main.c
··· 63 63 struct pci_dev *pdev = adapter->pdev; 64 64 struct ifcvf_hw *vf = &adapter->vf; 65 65 int vector, i, ret, irq; 66 + u16 max_intr; 66 67 67 - ret = pci_alloc_irq_vectors(pdev, IFCVF_MAX_INTR, 68 - IFCVF_MAX_INTR, PCI_IRQ_MSIX); 68 + /* all queues and config interrupt */ 69 + max_intr = vf->nr_vring + 1; 70 + 71 + ret = pci_alloc_irq_vectors(pdev, max_intr, 72 + max_intr, PCI_IRQ_MSIX); 69 73 if (ret < 0) { 70 74 IFCVF_ERR(pdev, "Failed to alloc IRQ vectors\n"); 71 75 return ret; ··· 87 83 return ret; 88 84 } 89 85 90 - for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) { 86 + for (i = 0; i < vf->nr_vring; i++) { 91 87 snprintf(vf->vring[i].msix_name, 256, "ifcvf[%s]-%d\n", 92 88 pci_name(pdev), i); 93 89 vector = i + IFCVF_MSI_QUEUE_OFF; ··· 116 112 u8 status; 117 113 int ret; 118 114 119 - vf->nr_vring = IFCVF_MAX_QUEUE_PAIRS * 2; 120 115 ret = ifcvf_start_hw(vf); 121 116 if (ret < 0) { 122 117 status = ifcvf_get_status(vf); ··· 131 128 struct ifcvf_hw *vf = ifcvf_private_to_vf(private); 132 129 int i; 133 130 134 - for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) 131 + for (i = 0; i < vf->nr_vring; i++) 135 132 vf->vring[i].cb.callback = NULL; 136 133 137 134 ifcvf_stop_hw(vf); ··· 144 141 struct ifcvf_hw *vf = ifcvf_private_to_vf(adapter); 145 142 int i; 146 143 147 - for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) { 144 + for (i = 0; i < vf->nr_vring; i++) { 148 145 vf->vring[i].last_avail_idx = 0; 149 146 vf->vring[i].desc = 0; 150 147 vf->vring[i].avail = 0; ··· 174 171 struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev); 175 172 struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev); 176 173 struct pci_dev *pdev = adapter->pdev; 177 - 174 + u32 type = vf->dev_type; 178 175 u64 features; 179 176 180 - switch (vf->dev_type) { 181 - case VIRTIO_ID_NET: 182 - features = ifcvf_get_features(vf) & IFCVF_NET_SUPPORTED_FEATURES; 183 - break; 184 - case VIRTIO_ID_BLOCK: 177 + if (type == VIRTIO_ID_NET || type == VIRTIO_ID_BLOCK) 185 178 features = ifcvf_get_features(vf); 186 - break; 187 - default: 179 + else { 188 180 features = 0; 189 181 IFCVF_ERR(pdev, "VIRTIO ID %u not supported\n", vf->dev_type); 190 182 } ··· 216 218 int ret; 217 219 218 220 vf = vdpa_to_vf(vdpa_dev); 219 - adapter = dev_get_drvdata(vdpa_dev->dev.parent); 221 + adapter = vdpa_to_adapter(vdpa_dev); 220 222 status_old = ifcvf_get_status(vf); 221 223 222 224 if (status_old == status) 223 225 return; 224 - 225 - if ((status_old & VIRTIO_CONFIG_S_DRIVER_OK) && 226 - !(status & VIRTIO_CONFIG_S_DRIVER_OK)) { 227 - ifcvf_stop_datapath(adapter); 228 - ifcvf_free_irq(adapter, IFCVF_MAX_QUEUE_PAIRS * 2); 229 - } 230 - 231 - if (status == 0) { 232 - ifcvf_reset_vring(adapter); 233 - return; 234 - } 235 226 236 227 if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && 237 228 !(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) { ··· 239 252 } 240 253 241 254 ifcvf_set_status(vf, status); 255 + } 256 + 257 + static int ifcvf_vdpa_reset(struct vdpa_device *vdpa_dev) 258 + { 259 + struct ifcvf_adapter *adapter; 260 + struct ifcvf_hw *vf; 261 + u8 status_old; 262 + 263 + vf = vdpa_to_vf(vdpa_dev); 264 + adapter = vdpa_to_adapter(vdpa_dev); 265 + status_old = ifcvf_get_status(vf); 266 + 267 + if (status_old == 0) 268 + return 0; 269 + 270 + if (status_old & VIRTIO_CONFIG_S_DRIVER_OK) { 271 + ifcvf_stop_datapath(adapter); 272 + ifcvf_free_irq(adapter, vf->nr_vring); 273 + } 274 + 275 + ifcvf_reset_vring(adapter); 276 + 277 + return 0; 242 278 } 243 279 244 280 static u16 ifcvf_vdpa_get_vq_num_max(struct vdpa_device *vdpa_dev) ··· 447 437 .set_features = ifcvf_vdpa_set_features, 448 438 .get_status = ifcvf_vdpa_get_status, 449 439 .set_status = ifcvf_vdpa_set_status, 440 + .reset = ifcvf_vdpa_reset, 450 441 .get_vq_num_max = ifcvf_vdpa_get_vq_num_max, 451 442 .get_vq_state = ifcvf_vdpa_get_vq_state, 452 443 .set_vq_state = ifcvf_vdpa_set_vq_state, ··· 469 458 .get_vq_notification = ifcvf_get_vq_notification, 470 459 }; 471 460 472 - static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) 461 + static struct virtio_device_id id_table_net[] = { 462 + {VIRTIO_ID_NET, VIRTIO_DEV_ANY_ID}, 463 + {0}, 464 + }; 465 + 466 + static struct virtio_device_id id_table_blk[] = { 467 + {VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID}, 468 + {0}, 469 + }; 470 + 471 + static u32 get_dev_type(struct pci_dev *pdev) 473 472 { 474 - struct device *dev = &pdev->dev; 475 - struct ifcvf_adapter *adapter; 476 - struct ifcvf_hw *vf; 477 - int ret, i; 478 - 479 - ret = pcim_enable_device(pdev); 480 - if (ret) { 481 - IFCVF_ERR(pdev, "Failed to enable device\n"); 482 - return ret; 483 - } 484 - 485 - ret = pcim_iomap_regions(pdev, BIT(0) | BIT(2) | BIT(4), 486 - IFCVF_DRIVER_NAME); 487 - if (ret) { 488 - IFCVF_ERR(pdev, "Failed to request MMIO region\n"); 489 - return ret; 490 - } 491 - 492 - ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)); 493 - if (ret) { 494 - IFCVF_ERR(pdev, "No usable DMA configuration\n"); 495 - return ret; 496 - } 497 - 498 - ret = devm_add_action_or_reset(dev, ifcvf_free_irq_vectors, pdev); 499 - if (ret) { 500 - IFCVF_ERR(pdev, 501 - "Failed for adding devres for freeing irq vectors\n"); 502 - return ret; 503 - } 504 - 505 - adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa, 506 - dev, &ifc_vdpa_ops, NULL); 507 - if (IS_ERR(adapter)) { 508 - IFCVF_ERR(pdev, "Failed to allocate vDPA structure"); 509 - return PTR_ERR(adapter); 510 - } 511 - 512 - pci_set_master(pdev); 513 - pci_set_drvdata(pdev, adapter); 514 - 515 - vf = &adapter->vf; 473 + u32 dev_type; 516 474 517 475 /* This drirver drives both modern virtio devices and transitional 518 476 * devices in modern mode. ··· 490 510 * mode will not work for vDPA, this driver will not 491 511 * drive devices with legacy interface. 492 512 */ 493 - if (pdev->device < 0x1040) 494 - vf->dev_type = pdev->subsystem_device; 495 - else 496 - vf->dev_type = pdev->device - 0x1040; 497 513 514 + if (pdev->device < 0x1040) 515 + dev_type = pdev->subsystem_device; 516 + else 517 + dev_type = pdev->device - 0x1040; 518 + 519 + return dev_type; 520 + } 521 + 522 + static int ifcvf_vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name) 523 + { 524 + struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev; 525 + struct ifcvf_adapter *adapter; 526 + struct pci_dev *pdev; 527 + struct ifcvf_hw *vf; 528 + struct device *dev; 529 + int ret, i; 530 + 531 + ifcvf_mgmt_dev = container_of(mdev, struct ifcvf_vdpa_mgmt_dev, mdev); 532 + if (ifcvf_mgmt_dev->adapter) 533 + return -EOPNOTSUPP; 534 + 535 + pdev = ifcvf_mgmt_dev->pdev; 536 + dev = &pdev->dev; 537 + adapter = vdpa_alloc_device(struct ifcvf_adapter, vdpa, 538 + dev, &ifc_vdpa_ops, name, false); 539 + if (IS_ERR(adapter)) { 540 + IFCVF_ERR(pdev, "Failed to allocate vDPA structure"); 541 + return PTR_ERR(adapter); 542 + } 543 + 544 + ifcvf_mgmt_dev->adapter = adapter; 545 + pci_set_drvdata(pdev, ifcvf_mgmt_dev); 546 + 547 + vf = &adapter->vf; 548 + vf->dev_type = get_dev_type(pdev); 498 549 vf->base = pcim_iomap_table(pdev); 499 550 500 551 adapter->pdev = pdev; ··· 537 526 goto err; 538 527 } 539 528 540 - for (i = 0; i < IFCVF_MAX_QUEUE_PAIRS * 2; i++) 529 + for (i = 0; i < vf->nr_vring; i++) 541 530 vf->vring[i].irq = -EINVAL; 542 531 543 532 vf->hw_features = ifcvf_get_hw_features(vf); 544 533 545 - ret = vdpa_register_device(&adapter->vdpa, IFCVF_MAX_QUEUE_PAIRS * 2); 534 + adapter->vdpa.mdev = &ifcvf_mgmt_dev->mdev; 535 + ret = _vdpa_register_device(&adapter->vdpa, vf->nr_vring); 546 536 if (ret) { 547 - IFCVF_ERR(pdev, "Failed to register ifcvf to vdpa bus"); 537 + IFCVF_ERR(pdev, "Failed to register to vDPA bus"); 548 538 goto err; 549 539 } 550 540 ··· 556 544 return ret; 557 545 } 558 546 547 + static void ifcvf_vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev) 548 + { 549 + struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev; 550 + 551 + ifcvf_mgmt_dev = container_of(mdev, struct ifcvf_vdpa_mgmt_dev, mdev); 552 + _vdpa_unregister_device(dev); 553 + ifcvf_mgmt_dev->adapter = NULL; 554 + } 555 + 556 + static const struct vdpa_mgmtdev_ops ifcvf_vdpa_mgmt_dev_ops = { 557 + .dev_add = ifcvf_vdpa_dev_add, 558 + .dev_del = ifcvf_vdpa_dev_del 559 + }; 560 + 561 + static int ifcvf_probe(struct pci_dev *pdev, const struct pci_device_id *id) 562 + { 563 + struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev; 564 + struct device *dev = &pdev->dev; 565 + u32 dev_type; 566 + int ret; 567 + 568 + ifcvf_mgmt_dev = kzalloc(sizeof(struct ifcvf_vdpa_mgmt_dev), GFP_KERNEL); 569 + if (!ifcvf_mgmt_dev) { 570 + IFCVF_ERR(pdev, "Failed to alloc memory for the vDPA management device\n"); 571 + return -ENOMEM; 572 + } 573 + 574 + dev_type = get_dev_type(pdev); 575 + switch (dev_type) { 576 + case VIRTIO_ID_NET: 577 + ifcvf_mgmt_dev->mdev.id_table = id_table_net; 578 + break; 579 + case VIRTIO_ID_BLOCK: 580 + ifcvf_mgmt_dev->mdev.id_table = id_table_blk; 581 + break; 582 + default: 583 + IFCVF_ERR(pdev, "VIRTIO ID %u not supported\n", dev_type); 584 + ret = -EOPNOTSUPP; 585 + goto err; 586 + } 587 + 588 + ifcvf_mgmt_dev->mdev.ops = &ifcvf_vdpa_mgmt_dev_ops; 589 + ifcvf_mgmt_dev->mdev.device = dev; 590 + ifcvf_mgmt_dev->pdev = pdev; 591 + 592 + ret = pcim_enable_device(pdev); 593 + if (ret) { 594 + IFCVF_ERR(pdev, "Failed to enable device\n"); 595 + goto err; 596 + } 597 + 598 + ret = pcim_iomap_regions(pdev, BIT(0) | BIT(2) | BIT(4), 599 + IFCVF_DRIVER_NAME); 600 + if (ret) { 601 + IFCVF_ERR(pdev, "Failed to request MMIO region\n"); 602 + goto err; 603 + } 604 + 605 + ret = dma_set_mask_and_coherent(dev, DMA_BIT_MASK(64)); 606 + if (ret) { 607 + IFCVF_ERR(pdev, "No usable DMA configuration\n"); 608 + goto err; 609 + } 610 + 611 + ret = devm_add_action_or_reset(dev, ifcvf_free_irq_vectors, pdev); 612 + if (ret) { 613 + IFCVF_ERR(pdev, 614 + "Failed for adding devres for freeing irq vectors\n"); 615 + goto err; 616 + } 617 + 618 + pci_set_master(pdev); 619 + 620 + ret = vdpa_mgmtdev_register(&ifcvf_mgmt_dev->mdev); 621 + if (ret) { 622 + IFCVF_ERR(pdev, 623 + "Failed to initialize the management interfaces\n"); 624 + goto err; 625 + } 626 + 627 + return 0; 628 + 629 + err: 630 + kfree(ifcvf_mgmt_dev); 631 + return ret; 632 + } 633 + 559 634 static void ifcvf_remove(struct pci_dev *pdev) 560 635 { 561 - struct ifcvf_adapter *adapter = pci_get_drvdata(pdev); 636 + struct ifcvf_vdpa_mgmt_dev *ifcvf_mgmt_dev; 562 637 563 - vdpa_unregister_device(&adapter->vdpa); 638 + ifcvf_mgmt_dev = pci_get_drvdata(pdev); 639 + vdpa_mgmtdev_unregister(&ifcvf_mgmt_dev->mdev); 640 + kfree(ifcvf_mgmt_dev); 564 641 } 565 642 566 643 static struct pci_device_id ifcvf_pci_ids[] = {
+25 -1
drivers/vdpa/mlx5/core/mlx5_vdpa.h
··· 5 5 #define __MLX5_VDPA_H__ 6 6 7 7 #include <linux/etherdevice.h> 8 - #include <linux/if_vlan.h> 8 + #include <linux/vringh.h> 9 9 #include <linux/vdpa.h> 10 10 #include <linux/mlx5/driver.h> 11 11 ··· 48 48 bool valid; 49 49 }; 50 50 51 + struct mlx5_control_vq { 52 + struct vhost_iotlb *iotlb; 53 + /* spinlock to synchronize iommu table */ 54 + spinlock_t iommu_lock; 55 + struct vringh vring; 56 + bool ready; 57 + u64 desc_addr; 58 + u64 device_addr; 59 + u64 driver_addr; 60 + struct vdpa_callback event_cb; 61 + struct vringh_kiov riov; 62 + struct vringh_kiov wiov; 63 + unsigned short head; 64 + }; 65 + 66 + struct mlx5_ctrl_wq_ent { 67 + struct work_struct work; 68 + struct mlx5_vdpa_dev *mvdev; 69 + }; 70 + 51 71 struct mlx5_vdpa_dev { 52 72 struct vdpa_device vdev; 53 73 struct mlx5_core_dev *mdev; ··· 77 57 u64 actual_features; 78 58 u8 status; 79 59 u32 max_vqs; 60 + u16 max_idx; 80 61 u32 generation; 81 62 82 63 struct mlx5_vdpa_mr mr; 64 + struct mlx5_control_vq cvq; 65 + struct workqueue_struct *wq; 83 66 }; 84 67 85 68 int mlx5_vdpa_alloc_pd(struct mlx5_vdpa_dev *dev, u32 *pdn, u16 uid); ··· 91 68 int mlx5_vdpa_create_tis(struct mlx5_vdpa_dev *mvdev, void *in, u32 *tisn); 92 69 void mlx5_vdpa_destroy_tis(struct mlx5_vdpa_dev *mvdev, u32 tisn); 93 70 int mlx5_vdpa_create_rqt(struct mlx5_vdpa_dev *mvdev, void *in, int inlen, u32 *rqtn); 71 + int mlx5_vdpa_modify_rqt(struct mlx5_vdpa_dev *mvdev, void *in, int inlen, u32 rqtn); 94 72 void mlx5_vdpa_destroy_rqt(struct mlx5_vdpa_dev *mvdev, u32 rqtn); 95 73 int mlx5_vdpa_create_tir(struct mlx5_vdpa_dev *mvdev, void *in, u32 *tirn); 96 74 void mlx5_vdpa_destroy_tir(struct mlx5_vdpa_dev *mvdev, u32 tirn);
+61 -20
drivers/vdpa/mlx5/core/mr.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 2 /* Copyright (c) 2020 Mellanox Technologies Ltd. */ 3 3 4 + #include <linux/vhost_types.h> 4 5 #include <linux/vdpa.h> 5 6 #include <linux/gcd.h> 6 7 #include <linux/string.h> ··· 452 451 mlx5_vdpa_destroy_mkey(mvdev, &mr->mkey); 453 452 } 454 453 455 - static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb) 454 + static int dup_iotlb(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *src) 456 455 { 457 - struct mlx5_vdpa_mr *mr = &mvdev->mr; 456 + struct vhost_iotlb_map *map; 457 + u64 start = 0, last = ULLONG_MAX; 458 458 int err; 459 459 460 - if (mr->initialized) 461 - return 0; 460 + if (!src) { 461 + err = vhost_iotlb_add_range(mvdev->cvq.iotlb, start, last, start, VHOST_ACCESS_RW); 462 + return err; 463 + } 462 464 463 - if (iotlb) 464 - err = create_user_mr(mvdev, iotlb); 465 - else 466 - err = create_dma_mr(mvdev, mr); 467 - 468 - if (!err) 469 - mr->initialized = true; 470 - 471 - return err; 465 + for (map = vhost_iotlb_itree_first(src, start, last); map; 466 + map = vhost_iotlb_itree_next(map, start, last)) { 467 + err = vhost_iotlb_add_range(mvdev->cvq.iotlb, map->start, map->last, 468 + map->addr, map->perm); 469 + if (err) 470 + return err; 471 + } 472 + return 0; 472 473 } 473 474 474 - int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb) 475 + static void prune_iotlb(struct mlx5_vdpa_dev *mvdev) 475 476 { 476 - int err; 477 - 478 - mutex_lock(&mvdev->mr.mkey_mtx); 479 - err = _mlx5_vdpa_create_mr(mvdev, iotlb); 480 - mutex_unlock(&mvdev->mr.mkey_mtx); 481 - return err; 477 + vhost_iotlb_del_range(mvdev->cvq.iotlb, 0, ULLONG_MAX); 482 478 } 483 479 484 480 static void destroy_user_mr(struct mlx5_vdpa_dev *mvdev, struct mlx5_vdpa_mr *mr) ··· 499 501 if (!mr->initialized) 500 502 goto out; 501 503 504 + prune_iotlb(mvdev); 502 505 if (mr->user_mr) 503 506 destroy_user_mr(mvdev, mr); 504 507 else ··· 509 510 mr->initialized = false; 510 511 out: 511 512 mutex_unlock(&mr->mkey_mtx); 513 + } 514 + 515 + static int _mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb) 516 + { 517 + struct mlx5_vdpa_mr *mr = &mvdev->mr; 518 + int err; 519 + 520 + if (mr->initialized) 521 + return 0; 522 + 523 + if (iotlb) 524 + err = create_user_mr(mvdev, iotlb); 525 + else 526 + err = create_dma_mr(mvdev, mr); 527 + 528 + if (err) 529 + return err; 530 + 531 + err = dup_iotlb(mvdev, iotlb); 532 + if (err) 533 + goto out_err; 534 + 535 + mr->initialized = true; 536 + return 0; 537 + 538 + out_err: 539 + if (iotlb) 540 + destroy_user_mr(mvdev, mr); 541 + else 542 + destroy_dma_mr(mvdev, mr); 543 + 544 + return err; 545 + } 546 + 547 + int mlx5_vdpa_create_mr(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb) 548 + { 549 + int err; 550 + 551 + mutex_lock(&mvdev->mr.mkey_mtx); 552 + err = _mlx5_vdpa_create_mr(mvdev, iotlb); 553 + mutex_unlock(&mvdev->mr.mkey_mtx); 554 + return err; 512 555 } 513 556 514 557 int mlx5_vdpa_handle_set_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb,
+35
drivers/vdpa/mlx5/core/resources.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB 2 2 /* Copyright (c) 2020 Mellanox Technologies Ltd. */ 3 3 4 + #include <linux/iova.h> 4 5 #include <linux/mlx5/driver.h> 5 6 #include "mlx5_vdpa.h" 6 7 ··· 129 128 return err; 130 129 } 131 130 131 + int mlx5_vdpa_modify_rqt(struct mlx5_vdpa_dev *mvdev, void *in, int inlen, u32 rqtn) 132 + { 133 + u32 out[MLX5_ST_SZ_DW(create_rqt_out)] = {}; 134 + 135 + MLX5_SET(modify_rqt_in, in, uid, mvdev->res.uid); 136 + MLX5_SET(modify_rqt_in, in, rqtn, rqtn); 137 + MLX5_SET(modify_rqt_in, in, opcode, MLX5_CMD_OP_MODIFY_RQT); 138 + return mlx5_cmd_exec(mvdev->mdev, in, inlen, out, sizeof(out)); 139 + } 140 + 132 141 void mlx5_vdpa_destroy_rqt(struct mlx5_vdpa_dev *mvdev, u32 rqtn) 133 142 { 134 143 u32 in[MLX5_ST_SZ_DW(destroy_rqt_in)] = {}; ··· 232 221 return mlx5_cmd_exec_in(mvdev->mdev, destroy_mkey, in); 233 222 } 234 223 224 + static int init_ctrl_vq(struct mlx5_vdpa_dev *mvdev) 225 + { 226 + mvdev->cvq.iotlb = vhost_iotlb_alloc(0, 0); 227 + if (!mvdev->cvq.iotlb) 228 + return -ENOMEM; 229 + 230 + vringh_set_iotlb(&mvdev->cvq.vring, mvdev->cvq.iotlb, &mvdev->cvq.iommu_lock); 231 + 232 + return 0; 233 + } 234 + 235 + static void cleanup_ctrl_vq(struct mlx5_vdpa_dev *mvdev) 236 + { 237 + vhost_iotlb_free(mvdev->cvq.iotlb); 238 + } 239 + 235 240 int mlx5_vdpa_alloc_resources(struct mlx5_vdpa_dev *mvdev) 236 241 { 237 242 u64 offset = MLX5_CAP64_DEV_VDPA_EMULATION(mvdev->mdev, doorbell_bar_offset); ··· 287 260 err = -ENOMEM; 288 261 goto err_key; 289 262 } 263 + 264 + err = init_ctrl_vq(mvdev); 265 + if (err) 266 + goto err_ctrl; 267 + 290 268 res->valid = true; 291 269 292 270 return 0; 293 271 272 + err_ctrl: 273 + iounmap(res->kick_addr); 294 274 err_key: 295 275 dealloc_pd(mvdev, res->pdn, res->uid); 296 276 err_pd: ··· 316 282 if (!res->valid) 317 283 return; 318 284 285 + cleanup_ctrl_vq(mvdev); 319 286 iounmap(res->kick_addr); 320 287 res->kick_addr = NULL; 321 288 dealloc_pd(mvdev, res->pdn, res->uid);
+477 -78
drivers/vdpa/mlx5/net/mlx5_vnet.c
··· 45 45 (VIRTIO_CONFIG_S_ACKNOWLEDGE | VIRTIO_CONFIG_S_DRIVER | VIRTIO_CONFIG_S_DRIVER_OK | \ 46 46 VIRTIO_CONFIG_S_FEATURES_OK | VIRTIO_CONFIG_S_NEEDS_RESET | VIRTIO_CONFIG_S_FAILED) 47 47 48 + #define MLX5_FEATURE(_mvdev, _feature) (!!((_mvdev)->actual_features & BIT_ULL(_feature))) 49 + 48 50 struct mlx5_vdpa_net_resources { 49 51 u32 tisn; 50 52 u32 tdn; ··· 92 90 u16 avail_index; 93 91 u16 used_index; 94 92 bool ready; 95 - struct vdpa_callback cb; 96 93 bool restore; 97 94 }; 98 95 ··· 101 100 u64 device_addr; 102 101 u64 driver_addr; 103 102 u32 num_ent; 104 - struct vdpa_callback event_cb; 105 103 106 104 /* Resources for implementing the notification channel from the device 107 105 * to the driver. fwqp is the firmware end of an RC connection; the ··· 135 135 */ 136 136 #define MLX5_MAX_SUPPORTED_VQS 16 137 137 138 + static bool is_index_valid(struct mlx5_vdpa_dev *mvdev, u16 idx) 139 + { 140 + if (unlikely(idx > mvdev->max_idx)) 141 + return false; 142 + 143 + return true; 144 + } 145 + 138 146 struct mlx5_vdpa_net { 139 147 struct mlx5_vdpa_dev mvdev; 140 148 struct mlx5_vdpa_net_resources res; 141 149 struct virtio_net_config config; 142 150 struct mlx5_vdpa_virtqueue vqs[MLX5_MAX_SUPPORTED_VQS]; 151 + struct vdpa_callback event_cbs[MLX5_MAX_SUPPORTED_VQS + 1]; 143 152 144 153 /* Serialize vq resources creation and destruction. This is required 145 154 * since memory map might change and we need to destroy and create ··· 160 151 struct mlx5_flow_handle *rx_rule; 161 152 bool setup; 162 153 u16 mtu; 154 + u32 cur_num_vqs; 163 155 }; 164 156 165 157 static void free_resources(struct mlx5_vdpa_net *ndev); 166 158 static void init_mvqs(struct mlx5_vdpa_net *ndev); 167 - static int setup_driver(struct mlx5_vdpa_net *ndev); 159 + static int setup_driver(struct mlx5_vdpa_dev *mvdev); 168 160 static void teardown_driver(struct mlx5_vdpa_net *ndev); 169 161 170 162 static bool mlx5_vdpa_debug; 163 + 164 + #define MLX5_CVQ_MAX_ENT 16 171 165 172 166 #define MLX5_LOG_VIO_FLAG(_feature) \ 173 167 do { \ ··· 184 172 mlx5_vdpa_info(mvdev, "%s\n", #_status); \ 185 173 } while (0) 186 174 175 + /* TODO: cross-endian support */ 176 + static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev) 177 + { 178 + return virtio_legacy_is_little_endian() || 179 + (mvdev->actual_features & BIT_ULL(VIRTIO_F_VERSION_1)); 180 + } 181 + 182 + static u16 mlx5vdpa16_to_cpu(struct mlx5_vdpa_dev *mvdev, __virtio16 val) 183 + { 184 + return __virtio16_to_cpu(mlx5_vdpa_is_little_endian(mvdev), val); 185 + } 186 + 187 + static __virtio16 cpu_to_mlx5vdpa16(struct mlx5_vdpa_dev *mvdev, u16 val) 188 + { 189 + return __cpu_to_virtio16(mlx5_vdpa_is_little_endian(mvdev), val); 190 + } 191 + 187 192 static inline u32 mlx5_vdpa_max_qps(int max_vqs) 188 193 { 189 194 return max_vqs / 2; 195 + } 196 + 197 + static u16 ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev) 198 + { 199 + if (!(mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_MQ))) 200 + return 2; 201 + 202 + return 2 * mlx5_vdpa_max_qps(mvdev->max_vqs); 203 + } 204 + 205 + static bool is_ctrl_vq_idx(struct mlx5_vdpa_dev *mvdev, u16 idx) 206 + { 207 + return idx == ctrl_vq_idx(mvdev); 190 208 } 191 209 192 210 static void print_status(struct mlx5_vdpa_dev *mvdev, u8 status, bool set) ··· 523 481 524 482 static void mlx5_vdpa_handle_completions(struct mlx5_vdpa_virtqueue *mvq, int num) 525 483 { 484 + struct mlx5_vdpa_net *ndev = mvq->ndev; 485 + struct vdpa_callback *event_cb; 486 + 487 + event_cb = &ndev->event_cbs[mvq->index]; 526 488 mlx5_cq_set_ci(&mvq->cq.mcq); 527 489 528 490 /* make sure CQ cosumer update is visible to the hardware before updating ··· 534 488 */ 535 489 dma_wmb(); 536 490 rx_post(&mvq->vqqp, num); 537 - if (mvq->event_cb.callback) 538 - mvq->event_cb.callback(mvq->event_cb.private); 491 + if (event_cb->callback) 492 + event_cb->callback(event_cb->private); 539 493 } 540 494 541 495 static void mlx5_vdpa_cq_comp(struct mlx5_core_cq *mcq, struct mlx5_eqe *eqe) ··· 1146 1100 if (!mvq->num_ent) 1147 1101 return 0; 1148 1102 1149 - if (mvq->initialized) { 1150 - mlx5_vdpa_warn(&ndev->mvdev, "attempt re init\n"); 1151 - return -EINVAL; 1152 - } 1103 + if (mvq->initialized) 1104 + return 0; 1153 1105 1154 1106 err = cq_create(ndev, idx, mvq->num_ent); 1155 1107 if (err) ··· 1234 1190 1235 1191 static int create_rqt(struct mlx5_vdpa_net *ndev) 1236 1192 { 1237 - int log_max_rqt; 1238 1193 __be32 *list; 1194 + int max_rqt; 1239 1195 void *rqtc; 1240 1196 int inlen; 1241 1197 void *in; 1242 1198 int i, j; 1243 1199 int err; 1244 1200 1245 - log_max_rqt = min_t(int, 1, MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size)); 1246 - if (log_max_rqt < 1) 1201 + max_rqt = min_t(int, MLX5_MAX_SUPPORTED_VQS / 2, 1202 + 1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size)); 1203 + if (max_rqt < 1) 1247 1204 return -EOPNOTSUPP; 1248 1205 1249 - inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + (1 << log_max_rqt) * MLX5_ST_SZ_BYTES(rq_num); 1206 + inlen = MLX5_ST_SZ_BYTES(create_rqt_in) + max_rqt * MLX5_ST_SZ_BYTES(rq_num); 1250 1207 in = kzalloc(inlen, GFP_KERNEL); 1251 1208 if (!in) 1252 1209 return -ENOMEM; ··· 1256 1211 rqtc = MLX5_ADDR_OF(create_rqt_in, in, rqt_context); 1257 1212 1258 1213 MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q); 1259 - MLX5_SET(rqtc, rqtc, rqt_max_size, 1 << log_max_rqt); 1260 - MLX5_SET(rqtc, rqtc, rqt_actual_size, 1); 1214 + MLX5_SET(rqtc, rqtc, rqt_max_size, max_rqt); 1261 1215 list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]); 1262 - for (i = 0, j = 0; j < ndev->mvdev.max_vqs; j++) { 1216 + for (i = 0, j = 0; j < max_rqt; j++) { 1263 1217 if (!ndev->vqs[j].initialized) 1264 1218 continue; 1265 1219 ··· 1267 1223 i++; 1268 1224 } 1269 1225 } 1226 + MLX5_SET(rqtc, rqtc, rqt_actual_size, i); 1270 1227 1271 1228 err = mlx5_vdpa_create_rqt(&ndev->mvdev, in, inlen, &ndev->res.rqtn); 1229 + kfree(in); 1230 + if (err) 1231 + return err; 1232 + 1233 + return 0; 1234 + } 1235 + 1236 + #define MLX5_MODIFY_RQT_NUM_RQS ((u64)1) 1237 + 1238 + static int modify_rqt(struct mlx5_vdpa_net *ndev, int num) 1239 + { 1240 + __be32 *list; 1241 + int max_rqt; 1242 + void *rqtc; 1243 + int inlen; 1244 + void *in; 1245 + int i, j; 1246 + int err; 1247 + 1248 + max_rqt = min_t(int, ndev->cur_num_vqs / 2, 1249 + 1 << MLX5_CAP_GEN(ndev->mvdev.mdev, log_max_rqt_size)); 1250 + if (max_rqt < 1) 1251 + return -EOPNOTSUPP; 1252 + 1253 + inlen = MLX5_ST_SZ_BYTES(modify_rqt_in) + max_rqt * MLX5_ST_SZ_BYTES(rq_num); 1254 + in = kzalloc(inlen, GFP_KERNEL); 1255 + if (!in) 1256 + return -ENOMEM; 1257 + 1258 + MLX5_SET(modify_rqt_in, in, uid, ndev->mvdev.res.uid); 1259 + MLX5_SET64(modify_rqt_in, in, bitmask, MLX5_MODIFY_RQT_NUM_RQS); 1260 + rqtc = MLX5_ADDR_OF(modify_rqt_in, in, ctx); 1261 + MLX5_SET(rqtc, rqtc, list_q_type, MLX5_RQTC_LIST_Q_TYPE_VIRTIO_NET_Q); 1262 + 1263 + list = MLX5_ADDR_OF(rqtc, rqtc, rq_num[0]); 1264 + for (i = 0, j = 0; j < num; j++) { 1265 + if (!ndev->vqs[j].initialized) 1266 + continue; 1267 + 1268 + if (!vq_is_tx(ndev->vqs[j].index)) { 1269 + list[i] = cpu_to_be32(ndev->vqs[j].virtq_id); 1270 + i++; 1271 + } 1272 + } 1273 + MLX5_SET(rqtc, rqtc, rqt_actual_size, i); 1274 + err = mlx5_vdpa_modify_rqt(&ndev->mvdev, in, inlen, ndev->res.rqtn); 1272 1275 kfree(in); 1273 1276 if (err) 1274 1277 return err; ··· 1436 1345 ndev->rx_rule = NULL; 1437 1346 } 1438 1347 1348 + static virtio_net_ctrl_ack handle_ctrl_mac(struct mlx5_vdpa_dev *mvdev, u8 cmd) 1349 + { 1350 + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1351 + struct mlx5_control_vq *cvq = &mvdev->cvq; 1352 + virtio_net_ctrl_ack status = VIRTIO_NET_ERR; 1353 + struct mlx5_core_dev *pfmdev; 1354 + size_t read; 1355 + u8 mac[ETH_ALEN]; 1356 + 1357 + pfmdev = pci_get_drvdata(pci_physfn(mvdev->mdev->pdev)); 1358 + switch (cmd) { 1359 + case VIRTIO_NET_CTRL_MAC_ADDR_SET: 1360 + read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->riov, (void *)mac, ETH_ALEN); 1361 + if (read != ETH_ALEN) 1362 + break; 1363 + 1364 + if (!memcmp(ndev->config.mac, mac, 6)) { 1365 + status = VIRTIO_NET_OK; 1366 + break; 1367 + } 1368 + 1369 + if (!is_zero_ether_addr(ndev->config.mac)) { 1370 + if (mlx5_mpfs_del_mac(pfmdev, ndev->config.mac)) { 1371 + mlx5_vdpa_warn(mvdev, "failed to delete old MAC %pM from MPFS table\n", 1372 + ndev->config.mac); 1373 + break; 1374 + } 1375 + } 1376 + 1377 + if (mlx5_mpfs_add_mac(pfmdev, mac)) { 1378 + mlx5_vdpa_warn(mvdev, "failed to insert new MAC %pM into MPFS table\n", 1379 + mac); 1380 + break; 1381 + } 1382 + 1383 + memcpy(ndev->config.mac, mac, ETH_ALEN); 1384 + status = VIRTIO_NET_OK; 1385 + break; 1386 + 1387 + default: 1388 + break; 1389 + } 1390 + 1391 + return status; 1392 + } 1393 + 1394 + static int change_num_qps(struct mlx5_vdpa_dev *mvdev, int newqps) 1395 + { 1396 + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1397 + int cur_qps = ndev->cur_num_vqs / 2; 1398 + int err; 1399 + int i; 1400 + 1401 + if (cur_qps > newqps) { 1402 + err = modify_rqt(ndev, 2 * newqps); 1403 + if (err) 1404 + return err; 1405 + 1406 + for (i = ndev->cur_num_vqs - 1; i >= 2 * newqps; i--) 1407 + teardown_vq(ndev, &ndev->vqs[i]); 1408 + 1409 + ndev->cur_num_vqs = 2 * newqps; 1410 + } else { 1411 + ndev->cur_num_vqs = 2 * newqps; 1412 + for (i = cur_qps * 2; i < 2 * newqps; i++) { 1413 + err = setup_vq(ndev, &ndev->vqs[i]); 1414 + if (err) 1415 + goto clean_added; 1416 + } 1417 + err = modify_rqt(ndev, 2 * newqps); 1418 + if (err) 1419 + goto clean_added; 1420 + } 1421 + return 0; 1422 + 1423 + clean_added: 1424 + for (--i; i >= cur_qps; --i) 1425 + teardown_vq(ndev, &ndev->vqs[i]); 1426 + 1427 + return err; 1428 + } 1429 + 1430 + static virtio_net_ctrl_ack handle_ctrl_mq(struct mlx5_vdpa_dev *mvdev, u8 cmd) 1431 + { 1432 + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1433 + virtio_net_ctrl_ack status = VIRTIO_NET_ERR; 1434 + struct mlx5_control_vq *cvq = &mvdev->cvq; 1435 + struct virtio_net_ctrl_mq mq; 1436 + size_t read; 1437 + u16 newqps; 1438 + 1439 + switch (cmd) { 1440 + case VIRTIO_NET_CTRL_MQ_VQ_PAIRS_SET: 1441 + read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->riov, (void *)&mq, sizeof(mq)); 1442 + if (read != sizeof(mq)) 1443 + break; 1444 + 1445 + newqps = mlx5vdpa16_to_cpu(mvdev, mq.virtqueue_pairs); 1446 + if (ndev->cur_num_vqs == 2 * newqps) { 1447 + status = VIRTIO_NET_OK; 1448 + break; 1449 + } 1450 + 1451 + if (newqps & (newqps - 1)) 1452 + break; 1453 + 1454 + if (!change_num_qps(mvdev, newqps)) 1455 + status = VIRTIO_NET_OK; 1456 + 1457 + break; 1458 + default: 1459 + break; 1460 + } 1461 + 1462 + return status; 1463 + } 1464 + 1465 + static void mlx5_cvq_kick_handler(struct work_struct *work) 1466 + { 1467 + virtio_net_ctrl_ack status = VIRTIO_NET_ERR; 1468 + struct virtio_net_ctrl_hdr ctrl; 1469 + struct mlx5_ctrl_wq_ent *wqent; 1470 + struct mlx5_vdpa_dev *mvdev; 1471 + struct mlx5_control_vq *cvq; 1472 + struct mlx5_vdpa_net *ndev; 1473 + size_t read, write; 1474 + int err; 1475 + 1476 + wqent = container_of(work, struct mlx5_ctrl_wq_ent, work); 1477 + mvdev = wqent->mvdev; 1478 + ndev = to_mlx5_vdpa_ndev(mvdev); 1479 + cvq = &mvdev->cvq; 1480 + if (!(ndev->mvdev.actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ))) 1481 + goto out; 1482 + 1483 + if (!cvq->ready) 1484 + goto out; 1485 + 1486 + while (true) { 1487 + err = vringh_getdesc_iotlb(&cvq->vring, &cvq->riov, &cvq->wiov, &cvq->head, 1488 + GFP_ATOMIC); 1489 + if (err <= 0) 1490 + break; 1491 + 1492 + read = vringh_iov_pull_iotlb(&cvq->vring, &cvq->riov, &ctrl, sizeof(ctrl)); 1493 + if (read != sizeof(ctrl)) 1494 + break; 1495 + 1496 + switch (ctrl.class) { 1497 + case VIRTIO_NET_CTRL_MAC: 1498 + status = handle_ctrl_mac(mvdev, ctrl.cmd); 1499 + break; 1500 + case VIRTIO_NET_CTRL_MQ: 1501 + status = handle_ctrl_mq(mvdev, ctrl.cmd); 1502 + break; 1503 + 1504 + default: 1505 + break; 1506 + } 1507 + 1508 + /* Make sure data is written before advancing index */ 1509 + smp_wmb(); 1510 + 1511 + write = vringh_iov_push_iotlb(&cvq->vring, &cvq->wiov, &status, sizeof(status)); 1512 + vringh_complete_iotlb(&cvq->vring, cvq->head, write); 1513 + vringh_kiov_cleanup(&cvq->riov); 1514 + vringh_kiov_cleanup(&cvq->wiov); 1515 + 1516 + if (vringh_need_notify_iotlb(&cvq->vring)) 1517 + vringh_notify(&cvq->vring); 1518 + } 1519 + out: 1520 + kfree(wqent); 1521 + } 1522 + 1439 1523 static void mlx5_vdpa_kick_vq(struct vdpa_device *vdev, u16 idx) 1440 1524 { 1441 1525 struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); 1442 1526 struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1443 - struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx]; 1527 + struct mlx5_vdpa_virtqueue *mvq; 1528 + struct mlx5_ctrl_wq_ent *wqent; 1444 1529 1530 + if (!is_index_valid(mvdev, idx)) 1531 + return; 1532 + 1533 + if (unlikely(is_ctrl_vq_idx(mvdev, idx))) { 1534 + if (!mvdev->cvq.ready) 1535 + return; 1536 + 1537 + wqent = kzalloc(sizeof(*wqent), GFP_ATOMIC); 1538 + if (!wqent) 1539 + return; 1540 + 1541 + wqent->mvdev = mvdev; 1542 + INIT_WORK(&wqent->work, mlx5_cvq_kick_handler); 1543 + queue_work(mvdev->wq, &wqent->work); 1544 + return; 1545 + } 1546 + 1547 + mvq = &ndev->vqs[idx]; 1445 1548 if (unlikely(!mvq->ready)) 1446 1549 return; 1447 1550 ··· 1647 1362 { 1648 1363 struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); 1649 1364 struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1650 - struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx]; 1365 + struct mlx5_vdpa_virtqueue *mvq; 1651 1366 1367 + if (!is_index_valid(mvdev, idx)) 1368 + return -EINVAL; 1369 + 1370 + if (is_ctrl_vq_idx(mvdev, idx)) { 1371 + mvdev->cvq.desc_addr = desc_area; 1372 + mvdev->cvq.device_addr = device_area; 1373 + mvdev->cvq.driver_addr = driver_area; 1374 + return 0; 1375 + } 1376 + 1377 + mvq = &ndev->vqs[idx]; 1652 1378 mvq->desc_addr = desc_area; 1653 1379 mvq->device_addr = device_area; 1654 1380 mvq->driver_addr = driver_area; ··· 1672 1376 struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1673 1377 struct mlx5_vdpa_virtqueue *mvq; 1674 1378 1379 + if (!is_index_valid(mvdev, idx) || is_ctrl_vq_idx(mvdev, idx)) 1380 + return; 1381 + 1675 1382 mvq = &ndev->vqs[idx]; 1676 1383 mvq->num_ent = num; 1677 1384 } ··· 1683 1384 { 1684 1385 struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); 1685 1386 struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1686 - struct mlx5_vdpa_virtqueue *vq = &ndev->vqs[idx]; 1687 1387 1688 - vq->event_cb = *cb; 1388 + ndev->event_cbs[idx] = *cb; 1389 + } 1390 + 1391 + static void mlx5_cvq_notify(struct vringh *vring) 1392 + { 1393 + struct mlx5_control_vq *cvq = container_of(vring, struct mlx5_control_vq, vring); 1394 + 1395 + if (!cvq->event_cb.callback) 1396 + return; 1397 + 1398 + cvq->event_cb.callback(cvq->event_cb.private); 1399 + } 1400 + 1401 + static void set_cvq_ready(struct mlx5_vdpa_dev *mvdev, bool ready) 1402 + { 1403 + struct mlx5_control_vq *cvq = &mvdev->cvq; 1404 + 1405 + cvq->ready = ready; 1406 + if (!ready) 1407 + return; 1408 + 1409 + cvq->vring.notify = mlx5_cvq_notify; 1689 1410 } 1690 1411 1691 1412 static void mlx5_vdpa_set_vq_ready(struct vdpa_device *vdev, u16 idx, bool ready) 1692 1413 { 1693 1414 struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); 1694 1415 struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1695 - struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx]; 1416 + struct mlx5_vdpa_virtqueue *mvq; 1696 1417 1418 + if (!is_index_valid(mvdev, idx)) 1419 + return; 1420 + 1421 + if (is_ctrl_vq_idx(mvdev, idx)) { 1422 + set_cvq_ready(mvdev, ready); 1423 + return; 1424 + } 1425 + 1426 + mvq = &ndev->vqs[idx]; 1697 1427 if (!ready) 1698 1428 suspend_vq(ndev, mvq); 1699 1429 ··· 1733 1405 { 1734 1406 struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); 1735 1407 struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1736 - struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx]; 1737 1408 1738 - return mvq->ready; 1409 + if (!is_index_valid(mvdev, idx)) 1410 + return false; 1411 + 1412 + if (is_ctrl_vq_idx(mvdev, idx)) 1413 + return mvdev->cvq.ready; 1414 + 1415 + return ndev->vqs[idx].ready; 1739 1416 } 1740 1417 1741 1418 static int mlx5_vdpa_set_vq_state(struct vdpa_device *vdev, u16 idx, ··· 1748 1415 { 1749 1416 struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); 1750 1417 struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1751 - struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx]; 1418 + struct mlx5_vdpa_virtqueue *mvq; 1752 1419 1420 + if (!is_index_valid(mvdev, idx)) 1421 + return -EINVAL; 1422 + 1423 + if (is_ctrl_vq_idx(mvdev, idx)) { 1424 + mvdev->cvq.vring.last_avail_idx = state->split.avail_index; 1425 + return 0; 1426 + } 1427 + 1428 + mvq = &ndev->vqs[idx]; 1753 1429 if (mvq->fw_state == MLX5_VIRTIO_NET_Q_OBJECT_STATE_RDY) { 1754 1430 mlx5_vdpa_warn(mvdev, "can't modify available index\n"); 1755 1431 return -EINVAL; ··· 1773 1431 { 1774 1432 struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); 1775 1433 struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1776 - struct mlx5_vdpa_virtqueue *mvq = &ndev->vqs[idx]; 1434 + struct mlx5_vdpa_virtqueue *mvq; 1777 1435 struct mlx5_virtq_attr attr; 1778 1436 int err; 1779 1437 1438 + if (!is_index_valid(mvdev, idx)) 1439 + return -EINVAL; 1440 + 1441 + if (is_ctrl_vq_idx(mvdev, idx)) { 1442 + state->split.avail_index = mvdev->cvq.vring.last_avail_idx; 1443 + return 0; 1444 + } 1445 + 1446 + mvq = &ndev->vqs[idx]; 1780 1447 /* If the virtq object was destroyed, use the value saved at 1781 1448 * the last minute of suspend_vq. This caters for userspace 1782 1449 * that cares about emulating the index after vq is stopped. ··· 1842 1491 u16 dev_features; 1843 1492 1844 1493 dev_features = MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, device_features_bits_mask); 1845 - ndev->mvdev.mlx_features = mlx_to_vritio_features(dev_features); 1494 + ndev->mvdev.mlx_features |= mlx_to_vritio_features(dev_features); 1846 1495 if (MLX5_CAP_DEV_VDPA_EMULATION(mvdev->mdev, virtio_version_1_0)) 1847 1496 ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_F_VERSION_1); 1848 1497 ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_F_ACCESS_PLATFORM); 1498 + ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_NET_F_CTRL_VQ); 1499 + ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR); 1500 + ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_NET_F_MQ); 1501 + 1849 1502 print_features(mvdev, ndev->mvdev.mlx_features, false); 1850 1503 return ndev->mvdev.mlx_features; 1851 1504 } ··· 1862 1507 return 0; 1863 1508 } 1864 1509 1865 - static int setup_virtqueues(struct mlx5_vdpa_net *ndev) 1510 + static int setup_virtqueues(struct mlx5_vdpa_dev *mvdev) 1866 1511 { 1512 + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1513 + struct mlx5_control_vq *cvq = &mvdev->cvq; 1867 1514 int err; 1868 1515 int i; 1869 1516 1870 - for (i = 0; i < 2 * mlx5_vdpa_max_qps(ndev->mvdev.max_vqs); i++) { 1517 + for (i = 0; i < 2 * mlx5_vdpa_max_qps(mvdev->max_vqs); i++) { 1871 1518 err = setup_vq(ndev, &ndev->vqs[i]); 1519 + if (err) 1520 + goto err_vq; 1521 + } 1522 + 1523 + if (mvdev->actual_features & BIT_ULL(VIRTIO_NET_F_CTRL_VQ)) { 1524 + err = vringh_init_iotlb(&cvq->vring, mvdev->actual_features, 1525 + MLX5_CVQ_MAX_ENT, false, 1526 + (struct vring_desc *)(uintptr_t)cvq->desc_addr, 1527 + (struct vring_avail *)(uintptr_t)cvq->driver_addr, 1528 + (struct vring_used *)(uintptr_t)cvq->device_addr); 1872 1529 if (err) 1873 1530 goto err_vq; 1874 1531 } ··· 1908 1541 } 1909 1542 } 1910 1543 1911 - /* TODO: cross-endian support */ 1912 - static inline bool mlx5_vdpa_is_little_endian(struct mlx5_vdpa_dev *mvdev) 1544 + static void update_cvq_info(struct mlx5_vdpa_dev *mvdev) 1913 1545 { 1914 - return virtio_legacy_is_little_endian() || 1915 - (mvdev->actual_features & BIT_ULL(VIRTIO_F_VERSION_1)); 1916 - } 1917 - 1918 - static __virtio16 cpu_to_mlx5vdpa16(struct mlx5_vdpa_dev *mvdev, u16 val) 1919 - { 1920 - return __cpu_to_virtio16(mlx5_vdpa_is_little_endian(mvdev), val); 1546 + if (MLX5_FEATURE(mvdev, VIRTIO_NET_F_CTRL_VQ)) { 1547 + if (MLX5_FEATURE(mvdev, VIRTIO_NET_F_MQ)) { 1548 + /* MQ supported. CVQ index is right above the last data virtqueue's */ 1549 + mvdev->max_idx = mvdev->max_vqs; 1550 + } else { 1551 + /* Only CVQ supportted. data virtqueues occupy indices 0 and 1. 1552 + * CVQ gets index 2 1553 + */ 1554 + mvdev->max_idx = 2; 1555 + } 1556 + } else { 1557 + /* Two data virtqueues only: one for rx and one for tx */ 1558 + mvdev->max_idx = 1; 1559 + } 1921 1560 } 1922 1561 1923 1562 static int mlx5_vdpa_set_features(struct vdpa_device *vdev, u64 features) ··· 1941 1568 ndev->mvdev.actual_features = features & ndev->mvdev.mlx_features; 1942 1569 ndev->config.mtu = cpu_to_mlx5vdpa16(mvdev, ndev->mtu); 1943 1570 ndev->config.status |= cpu_to_mlx5vdpa16(mvdev, VIRTIO_NET_S_LINK_UP); 1571 + update_cvq_info(mvdev); 1944 1572 return err; 1945 1573 } 1946 1574 ··· 1979 1605 static int save_channel_info(struct mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueue *mvq) 1980 1606 { 1981 1607 struct mlx5_vq_restore_info *ri = &mvq->ri; 1982 - struct mlx5_virtq_attr attr; 1608 + struct mlx5_virtq_attr attr = {}; 1983 1609 int err; 1984 1610 1985 - if (!mvq->initialized) 1986 - return 0; 1987 - 1988 - err = query_virtqueue(ndev, mvq, &attr); 1989 - if (err) 1990 - return err; 1611 + if (mvq->initialized) { 1612 + err = query_virtqueue(ndev, mvq, &attr); 1613 + if (err) 1614 + return err; 1615 + } 1991 1616 1992 1617 ri->avail_index = attr.available_index; 1993 1618 ri->used_index = attr.used_index; ··· 1995 1622 ri->desc_addr = mvq->desc_addr; 1996 1623 ri->device_addr = mvq->device_addr; 1997 1624 ri->driver_addr = mvq->driver_addr; 1998 - ri->cb = mvq->event_cb; 1999 1625 ri->restore = true; 2000 1626 return 0; 2001 1627 } ··· 2039 1667 mvq->desc_addr = ri->desc_addr; 2040 1668 mvq->device_addr = ri->device_addr; 2041 1669 mvq->driver_addr = ri->driver_addr; 2042 - mvq->event_cb = ri->cb; 2043 1670 } 2044 1671 } 2045 1672 2046 - static int mlx5_vdpa_change_map(struct mlx5_vdpa_net *ndev, struct vhost_iotlb *iotlb) 1673 + static int mlx5_vdpa_change_map(struct mlx5_vdpa_dev *mvdev, struct vhost_iotlb *iotlb) 2047 1674 { 1675 + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 2048 1676 int err; 2049 1677 2050 1678 suspend_vqs(ndev); ··· 2053 1681 goto err_mr; 2054 1682 2055 1683 teardown_driver(ndev); 2056 - mlx5_vdpa_destroy_mr(&ndev->mvdev); 2057 - err = mlx5_vdpa_create_mr(&ndev->mvdev, iotlb); 1684 + mlx5_vdpa_destroy_mr(mvdev); 1685 + err = mlx5_vdpa_create_mr(mvdev, iotlb); 2058 1686 if (err) 2059 1687 goto err_mr; 2060 1688 2061 - if (!(ndev->mvdev.status & VIRTIO_CONFIG_S_DRIVER_OK)) 1689 + if (!(mvdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) 2062 1690 return 0; 2063 1691 2064 1692 restore_channels_info(ndev); 2065 - err = setup_driver(ndev); 1693 + err = setup_driver(mvdev); 2066 1694 if (err) 2067 1695 goto err_setup; 2068 1696 2069 1697 return 0; 2070 1698 2071 1699 err_setup: 2072 - mlx5_vdpa_destroy_mr(&ndev->mvdev); 1700 + mlx5_vdpa_destroy_mr(mvdev); 2073 1701 err_mr: 2074 1702 return err; 2075 1703 } 2076 1704 2077 - static int setup_driver(struct mlx5_vdpa_net *ndev) 1705 + static int setup_driver(struct mlx5_vdpa_dev *mvdev) 2078 1706 { 1707 + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 2079 1708 int err; 2080 1709 2081 1710 mutex_lock(&ndev->reslock); 2082 1711 if (ndev->setup) { 2083 - mlx5_vdpa_warn(&ndev->mvdev, "setup driver called for already setup driver\n"); 1712 + mlx5_vdpa_warn(mvdev, "setup driver called for already setup driver\n"); 2084 1713 err = 0; 2085 1714 goto out; 2086 1715 } 2087 - err = setup_virtqueues(ndev); 1716 + err = setup_virtqueues(mvdev); 2088 1717 if (err) { 2089 - mlx5_vdpa_warn(&ndev->mvdev, "setup_virtqueues\n"); 1718 + mlx5_vdpa_warn(mvdev, "setup_virtqueues\n"); 2090 1719 goto out; 2091 1720 } 2092 1721 2093 1722 err = create_rqt(ndev); 2094 1723 if (err) { 2095 - mlx5_vdpa_warn(&ndev->mvdev, "create_rqt\n"); 1724 + mlx5_vdpa_warn(mvdev, "create_rqt\n"); 2096 1725 goto err_rqt; 2097 1726 } 2098 1727 2099 1728 err = create_tir(ndev); 2100 1729 if (err) { 2101 - mlx5_vdpa_warn(&ndev->mvdev, "create_tir\n"); 1730 + mlx5_vdpa_warn(mvdev, "create_tir\n"); 2102 1731 goto err_tir; 2103 1732 } 2104 1733 2105 1734 err = add_fwd_to_tir(ndev); 2106 1735 if (err) { 2107 - mlx5_vdpa_warn(&ndev->mvdev, "add_fwd_to_tir\n"); 1736 + mlx5_vdpa_warn(mvdev, "add_fwd_to_tir\n"); 2108 1737 goto err_fwd; 2109 1738 } 2110 1739 ndev->setup = true; ··· 2154 1781 int err; 2155 1782 2156 1783 print_status(mvdev, status, true); 2157 - if (!status) { 2158 - mlx5_vdpa_info(mvdev, "performing device reset\n"); 2159 - teardown_driver(ndev); 2160 - clear_vqs_ready(ndev); 2161 - mlx5_vdpa_destroy_mr(&ndev->mvdev); 2162 - ndev->mvdev.status = 0; 2163 - ndev->mvdev.mlx_features = 0; 2164 - ++mvdev->generation; 2165 - if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) { 2166 - if (mlx5_vdpa_create_mr(mvdev, NULL)) 2167 - mlx5_vdpa_warn(mvdev, "create MR failed\n"); 2168 - } 2169 - return; 2170 - } 2171 1784 2172 1785 if ((status ^ ndev->mvdev.status) & VIRTIO_CONFIG_S_DRIVER_OK) { 2173 1786 if (status & VIRTIO_CONFIG_S_DRIVER_OK) { 2174 - err = setup_driver(ndev); 1787 + err = setup_driver(mvdev); 2175 1788 if (err) { 2176 1789 mlx5_vdpa_warn(mvdev, "failed to setup driver\n"); 2177 1790 goto err_setup; ··· 2174 1815 err_setup: 2175 1816 mlx5_vdpa_destroy_mr(&ndev->mvdev); 2176 1817 ndev->mvdev.status |= VIRTIO_CONFIG_S_FAILED; 1818 + } 1819 + 1820 + static int mlx5_vdpa_reset(struct vdpa_device *vdev) 1821 + { 1822 + struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); 1823 + struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 1824 + 1825 + print_status(mvdev, 0, true); 1826 + mlx5_vdpa_info(mvdev, "performing device reset\n"); 1827 + teardown_driver(ndev); 1828 + clear_vqs_ready(ndev); 1829 + mlx5_vdpa_destroy_mr(&ndev->mvdev); 1830 + ndev->mvdev.status = 0; 1831 + ndev->mvdev.mlx_features = 0; 1832 + memset(ndev->event_cbs, 0, sizeof(ndev->event_cbs)); 1833 + ndev->mvdev.actual_features = 0; 1834 + ++mvdev->generation; 1835 + if (MLX5_CAP_GEN(mvdev->mdev, umem_uid_0)) { 1836 + if (mlx5_vdpa_create_mr(mvdev, NULL)) 1837 + mlx5_vdpa_warn(mvdev, "create MR failed\n"); 1838 + } 1839 + 1840 + return 0; 2177 1841 } 2178 1842 2179 1843 static size_t mlx5_vdpa_get_config_size(struct vdpa_device *vdev) ··· 2230 1848 static int mlx5_vdpa_set_map(struct vdpa_device *vdev, struct vhost_iotlb *iotlb) 2231 1849 { 2232 1850 struct mlx5_vdpa_dev *mvdev = to_mvdev(vdev); 2233 - struct mlx5_vdpa_net *ndev = to_mlx5_vdpa_ndev(mvdev); 2234 1851 bool change_map; 2235 1852 int err; 2236 1853 ··· 2240 1859 } 2241 1860 2242 1861 if (change_map) 2243 - return mlx5_vdpa_change_map(ndev, iotlb); 1862 + return mlx5_vdpa_change_map(mvdev, iotlb); 2244 1863 2245 1864 return 0; 2246 1865 } ··· 2269 1888 struct vdpa_notification_area ret = {}; 2270 1889 struct mlx5_vdpa_net *ndev; 2271 1890 phys_addr_t addr; 1891 + 1892 + if (!is_index_valid(mvdev, idx) || is_ctrl_vq_idx(mvdev, idx)) 1893 + return ret; 2272 1894 2273 1895 /* If SF BAR size is smaller than PAGE_SIZE, do not use direct 2274 1896 * notification to avoid the risk of mapping pages that contain BAR of more ··· 2312 1928 .get_vendor_id = mlx5_vdpa_get_vendor_id, 2313 1929 .get_status = mlx5_vdpa_get_status, 2314 1930 .set_status = mlx5_vdpa_set_status, 1931 + .reset = mlx5_vdpa_reset, 2315 1932 .get_config_size = mlx5_vdpa_get_config_size, 2316 1933 .get_config = mlx5_vdpa_get_config, 2317 1934 .set_config = mlx5_vdpa_set_config, ··· 2425 2040 max_vqs = min_t(u32, max_vqs, MLX5_MAX_SUPPORTED_VQS); 2426 2041 2427 2042 ndev = vdpa_alloc_device(struct mlx5_vdpa_net, mvdev.vdev, mdev->device, &mlx5_vdpa_ops, 2428 - name); 2043 + name, false); 2429 2044 if (IS_ERR(ndev)) 2430 2045 return PTR_ERR(ndev); 2431 2046 ··· 2448 2063 err = mlx5_mpfs_add_mac(pfmdev, config->mac); 2449 2064 if (err) 2450 2065 goto err_mtu; 2066 + 2067 + ndev->mvdev.mlx_features |= BIT_ULL(VIRTIO_NET_F_MAC); 2451 2068 } 2452 2069 2070 + config->max_virtqueue_pairs = cpu_to_mlx5vdpa16(mvdev, mlx5_vdpa_max_qps(max_vqs)); 2453 2071 mvdev->vdev.dma_dev = &mdev->pdev->dev; 2454 2072 err = mlx5_vdpa_alloc_resources(&ndev->mvdev); 2455 2073 if (err) ··· 2468 2080 if (err) 2469 2081 goto err_mr; 2470 2082 2083 + mvdev->wq = create_singlethread_workqueue("mlx5_vdpa_ctrl_wq"); 2084 + if (!mvdev->wq) { 2085 + err = -ENOMEM; 2086 + goto err_res2; 2087 + } 2088 + 2089 + ndev->cur_num_vqs = 2 * mlx5_vdpa_max_qps(max_vqs); 2471 2090 mvdev->vdev.mdev = &mgtdev->mgtdev; 2472 - err = _vdpa_register_device(&mvdev->vdev, 2 * mlx5_vdpa_max_qps(max_vqs)); 2091 + err = _vdpa_register_device(&mvdev->vdev, ndev->cur_num_vqs + 1); 2473 2092 if (err) 2474 2093 goto err_reg; 2475 2094 ··· 2484 2089 return 0; 2485 2090 2486 2091 err_reg: 2092 + destroy_workqueue(mvdev->wq); 2093 + err_res2: 2487 2094 free_resources(ndev); 2488 2095 err_mr: 2489 2096 mlx5_vdpa_destroy_mr(mvdev); ··· 2503 2106 static void mlx5_vdpa_dev_del(struct vdpa_mgmt_dev *v_mdev, struct vdpa_device *dev) 2504 2107 { 2505 2108 struct mlx5_vdpa_mgmtdev *mgtdev = container_of(v_mdev, struct mlx5_vdpa_mgmtdev, mgtdev); 2109 + struct mlx5_vdpa_dev *mvdev = to_mvdev(dev); 2506 2110 2111 + destroy_workqueue(mvdev->wq); 2507 2112 _vdpa_unregister_device(dev); 2508 2113 mgtdev->ndev = NULL; 2509 2114 }
+8 -1
drivers/vdpa/vdpa.c
··· 69 69 * @config: the bus operations that is supported by this device 70 70 * @size: size of the parent structure that contains private data 71 71 * @name: name of the vdpa device; optional. 72 + * @use_va: indicate whether virtual address must be used by this device 72 73 * 73 74 * Driver should use vdpa_alloc_device() wrapper macro instead of 74 75 * using this directly. ··· 79 78 */ 80 79 struct vdpa_device *__vdpa_alloc_device(struct device *parent, 81 80 const struct vdpa_config_ops *config, 82 - size_t size, const char *name) 81 + size_t size, const char *name, 82 + bool use_va) 83 83 { 84 84 struct vdpa_device *vdev; 85 85 int err = -EINVAL; ··· 89 87 goto err; 90 88 91 89 if (!!config->dma_map != !!config->dma_unmap) 90 + goto err; 91 + 92 + /* It should only work for the device that use on-chip IOMMU */ 93 + if (use_va && !(config->dma_map || config->set_map)) 92 94 goto err; 93 95 94 96 err = -ENOMEM; ··· 110 104 vdev->index = err; 111 105 vdev->config = config; 112 106 vdev->features_valid = false; 107 + vdev->use_va = use_va; 113 108 114 109 if (name) 115 110 err = dev_set_name(&vdev->dev, "%s", name);
+21 -8
drivers/vdpa/vdpa_sim/vdpa_sim.c
··· 92 92 vq->vring.notify = NULL; 93 93 } 94 94 95 - static void vdpasim_reset(struct vdpasim *vdpasim) 95 + static void vdpasim_do_reset(struct vdpasim *vdpasim) 96 96 { 97 97 int i; 98 98 ··· 137 137 int ret; 138 138 139 139 /* We set the limit_pfn to the maximum (ULONG_MAX - 1) */ 140 - iova = alloc_iova(&vdpasim->iova, size, ULONG_MAX - 1, true); 140 + iova = alloc_iova(&vdpasim->iova, size >> iova_shift(&vdpasim->iova), 141 + ULONG_MAX - 1, true); 141 142 if (!iova) 142 143 return DMA_MAPPING_ERROR; 143 144 ··· 251 250 ops = &vdpasim_config_ops; 252 251 253 252 vdpasim = vdpa_alloc_device(struct vdpasim, vdpa, NULL, ops, 254 - dev_attr->name); 253 + dev_attr->name, false); 255 254 if (IS_ERR(vdpasim)) { 256 255 ret = PTR_ERR(vdpasim); 257 256 goto err_alloc; ··· 460 459 461 460 spin_lock(&vdpasim->lock); 462 461 vdpasim->status = status; 463 - if (status == 0) 464 - vdpasim_reset(vdpasim); 465 462 spin_unlock(&vdpasim->lock); 463 + } 464 + 465 + static int vdpasim_reset(struct vdpa_device *vdpa) 466 + { 467 + struct vdpasim *vdpasim = vdpa_to_sim(vdpa); 468 + 469 + spin_lock(&vdpasim->lock); 470 + vdpasim->status = 0; 471 + vdpasim_do_reset(vdpasim); 472 + spin_unlock(&vdpasim->lock); 473 + 474 + return 0; 466 475 } 467 476 468 477 static size_t vdpasim_get_config_size(struct vdpa_device *vdpa) ··· 555 544 } 556 545 557 546 static int vdpasim_dma_map(struct vdpa_device *vdpa, u64 iova, u64 size, 558 - u64 pa, u32 perm) 547 + u64 pa, u32 perm, void *opaque) 559 548 { 560 549 struct vdpasim *vdpasim = vdpa_to_sim(vdpa); 561 550 int ret; 562 551 563 552 spin_lock(&vdpasim->iommu_lock); 564 - ret = vhost_iotlb_add_range(vdpasim->iommu, iova, iova + size - 1, pa, 565 - perm); 553 + ret = vhost_iotlb_add_range_ctx(vdpasim->iommu, iova, iova + size - 1, 554 + pa, perm, opaque); 566 555 spin_unlock(&vdpasim->iommu_lock); 567 556 568 557 return ret; ··· 618 607 .get_vendor_id = vdpasim_get_vendor_id, 619 608 .get_status = vdpasim_get_status, 620 609 .set_status = vdpasim_set_status, 610 + .reset = vdpasim_reset, 621 611 .get_config_size = vdpasim_get_config_size, 622 612 .get_config = vdpasim_get_config, 623 613 .set_config = vdpasim_set_config, ··· 647 635 .get_vendor_id = vdpasim_get_vendor_id, 648 636 .get_status = vdpasim_get_status, 649 637 .set_status = vdpasim_set_status, 638 + .reset = vdpasim_reset, 650 639 .get_config_size = vdpasim_get_config_size, 651 640 .get_config = vdpasim_get_config, 652 641 .set_config = vdpasim_set_config,
+5
drivers/vdpa/vdpa_user/Makefile
··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + 3 + vduse-y := vduse_dev.o iova_domain.o 4 + 5 + obj-$(CONFIG_VDPA_USER) += vduse.o
+545
drivers/vdpa/vdpa_user/iova_domain.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * MMU-based software IOTLB. 4 + * 5 + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights reserved. 6 + * 7 + * Author: Xie Yongji <xieyongji@bytedance.com> 8 + * 9 + */ 10 + 11 + #include <linux/slab.h> 12 + #include <linux/file.h> 13 + #include <linux/anon_inodes.h> 14 + #include <linux/highmem.h> 15 + #include <linux/vmalloc.h> 16 + #include <linux/vdpa.h> 17 + 18 + #include "iova_domain.h" 19 + 20 + static int vduse_iotlb_add_range(struct vduse_iova_domain *domain, 21 + u64 start, u64 last, 22 + u64 addr, unsigned int perm, 23 + struct file *file, u64 offset) 24 + { 25 + struct vdpa_map_file *map_file; 26 + int ret; 27 + 28 + map_file = kmalloc(sizeof(*map_file), GFP_ATOMIC); 29 + if (!map_file) 30 + return -ENOMEM; 31 + 32 + map_file->file = get_file(file); 33 + map_file->offset = offset; 34 + 35 + ret = vhost_iotlb_add_range_ctx(domain->iotlb, start, last, 36 + addr, perm, map_file); 37 + if (ret) { 38 + fput(map_file->file); 39 + kfree(map_file); 40 + return ret; 41 + } 42 + return 0; 43 + } 44 + 45 + static void vduse_iotlb_del_range(struct vduse_iova_domain *domain, 46 + u64 start, u64 last) 47 + { 48 + struct vdpa_map_file *map_file; 49 + struct vhost_iotlb_map *map; 50 + 51 + while ((map = vhost_iotlb_itree_first(domain->iotlb, start, last))) { 52 + map_file = (struct vdpa_map_file *)map->opaque; 53 + fput(map_file->file); 54 + kfree(map_file); 55 + vhost_iotlb_map_free(domain->iotlb, map); 56 + } 57 + } 58 + 59 + int vduse_domain_set_map(struct vduse_iova_domain *domain, 60 + struct vhost_iotlb *iotlb) 61 + { 62 + struct vdpa_map_file *map_file; 63 + struct vhost_iotlb_map *map; 64 + u64 start = 0ULL, last = ULLONG_MAX; 65 + int ret; 66 + 67 + spin_lock(&domain->iotlb_lock); 68 + vduse_iotlb_del_range(domain, start, last); 69 + 70 + for (map = vhost_iotlb_itree_first(iotlb, start, last); map; 71 + map = vhost_iotlb_itree_next(map, start, last)) { 72 + map_file = (struct vdpa_map_file *)map->opaque; 73 + ret = vduse_iotlb_add_range(domain, map->start, map->last, 74 + map->addr, map->perm, 75 + map_file->file, 76 + map_file->offset); 77 + if (ret) 78 + goto err; 79 + } 80 + spin_unlock(&domain->iotlb_lock); 81 + 82 + return 0; 83 + err: 84 + vduse_iotlb_del_range(domain, start, last); 85 + spin_unlock(&domain->iotlb_lock); 86 + return ret; 87 + } 88 + 89 + void vduse_domain_clear_map(struct vduse_iova_domain *domain, 90 + struct vhost_iotlb *iotlb) 91 + { 92 + struct vhost_iotlb_map *map; 93 + u64 start = 0ULL, last = ULLONG_MAX; 94 + 95 + spin_lock(&domain->iotlb_lock); 96 + for (map = vhost_iotlb_itree_first(iotlb, start, last); map; 97 + map = vhost_iotlb_itree_next(map, start, last)) { 98 + vduse_iotlb_del_range(domain, map->start, map->last); 99 + } 100 + spin_unlock(&domain->iotlb_lock); 101 + } 102 + 103 + static int vduse_domain_map_bounce_page(struct vduse_iova_domain *domain, 104 + u64 iova, u64 size, u64 paddr) 105 + { 106 + struct vduse_bounce_map *map; 107 + u64 last = iova + size - 1; 108 + 109 + while (iova <= last) { 110 + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; 111 + if (!map->bounce_page) { 112 + map->bounce_page = alloc_page(GFP_ATOMIC); 113 + if (!map->bounce_page) 114 + return -ENOMEM; 115 + } 116 + map->orig_phys = paddr; 117 + paddr += PAGE_SIZE; 118 + iova += PAGE_SIZE; 119 + } 120 + return 0; 121 + } 122 + 123 + static void vduse_domain_unmap_bounce_page(struct vduse_iova_domain *domain, 124 + u64 iova, u64 size) 125 + { 126 + struct vduse_bounce_map *map; 127 + u64 last = iova + size - 1; 128 + 129 + while (iova <= last) { 130 + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; 131 + map->orig_phys = INVALID_PHYS_ADDR; 132 + iova += PAGE_SIZE; 133 + } 134 + } 135 + 136 + static void do_bounce(phys_addr_t orig, void *addr, size_t size, 137 + enum dma_data_direction dir) 138 + { 139 + unsigned long pfn = PFN_DOWN(orig); 140 + unsigned int offset = offset_in_page(orig); 141 + char *buffer; 142 + unsigned int sz = 0; 143 + 144 + while (size) { 145 + sz = min_t(size_t, PAGE_SIZE - offset, size); 146 + 147 + buffer = kmap_atomic(pfn_to_page(pfn)); 148 + if (dir == DMA_TO_DEVICE) 149 + memcpy(addr, buffer + offset, sz); 150 + else 151 + memcpy(buffer + offset, addr, sz); 152 + kunmap_atomic(buffer); 153 + 154 + size -= sz; 155 + pfn++; 156 + addr += sz; 157 + offset = 0; 158 + } 159 + } 160 + 161 + static void vduse_domain_bounce(struct vduse_iova_domain *domain, 162 + dma_addr_t iova, size_t size, 163 + enum dma_data_direction dir) 164 + { 165 + struct vduse_bounce_map *map; 166 + unsigned int offset; 167 + void *addr; 168 + size_t sz; 169 + 170 + if (iova >= domain->bounce_size) 171 + return; 172 + 173 + while (size) { 174 + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; 175 + offset = offset_in_page(iova); 176 + sz = min_t(size_t, PAGE_SIZE - offset, size); 177 + 178 + if (WARN_ON(!map->bounce_page || 179 + map->orig_phys == INVALID_PHYS_ADDR)) 180 + return; 181 + 182 + addr = page_address(map->bounce_page) + offset; 183 + do_bounce(map->orig_phys + offset, addr, sz, dir); 184 + size -= sz; 185 + iova += sz; 186 + } 187 + } 188 + 189 + static struct page * 190 + vduse_domain_get_coherent_page(struct vduse_iova_domain *domain, u64 iova) 191 + { 192 + u64 start = iova & PAGE_MASK; 193 + u64 last = start + PAGE_SIZE - 1; 194 + struct vhost_iotlb_map *map; 195 + struct page *page = NULL; 196 + 197 + spin_lock(&domain->iotlb_lock); 198 + map = vhost_iotlb_itree_first(domain->iotlb, start, last); 199 + if (!map) 200 + goto out; 201 + 202 + page = pfn_to_page((map->addr + iova - map->start) >> PAGE_SHIFT); 203 + get_page(page); 204 + out: 205 + spin_unlock(&domain->iotlb_lock); 206 + 207 + return page; 208 + } 209 + 210 + static struct page * 211 + vduse_domain_get_bounce_page(struct vduse_iova_domain *domain, u64 iova) 212 + { 213 + struct vduse_bounce_map *map; 214 + struct page *page = NULL; 215 + 216 + spin_lock(&domain->iotlb_lock); 217 + map = &domain->bounce_maps[iova >> PAGE_SHIFT]; 218 + if (!map->bounce_page) 219 + goto out; 220 + 221 + page = map->bounce_page; 222 + get_page(page); 223 + out: 224 + spin_unlock(&domain->iotlb_lock); 225 + 226 + return page; 227 + } 228 + 229 + static void 230 + vduse_domain_free_bounce_pages(struct vduse_iova_domain *domain) 231 + { 232 + struct vduse_bounce_map *map; 233 + unsigned long pfn, bounce_pfns; 234 + 235 + bounce_pfns = domain->bounce_size >> PAGE_SHIFT; 236 + 237 + for (pfn = 0; pfn < bounce_pfns; pfn++) { 238 + map = &domain->bounce_maps[pfn]; 239 + if (WARN_ON(map->orig_phys != INVALID_PHYS_ADDR)) 240 + continue; 241 + 242 + if (!map->bounce_page) 243 + continue; 244 + 245 + __free_page(map->bounce_page); 246 + map->bounce_page = NULL; 247 + } 248 + } 249 + 250 + void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain) 251 + { 252 + if (!domain->bounce_map) 253 + return; 254 + 255 + spin_lock(&domain->iotlb_lock); 256 + if (!domain->bounce_map) 257 + goto unlock; 258 + 259 + vduse_iotlb_del_range(domain, 0, domain->bounce_size - 1); 260 + domain->bounce_map = 0; 261 + unlock: 262 + spin_unlock(&domain->iotlb_lock); 263 + } 264 + 265 + static int vduse_domain_init_bounce_map(struct vduse_iova_domain *domain) 266 + { 267 + int ret = 0; 268 + 269 + if (domain->bounce_map) 270 + return 0; 271 + 272 + spin_lock(&domain->iotlb_lock); 273 + if (domain->bounce_map) 274 + goto unlock; 275 + 276 + ret = vduse_iotlb_add_range(domain, 0, domain->bounce_size - 1, 277 + 0, VHOST_MAP_RW, domain->file, 0); 278 + if (ret) 279 + goto unlock; 280 + 281 + domain->bounce_map = 1; 282 + unlock: 283 + spin_unlock(&domain->iotlb_lock); 284 + return ret; 285 + } 286 + 287 + static dma_addr_t 288 + vduse_domain_alloc_iova(struct iova_domain *iovad, 289 + unsigned long size, unsigned long limit) 290 + { 291 + unsigned long shift = iova_shift(iovad); 292 + unsigned long iova_len = iova_align(iovad, size) >> shift; 293 + unsigned long iova_pfn; 294 + 295 + /* 296 + * Freeing non-power-of-two-sized allocations back into the IOVA caches 297 + * will come back to bite us badly, so we have to waste a bit of space 298 + * rounding up anything cacheable to make sure that can't happen. The 299 + * order of the unadjusted size will still match upon freeing. 300 + */ 301 + if (iova_len < (1 << (IOVA_RANGE_CACHE_MAX_SIZE - 1))) 302 + iova_len = roundup_pow_of_two(iova_len); 303 + iova_pfn = alloc_iova_fast(iovad, iova_len, limit >> shift, true); 304 + 305 + return iova_pfn << shift; 306 + } 307 + 308 + static void vduse_domain_free_iova(struct iova_domain *iovad, 309 + dma_addr_t iova, size_t size) 310 + { 311 + unsigned long shift = iova_shift(iovad); 312 + unsigned long iova_len = iova_align(iovad, size) >> shift; 313 + 314 + free_iova_fast(iovad, iova >> shift, iova_len); 315 + } 316 + 317 + dma_addr_t vduse_domain_map_page(struct vduse_iova_domain *domain, 318 + struct page *page, unsigned long offset, 319 + size_t size, enum dma_data_direction dir, 320 + unsigned long attrs) 321 + { 322 + struct iova_domain *iovad = &domain->stream_iovad; 323 + unsigned long limit = domain->bounce_size - 1; 324 + phys_addr_t pa = page_to_phys(page) + offset; 325 + dma_addr_t iova = vduse_domain_alloc_iova(iovad, size, limit); 326 + 327 + if (!iova) 328 + return DMA_MAPPING_ERROR; 329 + 330 + if (vduse_domain_init_bounce_map(domain)) 331 + goto err; 332 + 333 + if (vduse_domain_map_bounce_page(domain, (u64)iova, (u64)size, pa)) 334 + goto err; 335 + 336 + if (dir == DMA_TO_DEVICE || dir == DMA_BIDIRECTIONAL) 337 + vduse_domain_bounce(domain, iova, size, DMA_TO_DEVICE); 338 + 339 + return iova; 340 + err: 341 + vduse_domain_free_iova(iovad, iova, size); 342 + return DMA_MAPPING_ERROR; 343 + } 344 + 345 + void vduse_domain_unmap_page(struct vduse_iova_domain *domain, 346 + dma_addr_t dma_addr, size_t size, 347 + enum dma_data_direction dir, unsigned long attrs) 348 + { 349 + struct iova_domain *iovad = &domain->stream_iovad; 350 + 351 + if (dir == DMA_FROM_DEVICE || dir == DMA_BIDIRECTIONAL) 352 + vduse_domain_bounce(domain, dma_addr, size, DMA_FROM_DEVICE); 353 + 354 + vduse_domain_unmap_bounce_page(domain, (u64)dma_addr, (u64)size); 355 + vduse_domain_free_iova(iovad, dma_addr, size); 356 + } 357 + 358 + void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain, 359 + size_t size, dma_addr_t *dma_addr, 360 + gfp_t flag, unsigned long attrs) 361 + { 362 + struct iova_domain *iovad = &domain->consistent_iovad; 363 + unsigned long limit = domain->iova_limit; 364 + dma_addr_t iova = vduse_domain_alloc_iova(iovad, size, limit); 365 + void *orig = alloc_pages_exact(size, flag); 366 + 367 + if (!iova || !orig) 368 + goto err; 369 + 370 + spin_lock(&domain->iotlb_lock); 371 + if (vduse_iotlb_add_range(domain, (u64)iova, (u64)iova + size - 1, 372 + virt_to_phys(orig), VHOST_MAP_RW, 373 + domain->file, (u64)iova)) { 374 + spin_unlock(&domain->iotlb_lock); 375 + goto err; 376 + } 377 + spin_unlock(&domain->iotlb_lock); 378 + 379 + *dma_addr = iova; 380 + 381 + return orig; 382 + err: 383 + *dma_addr = DMA_MAPPING_ERROR; 384 + if (orig) 385 + free_pages_exact(orig, size); 386 + if (iova) 387 + vduse_domain_free_iova(iovad, iova, size); 388 + 389 + return NULL; 390 + } 391 + 392 + void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size, 393 + void *vaddr, dma_addr_t dma_addr, 394 + unsigned long attrs) 395 + { 396 + struct iova_domain *iovad = &domain->consistent_iovad; 397 + struct vhost_iotlb_map *map; 398 + struct vdpa_map_file *map_file; 399 + phys_addr_t pa; 400 + 401 + spin_lock(&domain->iotlb_lock); 402 + map = vhost_iotlb_itree_first(domain->iotlb, (u64)dma_addr, 403 + (u64)dma_addr + size - 1); 404 + if (WARN_ON(!map)) { 405 + spin_unlock(&domain->iotlb_lock); 406 + return; 407 + } 408 + map_file = (struct vdpa_map_file *)map->opaque; 409 + fput(map_file->file); 410 + kfree(map_file); 411 + pa = map->addr; 412 + vhost_iotlb_map_free(domain->iotlb, map); 413 + spin_unlock(&domain->iotlb_lock); 414 + 415 + vduse_domain_free_iova(iovad, dma_addr, size); 416 + free_pages_exact(phys_to_virt(pa), size); 417 + } 418 + 419 + static vm_fault_t vduse_domain_mmap_fault(struct vm_fault *vmf) 420 + { 421 + struct vduse_iova_domain *domain = vmf->vma->vm_private_data; 422 + unsigned long iova = vmf->pgoff << PAGE_SHIFT; 423 + struct page *page; 424 + 425 + if (!domain) 426 + return VM_FAULT_SIGBUS; 427 + 428 + if (iova < domain->bounce_size) 429 + page = vduse_domain_get_bounce_page(domain, iova); 430 + else 431 + page = vduse_domain_get_coherent_page(domain, iova); 432 + 433 + if (!page) 434 + return VM_FAULT_SIGBUS; 435 + 436 + vmf->page = page; 437 + 438 + return 0; 439 + } 440 + 441 + static const struct vm_operations_struct vduse_domain_mmap_ops = { 442 + .fault = vduse_domain_mmap_fault, 443 + }; 444 + 445 + static int vduse_domain_mmap(struct file *file, struct vm_area_struct *vma) 446 + { 447 + struct vduse_iova_domain *domain = file->private_data; 448 + 449 + vma->vm_flags |= VM_DONTDUMP | VM_DONTEXPAND; 450 + vma->vm_private_data = domain; 451 + vma->vm_ops = &vduse_domain_mmap_ops; 452 + 453 + return 0; 454 + } 455 + 456 + static int vduse_domain_release(struct inode *inode, struct file *file) 457 + { 458 + struct vduse_iova_domain *domain = file->private_data; 459 + 460 + spin_lock(&domain->iotlb_lock); 461 + vduse_iotlb_del_range(domain, 0, ULLONG_MAX); 462 + vduse_domain_free_bounce_pages(domain); 463 + spin_unlock(&domain->iotlb_lock); 464 + put_iova_domain(&domain->stream_iovad); 465 + put_iova_domain(&domain->consistent_iovad); 466 + vhost_iotlb_free(domain->iotlb); 467 + vfree(domain->bounce_maps); 468 + kfree(domain); 469 + 470 + return 0; 471 + } 472 + 473 + static const struct file_operations vduse_domain_fops = { 474 + .owner = THIS_MODULE, 475 + .mmap = vduse_domain_mmap, 476 + .release = vduse_domain_release, 477 + }; 478 + 479 + void vduse_domain_destroy(struct vduse_iova_domain *domain) 480 + { 481 + fput(domain->file); 482 + } 483 + 484 + struct vduse_iova_domain * 485 + vduse_domain_create(unsigned long iova_limit, size_t bounce_size) 486 + { 487 + struct vduse_iova_domain *domain; 488 + struct file *file; 489 + struct vduse_bounce_map *map; 490 + unsigned long pfn, bounce_pfns; 491 + 492 + bounce_pfns = PAGE_ALIGN(bounce_size) >> PAGE_SHIFT; 493 + if (iova_limit <= bounce_size) 494 + return NULL; 495 + 496 + domain = kzalloc(sizeof(*domain), GFP_KERNEL); 497 + if (!domain) 498 + return NULL; 499 + 500 + domain->iotlb = vhost_iotlb_alloc(0, 0); 501 + if (!domain->iotlb) 502 + goto err_iotlb; 503 + 504 + domain->iova_limit = iova_limit; 505 + domain->bounce_size = PAGE_ALIGN(bounce_size); 506 + domain->bounce_maps = vzalloc(bounce_pfns * 507 + sizeof(struct vduse_bounce_map)); 508 + if (!domain->bounce_maps) 509 + goto err_map; 510 + 511 + for (pfn = 0; pfn < bounce_pfns; pfn++) { 512 + map = &domain->bounce_maps[pfn]; 513 + map->orig_phys = INVALID_PHYS_ADDR; 514 + } 515 + file = anon_inode_getfile("[vduse-domain]", &vduse_domain_fops, 516 + domain, O_RDWR); 517 + if (IS_ERR(file)) 518 + goto err_file; 519 + 520 + domain->file = file; 521 + spin_lock_init(&domain->iotlb_lock); 522 + init_iova_domain(&domain->stream_iovad, 523 + PAGE_SIZE, IOVA_START_PFN); 524 + init_iova_domain(&domain->consistent_iovad, 525 + PAGE_SIZE, bounce_pfns); 526 + 527 + return domain; 528 + err_file: 529 + vfree(domain->bounce_maps); 530 + err_map: 531 + vhost_iotlb_free(domain->iotlb); 532 + err_iotlb: 533 + kfree(domain); 534 + return NULL; 535 + } 536 + 537 + int vduse_domain_init(void) 538 + { 539 + return iova_cache_get(); 540 + } 541 + 542 + void vduse_domain_exit(void) 543 + { 544 + iova_cache_put(); 545 + }
+73
drivers/vdpa/vdpa_user/iova_domain.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * MMU-based software IOTLB. 4 + * 5 + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights reserved. 6 + * 7 + * Author: Xie Yongji <xieyongji@bytedance.com> 8 + * 9 + */ 10 + 11 + #ifndef _VDUSE_IOVA_DOMAIN_H 12 + #define _VDUSE_IOVA_DOMAIN_H 13 + 14 + #include <linux/iova.h> 15 + #include <linux/dma-mapping.h> 16 + #include <linux/vhost_iotlb.h> 17 + 18 + #define IOVA_START_PFN 1 19 + 20 + #define INVALID_PHYS_ADDR (~(phys_addr_t)0) 21 + 22 + struct vduse_bounce_map { 23 + struct page *bounce_page; 24 + u64 orig_phys; 25 + }; 26 + 27 + struct vduse_iova_domain { 28 + struct iova_domain stream_iovad; 29 + struct iova_domain consistent_iovad; 30 + struct vduse_bounce_map *bounce_maps; 31 + size_t bounce_size; 32 + unsigned long iova_limit; 33 + int bounce_map; 34 + struct vhost_iotlb *iotlb; 35 + spinlock_t iotlb_lock; 36 + struct file *file; 37 + }; 38 + 39 + int vduse_domain_set_map(struct vduse_iova_domain *domain, 40 + struct vhost_iotlb *iotlb); 41 + 42 + void vduse_domain_clear_map(struct vduse_iova_domain *domain, 43 + struct vhost_iotlb *iotlb); 44 + 45 + dma_addr_t vduse_domain_map_page(struct vduse_iova_domain *domain, 46 + struct page *page, unsigned long offset, 47 + size_t size, enum dma_data_direction dir, 48 + unsigned long attrs); 49 + 50 + void vduse_domain_unmap_page(struct vduse_iova_domain *domain, 51 + dma_addr_t dma_addr, size_t size, 52 + enum dma_data_direction dir, unsigned long attrs); 53 + 54 + void *vduse_domain_alloc_coherent(struct vduse_iova_domain *domain, 55 + size_t size, dma_addr_t *dma_addr, 56 + gfp_t flag, unsigned long attrs); 57 + 58 + void vduse_domain_free_coherent(struct vduse_iova_domain *domain, size_t size, 59 + void *vaddr, dma_addr_t dma_addr, 60 + unsigned long attrs); 61 + 62 + void vduse_domain_reset_bounce_map(struct vduse_iova_domain *domain); 63 + 64 + void vduse_domain_destroy(struct vduse_iova_domain *domain); 65 + 66 + struct vduse_iova_domain *vduse_domain_create(unsigned long iova_limit, 67 + size_t bounce_size); 68 + 69 + int vduse_domain_init(void); 70 + 71 + void vduse_domain_exit(void); 72 + 73 + #endif /* _VDUSE_IOVA_DOMAIN_H */
+1641
drivers/vdpa/vdpa_user/vduse_dev.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * VDUSE: vDPA Device in Userspace 4 + * 5 + * Copyright (C) 2020-2021 Bytedance Inc. and/or its affiliates. All rights reserved. 6 + * 7 + * Author: Xie Yongji <xieyongji@bytedance.com> 8 + * 9 + */ 10 + 11 + #include <linux/init.h> 12 + #include <linux/module.h> 13 + #include <linux/cdev.h> 14 + #include <linux/device.h> 15 + #include <linux/eventfd.h> 16 + #include <linux/slab.h> 17 + #include <linux/wait.h> 18 + #include <linux/dma-map-ops.h> 19 + #include <linux/poll.h> 20 + #include <linux/file.h> 21 + #include <linux/uio.h> 22 + #include <linux/vdpa.h> 23 + #include <linux/nospec.h> 24 + #include <uapi/linux/vduse.h> 25 + #include <uapi/linux/vdpa.h> 26 + #include <uapi/linux/virtio_config.h> 27 + #include <uapi/linux/virtio_ids.h> 28 + #include <uapi/linux/virtio_blk.h> 29 + #include <linux/mod_devicetable.h> 30 + 31 + #include "iova_domain.h" 32 + 33 + #define DRV_AUTHOR "Yongji Xie <xieyongji@bytedance.com>" 34 + #define DRV_DESC "vDPA Device in Userspace" 35 + #define DRV_LICENSE "GPL v2" 36 + 37 + #define VDUSE_DEV_MAX (1U << MINORBITS) 38 + #define VDUSE_BOUNCE_SIZE (64 * 1024 * 1024) 39 + #define VDUSE_IOVA_SIZE (128 * 1024 * 1024) 40 + #define VDUSE_MSG_DEFAULT_TIMEOUT 30 41 + 42 + struct vduse_virtqueue { 43 + u16 index; 44 + u16 num_max; 45 + u32 num; 46 + u64 desc_addr; 47 + u64 driver_addr; 48 + u64 device_addr; 49 + struct vdpa_vq_state state; 50 + bool ready; 51 + bool kicked; 52 + spinlock_t kick_lock; 53 + spinlock_t irq_lock; 54 + struct eventfd_ctx *kickfd; 55 + struct vdpa_callback cb; 56 + struct work_struct inject; 57 + struct work_struct kick; 58 + }; 59 + 60 + struct vduse_dev; 61 + 62 + struct vduse_vdpa { 63 + struct vdpa_device vdpa; 64 + struct vduse_dev *dev; 65 + }; 66 + 67 + struct vduse_dev { 68 + struct vduse_vdpa *vdev; 69 + struct device *dev; 70 + struct vduse_virtqueue *vqs; 71 + struct vduse_iova_domain *domain; 72 + char *name; 73 + struct mutex lock; 74 + spinlock_t msg_lock; 75 + u64 msg_unique; 76 + u32 msg_timeout; 77 + wait_queue_head_t waitq; 78 + struct list_head send_list; 79 + struct list_head recv_list; 80 + struct vdpa_callback config_cb; 81 + struct work_struct inject; 82 + spinlock_t irq_lock; 83 + int minor; 84 + bool broken; 85 + bool connected; 86 + u64 api_version; 87 + u64 device_features; 88 + u64 driver_features; 89 + u32 device_id; 90 + u32 vendor_id; 91 + u32 generation; 92 + u32 config_size; 93 + void *config; 94 + u8 status; 95 + u32 vq_num; 96 + u32 vq_align; 97 + }; 98 + 99 + struct vduse_dev_msg { 100 + struct vduse_dev_request req; 101 + struct vduse_dev_response resp; 102 + struct list_head list; 103 + wait_queue_head_t waitq; 104 + bool completed; 105 + }; 106 + 107 + struct vduse_control { 108 + u64 api_version; 109 + }; 110 + 111 + static DEFINE_MUTEX(vduse_lock); 112 + static DEFINE_IDR(vduse_idr); 113 + 114 + static dev_t vduse_major; 115 + static struct class *vduse_class; 116 + static struct cdev vduse_ctrl_cdev; 117 + static struct cdev vduse_cdev; 118 + static struct workqueue_struct *vduse_irq_wq; 119 + 120 + static u32 allowed_device_id[] = { 121 + VIRTIO_ID_BLOCK, 122 + }; 123 + 124 + static inline struct vduse_dev *vdpa_to_vduse(struct vdpa_device *vdpa) 125 + { 126 + struct vduse_vdpa *vdev = container_of(vdpa, struct vduse_vdpa, vdpa); 127 + 128 + return vdev->dev; 129 + } 130 + 131 + static inline struct vduse_dev *dev_to_vduse(struct device *dev) 132 + { 133 + struct vdpa_device *vdpa = dev_to_vdpa(dev); 134 + 135 + return vdpa_to_vduse(vdpa); 136 + } 137 + 138 + static struct vduse_dev_msg *vduse_find_msg(struct list_head *head, 139 + uint32_t request_id) 140 + { 141 + struct vduse_dev_msg *msg; 142 + 143 + list_for_each_entry(msg, head, list) { 144 + if (msg->req.request_id == request_id) { 145 + list_del(&msg->list); 146 + return msg; 147 + } 148 + } 149 + 150 + return NULL; 151 + } 152 + 153 + static struct vduse_dev_msg *vduse_dequeue_msg(struct list_head *head) 154 + { 155 + struct vduse_dev_msg *msg = NULL; 156 + 157 + if (!list_empty(head)) { 158 + msg = list_first_entry(head, struct vduse_dev_msg, list); 159 + list_del(&msg->list); 160 + } 161 + 162 + return msg; 163 + } 164 + 165 + static void vduse_enqueue_msg(struct list_head *head, 166 + struct vduse_dev_msg *msg) 167 + { 168 + list_add_tail(&msg->list, head); 169 + } 170 + 171 + static void vduse_dev_broken(struct vduse_dev *dev) 172 + { 173 + struct vduse_dev_msg *msg, *tmp; 174 + 175 + if (unlikely(dev->broken)) 176 + return; 177 + 178 + list_splice_init(&dev->recv_list, &dev->send_list); 179 + list_for_each_entry_safe(msg, tmp, &dev->send_list, list) { 180 + list_del(&msg->list); 181 + msg->completed = 1; 182 + msg->resp.result = VDUSE_REQ_RESULT_FAILED; 183 + wake_up(&msg->waitq); 184 + } 185 + dev->broken = true; 186 + wake_up(&dev->waitq); 187 + } 188 + 189 + static int vduse_dev_msg_sync(struct vduse_dev *dev, 190 + struct vduse_dev_msg *msg) 191 + { 192 + int ret; 193 + 194 + if (unlikely(dev->broken)) 195 + return -EIO; 196 + 197 + init_waitqueue_head(&msg->waitq); 198 + spin_lock(&dev->msg_lock); 199 + if (unlikely(dev->broken)) { 200 + spin_unlock(&dev->msg_lock); 201 + return -EIO; 202 + } 203 + msg->req.request_id = dev->msg_unique++; 204 + vduse_enqueue_msg(&dev->send_list, msg); 205 + wake_up(&dev->waitq); 206 + spin_unlock(&dev->msg_lock); 207 + if (dev->msg_timeout) 208 + ret = wait_event_killable_timeout(msg->waitq, msg->completed, 209 + (long)dev->msg_timeout * HZ); 210 + else 211 + ret = wait_event_killable(msg->waitq, msg->completed); 212 + 213 + spin_lock(&dev->msg_lock); 214 + if (!msg->completed) { 215 + list_del(&msg->list); 216 + msg->resp.result = VDUSE_REQ_RESULT_FAILED; 217 + /* Mark the device as malfunction when there is a timeout */ 218 + if (!ret) 219 + vduse_dev_broken(dev); 220 + } 221 + ret = (msg->resp.result == VDUSE_REQ_RESULT_OK) ? 0 : -EIO; 222 + spin_unlock(&dev->msg_lock); 223 + 224 + return ret; 225 + } 226 + 227 + static int vduse_dev_get_vq_state_packed(struct vduse_dev *dev, 228 + struct vduse_virtqueue *vq, 229 + struct vdpa_vq_state_packed *packed) 230 + { 231 + struct vduse_dev_msg msg = { 0 }; 232 + int ret; 233 + 234 + msg.req.type = VDUSE_GET_VQ_STATE; 235 + msg.req.vq_state.index = vq->index; 236 + 237 + ret = vduse_dev_msg_sync(dev, &msg); 238 + if (ret) 239 + return ret; 240 + 241 + packed->last_avail_counter = 242 + msg.resp.vq_state.packed.last_avail_counter & 0x0001; 243 + packed->last_avail_idx = 244 + msg.resp.vq_state.packed.last_avail_idx & 0x7FFF; 245 + packed->last_used_counter = 246 + msg.resp.vq_state.packed.last_used_counter & 0x0001; 247 + packed->last_used_idx = 248 + msg.resp.vq_state.packed.last_used_idx & 0x7FFF; 249 + 250 + return 0; 251 + } 252 + 253 + static int vduse_dev_get_vq_state_split(struct vduse_dev *dev, 254 + struct vduse_virtqueue *vq, 255 + struct vdpa_vq_state_split *split) 256 + { 257 + struct vduse_dev_msg msg = { 0 }; 258 + int ret; 259 + 260 + msg.req.type = VDUSE_GET_VQ_STATE; 261 + msg.req.vq_state.index = vq->index; 262 + 263 + ret = vduse_dev_msg_sync(dev, &msg); 264 + if (ret) 265 + return ret; 266 + 267 + split->avail_index = msg.resp.vq_state.split.avail_index; 268 + 269 + return 0; 270 + } 271 + 272 + static int vduse_dev_set_status(struct vduse_dev *dev, u8 status) 273 + { 274 + struct vduse_dev_msg msg = { 0 }; 275 + 276 + msg.req.type = VDUSE_SET_STATUS; 277 + msg.req.s.status = status; 278 + 279 + return vduse_dev_msg_sync(dev, &msg); 280 + } 281 + 282 + static int vduse_dev_update_iotlb(struct vduse_dev *dev, 283 + u64 start, u64 last) 284 + { 285 + struct vduse_dev_msg msg = { 0 }; 286 + 287 + if (last < start) 288 + return -EINVAL; 289 + 290 + msg.req.type = VDUSE_UPDATE_IOTLB; 291 + msg.req.iova.start = start; 292 + msg.req.iova.last = last; 293 + 294 + return vduse_dev_msg_sync(dev, &msg); 295 + } 296 + 297 + static ssize_t vduse_dev_read_iter(struct kiocb *iocb, struct iov_iter *to) 298 + { 299 + struct file *file = iocb->ki_filp; 300 + struct vduse_dev *dev = file->private_data; 301 + struct vduse_dev_msg *msg; 302 + int size = sizeof(struct vduse_dev_request); 303 + ssize_t ret; 304 + 305 + if (iov_iter_count(to) < size) 306 + return -EINVAL; 307 + 308 + spin_lock(&dev->msg_lock); 309 + while (1) { 310 + msg = vduse_dequeue_msg(&dev->send_list); 311 + if (msg) 312 + break; 313 + 314 + ret = -EAGAIN; 315 + if (file->f_flags & O_NONBLOCK) 316 + goto unlock; 317 + 318 + spin_unlock(&dev->msg_lock); 319 + ret = wait_event_interruptible_exclusive(dev->waitq, 320 + !list_empty(&dev->send_list)); 321 + if (ret) 322 + return ret; 323 + 324 + spin_lock(&dev->msg_lock); 325 + } 326 + spin_unlock(&dev->msg_lock); 327 + ret = copy_to_iter(&msg->req, size, to); 328 + spin_lock(&dev->msg_lock); 329 + if (ret != size) { 330 + ret = -EFAULT; 331 + vduse_enqueue_msg(&dev->send_list, msg); 332 + goto unlock; 333 + } 334 + vduse_enqueue_msg(&dev->recv_list, msg); 335 + unlock: 336 + spin_unlock(&dev->msg_lock); 337 + 338 + return ret; 339 + } 340 + 341 + static bool is_mem_zero(const char *ptr, int size) 342 + { 343 + int i; 344 + 345 + for (i = 0; i < size; i++) { 346 + if (ptr[i]) 347 + return false; 348 + } 349 + return true; 350 + } 351 + 352 + static ssize_t vduse_dev_write_iter(struct kiocb *iocb, struct iov_iter *from) 353 + { 354 + struct file *file = iocb->ki_filp; 355 + struct vduse_dev *dev = file->private_data; 356 + struct vduse_dev_response resp; 357 + struct vduse_dev_msg *msg; 358 + size_t ret; 359 + 360 + ret = copy_from_iter(&resp, sizeof(resp), from); 361 + if (ret != sizeof(resp)) 362 + return -EINVAL; 363 + 364 + if (!is_mem_zero((const char *)resp.reserved, sizeof(resp.reserved))) 365 + return -EINVAL; 366 + 367 + spin_lock(&dev->msg_lock); 368 + msg = vduse_find_msg(&dev->recv_list, resp.request_id); 369 + if (!msg) { 370 + ret = -ENOENT; 371 + goto unlock; 372 + } 373 + 374 + memcpy(&msg->resp, &resp, sizeof(resp)); 375 + msg->completed = 1; 376 + wake_up(&msg->waitq); 377 + unlock: 378 + spin_unlock(&dev->msg_lock); 379 + 380 + return ret; 381 + } 382 + 383 + static __poll_t vduse_dev_poll(struct file *file, poll_table *wait) 384 + { 385 + struct vduse_dev *dev = file->private_data; 386 + __poll_t mask = 0; 387 + 388 + poll_wait(file, &dev->waitq, wait); 389 + 390 + spin_lock(&dev->msg_lock); 391 + 392 + if (unlikely(dev->broken)) 393 + mask |= EPOLLERR; 394 + if (!list_empty(&dev->send_list)) 395 + mask |= EPOLLIN | EPOLLRDNORM; 396 + if (!list_empty(&dev->recv_list)) 397 + mask |= EPOLLOUT | EPOLLWRNORM; 398 + 399 + spin_unlock(&dev->msg_lock); 400 + 401 + return mask; 402 + } 403 + 404 + static void vduse_dev_reset(struct vduse_dev *dev) 405 + { 406 + int i; 407 + struct vduse_iova_domain *domain = dev->domain; 408 + 409 + /* The coherent mappings are handled in vduse_dev_free_coherent() */ 410 + if (domain->bounce_map) 411 + vduse_domain_reset_bounce_map(domain); 412 + 413 + dev->status = 0; 414 + dev->driver_features = 0; 415 + dev->generation++; 416 + spin_lock(&dev->irq_lock); 417 + dev->config_cb.callback = NULL; 418 + dev->config_cb.private = NULL; 419 + spin_unlock(&dev->irq_lock); 420 + flush_work(&dev->inject); 421 + 422 + for (i = 0; i < dev->vq_num; i++) { 423 + struct vduse_virtqueue *vq = &dev->vqs[i]; 424 + 425 + vq->ready = false; 426 + vq->desc_addr = 0; 427 + vq->driver_addr = 0; 428 + vq->device_addr = 0; 429 + vq->num = 0; 430 + memset(&vq->state, 0, sizeof(vq->state)); 431 + 432 + spin_lock(&vq->kick_lock); 433 + vq->kicked = false; 434 + if (vq->kickfd) 435 + eventfd_ctx_put(vq->kickfd); 436 + vq->kickfd = NULL; 437 + spin_unlock(&vq->kick_lock); 438 + 439 + spin_lock(&vq->irq_lock); 440 + vq->cb.callback = NULL; 441 + vq->cb.private = NULL; 442 + spin_unlock(&vq->irq_lock); 443 + flush_work(&vq->inject); 444 + flush_work(&vq->kick); 445 + } 446 + } 447 + 448 + static int vduse_vdpa_set_vq_address(struct vdpa_device *vdpa, u16 idx, 449 + u64 desc_area, u64 driver_area, 450 + u64 device_area) 451 + { 452 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 453 + struct vduse_virtqueue *vq = &dev->vqs[idx]; 454 + 455 + vq->desc_addr = desc_area; 456 + vq->driver_addr = driver_area; 457 + vq->device_addr = device_area; 458 + 459 + return 0; 460 + } 461 + 462 + static void vduse_vq_kick(struct vduse_virtqueue *vq) 463 + { 464 + spin_lock(&vq->kick_lock); 465 + if (!vq->ready) 466 + goto unlock; 467 + 468 + if (vq->kickfd) 469 + eventfd_signal(vq->kickfd, 1); 470 + else 471 + vq->kicked = true; 472 + unlock: 473 + spin_unlock(&vq->kick_lock); 474 + } 475 + 476 + static void vduse_vq_kick_work(struct work_struct *work) 477 + { 478 + struct vduse_virtqueue *vq = container_of(work, 479 + struct vduse_virtqueue, kick); 480 + 481 + vduse_vq_kick(vq); 482 + } 483 + 484 + static void vduse_vdpa_kick_vq(struct vdpa_device *vdpa, u16 idx) 485 + { 486 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 487 + struct vduse_virtqueue *vq = &dev->vqs[idx]; 488 + 489 + if (!eventfd_signal_allowed()) { 490 + schedule_work(&vq->kick); 491 + return; 492 + } 493 + vduse_vq_kick(vq); 494 + } 495 + 496 + static void vduse_vdpa_set_vq_cb(struct vdpa_device *vdpa, u16 idx, 497 + struct vdpa_callback *cb) 498 + { 499 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 500 + struct vduse_virtqueue *vq = &dev->vqs[idx]; 501 + 502 + spin_lock(&vq->irq_lock); 503 + vq->cb.callback = cb->callback; 504 + vq->cb.private = cb->private; 505 + spin_unlock(&vq->irq_lock); 506 + } 507 + 508 + static void vduse_vdpa_set_vq_num(struct vdpa_device *vdpa, u16 idx, u32 num) 509 + { 510 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 511 + struct vduse_virtqueue *vq = &dev->vqs[idx]; 512 + 513 + vq->num = num; 514 + } 515 + 516 + static void vduse_vdpa_set_vq_ready(struct vdpa_device *vdpa, 517 + u16 idx, bool ready) 518 + { 519 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 520 + struct vduse_virtqueue *vq = &dev->vqs[idx]; 521 + 522 + vq->ready = ready; 523 + } 524 + 525 + static bool vduse_vdpa_get_vq_ready(struct vdpa_device *vdpa, u16 idx) 526 + { 527 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 528 + struct vduse_virtqueue *vq = &dev->vqs[idx]; 529 + 530 + return vq->ready; 531 + } 532 + 533 + static int vduse_vdpa_set_vq_state(struct vdpa_device *vdpa, u16 idx, 534 + const struct vdpa_vq_state *state) 535 + { 536 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 537 + struct vduse_virtqueue *vq = &dev->vqs[idx]; 538 + 539 + if (dev->driver_features & BIT_ULL(VIRTIO_F_RING_PACKED)) { 540 + vq->state.packed.last_avail_counter = 541 + state->packed.last_avail_counter; 542 + vq->state.packed.last_avail_idx = state->packed.last_avail_idx; 543 + vq->state.packed.last_used_counter = 544 + state->packed.last_used_counter; 545 + vq->state.packed.last_used_idx = state->packed.last_used_idx; 546 + } else 547 + vq->state.split.avail_index = state->split.avail_index; 548 + 549 + return 0; 550 + } 551 + 552 + static int vduse_vdpa_get_vq_state(struct vdpa_device *vdpa, u16 idx, 553 + struct vdpa_vq_state *state) 554 + { 555 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 556 + struct vduse_virtqueue *vq = &dev->vqs[idx]; 557 + 558 + if (dev->driver_features & BIT_ULL(VIRTIO_F_RING_PACKED)) 559 + return vduse_dev_get_vq_state_packed(dev, vq, &state->packed); 560 + 561 + return vduse_dev_get_vq_state_split(dev, vq, &state->split); 562 + } 563 + 564 + static u32 vduse_vdpa_get_vq_align(struct vdpa_device *vdpa) 565 + { 566 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 567 + 568 + return dev->vq_align; 569 + } 570 + 571 + static u64 vduse_vdpa_get_features(struct vdpa_device *vdpa) 572 + { 573 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 574 + 575 + return dev->device_features; 576 + } 577 + 578 + static int vduse_vdpa_set_features(struct vdpa_device *vdpa, u64 features) 579 + { 580 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 581 + 582 + dev->driver_features = features; 583 + return 0; 584 + } 585 + 586 + static void vduse_vdpa_set_config_cb(struct vdpa_device *vdpa, 587 + struct vdpa_callback *cb) 588 + { 589 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 590 + 591 + spin_lock(&dev->irq_lock); 592 + dev->config_cb.callback = cb->callback; 593 + dev->config_cb.private = cb->private; 594 + spin_unlock(&dev->irq_lock); 595 + } 596 + 597 + static u16 vduse_vdpa_get_vq_num_max(struct vdpa_device *vdpa) 598 + { 599 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 600 + u16 num_max = 0; 601 + int i; 602 + 603 + for (i = 0; i < dev->vq_num; i++) 604 + if (num_max < dev->vqs[i].num_max) 605 + num_max = dev->vqs[i].num_max; 606 + 607 + return num_max; 608 + } 609 + 610 + static u32 vduse_vdpa_get_device_id(struct vdpa_device *vdpa) 611 + { 612 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 613 + 614 + return dev->device_id; 615 + } 616 + 617 + static u32 vduse_vdpa_get_vendor_id(struct vdpa_device *vdpa) 618 + { 619 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 620 + 621 + return dev->vendor_id; 622 + } 623 + 624 + static u8 vduse_vdpa_get_status(struct vdpa_device *vdpa) 625 + { 626 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 627 + 628 + return dev->status; 629 + } 630 + 631 + static void vduse_vdpa_set_status(struct vdpa_device *vdpa, u8 status) 632 + { 633 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 634 + 635 + if (vduse_dev_set_status(dev, status)) 636 + return; 637 + 638 + dev->status = status; 639 + } 640 + 641 + static size_t vduse_vdpa_get_config_size(struct vdpa_device *vdpa) 642 + { 643 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 644 + 645 + return dev->config_size; 646 + } 647 + 648 + static void vduse_vdpa_get_config(struct vdpa_device *vdpa, unsigned int offset, 649 + void *buf, unsigned int len) 650 + { 651 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 652 + 653 + if (len > dev->config_size - offset) 654 + return; 655 + 656 + memcpy(buf, dev->config + offset, len); 657 + } 658 + 659 + static void vduse_vdpa_set_config(struct vdpa_device *vdpa, unsigned int offset, 660 + const void *buf, unsigned int len) 661 + { 662 + /* Now we only support read-only configuration space */ 663 + } 664 + 665 + static int vduse_vdpa_reset(struct vdpa_device *vdpa) 666 + { 667 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 668 + 669 + if (vduse_dev_set_status(dev, 0)) 670 + return -EIO; 671 + 672 + vduse_dev_reset(dev); 673 + 674 + return 0; 675 + } 676 + 677 + static u32 vduse_vdpa_get_generation(struct vdpa_device *vdpa) 678 + { 679 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 680 + 681 + return dev->generation; 682 + } 683 + 684 + static int vduse_vdpa_set_map(struct vdpa_device *vdpa, 685 + struct vhost_iotlb *iotlb) 686 + { 687 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 688 + int ret; 689 + 690 + ret = vduse_domain_set_map(dev->domain, iotlb); 691 + if (ret) 692 + return ret; 693 + 694 + ret = vduse_dev_update_iotlb(dev, 0ULL, ULLONG_MAX); 695 + if (ret) { 696 + vduse_domain_clear_map(dev->domain, iotlb); 697 + return ret; 698 + } 699 + 700 + return 0; 701 + } 702 + 703 + static void vduse_vdpa_free(struct vdpa_device *vdpa) 704 + { 705 + struct vduse_dev *dev = vdpa_to_vduse(vdpa); 706 + 707 + dev->vdev = NULL; 708 + } 709 + 710 + static const struct vdpa_config_ops vduse_vdpa_config_ops = { 711 + .set_vq_address = vduse_vdpa_set_vq_address, 712 + .kick_vq = vduse_vdpa_kick_vq, 713 + .set_vq_cb = vduse_vdpa_set_vq_cb, 714 + .set_vq_num = vduse_vdpa_set_vq_num, 715 + .set_vq_ready = vduse_vdpa_set_vq_ready, 716 + .get_vq_ready = vduse_vdpa_get_vq_ready, 717 + .set_vq_state = vduse_vdpa_set_vq_state, 718 + .get_vq_state = vduse_vdpa_get_vq_state, 719 + .get_vq_align = vduse_vdpa_get_vq_align, 720 + .get_features = vduse_vdpa_get_features, 721 + .set_features = vduse_vdpa_set_features, 722 + .set_config_cb = vduse_vdpa_set_config_cb, 723 + .get_vq_num_max = vduse_vdpa_get_vq_num_max, 724 + .get_device_id = vduse_vdpa_get_device_id, 725 + .get_vendor_id = vduse_vdpa_get_vendor_id, 726 + .get_status = vduse_vdpa_get_status, 727 + .set_status = vduse_vdpa_set_status, 728 + .get_config_size = vduse_vdpa_get_config_size, 729 + .get_config = vduse_vdpa_get_config, 730 + .set_config = vduse_vdpa_set_config, 731 + .get_generation = vduse_vdpa_get_generation, 732 + .reset = vduse_vdpa_reset, 733 + .set_map = vduse_vdpa_set_map, 734 + .free = vduse_vdpa_free, 735 + }; 736 + 737 + static dma_addr_t vduse_dev_map_page(struct device *dev, struct page *page, 738 + unsigned long offset, size_t size, 739 + enum dma_data_direction dir, 740 + unsigned long attrs) 741 + { 742 + struct vduse_dev *vdev = dev_to_vduse(dev); 743 + struct vduse_iova_domain *domain = vdev->domain; 744 + 745 + return vduse_domain_map_page(domain, page, offset, size, dir, attrs); 746 + } 747 + 748 + static void vduse_dev_unmap_page(struct device *dev, dma_addr_t dma_addr, 749 + size_t size, enum dma_data_direction dir, 750 + unsigned long attrs) 751 + { 752 + struct vduse_dev *vdev = dev_to_vduse(dev); 753 + struct vduse_iova_domain *domain = vdev->domain; 754 + 755 + return vduse_domain_unmap_page(domain, dma_addr, size, dir, attrs); 756 + } 757 + 758 + static void *vduse_dev_alloc_coherent(struct device *dev, size_t size, 759 + dma_addr_t *dma_addr, gfp_t flag, 760 + unsigned long attrs) 761 + { 762 + struct vduse_dev *vdev = dev_to_vduse(dev); 763 + struct vduse_iova_domain *domain = vdev->domain; 764 + unsigned long iova; 765 + void *addr; 766 + 767 + *dma_addr = DMA_MAPPING_ERROR; 768 + addr = vduse_domain_alloc_coherent(domain, size, 769 + (dma_addr_t *)&iova, flag, attrs); 770 + if (!addr) 771 + return NULL; 772 + 773 + *dma_addr = (dma_addr_t)iova; 774 + 775 + return addr; 776 + } 777 + 778 + static void vduse_dev_free_coherent(struct device *dev, size_t size, 779 + void *vaddr, dma_addr_t dma_addr, 780 + unsigned long attrs) 781 + { 782 + struct vduse_dev *vdev = dev_to_vduse(dev); 783 + struct vduse_iova_domain *domain = vdev->domain; 784 + 785 + vduse_domain_free_coherent(domain, size, vaddr, dma_addr, attrs); 786 + } 787 + 788 + static size_t vduse_dev_max_mapping_size(struct device *dev) 789 + { 790 + struct vduse_dev *vdev = dev_to_vduse(dev); 791 + struct vduse_iova_domain *domain = vdev->domain; 792 + 793 + return domain->bounce_size; 794 + } 795 + 796 + static const struct dma_map_ops vduse_dev_dma_ops = { 797 + .map_page = vduse_dev_map_page, 798 + .unmap_page = vduse_dev_unmap_page, 799 + .alloc = vduse_dev_alloc_coherent, 800 + .free = vduse_dev_free_coherent, 801 + .max_mapping_size = vduse_dev_max_mapping_size, 802 + }; 803 + 804 + static unsigned int perm_to_file_flags(u8 perm) 805 + { 806 + unsigned int flags = 0; 807 + 808 + switch (perm) { 809 + case VDUSE_ACCESS_WO: 810 + flags |= O_WRONLY; 811 + break; 812 + case VDUSE_ACCESS_RO: 813 + flags |= O_RDONLY; 814 + break; 815 + case VDUSE_ACCESS_RW: 816 + flags |= O_RDWR; 817 + break; 818 + default: 819 + WARN(1, "invalidate vhost IOTLB permission\n"); 820 + break; 821 + } 822 + 823 + return flags; 824 + } 825 + 826 + static int vduse_kickfd_setup(struct vduse_dev *dev, 827 + struct vduse_vq_eventfd *eventfd) 828 + { 829 + struct eventfd_ctx *ctx = NULL; 830 + struct vduse_virtqueue *vq; 831 + u32 index; 832 + 833 + if (eventfd->index >= dev->vq_num) 834 + return -EINVAL; 835 + 836 + index = array_index_nospec(eventfd->index, dev->vq_num); 837 + vq = &dev->vqs[index]; 838 + if (eventfd->fd >= 0) { 839 + ctx = eventfd_ctx_fdget(eventfd->fd); 840 + if (IS_ERR(ctx)) 841 + return PTR_ERR(ctx); 842 + } else if (eventfd->fd != VDUSE_EVENTFD_DEASSIGN) 843 + return 0; 844 + 845 + spin_lock(&vq->kick_lock); 846 + if (vq->kickfd) 847 + eventfd_ctx_put(vq->kickfd); 848 + vq->kickfd = ctx; 849 + if (vq->ready && vq->kicked && vq->kickfd) { 850 + eventfd_signal(vq->kickfd, 1); 851 + vq->kicked = false; 852 + } 853 + spin_unlock(&vq->kick_lock); 854 + 855 + return 0; 856 + } 857 + 858 + static bool vduse_dev_is_ready(struct vduse_dev *dev) 859 + { 860 + int i; 861 + 862 + for (i = 0; i < dev->vq_num; i++) 863 + if (!dev->vqs[i].num_max) 864 + return false; 865 + 866 + return true; 867 + } 868 + 869 + static void vduse_dev_irq_inject(struct work_struct *work) 870 + { 871 + struct vduse_dev *dev = container_of(work, struct vduse_dev, inject); 872 + 873 + spin_lock_irq(&dev->irq_lock); 874 + if (dev->config_cb.callback) 875 + dev->config_cb.callback(dev->config_cb.private); 876 + spin_unlock_irq(&dev->irq_lock); 877 + } 878 + 879 + static void vduse_vq_irq_inject(struct work_struct *work) 880 + { 881 + struct vduse_virtqueue *vq = container_of(work, 882 + struct vduse_virtqueue, inject); 883 + 884 + spin_lock_irq(&vq->irq_lock); 885 + if (vq->ready && vq->cb.callback) 886 + vq->cb.callback(vq->cb.private); 887 + spin_unlock_irq(&vq->irq_lock); 888 + } 889 + 890 + static long vduse_dev_ioctl(struct file *file, unsigned int cmd, 891 + unsigned long arg) 892 + { 893 + struct vduse_dev *dev = file->private_data; 894 + void __user *argp = (void __user *)arg; 895 + int ret; 896 + 897 + if (unlikely(dev->broken)) 898 + return -EPERM; 899 + 900 + switch (cmd) { 901 + case VDUSE_IOTLB_GET_FD: { 902 + struct vduse_iotlb_entry entry; 903 + struct vhost_iotlb_map *map; 904 + struct vdpa_map_file *map_file; 905 + struct vduse_iova_domain *domain = dev->domain; 906 + struct file *f = NULL; 907 + 908 + ret = -EFAULT; 909 + if (copy_from_user(&entry, argp, sizeof(entry))) 910 + break; 911 + 912 + ret = -EINVAL; 913 + if (entry.start > entry.last) 914 + break; 915 + 916 + spin_lock(&domain->iotlb_lock); 917 + map = vhost_iotlb_itree_first(domain->iotlb, 918 + entry.start, entry.last); 919 + if (map) { 920 + map_file = (struct vdpa_map_file *)map->opaque; 921 + f = get_file(map_file->file); 922 + entry.offset = map_file->offset; 923 + entry.start = map->start; 924 + entry.last = map->last; 925 + entry.perm = map->perm; 926 + } 927 + spin_unlock(&domain->iotlb_lock); 928 + ret = -EINVAL; 929 + if (!f) 930 + break; 931 + 932 + ret = -EFAULT; 933 + if (copy_to_user(argp, &entry, sizeof(entry))) { 934 + fput(f); 935 + break; 936 + } 937 + ret = receive_fd(f, perm_to_file_flags(entry.perm)); 938 + fput(f); 939 + break; 940 + } 941 + case VDUSE_DEV_GET_FEATURES: 942 + /* 943 + * Just mirror what driver wrote here. 944 + * The driver is expected to check FEATURE_OK later. 945 + */ 946 + ret = put_user(dev->driver_features, (u64 __user *)argp); 947 + break; 948 + case VDUSE_DEV_SET_CONFIG: { 949 + struct vduse_config_data config; 950 + unsigned long size = offsetof(struct vduse_config_data, 951 + buffer); 952 + 953 + ret = -EFAULT; 954 + if (copy_from_user(&config, argp, size)) 955 + break; 956 + 957 + ret = -EINVAL; 958 + if (config.length == 0 || 959 + config.length > dev->config_size - config.offset) 960 + break; 961 + 962 + ret = -EFAULT; 963 + if (copy_from_user(dev->config + config.offset, argp + size, 964 + config.length)) 965 + break; 966 + 967 + ret = 0; 968 + break; 969 + } 970 + case VDUSE_DEV_INJECT_CONFIG_IRQ: 971 + ret = 0; 972 + queue_work(vduse_irq_wq, &dev->inject); 973 + break; 974 + case VDUSE_VQ_SETUP: { 975 + struct vduse_vq_config config; 976 + u32 index; 977 + 978 + ret = -EFAULT; 979 + if (copy_from_user(&config, argp, sizeof(config))) 980 + break; 981 + 982 + ret = -EINVAL; 983 + if (config.index >= dev->vq_num) 984 + break; 985 + 986 + if (!is_mem_zero((const char *)config.reserved, 987 + sizeof(config.reserved))) 988 + break; 989 + 990 + index = array_index_nospec(config.index, dev->vq_num); 991 + dev->vqs[index].num_max = config.max_size; 992 + ret = 0; 993 + break; 994 + } 995 + case VDUSE_VQ_GET_INFO: { 996 + struct vduse_vq_info vq_info; 997 + struct vduse_virtqueue *vq; 998 + u32 index; 999 + 1000 + ret = -EFAULT; 1001 + if (copy_from_user(&vq_info, argp, sizeof(vq_info))) 1002 + break; 1003 + 1004 + ret = -EINVAL; 1005 + if (vq_info.index >= dev->vq_num) 1006 + break; 1007 + 1008 + index = array_index_nospec(vq_info.index, dev->vq_num); 1009 + vq = &dev->vqs[index]; 1010 + vq_info.desc_addr = vq->desc_addr; 1011 + vq_info.driver_addr = vq->driver_addr; 1012 + vq_info.device_addr = vq->device_addr; 1013 + vq_info.num = vq->num; 1014 + 1015 + if (dev->driver_features & BIT_ULL(VIRTIO_F_RING_PACKED)) { 1016 + vq_info.packed.last_avail_counter = 1017 + vq->state.packed.last_avail_counter; 1018 + vq_info.packed.last_avail_idx = 1019 + vq->state.packed.last_avail_idx; 1020 + vq_info.packed.last_used_counter = 1021 + vq->state.packed.last_used_counter; 1022 + vq_info.packed.last_used_idx = 1023 + vq->state.packed.last_used_idx; 1024 + } else 1025 + vq_info.split.avail_index = 1026 + vq->state.split.avail_index; 1027 + 1028 + vq_info.ready = vq->ready; 1029 + 1030 + ret = -EFAULT; 1031 + if (copy_to_user(argp, &vq_info, sizeof(vq_info))) 1032 + break; 1033 + 1034 + ret = 0; 1035 + break; 1036 + } 1037 + case VDUSE_VQ_SETUP_KICKFD: { 1038 + struct vduse_vq_eventfd eventfd; 1039 + 1040 + ret = -EFAULT; 1041 + if (copy_from_user(&eventfd, argp, sizeof(eventfd))) 1042 + break; 1043 + 1044 + ret = vduse_kickfd_setup(dev, &eventfd); 1045 + break; 1046 + } 1047 + case VDUSE_VQ_INJECT_IRQ: { 1048 + u32 index; 1049 + 1050 + ret = -EFAULT; 1051 + if (get_user(index, (u32 __user *)argp)) 1052 + break; 1053 + 1054 + ret = -EINVAL; 1055 + if (index >= dev->vq_num) 1056 + break; 1057 + 1058 + ret = 0; 1059 + index = array_index_nospec(index, dev->vq_num); 1060 + queue_work(vduse_irq_wq, &dev->vqs[index].inject); 1061 + break; 1062 + } 1063 + default: 1064 + ret = -ENOIOCTLCMD; 1065 + break; 1066 + } 1067 + 1068 + return ret; 1069 + } 1070 + 1071 + static int vduse_dev_release(struct inode *inode, struct file *file) 1072 + { 1073 + struct vduse_dev *dev = file->private_data; 1074 + 1075 + spin_lock(&dev->msg_lock); 1076 + /* Make sure the inflight messages can processed after reconncection */ 1077 + list_splice_init(&dev->recv_list, &dev->send_list); 1078 + spin_unlock(&dev->msg_lock); 1079 + dev->connected = false; 1080 + 1081 + return 0; 1082 + } 1083 + 1084 + static struct vduse_dev *vduse_dev_get_from_minor(int minor) 1085 + { 1086 + struct vduse_dev *dev; 1087 + 1088 + mutex_lock(&vduse_lock); 1089 + dev = idr_find(&vduse_idr, minor); 1090 + mutex_unlock(&vduse_lock); 1091 + 1092 + return dev; 1093 + } 1094 + 1095 + static int vduse_dev_open(struct inode *inode, struct file *file) 1096 + { 1097 + int ret; 1098 + struct vduse_dev *dev = vduse_dev_get_from_minor(iminor(inode)); 1099 + 1100 + if (!dev) 1101 + return -ENODEV; 1102 + 1103 + ret = -EBUSY; 1104 + mutex_lock(&dev->lock); 1105 + if (dev->connected) 1106 + goto unlock; 1107 + 1108 + ret = 0; 1109 + dev->connected = true; 1110 + file->private_data = dev; 1111 + unlock: 1112 + mutex_unlock(&dev->lock); 1113 + 1114 + return ret; 1115 + } 1116 + 1117 + static const struct file_operations vduse_dev_fops = { 1118 + .owner = THIS_MODULE, 1119 + .open = vduse_dev_open, 1120 + .release = vduse_dev_release, 1121 + .read_iter = vduse_dev_read_iter, 1122 + .write_iter = vduse_dev_write_iter, 1123 + .poll = vduse_dev_poll, 1124 + .unlocked_ioctl = vduse_dev_ioctl, 1125 + .compat_ioctl = compat_ptr_ioctl, 1126 + .llseek = noop_llseek, 1127 + }; 1128 + 1129 + static struct vduse_dev *vduse_dev_create(void) 1130 + { 1131 + struct vduse_dev *dev = kzalloc(sizeof(*dev), GFP_KERNEL); 1132 + 1133 + if (!dev) 1134 + return NULL; 1135 + 1136 + mutex_init(&dev->lock); 1137 + spin_lock_init(&dev->msg_lock); 1138 + INIT_LIST_HEAD(&dev->send_list); 1139 + INIT_LIST_HEAD(&dev->recv_list); 1140 + spin_lock_init(&dev->irq_lock); 1141 + 1142 + INIT_WORK(&dev->inject, vduse_dev_irq_inject); 1143 + init_waitqueue_head(&dev->waitq); 1144 + 1145 + return dev; 1146 + } 1147 + 1148 + static void vduse_dev_destroy(struct vduse_dev *dev) 1149 + { 1150 + kfree(dev); 1151 + } 1152 + 1153 + static struct vduse_dev *vduse_find_dev(const char *name) 1154 + { 1155 + struct vduse_dev *dev; 1156 + int id; 1157 + 1158 + idr_for_each_entry(&vduse_idr, dev, id) 1159 + if (!strcmp(dev->name, name)) 1160 + return dev; 1161 + 1162 + return NULL; 1163 + } 1164 + 1165 + static int vduse_destroy_dev(char *name) 1166 + { 1167 + struct vduse_dev *dev = vduse_find_dev(name); 1168 + 1169 + if (!dev) 1170 + return -EINVAL; 1171 + 1172 + mutex_lock(&dev->lock); 1173 + if (dev->vdev || dev->connected) { 1174 + mutex_unlock(&dev->lock); 1175 + return -EBUSY; 1176 + } 1177 + dev->connected = true; 1178 + mutex_unlock(&dev->lock); 1179 + 1180 + vduse_dev_reset(dev); 1181 + device_destroy(vduse_class, MKDEV(MAJOR(vduse_major), dev->minor)); 1182 + idr_remove(&vduse_idr, dev->minor); 1183 + kvfree(dev->config); 1184 + kfree(dev->vqs); 1185 + vduse_domain_destroy(dev->domain); 1186 + kfree(dev->name); 1187 + vduse_dev_destroy(dev); 1188 + module_put(THIS_MODULE); 1189 + 1190 + return 0; 1191 + } 1192 + 1193 + static bool device_is_allowed(u32 device_id) 1194 + { 1195 + int i; 1196 + 1197 + for (i = 0; i < ARRAY_SIZE(allowed_device_id); i++) 1198 + if (allowed_device_id[i] == device_id) 1199 + return true; 1200 + 1201 + return false; 1202 + } 1203 + 1204 + static bool features_is_valid(u64 features) 1205 + { 1206 + if (!(features & (1ULL << VIRTIO_F_ACCESS_PLATFORM))) 1207 + return false; 1208 + 1209 + /* Now we only support read-only configuration space */ 1210 + if (features & (1ULL << VIRTIO_BLK_F_CONFIG_WCE)) 1211 + return false; 1212 + 1213 + return true; 1214 + } 1215 + 1216 + static bool vduse_validate_config(struct vduse_dev_config *config) 1217 + { 1218 + if (!is_mem_zero((const char *)config->reserved, 1219 + sizeof(config->reserved))) 1220 + return false; 1221 + 1222 + if (config->vq_align > PAGE_SIZE) 1223 + return false; 1224 + 1225 + if (config->config_size > PAGE_SIZE) 1226 + return false; 1227 + 1228 + if (!device_is_allowed(config->device_id)) 1229 + return false; 1230 + 1231 + if (!features_is_valid(config->features)) 1232 + return false; 1233 + 1234 + return true; 1235 + } 1236 + 1237 + static ssize_t msg_timeout_show(struct device *device, 1238 + struct device_attribute *attr, char *buf) 1239 + { 1240 + struct vduse_dev *dev = dev_get_drvdata(device); 1241 + 1242 + return sysfs_emit(buf, "%u\n", dev->msg_timeout); 1243 + } 1244 + 1245 + static ssize_t msg_timeout_store(struct device *device, 1246 + struct device_attribute *attr, 1247 + const char *buf, size_t count) 1248 + { 1249 + struct vduse_dev *dev = dev_get_drvdata(device); 1250 + int ret; 1251 + 1252 + ret = kstrtouint(buf, 10, &dev->msg_timeout); 1253 + if (ret < 0) 1254 + return ret; 1255 + 1256 + return count; 1257 + } 1258 + 1259 + static DEVICE_ATTR_RW(msg_timeout); 1260 + 1261 + static struct attribute *vduse_dev_attrs[] = { 1262 + &dev_attr_msg_timeout.attr, 1263 + NULL 1264 + }; 1265 + 1266 + ATTRIBUTE_GROUPS(vduse_dev); 1267 + 1268 + static int vduse_create_dev(struct vduse_dev_config *config, 1269 + void *config_buf, u64 api_version) 1270 + { 1271 + int i, ret; 1272 + struct vduse_dev *dev; 1273 + 1274 + ret = -EEXIST; 1275 + if (vduse_find_dev(config->name)) 1276 + goto err; 1277 + 1278 + ret = -ENOMEM; 1279 + dev = vduse_dev_create(); 1280 + if (!dev) 1281 + goto err; 1282 + 1283 + dev->api_version = api_version; 1284 + dev->device_features = config->features; 1285 + dev->device_id = config->device_id; 1286 + dev->vendor_id = config->vendor_id; 1287 + dev->name = kstrdup(config->name, GFP_KERNEL); 1288 + if (!dev->name) 1289 + goto err_str; 1290 + 1291 + dev->domain = vduse_domain_create(VDUSE_IOVA_SIZE - 1, 1292 + VDUSE_BOUNCE_SIZE); 1293 + if (!dev->domain) 1294 + goto err_domain; 1295 + 1296 + dev->config = config_buf; 1297 + dev->config_size = config->config_size; 1298 + dev->vq_align = config->vq_align; 1299 + dev->vq_num = config->vq_num; 1300 + dev->vqs = kcalloc(dev->vq_num, sizeof(*dev->vqs), GFP_KERNEL); 1301 + if (!dev->vqs) 1302 + goto err_vqs; 1303 + 1304 + for (i = 0; i < dev->vq_num; i++) { 1305 + dev->vqs[i].index = i; 1306 + INIT_WORK(&dev->vqs[i].inject, vduse_vq_irq_inject); 1307 + INIT_WORK(&dev->vqs[i].kick, vduse_vq_kick_work); 1308 + spin_lock_init(&dev->vqs[i].kick_lock); 1309 + spin_lock_init(&dev->vqs[i].irq_lock); 1310 + } 1311 + 1312 + ret = idr_alloc(&vduse_idr, dev, 1, VDUSE_DEV_MAX, GFP_KERNEL); 1313 + if (ret < 0) 1314 + goto err_idr; 1315 + 1316 + dev->minor = ret; 1317 + dev->msg_timeout = VDUSE_MSG_DEFAULT_TIMEOUT; 1318 + dev->dev = device_create(vduse_class, NULL, 1319 + MKDEV(MAJOR(vduse_major), dev->minor), 1320 + dev, "%s", config->name); 1321 + if (IS_ERR(dev->dev)) { 1322 + ret = PTR_ERR(dev->dev); 1323 + goto err_dev; 1324 + } 1325 + __module_get(THIS_MODULE); 1326 + 1327 + return 0; 1328 + err_dev: 1329 + idr_remove(&vduse_idr, dev->minor); 1330 + err_idr: 1331 + kfree(dev->vqs); 1332 + err_vqs: 1333 + vduse_domain_destroy(dev->domain); 1334 + err_domain: 1335 + kfree(dev->name); 1336 + err_str: 1337 + vduse_dev_destroy(dev); 1338 + err: 1339 + kvfree(config_buf); 1340 + return ret; 1341 + } 1342 + 1343 + static long vduse_ioctl(struct file *file, unsigned int cmd, 1344 + unsigned long arg) 1345 + { 1346 + int ret; 1347 + void __user *argp = (void __user *)arg; 1348 + struct vduse_control *control = file->private_data; 1349 + 1350 + mutex_lock(&vduse_lock); 1351 + switch (cmd) { 1352 + case VDUSE_GET_API_VERSION: 1353 + ret = put_user(control->api_version, (u64 __user *)argp); 1354 + break; 1355 + case VDUSE_SET_API_VERSION: { 1356 + u64 api_version; 1357 + 1358 + ret = -EFAULT; 1359 + if (get_user(api_version, (u64 __user *)argp)) 1360 + break; 1361 + 1362 + ret = -EINVAL; 1363 + if (api_version > VDUSE_API_VERSION) 1364 + break; 1365 + 1366 + ret = 0; 1367 + control->api_version = api_version; 1368 + break; 1369 + } 1370 + case VDUSE_CREATE_DEV: { 1371 + struct vduse_dev_config config; 1372 + unsigned long size = offsetof(struct vduse_dev_config, config); 1373 + void *buf; 1374 + 1375 + ret = -EFAULT; 1376 + if (copy_from_user(&config, argp, size)) 1377 + break; 1378 + 1379 + ret = -EINVAL; 1380 + if (vduse_validate_config(&config) == false) 1381 + break; 1382 + 1383 + buf = vmemdup_user(argp + size, config.config_size); 1384 + if (IS_ERR(buf)) { 1385 + ret = PTR_ERR(buf); 1386 + break; 1387 + } 1388 + config.name[VDUSE_NAME_MAX - 1] = '\0'; 1389 + ret = vduse_create_dev(&config, buf, control->api_version); 1390 + break; 1391 + } 1392 + case VDUSE_DESTROY_DEV: { 1393 + char name[VDUSE_NAME_MAX]; 1394 + 1395 + ret = -EFAULT; 1396 + if (copy_from_user(name, argp, VDUSE_NAME_MAX)) 1397 + break; 1398 + 1399 + name[VDUSE_NAME_MAX - 1] = '\0'; 1400 + ret = vduse_destroy_dev(name); 1401 + break; 1402 + } 1403 + default: 1404 + ret = -EINVAL; 1405 + break; 1406 + } 1407 + mutex_unlock(&vduse_lock); 1408 + 1409 + return ret; 1410 + } 1411 + 1412 + static int vduse_release(struct inode *inode, struct file *file) 1413 + { 1414 + struct vduse_control *control = file->private_data; 1415 + 1416 + kfree(control); 1417 + return 0; 1418 + } 1419 + 1420 + static int vduse_open(struct inode *inode, struct file *file) 1421 + { 1422 + struct vduse_control *control; 1423 + 1424 + control = kmalloc(sizeof(struct vduse_control), GFP_KERNEL); 1425 + if (!control) 1426 + return -ENOMEM; 1427 + 1428 + control->api_version = VDUSE_API_VERSION; 1429 + file->private_data = control; 1430 + 1431 + return 0; 1432 + } 1433 + 1434 + static const struct file_operations vduse_ctrl_fops = { 1435 + .owner = THIS_MODULE, 1436 + .open = vduse_open, 1437 + .release = vduse_release, 1438 + .unlocked_ioctl = vduse_ioctl, 1439 + .compat_ioctl = compat_ptr_ioctl, 1440 + .llseek = noop_llseek, 1441 + }; 1442 + 1443 + static char *vduse_devnode(struct device *dev, umode_t *mode) 1444 + { 1445 + return kasprintf(GFP_KERNEL, "vduse/%s", dev_name(dev)); 1446 + } 1447 + 1448 + static void vduse_mgmtdev_release(struct device *dev) 1449 + { 1450 + } 1451 + 1452 + static struct device vduse_mgmtdev = { 1453 + .init_name = "vduse", 1454 + .release = vduse_mgmtdev_release, 1455 + }; 1456 + 1457 + static struct vdpa_mgmt_dev mgmt_dev; 1458 + 1459 + static int vduse_dev_init_vdpa(struct vduse_dev *dev, const char *name) 1460 + { 1461 + struct vduse_vdpa *vdev; 1462 + int ret; 1463 + 1464 + if (dev->vdev) 1465 + return -EEXIST; 1466 + 1467 + vdev = vdpa_alloc_device(struct vduse_vdpa, vdpa, dev->dev, 1468 + &vduse_vdpa_config_ops, name, true); 1469 + if (IS_ERR(vdev)) 1470 + return PTR_ERR(vdev); 1471 + 1472 + dev->vdev = vdev; 1473 + vdev->dev = dev; 1474 + vdev->vdpa.dev.dma_mask = &vdev->vdpa.dev.coherent_dma_mask; 1475 + ret = dma_set_mask_and_coherent(&vdev->vdpa.dev, DMA_BIT_MASK(64)); 1476 + if (ret) { 1477 + put_device(&vdev->vdpa.dev); 1478 + return ret; 1479 + } 1480 + set_dma_ops(&vdev->vdpa.dev, &vduse_dev_dma_ops); 1481 + vdev->vdpa.dma_dev = &vdev->vdpa.dev; 1482 + vdev->vdpa.mdev = &mgmt_dev; 1483 + 1484 + return 0; 1485 + } 1486 + 1487 + static int vdpa_dev_add(struct vdpa_mgmt_dev *mdev, const char *name) 1488 + { 1489 + struct vduse_dev *dev; 1490 + int ret; 1491 + 1492 + mutex_lock(&vduse_lock); 1493 + dev = vduse_find_dev(name); 1494 + if (!dev || !vduse_dev_is_ready(dev)) { 1495 + mutex_unlock(&vduse_lock); 1496 + return -EINVAL; 1497 + } 1498 + ret = vduse_dev_init_vdpa(dev, name); 1499 + mutex_unlock(&vduse_lock); 1500 + if (ret) 1501 + return ret; 1502 + 1503 + ret = _vdpa_register_device(&dev->vdev->vdpa, dev->vq_num); 1504 + if (ret) { 1505 + put_device(&dev->vdev->vdpa.dev); 1506 + return ret; 1507 + } 1508 + 1509 + return 0; 1510 + } 1511 + 1512 + static void vdpa_dev_del(struct vdpa_mgmt_dev *mdev, struct vdpa_device *dev) 1513 + { 1514 + _vdpa_unregister_device(dev); 1515 + } 1516 + 1517 + static const struct vdpa_mgmtdev_ops vdpa_dev_mgmtdev_ops = { 1518 + .dev_add = vdpa_dev_add, 1519 + .dev_del = vdpa_dev_del, 1520 + }; 1521 + 1522 + static struct virtio_device_id id_table[] = { 1523 + { VIRTIO_ID_BLOCK, VIRTIO_DEV_ANY_ID }, 1524 + { 0 }, 1525 + }; 1526 + 1527 + static struct vdpa_mgmt_dev mgmt_dev = { 1528 + .device = &vduse_mgmtdev, 1529 + .id_table = id_table, 1530 + .ops = &vdpa_dev_mgmtdev_ops, 1531 + }; 1532 + 1533 + static int vduse_mgmtdev_init(void) 1534 + { 1535 + int ret; 1536 + 1537 + ret = device_register(&vduse_mgmtdev); 1538 + if (ret) 1539 + return ret; 1540 + 1541 + ret = vdpa_mgmtdev_register(&mgmt_dev); 1542 + if (ret) 1543 + goto err; 1544 + 1545 + return 0; 1546 + err: 1547 + device_unregister(&vduse_mgmtdev); 1548 + return ret; 1549 + } 1550 + 1551 + static void vduse_mgmtdev_exit(void) 1552 + { 1553 + vdpa_mgmtdev_unregister(&mgmt_dev); 1554 + device_unregister(&vduse_mgmtdev); 1555 + } 1556 + 1557 + static int vduse_init(void) 1558 + { 1559 + int ret; 1560 + struct device *dev; 1561 + 1562 + vduse_class = class_create(THIS_MODULE, "vduse"); 1563 + if (IS_ERR(vduse_class)) 1564 + return PTR_ERR(vduse_class); 1565 + 1566 + vduse_class->devnode = vduse_devnode; 1567 + vduse_class->dev_groups = vduse_dev_groups; 1568 + 1569 + ret = alloc_chrdev_region(&vduse_major, 0, VDUSE_DEV_MAX, "vduse"); 1570 + if (ret) 1571 + goto err_chardev_region; 1572 + 1573 + /* /dev/vduse/control */ 1574 + cdev_init(&vduse_ctrl_cdev, &vduse_ctrl_fops); 1575 + vduse_ctrl_cdev.owner = THIS_MODULE; 1576 + ret = cdev_add(&vduse_ctrl_cdev, vduse_major, 1); 1577 + if (ret) 1578 + goto err_ctrl_cdev; 1579 + 1580 + dev = device_create(vduse_class, NULL, vduse_major, NULL, "control"); 1581 + if (IS_ERR(dev)) { 1582 + ret = PTR_ERR(dev); 1583 + goto err_device; 1584 + } 1585 + 1586 + /* /dev/vduse/$DEVICE */ 1587 + cdev_init(&vduse_cdev, &vduse_dev_fops); 1588 + vduse_cdev.owner = THIS_MODULE; 1589 + ret = cdev_add(&vduse_cdev, MKDEV(MAJOR(vduse_major), 1), 1590 + VDUSE_DEV_MAX - 1); 1591 + if (ret) 1592 + goto err_cdev; 1593 + 1594 + vduse_irq_wq = alloc_workqueue("vduse-irq", 1595 + WQ_HIGHPRI | WQ_SYSFS | WQ_UNBOUND, 0); 1596 + if (!vduse_irq_wq) 1597 + goto err_wq; 1598 + 1599 + ret = vduse_domain_init(); 1600 + if (ret) 1601 + goto err_domain; 1602 + 1603 + ret = vduse_mgmtdev_init(); 1604 + if (ret) 1605 + goto err_mgmtdev; 1606 + 1607 + return 0; 1608 + err_mgmtdev: 1609 + vduse_domain_exit(); 1610 + err_domain: 1611 + destroy_workqueue(vduse_irq_wq); 1612 + err_wq: 1613 + cdev_del(&vduse_cdev); 1614 + err_cdev: 1615 + device_destroy(vduse_class, vduse_major); 1616 + err_device: 1617 + cdev_del(&vduse_ctrl_cdev); 1618 + err_ctrl_cdev: 1619 + unregister_chrdev_region(vduse_major, VDUSE_DEV_MAX); 1620 + err_chardev_region: 1621 + class_destroy(vduse_class); 1622 + return ret; 1623 + } 1624 + module_init(vduse_init); 1625 + 1626 + static void vduse_exit(void) 1627 + { 1628 + vduse_mgmtdev_exit(); 1629 + vduse_domain_exit(); 1630 + destroy_workqueue(vduse_irq_wq); 1631 + cdev_del(&vduse_cdev); 1632 + device_destroy(vduse_class, vduse_major); 1633 + cdev_del(&vduse_ctrl_cdev); 1634 + unregister_chrdev_region(vduse_major, VDUSE_DEV_MAX); 1635 + class_destroy(vduse_class); 1636 + } 1637 + module_exit(vduse_exit); 1638 + 1639 + MODULE_LICENSE(DRV_LICENSE); 1640 + MODULE_AUTHOR(DRV_AUTHOR); 1641 + MODULE_DESCRIPTION(DRV_DESC);
+14 -3
drivers/vdpa/virtio_pci/vp_vdpa.c
··· 189 189 } 190 190 191 191 vp_modern_set_status(mdev, status); 192 + } 192 193 193 - if (!(status & VIRTIO_CONFIG_S_DRIVER_OK) && 194 - (s & VIRTIO_CONFIG_S_DRIVER_OK)) 194 + static int vp_vdpa_reset(struct vdpa_device *vdpa) 195 + { 196 + struct vp_vdpa *vp_vdpa = vdpa_to_vp(vdpa); 197 + struct virtio_pci_modern_device *mdev = &vp_vdpa->mdev; 198 + u8 s = vp_vdpa_get_status(vdpa); 199 + 200 + vp_modern_set_status(mdev, 0); 201 + 202 + if (s & VIRTIO_CONFIG_S_DRIVER_OK) 195 203 vp_vdpa_free_irq(vp_vdpa); 204 + 205 + return 0; 196 206 } 197 207 198 208 static u16 vp_vdpa_get_vq_num_max(struct vdpa_device *vdpa) ··· 408 398 .set_features = vp_vdpa_set_features, 409 399 .get_status = vp_vdpa_get_status, 410 400 .set_status = vp_vdpa_set_status, 401 + .reset = vp_vdpa_reset, 411 402 .get_vq_num_max = vp_vdpa_get_vq_num_max, 412 403 .get_vq_state = vp_vdpa_get_vq_state, 413 404 .get_vq_notification = vp_vdpa_get_vq_notification, ··· 446 435 return ret; 447 436 448 437 vp_vdpa = vdpa_alloc_device(struct vp_vdpa, vdpa, 449 - dev, &vp_vdpa_ops, NULL); 438 + dev, &vp_vdpa_ops, NULL, false); 450 439 if (IS_ERR(vp_vdpa)) { 451 440 dev_err(dev, "vp_vdpa: Failed to allocate vDPA structure\n"); 452 441 return PTR_ERR(vp_vdpa);
+16 -4
drivers/vhost/iotlb.c
··· 36 36 EXPORT_SYMBOL_GPL(vhost_iotlb_map_free); 37 37 38 38 /** 39 - * vhost_iotlb_add_range - add a new range to vhost IOTLB 39 + * vhost_iotlb_add_range_ctx - add a new range to vhost IOTLB 40 40 * @iotlb: the IOTLB 41 41 * @start: start of the IOVA range 42 42 * @last: last of IOVA range 43 43 * @addr: the address that is mapped to @start 44 44 * @perm: access permission of this range 45 + * @opaque: the opaque pointer for the new mapping 45 46 * 46 47 * Returns an error last is smaller than start or memory allocation 47 48 * fails 48 49 */ 49 - int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, 50 - u64 start, u64 last, 51 - u64 addr, unsigned int perm) 50 + int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, 51 + u64 start, u64 last, 52 + u64 addr, unsigned int perm, 53 + void *opaque) 52 54 { 53 55 struct vhost_iotlb_map *map; 54 56 ··· 73 71 map->last = last; 74 72 map->addr = addr; 75 73 map->perm = perm; 74 + map->opaque = opaque; 76 75 77 76 iotlb->nmaps++; 78 77 vhost_iotlb_itree_insert(map, &iotlb->root); ··· 82 79 list_add_tail(&map->link, &iotlb->list); 83 80 84 81 return 0; 82 + } 83 + EXPORT_SYMBOL_GPL(vhost_iotlb_add_range_ctx); 84 + 85 + int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, 86 + u64 start, u64 last, 87 + u64 addr, unsigned int perm) 88 + { 89 + return vhost_iotlb_add_range_ctx(iotlb, start, last, 90 + addr, perm, NULL); 85 91 } 86 92 EXPORT_SYMBOL_GPL(vhost_iotlb_add_range); 87 93
+1 -13
drivers/vhost/scsi.c
··· 1 + // SPDX-License-Identifier: GPL-2.0+ 1 2 /******************************************************************************* 2 3 * Vhost kernel TCM fabric driver for virtio SCSI initiators 3 4 * 4 5 * (C) Copyright 2010-2013 Datera, Inc. 5 6 * (C) Copyright 2010-2012 IBM Corp. 6 7 * 7 - * Licensed to the Linux Foundation under the General Public License (GPL) version 2. 8 - * 9 8 * Authors: Nicholas A. Bellinger <nab@daterainc.com> 10 9 * Stefan Hajnoczi <stefanha@linux.vnet.ibm.com> 11 - * 12 - * This program is free software; you can redistribute it and/or modify 13 - * it under the terms of the GNU General Public License as published by 14 - * the Free Software Foundation; either version 2 of the License, or 15 - * (at your option) any later version. 16 - * 17 - * This program is distributed in the hope that it will be useful, 18 - * but WITHOUT ANY WARRANTY; without even the implied warranty of 19 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 20 - * GNU General Public License for more details. 21 - * 22 10 ****************************************************************************/ 23 11 24 12 #include <linux/module.h>
+144 -44
drivers/vhost/vdpa.c
··· 116 116 irq_bypass_unregister_producer(&vq->call_ctx.producer); 117 117 } 118 118 119 - static void vhost_vdpa_reset(struct vhost_vdpa *v) 119 + static int vhost_vdpa_reset(struct vhost_vdpa *v) 120 120 { 121 121 struct vdpa_device *vdpa = v->vdpa; 122 122 123 - vdpa_reset(vdpa); 124 123 v->in_batch = 0; 124 + 125 + return vdpa_reset(vdpa); 125 126 } 126 127 127 128 static long vhost_vdpa_get_device_id(struct vhost_vdpa *v, u8 __user *argp) ··· 158 157 struct vdpa_device *vdpa = v->vdpa; 159 158 const struct vdpa_config_ops *ops = vdpa->config; 160 159 u8 status, status_old; 161 - int nvqs = v->nvqs; 160 + int ret, nvqs = v->nvqs; 162 161 u16 i; 163 162 164 163 if (copy_from_user(&status, statusp, sizeof(status))) ··· 173 172 if (status != 0 && (ops->get_status(vdpa) & ~status) != 0) 174 173 return -EINVAL; 175 174 176 - ops->set_status(vdpa, status); 175 + if (status == 0) { 176 + ret = ops->reset(vdpa); 177 + if (ret) 178 + return ret; 179 + } else 180 + ops->set_status(vdpa, status); 177 181 178 182 if ((status & VIRTIO_CONFIG_S_DRIVER_OK) && !(status_old & VIRTIO_CONFIG_S_DRIVER_OK)) 179 183 for (i = 0; i < nvqs; i++) ··· 504 498 return r; 505 499 } 506 500 507 - static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last) 501 + static void vhost_vdpa_pa_unmap(struct vhost_vdpa *v, u64 start, u64 last) 508 502 { 509 503 struct vhost_dev *dev = &v->vdev; 510 504 struct vhost_iotlb *iotlb = dev->iotlb; ··· 513 507 unsigned long pfn, pinned; 514 508 515 509 while ((map = vhost_iotlb_itree_first(iotlb, start, last)) != NULL) { 516 - pinned = map->size >> PAGE_SHIFT; 517 - for (pfn = map->addr >> PAGE_SHIFT; 510 + pinned = PFN_DOWN(map->size); 511 + for (pfn = PFN_DOWN(map->addr); 518 512 pinned > 0; pfn++, pinned--) { 519 513 page = pfn_to_page(pfn); 520 514 if (map->perm & VHOST_ACCESS_WO) 521 515 set_page_dirty_lock(page); 522 516 unpin_user_page(page); 523 517 } 524 - atomic64_sub(map->size >> PAGE_SHIFT, &dev->mm->pinned_vm); 518 + atomic64_sub(PFN_DOWN(map->size), &dev->mm->pinned_vm); 525 519 vhost_iotlb_map_free(iotlb, map); 526 520 } 521 + } 522 + 523 + static void vhost_vdpa_va_unmap(struct vhost_vdpa *v, u64 start, u64 last) 524 + { 525 + struct vhost_dev *dev = &v->vdev; 526 + struct vhost_iotlb *iotlb = dev->iotlb; 527 + struct vhost_iotlb_map *map; 528 + struct vdpa_map_file *map_file; 529 + 530 + while ((map = vhost_iotlb_itree_first(iotlb, start, last)) != NULL) { 531 + map_file = (struct vdpa_map_file *)map->opaque; 532 + fput(map_file->file); 533 + kfree(map_file); 534 + vhost_iotlb_map_free(iotlb, map); 535 + } 536 + } 537 + 538 + static void vhost_vdpa_iotlb_unmap(struct vhost_vdpa *v, u64 start, u64 last) 539 + { 540 + struct vdpa_device *vdpa = v->vdpa; 541 + 542 + if (vdpa->use_va) 543 + return vhost_vdpa_va_unmap(v, start, last); 544 + 545 + return vhost_vdpa_pa_unmap(v, start, last); 527 546 } 528 547 529 548 static void vhost_vdpa_iotlb_free(struct vhost_vdpa *v) ··· 582 551 return flags | IOMMU_CACHE; 583 552 } 584 553 585 - static int vhost_vdpa_map(struct vhost_vdpa *v, 586 - u64 iova, u64 size, u64 pa, u32 perm) 554 + static int vhost_vdpa_map(struct vhost_vdpa *v, u64 iova, 555 + u64 size, u64 pa, u32 perm, void *opaque) 587 556 { 588 557 struct vhost_dev *dev = &v->vdev; 589 558 struct vdpa_device *vdpa = v->vdpa; 590 559 const struct vdpa_config_ops *ops = vdpa->config; 591 560 int r = 0; 592 561 593 - r = vhost_iotlb_add_range(dev->iotlb, iova, iova + size - 1, 594 - pa, perm); 562 + r = vhost_iotlb_add_range_ctx(dev->iotlb, iova, iova + size - 1, 563 + pa, perm, opaque); 595 564 if (r) 596 565 return r; 597 566 598 567 if (ops->dma_map) { 599 - r = ops->dma_map(vdpa, iova, size, pa, perm); 568 + r = ops->dma_map(vdpa, iova, size, pa, perm, opaque); 600 569 } else if (ops->set_map) { 601 570 if (!v->in_batch) 602 571 r = ops->set_map(vdpa, dev->iotlb); ··· 604 573 r = iommu_map(v->domain, iova, pa, size, 605 574 perm_to_iommu_flags(perm)); 606 575 } 607 - 608 - if (r) 576 + if (r) { 609 577 vhost_iotlb_del_range(dev->iotlb, iova, iova + size - 1); 610 - else 611 - atomic64_add(size >> PAGE_SHIFT, &dev->mm->pinned_vm); 578 + return r; 579 + } 612 580 613 - return r; 581 + if (!vdpa->use_va) 582 + atomic64_add(PFN_DOWN(size), &dev->mm->pinned_vm); 583 + 584 + return 0; 614 585 } 615 586 616 587 static void vhost_vdpa_unmap(struct vhost_vdpa *v, u64 iova, u64 size) ··· 633 600 } 634 601 } 635 602 636 - static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, 637 - struct vhost_iotlb_msg *msg) 603 + static int vhost_vdpa_va_map(struct vhost_vdpa *v, 604 + u64 iova, u64 size, u64 uaddr, u32 perm) 638 605 { 639 606 struct vhost_dev *dev = &v->vdev; 640 - struct vhost_iotlb *iotlb = dev->iotlb; 607 + u64 offset, map_size, map_iova = iova; 608 + struct vdpa_map_file *map_file; 609 + struct vm_area_struct *vma; 610 + int ret; 611 + 612 + mmap_read_lock(dev->mm); 613 + 614 + while (size) { 615 + vma = find_vma(dev->mm, uaddr); 616 + if (!vma) { 617 + ret = -EINVAL; 618 + break; 619 + } 620 + map_size = min(size, vma->vm_end - uaddr); 621 + if (!(vma->vm_file && (vma->vm_flags & VM_SHARED) && 622 + !(vma->vm_flags & (VM_IO | VM_PFNMAP)))) 623 + goto next; 624 + 625 + map_file = kzalloc(sizeof(*map_file), GFP_KERNEL); 626 + if (!map_file) { 627 + ret = -ENOMEM; 628 + break; 629 + } 630 + offset = (vma->vm_pgoff << PAGE_SHIFT) + uaddr - vma->vm_start; 631 + map_file->offset = offset; 632 + map_file->file = get_file(vma->vm_file); 633 + ret = vhost_vdpa_map(v, map_iova, map_size, uaddr, 634 + perm, map_file); 635 + if (ret) { 636 + fput(map_file->file); 637 + kfree(map_file); 638 + break; 639 + } 640 + next: 641 + size -= map_size; 642 + uaddr += map_size; 643 + map_iova += map_size; 644 + } 645 + if (ret) 646 + vhost_vdpa_unmap(v, iova, map_iova - iova); 647 + 648 + mmap_read_unlock(dev->mm); 649 + 650 + return ret; 651 + } 652 + 653 + static int vhost_vdpa_pa_map(struct vhost_vdpa *v, 654 + u64 iova, u64 size, u64 uaddr, u32 perm) 655 + { 656 + struct vhost_dev *dev = &v->vdev; 641 657 struct page **page_list; 642 658 unsigned long list_size = PAGE_SIZE / sizeof(struct page *); 643 659 unsigned int gup_flags = FOLL_LONGTERM; 644 660 unsigned long npages, cur_base, map_pfn, last_pfn = 0; 645 661 unsigned long lock_limit, sz2pin, nchunks, i; 646 - u64 iova = msg->iova; 662 + u64 start = iova; 647 663 long pinned; 648 664 int ret = 0; 649 - 650 - if (msg->iova < v->range.first || !msg->size || 651 - msg->iova > U64_MAX - msg->size + 1 || 652 - msg->iova + msg->size - 1 > v->range.last) 653 - return -EINVAL; 654 - 655 - if (vhost_iotlb_itree_first(iotlb, msg->iova, 656 - msg->iova + msg->size - 1)) 657 - return -EEXIST; 658 665 659 666 /* Limit the use of memory for bookkeeping */ 660 667 page_list = (struct page **) __get_free_page(GFP_KERNEL); 661 668 if (!page_list) 662 669 return -ENOMEM; 663 670 664 - if (msg->perm & VHOST_ACCESS_WO) 671 + if (perm & VHOST_ACCESS_WO) 665 672 gup_flags |= FOLL_WRITE; 666 673 667 - npages = PAGE_ALIGN(msg->size + (iova & ~PAGE_MASK)) >> PAGE_SHIFT; 674 + npages = PFN_UP(size + (iova & ~PAGE_MASK)); 668 675 if (!npages) { 669 676 ret = -EINVAL; 670 677 goto free; ··· 712 639 713 640 mmap_read_lock(dev->mm); 714 641 715 - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; 642 + lock_limit = PFN_DOWN(rlimit(RLIMIT_MEMLOCK)); 716 643 if (npages + atomic64_read(&dev->mm->pinned_vm) > lock_limit) { 717 644 ret = -ENOMEM; 718 645 goto unlock; 719 646 } 720 647 721 - cur_base = msg->uaddr & PAGE_MASK; 648 + cur_base = uaddr & PAGE_MASK; 722 649 iova &= PAGE_MASK; 723 650 nchunks = 0; 724 651 ··· 746 673 747 674 if (last_pfn && (this_pfn != last_pfn + 1)) { 748 675 /* Pin a contiguous chunk of memory */ 749 - csize = (last_pfn - map_pfn + 1) << PAGE_SHIFT; 676 + csize = PFN_PHYS(last_pfn - map_pfn + 1); 750 677 ret = vhost_vdpa_map(v, iova, csize, 751 - map_pfn << PAGE_SHIFT, 752 - msg->perm); 678 + PFN_PHYS(map_pfn), 679 + perm, NULL); 753 680 if (ret) { 754 681 /* 755 682 * Unpin the pages that are left unmapped ··· 772 699 last_pfn = this_pfn; 773 700 } 774 701 775 - cur_base += pinned << PAGE_SHIFT; 702 + cur_base += PFN_PHYS(pinned); 776 703 npages -= pinned; 777 704 } 778 705 779 706 /* Pin the rest chunk */ 780 - ret = vhost_vdpa_map(v, iova, (last_pfn - map_pfn + 1) << PAGE_SHIFT, 781 - map_pfn << PAGE_SHIFT, msg->perm); 707 + ret = vhost_vdpa_map(v, iova, PFN_PHYS(last_pfn - map_pfn + 1), 708 + PFN_PHYS(map_pfn), perm, NULL); 782 709 out: 783 710 if (ret) { 784 711 if (nchunks) { ··· 797 724 for (pfn = map_pfn; pfn <= last_pfn; pfn++) 798 725 unpin_user_page(pfn_to_page(pfn)); 799 726 } 800 - vhost_vdpa_unmap(v, msg->iova, msg->size); 727 + vhost_vdpa_unmap(v, start, size); 801 728 } 802 729 unlock: 803 730 mmap_read_unlock(dev->mm); 804 731 free: 805 732 free_page((unsigned long)page_list); 806 733 return ret; 734 + 735 + } 736 + 737 + static int vhost_vdpa_process_iotlb_update(struct vhost_vdpa *v, 738 + struct vhost_iotlb_msg *msg) 739 + { 740 + struct vhost_dev *dev = &v->vdev; 741 + struct vdpa_device *vdpa = v->vdpa; 742 + struct vhost_iotlb *iotlb = dev->iotlb; 743 + 744 + if (msg->iova < v->range.first || !msg->size || 745 + msg->iova > U64_MAX - msg->size + 1 || 746 + msg->iova + msg->size - 1 > v->range.last) 747 + return -EINVAL; 748 + 749 + if (vhost_iotlb_itree_first(iotlb, msg->iova, 750 + msg->iova + msg->size - 1)) 751 + return -EEXIST; 752 + 753 + if (vdpa->use_va) 754 + return vhost_vdpa_va_map(v, msg->iova, msg->size, 755 + msg->uaddr, msg->perm); 756 + 757 + return vhost_vdpa_pa_map(v, msg->iova, msg->size, msg->uaddr, 758 + msg->perm); 807 759 } 808 760 809 761 static int vhost_vdpa_process_iotlb_msg(struct vhost_dev *dev, ··· 958 860 return -EBUSY; 959 861 960 862 nvqs = v->nvqs; 961 - vhost_vdpa_reset(v); 863 + r = vhost_vdpa_reset(v); 864 + if (r) 865 + goto err; 962 866 963 867 vqs = kmalloc_array(nvqs, sizeof(*vqs), GFP_KERNEL); 964 868 if (!vqs) { ··· 1045 945 1046 946 vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); 1047 947 if (remap_pfn_range(vma, vmf->address & PAGE_MASK, 1048 - notify.addr >> PAGE_SHIFT, PAGE_SIZE, 948 + PFN_DOWN(notify.addr), PAGE_SIZE, 1049 949 vma->vm_page_prot)) 1050 950 return VM_FAULT_SIGBUS; 1051 951
+16 -12
drivers/vhost/vsock.c
··· 114 114 size_t nbytes; 115 115 size_t iov_len, payload_len; 116 116 int head; 117 - bool restore_flag = false; 117 + u32 flags_to_restore = 0; 118 118 119 119 spin_lock_bh(&vsock->send_pkt_list_lock); 120 120 if (list_empty(&vsock->send_pkt_list)) { ··· 178 178 * small rx buffers, headers of packets in rx queue are 179 179 * created dynamically and are initialized with header 180 180 * of current packet(except length). But in case of 181 - * SOCK_SEQPACKET, we also must clear record delimeter 182 - * bit(VIRTIO_VSOCK_SEQ_EOR). Otherwise, instead of one 183 - * packet with delimeter(which marks end of record), 184 - * there will be sequence of packets with delimeter 185 - * bit set. After initialized header will be copied to 186 - * rx buffer, this bit will be restored. 181 + * SOCK_SEQPACKET, we also must clear message delimeter 182 + * bit (VIRTIO_VSOCK_SEQ_EOM) and MSG_EOR bit 183 + * (VIRTIO_VSOCK_SEQ_EOR) if set. Otherwise, 184 + * there will be sequence of packets with these 185 + * bits set. After initialized header will be copied to 186 + * rx buffer, these required bits will be restored. 187 187 */ 188 - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR) { 189 - pkt->hdr.flags &= ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); 190 - restore_flag = true; 188 + if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) { 189 + pkt->hdr.flags &= ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); 190 + flags_to_restore |= VIRTIO_VSOCK_SEQ_EOM; 191 + 192 + if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR) { 193 + pkt->hdr.flags &= ~cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); 194 + flags_to_restore |= VIRTIO_VSOCK_SEQ_EOR; 195 + } 191 196 } 192 197 } 193 198 ··· 229 224 * to send it with the next available buffer. 230 225 */ 231 226 if (pkt->off < pkt->len) { 232 - if (restore_flag) 233 - pkt->hdr.flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); 227 + pkt->hdr.flags |= cpu_to_le32(flags_to_restore); 234 228 235 229 /* We are queueing the same virtio_vsock_pkt to handle 236 230 * the remaining bytes, and we want to deliver it
+53 -3
drivers/virtio/virtio.c
··· 4 4 #include <linux/virtio_config.h> 5 5 #include <linux/module.h> 6 6 #include <linux/idr.h> 7 + #include <linux/of.h> 7 8 #include <uapi/linux/virtio_ids.h> 8 9 9 10 /* Unique numbering for virtio devices. */ ··· 293 292 294 293 /* Acknowledge the device's existence again. */ 295 294 virtio_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE); 295 + 296 + of_node_put(dev->dev.of_node); 296 297 } 297 298 298 299 static struct bus_type virtio_bus = { ··· 321 318 } 322 319 EXPORT_SYMBOL_GPL(unregister_virtio_driver); 323 320 321 + static int virtio_device_of_init(struct virtio_device *dev) 322 + { 323 + struct device_node *np, *pnode = dev_of_node(dev->dev.parent); 324 + char compat[] = "virtio,deviceXXXXXXXX"; 325 + int ret, count; 326 + 327 + if (!pnode) 328 + return 0; 329 + 330 + count = of_get_available_child_count(pnode); 331 + if (!count) 332 + return 0; 333 + 334 + /* There can be only 1 child node */ 335 + if (WARN_ON(count > 1)) 336 + return -EINVAL; 337 + 338 + np = of_get_next_available_child(pnode, NULL); 339 + if (WARN_ON(!np)) 340 + return -ENODEV; 341 + 342 + ret = snprintf(compat, sizeof(compat), "virtio,device%x", dev->id.device); 343 + BUG_ON(ret >= sizeof(compat)); 344 + 345 + if (!of_device_is_compatible(np, compat)) { 346 + ret = -EINVAL; 347 + goto out; 348 + } 349 + 350 + dev->dev.of_node = np; 351 + return 0; 352 + 353 + out: 354 + of_node_put(np); 355 + return ret; 356 + } 357 + 324 358 /** 325 359 * register_virtio_device - register virtio device 326 360 * @dev : virtio device to be registered ··· 382 342 dev->index = err; 383 343 dev_set_name(&dev->dev, "virtio%u", dev->index); 384 344 345 + err = virtio_device_of_init(dev); 346 + if (err) 347 + goto out_ida_remove; 348 + 385 349 spin_lock_init(&dev->config_lock); 386 350 dev->config_enabled = false; 387 351 dev->config_change_pending = false; ··· 406 362 */ 407 363 err = device_add(&dev->dev); 408 364 if (err) 409 - ida_simple_remove(&virtio_index_ida, dev->index); 365 + goto out_of_node_put; 366 + 367 + return 0; 368 + 369 + out_of_node_put: 370 + of_node_put(dev->dev.of_node); 371 + out_ida_remove: 372 + ida_simple_remove(&virtio_index_ida, dev->index); 410 373 out: 411 - if (err) 412 - virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED); 374 + virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED); 413 375 return err; 414 376 } 415 377 EXPORT_SYMBOL_GPL(register_virtio_device);
+2 -2
drivers/virtio/virtio_balloon.c
··· 531 531 callbacks[VIRTIO_BALLOON_VQ_REPORTING] = balloon_ack; 532 532 } 533 533 534 - err = vb->vdev->config->find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, 535 - vqs, callbacks, names, NULL, NULL); 534 + err = virtio_find_vqs(vb->vdev, VIRTIO_BALLOON_VQ_MAX, vqs, 535 + callbacks, names, NULL); 536 536 if (err) 537 537 return err; 538 538
+6
fs/file.c
··· 1150 1150 return new_fd; 1151 1151 } 1152 1152 1153 + int receive_fd(struct file *file, unsigned int o_flags) 1154 + { 1155 + return __receive_fd(file, NULL, o_flags); 1156 + } 1157 + EXPORT_SYMBOL_GPL(receive_fd); 1158 + 1153 1159 static int ksys_dup3(unsigned int oldfd, unsigned int newfd, int flags) 1154 1160 { 1155 1161 int err = -EBADF;
+3 -4
include/linux/file.h
··· 94 94 95 95 extern int __receive_fd(struct file *file, int __user *ufd, 96 96 unsigned int o_flags); 97 + 98 + extern int receive_fd(struct file *file, unsigned int o_flags); 99 + 97 100 static inline int receive_fd_user(struct file *file, int __user *ufd, 98 101 unsigned int o_flags) 99 102 { 100 103 if (ufd == NULL) 101 104 return -EFAULT; 102 105 return __receive_fd(file, ufd, o_flags); 103 - } 104 - static inline int receive_fd(struct file *file, unsigned int o_flags) 105 - { 106 - return __receive_fd(file, NULL, o_flags); 107 106 } 108 107 int receive_fd_replace(int new_fd, struct file *file, unsigned int o_flags); 109 108
+40 -22
include/linux/vdpa.h
··· 43 43 * @last_used_idx: used index 44 44 */ 45 45 struct vdpa_vq_state_packed { 46 - u16 last_avail_counter:1; 47 - u16 last_avail_idx:15; 48 - u16 last_used_counter:1; 49 - u16 last_used_idx:15; 46 + u16 last_avail_counter:1; 47 + u16 last_avail_idx:15; 48 + u16 last_used_counter:1; 49 + u16 last_used_idx:15; 50 50 }; 51 51 52 52 struct vdpa_vq_state { 53 - union { 54 - struct vdpa_vq_state_split split; 55 - struct vdpa_vq_state_packed packed; 56 - }; 53 + union { 54 + struct vdpa_vq_state_split split; 55 + struct vdpa_vq_state_packed packed; 56 + }; 57 57 }; 58 58 59 59 struct vdpa_mgmt_dev; ··· 65 65 * @config: the configuration ops for this device. 66 66 * @index: device index 67 67 * @features_valid: were features initialized? for legacy guests 68 + * @use_va: indicate whether virtual address must be used by this device 68 69 * @nvqs: maximum number of supported virtqueues 69 70 * @mdev: management device pointer; caller must setup when registering device as part 70 71 * of dev_add() mgmtdev ops callback before invoking _vdpa_register_device(). ··· 76 75 const struct vdpa_config_ops *config; 77 76 unsigned int index; 78 77 bool features_valid; 78 + bool use_va; 79 79 int nvqs; 80 80 struct vdpa_mgmt_dev *mdev; 81 81 }; ··· 89 87 struct vdpa_iova_range { 90 88 u64 first; 91 89 u64 last; 90 + }; 91 + 92 + /** 93 + * Corresponding file area for device memory mapping 94 + * @file: vma->vm_file for the mapping 95 + * @offset: mapping offset in the vm_file 96 + */ 97 + struct vdpa_map_file { 98 + struct file *file; 99 + u64 offset; 92 100 }; 93 101 94 102 /** ··· 143 131 * @vdev: vdpa device 144 132 * @idx: virtqueue index 145 133 * @state: pointer to returned state (last_avail_idx) 146 - * @get_vq_notification: Get the notification area for a virtqueue 134 + * @get_vq_notification: Get the notification area for a virtqueue 147 135 * @vdev: vdpa device 148 136 * @idx: virtqueue index 149 137 * Returns the notifcation area ··· 183 171 * @set_status: Set the device status 184 172 * @vdev: vdpa device 185 173 * @status: virtio device status 174 + * @reset: Reset device 175 + * @vdev: vdpa device 176 + * Returns integer: success (0) or error (< 0) 186 177 * @get_config_size: Get the size of the configuration space 187 178 * @vdev: vdpa device 188 179 * Returns size_t: configuration size ··· 270 255 u32 (*get_vendor_id)(struct vdpa_device *vdev); 271 256 u8 (*get_status)(struct vdpa_device *vdev); 272 257 void (*set_status)(struct vdpa_device *vdev, u8 status); 258 + int (*reset)(struct vdpa_device *vdev); 273 259 size_t (*get_config_size)(struct vdpa_device *vdev); 274 260 void (*get_config)(struct vdpa_device *vdev, unsigned int offset, 275 261 void *buf, unsigned int len); ··· 282 266 /* DMA ops */ 283 267 int (*set_map)(struct vdpa_device *vdev, struct vhost_iotlb *iotlb); 284 268 int (*dma_map)(struct vdpa_device *vdev, u64 iova, u64 size, 285 - u64 pa, u32 perm); 269 + u64 pa, u32 perm, void *opaque); 286 270 int (*dma_unmap)(struct vdpa_device *vdev, u64 iova, u64 size); 287 271 288 272 /* Free device resources */ ··· 291 275 292 276 struct vdpa_device *__vdpa_alloc_device(struct device *parent, 293 277 const struct vdpa_config_ops *config, 294 - size_t size, const char *name); 278 + size_t size, const char *name, 279 + bool use_va); 295 280 296 281 /** 297 282 * vdpa_alloc_device - allocate and initilaize a vDPA device ··· 302 285 * @parent: the parent device 303 286 * @config: the bus operations that is supported by this device 304 287 * @name: name of the vdpa device 288 + * @use_va: indicate whether virtual address must be used by this device 305 289 * 306 290 * Return allocated data structure or ERR_PTR upon error 307 291 */ 308 - #define vdpa_alloc_device(dev_struct, member, parent, config, name) \ 292 + #define vdpa_alloc_device(dev_struct, member, parent, config, name, use_va) \ 309 293 container_of(__vdpa_alloc_device( \ 310 294 parent, config, \ 311 295 sizeof(dev_struct) + \ 312 296 BUILD_BUG_ON_ZERO(offsetof( \ 313 - dev_struct, member)), name), \ 297 + dev_struct, member)), name, use_va), \ 314 298 dev_struct, member) 315 299 316 300 int vdpa_register_device(struct vdpa_device *vdev, int nvqs); ··· 366 348 return vdev->dma_dev; 367 349 } 368 350 369 - static inline void vdpa_reset(struct vdpa_device *vdev) 351 + static inline int vdpa_reset(struct vdpa_device *vdev) 370 352 { 371 - const struct vdpa_config_ops *ops = vdev->config; 353 + const struct vdpa_config_ops *ops = vdev->config; 372 354 373 355 vdev->features_valid = false; 374 - ops->set_status(vdev, 0); 356 + return ops->reset(vdev); 375 357 } 376 358 377 359 static inline int vdpa_set_features(struct vdpa_device *vdev, u64 features) 378 360 { 379 - const struct vdpa_config_ops *ops = vdev->config; 361 + const struct vdpa_config_ops *ops = vdev->config; 380 362 381 363 vdev->features_valid = true; 382 - return ops->set_features(vdev, features); 364 + return ops->set_features(vdev, features); 383 365 } 384 366 385 - 386 - static inline void vdpa_get_config(struct vdpa_device *vdev, unsigned offset, 387 - void *buf, unsigned int len) 367 + static inline void vdpa_get_config(struct vdpa_device *vdev, 368 + unsigned int offset, void *buf, 369 + unsigned int len) 388 370 { 389 - const struct vdpa_config_ops *ops = vdev->config; 371 + const struct vdpa_config_ops *ops = vdev->config; 390 372 391 373 /* 392 374 * Config accesses aren't supposed to trigger before features are set.
+3
include/linux/vhost_iotlb.h
··· 17 17 u32 perm; 18 18 u32 flags_padding; 19 19 u64 __subtree_last; 20 + void *opaque; 20 21 }; 21 22 22 23 #define VHOST_IOTLB_FLAG_RETIRE 0x1 ··· 30 29 unsigned int flags; 31 30 }; 32 31 32 + int vhost_iotlb_add_range_ctx(struct vhost_iotlb *iotlb, u64 start, u64 last, 33 + u64 addr, unsigned int perm, void *opaque); 33 34 int vhost_iotlb_add_range(struct vhost_iotlb *iotlb, u64 start, u64 last, 34 35 u64 addr, unsigned int perm); 35 36 void vhost_iotlb_del_range(struct vhost_iotlb *iotlb, u64 start, u64 last);
+306
include/uapi/linux/vduse.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ 2 + #ifndef _UAPI_VDUSE_H_ 3 + #define _UAPI_VDUSE_H_ 4 + 5 + #include <linux/types.h> 6 + 7 + #define VDUSE_BASE 0x81 8 + 9 + /* The ioctls for control device (/dev/vduse/control) */ 10 + 11 + #define VDUSE_API_VERSION 0 12 + 13 + /* 14 + * Get the version of VDUSE API that kernel supported (VDUSE_API_VERSION). 15 + * This is used for future extension. 16 + */ 17 + #define VDUSE_GET_API_VERSION _IOR(VDUSE_BASE, 0x00, __u64) 18 + 19 + /* Set the version of VDUSE API that userspace supported. */ 20 + #define VDUSE_SET_API_VERSION _IOW(VDUSE_BASE, 0x01, __u64) 21 + 22 + /** 23 + * struct vduse_dev_config - basic configuration of a VDUSE device 24 + * @name: VDUSE device name, needs to be NUL terminated 25 + * @vendor_id: virtio vendor id 26 + * @device_id: virtio device id 27 + * @features: virtio features 28 + * @vq_num: the number of virtqueues 29 + * @vq_align: the allocation alignment of virtqueue's metadata 30 + * @reserved: for future use, needs to be initialized to zero 31 + * @config_size: the size of the configuration space 32 + * @config: the buffer of the configuration space 33 + * 34 + * Structure used by VDUSE_CREATE_DEV ioctl to create VDUSE device. 35 + */ 36 + struct vduse_dev_config { 37 + #define VDUSE_NAME_MAX 256 38 + char name[VDUSE_NAME_MAX]; 39 + __u32 vendor_id; 40 + __u32 device_id; 41 + __u64 features; 42 + __u32 vq_num; 43 + __u32 vq_align; 44 + __u32 reserved[13]; 45 + __u32 config_size; 46 + __u8 config[]; 47 + }; 48 + 49 + /* Create a VDUSE device which is represented by a char device (/dev/vduse/$NAME) */ 50 + #define VDUSE_CREATE_DEV _IOW(VDUSE_BASE, 0x02, struct vduse_dev_config) 51 + 52 + /* 53 + * Destroy a VDUSE device. Make sure there are no more references 54 + * to the char device (/dev/vduse/$NAME). 55 + */ 56 + #define VDUSE_DESTROY_DEV _IOW(VDUSE_BASE, 0x03, char[VDUSE_NAME_MAX]) 57 + 58 + /* The ioctls for VDUSE device (/dev/vduse/$NAME) */ 59 + 60 + /** 61 + * struct vduse_iotlb_entry - entry of IOTLB to describe one IOVA region [start, last] 62 + * @offset: the mmap offset on returned file descriptor 63 + * @start: start of the IOVA region 64 + * @last: last of the IOVA region 65 + * @perm: access permission of the IOVA region 66 + * 67 + * Structure used by VDUSE_IOTLB_GET_FD ioctl to find an overlapped IOVA region. 68 + */ 69 + struct vduse_iotlb_entry { 70 + __u64 offset; 71 + __u64 start; 72 + __u64 last; 73 + #define VDUSE_ACCESS_RO 0x1 74 + #define VDUSE_ACCESS_WO 0x2 75 + #define VDUSE_ACCESS_RW 0x3 76 + __u8 perm; 77 + }; 78 + 79 + /* 80 + * Find the first IOVA region that overlaps with the range [start, last] 81 + * and return the corresponding file descriptor. Return -EINVAL means the 82 + * IOVA region doesn't exist. Caller should set start and last fields. 83 + */ 84 + #define VDUSE_IOTLB_GET_FD _IOWR(VDUSE_BASE, 0x10, struct vduse_iotlb_entry) 85 + 86 + /* 87 + * Get the negotiated virtio features. It's a subset of the features in 88 + * struct vduse_dev_config which can be accepted by virtio driver. It's 89 + * only valid after FEATURES_OK status bit is set. 90 + */ 91 + #define VDUSE_DEV_GET_FEATURES _IOR(VDUSE_BASE, 0x11, __u64) 92 + 93 + /** 94 + * struct vduse_config_data - data used to update configuration space 95 + * @offset: the offset from the beginning of configuration space 96 + * @length: the length to write to configuration space 97 + * @buffer: the buffer used to write from 98 + * 99 + * Structure used by VDUSE_DEV_SET_CONFIG ioctl to update device 100 + * configuration space. 101 + */ 102 + struct vduse_config_data { 103 + __u32 offset; 104 + __u32 length; 105 + __u8 buffer[]; 106 + }; 107 + 108 + /* Set device configuration space */ 109 + #define VDUSE_DEV_SET_CONFIG _IOW(VDUSE_BASE, 0x12, struct vduse_config_data) 110 + 111 + /* 112 + * Inject a config interrupt. It's usually used to notify virtio driver 113 + * that device configuration space has changed. 114 + */ 115 + #define VDUSE_DEV_INJECT_CONFIG_IRQ _IO(VDUSE_BASE, 0x13) 116 + 117 + /** 118 + * struct vduse_vq_config - basic configuration of a virtqueue 119 + * @index: virtqueue index 120 + * @max_size: the max size of virtqueue 121 + * @reserved: for future use, needs to be initialized to zero 122 + * 123 + * Structure used by VDUSE_VQ_SETUP ioctl to setup a virtqueue. 124 + */ 125 + struct vduse_vq_config { 126 + __u32 index; 127 + __u16 max_size; 128 + __u16 reserved[13]; 129 + }; 130 + 131 + /* 132 + * Setup the specified virtqueue. Make sure all virtqueues have been 133 + * configured before the device is attached to vDPA bus. 134 + */ 135 + #define VDUSE_VQ_SETUP _IOW(VDUSE_BASE, 0x14, struct vduse_vq_config) 136 + 137 + /** 138 + * struct vduse_vq_state_split - split virtqueue state 139 + * @avail_index: available index 140 + */ 141 + struct vduse_vq_state_split { 142 + __u16 avail_index; 143 + }; 144 + 145 + /** 146 + * struct vduse_vq_state_packed - packed virtqueue state 147 + * @last_avail_counter: last driver ring wrap counter observed by device 148 + * @last_avail_idx: device available index 149 + * @last_used_counter: device ring wrap counter 150 + * @last_used_idx: used index 151 + */ 152 + struct vduse_vq_state_packed { 153 + __u16 last_avail_counter; 154 + __u16 last_avail_idx; 155 + __u16 last_used_counter; 156 + __u16 last_used_idx; 157 + }; 158 + 159 + /** 160 + * struct vduse_vq_info - information of a virtqueue 161 + * @index: virtqueue index 162 + * @num: the size of virtqueue 163 + * @desc_addr: address of desc area 164 + * @driver_addr: address of driver area 165 + * @device_addr: address of device area 166 + * @split: split virtqueue state 167 + * @packed: packed virtqueue state 168 + * @ready: ready status of virtqueue 169 + * 170 + * Structure used by VDUSE_VQ_GET_INFO ioctl to get virtqueue's information. 171 + */ 172 + struct vduse_vq_info { 173 + __u32 index; 174 + __u32 num; 175 + __u64 desc_addr; 176 + __u64 driver_addr; 177 + __u64 device_addr; 178 + union { 179 + struct vduse_vq_state_split split; 180 + struct vduse_vq_state_packed packed; 181 + }; 182 + __u8 ready; 183 + }; 184 + 185 + /* Get the specified virtqueue's information. Caller should set index field. */ 186 + #define VDUSE_VQ_GET_INFO _IOWR(VDUSE_BASE, 0x15, struct vduse_vq_info) 187 + 188 + /** 189 + * struct vduse_vq_eventfd - eventfd configuration for a virtqueue 190 + * @index: virtqueue index 191 + * @fd: eventfd, -1 means de-assigning the eventfd 192 + * 193 + * Structure used by VDUSE_VQ_SETUP_KICKFD ioctl to setup kick eventfd. 194 + */ 195 + struct vduse_vq_eventfd { 196 + __u32 index; 197 + #define VDUSE_EVENTFD_DEASSIGN -1 198 + int fd; 199 + }; 200 + 201 + /* 202 + * Setup kick eventfd for specified virtqueue. The kick eventfd is used 203 + * by VDUSE kernel module to notify userspace to consume the avail vring. 204 + */ 205 + #define VDUSE_VQ_SETUP_KICKFD _IOW(VDUSE_BASE, 0x16, struct vduse_vq_eventfd) 206 + 207 + /* 208 + * Inject an interrupt for specific virtqueue. It's used to notify virtio driver 209 + * to consume the used vring. 210 + */ 211 + #define VDUSE_VQ_INJECT_IRQ _IOW(VDUSE_BASE, 0x17, __u32) 212 + 213 + /* The control messages definition for read(2)/write(2) on /dev/vduse/$NAME */ 214 + 215 + /** 216 + * enum vduse_req_type - request type 217 + * @VDUSE_GET_VQ_STATE: get the state for specified virtqueue from userspace 218 + * @VDUSE_SET_STATUS: set the device status 219 + * @VDUSE_UPDATE_IOTLB: Notify userspace to update the memory mapping for 220 + * specified IOVA range via VDUSE_IOTLB_GET_FD ioctl 221 + */ 222 + enum vduse_req_type { 223 + VDUSE_GET_VQ_STATE, 224 + VDUSE_SET_STATUS, 225 + VDUSE_UPDATE_IOTLB, 226 + }; 227 + 228 + /** 229 + * struct vduse_vq_state - virtqueue state 230 + * @index: virtqueue index 231 + * @split: split virtqueue state 232 + * @packed: packed virtqueue state 233 + */ 234 + struct vduse_vq_state { 235 + __u32 index; 236 + union { 237 + struct vduse_vq_state_split split; 238 + struct vduse_vq_state_packed packed; 239 + }; 240 + }; 241 + 242 + /** 243 + * struct vduse_dev_status - device status 244 + * @status: device status 245 + */ 246 + struct vduse_dev_status { 247 + __u8 status; 248 + }; 249 + 250 + /** 251 + * struct vduse_iova_range - IOVA range [start, last] 252 + * @start: start of the IOVA range 253 + * @last: last of the IOVA range 254 + */ 255 + struct vduse_iova_range { 256 + __u64 start; 257 + __u64 last; 258 + }; 259 + 260 + /** 261 + * struct vduse_dev_request - control request 262 + * @type: request type 263 + * @request_id: request id 264 + * @reserved: for future use 265 + * @vq_state: virtqueue state, only index field is available 266 + * @s: device status 267 + * @iova: IOVA range for updating 268 + * @padding: padding 269 + * 270 + * Structure used by read(2) on /dev/vduse/$NAME. 271 + */ 272 + struct vduse_dev_request { 273 + __u32 type; 274 + __u32 request_id; 275 + __u32 reserved[4]; 276 + union { 277 + struct vduse_vq_state vq_state; 278 + struct vduse_dev_status s; 279 + struct vduse_iova_range iova; 280 + __u32 padding[32]; 281 + }; 282 + }; 283 + 284 + /** 285 + * struct vduse_dev_response - response to control request 286 + * @request_id: corresponding request id 287 + * @result: the result of request 288 + * @reserved: for future use, needs to be initialized to zero 289 + * @vq_state: virtqueue state 290 + * @padding: padding 291 + * 292 + * Structure used by write(2) on /dev/vduse/$NAME. 293 + */ 294 + struct vduse_dev_response { 295 + __u32 request_id; 296 + #define VDUSE_REQ_RESULT_OK 0x00 297 + #define VDUSE_REQ_RESULT_FAILED 0x01 298 + __u32 result; 299 + __u32 reserved[4]; 300 + union { 301 + struct vduse_vq_state vq_state; 302 + __u32 padding[32]; 303 + }; 304 + }; 305 + 306 + #endif /* _UAPI_VDUSE_H_ */
+9
include/uapi/linux/virtio_ids.h
··· 54 54 #define VIRTIO_ID_SOUND 25 /* virtio sound */ 55 55 #define VIRTIO_ID_FS 26 /* virtio filesystem */ 56 56 #define VIRTIO_ID_PMEM 27 /* virtio pmem */ 57 + #define VIRTIO_ID_RPMB 28 /* virtio rpmb */ 57 58 #define VIRTIO_ID_MAC80211_HWSIM 29 /* virtio mac80211-hwsim */ 59 + #define VIRTIO_ID_VIDEO_ENCODER 30 /* virtio video encoder */ 60 + #define VIRTIO_ID_VIDEO_DECODER 31 /* virtio video decoder */ 58 61 #define VIRTIO_ID_SCMI 32 /* virtio SCMI */ 62 + #define VIRTIO_ID_NITRO_SEC_MOD 33 /* virtio nitro secure module*/ 59 63 #define VIRTIO_ID_I2C_ADAPTER 34 /* virtio i2c adapter */ 64 + #define VIRTIO_ID_WATCHDOG 35 /* virtio watchdog */ 65 + #define VIRTIO_ID_CAN 36 /* virtio can */ 66 + #define VIRTIO_ID_DMABUF 37 /* virtio dmabuf */ 67 + #define VIRTIO_ID_PARAM_SERV 38 /* virtio parameter server */ 68 + #define VIRTIO_ID_AUDIO_POLICY 39 /* virtio audio policy */ 60 69 #define VIRTIO_ID_BT 40 /* virtio bluetooth */ 61 70 #define VIRTIO_ID_GPIO 41 /* virtio gpio */ 62 71
+2 -1
include/uapi/linux/virtio_vsock.h
··· 97 97 98 98 /* VIRTIO_VSOCK_OP_RW flags values */ 99 99 enum virtio_vsock_rw { 100 - VIRTIO_VSOCK_SEQ_EOR = 1, 100 + VIRTIO_VSOCK_SEQ_EOM = 1, 101 + VIRTIO_VSOCK_SEQ_EOR = 2, 101 102 }; 102 103 103 104 #endif /* _UAPI_LINUX_VIRTIO_VSOCK_H */
+5 -5
net/vmw_vsock/af_vsock.c
··· 2014 2014 { 2015 2015 const struct vsock_transport *transport; 2016 2016 struct vsock_sock *vsk; 2017 - ssize_t record_len; 2017 + ssize_t msg_len; 2018 2018 long timeout; 2019 2019 int err = 0; 2020 2020 DEFINE_WAIT(wait); ··· 2028 2028 if (err <= 0) 2029 2029 goto out; 2030 2030 2031 - record_len = transport->seqpacket_dequeue(vsk, msg, flags); 2031 + msg_len = transport->seqpacket_dequeue(vsk, msg, flags); 2032 2032 2033 - if (record_len < 0) { 2033 + if (msg_len < 0) { 2034 2034 err = -ENOMEM; 2035 2035 goto out; 2036 2036 } ··· 2044 2044 * packet. 2045 2045 */ 2046 2046 if (flags & MSG_TRUNC) 2047 - err = record_len; 2047 + err = msg_len; 2048 2048 else 2049 2049 err = len - msg_data_left(msg); 2050 2050 2051 2051 /* Always set MSG_TRUNC if real length of packet is 2052 2052 * bigger than user's buffer. 2053 2053 */ 2054 - if (record_len > len) 2054 + if (msg_len > len) 2055 2055 msg->msg_flags |= MSG_TRUNC; 2056 2056 } 2057 2057
+15 -8
net/vmw_vsock/virtio_transport_common.c
··· 76 76 goto out; 77 77 78 78 if (msg_data_left(info->msg) == 0 && 79 - info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) 80 - pkt->hdr.flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); 79 + info->type == VIRTIO_VSOCK_TYPE_SEQPACKET) { 80 + pkt->hdr.flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOM); 81 + 82 + if (info->msg->msg_flags & MSG_EOR) 83 + pkt->hdr.flags |= cpu_to_le32(VIRTIO_VSOCK_SEQ_EOR); 84 + } 81 85 } 82 86 83 87 trace_virtio_transport_alloc_pkt(src_cid, src_port, ··· 461 457 dequeued_len += pkt_len; 462 458 } 463 459 464 - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR) { 460 + if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) { 465 461 msg_ready = true; 466 462 vvs->msg_count--; 463 + 464 + if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR) 465 + msg->msg_flags |= MSG_EOR; 467 466 } 468 467 469 468 virtio_transport_dec_rx_pkt(vvs, pkt); ··· 1036 1029 goto out; 1037 1030 } 1038 1031 1039 - if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR) 1032 + if (le32_to_cpu(pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM) 1040 1033 vvs->msg_count++; 1041 1034 1042 1035 /* Try to copy small packets into the buffer of last packet queued, ··· 1051 1044 1052 1045 /* If there is space in the last packet queued, we copy the 1053 1046 * new packet in its buffer. We avoid this if the last packet 1054 - * queued has VIRTIO_VSOCK_SEQ_EOR set, because this is 1055 - * delimiter of SEQPACKET record, so 'pkt' is the first packet 1056 - * of a new record. 1047 + * queued has VIRTIO_VSOCK_SEQ_EOM set, because this is 1048 + * delimiter of SEQPACKET message, so 'pkt' is the first packet 1049 + * of a new message. 1057 1050 */ 1058 1051 if ((pkt->len <= last_pkt->buf_len - last_pkt->len) && 1059 - !(le32_to_cpu(last_pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOR)) { 1052 + !(le32_to_cpu(last_pkt->hdr.flags) & VIRTIO_VSOCK_SEQ_EOM)) { 1060 1053 memcpy(last_pkt->buf + last_pkt->len, pkt->buf, 1061 1054 pkt->len); 1062 1055 last_pkt->len += pkt->len;
+7 -1
tools/testing/vsock/vsock_test.c
··· 282 282 } 283 283 284 284 #define MESSAGES_CNT 7 285 + #define MSG_EOR_IDX (MESSAGES_CNT / 2) 285 286 static void test_seqpacket_msg_bounds_client(const struct test_opts *opts) 286 287 { 287 288 int fd; ··· 295 294 296 295 /* Send several messages, one with MSG_EOR flag */ 297 296 for (int i = 0; i < MESSAGES_CNT; i++) 298 - send_byte(fd, 1, 0); 297 + send_byte(fd, 1, (i == MSG_EOR_IDX) ? MSG_EOR : 0); 299 298 300 299 control_writeln("SENDDONE"); 301 300 close(fd); ··· 323 322 for (int i = 0; i < MESSAGES_CNT; i++) { 324 323 if (recvmsg(fd, &msg, 0) != 1) { 325 324 perror("message bound violated"); 325 + exit(EXIT_FAILURE); 326 + } 327 + 328 + if ((i == MSG_EOR_IDX) ^ !!(msg.msg_flags & MSG_EOR)) { 329 + perror("MSG_EOR"); 326 330 exit(EXIT_FAILURE); 327 331 } 328 332 }