Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'vfio-v6.8-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

- Add debugfs support, initially used for reporting device migration
state (Longfang Liu)

- Fixes and support for migration dirty tracking across multiple IOVA
regions in the pds-vfio-pci driver (Brett Creeley)

- Improved IOMMU allocation accounting visibility (Pasha Tatashin)

- Virtio infrastructure and a new virtio-vfio-pci variant driver, which
provides emulation of a legacy virtio interfaces on modern virtio
hardware for virtio-net VF devices where the PF driver exposes
support for legacy admin queues, ie. an emulated IO BAR on an SR-IOV
VF to provide driver ABI compatibility to legacy devices (Yishai
Hadas & Feng Liu)

- Migration fixes for the hisi-acc-vfio-pci variant driver (Shameer
Kolothum)

- Kconfig dependency fix for new virtio-vfio-pci variant driver (Arnd
Bergmann)

* tag 'vfio-v6.8-rc1' of https://github.com/awilliam/linux-vfio: (22 commits)
vfio/virtio: fix virtio-pci dependency
hisi_acc_vfio_pci: Update migration data pointer correctly on saving/resume
vfio/virtio: Declare virtiovf_pci_aer_reset_done() static
vfio/virtio: Introduce a vfio driver over virtio devices
vfio/pci: Expose vfio_pci_core_iowrite/read##size()
vfio/pci: Expose vfio_pci_core_setup_barmap()
virtio-pci: Introduce APIs to execute legacy IO admin commands
virtio-pci: Initialize the supported admin commands
virtio-pci: Introduce admin commands
virtio-pci: Introduce admin command sending function
virtio-pci: Introduce admin virtqueue
virtio: Define feature bit for administration virtqueue
vfio/type1: account iommu allocations
vfio/pds: Add multi-region support
vfio/pds: Move seq/ack bitmaps into region struct
vfio/pds: Pass region info to relevant functions
vfio/pds: Move and rename region specific info
vfio/pds: Only use a single SGL for both seq and ack
vfio/pds: Fix calculations in pds_vfio_dirty_sync
MAINTAINERS: Add vfio debugfs interface doc link
...

+1754 -164
+25
Documentation/ABI/testing/debugfs-vfio
··· 1 + What: /sys/kernel/debug/vfio 2 + Date: December 2023 3 + KernelVersion: 6.8 4 + Contact: Longfang Liu <liulongfang@huawei.com> 5 + Description: This debugfs file directory is used for debugging 6 + of vfio devices, it's a common directory for all vfio devices. 7 + Vfio core will create a device subdirectory under this 8 + directory. 9 + 10 + What: /sys/kernel/debug/vfio/<device>/migration 11 + Date: December 2023 12 + KernelVersion: 6.8 13 + Contact: Longfang Liu <liulongfang@huawei.com> 14 + Description: This debugfs file directory is used for debugging 15 + of vfio devices that support live migration. 16 + The debugfs of each vfio device that supports live migration 17 + could be created under this directory. 18 + 19 + What: /sys/kernel/debug/vfio/<device>/migration/state 20 + Date: December 2023 21 + KernelVersion: 6.8 22 + Contact: Longfang Liu <liulongfang@huawei.com> 23 + Description: Read the live migration status of the vfio device. 24 + The contents of the state file reflects the migration state 25 + relative to those defined in the vfio_device_mig_state enum
+8
MAINTAINERS
··· 23002 23002 L: kvm@vger.kernel.org 23003 23003 S: Maintained 23004 23004 T: git https://github.com/awilliam/linux-vfio.git 23005 + F: Documentation/ABI/testing/debugfs-vfio 23005 23006 F: Documentation/ABI/testing/sysfs-devices-vfio-dev 23006 23007 F: Documentation/driver-api/vfio.rst 23007 23008 F: drivers/vfio/ ··· 23037 23036 L: kvm@vger.kernel.org 23038 23037 S: Maintained 23039 23038 F: drivers/vfio/pci/mlx5/ 23039 + 23040 + VFIO VIRTIO PCI DRIVER 23041 + M: Yishai Hadas <yishaih@nvidia.com> 23042 + L: kvm@vger.kernel.org 23043 + L: virtualization@lists.linux-foundation.org 23044 + S: Maintained 23045 + F: drivers/vfio/pci/virtio 23040 23046 23041 23047 VFIO PCI DEVICE SPECIFIC DRIVERS 23042 23048 R: Jason Gunthorpe <jgg@nvidia.com>
+10
drivers/vfio/Kconfig
··· 80 80 select EVENTFD 81 81 default n 82 82 83 + config VFIO_DEBUGFS 84 + bool "Export VFIO internals in DebugFS" 85 + depends on DEBUG_FS 86 + help 87 + Allows exposure of VFIO device internals. This option enables 88 + the use of debugfs by VFIO drivers as required. The device can 89 + cause the VFIO code create a top-level debug/vfio directory 90 + during initialization, and then populate a subdirectory with 91 + entries as required. 92 + 83 93 source "drivers/vfio/pci/Kconfig" 84 94 source "drivers/vfio/platform/Kconfig" 85 95 source "drivers/vfio/mdev/Kconfig"
+1
drivers/vfio/Makefile
··· 7 7 vfio-$(CONFIG_IOMMUFD) += iommufd.o 8 8 vfio-$(CONFIG_VFIO_CONTAINER) += container.o 9 9 vfio-$(CONFIG_VFIO_VIRQFD) += virqfd.o 10 + vfio-$(CONFIG_VFIO_DEBUGFS) += debugfs.o 10 11 11 12 obj-$(CONFIG_VFIO_IOMMU_TYPE1) += vfio_iommu_type1.o 12 13 obj-$(CONFIG_VFIO_IOMMU_SPAPR_TCE) += vfio_iommu_spapr_tce.o
+92
drivers/vfio/debugfs.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (c) 2023, HiSilicon Ltd. 4 + */ 5 + 6 + #include <linux/device.h> 7 + #include <linux/debugfs.h> 8 + #include <linux/seq_file.h> 9 + #include <linux/vfio.h> 10 + #include "vfio.h" 11 + 12 + static struct dentry *vfio_debugfs_root; 13 + 14 + static int vfio_device_state_read(struct seq_file *seq, void *data) 15 + { 16 + struct device *vf_dev = seq->private; 17 + struct vfio_device *vdev = container_of(vf_dev, 18 + struct vfio_device, device); 19 + enum vfio_device_mig_state state; 20 + int ret; 21 + 22 + BUILD_BUG_ON(VFIO_DEVICE_STATE_NR != 23 + VFIO_DEVICE_STATE_PRE_COPY_P2P + 1); 24 + 25 + ret = vdev->mig_ops->migration_get_state(vdev, &state); 26 + if (ret) 27 + return -EINVAL; 28 + 29 + switch (state) { 30 + case VFIO_DEVICE_STATE_ERROR: 31 + seq_puts(seq, "ERROR\n"); 32 + break; 33 + case VFIO_DEVICE_STATE_STOP: 34 + seq_puts(seq, "STOP\n"); 35 + break; 36 + case VFIO_DEVICE_STATE_RUNNING: 37 + seq_puts(seq, "RUNNING\n"); 38 + break; 39 + case VFIO_DEVICE_STATE_STOP_COPY: 40 + seq_puts(seq, "STOP_COPY\n"); 41 + break; 42 + case VFIO_DEVICE_STATE_RESUMING: 43 + seq_puts(seq, "RESUMING\n"); 44 + break; 45 + case VFIO_DEVICE_STATE_RUNNING_P2P: 46 + seq_puts(seq, "RUNNING_P2P\n"); 47 + break; 48 + case VFIO_DEVICE_STATE_PRE_COPY: 49 + seq_puts(seq, "PRE_COPY\n"); 50 + break; 51 + case VFIO_DEVICE_STATE_PRE_COPY_P2P: 52 + seq_puts(seq, "PRE_COPY_P2P\n"); 53 + break; 54 + default: 55 + seq_puts(seq, "Invalid\n"); 56 + } 57 + 58 + return 0; 59 + } 60 + 61 + void vfio_device_debugfs_init(struct vfio_device *vdev) 62 + { 63 + struct device *dev = &vdev->device; 64 + 65 + vdev->debug_root = debugfs_create_dir(dev_name(vdev->dev), 66 + vfio_debugfs_root); 67 + 68 + if (vdev->mig_ops) { 69 + struct dentry *vfio_dev_migration = NULL; 70 + 71 + vfio_dev_migration = debugfs_create_dir("migration", 72 + vdev->debug_root); 73 + debugfs_create_devm_seqfile(dev, "state", vfio_dev_migration, 74 + vfio_device_state_read); 75 + } 76 + } 77 + 78 + void vfio_device_debugfs_exit(struct vfio_device *vdev) 79 + { 80 + debugfs_remove_recursive(vdev->debug_root); 81 + } 82 + 83 + void vfio_debugfs_create_root(void) 84 + { 85 + vfio_debugfs_root = debugfs_create_dir("vfio", NULL); 86 + } 87 + 88 + void vfio_debugfs_remove_root(void) 89 + { 90 + debugfs_remove_recursive(vfio_debugfs_root); 91 + vfio_debugfs_root = NULL; 92 + }
+2
drivers/vfio/pci/Kconfig
··· 65 65 66 66 source "drivers/vfio/pci/pds/Kconfig" 67 67 68 + source "drivers/vfio/pci/virtio/Kconfig" 69 + 68 70 endmenu
+2
drivers/vfio/pci/Makefile
··· 13 13 obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/ 14 14 15 15 obj-$(CONFIG_PDS_VFIO_PCI) += pds/ 16 + 17 + obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio/
+5 -2
drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
··· 694 694 size_t len, loff_t *pos) 695 695 { 696 696 struct hisi_acc_vf_migration_file *migf = filp->private_data; 697 + u8 *vf_data = (u8 *)&migf->vf_data; 697 698 loff_t requested_length; 698 699 ssize_t done = 0; 699 700 int ret; ··· 716 715 goto out_unlock; 717 716 } 718 717 719 - ret = copy_from_user(&migf->vf_data, buf, len); 718 + ret = copy_from_user(vf_data + *pos, buf, len); 720 719 if (ret) { 721 720 done = -EFAULT; 722 721 goto out_unlock; ··· 836 835 837 836 len = min_t(size_t, migf->total_length - *pos, len); 838 837 if (len) { 839 - ret = copy_to_user(buf, &migf->vf_data, len); 838 + u8 *vf_data = (u8 *)&migf->vf_data; 839 + 840 + ret = copy_to_user(buf, vf_data + *pos, len); 840 841 if (ret) { 841 842 done = -EFAULT; 842 843 goto out_unlock;
+194 -115
drivers/vfio/pci/pds/dirty.c
··· 70 70 kfree(region_info); 71 71 } 72 72 73 - static int pds_vfio_dirty_alloc_bitmaps(struct pds_vfio_dirty *dirty, 73 + static int pds_vfio_dirty_alloc_bitmaps(struct pds_vfio_region *region, 74 74 unsigned long bytes) 75 75 { 76 76 unsigned long *host_seq_bmp, *host_ack_bmp; ··· 85 85 return -ENOMEM; 86 86 } 87 87 88 - dirty->host_seq.bmp = host_seq_bmp; 89 - dirty->host_ack.bmp = host_ack_bmp; 88 + region->host_seq = host_seq_bmp; 89 + region->host_ack = host_ack_bmp; 90 + region->bmp_bytes = bytes; 90 91 91 92 return 0; 92 93 } 93 94 94 95 static void pds_vfio_dirty_free_bitmaps(struct pds_vfio_dirty *dirty) 95 96 { 96 - vfree(dirty->host_seq.bmp); 97 - vfree(dirty->host_ack.bmp); 98 - dirty->host_seq.bmp = NULL; 99 - dirty->host_ack.bmp = NULL; 97 + if (!dirty->regions) 98 + return; 99 + 100 + for (int i = 0; i < dirty->num_regions; i++) { 101 + struct pds_vfio_region *region = &dirty->regions[i]; 102 + 103 + vfree(region->host_seq); 104 + vfree(region->host_ack); 105 + region->host_seq = NULL; 106 + region->host_ack = NULL; 107 + region->bmp_bytes = 0; 108 + } 100 109 } 101 110 102 111 static void __pds_vfio_dirty_free_sgl(struct pds_vfio_pci_device *pds_vfio, 103 - struct pds_vfio_bmp_info *bmp_info) 112 + struct pds_vfio_region *region) 104 113 { 105 114 struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev; 106 115 struct device *pdsc_dev = &pci_physfn(pdev)->dev; 107 116 108 - dma_unmap_single(pdsc_dev, bmp_info->sgl_addr, 109 - bmp_info->num_sge * sizeof(struct pds_lm_sg_elem), 117 + dma_unmap_single(pdsc_dev, region->sgl_addr, 118 + region->num_sge * sizeof(struct pds_lm_sg_elem), 110 119 DMA_BIDIRECTIONAL); 111 - kfree(bmp_info->sgl); 120 + kfree(region->sgl); 112 121 113 - bmp_info->num_sge = 0; 114 - bmp_info->sgl = NULL; 115 - bmp_info->sgl_addr = 0; 122 + region->num_sge = 0; 123 + region->sgl = NULL; 124 + region->sgl_addr = 0; 116 125 } 117 126 118 127 static void pds_vfio_dirty_free_sgl(struct pds_vfio_pci_device *pds_vfio) 119 128 { 120 - if (pds_vfio->dirty.host_seq.sgl) 121 - __pds_vfio_dirty_free_sgl(pds_vfio, &pds_vfio->dirty.host_seq); 122 - if (pds_vfio->dirty.host_ack.sgl) 123 - __pds_vfio_dirty_free_sgl(pds_vfio, &pds_vfio->dirty.host_ack); 129 + struct pds_vfio_dirty *dirty = &pds_vfio->dirty; 130 + 131 + if (!dirty->regions) 132 + return; 133 + 134 + for (int i = 0; i < dirty->num_regions; i++) { 135 + struct pds_vfio_region *region = &dirty->regions[i]; 136 + 137 + if (region->sgl) 138 + __pds_vfio_dirty_free_sgl(pds_vfio, region); 139 + } 124 140 } 125 141 126 - static int __pds_vfio_dirty_alloc_sgl(struct pds_vfio_pci_device *pds_vfio, 127 - struct pds_vfio_bmp_info *bmp_info, 128 - u32 page_count) 142 + static int pds_vfio_dirty_alloc_sgl(struct pds_vfio_pci_device *pds_vfio, 143 + struct pds_vfio_region *region, 144 + u32 page_count) 129 145 { 130 146 struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev; 131 147 struct device *pdsc_dev = &pci_physfn(pdev)->dev; ··· 163 147 return -EIO; 164 148 } 165 149 166 - bmp_info->sgl = sgl; 167 - bmp_info->num_sge = max_sge; 168 - bmp_info->sgl_addr = sgl_addr; 150 + region->sgl = sgl; 151 + region->num_sge = max_sge; 152 + region->sgl_addr = sgl_addr; 169 153 170 154 return 0; 171 155 } 172 156 173 - static int pds_vfio_dirty_alloc_sgl(struct pds_vfio_pci_device *pds_vfio, 174 - u32 page_count) 157 + static void pds_vfio_dirty_free_regions(struct pds_vfio_dirty *dirty) 175 158 { 159 + vfree(dirty->regions); 160 + dirty->regions = NULL; 161 + dirty->num_regions = 0; 162 + } 163 + 164 + static int pds_vfio_dirty_alloc_regions(struct pds_vfio_pci_device *pds_vfio, 165 + struct pds_lm_dirty_region_info *region_info, 166 + u64 region_page_size, u8 num_regions) 167 + { 168 + struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev; 176 169 struct pds_vfio_dirty *dirty = &pds_vfio->dirty; 170 + u32 dev_bmp_offset_byte = 0; 177 171 int err; 178 172 179 - err = __pds_vfio_dirty_alloc_sgl(pds_vfio, &dirty->host_seq, 180 - page_count); 181 - if (err) 182 - return err; 173 + dirty->regions = vcalloc(num_regions, sizeof(struct pds_vfio_region)); 174 + if (!dirty->regions) 175 + return -ENOMEM; 176 + dirty->num_regions = num_regions; 183 177 184 - err = __pds_vfio_dirty_alloc_sgl(pds_vfio, &dirty->host_ack, 185 - page_count); 186 - if (err) { 187 - __pds_vfio_dirty_free_sgl(pds_vfio, &dirty->host_seq); 188 - return err; 178 + for (int i = 0; i < num_regions; i++) { 179 + struct pds_lm_dirty_region_info *ri = &region_info[i]; 180 + struct pds_vfio_region *region = &dirty->regions[i]; 181 + u64 region_size, region_start; 182 + u32 page_count; 183 + 184 + /* page_count might be adjusted by the device */ 185 + page_count = le32_to_cpu(ri->page_count); 186 + region_start = le64_to_cpu(ri->dma_base); 187 + region_size = page_count * region_page_size; 188 + 189 + err = pds_vfio_dirty_alloc_bitmaps(region, 190 + page_count / BITS_PER_BYTE); 191 + if (err) { 192 + dev_err(&pdev->dev, "Failed to alloc dirty bitmaps: %pe\n", 193 + ERR_PTR(err)); 194 + goto out_free_regions; 195 + } 196 + 197 + err = pds_vfio_dirty_alloc_sgl(pds_vfio, region, page_count); 198 + if (err) { 199 + dev_err(&pdev->dev, "Failed to alloc dirty sg lists: %pe\n", 200 + ERR_PTR(err)); 201 + goto out_free_regions; 202 + } 203 + 204 + region->size = region_size; 205 + region->start = region_start; 206 + region->page_size = region_page_size; 207 + region->dev_bmp_offset_start_byte = dev_bmp_offset_byte; 208 + 209 + dev_bmp_offset_byte += page_count / BITS_PER_BYTE; 210 + if (dev_bmp_offset_byte % BITS_PER_BYTE) { 211 + dev_err(&pdev->dev, "Device bitmap offset is mis-aligned\n"); 212 + err = -EINVAL; 213 + goto out_free_regions; 214 + } 189 215 } 190 216 191 217 return 0; 218 + 219 + out_free_regions: 220 + pds_vfio_dirty_free_bitmaps(dirty); 221 + pds_vfio_dirty_free_sgl(pds_vfio); 222 + pds_vfio_dirty_free_regions(dirty); 223 + 224 + return err; 192 225 } 193 226 194 227 static int pds_vfio_dirty_enable(struct pds_vfio_pci_device *pds_vfio, ··· 246 181 { 247 182 struct pci_dev *pdev = pds_vfio->vfio_coredev.pdev; 248 183 struct device *pdsc_dev = &pci_physfn(pdev)->dev; 249 - struct pds_vfio_dirty *dirty = &pds_vfio->dirty; 250 - u64 region_start, region_size, region_page_size; 251 184 struct pds_lm_dirty_region_info *region_info; 252 185 struct interval_tree_node *node = NULL; 186 + u64 region_page_size = *page_size; 253 187 u8 max_regions = 0, num_regions; 254 188 dma_addr_t regions_dma = 0; 255 189 u32 num_ranges = nnodes; 256 - u32 page_count; 257 - u16 len; 258 190 int err; 191 + u16 len; 259 192 260 193 dev_dbg(&pdev->dev, "vf%u: Start dirty page tracking\n", 261 194 pds_vfio->vf_id); ··· 280 217 return -EOPNOTSUPP; 281 218 } 282 219 283 - /* 284 - * Only support 1 region for now. If there are any large gaps in the 285 - * VM's address regions, then this would be a waste of memory as we are 286 - * generating 2 bitmaps (ack/seq) from the min address to the max 287 - * address of the VM's address regions. In the future, if we support 288 - * more than one region in the device/driver we can split the bitmaps 289 - * on the largest address region gaps. We can do this split up to the 290 - * max_regions times returned from the dirty_status command. 291 - */ 292 - max_regions = 1; 293 220 if (num_ranges > max_regions) { 294 221 vfio_combine_iova_ranges(ranges, nnodes, max_regions); 295 222 num_ranges = max_regions; 296 223 } 297 224 225 + region_info = kcalloc(num_ranges, sizeof(*region_info), GFP_KERNEL); 226 + if (!region_info) 227 + return -ENOMEM; 228 + len = num_ranges * sizeof(*region_info); 229 + 298 230 node = interval_tree_iter_first(ranges, 0, ULONG_MAX); 299 231 if (!node) 300 232 return -EINVAL; 233 + for (int i = 0; i < num_ranges; i++) { 234 + struct pds_lm_dirty_region_info *ri = &region_info[i]; 235 + u64 region_size = node->last - node->start + 1; 236 + u64 region_start = node->start; 237 + u32 page_count; 301 238 302 - region_size = node->last - node->start + 1; 303 - region_start = node->start; 304 - region_page_size = *page_size; 239 + page_count = DIV_ROUND_UP(region_size, region_page_size); 305 240 306 - len = sizeof(*region_info); 307 - region_info = kzalloc(len, GFP_KERNEL); 308 - if (!region_info) 309 - return -ENOMEM; 241 + ri->dma_base = cpu_to_le64(region_start); 242 + ri->page_count = cpu_to_le32(page_count); 243 + ri->page_size_log2 = ilog2(region_page_size); 310 244 311 - page_count = DIV_ROUND_UP(region_size, region_page_size); 245 + dev_dbg(&pdev->dev, 246 + "region_info[%d]: region_start 0x%llx region_end 0x%lx region_size 0x%llx page_count %u page_size %llu\n", 247 + i, region_start, node->last, region_size, page_count, 248 + region_page_size); 312 249 313 - region_info->dma_base = cpu_to_le64(region_start); 314 - region_info->page_count = cpu_to_le32(page_count); 315 - region_info->page_size_log2 = ilog2(region_page_size); 250 + node = interval_tree_iter_next(node, 0, ULONG_MAX); 251 + } 316 252 317 253 regions_dma = dma_map_single(pdsc_dev, (void *)region_info, len, 318 254 DMA_BIDIRECTIONAL); ··· 320 258 goto out_free_region_info; 321 259 } 322 260 323 - err = pds_vfio_dirty_enable_cmd(pds_vfio, regions_dma, max_regions); 261 + err = pds_vfio_dirty_enable_cmd(pds_vfio, regions_dma, num_ranges); 324 262 dma_unmap_single(pdsc_dev, regions_dma, len, DMA_BIDIRECTIONAL); 325 263 if (err) 326 264 goto out_free_region_info; 327 265 328 - /* 329 - * page_count might be adjusted by the device, 330 - * update it before freeing region_info DMA 331 - */ 332 - page_count = le32_to_cpu(region_info->page_count); 333 - 334 - dev_dbg(&pdev->dev, 335 - "region_info: regions_dma 0x%llx dma_base 0x%llx page_count %u page_size_log2 %u\n", 336 - regions_dma, region_start, page_count, 337 - (u8)ilog2(region_page_size)); 338 - 339 - err = pds_vfio_dirty_alloc_bitmaps(dirty, page_count / BITS_PER_BYTE); 266 + err = pds_vfio_dirty_alloc_regions(pds_vfio, region_info, 267 + region_page_size, num_ranges); 340 268 if (err) { 341 - dev_err(&pdev->dev, "Failed to alloc dirty bitmaps: %pe\n", 342 - ERR_PTR(err)); 343 - goto out_free_region_info; 269 + dev_err(&pdev->dev, 270 + "Failed to allocate %d regions for tracking dirty regions: %pe\n", 271 + num_regions, ERR_PTR(err)); 272 + goto out_dirty_disable; 344 273 } 345 274 346 - err = pds_vfio_dirty_alloc_sgl(pds_vfio, page_count); 347 - if (err) { 348 - dev_err(&pdev->dev, "Failed to alloc dirty sg lists: %pe\n", 349 - ERR_PTR(err)); 350 - goto out_free_bitmaps; 351 - } 352 - 353 - dirty->region_start = region_start; 354 - dirty->region_size = region_size; 355 - dirty->region_page_size = region_page_size; 356 275 pds_vfio_dirty_set_enabled(pds_vfio); 357 276 358 277 pds_vfio_print_guest_region_info(pds_vfio, max_regions); ··· 342 299 343 300 return 0; 344 301 345 - out_free_bitmaps: 346 - pds_vfio_dirty_free_bitmaps(dirty); 302 + out_dirty_disable: 303 + pds_vfio_dirty_disable_cmd(pds_vfio); 347 304 out_free_region_info: 348 305 kfree(region_info); 349 306 return err; ··· 357 314 pds_vfio_dirty_disable_cmd(pds_vfio); 358 315 pds_vfio_dirty_free_sgl(pds_vfio); 359 316 pds_vfio_dirty_free_bitmaps(&pds_vfio->dirty); 317 + pds_vfio_dirty_free_regions(&pds_vfio->dirty); 360 318 } 361 319 362 320 if (send_cmd) ··· 365 321 } 366 322 367 323 static int pds_vfio_dirty_seq_ack(struct pds_vfio_pci_device *pds_vfio, 368 - struct pds_vfio_bmp_info *bmp_info, 369 - u32 offset, u32 bmp_bytes, bool read_seq) 324 + struct pds_vfio_region *region, 325 + unsigned long *seq_ack_bmp, u32 offset, 326 + u32 bmp_bytes, bool read_seq) 370 327 { 371 328 const char *bmp_type_str = read_seq ? "read_seq" : "write_ack"; 372 329 u8 dma_dir = read_seq ? DMA_FROM_DEVICE : DMA_TO_DEVICE; ··· 384 339 int err; 385 340 int i; 386 341 387 - bmp = (void *)((u64)bmp_info->bmp + offset); 342 + bmp = (void *)((u64)seq_ack_bmp + offset); 388 343 page_offset = offset_in_page(bmp); 389 344 bmp -= page_offset; 390 345 ··· 420 375 goto out_free_sg_table; 421 376 422 377 for_each_sgtable_dma_sg(&sg_table, sg, i) { 423 - struct pds_lm_sg_elem *sg_elem = &bmp_info->sgl[i]; 378 + struct pds_lm_sg_elem *sg_elem = &region->sgl[i]; 424 379 425 380 sg_elem->addr = cpu_to_le64(sg_dma_address(sg)); 426 381 sg_elem->len = cpu_to_le32(sg_dma_len(sg)); ··· 428 383 429 384 num_sge = sg_table.nents; 430 385 size = num_sge * sizeof(struct pds_lm_sg_elem); 431 - dma_sync_single_for_device(pdsc_dev, bmp_info->sgl_addr, size, dma_dir); 432 - err = pds_vfio_dirty_seq_ack_cmd(pds_vfio, bmp_info->sgl_addr, num_sge, 386 + offset += region->dev_bmp_offset_start_byte; 387 + dma_sync_single_for_device(pdsc_dev, region->sgl_addr, size, dma_dir); 388 + err = pds_vfio_dirty_seq_ack_cmd(pds_vfio, region->sgl_addr, num_sge, 433 389 offset, bmp_bytes, read_seq); 434 390 if (err) 435 391 dev_err(&pdev->dev, 436 392 "Dirty bitmap %s failed offset %u bmp_bytes %u num_sge %u DMA 0x%llx: %pe\n", 437 393 bmp_type_str, offset, bmp_bytes, 438 - num_sge, bmp_info->sgl_addr, ERR_PTR(err)); 439 - dma_sync_single_for_cpu(pdsc_dev, bmp_info->sgl_addr, size, dma_dir); 394 + num_sge, region->sgl_addr, ERR_PTR(err)); 395 + dma_sync_single_for_cpu(pdsc_dev, region->sgl_addr, size, dma_dir); 440 396 441 397 dma_unmap_sgtable(pdsc_dev, &sg_table, dma_dir, 0); 442 398 out_free_sg_table: ··· 449 403 } 450 404 451 405 static int pds_vfio_dirty_write_ack(struct pds_vfio_pci_device *pds_vfio, 406 + struct pds_vfio_region *region, 452 407 u32 offset, u32 len) 453 408 { 454 - return pds_vfio_dirty_seq_ack(pds_vfio, &pds_vfio->dirty.host_ack, 409 + 410 + return pds_vfio_dirty_seq_ack(pds_vfio, region, region->host_ack, 455 411 offset, len, WRITE_ACK); 456 412 } 457 413 458 414 static int pds_vfio_dirty_read_seq(struct pds_vfio_pci_device *pds_vfio, 415 + struct pds_vfio_region *region, 459 416 u32 offset, u32 len) 460 417 { 461 - return pds_vfio_dirty_seq_ack(pds_vfio, &pds_vfio->dirty.host_seq, 418 + return pds_vfio_dirty_seq_ack(pds_vfio, region, region->host_seq, 462 419 offset, len, READ_SEQ); 463 420 } 464 421 465 422 static int pds_vfio_dirty_process_bitmaps(struct pds_vfio_pci_device *pds_vfio, 423 + struct pds_vfio_region *region, 466 424 struct iova_bitmap *dirty_bitmap, 467 425 u32 bmp_offset, u32 len_bytes) 468 426 { 469 - u64 page_size = pds_vfio->dirty.region_page_size; 470 - u64 region_start = pds_vfio->dirty.region_start; 427 + u64 page_size = region->page_size; 428 + u64 region_start = region->start; 471 429 u32 bmp_offset_bit; 472 430 __le64 *seq, *ack; 473 431 int dword_count; 474 432 475 433 dword_count = len_bytes / sizeof(u64); 476 - seq = (__le64 *)((u64)pds_vfio->dirty.host_seq.bmp + bmp_offset); 477 - ack = (__le64 *)((u64)pds_vfio->dirty.host_ack.bmp + bmp_offset); 434 + seq = (__le64 *)((u64)region->host_seq + bmp_offset); 435 + ack = (__le64 *)((u64)region->host_ack + bmp_offset); 478 436 bmp_offset_bit = bmp_offset * 8; 479 437 480 438 for (int i = 0; i < dword_count; i++) { ··· 501 451 return 0; 502 452 } 503 453 454 + static struct pds_vfio_region * 455 + pds_vfio_get_region(struct pds_vfio_pci_device *pds_vfio, unsigned long iova) 456 + { 457 + struct pds_vfio_dirty *dirty = &pds_vfio->dirty; 458 + 459 + for (int i = 0; i < dirty->num_regions; i++) { 460 + struct pds_vfio_region *region = &dirty->regions[i]; 461 + 462 + if (iova >= region->start && 463 + iova < (region->start + region->size)) 464 + return region; 465 + } 466 + 467 + return NULL; 468 + } 469 + 504 470 static int pds_vfio_dirty_sync(struct pds_vfio_pci_device *pds_vfio, 505 471 struct iova_bitmap *dirty_bitmap, 506 472 unsigned long iova, unsigned long length) 507 473 { 508 474 struct device *dev = &pds_vfio->vfio_coredev.pdev->dev; 509 - struct pds_vfio_dirty *dirty = &pds_vfio->dirty; 475 + struct pds_vfio_region *region; 510 476 u64 bmp_offset, bmp_bytes; 511 477 u64 bitmap_size, pages; 512 478 int err; ··· 535 469 return -EINVAL; 536 470 } 537 471 538 - pages = DIV_ROUND_UP(length, pds_vfio->dirty.region_page_size); 472 + region = pds_vfio_get_region(pds_vfio, iova); 473 + if (!region) { 474 + dev_err(dev, "vf%u: Failed to find region that contains iova 0x%lx length 0x%lx\n", 475 + pds_vfio->vf_id, iova, length); 476 + return -EINVAL; 477 + } 478 + 479 + pages = DIV_ROUND_UP(length, region->page_size); 539 480 bitmap_size = 540 481 round_up(pages, sizeof(u64) * BITS_PER_BYTE) / BITS_PER_BYTE; 541 482 542 483 dev_dbg(dev, 543 484 "vf%u: iova 0x%lx length %lu page_size %llu pages %llu bitmap_size %llu\n", 544 - pds_vfio->vf_id, iova, length, pds_vfio->dirty.region_page_size, 485 + pds_vfio->vf_id, iova, length, region->page_size, 545 486 pages, bitmap_size); 546 487 547 - if (!length || ((dirty->region_start + iova + length) > 548 - (dirty->region_start + dirty->region_size))) { 488 + if (!length || ((iova - region->start + length) > region->size)) { 549 489 dev_err(dev, "Invalid iova 0x%lx and/or length 0x%lx to sync\n", 550 490 iova, length); 551 491 return -EINVAL; 552 492 } 553 493 554 494 /* bitmap is modified in 64 bit chunks */ 555 - bmp_bytes = ALIGN(DIV_ROUND_UP(length / dirty->region_page_size, 556 - sizeof(u64)), 557 - sizeof(u64)); 495 + bmp_bytes = ALIGN(DIV_ROUND_UP(length / region->page_size, 496 + sizeof(u64)), sizeof(u64)); 558 497 if (bmp_bytes != bitmap_size) { 559 498 dev_err(dev, 560 499 "Calculated bitmap bytes %llu not equal to bitmap size %llu\n", ··· 567 496 return -EINVAL; 568 497 } 569 498 570 - bmp_offset = DIV_ROUND_UP(iova / dirty->region_page_size, sizeof(u64)); 499 + if (bmp_bytes > region->bmp_bytes) { 500 + dev_err(dev, 501 + "Calculated bitmap bytes %llu larger than region's cached bmp_bytes %llu\n", 502 + bmp_bytes, region->bmp_bytes); 503 + return -EINVAL; 504 + } 505 + 506 + bmp_offset = DIV_ROUND_UP((iova - region->start) / 507 + region->page_size, sizeof(u64)); 571 508 572 509 dev_dbg(dev, 573 510 "Syncing dirty bitmap, iova 0x%lx length 0x%lx, bmp_offset %llu bmp_bytes %llu\n", 574 511 iova, length, bmp_offset, bmp_bytes); 575 512 576 - err = pds_vfio_dirty_read_seq(pds_vfio, bmp_offset, bmp_bytes); 513 + err = pds_vfio_dirty_read_seq(pds_vfio, region, bmp_offset, bmp_bytes); 577 514 if (err) 578 515 return err; 579 516 580 - err = pds_vfio_dirty_process_bitmaps(pds_vfio, dirty_bitmap, bmp_offset, 581 - bmp_bytes); 517 + err = pds_vfio_dirty_process_bitmaps(pds_vfio, region, dirty_bitmap, 518 + bmp_offset, bmp_bytes); 582 519 if (err) 583 520 return err; 584 521 585 - err = pds_vfio_dirty_write_ack(pds_vfio, bmp_offset, bmp_bytes); 522 + err = pds_vfio_dirty_write_ack(pds_vfio, region, bmp_offset, bmp_bytes); 586 523 if (err) 587 524 return err; 588 525
+10 -8
drivers/vfio/pci/pds/dirty.h
··· 4 4 #ifndef _DIRTY_H_ 5 5 #define _DIRTY_H_ 6 6 7 - struct pds_vfio_bmp_info { 8 - unsigned long *bmp; 9 - u32 bmp_bytes; 7 + struct pds_vfio_region { 8 + unsigned long *host_seq; 9 + unsigned long *host_ack; 10 + u64 bmp_bytes; 11 + u64 size; 12 + u64 start; 13 + u64 page_size; 10 14 struct pds_lm_sg_elem *sgl; 11 15 dma_addr_t sgl_addr; 16 + u32 dev_bmp_offset_start_byte; 12 17 u16 num_sge; 13 18 }; 14 19 15 20 struct pds_vfio_dirty { 16 - struct pds_vfio_bmp_info host_seq; 17 - struct pds_vfio_bmp_info host_ack; 18 - u64 region_size; 19 - u64 region_start; 20 - u64 region_page_size; 21 + struct pds_vfio_region *regions; 22 + u8 num_regions; 21 23 bool is_enabled; 22 24 }; 23 25
+30 -27
drivers/vfio/pci/vfio_pci_rdwr.c
··· 38 38 #define vfio_iowrite8 iowrite8 39 39 40 40 #define VFIO_IOWRITE(size) \ 41 - static int vfio_pci_iowrite##size(struct vfio_pci_core_device *vdev, \ 41 + int vfio_pci_core_iowrite##size(struct vfio_pci_core_device *vdev, \ 42 42 bool test_mem, u##size val, void __iomem *io) \ 43 43 { \ 44 44 if (test_mem) { \ ··· 55 55 up_read(&vdev->memory_lock); \ 56 56 \ 57 57 return 0; \ 58 - } 58 + } \ 59 + EXPORT_SYMBOL_GPL(vfio_pci_core_iowrite##size); 59 60 60 61 VFIO_IOWRITE(8) 61 62 VFIO_IOWRITE(16) ··· 66 65 #endif 67 66 68 67 #define VFIO_IOREAD(size) \ 69 - static int vfio_pci_ioread##size(struct vfio_pci_core_device *vdev, \ 68 + int vfio_pci_core_ioread##size(struct vfio_pci_core_device *vdev, \ 70 69 bool test_mem, u##size *val, void __iomem *io) \ 71 70 { \ 72 71 if (test_mem) { \ ··· 83 82 up_read(&vdev->memory_lock); \ 84 83 \ 85 84 return 0; \ 86 - } 85 + } \ 86 + EXPORT_SYMBOL_GPL(vfio_pci_core_ioread##size); 87 87 88 88 VFIO_IOREAD(8) 89 89 VFIO_IOREAD(16) ··· 121 119 if (copy_from_user(&val, buf, 4)) 122 120 return -EFAULT; 123 121 124 - ret = vfio_pci_iowrite32(vdev, test_mem, 125 - val, io + off); 122 + ret = vfio_pci_core_iowrite32(vdev, test_mem, 123 + val, io + off); 126 124 if (ret) 127 125 return ret; 128 126 } else { 129 - ret = vfio_pci_ioread32(vdev, test_mem, 130 - &val, io + off); 127 + ret = vfio_pci_core_ioread32(vdev, test_mem, 128 + &val, io + off); 131 129 if (ret) 132 130 return ret; 133 131 ··· 143 141 if (copy_from_user(&val, buf, 2)) 144 142 return -EFAULT; 145 143 146 - ret = vfio_pci_iowrite16(vdev, test_mem, 147 - val, io + off); 144 + ret = vfio_pci_core_iowrite16(vdev, test_mem, 145 + val, io + off); 148 146 if (ret) 149 147 return ret; 150 148 } else { 151 - ret = vfio_pci_ioread16(vdev, test_mem, 152 - &val, io + off); 149 + ret = vfio_pci_core_ioread16(vdev, test_mem, 150 + &val, io + off); 153 151 if (ret) 154 152 return ret; 155 153 ··· 165 163 if (copy_from_user(&val, buf, 1)) 166 164 return -EFAULT; 167 165 168 - ret = vfio_pci_iowrite8(vdev, test_mem, 169 - val, io + off); 166 + ret = vfio_pci_core_iowrite8(vdev, test_mem, 167 + val, io + off); 170 168 if (ret) 171 169 return ret; 172 170 } else { 173 - ret = vfio_pci_ioread8(vdev, test_mem, 174 - &val, io + off); 171 + ret = vfio_pci_core_ioread8(vdev, test_mem, 172 + &val, io + off); 175 173 if (ret) 176 174 return ret; 177 175 ··· 202 200 return done; 203 201 } 204 202 205 - static int vfio_pci_setup_barmap(struct vfio_pci_core_device *vdev, int bar) 203 + int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar) 206 204 { 207 205 struct pci_dev *pdev = vdev->pdev; 208 206 int ret; ··· 225 223 226 224 return 0; 227 225 } 226 + EXPORT_SYMBOL_GPL(vfio_pci_core_setup_barmap); 228 227 229 228 ssize_t vfio_pci_bar_rw(struct vfio_pci_core_device *vdev, char __user *buf, 230 229 size_t count, loff_t *ppos, bool iswrite) ··· 265 262 } 266 263 x_end = end; 267 264 } else { 268 - int ret = vfio_pci_setup_barmap(vdev, bar); 265 + int ret = vfio_pci_core_setup_barmap(vdev, bar); 269 266 if (ret) { 270 267 done = ret; 271 268 goto out; ··· 366 363 { 367 364 switch (ioeventfd->count) { 368 365 case 1: 369 - vfio_pci_iowrite8(ioeventfd->vdev, test_mem, 370 - ioeventfd->data, ioeventfd->addr); 366 + vfio_pci_core_iowrite8(ioeventfd->vdev, test_mem, 367 + ioeventfd->data, ioeventfd->addr); 371 368 break; 372 369 case 2: 373 - vfio_pci_iowrite16(ioeventfd->vdev, test_mem, 374 - ioeventfd->data, ioeventfd->addr); 370 + vfio_pci_core_iowrite16(ioeventfd->vdev, test_mem, 371 + ioeventfd->data, ioeventfd->addr); 375 372 break; 376 373 case 4: 377 - vfio_pci_iowrite32(ioeventfd->vdev, test_mem, 378 - ioeventfd->data, ioeventfd->addr); 374 + vfio_pci_core_iowrite32(ioeventfd->vdev, test_mem, 375 + ioeventfd->data, ioeventfd->addr); 379 376 break; 380 377 #ifdef iowrite64 381 378 case 8: 382 - vfio_pci_iowrite64(ioeventfd->vdev, test_mem, 383 - ioeventfd->data, ioeventfd->addr); 379 + vfio_pci_core_iowrite64(ioeventfd->vdev, test_mem, 380 + ioeventfd->data, ioeventfd->addr); 384 381 break; 385 382 #endif 386 383 } ··· 441 438 return -EINVAL; 442 439 #endif 443 440 444 - ret = vfio_pci_setup_barmap(vdev, bar); 441 + ret = vfio_pci_core_setup_barmap(vdev, bar); 445 442 if (ret) 446 443 return ret; 447 444
+15
drivers/vfio/pci/virtio/Kconfig
··· 1 + # SPDX-License-Identifier: GPL-2.0-only 2 + config VIRTIO_VFIO_PCI 3 + tristate "VFIO support for VIRTIO NET PCI devices" 4 + depends on VIRTIO_PCI && VIRTIO_PCI_ADMIN_LEGACY 5 + select VFIO_PCI_CORE 6 + help 7 + This provides support for exposing VIRTIO NET VF devices which support 8 + legacy IO access, using the VFIO framework that can work with a legacy 9 + virtio driver in the guest. 10 + Based on PCIe spec, VFs do not support I/O Space. 11 + As of that this driver emulates I/O BAR in software to let a VF be 12 + seen as a transitional device by its users and let it work with 13 + a legacy driver. 14 + 15 + If you don't know what to do here, say N.
+3
drivers/vfio/pci/virtio/Makefile
··· 1 + # SPDX-License-Identifier: GPL-2.0-only 2 + obj-$(CONFIG_VIRTIO_VFIO_PCI) += virtio-vfio-pci.o 3 + virtio-vfio-pci-y := main.o
+576
drivers/vfio/pci/virtio/main.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved 4 + */ 5 + 6 + #include <linux/device.h> 7 + #include <linux/module.h> 8 + #include <linux/mutex.h> 9 + #include <linux/pci.h> 10 + #include <linux/pm_runtime.h> 11 + #include <linux/types.h> 12 + #include <linux/uaccess.h> 13 + #include <linux/vfio.h> 14 + #include <linux/vfio_pci_core.h> 15 + #include <linux/virtio_pci.h> 16 + #include <linux/virtio_net.h> 17 + #include <linux/virtio_pci_admin.h> 18 + 19 + struct virtiovf_pci_core_device { 20 + struct vfio_pci_core_device core_device; 21 + u8 *bar0_virtual_buf; 22 + /* synchronize access to the virtual buf */ 23 + struct mutex bar_mutex; 24 + void __iomem *notify_addr; 25 + u64 notify_offset; 26 + __le32 pci_base_addr_0; 27 + __le16 pci_cmd; 28 + u8 bar0_virtual_buf_size; 29 + u8 notify_bar; 30 + }; 31 + 32 + static int 33 + virtiovf_issue_legacy_rw_cmd(struct virtiovf_pci_core_device *virtvdev, 34 + loff_t pos, char __user *buf, 35 + size_t count, bool read) 36 + { 37 + bool msix_enabled = 38 + (virtvdev->core_device.irq_type == VFIO_PCI_MSIX_IRQ_INDEX); 39 + struct pci_dev *pdev = virtvdev->core_device.pdev; 40 + u8 *bar0_buf = virtvdev->bar0_virtual_buf; 41 + bool common; 42 + u8 offset; 43 + int ret; 44 + 45 + common = pos < VIRTIO_PCI_CONFIG_OFF(msix_enabled); 46 + /* offset within the relevant configuration area */ 47 + offset = common ? pos : pos - VIRTIO_PCI_CONFIG_OFF(msix_enabled); 48 + mutex_lock(&virtvdev->bar_mutex); 49 + if (read) { 50 + if (common) 51 + ret = virtio_pci_admin_legacy_common_io_read(pdev, offset, 52 + count, bar0_buf + pos); 53 + else 54 + ret = virtio_pci_admin_legacy_device_io_read(pdev, offset, 55 + count, bar0_buf + pos); 56 + if (ret) 57 + goto out; 58 + if (copy_to_user(buf, bar0_buf + pos, count)) 59 + ret = -EFAULT; 60 + } else { 61 + if (copy_from_user(bar0_buf + pos, buf, count)) { 62 + ret = -EFAULT; 63 + goto out; 64 + } 65 + 66 + if (common) 67 + ret = virtio_pci_admin_legacy_common_io_write(pdev, offset, 68 + count, bar0_buf + pos); 69 + else 70 + ret = virtio_pci_admin_legacy_device_io_write(pdev, offset, 71 + count, bar0_buf + pos); 72 + } 73 + out: 74 + mutex_unlock(&virtvdev->bar_mutex); 75 + return ret; 76 + } 77 + 78 + static int 79 + virtiovf_pci_bar0_rw(struct virtiovf_pci_core_device *virtvdev, 80 + loff_t pos, char __user *buf, 81 + size_t count, bool read) 82 + { 83 + struct vfio_pci_core_device *core_device = &virtvdev->core_device; 84 + struct pci_dev *pdev = core_device->pdev; 85 + u16 queue_notify; 86 + int ret; 87 + 88 + if (!(le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO)) 89 + return -EIO; 90 + 91 + if (pos + count > virtvdev->bar0_virtual_buf_size) 92 + return -EINVAL; 93 + 94 + ret = pm_runtime_resume_and_get(&pdev->dev); 95 + if (ret) { 96 + pci_info_ratelimited(pdev, "runtime resume failed %d\n", ret); 97 + return -EIO; 98 + } 99 + 100 + switch (pos) { 101 + case VIRTIO_PCI_QUEUE_NOTIFY: 102 + if (count != sizeof(queue_notify)) { 103 + ret = -EINVAL; 104 + goto end; 105 + } 106 + if (read) { 107 + ret = vfio_pci_core_ioread16(core_device, true, &queue_notify, 108 + virtvdev->notify_addr); 109 + if (ret) 110 + goto end; 111 + if (copy_to_user(buf, &queue_notify, 112 + sizeof(queue_notify))) { 113 + ret = -EFAULT; 114 + goto end; 115 + } 116 + } else { 117 + if (copy_from_user(&queue_notify, buf, count)) { 118 + ret = -EFAULT; 119 + goto end; 120 + } 121 + ret = vfio_pci_core_iowrite16(core_device, true, queue_notify, 122 + virtvdev->notify_addr); 123 + } 124 + break; 125 + default: 126 + ret = virtiovf_issue_legacy_rw_cmd(virtvdev, pos, buf, count, 127 + read); 128 + } 129 + 130 + end: 131 + pm_runtime_put(&pdev->dev); 132 + return ret ? ret : count; 133 + } 134 + 135 + static bool range_intersect_range(loff_t range1_start, size_t count1, 136 + loff_t range2_start, size_t count2, 137 + loff_t *start_offset, 138 + size_t *intersect_count, 139 + size_t *register_offset) 140 + { 141 + if (range1_start <= range2_start && 142 + range1_start + count1 > range2_start) { 143 + *start_offset = range2_start - range1_start; 144 + *intersect_count = min_t(size_t, count2, 145 + range1_start + count1 - range2_start); 146 + *register_offset = 0; 147 + return true; 148 + } 149 + 150 + if (range1_start > range2_start && 151 + range1_start < range2_start + count2) { 152 + *start_offset = 0; 153 + *intersect_count = min_t(size_t, count1, 154 + range2_start + count2 - range1_start); 155 + *register_offset = range1_start - range2_start; 156 + return true; 157 + } 158 + 159 + return false; 160 + } 161 + 162 + static ssize_t virtiovf_pci_read_config(struct vfio_device *core_vdev, 163 + char __user *buf, size_t count, 164 + loff_t *ppos) 165 + { 166 + struct virtiovf_pci_core_device *virtvdev = container_of( 167 + core_vdev, struct virtiovf_pci_core_device, core_device.vdev); 168 + loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK; 169 + size_t register_offset; 170 + loff_t copy_offset; 171 + size_t copy_count; 172 + __le32 val32; 173 + __le16 val16; 174 + u8 val8; 175 + int ret; 176 + 177 + ret = vfio_pci_core_read(core_vdev, buf, count, ppos); 178 + if (ret < 0) 179 + return ret; 180 + 181 + if (range_intersect_range(pos, count, PCI_DEVICE_ID, sizeof(val16), 182 + &copy_offset, &copy_count, &register_offset)) { 183 + val16 = cpu_to_le16(VIRTIO_TRANS_ID_NET); 184 + if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, copy_count)) 185 + return -EFAULT; 186 + } 187 + 188 + if ((le16_to_cpu(virtvdev->pci_cmd) & PCI_COMMAND_IO) && 189 + range_intersect_range(pos, count, PCI_COMMAND, sizeof(val16), 190 + &copy_offset, &copy_count, &register_offset)) { 191 + if (copy_from_user((void *)&val16 + register_offset, buf + copy_offset, 192 + copy_count)) 193 + return -EFAULT; 194 + val16 |= cpu_to_le16(PCI_COMMAND_IO); 195 + if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, 196 + copy_count)) 197 + return -EFAULT; 198 + } 199 + 200 + if (range_intersect_range(pos, count, PCI_REVISION_ID, sizeof(val8), 201 + &copy_offset, &copy_count, &register_offset)) { 202 + /* Transional needs to have revision 0 */ 203 + val8 = 0; 204 + if (copy_to_user(buf + copy_offset, &val8, copy_count)) 205 + return -EFAULT; 206 + } 207 + 208 + if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, sizeof(val32), 209 + &copy_offset, &copy_count, &register_offset)) { 210 + u32 bar_mask = ~(virtvdev->bar0_virtual_buf_size - 1); 211 + u32 pci_base_addr_0 = le32_to_cpu(virtvdev->pci_base_addr_0); 212 + 213 + val32 = cpu_to_le32((pci_base_addr_0 & bar_mask) | PCI_BASE_ADDRESS_SPACE_IO); 214 + if (copy_to_user(buf + copy_offset, (void *)&val32 + register_offset, copy_count)) 215 + return -EFAULT; 216 + } 217 + 218 + if (range_intersect_range(pos, count, PCI_SUBSYSTEM_ID, sizeof(val16), 219 + &copy_offset, &copy_count, &register_offset)) { 220 + /* 221 + * Transitional devices use the PCI subsystem device id as 222 + * virtio device id, same as legacy driver always did. 223 + */ 224 + val16 = cpu_to_le16(VIRTIO_ID_NET); 225 + if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, 226 + copy_count)) 227 + return -EFAULT; 228 + } 229 + 230 + if (range_intersect_range(pos, count, PCI_SUBSYSTEM_VENDOR_ID, sizeof(val16), 231 + &copy_offset, &copy_count, &register_offset)) { 232 + val16 = cpu_to_le16(PCI_VENDOR_ID_REDHAT_QUMRANET); 233 + if (copy_to_user(buf + copy_offset, (void *)&val16 + register_offset, 234 + copy_count)) 235 + return -EFAULT; 236 + } 237 + 238 + return count; 239 + } 240 + 241 + static ssize_t 242 + virtiovf_pci_core_read(struct vfio_device *core_vdev, char __user *buf, 243 + size_t count, loff_t *ppos) 244 + { 245 + struct virtiovf_pci_core_device *virtvdev = container_of( 246 + core_vdev, struct virtiovf_pci_core_device, core_device.vdev); 247 + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); 248 + loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK; 249 + 250 + if (!count) 251 + return 0; 252 + 253 + if (index == VFIO_PCI_CONFIG_REGION_INDEX) 254 + return virtiovf_pci_read_config(core_vdev, buf, count, ppos); 255 + 256 + if (index == VFIO_PCI_BAR0_REGION_INDEX) 257 + return virtiovf_pci_bar0_rw(virtvdev, pos, buf, count, true); 258 + 259 + return vfio_pci_core_read(core_vdev, buf, count, ppos); 260 + } 261 + 262 + static ssize_t virtiovf_pci_write_config(struct vfio_device *core_vdev, 263 + const char __user *buf, size_t count, 264 + loff_t *ppos) 265 + { 266 + struct virtiovf_pci_core_device *virtvdev = container_of( 267 + core_vdev, struct virtiovf_pci_core_device, core_device.vdev); 268 + loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK; 269 + size_t register_offset; 270 + loff_t copy_offset; 271 + size_t copy_count; 272 + 273 + if (range_intersect_range(pos, count, PCI_COMMAND, sizeof(virtvdev->pci_cmd), 274 + &copy_offset, &copy_count, 275 + &register_offset)) { 276 + if (copy_from_user((void *)&virtvdev->pci_cmd + register_offset, 277 + buf + copy_offset, 278 + copy_count)) 279 + return -EFAULT; 280 + } 281 + 282 + if (range_intersect_range(pos, count, PCI_BASE_ADDRESS_0, 283 + sizeof(virtvdev->pci_base_addr_0), 284 + &copy_offset, &copy_count, 285 + &register_offset)) { 286 + if (copy_from_user((void *)&virtvdev->pci_base_addr_0 + register_offset, 287 + buf + copy_offset, 288 + copy_count)) 289 + return -EFAULT; 290 + } 291 + 292 + return vfio_pci_core_write(core_vdev, buf, count, ppos); 293 + } 294 + 295 + static ssize_t 296 + virtiovf_pci_core_write(struct vfio_device *core_vdev, const char __user *buf, 297 + size_t count, loff_t *ppos) 298 + { 299 + struct virtiovf_pci_core_device *virtvdev = container_of( 300 + core_vdev, struct virtiovf_pci_core_device, core_device.vdev); 301 + unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); 302 + loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK; 303 + 304 + if (!count) 305 + return 0; 306 + 307 + if (index == VFIO_PCI_CONFIG_REGION_INDEX) 308 + return virtiovf_pci_write_config(core_vdev, buf, count, ppos); 309 + 310 + if (index == VFIO_PCI_BAR0_REGION_INDEX) 311 + return virtiovf_pci_bar0_rw(virtvdev, pos, (char __user *)buf, count, false); 312 + 313 + return vfio_pci_core_write(core_vdev, buf, count, ppos); 314 + } 315 + 316 + static int 317 + virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev, 318 + unsigned int cmd, unsigned long arg) 319 + { 320 + struct virtiovf_pci_core_device *virtvdev = container_of( 321 + core_vdev, struct virtiovf_pci_core_device, core_device.vdev); 322 + unsigned long minsz = offsetofend(struct vfio_region_info, offset); 323 + void __user *uarg = (void __user *)arg; 324 + struct vfio_region_info info = {}; 325 + 326 + if (copy_from_user(&info, uarg, minsz)) 327 + return -EFAULT; 328 + 329 + if (info.argsz < minsz) 330 + return -EINVAL; 331 + 332 + switch (info.index) { 333 + case VFIO_PCI_BAR0_REGION_INDEX: 334 + info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 335 + info.size = virtvdev->bar0_virtual_buf_size; 336 + info.flags = VFIO_REGION_INFO_FLAG_READ | 337 + VFIO_REGION_INFO_FLAG_WRITE; 338 + return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0; 339 + default: 340 + return vfio_pci_core_ioctl(core_vdev, cmd, arg); 341 + } 342 + } 343 + 344 + static long 345 + virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd, 346 + unsigned long arg) 347 + { 348 + switch (cmd) { 349 + case VFIO_DEVICE_GET_REGION_INFO: 350 + return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg); 351 + default: 352 + return vfio_pci_core_ioctl(core_vdev, cmd, arg); 353 + } 354 + } 355 + 356 + static int 357 + virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev) 358 + { 359 + struct vfio_pci_core_device *core_device = &virtvdev->core_device; 360 + int ret; 361 + 362 + /* 363 + * Setup the BAR where the 'notify' exists to be used by vfio as well 364 + * This will let us mmap it only once and use it when needed. 365 + */ 366 + ret = vfio_pci_core_setup_barmap(core_device, 367 + virtvdev->notify_bar); 368 + if (ret) 369 + return ret; 370 + 371 + virtvdev->notify_addr = core_device->barmap[virtvdev->notify_bar] + 372 + virtvdev->notify_offset; 373 + return 0; 374 + } 375 + 376 + static int virtiovf_pci_open_device(struct vfio_device *core_vdev) 377 + { 378 + struct virtiovf_pci_core_device *virtvdev = container_of( 379 + core_vdev, struct virtiovf_pci_core_device, core_device.vdev); 380 + struct vfio_pci_core_device *vdev = &virtvdev->core_device; 381 + int ret; 382 + 383 + ret = vfio_pci_core_enable(vdev); 384 + if (ret) 385 + return ret; 386 + 387 + if (virtvdev->bar0_virtual_buf) { 388 + /* 389 + * Upon close_device() the vfio_pci_core_disable() is called 390 + * and will close all the previous mmaps, so it seems that the 391 + * valid life cycle for the 'notify' addr is per open/close. 392 + */ 393 + ret = virtiovf_set_notify_addr(virtvdev); 394 + if (ret) { 395 + vfio_pci_core_disable(vdev); 396 + return ret; 397 + } 398 + } 399 + 400 + vfio_pci_core_finish_enable(vdev); 401 + return 0; 402 + } 403 + 404 + static int virtiovf_get_device_config_size(unsigned short device) 405 + { 406 + /* Network card */ 407 + return offsetofend(struct virtio_net_config, status); 408 + } 409 + 410 + static int virtiovf_read_notify_info(struct virtiovf_pci_core_device *virtvdev) 411 + { 412 + u64 offset; 413 + int ret; 414 + u8 bar; 415 + 416 + ret = virtio_pci_admin_legacy_io_notify_info(virtvdev->core_device.pdev, 417 + VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM, 418 + &bar, &offset); 419 + if (ret) 420 + return ret; 421 + 422 + virtvdev->notify_bar = bar; 423 + virtvdev->notify_offset = offset; 424 + return 0; 425 + } 426 + 427 + static int virtiovf_pci_init_device(struct vfio_device *core_vdev) 428 + { 429 + struct virtiovf_pci_core_device *virtvdev = container_of( 430 + core_vdev, struct virtiovf_pci_core_device, core_device.vdev); 431 + struct pci_dev *pdev; 432 + int ret; 433 + 434 + ret = vfio_pci_core_init_dev(core_vdev); 435 + if (ret) 436 + return ret; 437 + 438 + pdev = virtvdev->core_device.pdev; 439 + ret = virtiovf_read_notify_info(virtvdev); 440 + if (ret) 441 + return ret; 442 + 443 + virtvdev->bar0_virtual_buf_size = VIRTIO_PCI_CONFIG_OFF(true) + 444 + virtiovf_get_device_config_size(pdev->device); 445 + BUILD_BUG_ON(!is_power_of_2(virtvdev->bar0_virtual_buf_size)); 446 + virtvdev->bar0_virtual_buf = kzalloc(virtvdev->bar0_virtual_buf_size, 447 + GFP_KERNEL); 448 + if (!virtvdev->bar0_virtual_buf) 449 + return -ENOMEM; 450 + mutex_init(&virtvdev->bar_mutex); 451 + return 0; 452 + } 453 + 454 + static void virtiovf_pci_core_release_dev(struct vfio_device *core_vdev) 455 + { 456 + struct virtiovf_pci_core_device *virtvdev = container_of( 457 + core_vdev, struct virtiovf_pci_core_device, core_device.vdev); 458 + 459 + kfree(virtvdev->bar0_virtual_buf); 460 + vfio_pci_core_release_dev(core_vdev); 461 + } 462 + 463 + static const struct vfio_device_ops virtiovf_vfio_pci_tran_ops = { 464 + .name = "virtio-vfio-pci-trans", 465 + .init = virtiovf_pci_init_device, 466 + .release = virtiovf_pci_core_release_dev, 467 + .open_device = virtiovf_pci_open_device, 468 + .close_device = vfio_pci_core_close_device, 469 + .ioctl = virtiovf_vfio_pci_core_ioctl, 470 + .device_feature = vfio_pci_core_ioctl_feature, 471 + .read = virtiovf_pci_core_read, 472 + .write = virtiovf_pci_core_write, 473 + .mmap = vfio_pci_core_mmap, 474 + .request = vfio_pci_core_request, 475 + .match = vfio_pci_core_match, 476 + .bind_iommufd = vfio_iommufd_physical_bind, 477 + .unbind_iommufd = vfio_iommufd_physical_unbind, 478 + .attach_ioas = vfio_iommufd_physical_attach_ioas, 479 + .detach_ioas = vfio_iommufd_physical_detach_ioas, 480 + }; 481 + 482 + static const struct vfio_device_ops virtiovf_vfio_pci_ops = { 483 + .name = "virtio-vfio-pci", 484 + .init = vfio_pci_core_init_dev, 485 + .release = vfio_pci_core_release_dev, 486 + .open_device = virtiovf_pci_open_device, 487 + .close_device = vfio_pci_core_close_device, 488 + .ioctl = vfio_pci_core_ioctl, 489 + .device_feature = vfio_pci_core_ioctl_feature, 490 + .read = vfio_pci_core_read, 491 + .write = vfio_pci_core_write, 492 + .mmap = vfio_pci_core_mmap, 493 + .request = vfio_pci_core_request, 494 + .match = vfio_pci_core_match, 495 + .bind_iommufd = vfio_iommufd_physical_bind, 496 + .unbind_iommufd = vfio_iommufd_physical_unbind, 497 + .attach_ioas = vfio_iommufd_physical_attach_ioas, 498 + .detach_ioas = vfio_iommufd_physical_detach_ioas, 499 + }; 500 + 501 + static bool virtiovf_bar0_exists(struct pci_dev *pdev) 502 + { 503 + struct resource *res = pdev->resource; 504 + 505 + return res->flags; 506 + } 507 + 508 + static int virtiovf_pci_probe(struct pci_dev *pdev, 509 + const struct pci_device_id *id) 510 + { 511 + const struct vfio_device_ops *ops = &virtiovf_vfio_pci_ops; 512 + struct virtiovf_pci_core_device *virtvdev; 513 + int ret; 514 + 515 + if (pdev->is_virtfn && virtio_pci_admin_has_legacy_io(pdev) && 516 + !virtiovf_bar0_exists(pdev)) 517 + ops = &virtiovf_vfio_pci_tran_ops; 518 + 519 + virtvdev = vfio_alloc_device(virtiovf_pci_core_device, core_device.vdev, 520 + &pdev->dev, ops); 521 + if (IS_ERR(virtvdev)) 522 + return PTR_ERR(virtvdev); 523 + 524 + dev_set_drvdata(&pdev->dev, &virtvdev->core_device); 525 + ret = vfio_pci_core_register_device(&virtvdev->core_device); 526 + if (ret) 527 + goto out; 528 + return 0; 529 + out: 530 + vfio_put_device(&virtvdev->core_device.vdev); 531 + return ret; 532 + } 533 + 534 + static void virtiovf_pci_remove(struct pci_dev *pdev) 535 + { 536 + struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev); 537 + 538 + vfio_pci_core_unregister_device(&virtvdev->core_device); 539 + vfio_put_device(&virtvdev->core_device.vdev); 540 + } 541 + 542 + static const struct pci_device_id virtiovf_pci_table[] = { 543 + /* Only virtio-net is supported/tested so far */ 544 + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_REDHAT_QUMRANET, 0x1041) }, 545 + {} 546 + }; 547 + 548 + MODULE_DEVICE_TABLE(pci, virtiovf_pci_table); 549 + 550 + static void virtiovf_pci_aer_reset_done(struct pci_dev *pdev) 551 + { 552 + struct virtiovf_pci_core_device *virtvdev = dev_get_drvdata(&pdev->dev); 553 + 554 + virtvdev->pci_cmd = 0; 555 + } 556 + 557 + static const struct pci_error_handlers virtiovf_err_handlers = { 558 + .reset_done = virtiovf_pci_aer_reset_done, 559 + .error_detected = vfio_pci_core_aer_err_detected, 560 + }; 561 + 562 + static struct pci_driver virtiovf_pci_driver = { 563 + .name = KBUILD_MODNAME, 564 + .id_table = virtiovf_pci_table, 565 + .probe = virtiovf_pci_probe, 566 + .remove = virtiovf_pci_remove, 567 + .err_handler = &virtiovf_err_handlers, 568 + .driver_managed_dma = true, 569 + }; 570 + 571 + module_pci_driver(virtiovf_pci_driver); 572 + 573 + MODULE_LICENSE("GPL"); 574 + MODULE_AUTHOR("Yishai Hadas <yishaih@nvidia.com>"); 575 + MODULE_DESCRIPTION( 576 + "VIRTIO VFIO PCI - User Level meta-driver for VIRTIO NET devices");
+14
drivers/vfio/vfio.h
··· 448 448 } 449 449 #endif 450 450 451 + #ifdef CONFIG_VFIO_DEBUGFS 452 + void vfio_debugfs_create_root(void); 453 + void vfio_debugfs_remove_root(void); 454 + 455 + void vfio_device_debugfs_init(struct vfio_device *vdev); 456 + void vfio_device_debugfs_exit(struct vfio_device *vdev); 457 + #else 458 + static inline void vfio_debugfs_create_root(void) { } 459 + static inline void vfio_debugfs_remove_root(void) { } 460 + 461 + static inline void vfio_device_debugfs_init(struct vfio_device *vdev) { } 462 + static inline void vfio_device_debugfs_exit(struct vfio_device *vdev) { } 463 + #endif /* CONFIG_VFIO_DEBUGFS */ 464 + 451 465 #endif
+5 -3
drivers/vfio/vfio_iommu_type1.c
··· 1436 1436 list_for_each_entry(d, &iommu->domain_list, next) { 1437 1437 ret = iommu_map(d->domain, iova, (phys_addr_t)pfn << PAGE_SHIFT, 1438 1438 npage << PAGE_SHIFT, prot | IOMMU_CACHE, 1439 - GFP_KERNEL); 1439 + GFP_KERNEL_ACCOUNT); 1440 1440 if (ret) 1441 1441 goto unwind; 1442 1442 ··· 1750 1750 } 1751 1751 1752 1752 ret = iommu_map(domain->domain, iova, phys, size, 1753 - dma->prot | IOMMU_CACHE, GFP_KERNEL); 1753 + dma->prot | IOMMU_CACHE, 1754 + GFP_KERNEL_ACCOUNT); 1754 1755 if (ret) { 1755 1756 if (!dma->iommu_mapped) { 1756 1757 vfio_unpin_pages_remote(dma, iova, ··· 1846 1845 continue; 1847 1846 1848 1847 ret = iommu_map(domain->domain, start, page_to_phys(pages), PAGE_SIZE * 2, 1849 - IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE, GFP_KERNEL); 1848 + IOMMU_READ | IOMMU_WRITE | IOMMU_CACHE, 1849 + GFP_KERNEL_ACCOUNT); 1850 1850 if (!ret) { 1851 1851 size_t unmapped = iommu_unmap(domain->domain, start, PAGE_SIZE); 1852 1852
+4
drivers/vfio/vfio_main.c
··· 311 311 refcount_set(&device->refcount, 1); 312 312 313 313 vfio_device_group_register(device); 314 + vfio_device_debugfs_init(device); 314 315 315 316 return 0; 316 317 err_out: ··· 379 378 } 380 379 } 381 380 381 + vfio_device_debugfs_exit(device); 382 382 /* Balances vfio_device_set_group in register path */ 383 383 vfio_device_remove_group(device); 384 384 } ··· 1678 1676 if (ret) 1679 1677 goto err_alloc_dev_chrdev; 1680 1678 1679 + vfio_debugfs_create_root(); 1681 1680 pr_info(DRIVER_DESC " version: " DRIVER_VERSION "\n"); 1682 1681 return 0; 1683 1682 ··· 1694 1691 1695 1692 static void __exit vfio_cleanup(void) 1696 1693 { 1694 + vfio_debugfs_remove_root(); 1697 1695 ida_destroy(&vfio.device_ida); 1698 1696 vfio_cdev_cleanup(); 1699 1697 class_destroy(vfio.device_class);
+5
drivers/virtio/Kconfig
··· 60 60 61 61 If unsure, say M. 62 62 63 + config VIRTIO_PCI_ADMIN_LEGACY 64 + bool 65 + depends on VIRTIO_PCI && (X86 || COMPILE_TEST) 66 + default y 67 + 63 68 config VIRTIO_PCI_LEGACY 64 69 bool "Support for legacy virtio draft 0.9.X and older devices" 65 70 default y
+1
drivers/virtio/Makefile
··· 7 7 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o 8 8 virtio_pci-y := virtio_pci_modern.o virtio_pci_common.o 9 9 virtio_pci-$(CONFIG_VIRTIO_PCI_LEGACY) += virtio_pci_legacy.o 10 + virtio_pci-$(CONFIG_VIRTIO_PCI_ADMIN_LEGACY) += virtio_pci_admin_legacy_io.o 10 11 obj-$(CONFIG_VIRTIO_BALLOON) += virtio_balloon.o 11 12 obj-$(CONFIG_VIRTIO_INPUT) += virtio_input.o 12 13 obj-$(CONFIG_VIRTIO_VDPA) += virtio_vdpa.o
+33 -4
drivers/virtio/virtio.c
··· 302 302 if (err) 303 303 goto err; 304 304 305 + if (dev->config->create_avq) { 306 + err = dev->config->create_avq(dev); 307 + if (err) 308 + goto err; 309 + } 310 + 305 311 err = drv->probe(dev); 306 312 if (err) 307 - goto err; 313 + goto err_probe; 308 314 309 315 /* If probe didn't do it, mark device DRIVER_OK ourselves. */ 310 316 if (!(dev->config->get_status(dev) & VIRTIO_CONFIG_S_DRIVER_OK)) ··· 322 316 virtio_config_enable(dev); 323 317 324 318 return 0; 319 + 320 + err_probe: 321 + if (dev->config->destroy_avq) 322 + dev->config->destroy_avq(dev); 325 323 err: 326 324 virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED); 327 325 return err; ··· 340 330 virtio_config_disable(dev); 341 331 342 332 drv->remove(dev); 333 + 334 + if (dev->config->destroy_avq) 335 + dev->config->destroy_avq(dev); 343 336 344 337 /* Driver should have reset device. */ 345 338 WARN_ON_ONCE(dev->config->get_status(dev)); ··· 502 489 int virtio_device_freeze(struct virtio_device *dev) 503 490 { 504 491 struct virtio_driver *drv = drv_to_virtio(dev->dev.driver); 492 + int ret; 505 493 506 494 virtio_config_disable(dev); 507 495 508 496 dev->failed = dev->config->get_status(dev) & VIRTIO_CONFIG_S_FAILED; 509 497 510 - if (drv && drv->freeze) 511 - return drv->freeze(dev); 498 + if (drv && drv->freeze) { 499 + ret = drv->freeze(dev); 500 + if (ret) 501 + return ret; 502 + } 503 + 504 + if (dev->config->destroy_avq) 505 + dev->config->destroy_avq(dev); 512 506 513 507 return 0; 514 508 } ··· 552 532 if (ret) 553 533 goto err; 554 534 535 + if (dev->config->create_avq) { 536 + ret = dev->config->create_avq(dev); 537 + if (ret) 538 + goto err; 539 + } 540 + 555 541 if (drv->restore) { 556 542 ret = drv->restore(dev); 557 543 if (ret) 558 - goto err; 544 + goto err_restore; 559 545 } 560 546 561 547 /* If restore didn't do it, mark device DRIVER_OK ourselves. */ ··· 572 546 573 547 return 0; 574 548 549 + err_restore: 550 + if (dev->config->destroy_avq) 551 + dev->config->destroy_avq(dev); 575 552 err: 576 553 virtio_add_status(dev, VIRTIO_CONFIG_S_FAILED); 577 554 return ret;
+244
drivers/virtio/virtio_pci_admin_legacy_io.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (c) 2023, NVIDIA CORPORATION & AFFILIATES. All rights reserved 4 + */ 5 + 6 + #include <linux/virtio_pci_admin.h> 7 + #include "virtio_pci_common.h" 8 + 9 + /* 10 + * virtio_pci_admin_has_legacy_io - Checks whether the legacy IO 11 + * commands are supported 12 + * @dev: VF pci_dev 13 + * 14 + * Returns true on success. 15 + */ 16 + bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev) 17 + { 18 + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev); 19 + struct virtio_pci_device *vp_dev; 20 + 21 + if (!virtio_dev) 22 + return false; 23 + 24 + if (!virtio_has_feature(virtio_dev, VIRTIO_F_ADMIN_VQ)) 25 + return false; 26 + 27 + vp_dev = to_vp_device(virtio_dev); 28 + 29 + if ((vp_dev->admin_vq.supported_cmds & VIRTIO_LEGACY_ADMIN_CMD_BITMAP) == 30 + VIRTIO_LEGACY_ADMIN_CMD_BITMAP) 31 + return true; 32 + return false; 33 + } 34 + EXPORT_SYMBOL_GPL(virtio_pci_admin_has_legacy_io); 35 + 36 + static int virtio_pci_admin_legacy_io_write(struct pci_dev *pdev, u16 opcode, 37 + u8 offset, u8 size, u8 *buf) 38 + { 39 + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev); 40 + struct virtio_admin_cmd_legacy_wr_data *data; 41 + struct virtio_admin_cmd cmd = {}; 42 + struct scatterlist data_sg; 43 + int vf_id; 44 + int ret; 45 + 46 + if (!virtio_dev) 47 + return -ENODEV; 48 + 49 + vf_id = pci_iov_vf_id(pdev); 50 + if (vf_id < 0) 51 + return vf_id; 52 + 53 + data = kzalloc(sizeof(*data) + size, GFP_KERNEL); 54 + if (!data) 55 + return -ENOMEM; 56 + 57 + data->offset = offset; 58 + memcpy(data->registers, buf, size); 59 + sg_init_one(&data_sg, data, sizeof(*data) + size); 60 + cmd.opcode = cpu_to_le16(opcode); 61 + cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV); 62 + cmd.group_member_id = cpu_to_le64(vf_id + 1); 63 + cmd.data_sg = &data_sg; 64 + ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd); 65 + 66 + kfree(data); 67 + return ret; 68 + } 69 + 70 + /* 71 + * virtio_pci_admin_legacy_io_write_common - Write legacy common configuration 72 + * of a member device 73 + * @dev: VF pci_dev 74 + * @offset: starting byte offset within the common configuration area to write to 75 + * @size: size of the data to write 76 + * @buf: buffer which holds the data 77 + * 78 + * Note: caller must serialize access for the given device. 79 + * Returns 0 on success, or negative on failure. 80 + */ 81 + int virtio_pci_admin_legacy_common_io_write(struct pci_dev *pdev, u8 offset, 82 + u8 size, u8 *buf) 83 + { 84 + return virtio_pci_admin_legacy_io_write(pdev, 85 + VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE, 86 + offset, size, buf); 87 + } 88 + EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_common_io_write); 89 + 90 + /* 91 + * virtio_pci_admin_legacy_io_write_device - Write legacy device configuration 92 + * of a member device 93 + * @dev: VF pci_dev 94 + * @offset: starting byte offset within the device configuration area to write to 95 + * @size: size of the data to write 96 + * @buf: buffer which holds the data 97 + * 98 + * Note: caller must serialize access for the given device. 99 + * Returns 0 on success, or negative on failure. 100 + */ 101 + int virtio_pci_admin_legacy_device_io_write(struct pci_dev *pdev, u8 offset, 102 + u8 size, u8 *buf) 103 + { 104 + return virtio_pci_admin_legacy_io_write(pdev, 105 + VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE, 106 + offset, size, buf); 107 + } 108 + EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_device_io_write); 109 + 110 + static int virtio_pci_admin_legacy_io_read(struct pci_dev *pdev, u16 opcode, 111 + u8 offset, u8 size, u8 *buf) 112 + { 113 + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev); 114 + struct virtio_admin_cmd_legacy_rd_data *data; 115 + struct scatterlist data_sg, result_sg; 116 + struct virtio_admin_cmd cmd = {}; 117 + int vf_id; 118 + int ret; 119 + 120 + if (!virtio_dev) 121 + return -ENODEV; 122 + 123 + vf_id = pci_iov_vf_id(pdev); 124 + if (vf_id < 0) 125 + return vf_id; 126 + 127 + data = kzalloc(sizeof(*data), GFP_KERNEL); 128 + if (!data) 129 + return -ENOMEM; 130 + 131 + data->offset = offset; 132 + sg_init_one(&data_sg, data, sizeof(*data)); 133 + sg_init_one(&result_sg, buf, size); 134 + cmd.opcode = cpu_to_le16(opcode); 135 + cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV); 136 + cmd.group_member_id = cpu_to_le64(vf_id + 1); 137 + cmd.data_sg = &data_sg; 138 + cmd.result_sg = &result_sg; 139 + ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd); 140 + 141 + kfree(data); 142 + return ret; 143 + } 144 + 145 + /* 146 + * virtio_pci_admin_legacy_device_io_read - Read legacy device configuration of 147 + * a member device 148 + * @dev: VF pci_dev 149 + * @offset: starting byte offset within the device configuration area to read from 150 + * @size: size of the data to be read 151 + * @buf: buffer to hold the returned data 152 + * 153 + * Note: caller must serialize access for the given device. 154 + * Returns 0 on success, or negative on failure. 155 + */ 156 + int virtio_pci_admin_legacy_device_io_read(struct pci_dev *pdev, u8 offset, 157 + u8 size, u8 *buf) 158 + { 159 + return virtio_pci_admin_legacy_io_read(pdev, 160 + VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ, 161 + offset, size, buf); 162 + } 163 + EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_device_io_read); 164 + 165 + /* 166 + * virtio_pci_admin_legacy_common_io_read - Read legacy common configuration of 167 + * a member device 168 + * @dev: VF pci_dev 169 + * @offset: starting byte offset within the common configuration area to read from 170 + * @size: size of the data to be read 171 + * @buf: buffer to hold the returned data 172 + * 173 + * Note: caller must serialize access for the given device. 174 + * Returns 0 on success, or negative on failure. 175 + */ 176 + int virtio_pci_admin_legacy_common_io_read(struct pci_dev *pdev, u8 offset, 177 + u8 size, u8 *buf) 178 + { 179 + return virtio_pci_admin_legacy_io_read(pdev, 180 + VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ, 181 + offset, size, buf); 182 + } 183 + EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_common_io_read); 184 + 185 + /* 186 + * virtio_pci_admin_legacy_io_notify_info - Read the queue notification 187 + * information for legacy interface 188 + * @dev: VF pci_dev 189 + * @req_bar_flags: requested bar flags 190 + * @bar: on output the BAR number of the owner or member device 191 + * @bar_offset: on output the offset within bar 192 + * 193 + * Returns 0 on success, or negative on failure. 194 + */ 195 + int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev, 196 + u8 req_bar_flags, u8 *bar, 197 + u64 *bar_offset) 198 + { 199 + struct virtio_device *virtio_dev = virtio_pci_vf_get_pf_dev(pdev); 200 + struct virtio_admin_cmd_notify_info_result *result; 201 + struct virtio_admin_cmd cmd = {}; 202 + struct scatterlist result_sg; 203 + int vf_id; 204 + int ret; 205 + 206 + if (!virtio_dev) 207 + return -ENODEV; 208 + 209 + vf_id = pci_iov_vf_id(pdev); 210 + if (vf_id < 0) 211 + return vf_id; 212 + 213 + result = kzalloc(sizeof(*result), GFP_KERNEL); 214 + if (!result) 215 + return -ENOMEM; 216 + 217 + sg_init_one(&result_sg, result, sizeof(*result)); 218 + cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO); 219 + cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV); 220 + cmd.group_member_id = cpu_to_le64(vf_id + 1); 221 + cmd.result_sg = &result_sg; 222 + ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd); 223 + if (!ret) { 224 + struct virtio_admin_cmd_notify_info_data *entry; 225 + int i; 226 + 227 + ret = -ENOENT; 228 + for (i = 0; i < VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO; i++) { 229 + entry = &result->entries[i]; 230 + if (entry->flags == VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END) 231 + break; 232 + if (entry->flags != req_bar_flags) 233 + continue; 234 + *bar = entry->bar; 235 + *bar_offset = le64_to_cpu(entry->offset); 236 + ret = 0; 237 + break; 238 + } 239 + } 240 + 241 + kfree(result); 242 + return ret; 243 + } 244 + EXPORT_SYMBOL_GPL(virtio_pci_admin_legacy_io_notify_info);
+14
drivers/virtio/virtio_pci_common.c
··· 236 236 int i; 237 237 238 238 list_for_each_entry_safe(vq, n, &vdev->vqs, list) { 239 + if (vp_dev->is_avq(vdev, vq->index)) 240 + continue; 241 + 239 242 if (vp_dev->per_vq_vectors) { 240 243 int v = vp_dev->vqs[vq->index]->msix_vector; 241 244 ··· 644 641 #endif 645 642 .sriov_configure = virtio_pci_sriov_configure, 646 643 }; 644 + 645 + struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev) 646 + { 647 + struct virtio_pci_device *pf_vp_dev; 648 + 649 + pf_vp_dev = pci_iov_get_pf_drvdata(pdev, &virtio_pci_driver); 650 + if (IS_ERR(pf_vp_dev)) 651 + return NULL; 652 + 653 + return &pf_vp_dev->vdev; 654 + } 647 655 648 656 module_pci_driver(virtio_pci_driver); 649 657
+41 -1
drivers/virtio/virtio_pci_common.h
··· 29 29 #include <linux/virtio_pci_modern.h> 30 30 #include <linux/highmem.h> 31 31 #include <linux/spinlock.h> 32 + #include <linux/mutex.h> 32 33 33 34 struct virtio_pci_vq_info { 34 35 /* the actual virtqueue */ ··· 40 39 41 40 /* MSI-X vector (or none) */ 42 41 unsigned int msix_vector; 42 + }; 43 + 44 + struct virtio_pci_admin_vq { 45 + /* Virtqueue info associated with this admin queue. */ 46 + struct virtio_pci_vq_info info; 47 + /* serializing admin commands execution and virtqueue deletion */ 48 + struct mutex cmd_lock; 49 + u64 supported_cmds; 50 + /* Name of the admin queue: avq.$vq_index. */ 51 + char name[10]; 52 + u16 vq_index; 43 53 }; 44 54 45 55 /* Our device structure */ ··· 70 58 spinlock_t lock; 71 59 struct list_head virtqueues; 72 60 73 - /* array of all queues for house-keeping */ 61 + /* Array of all virtqueues reported in the 62 + * PCI common config num_queues field 63 + */ 74 64 struct virtio_pci_vq_info **vqs; 65 + 66 + struct virtio_pci_admin_vq admin_vq; 75 67 76 68 /* MSI-X support */ 77 69 int msix_enabled; ··· 102 86 void (*del_vq)(struct virtio_pci_vq_info *info); 103 87 104 88 u16 (*config_vector)(struct virtio_pci_device *vp_dev, u16 vector); 89 + bool (*is_avq)(struct virtio_device *vdev, unsigned int index); 105 90 }; 106 91 107 92 /* Constants for MSI-X */ ··· 155 138 #endif 156 139 int virtio_pci_modern_probe(struct virtio_pci_device *); 157 140 void virtio_pci_modern_remove(struct virtio_pci_device *); 141 + 142 + struct virtio_device *virtio_pci_vf_get_pf_dev(struct pci_dev *pdev); 143 + 144 + #define VIRTIO_LEGACY_ADMIN_CMD_BITMAP \ 145 + (BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE) | \ 146 + BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ) | \ 147 + BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE) | \ 148 + BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ) | \ 149 + BIT_ULL(VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO)) 150 + 151 + /* Unlike modern drivers which support hardware virtio devices, legacy drivers 152 + * assume software-based devices: e.g. they don't use proper memory barriers 153 + * on ARM, use big endian on PPC, etc. X86 drivers are mostly ok though, more 154 + * or less by chance. For now, only support legacy IO on X86. 155 + */ 156 + #ifdef CONFIG_VIRTIO_PCI_ADMIN_LEGACY 157 + #define VIRTIO_ADMIN_CMD_BITMAP VIRTIO_LEGACY_ADMIN_CMD_BITMAP 158 + #else 159 + #define VIRTIO_ADMIN_CMD_BITMAP 0 160 + #endif 161 + 162 + int vp_modern_admin_cmd_exec(struct virtio_device *vdev, 163 + struct virtio_admin_cmd *cmd); 158 164 159 165 #endif
+257 -2
drivers/virtio/virtio_pci_modern.c
··· 19 19 #define VIRTIO_RING_NO_LEGACY 20 20 #include "virtio_pci_common.h" 21 21 22 + #define VIRTIO_AVQ_SGS_MAX 4 23 + 22 24 static u64 vp_get_features(struct virtio_device *vdev) 23 25 { 24 26 struct virtio_pci_device *vp_dev = to_vp_device(vdev); 25 27 26 28 return vp_modern_get_features(&vp_dev->mdev); 29 + } 30 + 31 + static bool vp_is_avq(struct virtio_device *vdev, unsigned int index) 32 + { 33 + struct virtio_pci_device *vp_dev = to_vp_device(vdev); 34 + 35 + if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ)) 36 + return false; 37 + 38 + return index == vp_dev->admin_vq.vq_index; 39 + } 40 + 41 + static int virtqueue_exec_admin_cmd(struct virtio_pci_admin_vq *admin_vq, 42 + u16 opcode, 43 + struct scatterlist **sgs, 44 + unsigned int out_num, 45 + unsigned int in_num, 46 + void *data) 47 + { 48 + struct virtqueue *vq; 49 + int ret, len; 50 + 51 + vq = admin_vq->info.vq; 52 + if (!vq) 53 + return -EIO; 54 + 55 + if (opcode != VIRTIO_ADMIN_CMD_LIST_QUERY && 56 + opcode != VIRTIO_ADMIN_CMD_LIST_USE && 57 + !((1ULL << opcode) & admin_vq->supported_cmds)) 58 + return -EOPNOTSUPP; 59 + 60 + ret = virtqueue_add_sgs(vq, sgs, out_num, in_num, data, GFP_KERNEL); 61 + if (ret < 0) 62 + return -EIO; 63 + 64 + if (unlikely(!virtqueue_kick(vq))) 65 + return -EIO; 66 + 67 + while (!virtqueue_get_buf(vq, &len) && 68 + !virtqueue_is_broken(vq)) 69 + cpu_relax(); 70 + 71 + if (virtqueue_is_broken(vq)) 72 + return -EIO; 73 + 74 + return 0; 75 + } 76 + 77 + int vp_modern_admin_cmd_exec(struct virtio_device *vdev, 78 + struct virtio_admin_cmd *cmd) 79 + { 80 + struct scatterlist *sgs[VIRTIO_AVQ_SGS_MAX], hdr, stat; 81 + struct virtio_pci_device *vp_dev = to_vp_device(vdev); 82 + struct virtio_admin_cmd_status *va_status; 83 + unsigned int out_num = 0, in_num = 0; 84 + struct virtio_admin_cmd_hdr *va_hdr; 85 + u16 status; 86 + int ret; 87 + 88 + if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ)) 89 + return -EOPNOTSUPP; 90 + 91 + va_status = kzalloc(sizeof(*va_status), GFP_KERNEL); 92 + if (!va_status) 93 + return -ENOMEM; 94 + 95 + va_hdr = kzalloc(sizeof(*va_hdr), GFP_KERNEL); 96 + if (!va_hdr) { 97 + ret = -ENOMEM; 98 + goto err_alloc; 99 + } 100 + 101 + va_hdr->opcode = cmd->opcode; 102 + va_hdr->group_type = cmd->group_type; 103 + va_hdr->group_member_id = cmd->group_member_id; 104 + 105 + /* Add header */ 106 + sg_init_one(&hdr, va_hdr, sizeof(*va_hdr)); 107 + sgs[out_num] = &hdr; 108 + out_num++; 109 + 110 + if (cmd->data_sg) { 111 + sgs[out_num] = cmd->data_sg; 112 + out_num++; 113 + } 114 + 115 + /* Add return status */ 116 + sg_init_one(&stat, va_status, sizeof(*va_status)); 117 + sgs[out_num + in_num] = &stat; 118 + in_num++; 119 + 120 + if (cmd->result_sg) { 121 + sgs[out_num + in_num] = cmd->result_sg; 122 + in_num++; 123 + } 124 + 125 + mutex_lock(&vp_dev->admin_vq.cmd_lock); 126 + ret = virtqueue_exec_admin_cmd(&vp_dev->admin_vq, 127 + le16_to_cpu(cmd->opcode), 128 + sgs, out_num, in_num, sgs); 129 + mutex_unlock(&vp_dev->admin_vq.cmd_lock); 130 + 131 + if (ret) { 132 + dev_err(&vdev->dev, 133 + "Failed to execute command on admin vq: %d\n.", ret); 134 + goto err_cmd_exec; 135 + } 136 + 137 + status = le16_to_cpu(va_status->status); 138 + if (status != VIRTIO_ADMIN_STATUS_OK) { 139 + dev_err(&vdev->dev, 140 + "admin command error: status(%#x) qualifier(%#x)\n", 141 + status, le16_to_cpu(va_status->status_qualifier)); 142 + ret = -status; 143 + } 144 + 145 + err_cmd_exec: 146 + kfree(va_hdr); 147 + err_alloc: 148 + kfree(va_status); 149 + return ret; 150 + } 151 + 152 + static void virtio_pci_admin_cmd_list_init(struct virtio_device *virtio_dev) 153 + { 154 + struct virtio_pci_device *vp_dev = to_vp_device(virtio_dev); 155 + struct virtio_admin_cmd cmd = {}; 156 + struct scatterlist result_sg; 157 + struct scatterlist data_sg; 158 + __le64 *data; 159 + int ret; 160 + 161 + data = kzalloc(sizeof(*data), GFP_KERNEL); 162 + if (!data) 163 + return; 164 + 165 + sg_init_one(&result_sg, data, sizeof(*data)); 166 + cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_QUERY); 167 + cmd.group_type = cpu_to_le16(VIRTIO_ADMIN_GROUP_TYPE_SRIOV); 168 + cmd.result_sg = &result_sg; 169 + 170 + ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd); 171 + if (ret) 172 + goto end; 173 + 174 + *data &= cpu_to_le64(VIRTIO_ADMIN_CMD_BITMAP); 175 + sg_init_one(&data_sg, data, sizeof(*data)); 176 + cmd.opcode = cpu_to_le16(VIRTIO_ADMIN_CMD_LIST_USE); 177 + cmd.data_sg = &data_sg; 178 + cmd.result_sg = NULL; 179 + 180 + ret = vp_modern_admin_cmd_exec(virtio_dev, &cmd); 181 + if (ret) 182 + goto end; 183 + 184 + vp_dev->admin_vq.supported_cmds = le64_to_cpu(*data); 185 + end: 186 + kfree(data); 187 + } 188 + 189 + static void vp_modern_avq_activate(struct virtio_device *vdev) 190 + { 191 + struct virtio_pci_device *vp_dev = to_vp_device(vdev); 192 + struct virtio_pci_admin_vq *admin_vq = &vp_dev->admin_vq; 193 + 194 + if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ)) 195 + return; 196 + 197 + __virtqueue_unbreak(admin_vq->info.vq); 198 + virtio_pci_admin_cmd_list_init(vdev); 199 + } 200 + 201 + static void vp_modern_avq_deactivate(struct virtio_device *vdev) 202 + { 203 + struct virtio_pci_device *vp_dev = to_vp_device(vdev); 204 + struct virtio_pci_admin_vq *admin_vq = &vp_dev->admin_vq; 205 + 206 + if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ)) 207 + return; 208 + 209 + __virtqueue_break(admin_vq->info.vq); 27 210 } 28 211 29 212 static void vp_transport_features(struct virtio_device *vdev, u64 features) ··· 220 37 221 38 if (features & BIT_ULL(VIRTIO_F_RING_RESET)) 222 39 __virtio_set_bit(vdev, VIRTIO_F_RING_RESET); 40 + 41 + if (features & BIT_ULL(VIRTIO_F_ADMIN_VQ)) 42 + __virtio_set_bit(vdev, VIRTIO_F_ADMIN_VQ); 223 43 } 224 44 225 45 static int __vp_check_common_size_one_feature(struct virtio_device *vdev, u32 fbit, ··· 253 67 return -EINVAL; 254 68 255 69 if (vp_check_common_size_one_feature(vdev, VIRTIO_F_RING_RESET, queue_reset)) 70 + return -EINVAL; 71 + 72 + if (vp_check_common_size_one_feature(vdev, VIRTIO_F_ADMIN_VQ, admin_queue_num)) 256 73 return -EINVAL; 257 74 258 75 return 0; ··· 384 195 /* We should never be setting status to 0. */ 385 196 BUG_ON(status == 0); 386 197 vp_modern_set_status(&vp_dev->mdev, status); 198 + if (status & VIRTIO_CONFIG_S_DRIVER_OK) 199 + vp_modern_avq_activate(vdev); 387 200 } 388 201 389 202 static void vp_reset(struct virtio_device *vdev) ··· 402 211 */ 403 212 while (vp_modern_get_status(mdev)) 404 213 msleep(1); 214 + 215 + vp_modern_avq_deactivate(vdev); 216 + 405 217 /* Flush pending VQ/configuration callbacks. */ 406 218 vp_synchronize_vectors(vdev); 407 219 } ··· 539 345 struct virtio_pci_modern_device *mdev = &vp_dev->mdev; 540 346 bool (*notify)(struct virtqueue *vq); 541 347 struct virtqueue *vq; 348 + bool is_avq; 542 349 u16 num; 543 350 int err; 544 351 ··· 548 353 else 549 354 notify = vp_notify; 550 355 551 - if (index >= vp_modern_get_num_queues(mdev)) 356 + is_avq = vp_is_avq(&vp_dev->vdev, index); 357 + if (index >= vp_modern_get_num_queues(mdev) && !is_avq) 552 358 return ERR_PTR(-EINVAL); 553 359 360 + num = is_avq ? 361 + VIRTIO_AVQ_SGS_MAX : vp_modern_get_queue_size(mdev, index); 554 362 /* Check if queue is either not available or already active. */ 555 - num = vp_modern_get_queue_size(mdev, index); 556 363 if (!num || vp_modern_get_queue_enable(mdev, index)) 557 364 return ERR_PTR(-ENOENT); 558 365 ··· 578 381 if (!vq->priv) { 579 382 err = -ENOMEM; 580 383 goto err; 384 + } 385 + 386 + if (is_avq) { 387 + mutex_lock(&vp_dev->admin_vq.cmd_lock); 388 + vp_dev->admin_vq.info.vq = vq; 389 + mutex_unlock(&vp_dev->admin_vq.cmd_lock); 581 390 } 582 391 583 392 return vq; ··· 620 417 struct virtqueue *vq = info->vq; 621 418 struct virtio_pci_device *vp_dev = to_vp_device(vq->vdev); 622 419 struct virtio_pci_modern_device *mdev = &vp_dev->mdev; 420 + 421 + if (vp_is_avq(&vp_dev->vdev, vq->index)) { 422 + mutex_lock(&vp_dev->admin_vq.cmd_lock); 423 + vp_dev->admin_vq.info.vq = NULL; 424 + mutex_unlock(&vp_dev->admin_vq.cmd_lock); 425 + } 623 426 624 427 if (vp_dev->msix_enabled) 625 428 vp_modern_queue_vector(mdev, vq->index, ··· 736 527 return true; 737 528 } 738 529 530 + static int vp_modern_create_avq(struct virtio_device *vdev) 531 + { 532 + struct virtio_pci_device *vp_dev = to_vp_device(vdev); 533 + struct virtio_pci_admin_vq *avq; 534 + struct virtqueue *vq; 535 + u16 admin_q_num; 536 + 537 + if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ)) 538 + return 0; 539 + 540 + admin_q_num = vp_modern_avq_num(&vp_dev->mdev); 541 + if (!admin_q_num) 542 + return -EINVAL; 543 + 544 + avq = &vp_dev->admin_vq; 545 + avq->vq_index = vp_modern_avq_index(&vp_dev->mdev); 546 + sprintf(avq->name, "avq.%u", avq->vq_index); 547 + vq = vp_dev->setup_vq(vp_dev, &vp_dev->admin_vq.info, avq->vq_index, NULL, 548 + avq->name, NULL, VIRTIO_MSI_NO_VECTOR); 549 + if (IS_ERR(vq)) { 550 + dev_err(&vdev->dev, "failed to setup admin virtqueue, err=%ld", 551 + PTR_ERR(vq)); 552 + return PTR_ERR(vq); 553 + } 554 + 555 + vp_modern_set_queue_enable(&vp_dev->mdev, avq->info.vq->index, true); 556 + return 0; 557 + } 558 + 559 + static void vp_modern_destroy_avq(struct virtio_device *vdev) 560 + { 561 + struct virtio_pci_device *vp_dev = to_vp_device(vdev); 562 + 563 + if (!virtio_has_feature(vdev, VIRTIO_F_ADMIN_VQ)) 564 + return; 565 + 566 + vp_dev->del_vq(&vp_dev->admin_vq.info); 567 + } 568 + 739 569 static const struct virtio_config_ops virtio_pci_config_nodev_ops = { 740 570 .get = NULL, 741 571 .set = NULL, ··· 793 545 .get_shm_region = vp_get_shm_region, 794 546 .disable_vq_and_reset = vp_modern_disable_vq_and_reset, 795 547 .enable_vq_after_reset = vp_modern_enable_vq_after_reset, 548 + .create_avq = vp_modern_create_avq, 549 + .destroy_avq = vp_modern_destroy_avq, 796 550 }; 797 551 798 552 static const struct virtio_config_ops virtio_pci_config_ops = { ··· 815 565 .get_shm_region = vp_get_shm_region, 816 566 .disable_vq_and_reset = vp_modern_disable_vq_and_reset, 817 567 .enable_vq_after_reset = vp_modern_enable_vq_after_reset, 568 + .create_avq = vp_modern_create_avq, 569 + .destroy_avq = vp_modern_destroy_avq, 818 570 }; 819 571 820 572 /* the PCI probing function */ ··· 840 588 vp_dev->config_vector = vp_config_vector; 841 589 vp_dev->setup_vq = setup_vq; 842 590 vp_dev->del_vq = del_vq; 591 + vp_dev->is_avq = vp_is_avq; 843 592 vp_dev->isr = mdev->isr; 844 593 vp_dev->vdev.id = mdev->id; 845 594 595 + mutex_init(&vp_dev->admin_vq.cmd_lock); 846 596 return 0; 847 597 } 848 598 ··· 852 598 { 853 599 struct virtio_pci_modern_device *mdev = &vp_dev->mdev; 854 600 601 + mutex_destroy(&vp_dev->admin_vq.cmd_lock); 855 602 vp_modern_remove(mdev); 856 603 }
+23 -1
drivers/virtio/virtio_pci_modern_dev.c
··· 207 207 offsetof(struct virtio_pci_modern_common_cfg, queue_notify_data)); 208 208 BUILD_BUG_ON(VIRTIO_PCI_COMMON_Q_RESET != 209 209 offsetof(struct virtio_pci_modern_common_cfg, queue_reset)); 210 + BUILD_BUG_ON(VIRTIO_PCI_COMMON_ADM_Q_IDX != 211 + offsetof(struct virtio_pci_modern_common_cfg, admin_queue_index)); 212 + BUILD_BUG_ON(VIRTIO_PCI_COMMON_ADM_Q_NUM != 213 + offsetof(struct virtio_pci_modern_common_cfg, admin_queue_num)); 210 214 } 211 215 212 216 /* ··· 300 296 mdev->common = vp_modern_map_capability(mdev, common, 301 297 sizeof(struct virtio_pci_common_cfg), 4, 0, 302 298 offsetofend(struct virtio_pci_modern_common_cfg, 303 - queue_reset), 299 + admin_queue_num), 304 300 &mdev->common_len, NULL); 305 301 if (!mdev->common) 306 302 goto err_map_common; ··· 722 718 } 723 719 } 724 720 EXPORT_SYMBOL_GPL(vp_modern_map_vq_notify); 721 + 722 + u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev) 723 + { 724 + struct virtio_pci_modern_common_cfg __iomem *cfg; 725 + 726 + cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; 727 + return vp_ioread16(&cfg->admin_queue_num); 728 + } 729 + EXPORT_SYMBOL_GPL(vp_modern_avq_num); 730 + 731 + u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev) 732 + { 733 + struct virtio_pci_modern_common_cfg __iomem *cfg; 734 + 735 + cfg = (struct virtio_pci_modern_common_cfg __iomem *)mdev->common; 736 + return vp_ioread16(&cfg->admin_queue_index); 737 + } 738 + EXPORT_SYMBOL_GPL(vp_modern_avq_index); 725 739 726 740 MODULE_VERSION("0.1"); 727 741 MODULE_DESCRIPTION("Modern Virtio PCI Device");
+7
include/linux/vfio.h
··· 69 69 u8 iommufd_attached:1; 70 70 #endif 71 71 u8 cdev_opened:1; 72 + #ifdef CONFIG_DEBUG_FS 73 + /* 74 + * debug_root is a static property of the vfio_device 75 + * which must be set prior to registering the vfio_device. 76 + */ 77 + struct dentry *debug_root; 78 + #endif 72 79 }; 73 80 74 81 /**
+20
include/linux/vfio_pci_core.h
··· 127 127 int vfio_pci_core_enable(struct vfio_pci_core_device *vdev); 128 128 void vfio_pci_core_disable(struct vfio_pci_core_device *vdev); 129 129 void vfio_pci_core_finish_enable(struct vfio_pci_core_device *vdev); 130 + int vfio_pci_core_setup_barmap(struct vfio_pci_core_device *vdev, int bar); 130 131 pci_ers_result_t vfio_pci_core_aer_err_detected(struct pci_dev *pdev, 131 132 pci_channel_state_t state); 133 + 134 + #define VFIO_IOWRITE_DECLATION(size) \ 135 + int vfio_pci_core_iowrite##size(struct vfio_pci_core_device *vdev, \ 136 + bool test_mem, u##size val, void __iomem *io); 137 + 138 + VFIO_IOWRITE_DECLATION(8) 139 + VFIO_IOWRITE_DECLATION(16) 140 + VFIO_IOWRITE_DECLATION(32) 141 + #ifdef iowrite64 142 + VFIO_IOWRITE_DECLATION(64) 143 + #endif 144 + 145 + #define VFIO_IOREAD_DECLATION(size) \ 146 + int vfio_pci_core_ioread##size(struct vfio_pci_core_device *vdev, \ 147 + bool test_mem, u##size *val, void __iomem *io); 148 + 149 + VFIO_IOREAD_DECLATION(8) 150 + VFIO_IOREAD_DECLATION(16) 151 + VFIO_IOREAD_DECLATION(32) 132 152 133 153 #endif /* VFIO_PCI_CORE_H */
+8
include/linux/virtio.h
··· 103 103 int virtqueue_reset(struct virtqueue *vq, 104 104 void (*recycle)(struct virtqueue *vq, void *buf)); 105 105 106 + struct virtio_admin_cmd { 107 + __le16 opcode; 108 + __le16 group_type; 109 + __le64 group_member_id; 110 + struct scatterlist *data_sg; 111 + struct scatterlist *result_sg; 112 + }; 113 + 106 114 /** 107 115 * struct virtio_device - representation of a device using virtio 108 116 * @index: unique position on the virtio bus
+4
include/linux/virtio_config.h
··· 93 93 * Returns 0 on success or error status 94 94 * If disable_vq_and_reset is set, then enable_vq_after_reset must also be 95 95 * set. 96 + * @create_avq: create admin virtqueue resource. 97 + * @destroy_avq: destroy admin virtqueue resource. 96 98 */ 97 99 struct virtio_config_ops { 98 100 void (*get)(struct virtio_device *vdev, unsigned offset, ··· 122 120 struct virtio_shm_region *region, u8 id); 123 121 int (*disable_vq_and_reset)(struct virtqueue *vq); 124 122 int (*enable_vq_after_reset)(struct virtqueue *vq); 123 + int (*create_avq)(struct virtio_device *vdev); 124 + void (*destroy_avq)(struct virtio_device *vdev); 125 125 }; 126 126 127 127 /* If driver didn't advertise the feature, it will never appear. */
+23
include/linux/virtio_pci_admin.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _LINUX_VIRTIO_PCI_ADMIN_H 3 + #define _LINUX_VIRTIO_PCI_ADMIN_H 4 + 5 + #include <linux/types.h> 6 + #include <linux/pci.h> 7 + 8 + #ifdef CONFIG_VIRTIO_PCI_ADMIN_LEGACY 9 + bool virtio_pci_admin_has_legacy_io(struct pci_dev *pdev); 10 + int virtio_pci_admin_legacy_common_io_write(struct pci_dev *pdev, u8 offset, 11 + u8 size, u8 *buf); 12 + int virtio_pci_admin_legacy_common_io_read(struct pci_dev *pdev, u8 offset, 13 + u8 size, u8 *buf); 14 + int virtio_pci_admin_legacy_device_io_write(struct pci_dev *pdev, u8 offset, 15 + u8 size, u8 *buf); 16 + int virtio_pci_admin_legacy_device_io_read(struct pci_dev *pdev, u8 offset, 17 + u8 size, u8 *buf); 18 + int virtio_pci_admin_legacy_io_notify_info(struct pci_dev *pdev, 19 + u8 req_bar_flags, u8 *bar, 20 + u64 *bar_offset); 21 + #endif 22 + 23 + #endif /* _LINUX_VIRTIO_PCI_ADMIN_H */
+2
include/linux/virtio_pci_modern.h
··· 125 125 void vp_modern_remove(struct virtio_pci_modern_device *mdev); 126 126 int vp_modern_get_queue_reset(struct virtio_pci_modern_device *mdev, u16 index); 127 127 void vp_modern_set_queue_reset(struct virtio_pci_modern_device *mdev, u16 index); 128 + u16 vp_modern_avq_num(struct virtio_pci_modern_device *mdev); 129 + u16 vp_modern_avq_index(struct virtio_pci_modern_device *mdev); 128 130 #endif
+1
include/uapi/linux/vfio.h
··· 1219 1219 VFIO_DEVICE_STATE_RUNNING_P2P = 5, 1220 1220 VFIO_DEVICE_STATE_PRE_COPY = 6, 1221 1221 VFIO_DEVICE_STATE_PRE_COPY_P2P = 7, 1222 + VFIO_DEVICE_STATE_NR, 1222 1223 }; 1223 1224 1224 1225 /**
+7 -1
include/uapi/linux/virtio_config.h
··· 52 52 * rest are per-device feature bits. 53 53 */ 54 54 #define VIRTIO_TRANSPORT_F_START 28 55 - #define VIRTIO_TRANSPORT_F_END 41 55 + #define VIRTIO_TRANSPORT_F_END 42 56 56 57 57 #ifndef VIRTIO_CONFIG_NO_LEGACY 58 58 /* Do we get callbacks when the ring is completely used, even if we've ··· 114 114 * This feature indicates that the driver can reset a queue individually. 115 115 */ 116 116 #define VIRTIO_F_RING_RESET 40 117 + 118 + /* 119 + * This feature indicates that the device support administration virtqueues. 120 + */ 121 + #define VIRTIO_F_ADMIN_VQ 41 122 + 117 123 #endif /* _UAPI_LINUX_VIRTIO_CONFIG_H */
+68
include/uapi/linux/virtio_pci.h
··· 175 175 176 176 __le16 queue_notify_data; /* read-write */ 177 177 __le16 queue_reset; /* read-write */ 178 + 179 + __le16 admin_queue_index; /* read-only */ 180 + __le16 admin_queue_num; /* read-only */ 178 181 }; 179 182 180 183 /* Fields in VIRTIO_PCI_CAP_PCI_CFG: */ ··· 218 215 #define VIRTIO_PCI_COMMON_Q_USEDHI 52 219 216 #define VIRTIO_PCI_COMMON_Q_NDATA 56 220 217 #define VIRTIO_PCI_COMMON_Q_RESET 58 218 + #define VIRTIO_PCI_COMMON_ADM_Q_IDX 60 219 + #define VIRTIO_PCI_COMMON_ADM_Q_NUM 62 221 220 222 221 #endif /* VIRTIO_PCI_NO_MODERN */ 222 + 223 + /* Admin command status. */ 224 + #define VIRTIO_ADMIN_STATUS_OK 0 225 + 226 + /* Admin command opcode. */ 227 + #define VIRTIO_ADMIN_CMD_LIST_QUERY 0x0 228 + #define VIRTIO_ADMIN_CMD_LIST_USE 0x1 229 + 230 + /* Admin command group type. */ 231 + #define VIRTIO_ADMIN_GROUP_TYPE_SRIOV 0x1 232 + 233 + /* Transitional device admin command. */ 234 + #define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_WRITE 0x2 235 + #define VIRTIO_ADMIN_CMD_LEGACY_COMMON_CFG_READ 0x3 236 + #define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_WRITE 0x4 237 + #define VIRTIO_ADMIN_CMD_LEGACY_DEV_CFG_READ 0x5 238 + #define VIRTIO_ADMIN_CMD_LEGACY_NOTIFY_INFO 0x6 239 + 240 + struct __packed virtio_admin_cmd_hdr { 241 + __le16 opcode; 242 + /* 243 + * 1 - SR-IOV 244 + * 2-65535 - reserved 245 + */ 246 + __le16 group_type; 247 + /* Unused, reserved for future extensions. */ 248 + __u8 reserved1[12]; 249 + __le64 group_member_id; 250 + }; 251 + 252 + struct __packed virtio_admin_cmd_status { 253 + __le16 status; 254 + __le16 status_qualifier; 255 + /* Unused, reserved for future extensions. */ 256 + __u8 reserved2[4]; 257 + }; 258 + 259 + struct __packed virtio_admin_cmd_legacy_wr_data { 260 + __u8 offset; /* Starting offset of the register(s) to write. */ 261 + __u8 reserved[7]; 262 + __u8 registers[]; 263 + }; 264 + 265 + struct __packed virtio_admin_cmd_legacy_rd_data { 266 + __u8 offset; /* Starting offset of the register(s) to read. */ 267 + }; 268 + 269 + #define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_END 0 270 + #define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_DEV 0x1 271 + #define VIRTIO_ADMIN_CMD_NOTIFY_INFO_FLAGS_OWNER_MEM 0x2 272 + 273 + #define VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO 4 274 + 275 + struct __packed virtio_admin_cmd_notify_info_data { 276 + __u8 flags; /* 0 = end of list, 1 = owner device, 2 = member device */ 277 + __u8 bar; /* BAR of the member or the owner device */ 278 + __u8 padding[6]; 279 + __le64 offset; /* Offset within bar. */ 280 + }; 281 + 282 + struct virtio_admin_cmd_notify_info_result { 283 + struct virtio_admin_cmd_notify_info_data entries[VIRTIO_ADMIN_CMD_MAX_NOTIFY_INFO]; 284 + }; 223 285 224 286 #endif