Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'vfio-v6.19-rc1' of https://github.com/awilliam/linux-vfio

Pull VFIO updates from Alex Williamson:

- Move libvfio selftest artifacts in preparation of more tightly
coupled integration with KVM selftests (David Matlack)

- Fix comment typo in mtty driver (Chu Guangqing)

- Support for new hardware revision in the hisi_acc vfio-pci variant
driver where the migration registers can now be accessed via the PF.
When enabled for this support, the full BAR can be exposed to the
user (Longfang Liu)

- Fix vfio cdev support for VF token passing, using the correct size
for the kernel structure, thereby actually allowing userspace to
provide a non-zero UUID token. Also set the match token callback for
the hisi_acc, fixing VF token support for this this vfio-pci variant
driver (Raghavendra Rao Ananta)

- Introduce internal callbacks on vfio devices to simplify and
consolidate duplicate code for generating VFIO_DEVICE_GET_REGION_INFO
data, removing various ioctl intercepts with a more structured
solution (Jason Gunthorpe)

- Introduce dma-buf support for vfio-pci devices, allowing MMIO regions
to be exposed through dma-buf objects with lifecycle managed through
move operations. This enables low-level interactions such as a
vfio-pci based SPDK drivers interacting directly with dma-buf capable
RDMA devices to enable peer-to-peer operations. IOMMUFD is also now
able to build upon this support to fill a long standing feature gap
versus the legacy vfio type1 IOMMU backend with an implementation of
P2P support for VM use cases that better manages the lifecycle of the
P2P mapping (Leon Romanovsky, Jason Gunthorpe, Vivek Kasireddy)

- Convert eventfd triggering for error and request signals to use RCU
mechanisms in order to avoid a 3-way lockdep reported deadlock issue
(Alex Williamson)

- Fix a 32-bit overflow introduced via dma-buf support manifesting with
large DMA buffers (Alex Mastro)

- Convert nvgrace-gpu vfio-pci variant driver to insert mappings on
fault rather than at mmap time. This conversion serves both to make
use of huge PFNMAPs but also to both avoid corrected RAS events
during reset by now being subject to vfio-pci-core's use of
unmap_mapping_range(), and to enable a device readiness test after
reset (Ankit Agrawal)

- Refactoring of vfio selftests to support multi-device tests and split
code to provide better separation between IOMMU and device objects.
This work also enables a new test suite addition to measure parallel
device initialization latency (David Matlack)

* tag 'vfio-v6.19-rc1' of https://github.com/awilliam/linux-vfio: (65 commits)
vfio: selftests: Add vfio_pci_device_init_perf_test
vfio: selftests: Eliminate INVALID_IOVA
vfio: selftests: Split libvfio.h into separate header files
vfio: selftests: Move vfio_selftests_*() helpers into libvfio.c
vfio: selftests: Rename vfio_util.h to libvfio.h
vfio: selftests: Stop passing device for IOMMU operations
vfio: selftests: Move IOVA allocator into iova_allocator.c
vfio: selftests: Move IOMMU library code into iommu.c
vfio: selftests: Rename struct vfio_dma_region to dma_region
vfio: selftests: Upgrade driver logging to dev_err()
vfio: selftests: Prefix logs with device BDF where relevant
vfio: selftests: Eliminate overly chatty logging
vfio: selftests: Support multiple devices in the same container/iommufd
vfio: selftests: Introduce struct iommu
vfio: selftests: Rename struct vfio_iommu_mode to iommu_mode
vfio: selftests: Allow passing multiple BDFs on the command line
vfio: selftests: Split run.sh into separate scripts
vfio: selftests: Move run.sh into scripts directory
vfio/nvgrace-gpu: wait for the GPU mem to be ready
vfio/nvgrace-gpu: Inform devmem unmapped after reset
...

+3431 -1904
+73 -22
Documentation/driver-api/pci/p2pdma.rst
··· 9 9 called Peer-to-Peer (or P2P). However, there are a number of issues that 10 10 make P2P transactions tricky to do in a perfectly safe way. 11 11 12 - One of the biggest issues is that PCI doesn't require forwarding 13 - transactions between hierarchy domains, and in PCIe, each Root Port 14 - defines a separate hierarchy domain. To make things worse, there is no 15 - simple way to determine if a given Root Complex supports this or not. 16 - (See PCIe r4.0, sec 1.3.1). Therefore, as of this writing, the kernel 17 - only supports doing P2P when the endpoints involved are all behind the 18 - same PCI bridge, as such devices are all in the same PCI hierarchy 19 - domain, and the spec guarantees that all transactions within the 20 - hierarchy will be routable, but it does not require routing 21 - between hierarchies. 12 + For PCIe the routing of Transaction Layer Packets (TLPs) is well-defined up 13 + until they reach a host bridge or root port. If the path includes PCIe switches 14 + then based on the ACS settings the transaction can route entirely within 15 + the PCIe hierarchy and never reach the root port. The kernel will evaluate 16 + the PCIe topology and always permit P2P in these well-defined cases. 22 17 23 - The second issue is that to make use of existing interfaces in Linux, 24 - memory that is used for P2P transactions needs to be backed by struct 25 - pages. However, PCI BARs are not typically cache coherent so there are 26 - a few corner case gotchas with these pages so developers need to 27 - be careful about what they do with them. 18 + However, if the P2P transaction reaches the host bridge then it might have to 19 + hairpin back out the same root port, be routed inside the CPU SOC to another 20 + PCIe root port, or routed internally to the SOC. 21 + 22 + The PCIe specification doesn't define the forwarding of transactions between 23 + hierarchy domains and kernel defaults to blocking such routing. There is an 24 + allow list to allow detecting known-good HW, in which case P2P between any 25 + two PCIe devices will be permitted. 26 + 27 + Since P2P inherently is doing transactions between two devices it requires two 28 + drivers to be co-operating inside the kernel. The providing driver has to convey 29 + its MMIO to the consuming driver. To meet the driver model lifecycle rules the 30 + MMIO must have all DMA mapping removed, all CPU accesses prevented, all page 31 + table mappings undone before the providing driver completes remove(). 32 + 33 + This requires the providing and consuming driver to actively work together to 34 + guarantee that the consuming driver has stopped using the MMIO during a removal 35 + cycle. This is done by either a synchronous invalidation shutdown or waiting 36 + for all usage refcounts to reach zero. 37 + 38 + At the lowest level the P2P subsystem offers a naked struct p2p_provider that 39 + delegates lifecycle management to the providing driver. It is expected that 40 + drivers using this option will wrap their MMIO memory in DMABUF and use DMABUF 41 + to provide an invalidation shutdown. These MMIO addresess have no struct page, and 42 + if used with mmap() must create special PTEs. As such there are very few 43 + kernel uAPIs that can accept pointers to them; in particular they cannot be used 44 + with read()/write(), including O_DIRECT. 45 + 46 + Building on this, the subsystem offers a layer to wrap the MMIO in a ZONE_DEVICE 47 + pgmap of MEMORY_DEVICE_PCI_P2PDMA to create struct pages. The lifecycle of 48 + pgmap ensures that when the pgmap is destroyed all other drivers have stopped 49 + using the MMIO. This option works with O_DIRECT flows, in some cases, if the 50 + underlying subsystem supports handling MEMORY_DEVICE_PCI_P2PDMA through 51 + FOLL_PCI_P2PDMA. The use of FOLL_LONGTERM is prevented. As this relies on pgmap 52 + it also relies on architecture support along with alignment and minimum size 53 + limitations. 28 54 29 55 30 56 Driver Writer's Guide ··· 140 114 Struct Page Caveats 141 115 ------------------- 142 116 143 - Driver writers should be very careful about not passing these special 144 - struct pages to code that isn't prepared for it. At this time, the kernel 145 - interfaces do not have any checks for ensuring this. This obviously 146 - precludes passing these pages to userspace. 117 + While the MEMORY_DEVICE_PCI_P2PDMA pages can be installed in VMAs, 118 + pin_user_pages() and related will not return them unless FOLL_PCI_P2PDMA is set. 147 119 148 - P2P memory is also technically IO memory but should never have any side 149 - effects behind it. Thus, the order of loads and stores should not be important 150 - and ioreadX(), iowriteX() and friends should not be necessary. 120 + The MEMORY_DEVICE_PCI_P2PDMA pages require care to support in the kernel. The 121 + KVA is still MMIO and must still be accessed through the normal 122 + readX()/writeX()/etc helpers. Direct CPU access (e.g. memcpy) is forbidden, just 123 + like any other MMIO mapping. While this will actually work on some 124 + architectures, others will experience corruption or just crash in the kernel. 125 + Supporting FOLL_PCI_P2PDMA in a subsystem requires scrubbing it to ensure no CPU 126 + access happens. 127 + 128 + 129 + Usage With DMABUF 130 + ================= 131 + 132 + DMABUF provides an alternative to the above struct page-based 133 + client/provider/orchestrator system and should be used when struct page 134 + doesn't exist. In this mode the exporting driver will wrap 135 + some of its MMIO in a DMABUF and give the DMABUF FD to userspace. 136 + 137 + Userspace can then pass the FD to an importing driver which will ask the 138 + exporting driver to map it to the importer. 139 + 140 + In this case the initiator and target pci_devices are known and the P2P subsystem 141 + is used to determine the mapping type. The phys_addr_t-based DMA API is used to 142 + establish the dma_addr_t. 143 + 144 + Lifecycle is controlled by DMABUF move_notify(). When the exporting driver wants 145 + to remove() it must deliver an invalidation shutdown to all DMABUF importing 146 + drivers through move_notify() and synchronously DMA unmap all the MMIO. 147 + 148 + No importing driver can continue to have a DMA map to the MMIO after the 149 + exporting driver has destroyed its p2p_provider. 151 150 152 151 153 152 P2P DMA Support Library
+1 -1
block/blk-mq-dma.c
··· 84 84 85 85 static bool blk_dma_map_bus(struct blk_dma_iter *iter, struct phys_vec *vec) 86 86 { 87 - iter->addr = pci_p2pdma_bus_addr_map(&iter->p2pdma, vec->paddr); 87 + iter->addr = pci_p2pdma_bus_addr_map(iter->p2pdma.mem, vec->paddr); 88 88 iter->len = vec->len; 89 89 return true; 90 90 }
+27
drivers/crypto/hisilicon/qm.c
··· 3032 3032 pci_release_mem_regions(pdev); 3033 3033 } 3034 3034 3035 + static void hisi_mig_region_clear(struct hisi_qm *qm) 3036 + { 3037 + u32 val; 3038 + 3039 + /* Clear migration region set of PF */ 3040 + if (qm->fun_type == QM_HW_PF && qm->ver > QM_HW_V3) { 3041 + val = readl(qm->io_base + QM_MIG_REGION_SEL); 3042 + val &= ~QM_MIG_REGION_EN; 3043 + writel(val, qm->io_base + QM_MIG_REGION_SEL); 3044 + } 3045 + } 3046 + 3047 + static void hisi_mig_region_enable(struct hisi_qm *qm) 3048 + { 3049 + u32 val; 3050 + 3051 + /* Select migration region of PF */ 3052 + if (qm->fun_type == QM_HW_PF && qm->ver > QM_HW_V3) { 3053 + val = readl(qm->io_base + QM_MIG_REGION_SEL); 3054 + val |= QM_MIG_REGION_EN; 3055 + writel(val, qm->io_base + QM_MIG_REGION_SEL); 3056 + } 3057 + } 3058 + 3035 3059 static void hisi_qm_pci_uninit(struct hisi_qm *qm) 3036 3060 { 3037 3061 struct pci_dev *pdev = qm->pdev; 3038 3062 3039 3063 pci_free_irq_vectors(pdev); 3064 + hisi_mig_region_clear(qm); 3040 3065 qm_put_pci_res(qm); 3041 3066 pci_disable_device(pdev); 3042 3067 } ··· 5777 5752 goto err_free_qm_memory; 5778 5753 5779 5754 qm_cmd_init(qm); 5755 + hisi_mig_region_enable(qm); 5780 5756 5781 5757 return 0; 5782 5758 ··· 5916 5890 } 5917 5891 5918 5892 qm_cmd_init(qm); 5893 + hisi_mig_region_enable(qm); 5919 5894 hisi_qm_dev_err_init(qm); 5920 5895 /* Set the doorbell timeout to QM_DB_TIMEOUT_CFG ns. */ 5921 5896 writel(QM_DB_TIMEOUT_SET, qm->io_base + QM_DB_TIMEOUT_CFG);
+1 -1
drivers/dma-buf/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 2 obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \ 3 - dma-fence-unwrap.o dma-resv.o 3 + dma-fence-unwrap.o dma-resv.o dma-buf-mapping.o 4 4 obj-$(CONFIG_DMABUF_HEAPS) += dma-heap.o 5 5 obj-$(CONFIG_DMABUF_HEAPS) += heaps/ 6 6 obj-$(CONFIG_SYNC_FILE) += sync_file.o
+248
drivers/dma-buf/dma-buf-mapping.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * DMA BUF Mapping Helpers 4 + * 5 + */ 6 + #include <linux/dma-buf-mapping.h> 7 + #include <linux/dma-resv.h> 8 + 9 + static struct scatterlist *fill_sg_entry(struct scatterlist *sgl, size_t length, 10 + dma_addr_t addr) 11 + { 12 + unsigned int len, nents; 13 + int i; 14 + 15 + nents = DIV_ROUND_UP(length, UINT_MAX); 16 + for (i = 0; i < nents; i++) { 17 + len = min_t(size_t, length, UINT_MAX); 18 + length -= len; 19 + /* 20 + * DMABUF abuses scatterlist to create a scatterlist 21 + * that does not have any CPU list, only the DMA list. 22 + * Always set the page related values to NULL to ensure 23 + * importers can't use it. The phys_addr based DMA API 24 + * does not require the CPU list for mapping or unmapping. 25 + */ 26 + sg_set_page(sgl, NULL, 0, 0); 27 + sg_dma_address(sgl) = addr + (dma_addr_t)i * UINT_MAX; 28 + sg_dma_len(sgl) = len; 29 + sgl = sg_next(sgl); 30 + } 31 + 32 + return sgl; 33 + } 34 + 35 + static unsigned int calc_sg_nents(struct dma_iova_state *state, 36 + struct dma_buf_phys_vec *phys_vec, 37 + size_t nr_ranges, size_t size) 38 + { 39 + unsigned int nents = 0; 40 + size_t i; 41 + 42 + if (!state || !dma_use_iova(state)) { 43 + for (i = 0; i < nr_ranges; i++) 44 + nents += DIV_ROUND_UP(phys_vec[i].len, UINT_MAX); 45 + } else { 46 + /* 47 + * In IOVA case, there is only one SG entry which spans 48 + * for whole IOVA address space, but we need to make sure 49 + * that it fits sg->length, maybe we need more. 50 + */ 51 + nents = DIV_ROUND_UP(size, UINT_MAX); 52 + } 53 + 54 + return nents; 55 + } 56 + 57 + /** 58 + * struct dma_buf_dma - holds DMA mapping information 59 + * @sgt: Scatter-gather table 60 + * @state: DMA IOVA state relevant in IOMMU-based DMA 61 + * @size: Total size of DMA transfer 62 + */ 63 + struct dma_buf_dma { 64 + struct sg_table sgt; 65 + struct dma_iova_state *state; 66 + size_t size; 67 + }; 68 + 69 + /** 70 + * dma_buf_phys_vec_to_sgt - Returns the scatterlist table of the attachment 71 + * from arrays of physical vectors. This funciton is intended for MMIO memory 72 + * only. 73 + * @attach: [in] attachment whose scatterlist is to be returned 74 + * @provider: [in] p2pdma provider 75 + * @phys_vec: [in] array of physical vectors 76 + * @nr_ranges: [in] number of entries in phys_vec array 77 + * @size: [in] total size of phys_vec 78 + * @dir: [in] direction of DMA transfer 79 + * 80 + * Returns sg_table containing the scatterlist to be returned; returns ERR_PTR 81 + * on error. May return -EINTR if it is interrupted by a signal. 82 + * 83 + * On success, the DMA addresses and lengths in the returned scatterlist are 84 + * PAGE_SIZE aligned. 85 + * 86 + * A mapping must be unmapped by using dma_buf_free_sgt(). 87 + * 88 + * NOTE: This function is intended for exporters. If direct traffic routing is 89 + * mandatory exporter should call routing pci_p2pdma_map_type() before calling 90 + * this function. 91 + */ 92 + struct sg_table *dma_buf_phys_vec_to_sgt(struct dma_buf_attachment *attach, 93 + struct p2pdma_provider *provider, 94 + struct dma_buf_phys_vec *phys_vec, 95 + size_t nr_ranges, size_t size, 96 + enum dma_data_direction dir) 97 + { 98 + unsigned int nents, mapped_len = 0; 99 + struct dma_buf_dma *dma; 100 + struct scatterlist *sgl; 101 + dma_addr_t addr; 102 + size_t i; 103 + int ret; 104 + 105 + dma_resv_assert_held(attach->dmabuf->resv); 106 + 107 + if (WARN_ON(!attach || !attach->dmabuf || !provider)) 108 + /* This function is supposed to work on MMIO memory only */ 109 + return ERR_PTR(-EINVAL); 110 + 111 + dma = kzalloc(sizeof(*dma), GFP_KERNEL); 112 + if (!dma) 113 + return ERR_PTR(-ENOMEM); 114 + 115 + switch (pci_p2pdma_map_type(provider, attach->dev)) { 116 + case PCI_P2PDMA_MAP_BUS_ADDR: 117 + /* 118 + * There is no need in IOVA at all for this flow. 119 + */ 120 + break; 121 + case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: 122 + dma->state = kzalloc(sizeof(*dma->state), GFP_KERNEL); 123 + if (!dma->state) { 124 + ret = -ENOMEM; 125 + goto err_free_dma; 126 + } 127 + 128 + dma_iova_try_alloc(attach->dev, dma->state, 0, size); 129 + break; 130 + default: 131 + ret = -EINVAL; 132 + goto err_free_dma; 133 + } 134 + 135 + nents = calc_sg_nents(dma->state, phys_vec, nr_ranges, size); 136 + ret = sg_alloc_table(&dma->sgt, nents, GFP_KERNEL | __GFP_ZERO); 137 + if (ret) 138 + goto err_free_state; 139 + 140 + sgl = dma->sgt.sgl; 141 + 142 + for (i = 0; i < nr_ranges; i++) { 143 + if (!dma->state) { 144 + addr = pci_p2pdma_bus_addr_map(provider, 145 + phys_vec[i].paddr); 146 + } else if (dma_use_iova(dma->state)) { 147 + ret = dma_iova_link(attach->dev, dma->state, 148 + phys_vec[i].paddr, 0, 149 + phys_vec[i].len, dir, 150 + DMA_ATTR_MMIO); 151 + if (ret) 152 + goto err_unmap_dma; 153 + 154 + mapped_len += phys_vec[i].len; 155 + } else { 156 + addr = dma_map_phys(attach->dev, phys_vec[i].paddr, 157 + phys_vec[i].len, dir, 158 + DMA_ATTR_MMIO); 159 + ret = dma_mapping_error(attach->dev, addr); 160 + if (ret) 161 + goto err_unmap_dma; 162 + } 163 + 164 + if (!dma->state || !dma_use_iova(dma->state)) 165 + sgl = fill_sg_entry(sgl, phys_vec[i].len, addr); 166 + } 167 + 168 + if (dma->state && dma_use_iova(dma->state)) { 169 + WARN_ON_ONCE(mapped_len != size); 170 + ret = dma_iova_sync(attach->dev, dma->state, 0, mapped_len); 171 + if (ret) 172 + goto err_unmap_dma; 173 + 174 + sgl = fill_sg_entry(sgl, mapped_len, dma->state->addr); 175 + } 176 + 177 + dma->size = size; 178 + 179 + /* 180 + * No CPU list included — set orig_nents = 0 so others can detect 181 + * this via SG table (use nents only). 182 + */ 183 + dma->sgt.orig_nents = 0; 184 + 185 + 186 + /* 187 + * SGL must be NULL to indicate that SGL is the last one 188 + * and we allocated correct number of entries in sg_alloc_table() 189 + */ 190 + WARN_ON_ONCE(sgl); 191 + return &dma->sgt; 192 + 193 + err_unmap_dma: 194 + if (!i || !dma->state) { 195 + ; /* Do nothing */ 196 + } else if (dma_use_iova(dma->state)) { 197 + dma_iova_destroy(attach->dev, dma->state, mapped_len, dir, 198 + DMA_ATTR_MMIO); 199 + } else { 200 + for_each_sgtable_dma_sg(&dma->sgt, sgl, i) 201 + dma_unmap_phys(attach->dev, sg_dma_address(sgl), 202 + sg_dma_len(sgl), dir, DMA_ATTR_MMIO); 203 + } 204 + sg_free_table(&dma->sgt); 205 + err_free_state: 206 + kfree(dma->state); 207 + err_free_dma: 208 + kfree(dma); 209 + return ERR_PTR(ret); 210 + } 211 + EXPORT_SYMBOL_NS_GPL(dma_buf_phys_vec_to_sgt, "DMA_BUF"); 212 + 213 + /** 214 + * dma_buf_free_sgt- unmaps the buffer 215 + * @attach: [in] attachment to unmap buffer from 216 + * @sgt: [in] scatterlist info of the buffer to unmap 217 + * @dir: [in] direction of DMA transfer 218 + * 219 + * This unmaps a DMA mapping for @attached obtained 220 + * by dma_buf_phys_vec_to_sgt(). 221 + */ 222 + void dma_buf_free_sgt(struct dma_buf_attachment *attach, struct sg_table *sgt, 223 + enum dma_data_direction dir) 224 + { 225 + struct dma_buf_dma *dma = container_of(sgt, struct dma_buf_dma, sgt); 226 + int i; 227 + 228 + dma_resv_assert_held(attach->dmabuf->resv); 229 + 230 + if (!dma->state) { 231 + ; /* Do nothing */ 232 + } else if (dma_use_iova(dma->state)) { 233 + dma_iova_destroy(attach->dev, dma->state, dma->size, dir, 234 + DMA_ATTR_MMIO); 235 + } else { 236 + struct scatterlist *sgl; 237 + 238 + for_each_sgtable_dma_sg(sgt, sgl, i) 239 + dma_unmap_phys(attach->dev, sg_dma_address(sgl), 240 + sg_dma_len(sgl), dir, DMA_ATTR_MMIO); 241 + } 242 + 243 + sg_free_table(sgt); 244 + kfree(dma->state); 245 + kfree(dma); 246 + 247 + } 248 + EXPORT_SYMBOL_NS_GPL(dma_buf_free_sgt, "DMA_BUF");
+117 -146
drivers/gpu/drm/i915/gvt/kvmgt.c
··· 1141 1141 return func(vgpu, index, start, count, flags, data); 1142 1142 } 1143 1143 1144 + static int intel_vgpu_ioctl_get_region_info(struct vfio_device *vfio_dev, 1145 + struct vfio_region_info *info, 1146 + struct vfio_info_cap *caps) 1147 + { 1148 + struct vfio_region_info_cap_sparse_mmap *sparse = NULL; 1149 + struct intel_vgpu *vgpu = vfio_dev_to_vgpu(vfio_dev); 1150 + int nr_areas = 1; 1151 + int cap_type_id; 1152 + unsigned int i; 1153 + int ret; 1154 + 1155 + switch (info->index) { 1156 + case VFIO_PCI_CONFIG_REGION_INDEX: 1157 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1158 + info->size = vgpu->gvt->device_info.cfg_space_size; 1159 + info->flags = VFIO_REGION_INFO_FLAG_READ | 1160 + VFIO_REGION_INFO_FLAG_WRITE; 1161 + break; 1162 + case VFIO_PCI_BAR0_REGION_INDEX: 1163 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1164 + info->size = vgpu->cfg_space.bar[info->index].size; 1165 + if (!info->size) { 1166 + info->flags = 0; 1167 + break; 1168 + } 1169 + 1170 + info->flags = VFIO_REGION_INFO_FLAG_READ | 1171 + VFIO_REGION_INFO_FLAG_WRITE; 1172 + break; 1173 + case VFIO_PCI_BAR1_REGION_INDEX: 1174 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1175 + info->size = 0; 1176 + info->flags = 0; 1177 + break; 1178 + case VFIO_PCI_BAR2_REGION_INDEX: 1179 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1180 + info->flags = VFIO_REGION_INFO_FLAG_CAPS | 1181 + VFIO_REGION_INFO_FLAG_MMAP | 1182 + VFIO_REGION_INFO_FLAG_READ | 1183 + VFIO_REGION_INFO_FLAG_WRITE; 1184 + info->size = gvt_aperture_sz(vgpu->gvt); 1185 + 1186 + sparse = kzalloc(struct_size(sparse, areas, nr_areas), 1187 + GFP_KERNEL); 1188 + if (!sparse) 1189 + return -ENOMEM; 1190 + 1191 + sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; 1192 + sparse->header.version = 1; 1193 + sparse->nr_areas = nr_areas; 1194 + cap_type_id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; 1195 + sparse->areas[0].offset = 1196 + PAGE_ALIGN(vgpu_aperture_offset(vgpu)); 1197 + sparse->areas[0].size = vgpu_aperture_sz(vgpu); 1198 + break; 1199 + 1200 + case VFIO_PCI_BAR3_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: 1201 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1202 + info->size = 0; 1203 + info->flags = 0; 1204 + 1205 + gvt_dbg_core("get region info bar:%d\n", info->index); 1206 + break; 1207 + 1208 + case VFIO_PCI_ROM_REGION_INDEX: 1209 + case VFIO_PCI_VGA_REGION_INDEX: 1210 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1211 + info->size = 0; 1212 + info->flags = 0; 1213 + 1214 + gvt_dbg_core("get region info index:%d\n", info->index); 1215 + break; 1216 + default: { 1217 + struct vfio_region_info_cap_type cap_type = { 1218 + .header.id = VFIO_REGION_INFO_CAP_TYPE, 1219 + .header.version = 1 1220 + }; 1221 + 1222 + if (info->index >= VFIO_PCI_NUM_REGIONS + vgpu->num_regions) 1223 + return -EINVAL; 1224 + info->index = array_index_nospec( 1225 + info->index, VFIO_PCI_NUM_REGIONS + vgpu->num_regions); 1226 + 1227 + i = info->index - VFIO_PCI_NUM_REGIONS; 1228 + 1229 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1230 + info->size = vgpu->region[i].size; 1231 + info->flags = vgpu->region[i].flags; 1232 + 1233 + cap_type.type = vgpu->region[i].type; 1234 + cap_type.subtype = vgpu->region[i].subtype; 1235 + 1236 + ret = vfio_info_add_capability(caps, &cap_type.header, 1237 + sizeof(cap_type)); 1238 + if (ret) 1239 + return ret; 1240 + } 1241 + } 1242 + 1243 + if ((info->flags & VFIO_REGION_INFO_FLAG_CAPS) && sparse) { 1244 + ret = -EINVAL; 1245 + if (cap_type_id == VFIO_REGION_INFO_CAP_SPARSE_MMAP) { 1246 + ret = vfio_info_add_capability( 1247 + caps, &sparse->header, 1248 + struct_size(sparse, areas, sparse->nr_areas)); 1249 + } 1250 + if (ret) { 1251 + kfree(sparse); 1252 + return ret; 1253 + } 1254 + } 1255 + 1256 + kfree(sparse); 1257 + return 0; 1258 + } 1259 + 1144 1260 static long intel_vgpu_ioctl(struct vfio_device *vfio_dev, unsigned int cmd, 1145 1261 unsigned long arg) 1146 1262 { ··· 1285 1169 return copy_to_user((void __user *)arg, &info, minsz) ? 1286 1170 -EFAULT : 0; 1287 1171 1288 - } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { 1289 - struct vfio_region_info info; 1290 - struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; 1291 - unsigned int i; 1292 - int ret; 1293 - struct vfio_region_info_cap_sparse_mmap *sparse = NULL; 1294 - int nr_areas = 1; 1295 - int cap_type_id; 1296 - 1297 - minsz = offsetofend(struct vfio_region_info, offset); 1298 - 1299 - if (copy_from_user(&info, (void __user *)arg, minsz)) 1300 - return -EFAULT; 1301 - 1302 - if (info.argsz < minsz) 1303 - return -EINVAL; 1304 - 1305 - switch (info.index) { 1306 - case VFIO_PCI_CONFIG_REGION_INDEX: 1307 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 1308 - info.size = vgpu->gvt->device_info.cfg_space_size; 1309 - info.flags = VFIO_REGION_INFO_FLAG_READ | 1310 - VFIO_REGION_INFO_FLAG_WRITE; 1311 - break; 1312 - case VFIO_PCI_BAR0_REGION_INDEX: 1313 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 1314 - info.size = vgpu->cfg_space.bar[info.index].size; 1315 - if (!info.size) { 1316 - info.flags = 0; 1317 - break; 1318 - } 1319 - 1320 - info.flags = VFIO_REGION_INFO_FLAG_READ | 1321 - VFIO_REGION_INFO_FLAG_WRITE; 1322 - break; 1323 - case VFIO_PCI_BAR1_REGION_INDEX: 1324 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 1325 - info.size = 0; 1326 - info.flags = 0; 1327 - break; 1328 - case VFIO_PCI_BAR2_REGION_INDEX: 1329 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 1330 - info.flags = VFIO_REGION_INFO_FLAG_CAPS | 1331 - VFIO_REGION_INFO_FLAG_MMAP | 1332 - VFIO_REGION_INFO_FLAG_READ | 1333 - VFIO_REGION_INFO_FLAG_WRITE; 1334 - info.size = gvt_aperture_sz(vgpu->gvt); 1335 - 1336 - sparse = kzalloc(struct_size(sparse, areas, nr_areas), 1337 - GFP_KERNEL); 1338 - if (!sparse) 1339 - return -ENOMEM; 1340 - 1341 - sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; 1342 - sparse->header.version = 1; 1343 - sparse->nr_areas = nr_areas; 1344 - cap_type_id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; 1345 - sparse->areas[0].offset = 1346 - PAGE_ALIGN(vgpu_aperture_offset(vgpu)); 1347 - sparse->areas[0].size = vgpu_aperture_sz(vgpu); 1348 - break; 1349 - 1350 - case VFIO_PCI_BAR3_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: 1351 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 1352 - info.size = 0; 1353 - info.flags = 0; 1354 - 1355 - gvt_dbg_core("get region info bar:%d\n", info.index); 1356 - break; 1357 - 1358 - case VFIO_PCI_ROM_REGION_INDEX: 1359 - case VFIO_PCI_VGA_REGION_INDEX: 1360 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 1361 - info.size = 0; 1362 - info.flags = 0; 1363 - 1364 - gvt_dbg_core("get region info index:%d\n", info.index); 1365 - break; 1366 - default: 1367 - { 1368 - struct vfio_region_info_cap_type cap_type = { 1369 - .header.id = VFIO_REGION_INFO_CAP_TYPE, 1370 - .header.version = 1 }; 1371 - 1372 - if (info.index >= VFIO_PCI_NUM_REGIONS + 1373 - vgpu->num_regions) 1374 - return -EINVAL; 1375 - info.index = 1376 - array_index_nospec(info.index, 1377 - VFIO_PCI_NUM_REGIONS + 1378 - vgpu->num_regions); 1379 - 1380 - i = info.index - VFIO_PCI_NUM_REGIONS; 1381 - 1382 - info.offset = 1383 - VFIO_PCI_INDEX_TO_OFFSET(info.index); 1384 - info.size = vgpu->region[i].size; 1385 - info.flags = vgpu->region[i].flags; 1386 - 1387 - cap_type.type = vgpu->region[i].type; 1388 - cap_type.subtype = vgpu->region[i].subtype; 1389 - 1390 - ret = vfio_info_add_capability(&caps, 1391 - &cap_type.header, 1392 - sizeof(cap_type)); 1393 - if (ret) 1394 - return ret; 1395 - } 1396 - } 1397 - 1398 - if ((info.flags & VFIO_REGION_INFO_FLAG_CAPS) && sparse) { 1399 - ret = -EINVAL; 1400 - if (cap_type_id == VFIO_REGION_INFO_CAP_SPARSE_MMAP) 1401 - ret = vfio_info_add_capability(&caps, 1402 - &sparse->header, 1403 - struct_size(sparse, areas, 1404 - sparse->nr_areas)); 1405 - if (ret) { 1406 - kfree(sparse); 1407 - return ret; 1408 - } 1409 - } 1410 - 1411 - if (caps.size) { 1412 - info.flags |= VFIO_REGION_INFO_FLAG_CAPS; 1413 - if (info.argsz < sizeof(info) + caps.size) { 1414 - info.argsz = sizeof(info) + caps.size; 1415 - info.cap_offset = 0; 1416 - } else { 1417 - vfio_info_cap_shift(&caps, sizeof(info)); 1418 - if (copy_to_user((void __user *)arg + 1419 - sizeof(info), caps.buf, 1420 - caps.size)) { 1421 - kfree(caps.buf); 1422 - kfree(sparse); 1423 - return -EFAULT; 1424 - } 1425 - info.cap_offset = sizeof(info); 1426 - } 1427 - 1428 - kfree(caps.buf); 1429 - } 1430 - 1431 - kfree(sparse); 1432 - return copy_to_user((void __user *)arg, &info, minsz) ? 1433 - -EFAULT : 0; 1434 1172 } else if (cmd == VFIO_DEVICE_GET_IRQ_INFO) { 1435 1173 struct vfio_irq_info info; 1436 1174 ··· 1447 1477 .write = intel_vgpu_write, 1448 1478 .mmap = intel_vgpu_mmap, 1449 1479 .ioctl = intel_vgpu_ioctl, 1480 + .get_region_info_caps = intel_vgpu_ioctl_get_region_info, 1450 1481 .dma_unmap = intel_vgpu_dma_unmap, 1451 1482 .bind_iommufd = vfio_iommufd_emulated_bind, 1452 1483 .unbind_iommufd = vfio_iommufd_emulated_unbind,
+2 -2
drivers/iommu/dma-iommu.c
··· 1439 1439 * as a bus address, __finalise_sg() will copy the dma 1440 1440 * address into the output segment. 1441 1441 */ 1442 - s->dma_address = pci_p2pdma_bus_addr_map(&p2pdma_state, 1443 - sg_phys(s)); 1442 + s->dma_address = pci_p2pdma_bus_addr_map( 1443 + p2pdma_state.mem, sg_phys(s)); 1444 1444 sg_dma_len(s) = sg->length; 1445 1445 sg_dma_mark_bus_address(s); 1446 1446 continue;
+145 -43
drivers/pci/p2pdma.c
··· 25 25 struct gen_pool *pool; 26 26 bool p2pmem_published; 27 27 struct xarray map_types; 28 + struct p2pdma_provider mem[PCI_STD_NUM_BARS]; 28 29 }; 29 30 30 31 struct pci_p2pdma_pagemap { 31 - struct pci_dev *provider; 32 - u64 bus_offset; 33 32 struct dev_pagemap pgmap; 33 + struct p2pdma_provider *mem; 34 34 }; 35 35 36 36 static struct pci_p2pdma_pagemap *to_p2p_pgmap(struct dev_pagemap *pgmap) ··· 204 204 { 205 205 struct pci_p2pdma_pagemap *pgmap = to_p2p_pgmap(page_pgmap(page)); 206 206 /* safe to dereference while a reference is held to the percpu ref */ 207 - struct pci_p2pdma *p2pdma = 208 - rcu_dereference_protected(pgmap->provider->p2pdma, 1); 207 + struct pci_p2pdma *p2pdma = rcu_dereference_protected( 208 + to_pci_dev(pgmap->mem->owner)->p2pdma, 1); 209 209 struct percpu_ref *ref; 210 210 211 211 gen_pool_free_owner(p2pdma->pool, (uintptr_t)page_to_virt(page), ··· 228 228 229 229 /* Flush and disable pci_alloc_p2p_mem() */ 230 230 pdev->p2pdma = NULL; 231 - synchronize_rcu(); 231 + if (p2pdma->pool) 232 + synchronize_rcu(); 233 + xa_destroy(&p2pdma->map_types); 234 + 235 + if (!p2pdma->pool) 236 + return; 232 237 233 238 gen_pool_destroy(p2pdma->pool); 234 239 sysfs_remove_group(&pdev->dev.kobj, &p2pmem_group); 235 - xa_destroy(&p2pdma->map_types); 236 240 } 237 241 238 - static int pci_p2pdma_setup(struct pci_dev *pdev) 242 + /** 243 + * pcim_p2pdma_init - Initialise peer-to-peer DMA providers 244 + * @pdev: The PCI device to enable P2PDMA for 245 + * 246 + * This function initializes the peer-to-peer DMA infrastructure 247 + * for a PCI device. It allocates and sets up the necessary data 248 + * structures to support P2PDMA operations, including mapping type 249 + * tracking. 250 + */ 251 + int pcim_p2pdma_init(struct pci_dev *pdev) 239 252 { 240 - int error = -ENOMEM; 241 253 struct pci_p2pdma *p2p; 254 + int i, ret; 255 + 256 + p2p = rcu_dereference_protected(pdev->p2pdma, 1); 257 + if (p2p) 258 + return 0; 242 259 243 260 p2p = devm_kzalloc(&pdev->dev, sizeof(*p2p), GFP_KERNEL); 244 261 if (!p2p) 245 262 return -ENOMEM; 246 263 247 264 xa_init(&p2p->map_types); 265 + /* 266 + * Iterate over all standard PCI BARs and record only those that 267 + * correspond to MMIO regions. Skip non-memory resources (e.g. I/O 268 + * port BARs) since they cannot be used for peer-to-peer (P2P) 269 + * transactions. 270 + */ 271 + for (i = 0; i < PCI_STD_NUM_BARS; i++) { 272 + if (!(pci_resource_flags(pdev, i) & IORESOURCE_MEM)) 273 + continue; 248 274 249 - p2p->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev)); 250 - if (!p2p->pool) 251 - goto out; 275 + p2p->mem[i].owner = &pdev->dev; 276 + p2p->mem[i].bus_offset = 277 + pci_bus_address(pdev, i) - pci_resource_start(pdev, i); 278 + } 252 279 253 - error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); 254 - if (error) 255 - goto out_pool_destroy; 256 - 257 - error = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group); 258 - if (error) 259 - goto out_pool_destroy; 280 + ret = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_release, pdev); 281 + if (ret) 282 + goto out_p2p; 260 283 261 284 rcu_assign_pointer(pdev->p2pdma, p2p); 262 285 return 0; 263 286 264 - out_pool_destroy: 265 - gen_pool_destroy(p2p->pool); 266 - out: 287 + out_p2p: 267 288 devm_kfree(&pdev->dev, p2p); 268 - return error; 289 + return ret; 290 + } 291 + EXPORT_SYMBOL_GPL(pcim_p2pdma_init); 292 + 293 + /** 294 + * pcim_p2pdma_provider - Get peer-to-peer DMA provider 295 + * @pdev: The PCI device to enable P2PDMA for 296 + * @bar: BAR index to get provider 297 + * 298 + * This function gets peer-to-peer DMA provider for a PCI device. The lifetime 299 + * of the provider (and of course the MMIO) is bound to the lifetime of the 300 + * driver. A driver calling this function must ensure that all references to the 301 + * provider, and any DMA mappings created for any MMIO, are all cleaned up 302 + * before the driver remove() completes. 303 + * 304 + * Since P2P is almost always shared with a second driver this means some system 305 + * to notify, invalidate and revoke the MMIO's DMA must be in place to use this 306 + * function. For example a revoke can be built using DMABUF. 307 + */ 308 + struct p2pdma_provider *pcim_p2pdma_provider(struct pci_dev *pdev, int bar) 309 + { 310 + struct pci_p2pdma *p2p; 311 + 312 + if (!(pci_resource_flags(pdev, bar) & IORESOURCE_MEM)) 313 + return NULL; 314 + 315 + p2p = rcu_dereference_protected(pdev->p2pdma, 1); 316 + if (WARN_ON(!p2p)) 317 + /* Someone forgot to call to pcim_p2pdma_init() before */ 318 + return NULL; 319 + 320 + return &p2p->mem[bar]; 321 + } 322 + EXPORT_SYMBOL_GPL(pcim_p2pdma_provider); 323 + 324 + static int pci_p2pdma_setup_pool(struct pci_dev *pdev) 325 + { 326 + struct pci_p2pdma *p2pdma; 327 + int ret; 328 + 329 + p2pdma = rcu_dereference_protected(pdev->p2pdma, 1); 330 + if (p2pdma->pool) 331 + /* We already setup pools, do nothing, */ 332 + return 0; 333 + 334 + p2pdma->pool = gen_pool_create(PAGE_SHIFT, dev_to_node(&pdev->dev)); 335 + if (!p2pdma->pool) 336 + return -ENOMEM; 337 + 338 + ret = sysfs_create_group(&pdev->dev.kobj, &p2pmem_group); 339 + if (ret) 340 + goto out_pool_destroy; 341 + 342 + return 0; 343 + 344 + out_pool_destroy: 345 + gen_pool_destroy(p2pdma->pool); 346 + p2pdma->pool = NULL; 347 + return ret; 269 348 } 270 349 271 350 static void pci_p2pdma_unmap_mappings(void *data) 272 351 { 273 - struct pci_dev *pdev = data; 352 + struct pci_p2pdma_pagemap *p2p_pgmap = data; 274 353 275 354 /* 276 355 * Removing the alloc attribute from sysfs will call 277 356 * unmap_mapping_range() on the inode, teardown any existing userspace 278 357 * mappings and prevent new ones from being created. 279 358 */ 280 - sysfs_remove_file_from_group(&pdev->dev.kobj, &p2pmem_alloc_attr.attr, 359 + sysfs_remove_file_from_group(&p2p_pgmap->mem->owner->kobj, 360 + &p2pmem_alloc_attr.attr, 281 361 p2pmem_group.name); 282 362 } 283 363 ··· 375 295 u64 offset) 376 296 { 377 297 struct pci_p2pdma_pagemap *p2p_pgmap; 298 + struct p2pdma_provider *mem; 378 299 struct dev_pagemap *pgmap; 379 300 struct pci_p2pdma *p2pdma; 380 301 void *addr; ··· 393 312 if (size + offset > pci_resource_len(pdev, bar)) 394 313 return -EINVAL; 395 314 396 - if (!pdev->p2pdma) { 397 - error = pci_p2pdma_setup(pdev); 398 - if (error) 399 - return error; 400 - } 315 + error = pcim_p2pdma_init(pdev); 316 + if (error) 317 + return error; 318 + 319 + error = pci_p2pdma_setup_pool(pdev); 320 + if (error) 321 + return error; 322 + 323 + mem = pcim_p2pdma_provider(pdev, bar); 324 + /* 325 + * We checked validity of BAR prior to call 326 + * to pcim_p2pdma_provider. It should never return NULL. 327 + */ 328 + if (WARN_ON(!mem)) 329 + return -EINVAL; 401 330 402 331 p2p_pgmap = devm_kzalloc(&pdev->dev, sizeof(*p2p_pgmap), GFP_KERNEL); 403 332 if (!p2p_pgmap) ··· 419 328 pgmap->nr_range = 1; 420 329 pgmap->type = MEMORY_DEVICE_PCI_P2PDMA; 421 330 pgmap->ops = &p2pdma_pgmap_ops; 422 - 423 - p2p_pgmap->provider = pdev; 424 - p2p_pgmap->bus_offset = pci_bus_address(pdev, bar) - 425 - pci_resource_start(pdev, bar); 331 + p2p_pgmap->mem = mem; 426 332 427 333 addr = devm_memremap_pages(&pdev->dev, pgmap); 428 334 if (IS_ERR(addr)) { ··· 428 340 } 429 341 430 342 error = devm_add_action_or_reset(&pdev->dev, pci_p2pdma_unmap_mappings, 431 - pdev); 343 + p2p_pgmap); 432 344 if (error) 433 345 goto pages_free; 434 346 ··· 1060 972 } 1061 973 EXPORT_SYMBOL_GPL(pci_p2pmem_publish); 1062 974 1063 - static enum pci_p2pdma_map_type pci_p2pdma_map_type(struct dev_pagemap *pgmap, 1064 - struct device *dev) 975 + /** 976 + * pci_p2pdma_map_type - Determine the mapping type for P2PDMA transfers 977 + * @provider: P2PDMA provider structure 978 + * @dev: Target device for the transfer 979 + * 980 + * Determines how peer-to-peer DMA transfers should be mapped between 981 + * the provider and the target device. The mapping type indicates whether 982 + * the transfer can be done directly through PCI switches or must go 983 + * through the host bridge. 984 + */ 985 + enum pci_p2pdma_map_type pci_p2pdma_map_type(struct p2pdma_provider *provider, 986 + struct device *dev) 1065 987 { 1066 988 enum pci_p2pdma_map_type type = PCI_P2PDMA_MAP_NOT_SUPPORTED; 1067 - struct pci_dev *provider = to_p2p_pgmap(pgmap)->provider; 989 + struct pci_dev *pdev = to_pci_dev(provider->owner); 1068 990 struct pci_dev *client; 1069 991 struct pci_p2pdma *p2pdma; 1070 992 int dist; 1071 993 1072 - if (!provider->p2pdma) 994 + if (!pdev->p2pdma) 1073 995 return PCI_P2PDMA_MAP_NOT_SUPPORTED; 1074 996 1075 997 if (!dev_is_pci(dev)) ··· 1088 990 client = to_pci_dev(dev); 1089 991 1090 992 rcu_read_lock(); 1091 - p2pdma = rcu_dereference(provider->p2pdma); 993 + p2pdma = rcu_dereference(pdev->p2pdma); 1092 994 1093 995 if (p2pdma) 1094 996 type = xa_to_value(xa_load(&p2pdma->map_types, ··· 1096 998 rcu_read_unlock(); 1097 999 1098 1000 if (type == PCI_P2PDMA_MAP_UNKNOWN) 1099 - return calc_map_type_and_dist(provider, client, &dist, true); 1001 + return calc_map_type_and_dist(pdev, client, &dist, true); 1100 1002 1101 1003 return type; 1102 1004 } ··· 1104 1006 void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state, 1105 1007 struct device *dev, struct page *page) 1106 1008 { 1107 - state->pgmap = page_pgmap(page); 1108 - state->map = pci_p2pdma_map_type(state->pgmap, dev); 1109 - state->bus_off = to_p2p_pgmap(state->pgmap)->bus_offset; 1009 + struct pci_p2pdma_pagemap *p2p_pgmap = to_p2p_pgmap(page_pgmap(page)); 1010 + 1011 + if (state->mem == p2p_pgmap->mem) 1012 + return; 1013 + 1014 + state->mem = p2p_pgmap->mem; 1015 + state->map = pci_p2pdma_map_type(p2p_pgmap->mem, dev); 1110 1016 } 1111 1017 1112 1018 /**
+7 -40
drivers/s390/cio/vfio_ccw_ops.c
··· 313 313 return 0; 314 314 } 315 315 316 - static int vfio_ccw_mdev_get_region_info(struct vfio_ccw_private *private, 317 - struct vfio_region_info *info, 318 - unsigned long arg) 316 + static int vfio_ccw_mdev_ioctl_get_region_info(struct vfio_device *vdev, 317 + struct vfio_region_info *info, 318 + struct vfio_info_cap *caps) 319 319 { 320 + struct vfio_ccw_private *private = 321 + container_of(vdev, struct vfio_ccw_private, vdev); 320 322 int i; 321 323 322 324 switch (info->index) { ··· 330 328 return 0; 331 329 default: /* all other regions are handled via capability chain */ 332 330 { 333 - struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; 334 331 struct vfio_region_info_cap_type cap_type = { 335 332 .header.id = VFIO_REGION_INFO_CAP_TYPE, 336 333 .header.version = 1 }; ··· 352 351 cap_type.type = private->region[i].type; 353 352 cap_type.subtype = private->region[i].subtype; 354 353 355 - ret = vfio_info_add_capability(&caps, &cap_type.header, 354 + ret = vfio_info_add_capability(caps, &cap_type.header, 356 355 sizeof(cap_type)); 357 356 if (ret) 358 357 return ret; 359 - 360 - info->flags |= VFIO_REGION_INFO_FLAG_CAPS; 361 - if (info->argsz < sizeof(*info) + caps.size) { 362 - info->argsz = sizeof(*info) + caps.size; 363 - info->cap_offset = 0; 364 - } else { 365 - vfio_info_cap_shift(&caps, sizeof(*info)); 366 - if (copy_to_user((void __user *)arg + sizeof(*info), 367 - caps.buf, caps.size)) { 368 - kfree(caps.buf); 369 - return -EFAULT; 370 - } 371 - info->cap_offset = sizeof(*info); 372 - } 373 - 374 - kfree(caps.buf); 375 - 376 358 } 377 359 } 378 360 return 0; ··· 516 532 517 533 return copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0; 518 534 } 519 - case VFIO_DEVICE_GET_REGION_INFO: 520 - { 521 - struct vfio_region_info info; 522 - 523 - minsz = offsetofend(struct vfio_region_info, offset); 524 - 525 - if (copy_from_user(&info, (void __user *)arg, minsz)) 526 - return -EFAULT; 527 - 528 - if (info.argsz < minsz) 529 - return -EINVAL; 530 - 531 - ret = vfio_ccw_mdev_get_region_info(private, &info, arg); 532 - if (ret) 533 - return ret; 534 - 535 - return copy_to_user((void __user *)arg, &info, minsz) ? -EFAULT : 0; 536 - } 537 535 case VFIO_DEVICE_GET_IRQ_INFO: 538 536 { 539 537 struct vfio_irq_info info; ··· 593 627 .read = vfio_ccw_mdev_read, 594 628 .write = vfio_ccw_mdev_write, 595 629 .ioctl = vfio_ccw_mdev_ioctl, 630 + .get_region_info_caps = vfio_ccw_mdev_ioctl_get_region_info, 596 631 .request = vfio_ccw_mdev_request, 597 632 .dma_unmap = vfio_ccw_dma_unmap, 598 633 .bind_iommufd = vfio_iommufd_emulated_bind,
+11 -18
drivers/vfio/cdx/main.c
··· 129 129 return copy_to_user(arg, &info, minsz) ? -EFAULT : 0; 130 130 } 131 131 132 - static int vfio_cdx_ioctl_get_region_info(struct vfio_cdx_device *vdev, 133 - struct vfio_region_info __user *arg) 132 + static int vfio_cdx_ioctl_get_region_info(struct vfio_device *core_vdev, 133 + struct vfio_region_info *info, 134 + struct vfio_info_cap *caps) 134 135 { 135 - unsigned long minsz = offsetofend(struct vfio_region_info, offset); 136 + struct vfio_cdx_device *vdev = 137 + container_of(core_vdev, struct vfio_cdx_device, vdev); 136 138 struct cdx_device *cdx_dev = to_cdx_device(vdev->vdev.dev); 137 - struct vfio_region_info info; 138 139 139 - if (copy_from_user(&info, arg, minsz)) 140 - return -EFAULT; 141 - 142 - if (info.argsz < minsz) 143 - return -EINVAL; 144 - 145 - if (info.index >= cdx_dev->res_count) 140 + if (info->index >= cdx_dev->res_count) 146 141 return -EINVAL; 147 142 148 143 /* map offset to the physical address */ 149 - info.offset = vfio_cdx_index_to_offset(info.index); 150 - info.size = vdev->regions[info.index].size; 151 - info.flags = vdev->regions[info.index].flags; 152 - 153 - return copy_to_user(arg, &info, minsz) ? -EFAULT : 0; 144 + info->offset = vfio_cdx_index_to_offset(info->index); 145 + info->size = vdev->regions[info->index].size; 146 + info->flags = vdev->regions[info->index].flags; 147 + return 0; 154 148 } 155 149 156 150 static int vfio_cdx_ioctl_get_irq_info(struct vfio_cdx_device *vdev, ··· 213 219 switch (cmd) { 214 220 case VFIO_DEVICE_GET_INFO: 215 221 return vfio_cdx_ioctl_get_info(vdev, uarg); 216 - case VFIO_DEVICE_GET_REGION_INFO: 217 - return vfio_cdx_ioctl_get_region_info(vdev, uarg); 218 222 case VFIO_DEVICE_GET_IRQ_INFO: 219 223 return vfio_cdx_ioctl_get_irq_info(vdev, uarg); 220 224 case VFIO_DEVICE_SET_IRQS: ··· 276 284 .open_device = vfio_cdx_open_device, 277 285 .close_device = vfio_cdx_close_device, 278 286 .ioctl = vfio_cdx_ioctl, 287 + .get_region_info_caps = vfio_cdx_ioctl_get_region_info, 279 288 .device_feature = vfio_cdx_ioctl_feature, 280 289 .mmap = vfio_cdx_mmap, 281 290 .bind_iommufd = vfio_iommufd_physical_bind,
+1 -1
drivers/vfio/device_cdev.c
··· 99 99 return ret; 100 100 if (user_size < minsz) 101 101 return -EINVAL; 102 - ret = copy_struct_from_user(&bind, minsz, arg, user_size); 102 + ret = copy_struct_from_user(&bind, sizeof(bind), arg, user_size); 103 103 if (ret) 104 104 return ret; 105 105
+19 -24
drivers/vfio/fsl-mc/vfio_fsl_mc.c
··· 117 117 fsl_mc_cleanup_irq_pool(mc_cont); 118 118 } 119 119 120 + static int vfio_fsl_mc_ioctl_get_region_info(struct vfio_device *core_vdev, 121 + struct vfio_region_info *info, 122 + struct vfio_info_cap *caps) 123 + { 124 + struct vfio_fsl_mc_device *vdev = 125 + container_of(core_vdev, struct vfio_fsl_mc_device, vdev); 126 + struct fsl_mc_device *mc_dev = vdev->mc_dev; 127 + 128 + if (info->index >= mc_dev->obj_desc.region_count) 129 + return -EINVAL; 130 + 131 + /* map offset to the physical address */ 132 + info->offset = VFIO_FSL_MC_INDEX_TO_OFFSET(info->index); 133 + info->size = vdev->regions[info->index].size; 134 + info->flags = vdev->regions[info->index].flags; 135 + return 0; 136 + } 137 + 120 138 static long vfio_fsl_mc_ioctl(struct vfio_device *core_vdev, 121 139 unsigned int cmd, unsigned long arg) 122 140 { ··· 166 148 167 149 return copy_to_user((void __user *)arg, &info, minsz) ? 168 150 -EFAULT : 0; 169 - } 170 - case VFIO_DEVICE_GET_REGION_INFO: 171 - { 172 - struct vfio_region_info info; 173 - 174 - minsz = offsetofend(struct vfio_region_info, offset); 175 - 176 - if (copy_from_user(&info, (void __user *)arg, minsz)) 177 - return -EFAULT; 178 - 179 - if (info.argsz < minsz) 180 - return -EINVAL; 181 - 182 - if (info.index >= mc_dev->obj_desc.region_count) 183 - return -EINVAL; 184 - 185 - /* map offset to the physical address */ 186 - info.offset = VFIO_FSL_MC_INDEX_TO_OFFSET(info.index); 187 - info.size = vdev->regions[info.index].size; 188 - info.flags = vdev->regions[info.index].flags; 189 - 190 - if (copy_to_user((void __user *)arg, &info, minsz)) 191 - return -EFAULT; 192 - return 0; 193 151 } 194 152 case VFIO_DEVICE_GET_IRQ_INFO: 195 153 { ··· 583 589 .open_device = vfio_fsl_mc_open_device, 584 590 .close_device = vfio_fsl_mc_close_device, 585 591 .ioctl = vfio_fsl_mc_ioctl, 592 + .get_region_info_caps = vfio_fsl_mc_ioctl_get_region_info, 586 593 .read = vfio_fsl_mc_read, 587 594 .write = vfio_fsl_mc_write, 588 595 .mmap = vfio_fsl_mc_mmap,
+3
drivers/vfio/pci/Kconfig
··· 55 55 56 56 To enable s390x KVM vfio-pci extensions, say Y. 57 57 58 + config VFIO_PCI_DMABUF 59 + def_bool y if VFIO_PCI_CORE && PCI_P2PDMA && DMA_SHARED_BUFFER 60 + 58 61 source "drivers/vfio/pci/mlx5/Kconfig" 59 62 60 63 source "drivers/vfio/pci/hisilicon/Kconfig"
+1
drivers/vfio/pci/Makefile
··· 2 2 3 3 vfio-pci-core-y := vfio_pci_core.o vfio_pci_intrs.o vfio_pci_rdwr.o vfio_pci_config.o 4 4 vfio-pci-core-$(CONFIG_VFIO_PCI_ZDEV_KVM) += vfio_pci_zdev.o 5 + vfio-pci-core-$(CONFIG_VFIO_PCI_DMABUF) += vfio_pci_dmabuf.o 5 6 obj-$(CONFIG_VFIO_PCI_CORE) += vfio-pci-core.o 6 7 7 8 vfio-pci-y := vfio_pci.o
+108 -63
drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.c
··· 125 125 return 0; 126 126 } 127 127 128 + static void qm_xqc_reg_offsets(struct hisi_qm *qm, 129 + u32 *eqc_addr, u32 *aeqc_addr) 130 + { 131 + struct hisi_acc_vf_core_device *hisi_acc_vdev = 132 + container_of(qm, struct hisi_acc_vf_core_device, vf_qm); 133 + 134 + if (hisi_acc_vdev->drv_mode == HW_ACC_MIG_VF_CTRL) { 135 + *eqc_addr = QM_EQC_VF_DW0; 136 + *aeqc_addr = QM_AEQC_VF_DW0; 137 + } else { 138 + *eqc_addr = QM_EQC_PF_DW0; 139 + *aeqc_addr = QM_AEQC_PF_DW0; 140 + } 141 + } 142 + 128 143 static int qm_get_regs(struct hisi_qm *qm, struct acc_vf_data *vf_data) 129 144 { 130 145 struct device *dev = &qm->pdev->dev; 146 + u32 eqc_addr, aeqc_addr; 131 147 int ret; 132 148 133 149 ret = qm_read_regs(qm, QM_VF_AEQ_INT_MASK, &vf_data->aeq_int_mask, 1); ··· 183 167 return ret; 184 168 } 185 169 170 + qm_xqc_reg_offsets(qm, &eqc_addr, &aeqc_addr); 186 171 /* QM_EQC_DW has 7 regs */ 187 - ret = qm_read_regs(qm, QM_EQC_DW0, vf_data->qm_eqc_dw, 7); 172 + ret = qm_read_regs(qm, eqc_addr, vf_data->qm_eqc_dw, 7); 188 173 if (ret) { 189 174 dev_err(dev, "failed to read QM_EQC_DW\n"); 190 175 return ret; 191 176 } 192 177 193 178 /* QM_AEQC_DW has 7 regs */ 194 - ret = qm_read_regs(qm, QM_AEQC_DW0, vf_data->qm_aeqc_dw, 7); 179 + ret = qm_read_regs(qm, aeqc_addr, vf_data->qm_aeqc_dw, 7); 195 180 if (ret) { 196 181 dev_err(dev, "failed to read QM_AEQC_DW\n"); 197 182 return ret; ··· 204 187 static int qm_set_regs(struct hisi_qm *qm, struct acc_vf_data *vf_data) 205 188 { 206 189 struct device *dev = &qm->pdev->dev; 190 + u32 eqc_addr, aeqc_addr; 207 191 int ret; 208 192 209 193 /* Check VF state */ ··· 257 239 return ret; 258 240 } 259 241 242 + qm_xqc_reg_offsets(qm, &eqc_addr, &aeqc_addr); 260 243 /* QM_EQC_DW has 7 regs */ 261 - ret = qm_write_regs(qm, QM_EQC_DW0, vf_data->qm_eqc_dw, 7); 244 + ret = qm_write_regs(qm, eqc_addr, vf_data->qm_eqc_dw, 7); 262 245 if (ret) { 263 246 dev_err(dev, "failed to write QM_EQC_DW\n"); 264 247 return ret; 265 248 } 266 249 267 250 /* QM_AEQC_DW has 7 regs */ 268 - ret = qm_write_regs(qm, QM_AEQC_DW0, vf_data->qm_aeqc_dw, 7); 251 + ret = qm_write_regs(qm, aeqc_addr, vf_data->qm_aeqc_dw, 7); 269 252 if (ret) { 270 253 dev_err(dev, "failed to write QM_AEQC_DW\n"); 271 254 return ret; ··· 1205 1186 { 1206 1187 struct vfio_pci_core_device *vdev = &hisi_acc_vdev->core_device; 1207 1188 struct hisi_qm *vf_qm = &hisi_acc_vdev->vf_qm; 1189 + struct hisi_qm *pf_qm = hisi_acc_vdev->pf_qm; 1208 1190 struct pci_dev *vf_dev = vdev->pdev; 1191 + u32 val; 1209 1192 1210 - /* 1211 - * ACC VF dev BAR2 region consists of both functional register space 1212 - * and migration control register space. For migration to work, we 1213 - * need access to both. Hence, we map the entire BAR2 region here. 1214 - * But unnecessarily exposing the migration BAR region to the Guest 1215 - * has the potential to prevent/corrupt the Guest migration. Hence, 1216 - * we restrict access to the migration control space from 1217 - * Guest(Please see mmap/ioctl/read/write override functions). 1218 - * 1219 - * Please note that it is OK to expose the entire VF BAR if migration 1220 - * is not supported or required as this cannot affect the ACC PF 1221 - * configurations. 1222 - * 1223 - * Also the HiSilicon ACC VF devices supported by this driver on 1224 - * HiSilicon hardware platforms are integrated end point devices 1225 - * and the platform lacks the capability to perform any PCIe P2P 1226 - * between these devices. 1227 - */ 1193 + val = readl(pf_qm->io_base + QM_MIG_REGION_SEL); 1194 + if (pf_qm->ver > QM_HW_V3 && (val & QM_MIG_REGION_EN)) 1195 + hisi_acc_vdev->drv_mode = HW_ACC_MIG_PF_CTRL; 1196 + else 1197 + hisi_acc_vdev->drv_mode = HW_ACC_MIG_VF_CTRL; 1228 1198 1229 - vf_qm->io_base = 1230 - ioremap(pci_resource_start(vf_dev, VFIO_PCI_BAR2_REGION_INDEX), 1231 - pci_resource_len(vf_dev, VFIO_PCI_BAR2_REGION_INDEX)); 1232 - if (!vf_qm->io_base) 1233 - return -EIO; 1199 + if (hisi_acc_vdev->drv_mode == HW_ACC_MIG_PF_CTRL) { 1200 + /* 1201 + * On hardware platforms greater than QM_HW_V3, the migration function 1202 + * register is placed in the BAR2 configuration region of the PF, 1203 + * and each VF device occupies 8KB of configuration space. 1204 + */ 1205 + vf_qm->io_base = pf_qm->io_base + QM_MIG_REGION_OFFSET + 1206 + hisi_acc_vdev->vf_id * QM_MIG_REGION_SIZE; 1207 + } else { 1208 + /* 1209 + * ACC VF dev BAR2 region consists of both functional register space 1210 + * and migration control register space. For migration to work, we 1211 + * need access to both. Hence, we map the entire BAR2 region here. 1212 + * But unnecessarily exposing the migration BAR region to the Guest 1213 + * has the potential to prevent/corrupt the Guest migration. Hence, 1214 + * we restrict access to the migration control space from 1215 + * Guest(Please see mmap/ioctl/read/write override functions). 1216 + * 1217 + * Please note that it is OK to expose the entire VF BAR if migration 1218 + * is not supported or required as this cannot affect the ACC PF 1219 + * configurations. 1220 + * 1221 + * Also the HiSilicon ACC VF devices supported by this driver on 1222 + * HiSilicon hardware platforms are integrated end point devices 1223 + * and the platform lacks the capability to perform any PCIe P2P 1224 + * between these devices. 1225 + */ 1234 1226 1227 + vf_qm->io_base = 1228 + ioremap(pci_resource_start(vf_dev, VFIO_PCI_BAR2_REGION_INDEX), 1229 + pci_resource_len(vf_dev, VFIO_PCI_BAR2_REGION_INDEX)); 1230 + if (!vf_qm->io_base) 1231 + return -EIO; 1232 + } 1235 1233 vf_qm->fun_type = QM_HW_VF; 1234 + vf_qm->ver = pf_qm->ver; 1236 1235 vf_qm->pdev = vf_dev; 1237 1236 mutex_init(&vf_qm->mailbox_lock); 1238 1237 ··· 1287 1250 return !IS_ERR(pf_qm) ? pf_qm : NULL; 1288 1251 } 1289 1252 1253 + static size_t hisi_acc_get_resource_len(struct vfio_pci_core_device *vdev, 1254 + unsigned int index) 1255 + { 1256 + struct hisi_acc_vf_core_device *hisi_acc_vdev = 1257 + hisi_acc_drvdata(vdev->pdev); 1258 + 1259 + /* 1260 + * On the old HW_ACC_MIG_VF_CTRL mode device, the ACC VF device 1261 + * BAR2 region encompasses both functional register space 1262 + * and migration control register space. 1263 + * only the functional region should be report to Guest. 1264 + */ 1265 + if (hisi_acc_vdev->drv_mode == HW_ACC_MIG_VF_CTRL) 1266 + return (pci_resource_len(vdev->pdev, index) >> 1); 1267 + /* 1268 + * On the new HW device, the migration control register 1269 + * has been moved to the PF device BAR2 region. 1270 + * The VF device BAR2 is entirely functional register space. 1271 + */ 1272 + return pci_resource_len(vdev->pdev, index); 1273 + } 1274 + 1290 1275 static int hisi_acc_pci_rw_access_check(struct vfio_device *core_vdev, 1291 1276 size_t count, loff_t *ppos, 1292 1277 size_t *new_count) ··· 1319 1260 1320 1261 if (index == VFIO_PCI_BAR2_REGION_INDEX) { 1321 1262 loff_t pos = *ppos & VFIO_PCI_OFFSET_MASK; 1322 - resource_size_t end = pci_resource_len(vdev->pdev, index) / 2; 1263 + resource_size_t end; 1323 1264 1265 + end = hisi_acc_get_resource_len(vdev, index); 1324 1266 /* Check if access is for migration control region */ 1325 1267 if (pos >= end) 1326 1268 return -EINVAL; ··· 1342 1282 index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); 1343 1283 if (index == VFIO_PCI_BAR2_REGION_INDEX) { 1344 1284 u64 req_len, pgoff, req_start; 1345 - resource_size_t end = pci_resource_len(vdev->pdev, index) / 2; 1285 + resource_size_t end; 1346 1286 1287 + end = hisi_acc_get_resource_len(vdev, index); 1347 1288 req_len = vma->vm_end - vma->vm_start; 1348 1289 pgoff = vma->vm_pgoff & 1349 1290 ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); ··· 1385 1324 return vfio_pci_core_read(core_vdev, buf, new_count, ppos); 1386 1325 } 1387 1326 1388 - static long hisi_acc_vfio_pci_ioctl(struct vfio_device *core_vdev, unsigned int cmd, 1389 - unsigned long arg) 1327 + static int hisi_acc_vfio_ioctl_get_region(struct vfio_device *core_vdev, 1328 + struct vfio_region_info *info, 1329 + struct vfio_info_cap *caps) 1390 1330 { 1391 - if (cmd == VFIO_DEVICE_GET_REGION_INFO) { 1392 - struct vfio_pci_core_device *vdev = 1393 - container_of(core_vdev, struct vfio_pci_core_device, vdev); 1394 - struct pci_dev *pdev = vdev->pdev; 1395 - struct vfio_region_info info; 1396 - unsigned long minsz; 1331 + struct vfio_pci_core_device *vdev = 1332 + container_of(core_vdev, struct vfio_pci_core_device, vdev); 1397 1333 1398 - minsz = offsetofend(struct vfio_region_info, offset); 1334 + if (info->index != VFIO_PCI_BAR2_REGION_INDEX) 1335 + return vfio_pci_ioctl_get_region_info(core_vdev, info, caps); 1399 1336 1400 - if (copy_from_user(&info, (void __user *)arg, minsz)) 1401 - return -EFAULT; 1337 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1402 1338 1403 - if (info.argsz < minsz) 1404 - return -EINVAL; 1339 + info->size = hisi_acc_get_resource_len(vdev, info->index); 1405 1340 1406 - if (info.index == VFIO_PCI_BAR2_REGION_INDEX) { 1407 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 1408 - 1409 - /* 1410 - * ACC VF dev BAR2 region consists of both functional 1411 - * register space and migration control register space. 1412 - * Report only the functional region to Guest. 1413 - */ 1414 - info.size = pci_resource_len(pdev, info.index) / 2; 1415 - 1416 - info.flags = VFIO_REGION_INFO_FLAG_READ | 1417 - VFIO_REGION_INFO_FLAG_WRITE | 1418 - VFIO_REGION_INFO_FLAG_MMAP; 1419 - 1420 - return copy_to_user((void __user *)arg, &info, minsz) ? 1421 - -EFAULT : 0; 1422 - } 1423 - } 1424 - return vfio_pci_core_ioctl(core_vdev, cmd, arg); 1341 + info->flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE | 1342 + VFIO_REGION_INFO_FLAG_MMAP; 1343 + return 0; 1425 1344 } 1426 1345 1427 1346 static int hisi_acc_vf_debug_check(struct seq_file *seq, struct vfio_device *vdev) ··· 1562 1521 hisi_acc_vf_disable_fds(hisi_acc_vdev); 1563 1522 mutex_lock(&hisi_acc_vdev->open_mutex); 1564 1523 hisi_acc_vdev->dev_opened = false; 1565 - iounmap(vf_qm->io_base); 1524 + if (hisi_acc_vdev->drv_mode == HW_ACC_MIG_VF_CTRL) 1525 + iounmap(vf_qm->io_base); 1566 1526 mutex_unlock(&hisi_acc_vdev->open_mutex); 1567 1527 vfio_pci_core_close_device(core_vdev); 1568 1528 } ··· 1599 1557 .release = vfio_pci_core_release_dev, 1600 1558 .open_device = hisi_acc_vfio_pci_open_device, 1601 1559 .close_device = hisi_acc_vfio_pci_close_device, 1602 - .ioctl = hisi_acc_vfio_pci_ioctl, 1560 + .ioctl = vfio_pci_core_ioctl, 1561 + .get_region_info_caps = hisi_acc_vfio_ioctl_get_region, 1603 1562 .device_feature = vfio_pci_core_ioctl_feature, 1604 1563 .read = hisi_acc_vfio_pci_read, 1605 1564 .write = hisi_acc_vfio_pci_write, 1606 1565 .mmap = hisi_acc_vfio_pci_mmap, 1607 1566 .request = vfio_pci_core_request, 1608 1567 .match = vfio_pci_core_match, 1568 + .match_token_uuid = vfio_pci_core_match_token_uuid, 1609 1569 .bind_iommufd = vfio_iommufd_physical_bind, 1610 1570 .unbind_iommufd = vfio_iommufd_physical_unbind, 1611 1571 .attach_ioas = vfio_iommufd_physical_attach_ioas, ··· 1621 1577 .open_device = hisi_acc_vfio_pci_open_device, 1622 1578 .close_device = vfio_pci_core_close_device, 1623 1579 .ioctl = vfio_pci_core_ioctl, 1580 + .get_region_info_caps = vfio_pci_ioctl_get_region_info, 1624 1581 .device_feature = vfio_pci_core_ioctl_feature, 1625 1582 .read = vfio_pci_core_read, 1626 1583 .write = vfio_pci_core_write,
+21 -2
drivers/vfio/pci/hisilicon/hisi_acc_vfio_pci.h
··· 50 50 #define QM_QUE_ISO_CFG_V 0x0030 51 51 #define QM_PAGE_SIZE 0x0034 52 52 53 - #define QM_EQC_DW0 0X8000 54 - #define QM_AEQC_DW0 0X8020 53 + #define QM_EQC_VF_DW0 0X8000 54 + #define QM_AEQC_VF_DW0 0X8020 55 + #define QM_EQC_PF_DW0 0x1c00 56 + #define QM_AEQC_PF_DW0 0x1c20 55 57 56 58 #define ACC_DRV_MAJOR_VER 1 57 59 #define ACC_DRV_MINOR_VER 0 58 60 59 61 #define ACC_DEV_MAGIC_V1 0XCDCDCDCDFEEDAACC 60 62 #define ACC_DEV_MAGIC_V2 0xAACCFEEDDECADEDE 63 + 64 + #define QM_MIG_REGION_OFFSET 0x180000 65 + #define QM_MIG_REGION_SIZE 0x2000 66 + 67 + /** 68 + * On HW_ACC_MIG_VF_CTRL mode, the configuration domain supporting live 69 + * migration functionality is located in the latter 32KB of the VF's BAR2. 70 + * The Guest is only provided with the first 32KB of the VF's BAR2. 71 + * On HW_ACC_MIG_PF_CTRL mode, the configuration domain supporting live 72 + * migration functionality is located in the PF's BAR2, and the entire 64KB 73 + * of the VF's BAR2 is allocated to the Guest. 74 + */ 75 + enum hw_drv_mode { 76 + HW_ACC_MIG_VF_CTRL = 0, 77 + HW_ACC_MIG_PF_CTRL, 78 + }; 61 79 62 80 struct acc_vf_data { 63 81 #define QM_MATCH_SIZE offsetofend(struct acc_vf_data, qm_rsv_state) ··· 143 125 struct pci_dev *vf_dev; 144 126 struct hisi_qm *pf_qm; 145 127 struct hisi_qm vf_qm; 128 + enum hw_drv_mode drv_mode; 146 129 /* 147 130 * vf_qm_state represents the QM_VF_STATE register value. 148 131 * It is set by Guest driver for the ACC VF dev indicating
+1
drivers/vfio/pci/mlx5/main.c
··· 1366 1366 .open_device = mlx5vf_pci_open_device, 1367 1367 .close_device = mlx5vf_pci_close_device, 1368 1368 .ioctl = vfio_pci_core_ioctl, 1369 + .get_region_info_caps = vfio_pci_ioctl_get_region_info, 1369 1370 .device_feature = vfio_pci_core_ioctl_feature, 1370 1371 .read = vfio_pci_core_read, 1371 1372 .write = vfio_pci_core_write,
+255 -87
drivers/vfio/pci/nvgrace-gpu/main.c
··· 7 7 #include <linux/vfio_pci_core.h> 8 8 #include <linux/delay.h> 9 9 #include <linux/jiffies.h> 10 + #include <linux/pci-p2pdma.h> 11 + #include <linux/pm_runtime.h> 10 12 11 13 /* 12 14 * The device memory usable to the workloads running in the VM is cached ··· 60 58 /* Lock to control device memory kernel mapping */ 61 59 struct mutex remap_lock; 62 60 bool has_mig_hw_bug; 61 + /* GPU has just been reset */ 62 + bool reset_done; 63 63 }; 64 64 65 65 static void nvgrace_gpu_init_fake_bar_emu_regs(struct vfio_device *core_vdev) ··· 106 102 mutex_init(&nvdev->remap_lock); 107 103 } 108 104 105 + /* 106 + * GPU readiness is checked by reading the BAR0 registers. 107 + * 108 + * ioremap BAR0 to ensure that the BAR0 mapping is present before 109 + * register reads on first fault before establishing any GPU 110 + * memory mapping. 111 + */ 112 + ret = vfio_pci_core_setup_barmap(vdev, 0); 113 + if (ret) { 114 + vfio_pci_core_disable(vdev); 115 + return ret; 116 + } 117 + 109 118 vfio_pci_core_finish_enable(vdev); 110 119 111 120 return 0; ··· 147 130 vfio_pci_core_close_device(core_vdev); 148 131 } 149 132 133 + static int nvgrace_gpu_wait_device_ready(void __iomem *io) 134 + { 135 + unsigned long timeout = jiffies + msecs_to_jiffies(POLL_TIMEOUT_MS); 136 + 137 + do { 138 + if ((ioread32(io + C2C_LINK_BAR0_OFFSET) == STATUS_READY) && 139 + (ioread32(io + HBM_TRAINING_BAR0_OFFSET) == STATUS_READY)) 140 + return 0; 141 + msleep(POLL_QUANTUM_MS); 142 + } while (!time_after(jiffies, timeout)); 143 + 144 + return -ETIME; 145 + } 146 + 147 + /* 148 + * If the GPU memory is accessed by the CPU while the GPU is not ready 149 + * after reset, it can cause harmless corrected RAS events to be logged. 150 + * Make sure the GPU is ready before establishing the mappings. 151 + */ 152 + static int 153 + nvgrace_gpu_check_device_ready(struct nvgrace_gpu_pci_core_device *nvdev) 154 + { 155 + struct vfio_pci_core_device *vdev = &nvdev->core_device; 156 + int ret; 157 + 158 + lockdep_assert_held_read(&vdev->memory_lock); 159 + 160 + if (!nvdev->reset_done) 161 + return 0; 162 + 163 + if (!__vfio_pci_memory_enabled(vdev)) 164 + return -EIO; 165 + 166 + ret = nvgrace_gpu_wait_device_ready(vdev->barmap[0]); 167 + if (ret) 168 + return ret; 169 + 170 + nvdev->reset_done = false; 171 + 172 + return 0; 173 + } 174 + 175 + static unsigned long addr_to_pgoff(struct vm_area_struct *vma, 176 + unsigned long addr) 177 + { 178 + u64 pgoff = vma->vm_pgoff & 179 + ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); 180 + 181 + return ((addr - vma->vm_start) >> PAGE_SHIFT) + pgoff; 182 + } 183 + 184 + static vm_fault_t nvgrace_gpu_vfio_pci_huge_fault(struct vm_fault *vmf, 185 + unsigned int order) 186 + { 187 + struct vm_area_struct *vma = vmf->vma; 188 + struct nvgrace_gpu_pci_core_device *nvdev = vma->vm_private_data; 189 + struct vfio_pci_core_device *vdev = &nvdev->core_device; 190 + unsigned int index = 191 + vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); 192 + vm_fault_t ret = VM_FAULT_FALLBACK; 193 + struct mem_region *memregion; 194 + unsigned long pfn, addr; 195 + 196 + memregion = nvgrace_gpu_memregion(index, nvdev); 197 + if (!memregion) 198 + return VM_FAULT_SIGBUS; 199 + 200 + addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order); 201 + pfn = PHYS_PFN(memregion->memphys) + addr_to_pgoff(vma, addr); 202 + 203 + if (is_aligned_for_order(vma, addr, pfn, order)) { 204 + scoped_guard(rwsem_read, &vdev->memory_lock) { 205 + if (vdev->pm_runtime_engaged || 206 + nvgrace_gpu_check_device_ready(nvdev)) 207 + return VM_FAULT_SIGBUS; 208 + 209 + ret = vfio_pci_vmf_insert_pfn(vdev, vmf, pfn, order); 210 + } 211 + } 212 + 213 + dev_dbg_ratelimited(&vdev->pdev->dev, 214 + "%s order = %d pfn 0x%lx: 0x%x\n", 215 + __func__, order, pfn, 216 + (unsigned int)ret); 217 + 218 + return ret; 219 + } 220 + 221 + static vm_fault_t nvgrace_gpu_vfio_pci_fault(struct vm_fault *vmf) 222 + { 223 + return nvgrace_gpu_vfio_pci_huge_fault(vmf, 0); 224 + } 225 + 226 + static const struct vm_operations_struct nvgrace_gpu_vfio_pci_mmap_ops = { 227 + .fault = nvgrace_gpu_vfio_pci_fault, 228 + #ifdef CONFIG_ARCH_SUPPORTS_HUGE_PFNMAP 229 + .huge_fault = nvgrace_gpu_vfio_pci_huge_fault, 230 + #endif 231 + }; 232 + 150 233 static int nvgrace_gpu_mmap(struct vfio_device *core_vdev, 151 234 struct vm_area_struct *vma) 152 235 { ··· 254 137 container_of(core_vdev, struct nvgrace_gpu_pci_core_device, 255 138 core_device.vdev); 256 139 struct mem_region *memregion; 257 - unsigned long start_pfn; 258 140 u64 req_len, pgoff, end; 259 141 unsigned int index; 260 - int ret = 0; 261 142 262 143 index = vma->vm_pgoff >> (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT); 263 144 ··· 272 157 ((1U << (VFIO_PCI_OFFSET_SHIFT - PAGE_SHIFT)) - 1); 273 158 274 159 if (check_sub_overflow(vma->vm_end, vma->vm_start, &req_len) || 275 - check_add_overflow(PHYS_PFN(memregion->memphys), pgoff, &start_pfn) || 276 160 check_add_overflow(PFN_PHYS(pgoff), req_len, &end)) 277 161 return -EOVERFLOW; 278 162 279 163 /* 280 - * Check that the mapping request does not go beyond available device 281 - * memory size 164 + * Check that the mapping request does not go beyond the exposed 165 + * device memory size. 282 166 */ 283 167 if (end > memregion->memlength) 284 168 return -EINVAL; 169 + 170 + vm_flags_set(vma, VM_IO | VM_PFNMAP | VM_DONTEXPAND | VM_DONTDUMP); 285 171 286 172 /* 287 173 * The carved out region of the device memory needs the NORMAL_NC ··· 300 184 vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot); 301 185 } 302 186 303 - /* 304 - * Perform a PFN map to the memory and back the device BAR by the 305 - * GPU memory. 306 - * 307 - * The available GPU memory size may not be power-of-2 aligned. The 308 - * remainder is only backed by vfio_device_ops read/write handlers. 309 - * 310 - * During device reset, the GPU is safely disconnected to the CPU 311 - * and access to the BAR will be immediately returned preventing 312 - * machine check. 313 - */ 314 - ret = remap_pfn_range(vma, vma->vm_start, start_pfn, 315 - req_len, vma->vm_page_prot); 316 - if (ret) 317 - return ret; 318 - 319 - vma->vm_pgoff = start_pfn; 187 + vma->vm_ops = &nvgrace_gpu_vfio_pci_mmap_ops; 188 + vma->vm_private_data = nvdev; 320 189 321 190 return 0; 322 191 } 323 192 324 - static long 325 - nvgrace_gpu_ioctl_get_region_info(struct vfio_device *core_vdev, 326 - unsigned long arg) 193 + static int nvgrace_gpu_ioctl_get_region_info(struct vfio_device *core_vdev, 194 + struct vfio_region_info *info, 195 + struct vfio_info_cap *caps) 327 196 { 328 197 struct nvgrace_gpu_pci_core_device *nvdev = 329 198 container_of(core_vdev, struct nvgrace_gpu_pci_core_device, 330 199 core_device.vdev); 331 - unsigned long minsz = offsetofend(struct vfio_region_info, offset); 332 - struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; 333 200 struct vfio_region_info_cap_sparse_mmap *sparse; 334 - struct vfio_region_info info; 335 201 struct mem_region *memregion; 336 202 u32 size; 337 203 int ret; 338 - 339 - if (copy_from_user(&info, (void __user *)arg, minsz)) 340 - return -EFAULT; 341 - 342 - if (info.argsz < minsz) 343 - return -EINVAL; 344 204 345 205 /* 346 206 * Request to determine the BAR region information. Send the 347 207 * GPU memory information. 348 208 */ 349 - memregion = nvgrace_gpu_memregion(info.index, nvdev); 209 + memregion = nvgrace_gpu_memregion(info->index, nvdev); 350 210 if (!memregion) 351 - return vfio_pci_core_ioctl(core_vdev, 352 - VFIO_DEVICE_GET_REGION_INFO, arg); 211 + return vfio_pci_ioctl_get_region_info(core_vdev, info, caps); 353 212 354 213 size = struct_size(sparse, areas, 1); 355 214 ··· 343 252 sparse->header.id = VFIO_REGION_INFO_CAP_SPARSE_MMAP; 344 253 sparse->header.version = 1; 345 254 346 - ret = vfio_info_add_capability(&caps, &sparse->header, size); 255 + ret = vfio_info_add_capability(caps, &sparse->header, size); 347 256 kfree(sparse); 348 257 if (ret) 349 258 return ret; 350 259 351 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 260 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 352 261 /* 353 262 * The region memory size may not be power-of-2 aligned. 354 263 * Given that the memory is a BAR and may not be 355 264 * aligned, roundup to the next power-of-2. 356 265 */ 357 - info.size = memregion->bar_size; 358 - info.flags = VFIO_REGION_INFO_FLAG_READ | 266 + info->size = memregion->bar_size; 267 + info->flags = VFIO_REGION_INFO_FLAG_READ | 359 268 VFIO_REGION_INFO_FLAG_WRITE | 360 269 VFIO_REGION_INFO_FLAG_MMAP; 361 - 362 - if (caps.size) { 363 - info.flags |= VFIO_REGION_INFO_FLAG_CAPS; 364 - if (info.argsz < sizeof(info) + caps.size) { 365 - info.argsz = sizeof(info) + caps.size; 366 - info.cap_offset = 0; 367 - } else { 368 - vfio_info_cap_shift(&caps, sizeof(info)); 369 - if (copy_to_user((void __user *)arg + 370 - sizeof(info), caps.buf, 371 - caps.size)) { 372 - kfree(caps.buf); 373 - return -EFAULT; 374 - } 375 - info.cap_offset = sizeof(info); 376 - } 377 - kfree(caps.buf); 378 - } 379 - return copy_to_user((void __user *)arg, &info, minsz) ? 380 - -EFAULT : 0; 270 + return 0; 381 271 } 382 272 383 273 static long nvgrace_gpu_ioctl(struct vfio_device *core_vdev, 384 274 unsigned int cmd, unsigned long arg) 385 275 { 386 276 switch (cmd) { 387 - case VFIO_DEVICE_GET_REGION_INFO: 388 - return nvgrace_gpu_ioctl_get_region_info(core_vdev, arg); 389 277 case VFIO_DEVICE_IOEVENTFD: 390 278 return -ENOTTY; 391 279 case VFIO_DEVICE_RESET: ··· 580 510 nvgrace_gpu_read_mem(struct nvgrace_gpu_pci_core_device *nvdev, 581 511 char __user *buf, size_t count, loff_t *ppos) 582 512 { 513 + struct vfio_pci_core_device *vdev = &nvdev->core_device; 583 514 u64 offset = *ppos & VFIO_PCI_OFFSET_MASK; 584 515 unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); 585 516 struct mem_region *memregion; ··· 607 536 else 608 537 mem_count = min(count, memregion->memlength - (size_t)offset); 609 538 610 - ret = nvgrace_gpu_map_and_read(nvdev, buf, mem_count, ppos); 611 - if (ret) 612 - return ret; 539 + scoped_guard(rwsem_read, &vdev->memory_lock) { 540 + ret = nvgrace_gpu_check_device_ready(nvdev); 541 + if (ret) 542 + return ret; 543 + 544 + ret = nvgrace_gpu_map_and_read(nvdev, buf, mem_count, ppos); 545 + if (ret) 546 + return ret; 547 + } 613 548 614 549 /* 615 550 * Only the device memory present on the hardware is mapped, which may ··· 640 563 struct nvgrace_gpu_pci_core_device *nvdev = 641 564 container_of(core_vdev, struct nvgrace_gpu_pci_core_device, 642 565 core_device.vdev); 566 + struct vfio_pci_core_device *vdev = &nvdev->core_device; 567 + int ret; 643 568 644 - if (nvgrace_gpu_memregion(index, nvdev)) 645 - return nvgrace_gpu_read_mem(nvdev, buf, count, ppos); 569 + if (nvgrace_gpu_memregion(index, nvdev)) { 570 + if (pm_runtime_resume_and_get(&vdev->pdev->dev)) 571 + return -EIO; 572 + ret = nvgrace_gpu_read_mem(nvdev, buf, count, ppos); 573 + pm_runtime_put(&vdev->pdev->dev); 574 + return ret; 575 + } 646 576 647 577 if (index == VFIO_PCI_CONFIG_REGION_INDEX) 648 578 return nvgrace_gpu_read_config_emu(core_vdev, buf, count, ppos); ··· 711 627 nvgrace_gpu_write_mem(struct nvgrace_gpu_pci_core_device *nvdev, 712 628 size_t count, loff_t *ppos, const char __user *buf) 713 629 { 630 + struct vfio_pci_core_device *vdev = &nvdev->core_device; 714 631 unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); 715 632 u64 offset = *ppos & VFIO_PCI_OFFSET_MASK; 716 633 struct mem_region *memregion; ··· 741 656 */ 742 657 mem_count = min(count, memregion->memlength - (size_t)offset); 743 658 744 - ret = nvgrace_gpu_map_and_write(nvdev, buf, mem_count, ppos); 745 - if (ret) 746 - return ret; 659 + scoped_guard(rwsem_read, &vdev->memory_lock) { 660 + ret = nvgrace_gpu_check_device_ready(nvdev); 661 + if (ret) 662 + return ret; 663 + 664 + ret = nvgrace_gpu_map_and_write(nvdev, buf, mem_count, ppos); 665 + if (ret) 666 + return ret; 667 + } 747 668 748 669 exitfn: 749 670 *ppos += count; ··· 763 672 struct nvgrace_gpu_pci_core_device *nvdev = 764 673 container_of(core_vdev, struct nvgrace_gpu_pci_core_device, 765 674 core_device.vdev); 675 + struct vfio_pci_core_device *vdev = &nvdev->core_device; 766 676 unsigned int index = VFIO_PCI_OFFSET_TO_INDEX(*ppos); 677 + int ret; 767 678 768 - if (nvgrace_gpu_memregion(index, nvdev)) 769 - return nvgrace_gpu_write_mem(nvdev, count, ppos, buf); 679 + if (nvgrace_gpu_memregion(index, nvdev)) { 680 + if (pm_runtime_resume_and_get(&vdev->pdev->dev)) 681 + return -EIO; 682 + ret = nvgrace_gpu_write_mem(nvdev, count, ppos, buf); 683 + pm_runtime_put(&vdev->pdev->dev); 684 + return ret; 685 + } 770 686 771 687 if (index == VFIO_PCI_CONFIG_REGION_INDEX) 772 688 return nvgrace_gpu_write_config_emu(core_vdev, buf, count, ppos); 773 689 774 690 return vfio_pci_core_write(core_vdev, buf, count, ppos); 775 691 } 692 + 693 + static int nvgrace_get_dmabuf_phys(struct vfio_pci_core_device *core_vdev, 694 + struct p2pdma_provider **provider, 695 + unsigned int region_index, 696 + struct dma_buf_phys_vec *phys_vec, 697 + struct vfio_region_dma_range *dma_ranges, 698 + size_t nr_ranges) 699 + { 700 + struct nvgrace_gpu_pci_core_device *nvdev = container_of( 701 + core_vdev, struct nvgrace_gpu_pci_core_device, core_device); 702 + struct pci_dev *pdev = core_vdev->pdev; 703 + struct mem_region *mem_region; 704 + 705 + /* 706 + * if (nvdev->resmem.memlength && region_index == RESMEM_REGION_INDEX) { 707 + * The P2P properties of the non-BAR memory is the same as the 708 + * BAR memory, so just use the provider for index 0. Someday 709 + * when CXL gets P2P support we could create CXLish providers 710 + * for the non-BAR memory. 711 + * } else if (region_index == USEMEM_REGION_INDEX) { 712 + * This is actually cachable memory and isn't treated as P2P in 713 + * the chip. For now we have no way to push cachable memory 714 + * through everything and the Grace HW doesn't care what caching 715 + * attribute is programmed into the SMMU. So use BAR 0. 716 + * } 717 + */ 718 + mem_region = nvgrace_gpu_memregion(region_index, nvdev); 719 + if (mem_region) { 720 + *provider = pcim_p2pdma_provider(pdev, 0); 721 + if (!*provider) 722 + return -EINVAL; 723 + return vfio_pci_core_fill_phys_vec(phys_vec, dma_ranges, 724 + nr_ranges, 725 + mem_region->memphys, 726 + mem_region->memlength); 727 + } 728 + 729 + return vfio_pci_core_get_dmabuf_phys(core_vdev, provider, region_index, 730 + phys_vec, dma_ranges, nr_ranges); 731 + } 732 + 733 + static const struct vfio_pci_device_ops nvgrace_gpu_pci_dev_ops = { 734 + .get_dmabuf_phys = nvgrace_get_dmabuf_phys, 735 + }; 776 736 777 737 static const struct vfio_device_ops nvgrace_gpu_pci_ops = { 778 738 .name = "nvgrace-gpu-vfio-pci", ··· 832 690 .open_device = nvgrace_gpu_open_device, 833 691 .close_device = nvgrace_gpu_close_device, 834 692 .ioctl = nvgrace_gpu_ioctl, 693 + .get_region_info_caps = nvgrace_gpu_ioctl_get_region_info, 835 694 .device_feature = vfio_pci_core_ioctl_feature, 836 695 .read = nvgrace_gpu_read, 837 696 .write = nvgrace_gpu_write, ··· 846 703 .detach_ioas = vfio_iommufd_physical_detach_ioas, 847 704 }; 848 705 706 + static const struct vfio_pci_device_ops nvgrace_gpu_pci_dev_core_ops = { 707 + .get_dmabuf_phys = vfio_pci_core_get_dmabuf_phys, 708 + }; 709 + 849 710 static const struct vfio_device_ops nvgrace_gpu_pci_core_ops = { 850 711 .name = "nvgrace-gpu-vfio-pci-core", 851 712 .init = vfio_pci_core_init_dev, ··· 857 710 .open_device = nvgrace_gpu_open_device, 858 711 .close_device = vfio_pci_core_close_device, 859 712 .ioctl = vfio_pci_core_ioctl, 713 + .get_region_info_caps = vfio_pci_ioctl_get_region_info, 860 714 .device_feature = vfio_pci_core_ioctl_feature, 861 715 .read = vfio_pci_core_read, 862 716 .write = vfio_pci_core_write, ··· 1041 893 * Ensure that the BAR0 region is enabled before accessing the 1042 894 * registers. 1043 895 */ 1044 - static int nvgrace_gpu_wait_device_ready(struct pci_dev *pdev) 896 + static int nvgrace_gpu_probe_check_device_ready(struct pci_dev *pdev) 1045 897 { 1046 - unsigned long timeout = jiffies + msecs_to_jiffies(POLL_TIMEOUT_MS); 1047 898 void __iomem *io; 1048 - int ret = -ETIME; 899 + int ret; 1049 900 1050 901 ret = pci_enable_device(pdev); 1051 902 if (ret) ··· 1060 913 goto iomap_exit; 1061 914 } 1062 915 1063 - do { 1064 - if ((ioread32(io + C2C_LINK_BAR0_OFFSET) == STATUS_READY) && 1065 - (ioread32(io + HBM_TRAINING_BAR0_OFFSET) == STATUS_READY)) { 1066 - ret = 0; 1067 - goto reg_check_exit; 1068 - } 1069 - msleep(POLL_QUANTUM_MS); 1070 - } while (!time_after(jiffies, timeout)); 916 + ret = nvgrace_gpu_wait_device_ready(io); 1071 917 1072 - reg_check_exit: 1073 918 pci_iounmap(pdev, io); 1074 919 iomap_exit: 1075 920 pci_release_selected_regions(pdev, 1 << 0); ··· 1078 939 u64 memphys, memlength; 1079 940 int ret; 1080 941 1081 - ret = nvgrace_gpu_wait_device_ready(pdev); 942 + ret = nvgrace_gpu_probe_check_device_ready(pdev); 1082 943 if (ret) 1083 944 return ret; 1084 945 ··· 1104 965 memphys, memlength); 1105 966 if (ret) 1106 967 goto out_put_vdev; 968 + nvdev->core_device.pci_ops = &nvgrace_gpu_pci_dev_ops; 969 + } else { 970 + nvdev->core_device.pci_ops = &nvgrace_gpu_pci_dev_core_ops; 1107 971 } 1108 972 1109 973 ret = vfio_pci_core_register_device(&nvdev->core_device); ··· 1144 1002 1145 1003 MODULE_DEVICE_TABLE(pci, nvgrace_gpu_vfio_pci_table); 1146 1004 1005 + /* 1006 + * The GPU reset is required to be serialized against the *first* mapping 1007 + * faults and read/writes accesses to prevent potential RAS events logging. 1008 + * 1009 + * First fault or access after a reset needs to poll device readiness, 1010 + * flag that a reset has occurred. The readiness test is done by holding 1011 + * the memory_lock read lock and we expect all vfio-pci initiated resets to 1012 + * hold the memory_lock write lock to avoid races. However, .reset_done 1013 + * extends beyond the scope of vfio-pci initiated resets therefore we 1014 + * cannot assert this behavior and use lockdep_assert_held_write. 1015 + */ 1016 + static void nvgrace_gpu_vfio_pci_reset_done(struct pci_dev *pdev) 1017 + { 1018 + struct vfio_pci_core_device *core_device = dev_get_drvdata(&pdev->dev); 1019 + struct nvgrace_gpu_pci_core_device *nvdev = 1020 + container_of(core_device, struct nvgrace_gpu_pci_core_device, 1021 + core_device); 1022 + 1023 + nvdev->reset_done = true; 1024 + } 1025 + 1026 + static const struct pci_error_handlers nvgrace_gpu_vfio_pci_err_handlers = { 1027 + .reset_done = nvgrace_gpu_vfio_pci_reset_done, 1028 + .error_detected = vfio_pci_core_aer_err_detected, 1029 + }; 1030 + 1147 1031 static struct pci_driver nvgrace_gpu_vfio_pci_driver = { 1148 1032 .name = KBUILD_MODNAME, 1149 1033 .id_table = nvgrace_gpu_vfio_pci_table, 1150 1034 .probe = nvgrace_gpu_probe, 1151 1035 .remove = nvgrace_gpu_remove, 1152 - .err_handler = &vfio_pci_core_err_handlers, 1036 + .err_handler = &nvgrace_gpu_vfio_pci_err_handlers, 1153 1037 .driver_managed_dma = true, 1154 1038 }; 1155 1039
+1
drivers/vfio/pci/pds/vfio_dev.c
··· 195 195 .open_device = pds_vfio_open_device, 196 196 .close_device = pds_vfio_close_device, 197 197 .ioctl = vfio_pci_core_ioctl, 198 + .get_region_info_caps = vfio_pci_ioctl_get_region_info, 198 199 .device_feature = vfio_pci_core_ioctl_feature, 199 200 .read = vfio_pci_core_read, 200 201 .write = vfio_pci_core_write,
+1
drivers/vfio/pci/qat/main.c
··· 609 609 .open_device = qat_vf_pci_open_device, 610 610 .close_device = qat_vf_pci_close_device, 611 611 .ioctl = vfio_pci_core_ioctl, 612 + .get_region_info_caps = vfio_pci_ioctl_get_region_info, 612 613 .read = vfio_pci_core_read, 613 614 .write = vfio_pci_core_write, 614 615 .mmap = vfio_pci_core_mmap,
+6
drivers/vfio/pci/vfio_pci.c
··· 132 132 .open_device = vfio_pci_open_device, 133 133 .close_device = vfio_pci_core_close_device, 134 134 .ioctl = vfio_pci_core_ioctl, 135 + .get_region_info_caps = vfio_pci_ioctl_get_region_info, 135 136 .device_feature = vfio_pci_core_ioctl_feature, 136 137 .read = vfio_pci_core_read, 137 138 .write = vfio_pci_core_write, ··· 146 145 .detach_ioas = vfio_iommufd_physical_detach_ioas, 147 146 .pasid_attach_ioas = vfio_iommufd_physical_pasid_attach_ioas, 148 147 .pasid_detach_ioas = vfio_iommufd_physical_pasid_detach_ioas, 148 + }; 149 + 150 + static const struct vfio_pci_device_ops vfio_pci_dev_ops = { 151 + .get_dmabuf_phys = vfio_pci_core_get_dmabuf_phys, 149 152 }; 150 153 151 154 static int vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) ··· 166 161 return PTR_ERR(vdev); 167 162 168 163 dev_set_drvdata(&pdev->dev, vdev); 164 + vdev->pci_ops = &vfio_pci_dev_ops; 169 165 ret = vfio_pci_core_register_device(vdev); 170 166 if (ret) 171 167 goto out_put_vdev;
+19 -4
drivers/vfio/pci/vfio_pci_config.c
··· 416 416 return pdev->current_state < PCI_D3hot && 417 417 (pdev->no_command_memory || (cmd & PCI_COMMAND_MEMORY)); 418 418 } 419 + EXPORT_SYMBOL_GPL(__vfio_pci_memory_enabled); 419 420 420 421 /* 421 422 * Restore the *real* BARs after we detect a FLR or backdoor reset. ··· 590 589 virt_mem = !!(le16_to_cpu(*virt_cmd) & PCI_COMMAND_MEMORY); 591 590 new_mem = !!(new_cmd & PCI_COMMAND_MEMORY); 592 591 593 - if (!new_mem) 592 + if (!new_mem) { 594 593 vfio_pci_zap_and_down_write_memory_lock(vdev); 595 - else 594 + vfio_pci_dma_buf_move(vdev, true); 595 + } else { 596 596 down_write(&vdev->memory_lock); 597 + } 597 598 598 599 /* 599 600 * If the user is writing mem/io enable (new_mem/io) and we ··· 630 627 *virt_cmd &= cpu_to_le16(~mask); 631 628 *virt_cmd |= cpu_to_le16(new_cmd & mask); 632 629 630 + if (__vfio_pci_memory_enabled(vdev)) 631 + vfio_pci_dma_buf_move(vdev, false); 633 632 up_write(&vdev->memory_lock); 634 633 } 635 634 ··· 712 707 static void vfio_lock_and_set_power_state(struct vfio_pci_core_device *vdev, 713 708 pci_power_t state) 714 709 { 715 - if (state >= PCI_D3hot) 710 + if (state >= PCI_D3hot) { 716 711 vfio_pci_zap_and_down_write_memory_lock(vdev); 717 - else 712 + vfio_pci_dma_buf_move(vdev, true); 713 + } else { 718 714 down_write(&vdev->memory_lock); 715 + } 719 716 720 717 vfio_pci_set_power_state(vdev, state); 718 + if (__vfio_pci_memory_enabled(vdev)) 719 + vfio_pci_dma_buf_move(vdev, false); 721 720 up_write(&vdev->memory_lock); 722 721 } 723 722 ··· 909 900 910 901 if (!ret && (cap & PCI_EXP_DEVCAP_FLR)) { 911 902 vfio_pci_zap_and_down_write_memory_lock(vdev); 903 + vfio_pci_dma_buf_move(vdev, true); 912 904 pci_try_reset_function(vdev->pdev); 905 + if (__vfio_pci_memory_enabled(vdev)) 906 + vfio_pci_dma_buf_move(vdev, false); 913 907 up_write(&vdev->memory_lock); 914 908 } 915 909 } ··· 994 982 995 983 if (!ret && (cap & PCI_AF_CAP_FLR) && (cap & PCI_AF_CAP_TP)) { 996 984 vfio_pci_zap_and_down_write_memory_lock(vdev); 985 + vfio_pci_dma_buf_move(vdev, true); 997 986 pci_try_reset_function(vdev->pdev); 987 + if (__vfio_pci_memory_enabled(vdev)) 988 + vfio_pci_dma_buf_move(vdev, false); 998 989 up_write(&vdev->memory_lock); 999 990 } 1000 991 }
+161 -149
drivers/vfio/pci/vfio_pci_core.c
··· 28 28 #include <linux/nospec.h> 29 29 #include <linux/sched/mm.h> 30 30 #include <linux/iommufd.h> 31 + #include <linux/pci-p2pdma.h> 31 32 #if IS_ENABLED(CONFIG_EEH) 32 33 #include <asm/eeh.h> 33 34 #endif ··· 41 40 static bool nointxmask; 42 41 static bool disable_vga; 43 42 static bool disable_idle_d3; 43 + 44 + static void vfio_pci_eventfd_rcu_free(struct rcu_head *rcu) 45 + { 46 + struct vfio_pci_eventfd *eventfd = 47 + container_of(rcu, struct vfio_pci_eventfd, rcu); 48 + 49 + eventfd_ctx_put(eventfd->ctx); 50 + kfree(eventfd); 51 + } 52 + 53 + int vfio_pci_eventfd_replace_locked(struct vfio_pci_core_device *vdev, 54 + struct vfio_pci_eventfd __rcu **peventfd, 55 + struct eventfd_ctx *ctx) 56 + { 57 + struct vfio_pci_eventfd *new = NULL; 58 + struct vfio_pci_eventfd *old; 59 + 60 + lockdep_assert_held(&vdev->igate); 61 + 62 + if (ctx) { 63 + new = kzalloc(sizeof(*new), GFP_KERNEL_ACCOUNT); 64 + if (!new) 65 + return -ENOMEM; 66 + 67 + new->ctx = ctx; 68 + } 69 + 70 + old = rcu_replace_pointer(*peventfd, new, 71 + lockdep_is_held(&vdev->igate)); 72 + if (old) 73 + call_rcu(&old->rcu, vfio_pci_eventfd_rcu_free); 74 + 75 + return 0; 76 + } 44 77 45 78 /* List of PF's that vfio_pci_core_sriov_configure() has been called on */ 46 79 static DEFINE_MUTEX(vfio_pci_sriov_pfs_mutex); ··· 321 286 * semaphore. 322 287 */ 323 288 vfio_pci_zap_and_down_write_memory_lock(vdev); 289 + vfio_pci_dma_buf_move(vdev, true); 290 + 324 291 if (vdev->pm_runtime_engaged) { 325 292 up_write(&vdev->memory_lock); 326 293 return -EINVAL; ··· 336 299 return 0; 337 300 } 338 301 339 - static int vfio_pci_core_pm_entry(struct vfio_device *device, u32 flags, 302 + static int vfio_pci_core_pm_entry(struct vfio_pci_core_device *vdev, u32 flags, 340 303 void __user *arg, size_t argsz) 341 304 { 342 - struct vfio_pci_core_device *vdev = 343 - container_of(device, struct vfio_pci_core_device, vdev); 344 305 int ret; 345 306 346 307 ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0); ··· 355 320 } 356 321 357 322 static int vfio_pci_core_pm_entry_with_wakeup( 358 - struct vfio_device *device, u32 flags, 323 + struct vfio_pci_core_device *vdev, u32 flags, 359 324 struct vfio_device_low_power_entry_with_wakeup __user *arg, 360 325 size_t argsz) 361 326 { 362 - struct vfio_pci_core_device *vdev = 363 - container_of(device, struct vfio_pci_core_device, vdev); 364 327 struct vfio_device_low_power_entry_with_wakeup entry; 365 328 struct eventfd_ctx *efdctx; 366 329 int ret; ··· 406 373 */ 407 374 down_write(&vdev->memory_lock); 408 375 __vfio_pci_runtime_pm_exit(vdev); 376 + if (__vfio_pci_memory_enabled(vdev)) 377 + vfio_pci_dma_buf_move(vdev, false); 409 378 up_write(&vdev->memory_lock); 410 379 } 411 380 412 - static int vfio_pci_core_pm_exit(struct vfio_device *device, u32 flags, 381 + static int vfio_pci_core_pm_exit(struct vfio_pci_core_device *vdev, u32 flags, 413 382 void __user *arg, size_t argsz) 414 383 { 415 - struct vfio_pci_core_device *vdev = 416 - container_of(device, struct vfio_pci_core_device, vdev); 417 384 int ret; 418 385 419 386 ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_SET, 0); ··· 728 695 #endif 729 696 vfio_pci_core_disable(vdev); 730 697 698 + vfio_pci_dma_buf_cleanup(vdev); 699 + 731 700 mutex_lock(&vdev->igate); 732 - if (vdev->err_trigger) { 733 - eventfd_ctx_put(vdev->err_trigger); 734 - vdev->err_trigger = NULL; 735 - } 736 - if (vdev->req_trigger) { 737 - eventfd_ctx_put(vdev->req_trigger); 738 - vdev->req_trigger = NULL; 739 - } 701 + vfio_pci_eventfd_replace_locked(vdev, &vdev->err_trigger, NULL); 702 + vfio_pci_eventfd_replace_locked(vdev, &vdev->req_trigger, NULL); 740 703 mutex_unlock(&vdev->igate); 741 704 } 742 705 EXPORT_SYMBOL_GPL(vfio_pci_core_close_device); ··· 1025 996 return copy_to_user(arg, &info, minsz) ? -EFAULT : 0; 1026 997 } 1027 998 1028 - static int vfio_pci_ioctl_get_region_info(struct vfio_pci_core_device *vdev, 1029 - struct vfio_region_info __user *arg) 999 + int vfio_pci_ioctl_get_region_info(struct vfio_device *core_vdev, 1000 + struct vfio_region_info *info, 1001 + struct vfio_info_cap *caps) 1030 1002 { 1031 - unsigned long minsz = offsetofend(struct vfio_region_info, offset); 1003 + struct vfio_pci_core_device *vdev = 1004 + container_of(core_vdev, struct vfio_pci_core_device, vdev); 1032 1005 struct pci_dev *pdev = vdev->pdev; 1033 - struct vfio_region_info info; 1034 - struct vfio_info_cap caps = { .buf = NULL, .size = 0 }; 1035 1006 int i, ret; 1036 1007 1037 - if (copy_from_user(&info, arg, minsz)) 1038 - return -EFAULT; 1039 - 1040 - if (info.argsz < minsz) 1041 - return -EINVAL; 1042 - 1043 - switch (info.index) { 1008 + switch (info->index) { 1044 1009 case VFIO_PCI_CONFIG_REGION_INDEX: 1045 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 1046 - info.size = pdev->cfg_size; 1047 - info.flags = VFIO_REGION_INFO_FLAG_READ | 1048 - VFIO_REGION_INFO_FLAG_WRITE; 1010 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1011 + info->size = pdev->cfg_size; 1012 + info->flags = VFIO_REGION_INFO_FLAG_READ | 1013 + VFIO_REGION_INFO_FLAG_WRITE; 1049 1014 break; 1050 1015 case VFIO_PCI_BAR0_REGION_INDEX ... VFIO_PCI_BAR5_REGION_INDEX: 1051 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 1052 - info.size = pci_resource_len(pdev, info.index); 1053 - if (!info.size) { 1054 - info.flags = 0; 1016 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1017 + info->size = pci_resource_len(pdev, info->index); 1018 + if (!info->size) { 1019 + info->flags = 0; 1055 1020 break; 1056 1021 } 1057 1022 1058 - info.flags = VFIO_REGION_INFO_FLAG_READ | 1059 - VFIO_REGION_INFO_FLAG_WRITE; 1060 - if (vdev->bar_mmap_supported[info.index]) { 1061 - info.flags |= VFIO_REGION_INFO_FLAG_MMAP; 1062 - if (info.index == vdev->msix_bar) { 1063 - ret = msix_mmappable_cap(vdev, &caps); 1023 + info->flags = VFIO_REGION_INFO_FLAG_READ | 1024 + VFIO_REGION_INFO_FLAG_WRITE; 1025 + if (vdev->bar_mmap_supported[info->index]) { 1026 + info->flags |= VFIO_REGION_INFO_FLAG_MMAP; 1027 + if (info->index == vdev->msix_bar) { 1028 + ret = msix_mmappable_cap(vdev, caps); 1064 1029 if (ret) 1065 1030 return ret; 1066 1031 } ··· 1066 1043 size_t size; 1067 1044 u16 cmd; 1068 1045 1069 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 1070 - info.flags = 0; 1071 - info.size = 0; 1046 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1047 + info->flags = 0; 1048 + info->size = 0; 1072 1049 1073 1050 if (pci_resource_start(pdev, PCI_ROM_RESOURCE)) { 1074 1051 /* ··· 1078 1055 cmd = vfio_pci_memory_lock_and_enable(vdev); 1079 1056 io = pci_map_rom(pdev, &size); 1080 1057 if (io) { 1081 - info.flags = VFIO_REGION_INFO_FLAG_READ; 1058 + info->flags = VFIO_REGION_INFO_FLAG_READ; 1082 1059 /* Report the BAR size, not the ROM size. */ 1083 - info.size = pci_resource_len(pdev, PCI_ROM_RESOURCE); 1060 + info->size = pci_resource_len(pdev, 1061 + PCI_ROM_RESOURCE); 1084 1062 pci_unmap_rom(pdev, io); 1085 1063 } 1086 1064 vfio_pci_memory_unlock_and_restore(vdev, cmd); 1087 1065 } else if (pdev->rom && pdev->romlen) { 1088 - info.flags = VFIO_REGION_INFO_FLAG_READ; 1066 + info->flags = VFIO_REGION_INFO_FLAG_READ; 1089 1067 /* Report BAR size as power of two. */ 1090 - info.size = roundup_pow_of_two(pdev->romlen); 1068 + info->size = roundup_pow_of_two(pdev->romlen); 1091 1069 } 1092 1070 1093 1071 break; ··· 1097 1073 if (!vdev->has_vga) 1098 1074 return -EINVAL; 1099 1075 1100 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 1101 - info.size = 0xc0000; 1102 - info.flags = VFIO_REGION_INFO_FLAG_READ | 1103 - VFIO_REGION_INFO_FLAG_WRITE; 1076 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1077 + info->size = 0xc0000; 1078 + info->flags = VFIO_REGION_INFO_FLAG_READ | 1079 + VFIO_REGION_INFO_FLAG_WRITE; 1104 1080 1105 1081 break; 1106 1082 default: { ··· 1109 1085 .header.version = 1 1110 1086 }; 1111 1087 1112 - if (info.index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) 1088 + if (info->index >= VFIO_PCI_NUM_REGIONS + vdev->num_regions) 1113 1089 return -EINVAL; 1114 - info.index = array_index_nospec( 1115 - info.index, VFIO_PCI_NUM_REGIONS + vdev->num_regions); 1090 + info->index = array_index_nospec( 1091 + info->index, VFIO_PCI_NUM_REGIONS + vdev->num_regions); 1116 1092 1117 - i = info.index - VFIO_PCI_NUM_REGIONS; 1093 + i = info->index - VFIO_PCI_NUM_REGIONS; 1118 1094 1119 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 1120 - info.size = vdev->region[i].size; 1121 - info.flags = vdev->region[i].flags; 1095 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 1096 + info->size = vdev->region[i].size; 1097 + info->flags = vdev->region[i].flags; 1122 1098 1123 1099 cap_type.type = vdev->region[i].type; 1124 1100 cap_type.subtype = vdev->region[i].subtype; 1125 1101 1126 - ret = vfio_info_add_capability(&caps, &cap_type.header, 1102 + ret = vfio_info_add_capability(caps, &cap_type.header, 1127 1103 sizeof(cap_type)); 1128 1104 if (ret) 1129 1105 return ret; 1130 1106 1131 1107 if (vdev->region[i].ops->add_capability) { 1132 1108 ret = vdev->region[i].ops->add_capability( 1133 - vdev, &vdev->region[i], &caps); 1109 + vdev, &vdev->region[i], caps); 1134 1110 if (ret) 1135 1111 return ret; 1136 1112 } 1137 1113 } 1138 1114 } 1139 - 1140 - if (caps.size) { 1141 - info.flags |= VFIO_REGION_INFO_FLAG_CAPS; 1142 - if (info.argsz < sizeof(info) + caps.size) { 1143 - info.argsz = sizeof(info) + caps.size; 1144 - info.cap_offset = 0; 1145 - } else { 1146 - vfio_info_cap_shift(&caps, sizeof(info)); 1147 - if (copy_to_user(arg + 1, caps.buf, caps.size)) { 1148 - kfree(caps.buf); 1149 - return -EFAULT; 1150 - } 1151 - info.cap_offset = sizeof(*arg); 1152 - } 1153 - 1154 - kfree(caps.buf); 1155 - } 1156 - 1157 - return copy_to_user(arg, &info, minsz) ? -EFAULT : 0; 1115 + return 0; 1158 1116 } 1117 + EXPORT_SYMBOL_GPL(vfio_pci_ioctl_get_region_info); 1159 1118 1160 1119 static int vfio_pci_ioctl_get_irq_info(struct vfio_pci_core_device *vdev, 1161 1120 struct vfio_irq_info __user *arg) ··· 1234 1227 */ 1235 1228 vfio_pci_set_power_state(vdev, PCI_D0); 1236 1229 1230 + vfio_pci_dma_buf_move(vdev, true); 1237 1231 ret = pci_try_reset_function(vdev->pdev); 1232 + if (__vfio_pci_memory_enabled(vdev)) 1233 + vfio_pci_dma_buf_move(vdev, false); 1238 1234 up_write(&vdev->memory_lock); 1239 1235 1240 1236 return ret; ··· 1467 1457 return vfio_pci_ioctl_get_irq_info(vdev, uarg); 1468 1458 case VFIO_DEVICE_GET_PCI_HOT_RESET_INFO: 1469 1459 return vfio_pci_ioctl_get_pci_hot_reset_info(vdev, uarg); 1470 - case VFIO_DEVICE_GET_REGION_INFO: 1471 - return vfio_pci_ioctl_get_region_info(vdev, uarg); 1472 1460 case VFIO_DEVICE_IOEVENTFD: 1473 1461 return vfio_pci_ioctl_ioeventfd(vdev, uarg); 1474 1462 case VFIO_DEVICE_PCI_HOT_RESET: ··· 1481 1473 } 1482 1474 EXPORT_SYMBOL_GPL(vfio_pci_core_ioctl); 1483 1475 1484 - static int vfio_pci_core_feature_token(struct vfio_device *device, u32 flags, 1485 - uuid_t __user *arg, size_t argsz) 1476 + static int vfio_pci_core_feature_token(struct vfio_pci_core_device *vdev, 1477 + u32 flags, uuid_t __user *arg, 1478 + size_t argsz) 1486 1479 { 1487 - struct vfio_pci_core_device *vdev = 1488 - container_of(device, struct vfio_pci_core_device, vdev); 1489 1480 uuid_t uuid; 1490 1481 int ret; 1491 1482 ··· 1511 1504 int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, 1512 1505 void __user *arg, size_t argsz) 1513 1506 { 1507 + struct vfio_pci_core_device *vdev = 1508 + container_of(device, struct vfio_pci_core_device, vdev); 1509 + 1514 1510 switch (flags & VFIO_DEVICE_FEATURE_MASK) { 1515 1511 case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY: 1516 - return vfio_pci_core_pm_entry(device, flags, arg, argsz); 1512 + return vfio_pci_core_pm_entry(vdev, flags, arg, argsz); 1517 1513 case VFIO_DEVICE_FEATURE_LOW_POWER_ENTRY_WITH_WAKEUP: 1518 - return vfio_pci_core_pm_entry_with_wakeup(device, flags, 1514 + return vfio_pci_core_pm_entry_with_wakeup(vdev, flags, 1519 1515 arg, argsz); 1520 1516 case VFIO_DEVICE_FEATURE_LOW_POWER_EXIT: 1521 - return vfio_pci_core_pm_exit(device, flags, arg, argsz); 1517 + return vfio_pci_core_pm_exit(vdev, flags, arg, argsz); 1522 1518 case VFIO_DEVICE_FEATURE_PCI_VF_TOKEN: 1523 - return vfio_pci_core_feature_token(device, flags, arg, argsz); 1519 + return vfio_pci_core_feature_token(vdev, flags, arg, argsz); 1520 + case VFIO_DEVICE_FEATURE_DMA_BUF: 1521 + return vfio_pci_core_feature_dma_buf(vdev, flags, arg, argsz); 1524 1522 default: 1525 1523 return -ENOTTY; 1526 1524 } ··· 1652 1640 return (pci_resource_start(vdev->pdev, index) >> PAGE_SHIFT) + pgoff; 1653 1641 } 1654 1642 1643 + vm_fault_t vfio_pci_vmf_insert_pfn(struct vfio_pci_core_device *vdev, 1644 + struct vm_fault *vmf, 1645 + unsigned long pfn, 1646 + unsigned int order) 1647 + { 1648 + lockdep_assert_held_read(&vdev->memory_lock); 1649 + 1650 + if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev)) 1651 + return VM_FAULT_SIGBUS; 1652 + 1653 + switch (order) { 1654 + case 0: 1655 + return vmf_insert_pfn(vmf->vma, vmf->address, pfn); 1656 + #ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP 1657 + case PMD_ORDER: 1658 + return vmf_insert_pfn_pmd(vmf, pfn, false); 1659 + #endif 1660 + #ifdef CONFIG_ARCH_SUPPORTS_PUD_PFNMAP 1661 + case PUD_ORDER: 1662 + return vmf_insert_pfn_pud(vmf, pfn, false); 1663 + break; 1664 + #endif 1665 + default: 1666 + return VM_FAULT_FALLBACK; 1667 + } 1668 + } 1669 + EXPORT_SYMBOL_GPL(vfio_pci_vmf_insert_pfn); 1670 + 1655 1671 static vm_fault_t vfio_pci_mmap_huge_fault(struct vm_fault *vmf, 1656 1672 unsigned int order) 1657 1673 { ··· 1688 1648 unsigned long addr = vmf->address & ~((PAGE_SIZE << order) - 1); 1689 1649 unsigned long pgoff = (addr - vma->vm_start) >> PAGE_SHIFT; 1690 1650 unsigned long pfn = vma_to_pfn(vma) + pgoff; 1691 - vm_fault_t ret = VM_FAULT_SIGBUS; 1651 + vm_fault_t ret = VM_FAULT_FALLBACK; 1692 1652 1693 - if (order && (addr < vma->vm_start || 1694 - addr + (PAGE_SIZE << order) > vma->vm_end || 1695 - pfn & ((1 << order) - 1))) { 1696 - ret = VM_FAULT_FALLBACK; 1697 - goto out; 1653 + if (is_aligned_for_order(vma, addr, pfn, order)) { 1654 + scoped_guard(rwsem_read, &vdev->memory_lock) 1655 + ret = vfio_pci_vmf_insert_pfn(vdev, vmf, pfn, order); 1698 1656 } 1699 1657 1700 - down_read(&vdev->memory_lock); 1701 - 1702 - if (vdev->pm_runtime_engaged || !__vfio_pci_memory_enabled(vdev)) 1703 - goto out_unlock; 1704 - 1705 - switch (order) { 1706 - case 0: 1707 - ret = vmf_insert_pfn(vma, vmf->address, pfn); 1708 - break; 1709 - #ifdef CONFIG_ARCH_SUPPORTS_PMD_PFNMAP 1710 - case PMD_ORDER: 1711 - ret = vmf_insert_pfn_pmd(vmf, pfn, false); 1712 - break; 1713 - #endif 1714 - #ifdef CONFIG_ARCH_SUPPORTS_PUD_PFNMAP 1715 - case PUD_ORDER: 1716 - ret = vmf_insert_pfn_pud(vmf, pfn, false); 1717 - break; 1718 - #endif 1719 - default: 1720 - ret = VM_FAULT_FALLBACK; 1721 - } 1722 - 1723 - out_unlock: 1724 - up_read(&vdev->memory_lock); 1725 - out: 1726 1658 dev_dbg_ratelimited(&vdev->pdev->dev, 1727 1659 "%s(,order = %d) BAR %ld page offset 0x%lx: 0x%x\n", 1728 1660 __func__, order, ··· 1761 1749 * Even though we don't make use of the barmap for the mmap, 1762 1750 * we need to request the region and the barmap tracks that. 1763 1751 */ 1764 - if (!vdev->barmap[index]) { 1765 - ret = pci_request_selected_regions(pdev, 1766 - 1 << index, "vfio-pci"); 1767 - if (ret) 1768 - return ret; 1769 - 1770 - vdev->barmap[index] = pci_iomap(pdev, index, 0); 1771 - if (!vdev->barmap[index]) { 1772 - pci_release_selected_regions(pdev, 1 << index); 1773 - return -ENOMEM; 1774 - } 1775 - } 1752 + ret = vfio_pci_core_setup_barmap(vdev, index); 1753 + if (ret) 1754 + return ret; 1776 1755 1777 1756 vma->vm_private_data = vdev; 1778 1757 vma->vm_page_prot = pgprot_noncached(vma->vm_page_prot); ··· 1803 1800 struct vfio_pci_core_device *vdev = 1804 1801 container_of(core_vdev, struct vfio_pci_core_device, vdev); 1805 1802 struct pci_dev *pdev = vdev->pdev; 1803 + struct vfio_pci_eventfd *eventfd; 1806 1804 1807 - mutex_lock(&vdev->igate); 1808 - 1809 - if (vdev->req_trigger) { 1805 + rcu_read_lock(); 1806 + eventfd = rcu_dereference(vdev->req_trigger); 1807 + if (eventfd) { 1810 1808 if (!(count % 10)) 1811 1809 pci_notice_ratelimited(pdev, 1812 1810 "Relaying device request to user (#%u)\n", 1813 1811 count); 1814 - eventfd_signal(vdev->req_trigger); 1812 + eventfd_signal(eventfd->ctx); 1815 1813 } else if (count == 0) { 1816 1814 pci_warn(pdev, 1817 1815 "No device request channel registered, blocked until released by user\n"); 1818 1816 } 1819 - 1820 - mutex_unlock(&vdev->igate); 1817 + rcu_read_unlock(); 1821 1818 } 1822 1819 EXPORT_SYMBOL_GPL(vfio_pci_core_request); 1823 1820 ··· 2088 2085 { 2089 2086 struct vfio_pci_core_device *vdev = 2090 2087 container_of(core_vdev, struct vfio_pci_core_device, vdev); 2088 + int ret; 2091 2089 2092 2090 vdev->pdev = to_pci_dev(core_vdev->dev); 2093 2091 vdev->irq_type = VFIO_PCI_NUM_IRQS; ··· 2098 2094 INIT_LIST_HEAD(&vdev->dummy_resources_list); 2099 2095 INIT_LIST_HEAD(&vdev->ioeventfds_list); 2100 2096 INIT_LIST_HEAD(&vdev->sriov_pfs_item); 2097 + ret = pcim_p2pdma_init(vdev->pdev); 2098 + if (ret && ret != -EOPNOTSUPP) 2099 + return ret; 2100 + INIT_LIST_HEAD(&vdev->dmabufs); 2101 2101 init_rwsem(&vdev->memory_lock); 2102 2102 xa_init(&vdev->ctx); 2103 2103 ··· 2235 2227 pci_channel_state_t state) 2236 2228 { 2237 2229 struct vfio_pci_core_device *vdev = dev_get_drvdata(&pdev->dev); 2230 + struct vfio_pci_eventfd *eventfd; 2238 2231 2239 - mutex_lock(&vdev->igate); 2240 - 2241 - if (vdev->err_trigger) 2242 - eventfd_signal(vdev->err_trigger); 2243 - 2244 - mutex_unlock(&vdev->igate); 2232 + rcu_read_lock(); 2233 + eventfd = rcu_dereference(vdev->err_trigger); 2234 + if (eventfd) 2235 + eventfd_signal(eventfd->ctx); 2236 + rcu_read_unlock(); 2245 2237 2246 2238 return PCI_ERS_RESULT_CAN_RECOVER; 2247 2239 } ··· 2466 2458 break; 2467 2459 } 2468 2460 2461 + vfio_pci_dma_buf_move(vdev, true); 2469 2462 vfio_pci_zap_bars(vdev); 2470 2463 } 2471 2464 ··· 2495 2486 2496 2487 err_undo: 2497 2488 list_for_each_entry_from_reverse(vdev, &dev_set->device_list, 2498 - vdev.dev_set_list) 2489 + vdev.dev_set_list) { 2490 + if (vdev->vdev.open_count && __vfio_pci_memory_enabled(vdev)) 2491 + vfio_pci_dma_buf_move(vdev, false); 2499 2492 up_write(&vdev->memory_lock); 2493 + } 2500 2494 2501 2495 list_for_each_entry(vdev, &dev_set->device_list, vdev.dev_set_list) 2502 2496 pm_runtime_put(&vdev->pdev->dev);
+316
drivers/vfio/pci/vfio_pci_dmabuf.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. 3 + */ 4 + #include <linux/dma-buf-mapping.h> 5 + #include <linux/pci-p2pdma.h> 6 + #include <linux/dma-resv.h> 7 + 8 + #include "vfio_pci_priv.h" 9 + 10 + MODULE_IMPORT_NS("DMA_BUF"); 11 + 12 + struct vfio_pci_dma_buf { 13 + struct dma_buf *dmabuf; 14 + struct vfio_pci_core_device *vdev; 15 + struct list_head dmabufs_elm; 16 + size_t size; 17 + struct dma_buf_phys_vec *phys_vec; 18 + struct p2pdma_provider *provider; 19 + u32 nr_ranges; 20 + u8 revoked : 1; 21 + }; 22 + 23 + static int vfio_pci_dma_buf_attach(struct dma_buf *dmabuf, 24 + struct dma_buf_attachment *attachment) 25 + { 26 + struct vfio_pci_dma_buf *priv = dmabuf->priv; 27 + 28 + if (!attachment->peer2peer) 29 + return -EOPNOTSUPP; 30 + 31 + if (priv->revoked) 32 + return -ENODEV; 33 + 34 + return 0; 35 + } 36 + 37 + static struct sg_table * 38 + vfio_pci_dma_buf_map(struct dma_buf_attachment *attachment, 39 + enum dma_data_direction dir) 40 + { 41 + struct vfio_pci_dma_buf *priv = attachment->dmabuf->priv; 42 + 43 + dma_resv_assert_held(priv->dmabuf->resv); 44 + 45 + if (priv->revoked) 46 + return ERR_PTR(-ENODEV); 47 + 48 + return dma_buf_phys_vec_to_sgt(attachment, priv->provider, 49 + priv->phys_vec, priv->nr_ranges, 50 + priv->size, dir); 51 + } 52 + 53 + static void vfio_pci_dma_buf_unmap(struct dma_buf_attachment *attachment, 54 + struct sg_table *sgt, 55 + enum dma_data_direction dir) 56 + { 57 + dma_buf_free_sgt(attachment, sgt, dir); 58 + } 59 + 60 + static void vfio_pci_dma_buf_release(struct dma_buf *dmabuf) 61 + { 62 + struct vfio_pci_dma_buf *priv = dmabuf->priv; 63 + 64 + /* 65 + * Either this or vfio_pci_dma_buf_cleanup() will remove from the list. 66 + * The refcount prevents both. 67 + */ 68 + if (priv->vdev) { 69 + down_write(&priv->vdev->memory_lock); 70 + list_del_init(&priv->dmabufs_elm); 71 + up_write(&priv->vdev->memory_lock); 72 + vfio_device_put_registration(&priv->vdev->vdev); 73 + } 74 + kfree(priv->phys_vec); 75 + kfree(priv); 76 + } 77 + 78 + static const struct dma_buf_ops vfio_pci_dmabuf_ops = { 79 + .attach = vfio_pci_dma_buf_attach, 80 + .map_dma_buf = vfio_pci_dma_buf_map, 81 + .unmap_dma_buf = vfio_pci_dma_buf_unmap, 82 + .release = vfio_pci_dma_buf_release, 83 + }; 84 + 85 + int vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec, 86 + struct vfio_region_dma_range *dma_ranges, 87 + size_t nr_ranges, phys_addr_t start, 88 + phys_addr_t len) 89 + { 90 + phys_addr_t max_addr; 91 + unsigned int i; 92 + 93 + max_addr = start + len; 94 + for (i = 0; i < nr_ranges; i++) { 95 + phys_addr_t end; 96 + 97 + if (!dma_ranges[i].length) 98 + return -EINVAL; 99 + 100 + if (check_add_overflow(start, dma_ranges[i].offset, 101 + &phys_vec[i].paddr) || 102 + check_add_overflow(phys_vec[i].paddr, 103 + dma_ranges[i].length, &end)) 104 + return -EOVERFLOW; 105 + if (end > max_addr) 106 + return -EINVAL; 107 + 108 + phys_vec[i].len = dma_ranges[i].length; 109 + } 110 + return 0; 111 + } 112 + EXPORT_SYMBOL_GPL(vfio_pci_core_fill_phys_vec); 113 + 114 + int vfio_pci_core_get_dmabuf_phys(struct vfio_pci_core_device *vdev, 115 + struct p2pdma_provider **provider, 116 + unsigned int region_index, 117 + struct dma_buf_phys_vec *phys_vec, 118 + struct vfio_region_dma_range *dma_ranges, 119 + size_t nr_ranges) 120 + { 121 + struct pci_dev *pdev = vdev->pdev; 122 + 123 + *provider = pcim_p2pdma_provider(pdev, region_index); 124 + if (!*provider) 125 + return -EINVAL; 126 + 127 + return vfio_pci_core_fill_phys_vec( 128 + phys_vec, dma_ranges, nr_ranges, 129 + pci_resource_start(pdev, region_index), 130 + pci_resource_len(pdev, region_index)); 131 + } 132 + EXPORT_SYMBOL_GPL(vfio_pci_core_get_dmabuf_phys); 133 + 134 + static int validate_dmabuf_input(struct vfio_device_feature_dma_buf *dma_buf, 135 + struct vfio_region_dma_range *dma_ranges, 136 + size_t *lengthp) 137 + { 138 + size_t length = 0; 139 + u32 i; 140 + 141 + for (i = 0; i < dma_buf->nr_ranges; i++) { 142 + u64 offset = dma_ranges[i].offset; 143 + u64 len = dma_ranges[i].length; 144 + 145 + if (!len || !PAGE_ALIGNED(offset) || !PAGE_ALIGNED(len)) 146 + return -EINVAL; 147 + 148 + if (check_add_overflow(length, len, &length)) 149 + return -EINVAL; 150 + } 151 + 152 + /* 153 + * dma_iova_try_alloc() will WARN on if userspace proposes a size that 154 + * is too big, eg with lots of ranges. 155 + */ 156 + if ((u64)(length) & DMA_IOVA_USE_SWIOTLB) 157 + return -EINVAL; 158 + 159 + *lengthp = length; 160 + return 0; 161 + } 162 + 163 + int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, 164 + struct vfio_device_feature_dma_buf __user *arg, 165 + size_t argsz) 166 + { 167 + struct vfio_device_feature_dma_buf get_dma_buf = {}; 168 + struct vfio_region_dma_range *dma_ranges; 169 + DEFINE_DMA_BUF_EXPORT_INFO(exp_info); 170 + struct vfio_pci_dma_buf *priv; 171 + size_t length; 172 + int ret; 173 + 174 + if (!vdev->pci_ops || !vdev->pci_ops->get_dmabuf_phys) 175 + return -EOPNOTSUPP; 176 + 177 + ret = vfio_check_feature(flags, argsz, VFIO_DEVICE_FEATURE_GET, 178 + sizeof(get_dma_buf)); 179 + if (ret != 1) 180 + return ret; 181 + 182 + if (copy_from_user(&get_dma_buf, arg, sizeof(get_dma_buf))) 183 + return -EFAULT; 184 + 185 + if (!get_dma_buf.nr_ranges || get_dma_buf.flags) 186 + return -EINVAL; 187 + 188 + /* 189 + * For PCI the region_index is the BAR number like everything else. 190 + */ 191 + if (get_dma_buf.region_index >= VFIO_PCI_ROM_REGION_INDEX) 192 + return -ENODEV; 193 + 194 + dma_ranges = memdup_array_user(&arg->dma_ranges, get_dma_buf.nr_ranges, 195 + sizeof(*dma_ranges)); 196 + if (IS_ERR(dma_ranges)) 197 + return PTR_ERR(dma_ranges); 198 + 199 + ret = validate_dmabuf_input(&get_dma_buf, dma_ranges, &length); 200 + if (ret) 201 + goto err_free_ranges; 202 + 203 + priv = kzalloc(sizeof(*priv), GFP_KERNEL); 204 + if (!priv) { 205 + ret = -ENOMEM; 206 + goto err_free_ranges; 207 + } 208 + priv->phys_vec = kcalloc(get_dma_buf.nr_ranges, sizeof(*priv->phys_vec), 209 + GFP_KERNEL); 210 + if (!priv->phys_vec) { 211 + ret = -ENOMEM; 212 + goto err_free_priv; 213 + } 214 + 215 + priv->vdev = vdev; 216 + priv->nr_ranges = get_dma_buf.nr_ranges; 217 + priv->size = length; 218 + ret = vdev->pci_ops->get_dmabuf_phys(vdev, &priv->provider, 219 + get_dma_buf.region_index, 220 + priv->phys_vec, dma_ranges, 221 + priv->nr_ranges); 222 + if (ret) 223 + goto err_free_phys; 224 + 225 + kfree(dma_ranges); 226 + dma_ranges = NULL; 227 + 228 + if (!vfio_device_try_get_registration(&vdev->vdev)) { 229 + ret = -ENODEV; 230 + goto err_free_phys; 231 + } 232 + 233 + exp_info.ops = &vfio_pci_dmabuf_ops; 234 + exp_info.size = priv->size; 235 + exp_info.flags = get_dma_buf.open_flags; 236 + exp_info.priv = priv; 237 + 238 + priv->dmabuf = dma_buf_export(&exp_info); 239 + if (IS_ERR(priv->dmabuf)) { 240 + ret = PTR_ERR(priv->dmabuf); 241 + goto err_dev_put; 242 + } 243 + 244 + /* dma_buf_put() now frees priv */ 245 + INIT_LIST_HEAD(&priv->dmabufs_elm); 246 + down_write(&vdev->memory_lock); 247 + dma_resv_lock(priv->dmabuf->resv, NULL); 248 + priv->revoked = !__vfio_pci_memory_enabled(vdev); 249 + list_add_tail(&priv->dmabufs_elm, &vdev->dmabufs); 250 + dma_resv_unlock(priv->dmabuf->resv); 251 + up_write(&vdev->memory_lock); 252 + 253 + /* 254 + * dma_buf_fd() consumes the reference, when the file closes the dmabuf 255 + * will be released. 256 + */ 257 + ret = dma_buf_fd(priv->dmabuf, get_dma_buf.open_flags); 258 + if (ret < 0) 259 + goto err_dma_buf; 260 + return ret; 261 + 262 + err_dma_buf: 263 + dma_buf_put(priv->dmabuf); 264 + err_dev_put: 265 + vfio_device_put_registration(&vdev->vdev); 266 + err_free_phys: 267 + kfree(priv->phys_vec); 268 + err_free_priv: 269 + kfree(priv); 270 + err_free_ranges: 271 + kfree(dma_ranges); 272 + return ret; 273 + } 274 + 275 + void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked) 276 + { 277 + struct vfio_pci_dma_buf *priv; 278 + struct vfio_pci_dma_buf *tmp; 279 + 280 + lockdep_assert_held_write(&vdev->memory_lock); 281 + 282 + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { 283 + if (!get_file_active(&priv->dmabuf->file)) 284 + continue; 285 + 286 + if (priv->revoked != revoked) { 287 + dma_resv_lock(priv->dmabuf->resv, NULL); 288 + priv->revoked = revoked; 289 + dma_buf_move_notify(priv->dmabuf); 290 + dma_resv_unlock(priv->dmabuf->resv); 291 + } 292 + fput(priv->dmabuf->file); 293 + } 294 + } 295 + 296 + void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) 297 + { 298 + struct vfio_pci_dma_buf *priv; 299 + struct vfio_pci_dma_buf *tmp; 300 + 301 + down_write(&vdev->memory_lock); 302 + list_for_each_entry_safe(priv, tmp, &vdev->dmabufs, dmabufs_elm) { 303 + if (!get_file_active(&priv->dmabuf->file)) 304 + continue; 305 + 306 + dma_resv_lock(priv->dmabuf->resv, NULL); 307 + list_del_init(&priv->dmabufs_elm); 308 + priv->vdev = NULL; 309 + priv->revoked = true; 310 + dma_buf_move_notify(priv->dmabuf); 311 + dma_resv_unlock(priv->dmabuf->resv); 312 + vfio_device_put_registration(&vdev->vdev); 313 + fput(priv->dmabuf->file); 314 + } 315 + up_write(&vdev->memory_lock); 316 + }
+33 -19
drivers/vfio/pci/vfio_pci_intrs.c
··· 731 731 return 0; 732 732 } 733 733 734 - static int vfio_pci_set_ctx_trigger_single(struct eventfd_ctx **ctx, 734 + static int vfio_pci_set_ctx_trigger_single(struct vfio_pci_core_device *vdev, 735 + struct vfio_pci_eventfd __rcu **peventfd, 735 736 unsigned int count, uint32_t flags, 736 737 void *data) 737 738 { 738 739 /* DATA_NONE/DATA_BOOL enables loopback testing */ 739 740 if (flags & VFIO_IRQ_SET_DATA_NONE) { 740 - if (*ctx) { 741 - if (count) { 742 - eventfd_signal(*ctx); 743 - } else { 744 - eventfd_ctx_put(*ctx); 745 - *ctx = NULL; 746 - } 741 + struct vfio_pci_eventfd *eventfd; 742 + 743 + eventfd = rcu_dereference_protected(*peventfd, 744 + lockdep_is_held(&vdev->igate)); 745 + 746 + if (!eventfd) 747 + return -EINVAL; 748 + 749 + if (count) { 750 + eventfd_signal(eventfd->ctx); 747 751 return 0; 748 752 } 753 + 754 + return vfio_pci_eventfd_replace_locked(vdev, peventfd, NULL); 749 755 } else if (flags & VFIO_IRQ_SET_DATA_BOOL) { 750 756 uint8_t trigger; 751 757 ··· 759 753 return -EINVAL; 760 754 761 755 trigger = *(uint8_t *)data; 762 - if (trigger && *ctx) 763 - eventfd_signal(*ctx); 756 + 757 + if (trigger) { 758 + struct vfio_pci_eventfd *eventfd = 759 + rcu_dereference_protected(*peventfd, 760 + lockdep_is_held(&vdev->igate)); 761 + 762 + if (eventfd) 763 + eventfd_signal(eventfd->ctx); 764 + } 764 765 765 766 return 0; 766 767 } else if (flags & VFIO_IRQ_SET_DATA_EVENTFD) { ··· 778 765 779 766 fd = *(int32_t *)data; 780 767 if (fd == -1) { 781 - if (*ctx) 782 - eventfd_ctx_put(*ctx); 783 - *ctx = NULL; 768 + return vfio_pci_eventfd_replace_locked(vdev, 769 + peventfd, NULL); 784 770 } else if (fd >= 0) { 785 771 struct eventfd_ctx *efdctx; 772 + int ret; 786 773 787 774 efdctx = eventfd_ctx_fdget(fd); 788 775 if (IS_ERR(efdctx)) 789 776 return PTR_ERR(efdctx); 790 777 791 - if (*ctx) 792 - eventfd_ctx_put(*ctx); 778 + ret = vfio_pci_eventfd_replace_locked(vdev, 779 + peventfd, efdctx); 780 + if (ret) 781 + eventfd_ctx_put(efdctx); 793 782 794 - *ctx = efdctx; 783 + return ret; 795 784 } 796 - return 0; 797 785 } 798 786 799 787 return -EINVAL; ··· 807 793 if (index != VFIO_PCI_ERR_IRQ_INDEX || start != 0 || count > 1) 808 794 return -EINVAL; 809 795 810 - return vfio_pci_set_ctx_trigger_single(&vdev->err_trigger, 796 + return vfio_pci_set_ctx_trigger_single(vdev, &vdev->err_trigger, 811 797 count, flags, data); 812 798 } 813 799 ··· 818 804 if (index != VFIO_PCI_REQ_IRQ_INDEX || start != 0 || count > 1) 819 805 return -EINVAL; 820 806 821 - return vfio_pci_set_ctx_trigger_single(&vdev->req_trigger, 807 + return vfio_pci_set_ctx_trigger_single(vdev, &vdev->req_trigger, 822 808 count, flags, data); 823 809 } 824 810
+27 -1
drivers/vfio/pci/vfio_pci_priv.h
··· 26 26 bool vfio_pci_intx_mask(struct vfio_pci_core_device *vdev); 27 27 void vfio_pci_intx_unmask(struct vfio_pci_core_device *vdev); 28 28 29 + int vfio_pci_eventfd_replace_locked(struct vfio_pci_core_device *vdev, 30 + struct vfio_pci_eventfd __rcu **peventfd, 31 + struct eventfd_ctx *ctx); 32 + 29 33 int vfio_pci_set_irqs_ioctl(struct vfio_pci_core_device *vdev, uint32_t flags, 30 34 unsigned index, unsigned start, unsigned count, 31 35 void *data); ··· 64 60 int vfio_pci_set_power_state(struct vfio_pci_core_device *vdev, 65 61 pci_power_t state); 66 62 67 - bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev); 68 63 void vfio_pci_zap_and_down_write_memory_lock(struct vfio_pci_core_device *vdev); 69 64 u16 vfio_pci_memory_lock_and_enable(struct vfio_pci_core_device *vdev); 70 65 void vfio_pci_memory_unlock_and_restore(struct vfio_pci_core_device *vdev, ··· 109 106 { 110 107 return (pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA; 111 108 } 109 + 110 + #ifdef CONFIG_VFIO_PCI_DMABUF 111 + int vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, 112 + struct vfio_device_feature_dma_buf __user *arg, 113 + size_t argsz); 114 + void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev); 115 + void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, bool revoked); 116 + #else 117 + static inline int 118 + vfio_pci_core_feature_dma_buf(struct vfio_pci_core_device *vdev, u32 flags, 119 + struct vfio_device_feature_dma_buf __user *arg, 120 + size_t argsz) 121 + { 122 + return -ENOTTY; 123 + } 124 + static inline void vfio_pci_dma_buf_cleanup(struct vfio_pci_core_device *vdev) 125 + { 126 + } 127 + static inline void vfio_pci_dma_buf_move(struct vfio_pci_core_device *vdev, 128 + bool revoked) 129 + { 130 + } 131 + #endif 112 132 113 133 #endif
+2 -3
drivers/vfio/pci/virtio/common.h
··· 109 109 110 110 #ifdef CONFIG_VIRTIO_VFIO_PCI_ADMIN_LEGACY 111 111 int virtiovf_open_legacy_io(struct virtiovf_pci_core_device *virtvdev); 112 - long virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, 113 - unsigned int cmd, unsigned long arg); 114 112 int virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev, 115 - unsigned int cmd, unsigned long arg); 113 + struct vfio_region_info *info, 114 + struct vfio_info_cap *caps); 116 115 ssize_t virtiovf_pci_core_write(struct vfio_device *core_vdev, 117 116 const char __user *buf, size_t count, 118 117 loff_t *ppos);
+8 -30
drivers/vfio/pci/virtio/legacy_io.c
··· 281 281 } 282 282 283 283 int virtiovf_pci_ioctl_get_region_info(struct vfio_device *core_vdev, 284 - unsigned int cmd, unsigned long arg) 284 + struct vfio_region_info *info, 285 + struct vfio_info_cap *caps) 285 286 { 286 287 struct virtiovf_pci_core_device *virtvdev = container_of( 287 288 core_vdev, struct virtiovf_pci_core_device, core_device.vdev); 288 - unsigned long minsz = offsetofend(struct vfio_region_info, offset); 289 - void __user *uarg = (void __user *)arg; 290 - struct vfio_region_info info = {}; 291 289 292 - if (copy_from_user(&info, uarg, minsz)) 293 - return -EFAULT; 290 + if (info->index != VFIO_PCI_BAR0_REGION_INDEX) 291 + return vfio_pci_ioctl_get_region_info(core_vdev, info, caps); 294 292 295 - if (info.argsz < minsz) 296 - return -EINVAL; 297 - 298 - switch (info.index) { 299 - case VFIO_PCI_BAR0_REGION_INDEX: 300 - info.offset = VFIO_PCI_INDEX_TO_OFFSET(info.index); 301 - info.size = virtvdev->bar0_virtual_buf_size; 302 - info.flags = VFIO_REGION_INFO_FLAG_READ | 303 - VFIO_REGION_INFO_FLAG_WRITE; 304 - return copy_to_user(uarg, &info, minsz) ? -EFAULT : 0; 305 - default: 306 - return vfio_pci_core_ioctl(core_vdev, cmd, arg); 307 - } 308 - } 309 - 310 - long virtiovf_vfio_pci_core_ioctl(struct vfio_device *core_vdev, unsigned int cmd, 311 - unsigned long arg) 312 - { 313 - switch (cmd) { 314 - case VFIO_DEVICE_GET_REGION_INFO: 315 - return virtiovf_pci_ioctl_get_region_info(core_vdev, cmd, arg); 316 - default: 317 - return vfio_pci_core_ioctl(core_vdev, cmd, arg); 318 - } 293 + info->offset = VFIO_PCI_INDEX_TO_OFFSET(info->index); 294 + info->size = virtvdev->bar0_virtual_buf_size; 295 + info->flags = VFIO_REGION_INFO_FLAG_READ | VFIO_REGION_INFO_FLAG_WRITE; 296 + return 0; 319 297 } 320 298 321 299 static int virtiovf_set_notify_addr(struct virtiovf_pci_core_device *virtvdev)
+4 -1
drivers/vfio/pci/virtio/main.c
··· 88 88 .open_device = virtiovf_pci_open_device, 89 89 .close_device = virtiovf_pci_close_device, 90 90 .ioctl = vfio_pci_core_ioctl, 91 + .get_region_info_caps = vfio_pci_ioctl_get_region_info, 91 92 .device_feature = vfio_pci_core_ioctl_feature, 92 93 .read = vfio_pci_core_read, 93 94 .write = vfio_pci_core_write, ··· 109 108 .release = virtiovf_pci_core_release_dev, 110 109 .open_device = virtiovf_pci_open_device, 111 110 .close_device = virtiovf_pci_close_device, 112 - .ioctl = virtiovf_vfio_pci_core_ioctl, 111 + .ioctl = vfio_pci_core_ioctl, 112 + .get_region_info_caps = virtiovf_pci_ioctl_get_region_info, 113 113 .device_feature = vfio_pci_core_ioctl_feature, 114 114 .read = virtiovf_pci_core_read, 115 115 .write = virtiovf_pci_core_write, ··· 132 130 .open_device = virtiovf_pci_open_device, 133 131 .close_device = vfio_pci_core_close_device, 134 132 .ioctl = vfio_pci_core_ioctl, 133 + .get_region_info_caps = vfio_pci_ioctl_get_region_info, 135 134 .device_feature = vfio_pci_core_ioctl_feature, 136 135 .read = vfio_pci_core_read, 137 136 .write = vfio_pci_core_write,
+1
drivers/vfio/platform/vfio_amba.c
··· 115 115 .open_device = vfio_platform_open_device, 116 116 .close_device = vfio_platform_close_device, 117 117 .ioctl = vfio_platform_ioctl, 118 + .get_region_info_caps = vfio_platform_ioctl_get_region_info, 118 119 .read = vfio_platform_read, 119 120 .write = vfio_platform_write, 120 121 .mmap = vfio_platform_mmap,
+1
drivers/vfio/platform/vfio_platform.c
··· 101 101 .open_device = vfio_platform_open_device, 102 102 .close_device = vfio_platform_close_device, 103 103 .ioctl = vfio_platform_ioctl, 104 + .get_region_info_caps = vfio_platform_ioctl_get_region_info, 104 105 .read = vfio_platform_read, 105 106 .write = vfio_platform_write, 106 107 .mmap = vfio_platform_mmap,
+18 -22
drivers/vfio/platform/vfio_platform_common.c
··· 272 272 } 273 273 EXPORT_SYMBOL_GPL(vfio_platform_open_device); 274 274 275 + int vfio_platform_ioctl_get_region_info(struct vfio_device *core_vdev, 276 + struct vfio_region_info *info, 277 + struct vfio_info_cap *caps) 278 + { 279 + struct vfio_platform_device *vdev = 280 + container_of(core_vdev, struct vfio_platform_device, vdev); 281 + 282 + if (info->index >= vdev->num_regions) 283 + return -EINVAL; 284 + 285 + /* map offset to the physical address */ 286 + info->offset = VFIO_PLATFORM_INDEX_TO_OFFSET(info->index); 287 + info->size = vdev->regions[info->index].size; 288 + info->flags = vdev->regions[info->index].flags; 289 + return 0; 290 + } 291 + EXPORT_SYMBOL_GPL(vfio_platform_ioctl_get_region_info); 292 + 275 293 long vfio_platform_ioctl(struct vfio_device *core_vdev, 276 294 unsigned int cmd, unsigned long arg) 277 295 { ··· 314 296 info.flags = vdev->flags; 315 297 info.num_regions = vdev->num_regions; 316 298 info.num_irqs = vdev->num_irqs; 317 - 318 - return copy_to_user((void __user *)arg, &info, minsz) ? 319 - -EFAULT : 0; 320 - 321 - } else if (cmd == VFIO_DEVICE_GET_REGION_INFO) { 322 - struct vfio_region_info info; 323 - 324 - minsz = offsetofend(struct vfio_region_info, offset); 325 - 326 - if (copy_from_user(&info, (void __user *)arg, minsz)) 327 - return -EFAULT; 328 - 329 - if (info.argsz < minsz) 330 - return -EINVAL; 331 - 332 - if (info.index >= vdev->num_regions) 333 - return -EINVAL; 334 - 335 - /* map offset to the physical address */ 336 - info.offset = VFIO_PLATFORM_INDEX_TO_OFFSET(info.index); 337 - info.size = vdev->regions[info.index].size; 338 - info.flags = vdev->regions[info.index].flags; 339 299 340 300 return copy_to_user((void __user *)arg, &info, minsz) ? 341 301 -EFAULT : 0;
+3
drivers/vfio/platform/vfio_platform_private.h
··· 85 85 void vfio_platform_close_device(struct vfio_device *core_vdev); 86 86 long vfio_platform_ioctl(struct vfio_device *core_vdev, 87 87 unsigned int cmd, unsigned long arg); 88 + int vfio_platform_ioctl_get_region_info(struct vfio_device *core_vdev, 89 + struct vfio_region_info *info, 90 + struct vfio_info_cap *caps); 88 91 ssize_t vfio_platform_read(struct vfio_device *core_vdev, 89 92 char __user *buf, size_t count, 90 93 loff_t *ppos);
+51
drivers/vfio/vfio_main.c
··· 172 172 if (refcount_dec_and_test(&device->refcount)) 173 173 complete(&device->comp); 174 174 } 175 + EXPORT_SYMBOL_GPL(vfio_device_put_registration); 175 176 176 177 bool vfio_device_try_get_registration(struct vfio_device *device) 177 178 { 178 179 return refcount_inc_not_zero(&device->refcount); 179 180 } 181 + EXPORT_SYMBOL_GPL(vfio_device_try_get_registration); 180 182 181 183 /* 182 184 * VFIO driver API ··· 1261 1259 } 1262 1260 } 1263 1261 1262 + static long vfio_get_region_info(struct vfio_device *device, 1263 + struct vfio_region_info __user *arg) 1264 + { 1265 + unsigned long minsz = offsetofend(struct vfio_region_info, offset); 1266 + struct vfio_region_info info = {}; 1267 + struct vfio_info_cap caps = {}; 1268 + int ret; 1269 + 1270 + if (unlikely(!device->ops->get_region_info_caps)) 1271 + return -EINVAL; 1272 + 1273 + if (copy_from_user(&info, arg, minsz)) 1274 + return -EFAULT; 1275 + if (info.argsz < minsz) 1276 + return -EINVAL; 1277 + 1278 + ret = device->ops->get_region_info_caps(device, &info, &caps); 1279 + if (ret) 1280 + goto out_free; 1281 + 1282 + if (caps.size) { 1283 + info.flags |= VFIO_REGION_INFO_FLAG_CAPS; 1284 + if (info.argsz < sizeof(info) + caps.size) { 1285 + info.argsz = sizeof(info) + caps.size; 1286 + info.cap_offset = 0; 1287 + } else { 1288 + vfio_info_cap_shift(&caps, sizeof(info)); 1289 + if (copy_to_user(arg + 1, caps.buf, caps.size)) { 1290 + ret = -EFAULT; 1291 + goto out_free; 1292 + } 1293 + info.cap_offset = sizeof(info); 1294 + } 1295 + } 1296 + 1297 + if (copy_to_user(arg, &info, minsz)){ 1298 + ret = -EFAULT; 1299 + goto out_free; 1300 + } 1301 + 1302 + out_free: 1303 + kfree(caps.buf); 1304 + return ret; 1305 + } 1306 + 1264 1307 static long vfio_device_fops_unl_ioctl(struct file *filep, 1265 1308 unsigned int cmd, unsigned long arg) 1266 1309 { ··· 1341 1294 switch (cmd) { 1342 1295 case VFIO_DEVICE_FEATURE: 1343 1296 ret = vfio_ioctl_device_feature(device, uptr); 1297 + break; 1298 + 1299 + case VFIO_DEVICE_GET_REGION_INFO: 1300 + ret = vfio_get_region_info(device, uptr); 1344 1301 break; 1345 1302 1346 1303 default:
+17
include/linux/dma-buf-mapping.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * DMA BUF Mapping Helpers 4 + * 5 + */ 6 + #ifndef __DMA_BUF_MAPPING_H__ 7 + #define __DMA_BUF_MAPPING_H__ 8 + #include <linux/dma-buf.h> 9 + 10 + struct sg_table *dma_buf_phys_vec_to_sgt(struct dma_buf_attachment *attach, 11 + struct p2pdma_provider *provider, 12 + struct dma_buf_phys_vec *phys_vec, 13 + size_t nr_ranges, size_t size, 14 + enum dma_data_direction dir); 15 + void dma_buf_free_sgt(struct dma_buf_attachment *attach, struct sg_table *sgt, 16 + enum dma_data_direction dir); 17 + #endif
+11
include/linux/dma-buf.h
··· 22 22 #include <linux/fs.h> 23 23 #include <linux/dma-fence.h> 24 24 #include <linux/wait.h> 25 + #include <linux/pci-p2pdma.h> 25 26 26 27 struct device; 27 28 struct dma_buf; ··· 529 528 int flags; 530 529 struct dma_resv *resv; 531 530 void *priv; 531 + }; 532 + 533 + /** 534 + * struct dma_buf_phys_vec - describe continuous chunk of memory 535 + * @paddr: physical address of that chunk 536 + * @len: Length of this chunk 537 + */ 538 + struct dma_buf_phys_vec { 539 + phys_addr_t paddr; 540 + size_t len; 532 541 }; 533 542 534 543 /**
+3
include/linux/hisi_acc_qm.h
··· 99 99 100 100 #define QM_DEV_ALG_MAX_LEN 256 101 101 102 + #define QM_MIG_REGION_SEL 0x100198 103 + #define QM_MIG_REGION_EN BIT(0) 104 + 102 105 /* uacce mode of the driver */ 103 106 #define UACCE_MODE_NOUACCE 0 /* don't use uacce */ 104 107 #define UACCE_MODE_SVA 1 /* use uacce sva mode */
+73 -47
include/linux/pci-p2pdma.h
··· 16 16 struct block_device; 17 17 struct scatterlist; 18 18 19 + /** 20 + * struct p2pdma_provider 21 + * 22 + * A p2pdma provider is a range of MMIO address space available to the CPU. 23 + */ 24 + struct p2pdma_provider { 25 + struct device *owner; 26 + u64 bus_offset; 27 + }; 28 + 29 + enum pci_p2pdma_map_type { 30 + /* 31 + * PCI_P2PDMA_MAP_UNKNOWN: Used internally as an initial state before 32 + * the mapping type has been calculated. Exported routines for the API 33 + * will never return this value. 34 + */ 35 + PCI_P2PDMA_MAP_UNKNOWN = 0, 36 + 37 + /* 38 + * Not a PCI P2PDMA transfer. 39 + */ 40 + PCI_P2PDMA_MAP_NONE, 41 + 42 + /* 43 + * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will 44 + * traverse the host bridge and the host bridge is not in the 45 + * allowlist. DMA Mapping routines should return an error when 46 + * this is returned. 47 + */ 48 + PCI_P2PDMA_MAP_NOT_SUPPORTED, 49 + 50 + /* 51 + * PCI_P2PDMA_MAP_BUS_ADDR: Indicates that two devices can talk to 52 + * each other directly through a PCI switch and the transaction will 53 + * not traverse the host bridge. Such a mapping should program 54 + * the DMA engine with PCI bus addresses. 55 + */ 56 + PCI_P2PDMA_MAP_BUS_ADDR, 57 + 58 + /* 59 + * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk 60 + * to each other, but the transaction traverses a host bridge on the 61 + * allowlist. In this case, a normal mapping either with CPU physical 62 + * addresses (in the case of dma-direct) or IOVA addresses (in the 63 + * case of IOMMUs) should be used to program the DMA engine. 64 + */ 65 + PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, 66 + }; 67 + 19 68 #ifdef CONFIG_PCI_P2PDMA 69 + int pcim_p2pdma_init(struct pci_dev *pdev); 70 + struct p2pdma_provider *pcim_p2pdma_provider(struct pci_dev *pdev, int bar); 20 71 int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, size_t size, 21 72 u64 offset); 22 73 int pci_p2pdma_distance_many(struct pci_dev *provider, struct device **clients, ··· 84 33 bool *use_p2pdma); 85 34 ssize_t pci_p2pdma_enable_show(char *page, struct pci_dev *p2p_dev, 86 35 bool use_p2pdma); 36 + enum pci_p2pdma_map_type pci_p2pdma_map_type(struct p2pdma_provider *provider, 37 + struct device *dev); 87 38 #else /* CONFIG_PCI_P2PDMA */ 39 + static inline int pcim_p2pdma_init(struct pci_dev *pdev) 40 + { 41 + return -EOPNOTSUPP; 42 + } 43 + static inline struct p2pdma_provider *pcim_p2pdma_provider(struct pci_dev *pdev, 44 + int bar) 45 + { 46 + return NULL; 47 + } 88 48 static inline int pci_p2pdma_add_resource(struct pci_dev *pdev, int bar, 89 49 size_t size, u64 offset) 90 50 { ··· 147 85 { 148 86 return sprintf(page, "none\n"); 149 87 } 88 + static inline enum pci_p2pdma_map_type 89 + pci_p2pdma_map_type(struct p2pdma_provider *provider, struct device *dev) 90 + { 91 + return PCI_P2PDMA_MAP_NOT_SUPPORTED; 92 + } 150 93 #endif /* CONFIG_PCI_P2PDMA */ 151 94 152 95 ··· 166 99 return pci_p2pmem_find_many(&client, 1); 167 100 } 168 101 169 - enum pci_p2pdma_map_type { 170 - /* 171 - * PCI_P2PDMA_MAP_UNKNOWN: Used internally as an initial state before 172 - * the mapping type has been calculated. Exported routines for the API 173 - * will never return this value. 174 - */ 175 - PCI_P2PDMA_MAP_UNKNOWN = 0, 176 - 177 - /* 178 - * Not a PCI P2PDMA transfer. 179 - */ 180 - PCI_P2PDMA_MAP_NONE, 181 - 182 - /* 183 - * PCI_P2PDMA_MAP_NOT_SUPPORTED: Indicates the transaction will 184 - * traverse the host bridge and the host bridge is not in the 185 - * allowlist. DMA Mapping routines should return an error when 186 - * this is returned. 187 - */ 188 - PCI_P2PDMA_MAP_NOT_SUPPORTED, 189 - 190 - /* 191 - * PCI_P2PDMA_MAP_BUS_ADDR: Indicates that two devices can talk to 192 - * each other directly through a PCI switch and the transaction will 193 - * not traverse the host bridge. Such a mapping should program 194 - * the DMA engine with PCI bus addresses. 195 - */ 196 - PCI_P2PDMA_MAP_BUS_ADDR, 197 - 198 - /* 199 - * PCI_P2PDMA_MAP_THRU_HOST_BRIDGE: Indicates two devices can talk 200 - * to each other, but the transaction traverses a host bridge on the 201 - * allowlist. In this case, a normal mapping either with CPU physical 202 - * addresses (in the case of dma-direct) or IOVA addresses (in the 203 - * case of IOMMUs) should be used to program the DMA engine. 204 - */ 205 - PCI_P2PDMA_MAP_THRU_HOST_BRIDGE, 206 - }; 207 - 208 102 struct pci_p2pdma_map_state { 209 - struct dev_pagemap *pgmap; 103 + struct p2pdma_provider *mem; 210 104 enum pci_p2pdma_map_type map; 211 - u64 bus_off; 212 105 }; 106 + 213 107 214 108 /* helper for pci_p2pdma_state(), do not use directly */ 215 109 void __pci_p2pdma_update_state(struct pci_p2pdma_map_state *state, ··· 190 162 struct page *page) 191 163 { 192 164 if (IS_ENABLED(CONFIG_PCI_P2PDMA) && is_pci_p2pdma_page(page)) { 193 - if (state->pgmap != page_pgmap(page)) 194 - __pci_p2pdma_update_state(state, dev, page); 165 + __pci_p2pdma_update_state(state, dev, page); 195 166 return state->map; 196 167 } 197 168 return PCI_P2PDMA_MAP_NONE; ··· 199 172 /** 200 173 * pci_p2pdma_bus_addr_map - Translate a physical address to a bus address 201 174 * for a PCI_P2PDMA_MAP_BUS_ADDR transfer. 202 - * @state: P2P state structure 175 + * @provider: P2P provider structure 203 176 * @paddr: physical address to map 204 177 * 205 178 * Map a physically contiguous PCI_P2PDMA_MAP_BUS_ADDR transfer. 206 179 */ 207 180 static inline dma_addr_t 208 - pci_p2pdma_bus_addr_map(struct pci_p2pdma_map_state *state, phys_addr_t paddr) 181 + pci_p2pdma_bus_addr_map(struct p2pdma_provider *provider, phys_addr_t paddr) 209 182 { 210 - WARN_ON_ONCE(state->map != PCI_P2PDMA_MAP_BUS_ADDR); 211 - return paddr + state->bus_off; 183 + return paddr + provider->bus_offset; 212 184 } 213 185 214 186 #endif /* _LINUX_PCI_P2P_H */
+6
include/linux/vfio.h
··· 21 21 struct iommufd_ctx; 22 22 struct iommufd_device; 23 23 struct iommufd_access; 24 + struct vfio_info_cap; 24 25 25 26 /* 26 27 * VFIO devices can be placed in a set, this allows all devices to share this ··· 133 132 size_t count, loff_t *size); 134 133 long (*ioctl)(struct vfio_device *vdev, unsigned int cmd, 135 134 unsigned long arg); 135 + int (*get_region_info_caps)(struct vfio_device *vdev, 136 + struct vfio_region_info *info, 137 + struct vfio_info_cap *caps); 136 138 int (*mmap)(struct vfio_device *vdev, struct vm_area_struct *vma); 137 139 void (*request)(struct vfio_device *vdev, unsigned int count); 138 140 int (*match)(struct vfio_device *vdev, char *buf); ··· 301 297 int vfio_register_group_dev(struct vfio_device *device); 302 298 int vfio_register_emulated_iommu_dev(struct vfio_device *device); 303 299 void vfio_unregister_group_dev(struct vfio_device *device); 300 + bool vfio_device_try_get_registration(struct vfio_device *device); 301 + void vfio_device_put_registration(struct vfio_device *device); 304 302 305 303 int vfio_assign_device_set(struct vfio_device *device, void *set_id); 306 304 unsigned int vfio_device_set_open_count(struct vfio_device_set *dev_set);
+67 -2
include/linux/vfio_pci_core.h
··· 12 12 #include <linux/pci.h> 13 13 #include <linux/vfio.h> 14 14 #include <linux/irqbypass.h> 15 + #include <linux/rcupdate.h> 15 16 #include <linux/types.h> 16 17 #include <linux/uuid.h> 17 18 #include <linux/notifier.h> ··· 27 26 28 27 struct vfio_pci_core_device; 29 28 struct vfio_pci_region; 29 + struct p2pdma_provider; 30 + struct dma_buf_phys_vec; 31 + 32 + struct vfio_pci_eventfd { 33 + struct eventfd_ctx *ctx; 34 + struct rcu_head rcu; 35 + }; 30 36 31 37 struct vfio_pci_regops { 32 38 ssize_t (*rw)(struct vfio_pci_core_device *vdev, char __user *buf, ··· 57 49 u32 flags; 58 50 }; 59 51 52 + struct vfio_pci_device_ops { 53 + int (*get_dmabuf_phys)(struct vfio_pci_core_device *vdev, 54 + struct p2pdma_provider **provider, 55 + unsigned int region_index, 56 + struct dma_buf_phys_vec *phys_vec, 57 + struct vfio_region_dma_range *dma_ranges, 58 + size_t nr_ranges); 59 + }; 60 + 61 + #if IS_ENABLED(CONFIG_VFIO_PCI_DMABUF) 62 + int vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec, 63 + struct vfio_region_dma_range *dma_ranges, 64 + size_t nr_ranges, phys_addr_t start, 65 + phys_addr_t len); 66 + int vfio_pci_core_get_dmabuf_phys(struct vfio_pci_core_device *vdev, 67 + struct p2pdma_provider **provider, 68 + unsigned int region_index, 69 + struct dma_buf_phys_vec *phys_vec, 70 + struct vfio_region_dma_range *dma_ranges, 71 + size_t nr_ranges); 72 + #else 73 + static inline int 74 + vfio_pci_core_fill_phys_vec(struct dma_buf_phys_vec *phys_vec, 75 + struct vfio_region_dma_range *dma_ranges, 76 + size_t nr_ranges, phys_addr_t start, 77 + phys_addr_t len) 78 + { 79 + return -EINVAL; 80 + } 81 + static inline int vfio_pci_core_get_dmabuf_phys( 82 + struct vfio_pci_core_device *vdev, struct p2pdma_provider **provider, 83 + unsigned int region_index, struct dma_buf_phys_vec *phys_vec, 84 + struct vfio_region_dma_range *dma_ranges, size_t nr_ranges) 85 + { 86 + return -EOPNOTSUPP; 87 + } 88 + #endif 89 + 60 90 struct vfio_pci_core_device { 61 91 struct vfio_device vdev; 62 92 struct pci_dev *pdev; 93 + const struct vfio_pci_device_ops *pci_ops; 63 94 void __iomem *barmap[PCI_STD_NUM_BARS]; 64 95 bool bar_mmap_supported[PCI_STD_NUM_BARS]; 65 96 u8 *pci_config_map; ··· 130 83 struct pci_saved_state *pci_saved_state; 131 84 struct pci_saved_state *pm_save; 132 85 int ioeventfds_nr; 133 - struct eventfd_ctx *err_trigger; 134 - struct eventfd_ctx *req_trigger; 86 + struct vfio_pci_eventfd __rcu *err_trigger; 87 + struct vfio_pci_eventfd __rcu *req_trigger; 135 88 struct eventfd_ctx *pm_wake_eventfd_ctx; 136 89 struct list_head dummy_resources_list; 137 90 struct mutex ioeventfds_lock; ··· 141 94 struct vfio_pci_core_device *sriov_pf_core_dev; 142 95 struct notifier_block nb; 143 96 struct rw_semaphore memory_lock; 97 + struct list_head dmabufs; 144 98 }; 145 99 146 100 /* Will be exported for vfio pci drivers usage */ ··· 163 115 unsigned long arg); 164 116 int vfio_pci_core_ioctl_feature(struct vfio_device *device, u32 flags, 165 117 void __user *arg, size_t argsz); 118 + int vfio_pci_ioctl_get_region_info(struct vfio_device *core_vdev, 119 + struct vfio_region_info *info, 120 + struct vfio_info_cap *caps); 166 121 ssize_t vfio_pci_core_read(struct vfio_device *core_vdev, char __user *buf, 167 122 size_t count, loff_t *ppos); 168 123 ssize_t vfio_pci_core_write(struct vfio_device *core_vdev, const char __user *buf, 169 124 size_t count, loff_t *ppos); 125 + vm_fault_t vfio_pci_vmf_insert_pfn(struct vfio_pci_core_device *vdev, 126 + struct vm_fault *vmf, unsigned long pfn, 127 + unsigned int order); 170 128 int vfio_pci_core_mmap(struct vfio_device *core_vdev, struct vm_area_struct *vma); 171 129 void vfio_pci_core_request(struct vfio_device *core_vdev, unsigned int count); 172 130 int vfio_pci_core_match(struct vfio_device *core_vdev, char *buf); ··· 188 134 void __iomem *io, char __user *buf, 189 135 loff_t off, size_t count, size_t x_start, 190 136 size_t x_end, bool iswrite); 137 + bool __vfio_pci_memory_enabled(struct vfio_pci_core_device *vdev); 191 138 bool vfio_pci_core_range_intersect_range(loff_t buf_start, size_t buf_cnt, 192 139 loff_t reg_start, size_t reg_cnt, 193 140 loff_t *buf_offset, ··· 215 160 #ifdef ioread64 216 161 VFIO_IOREAD_DECLARATION(64) 217 162 #endif 163 + 164 + static inline bool is_aligned_for_order(struct vm_area_struct *vma, 165 + unsigned long addr, 166 + unsigned long pfn, 167 + unsigned int order) 168 + { 169 + return !(order && (addr < vma->vm_start || 170 + addr + (PAGE_SIZE << order) > vma->vm_end || 171 + !IS_ALIGNED(pfn, 1 << order))); 172 + } 218 173 219 174 #endif /* VFIO_PCI_CORE_H */
+28
include/uapi/linux/vfio.h
··· 14 14 15 15 #include <linux/types.h> 16 16 #include <linux/ioctl.h> 17 + #include <linux/stddef.h> 17 18 18 19 #define VFIO_API_VERSION 0 19 20 ··· 1478 1477 #define VFIO_DEVICE_FEATURE_SET_MASTER 1 /* Set Bus Master */ 1479 1478 }; 1480 1479 #define VFIO_DEVICE_FEATURE_BUS_MASTER 10 1480 + 1481 + /** 1482 + * Upon VFIO_DEVICE_FEATURE_GET create a dma_buf fd for the 1483 + * regions selected. 1484 + * 1485 + * open_flags are the typical flags passed to open(2), eg O_RDWR, O_CLOEXEC, 1486 + * etc. offset/length specify a slice of the region to create the dmabuf from. 1487 + * nr_ranges is the total number of (P2P DMA) ranges that comprise the dmabuf. 1488 + * 1489 + * flags should be 0. 1490 + * 1491 + * Return: The fd number on success, -1 and errno is set on failure. 1492 + */ 1493 + #define VFIO_DEVICE_FEATURE_DMA_BUF 11 1494 + 1495 + struct vfio_region_dma_range { 1496 + __u64 offset; 1497 + __u64 length; 1498 + }; 1499 + 1500 + struct vfio_device_feature_dma_buf { 1501 + __u32 region_index; 1502 + __u32 open_flags; 1503 + __u32 flags; 1504 + __u32 nr_ranges; 1505 + struct vfio_region_dma_range dma_ranges[] __counted_by(nr_ranges); 1506 + }; 1481 1507 1482 1508 /* -------- API for Type1 VFIO IOMMU -------- */ 1483 1509
+2 -2
kernel/dma/direct.c
··· 479 479 } 480 480 break; 481 481 case PCI_P2PDMA_MAP_BUS_ADDR: 482 - sg->dma_address = pci_p2pdma_bus_addr_map(&p2pdma_state, 483 - sg_phys(sg)); 482 + sg->dma_address = pci_p2pdma_bus_addr_map( 483 + p2pdma_state.mem, sg_phys(sg)); 484 484 sg_dma_len(sg) = sg->length; 485 485 sg_dma_mark_bus_address(sg); 486 486 continue;
+1 -1
mm/hmm.c
··· 811 811 break; 812 812 case PCI_P2PDMA_MAP_BUS_ADDR: 813 813 pfns[idx] |= HMM_PFN_P2PDMA_BUS | HMM_PFN_DMA_MAPPED; 814 - return pci_p2pdma_bus_addr_map(p2pdma_state, paddr); 814 + return pci_p2pdma_bus_addr_map(p2pdma_state->mem, paddr); 815 815 default: 816 816 return DMA_MAPPING_ERROR; 817 817 }
+24 -47
samples/vfio-mdev/mbochs.c
··· 143 143 static atomic_t mbochs_avail_mbytes; 144 144 static const struct vfio_device_ops mbochs_dev_ops; 145 145 146 - struct vfio_region_info_ext { 147 - struct vfio_region_info base; 148 - struct vfio_region_info_cap_type type; 149 - }; 150 - 151 146 struct mbochs_mode { 152 147 u32 drm_format; 153 148 u32 bytepp; ··· 1028 1033 return 0; 1029 1034 } 1030 1035 1031 - static int mbochs_get_region_info(struct mdev_state *mdev_state, 1032 - struct vfio_region_info_ext *ext) 1036 + static int mbochs_ioctl_get_region_info(struct vfio_device *vdev, 1037 + struct vfio_region_info *region_info, 1038 + struct vfio_info_cap *caps) 1033 1039 { 1034 - struct vfio_region_info *region_info = &ext->base; 1040 + struct mdev_state *mdev_state = 1041 + container_of(vdev, struct mdev_state, vdev); 1035 1042 1036 1043 if (region_info->index >= MBOCHS_NUM_REGIONS) 1037 1044 return -EINVAL; ··· 1058 1061 region_info->flags = (VFIO_REGION_INFO_FLAG_READ | 1059 1062 VFIO_REGION_INFO_FLAG_WRITE); 1060 1063 break; 1061 - case MBOCHS_EDID_REGION_INDEX: 1062 - ext->base.argsz = sizeof(*ext); 1063 - ext->base.offset = MBOCHS_EDID_OFFSET; 1064 - ext->base.size = MBOCHS_EDID_SIZE; 1065 - ext->base.flags = (VFIO_REGION_INFO_FLAG_READ | 1066 - VFIO_REGION_INFO_FLAG_WRITE | 1067 - VFIO_REGION_INFO_FLAG_CAPS); 1068 - ext->base.cap_offset = offsetof(typeof(*ext), type); 1069 - ext->type.header.id = VFIO_REGION_INFO_CAP_TYPE; 1070 - ext->type.header.version = 1; 1071 - ext->type.header.next = 0; 1072 - ext->type.type = VFIO_REGION_TYPE_GFX; 1073 - ext->type.subtype = VFIO_REGION_SUBTYPE_GFX_EDID; 1074 - break; 1064 + case MBOCHS_EDID_REGION_INDEX: { 1065 + struct vfio_region_info_cap_type cap_type = { 1066 + .header.id = VFIO_REGION_INFO_CAP_TYPE, 1067 + .header.version = 1, 1068 + .type = VFIO_REGION_TYPE_GFX, 1069 + .subtype = VFIO_REGION_SUBTYPE_GFX_EDID, 1070 + }; 1071 + 1072 + region_info->offset = MBOCHS_EDID_OFFSET; 1073 + region_info->size = MBOCHS_EDID_SIZE; 1074 + region_info->flags = (VFIO_REGION_INFO_FLAG_READ | 1075 + VFIO_REGION_INFO_FLAG_WRITE | 1076 + VFIO_REGION_INFO_FLAG_CAPS); 1077 + 1078 + return vfio_info_add_capability(caps, &cap_type.header, 1079 + sizeof(cap_type)); 1080 + } 1075 1081 default: 1076 1082 region_info->size = 0; 1077 1083 region_info->offset = 0; ··· 1191 1191 struct mdev_state *mdev_state = 1192 1192 container_of(vdev, struct mdev_state, vdev); 1193 1193 int ret = 0; 1194 - unsigned long minsz, outsz; 1194 + unsigned long minsz; 1195 1195 1196 1196 switch (cmd) { 1197 1197 case VFIO_DEVICE_GET_INFO: ··· 1211 1211 return ret; 1212 1212 1213 1213 if (copy_to_user((void __user *)arg, &info, minsz)) 1214 - return -EFAULT; 1215 - 1216 - return 0; 1217 - } 1218 - case VFIO_DEVICE_GET_REGION_INFO: 1219 - { 1220 - struct vfio_region_info_ext info; 1221 - 1222 - minsz = offsetofend(typeof(info), base.offset); 1223 - 1224 - if (copy_from_user(&info, (void __user *)arg, minsz)) 1225 - return -EFAULT; 1226 - 1227 - outsz = info.base.argsz; 1228 - if (outsz < minsz) 1229 - return -EINVAL; 1230 - if (outsz > sizeof(info)) 1231 - return -EINVAL; 1232 - 1233 - ret = mbochs_get_region_info(mdev_state, &info); 1234 - if (ret) 1235 - return ret; 1236 - 1237 - if (copy_to_user((void __user *)arg, &info, outsz)) 1238 1214 return -EFAULT; 1239 1215 1240 1216 return 0; ··· 1352 1376 .read = mbochs_read, 1353 1377 .write = mbochs_write, 1354 1378 .ioctl = mbochs_ioctl, 1379 + .get_region_info_caps = mbochs_ioctl_get_region_info, 1355 1380 .mmap = mbochs_mmap, 1356 1381 .bind_iommufd = vfio_iommufd_emulated_bind, 1357 1382 .unbind_iommufd = vfio_iommufd_emulated_unbind,
+7 -27
samples/vfio-mdev/mdpy.c
··· 435 435 return remap_vmalloc_range(vma, mdev_state->memblk, 0); 436 436 } 437 437 438 - static int mdpy_get_region_info(struct mdev_state *mdev_state, 439 - struct vfio_region_info *region_info, 440 - u16 *cap_type_id, void **cap_type) 438 + static int mdpy_ioctl_get_region_info(struct vfio_device *vdev, 439 + struct vfio_region_info *region_info, 440 + struct vfio_info_cap *caps) 441 441 { 442 + struct mdev_state *mdev_state = 443 + container_of(vdev, struct mdev_state, vdev); 444 + 442 445 if (region_info->index >= VFIO_PCI_NUM_REGIONS && 443 446 region_info->index != MDPY_DISPLAY_REGION) 444 447 return -EINVAL; ··· 547 544 548 545 return 0; 549 546 } 550 - case VFIO_DEVICE_GET_REGION_INFO: 551 - { 552 - struct vfio_region_info info; 553 - u16 cap_type_id = 0; 554 - void *cap_type = NULL; 555 - 556 - minsz = offsetofend(struct vfio_region_info, offset); 557 - 558 - if (copy_from_user(&info, (void __user *)arg, minsz)) 559 - return -EFAULT; 560 - 561 - if (info.argsz < minsz) 562 - return -EINVAL; 563 - 564 - ret = mdpy_get_region_info(mdev_state, &info, &cap_type_id, 565 - &cap_type); 566 - if (ret) 567 - return ret; 568 - 569 - if (copy_to_user((void __user *)arg, &info, minsz)) 570 - return -EFAULT; 571 - 572 - return 0; 573 - } 574 547 575 548 case VFIO_DEVICE_GET_IRQ_INFO: 576 549 { ··· 644 665 .read = mdpy_read, 645 666 .write = mdpy_write, 646 667 .ioctl = mdpy_ioctl, 668 + .get_region_info_caps = mdpy_ioctl_get_region_info, 647 669 .mmap = mdpy_mmap, 648 670 .bind_iommufd = vfio_iommufd_emulated_bind, 649 671 .unbind_iommufd = vfio_iommufd_emulated_unbind,
+7 -28
samples/vfio-mdev/mtty.c
··· 624 624 u8 lsr = 0; 625 625 626 626 mutex_lock(&mdev_state->rxtx_lock); 627 - /* atleast one char in FIFO */ 627 + /* at least one char in FIFO */ 628 628 if (mdev_state->s[index].rxtx.head != 629 629 mdev_state->s[index].rxtx.tail) 630 630 lsr |= UART_LSR_DR; ··· 1717 1717 return ret; 1718 1718 } 1719 1719 1720 - static int mtty_get_region_info(struct mdev_state *mdev_state, 1721 - struct vfio_region_info *region_info, 1722 - u16 *cap_type_id, void **cap_type) 1720 + static int mtty_ioctl_get_region_info(struct vfio_device *vdev, 1721 + struct vfio_region_info *region_info, 1722 + struct vfio_info_cap *caps) 1723 1723 { 1724 + struct mdev_state *mdev_state = 1725 + container_of(vdev, struct mdev_state, vdev); 1724 1726 unsigned int size = 0; 1725 1727 u32 bar_index; 1726 1728 ··· 1813 1811 return ret; 1814 1812 1815 1813 memcpy(&mdev_state->dev_info, &info, sizeof(info)); 1816 - 1817 - if (copy_to_user((void __user *)arg, &info, minsz)) 1818 - return -EFAULT; 1819 - 1820 - return 0; 1821 - } 1822 - case VFIO_DEVICE_GET_REGION_INFO: 1823 - { 1824 - struct vfio_region_info info; 1825 - u16 cap_type_id = 0; 1826 - void *cap_type = NULL; 1827 - 1828 - minsz = offsetofend(struct vfio_region_info, offset); 1829 - 1830 - if (copy_from_user(&info, (void __user *)arg, minsz)) 1831 - return -EFAULT; 1832 - 1833 - if (info.argsz < minsz) 1834 - return -EINVAL; 1835 - 1836 - ret = mtty_get_region_info(mdev_state, &info, &cap_type_id, 1837 - &cap_type); 1838 - if (ret) 1839 - return ret; 1840 1814 1841 1815 if (copy_to_user((void __user *)arg, &info, minsz)) 1842 1816 return -EFAULT; ··· 1927 1949 .read = mtty_read, 1928 1950 .write = mtty_write, 1929 1951 .ioctl = mtty_ioctl, 1952 + .get_region_info_caps = mtty_ioctl_get_region_info, 1930 1953 .bind_iommufd = vfio_iommufd_emulated_bind, 1931 1954 .unbind_iommufd = vfio_iommufd_emulated_unbind, 1932 1955 .attach_ioas = vfio_iommufd_emulated_attach_ioas,
+9 -1
tools/testing/selftests/vfio/Makefile
··· 2 2 TEST_GEN_PROGS += vfio_dma_mapping_test 3 3 TEST_GEN_PROGS += vfio_iommufd_setup_test 4 4 TEST_GEN_PROGS += vfio_pci_device_test 5 + TEST_GEN_PROGS += vfio_pci_device_init_perf_test 5 6 TEST_GEN_PROGS += vfio_pci_driver_test 6 - TEST_PROGS_EXTENDED := run.sh 7 + 8 + TEST_FILES += scripts/cleanup.sh 9 + TEST_FILES += scripts/lib.sh 10 + TEST_FILES += scripts/run.sh 11 + TEST_FILES += scripts/setup.sh 12 + 7 13 include ../lib.mk 8 14 include lib/libvfio.mk 9 15 10 16 CFLAGS += -I$(top_srcdir)/tools/include 11 17 CFLAGS += -MD 12 18 CFLAGS += $(EXTRA_CFLAGS) 19 + 20 + LDFLAGS += -pthread 13 21 14 22 $(TEST_GEN_PROGS): %: %.o $(LIBVFIO_O) 15 23 $(CC) $(CFLAGS) $(CPPFLAGS) $(LDFLAGS) $< $(LIBVFIO_O) $(LDLIBS) -o $@
+18 -18
tools/testing/selftests/vfio/lib/drivers/dsa/dsa.c
··· 9 9 #include <linux/pci_ids.h> 10 10 #include <linux/sizes.h> 11 11 12 - #include <vfio_util.h> 12 + #include <libvfio.h> 13 13 14 14 #include "registers.h" 15 15 ··· 70 70 return -EINVAL; 71 71 72 72 if (dsa_int_handle_request_required(device)) { 73 - printf("Device requires requesting interrupt handles\n"); 73 + dev_err(device, "Device requires requesting interrupt handles\n"); 74 74 return -EINVAL; 75 75 } 76 76 ··· 91 91 return; 92 92 } 93 93 94 - fprintf(stderr, "SWERR: 0x%016lx 0x%016lx 0x%016lx 0x%016lx\n", 94 + dev_err(device, "SWERR: 0x%016lx 0x%016lx 0x%016lx 0x%016lx\n", 95 95 err.bits[0], err.bits[1], err.bits[2], err.bits[3]); 96 96 97 - fprintf(stderr, " valid: 0x%x\n", err.valid); 98 - fprintf(stderr, " overflow: 0x%x\n", err.overflow); 99 - fprintf(stderr, " desc_valid: 0x%x\n", err.desc_valid); 100 - fprintf(stderr, " wq_idx_valid: 0x%x\n", err.wq_idx_valid); 101 - fprintf(stderr, " batch: 0x%x\n", err.batch); 102 - fprintf(stderr, " fault_rw: 0x%x\n", err.fault_rw); 103 - fprintf(stderr, " priv: 0x%x\n", err.priv); 104 - fprintf(stderr, " error: 0x%x\n", err.error); 105 - fprintf(stderr, " wq_idx: 0x%x\n", err.wq_idx); 106 - fprintf(stderr, " operation: 0x%x\n", err.operation); 107 - fprintf(stderr, " pasid: 0x%x\n", err.pasid); 108 - fprintf(stderr, " batch_idx: 0x%x\n", err.batch_idx); 109 - fprintf(stderr, " invalid_flags: 0x%x\n", err.invalid_flags); 110 - fprintf(stderr, " fault_addr: 0x%lx\n", err.fault_addr); 97 + dev_err(device, " valid: 0x%x\n", err.valid); 98 + dev_err(device, " overflow: 0x%x\n", err.overflow); 99 + dev_err(device, " desc_valid: 0x%x\n", err.desc_valid); 100 + dev_err(device, " wq_idx_valid: 0x%x\n", err.wq_idx_valid); 101 + dev_err(device, " batch: 0x%x\n", err.batch); 102 + dev_err(device, " fault_rw: 0x%x\n", err.fault_rw); 103 + dev_err(device, " priv: 0x%x\n", err.priv); 104 + dev_err(device, " error: 0x%x\n", err.error); 105 + dev_err(device, " wq_idx: 0x%x\n", err.wq_idx); 106 + dev_err(device, " operation: 0x%x\n", err.operation); 107 + dev_err(device, " pasid: 0x%x\n", err.pasid); 108 + dev_err(device, " batch_idx: 0x%x\n", err.batch_idx); 109 + dev_err(device, " invalid_flags: 0x%x\n", err.invalid_flags); 110 + dev_err(device, " fault_addr: 0x%lx\n", err.fault_addr); 111 111 112 112 VFIO_FAIL("Software Error Detected!\n"); 113 113 } ··· 256 256 if (status == DSA_COMP_SUCCESS) 257 257 return 0; 258 258 259 - printf("Error detected during memcpy operation: 0x%x\n", status); 259 + dev_err(device, "Error detected during memcpy operation: 0x%x\n", status); 260 260 return -1; 261 261 } 262 262
+9 -9
tools/testing/selftests/vfio/lib/drivers/ioat/ioat.c
··· 7 7 #include <linux/pci_ids.h> 8 8 #include <linux/sizes.h> 9 9 10 - #include <vfio_util.h> 10 + #include <libvfio.h> 11 11 12 12 #include "hw.h" 13 13 #include "registers.h" ··· 51 51 r = 0; 52 52 break; 53 53 default: 54 - printf("ioat: Unsupported version: 0x%x\n", version); 54 + dev_err(device, "ioat: Unsupported version: 0x%x\n", version); 55 55 r = -EINVAL; 56 56 } 57 57 return r; ··· 135 135 { 136 136 void *registers = ioat_channel_registers(device); 137 137 138 - printf("Error detected during memcpy operation!\n" 139 - " CHANERR: 0x%x\n" 140 - " CHANERR_INT: 0x%x\n" 141 - " DMAUNCERRSTS: 0x%x\n", 142 - readl(registers + IOAT_CHANERR_OFFSET), 143 - vfio_pci_config_readl(device, IOAT_PCI_CHANERR_INT_OFFSET), 144 - vfio_pci_config_readl(device, IOAT_PCI_DMAUNCERRSTS_OFFSET)); 138 + dev_err(device, "Error detected during memcpy operation!\n" 139 + " CHANERR: 0x%x\n" 140 + " CHANERR_INT: 0x%x\n" 141 + " DMAUNCERRSTS: 0x%x\n", 142 + readl(registers + IOAT_CHANERR_OFFSET), 143 + vfio_pci_config_readl(device, IOAT_PCI_CHANERR_INT_OFFSET), 144 + vfio_pci_config_readl(device, IOAT_PCI_DMAUNCERRSTS_OFFSET)); 145 145 146 146 ioat_reset(device); 147 147 }
+26
tools/testing/selftests/vfio/lib/include/libvfio.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_H 3 + #define SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_H 4 + 5 + #include <libvfio/assert.h> 6 + #include <libvfio/iommu.h> 7 + #include <libvfio/iova_allocator.h> 8 + #include <libvfio/vfio_pci_device.h> 9 + #include <libvfio/vfio_pci_driver.h> 10 + 11 + /* 12 + * Return the BDF string of the device that the test should use. 13 + * 14 + * If a BDF string is provided by the user on the command line (as the last 15 + * element of argv[]), then this function will return that and decrement argc 16 + * by 1. 17 + * 18 + * Otherwise this function will attempt to use the environment variable 19 + * $VFIO_SELFTESTS_BDF. 20 + * 21 + * If BDF cannot be determined then the test will exit with KSFT_SKIP. 22 + */ 23 + const char *vfio_selftests_get_bdf(int *argc, char *argv[]); 24 + char **vfio_selftests_get_bdfs(int *argc, char *argv[], int *nr_bdfs); 25 + 26 + #endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_H */
+54
tools/testing/selftests/vfio/lib/include/libvfio/assert.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_ASSERT_H 3 + #define SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_ASSERT_H 4 + 5 + #include <stdio.h> 6 + #include <string.h> 7 + #include <sys/ioctl.h> 8 + 9 + #include "../../../../kselftest.h" 10 + 11 + #define VFIO_LOG_AND_EXIT(...) do { \ 12 + fprintf(stderr, " " __VA_ARGS__); \ 13 + fprintf(stderr, "\n"); \ 14 + exit(KSFT_FAIL); \ 15 + } while (0) 16 + 17 + #define VFIO_ASSERT_OP(_lhs, _rhs, _op, ...) do { \ 18 + typeof(_lhs) __lhs = (_lhs); \ 19 + typeof(_rhs) __rhs = (_rhs); \ 20 + \ 21 + if (__lhs _op __rhs) \ 22 + break; \ 23 + \ 24 + fprintf(stderr, "%s:%u: Assertion Failure\n\n", __FILE__, __LINE__); \ 25 + fprintf(stderr, " Expression: " #_lhs " " #_op " " #_rhs "\n"); \ 26 + fprintf(stderr, " Observed: %#lx %s %#lx\n", \ 27 + (u64)__lhs, #_op, (u64)__rhs); \ 28 + fprintf(stderr, " [errno: %d - %s]\n", errno, strerror(errno)); \ 29 + VFIO_LOG_AND_EXIT(__VA_ARGS__); \ 30 + } while (0) 31 + 32 + #define VFIO_ASSERT_EQ(_a, _b, ...) VFIO_ASSERT_OP(_a, _b, ==, ##__VA_ARGS__) 33 + #define VFIO_ASSERT_NE(_a, _b, ...) VFIO_ASSERT_OP(_a, _b, !=, ##__VA_ARGS__) 34 + #define VFIO_ASSERT_LT(_a, _b, ...) VFIO_ASSERT_OP(_a, _b, <, ##__VA_ARGS__) 35 + #define VFIO_ASSERT_LE(_a, _b, ...) VFIO_ASSERT_OP(_a, _b, <=, ##__VA_ARGS__) 36 + #define VFIO_ASSERT_GT(_a, _b, ...) VFIO_ASSERT_OP(_a, _b, >, ##__VA_ARGS__) 37 + #define VFIO_ASSERT_GE(_a, _b, ...) VFIO_ASSERT_OP(_a, _b, >=, ##__VA_ARGS__) 38 + #define VFIO_ASSERT_TRUE(_a, ...) VFIO_ASSERT_NE(false, (_a), ##__VA_ARGS__) 39 + #define VFIO_ASSERT_FALSE(_a, ...) VFIO_ASSERT_EQ(false, (_a), ##__VA_ARGS__) 40 + #define VFIO_ASSERT_NULL(_a, ...) VFIO_ASSERT_EQ(NULL, _a, ##__VA_ARGS__) 41 + #define VFIO_ASSERT_NOT_NULL(_a, ...) VFIO_ASSERT_NE(NULL, _a, ##__VA_ARGS__) 42 + 43 + #define VFIO_FAIL(_fmt, ...) do { \ 44 + fprintf(stderr, "%s:%u: FAIL\n\n", __FILE__, __LINE__); \ 45 + VFIO_LOG_AND_EXIT(_fmt, ##__VA_ARGS__); \ 46 + } while (0) 47 + 48 + #define ioctl_assert(_fd, _op, _arg) do { \ 49 + void *__arg = (_arg); \ 50 + int __ret = ioctl((_fd), (_op), (__arg)); \ 51 + VFIO_ASSERT_EQ(__ret, 0, "ioctl(%s, %s, %s) returned %d\n", #_fd, #_op, #_arg, __ret); \ 52 + } while (0) 53 + 54 + #endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_ASSERT_H */
+76
tools/testing/selftests/vfio/lib/include/libvfio/iommu.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_IOMMU_H 3 + #define SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_IOMMU_H 4 + 5 + #include <linux/list.h> 6 + #include <linux/types.h> 7 + 8 + #include <libvfio/assert.h> 9 + 10 + typedef u64 iova_t; 11 + 12 + struct iommu_mode { 13 + const char *name; 14 + const char *container_path; 15 + unsigned long iommu_type; 16 + }; 17 + 18 + extern const char *default_iommu_mode; 19 + 20 + struct dma_region { 21 + struct list_head link; 22 + void *vaddr; 23 + iova_t iova; 24 + u64 size; 25 + }; 26 + 27 + struct iommu { 28 + const struct iommu_mode *mode; 29 + int container_fd; 30 + int iommufd; 31 + u32 ioas_id; 32 + struct list_head dma_regions; 33 + }; 34 + 35 + struct iommu *iommu_init(const char *iommu_mode); 36 + void iommu_cleanup(struct iommu *iommu); 37 + 38 + int __iommu_map(struct iommu *iommu, struct dma_region *region); 39 + 40 + static inline void iommu_map(struct iommu *iommu, struct dma_region *region) 41 + { 42 + VFIO_ASSERT_EQ(__iommu_map(iommu, region), 0); 43 + } 44 + 45 + int __iommu_unmap(struct iommu *iommu, struct dma_region *region, u64 *unmapped); 46 + 47 + static inline void iommu_unmap(struct iommu *iommu, struct dma_region *region) 48 + { 49 + VFIO_ASSERT_EQ(__iommu_unmap(iommu, region, NULL), 0); 50 + } 51 + 52 + int __iommu_unmap_all(struct iommu *iommu, u64 *unmapped); 53 + 54 + static inline void iommu_unmap_all(struct iommu *iommu) 55 + { 56 + VFIO_ASSERT_EQ(__iommu_unmap_all(iommu, NULL), 0); 57 + } 58 + 59 + int __iommu_hva2iova(struct iommu *iommu, void *vaddr, iova_t *iova); 60 + iova_t iommu_hva2iova(struct iommu *iommu, void *vaddr); 61 + 62 + struct iommu_iova_range *iommu_iova_ranges(struct iommu *iommu, u32 *nranges); 63 + 64 + /* 65 + * Generator for VFIO selftests fixture variants that replicate across all 66 + * possible IOMMU modes. Tests must define FIXTURE_VARIANT_ADD_IOMMU_MODE() 67 + * which should then use FIXTURE_VARIANT_ADD() to create the variant. 68 + */ 69 + #define FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(...) \ 70 + FIXTURE_VARIANT_ADD_IOMMU_MODE(vfio_type1_iommu, ##__VA_ARGS__); \ 71 + FIXTURE_VARIANT_ADD_IOMMU_MODE(vfio_type1v2_iommu, ##__VA_ARGS__); \ 72 + FIXTURE_VARIANT_ADD_IOMMU_MODE(iommufd_compat_type1, ##__VA_ARGS__); \ 73 + FIXTURE_VARIANT_ADD_IOMMU_MODE(iommufd_compat_type1v2, ##__VA_ARGS__); \ 74 + FIXTURE_VARIANT_ADD_IOMMU_MODE(iommufd, ##__VA_ARGS__) 75 + 76 + #endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_IOMMU_H */
+23
tools/testing/selftests/vfio/lib/include/libvfio/iova_allocator.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_IOVA_ALLOCATOR_H 3 + #define SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_IOVA_ALLOCATOR_H 4 + 5 + #include <uapi/linux/types.h> 6 + #include <linux/list.h> 7 + #include <linux/types.h> 8 + #include <linux/iommufd.h> 9 + 10 + #include <libvfio/iommu.h> 11 + 12 + struct iova_allocator { 13 + struct iommu_iova_range *ranges; 14 + u32 nranges; 15 + u32 range_idx; 16 + u64 range_offset; 17 + }; 18 + 19 + struct iova_allocator *iova_allocator_init(struct iommu *iommu); 20 + void iova_allocator_cleanup(struct iova_allocator *allocator); 21 + iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size); 22 + 23 + #endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_IOVA_ALLOCATOR_H */
+125
tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_device.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DEVICE_H 3 + #define SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DEVICE_H 4 + 5 + #include <fcntl.h> 6 + #include <linux/vfio.h> 7 + #include <linux/pci_regs.h> 8 + 9 + #include <libvfio/assert.h> 10 + #include <libvfio/iommu.h> 11 + #include <libvfio/vfio_pci_driver.h> 12 + 13 + struct vfio_pci_bar { 14 + struct vfio_region_info info; 15 + void *vaddr; 16 + }; 17 + 18 + struct vfio_pci_device { 19 + const char *bdf; 20 + int fd; 21 + int group_fd; 22 + 23 + struct iommu *iommu; 24 + 25 + struct vfio_device_info info; 26 + struct vfio_region_info config_space; 27 + struct vfio_pci_bar bars[PCI_STD_NUM_BARS]; 28 + 29 + struct vfio_irq_info msi_info; 30 + struct vfio_irq_info msix_info; 31 + 32 + /* eventfds for MSI and MSI-x interrupts */ 33 + int msi_eventfds[PCI_MSIX_FLAGS_QSIZE + 1]; 34 + 35 + struct vfio_pci_driver driver; 36 + }; 37 + 38 + #define dev_info(_dev, _fmt, ...) printf("%s: " _fmt, (_dev)->bdf, ##__VA_ARGS__) 39 + #define dev_err(_dev, _fmt, ...) fprintf(stderr, "%s: " _fmt, (_dev)->bdf, ##__VA_ARGS__) 40 + 41 + struct vfio_pci_device *vfio_pci_device_init(const char *bdf, struct iommu *iommu); 42 + void vfio_pci_device_cleanup(struct vfio_pci_device *device); 43 + 44 + void vfio_pci_device_reset(struct vfio_pci_device *device); 45 + 46 + void vfio_pci_config_access(struct vfio_pci_device *device, bool write, 47 + size_t config, size_t size, void *data); 48 + 49 + #define vfio_pci_config_read(_device, _offset, _type) ({ \ 50 + _type __data; \ 51 + vfio_pci_config_access((_device), false, _offset, sizeof(__data), &__data); \ 52 + __data; \ 53 + }) 54 + 55 + #define vfio_pci_config_readb(_d, _o) vfio_pci_config_read(_d, _o, u8) 56 + #define vfio_pci_config_readw(_d, _o) vfio_pci_config_read(_d, _o, u16) 57 + #define vfio_pci_config_readl(_d, _o) vfio_pci_config_read(_d, _o, u32) 58 + 59 + #define vfio_pci_config_write(_device, _offset, _value, _type) do { \ 60 + _type __data = (_value); \ 61 + vfio_pci_config_access((_device), true, _offset, sizeof(_type), &__data); \ 62 + } while (0) 63 + 64 + #define vfio_pci_config_writeb(_d, _o, _v) vfio_pci_config_write(_d, _o, _v, u8) 65 + #define vfio_pci_config_writew(_d, _o, _v) vfio_pci_config_write(_d, _o, _v, u16) 66 + #define vfio_pci_config_writel(_d, _o, _v) vfio_pci_config_write(_d, _o, _v, u32) 67 + 68 + void vfio_pci_irq_enable(struct vfio_pci_device *device, u32 index, 69 + u32 vector, int count); 70 + void vfio_pci_irq_disable(struct vfio_pci_device *device, u32 index); 71 + void vfio_pci_irq_trigger(struct vfio_pci_device *device, u32 index, u32 vector); 72 + 73 + static inline void fcntl_set_nonblock(int fd) 74 + { 75 + int r; 76 + 77 + r = fcntl(fd, F_GETFL, 0); 78 + VFIO_ASSERT_NE(r, -1, "F_GETFL failed for fd %d\n", fd); 79 + 80 + r = fcntl(fd, F_SETFL, r | O_NONBLOCK); 81 + VFIO_ASSERT_NE(r, -1, "F_SETFL O_NONBLOCK failed for fd %d\n", fd); 82 + } 83 + 84 + static inline void vfio_pci_msi_enable(struct vfio_pci_device *device, 85 + u32 vector, int count) 86 + { 87 + vfio_pci_irq_enable(device, VFIO_PCI_MSI_IRQ_INDEX, vector, count); 88 + } 89 + 90 + static inline void vfio_pci_msi_disable(struct vfio_pci_device *device) 91 + { 92 + vfio_pci_irq_disable(device, VFIO_PCI_MSI_IRQ_INDEX); 93 + } 94 + 95 + static inline void vfio_pci_msix_enable(struct vfio_pci_device *device, 96 + u32 vector, int count) 97 + { 98 + vfio_pci_irq_enable(device, VFIO_PCI_MSIX_IRQ_INDEX, vector, count); 99 + } 100 + 101 + static inline void vfio_pci_msix_disable(struct vfio_pci_device *device) 102 + { 103 + vfio_pci_irq_disable(device, VFIO_PCI_MSIX_IRQ_INDEX); 104 + } 105 + 106 + static inline int __to_iova(struct vfio_pci_device *device, void *vaddr, iova_t *iova) 107 + { 108 + return __iommu_hva2iova(device->iommu, vaddr, iova); 109 + } 110 + 111 + static inline iova_t to_iova(struct vfio_pci_device *device, void *vaddr) 112 + { 113 + return iommu_hva2iova(device->iommu, vaddr); 114 + } 115 + 116 + static inline bool vfio_pci_device_match(struct vfio_pci_device *device, 117 + u16 vendor_id, u16 device_id) 118 + { 119 + return (vendor_id == vfio_pci_config_readw(device, PCI_VENDOR_ID)) && 120 + (device_id == vfio_pci_config_readw(device, PCI_DEVICE_ID)); 121 + } 122 + 123 + const char *vfio_pci_get_cdev_path(const char *bdf); 124 + 125 + #endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DEVICE_H */
+97
tools/testing/selftests/vfio/lib/include/libvfio/vfio_pci_driver.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DRIVER_H 3 + #define SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DRIVER_H 4 + 5 + #include <libvfio/iommu.h> 6 + 7 + struct vfio_pci_device; 8 + 9 + struct vfio_pci_driver_ops { 10 + const char *name; 11 + 12 + /** 13 + * @probe() - Check if the driver supports the given device. 14 + * 15 + * Return: 0 on success, non-0 on failure. 16 + */ 17 + int (*probe)(struct vfio_pci_device *device); 18 + 19 + /** 20 + * @init() - Initialize the driver for @device. 21 + * 22 + * Must be called after device->driver.region has been initialized. 23 + */ 24 + void (*init)(struct vfio_pci_device *device); 25 + 26 + /** 27 + * remove() - Deinitialize the driver for @device. 28 + */ 29 + void (*remove)(struct vfio_pci_device *device); 30 + 31 + /** 32 + * memcpy_start() - Kick off @count repeated memcpy operations from 33 + * [@src, @src + @size) to [@dst, @dst + @size). 34 + * 35 + * Guarantees: 36 + * - The device will attempt DMA reads on [src, src + size). 37 + * - The device will attempt DMA writes on [dst, dst + size). 38 + * - The device will not generate any interrupts. 39 + * 40 + * memcpy_start() returns immediately, it does not wait for the 41 + * copies to complete. 42 + */ 43 + void (*memcpy_start)(struct vfio_pci_device *device, 44 + iova_t src, iova_t dst, u64 size, u64 count); 45 + 46 + /** 47 + * memcpy_wait() - Wait until the memcpy operations started by 48 + * memcpy_start() have finished. 49 + * 50 + * Guarantees: 51 + * - All in-flight DMAs initiated by memcpy_start() are fully complete 52 + * before memcpy_wait() returns. 53 + * 54 + * Returns non-0 if the driver detects that an error occurred during the 55 + * memcpy, 0 otherwise. 56 + */ 57 + int (*memcpy_wait)(struct vfio_pci_device *device); 58 + 59 + /** 60 + * send_msi() - Make the device send the MSI device->driver.msi. 61 + * 62 + * Guarantees: 63 + * - The device will send the MSI once. 64 + */ 65 + void (*send_msi)(struct vfio_pci_device *device); 66 + }; 67 + 68 + struct vfio_pci_driver { 69 + const struct vfio_pci_driver_ops *ops; 70 + bool initialized; 71 + bool memcpy_in_progress; 72 + 73 + /* Region to be used by the driver (e.g. for in-memory descriptors) */ 74 + struct dma_region region; 75 + 76 + /* The maximum size that can be passed to memcpy_start(). */ 77 + u64 max_memcpy_size; 78 + 79 + /* The maximum count that can be passed to memcpy_start(). */ 80 + u64 max_memcpy_count; 81 + 82 + /* The MSI vector the device will signal in ops->send_msi(). */ 83 + int msi; 84 + }; 85 + 86 + void vfio_pci_driver_probe(struct vfio_pci_device *device); 87 + void vfio_pci_driver_init(struct vfio_pci_device *device); 88 + void vfio_pci_driver_remove(struct vfio_pci_device *device); 89 + int vfio_pci_driver_memcpy(struct vfio_pci_device *device, 90 + iova_t src, iova_t dst, u64 size); 91 + void vfio_pci_driver_memcpy_start(struct vfio_pci_device *device, 92 + iova_t src, iova_t dst, u64 size, 93 + u64 count); 94 + int vfio_pci_driver_memcpy_wait(struct vfio_pci_device *device); 95 + void vfio_pci_driver_send_msi(struct vfio_pci_device *device); 96 + 97 + #endif /* SELFTESTS_VFIO_LIB_INCLUDE_LIBVFIO_VFIO_PCI_DRIVER_H */
-331
tools/testing/selftests/vfio/lib/include/vfio_util.h
··· 1 - /* SPDX-License-Identifier: GPL-2.0-only */ 2 - #ifndef SELFTESTS_VFIO_LIB_INCLUDE_VFIO_UTIL_H 3 - #define SELFTESTS_VFIO_LIB_INCLUDE_VFIO_UTIL_H 4 - 5 - #include <fcntl.h> 6 - #include <string.h> 7 - 8 - #include <uapi/linux/types.h> 9 - #include <linux/iommufd.h> 10 - #include <linux/list.h> 11 - #include <linux/pci_regs.h> 12 - #include <linux/vfio.h> 13 - 14 - #include "../../../kselftest.h" 15 - 16 - #define VFIO_LOG_AND_EXIT(...) do { \ 17 - fprintf(stderr, " " __VA_ARGS__); \ 18 - fprintf(stderr, "\n"); \ 19 - exit(KSFT_FAIL); \ 20 - } while (0) 21 - 22 - #define VFIO_ASSERT_OP(_lhs, _rhs, _op, ...) do { \ 23 - typeof(_lhs) __lhs = (_lhs); \ 24 - typeof(_rhs) __rhs = (_rhs); \ 25 - \ 26 - if (__lhs _op __rhs) \ 27 - break; \ 28 - \ 29 - fprintf(stderr, "%s:%u: Assertion Failure\n\n", __FILE__, __LINE__); \ 30 - fprintf(stderr, " Expression: " #_lhs " " #_op " " #_rhs "\n"); \ 31 - fprintf(stderr, " Observed: %#lx %s %#lx\n", \ 32 - (u64)__lhs, #_op, (u64)__rhs); \ 33 - fprintf(stderr, " [errno: %d - %s]\n", errno, strerror(errno)); \ 34 - VFIO_LOG_AND_EXIT(__VA_ARGS__); \ 35 - } while (0) 36 - 37 - #define VFIO_ASSERT_EQ(_a, _b, ...) VFIO_ASSERT_OP(_a, _b, ==, ##__VA_ARGS__) 38 - #define VFIO_ASSERT_NE(_a, _b, ...) VFIO_ASSERT_OP(_a, _b, !=, ##__VA_ARGS__) 39 - #define VFIO_ASSERT_LT(_a, _b, ...) VFIO_ASSERT_OP(_a, _b, <, ##__VA_ARGS__) 40 - #define VFIO_ASSERT_LE(_a, _b, ...) VFIO_ASSERT_OP(_a, _b, <=, ##__VA_ARGS__) 41 - #define VFIO_ASSERT_GT(_a, _b, ...) VFIO_ASSERT_OP(_a, _b, >, ##__VA_ARGS__) 42 - #define VFIO_ASSERT_GE(_a, _b, ...) VFIO_ASSERT_OP(_a, _b, >=, ##__VA_ARGS__) 43 - #define VFIO_ASSERT_TRUE(_a, ...) VFIO_ASSERT_NE(false, (_a), ##__VA_ARGS__) 44 - #define VFIO_ASSERT_FALSE(_a, ...) VFIO_ASSERT_EQ(false, (_a), ##__VA_ARGS__) 45 - #define VFIO_ASSERT_NULL(_a, ...) VFIO_ASSERT_EQ(NULL, _a, ##__VA_ARGS__) 46 - #define VFIO_ASSERT_NOT_NULL(_a, ...) VFIO_ASSERT_NE(NULL, _a, ##__VA_ARGS__) 47 - 48 - #define VFIO_FAIL(_fmt, ...) do { \ 49 - fprintf(stderr, "%s:%u: FAIL\n\n", __FILE__, __LINE__); \ 50 - VFIO_LOG_AND_EXIT(_fmt, ##__VA_ARGS__); \ 51 - } while (0) 52 - 53 - struct vfio_iommu_mode { 54 - const char *name; 55 - const char *container_path; 56 - unsigned long iommu_type; 57 - }; 58 - 59 - /* 60 - * Generator for VFIO selftests fixture variants that replicate across all 61 - * possible IOMMU modes. Tests must define FIXTURE_VARIANT_ADD_IOMMU_MODE() 62 - * which should then use FIXTURE_VARIANT_ADD() to create the variant. 63 - */ 64 - #define FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(...) \ 65 - FIXTURE_VARIANT_ADD_IOMMU_MODE(vfio_type1_iommu, ##__VA_ARGS__); \ 66 - FIXTURE_VARIANT_ADD_IOMMU_MODE(vfio_type1v2_iommu, ##__VA_ARGS__); \ 67 - FIXTURE_VARIANT_ADD_IOMMU_MODE(iommufd_compat_type1, ##__VA_ARGS__); \ 68 - FIXTURE_VARIANT_ADD_IOMMU_MODE(iommufd_compat_type1v2, ##__VA_ARGS__); \ 69 - FIXTURE_VARIANT_ADD_IOMMU_MODE(iommufd, ##__VA_ARGS__) 70 - 71 - struct vfio_pci_bar { 72 - struct vfio_region_info info; 73 - void *vaddr; 74 - }; 75 - 76 - typedef u64 iova_t; 77 - 78 - #define INVALID_IOVA UINT64_MAX 79 - 80 - struct vfio_dma_region { 81 - struct list_head link; 82 - void *vaddr; 83 - iova_t iova; 84 - u64 size; 85 - }; 86 - 87 - struct vfio_pci_device; 88 - 89 - struct vfio_pci_driver_ops { 90 - const char *name; 91 - 92 - /** 93 - * @probe() - Check if the driver supports the given device. 94 - * 95 - * Return: 0 on success, non-0 on failure. 96 - */ 97 - int (*probe)(struct vfio_pci_device *device); 98 - 99 - /** 100 - * @init() - Initialize the driver for @device. 101 - * 102 - * Must be called after device->driver.region has been initialized. 103 - */ 104 - void (*init)(struct vfio_pci_device *device); 105 - 106 - /** 107 - * remove() - Deinitialize the driver for @device. 108 - */ 109 - void (*remove)(struct vfio_pci_device *device); 110 - 111 - /** 112 - * memcpy_start() - Kick off @count repeated memcpy operations from 113 - * [@src, @src + @size) to [@dst, @dst + @size). 114 - * 115 - * Guarantees: 116 - * - The device will attempt DMA reads on [src, src + size). 117 - * - The device will attempt DMA writes on [dst, dst + size). 118 - * - The device will not generate any interrupts. 119 - * 120 - * memcpy_start() returns immediately, it does not wait for the 121 - * copies to complete. 122 - */ 123 - void (*memcpy_start)(struct vfio_pci_device *device, 124 - iova_t src, iova_t dst, u64 size, u64 count); 125 - 126 - /** 127 - * memcpy_wait() - Wait until the memcpy operations started by 128 - * memcpy_start() have finished. 129 - * 130 - * Guarantees: 131 - * - All in-flight DMAs initiated by memcpy_start() are fully complete 132 - * before memcpy_wait() returns. 133 - * 134 - * Returns non-0 if the driver detects that an error occurred during the 135 - * memcpy, 0 otherwise. 136 - */ 137 - int (*memcpy_wait)(struct vfio_pci_device *device); 138 - 139 - /** 140 - * send_msi() - Make the device send the MSI device->driver.msi. 141 - * 142 - * Guarantees: 143 - * - The device will send the MSI once. 144 - */ 145 - void (*send_msi)(struct vfio_pci_device *device); 146 - }; 147 - 148 - struct vfio_pci_driver { 149 - const struct vfio_pci_driver_ops *ops; 150 - bool initialized; 151 - bool memcpy_in_progress; 152 - 153 - /* Region to be used by the driver (e.g. for in-memory descriptors) */ 154 - struct vfio_dma_region region; 155 - 156 - /* The maximum size that can be passed to memcpy_start(). */ 157 - u64 max_memcpy_size; 158 - 159 - /* The maximum count that can be passed to memcpy_start(). */ 160 - u64 max_memcpy_count; 161 - 162 - /* The MSI vector the device will signal in ops->send_msi(). */ 163 - int msi; 164 - }; 165 - 166 - struct vfio_pci_device { 167 - int fd; 168 - 169 - const struct vfio_iommu_mode *iommu_mode; 170 - int group_fd; 171 - int container_fd; 172 - 173 - int iommufd; 174 - u32 ioas_id; 175 - 176 - struct vfio_device_info info; 177 - struct vfio_region_info config_space; 178 - struct vfio_pci_bar bars[PCI_STD_NUM_BARS]; 179 - 180 - struct vfio_irq_info msi_info; 181 - struct vfio_irq_info msix_info; 182 - 183 - struct list_head dma_regions; 184 - 185 - /* eventfds for MSI and MSI-x interrupts */ 186 - int msi_eventfds[PCI_MSIX_FLAGS_QSIZE + 1]; 187 - 188 - struct vfio_pci_driver driver; 189 - }; 190 - 191 - struct iova_allocator { 192 - struct iommu_iova_range *ranges; 193 - u32 nranges; 194 - u32 range_idx; 195 - u64 range_offset; 196 - }; 197 - 198 - /* 199 - * Return the BDF string of the device that the test should use. 200 - * 201 - * If a BDF string is provided by the user on the command line (as the last 202 - * element of argv[]), then this function will return that and decrement argc 203 - * by 1. 204 - * 205 - * Otherwise this function will attempt to use the environment variable 206 - * $VFIO_SELFTESTS_BDF. 207 - * 208 - * If BDF cannot be determined then the test will exit with KSFT_SKIP. 209 - */ 210 - const char *vfio_selftests_get_bdf(int *argc, char *argv[]); 211 - const char *vfio_pci_get_cdev_path(const char *bdf); 212 - 213 - extern const char *default_iommu_mode; 214 - 215 - struct vfio_pci_device *vfio_pci_device_init(const char *bdf, const char *iommu_mode); 216 - void vfio_pci_device_cleanup(struct vfio_pci_device *device); 217 - void vfio_pci_device_reset(struct vfio_pci_device *device); 218 - 219 - struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, 220 - u32 *nranges); 221 - 222 - struct iova_allocator *iova_allocator_init(struct vfio_pci_device *device); 223 - void iova_allocator_cleanup(struct iova_allocator *allocator); 224 - iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size); 225 - 226 - int __vfio_pci_dma_map(struct vfio_pci_device *device, 227 - struct vfio_dma_region *region); 228 - int __vfio_pci_dma_unmap(struct vfio_pci_device *device, 229 - struct vfio_dma_region *region, 230 - u64 *unmapped); 231 - int __vfio_pci_dma_unmap_all(struct vfio_pci_device *device, u64 *unmapped); 232 - 233 - static inline void vfio_pci_dma_map(struct vfio_pci_device *device, 234 - struct vfio_dma_region *region) 235 - { 236 - VFIO_ASSERT_EQ(__vfio_pci_dma_map(device, region), 0); 237 - } 238 - 239 - static inline void vfio_pci_dma_unmap(struct vfio_pci_device *device, 240 - struct vfio_dma_region *region) 241 - { 242 - VFIO_ASSERT_EQ(__vfio_pci_dma_unmap(device, region, NULL), 0); 243 - } 244 - 245 - static inline void vfio_pci_dma_unmap_all(struct vfio_pci_device *device) 246 - { 247 - VFIO_ASSERT_EQ(__vfio_pci_dma_unmap_all(device, NULL), 0); 248 - } 249 - 250 - void vfio_pci_config_access(struct vfio_pci_device *device, bool write, 251 - size_t config, size_t size, void *data); 252 - 253 - #define vfio_pci_config_read(_device, _offset, _type) ({ \ 254 - _type __data; \ 255 - vfio_pci_config_access((_device), false, _offset, sizeof(__data), &__data); \ 256 - __data; \ 257 - }) 258 - 259 - #define vfio_pci_config_readb(_d, _o) vfio_pci_config_read(_d, _o, u8) 260 - #define vfio_pci_config_readw(_d, _o) vfio_pci_config_read(_d, _o, u16) 261 - #define vfio_pci_config_readl(_d, _o) vfio_pci_config_read(_d, _o, u32) 262 - 263 - #define vfio_pci_config_write(_device, _offset, _value, _type) do { \ 264 - _type __data = (_value); \ 265 - vfio_pci_config_access((_device), true, _offset, sizeof(_type), &__data); \ 266 - } while (0) 267 - 268 - #define vfio_pci_config_writeb(_d, _o, _v) vfio_pci_config_write(_d, _o, _v, u8) 269 - #define vfio_pci_config_writew(_d, _o, _v) vfio_pci_config_write(_d, _o, _v, u16) 270 - #define vfio_pci_config_writel(_d, _o, _v) vfio_pci_config_write(_d, _o, _v, u32) 271 - 272 - void vfio_pci_irq_enable(struct vfio_pci_device *device, u32 index, 273 - u32 vector, int count); 274 - void vfio_pci_irq_disable(struct vfio_pci_device *device, u32 index); 275 - void vfio_pci_irq_trigger(struct vfio_pci_device *device, u32 index, u32 vector); 276 - 277 - static inline void fcntl_set_nonblock(int fd) 278 - { 279 - int r; 280 - 281 - r = fcntl(fd, F_GETFL, 0); 282 - VFIO_ASSERT_NE(r, -1, "F_GETFL failed for fd %d\n", fd); 283 - 284 - r = fcntl(fd, F_SETFL, r | O_NONBLOCK); 285 - VFIO_ASSERT_NE(r, -1, "F_SETFL O_NONBLOCK failed for fd %d\n", fd); 286 - } 287 - 288 - static inline void vfio_pci_msi_enable(struct vfio_pci_device *device, 289 - u32 vector, int count) 290 - { 291 - vfio_pci_irq_enable(device, VFIO_PCI_MSI_IRQ_INDEX, vector, count); 292 - } 293 - 294 - static inline void vfio_pci_msi_disable(struct vfio_pci_device *device) 295 - { 296 - vfio_pci_irq_disable(device, VFIO_PCI_MSI_IRQ_INDEX); 297 - } 298 - 299 - static inline void vfio_pci_msix_enable(struct vfio_pci_device *device, 300 - u32 vector, int count) 301 - { 302 - vfio_pci_irq_enable(device, VFIO_PCI_MSIX_IRQ_INDEX, vector, count); 303 - } 304 - 305 - static inline void vfio_pci_msix_disable(struct vfio_pci_device *device) 306 - { 307 - vfio_pci_irq_disable(device, VFIO_PCI_MSIX_IRQ_INDEX); 308 - } 309 - 310 - iova_t __to_iova(struct vfio_pci_device *device, void *vaddr); 311 - iova_t to_iova(struct vfio_pci_device *device, void *vaddr); 312 - 313 - static inline bool vfio_pci_device_match(struct vfio_pci_device *device, 314 - u16 vendor_id, u16 device_id) 315 - { 316 - return (vendor_id == vfio_pci_config_readw(device, PCI_VENDOR_ID)) && 317 - (device_id == vfio_pci_config_readw(device, PCI_DEVICE_ID)); 318 - } 319 - 320 - void vfio_pci_driver_probe(struct vfio_pci_device *device); 321 - void vfio_pci_driver_init(struct vfio_pci_device *device); 322 - void vfio_pci_driver_remove(struct vfio_pci_device *device); 323 - int vfio_pci_driver_memcpy(struct vfio_pci_device *device, 324 - iova_t src, iova_t dst, u64 size); 325 - void vfio_pci_driver_memcpy_start(struct vfio_pci_device *device, 326 - iova_t src, iova_t dst, u64 size, 327 - u64 count); 328 - int vfio_pci_driver_memcpy_wait(struct vfio_pci_device *device); 329 - void vfio_pci_driver_send_msi(struct vfio_pci_device *device); 330 - 331 - #endif /* SELFTESTS_VFIO_LIB_INCLUDE_VFIO_UTIL_H */
+465
tools/testing/selftests/vfio/lib/iommu.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + #include <dirent.h> 3 + #include <fcntl.h> 4 + #include <libgen.h> 5 + #include <stdint.h> 6 + #include <stdlib.h> 7 + #include <string.h> 8 + #include <unistd.h> 9 + 10 + #include <sys/eventfd.h> 11 + #include <sys/ioctl.h> 12 + #include <sys/mman.h> 13 + 14 + #include <uapi/linux/types.h> 15 + #include <linux/limits.h> 16 + #include <linux/mman.h> 17 + #include <linux/types.h> 18 + #include <linux/vfio.h> 19 + #include <linux/iommufd.h> 20 + 21 + #include "../../../kselftest.h" 22 + #include <libvfio.h> 23 + 24 + const char *default_iommu_mode = "iommufd"; 25 + 26 + /* Reminder: Keep in sync with FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(). */ 27 + static const struct iommu_mode iommu_modes[] = { 28 + { 29 + .name = "vfio_type1_iommu", 30 + .container_path = "/dev/vfio/vfio", 31 + .iommu_type = VFIO_TYPE1_IOMMU, 32 + }, 33 + { 34 + .name = "vfio_type1v2_iommu", 35 + .container_path = "/dev/vfio/vfio", 36 + .iommu_type = VFIO_TYPE1v2_IOMMU, 37 + }, 38 + { 39 + .name = "iommufd_compat_type1", 40 + .container_path = "/dev/iommu", 41 + .iommu_type = VFIO_TYPE1_IOMMU, 42 + }, 43 + { 44 + .name = "iommufd_compat_type1v2", 45 + .container_path = "/dev/iommu", 46 + .iommu_type = VFIO_TYPE1v2_IOMMU, 47 + }, 48 + { 49 + .name = "iommufd", 50 + }, 51 + }; 52 + 53 + static const struct iommu_mode *lookup_iommu_mode(const char *iommu_mode) 54 + { 55 + int i; 56 + 57 + if (!iommu_mode) 58 + iommu_mode = default_iommu_mode; 59 + 60 + for (i = 0; i < ARRAY_SIZE(iommu_modes); i++) { 61 + if (strcmp(iommu_mode, iommu_modes[i].name)) 62 + continue; 63 + 64 + return &iommu_modes[i]; 65 + } 66 + 67 + VFIO_FAIL("Unrecognized IOMMU mode: %s\n", iommu_mode); 68 + } 69 + 70 + int __iommu_hva2iova(struct iommu *iommu, void *vaddr, iova_t *iova) 71 + { 72 + struct dma_region *region; 73 + 74 + list_for_each_entry(region, &iommu->dma_regions, link) { 75 + if (vaddr < region->vaddr) 76 + continue; 77 + 78 + if (vaddr >= region->vaddr + region->size) 79 + continue; 80 + 81 + if (iova) 82 + *iova = region->iova + (vaddr - region->vaddr); 83 + 84 + return 0; 85 + } 86 + 87 + return -ENOENT; 88 + } 89 + 90 + iova_t iommu_hva2iova(struct iommu *iommu, void *vaddr) 91 + { 92 + iova_t iova; 93 + int ret; 94 + 95 + ret = __iommu_hva2iova(iommu, vaddr, &iova); 96 + VFIO_ASSERT_EQ(ret, 0, "%p is not mapped into the iommu\n", vaddr); 97 + 98 + return iova; 99 + } 100 + 101 + static int vfio_iommu_map(struct iommu *iommu, struct dma_region *region) 102 + { 103 + struct vfio_iommu_type1_dma_map args = { 104 + .argsz = sizeof(args), 105 + .flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE, 106 + .vaddr = (u64)region->vaddr, 107 + .iova = region->iova, 108 + .size = region->size, 109 + }; 110 + 111 + if (ioctl(iommu->container_fd, VFIO_IOMMU_MAP_DMA, &args)) 112 + return -errno; 113 + 114 + return 0; 115 + } 116 + 117 + static int iommufd_map(struct iommu *iommu, struct dma_region *region) 118 + { 119 + struct iommu_ioas_map args = { 120 + .size = sizeof(args), 121 + .flags = IOMMU_IOAS_MAP_READABLE | 122 + IOMMU_IOAS_MAP_WRITEABLE | 123 + IOMMU_IOAS_MAP_FIXED_IOVA, 124 + .user_va = (u64)region->vaddr, 125 + .iova = region->iova, 126 + .length = region->size, 127 + .ioas_id = iommu->ioas_id, 128 + }; 129 + 130 + if (ioctl(iommu->iommufd, IOMMU_IOAS_MAP, &args)) 131 + return -errno; 132 + 133 + return 0; 134 + } 135 + 136 + int __iommu_map(struct iommu *iommu, struct dma_region *region) 137 + { 138 + int ret; 139 + 140 + if (iommu->iommufd) 141 + ret = iommufd_map(iommu, region); 142 + else 143 + ret = vfio_iommu_map(iommu, region); 144 + 145 + if (ret) 146 + return ret; 147 + 148 + list_add(&region->link, &iommu->dma_regions); 149 + 150 + return 0; 151 + } 152 + 153 + static int __vfio_iommu_unmap(int fd, u64 iova, u64 size, u32 flags, u64 *unmapped) 154 + { 155 + struct vfio_iommu_type1_dma_unmap args = { 156 + .argsz = sizeof(args), 157 + .iova = iova, 158 + .size = size, 159 + .flags = flags, 160 + }; 161 + 162 + if (ioctl(fd, VFIO_IOMMU_UNMAP_DMA, &args)) 163 + return -errno; 164 + 165 + if (unmapped) 166 + *unmapped = args.size; 167 + 168 + return 0; 169 + } 170 + 171 + static int vfio_iommu_unmap(struct iommu *iommu, struct dma_region *region, 172 + u64 *unmapped) 173 + { 174 + return __vfio_iommu_unmap(iommu->container_fd, region->iova, 175 + region->size, 0, unmapped); 176 + } 177 + 178 + static int __iommufd_unmap(int fd, u64 iova, u64 length, u32 ioas_id, u64 *unmapped) 179 + { 180 + struct iommu_ioas_unmap args = { 181 + .size = sizeof(args), 182 + .iova = iova, 183 + .length = length, 184 + .ioas_id = ioas_id, 185 + }; 186 + 187 + if (ioctl(fd, IOMMU_IOAS_UNMAP, &args)) 188 + return -errno; 189 + 190 + if (unmapped) 191 + *unmapped = args.length; 192 + 193 + return 0; 194 + } 195 + 196 + static int iommufd_unmap(struct iommu *iommu, struct dma_region *region, 197 + u64 *unmapped) 198 + { 199 + return __iommufd_unmap(iommu->iommufd, region->iova, region->size, 200 + iommu->ioas_id, unmapped); 201 + } 202 + 203 + int __iommu_unmap(struct iommu *iommu, struct dma_region *region, u64 *unmapped) 204 + { 205 + int ret; 206 + 207 + if (iommu->iommufd) 208 + ret = iommufd_unmap(iommu, region, unmapped); 209 + else 210 + ret = vfio_iommu_unmap(iommu, region, unmapped); 211 + 212 + if (ret) 213 + return ret; 214 + 215 + list_del_init(&region->link); 216 + 217 + return 0; 218 + } 219 + 220 + int __iommu_unmap_all(struct iommu *iommu, u64 *unmapped) 221 + { 222 + int ret; 223 + struct dma_region *curr, *next; 224 + 225 + if (iommu->iommufd) 226 + ret = __iommufd_unmap(iommu->iommufd, 0, UINT64_MAX, 227 + iommu->ioas_id, unmapped); 228 + else 229 + ret = __vfio_iommu_unmap(iommu->container_fd, 0, 0, 230 + VFIO_DMA_UNMAP_FLAG_ALL, unmapped); 231 + 232 + if (ret) 233 + return ret; 234 + 235 + list_for_each_entry_safe(curr, next, &iommu->dma_regions, link) 236 + list_del_init(&curr->link); 237 + 238 + return 0; 239 + } 240 + 241 + static struct vfio_info_cap_header *next_cap_hdr(void *buf, u32 bufsz, 242 + u32 *cap_offset) 243 + { 244 + struct vfio_info_cap_header *hdr; 245 + 246 + if (!*cap_offset) 247 + return NULL; 248 + 249 + VFIO_ASSERT_LT(*cap_offset, bufsz); 250 + VFIO_ASSERT_GE(bufsz - *cap_offset, sizeof(*hdr)); 251 + 252 + hdr = (struct vfio_info_cap_header *)((u8 *)buf + *cap_offset); 253 + *cap_offset = hdr->next; 254 + 255 + return hdr; 256 + } 257 + 258 + static struct vfio_info_cap_header *vfio_iommu_info_cap_hdr(struct vfio_iommu_type1_info *info, 259 + u16 cap_id) 260 + { 261 + struct vfio_info_cap_header *hdr; 262 + u32 cap_offset = info->cap_offset; 263 + u32 max_depth; 264 + u32 depth = 0; 265 + 266 + if (!(info->flags & VFIO_IOMMU_INFO_CAPS)) 267 + return NULL; 268 + 269 + if (cap_offset) 270 + VFIO_ASSERT_GE(cap_offset, sizeof(*info)); 271 + 272 + max_depth = (info->argsz - sizeof(*info)) / sizeof(*hdr); 273 + 274 + while ((hdr = next_cap_hdr(info, info->argsz, &cap_offset))) { 275 + depth++; 276 + VFIO_ASSERT_LE(depth, max_depth, "Capability chain contains a cycle\n"); 277 + 278 + if (hdr->id == cap_id) 279 + return hdr; 280 + } 281 + 282 + return NULL; 283 + } 284 + 285 + /* Return buffer including capability chain, if present. Free with free() */ 286 + static struct vfio_iommu_type1_info *vfio_iommu_get_info(int container_fd) 287 + { 288 + struct vfio_iommu_type1_info *info; 289 + 290 + info = malloc(sizeof(*info)); 291 + VFIO_ASSERT_NOT_NULL(info); 292 + 293 + *info = (struct vfio_iommu_type1_info) { 294 + .argsz = sizeof(*info), 295 + }; 296 + 297 + ioctl_assert(container_fd, VFIO_IOMMU_GET_INFO, info); 298 + VFIO_ASSERT_GE(info->argsz, sizeof(*info)); 299 + 300 + info = realloc(info, info->argsz); 301 + VFIO_ASSERT_NOT_NULL(info); 302 + 303 + ioctl_assert(container_fd, VFIO_IOMMU_GET_INFO, info); 304 + VFIO_ASSERT_GE(info->argsz, sizeof(*info)); 305 + 306 + return info; 307 + } 308 + 309 + /* 310 + * Return iova ranges for the device's container. Normalize vfio_iommu_type1 to 311 + * report iommufd's iommu_iova_range. Free with free(). 312 + */ 313 + static struct iommu_iova_range *vfio_iommu_iova_ranges(struct iommu *iommu, 314 + u32 *nranges) 315 + { 316 + struct vfio_iommu_type1_info_cap_iova_range *cap_range; 317 + struct vfio_iommu_type1_info *info; 318 + struct vfio_info_cap_header *hdr; 319 + struct iommu_iova_range *ranges = NULL; 320 + 321 + info = vfio_iommu_get_info(iommu->container_fd); 322 + hdr = vfio_iommu_info_cap_hdr(info, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE); 323 + VFIO_ASSERT_NOT_NULL(hdr); 324 + 325 + cap_range = container_of(hdr, struct vfio_iommu_type1_info_cap_iova_range, header); 326 + VFIO_ASSERT_GT(cap_range->nr_iovas, 0); 327 + 328 + ranges = calloc(cap_range->nr_iovas, sizeof(*ranges)); 329 + VFIO_ASSERT_NOT_NULL(ranges); 330 + 331 + for (u32 i = 0; i < cap_range->nr_iovas; i++) { 332 + ranges[i] = (struct iommu_iova_range){ 333 + .start = cap_range->iova_ranges[i].start, 334 + .last = cap_range->iova_ranges[i].end, 335 + }; 336 + } 337 + 338 + *nranges = cap_range->nr_iovas; 339 + 340 + free(info); 341 + return ranges; 342 + } 343 + 344 + /* Return iova ranges of the device's IOAS. Free with free() */ 345 + static struct iommu_iova_range *iommufd_iova_ranges(struct iommu *iommu, 346 + u32 *nranges) 347 + { 348 + struct iommu_iova_range *ranges; 349 + int ret; 350 + 351 + struct iommu_ioas_iova_ranges query = { 352 + .size = sizeof(query), 353 + .ioas_id = iommu->ioas_id, 354 + }; 355 + 356 + ret = ioctl(iommu->iommufd, IOMMU_IOAS_IOVA_RANGES, &query); 357 + VFIO_ASSERT_EQ(ret, -1); 358 + VFIO_ASSERT_EQ(errno, EMSGSIZE); 359 + VFIO_ASSERT_GT(query.num_iovas, 0); 360 + 361 + ranges = calloc(query.num_iovas, sizeof(*ranges)); 362 + VFIO_ASSERT_NOT_NULL(ranges); 363 + 364 + query.allowed_iovas = (uintptr_t)ranges; 365 + 366 + ioctl_assert(iommu->iommufd, IOMMU_IOAS_IOVA_RANGES, &query); 367 + *nranges = query.num_iovas; 368 + 369 + return ranges; 370 + } 371 + 372 + static int iova_range_comp(const void *a, const void *b) 373 + { 374 + const struct iommu_iova_range *ra = a, *rb = b; 375 + 376 + if (ra->start < rb->start) 377 + return -1; 378 + 379 + if (ra->start > rb->start) 380 + return 1; 381 + 382 + return 0; 383 + } 384 + 385 + /* Return sorted IOVA ranges of the device. Free with free(). */ 386 + struct iommu_iova_range *iommu_iova_ranges(struct iommu *iommu, u32 *nranges) 387 + { 388 + struct iommu_iova_range *ranges; 389 + 390 + if (iommu->iommufd) 391 + ranges = iommufd_iova_ranges(iommu, nranges); 392 + else 393 + ranges = vfio_iommu_iova_ranges(iommu, nranges); 394 + 395 + if (!ranges) 396 + return NULL; 397 + 398 + VFIO_ASSERT_GT(*nranges, 0); 399 + 400 + /* Sort and check that ranges are sane and non-overlapping */ 401 + qsort(ranges, *nranges, sizeof(*ranges), iova_range_comp); 402 + VFIO_ASSERT_LT(ranges[0].start, ranges[0].last); 403 + 404 + for (u32 i = 1; i < *nranges; i++) { 405 + VFIO_ASSERT_LT(ranges[i].start, ranges[i].last); 406 + VFIO_ASSERT_LT(ranges[i - 1].last, ranges[i].start); 407 + } 408 + 409 + return ranges; 410 + } 411 + 412 + static u32 iommufd_ioas_alloc(int iommufd) 413 + { 414 + struct iommu_ioas_alloc args = { 415 + .size = sizeof(args), 416 + }; 417 + 418 + ioctl_assert(iommufd, IOMMU_IOAS_ALLOC, &args); 419 + return args.out_ioas_id; 420 + } 421 + 422 + struct iommu *iommu_init(const char *iommu_mode) 423 + { 424 + const char *container_path; 425 + struct iommu *iommu; 426 + int version; 427 + 428 + iommu = calloc(1, sizeof(*iommu)); 429 + VFIO_ASSERT_NOT_NULL(iommu); 430 + 431 + INIT_LIST_HEAD(&iommu->dma_regions); 432 + 433 + iommu->mode = lookup_iommu_mode(iommu_mode); 434 + 435 + container_path = iommu->mode->container_path; 436 + if (container_path) { 437 + iommu->container_fd = open(container_path, O_RDWR); 438 + VFIO_ASSERT_GE(iommu->container_fd, 0, "open(%s) failed\n", container_path); 439 + 440 + version = ioctl(iommu->container_fd, VFIO_GET_API_VERSION); 441 + VFIO_ASSERT_EQ(version, VFIO_API_VERSION, "Unsupported version: %d\n", version); 442 + } else { 443 + /* 444 + * Require device->iommufd to be >0 so that a simple non-0 check can be 445 + * used to check if iommufd is enabled. In practice open() will never 446 + * return 0 unless stdin is closed. 447 + */ 448 + iommu->iommufd = open("/dev/iommu", O_RDWR); 449 + VFIO_ASSERT_GT(iommu->iommufd, 0); 450 + 451 + iommu->ioas_id = iommufd_ioas_alloc(iommu->iommufd); 452 + } 453 + 454 + return iommu; 455 + } 456 + 457 + void iommu_cleanup(struct iommu *iommu) 458 + { 459 + if (iommu->iommufd) 460 + VFIO_ASSERT_EQ(close(iommu->iommufd), 0); 461 + else 462 + VFIO_ASSERT_EQ(close(iommu->container_fd), 0); 463 + 464 + free(iommu); 465 + }
+94
tools/testing/selftests/vfio/lib/iova_allocator.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + #include <dirent.h> 3 + #include <fcntl.h> 4 + #include <libgen.h> 5 + #include <stdint.h> 6 + #include <stdlib.h> 7 + #include <string.h> 8 + #include <unistd.h> 9 + 10 + #include <sys/eventfd.h> 11 + #include <sys/ioctl.h> 12 + #include <sys/mman.h> 13 + 14 + #include <uapi/linux/types.h> 15 + #include <linux/iommufd.h> 16 + #include <linux/limits.h> 17 + #include <linux/mman.h> 18 + #include <linux/overflow.h> 19 + #include <linux/types.h> 20 + #include <linux/vfio.h> 21 + 22 + #include <libvfio.h> 23 + 24 + struct iova_allocator *iova_allocator_init(struct iommu *iommu) 25 + { 26 + struct iova_allocator *allocator; 27 + struct iommu_iova_range *ranges; 28 + u32 nranges; 29 + 30 + ranges = iommu_iova_ranges(iommu, &nranges); 31 + VFIO_ASSERT_NOT_NULL(ranges); 32 + 33 + allocator = malloc(sizeof(*allocator)); 34 + VFIO_ASSERT_NOT_NULL(allocator); 35 + 36 + *allocator = (struct iova_allocator){ 37 + .ranges = ranges, 38 + .nranges = nranges, 39 + .range_idx = 0, 40 + .range_offset = 0, 41 + }; 42 + 43 + return allocator; 44 + } 45 + 46 + void iova_allocator_cleanup(struct iova_allocator *allocator) 47 + { 48 + free(allocator->ranges); 49 + free(allocator); 50 + } 51 + 52 + iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size) 53 + { 54 + VFIO_ASSERT_GT(size, 0, "Invalid size arg, zero\n"); 55 + VFIO_ASSERT_EQ(size & (size - 1), 0, "Invalid size arg, non-power-of-2\n"); 56 + 57 + for (;;) { 58 + struct iommu_iova_range *range; 59 + iova_t iova, last; 60 + 61 + VFIO_ASSERT_LT(allocator->range_idx, allocator->nranges, 62 + "IOVA allocator out of space\n"); 63 + 64 + range = &allocator->ranges[allocator->range_idx]; 65 + iova = range->start + allocator->range_offset; 66 + 67 + /* Check for sufficient space at the current offset */ 68 + if (check_add_overflow(iova, size - 1, &last) || 69 + last > range->last) 70 + goto next_range; 71 + 72 + /* Align iova to size */ 73 + iova = last & ~(size - 1); 74 + 75 + /* Check for sufficient space at the aligned iova */ 76 + if (check_add_overflow(iova, size - 1, &last) || 77 + last > range->last) 78 + goto next_range; 79 + 80 + if (last == range->last) { 81 + allocator->range_idx++; 82 + allocator->range_offset = 0; 83 + } else { 84 + allocator->range_offset = last - range->start + 1; 85 + } 86 + 87 + return iova; 88 + 89 + next_range: 90 + allocator->range_idx++; 91 + allocator->range_offset = 0; 92 + } 93 + } 94 +
+78
tools/testing/selftests/vfio/lib/libvfio.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + 3 + #include <stdio.h> 4 + #include <stdlib.h> 5 + 6 + #include "../../../kselftest.h" 7 + #include <libvfio.h> 8 + 9 + static bool is_bdf(const char *str) 10 + { 11 + unsigned int s, b, d, f; 12 + int length, count; 13 + 14 + count = sscanf(str, "%4x:%2x:%2x.%2x%n", &s, &b, &d, &f, &length); 15 + return count == 4 && length == strlen(str); 16 + } 17 + 18 + static char **get_bdfs_cmdline(int *argc, char *argv[], int *nr_bdfs) 19 + { 20 + int i; 21 + 22 + for (i = *argc - 1; i > 0 && is_bdf(argv[i]); i--) 23 + continue; 24 + 25 + i++; 26 + *nr_bdfs = *argc - i; 27 + *argc -= *nr_bdfs; 28 + 29 + return *nr_bdfs ? &argv[i] : NULL; 30 + } 31 + 32 + static char *get_bdf_env(void) 33 + { 34 + char *bdf; 35 + 36 + bdf = getenv("VFIO_SELFTESTS_BDF"); 37 + if (!bdf) 38 + return NULL; 39 + 40 + VFIO_ASSERT_TRUE(is_bdf(bdf), "Invalid BDF: %s\n", bdf); 41 + return bdf; 42 + } 43 + 44 + char **vfio_selftests_get_bdfs(int *argc, char *argv[], int *nr_bdfs) 45 + { 46 + static char *env_bdf; 47 + char **bdfs; 48 + 49 + bdfs = get_bdfs_cmdline(argc, argv, nr_bdfs); 50 + if (bdfs) 51 + return bdfs; 52 + 53 + env_bdf = get_bdf_env(); 54 + if (env_bdf) { 55 + *nr_bdfs = 1; 56 + return &env_bdf; 57 + } 58 + 59 + fprintf(stderr, "Unable to determine which device(s) to use, skipping test.\n"); 60 + fprintf(stderr, "\n"); 61 + fprintf(stderr, "To pass the device address via environment variable:\n"); 62 + fprintf(stderr, "\n"); 63 + fprintf(stderr, " export VFIO_SELFTESTS_BDF=\"segment:bus:device.function\"\n"); 64 + fprintf(stderr, " %s [options]\n", argv[0]); 65 + fprintf(stderr, "\n"); 66 + fprintf(stderr, "To pass the device address(es) via argv:\n"); 67 + fprintf(stderr, "\n"); 68 + fprintf(stderr, " %s [options] segment:bus:device.function ...\n", argv[0]); 69 + fprintf(stderr, "\n"); 70 + exit(KSFT_SKIP); 71 + } 72 + 73 + const char *vfio_selftests_get_bdf(int *argc, char *argv[]) 74 + { 75 + int nr_bdfs; 76 + 77 + return vfio_selftests_get_bdfs(argc, argv, &nr_bdfs)[0]; 78 + }
+14 -9
tools/testing/selftests/vfio/lib/libvfio.mk
··· 1 1 include $(top_srcdir)/scripts/subarch.include 2 2 ARCH ?= $(SUBARCH) 3 3 4 - VFIO_DIR := $(selfdir)/vfio 4 + LIBVFIO_SRCDIR := $(selfdir)/vfio/lib 5 5 6 - LIBVFIO_C := lib/vfio_pci_device.c 7 - LIBVFIO_C += lib/vfio_pci_driver.c 6 + LIBVFIO_C := iommu.c 7 + LIBVFIO_C += iova_allocator.c 8 + LIBVFIO_C += libvfio.c 9 + LIBVFIO_C += vfio_pci_device.c 10 + LIBVFIO_C += vfio_pci_driver.c 8 11 9 12 ifeq ($(ARCH:x86_64=x86),x86) 10 - LIBVFIO_C += lib/drivers/ioat/ioat.c 11 - LIBVFIO_C += lib/drivers/dsa/dsa.c 13 + LIBVFIO_C += drivers/ioat/ioat.c 14 + LIBVFIO_C += drivers/dsa/dsa.c 12 15 endif 13 16 14 - LIBVFIO_O := $(patsubst %.c, $(OUTPUT)/%.o, $(LIBVFIO_C)) 17 + LIBVFIO_OUTPUT := $(OUTPUT)/libvfio 18 + 19 + LIBVFIO_O := $(patsubst %.c, $(LIBVFIO_OUTPUT)/%.o, $(LIBVFIO_C)) 15 20 16 21 LIBVFIO_O_DIRS := $(shell dirname $(LIBVFIO_O) | uniq) 17 22 $(shell mkdir -p $(LIBVFIO_O_DIRS)) 18 23 19 - CFLAGS += -I$(VFIO_DIR)/lib/include 24 + CFLAGS += -I$(LIBVFIO_SRCDIR)/include 20 25 21 - $(LIBVFIO_O): $(OUTPUT)/%.o : $(VFIO_DIR)/%.c 26 + $(LIBVFIO_O): $(LIBVFIO_OUTPUT)/%.o : $(LIBVFIO_SRCDIR)/%.c 22 27 $(CC) $(CFLAGS) $(CPPFLAGS) $(TARGET_ARCH) -c $< -o $@ 23 28 24 - EXTRA_CLEAN += $(LIBVFIO_O) 29 + EXTRA_CLEAN += $(LIBVFIO_OUTPUT)
+19 -537
tools/testing/selftests/vfio/lib/vfio_pci_device.c
··· 20 20 #include <linux/vfio.h> 21 21 22 22 #include "../../../kselftest.h" 23 - #include <vfio_util.h> 23 + #include <libvfio.h> 24 24 25 25 #define PCI_SYSFS_PATH "/sys/bus/pci/devices" 26 - 27 - #define ioctl_assert(_fd, _op, _arg) do { \ 28 - void *__arg = (_arg); \ 29 - int __ret = ioctl((_fd), (_op), (__arg)); \ 30 - VFIO_ASSERT_EQ(__ret, 0, "ioctl(%s, %s, %s) returned %d\n", #_fd, #_op, #_arg, __ret); \ 31 - } while (0) 32 - 33 - static struct vfio_info_cap_header *next_cap_hdr(void *buf, u32 bufsz, 34 - u32 *cap_offset) 35 - { 36 - struct vfio_info_cap_header *hdr; 37 - 38 - if (!*cap_offset) 39 - return NULL; 40 - 41 - VFIO_ASSERT_LT(*cap_offset, bufsz); 42 - VFIO_ASSERT_GE(bufsz - *cap_offset, sizeof(*hdr)); 43 - 44 - hdr = (struct vfio_info_cap_header *)((u8 *)buf + *cap_offset); 45 - *cap_offset = hdr->next; 46 - 47 - return hdr; 48 - } 49 - 50 - static struct vfio_info_cap_header *vfio_iommu_info_cap_hdr(struct vfio_iommu_type1_info *info, 51 - u16 cap_id) 52 - { 53 - struct vfio_info_cap_header *hdr; 54 - u32 cap_offset = info->cap_offset; 55 - u32 max_depth; 56 - u32 depth = 0; 57 - 58 - if (!(info->flags & VFIO_IOMMU_INFO_CAPS)) 59 - return NULL; 60 - 61 - if (cap_offset) 62 - VFIO_ASSERT_GE(cap_offset, sizeof(*info)); 63 - 64 - max_depth = (info->argsz - sizeof(*info)) / sizeof(*hdr); 65 - 66 - while ((hdr = next_cap_hdr(info, info->argsz, &cap_offset))) { 67 - depth++; 68 - VFIO_ASSERT_LE(depth, max_depth, "Capability chain contains a cycle\n"); 69 - 70 - if (hdr->id == cap_id) 71 - return hdr; 72 - } 73 - 74 - return NULL; 75 - } 76 - 77 - /* Return buffer including capability chain, if present. Free with free() */ 78 - static struct vfio_iommu_type1_info *vfio_iommu_get_info(struct vfio_pci_device *device) 79 - { 80 - struct vfio_iommu_type1_info *info; 81 - 82 - info = malloc(sizeof(*info)); 83 - VFIO_ASSERT_NOT_NULL(info); 84 - 85 - *info = (struct vfio_iommu_type1_info) { 86 - .argsz = sizeof(*info), 87 - }; 88 - 89 - ioctl_assert(device->container_fd, VFIO_IOMMU_GET_INFO, info); 90 - VFIO_ASSERT_GE(info->argsz, sizeof(*info)); 91 - 92 - info = realloc(info, info->argsz); 93 - VFIO_ASSERT_NOT_NULL(info); 94 - 95 - ioctl_assert(device->container_fd, VFIO_IOMMU_GET_INFO, info); 96 - VFIO_ASSERT_GE(info->argsz, sizeof(*info)); 97 - 98 - return info; 99 - } 100 - 101 - /* 102 - * Return iova ranges for the device's container. Normalize vfio_iommu_type1 to 103 - * report iommufd's iommu_iova_range. Free with free(). 104 - */ 105 - static struct iommu_iova_range *vfio_iommu_iova_ranges(struct vfio_pci_device *device, 106 - u32 *nranges) 107 - { 108 - struct vfio_iommu_type1_info_cap_iova_range *cap_range; 109 - struct vfio_iommu_type1_info *info; 110 - struct vfio_info_cap_header *hdr; 111 - struct iommu_iova_range *ranges = NULL; 112 - 113 - info = vfio_iommu_get_info(device); 114 - hdr = vfio_iommu_info_cap_hdr(info, VFIO_IOMMU_TYPE1_INFO_CAP_IOVA_RANGE); 115 - VFIO_ASSERT_NOT_NULL(hdr); 116 - 117 - cap_range = container_of(hdr, struct vfio_iommu_type1_info_cap_iova_range, header); 118 - VFIO_ASSERT_GT(cap_range->nr_iovas, 0); 119 - 120 - ranges = calloc(cap_range->nr_iovas, sizeof(*ranges)); 121 - VFIO_ASSERT_NOT_NULL(ranges); 122 - 123 - for (u32 i = 0; i < cap_range->nr_iovas; i++) { 124 - ranges[i] = (struct iommu_iova_range){ 125 - .start = cap_range->iova_ranges[i].start, 126 - .last = cap_range->iova_ranges[i].end, 127 - }; 128 - } 129 - 130 - *nranges = cap_range->nr_iovas; 131 - 132 - free(info); 133 - return ranges; 134 - } 135 - 136 - /* Return iova ranges of the device's IOAS. Free with free() */ 137 - static struct iommu_iova_range *iommufd_iova_ranges(struct vfio_pci_device *device, 138 - u32 *nranges) 139 - { 140 - struct iommu_iova_range *ranges; 141 - int ret; 142 - 143 - struct iommu_ioas_iova_ranges query = { 144 - .size = sizeof(query), 145 - .ioas_id = device->ioas_id, 146 - }; 147 - 148 - ret = ioctl(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query); 149 - VFIO_ASSERT_EQ(ret, -1); 150 - VFIO_ASSERT_EQ(errno, EMSGSIZE); 151 - VFIO_ASSERT_GT(query.num_iovas, 0); 152 - 153 - ranges = calloc(query.num_iovas, sizeof(*ranges)); 154 - VFIO_ASSERT_NOT_NULL(ranges); 155 - 156 - query.allowed_iovas = (uintptr_t)ranges; 157 - 158 - ioctl_assert(device->iommufd, IOMMU_IOAS_IOVA_RANGES, &query); 159 - *nranges = query.num_iovas; 160 - 161 - return ranges; 162 - } 163 - 164 - static int iova_range_comp(const void *a, const void *b) 165 - { 166 - const struct iommu_iova_range *ra = a, *rb = b; 167 - 168 - if (ra->start < rb->start) 169 - return -1; 170 - 171 - if (ra->start > rb->start) 172 - return 1; 173 - 174 - return 0; 175 - } 176 - 177 - /* Return sorted IOVA ranges of the device. Free with free(). */ 178 - struct iommu_iova_range *vfio_pci_iova_ranges(struct vfio_pci_device *device, 179 - u32 *nranges) 180 - { 181 - struct iommu_iova_range *ranges; 182 - 183 - if (device->iommufd) 184 - ranges = iommufd_iova_ranges(device, nranges); 185 - else 186 - ranges = vfio_iommu_iova_ranges(device, nranges); 187 - 188 - if (!ranges) 189 - return NULL; 190 - 191 - VFIO_ASSERT_GT(*nranges, 0); 192 - 193 - /* Sort and check that ranges are sane and non-overlapping */ 194 - qsort(ranges, *nranges, sizeof(*ranges), iova_range_comp); 195 - VFIO_ASSERT_LT(ranges[0].start, ranges[0].last); 196 - 197 - for (u32 i = 1; i < *nranges; i++) { 198 - VFIO_ASSERT_LT(ranges[i].start, ranges[i].last); 199 - VFIO_ASSERT_LT(ranges[i - 1].last, ranges[i].start); 200 - } 201 - 202 - return ranges; 203 - } 204 - 205 - struct iova_allocator *iova_allocator_init(struct vfio_pci_device *device) 206 - { 207 - struct iova_allocator *allocator; 208 - struct iommu_iova_range *ranges; 209 - u32 nranges; 210 - 211 - ranges = vfio_pci_iova_ranges(device, &nranges); 212 - VFIO_ASSERT_NOT_NULL(ranges); 213 - 214 - allocator = malloc(sizeof(*allocator)); 215 - VFIO_ASSERT_NOT_NULL(allocator); 216 - 217 - *allocator = (struct iova_allocator){ 218 - .ranges = ranges, 219 - .nranges = nranges, 220 - .range_idx = 0, 221 - .range_offset = 0, 222 - }; 223 - 224 - return allocator; 225 - } 226 - 227 - void iova_allocator_cleanup(struct iova_allocator *allocator) 228 - { 229 - free(allocator->ranges); 230 - free(allocator); 231 - } 232 - 233 - iova_t iova_allocator_alloc(struct iova_allocator *allocator, size_t size) 234 - { 235 - VFIO_ASSERT_GT(size, 0, "Invalid size arg, zero\n"); 236 - VFIO_ASSERT_EQ(size & (size - 1), 0, "Invalid size arg, non-power-of-2\n"); 237 - 238 - for (;;) { 239 - struct iommu_iova_range *range; 240 - iova_t iova, last; 241 - 242 - VFIO_ASSERT_LT(allocator->range_idx, allocator->nranges, 243 - "IOVA allocator out of space\n"); 244 - 245 - range = &allocator->ranges[allocator->range_idx]; 246 - iova = range->start + allocator->range_offset; 247 - 248 - /* Check for sufficient space at the current offset */ 249 - if (check_add_overflow(iova, size - 1, &last) || 250 - last > range->last) 251 - goto next_range; 252 - 253 - /* Align iova to size */ 254 - iova = last & ~(size - 1); 255 - 256 - /* Check for sufficient space at the aligned iova */ 257 - if (check_add_overflow(iova, size - 1, &last) || 258 - last > range->last) 259 - goto next_range; 260 - 261 - if (last == range->last) { 262 - allocator->range_idx++; 263 - allocator->range_offset = 0; 264 - } else { 265 - allocator->range_offset = last - range->start + 1; 266 - } 267 - 268 - return iova; 269 - 270 - next_range: 271 - allocator->range_idx++; 272 - allocator->range_offset = 0; 273 - } 274 - } 275 - 276 - iova_t __to_iova(struct vfio_pci_device *device, void *vaddr) 277 - { 278 - struct vfio_dma_region *region; 279 - 280 - list_for_each_entry(region, &device->dma_regions, link) { 281 - if (vaddr < region->vaddr) 282 - continue; 283 - 284 - if (vaddr >= region->vaddr + region->size) 285 - continue; 286 - 287 - return region->iova + (vaddr - region->vaddr); 288 - } 289 - 290 - return INVALID_IOVA; 291 - } 292 - 293 - iova_t to_iova(struct vfio_pci_device *device, void *vaddr) 294 - { 295 - iova_t iova; 296 - 297 - iova = __to_iova(device, vaddr); 298 - VFIO_ASSERT_NE(iova, INVALID_IOVA, "%p is not mapped into device.\n", vaddr); 299 - 300 - return iova; 301 - } 302 26 303 27 static void vfio_pci_irq_set(struct vfio_pci_device *device, 304 28 u32 index, u32 vector, u32 count, int *fds) ··· 108 384 irq_info->index = index; 109 385 110 386 ioctl_assert(device->fd, VFIO_DEVICE_GET_IRQ_INFO, irq_info); 111 - } 112 - 113 - static int vfio_iommu_dma_map(struct vfio_pci_device *device, 114 - struct vfio_dma_region *region) 115 - { 116 - struct vfio_iommu_type1_dma_map args = { 117 - .argsz = sizeof(args), 118 - .flags = VFIO_DMA_MAP_FLAG_READ | VFIO_DMA_MAP_FLAG_WRITE, 119 - .vaddr = (u64)region->vaddr, 120 - .iova = region->iova, 121 - .size = region->size, 122 - }; 123 - 124 - if (ioctl(device->container_fd, VFIO_IOMMU_MAP_DMA, &args)) 125 - return -errno; 126 - 127 - return 0; 128 - } 129 - 130 - static int iommufd_dma_map(struct vfio_pci_device *device, 131 - struct vfio_dma_region *region) 132 - { 133 - struct iommu_ioas_map args = { 134 - .size = sizeof(args), 135 - .flags = IOMMU_IOAS_MAP_READABLE | 136 - IOMMU_IOAS_MAP_WRITEABLE | 137 - IOMMU_IOAS_MAP_FIXED_IOVA, 138 - .user_va = (u64)region->vaddr, 139 - .iova = region->iova, 140 - .length = region->size, 141 - .ioas_id = device->ioas_id, 142 - }; 143 - 144 - if (ioctl(device->iommufd, IOMMU_IOAS_MAP, &args)) 145 - return -errno; 146 - 147 - return 0; 148 - } 149 - 150 - int __vfio_pci_dma_map(struct vfio_pci_device *device, 151 - struct vfio_dma_region *region) 152 - { 153 - int ret; 154 - 155 - if (device->iommufd) 156 - ret = iommufd_dma_map(device, region); 157 - else 158 - ret = vfio_iommu_dma_map(device, region); 159 - 160 - if (ret) 161 - return ret; 162 - 163 - list_add(&region->link, &device->dma_regions); 164 - 165 - return 0; 166 - } 167 - 168 - static int vfio_iommu_dma_unmap(int fd, u64 iova, u64 size, u32 flags, 169 - u64 *unmapped) 170 - { 171 - struct vfio_iommu_type1_dma_unmap args = { 172 - .argsz = sizeof(args), 173 - .iova = iova, 174 - .size = size, 175 - .flags = flags, 176 - }; 177 - 178 - if (ioctl(fd, VFIO_IOMMU_UNMAP_DMA, &args)) 179 - return -errno; 180 - 181 - if (unmapped) 182 - *unmapped = args.size; 183 - 184 - return 0; 185 - } 186 - 187 - static int iommufd_dma_unmap(int fd, u64 iova, u64 length, u32 ioas_id, 188 - u64 *unmapped) 189 - { 190 - struct iommu_ioas_unmap args = { 191 - .size = sizeof(args), 192 - .iova = iova, 193 - .length = length, 194 - .ioas_id = ioas_id, 195 - }; 196 - 197 - if (ioctl(fd, IOMMU_IOAS_UNMAP, &args)) 198 - return -errno; 199 - 200 - if (unmapped) 201 - *unmapped = args.length; 202 - 203 - return 0; 204 - } 205 - 206 - int __vfio_pci_dma_unmap(struct vfio_pci_device *device, 207 - struct vfio_dma_region *region, u64 *unmapped) 208 - { 209 - int ret; 210 - 211 - if (device->iommufd) 212 - ret = iommufd_dma_unmap(device->iommufd, region->iova, 213 - region->size, device->ioas_id, 214 - unmapped); 215 - else 216 - ret = vfio_iommu_dma_unmap(device->container_fd, region->iova, 217 - region->size, 0, unmapped); 218 - 219 - if (ret) 220 - return ret; 221 - 222 - list_del_init(&region->link); 223 - 224 - return 0; 225 - } 226 - 227 - int __vfio_pci_dma_unmap_all(struct vfio_pci_device *device, u64 *unmapped) 228 - { 229 - int ret; 230 - struct vfio_dma_region *curr, *next; 231 - 232 - if (device->iommufd) 233 - ret = iommufd_dma_unmap(device->iommufd, 0, UINT64_MAX, 234 - device->ioas_id, unmapped); 235 - else 236 - ret = vfio_iommu_dma_unmap(device->container_fd, 0, 0, 237 - VFIO_DMA_UNMAP_FLAG_ALL, unmapped); 238 - 239 - if (ret) 240 - return ret; 241 - 242 - list_for_each_entry_safe(curr, next, &device->dma_regions, link) 243 - list_del_init(&curr->link); 244 - 245 - return 0; 246 387 } 247 388 248 389 static void vfio_pci_region_get(struct vfio_pci_device *device, int index, ··· 216 627 ioctl_assert(device->group_fd, VFIO_GROUP_GET_STATUS, &group_status); 217 628 VFIO_ASSERT_TRUE(group_status.flags & VFIO_GROUP_FLAGS_VIABLE); 218 629 219 - ioctl_assert(device->group_fd, VFIO_GROUP_SET_CONTAINER, &device->container_fd); 630 + ioctl_assert(device->group_fd, VFIO_GROUP_SET_CONTAINER, &device->iommu->container_fd); 220 631 } 221 632 222 633 static void vfio_pci_container_setup(struct vfio_pci_device *device, const char *bdf) 223 634 { 224 - unsigned long iommu_type = device->iommu_mode->iommu_type; 225 - const char *path = device->iommu_mode->container_path; 226 - int version; 635 + struct iommu *iommu = device->iommu; 636 + unsigned long iommu_type = iommu->mode->iommu_type; 227 637 int ret; 228 - 229 - device->container_fd = open(path, O_RDWR); 230 - VFIO_ASSERT_GE(device->container_fd, 0, "open(%s) failed\n", path); 231 - 232 - version = ioctl(device->container_fd, VFIO_GET_API_VERSION); 233 - VFIO_ASSERT_EQ(version, VFIO_API_VERSION, "Unsupported version: %d\n", version); 234 638 235 639 vfio_pci_group_setup(device, bdf); 236 640 237 - ret = ioctl(device->container_fd, VFIO_CHECK_EXTENSION, iommu_type); 641 + ret = ioctl(iommu->container_fd, VFIO_CHECK_EXTENSION, iommu_type); 238 642 VFIO_ASSERT_GT(ret, 0, "VFIO IOMMU type %lu not supported\n", iommu_type); 239 643 240 - ioctl_assert(device->container_fd, VFIO_SET_IOMMU, (void *)iommu_type); 644 + /* 645 + * Allow multiple threads to race to set the IOMMU type on the 646 + * container. The first will succeed and the rest should fail 647 + * because the IOMMU type is already set. 648 + */ 649 + (void)ioctl(iommu->container_fd, VFIO_SET_IOMMU, (void *)iommu_type); 241 650 242 651 device->fd = ioctl(device->group_fd, VFIO_GROUP_GET_DEVICE_FD, bdf); 243 652 VFIO_ASSERT_GE(device->fd, 0); ··· 299 712 return cdev_path; 300 713 } 301 714 302 - /* Reminder: Keep in sync with FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(). */ 303 - static const struct vfio_iommu_mode iommu_modes[] = { 304 - { 305 - .name = "vfio_type1_iommu", 306 - .container_path = "/dev/vfio/vfio", 307 - .iommu_type = VFIO_TYPE1_IOMMU, 308 - }, 309 - { 310 - .name = "vfio_type1v2_iommu", 311 - .container_path = "/dev/vfio/vfio", 312 - .iommu_type = VFIO_TYPE1v2_IOMMU, 313 - }, 314 - { 315 - .name = "iommufd_compat_type1", 316 - .container_path = "/dev/iommu", 317 - .iommu_type = VFIO_TYPE1_IOMMU, 318 - }, 319 - { 320 - .name = "iommufd_compat_type1v2", 321 - .container_path = "/dev/iommu", 322 - .iommu_type = VFIO_TYPE1v2_IOMMU, 323 - }, 324 - { 325 - .name = "iommufd", 326 - }, 327 - }; 328 - 329 - const char *default_iommu_mode = "iommufd"; 330 - 331 - static const struct vfio_iommu_mode *lookup_iommu_mode(const char *iommu_mode) 332 - { 333 - int i; 334 - 335 - if (!iommu_mode) 336 - iommu_mode = default_iommu_mode; 337 - 338 - for (i = 0; i < ARRAY_SIZE(iommu_modes); i++) { 339 - if (strcmp(iommu_mode, iommu_modes[i].name)) 340 - continue; 341 - 342 - return &iommu_modes[i]; 343 - } 344 - 345 - VFIO_FAIL("Unrecognized IOMMU mode: %s\n", iommu_mode); 346 - } 347 - 348 715 static void vfio_device_bind_iommufd(int device_fd, int iommufd) 349 716 { 350 717 struct vfio_device_bind_iommufd args = { ··· 307 766 }; 308 767 309 768 ioctl_assert(device_fd, VFIO_DEVICE_BIND_IOMMUFD, &args); 310 - } 311 - 312 - static u32 iommufd_ioas_alloc(int iommufd) 313 - { 314 - struct iommu_ioas_alloc args = { 315 - .size = sizeof(args), 316 - }; 317 - 318 - ioctl_assert(iommufd, IOMMU_IOAS_ALLOC, &args); 319 - return args.out_ioas_id; 320 769 } 321 770 322 771 static void vfio_device_attach_iommufd_pt(int device_fd, u32 pt_id) ··· 327 796 VFIO_ASSERT_GE(device->fd, 0); 328 797 free((void *)cdev_path); 329 798 330 - /* 331 - * Require device->iommufd to be >0 so that a simple non-0 check can be 332 - * used to check if iommufd is enabled. In practice open() will never 333 - * return 0 unless stdin is closed. 334 - */ 335 - device->iommufd = open("/dev/iommu", O_RDWR); 336 - VFIO_ASSERT_GT(device->iommufd, 0); 337 - 338 - vfio_device_bind_iommufd(device->fd, device->iommufd); 339 - device->ioas_id = iommufd_ioas_alloc(device->iommufd); 340 - vfio_device_attach_iommufd_pt(device->fd, device->ioas_id); 799 + vfio_device_bind_iommufd(device->fd, device->iommu->iommufd); 800 + vfio_device_attach_iommufd_pt(device->fd, device->iommu->ioas_id); 341 801 } 342 802 343 - struct vfio_pci_device *vfio_pci_device_init(const char *bdf, const char *iommu_mode) 803 + struct vfio_pci_device *vfio_pci_device_init(const char *bdf, struct iommu *iommu) 344 804 { 345 805 struct vfio_pci_device *device; 346 806 347 807 device = calloc(1, sizeof(*device)); 348 808 VFIO_ASSERT_NOT_NULL(device); 349 809 350 - INIT_LIST_HEAD(&device->dma_regions); 810 + VFIO_ASSERT_NOT_NULL(iommu); 811 + device->iommu = iommu; 812 + device->bdf = bdf; 351 813 352 - device->iommu_mode = lookup_iommu_mode(iommu_mode); 353 - 354 - if (device->iommu_mode->container_path) 814 + if (iommu->mode->container_path) 355 815 vfio_pci_container_setup(device, bdf); 356 816 else 357 817 vfio_pci_iommufd_setup(device, bdf); ··· 371 849 VFIO_ASSERT_EQ(close(device->msi_eventfds[i]), 0); 372 850 } 373 851 374 - if (device->iommufd) { 375 - VFIO_ASSERT_EQ(close(device->iommufd), 0); 376 - } else { 852 + if (device->group_fd) 377 853 VFIO_ASSERT_EQ(close(device->group_fd), 0); 378 - VFIO_ASSERT_EQ(close(device->container_fd), 0); 379 - } 380 854 381 855 free(device); 382 - } 383 - 384 - static bool is_bdf(const char *str) 385 - { 386 - unsigned int s, b, d, f; 387 - int length, count; 388 - 389 - count = sscanf(str, "%4x:%2x:%2x.%2x%n", &s, &b, &d, &f, &length); 390 - return count == 4 && length == strlen(str); 391 - } 392 - 393 - const char *vfio_selftests_get_bdf(int *argc, char *argv[]) 394 - { 395 - char *bdf; 396 - 397 - if (*argc > 1 && is_bdf(argv[*argc - 1])) 398 - return argv[--(*argc)]; 399 - 400 - bdf = getenv("VFIO_SELFTESTS_BDF"); 401 - if (bdf) { 402 - VFIO_ASSERT_TRUE(is_bdf(bdf), "Invalid BDF: %s\n", bdf); 403 - return bdf; 404 - } 405 - 406 - fprintf(stderr, "Unable to determine which device to use, skipping test.\n"); 407 - fprintf(stderr, "\n"); 408 - fprintf(stderr, "To pass the device address via environment variable:\n"); 409 - fprintf(stderr, "\n"); 410 - fprintf(stderr, " export VFIO_SELFTESTS_BDF=segment:bus:device.function\n"); 411 - fprintf(stderr, " %s [options]\n", argv[0]); 412 - fprintf(stderr, "\n"); 413 - fprintf(stderr, "To pass the device address via argv:\n"); 414 - fprintf(stderr, "\n"); 415 - fprintf(stderr, " %s [options] segment:bus:device.function\n", argv[0]); 416 - fprintf(stderr, "\n"); 417 - exit(KSFT_SKIP); 418 856 }
+1 -15
tools/testing/selftests/vfio/lib/vfio_pci_driver.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 - #include <stdio.h> 3 - 4 2 #include "../../../kselftest.h" 5 - #include <vfio_util.h> 3 + #include <libvfio.h> 6 4 7 5 #ifdef __x86_64__ 8 6 extern struct vfio_pci_driver_ops dsa_ops; ··· 27 29 if (ops->probe(device)) 28 30 continue; 29 31 30 - printf("Driver found: %s\n", ops->name); 31 32 device->driver.ops = ops; 32 33 } 33 34 } ··· 55 58 driver->ops->init(device); 56 59 57 60 driver->initialized = true; 58 - 59 - printf("%s: region: vaddr %p, iova 0x%lx, size 0x%lx\n", 60 - driver->ops->name, 61 - driver->region.vaddr, 62 - driver->region.iova, 63 - driver->region.size); 64 - 65 - printf("%s: max_memcpy_size 0x%lx, max_memcpy_count 0x%lx\n", 66 - driver->ops->name, 67 - driver->max_memcpy_size, 68 - driver->max_memcpy_count); 69 61 } 70 62 71 63 void vfio_pci_driver_remove(struct vfio_pci_device *device)
-109
tools/testing/selftests/vfio/run.sh
··· 1 - # SPDX-License-Identifier: GPL-2.0-or-later 2 - 3 - # Global variables initialized in main() and then used during cleanup() when 4 - # the script exits. 5 - declare DEVICE_BDF 6 - declare NEW_DRIVER 7 - declare OLD_DRIVER 8 - declare OLD_NUMVFS 9 - declare DRIVER_OVERRIDE 10 - 11 - function write_to() { 12 - # Unfortunately set -x does not show redirects so use echo to manually 13 - # tell the user what commands are being run. 14 - echo "+ echo \"${2}\" > ${1}" 15 - echo "${2}" > ${1} 16 - } 17 - 18 - function bind() { 19 - write_to /sys/bus/pci/drivers/${2}/bind ${1} 20 - } 21 - 22 - function unbind() { 23 - write_to /sys/bus/pci/drivers/${2}/unbind ${1} 24 - } 25 - 26 - function set_sriov_numvfs() { 27 - write_to /sys/bus/pci/devices/${1}/sriov_numvfs ${2} 28 - } 29 - 30 - function set_driver_override() { 31 - write_to /sys/bus/pci/devices/${1}/driver_override ${2} 32 - } 33 - 34 - function clear_driver_override() { 35 - set_driver_override ${1} "" 36 - } 37 - 38 - function cleanup() { 39 - if [ "${NEW_DRIVER}" ]; then unbind ${DEVICE_BDF} ${NEW_DRIVER} ; fi 40 - if [ "${DRIVER_OVERRIDE}" ]; then clear_driver_override ${DEVICE_BDF} ; fi 41 - if [ "${OLD_DRIVER}" ]; then bind ${DEVICE_BDF} ${OLD_DRIVER} ; fi 42 - if [ "${OLD_NUMVFS}" ]; then set_sriov_numvfs ${DEVICE_BDF} ${OLD_NUMVFS} ; fi 43 - } 44 - 45 - function usage() { 46 - echo "usage: $0 [-d segment:bus:device.function] [-s] [-h] [cmd ...]" >&2 47 - echo >&2 48 - echo " -d: The BDF of the device to use for the test (required)" >&2 49 - echo " -h: Show this help message" >&2 50 - echo " -s: Drop into a shell rather than running a command" >&2 51 - echo >&2 52 - echo " cmd: The command to run and arguments to pass to it." >&2 53 - echo " Required when not using -s. The SBDF will be " >&2 54 - echo " appended to the argument list." >&2 55 - exit 1 56 - } 57 - 58 - function main() { 59 - local shell 60 - 61 - while getopts "d:hs" opt; do 62 - case $opt in 63 - d) DEVICE_BDF="$OPTARG" ;; 64 - s) shell=true ;; 65 - *) usage ;; 66 - esac 67 - done 68 - 69 - # Shift past all optional arguments. 70 - shift $((OPTIND - 1)) 71 - 72 - # Check that the user passed in the command to run. 73 - [ ! "${shell}" ] && [ $# = 0 ] && usage 74 - 75 - # Check that the user passed in a BDF. 76 - [ "${DEVICE_BDF}" ] || usage 77 - 78 - trap cleanup EXIT 79 - set -e 80 - 81 - test -d /sys/bus/pci/devices/${DEVICE_BDF} 82 - 83 - if [ -f /sys/bus/pci/devices/${DEVICE_BDF}/sriov_numvfs ]; then 84 - OLD_NUMVFS=$(cat /sys/bus/pci/devices/${DEVICE_BDF}/sriov_numvfs) 85 - set_sriov_numvfs ${DEVICE_BDF} 0 86 - fi 87 - 88 - if [ -L /sys/bus/pci/devices/${DEVICE_BDF}/driver ]; then 89 - OLD_DRIVER=$(basename $(readlink -m /sys/bus/pci/devices/${DEVICE_BDF}/driver)) 90 - unbind ${DEVICE_BDF} ${OLD_DRIVER} 91 - fi 92 - 93 - set_driver_override ${DEVICE_BDF} vfio-pci 94 - DRIVER_OVERRIDE=true 95 - 96 - bind ${DEVICE_BDF} vfio-pci 97 - NEW_DRIVER=vfio-pci 98 - 99 - echo 100 - if [ "${shell}" ]; then 101 - echo "Dropping into ${SHELL} with VFIO_SELFTESTS_BDF=${DEVICE_BDF}" 102 - VFIO_SELFTESTS_BDF=${DEVICE_BDF} ${SHELL} 103 - else 104 - "$@" ${DEVICE_BDF} 105 - fi 106 - echo 107 - } 108 - 109 - main "$@"
+41
tools/testing/selftests/vfio/scripts/cleanup.sh
··· 1 + # SPDX-License-Identifier: GPL-2.0-or-later 2 + 3 + source $(dirname -- "${BASH_SOURCE[0]}")/lib.sh 4 + 5 + function cleanup_devices() { 6 + local device_bdf 7 + local device_dir 8 + 9 + for device_bdf in "$@"; do 10 + device_dir=${DEVICES_DIR}/${device_bdf} 11 + 12 + if [ -f ${device_dir}/vfio-pci ]; then 13 + unbind ${device_bdf} vfio-pci 14 + fi 15 + 16 + if [ -f ${device_dir}/driver_override ]; then 17 + clear_driver_override ${device_bdf} 18 + fi 19 + 20 + if [ -f ${device_dir}/driver ]; then 21 + bind ${device_bdf} $(cat ${device_dir}/driver) 22 + fi 23 + 24 + if [ -f ${device_dir}/sriov_numvfs ]; then 25 + set_sriov_numvfs ${device_bdf} $(cat ${device_dir}/sriov_numvfs) 26 + fi 27 + 28 + rm -rf ${device_dir} 29 + done 30 + } 31 + 32 + function main() { 33 + if [ $# = 0 ]; then 34 + cleanup_devices $(ls ${DEVICES_DIR}) 35 + rmdir ${DEVICES_DIR} 36 + else 37 + cleanup_devices "$@" 38 + fi 39 + } 40 + 41 + main "$@"
+42
tools/testing/selftests/vfio/scripts/lib.sh
··· 1 + # SPDX-License-Identifier: GPL-2.0-or-later 2 + 3 + readonly DEVICES_DIR="${TMPDIR:-/tmp}/vfio-selftests-devices" 4 + 5 + function write_to() { 6 + # Unfortunately set -x does not show redirects so use echo to manually 7 + # tell the user what commands are being run. 8 + echo "+ echo \"${2}\" > ${1}" 9 + echo "${2}" > ${1} 10 + } 11 + 12 + function get_driver() { 13 + if [ -L /sys/bus/pci/devices/${1}/driver ]; then 14 + basename $(readlink -m /sys/bus/pci/devices/${1}/driver) 15 + fi 16 + } 17 + 18 + function bind() { 19 + write_to /sys/bus/pci/drivers/${2}/bind ${1} 20 + } 21 + 22 + function unbind() { 23 + write_to /sys/bus/pci/drivers/${2}/unbind ${1} 24 + } 25 + 26 + function set_sriov_numvfs() { 27 + write_to /sys/bus/pci/devices/${1}/sriov_numvfs ${2} 28 + } 29 + 30 + function get_sriov_numvfs() { 31 + if [ -f /sys/bus/pci/devices/${1}/sriov_numvfs ]; then 32 + cat /sys/bus/pci/devices/${1}/sriov_numvfs 33 + fi 34 + } 35 + 36 + function set_driver_override() { 37 + write_to /sys/bus/pci/devices/${1}/driver_override ${2} 38 + } 39 + 40 + function clear_driver_override() { 41 + set_driver_override ${1} "" 42 + }
+16
tools/testing/selftests/vfio/scripts/run.sh
··· 1 + # SPDX-License-Identifier: GPL-2.0-or-later 2 + 3 + source $(dirname -- "${BASH_SOURCE[0]}")/lib.sh 4 + 5 + function main() { 6 + local device_bdfs=$(ls ${DEVICES_DIR}) 7 + 8 + if [ -z "${device_bdfs}" ]; then 9 + echo "No devices found, skipping." 10 + exit 4 11 + fi 12 + 13 + "$@" ${device_bdfs} 14 + } 15 + 16 + main "$@"
+48
tools/testing/selftests/vfio/scripts/setup.sh
··· 1 + # SPDX-License-Identifier: GPL-2.0-or-later 2 + set -e 3 + 4 + source $(dirname -- "${BASH_SOURCE[0]}")/lib.sh 5 + 6 + function main() { 7 + local device_bdf 8 + local device_dir 9 + local numvfs 10 + local driver 11 + 12 + if [ $# = 0 ]; then 13 + echo "usage: $0 segment:bus:device.function ..." >&2 14 + exit 1 15 + fi 16 + 17 + for device_bdf in "$@"; do 18 + test -d /sys/bus/pci/devices/${device_bdf} 19 + 20 + device_dir=${DEVICES_DIR}/${device_bdf} 21 + if [ -d "${device_dir}" ]; then 22 + echo "${device_bdf} has already been set up, exiting." 23 + exit 0 24 + fi 25 + 26 + mkdir -p ${device_dir} 27 + 28 + numvfs=$(get_sriov_numvfs ${device_bdf}) 29 + if [ "${numvfs}" ]; then 30 + set_sriov_numvfs ${device_bdf} 0 31 + echo ${numvfs} > ${device_dir}/sriov_numvfs 32 + fi 33 + 34 + driver=$(get_driver ${device_bdf}) 35 + if [ "${driver}" ]; then 36 + unbind ${device_bdf} ${driver} 37 + echo ${driver} > ${device_dir}/driver 38 + fi 39 + 40 + set_driver_override ${device_bdf} vfio-pci 41 + touch ${device_dir}/driver_override 42 + 43 + bind ${device_bdf} vfio-pci 44 + touch ${device_dir}/vfio-pci 45 + done 46 + } 47 + 48 + main "$@"
+26 -20
tools/testing/selftests/vfio/vfio_dma_mapping_test.c
··· 10 10 #include <linux/sizes.h> 11 11 #include <linux/vfio.h> 12 12 13 - #include <vfio_util.h> 13 + #include <libvfio.h> 14 14 15 15 #include "../kselftest_harness.h" 16 16 ··· 94 94 } 95 95 96 96 FIXTURE(vfio_dma_mapping_test) { 97 + struct iommu *iommu; 97 98 struct vfio_pci_device *device; 98 99 struct iova_allocator *iova_allocator; 99 100 }; ··· 120 119 121 120 FIXTURE_SETUP(vfio_dma_mapping_test) 122 121 { 123 - self->device = vfio_pci_device_init(device_bdf, variant->iommu_mode); 124 - self->iova_allocator = iova_allocator_init(self->device); 122 + self->iommu = iommu_init(variant->iommu_mode); 123 + self->device = vfio_pci_device_init(device_bdf, self->iommu); 124 + self->iova_allocator = iova_allocator_init(self->iommu); 125 125 } 126 126 127 127 FIXTURE_TEARDOWN(vfio_dma_mapping_test) 128 128 { 129 129 iova_allocator_cleanup(self->iova_allocator); 130 130 vfio_pci_device_cleanup(self->device); 131 + iommu_cleanup(self->iommu); 131 132 } 132 133 133 134 TEST_F(vfio_dma_mapping_test, dma_map_unmap) 134 135 { 135 136 const u64 size = variant->size ?: getpagesize(); 136 137 const int flags = variant->mmap_flags; 137 - struct vfio_dma_region region; 138 + struct dma_region region; 138 139 struct iommu_mapping mapping; 139 140 u64 mapping_size = size; 140 141 u64 unmapped; ··· 153 150 region.iova = iova_allocator_alloc(self->iova_allocator, size); 154 151 region.size = size; 155 152 156 - vfio_pci_dma_map(self->device, &region); 153 + iommu_map(self->iommu, &region); 157 154 printf("Mapped HVA %p (size 0x%lx) at IOVA 0x%lx\n", region.vaddr, size, region.iova); 158 155 159 156 ASSERT_EQ(region.iova, to_iova(self->device, region.vaddr)); ··· 195 192 } 196 193 197 194 unmap: 198 - rc = __vfio_pci_dma_unmap(self->device, &region, &unmapped); 195 + rc = __iommu_unmap(self->iommu, &region, &unmapped); 199 196 ASSERT_EQ(rc, 0); 200 197 ASSERT_EQ(unmapped, region.size); 201 198 printf("Unmapped IOVA 0x%lx\n", region.iova); 202 - ASSERT_EQ(INVALID_IOVA, __to_iova(self->device, region.vaddr)); 199 + ASSERT_NE(0, __to_iova(self->device, region.vaddr, NULL)); 203 200 ASSERT_NE(0, iommu_mapping_get(device_bdf, region.iova, &mapping)); 204 201 205 202 ASSERT_TRUE(!munmap(region.vaddr, size)); 206 203 } 207 204 208 205 FIXTURE(vfio_dma_map_limit_test) { 206 + struct iommu *iommu; 209 207 struct vfio_pci_device *device; 210 - struct vfio_dma_region region; 208 + struct dma_region region; 211 209 size_t mmap_size; 212 210 }; 213 211 ··· 227 223 228 224 FIXTURE_SETUP(vfio_dma_map_limit_test) 229 225 { 230 - struct vfio_dma_region *region = &self->region; 226 + struct dma_region *region = &self->region; 231 227 struct iommu_iova_range *ranges; 232 228 u64 region_size = getpagesize(); 233 229 iova_t last_iova; ··· 239 235 */ 240 236 self->mmap_size = 2 * region_size; 241 237 242 - self->device = vfio_pci_device_init(device_bdf, variant->iommu_mode); 238 + self->iommu = iommu_init(variant->iommu_mode); 239 + self->device = vfio_pci_device_init(device_bdf, self->iommu); 243 240 region->vaddr = mmap(NULL, self->mmap_size, PROT_READ | PROT_WRITE, 244 241 MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); 245 242 ASSERT_NE(region->vaddr, MAP_FAILED); 246 243 247 - ranges = vfio_pci_iova_ranges(self->device, &nranges); 244 + ranges = iommu_iova_ranges(self->iommu, &nranges); 248 245 VFIO_ASSERT_NOT_NULL(ranges); 249 246 last_iova = ranges[nranges - 1].last; 250 247 free(ranges); ··· 258 253 FIXTURE_TEARDOWN(vfio_dma_map_limit_test) 259 254 { 260 255 vfio_pci_device_cleanup(self->device); 256 + iommu_cleanup(self->iommu); 261 257 ASSERT_EQ(munmap(self->region.vaddr, self->mmap_size), 0); 262 258 } 263 259 264 260 TEST_F(vfio_dma_map_limit_test, unmap_range) 265 261 { 266 - struct vfio_dma_region *region = &self->region; 262 + struct dma_region *region = &self->region; 267 263 u64 unmapped; 268 264 int rc; 269 265 270 - vfio_pci_dma_map(self->device, region); 266 + iommu_map(self->iommu, region); 271 267 ASSERT_EQ(region->iova, to_iova(self->device, region->vaddr)); 272 268 273 - rc = __vfio_pci_dma_unmap(self->device, region, &unmapped); 269 + rc = __iommu_unmap(self->iommu, region, &unmapped); 274 270 ASSERT_EQ(rc, 0); 275 271 ASSERT_EQ(unmapped, region->size); 276 272 } 277 273 278 274 TEST_F(vfio_dma_map_limit_test, unmap_all) 279 275 { 280 - struct vfio_dma_region *region = &self->region; 276 + struct dma_region *region = &self->region; 281 277 u64 unmapped; 282 278 int rc; 283 279 284 - vfio_pci_dma_map(self->device, region); 280 + iommu_map(self->iommu, region); 285 281 ASSERT_EQ(region->iova, to_iova(self->device, region->vaddr)); 286 282 287 - rc = __vfio_pci_dma_unmap_all(self->device, &unmapped); 283 + rc = __iommu_unmap_all(self->iommu, &unmapped); 288 284 ASSERT_EQ(rc, 0); 289 285 ASSERT_EQ(unmapped, region->size); 290 286 } 291 287 292 288 TEST_F(vfio_dma_map_limit_test, overflow) 293 289 { 294 - struct vfio_dma_region *region = &self->region; 290 + struct dma_region *region = &self->region; 295 291 int rc; 296 292 297 293 region->iova = ~(iova_t)0 & ~(region->size - 1); 298 294 region->size = self->mmap_size; 299 295 300 - rc = __vfio_pci_dma_map(self->device, region); 296 + rc = __iommu_map(self->iommu, region); 301 297 ASSERT_EQ(rc, -EOVERFLOW); 302 298 303 - rc = __vfio_pci_dma_unmap(self->device, region, NULL); 299 + rc = __iommu_unmap(self->iommu, region, NULL); 304 300 ASSERT_EQ(rc, -EOVERFLOW); 305 301 } 306 302
+1 -1
tools/testing/selftests/vfio/vfio_iommufd_setup_test.c
··· 10 10 #include <sys/ioctl.h> 11 11 #include <unistd.h> 12 12 13 - #include <vfio_util.h> 13 + #include <libvfio.h> 14 14 #include "../kselftest_harness.h" 15 15 16 16 static const char iommu_dev_path[] = "/dev/iommu";
+168
tools/testing/selftests/vfio/vfio_pci_device_init_perf_test.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + #include <pthread.h> 3 + #include <sys/ioctl.h> 4 + #include <sys/mman.h> 5 + 6 + #include <linux/sizes.h> 7 + #include <linux/time64.h> 8 + #include <linux/vfio.h> 9 + 10 + #include <libvfio.h> 11 + 12 + #include "../kselftest_harness.h" 13 + 14 + static char **device_bdfs; 15 + static int nr_devices; 16 + 17 + struct thread_args { 18 + struct iommu *iommu; 19 + int device_index; 20 + struct timespec start; 21 + struct timespec end; 22 + pthread_barrier_t *barrier; 23 + }; 24 + 25 + FIXTURE(vfio_pci_device_init_perf_test) { 26 + pthread_t *threads; 27 + pthread_barrier_t barrier; 28 + struct thread_args *thread_args; 29 + struct iommu *iommu; 30 + }; 31 + 32 + FIXTURE_VARIANT(vfio_pci_device_init_perf_test) { 33 + const char *iommu_mode; 34 + }; 35 + 36 + #define FIXTURE_VARIANT_ADD_IOMMU_MODE(_iommu_mode) \ 37 + FIXTURE_VARIANT_ADD(vfio_pci_device_init_perf_test, _iommu_mode) { \ 38 + .iommu_mode = #_iommu_mode, \ 39 + } 40 + 41 + FIXTURE_VARIANT_ADD_ALL_IOMMU_MODES(); 42 + 43 + FIXTURE_SETUP(vfio_pci_device_init_perf_test) 44 + { 45 + int i; 46 + 47 + self->iommu = iommu_init(variant->iommu_mode); 48 + self->threads = calloc(nr_devices, sizeof(self->threads[0])); 49 + self->thread_args = calloc(nr_devices, sizeof(self->thread_args[0])); 50 + 51 + pthread_barrier_init(&self->barrier, NULL, nr_devices); 52 + 53 + for (i = 0; i < nr_devices; i++) { 54 + self->thread_args[i].iommu = self->iommu; 55 + self->thread_args[i].barrier = &self->barrier; 56 + self->thread_args[i].device_index = i; 57 + } 58 + } 59 + 60 + FIXTURE_TEARDOWN(vfio_pci_device_init_perf_test) 61 + { 62 + iommu_cleanup(self->iommu); 63 + free(self->threads); 64 + free(self->thread_args); 65 + } 66 + 67 + static s64 to_ns(struct timespec ts) 68 + { 69 + return (s64)ts.tv_nsec + NSEC_PER_SEC * (s64)ts.tv_sec; 70 + } 71 + 72 + static struct timespec to_timespec(s64 ns) 73 + { 74 + struct timespec ts = { 75 + .tv_nsec = ns % NSEC_PER_SEC, 76 + .tv_sec = ns / NSEC_PER_SEC, 77 + }; 78 + 79 + return ts; 80 + } 81 + 82 + static struct timespec timespec_sub(struct timespec a, struct timespec b) 83 + { 84 + return to_timespec(to_ns(a) - to_ns(b)); 85 + } 86 + 87 + static struct timespec timespec_min(struct timespec a, struct timespec b) 88 + { 89 + return to_ns(a) < to_ns(b) ? a : b; 90 + } 91 + 92 + static struct timespec timespec_max(struct timespec a, struct timespec b) 93 + { 94 + return to_ns(a) > to_ns(b) ? a : b; 95 + } 96 + 97 + static void *thread_main(void *__args) 98 + { 99 + struct thread_args *args = __args; 100 + struct vfio_pci_device *device; 101 + 102 + pthread_barrier_wait(args->barrier); 103 + 104 + clock_gettime(CLOCK_MONOTONIC, &args->start); 105 + device = vfio_pci_device_init(device_bdfs[args->device_index], args->iommu); 106 + clock_gettime(CLOCK_MONOTONIC, &args->end); 107 + 108 + pthread_barrier_wait(args->barrier); 109 + 110 + vfio_pci_device_cleanup(device); 111 + return NULL; 112 + } 113 + 114 + TEST_F(vfio_pci_device_init_perf_test, init) 115 + { 116 + struct timespec start = to_timespec(INT64_MAX), end = {}; 117 + struct timespec min = to_timespec(INT64_MAX); 118 + struct timespec max = {}; 119 + struct timespec avg = {}; 120 + struct timespec wall_time; 121 + s64 thread_ns = 0; 122 + int i; 123 + 124 + for (i = 0; i < nr_devices; i++) { 125 + pthread_create(&self->threads[i], NULL, thread_main, 126 + &self->thread_args[i]); 127 + } 128 + 129 + for (i = 0; i < nr_devices; i++) { 130 + struct thread_args *args = &self->thread_args[i]; 131 + struct timespec init_time; 132 + 133 + pthread_join(self->threads[i], NULL); 134 + 135 + start = timespec_min(start, args->start); 136 + end = timespec_max(end, args->end); 137 + 138 + init_time = timespec_sub(args->end, args->start); 139 + min = timespec_min(min, init_time); 140 + max = timespec_max(max, init_time); 141 + thread_ns += to_ns(init_time); 142 + } 143 + 144 + avg = to_timespec(thread_ns / nr_devices); 145 + wall_time = timespec_sub(end, start); 146 + 147 + printf("Wall time: %lu.%09lus\n", 148 + wall_time.tv_sec, wall_time.tv_nsec); 149 + printf("Min init time (per device): %lu.%09lus\n", 150 + min.tv_sec, min.tv_nsec); 151 + printf("Max init time (per device): %lu.%09lus\n", 152 + max.tv_sec, max.tv_nsec); 153 + printf("Avg init time (per device): %lu.%09lus\n", 154 + avg.tv_sec, avg.tv_nsec); 155 + } 156 + 157 + int main(int argc, char *argv[]) 158 + { 159 + int i; 160 + 161 + device_bdfs = vfio_selftests_get_bdfs(&argc, argv, &nr_devices); 162 + 163 + printf("Testing parallel initialization of %d devices:\n", nr_devices); 164 + for (i = 0; i < nr_devices; i++) 165 + printf(" %s\n", device_bdfs[i]); 166 + 167 + return test_harness_run(argc, argv); 168 + }
+9 -3
tools/testing/selftests/vfio/vfio_pci_device_test.c
··· 10 10 #include <linux/sizes.h> 11 11 #include <linux/vfio.h> 12 12 13 - #include <vfio_util.h> 13 + #include <libvfio.h> 14 14 15 15 #include "../kselftest_harness.h" 16 16 ··· 23 23 #define MAX_TEST_MSI 16U 24 24 25 25 FIXTURE(vfio_pci_device_test) { 26 + struct iommu *iommu; 26 27 struct vfio_pci_device *device; 27 28 }; 28 29 29 30 FIXTURE_SETUP(vfio_pci_device_test) 30 31 { 31 - self->device = vfio_pci_device_init(device_bdf, default_iommu_mode); 32 + self->iommu = iommu_init(default_iommu_mode); 33 + self->device = vfio_pci_device_init(device_bdf, self->iommu); 32 34 } 33 35 34 36 FIXTURE_TEARDOWN(vfio_pci_device_test) 35 37 { 36 38 vfio_pci_device_cleanup(self->device); 39 + iommu_cleanup(self->iommu); 37 40 } 38 41 39 42 #define read_pci_id_from_sysfs(_file) ({ \ ··· 102 99 } 103 100 104 101 FIXTURE(vfio_pci_irq_test) { 102 + struct iommu *iommu; 105 103 struct vfio_pci_device *device; 106 104 }; 107 105 ··· 120 116 121 117 FIXTURE_SETUP(vfio_pci_irq_test) 122 118 { 123 - self->device = vfio_pci_device_init(device_bdf, default_iommu_mode); 119 + self->iommu = iommu_init(default_iommu_mode); 120 + self->device = vfio_pci_device_init(device_bdf, self->iommu); 124 121 } 125 122 126 123 FIXTURE_TEARDOWN(vfio_pci_irq_test) 127 124 { 128 125 vfio_pci_device_cleanup(self->device); 126 + iommu_cleanup(self->iommu); 129 127 } 130 128 131 129 TEST_F(vfio_pci_irq_test, enable_trigger_disable)
+33 -18
tools/testing/selftests/vfio/vfio_pci_driver_test.c
··· 5 5 #include <linux/sizes.h> 6 6 #include <linux/vfio.h> 7 7 8 - #include <vfio_util.h> 8 + #include <libvfio.h> 9 9 10 10 #include "../kselftest_harness.h" 11 11 ··· 18 18 ASSERT_EQ(EAGAIN, errno); \ 19 19 } while (0) 20 20 21 - static void region_setup(struct vfio_pci_device *device, 21 + static void region_setup(struct iommu *iommu, 22 22 struct iova_allocator *iova_allocator, 23 - struct vfio_dma_region *region, u64 size) 23 + struct dma_region *region, u64 size) 24 24 { 25 25 const int flags = MAP_SHARED | MAP_ANONYMOUS; 26 26 const int prot = PROT_READ | PROT_WRITE; ··· 33 33 region->iova = iova_allocator_alloc(iova_allocator, size); 34 34 region->size = size; 35 35 36 - vfio_pci_dma_map(device, region); 36 + iommu_map(iommu, region); 37 37 } 38 38 39 - static void region_teardown(struct vfio_pci_device *device, 40 - struct vfio_dma_region *region) 39 + static void region_teardown(struct iommu *iommu, struct dma_region *region) 41 40 { 42 - vfio_pci_dma_unmap(device, region); 41 + iommu_unmap(iommu, region); 43 42 VFIO_ASSERT_EQ(munmap(region->vaddr, region->size), 0); 44 43 } 45 44 46 45 FIXTURE(vfio_pci_driver_test) { 46 + struct iommu *iommu; 47 47 struct vfio_pci_device *device; 48 48 struct iova_allocator *iova_allocator; 49 - struct vfio_dma_region memcpy_region; 49 + struct dma_region memcpy_region; 50 50 void *vaddr; 51 51 int msi_fd; 52 52 ··· 73 73 { 74 74 struct vfio_pci_driver *driver; 75 75 76 - self->device = vfio_pci_device_init(device_bdf, variant->iommu_mode); 77 - self->iova_allocator = iova_allocator_init(self->device); 76 + self->iommu = iommu_init(variant->iommu_mode); 77 + self->device = vfio_pci_device_init(device_bdf, self->iommu); 78 + self->iova_allocator = iova_allocator_init(self->iommu); 78 79 79 80 driver = &self->device->driver; 80 81 81 - region_setup(self->device, self->iova_allocator, &self->memcpy_region, SZ_1G); 82 - region_setup(self->device, self->iova_allocator, &driver->region, SZ_2M); 82 + region_setup(self->iommu, self->iova_allocator, &self->memcpy_region, SZ_1G); 83 + region_setup(self->iommu, self->iova_allocator, &driver->region, SZ_2M); 83 84 84 85 /* Any IOVA that doesn't overlap memcpy_region and driver->region. */ 85 86 self->unmapped_iova = iova_allocator_alloc(self->iova_allocator, SZ_1G); ··· 109 108 110 109 vfio_pci_driver_remove(self->device); 111 110 112 - region_teardown(self->device, &self->memcpy_region); 113 - region_teardown(self->device, &driver->region); 111 + region_teardown(self->iommu, &self->memcpy_region); 112 + region_teardown(self->iommu, &driver->region); 114 113 115 114 iova_allocator_cleanup(self->iova_allocator); 116 115 vfio_pci_device_cleanup(self->device); 116 + iommu_cleanup(self->iommu); 117 117 } 118 118 119 119 TEST_F(vfio_pci_driver_test, init_remove) ··· 233 231 ASSERT_NO_MSI(self->msi_fd); 234 232 } 235 233 236 - int main(int argc, char *argv[]) 234 + static bool device_has_selftests_driver(const char *bdf) 237 235 { 238 236 struct vfio_pci_device *device; 237 + struct iommu *iommu; 238 + bool has_driver; 239 239 240 + iommu = iommu_init(default_iommu_mode); 241 + device = vfio_pci_device_init(device_bdf, iommu); 242 + 243 + has_driver = !!device->driver.ops; 244 + 245 + vfio_pci_device_cleanup(device); 246 + iommu_cleanup(iommu); 247 + 248 + return has_driver; 249 + } 250 + 251 + int main(int argc, char *argv[]) 252 + { 240 253 device_bdf = vfio_selftests_get_bdf(&argc, argv); 241 254 242 - device = vfio_pci_device_init(device_bdf, default_iommu_mode); 243 - if (!device->driver.ops) { 255 + if (!device_has_selftests_driver(device_bdf)) { 244 256 fprintf(stderr, "No driver found for device %s\n", device_bdf); 245 257 return KSFT_SKIP; 246 258 } 247 - vfio_pci_device_cleanup(device); 248 259 249 260 return test_harness_run(argc, argv); 250 261 }