Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

Pull iommufd updates from Jason Gunthorpe:
"Several new features and uAPI for iommufd:

- IOMMU_IOAS_MAP_FILE allows passing in a file descriptor as the
backing memory for an iommu mapping. To date VFIO/iommufd have used
VMA's and pin_user_pages(), this now allows using memfds and
memfd_pin_folios(). Notably this creates a pure folio path from the
memfd to the iommu page table where memory is never broken down to
PAGE_SIZE.

- IOMMU_IOAS_CHANGE_PROCESS moves the pinned page accounting between
two processes. Combined with the above this allows iommufd to
support a VMM re-start using exec() where something like qemu would
exec() a new version of itself and fd pass the memfds/iommufd/etc
to the new process. The memfd allows DMA access to the memory to
continue while the new process is getting setup, and the
CHANGE_PROCESS updates all the accounting.

- Support for fault reporting to userspace on non-PRI HW, such as ARM
stall-mode embedded devices.

- IOMMU_VIOMMU_ALLOC introduces the concept of a HW/driver backed
virtual iommu. This will be used by VMMs to access hardware
features that are contained with in a VM. The first use is to
inform the kernel of the virtual SID to physical SID mapping when
issuing SID based invalidation on ARM. Further uses will tie HW
features that are directly accessed by the VM, such as invalidation
queue assignment and others.

- IOMMU_VDEVICE_ALLOC informs the kernel about the mapping of virtual
device to physical device within a VIOMMU. Minimially this is used
to translate VM issued cache invalidation commands from virtual to
physical device IDs.

- Enhancements to IOMMU_HWPT_INVALIDATE and IOMMU_HWPT_ALLOC to work
with the VIOMMU

- ARM SMMuv3 support for nested translation. Using the VIOMMU and
VDEVICE the driver can model this HW's behavior for nested
translation. This includes a shared branch from Will"

* tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd: (51 commits)
iommu/arm-smmu-v3: Import IOMMUFD module namespace
iommufd: IOMMU_IOAS_CHANGE_PROCESS selftest
iommufd: Add IOMMU_IOAS_CHANGE_PROCESS
iommufd: Lock all IOAS objects
iommufd: Export do_update_pinned
iommu/arm-smmu-v3: Support IOMMU_HWPT_INVALIDATE using a VIOMMU object
iommu/arm-smmu-v3: Allow ATS for IOMMU_DOMAIN_NESTED
iommu/arm-smmu-v3: Use S2FWB for NESTED domains
iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED
iommu/arm-smmu-v3: Support IOMMU_VIOMMU_ALLOC
Documentation: userspace-api: iommufd: Update vDEVICE
iommufd/selftest: Add vIOMMU coverage for IOMMU_HWPT_INVALIDATE ioctl
iommufd/selftest: Add IOMMU_TEST_OP_DEV_CHECK_CACHE test command
iommufd/selftest: Add mock_viommu_cache_invalidate
iommufd/viommu: Add iommufd_viommu_find_dev helper
iommu: Add iommu_copy_struct_from_full_user_array helper
iommufd: Allow hwpt_id to carry viommu_id for IOMMU_HWPT_INVALIDATE
iommu/viommu: Add cache_invalidate to iommufd_viommu_ops
iommufd/selftest: Add IOMMU_VDEVICE_ALLOC test coverage
iommufd/viommu: Add IOMMUFD_OBJ_VDEVICE and IOMMU_VDEVICE_ALLOC ioctl
...

+3355 -428
+177 -45
Documentation/userspace-api/iommufd.rst
··· 41 41 - IOMMUFD_OBJ_DEVICE, representing a device that is bound to iommufd by an 42 42 external driver. 43 43 44 - - IOMMUFD_OBJ_HW_PAGETABLE, representing an actual hardware I/O page table 45 - (i.e. a single struct iommu_domain) managed by the iommu driver. 44 + - IOMMUFD_OBJ_HWPT_PAGING, representing an actual hardware I/O page table 45 + (i.e. a single struct iommu_domain) managed by the iommu driver. "PAGING" 46 + primarly indicates this type of HWPT should be linked to an IOAS. It also 47 + indicates that it is backed by an iommu_domain with __IOMMU_DOMAIN_PAGING 48 + feature flag. This can be either an UNMANAGED stage-1 domain for a device 49 + running in the user space, or a nesting parent stage-2 domain for mappings 50 + from guest-level physical addresses to host-level physical addresses. 46 51 47 - The IOAS has a list of HW_PAGETABLES that share the same IOVA mapping and 48 - it will synchronize its mapping with each member HW_PAGETABLE. 52 + The IOAS has a list of HWPT_PAGINGs that share the same IOVA mapping and 53 + it will synchronize its mapping with each member HWPT_PAGING. 54 + 55 + - IOMMUFD_OBJ_HWPT_NESTED, representing an actual hardware I/O page table 56 + (i.e. a single struct iommu_domain) managed by user space (e.g. guest OS). 57 + "NESTED" indicates that this type of HWPT should be linked to an HWPT_PAGING. 58 + It also indicates that it is backed by an iommu_domain that has a type of 59 + IOMMU_DOMAIN_NESTED. This must be a stage-1 domain for a device running in 60 + the user space (e.g. in a guest VM enabling the IOMMU nested translation 61 + feature.) As such, it must be created with a given nesting parent stage-2 62 + domain to associate to. This nested stage-1 page table managed by the user 63 + space usually has mappings from guest-level I/O virtual addresses to guest- 64 + level physical addresses. 65 + 66 + - IOMMUFD_OBJ_VIOMMU, representing a slice of the physical IOMMU instance, 67 + passed to or shared with a VM. It may be some HW-accelerated virtualization 68 + features and some SW resources used by the VM. For examples: 69 + 70 + * Security namespace for guest owned ID, e.g. guest-controlled cache tags 71 + * Non-device-affiliated event reporting, e.g. invalidation queue errors 72 + * Access to a sharable nesting parent pagetable across physical IOMMUs 73 + * Virtualization of various platforms IDs, e.g. RIDs and others 74 + * Delivery of paravirtualized invalidation 75 + * Direct assigned invalidation queues 76 + * Direct assigned interrupts 77 + 78 + Such a vIOMMU object generally has the access to a nesting parent pagetable 79 + to support some HW-accelerated virtualization features. So, a vIOMMU object 80 + must be created given a nesting parent HWPT_PAGING object, and then it would 81 + encapsulate that HWPT_PAGING object. Therefore, a vIOMMU object can be used 82 + to allocate an HWPT_NESTED object in place of the encapsulated HWPT_PAGING. 83 + 84 + .. note:: 85 + 86 + The name "vIOMMU" isn't necessarily identical to a virtualized IOMMU in a 87 + VM. A VM can have one giant virtualized IOMMU running on a machine having 88 + multiple physical IOMMUs, in which case the VMM will dispatch the requests 89 + or configurations from this single virtualized IOMMU instance to multiple 90 + vIOMMU objects created for individual slices of different physical IOMMUs. 91 + In other words, a vIOMMU object is always a representation of one physical 92 + IOMMU, not necessarily of a virtualized IOMMU. For VMMs that want the full 93 + virtualization features from physical IOMMUs, it is suggested to build the 94 + same number of virtualized IOMMUs as the number of physical IOMMUs, so the 95 + passed-through devices would be connected to their own virtualized IOMMUs 96 + backed by corresponding vIOMMU objects, in which case a guest OS would do 97 + the "dispatch" naturally instead of VMM trappings. 98 + 99 + - IOMMUFD_OBJ_VDEVICE, representing a virtual device for an IOMMUFD_OBJ_DEVICE 100 + against an IOMMUFD_OBJ_VIOMMU. This virtual device holds the device's virtual 101 + information or attributes (related to the vIOMMU) in a VM. An immediate vDATA 102 + example can be the virtual ID of the device on a vIOMMU, which is a unique ID 103 + that VMM assigns to the device for a translation channel/port of the vIOMMU, 104 + e.g. vSID of ARM SMMUv3, vDeviceID of AMD IOMMU, and vRID of Intel VT-d to a 105 + Context Table. Potential use cases of some advanced security information can 106 + be forwarded via this object too, such as security level or realm information 107 + in a Confidential Compute Architecture. A VMM should create a vDEVICE object 108 + to forward all the device information in a VM, when it connects a device to a 109 + vIOMMU, which is a separate ioctl call from attaching the same device to an 110 + HWPT_PAGING that the vIOMMU holds. 49 111 50 112 All user-visible objects are destroyed via the IOMMU_DESTROY uAPI. 51 113 52 - The diagram below shows relationship between user-visible objects and kernel 114 + The diagrams below show relationships between user-visible objects and kernel 53 115 datastructures (external to iommufd), with numbers referred to operations 54 116 creating the objects and links:: 55 117 56 - _________________________________________________________ 57 - | iommufd | 58 - | [1] | 59 - | _________________ | 60 - | | | | 61 - | | | | 62 - | | | | 63 - | | | | 64 - | | | | 65 - | | | | 66 - | | | [3] [2] | 67 - | | | ____________ __________ | 68 - | | IOAS |<--| |<------| | | 69 - | | | |HW_PAGETABLE| | DEVICE | | 70 - | | | |____________| |__________| | 71 - | | | | | | 72 - | | | | | | 73 - | | | | | | 74 - | | | | | | 75 - | | | | | | 76 - | |_________________| | | | 77 - | | | | | 78 - |_________|___________________|___________________|_______| 79 - | | | 80 - | _____v______ _______v_____ 81 - | PFN storage | | | | 82 - |------------>|iommu_domain| |struct device| 83 - |____________| |_____________| 118 + _______________________________________________________________________ 119 + | iommufd (HWPT_PAGING only) | 120 + | | 121 + | [1] [3] [2] | 122 + | ________________ _____________ ________ | 123 + | | | | | | | | 124 + | | IOAS |<---| HWPT_PAGING |<---------------------| DEVICE | | 125 + | |________________| |_____________| |________| | 126 + | | | | | 127 + |_________|____________________|__________________________________|_____| 128 + | | | 129 + | ______v_____ ___v__ 130 + | PFN storage | (paging) | |struct| 131 + |------------>|iommu_domain|<-----------------------|device| 132 + |____________| |______| 133 + 134 + _______________________________________________________________________ 135 + | iommufd (with HWPT_NESTED) | 136 + | | 137 + | [1] [3] [4] [2] | 138 + | ________________ _____________ _____________ ________ | 139 + | | | | | | | | | | 140 + | | IOAS |<---| HWPT_PAGING |<---| HWPT_NESTED |<--| DEVICE | | 141 + | |________________| |_____________| |_____________| |________| | 142 + | | | | | | 143 + |_________|____________________|__________________|_______________|_____| 144 + | | | | 145 + | ______v_____ ______v_____ ___v__ 146 + | PFN storage | (paging) | | (nested) | |struct| 147 + |------------>|iommu_domain|<----|iommu_domain|<----|device| 148 + |____________| |____________| |______| 149 + 150 + _______________________________________________________________________ 151 + | iommufd (with vIOMMU/vDEVICE) | 152 + | | 153 + | [5] [6] | 154 + | _____________ _____________ | 155 + | | | | | | 156 + | |----------------| vIOMMU |<---| vDEVICE |<----| | 157 + | | | | |_____________| | | 158 + | | | | | | 159 + | | [1] | | [4] | [2] | 160 + | | ______ | | _____________ _|______ | 161 + | | | | | [3] | | | | | | 162 + | | | IOAS |<---|(HWPT_PAGING)|<---| HWPT_NESTED |<--| DEVICE | | 163 + | | |______| |_____________| |_____________| |________| | 164 + | | | | | | | 165 + |______|________|______________|__________________|_______________|_____| 166 + | | | | | 167 + ______v_____ | ______v_____ ______v_____ ___v__ 168 + | struct | | PFN | (paging) | | (nested) | |struct| 169 + |iommu_device| |------>|iommu_domain|<----|iommu_domain|<----|device| 170 + |____________| storage|____________| |____________| |______| 84 171 85 172 1. IOMMUFD_OBJ_IOAS is created via the IOMMU_IOAS_ALLOC uAPI. An iommufd can 86 173 hold multiple IOAS objects. IOAS is the most generic object and does not ··· 181 94 device. The driver must also set the driver_managed_dma flag and must not 182 95 touch the device until this operation succeeds. 183 96 184 - 3. IOMMUFD_OBJ_HW_PAGETABLE is created when an external driver calls the IOMMUFD 185 - kAPI to attach a bound device to an IOAS. Similarly the external driver uAPI 186 - allows userspace to initiate the attaching operation. If a compatible 187 - pagetable already exists then it is reused for the attachment. Otherwise a 188 - new pagetable object and iommu_domain is created. Successful completion of 189 - this operation sets up the linkages among IOAS, device and iommu_domain. Once 190 - this completes the device could do DMA. 97 + 3. IOMMUFD_OBJ_HWPT_PAGING can be created in two ways: 191 98 192 - Every iommu_domain inside the IOAS is also represented to userspace as a 193 - HW_PAGETABLE object. 99 + * IOMMUFD_OBJ_HWPT_PAGING is automatically created when an external driver 100 + calls the IOMMUFD kAPI to attach a bound device to an IOAS. Similarly the 101 + external driver uAPI allows userspace to initiate the attaching operation. 102 + If a compatible member HWPT_PAGING object exists in the IOAS's HWPT_PAGING 103 + list, then it will be reused. Otherwise a new HWPT_PAGING that represents 104 + an iommu_domain to userspace will be created, and then added to the list. 105 + Successful completion of this operation sets up the linkages among IOAS, 106 + device and iommu_domain. Once this completes the device could do DMA. 107 + 108 + * IOMMUFD_OBJ_HWPT_PAGING can be manually created via the IOMMU_HWPT_ALLOC 109 + uAPI, provided an ioas_id via @pt_id to associate the new HWPT_PAGING to 110 + the corresponding IOAS object. The benefit of this manual allocation is to 111 + allow allocation flags (defined in enum iommufd_hwpt_alloc_flags), e.g. it 112 + allocates a nesting parent HWPT_PAGING if the IOMMU_HWPT_ALLOC_NEST_PARENT 113 + flag is set. 114 + 115 + 4. IOMMUFD_OBJ_HWPT_NESTED can be only manually created via the IOMMU_HWPT_ALLOC 116 + uAPI, provided an hwpt_id or a viommu_id of a vIOMMU object encapsulating a 117 + nesting parent HWPT_PAGING via @pt_id to associate the new HWPT_NESTED object 118 + to the corresponding HWPT_PAGING object. The associating HWPT_PAGING object 119 + must be a nesting parent manually allocated via the same uAPI previously with 120 + an IOMMU_HWPT_ALLOC_NEST_PARENT flag, otherwise the allocation will fail. The 121 + allocation will be further validated by the IOMMU driver to ensure that the 122 + nesting parent domain and the nested domain being allocated are compatible. 123 + Successful completion of this operation sets up linkages among IOAS, device, 124 + and iommu_domains. Once this completes the device could do DMA via a 2-stage 125 + translation, a.k.a nested translation. Note that multiple HWPT_NESTED objects 126 + can be allocated by (and then associated to) the same nesting parent. 194 127 195 128 .. note:: 196 129 197 - Future IOMMUFD updates will provide an API to create and manipulate the 198 - HW_PAGETABLE directly. 130 + Either a manual IOMMUFD_OBJ_HWPT_PAGING or an IOMMUFD_OBJ_HWPT_NESTED is 131 + created via the same IOMMU_HWPT_ALLOC uAPI. The difference is at the type 132 + of the object passed in via the @pt_id field of struct iommufd_hwpt_alloc. 133 + 134 + 5. IOMMUFD_OBJ_VIOMMU can be only manually created via the IOMMU_VIOMMU_ALLOC 135 + uAPI, provided a dev_id (for the device's physical IOMMU to back the vIOMMU) 136 + and an hwpt_id (to associate the vIOMMU to a nesting parent HWPT_PAGING). The 137 + iommufd core will link the vIOMMU object to the struct iommu_device that the 138 + struct device is behind. And an IOMMU driver can implement a viommu_alloc op 139 + to allocate its own vIOMMU data structure embedding the core-level structure 140 + iommufd_viommu and some driver-specific data. If necessary, the driver can 141 + also configure its HW virtualization feature for that vIOMMU (and thus for 142 + the VM). Successful completion of this operation sets up the linkages between 143 + the vIOMMU object and the HWPT_PAGING, then this vIOMMU object can be used 144 + as a nesting parent object to allocate an HWPT_NESTED object described above. 145 + 146 + 6. IOMMUFD_OBJ_VDEVICE can be only manually created via the IOMMU_VDEVICE_ALLOC 147 + uAPI, provided a viommu_id for an iommufd_viommu object and a dev_id for an 148 + iommufd_device object. The vDEVICE object will be the binding between these 149 + two parent objects. Another @virt_id will be also set via the uAPI providing 150 + the iommufd core an index to store the vDEVICE object to a vDEVICE array per 151 + vIOMMU. If necessary, the IOMMU driver may choose to implement a vdevce_alloc 152 + op to init its HW for virtualization feature related to a vDEVICE. Successful 153 + completion of this operation sets up the linkages between vIOMMU and device. 199 154 200 155 A device can only bind to an iommufd due to DMA ownership claim and attach to at 201 156 most one IOAS object (no support of PASID yet). ··· 249 120 250 121 - iommufd_ioas for IOMMUFD_OBJ_IOAS. 251 122 - iommufd_device for IOMMUFD_OBJ_DEVICE. 252 - - iommufd_hw_pagetable for IOMMUFD_OBJ_HW_PAGETABLE. 123 + - iommufd_hwpt_paging for IOMMUFD_OBJ_HWPT_PAGING. 124 + - iommufd_hwpt_nested for IOMMUFD_OBJ_HWPT_NESTED. 125 + - iommufd_viommu for IOMMUFD_OBJ_VIOMMU. 126 + - iommufd_vdevice for IOMMUFD_OBJ_VDEVICE. 253 127 254 128 Several terminologies when looking at these datastructures: 255 129
+13
drivers/acpi/arm64/iort.c
··· 1218 1218 return pci_rc->ats_attribute & ACPI_IORT_ATS_SUPPORTED; 1219 1219 } 1220 1220 1221 + static bool iort_pci_rc_supports_canwbs(struct acpi_iort_node *node) 1222 + { 1223 + struct acpi_iort_memory_access *memory_access; 1224 + struct acpi_iort_root_complex *pci_rc; 1225 + 1226 + pci_rc = (struct acpi_iort_root_complex *)node->node_data; 1227 + memory_access = 1228 + (struct acpi_iort_memory_access *)&pci_rc->memory_properties; 1229 + return memory_access->memory_flags & ACPI_IORT_MF_CANWBS; 1230 + } 1231 + 1221 1232 static int iort_iommu_xlate(struct device *dev, struct acpi_iort_node *node, 1222 1233 u32 streamid) 1223 1234 { ··· 1346 1335 fwspec = dev_iommu_fwspec_get(dev); 1347 1336 if (fwspec && iort_pci_rc_supports_ats(node)) 1348 1337 fwspec->flags |= IOMMU_FWSPEC_PCI_RC_ATS; 1338 + if (fwspec && iort_pci_rc_supports_canwbs(node)) 1339 + fwspec->flags |= IOMMU_FWSPEC_PCI_RC_CANWBS; 1349 1340 } else { 1350 1341 node = iort_scan_node(ACPI_IORT_NODE_NAMED_COMPONENT, 1351 1342 iort_match_node_callback, dev);
+9
drivers/iommu/Kconfig
··· 415 415 Say Y here if your system supports SVA extensions such as PCIe PASID 416 416 and PRI. 417 417 418 + config ARM_SMMU_V3_IOMMUFD 419 + bool "Enable IOMMUFD features for ARM SMMUv3 (EXPERIMENTAL)" 420 + depends on IOMMUFD 421 + help 422 + Support for IOMMUFD features intended to support virtual machines 423 + with accelerated virtual IOMMUs. 424 + 425 + Say Y here if you are doing development and testing on this feature. 426 + 418 427 config ARM_SMMU_V3_KUNIT_TEST 419 428 tristate "KUnit tests for arm-smmu-v3 driver" if !KUNIT_ALL_TESTS 420 429 depends on KUNIT
+1
drivers/iommu/arm/arm-smmu-v3/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0 2 2 obj-$(CONFIG_ARM_SMMU_V3) += arm_smmu_v3.o 3 3 arm_smmu_v3-y := arm-smmu-v3.o 4 + arm_smmu_v3-$(CONFIG_ARM_SMMU_V3_IOMMUFD) += arm-smmu-v3-iommufd.o 4 5 arm_smmu_v3-$(CONFIG_ARM_SMMU_V3_SVA) += arm-smmu-v3-sva.o 5 6 arm_smmu_v3-$(CONFIG_TEGRA241_CMDQV) += tegra241-cmdqv.o 6 7
+401
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES 4 + */ 5 + 6 + #include <uapi/linux/iommufd.h> 7 + 8 + #include "arm-smmu-v3.h" 9 + 10 + void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type) 11 + { 12 + struct arm_smmu_master *master = dev_iommu_priv_get(dev); 13 + struct iommu_hw_info_arm_smmuv3 *info; 14 + u32 __iomem *base_idr; 15 + unsigned int i; 16 + 17 + info = kzalloc(sizeof(*info), GFP_KERNEL); 18 + if (!info) 19 + return ERR_PTR(-ENOMEM); 20 + 21 + base_idr = master->smmu->base + ARM_SMMU_IDR0; 22 + for (i = 0; i <= 5; i++) 23 + info->idr[i] = readl_relaxed(base_idr + i); 24 + info->iidr = readl_relaxed(master->smmu->base + ARM_SMMU_IIDR); 25 + info->aidr = readl_relaxed(master->smmu->base + ARM_SMMU_AIDR); 26 + 27 + *length = sizeof(*info); 28 + *type = IOMMU_HW_INFO_TYPE_ARM_SMMUV3; 29 + 30 + return info; 31 + } 32 + 33 + static void arm_smmu_make_nested_cd_table_ste( 34 + struct arm_smmu_ste *target, struct arm_smmu_master *master, 35 + struct arm_smmu_nested_domain *nested_domain, bool ats_enabled) 36 + { 37 + arm_smmu_make_s2_domain_ste( 38 + target, master, nested_domain->vsmmu->s2_parent, ats_enabled); 39 + 40 + target->data[0] = cpu_to_le64(STRTAB_STE_0_V | 41 + FIELD_PREP(STRTAB_STE_0_CFG, 42 + STRTAB_STE_0_CFG_NESTED)); 43 + target->data[0] |= nested_domain->ste[0] & 44 + ~cpu_to_le64(STRTAB_STE_0_CFG); 45 + target->data[1] |= nested_domain->ste[1]; 46 + } 47 + 48 + /* 49 + * Create a physical STE from the virtual STE that userspace provided when it 50 + * created the nested domain. Using the vSTE userspace can request: 51 + * - Non-valid STE 52 + * - Abort STE 53 + * - Bypass STE (install the S2, no CD table) 54 + * - CD table STE (install the S2 and the userspace CD table) 55 + */ 56 + static void arm_smmu_make_nested_domain_ste( 57 + struct arm_smmu_ste *target, struct arm_smmu_master *master, 58 + struct arm_smmu_nested_domain *nested_domain, bool ats_enabled) 59 + { 60 + unsigned int cfg = 61 + FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(nested_domain->ste[0])); 62 + 63 + /* 64 + * Userspace can request a non-valid STE through the nesting interface. 65 + * We relay that into an abort physical STE with the intention that 66 + * C_BAD_STE for this SID can be generated to userspace. 67 + */ 68 + if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) 69 + cfg = STRTAB_STE_0_CFG_ABORT; 70 + 71 + switch (cfg) { 72 + case STRTAB_STE_0_CFG_S1_TRANS: 73 + arm_smmu_make_nested_cd_table_ste(target, master, nested_domain, 74 + ats_enabled); 75 + break; 76 + case STRTAB_STE_0_CFG_BYPASS: 77 + arm_smmu_make_s2_domain_ste(target, master, 78 + nested_domain->vsmmu->s2_parent, 79 + ats_enabled); 80 + break; 81 + case STRTAB_STE_0_CFG_ABORT: 82 + default: 83 + arm_smmu_make_abort_ste(target); 84 + break; 85 + } 86 + } 87 + 88 + static int arm_smmu_attach_dev_nested(struct iommu_domain *domain, 89 + struct device *dev) 90 + { 91 + struct arm_smmu_nested_domain *nested_domain = 92 + to_smmu_nested_domain(domain); 93 + struct arm_smmu_master *master = dev_iommu_priv_get(dev); 94 + struct arm_smmu_attach_state state = { 95 + .master = master, 96 + .old_domain = iommu_get_domain_for_dev(dev), 97 + .ssid = IOMMU_NO_PASID, 98 + }; 99 + struct arm_smmu_ste ste; 100 + int ret; 101 + 102 + if (nested_domain->vsmmu->smmu != master->smmu) 103 + return -EINVAL; 104 + if (arm_smmu_ssids_in_use(&master->cd_table)) 105 + return -EBUSY; 106 + 107 + mutex_lock(&arm_smmu_asid_lock); 108 + /* 109 + * The VM has to control the actual ATS state at the PCI device because 110 + * we forward the invalidations directly from the VM. If the VM doesn't 111 + * think ATS is on it will not generate ATC flushes and the ATC will 112 + * become incoherent. Since we can't access the actual virtual PCI ATS 113 + * config bit here base this off the EATS value in the STE. If the EATS 114 + * is set then the VM must generate ATC flushes. 115 + */ 116 + state.disable_ats = !nested_domain->enable_ats; 117 + ret = arm_smmu_attach_prepare(&state, domain); 118 + if (ret) { 119 + mutex_unlock(&arm_smmu_asid_lock); 120 + return ret; 121 + } 122 + 123 + arm_smmu_make_nested_domain_ste(&ste, master, nested_domain, 124 + state.ats_enabled); 125 + arm_smmu_install_ste_for_dev(master, &ste); 126 + arm_smmu_attach_commit(&state); 127 + mutex_unlock(&arm_smmu_asid_lock); 128 + return 0; 129 + } 130 + 131 + static void arm_smmu_domain_nested_free(struct iommu_domain *domain) 132 + { 133 + kfree(to_smmu_nested_domain(domain)); 134 + } 135 + 136 + static const struct iommu_domain_ops arm_smmu_nested_ops = { 137 + .attach_dev = arm_smmu_attach_dev_nested, 138 + .free = arm_smmu_domain_nested_free, 139 + }; 140 + 141 + static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg, 142 + bool *enable_ats) 143 + { 144 + unsigned int eats; 145 + unsigned int cfg; 146 + 147 + if (!(arg->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) { 148 + memset(arg->ste, 0, sizeof(arg->ste)); 149 + return 0; 150 + } 151 + 152 + /* EIO is reserved for invalid STE data. */ 153 + if ((arg->ste[0] & ~STRTAB_STE_0_NESTING_ALLOWED) || 154 + (arg->ste[1] & ~STRTAB_STE_1_NESTING_ALLOWED)) 155 + return -EIO; 156 + 157 + cfg = FIELD_GET(STRTAB_STE_0_CFG, le64_to_cpu(arg->ste[0])); 158 + if (cfg != STRTAB_STE_0_CFG_ABORT && cfg != STRTAB_STE_0_CFG_BYPASS && 159 + cfg != STRTAB_STE_0_CFG_S1_TRANS) 160 + return -EIO; 161 + 162 + /* 163 + * Only Full ATS or ATS UR is supported 164 + * The EATS field will be set by arm_smmu_make_nested_domain_ste() 165 + */ 166 + eats = FIELD_GET(STRTAB_STE_1_EATS, le64_to_cpu(arg->ste[1])); 167 + arg->ste[1] &= ~cpu_to_le64(STRTAB_STE_1_EATS); 168 + if (eats != STRTAB_STE_1_EATS_ABT && eats != STRTAB_STE_1_EATS_TRANS) 169 + return -EIO; 170 + 171 + if (cfg == STRTAB_STE_0_CFG_S1_TRANS) 172 + *enable_ats = (eats == STRTAB_STE_1_EATS_TRANS); 173 + return 0; 174 + } 175 + 176 + static struct iommu_domain * 177 + arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, 178 + const struct iommu_user_data *user_data) 179 + { 180 + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); 181 + const u32 SUPPORTED_FLAGS = IOMMU_HWPT_FAULT_ID_VALID; 182 + struct arm_smmu_nested_domain *nested_domain; 183 + struct iommu_hwpt_arm_smmuv3 arg; 184 + bool enable_ats = false; 185 + int ret; 186 + 187 + /* 188 + * Faults delivered to the nested domain are faults that originated by 189 + * the S1 in the domain. The core code will match all PASIDs when 190 + * delivering the fault due to user_pasid_table 191 + */ 192 + if (flags & ~SUPPORTED_FLAGS) 193 + return ERR_PTR(-EOPNOTSUPP); 194 + 195 + ret = iommu_copy_struct_from_user(&arg, user_data, 196 + IOMMU_HWPT_DATA_ARM_SMMUV3, ste); 197 + if (ret) 198 + return ERR_PTR(ret); 199 + 200 + ret = arm_smmu_validate_vste(&arg, &enable_ats); 201 + if (ret) 202 + return ERR_PTR(ret); 203 + 204 + nested_domain = kzalloc(sizeof(*nested_domain), GFP_KERNEL_ACCOUNT); 205 + if (!nested_domain) 206 + return ERR_PTR(-ENOMEM); 207 + 208 + nested_domain->domain.type = IOMMU_DOMAIN_NESTED; 209 + nested_domain->domain.ops = &arm_smmu_nested_ops; 210 + nested_domain->enable_ats = enable_ats; 211 + nested_domain->vsmmu = vsmmu; 212 + nested_domain->ste[0] = arg.ste[0]; 213 + nested_domain->ste[1] = arg.ste[1] & ~cpu_to_le64(STRTAB_STE_1_EATS); 214 + 215 + return &nested_domain->domain; 216 + } 217 + 218 + static int arm_vsmmu_vsid_to_sid(struct arm_vsmmu *vsmmu, u32 vsid, u32 *sid) 219 + { 220 + struct arm_smmu_master *master; 221 + struct device *dev; 222 + int ret = 0; 223 + 224 + xa_lock(&vsmmu->core.vdevs); 225 + dev = iommufd_viommu_find_dev(&vsmmu->core, (unsigned long)vsid); 226 + if (!dev) { 227 + ret = -EIO; 228 + goto unlock; 229 + } 230 + master = dev_iommu_priv_get(dev); 231 + 232 + /* At this moment, iommufd only supports PCI device that has one SID */ 233 + if (sid) 234 + *sid = master->streams[0].id; 235 + unlock: 236 + xa_unlock(&vsmmu->core.vdevs); 237 + return ret; 238 + } 239 + 240 + /* This is basically iommu_viommu_arm_smmuv3_invalidate in u64 for conversion */ 241 + struct arm_vsmmu_invalidation_cmd { 242 + union { 243 + u64 cmd[2]; 244 + struct iommu_viommu_arm_smmuv3_invalidate ucmd; 245 + }; 246 + }; 247 + 248 + /* 249 + * Convert, in place, the raw invalidation command into an internal format that 250 + * can be passed to arm_smmu_cmdq_issue_cmdlist(). Internally commands are 251 + * stored in CPU endian. 252 + * 253 + * Enforce the VMID or SID on the command. 254 + */ 255 + static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu, 256 + struct arm_vsmmu_invalidation_cmd *cmd) 257 + { 258 + /* Commands are le64 stored in u64 */ 259 + cmd->cmd[0] = le64_to_cpu(cmd->ucmd.cmd[0]); 260 + cmd->cmd[1] = le64_to_cpu(cmd->ucmd.cmd[1]); 261 + 262 + switch (cmd->cmd[0] & CMDQ_0_OP) { 263 + case CMDQ_OP_TLBI_NSNH_ALL: 264 + /* Convert to NH_ALL */ 265 + cmd->cmd[0] = CMDQ_OP_TLBI_NH_ALL | 266 + FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid); 267 + cmd->cmd[1] = 0; 268 + break; 269 + case CMDQ_OP_TLBI_NH_VA: 270 + case CMDQ_OP_TLBI_NH_VAA: 271 + case CMDQ_OP_TLBI_NH_ALL: 272 + case CMDQ_OP_TLBI_NH_ASID: 273 + cmd->cmd[0] &= ~CMDQ_TLBI_0_VMID; 274 + cmd->cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, vsmmu->vmid); 275 + break; 276 + case CMDQ_OP_ATC_INV: 277 + case CMDQ_OP_CFGI_CD: 278 + case CMDQ_OP_CFGI_CD_ALL: { 279 + u32 sid, vsid = FIELD_GET(CMDQ_CFGI_0_SID, cmd->cmd[0]); 280 + 281 + if (arm_vsmmu_vsid_to_sid(vsmmu, vsid, &sid)) 282 + return -EIO; 283 + cmd->cmd[0] &= ~CMDQ_CFGI_0_SID; 284 + cmd->cmd[0] |= FIELD_PREP(CMDQ_CFGI_0_SID, sid); 285 + break; 286 + } 287 + default: 288 + return -EIO; 289 + } 290 + return 0; 291 + } 292 + 293 + static int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu, 294 + struct iommu_user_data_array *array) 295 + { 296 + struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core); 297 + struct arm_smmu_device *smmu = vsmmu->smmu; 298 + struct arm_vsmmu_invalidation_cmd *last; 299 + struct arm_vsmmu_invalidation_cmd *cmds; 300 + struct arm_vsmmu_invalidation_cmd *cur; 301 + struct arm_vsmmu_invalidation_cmd *end; 302 + int ret; 303 + 304 + cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL); 305 + if (!cmds) 306 + return -ENOMEM; 307 + cur = cmds; 308 + end = cmds + array->entry_num; 309 + 310 + static_assert(sizeof(*cmds) == 2 * sizeof(u64)); 311 + ret = iommu_copy_struct_from_full_user_array( 312 + cmds, sizeof(*cmds), array, 313 + IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3); 314 + if (ret) 315 + goto out; 316 + 317 + last = cmds; 318 + while (cur != end) { 319 + ret = arm_vsmmu_convert_user_cmd(vsmmu, cur); 320 + if (ret) 321 + goto out; 322 + 323 + /* FIXME work in blocks of CMDQ_BATCH_ENTRIES and copy each block? */ 324 + cur++; 325 + if (cur != end && (cur - last) != CMDQ_BATCH_ENTRIES - 1) 326 + continue; 327 + 328 + /* FIXME always uses the main cmdq rather than trying to group by type */ 329 + ret = arm_smmu_cmdq_issue_cmdlist(smmu, &smmu->cmdq, last->cmd, 330 + cur - last, true); 331 + if (ret) { 332 + cur--; 333 + goto out; 334 + } 335 + last = cur; 336 + } 337 + out: 338 + array->entry_num = cur - cmds; 339 + kfree(cmds); 340 + return ret; 341 + } 342 + 343 + static const struct iommufd_viommu_ops arm_vsmmu_ops = { 344 + .alloc_domain_nested = arm_vsmmu_alloc_domain_nested, 345 + .cache_invalidate = arm_vsmmu_cache_invalidate, 346 + }; 347 + 348 + struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, 349 + struct iommu_domain *parent, 350 + struct iommufd_ctx *ictx, 351 + unsigned int viommu_type) 352 + { 353 + struct arm_smmu_device *smmu = 354 + iommu_get_iommu_dev(dev, struct arm_smmu_device, iommu); 355 + struct arm_smmu_master *master = dev_iommu_priv_get(dev); 356 + struct arm_smmu_domain *s2_parent = to_smmu_domain(parent); 357 + struct arm_vsmmu *vsmmu; 358 + 359 + if (viommu_type != IOMMU_VIOMMU_TYPE_ARM_SMMUV3) 360 + return ERR_PTR(-EOPNOTSUPP); 361 + 362 + if (!(smmu->features & ARM_SMMU_FEAT_NESTING)) 363 + return ERR_PTR(-EOPNOTSUPP); 364 + 365 + if (s2_parent->smmu != master->smmu) 366 + return ERR_PTR(-EINVAL); 367 + 368 + /* 369 + * FORCE_SYNC is not set with FEAT_NESTING. Some study of the exact HW 370 + * defect is needed to determine if arm_vsmmu_cache_invalidate() needs 371 + * any change to remove this. 372 + */ 373 + if (WARN_ON(smmu->options & ARM_SMMU_OPT_CMDQ_FORCE_SYNC)) 374 + return ERR_PTR(-EOPNOTSUPP); 375 + 376 + /* 377 + * Must support some way to prevent the VM from bypassing the cache 378 + * because VFIO currently does not do any cache maintenance. canwbs 379 + * indicates the device is fully coherent and no cache maintenance is 380 + * ever required, even for PCI No-Snoop. S2FWB means the S1 can't make 381 + * things non-coherent using the memattr, but No-Snoop behavior is not 382 + * effected. 383 + */ 384 + if (!arm_smmu_master_canwbs(master) && 385 + !(smmu->features & ARM_SMMU_FEAT_S2FWB)) 386 + return ERR_PTR(-EOPNOTSUPP); 387 + 388 + vsmmu = iommufd_viommu_alloc(ictx, struct arm_vsmmu, core, 389 + &arm_vsmmu_ops); 390 + if (IS_ERR(vsmmu)) 391 + return ERR_CAST(vsmmu); 392 + 393 + vsmmu->smmu = smmu; 394 + vsmmu->s2_parent = s2_parent; 395 + /* FIXME Move VMID allocation from the S2 domain allocation to here */ 396 + vsmmu->vmid = s2_parent->s2_cfg.vmid; 397 + 398 + return &vsmmu->core; 399 + } 400 + 401 + MODULE_IMPORT_NS(IOMMUFD);
+96 -43
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
··· 295 295 case CMDQ_OP_TLBI_NH_ASID: 296 296 cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_ASID, ent->tlbi.asid); 297 297 fallthrough; 298 + case CMDQ_OP_TLBI_NH_ALL: 298 299 case CMDQ_OP_TLBI_S12_VMALL: 299 300 cmd[0] |= FIELD_PREP(CMDQ_TLBI_0_VMID, ent->tlbi.vmid); 300 301 break; ··· 766 765 * insert their own list of commands then all of the commands from one 767 766 * CPU will appear before any of the commands from the other CPU. 768 767 */ 769 - static int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, 770 - struct arm_smmu_cmdq *cmdq, 771 - u64 *cmds, int n, bool sync) 768 + int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, 769 + struct arm_smmu_cmdq *cmdq, u64 *cmds, int n, 770 + bool sync) 772 771 { 773 772 u64 cmd_sync[CMDQ_ENT_DWORDS]; 774 773 u32 prod; ··· 1046 1045 /* S2 translates */ 1047 1046 if (cfg & BIT(1)) { 1048 1047 used_bits[1] |= 1049 - cpu_to_le64(STRTAB_STE_1_EATS | STRTAB_STE_1_SHCFG); 1048 + cpu_to_le64(STRTAB_STE_1_S2FWB | STRTAB_STE_1_EATS | 1049 + STRTAB_STE_1_SHCFG); 1050 1050 used_bits[2] |= 1051 1051 cpu_to_le64(STRTAB_STE_2_S2VMID | STRTAB_STE_2_VTCR | 1052 1052 STRTAB_STE_2_S2AA64 | STRTAB_STE_2_S2ENDI | ··· 1551 1549 } 1552 1550 } 1553 1551 1554 - VISIBLE_IF_KUNIT 1555 1552 void arm_smmu_make_abort_ste(struct arm_smmu_ste *target) 1556 1553 { 1557 1554 memset(target, 0, sizeof(*target)); ··· 1633 1632 } 1634 1633 EXPORT_SYMBOL_IF_KUNIT(arm_smmu_make_cdtable_ste); 1635 1634 1636 - VISIBLE_IF_KUNIT 1637 1635 void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, 1638 1636 struct arm_smmu_master *master, 1639 1637 struct arm_smmu_domain *smmu_domain, ··· 1655 1655 FIELD_PREP(STRTAB_STE_1_EATS, 1656 1656 ats_enabled ? STRTAB_STE_1_EATS_TRANS : 0)); 1657 1657 1658 + if (pgtbl_cfg->quirks & IO_PGTABLE_QUIRK_ARM_S2FWB) 1659 + target->data[1] |= cpu_to_le64(STRTAB_STE_1_S2FWB); 1658 1660 if (smmu->features & ARM_SMMU_FEAT_ATTR_TYPES_OVR) 1659 1661 target->data[1] |= cpu_to_le64(FIELD_PREP(STRTAB_STE_1_SHCFG, 1660 1662 STRTAB_STE_1_SHCFG_INCOMING)); ··· 2107 2105 if (!master->ats_enabled) 2108 2106 continue; 2109 2107 2110 - arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, &cmd); 2108 + if (master_domain->nested_ats_flush) { 2109 + /* 2110 + * If a S2 used as a nesting parent is changed we have 2111 + * no option but to completely flush the ATC. 2112 + */ 2113 + arm_smmu_atc_inv_to_cmd(IOMMU_NO_PASID, 0, 0, &cmd); 2114 + } else { 2115 + arm_smmu_atc_inv_to_cmd(master_domain->ssid, iova, size, 2116 + &cmd); 2117 + } 2111 2118 2112 2119 for (i = 0; i < master->num_streams; i++) { 2113 2120 cmd.atc.sid = master->streams[i].id; ··· 2243 2232 } 2244 2233 __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain); 2245 2234 2235 + if (smmu_domain->nest_parent) { 2236 + /* 2237 + * When the S2 domain changes all the nested S1 ASIDs have to be 2238 + * flushed too. 2239 + */ 2240 + cmd.opcode = CMDQ_OP_TLBI_NH_ALL; 2241 + arm_smmu_cmdq_issue_cmd_with_sync(smmu_domain->smmu, &cmd); 2242 + } 2243 + 2246 2244 /* 2247 2245 * Unfortunately, this can't be leaf-only since we may have 2248 2246 * zapped an entire table. ··· 2313 2293 case IOMMU_CAP_CACHE_COHERENCY: 2314 2294 /* Assume that a coherent TCU implies coherent TBUs */ 2315 2295 return master->smmu->features & ARM_SMMU_FEAT_COHERENCY; 2296 + case IOMMU_CAP_ENFORCE_CACHE_COHERENCY: 2297 + return arm_smmu_master_canwbs(master); 2316 2298 case IOMMU_CAP_NOEXEC: 2317 2299 case IOMMU_CAP_DEFERRED_FLUSH: 2318 2300 return true; ··· 2323 2301 default: 2324 2302 return false; 2325 2303 } 2304 + } 2305 + 2306 + static bool arm_smmu_enforce_cache_coherency(struct iommu_domain *domain) 2307 + { 2308 + struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); 2309 + struct arm_smmu_master_domain *master_domain; 2310 + unsigned long flags; 2311 + bool ret = true; 2312 + 2313 + spin_lock_irqsave(&smmu_domain->devices_lock, flags); 2314 + list_for_each_entry(master_domain, &smmu_domain->devices, 2315 + devices_elm) { 2316 + if (!arm_smmu_master_canwbs(master_domain->master)) { 2317 + ret = false; 2318 + break; 2319 + } 2320 + } 2321 + smmu_domain->enforce_cache_coherency = ret; 2322 + spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); 2323 + return ret; 2326 2324 } 2327 2325 2328 2326 struct arm_smmu_domain *arm_smmu_domain_alloc(void) ··· 2484 2442 pgtbl_cfg.oas = smmu->oas; 2485 2443 fmt = ARM_64_LPAE_S2; 2486 2444 finalise_stage_fn = arm_smmu_domain_finalise_s2; 2445 + if ((smmu->features & ARM_SMMU_FEAT_S2FWB) && 2446 + (flags & IOMMU_HWPT_ALLOC_NEST_PARENT)) 2447 + pgtbl_cfg.quirks |= IO_PGTABLE_QUIRK_ARM_S2FWB; 2487 2448 break; 2488 2449 default: 2489 2450 return -EINVAL; ··· 2528 2483 } 2529 2484 } 2530 2485 2531 - static void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, 2532 - const struct arm_smmu_ste *target) 2486 + void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, 2487 + const struct arm_smmu_ste *target) 2533 2488 { 2534 2489 int i, j; 2535 2490 struct arm_smmu_device *smmu = master->smmu; ··· 2640 2595 static struct arm_smmu_master_domain * 2641 2596 arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, 2642 2597 struct arm_smmu_master *master, 2643 - ioasid_t ssid) 2598 + ioasid_t ssid, bool nested_ats_flush) 2644 2599 { 2645 2600 struct arm_smmu_master_domain *master_domain; 2646 2601 ··· 2649 2604 list_for_each_entry(master_domain, &smmu_domain->devices, 2650 2605 devices_elm) { 2651 2606 if (master_domain->master == master && 2652 - master_domain->ssid == ssid) 2607 + master_domain->ssid == ssid && 2608 + master_domain->nested_ats_flush == nested_ats_flush) 2653 2609 return master_domain; 2654 2610 } 2655 2611 return NULL; ··· 2670 2624 if ((domain->type & __IOMMU_DOMAIN_PAGING) || 2671 2625 domain->type == IOMMU_DOMAIN_SVA) 2672 2626 return to_smmu_domain(domain); 2627 + if (domain->type == IOMMU_DOMAIN_NESTED) 2628 + return to_smmu_nested_domain(domain)->vsmmu->s2_parent; 2673 2629 return NULL; 2674 2630 } 2675 2631 ··· 2681 2633 { 2682 2634 struct arm_smmu_domain *smmu_domain = to_smmu_domain_devices(domain); 2683 2635 struct arm_smmu_master_domain *master_domain; 2636 + bool nested_ats_flush = false; 2684 2637 unsigned long flags; 2685 2638 2686 2639 if (!smmu_domain) 2687 2640 return; 2688 2641 2642 + if (domain->type == IOMMU_DOMAIN_NESTED) 2643 + nested_ats_flush = to_smmu_nested_domain(domain)->enable_ats; 2644 + 2689 2645 spin_lock_irqsave(&smmu_domain->devices_lock, flags); 2690 - master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid); 2646 + master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid, 2647 + nested_ats_flush); 2691 2648 if (master_domain) { 2692 2649 list_del(&master_domain->devices_elm); 2693 2650 kfree(master_domain); ··· 2701 2648 } 2702 2649 spin_unlock_irqrestore(&smmu_domain->devices_lock, flags); 2703 2650 } 2704 - 2705 - struct arm_smmu_attach_state { 2706 - /* Inputs */ 2707 - struct iommu_domain *old_domain; 2708 - struct arm_smmu_master *master; 2709 - bool cd_needs_ats; 2710 - ioasid_t ssid; 2711 - /* Resulting state */ 2712 - bool ats_enabled; 2713 - }; 2714 2651 2715 2652 /* 2716 2653 * Start the sequence to attach a domain to a master. The sequence contains three ··· 2722 2679 * new_domain can be a non-paging domain. In this case ATS will not be enabled, 2723 2680 * and invalidations won't be tracked. 2724 2681 */ 2725 - static int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, 2726 - struct iommu_domain *new_domain) 2682 + int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, 2683 + struct iommu_domain *new_domain) 2727 2684 { 2728 2685 struct arm_smmu_master *master = state->master; 2729 2686 struct arm_smmu_master_domain *master_domain; ··· 2749 2706 * enabled if we have arm_smmu_domain, those always have page 2750 2707 * tables. 2751 2708 */ 2752 - state->ats_enabled = arm_smmu_ats_supported(master); 2709 + state->ats_enabled = !state->disable_ats && 2710 + arm_smmu_ats_supported(master); 2753 2711 } 2754 2712 2755 2713 if (smmu_domain) { ··· 2759 2715 return -ENOMEM; 2760 2716 master_domain->master = master; 2761 2717 master_domain->ssid = state->ssid; 2718 + if (new_domain->type == IOMMU_DOMAIN_NESTED) 2719 + master_domain->nested_ats_flush = 2720 + to_smmu_nested_domain(new_domain)->enable_ats; 2762 2721 2763 2722 /* 2764 2723 * During prepare we want the current smmu_domain and new ··· 2778 2731 * one of them. 2779 2732 */ 2780 2733 spin_lock_irqsave(&smmu_domain->devices_lock, flags); 2734 + if (smmu_domain->enforce_cache_coherency && 2735 + !arm_smmu_master_canwbs(master)) { 2736 + spin_unlock_irqrestore(&smmu_domain->devices_lock, 2737 + flags); 2738 + kfree(master_domain); 2739 + return -EINVAL; 2740 + } 2741 + 2781 2742 if (state->ats_enabled) 2782 2743 atomic_inc(&smmu_domain->nr_ats_masters); 2783 2744 list_add(&master_domain->devices_elm, &smmu_domain->devices); ··· 2809 2754 * completes synchronizing the PCI device's ATC and finishes manipulating the 2810 2755 * smmu_domain->devices list. 2811 2756 */ 2812 - static void arm_smmu_attach_commit(struct arm_smmu_attach_state *state) 2757 + void arm_smmu_attach_commit(struct arm_smmu_attach_state *state) 2813 2758 { 2814 2759 struct arm_smmu_master *master = state->master; 2815 2760 ··· 3139 3084 const struct iommu_user_data *user_data) 3140 3085 { 3141 3086 struct arm_smmu_master *master = dev_iommu_priv_get(dev); 3142 - const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING; 3087 + const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING | 3088 + IOMMU_HWPT_ALLOC_NEST_PARENT; 3143 3089 struct arm_smmu_domain *smmu_domain; 3144 3090 int ret; 3145 3091 ··· 3152 3096 smmu_domain = arm_smmu_domain_alloc(); 3153 3097 if (IS_ERR(smmu_domain)) 3154 3098 return ERR_CAST(smmu_domain); 3099 + 3100 + if (flags & IOMMU_HWPT_ALLOC_NEST_PARENT) { 3101 + if (!(master->smmu->features & ARM_SMMU_FEAT_NESTING)) { 3102 + ret = -EOPNOTSUPP; 3103 + goto err_free; 3104 + } 3105 + smmu_domain->stage = ARM_SMMU_DOMAIN_S2; 3106 + smmu_domain->nest_parent = true; 3107 + } 3155 3108 3156 3109 smmu_domain->domain.type = IOMMU_DOMAIN_UNMANAGED; 3157 3110 smmu_domain->domain.ops = arm_smmu_ops.default_domain_ops; ··· 3443 3378 return group; 3444 3379 } 3445 3380 3446 - static int arm_smmu_enable_nesting(struct iommu_domain *domain) 3447 - { 3448 - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); 3449 - int ret = 0; 3450 - 3451 - mutex_lock(&smmu_domain->init_mutex); 3452 - if (smmu_domain->smmu) 3453 - ret = -EPERM; 3454 - else 3455 - smmu_domain->stage = ARM_SMMU_DOMAIN_S2; 3456 - mutex_unlock(&smmu_domain->init_mutex); 3457 - 3458 - return ret; 3459 - } 3460 - 3461 3381 static int arm_smmu_of_xlate(struct device *dev, 3462 3382 const struct of_phandle_args *args) 3463 3383 { ··· 3541 3491 .identity_domain = &arm_smmu_identity_domain, 3542 3492 .blocked_domain = &arm_smmu_blocked_domain, 3543 3493 .capable = arm_smmu_capable, 3494 + .hw_info = arm_smmu_hw_info, 3544 3495 .domain_alloc_paging = arm_smmu_domain_alloc_paging, 3545 3496 .domain_alloc_sva = arm_smmu_sva_domain_alloc, 3546 3497 .domain_alloc_user = arm_smmu_domain_alloc_user, ··· 3555 3504 .dev_disable_feat = arm_smmu_dev_disable_feature, 3556 3505 .page_response = arm_smmu_page_response, 3557 3506 .def_domain_type = arm_smmu_def_domain_type, 3507 + .viommu_alloc = arm_vsmmu_alloc, 3508 + .user_pasid_table = 1, 3558 3509 .pgsize_bitmap = -1UL, /* Restricted during device attach */ 3559 3510 .owner = THIS_MODULE, 3560 3511 .default_domain_ops = &(const struct iommu_domain_ops) { 3561 3512 .attach_dev = arm_smmu_attach_dev, 3513 + .enforce_cache_coherency = arm_smmu_enforce_cache_coherency, 3562 3514 .set_dev_pasid = arm_smmu_s1_set_dev_pasid, 3563 3515 .map_pages = arm_smmu_map_pages, 3564 3516 .unmap_pages = arm_smmu_unmap_pages, 3565 3517 .flush_iotlb_all = arm_smmu_flush_iotlb_all, 3566 3518 .iotlb_sync = arm_smmu_iotlb_sync, 3567 3519 .iova_to_phys = arm_smmu_iova_to_phys, 3568 - .enable_nesting = arm_smmu_enable_nesting, 3569 3520 .free = arm_smmu_domain_free_paging, 3570 3521 } 3571 3522 };
+87 -5
drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h
··· 10 10 11 11 #include <linux/bitfield.h> 12 12 #include <linux/iommu.h> 13 + #include <linux/iommufd.h> 13 14 #include <linux/kernel.h> 14 15 #include <linux/mmzone.h> 15 16 #include <linux/sizes.h> ··· 58 57 #define IDR1_SIDSIZE GENMASK(5, 0) 59 58 60 59 #define ARM_SMMU_IDR3 0xc 60 + #define IDR3_FWB (1 << 8) 61 61 #define IDR3_RIL (1 << 10) 62 62 63 63 #define ARM_SMMU_IDR5 0x14 ··· 82 80 #define IIDR_VARIANT GENMASK(19, 16) 83 81 #define IIDR_REVISION GENMASK(15, 12) 84 82 #define IIDR_IMPLEMENTER GENMASK(11, 0) 83 + 84 + #define ARM_SMMU_AIDR 0x1C 85 85 86 86 #define ARM_SMMU_CR0 0x20 87 87 #define CR0_ATSCHK (1 << 4) ··· 245 241 #define STRTAB_STE_0_CFG_BYPASS 4 246 242 #define STRTAB_STE_0_CFG_S1_TRANS 5 247 243 #define STRTAB_STE_0_CFG_S2_TRANS 6 244 + #define STRTAB_STE_0_CFG_NESTED 7 248 245 249 246 #define STRTAB_STE_0_S1FMT GENMASK_ULL(5, 4) 250 247 #define STRTAB_STE_0_S1FMT_LINEAR 0 ··· 266 261 #define STRTAB_STE_1_S1COR GENMASK_ULL(5, 4) 267 262 #define STRTAB_STE_1_S1CSH GENMASK_ULL(7, 6) 268 263 264 + #define STRTAB_STE_1_S2FWB (1UL << 25) 269 265 #define STRTAB_STE_1_S1STALLD (1UL << 27) 270 266 271 267 #define STRTAB_STE_1_EATS GENMASK_ULL(29, 28) ··· 297 291 #define STRTAB_STE_2_S2R (1UL << 58) 298 292 299 293 #define STRTAB_STE_3_S2TTB_MASK GENMASK_ULL(51, 4) 294 + 295 + /* These bits can be controlled by userspace for STRTAB_STE_0_CFG_NESTED */ 296 + #define STRTAB_STE_0_NESTING_ALLOWED \ 297 + cpu_to_le64(STRTAB_STE_0_V | STRTAB_STE_0_CFG | STRTAB_STE_0_S1FMT | \ 298 + STRTAB_STE_0_S1CTXPTR_MASK | STRTAB_STE_0_S1CDMAX) 299 + #define STRTAB_STE_1_NESTING_ALLOWED \ 300 + cpu_to_le64(STRTAB_STE_1_S1DSS | STRTAB_STE_1_S1CIR | \ 301 + STRTAB_STE_1_S1COR | STRTAB_STE_1_S1CSH | \ 302 + STRTAB_STE_1_S1STALLD | STRTAB_STE_1_EATS) 300 303 301 304 /* 302 305 * Context descriptors. ··· 526 511 }; 527 512 } cfgi; 528 513 514 + #define CMDQ_OP_TLBI_NH_ALL 0x10 529 515 #define CMDQ_OP_TLBI_NH_ASID 0x11 530 516 #define CMDQ_OP_TLBI_NH_VA 0x12 517 + #define CMDQ_OP_TLBI_NH_VAA 0x13 531 518 #define CMDQ_OP_TLBI_EL2_ALL 0x20 532 519 #define CMDQ_OP_TLBI_EL2_ASID 0x21 533 520 #define CMDQ_OP_TLBI_EL2_VA 0x22 ··· 743 726 #define ARM_SMMU_FEAT_ATTR_TYPES_OVR (1 << 20) 744 727 #define ARM_SMMU_FEAT_HA (1 << 21) 745 728 #define ARM_SMMU_FEAT_HD (1 << 22) 729 + #define ARM_SMMU_FEAT_S2FWB (1 << 23) 746 730 u32 features; 747 731 748 732 #define ARM_SMMU_OPT_SKIP_PREFETCH (1 << 0) ··· 829 811 /* List of struct arm_smmu_master_domain */ 830 812 struct list_head devices; 831 813 spinlock_t devices_lock; 814 + bool enforce_cache_coherency : 1; 815 + bool nest_parent : 1; 832 816 833 817 struct mmu_notifier mmu_notifier; 818 + }; 819 + 820 + struct arm_smmu_nested_domain { 821 + struct iommu_domain domain; 822 + struct arm_vsmmu *vsmmu; 823 + bool enable_ats : 1; 824 + 825 + __le64 ste[2]; 834 826 }; 835 827 836 828 /* The following are exposed for testing purposes. */ ··· 855 827 void (*sync)(struct arm_smmu_entry_writer *writer); 856 828 }; 857 829 830 + void arm_smmu_make_abort_ste(struct arm_smmu_ste *target); 831 + void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, 832 + struct arm_smmu_master *master, 833 + struct arm_smmu_domain *smmu_domain, 834 + bool ats_enabled); 835 + 858 836 #if IS_ENABLED(CONFIG_KUNIT) 859 837 void arm_smmu_get_ste_used(const __le64 *ent, __le64 *used_bits); 860 838 void arm_smmu_write_entry(struct arm_smmu_entry_writer *writer, __le64 *cur, 861 839 const __le64 *target); 862 840 void arm_smmu_get_cd_used(const __le64 *ent, __le64 *used_bits); 863 - void arm_smmu_make_abort_ste(struct arm_smmu_ste *target); 864 841 void arm_smmu_make_bypass_ste(struct arm_smmu_device *smmu, 865 842 struct arm_smmu_ste *target); 866 843 void arm_smmu_make_cdtable_ste(struct arm_smmu_ste *target, 867 844 struct arm_smmu_master *master, bool ats_enabled, 868 845 unsigned int s1dss); 869 - void arm_smmu_make_s2_domain_ste(struct arm_smmu_ste *target, 870 - struct arm_smmu_master *master, 871 - struct arm_smmu_domain *smmu_domain, 872 - bool ats_enabled); 873 846 void arm_smmu_make_sva_cd(struct arm_smmu_cd *target, 874 847 struct arm_smmu_master *master, struct mm_struct *mm, 875 848 u16 asid); ··· 880 851 struct list_head devices_elm; 881 852 struct arm_smmu_master *master; 882 853 ioasid_t ssid; 854 + bool nested_ats_flush : 1; 883 855 }; 884 856 885 857 static inline struct arm_smmu_domain *to_smmu_domain(struct iommu_domain *dom) 886 858 { 887 859 return container_of(dom, struct arm_smmu_domain, domain); 860 + } 861 + 862 + static inline struct arm_smmu_nested_domain * 863 + to_smmu_nested_domain(struct iommu_domain *dom) 864 + { 865 + return container_of(dom, struct arm_smmu_nested_domain, domain); 888 866 } 889 867 890 868 extern struct xarray arm_smmu_asid_xa; ··· 928 892 size_t dwords, const char *name); 929 893 int arm_smmu_cmdq_init(struct arm_smmu_device *smmu, 930 894 struct arm_smmu_cmdq *cmdq); 895 + 896 + static inline bool arm_smmu_master_canwbs(struct arm_smmu_master *master) 897 + { 898 + return dev_iommu_fwspec_get(master->dev)->flags & 899 + IOMMU_FWSPEC_PCI_RC_CANWBS; 900 + } 901 + 902 + struct arm_smmu_attach_state { 903 + /* Inputs */ 904 + struct iommu_domain *old_domain; 905 + struct arm_smmu_master *master; 906 + bool cd_needs_ats; 907 + bool disable_ats; 908 + ioasid_t ssid; 909 + /* Resulting state */ 910 + bool ats_enabled; 911 + }; 912 + 913 + int arm_smmu_attach_prepare(struct arm_smmu_attach_state *state, 914 + struct iommu_domain *new_domain); 915 + void arm_smmu_attach_commit(struct arm_smmu_attach_state *state); 916 + void arm_smmu_install_ste_for_dev(struct arm_smmu_master *master, 917 + const struct arm_smmu_ste *target); 918 + 919 + int arm_smmu_cmdq_issue_cmdlist(struct arm_smmu_device *smmu, 920 + struct arm_smmu_cmdq *cmdq, u64 *cmds, int n, 921 + bool sync); 931 922 932 923 #ifdef CONFIG_ARM_SMMU_V3_SVA 933 924 bool arm_smmu_sva_supported(struct arm_smmu_device *smmu); ··· 1012 949 return ERR_PTR(-ENODEV); 1013 950 } 1014 951 #endif /* CONFIG_TEGRA241_CMDQV */ 952 + 953 + struct arm_vsmmu { 954 + struct iommufd_viommu core; 955 + struct arm_smmu_device *smmu; 956 + struct arm_smmu_domain *s2_parent; 957 + u16 vmid; 958 + }; 959 + 960 + #if IS_ENABLED(CONFIG_ARM_SMMU_V3_IOMMUFD) 961 + void *arm_smmu_hw_info(struct device *dev, u32 *length, u32 *type); 962 + struct iommufd_viommu *arm_vsmmu_alloc(struct device *dev, 963 + struct iommu_domain *parent, 964 + struct iommufd_ctx *ictx, 965 + unsigned int viommu_type); 966 + #else 967 + #define arm_smmu_hw_info NULL 968 + #define arm_vsmmu_alloc NULL 969 + #endif /* CONFIG_ARM_SMMU_V3_IOMMUFD */ 970 + 1015 971 #endif /* _ARM_SMMU_V3_H */
-16
drivers/iommu/arm/arm-smmu/arm-smmu.c
··· 1558 1558 return group; 1559 1559 } 1560 1560 1561 - static int arm_smmu_enable_nesting(struct iommu_domain *domain) 1562 - { 1563 - struct arm_smmu_domain *smmu_domain = to_smmu_domain(domain); 1564 - int ret = 0; 1565 - 1566 - mutex_lock(&smmu_domain->init_mutex); 1567 - if (smmu_domain->smmu) 1568 - ret = -EPERM; 1569 - else 1570 - smmu_domain->stage = ARM_SMMU_DOMAIN_NESTED; 1571 - mutex_unlock(&smmu_domain->init_mutex); 1572 - 1573 - return ret; 1574 - } 1575 - 1576 1561 static int arm_smmu_set_pgtable_quirks(struct iommu_domain *domain, 1577 1562 unsigned long quirks) 1578 1563 { ··· 1641 1656 .flush_iotlb_all = arm_smmu_flush_iotlb_all, 1642 1657 .iotlb_sync = arm_smmu_iotlb_sync, 1643 1658 .iova_to_phys = arm_smmu_iova_to_phys, 1644 - .enable_nesting = arm_smmu_enable_nesting, 1645 1659 .set_pgtable_quirks = arm_smmu_set_pgtable_quirks, 1646 1660 .free = arm_smmu_domain_free, 1647 1661 }
+21 -6
drivers/iommu/io-pgtable-arm.c
··· 106 106 #define ARM_LPAE_PTE_HAP_FAULT (((arm_lpae_iopte)0) << 6) 107 107 #define ARM_LPAE_PTE_HAP_READ (((arm_lpae_iopte)1) << 6) 108 108 #define ARM_LPAE_PTE_HAP_WRITE (((arm_lpae_iopte)2) << 6) 109 + /* 110 + * For !FWB these code to: 111 + * 1111 = Normal outer write back cachable / Inner Write Back Cachable 112 + * Permit S1 to override 113 + * 0101 = Normal Non-cachable / Inner Non-cachable 114 + * 0001 = Device / Device-nGnRE 115 + * For S2FWB these code: 116 + * 0110 Force Normal Write Back 117 + * 0101 Normal* is forced Normal-NC, Device unchanged 118 + * 0001 Force Device-nGnRE 119 + */ 120 + #define ARM_LPAE_PTE_MEMATTR_FWB_WB (((arm_lpae_iopte)0x6) << 2) 109 121 #define ARM_LPAE_PTE_MEMATTR_OIWB (((arm_lpae_iopte)0xf) << 2) 110 122 #define ARM_LPAE_PTE_MEMATTR_NC (((arm_lpae_iopte)0x5) << 2) 111 123 #define ARM_LPAE_PTE_MEMATTR_DEV (((arm_lpae_iopte)0x1) << 2) ··· 470 458 */ 471 459 if (data->iop.fmt == ARM_64_LPAE_S2 || 472 460 data->iop.fmt == ARM_32_LPAE_S2) { 473 - if (prot & IOMMU_MMIO) 461 + if (prot & IOMMU_MMIO) { 474 462 pte |= ARM_LPAE_PTE_MEMATTR_DEV; 475 - else if (prot & IOMMU_CACHE) 476 - pte |= ARM_LPAE_PTE_MEMATTR_OIWB; 477 - else 463 + } else if (prot & IOMMU_CACHE) { 464 + if (data->iop.cfg.quirks & IO_PGTABLE_QUIRK_ARM_S2FWB) 465 + pte |= ARM_LPAE_PTE_MEMATTR_FWB_WB; 466 + else 467 + pte |= ARM_LPAE_PTE_MEMATTR_OIWB; 468 + } else { 478 469 pte |= ARM_LPAE_PTE_MEMATTR_NC; 470 + } 479 471 } else { 480 472 if (prot & IOMMU_MMIO) 481 473 pte |= (ARM_LPAE_MAIR_ATTR_IDX_DEV ··· 1051 1035 struct arm_lpae_io_pgtable *data; 1052 1036 typeof(&cfg->arm_lpae_s2_cfg.vtcr) vtcr = &cfg->arm_lpae_s2_cfg.vtcr; 1053 1037 1054 - /* The NS quirk doesn't apply at stage 2 */ 1055 - if (cfg->quirks) 1038 + if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_S2FWB)) 1056 1039 return NULL; 1057 1040 1058 1041 data = arm_lpae_alloc_pgtable(cfg);
-10
drivers/iommu/iommu.c
··· 2723 2723 } 2724 2724 core_initcall(iommu_init); 2725 2725 2726 - int iommu_enable_nesting(struct iommu_domain *domain) 2727 - { 2728 - if (domain->type != IOMMU_DOMAIN_UNMANAGED) 2729 - return -EINVAL; 2730 - if (!domain->ops->enable_nesting) 2731 - return -EINVAL; 2732 - return domain->ops->enable_nesting(domain); 2733 - } 2734 - EXPORT_SYMBOL_GPL(iommu_enable_nesting); 2735 - 2736 2726 int iommu_set_pgtable_quirks(struct iommu_domain *domain, 2737 2727 unsigned long quirk) 2738 2728 {
+4
drivers/iommu/iommufd/Kconfig
··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 + config IOMMUFD_DRIVER_CORE 3 + tristate 4 + default (IOMMUFD_DRIVER || IOMMUFD) if IOMMUFD!=n 5 + 2 6 config IOMMUFD 3 7 tristate "IOMMU Userspace API" 4 8 select INTERVAL_TREE
+5 -1
drivers/iommu/iommufd/Makefile
··· 7 7 ioas.o \ 8 8 main.o \ 9 9 pages.o \ 10 - vfio_compat.o 10 + vfio_compat.o \ 11 + viommu.o 11 12 12 13 iommufd-$(CONFIG_IOMMUFD_TEST) += selftest.o 13 14 14 15 obj-$(CONFIG_IOMMUFD) += iommufd.o 15 16 obj-$(CONFIG_IOMMUFD_DRIVER) += iova_bitmap.o 17 + 18 + iommufd_driver-y := driver.o 19 + obj-$(CONFIG_IOMMUFD_DRIVER_CORE) += iommufd_driver.o
+53
drivers/iommu/iommufd/driver.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES 3 + */ 4 + #include "iommufd_private.h" 5 + 6 + struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, 7 + size_t size, 8 + enum iommufd_object_type type) 9 + { 10 + struct iommufd_object *obj; 11 + int rc; 12 + 13 + obj = kzalloc(size, GFP_KERNEL_ACCOUNT); 14 + if (!obj) 15 + return ERR_PTR(-ENOMEM); 16 + obj->type = type; 17 + /* Starts out bias'd by 1 until it is removed from the xarray */ 18 + refcount_set(&obj->shortterm_users, 1); 19 + refcount_set(&obj->users, 1); 20 + 21 + /* 22 + * Reserve an ID in the xarray but do not publish the pointer yet since 23 + * the caller hasn't initialized it yet. Once the pointer is published 24 + * in the xarray and visible to other threads we can't reliably destroy 25 + * it anymore, so the caller must complete all errorable operations 26 + * before calling iommufd_object_finalize(). 27 + */ 28 + rc = xa_alloc(&ictx->objects, &obj->id, XA_ZERO_ENTRY, xa_limit_31b, 29 + GFP_KERNEL_ACCOUNT); 30 + if (rc) 31 + goto out_free; 32 + return obj; 33 + out_free: 34 + kfree(obj); 35 + return ERR_PTR(rc); 36 + } 37 + EXPORT_SYMBOL_NS_GPL(_iommufd_object_alloc, IOMMUFD); 38 + 39 + /* Caller should xa_lock(&viommu->vdevs) to protect the return value */ 40 + struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, 41 + unsigned long vdev_id) 42 + { 43 + struct iommufd_vdevice *vdev; 44 + 45 + lockdep_assert_held(&viommu->vdevs.xa_lock); 46 + 47 + vdev = xa_load(&viommu->vdevs, vdev_id); 48 + return vdev ? vdev->dev : NULL; 49 + } 50 + EXPORT_SYMBOL_NS_GPL(iommufd_viommu_find_dev, IOMMUFD); 51 + 52 + MODULE_DESCRIPTION("iommufd code shared with builtin modules"); 53 + MODULE_LICENSE("GPL");
+7 -2
drivers/iommu/iommufd/fault.c
··· 10 10 #include <linux/module.h> 11 11 #include <linux/mutex.h> 12 12 #include <linux/pci.h> 13 + #include <linux/pci-ats.h> 13 14 #include <linux/poll.h> 14 15 #include <uapi/linux/iommufd.h> 15 16 ··· 28 27 * resource between PF and VFs. There is no coordination for this 29 28 * shared capability. This waits for a vPRI reset to recover. 30 29 */ 31 - if (dev_is_pci(dev) && to_pci_dev(dev)->is_virtfn) 32 - return -EINVAL; 30 + if (dev_is_pci(dev)) { 31 + struct pci_dev *pdev = to_pci_dev(dev); 32 + 33 + if (pdev->is_virtfn && pci_pri_supported(pdev)) 34 + return -EINVAL; 35 + } 33 36 34 37 mutex_lock(&idev->iopf_lock); 35 38 /* Device iopf has already been on. */
+103 -10
drivers/iommu/iommufd/hw_pagetable.c
··· 57 57 container_of(obj, struct iommufd_hwpt_nested, common.obj); 58 58 59 59 __iommufd_hwpt_destroy(&hwpt_nested->common); 60 - refcount_dec(&hwpt_nested->parent->common.obj.users); 60 + if (hwpt_nested->viommu) 61 + refcount_dec(&hwpt_nested->viommu->obj.users); 62 + else 63 + refcount_dec(&hwpt_nested->parent->common.obj.users); 61 64 } 62 65 63 66 void iommufd_hwpt_nested_abort(struct iommufd_object *obj) ··· 251 248 } 252 249 hwpt->domain->owner = ops; 253 250 254 - if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED || 255 - !hwpt->domain->ops->cache_invalidate_user)) { 251 + if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) { 256 252 rc = -EINVAL; 257 253 goto out_abort; 258 254 } ··· 259 257 260 258 out_abort: 261 259 iommufd_object_abort_and_destroy(ictx, &hwpt->obj); 260 + return ERR_PTR(rc); 261 + } 262 + 263 + /** 264 + * iommufd_viommu_alloc_hwpt_nested() - Get a hwpt_nested for a vIOMMU 265 + * @viommu: vIOMMU ojbect to associate the hwpt_nested/domain with 266 + * @flags: Flags from userspace 267 + * @user_data: user_data pointer. Must be valid 268 + * 269 + * Allocate a new IOMMU_DOMAIN_NESTED for a vIOMMU and return it as a NESTED 270 + * hw_pagetable. 271 + */ 272 + static struct iommufd_hwpt_nested * 273 + iommufd_viommu_alloc_hwpt_nested(struct iommufd_viommu *viommu, u32 flags, 274 + const struct iommu_user_data *user_data) 275 + { 276 + struct iommufd_hwpt_nested *hwpt_nested; 277 + struct iommufd_hw_pagetable *hwpt; 278 + int rc; 279 + 280 + if (!user_data->len) 281 + return ERR_PTR(-EOPNOTSUPP); 282 + if (!viommu->ops || !viommu->ops->alloc_domain_nested) 283 + return ERR_PTR(-EOPNOTSUPP); 284 + 285 + hwpt_nested = __iommufd_object_alloc( 286 + viommu->ictx, hwpt_nested, IOMMUFD_OBJ_HWPT_NESTED, common.obj); 287 + if (IS_ERR(hwpt_nested)) 288 + return ERR_CAST(hwpt_nested); 289 + hwpt = &hwpt_nested->common; 290 + 291 + hwpt_nested->viommu = viommu; 292 + refcount_inc(&viommu->obj.users); 293 + hwpt_nested->parent = viommu->hwpt; 294 + 295 + hwpt->domain = 296 + viommu->ops->alloc_domain_nested(viommu, flags, user_data); 297 + if (IS_ERR(hwpt->domain)) { 298 + rc = PTR_ERR(hwpt->domain); 299 + hwpt->domain = NULL; 300 + goto out_abort; 301 + } 302 + hwpt->domain->owner = viommu->iommu_dev->ops; 303 + 304 + if (WARN_ON_ONCE(hwpt->domain->type != IOMMU_DOMAIN_NESTED)) { 305 + rc = -EINVAL; 306 + goto out_abort; 307 + } 308 + return hwpt_nested; 309 + 310 + out_abort: 311 + iommufd_object_abort_and_destroy(viommu->ictx, &hwpt->obj); 262 312 return ERR_PTR(rc); 263 313 } 264 314 ··· 365 311 container_of(pt_obj, struct iommufd_hwpt_paging, 366 312 common.obj), 367 313 idev, cmd->flags, &user_data); 314 + if (IS_ERR(hwpt_nested)) { 315 + rc = PTR_ERR(hwpt_nested); 316 + goto out_unlock; 317 + } 318 + hwpt = &hwpt_nested->common; 319 + } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) { 320 + struct iommufd_hwpt_nested *hwpt_nested; 321 + struct iommufd_viommu *viommu; 322 + 323 + viommu = container_of(pt_obj, struct iommufd_viommu, obj); 324 + if (viommu->iommu_dev != __iommu_get_iommu_dev(idev->dev)) { 325 + rc = -EINVAL; 326 + goto out_unlock; 327 + } 328 + hwpt_nested = iommufd_viommu_alloc_hwpt_nested( 329 + viommu, cmd->flags, &user_data); 368 330 if (IS_ERR(hwpt_nested)) { 369 331 rc = PTR_ERR(hwpt_nested); 370 332 goto out_unlock; ··· 482 412 .entry_len = cmd->entry_len, 483 413 .entry_num = cmd->entry_num, 484 414 }; 485 - struct iommufd_hw_pagetable *hwpt; 415 + struct iommufd_object *pt_obj; 486 416 u32 done_num = 0; 487 417 int rc; 488 418 ··· 496 426 goto out; 497 427 } 498 428 499 - hwpt = iommufd_get_hwpt_nested(ucmd, cmd->hwpt_id); 500 - if (IS_ERR(hwpt)) { 501 - rc = PTR_ERR(hwpt); 429 + pt_obj = iommufd_get_object(ucmd->ictx, cmd->hwpt_id, IOMMUFD_OBJ_ANY); 430 + if (IS_ERR(pt_obj)) { 431 + rc = PTR_ERR(pt_obj); 502 432 goto out; 503 433 } 434 + if (pt_obj->type == IOMMUFD_OBJ_HWPT_NESTED) { 435 + struct iommufd_hw_pagetable *hwpt = 436 + container_of(pt_obj, struct iommufd_hw_pagetable, obj); 504 437 505 - rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, 506 - &data_array); 438 + if (!hwpt->domain->ops || 439 + !hwpt->domain->ops->cache_invalidate_user) { 440 + rc = -EOPNOTSUPP; 441 + goto out_put_pt; 442 + } 443 + rc = hwpt->domain->ops->cache_invalidate_user(hwpt->domain, 444 + &data_array); 445 + } else if (pt_obj->type == IOMMUFD_OBJ_VIOMMU) { 446 + struct iommufd_viommu *viommu = 447 + container_of(pt_obj, struct iommufd_viommu, obj); 448 + 449 + if (!viommu->ops || !viommu->ops->cache_invalidate) { 450 + rc = -EOPNOTSUPP; 451 + goto out_put_pt; 452 + } 453 + rc = viommu->ops->cache_invalidate(viommu, &data_array); 454 + } else { 455 + rc = -EINVAL; 456 + goto out_put_pt; 457 + } 458 + 507 459 done_num = data_array.entry_num; 508 460 509 - iommufd_put_object(ucmd->ictx, &hwpt->obj); 461 + out_put_pt: 462 + iommufd_put_object(ucmd->ictx, pt_obj); 510 463 out: 511 464 cmd->entry_num = done_num; 512 465 if (iommufd_ucmd_respond(ucmd, sizeof(*cmd)))
+76 -29
drivers/iommu/iommufd/io_pagetable.c
··· 107 107 * Does not return a 0 IOVA even if it is valid. 108 108 */ 109 109 static int iopt_alloc_iova(struct io_pagetable *iopt, unsigned long *iova, 110 - unsigned long uptr, unsigned long length) 110 + unsigned long addr, unsigned long length) 111 111 { 112 - unsigned long page_offset = uptr % PAGE_SIZE; 112 + unsigned long page_offset = addr % PAGE_SIZE; 113 113 struct interval_tree_double_span_iter used_span; 114 114 struct interval_tree_span_iter allowed_span; 115 115 unsigned long max_alignment = PAGE_SIZE; ··· 122 122 return -EOVERFLOW; 123 123 124 124 /* 125 - * Keep alignment present in the uptr when building the IOVA, this 125 + * Keep alignment present in addr when building the IOVA, which 126 126 * increases the chance we can map a THP. 127 127 */ 128 - if (!uptr) 128 + if (!addr) 129 129 iova_alignment = roundup_pow_of_two(length); 130 130 else 131 131 iova_alignment = min_t(unsigned long, 132 132 roundup_pow_of_two(length), 133 - 1UL << __ffs64(uptr)); 133 + 1UL << __ffs64(addr)); 134 134 135 135 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 136 136 max_alignment = HPAGE_SIZE; ··· 248 248 int iommu_prot, unsigned int flags) 249 249 { 250 250 struct iopt_pages_list *elm; 251 + unsigned long start; 251 252 unsigned long iova; 252 253 int rc = 0; 253 254 ··· 268 267 /* Use the first entry to guess the ideal IOVA alignment */ 269 268 elm = list_first_entry(pages_list, struct iopt_pages_list, 270 269 next); 271 - rc = iopt_alloc_iova( 272 - iopt, dst_iova, 273 - (uintptr_t)elm->pages->uptr + elm->start_byte, length); 270 + switch (elm->pages->type) { 271 + case IOPT_ADDRESS_USER: 272 + start = elm->start_byte + (uintptr_t)elm->pages->uptr; 273 + break; 274 + case IOPT_ADDRESS_FILE: 275 + start = elm->start_byte + elm->pages->start; 276 + break; 277 + } 278 + rc = iopt_alloc_iova(iopt, dst_iova, start, length); 274 279 if (rc) 275 280 goto out_unlock; 276 281 if (IS_ENABLED(CONFIG_IOMMUFD_TEST) && ··· 391 384 return rc; 392 385 } 393 386 387 + static int iopt_map_common(struct iommufd_ctx *ictx, struct io_pagetable *iopt, 388 + struct iopt_pages *pages, unsigned long *iova, 389 + unsigned long length, unsigned long start_byte, 390 + int iommu_prot, unsigned int flags) 391 + { 392 + struct iopt_pages_list elm = {}; 393 + LIST_HEAD(pages_list); 394 + int rc; 395 + 396 + elm.pages = pages; 397 + elm.start_byte = start_byte; 398 + if (ictx->account_mode == IOPT_PAGES_ACCOUNT_MM && 399 + elm.pages->account_mode == IOPT_PAGES_ACCOUNT_USER) 400 + elm.pages->account_mode = IOPT_PAGES_ACCOUNT_MM; 401 + elm.length = length; 402 + list_add(&elm.next, &pages_list); 403 + 404 + rc = iopt_map_pages(iopt, &pages_list, length, iova, iommu_prot, flags); 405 + if (rc) { 406 + if (elm.area) 407 + iopt_abort_area(elm.area); 408 + if (elm.pages) 409 + iopt_put_pages(elm.pages); 410 + return rc; 411 + } 412 + return 0; 413 + } 414 + 394 415 /** 395 416 * iopt_map_user_pages() - Map a user VA to an iova in the io page table 396 417 * @ictx: iommufd_ctx the iopt is part of ··· 443 408 unsigned long length, int iommu_prot, 444 409 unsigned int flags) 445 410 { 446 - struct iopt_pages_list elm = {}; 447 - LIST_HEAD(pages_list); 448 - int rc; 411 + struct iopt_pages *pages; 449 412 450 - elm.pages = iopt_alloc_pages(uptr, length, iommu_prot & IOMMU_WRITE); 451 - if (IS_ERR(elm.pages)) 452 - return PTR_ERR(elm.pages); 453 - if (ictx->account_mode == IOPT_PAGES_ACCOUNT_MM && 454 - elm.pages->account_mode == IOPT_PAGES_ACCOUNT_USER) 455 - elm.pages->account_mode = IOPT_PAGES_ACCOUNT_MM; 456 - elm.start_byte = uptr - elm.pages->uptr; 457 - elm.length = length; 458 - list_add(&elm.next, &pages_list); 413 + pages = iopt_alloc_user_pages(uptr, length, iommu_prot & IOMMU_WRITE); 414 + if (IS_ERR(pages)) 415 + return PTR_ERR(pages); 459 416 460 - rc = iopt_map_pages(iopt, &pages_list, length, iova, iommu_prot, flags); 461 - if (rc) { 462 - if (elm.area) 463 - iopt_abort_area(elm.area); 464 - if (elm.pages) 465 - iopt_put_pages(elm.pages); 466 - return rc; 467 - } 468 - return 0; 417 + return iopt_map_common(ictx, iopt, pages, iova, length, 418 + uptr - pages->uptr, iommu_prot, flags); 419 + } 420 + 421 + /** 422 + * iopt_map_file_pages() - Like iopt_map_user_pages, but map a file. 423 + * @ictx: iommufd_ctx the iopt is part of 424 + * @iopt: io_pagetable to act on 425 + * @iova: If IOPT_ALLOC_IOVA is set this is unused on input and contains 426 + * the chosen iova on output. Otherwise is the iova to map to on input 427 + * @file: file to map 428 + * @start: map file starting at this byte offset 429 + * @length: Number of bytes to map 430 + * @iommu_prot: Combination of IOMMU_READ/WRITE/etc bits for the mapping 431 + * @flags: IOPT_ALLOC_IOVA or zero 432 + */ 433 + int iopt_map_file_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt, 434 + unsigned long *iova, struct file *file, 435 + unsigned long start, unsigned long length, 436 + int iommu_prot, unsigned int flags) 437 + { 438 + struct iopt_pages *pages; 439 + 440 + pages = iopt_alloc_file_pages(file, start, length, 441 + iommu_prot & IOMMU_WRITE); 442 + if (IS_ERR(pages)) 443 + return PTR_ERR(pages); 444 + return iopt_map_common(ictx, iopt, pages, iova, length, 445 + start - pages->start, iommu_prot, flags); 469 446 } 470 447 471 448 struct iova_bitmap_fn_arg {
+23 -3
drivers/iommu/iommufd/io_pagetable.h
··· 173 173 IOPT_PAGES_ACCOUNT_NONE = 0, 174 174 IOPT_PAGES_ACCOUNT_USER = 1, 175 175 IOPT_PAGES_ACCOUNT_MM = 2, 176 + IOPT_PAGES_ACCOUNT_MODE_NUM = 3, 177 + }; 178 + 179 + enum iopt_address_type { 180 + IOPT_ADDRESS_USER = 0, 181 + IOPT_ADDRESS_FILE = 1, 176 182 }; 177 183 178 184 /* ··· 201 195 struct task_struct *source_task; 202 196 struct mm_struct *source_mm; 203 197 struct user_struct *source_user; 204 - void __user *uptr; 198 + enum iopt_address_type type; 199 + union { 200 + void __user *uptr; /* IOPT_ADDRESS_USER */ 201 + struct { /* IOPT_ADDRESS_FILE */ 202 + struct file *file; 203 + unsigned long start; 204 + }; 205 + }; 205 206 bool writable:1; 206 207 u8 account_mode; 207 208 ··· 219 206 struct rb_root_cached domains_itree; 220 207 }; 221 208 222 - struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length, 223 - bool writable); 209 + struct iopt_pages *iopt_alloc_user_pages(void __user *uptr, 210 + unsigned long length, bool writable); 211 + struct iopt_pages *iopt_alloc_file_pages(struct file *file, unsigned long start, 212 + unsigned long length, bool writable); 224 213 void iopt_release_pages(struct kref *kref); 225 214 static inline void iopt_put_pages(struct iopt_pages *pages) 226 215 { ··· 252 237 struct interval_tree_node node; 253 238 unsigned int users; 254 239 }; 240 + 241 + struct pfn_reader_user; 242 + 243 + int iopt_pages_update_pinned(struct iopt_pages *pages, unsigned long npages, 244 + bool inc, struct pfn_reader_user *user); 255 245 256 246 #endif
+259
drivers/iommu/iommufd/ioas.c
··· 2 2 /* 3 3 * Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES 4 4 */ 5 + #include <linux/file.h> 5 6 #include <linux/interval_tree.h> 6 7 #include <linux/iommu.h> 7 8 #include <linux/iommufd.h> ··· 52 51 rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); 53 52 if (rc) 54 53 goto out_table; 54 + 55 + down_read(&ucmd->ictx->ioas_creation_lock); 55 56 iommufd_object_finalize(ucmd->ictx, &ioas->obj); 57 + up_read(&ucmd->ictx->ioas_creation_lock); 56 58 return 0; 57 59 58 60 out_table: ··· 201 197 return iommu_prot; 202 198 } 203 199 200 + int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd) 201 + { 202 + struct iommu_ioas_map_file *cmd = ucmd->cmd; 203 + unsigned long iova = cmd->iova; 204 + struct iommufd_ioas *ioas; 205 + unsigned int flags = 0; 206 + struct file *file; 207 + int rc; 208 + 209 + if (cmd->flags & 210 + ~(IOMMU_IOAS_MAP_FIXED_IOVA | IOMMU_IOAS_MAP_WRITEABLE | 211 + IOMMU_IOAS_MAP_READABLE)) 212 + return -EOPNOTSUPP; 213 + 214 + if (cmd->iova >= ULONG_MAX || cmd->length >= ULONG_MAX) 215 + return -EOVERFLOW; 216 + 217 + if (!(cmd->flags & 218 + (IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IOAS_MAP_READABLE))) 219 + return -EINVAL; 220 + 221 + ioas = iommufd_get_ioas(ucmd->ictx, cmd->ioas_id); 222 + if (IS_ERR(ioas)) 223 + return PTR_ERR(ioas); 224 + 225 + if (!(cmd->flags & IOMMU_IOAS_MAP_FIXED_IOVA)) 226 + flags = IOPT_ALLOC_IOVA; 227 + 228 + file = fget(cmd->fd); 229 + if (!file) 230 + return -EBADF; 231 + 232 + rc = iopt_map_file_pages(ucmd->ictx, &ioas->iopt, &iova, file, 233 + cmd->start, cmd->length, 234 + conv_iommu_prot(cmd->flags), flags); 235 + if (rc) 236 + goto out_put; 237 + 238 + cmd->iova = iova; 239 + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); 240 + out_put: 241 + iommufd_put_object(ucmd->ictx, &ioas->obj); 242 + fput(file); 243 + return rc; 244 + } 245 + 204 246 int iommufd_ioas_map(struct iommufd_ucmd *ucmd) 205 247 { 206 248 struct iommu_ioas_map *cmd = ucmd->cmd; ··· 374 324 375 325 out_put: 376 326 iommufd_put_object(ucmd->ictx, &ioas->obj); 327 + return rc; 328 + } 329 + 330 + static void iommufd_release_all_iova_rwsem(struct iommufd_ctx *ictx, 331 + struct xarray *ioas_list) 332 + { 333 + struct iommufd_ioas *ioas; 334 + unsigned long index; 335 + 336 + xa_for_each(ioas_list, index, ioas) { 337 + up_write(&ioas->iopt.iova_rwsem); 338 + refcount_dec(&ioas->obj.users); 339 + } 340 + up_write(&ictx->ioas_creation_lock); 341 + xa_destroy(ioas_list); 342 + } 343 + 344 + static int iommufd_take_all_iova_rwsem(struct iommufd_ctx *ictx, 345 + struct xarray *ioas_list) 346 + { 347 + struct iommufd_object *obj; 348 + unsigned long index; 349 + int rc; 350 + 351 + /* 352 + * This is very ugly, it is done instead of adding a lock around 353 + * pages->source_mm, which is a performance path for mdev, we just 354 + * obtain the write side of all the iova_rwsems which also protects the 355 + * pages->source_*. Due to copies we can't know which IOAS could read 356 + * from the pages, so we just lock everything. This is the only place 357 + * locks are nested and they are uniformly taken in ID order. 358 + * 359 + * ioas_creation_lock prevents new IOAS from being installed in the 360 + * xarray while we do this, and also prevents more than one thread from 361 + * holding nested locks. 362 + */ 363 + down_write(&ictx->ioas_creation_lock); 364 + xa_lock(&ictx->objects); 365 + xa_for_each(&ictx->objects, index, obj) { 366 + struct iommufd_ioas *ioas; 367 + 368 + if (!obj || obj->type != IOMMUFD_OBJ_IOAS) 369 + continue; 370 + 371 + if (!refcount_inc_not_zero(&obj->users)) 372 + continue; 373 + 374 + xa_unlock(&ictx->objects); 375 + 376 + ioas = container_of(obj, struct iommufd_ioas, obj); 377 + down_write_nest_lock(&ioas->iopt.iova_rwsem, 378 + &ictx->ioas_creation_lock); 379 + 380 + rc = xa_err(xa_store(ioas_list, index, ioas, GFP_KERNEL)); 381 + if (rc) { 382 + iommufd_release_all_iova_rwsem(ictx, ioas_list); 383 + return rc; 384 + } 385 + 386 + xa_lock(&ictx->objects); 387 + } 388 + xa_unlock(&ictx->objects); 389 + return 0; 390 + } 391 + 392 + static bool need_charge_update(struct iopt_pages *pages) 393 + { 394 + switch (pages->account_mode) { 395 + case IOPT_PAGES_ACCOUNT_NONE: 396 + return false; 397 + case IOPT_PAGES_ACCOUNT_MM: 398 + return pages->source_mm != current->mm; 399 + case IOPT_PAGES_ACCOUNT_USER: 400 + /* 401 + * Update when mm changes because it also accounts 402 + * in mm->pinned_vm. 403 + */ 404 + return (pages->source_user != current_user()) || 405 + (pages->source_mm != current->mm); 406 + } 407 + return true; 408 + } 409 + 410 + static int charge_current(unsigned long *npinned) 411 + { 412 + struct iopt_pages tmp = { 413 + .source_mm = current->mm, 414 + .source_task = current->group_leader, 415 + .source_user = current_user(), 416 + }; 417 + unsigned int account_mode; 418 + int rc; 419 + 420 + for (account_mode = 0; account_mode != IOPT_PAGES_ACCOUNT_MODE_NUM; 421 + account_mode++) { 422 + if (!npinned[account_mode]) 423 + continue; 424 + 425 + tmp.account_mode = account_mode; 426 + rc = iopt_pages_update_pinned(&tmp, npinned[account_mode], true, 427 + NULL); 428 + if (rc) 429 + goto err_undo; 430 + } 431 + return 0; 432 + 433 + err_undo: 434 + while (account_mode != 0) { 435 + account_mode--; 436 + if (!npinned[account_mode]) 437 + continue; 438 + tmp.account_mode = account_mode; 439 + iopt_pages_update_pinned(&tmp, npinned[account_mode], false, 440 + NULL); 441 + } 442 + return rc; 443 + } 444 + 445 + static void change_mm(struct iopt_pages *pages) 446 + { 447 + struct task_struct *old_task = pages->source_task; 448 + struct user_struct *old_user = pages->source_user; 449 + struct mm_struct *old_mm = pages->source_mm; 450 + 451 + pages->source_mm = current->mm; 452 + mmgrab(pages->source_mm); 453 + mmdrop(old_mm); 454 + 455 + pages->source_task = current->group_leader; 456 + get_task_struct(pages->source_task); 457 + put_task_struct(old_task); 458 + 459 + pages->source_user = get_uid(current_user()); 460 + free_uid(old_user); 461 + } 462 + 463 + #define for_each_ioas_area(_xa, _index, _ioas, _area) \ 464 + xa_for_each((_xa), (_index), (_ioas)) \ 465 + for (_area = iopt_area_iter_first(&_ioas->iopt, 0, ULONG_MAX); \ 466 + _area; \ 467 + _area = iopt_area_iter_next(_area, 0, ULONG_MAX)) 468 + 469 + int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd) 470 + { 471 + struct iommu_ioas_change_process *cmd = ucmd->cmd; 472 + struct iommufd_ctx *ictx = ucmd->ictx; 473 + unsigned long all_npinned[IOPT_PAGES_ACCOUNT_MODE_NUM] = {}; 474 + struct iommufd_ioas *ioas; 475 + struct iopt_area *area; 476 + struct iopt_pages *pages; 477 + struct xarray ioas_list; 478 + unsigned long index; 479 + int rc; 480 + 481 + if (cmd->__reserved) 482 + return -EOPNOTSUPP; 483 + 484 + xa_init(&ioas_list); 485 + rc = iommufd_take_all_iova_rwsem(ictx, &ioas_list); 486 + if (rc) 487 + return rc; 488 + 489 + for_each_ioas_area(&ioas_list, index, ioas, area) { 490 + if (area->pages->type != IOPT_ADDRESS_FILE) { 491 + rc = -EINVAL; 492 + goto out; 493 + } 494 + } 495 + 496 + /* 497 + * Count last_pinned pages, then clear it to avoid double counting 498 + * if the same iopt_pages is visited multiple times in this loop. 499 + * Since we are under all the locks, npinned == last_npinned, so we 500 + * can easily restore last_npinned before we return. 501 + */ 502 + for_each_ioas_area(&ioas_list, index, ioas, area) { 503 + pages = area->pages; 504 + 505 + if (need_charge_update(pages)) { 506 + all_npinned[pages->account_mode] += pages->last_npinned; 507 + pages->last_npinned = 0; 508 + } 509 + } 510 + 511 + rc = charge_current(all_npinned); 512 + 513 + if (rc) { 514 + /* Charge failed. Fix last_npinned and bail. */ 515 + for_each_ioas_area(&ioas_list, index, ioas, area) 516 + area->pages->last_npinned = area->pages->npinned; 517 + goto out; 518 + } 519 + 520 + for_each_ioas_area(&ioas_list, index, ioas, area) { 521 + pages = area->pages; 522 + 523 + /* Uncharge the old one (which also restores last_npinned) */ 524 + if (need_charge_update(pages)) { 525 + int r = iopt_pages_update_pinned(pages, pages->npinned, 526 + false, NULL); 527 + 528 + if (WARN_ON(r)) 529 + rc = r; 530 + } 531 + change_mm(pages); 532 + } 533 + 534 + out: 535 + iommufd_release_all_iova_rwsem(ictx, &ioas_list); 377 536 return rc; 378 537 } 379 538
+30 -28
drivers/iommu/iommufd/iommufd_private.h
··· 5 5 #define __IOMMUFD_PRIVATE_H 6 6 7 7 #include <linux/iommu.h> 8 + #include <linux/iommufd.h> 8 9 #include <linux/iova_bitmap.h> 9 - #include <linux/refcount.h> 10 10 #include <linux/rwsem.h> 11 11 #include <linux/uaccess.h> 12 12 #include <linux/xarray.h> ··· 24 24 struct xarray objects; 25 25 struct xarray groups; 26 26 wait_queue_head_t destroy_wait; 27 + struct rw_semaphore ioas_creation_lock; 27 28 28 29 u8 account_mode; 29 30 /* Compatibility with VFIO no iommu */ ··· 70 69 unsigned long *iova, void __user *uptr, 71 70 unsigned long length, int iommu_prot, 72 71 unsigned int flags); 72 + int iopt_map_file_pages(struct iommufd_ctx *ictx, struct io_pagetable *iopt, 73 + unsigned long *iova, struct file *file, 74 + unsigned long start, unsigned long length, 75 + int iommu_prot, unsigned int flags); 73 76 int iopt_map_pages(struct io_pagetable *iopt, struct list_head *pages_list, 74 77 unsigned long length, unsigned long *dst_iova, 75 78 int iommu_prot, unsigned int flags); ··· 126 121 return -EFAULT; 127 122 return 0; 128 123 } 129 - 130 - enum iommufd_object_type { 131 - IOMMUFD_OBJ_NONE, 132 - IOMMUFD_OBJ_ANY = IOMMUFD_OBJ_NONE, 133 - IOMMUFD_OBJ_DEVICE, 134 - IOMMUFD_OBJ_HWPT_PAGING, 135 - IOMMUFD_OBJ_HWPT_NESTED, 136 - IOMMUFD_OBJ_IOAS, 137 - IOMMUFD_OBJ_ACCESS, 138 - IOMMUFD_OBJ_FAULT, 139 - #ifdef CONFIG_IOMMUFD_TEST 140 - IOMMUFD_OBJ_SELFTEST, 141 - #endif 142 - IOMMUFD_OBJ_MAX, 143 - }; 144 - 145 - /* Base struct for all objects with a userspace ID handle. */ 146 - struct iommufd_object { 147 - refcount_t shortterm_users; 148 - refcount_t users; 149 - enum iommufd_object_type type; 150 - unsigned int id; 151 - }; 152 124 153 125 static inline bool iommufd_lock_obj(struct iommufd_object *obj) 154 126 { ··· 207 225 iommufd_object_remove(ictx, obj, obj->id, 0); 208 226 } 209 227 210 - struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, 211 - size_t size, 212 - enum iommufd_object_type type); 213 - 214 228 #define __iommufd_object_alloc(ictx, ptr, type, obj) \ 215 229 container_of(_iommufd_object_alloc( \ 216 230 ictx, \ ··· 254 276 int iommufd_ioas_iova_ranges(struct iommufd_ucmd *ucmd); 255 277 int iommufd_ioas_allow_iovas(struct iommufd_ucmd *ucmd); 256 278 int iommufd_ioas_map(struct iommufd_ucmd *ucmd); 279 + int iommufd_ioas_map_file(struct iommufd_ucmd *ucmd); 280 + int iommufd_ioas_change_process(struct iommufd_ucmd *ucmd); 257 281 int iommufd_ioas_copy(struct iommufd_ucmd *ucmd); 258 282 int iommufd_ioas_unmap(struct iommufd_ucmd *ucmd); 259 283 int iommufd_ioas_option(struct iommufd_ucmd *ucmd); ··· 292 312 struct iommufd_hwpt_nested { 293 313 struct iommufd_hw_pagetable common; 294 314 struct iommufd_hwpt_paging *parent; 315 + struct iommufd_viommu *viommu; 295 316 }; 296 317 297 318 static inline bool hwpt_is_paging(struct iommufd_hw_pagetable *hwpt) ··· 508 527 509 528 return iommu_group_replace_domain(idev->igroup->group, hwpt->domain); 510 529 } 530 + 531 + static inline struct iommufd_viommu * 532 + iommufd_get_viommu(struct iommufd_ucmd *ucmd, u32 id) 533 + { 534 + return container_of(iommufd_get_object(ucmd->ictx, id, 535 + IOMMUFD_OBJ_VIOMMU), 536 + struct iommufd_viommu, obj); 537 + } 538 + 539 + int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd); 540 + void iommufd_viommu_destroy(struct iommufd_object *obj); 541 + int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd); 542 + void iommufd_vdevice_destroy(struct iommufd_object *obj); 543 + 544 + struct iommufd_vdevice { 545 + struct iommufd_object obj; 546 + struct iommufd_ctx *ictx; 547 + struct iommufd_viommu *viommu; 548 + struct device *dev; 549 + u64 id; /* per-vIOMMU virtual ID */ 550 + }; 511 551 512 552 #ifdef CONFIG_IOMMUFD_TEST 513 553 int iommufd_test(struct iommufd_ucmd *ucmd);
+32
drivers/iommu/iommufd/iommufd_test.h
··· 23 23 IOMMU_TEST_OP_DIRTY, 24 24 IOMMU_TEST_OP_MD_CHECK_IOTLB, 25 25 IOMMU_TEST_OP_TRIGGER_IOPF, 26 + IOMMU_TEST_OP_DEV_CHECK_CACHE, 26 27 }; 27 28 28 29 enum { ··· 53 52 enum { 54 53 MOCK_NESTED_DOMAIN_IOTLB_ID_MAX = 3, 55 54 MOCK_NESTED_DOMAIN_IOTLB_NUM = 4, 55 + }; 56 + 57 + enum { 58 + MOCK_DEV_CACHE_ID_MAX = 3, 59 + MOCK_DEV_CACHE_NUM = 4, 56 60 }; 57 61 58 62 struct iommu_test_cmd { ··· 141 135 __u32 perm; 142 136 __u64 addr; 143 137 } trigger_iopf; 138 + struct { 139 + __u32 id; 140 + __u32 cache; 141 + } check_dev_cache; 144 142 }; 145 143 __u32 last; 146 144 }; ··· 162 152 /* Should not be equal to any defined value in enum iommu_hwpt_data_type */ 163 153 #define IOMMU_HWPT_DATA_SELFTEST 0xdead 164 154 #define IOMMU_TEST_IOTLB_DEFAULT 0xbadbeef 155 + #define IOMMU_TEST_DEV_CACHE_DEFAULT 0xbaddad 165 156 166 157 /** 167 158 * struct iommu_hwpt_selftest ··· 189 178 #define IOMMU_TEST_INVALIDATE_FLAG_ALL (1 << 0) 190 179 __u32 flags; 191 180 __u32 iotlb_id; 181 + }; 182 + 183 + #define IOMMU_VIOMMU_TYPE_SELFTEST 0xdeadbeef 184 + 185 + /* Should not be equal to any defined value in enum iommu_viommu_invalidate_data_type */ 186 + #define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST 0xdeadbeef 187 + #define IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID 0xdadbeef 188 + 189 + /** 190 + * struct iommu_viommu_invalidate_selftest - Invalidation data for Mock VIOMMU 191 + * (IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) 192 + * @flags: Invalidate flags 193 + * @cache_id: Invalidate cache entry index 194 + * 195 + * If IOMMU_TEST_INVALIDATE_ALL is set in @flags, @cache_id will be ignored 196 + */ 197 + struct iommu_viommu_invalidate_selftest { 198 + #define IOMMU_TEST_INVALIDATE_FLAG_ALL (1 << 0) 199 + __u32 flags; 200 + __u32 vdev_id; 201 + __u32 cache_id; 192 202 }; 193 203 194 204 #endif
+28 -37
drivers/iommu/iommufd/main.c
··· 29 29 static const struct iommufd_object_ops iommufd_object_ops[]; 30 30 static struct miscdevice vfio_misc_dev; 31 31 32 - struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, 33 - size_t size, 34 - enum iommufd_object_type type) 35 - { 36 - struct iommufd_object *obj; 37 - int rc; 38 - 39 - obj = kzalloc(size, GFP_KERNEL_ACCOUNT); 40 - if (!obj) 41 - return ERR_PTR(-ENOMEM); 42 - obj->type = type; 43 - /* Starts out bias'd by 1 until it is removed from the xarray */ 44 - refcount_set(&obj->shortterm_users, 1); 45 - refcount_set(&obj->users, 1); 46 - 47 - /* 48 - * Reserve an ID in the xarray but do not publish the pointer yet since 49 - * the caller hasn't initialized it yet. Once the pointer is published 50 - * in the xarray and visible to other threads we can't reliably destroy 51 - * it anymore, so the caller must complete all errorable operations 52 - * before calling iommufd_object_finalize(). 53 - */ 54 - rc = xa_alloc(&ictx->objects, &obj->id, XA_ZERO_ENTRY, 55 - xa_limit_31b, GFP_KERNEL_ACCOUNT); 56 - if (rc) 57 - goto out_free; 58 - return obj; 59 - out_free: 60 - kfree(obj); 61 - return ERR_PTR(rc); 62 - } 63 - 64 32 /* 65 33 * Allow concurrent access to the object. 66 34 * ··· 41 73 void iommufd_object_finalize(struct iommufd_ctx *ictx, 42 74 struct iommufd_object *obj) 43 75 { 76 + XA_STATE(xas, &ictx->objects, obj->id); 44 77 void *old; 45 78 46 - old = xa_store(&ictx->objects, obj->id, obj, GFP_KERNEL); 47 - /* obj->id was returned from xa_alloc() so the xa_store() cannot fail */ 48 - WARN_ON(old); 79 + xa_lock(&ictx->objects); 80 + old = xas_store(&xas, obj); 81 + xa_unlock(&ictx->objects); 82 + /* obj->id was returned from xa_alloc() so the xas_store() cannot fail */ 83 + WARN_ON(old != XA_ZERO_ENTRY); 49 84 } 50 85 51 86 /* Undo _iommufd_object_alloc() if iommufd_object_finalize() was not called */ 52 87 void iommufd_object_abort(struct iommufd_ctx *ictx, struct iommufd_object *obj) 53 88 { 89 + XA_STATE(xas, &ictx->objects, obj->id); 54 90 void *old; 55 91 56 - old = xa_erase(&ictx->objects, obj->id); 57 - WARN_ON(old); 92 + xa_lock(&ictx->objects); 93 + old = xas_store(&xas, NULL); 94 + xa_unlock(&ictx->objects); 95 + WARN_ON(old != XA_ZERO_ENTRY); 58 96 kfree(obj); 59 97 } 60 98 ··· 222 248 pr_info_once("IOMMUFD is providing /dev/vfio/vfio, not VFIO.\n"); 223 249 } 224 250 251 + init_rwsem(&ictx->ioas_creation_lock); 225 252 xa_init_flags(&ictx->objects, XA_FLAGS_ALLOC1 | XA_FLAGS_ACCOUNT); 226 253 xa_init(&ictx->groups); 227 254 ictx->file = filp; ··· 308 333 struct iommu_ioas_unmap unmap; 309 334 struct iommu_option option; 310 335 struct iommu_vfio_ioas vfio_ioas; 336 + struct iommu_viommu_alloc viommu; 337 + struct iommu_vdevice_alloc vdev; 311 338 #ifdef CONFIG_IOMMUFD_TEST 312 339 struct iommu_test_cmd test; 313 340 #endif ··· 349 372 struct iommu_ioas_alloc, out_ioas_id), 350 373 IOCTL_OP(IOMMU_IOAS_ALLOW_IOVAS, iommufd_ioas_allow_iovas, 351 374 struct iommu_ioas_allow_iovas, allowed_iovas), 375 + IOCTL_OP(IOMMU_IOAS_CHANGE_PROCESS, iommufd_ioas_change_process, 376 + struct iommu_ioas_change_process, __reserved), 352 377 IOCTL_OP(IOMMU_IOAS_COPY, iommufd_ioas_copy, struct iommu_ioas_copy, 353 378 src_iova), 354 379 IOCTL_OP(IOMMU_IOAS_IOVA_RANGES, iommufd_ioas_iova_ranges, 355 380 struct iommu_ioas_iova_ranges, out_iova_alignment), 356 381 IOCTL_OP(IOMMU_IOAS_MAP, iommufd_ioas_map, struct iommu_ioas_map, 357 382 iova), 383 + IOCTL_OP(IOMMU_IOAS_MAP_FILE, iommufd_ioas_map_file, 384 + struct iommu_ioas_map_file, iova), 358 385 IOCTL_OP(IOMMU_IOAS_UNMAP, iommufd_ioas_unmap, struct iommu_ioas_unmap, 359 386 length), 360 387 IOCTL_OP(IOMMU_OPTION, iommufd_option, struct iommu_option, 361 388 val64), 362 389 IOCTL_OP(IOMMU_VFIO_IOAS, iommufd_vfio_ioas, struct iommu_vfio_ioas, 363 390 __reserved), 391 + IOCTL_OP(IOMMU_VIOMMU_ALLOC, iommufd_viommu_alloc_ioctl, 392 + struct iommu_viommu_alloc, out_viommu_id), 393 + IOCTL_OP(IOMMU_VDEVICE_ALLOC, iommufd_vdevice_alloc_ioctl, 394 + struct iommu_vdevice_alloc, virt_id), 364 395 #ifdef CONFIG_IOMMUFD_TEST 365 396 IOCTL_OP(IOMMU_TEST_CMD, iommufd_test, struct iommu_test_cmd, last), 366 397 #endif ··· 503 518 }, 504 519 [IOMMUFD_OBJ_FAULT] = { 505 520 .destroy = iommufd_fault_destroy, 521 + }, 522 + [IOMMUFD_OBJ_VIOMMU] = { 523 + .destroy = iommufd_viommu_destroy, 524 + }, 525 + [IOMMUFD_OBJ_VDEVICE] = { 526 + .destroy = iommufd_vdevice_destroy, 506 527 }, 507 528 #ifdef CONFIG_IOMMUFD_TEST 508 529 [IOMMUFD_OBJ_SELFTEST] = {
+259 -62
drivers/iommu/iommufd/pages.c
··· 45 45 * last_iova + 1 can overflow. An iopt_pages index will always be much less than 46 46 * ULONG_MAX so last_index + 1 cannot overflow. 47 47 */ 48 + #include <linux/file.h> 48 49 #include <linux/highmem.h> 49 50 #include <linux/iommu.h> 50 51 #include <linux/iommufd.h> ··· 347 346 kfree(batch->pfns); 348 347 } 349 348 349 + static bool batch_add_pfn_num(struct pfn_batch *batch, unsigned long pfn, 350 + u32 nr) 351 + { 352 + const unsigned int MAX_NPFNS = type_max(typeof(*batch->npfns)); 353 + unsigned int end = batch->end; 354 + 355 + if (end && pfn == batch->pfns[end - 1] + batch->npfns[end - 1] && 356 + nr <= MAX_NPFNS - batch->npfns[end - 1]) { 357 + batch->npfns[end - 1] += nr; 358 + } else if (end < batch->array_size) { 359 + batch->pfns[end] = pfn; 360 + batch->npfns[end] = nr; 361 + batch->end++; 362 + } else { 363 + return false; 364 + } 365 + 366 + batch->total_pfns += nr; 367 + return true; 368 + } 369 + 370 + static void batch_remove_pfn_num(struct pfn_batch *batch, unsigned long nr) 371 + { 372 + batch->npfns[batch->end - 1] -= nr; 373 + if (batch->npfns[batch->end - 1] == 0) 374 + batch->end--; 375 + batch->total_pfns -= nr; 376 + } 377 + 350 378 /* true if the pfn was added, false otherwise */ 351 379 static bool batch_add_pfn(struct pfn_batch *batch, unsigned long pfn) 352 380 { 353 - const unsigned int MAX_NPFNS = type_max(typeof(*batch->npfns)); 354 - 355 - if (batch->end && 356 - pfn == batch->pfns[batch->end - 1] + batch->npfns[batch->end - 1] && 357 - batch->npfns[batch->end - 1] != MAX_NPFNS) { 358 - batch->npfns[batch->end - 1]++; 359 - batch->total_pfns++; 360 - return true; 361 - } 362 - if (batch->end == batch->array_size) 363 - return false; 364 - batch->total_pfns++; 365 - batch->pfns[batch->end] = pfn; 366 - batch->npfns[batch->end] = 1; 367 - batch->end++; 368 - return true; 381 + return batch_add_pfn_num(batch, pfn, 1); 369 382 } 370 383 371 384 /* ··· 637 622 break; 638 623 } 639 624 625 + static int batch_from_folios(struct pfn_batch *batch, struct folio ***folios_p, 626 + unsigned long *offset_p, unsigned long npages) 627 + { 628 + int rc = 0; 629 + struct folio **folios = *folios_p; 630 + unsigned long offset = *offset_p; 631 + 632 + while (npages) { 633 + struct folio *folio = *folios; 634 + unsigned long nr = folio_nr_pages(folio) - offset; 635 + unsigned long pfn = page_to_pfn(folio_page(folio, offset)); 636 + 637 + nr = min(nr, npages); 638 + npages -= nr; 639 + 640 + if (!batch_add_pfn_num(batch, pfn, nr)) 641 + break; 642 + if (nr > 1) { 643 + rc = folio_add_pins(folio, nr - 1); 644 + if (rc) { 645 + batch_remove_pfn_num(batch, nr); 646 + goto out; 647 + } 648 + } 649 + 650 + folios++; 651 + offset = 0; 652 + } 653 + 654 + out: 655 + *folios_p = folios; 656 + *offset_p = offset; 657 + return rc; 658 + } 659 + 640 660 static void batch_unpin(struct pfn_batch *batch, struct iopt_pages *pages, 641 661 unsigned int first_page_off, size_t npages) 642 662 { ··· 753 703 * neither 754 704 */ 755 705 int locked; 706 + 707 + /* The following are only valid if file != NULL. */ 708 + struct file *file; 709 + struct folio **ufolios; 710 + size_t ufolios_len; 711 + unsigned long ufolios_offset; 712 + struct folio **ufolios_next; 756 713 }; 757 714 758 715 static void pfn_reader_user_init(struct pfn_reader_user *user, 759 716 struct iopt_pages *pages) 760 717 { 761 718 user->upages = NULL; 719 + user->upages_len = 0; 762 720 user->upages_start = 0; 763 721 user->upages_end = 0; 764 722 user->locked = -1; 765 - 766 723 user->gup_flags = FOLL_LONGTERM; 767 724 if (pages->writable) 768 725 user->gup_flags |= FOLL_WRITE; 726 + 727 + user->file = (pages->type == IOPT_ADDRESS_FILE) ? pages->file : NULL; 728 + user->ufolios = NULL; 729 + user->ufolios_len = 0; 730 + user->ufolios_next = NULL; 731 + user->ufolios_offset = 0; 769 732 } 770 733 771 734 static void pfn_reader_user_destroy(struct pfn_reader_user *user, ··· 787 724 if (user->locked != -1) { 788 725 if (user->locked) 789 726 mmap_read_unlock(pages->source_mm); 790 - if (pages->source_mm != current->mm) 727 + if (!user->file && pages->source_mm != current->mm) 791 728 mmput(pages->source_mm); 792 729 user->locked = -1; 793 730 } 794 731 795 732 kfree(user->upages); 796 733 user->upages = NULL; 734 + kfree(user->ufolios); 735 + user->ufolios = NULL; 736 + } 737 + 738 + static long pin_memfd_pages(struct pfn_reader_user *user, unsigned long start, 739 + unsigned long npages) 740 + { 741 + unsigned long i; 742 + unsigned long offset; 743 + unsigned long npages_out = 0; 744 + struct page **upages = user->upages; 745 + unsigned long end = start + (npages << PAGE_SHIFT) - 1; 746 + long nfolios = user->ufolios_len / sizeof(*user->ufolios); 747 + 748 + /* 749 + * todo: memfd_pin_folios should return the last pinned offset so 750 + * we can compute npages pinned, and avoid looping over folios here 751 + * if upages == NULL. 752 + */ 753 + nfolios = memfd_pin_folios(user->file, start, end, user->ufolios, 754 + nfolios, &offset); 755 + if (nfolios <= 0) 756 + return nfolios; 757 + 758 + offset >>= PAGE_SHIFT; 759 + user->ufolios_next = user->ufolios; 760 + user->ufolios_offset = offset; 761 + 762 + for (i = 0; i < nfolios; i++) { 763 + struct folio *folio = user->ufolios[i]; 764 + unsigned long nr = folio_nr_pages(folio); 765 + unsigned long npin = min(nr - offset, npages); 766 + 767 + npages -= npin; 768 + npages_out += npin; 769 + 770 + if (upages) { 771 + if (npin == 1) { 772 + *upages++ = folio_page(folio, offset); 773 + } else { 774 + int rc = folio_add_pins(folio, npin - 1); 775 + 776 + if (rc) 777 + return rc; 778 + 779 + while (npin--) 780 + *upages++ = folio_page(folio, offset++); 781 + } 782 + } 783 + 784 + offset = 0; 785 + } 786 + 787 + return npages_out; 797 788 } 798 789 799 790 static int pfn_reader_user_pin(struct pfn_reader_user *user, ··· 856 739 unsigned long last_index) 857 740 { 858 741 bool remote_mm = pages->source_mm != current->mm; 859 - unsigned long npages; 742 + unsigned long npages = last_index - start_index + 1; 743 + unsigned long start; 744 + unsigned long unum; 860 745 uintptr_t uptr; 861 746 long rc; 862 747 ··· 866 747 WARN_ON(last_index < start_index)) 867 748 return -EINVAL; 868 749 869 - if (!user->upages) { 750 + if (!user->file && !user->upages) { 870 751 /* All undone in pfn_reader_destroy() */ 871 - user->upages_len = 872 - (last_index - start_index + 1) * sizeof(*user->upages); 752 + user->upages_len = npages * sizeof(*user->upages); 873 753 user->upages = temp_kmalloc(&user->upages_len, NULL, 0); 874 754 if (!user->upages) 755 + return -ENOMEM; 756 + } 757 + 758 + if (user->file && !user->ufolios) { 759 + user->ufolios_len = npages * sizeof(*user->ufolios); 760 + user->ufolios = temp_kmalloc(&user->ufolios_len, NULL, 0); 761 + if (!user->ufolios) 875 762 return -ENOMEM; 876 763 } 877 764 ··· 887 762 * providing the pages, so we can optimize into 888 763 * get_user_pages_fast() 889 764 */ 890 - if (remote_mm) { 765 + if (!user->file && remote_mm) { 891 766 if (!mmget_not_zero(pages->source_mm)) 892 767 return -EFAULT; 893 768 } 894 769 user->locked = 0; 895 770 } 896 771 897 - npages = min_t(unsigned long, last_index - start_index + 1, 898 - user->upages_len / sizeof(*user->upages)); 899 - 772 + unum = user->file ? user->ufolios_len / sizeof(*user->ufolios) : 773 + user->upages_len / sizeof(*user->upages); 774 + npages = min_t(unsigned long, npages, unum); 900 775 901 776 if (iommufd_should_fail()) 902 777 return -EFAULT; 903 778 904 - uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE); 905 - if (!remote_mm) 779 + if (user->file) { 780 + start = pages->start + (start_index * PAGE_SIZE); 781 + rc = pin_memfd_pages(user, start, npages); 782 + } else if (!remote_mm) { 783 + uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE); 906 784 rc = pin_user_pages_fast(uptr, npages, user->gup_flags, 907 785 user->upages); 908 - else { 786 + } else { 787 + uptr = (uintptr_t)(pages->uptr + start_index * PAGE_SIZE); 909 788 if (!user->locked) { 910 789 mmap_read_lock(pages->source_mm); 911 790 user->locked = 1; ··· 967 838 mmap_read_unlock(pages->source_mm); 968 839 user->locked = 0; 969 840 /* If we had the lock then we also have a get */ 970 - } else if ((!user || !user->upages) && 841 + 842 + } else if ((!user || (!user->upages && !user->ufolios)) && 971 843 pages->source_mm != current->mm) { 972 844 if (!mmget_not_zero(pages->source_mm)) 973 845 return -EINVAL; ··· 985 855 return rc; 986 856 } 987 857 988 - static int do_update_pinned(struct iopt_pages *pages, unsigned long npages, 989 - bool inc, struct pfn_reader_user *user) 858 + int iopt_pages_update_pinned(struct iopt_pages *pages, unsigned long npages, 859 + bool inc, struct pfn_reader_user *user) 990 860 { 991 861 int rc = 0; 992 862 ··· 1020 890 return; 1021 891 if (pages->npinned == pages->last_npinned) 1022 892 return; 1023 - do_update_pinned(pages, pages->last_npinned - pages->npinned, false, 1024 - NULL); 893 + iopt_pages_update_pinned(pages, pages->last_npinned - pages->npinned, 894 + false, NULL); 1025 895 } 1026 896 1027 897 /* ··· 1051 921 npages = pages->npinned - pages->last_npinned; 1052 922 inc = true; 1053 923 } 1054 - return do_update_pinned(pages, npages, inc, user); 924 + return iopt_pages_update_pinned(pages, npages, inc, user); 1055 925 } 1056 926 1057 927 /* ··· 1108 978 { 1109 979 struct interval_tree_double_span_iter *span = &pfns->span; 1110 980 unsigned long start_index = pfns->batch_end_index; 981 + struct pfn_reader_user *user = &pfns->user; 982 + unsigned long npages; 1111 983 struct iopt_area *area; 1112 984 int rc; 1113 985 ··· 1147 1015 return rc; 1148 1016 } 1149 1017 1150 - batch_from_pages(&pfns->batch, 1151 - pfns->user.upages + 1152 - (start_index - pfns->user.upages_start), 1153 - pfns->user.upages_end - start_index); 1154 - return 0; 1018 + npages = user->upages_end - start_index; 1019 + start_index -= user->upages_start; 1020 + rc = 0; 1021 + 1022 + if (!user->file) 1023 + batch_from_pages(&pfns->batch, user->upages + start_index, 1024 + npages); 1025 + else 1026 + rc = batch_from_folios(&pfns->batch, &user->ufolios_next, 1027 + &user->ufolios_offset, npages); 1028 + return rc; 1155 1029 } 1156 1030 1157 1031 static bool pfn_reader_done(struct pfn_reader *pfns) ··· 1230 1092 static void pfn_reader_release_pins(struct pfn_reader *pfns) 1231 1093 { 1232 1094 struct iopt_pages *pages = pfns->pages; 1095 + struct pfn_reader_user *user = &pfns->user; 1233 1096 1234 - if (pfns->user.upages_end > pfns->batch_end_index) { 1235 - size_t npages = pfns->user.upages_end - pfns->batch_end_index; 1236 - 1097 + if (user->upages_end > pfns->batch_end_index) { 1237 1098 /* Any pages not transferred to the batch are just unpinned */ 1238 - unpin_user_pages(pfns->user.upages + (pfns->batch_end_index - 1239 - pfns->user.upages_start), 1240 - npages); 1099 + 1100 + unsigned long npages = user->upages_end - pfns->batch_end_index; 1101 + unsigned long start_index = pfns->batch_end_index - 1102 + user->upages_start; 1103 + 1104 + if (!user->file) { 1105 + unpin_user_pages(user->upages + start_index, npages); 1106 + } else { 1107 + long n = user->ufolios_len / sizeof(*user->ufolios); 1108 + 1109 + unpin_folios(user->ufolios_next, 1110 + user->ufolios + n - user->ufolios_next); 1111 + } 1241 1112 iopt_pages_sub_npinned(pages, npages); 1242 - pfns->user.upages_end = pfns->batch_end_index; 1113 + user->upages_end = pfns->batch_end_index; 1243 1114 } 1244 1115 if (pfns->batch_start_index != pfns->batch_end_index) { 1245 1116 pfn_reader_unpin(pfns); ··· 1286 1139 return 0; 1287 1140 } 1288 1141 1289 - struct iopt_pages *iopt_alloc_pages(void __user *uptr, unsigned long length, 1290 - bool writable) 1142 + static struct iopt_pages *iopt_alloc_pages(unsigned long start_byte, 1143 + unsigned long length, 1144 + bool writable) 1291 1145 { 1292 1146 struct iopt_pages *pages; 1293 - unsigned long end; 1294 1147 1295 1148 /* 1296 1149 * The iommu API uses size_t as the length, and protect the DIV_ROUND_UP ··· 1298 1151 */ 1299 1152 if (length > SIZE_MAX - PAGE_SIZE || length == 0) 1300 1153 return ERR_PTR(-EINVAL); 1301 - 1302 - if (check_add_overflow((unsigned long)uptr, length, &end)) 1303 - return ERR_PTR(-EOVERFLOW); 1304 1154 1305 1155 pages = kzalloc(sizeof(*pages), GFP_KERNEL_ACCOUNT); 1306 1156 if (!pages) ··· 1308 1164 mutex_init(&pages->mutex); 1309 1165 pages->source_mm = current->mm; 1310 1166 mmgrab(pages->source_mm); 1311 - pages->uptr = (void __user *)ALIGN_DOWN((uintptr_t)uptr, PAGE_SIZE); 1312 - pages->npages = DIV_ROUND_UP(length + (uptr - pages->uptr), PAGE_SIZE); 1167 + pages->npages = DIV_ROUND_UP(length + start_byte, PAGE_SIZE); 1313 1168 pages->access_itree = RB_ROOT_CACHED; 1314 1169 pages->domains_itree = RB_ROOT_CACHED; 1315 1170 pages->writable = writable; ··· 1319 1176 pages->source_task = current->group_leader; 1320 1177 get_task_struct(current->group_leader); 1321 1178 pages->source_user = get_uid(current_user()); 1179 + return pages; 1180 + } 1181 + 1182 + struct iopt_pages *iopt_alloc_user_pages(void __user *uptr, 1183 + unsigned long length, bool writable) 1184 + { 1185 + struct iopt_pages *pages; 1186 + unsigned long end; 1187 + void __user *uptr_down = 1188 + (void __user *) ALIGN_DOWN((uintptr_t)uptr, PAGE_SIZE); 1189 + 1190 + if (check_add_overflow((unsigned long)uptr, length, &end)) 1191 + return ERR_PTR(-EOVERFLOW); 1192 + 1193 + pages = iopt_alloc_pages(uptr - uptr_down, length, writable); 1194 + if (IS_ERR(pages)) 1195 + return pages; 1196 + pages->uptr = uptr_down; 1197 + pages->type = IOPT_ADDRESS_USER; 1198 + return pages; 1199 + } 1200 + 1201 + struct iopt_pages *iopt_alloc_file_pages(struct file *file, unsigned long start, 1202 + unsigned long length, bool writable) 1203 + 1204 + { 1205 + struct iopt_pages *pages; 1206 + unsigned long start_down = ALIGN_DOWN(start, PAGE_SIZE); 1207 + unsigned long end; 1208 + 1209 + if (length && check_add_overflow(start, length - 1, &end)) 1210 + return ERR_PTR(-EOVERFLOW); 1211 + 1212 + pages = iopt_alloc_pages(start - start_down, length, writable); 1213 + if (IS_ERR(pages)) 1214 + return pages; 1215 + pages->file = get_file(file); 1216 + pages->start = start_down; 1217 + pages->type = IOPT_ADDRESS_FILE; 1322 1218 return pages; 1323 1219 } 1324 1220 ··· 1373 1191 mutex_destroy(&pages->mutex); 1374 1192 put_task_struct(pages->source_task); 1375 1193 free_uid(pages->source_user); 1194 + if (pages->type == IOPT_ADDRESS_FILE) 1195 + fput(pages->file); 1376 1196 kfree(pages); 1377 1197 } 1378 1198 ··· 1814 1630 return 0; 1815 1631 } 1816 1632 1817 - static int iopt_pages_fill_from_mm(struct iopt_pages *pages, 1818 - struct pfn_reader_user *user, 1819 - unsigned long start_index, 1820 - unsigned long last_index, 1821 - struct page **out_pages) 1633 + static int iopt_pages_fill(struct iopt_pages *pages, 1634 + struct pfn_reader_user *user, 1635 + unsigned long start_index, 1636 + unsigned long last_index, 1637 + struct page **out_pages) 1822 1638 { 1823 1639 unsigned long cur_index = start_index; 1824 1640 int rc; ··· 1892 1708 1893 1709 /* hole */ 1894 1710 cur_pages = out_pages + (span.start_hole - start_index); 1895 - rc = iopt_pages_fill_from_mm(pages, &user, span.start_hole, 1896 - span.last_hole, cur_pages); 1711 + rc = iopt_pages_fill(pages, &user, span.start_hole, 1712 + span.last_hole, cur_pages); 1897 1713 if (rc) 1898 1714 goto out_clean_xa; 1899 1715 rc = pages_to_xarray(&pages->pinned_pfns, span.start_hole, ··· 1973 1789 struct page *page = NULL; 1974 1790 int rc; 1975 1791 1792 + if (IS_ENABLED(CONFIG_IOMMUFD_TEST) && 1793 + WARN_ON(pages->type != IOPT_ADDRESS_USER)) 1794 + return -EINVAL; 1795 + 1976 1796 if (!mmget_not_zero(pages->source_mm)) 1977 1797 return iopt_pages_rw_slow(pages, index, index, offset, data, 1978 1798 length, flags); ··· 2031 1843 2032 1844 if ((flags & IOMMUFD_ACCESS_RW_WRITE) && !pages->writable) 2033 1845 return -EPERM; 1846 + 1847 + if (pages->type == IOPT_ADDRESS_FILE) 1848 + return iopt_pages_rw_slow(pages, start_index, last_index, 1849 + start_byte % PAGE_SIZE, data, length, 1850 + flags); 1851 + 1852 + if (IS_ENABLED(CONFIG_IOMMUFD_TEST) && 1853 + WARN_ON(pages->type != IOPT_ADDRESS_USER)) 1854 + return -EINVAL; 2034 1855 2035 1856 if (!(flags & IOMMUFD_ACCESS_RW_KTHREAD) && change_mm) { 2036 1857 if (start_index == last_index)
+287 -81
drivers/iommu/iommufd/selftest.c
··· 126 126 struct xarray pfns; 127 127 }; 128 128 129 + static inline struct mock_iommu_domain * 130 + to_mock_domain(struct iommu_domain *domain) 131 + { 132 + return container_of(domain, struct mock_iommu_domain, domain); 133 + } 134 + 129 135 struct mock_iommu_domain_nested { 130 136 struct iommu_domain domain; 137 + struct mock_viommu *mock_viommu; 131 138 struct mock_iommu_domain *parent; 132 139 u32 iotlb[MOCK_NESTED_DOMAIN_IOTLB_NUM]; 133 140 }; 141 + 142 + static inline struct mock_iommu_domain_nested * 143 + to_mock_nested(struct iommu_domain *domain) 144 + { 145 + return container_of(domain, struct mock_iommu_domain_nested, domain); 146 + } 147 + 148 + struct mock_viommu { 149 + struct iommufd_viommu core; 150 + struct mock_iommu_domain *s2_parent; 151 + }; 152 + 153 + static inline struct mock_viommu *to_mock_viommu(struct iommufd_viommu *viommu) 154 + { 155 + return container_of(viommu, struct mock_viommu, core); 156 + } 134 157 135 158 enum selftest_obj_type { 136 159 TYPE_IDEV, ··· 163 140 struct device dev; 164 141 unsigned long flags; 165 142 int id; 143 + u32 cache[MOCK_DEV_CACHE_NUM]; 166 144 }; 145 + 146 + static inline struct mock_dev *to_mock_dev(struct device *dev) 147 + { 148 + return container_of(dev, struct mock_dev, dev); 149 + } 167 150 168 151 struct selftest_obj { 169 152 struct iommufd_object obj; ··· 184 155 }; 185 156 }; 186 157 158 + static inline struct selftest_obj *to_selftest_obj(struct iommufd_object *obj) 159 + { 160 + return container_of(obj, struct selftest_obj, obj); 161 + } 162 + 187 163 static int mock_domain_nop_attach(struct iommu_domain *domain, 188 164 struct device *dev) 189 165 { 190 - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); 166 + struct mock_dev *mdev = to_mock_dev(dev); 191 167 192 168 if (domain->dirty_ops && (mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY)) 193 169 return -EINVAL; ··· 227 193 static int mock_domain_set_dirty_tracking(struct iommu_domain *domain, 228 194 bool enable) 229 195 { 230 - struct mock_iommu_domain *mock = 231 - container_of(domain, struct mock_iommu_domain, domain); 196 + struct mock_iommu_domain *mock = to_mock_domain(domain); 232 197 unsigned long flags = mock->flags; 233 198 234 199 if (enable && !domain->dirty_ops) ··· 276 243 unsigned long flags, 277 244 struct iommu_dirty_bitmap *dirty) 278 245 { 279 - struct mock_iommu_domain *mock = 280 - container_of(domain, struct mock_iommu_domain, domain); 246 + struct mock_iommu_domain *mock = to_mock_domain(domain); 281 247 unsigned long end = iova + size; 282 248 void *ent; 283 249 ··· 313 281 314 282 static struct iommu_domain *mock_domain_alloc_paging(struct device *dev) 315 283 { 316 - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); 284 + struct mock_dev *mdev = to_mock_dev(dev); 317 285 struct mock_iommu_domain *mock; 318 286 319 287 mock = kzalloc(sizeof(*mock), GFP_KERNEL); ··· 330 298 return &mock->domain; 331 299 } 332 300 333 - static struct iommu_domain * 334 - __mock_domain_alloc_nested(struct mock_iommu_domain *mock_parent, 335 - const struct iommu_hwpt_selftest *user_cfg) 301 + static struct mock_iommu_domain_nested * 302 + __mock_domain_alloc_nested(const struct iommu_user_data *user_data) 336 303 { 337 304 struct mock_iommu_domain_nested *mock_nested; 338 - int i; 305 + struct iommu_hwpt_selftest user_cfg; 306 + int rc, i; 307 + 308 + if (user_data->type != IOMMU_HWPT_DATA_SELFTEST) 309 + return ERR_PTR(-EOPNOTSUPP); 310 + 311 + rc = iommu_copy_struct_from_user(&user_cfg, user_data, 312 + IOMMU_HWPT_DATA_SELFTEST, iotlb); 313 + if (rc) 314 + return ERR_PTR(rc); 339 315 340 316 mock_nested = kzalloc(sizeof(*mock_nested), GFP_KERNEL); 341 317 if (!mock_nested) 342 318 return ERR_PTR(-ENOMEM); 343 - mock_nested->parent = mock_parent; 344 319 mock_nested->domain.ops = &domain_nested_ops; 345 320 mock_nested->domain.type = IOMMU_DOMAIN_NESTED; 346 321 for (i = 0; i < MOCK_NESTED_DOMAIN_IOTLB_NUM; i++) 347 - mock_nested->iotlb[i] = user_cfg->iotlb; 322 + mock_nested->iotlb[i] = user_cfg.iotlb; 323 + return mock_nested; 324 + } 325 + 326 + static struct iommu_domain * 327 + mock_domain_alloc_nested(struct iommu_domain *parent, u32 flags, 328 + const struct iommu_user_data *user_data) 329 + { 330 + struct mock_iommu_domain_nested *mock_nested; 331 + struct mock_iommu_domain *mock_parent; 332 + 333 + if (flags) 334 + return ERR_PTR(-EOPNOTSUPP); 335 + if (!parent || parent->ops != mock_ops.default_domain_ops) 336 + return ERR_PTR(-EINVAL); 337 + 338 + mock_parent = to_mock_domain(parent); 339 + if (!mock_parent) 340 + return ERR_PTR(-EINVAL); 341 + 342 + mock_nested = __mock_domain_alloc_nested(user_data); 343 + if (IS_ERR(mock_nested)) 344 + return ERR_CAST(mock_nested); 345 + mock_nested->parent = mock_parent; 348 346 return &mock_nested->domain; 349 347 } 350 348 ··· 383 321 struct iommu_domain *parent, 384 322 const struct iommu_user_data *user_data) 385 323 { 386 - struct mock_iommu_domain *mock_parent; 387 - struct iommu_hwpt_selftest user_cfg; 388 - int rc; 324 + bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; 325 + const u32 PAGING_FLAGS = IOMMU_HWPT_ALLOC_DIRTY_TRACKING | 326 + IOMMU_HWPT_ALLOC_NEST_PARENT; 327 + bool no_dirty_ops = to_mock_dev(dev)->flags & 328 + MOCK_FLAGS_DEVICE_NO_DIRTY; 329 + struct iommu_domain *domain; 389 330 390 - /* must be mock_domain */ 391 - if (!parent) { 392 - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); 393 - bool has_dirty_flag = flags & IOMMU_HWPT_ALLOC_DIRTY_TRACKING; 394 - bool no_dirty_ops = mdev->flags & MOCK_FLAGS_DEVICE_NO_DIRTY; 395 - struct iommu_domain *domain; 331 + if (parent) 332 + return mock_domain_alloc_nested(parent, flags, user_data); 396 333 397 - if (flags & (~(IOMMU_HWPT_ALLOC_NEST_PARENT | 398 - IOMMU_HWPT_ALLOC_DIRTY_TRACKING))) 399 - return ERR_PTR(-EOPNOTSUPP); 400 - if (user_data || (has_dirty_flag && no_dirty_ops)) 401 - return ERR_PTR(-EOPNOTSUPP); 402 - domain = mock_domain_alloc_paging(dev); 403 - if (!domain) 404 - return ERR_PTR(-ENOMEM); 405 - if (has_dirty_flag) 406 - container_of(domain, struct mock_iommu_domain, domain) 407 - ->domain.dirty_ops = &dirty_ops; 408 - return domain; 409 - } 410 - 411 - /* must be mock_domain_nested */ 412 - if (user_data->type != IOMMU_HWPT_DATA_SELFTEST || flags) 334 + if (user_data) 413 335 return ERR_PTR(-EOPNOTSUPP); 414 - if (!parent || parent->ops != mock_ops.default_domain_ops) 415 - return ERR_PTR(-EINVAL); 336 + if ((flags & ~PAGING_FLAGS) || (has_dirty_flag && no_dirty_ops)) 337 + return ERR_PTR(-EOPNOTSUPP); 416 338 417 - mock_parent = container_of(parent, struct mock_iommu_domain, domain); 418 - if (!mock_parent) 419 - return ERR_PTR(-EINVAL); 420 - 421 - rc = iommu_copy_struct_from_user(&user_cfg, user_data, 422 - IOMMU_HWPT_DATA_SELFTEST, iotlb); 423 - if (rc) 424 - return ERR_PTR(rc); 425 - 426 - return __mock_domain_alloc_nested(mock_parent, &user_cfg); 339 + domain = mock_domain_alloc_paging(dev); 340 + if (!domain) 341 + return ERR_PTR(-ENOMEM); 342 + if (has_dirty_flag) 343 + domain->dirty_ops = &dirty_ops; 344 + return domain; 427 345 } 428 346 429 347 static void mock_domain_free(struct iommu_domain *domain) 430 348 { 431 - struct mock_iommu_domain *mock = 432 - container_of(domain, struct mock_iommu_domain, domain); 349 + struct mock_iommu_domain *mock = to_mock_domain(domain); 433 350 434 351 WARN_ON(!xa_empty(&mock->pfns)); 435 352 kfree(mock); ··· 419 378 size_t pgsize, size_t pgcount, int prot, 420 379 gfp_t gfp, size_t *mapped) 421 380 { 422 - struct mock_iommu_domain *mock = 423 - container_of(domain, struct mock_iommu_domain, domain); 381 + struct mock_iommu_domain *mock = to_mock_domain(domain); 424 382 unsigned long flags = MOCK_PFN_START_IOVA; 425 383 unsigned long start_iova = iova; 426 384 ··· 470 430 size_t pgcount, 471 431 struct iommu_iotlb_gather *iotlb_gather) 472 432 { 473 - struct mock_iommu_domain *mock = 474 - container_of(domain, struct mock_iommu_domain, domain); 433 + struct mock_iommu_domain *mock = to_mock_domain(domain); 475 434 bool first = true; 476 435 size_t ret = 0; 477 436 void *ent; ··· 518 479 static phys_addr_t mock_domain_iova_to_phys(struct iommu_domain *domain, 519 480 dma_addr_t iova) 520 481 { 521 - struct mock_iommu_domain *mock = 522 - container_of(domain, struct mock_iommu_domain, domain); 482 + struct mock_iommu_domain *mock = to_mock_domain(domain); 523 483 void *ent; 524 484 525 485 WARN_ON(iova % MOCK_IO_PAGE_SIZE); ··· 529 491 530 492 static bool mock_domain_capable(struct device *dev, enum iommu_cap cap) 531 493 { 532 - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); 494 + struct mock_dev *mdev = to_mock_dev(dev); 533 495 534 496 switch (cap) { 535 497 case IOMMU_CAP_CACHE_COHERENCY: ··· 545 507 546 508 static struct iopf_queue *mock_iommu_iopf_queue; 547 509 548 - static struct iommu_device mock_iommu_device = { 549 - }; 510 + static struct mock_iommu_device { 511 + struct iommu_device iommu_dev; 512 + struct completion complete; 513 + refcount_t users; 514 + } mock_iommu; 550 515 551 516 static struct iommu_device *mock_probe_device(struct device *dev) 552 517 { 553 518 if (dev->bus != &iommufd_mock_bus_type.bus) 554 519 return ERR_PTR(-ENODEV); 555 - return &mock_iommu_device; 520 + return &mock_iommu.iommu_dev; 556 521 } 557 522 558 523 static void mock_domain_page_response(struct device *dev, struct iopf_fault *evt, ··· 581 540 return 0; 582 541 } 583 542 543 + static void mock_viommu_destroy(struct iommufd_viommu *viommu) 544 + { 545 + struct mock_iommu_device *mock_iommu = container_of( 546 + viommu->iommu_dev, struct mock_iommu_device, iommu_dev); 547 + 548 + if (refcount_dec_and_test(&mock_iommu->users)) 549 + complete(&mock_iommu->complete); 550 + 551 + /* iommufd core frees mock_viommu and viommu */ 552 + } 553 + 554 + static struct iommu_domain * 555 + mock_viommu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags, 556 + const struct iommu_user_data *user_data) 557 + { 558 + struct mock_viommu *mock_viommu = to_mock_viommu(viommu); 559 + struct mock_iommu_domain_nested *mock_nested; 560 + 561 + if (flags & ~IOMMU_HWPT_FAULT_ID_VALID) 562 + return ERR_PTR(-EOPNOTSUPP); 563 + 564 + mock_nested = __mock_domain_alloc_nested(user_data); 565 + if (IS_ERR(mock_nested)) 566 + return ERR_CAST(mock_nested); 567 + mock_nested->mock_viommu = mock_viommu; 568 + mock_nested->parent = mock_viommu->s2_parent; 569 + return &mock_nested->domain; 570 + } 571 + 572 + static int mock_viommu_cache_invalidate(struct iommufd_viommu *viommu, 573 + struct iommu_user_data_array *array) 574 + { 575 + struct iommu_viommu_invalidate_selftest *cmds; 576 + struct iommu_viommu_invalidate_selftest *cur; 577 + struct iommu_viommu_invalidate_selftest *end; 578 + int rc; 579 + 580 + /* A zero-length array is allowed to validate the array type */ 581 + if (array->entry_num == 0 && 582 + array->type == IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST) { 583 + array->entry_num = 0; 584 + return 0; 585 + } 586 + 587 + cmds = kcalloc(array->entry_num, sizeof(*cmds), GFP_KERNEL); 588 + if (!cmds) 589 + return -ENOMEM; 590 + cur = cmds; 591 + end = cmds + array->entry_num; 592 + 593 + static_assert(sizeof(*cmds) == 3 * sizeof(u32)); 594 + rc = iommu_copy_struct_from_full_user_array( 595 + cmds, sizeof(*cmds), array, 596 + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST); 597 + if (rc) 598 + goto out; 599 + 600 + while (cur != end) { 601 + struct mock_dev *mdev; 602 + struct device *dev; 603 + int i; 604 + 605 + if (cur->flags & ~IOMMU_TEST_INVALIDATE_FLAG_ALL) { 606 + rc = -EOPNOTSUPP; 607 + goto out; 608 + } 609 + 610 + if (cur->cache_id > MOCK_DEV_CACHE_ID_MAX) { 611 + rc = -EINVAL; 612 + goto out; 613 + } 614 + 615 + xa_lock(&viommu->vdevs); 616 + dev = iommufd_viommu_find_dev(viommu, 617 + (unsigned long)cur->vdev_id); 618 + if (!dev) { 619 + xa_unlock(&viommu->vdevs); 620 + rc = -EINVAL; 621 + goto out; 622 + } 623 + mdev = container_of(dev, struct mock_dev, dev); 624 + 625 + if (cur->flags & IOMMU_TEST_INVALIDATE_FLAG_ALL) { 626 + /* Invalidate all cache entries and ignore cache_id */ 627 + for (i = 0; i < MOCK_DEV_CACHE_NUM; i++) 628 + mdev->cache[i] = 0; 629 + } else { 630 + mdev->cache[cur->cache_id] = 0; 631 + } 632 + xa_unlock(&viommu->vdevs); 633 + 634 + cur++; 635 + } 636 + out: 637 + array->entry_num = cur - cmds; 638 + kfree(cmds); 639 + return rc; 640 + } 641 + 642 + static struct iommufd_viommu_ops mock_viommu_ops = { 643 + .destroy = mock_viommu_destroy, 644 + .alloc_domain_nested = mock_viommu_alloc_domain_nested, 645 + .cache_invalidate = mock_viommu_cache_invalidate, 646 + }; 647 + 648 + static struct iommufd_viommu *mock_viommu_alloc(struct device *dev, 649 + struct iommu_domain *domain, 650 + struct iommufd_ctx *ictx, 651 + unsigned int viommu_type) 652 + { 653 + struct mock_iommu_device *mock_iommu = 654 + iommu_get_iommu_dev(dev, struct mock_iommu_device, iommu_dev); 655 + struct mock_viommu *mock_viommu; 656 + 657 + if (viommu_type != IOMMU_VIOMMU_TYPE_SELFTEST) 658 + return ERR_PTR(-EOPNOTSUPP); 659 + 660 + mock_viommu = iommufd_viommu_alloc(ictx, struct mock_viommu, core, 661 + &mock_viommu_ops); 662 + if (IS_ERR(mock_viommu)) 663 + return ERR_CAST(mock_viommu); 664 + 665 + refcount_inc(&mock_iommu->users); 666 + return &mock_viommu->core; 667 + } 668 + 584 669 static const struct iommu_ops mock_ops = { 585 670 /* 586 671 * IOMMU_DOMAIN_BLOCKED cannot be returned from def_domain_type() ··· 726 559 .dev_enable_feat = mock_dev_enable_feat, 727 560 .dev_disable_feat = mock_dev_disable_feat, 728 561 .user_pasid_table = true, 562 + .viommu_alloc = mock_viommu_alloc, 729 563 .default_domain_ops = 730 564 &(struct iommu_domain_ops){ 731 565 .free = mock_domain_free, ··· 739 571 740 572 static void mock_domain_free_nested(struct iommu_domain *domain) 741 573 { 742 - struct mock_iommu_domain_nested *mock_nested = 743 - container_of(domain, struct mock_iommu_domain_nested, domain); 744 - 745 - kfree(mock_nested); 574 + kfree(to_mock_nested(domain)); 746 575 } 747 576 748 577 static int 749 578 mock_domain_cache_invalidate_user(struct iommu_domain *domain, 750 579 struct iommu_user_data_array *array) 751 580 { 752 - struct mock_iommu_domain_nested *mock_nested = 753 - container_of(domain, struct mock_iommu_domain_nested, domain); 581 + struct mock_iommu_domain_nested *mock_nested = to_mock_nested(domain); 754 582 struct iommu_hwpt_invalidate_selftest inv; 755 583 u32 processed = 0; 756 584 int i = 0, j; ··· 821 657 iommufd_put_object(ucmd->ictx, &hwpt->obj); 822 658 return ERR_PTR(-EINVAL); 823 659 } 824 - *mock = container_of(hwpt->domain, struct mock_iommu_domain, domain); 660 + *mock = to_mock_domain(hwpt->domain); 825 661 return hwpt; 826 662 } 827 663 ··· 839 675 iommufd_put_object(ucmd->ictx, &hwpt->obj); 840 676 return ERR_PTR(-EINVAL); 841 677 } 842 - *mock_nested = container_of(hwpt->domain, 843 - struct mock_iommu_domain_nested, domain); 678 + *mock_nested = to_mock_nested(hwpt->domain); 844 679 return hwpt; 845 680 } 846 681 847 682 static void mock_dev_release(struct device *dev) 848 683 { 849 - struct mock_dev *mdev = container_of(dev, struct mock_dev, dev); 684 + struct mock_dev *mdev = to_mock_dev(dev); 850 685 851 686 ida_free(&mock_dev_ida, mdev->id); 852 687 kfree(mdev); ··· 854 691 static struct mock_dev *mock_dev_create(unsigned long dev_flags) 855 692 { 856 693 struct mock_dev *mdev; 857 - int rc; 694 + int rc, i; 858 695 859 696 if (dev_flags & 860 697 ~(MOCK_FLAGS_DEVICE_NO_DIRTY | MOCK_FLAGS_DEVICE_HUGE_IOVA)) ··· 868 705 mdev->flags = dev_flags; 869 706 mdev->dev.release = mock_dev_release; 870 707 mdev->dev.bus = &iommufd_mock_bus_type.bus; 708 + for (i = 0; i < MOCK_DEV_CACHE_NUM; i++) 709 + mdev->cache[i] = IOMMU_TEST_DEV_CACHE_DEFAULT; 871 710 872 711 rc = ida_alloc(&mock_dev_ida, GFP_KERNEL); 873 712 if (rc < 0) ··· 978 813 if (IS_ERR(dev_obj)) 979 814 return PTR_ERR(dev_obj); 980 815 981 - sobj = container_of(dev_obj, struct selftest_obj, obj); 816 + sobj = to_selftest_obj(dev_obj); 982 817 if (sobj->type != TYPE_IDEV) { 983 818 rc = -EINVAL; 984 819 goto out_dev_obj; ··· 1116 951 if (IS_ERR(hwpt)) 1117 952 return PTR_ERR(hwpt); 1118 953 1119 - mock_nested = container_of(hwpt->domain, 1120 - struct mock_iommu_domain_nested, domain); 954 + mock_nested = to_mock_nested(hwpt->domain); 1121 955 1122 956 if (iotlb_id > MOCK_NESTED_DOMAIN_IOTLB_ID_MAX || 1123 957 mock_nested->iotlb[iotlb_id] != iotlb) 1124 958 rc = -EINVAL; 1125 959 iommufd_put_object(ucmd->ictx, &hwpt->obj); 960 + return rc; 961 + } 962 + 963 + static int iommufd_test_dev_check_cache(struct iommufd_ucmd *ucmd, u32 idev_id, 964 + unsigned int cache_id, u32 cache) 965 + { 966 + struct iommufd_device *idev; 967 + struct mock_dev *mdev; 968 + int rc = 0; 969 + 970 + idev = iommufd_get_device(ucmd, idev_id); 971 + if (IS_ERR(idev)) 972 + return PTR_ERR(idev); 973 + mdev = container_of(idev->dev, struct mock_dev, dev); 974 + 975 + if (cache_id > MOCK_DEV_CACHE_ID_MAX || mdev->cache[cache_id] != cache) 976 + rc = -EINVAL; 977 + iommufd_put_object(ucmd->ictx, &idev->obj); 1126 978 return rc; 1127 979 } 1128 980 ··· 1613 1431 1614 1432 void iommufd_selftest_destroy(struct iommufd_object *obj) 1615 1433 { 1616 - struct selftest_obj *sobj = container_of(obj, struct selftest_obj, obj); 1434 + struct selftest_obj *sobj = to_selftest_obj(obj); 1617 1435 1618 1436 switch (sobj->type) { 1619 1437 case TYPE_IDEV: ··· 1652 1470 return iommufd_test_md_check_iotlb(ucmd, cmd->id, 1653 1471 cmd->check_iotlb.id, 1654 1472 cmd->check_iotlb.iotlb); 1473 + case IOMMU_TEST_OP_DEV_CHECK_CACHE: 1474 + return iommufd_test_dev_check_cache(ucmd, cmd->id, 1475 + cmd->check_dev_cache.id, 1476 + cmd->check_dev_cache.cache); 1655 1477 case IOMMU_TEST_OP_CREATE_ACCESS: 1656 1478 return iommufd_test_create_access(ucmd, cmd->id, 1657 1479 cmd->create_access.flags); ··· 1722 1536 if (rc) 1723 1537 goto err_platform; 1724 1538 1725 - rc = iommu_device_sysfs_add(&mock_iommu_device, 1539 + rc = iommu_device_sysfs_add(&mock_iommu.iommu_dev, 1726 1540 &selftest_iommu_dev->dev, NULL, "%s", 1727 1541 dev_name(&selftest_iommu_dev->dev)); 1728 1542 if (rc) 1729 1543 goto err_bus; 1730 1544 1731 - rc = iommu_device_register_bus(&mock_iommu_device, &mock_ops, 1545 + rc = iommu_device_register_bus(&mock_iommu.iommu_dev, &mock_ops, 1732 1546 &iommufd_mock_bus_type.bus, 1733 1547 &iommufd_mock_bus_type.nb); 1734 1548 if (rc) 1735 1549 goto err_sysfs; 1550 + 1551 + refcount_set(&mock_iommu.users, 1); 1552 + init_completion(&mock_iommu.complete); 1736 1553 1737 1554 mock_iommu_iopf_queue = iopf_queue_alloc("mock-iopfq"); 1738 1555 1739 1556 return 0; 1740 1557 1741 1558 err_sysfs: 1742 - iommu_device_sysfs_remove(&mock_iommu_device); 1559 + iommu_device_sysfs_remove(&mock_iommu.iommu_dev); 1743 1560 err_bus: 1744 1561 bus_unregister(&iommufd_mock_bus_type.bus); 1745 1562 err_platform: ··· 1752 1563 return rc; 1753 1564 } 1754 1565 1566 + static void iommufd_test_wait_for_users(void) 1567 + { 1568 + if (refcount_dec_and_test(&mock_iommu.users)) 1569 + return; 1570 + /* 1571 + * Time out waiting for iommu device user count to become 0. 1572 + * 1573 + * Note that this is just making an example here, since the selftest is 1574 + * built into the iommufd module, i.e. it only unplugs the iommu device 1575 + * when unloading the module. So, it is expected that this WARN_ON will 1576 + * not trigger, as long as any iommufd FDs are open. 1577 + */ 1578 + WARN_ON(!wait_for_completion_timeout(&mock_iommu.complete, 1579 + msecs_to_jiffies(10000))); 1580 + } 1581 + 1755 1582 void iommufd_test_exit(void) 1756 1583 { 1757 1584 if (mock_iommu_iopf_queue) { ··· 1775 1570 mock_iommu_iopf_queue = NULL; 1776 1571 } 1777 1572 1778 - iommu_device_sysfs_remove(&mock_iommu_device); 1779 - iommu_device_unregister_bus(&mock_iommu_device, 1573 + iommufd_test_wait_for_users(); 1574 + iommu_device_sysfs_remove(&mock_iommu.iommu_dev); 1575 + iommu_device_unregister_bus(&mock_iommu.iommu_dev, 1780 1576 &iommufd_mock_bus_type.bus, 1781 1577 &iommufd_mock_bus_type.nb); 1782 1578 bus_unregister(&iommufd_mock_bus_type.bus);
+1 -6
drivers/iommu/iommufd/vfio_compat.c
··· 291 291 case VFIO_DMA_CC_IOMMU: 292 292 return iommufd_vfio_cc_iommu(ictx); 293 293 294 - /* 295 - * This is obsolete, and to be removed from VFIO. It was an incomplete 296 - * idea that got merged. 297 - * https://lore.kernel.org/kvm/0-v1-0093c9b0e345+19-vfio_no_nesting_jgg@nvidia.com/ 298 - */ 299 - case VFIO_TYPE1_NESTING_IOMMU: 294 + case __VFIO_RESERVED_TYPE1_NESTING_IOMMU: 300 295 return 0; 301 296 302 297 /*
+157
drivers/iommu/iommufd/viommu.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* Copyright (c) 2024, NVIDIA CORPORATION & AFFILIATES 3 + */ 4 + #include "iommufd_private.h" 5 + 6 + void iommufd_viommu_destroy(struct iommufd_object *obj) 7 + { 8 + struct iommufd_viommu *viommu = 9 + container_of(obj, struct iommufd_viommu, obj); 10 + 11 + if (viommu->ops && viommu->ops->destroy) 12 + viommu->ops->destroy(viommu); 13 + refcount_dec(&viommu->hwpt->common.obj.users); 14 + xa_destroy(&viommu->vdevs); 15 + } 16 + 17 + int iommufd_viommu_alloc_ioctl(struct iommufd_ucmd *ucmd) 18 + { 19 + struct iommu_viommu_alloc *cmd = ucmd->cmd; 20 + struct iommufd_hwpt_paging *hwpt_paging; 21 + struct iommufd_viommu *viommu; 22 + struct iommufd_device *idev; 23 + const struct iommu_ops *ops; 24 + int rc; 25 + 26 + if (cmd->flags || cmd->type == IOMMU_VIOMMU_TYPE_DEFAULT) 27 + return -EOPNOTSUPP; 28 + 29 + idev = iommufd_get_device(ucmd, cmd->dev_id); 30 + if (IS_ERR(idev)) 31 + return PTR_ERR(idev); 32 + 33 + ops = dev_iommu_ops(idev->dev); 34 + if (!ops->viommu_alloc) { 35 + rc = -EOPNOTSUPP; 36 + goto out_put_idev; 37 + } 38 + 39 + hwpt_paging = iommufd_get_hwpt_paging(ucmd, cmd->hwpt_id); 40 + if (IS_ERR(hwpt_paging)) { 41 + rc = PTR_ERR(hwpt_paging); 42 + goto out_put_idev; 43 + } 44 + 45 + if (!hwpt_paging->nest_parent) { 46 + rc = -EINVAL; 47 + goto out_put_hwpt; 48 + } 49 + 50 + viommu = ops->viommu_alloc(idev->dev, hwpt_paging->common.domain, 51 + ucmd->ictx, cmd->type); 52 + if (IS_ERR(viommu)) { 53 + rc = PTR_ERR(viommu); 54 + goto out_put_hwpt; 55 + } 56 + 57 + xa_init(&viommu->vdevs); 58 + viommu->type = cmd->type; 59 + viommu->ictx = ucmd->ictx; 60 + viommu->hwpt = hwpt_paging; 61 + refcount_inc(&viommu->hwpt->common.obj.users); 62 + /* 63 + * It is the most likely case that a physical IOMMU is unpluggable. A 64 + * pluggable IOMMU instance (if exists) is responsible for refcounting 65 + * on its own. 66 + */ 67 + viommu->iommu_dev = __iommu_get_iommu_dev(idev->dev); 68 + 69 + cmd->out_viommu_id = viommu->obj.id; 70 + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); 71 + if (rc) 72 + goto out_abort; 73 + iommufd_object_finalize(ucmd->ictx, &viommu->obj); 74 + goto out_put_hwpt; 75 + 76 + out_abort: 77 + iommufd_object_abort_and_destroy(ucmd->ictx, &viommu->obj); 78 + out_put_hwpt: 79 + iommufd_put_object(ucmd->ictx, &hwpt_paging->common.obj); 80 + out_put_idev: 81 + iommufd_put_object(ucmd->ictx, &idev->obj); 82 + return rc; 83 + } 84 + 85 + void iommufd_vdevice_destroy(struct iommufd_object *obj) 86 + { 87 + struct iommufd_vdevice *vdev = 88 + container_of(obj, struct iommufd_vdevice, obj); 89 + struct iommufd_viommu *viommu = vdev->viommu; 90 + 91 + /* xa_cmpxchg is okay to fail if alloc failed xa_cmpxchg previously */ 92 + xa_cmpxchg(&viommu->vdevs, vdev->id, vdev, NULL, GFP_KERNEL); 93 + refcount_dec(&viommu->obj.users); 94 + put_device(vdev->dev); 95 + } 96 + 97 + int iommufd_vdevice_alloc_ioctl(struct iommufd_ucmd *ucmd) 98 + { 99 + struct iommu_vdevice_alloc *cmd = ucmd->cmd; 100 + struct iommufd_vdevice *vdev, *curr; 101 + struct iommufd_viommu *viommu; 102 + struct iommufd_device *idev; 103 + u64 virt_id = cmd->virt_id; 104 + int rc = 0; 105 + 106 + /* virt_id indexes an xarray */ 107 + if (virt_id > ULONG_MAX) 108 + return -EINVAL; 109 + 110 + viommu = iommufd_get_viommu(ucmd, cmd->viommu_id); 111 + if (IS_ERR(viommu)) 112 + return PTR_ERR(viommu); 113 + 114 + idev = iommufd_get_device(ucmd, cmd->dev_id); 115 + if (IS_ERR(idev)) { 116 + rc = PTR_ERR(idev); 117 + goto out_put_viommu; 118 + } 119 + 120 + if (viommu->iommu_dev != __iommu_get_iommu_dev(idev->dev)) { 121 + rc = -EINVAL; 122 + goto out_put_idev; 123 + } 124 + 125 + vdev = iommufd_object_alloc(ucmd->ictx, vdev, IOMMUFD_OBJ_VDEVICE); 126 + if (IS_ERR(vdev)) { 127 + rc = PTR_ERR(vdev); 128 + goto out_put_idev; 129 + } 130 + 131 + vdev->id = virt_id; 132 + vdev->dev = idev->dev; 133 + get_device(idev->dev); 134 + vdev->viommu = viommu; 135 + refcount_inc(&viommu->obj.users); 136 + 137 + curr = xa_cmpxchg(&viommu->vdevs, virt_id, NULL, vdev, GFP_KERNEL); 138 + if (curr) { 139 + rc = xa_err(curr) ?: -EEXIST; 140 + goto out_abort; 141 + } 142 + 143 + cmd->out_vdevice_id = vdev->obj.id; 144 + rc = iommufd_ucmd_respond(ucmd, sizeof(*cmd)); 145 + if (rc) 146 + goto out_abort; 147 + iommufd_object_finalize(ucmd->ictx, &vdev->obj); 148 + goto out_put_idev; 149 + 150 + out_abort: 151 + iommufd_object_abort_and_destroy(ucmd->ictx, &vdev->obj); 152 + out_put_idev: 153 + iommufd_put_object(ucmd->ictx, &idev->obj); 154 + out_put_viommu: 155 + iommufd_put_object(ucmd->ictx, &viommu->obj); 156 + return rc; 157 + }
+1 -11
drivers/vfio/vfio_iommu_type1.c
··· 72 72 uint64_t pgsize_bitmap; 73 73 uint64_t num_non_pinned_groups; 74 74 bool v2; 75 - bool nesting; 76 75 bool dirty_page_tracking; 77 76 struct list_head emulated_iommu_groups; 78 77 }; ··· 2194 2195 goto out_free_domain; 2195 2196 } 2196 2197 2197 - if (iommu->nesting) { 2198 - ret = iommu_enable_nesting(domain->domain); 2199 - if (ret) 2200 - goto out_domain; 2201 - } 2202 - 2203 2198 ret = iommu_attach_group(domain->domain, group->iommu_group); 2204 2199 if (ret) 2205 2200 goto out_domain; ··· 2534 2541 switch (arg) { 2535 2542 case VFIO_TYPE1_IOMMU: 2536 2543 break; 2537 - case VFIO_TYPE1_NESTING_IOMMU: 2538 - iommu->nesting = true; 2539 - fallthrough; 2544 + case __VFIO_RESERVED_TYPE1_NESTING_IOMMU: 2540 2545 case VFIO_TYPE1v2_IOMMU: 2541 2546 iommu->v2 = true; 2542 2547 break; ··· 2629 2638 switch (arg) { 2630 2639 case VFIO_TYPE1_IOMMU: 2631 2640 case VFIO_TYPE1v2_IOMMU: 2632 - case VFIO_TYPE1_NESTING_IOMMU: 2633 2641 case VFIO_UNMAP_ALL: 2634 2642 return 1; 2635 2643 case VFIO_UPDATE_VADDR:
+2 -1
include/acpi/actbl2.h
··· 453 453 * IORT - IO Remapping Table 454 454 * 455 455 * Conforms to "IO Remapping Table System Software on ARM Platforms", 456 - * Document number: ARM DEN 0049E.e, Sep 2022 456 + * Document number: ARM DEN 0049E.f, Apr 2024 457 457 * 458 458 ******************************************************************************/ 459 459 ··· 524 524 525 525 #define ACPI_IORT_MF_COHERENCY (1) 526 526 #define ACPI_IORT_MF_ATTRIBUTES (1<<1) 527 + #define ACPI_IORT_MF_CANWBS (1<<2) 527 528 528 529 /* 529 530 * IORT node specific subtables
+2
include/linux/io-pgtable.h
··· 87 87 * attributes set in the TCR for a non-coherent page-table walker. 88 88 * 89 89 * IO_PGTABLE_QUIRK_ARM_HD: Enables dirty tracking in stage 1 pagetable. 90 + * IO_PGTABLE_QUIRK_ARM_S2FWB: Use the FWB format for the MemAttrs bits 90 91 */ 91 92 #define IO_PGTABLE_QUIRK_ARM_NS BIT(0) 92 93 #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1) ··· 96 95 #define IO_PGTABLE_QUIRK_ARM_TTBR1 BIT(5) 97 96 #define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6) 98 97 #define IO_PGTABLE_QUIRK_ARM_HD BIT(7) 98 + #define IO_PGTABLE_QUIRK_ARM_S2FWB BIT(8) 99 99 unsigned long quirks; 100 100 unsigned long pgsize_bitmap; 101 101 unsigned int ias;
+63 -4
include/linux/iommu.h
··· 42 42 struct iommu_sva; 43 43 struct iommu_dma_cookie; 44 44 struct iommu_fault_param; 45 + struct iommufd_ctx; 46 + struct iommufd_viommu; 45 47 46 48 #define IOMMU_FAULT_PERM_READ (1 << 0) /* read */ 47 49 #define IOMMU_FAULT_PERM_WRITE (1 << 1) /* write */ ··· 493 491 * @index: Index to the location in the array to copy user data from 494 492 * @min_last: The last member of the data structure @kdst points in the 495 493 * initial version. 496 - * Return 0 for success, otherwise -error. 494 + * 495 + * Copy a single entry from a user array. Return 0 for success, otherwise 496 + * -error. 497 497 */ 498 498 #define iommu_copy_struct_from_user_array(kdst, user_array, data_type, index, \ 499 499 min_last) \ 500 500 __iommu_copy_struct_from_user_array( \ 501 501 kdst, user_array, data_type, index, sizeof(*(kdst)), \ 502 502 offsetofend(typeof(*(kdst)), min_last)) 503 + 504 + /** 505 + * iommu_copy_struct_from_full_user_array - Copy iommu driver specific user 506 + * space data from an iommu_user_data_array 507 + * @kdst: Pointer to an iommu driver specific user data that is defined in 508 + * include/uapi/linux/iommufd.h 509 + * @kdst_entry_size: sizeof(*kdst) 510 + * @user_array: Pointer to a struct iommu_user_data_array for a user space 511 + * array 512 + * @data_type: The data type of the @kdst. Must match with @user_array->type 513 + * 514 + * Copy the entire user array. kdst must have room for kdst_entry_size * 515 + * user_array->entry_num bytes. Return 0 for success, otherwise -error. 516 + */ 517 + static inline int 518 + iommu_copy_struct_from_full_user_array(void *kdst, size_t kdst_entry_size, 519 + struct iommu_user_data_array *user_array, 520 + unsigned int data_type) 521 + { 522 + unsigned int i; 523 + int ret; 524 + 525 + if (user_array->type != data_type) 526 + return -EINVAL; 527 + if (!user_array->entry_num) 528 + return -EINVAL; 529 + if (likely(user_array->entry_len == kdst_entry_size)) { 530 + if (copy_from_user(kdst, user_array->uptr, 531 + user_array->entry_num * 532 + user_array->entry_len)) 533 + return -EFAULT; 534 + } 535 + 536 + /* Copy item by item */ 537 + for (i = 0; i != user_array->entry_num; i++) { 538 + ret = copy_struct_from_user( 539 + kdst + kdst_entry_size * i, kdst_entry_size, 540 + user_array->uptr + user_array->entry_len * i, 541 + user_array->entry_len); 542 + if (ret) 543 + return ret; 544 + } 545 + return 0; 546 + } 503 547 504 548 /** 505 549 * struct iommu_ops - iommu ops and capabilities ··· 590 542 * @remove_dev_pasid: Remove any translation configurations of a specific 591 543 * pasid, so that any DMA transactions with this pasid 592 544 * will be blocked by the hardware. 545 + * @viommu_alloc: Allocate an iommufd_viommu on a physical IOMMU instance behind 546 + * the @dev, as the set of virtualization resources shared/passed 547 + * to user space IOMMU instance. And associate it with a nesting 548 + * @parent_domain. The @viommu_type must be defined in the header 549 + * include/uapi/linux/iommufd.h 550 + * It is required to call iommufd_viommu_alloc() helper for 551 + * a bundled allocation of the core and the driver structures, 552 + * using the given @ictx pointer. 593 553 * @pgsize_bitmap: bitmap of all possible supported page sizes 594 554 * @owner: Driver module providing these ops 595 555 * @identity_domain: An always available, always attachable identity ··· 647 591 void (*remove_dev_pasid)(struct device *dev, ioasid_t pasid, 648 592 struct iommu_domain *domain); 649 593 594 + struct iommufd_viommu *(*viommu_alloc)( 595 + struct device *dev, struct iommu_domain *parent_domain, 596 + struct iommufd_ctx *ictx, unsigned int viommu_type); 597 + 650 598 const struct iommu_domain_ops *default_domain_ops; 651 599 unsigned long pgsize_bitmap; 652 600 struct module *owner; ··· 695 635 * @enforce_cache_coherency: Prevent any kind of DMA from bypassing IOMMU_CACHE, 696 636 * including no-snoop TLPs on PCIe or other platform 697 637 * specific mechanisms. 698 - * @enable_nesting: Enable nesting 699 638 * @set_pgtable_quirks: Set io page table quirks (IO_PGTABLE_QUIRK_*) 700 639 * @free: Release the domain after use. 701 640 */ ··· 722 663 dma_addr_t iova); 723 664 724 665 bool (*enforce_cache_coherency)(struct iommu_domain *domain); 725 - int (*enable_nesting)(struct iommu_domain *domain); 726 666 int (*set_pgtable_quirks)(struct iommu_domain *domain, 727 667 unsigned long quirks); 728 668 ··· 902 844 extern int iommu_group_id(struct iommu_group *group); 903 845 extern struct iommu_domain *iommu_group_default_domain(struct iommu_group *); 904 846 905 - int iommu_enable_nesting(struct iommu_domain *domain); 906 847 int iommu_set_pgtable_quirks(struct iommu_domain *domain, 907 848 unsigned long quirks); 908 849 ··· 1051 994 1052 995 /* ATS is supported */ 1053 996 #define IOMMU_FWSPEC_PCI_RC_ATS (1 << 0) 997 + /* CANWBS is supported */ 998 + #define IOMMU_FWSPEC_PCI_RC_CANWBS (1 << 1) 1054 999 1055 1000 /* 1056 1001 * An iommu attach handle represents a relationship between an iommu domain
+108
include/linux/iommufd.h
··· 8 8 9 9 #include <linux/err.h> 10 10 #include <linux/errno.h> 11 + #include <linux/refcount.h> 11 12 #include <linux/types.h> 13 + #include <linux/xarray.h> 12 14 13 15 struct device; 14 16 struct file; 15 17 struct iommu_group; 18 + struct iommu_user_data; 19 + struct iommu_user_data_array; 16 20 struct iommufd_access; 17 21 struct iommufd_ctx; 18 22 struct iommufd_device; 23 + struct iommufd_viommu_ops; 19 24 struct page; 25 + 26 + enum iommufd_object_type { 27 + IOMMUFD_OBJ_NONE, 28 + IOMMUFD_OBJ_ANY = IOMMUFD_OBJ_NONE, 29 + IOMMUFD_OBJ_DEVICE, 30 + IOMMUFD_OBJ_HWPT_PAGING, 31 + IOMMUFD_OBJ_HWPT_NESTED, 32 + IOMMUFD_OBJ_IOAS, 33 + IOMMUFD_OBJ_ACCESS, 34 + IOMMUFD_OBJ_FAULT, 35 + IOMMUFD_OBJ_VIOMMU, 36 + IOMMUFD_OBJ_VDEVICE, 37 + #ifdef CONFIG_IOMMUFD_TEST 38 + IOMMUFD_OBJ_SELFTEST, 39 + #endif 40 + IOMMUFD_OBJ_MAX, 41 + }; 42 + 43 + /* Base struct for all objects with a userspace ID handle. */ 44 + struct iommufd_object { 45 + refcount_t shortterm_users; 46 + refcount_t users; 47 + enum iommufd_object_type type; 48 + unsigned int id; 49 + }; 20 50 21 51 struct iommufd_device *iommufd_device_bind(struct iommufd_ctx *ictx, 22 52 struct device *dev, u32 *id); ··· 83 53 void iommufd_access_detach(struct iommufd_access *access); 84 54 85 55 void iommufd_ctx_get(struct iommufd_ctx *ictx); 56 + 57 + struct iommufd_viommu { 58 + struct iommufd_object obj; 59 + struct iommufd_ctx *ictx; 60 + struct iommu_device *iommu_dev; 61 + struct iommufd_hwpt_paging *hwpt; 62 + 63 + const struct iommufd_viommu_ops *ops; 64 + 65 + struct xarray vdevs; 66 + 67 + unsigned int type; 68 + }; 69 + 70 + /** 71 + * struct iommufd_viommu_ops - vIOMMU specific operations 72 + * @destroy: Clean up all driver-specific parts of an iommufd_viommu. The memory 73 + * of the vIOMMU will be free-ed by iommufd core after calling this op 74 + * @alloc_domain_nested: Allocate a IOMMU_DOMAIN_NESTED on a vIOMMU that holds a 75 + * nesting parent domain (IOMMU_DOMAIN_PAGING). @user_data 76 + * must be defined in include/uapi/linux/iommufd.h. 77 + * It must fully initialize the new iommu_domain before 78 + * returning. Upon failure, ERR_PTR must be returned. 79 + * @cache_invalidate: Flush hardware cache used by a vIOMMU. It can be used for 80 + * any IOMMU hardware specific cache: TLB and device cache. 81 + * The @array passes in the cache invalidation requests, in 82 + * form of a driver data structure. A driver must update the 83 + * array->entry_num to report the number of handled requests. 84 + * The data structure of the array entry must be defined in 85 + * include/uapi/linux/iommufd.h 86 + */ 87 + struct iommufd_viommu_ops { 88 + void (*destroy)(struct iommufd_viommu *viommu); 89 + struct iommu_domain *(*alloc_domain_nested)( 90 + struct iommufd_viommu *viommu, u32 flags, 91 + const struct iommu_user_data *user_data); 92 + int (*cache_invalidate)(struct iommufd_viommu *viommu, 93 + struct iommu_user_data_array *array); 94 + }; 86 95 87 96 #if IS_ENABLED(CONFIG_IOMMUFD) 88 97 struct iommufd_ctx *iommufd_ctx_from_file(struct file *file); ··· 180 111 return -EOPNOTSUPP; 181 112 } 182 113 #endif /* CONFIG_IOMMUFD */ 114 + 115 + #if IS_ENABLED(CONFIG_IOMMUFD_DRIVER_CORE) 116 + struct iommufd_object *_iommufd_object_alloc(struct iommufd_ctx *ictx, 117 + size_t size, 118 + enum iommufd_object_type type); 119 + struct device *iommufd_viommu_find_dev(struct iommufd_viommu *viommu, 120 + unsigned long vdev_id); 121 + #else /* !CONFIG_IOMMUFD_DRIVER_CORE */ 122 + static inline struct iommufd_object * 123 + _iommufd_object_alloc(struct iommufd_ctx *ictx, size_t size, 124 + enum iommufd_object_type type) 125 + { 126 + return ERR_PTR(-EOPNOTSUPP); 127 + } 128 + 129 + static inline struct device * 130 + iommufd_viommu_find_dev(struct iommufd_viommu *viommu, unsigned long vdev_id) 131 + { 132 + return NULL; 133 + } 134 + #endif /* CONFIG_IOMMUFD_DRIVER_CORE */ 135 + 136 + /* 137 + * Helpers for IOMMU driver to allocate driver structures that will be freed by 138 + * the iommufd core. The free op will be called prior to freeing the memory. 139 + */ 140 + #define iommufd_viommu_alloc(ictx, drv_struct, member, viommu_ops) \ 141 + ({ \ 142 + drv_struct *ret; \ 143 + \ 144 + static_assert(__same_type(struct iommufd_viommu, \ 145 + ((drv_struct *)NULL)->member)); \ 146 + static_assert(offsetof(drv_struct, member.obj) == 0); \ 147 + ret = (drv_struct *)_iommufd_object_alloc( \ 148 + ictx, sizeof(drv_struct), IOMMUFD_OBJ_VIOMMU); \ 149 + if (!IS_ERR(ret)) \ 150 + ret->member.ops = viommu_ops; \ 151 + ret; \ 152 + }) 183 153 #endif
+1
include/linux/mm.h
··· 2536 2536 long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end, 2537 2537 struct folio **folios, unsigned int max_folios, 2538 2538 pgoff_t *offset); 2539 + int folio_add_pins(struct folio *folio, unsigned int pins); 2539 2540 2540 2541 int get_user_pages_fast(unsigned long start, int nr_pages, 2541 2542 unsigned int gup_flags, struct page **pages);
+207 -9
include/uapi/linux/iommufd.h
··· 51 51 IOMMUFD_CMD_HWPT_GET_DIRTY_BITMAP = 0x8c, 52 52 IOMMUFD_CMD_HWPT_INVALIDATE = 0x8d, 53 53 IOMMUFD_CMD_FAULT_QUEUE_ALLOC = 0x8e, 54 + IOMMUFD_CMD_IOAS_MAP_FILE = 0x8f, 55 + IOMMUFD_CMD_VIOMMU_ALLOC = 0x90, 56 + IOMMUFD_CMD_VDEVICE_ALLOC = 0x91, 57 + IOMMUFD_CMD_IOAS_CHANGE_PROCESS = 0x92, 54 58 }; 55 59 56 60 /** ··· 216 212 __aligned_u64 iova; 217 213 }; 218 214 #define IOMMU_IOAS_MAP _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP) 215 + 216 + /** 217 + * struct iommu_ioas_map_file - ioctl(IOMMU_IOAS_MAP_FILE) 218 + * @size: sizeof(struct iommu_ioas_map_file) 219 + * @flags: same as for iommu_ioas_map 220 + * @ioas_id: same as for iommu_ioas_map 221 + * @fd: the memfd to map 222 + * @start: byte offset from start of file to map from 223 + * @length: same as for iommu_ioas_map 224 + * @iova: same as for iommu_ioas_map 225 + * 226 + * Set an IOVA mapping from a memfd file. All other arguments and semantics 227 + * match those of IOMMU_IOAS_MAP. 228 + */ 229 + struct iommu_ioas_map_file { 230 + __u32 size; 231 + __u32 flags; 232 + __u32 ioas_id; 233 + __s32 fd; 234 + __aligned_u64 start; 235 + __aligned_u64 length; 236 + __aligned_u64 iova; 237 + }; 238 + #define IOMMU_IOAS_MAP_FILE _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_MAP_FILE) 219 239 220 240 /** 221 241 * struct iommu_ioas_copy - ioctl(IOMMU_IOAS_COPY) ··· 423 395 }; 424 396 425 397 /** 398 + * struct iommu_hwpt_arm_smmuv3 - ARM SMMUv3 nested STE 399 + * (IOMMU_HWPT_DATA_ARM_SMMUV3) 400 + * 401 + * @ste: The first two double words of the user space Stream Table Entry for 402 + * the translation. Must be little-endian. 403 + * Allowed fields: (Refer to "5.2 Stream Table Entry" in SMMUv3 HW Spec) 404 + * - word-0: V, Cfg, S1Fmt, S1ContextPtr, S1CDMax 405 + * - word-1: EATS, S1DSS, S1CIR, S1COR, S1CSH, S1STALLD 406 + * 407 + * -EIO will be returned if @ste is not legal or contains any non-allowed field. 408 + * Cfg can be used to select a S1, Bypass or Abort configuration. A Bypass 409 + * nested domain will translate the same as the nesting parent. The S1 will 410 + * install a Context Descriptor Table pointing at userspace memory translated 411 + * by the nesting parent. 412 + */ 413 + struct iommu_hwpt_arm_smmuv3 { 414 + __aligned_le64 ste[2]; 415 + }; 416 + 417 + /** 426 418 * enum iommu_hwpt_data_type - IOMMU HWPT Data Type 427 419 * @IOMMU_HWPT_DATA_NONE: no data 428 420 * @IOMMU_HWPT_DATA_VTD_S1: Intel VT-d stage-1 page table 421 + * @IOMMU_HWPT_DATA_ARM_SMMUV3: ARM SMMUv3 Context Descriptor Table 429 422 */ 430 423 enum iommu_hwpt_data_type { 431 424 IOMMU_HWPT_DATA_NONE = 0, 432 425 IOMMU_HWPT_DATA_VTD_S1 = 1, 426 + IOMMU_HWPT_DATA_ARM_SMMUV3 = 2, 433 427 }; 434 428 435 429 /** ··· 459 409 * @size: sizeof(struct iommu_hwpt_alloc) 460 410 * @flags: Combination of enum iommufd_hwpt_alloc_flags 461 411 * @dev_id: The device to allocate this HWPT for 462 - * @pt_id: The IOAS or HWPT to connect this HWPT to 412 + * @pt_id: The IOAS or HWPT or vIOMMU to connect this HWPT to 463 413 * @out_hwpt_id: The ID of the new HWPT 464 414 * @__reserved: Must be 0 465 415 * @data_type: One of enum iommu_hwpt_data_type ··· 478 428 * IOMMU_HWPT_DATA_NONE. The HWPT can be allocated as a parent HWPT for a 479 429 * nesting configuration by passing IOMMU_HWPT_ALLOC_NEST_PARENT via @flags. 480 430 * 481 - * A user-managed nested HWPT will be created from a given parent HWPT via 482 - * @pt_id, in which the parent HWPT must be allocated previously via the 483 - * same ioctl from a given IOAS (@pt_id). In this case, the @data_type 484 - * must be set to a pre-defined type corresponding to an I/O page table 485 - * type supported by the underlying IOMMU hardware. 431 + * A user-managed nested HWPT will be created from a given vIOMMU (wrapping a 432 + * parent HWPT) or a parent HWPT via @pt_id, in which the parent HWPT must be 433 + * allocated previously via the same ioctl from a given IOAS (@pt_id). In this 434 + * case, the @data_type must be set to a pre-defined type corresponding to an 435 + * I/O page table type supported by the underlying IOMMU hardware. The device 436 + * via @dev_id and the vIOMMU via @pt_id must be associated to the same IOMMU 437 + * instance. 486 438 * 487 439 * If the @data_type is set to IOMMU_HWPT_DATA_NONE, @data_len and 488 440 * @data_uptr should be zero. Otherwise, both @data_len and @data_uptr ··· 537 485 }; 538 486 539 487 /** 488 + * struct iommu_hw_info_arm_smmuv3 - ARM SMMUv3 hardware information 489 + * (IOMMU_HW_INFO_TYPE_ARM_SMMUV3) 490 + * 491 + * @flags: Must be set to 0 492 + * @__reserved: Must be 0 493 + * @idr: Implemented features for ARM SMMU Non-secure programming interface 494 + * @iidr: Information about the implementation and implementer of ARM SMMU, 495 + * and architecture version supported 496 + * @aidr: ARM SMMU architecture version 497 + * 498 + * For the details of @idr, @iidr and @aidr, please refer to the chapters 499 + * from 6.3.1 to 6.3.6 in the SMMUv3 Spec. 500 + * 501 + * User space should read the underlying ARM SMMUv3 hardware information for 502 + * the list of supported features. 503 + * 504 + * Note that these values reflect the raw HW capability, without any insight if 505 + * any required kernel driver support is present. Bits may be set indicating the 506 + * HW has functionality that is lacking kernel software support, such as BTM. If 507 + * a VMM is using this information to construct emulated copies of these 508 + * registers it should only forward bits that it knows it can support. 509 + * 510 + * In future, presence of required kernel support will be indicated in flags. 511 + */ 512 + struct iommu_hw_info_arm_smmuv3 { 513 + __u32 flags; 514 + __u32 __reserved; 515 + __u32 idr[6]; 516 + __u32 iidr; 517 + __u32 aidr; 518 + }; 519 + 520 + /** 540 521 * enum iommu_hw_info_type - IOMMU Hardware Info Types 541 522 * @IOMMU_HW_INFO_TYPE_NONE: Used by the drivers that do not report hardware 542 523 * info 543 524 * @IOMMU_HW_INFO_TYPE_INTEL_VTD: Intel VT-d iommu info type 525 + * @IOMMU_HW_INFO_TYPE_ARM_SMMUV3: ARM SMMUv3 iommu info type 544 526 */ 545 527 enum iommu_hw_info_type { 546 528 IOMMU_HW_INFO_TYPE_NONE = 0, 547 529 IOMMU_HW_INFO_TYPE_INTEL_VTD = 1, 530 + IOMMU_HW_INFO_TYPE_ARM_SMMUV3 = 2, 548 531 }; 549 532 550 533 /** ··· 714 627 * enum iommu_hwpt_invalidate_data_type - IOMMU HWPT Cache Invalidation 715 628 * Data Type 716 629 * @IOMMU_HWPT_INVALIDATE_DATA_VTD_S1: Invalidation data for VTD_S1 630 + * @IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3: Invalidation data for ARM SMMUv3 717 631 */ 718 632 enum iommu_hwpt_invalidate_data_type { 719 633 IOMMU_HWPT_INVALIDATE_DATA_VTD_S1 = 0, 634 + IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3 = 1, 720 635 }; 721 636 722 637 /** ··· 758 669 }; 759 670 760 671 /** 672 + * struct iommu_viommu_arm_smmuv3_invalidate - ARM SMMUv3 cahce invalidation 673 + * (IOMMU_VIOMMU_INVALIDATE_DATA_ARM_SMMUV3) 674 + * @cmd: 128-bit cache invalidation command that runs in SMMU CMDQ. 675 + * Must be little-endian. 676 + * 677 + * Supported command list only when passing in a vIOMMU via @hwpt_id: 678 + * CMDQ_OP_TLBI_NSNH_ALL 679 + * CMDQ_OP_TLBI_NH_VA 680 + * CMDQ_OP_TLBI_NH_VAA 681 + * CMDQ_OP_TLBI_NH_ALL 682 + * CMDQ_OP_TLBI_NH_ASID 683 + * CMDQ_OP_ATC_INV 684 + * CMDQ_OP_CFGI_CD 685 + * CMDQ_OP_CFGI_CD_ALL 686 + * 687 + * -EIO will be returned if the command is not supported. 688 + */ 689 + struct iommu_viommu_arm_smmuv3_invalidate { 690 + __aligned_le64 cmd[2]; 691 + }; 692 + 693 + /** 761 694 * struct iommu_hwpt_invalidate - ioctl(IOMMU_HWPT_INVALIDATE) 762 695 * @size: sizeof(struct iommu_hwpt_invalidate) 763 - * @hwpt_id: ID of a nested HWPT for cache invalidation 696 + * @hwpt_id: ID of a nested HWPT or a vIOMMU, for cache invalidation 764 697 * @data_uptr: User pointer to an array of driver-specific cache invalidation 765 698 * data. 766 699 * @data_type: One of enum iommu_hwpt_invalidate_data_type, defining the data ··· 793 682 * Output the number of requests successfully handled by kernel. 794 683 * @__reserved: Must be 0. 795 684 * 796 - * Invalidate the iommu cache for user-managed page table. Modifications on a 797 - * user-managed page table should be followed by this operation to sync cache. 685 + * Invalidate iommu cache for user-managed page table or vIOMMU. Modifications 686 + * on a user-managed page table should be followed by this operation, if a HWPT 687 + * is passed in via @hwpt_id. Other caches, such as device cache or descriptor 688 + * cache can be flushed if a vIOMMU is passed in via the @hwpt_id field. 689 + * 798 690 * Each ioctl can support one or more cache invalidation requests in the array 799 691 * that has a total size of @entry_len * @entry_num. 800 692 * ··· 911 797 __u32 out_fault_fd; 912 798 }; 913 799 #define IOMMU_FAULT_QUEUE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_FAULT_QUEUE_ALLOC) 800 + 801 + /** 802 + * enum iommu_viommu_type - Virtual IOMMU Type 803 + * @IOMMU_VIOMMU_TYPE_DEFAULT: Reserved for future use 804 + * @IOMMU_VIOMMU_TYPE_ARM_SMMUV3: ARM SMMUv3 driver specific type 805 + */ 806 + enum iommu_viommu_type { 807 + IOMMU_VIOMMU_TYPE_DEFAULT = 0, 808 + IOMMU_VIOMMU_TYPE_ARM_SMMUV3 = 1, 809 + }; 810 + 811 + /** 812 + * struct iommu_viommu_alloc - ioctl(IOMMU_VIOMMU_ALLOC) 813 + * @size: sizeof(struct iommu_viommu_alloc) 814 + * @flags: Must be 0 815 + * @type: Type of the virtual IOMMU. Must be defined in enum iommu_viommu_type 816 + * @dev_id: The device's physical IOMMU will be used to back the virtual IOMMU 817 + * @hwpt_id: ID of a nesting parent HWPT to associate to 818 + * @out_viommu_id: Output virtual IOMMU ID for the allocated object 819 + * 820 + * Allocate a virtual IOMMU object, representing the underlying physical IOMMU's 821 + * virtualization support that is a security-isolated slice of the real IOMMU HW 822 + * that is unique to a specific VM. Operations global to the IOMMU are connected 823 + * to the vIOMMU, such as: 824 + * - Security namespace for guest owned ID, e.g. guest-controlled cache tags 825 + * - Non-device-affiliated event reporting, e.g. invalidation queue errors 826 + * - Access to a sharable nesting parent pagetable across physical IOMMUs 827 + * - Virtualization of various platforms IDs, e.g. RIDs and others 828 + * - Delivery of paravirtualized invalidation 829 + * - Direct assigned invalidation queues 830 + * - Direct assigned interrupts 831 + */ 832 + struct iommu_viommu_alloc { 833 + __u32 size; 834 + __u32 flags; 835 + __u32 type; 836 + __u32 dev_id; 837 + __u32 hwpt_id; 838 + __u32 out_viommu_id; 839 + }; 840 + #define IOMMU_VIOMMU_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VIOMMU_ALLOC) 841 + 842 + /** 843 + * struct iommu_vdevice_alloc - ioctl(IOMMU_VDEVICE_ALLOC) 844 + * @size: sizeof(struct iommu_vdevice_alloc) 845 + * @viommu_id: vIOMMU ID to associate with the virtual device 846 + * @dev_id: The physical device to allocate a virtual instance on the vIOMMU 847 + * @out_vdevice_id: Object handle for the vDevice. Pass to IOMMU_DESTORY 848 + * @virt_id: Virtual device ID per vIOMMU, e.g. vSID of ARM SMMUv3, vDeviceID 849 + * of AMD IOMMU, and vRID of a nested Intel VT-d to a Context Table 850 + * 851 + * Allocate a virtual device instance (for a physical device) against a vIOMMU. 852 + * This instance holds the device's information (related to its vIOMMU) in a VM. 853 + */ 854 + struct iommu_vdevice_alloc { 855 + __u32 size; 856 + __u32 viommu_id; 857 + __u32 dev_id; 858 + __u32 out_vdevice_id; 859 + __aligned_u64 virt_id; 860 + }; 861 + #define IOMMU_VDEVICE_ALLOC _IO(IOMMUFD_TYPE, IOMMUFD_CMD_VDEVICE_ALLOC) 862 + 863 + /** 864 + * struct iommu_ioas_change_process - ioctl(VFIO_IOAS_CHANGE_PROCESS) 865 + * @size: sizeof(struct iommu_ioas_change_process) 866 + * @__reserved: Must be 0 867 + * 868 + * This transfers pinned memory counts for every memory map in every IOAS 869 + * in the context to the current process. This only supports maps created 870 + * with IOMMU_IOAS_MAP_FILE, and returns EINVAL if other maps are present. 871 + * If the ioctl returns a failure status, then nothing is changed. 872 + * 873 + * This API is useful for transferring operation of a device from one process 874 + * to another, such as during userland live update. 875 + */ 876 + struct iommu_ioas_change_process { 877 + __u32 size; 878 + __u32 __reserved; 879 + }; 880 + 881 + #define IOMMU_IOAS_CHANGE_PROCESS \ 882 + _IO(IOMMUFD_TYPE, IOMMUFD_CMD_IOAS_CHANGE_PROCESS) 883 + 914 884 #endif
+1 -1
include/uapi/linux/vfio.h
··· 35 35 #define VFIO_EEH 5 36 36 37 37 /* Two-stage IOMMU */ 38 - #define VFIO_TYPE1_NESTING_IOMMU 6 /* Implies v2 */ 38 + #define __VFIO_RESERVED_TYPE1_NESTING_IOMMU 6 /* Implies v2 */ 39 39 40 40 #define VFIO_SPAPR_TCE_v2_IOMMU 7 41 41
+24
mm/gup.c
··· 3760 3760 return ret; 3761 3761 } 3762 3762 EXPORT_SYMBOL_GPL(memfd_pin_folios); 3763 + 3764 + /** 3765 + * folio_add_pins() - add pins to an already-pinned folio 3766 + * @folio: the folio to add more pins to 3767 + * @pins: number of pins to add 3768 + * 3769 + * Try to add more pins to an already-pinned folio. The semantics 3770 + * of the pin (e.g., FOLL_WRITE) follow any existing pin and cannot 3771 + * be changed. 3772 + * 3773 + * This function is helpful when having obtained a pin on a large folio 3774 + * using memfd_pin_folios(), but wanting to logically unpin parts 3775 + * (e.g., individual pages) of the folio later, for example, using 3776 + * unpin_user_page_range_dirty_lock(). 3777 + * 3778 + * This is not the right interface to initially pin a folio. 3779 + */ 3780 + int folio_add_pins(struct folio *folio, unsigned int pins) 3781 + { 3782 + VM_WARN_ON_ONCE(!folio_maybe_dma_pinned(folio)); 3783 + 3784 + return try_grab_folio(folio, pins, FOLL_PIN); 3785 + } 3786 + EXPORT_SYMBOL_GPL(folio_add_pins);
+1
tools/testing/selftests/iommu/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 2 CFLAGS += -Wall -O2 -Wno-unused-function 3 3 CFLAGS += $(KHDR_INCLUDES) 4 + LDLIBS += -lcap 4 5 5 6 TEST_GEN_PROGS := 6 7 TEST_GEN_PROGS += iommufd
+588 -18
tools/testing/selftests/iommu/iommufd.c
··· 1 1 // SPDX-License-Identifier: GPL-2.0-only 2 2 /* Copyright (c) 2021-2022, NVIDIA CORPORATION & AFFILIATES */ 3 + #include <asm/unistd.h> 3 4 #include <stdlib.h> 5 + #include <sys/capability.h> 4 6 #include <sys/mman.h> 5 7 #include <sys/eventfd.h> 6 8 ··· 51 49 vrc = mmap(buffer, BUFFER_SIZE, PROT_READ | PROT_WRITE, 52 50 MAP_SHARED | MAP_ANONYMOUS | MAP_FIXED, -1, 0); 53 51 assert(vrc == buffer); 52 + 53 + mfd_buffer = memfd_mmap(BUFFER_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, 54 + &mfd); 54 55 } 55 56 56 57 FIXTURE(iommufd) ··· 133 128 TEST_LENGTH(iommu_ioas_unmap, IOMMU_IOAS_UNMAP, length); 134 129 TEST_LENGTH(iommu_option, IOMMU_OPTION, val64); 135 130 TEST_LENGTH(iommu_vfio_ioas, IOMMU_VFIO_IOAS, __reserved); 131 + TEST_LENGTH(iommu_ioas_map_file, IOMMU_IOAS_MAP_FILE, iova); 132 + TEST_LENGTH(iommu_viommu_alloc, IOMMU_VIOMMU_ALLOC, out_viommu_id); 133 + TEST_LENGTH(iommu_vdevice_alloc, IOMMU_VDEVICE_ALLOC, virt_id); 134 + TEST_LENGTH(iommu_ioas_change_process, IOMMU_IOAS_CHANGE_PROCESS, 135 + __reserved); 136 136 #undef TEST_LENGTH 137 137 } 138 138 ··· 196 186 EXPECT_ERRNO(ENOENT, ioctl(self->fd, IOMMU_OPTION, &cmd)); 197 187 } 198 188 189 + static void drop_cap_ipc_lock(struct __test_metadata *_metadata) 190 + { 191 + cap_t caps; 192 + cap_value_t cap_list[1] = { CAP_IPC_LOCK }; 193 + 194 + caps = cap_get_proc(); 195 + ASSERT_NE(caps, NULL); 196 + ASSERT_NE(-1, 197 + cap_set_flag(caps, CAP_EFFECTIVE, 1, cap_list, CAP_CLEAR)); 198 + ASSERT_NE(-1, cap_set_proc(caps)); 199 + cap_free(caps); 200 + } 201 + 202 + static long get_proc_status_value(pid_t pid, const char *var) 203 + { 204 + FILE *fp; 205 + char buf[80], tag[80]; 206 + long val = -1; 207 + 208 + snprintf(buf, sizeof(buf), "/proc/%d/status", pid); 209 + fp = fopen(buf, "r"); 210 + if (!fp) 211 + return val; 212 + 213 + while (fgets(buf, sizeof(buf), fp)) 214 + if (fscanf(fp, "%s %ld\n", tag, &val) == 2 && !strcmp(tag, var)) 215 + break; 216 + 217 + fclose(fp); 218 + return val; 219 + } 220 + 221 + static long get_vm_pinned(pid_t pid) 222 + { 223 + return get_proc_status_value(pid, "VmPin:"); 224 + } 225 + 226 + static long get_vm_locked(pid_t pid) 227 + { 228 + return get_proc_status_value(pid, "VmLck:"); 229 + } 230 + 231 + FIXTURE(change_process) 232 + { 233 + int fd; 234 + uint32_t ioas_id; 235 + }; 236 + 237 + FIXTURE_VARIANT(change_process) 238 + { 239 + int accounting; 240 + }; 241 + 242 + FIXTURE_SETUP(change_process) 243 + { 244 + self->fd = open("/dev/iommu", O_RDWR); 245 + ASSERT_NE(-1, self->fd); 246 + 247 + drop_cap_ipc_lock(_metadata); 248 + if (variant->accounting != IOPT_PAGES_ACCOUNT_NONE) { 249 + struct iommu_option set_limit_cmd = { 250 + .size = sizeof(set_limit_cmd), 251 + .option_id = IOMMU_OPTION_RLIMIT_MODE, 252 + .op = IOMMU_OPTION_OP_SET, 253 + .val64 = (variant->accounting == IOPT_PAGES_ACCOUNT_MM), 254 + }; 255 + ASSERT_EQ(0, ioctl(self->fd, IOMMU_OPTION, &set_limit_cmd)); 256 + } 257 + 258 + test_ioctl_ioas_alloc(&self->ioas_id); 259 + test_cmd_mock_domain(self->ioas_id, NULL, NULL, NULL); 260 + } 261 + 262 + FIXTURE_TEARDOWN(change_process) 263 + { 264 + teardown_iommufd(self->fd, _metadata); 265 + } 266 + 267 + FIXTURE_VARIANT_ADD(change_process, account_none) 268 + { 269 + .accounting = IOPT_PAGES_ACCOUNT_NONE, 270 + }; 271 + 272 + FIXTURE_VARIANT_ADD(change_process, account_user) 273 + { 274 + .accounting = IOPT_PAGES_ACCOUNT_USER, 275 + }; 276 + 277 + FIXTURE_VARIANT_ADD(change_process, account_mm) 278 + { 279 + .accounting = IOPT_PAGES_ACCOUNT_MM, 280 + }; 281 + 282 + TEST_F(change_process, basic) 283 + { 284 + pid_t parent = getpid(); 285 + pid_t child; 286 + __u64 iova; 287 + struct iommu_ioas_change_process cmd = { 288 + .size = sizeof(cmd), 289 + }; 290 + 291 + /* Expect failure if non-file maps exist */ 292 + test_ioctl_ioas_map(buffer, PAGE_SIZE, &iova); 293 + EXPECT_ERRNO(EINVAL, ioctl(self->fd, IOMMU_IOAS_CHANGE_PROCESS, &cmd)); 294 + test_ioctl_ioas_unmap(iova, PAGE_SIZE); 295 + 296 + /* Change process works in current process. */ 297 + test_ioctl_ioas_map_file(mfd, 0, PAGE_SIZE, &iova); 298 + ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_CHANGE_PROCESS, &cmd)); 299 + 300 + /* Change process works in another process */ 301 + child = fork(); 302 + if (!child) { 303 + int nlock = PAGE_SIZE / 1024; 304 + 305 + /* Parent accounts for locked memory before */ 306 + ASSERT_EQ(nlock, get_vm_pinned(parent)); 307 + if (variant->accounting == IOPT_PAGES_ACCOUNT_MM) 308 + ASSERT_EQ(nlock, get_vm_locked(parent)); 309 + ASSERT_EQ(0, get_vm_pinned(getpid())); 310 + ASSERT_EQ(0, get_vm_locked(getpid())); 311 + 312 + ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_CHANGE_PROCESS, &cmd)); 313 + 314 + /* Child accounts for locked memory after */ 315 + ASSERT_EQ(0, get_vm_pinned(parent)); 316 + ASSERT_EQ(0, get_vm_locked(parent)); 317 + ASSERT_EQ(nlock, get_vm_pinned(getpid())); 318 + if (variant->accounting == IOPT_PAGES_ACCOUNT_MM) 319 + ASSERT_EQ(nlock, get_vm_locked(getpid())); 320 + 321 + exit(0); 322 + } 323 + ASSERT_NE(-1, child); 324 + ASSERT_EQ(child, waitpid(child, NULL, 0)); 325 + } 326 + 199 327 FIXTURE(iommufd_ioas) 200 328 { 201 329 int fd; ··· 368 220 for (i = 0; i != variant->mock_domains; i++) { 369 221 test_cmd_mock_domain(self->ioas_id, &self->stdev_id, 370 222 &self->hwpt_id, &self->device_id); 223 + test_cmd_dev_check_cache_all(self->device_id, 224 + IOMMU_TEST_DEV_CACHE_DEFAULT); 371 225 self->base_iova = MOCK_APERTURE_START; 372 226 } 373 227 } ··· 510 360 EXPECT_ERRNO(EBUSY, 511 361 _test_ioctl_destroy(self->fd, parent_hwpt_id)); 512 362 513 - /* hwpt_invalidate only supports a user-managed hwpt (nested) */ 363 + /* hwpt_invalidate does not support a parent hwpt */ 514 364 num_inv = 1; 515 - test_err_hwpt_invalidate(ENOENT, parent_hwpt_id, inv_reqs, 365 + test_err_hwpt_invalidate(EINVAL, parent_hwpt_id, inv_reqs, 516 366 IOMMU_HWPT_INVALIDATE_DATA_SELFTEST, 517 367 sizeof(*inv_reqs), &num_inv); 518 368 assert(!num_inv); ··· 1522 1372 { 1523 1373 unsigned int mock_domains; 1524 1374 bool hugepages; 1375 + bool file; 1525 1376 }; 1526 1377 1527 1378 FIXTURE_SETUP(iommufd_mock_domain) ··· 1535 1384 1536 1385 ASSERT_GE(ARRAY_SIZE(self->hwpt_ids), variant->mock_domains); 1537 1386 1538 - for (i = 0; i != variant->mock_domains; i++) 1387 + for (i = 0; i != variant->mock_domains; i++) { 1539 1388 test_cmd_mock_domain(self->ioas_id, &self->stdev_ids[i], 1540 1389 &self->hwpt_ids[i], &self->idev_ids[i]); 1390 + test_cmd_dev_check_cache_all(self->idev_ids[0], 1391 + IOMMU_TEST_DEV_CACHE_DEFAULT); 1392 + } 1541 1393 self->hwpt_id = self->hwpt_ids[0]; 1542 1394 1543 1395 self->mmap_flags = MAP_SHARED | MAP_ANONYMOUS; ··· 1564 1410 { 1565 1411 .mock_domains = 1, 1566 1412 .hugepages = false, 1413 + .file = false, 1567 1414 }; 1568 1415 1569 1416 FIXTURE_VARIANT_ADD(iommufd_mock_domain, two_domains) 1570 1417 { 1571 1418 .mock_domains = 2, 1572 1419 .hugepages = false, 1420 + .file = false, 1573 1421 }; 1574 1422 1575 1423 FIXTURE_VARIANT_ADD(iommufd_mock_domain, one_domain_hugepage) 1576 1424 { 1577 1425 .mock_domains = 1, 1578 1426 .hugepages = true, 1427 + .file = false, 1579 1428 }; 1580 1429 1581 1430 FIXTURE_VARIANT_ADD(iommufd_mock_domain, two_domains_hugepage) 1582 1431 { 1583 1432 .mock_domains = 2, 1584 1433 .hugepages = true, 1434 + .file = false, 1585 1435 }; 1436 + 1437 + FIXTURE_VARIANT_ADD(iommufd_mock_domain, one_domain_file) 1438 + { 1439 + .mock_domains = 1, 1440 + .hugepages = false, 1441 + .file = true, 1442 + }; 1443 + 1444 + FIXTURE_VARIANT_ADD(iommufd_mock_domain, one_domain_file_hugepage) 1445 + { 1446 + .mock_domains = 1, 1447 + .hugepages = true, 1448 + .file = true, 1449 + }; 1450 + 1586 1451 1587 1452 /* Have the kernel check that the user pages made it to the iommu_domain */ 1588 1453 #define check_mock_iova(_ptr, _iova, _length) \ ··· 1628 1455 } \ 1629 1456 }) 1630 1457 1631 - TEST_F(iommufd_mock_domain, basic) 1458 + static void 1459 + test_basic_mmap(struct __test_metadata *_metadata, 1460 + struct _test_data_iommufd_mock_domain *self, 1461 + const struct _fixture_variant_iommufd_mock_domain *variant) 1632 1462 { 1633 1463 size_t buf_size = self->mmap_buf_size; 1634 1464 uint8_t *buf; ··· 1652 1476 /* EFAULT on first page */ 1653 1477 ASSERT_EQ(0, munmap(buf, buf_size / 2)); 1654 1478 test_err_ioctl_ioas_map(EFAULT, buf, buf_size, &iova); 1479 + } 1480 + 1481 + static void 1482 + test_basic_file(struct __test_metadata *_metadata, 1483 + struct _test_data_iommufd_mock_domain *self, 1484 + const struct _fixture_variant_iommufd_mock_domain *variant) 1485 + { 1486 + size_t buf_size = self->mmap_buf_size; 1487 + uint8_t *buf; 1488 + __u64 iova; 1489 + int mfd_tmp; 1490 + int prot = PROT_READ | PROT_WRITE; 1491 + 1492 + /* Simple one page map */ 1493 + test_ioctl_ioas_map_file(mfd, 0, PAGE_SIZE, &iova); 1494 + check_mock_iova(mfd_buffer, iova, PAGE_SIZE); 1495 + 1496 + buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd_tmp); 1497 + ASSERT_NE(MAP_FAILED, buf); 1498 + 1499 + test_err_ioctl_ioas_map_file(EINVAL, mfd_tmp, 0, buf_size + 1, &iova); 1500 + 1501 + ASSERT_EQ(0, ftruncate(mfd_tmp, 0)); 1502 + test_err_ioctl_ioas_map_file(EINVAL, mfd_tmp, 0, buf_size, &iova); 1503 + 1504 + close(mfd_tmp); 1505 + } 1506 + 1507 + TEST_F(iommufd_mock_domain, basic) 1508 + { 1509 + if (variant->file) 1510 + test_basic_file(_metadata, self, variant); 1511 + else 1512 + test_basic_mmap(_metadata, self, variant); 1655 1513 } 1656 1514 1657 1515 TEST_F(iommufd_mock_domain, ro_unshare) ··· 1723 1513 unsigned int start; 1724 1514 unsigned int end; 1725 1515 uint8_t *buf; 1516 + int prot = PROT_READ | PROT_WRITE; 1517 + int mfd; 1726 1518 1727 - buf = mmap(0, buf_size, PROT_READ | PROT_WRITE, self->mmap_flags, -1, 1728 - 0); 1519 + if (variant->file) 1520 + buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd); 1521 + else 1522 + buf = mmap(0, buf_size, prot, self->mmap_flags, -1, 0); 1729 1523 ASSERT_NE(MAP_FAILED, buf); 1730 1524 check_refs(buf, buf_size, 0); 1731 1525 ··· 1746 1532 size_t length = end - start; 1747 1533 __u64 iova; 1748 1534 1749 - test_ioctl_ioas_map(buf + start, length, &iova); 1535 + if (variant->file) { 1536 + test_ioctl_ioas_map_file(mfd, start, length, 1537 + &iova); 1538 + } else { 1539 + test_ioctl_ioas_map(buf + start, length, &iova); 1540 + } 1750 1541 check_mock_iova(buf + start, iova, length); 1751 1542 check_refs(buf + start / PAGE_SIZE * PAGE_SIZE, 1752 1543 end / PAGE_SIZE * PAGE_SIZE - ··· 1763 1544 } 1764 1545 check_refs(buf, buf_size, 0); 1765 1546 ASSERT_EQ(0, munmap(buf, buf_size)); 1547 + if (variant->file) 1548 + close(mfd); 1766 1549 } 1767 1550 1768 1551 TEST_F(iommufd_mock_domain, all_aligns_copy) ··· 1775 1554 unsigned int start; 1776 1555 unsigned int end; 1777 1556 uint8_t *buf; 1557 + int prot = PROT_READ | PROT_WRITE; 1558 + int mfd; 1778 1559 1779 - buf = mmap(0, buf_size, PROT_READ | PROT_WRITE, self->mmap_flags, -1, 1780 - 0); 1560 + if (variant->file) 1561 + buf = memfd_mmap(buf_size, prot, MAP_SHARED, &mfd); 1562 + else 1563 + buf = mmap(0, buf_size, prot, self->mmap_flags, -1, 0); 1781 1564 ASSERT_NE(MAP_FAILED, buf); 1782 1565 check_refs(buf, buf_size, 0); 1783 1566 ··· 1800 1575 uint32_t mock_stdev_id; 1801 1576 __u64 iova; 1802 1577 1803 - test_ioctl_ioas_map(buf + start, length, &iova); 1578 + if (variant->file) { 1579 + test_ioctl_ioas_map_file(mfd, start, length, 1580 + &iova); 1581 + } else { 1582 + test_ioctl_ioas_map(buf + start, length, &iova); 1583 + } 1804 1584 1805 1585 /* Add and destroy a domain while the area exists */ 1806 1586 old_id = self->hwpt_ids[1]; ··· 1826 1596 } 1827 1597 check_refs(buf, buf_size, 0); 1828 1598 ASSERT_EQ(0, munmap(buf, buf_size)); 1599 + if (variant->file) 1600 + close(mfd); 1829 1601 } 1830 1602 1831 1603 TEST_F(iommufd_mock_domain, user_copy) 1832 1604 { 1605 + void *buf = variant->file ? mfd_buffer : buffer; 1833 1606 struct iommu_test_cmd access_cmd = { 1834 1607 .size = sizeof(access_cmd), 1835 1608 .op = IOMMU_TEST_OP_ACCESS_PAGES, 1836 1609 .access_pages = { .length = BUFFER_SIZE, 1837 - .uptr = (uintptr_t)buffer }, 1610 + .uptr = (uintptr_t)buf }, 1838 1611 }; 1839 1612 struct iommu_ioas_copy copy_cmd = { 1840 1613 .size = sizeof(copy_cmd), ··· 1856 1623 1857 1624 /* Pin the pages in an IOAS with no domains then copy to an IOAS with domains */ 1858 1625 test_ioctl_ioas_alloc(&ioas_id); 1859 - test_ioctl_ioas_map_id(ioas_id, buffer, BUFFER_SIZE, 1860 - &copy_cmd.src_iova); 1861 - 1626 + if (variant->file) { 1627 + test_ioctl_ioas_map_id_file(ioas_id, mfd, 0, BUFFER_SIZE, 1628 + &copy_cmd.src_iova); 1629 + } else { 1630 + test_ioctl_ioas_map_id(ioas_id, buf, BUFFER_SIZE, 1631 + &copy_cmd.src_iova); 1632 + } 1862 1633 test_cmd_create_access(ioas_id, &access_cmd.id, 1863 1634 MOCK_FLAGS_ACCESS_CREATE_NEEDS_PIN_PAGES); 1864 1635 ··· 1872 1635 &access_cmd)); 1873 1636 copy_cmd.src_ioas_id = ioas_id; 1874 1637 ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_COPY, &copy_cmd)); 1875 - check_mock_iova(buffer, MOCK_APERTURE_START, BUFFER_SIZE); 1638 + check_mock_iova(buf, MOCK_APERTURE_START, BUFFER_SIZE); 1876 1639 1877 1640 /* Now replace the ioas with a new one */ 1878 1641 test_ioctl_ioas_alloc(&new_ioas_id); 1879 - test_ioctl_ioas_map_id(new_ioas_id, buffer, BUFFER_SIZE, 1880 - &copy_cmd.src_iova); 1642 + if (variant->file) { 1643 + test_ioctl_ioas_map_id_file(new_ioas_id, mfd, 0, BUFFER_SIZE, 1644 + &copy_cmd.src_iova); 1645 + } else { 1646 + test_ioctl_ioas_map_id(new_ioas_id, buf, BUFFER_SIZE, 1647 + &copy_cmd.src_iova); 1648 + } 1881 1649 test_cmd_access_replace_ioas(access_cmd.id, new_ioas_id); 1882 1650 1883 1651 /* Destroy the old ioas and cleanup copied mapping */ ··· 1896 1654 &access_cmd)); 1897 1655 copy_cmd.src_ioas_id = new_ioas_id; 1898 1656 ASSERT_EQ(0, ioctl(self->fd, IOMMU_IOAS_COPY, &copy_cmd)); 1899 - check_mock_iova(buffer, MOCK_APERTURE_START, BUFFER_SIZE); 1657 + check_mock_iova(buf, MOCK_APERTURE_START, BUFFER_SIZE); 1900 1658 1901 1659 test_cmd_destroy_access_pages( 1902 1660 access_cmd.id, access_cmd.access_pages.out_access_pages_id); ··· 2625 2383 ioctl(self->fd, VFIO_IOMMU_UNMAP_DMA, 2626 2384 &unmap_cmd)); 2627 2385 } 2386 + } 2387 + } 2388 + 2389 + FIXTURE(iommufd_viommu) 2390 + { 2391 + int fd; 2392 + uint32_t ioas_id; 2393 + uint32_t stdev_id; 2394 + uint32_t hwpt_id; 2395 + uint32_t nested_hwpt_id; 2396 + uint32_t device_id; 2397 + uint32_t viommu_id; 2398 + }; 2399 + 2400 + FIXTURE_VARIANT(iommufd_viommu) 2401 + { 2402 + unsigned int viommu; 2403 + }; 2404 + 2405 + FIXTURE_SETUP(iommufd_viommu) 2406 + { 2407 + self->fd = open("/dev/iommu", O_RDWR); 2408 + ASSERT_NE(-1, self->fd); 2409 + test_ioctl_ioas_alloc(&self->ioas_id); 2410 + test_ioctl_set_default_memory_limit(); 2411 + 2412 + if (variant->viommu) { 2413 + struct iommu_hwpt_selftest data = { 2414 + .iotlb = IOMMU_TEST_IOTLB_DEFAULT, 2415 + }; 2416 + 2417 + test_cmd_mock_domain(self->ioas_id, &self->stdev_id, NULL, 2418 + &self->device_id); 2419 + 2420 + /* Allocate a nesting parent hwpt */ 2421 + test_cmd_hwpt_alloc(self->device_id, self->ioas_id, 2422 + IOMMU_HWPT_ALLOC_NEST_PARENT, 2423 + &self->hwpt_id); 2424 + 2425 + /* Allocate a vIOMMU taking refcount of the parent hwpt */ 2426 + test_cmd_viommu_alloc(self->device_id, self->hwpt_id, 2427 + IOMMU_VIOMMU_TYPE_SELFTEST, 2428 + &self->viommu_id); 2429 + 2430 + /* Allocate a regular nested hwpt */ 2431 + test_cmd_hwpt_alloc_nested(self->device_id, self->viommu_id, 0, 2432 + &self->nested_hwpt_id, 2433 + IOMMU_HWPT_DATA_SELFTEST, &data, 2434 + sizeof(data)); 2435 + } 2436 + } 2437 + 2438 + FIXTURE_TEARDOWN(iommufd_viommu) 2439 + { 2440 + teardown_iommufd(self->fd, _metadata); 2441 + } 2442 + 2443 + FIXTURE_VARIANT_ADD(iommufd_viommu, no_viommu) 2444 + { 2445 + .viommu = 0, 2446 + }; 2447 + 2448 + FIXTURE_VARIANT_ADD(iommufd_viommu, mock_viommu) 2449 + { 2450 + .viommu = 1, 2451 + }; 2452 + 2453 + TEST_F(iommufd_viommu, viommu_auto_destroy) 2454 + { 2455 + } 2456 + 2457 + TEST_F(iommufd_viommu, viommu_negative_tests) 2458 + { 2459 + uint32_t device_id = self->device_id; 2460 + uint32_t ioas_id = self->ioas_id; 2461 + uint32_t hwpt_id; 2462 + 2463 + if (self->device_id) { 2464 + /* Negative test -- invalid hwpt (hwpt_id=0) */ 2465 + test_err_viommu_alloc(ENOENT, device_id, 0, 2466 + IOMMU_VIOMMU_TYPE_SELFTEST, NULL); 2467 + 2468 + /* Negative test -- not a nesting parent hwpt */ 2469 + test_cmd_hwpt_alloc(device_id, ioas_id, 0, &hwpt_id); 2470 + test_err_viommu_alloc(EINVAL, device_id, hwpt_id, 2471 + IOMMU_VIOMMU_TYPE_SELFTEST, NULL); 2472 + test_ioctl_destroy(hwpt_id); 2473 + 2474 + /* Negative test -- unsupported viommu type */ 2475 + test_err_viommu_alloc(EOPNOTSUPP, device_id, self->hwpt_id, 2476 + 0xdead, NULL); 2477 + EXPECT_ERRNO(EBUSY, 2478 + _test_ioctl_destroy(self->fd, self->hwpt_id)); 2479 + EXPECT_ERRNO(EBUSY, 2480 + _test_ioctl_destroy(self->fd, self->viommu_id)); 2481 + } else { 2482 + test_err_viommu_alloc(ENOENT, self->device_id, self->hwpt_id, 2483 + IOMMU_VIOMMU_TYPE_SELFTEST, NULL); 2484 + } 2485 + } 2486 + 2487 + TEST_F(iommufd_viommu, viommu_alloc_nested_iopf) 2488 + { 2489 + struct iommu_hwpt_selftest data = { 2490 + .iotlb = IOMMU_TEST_IOTLB_DEFAULT, 2491 + }; 2492 + uint32_t viommu_id = self->viommu_id; 2493 + uint32_t dev_id = self->device_id; 2494 + uint32_t iopf_hwpt_id; 2495 + uint32_t fault_id; 2496 + uint32_t fault_fd; 2497 + 2498 + if (self->device_id) { 2499 + test_ioctl_fault_alloc(&fault_id, &fault_fd); 2500 + test_err_hwpt_alloc_iopf( 2501 + ENOENT, dev_id, viommu_id, UINT32_MAX, 2502 + IOMMU_HWPT_FAULT_ID_VALID, &iopf_hwpt_id, 2503 + IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); 2504 + test_err_hwpt_alloc_iopf( 2505 + EOPNOTSUPP, dev_id, viommu_id, fault_id, 2506 + IOMMU_HWPT_FAULT_ID_VALID | (1 << 31), &iopf_hwpt_id, 2507 + IOMMU_HWPT_DATA_SELFTEST, &data, sizeof(data)); 2508 + test_cmd_hwpt_alloc_iopf( 2509 + dev_id, viommu_id, fault_id, IOMMU_HWPT_FAULT_ID_VALID, 2510 + &iopf_hwpt_id, IOMMU_HWPT_DATA_SELFTEST, &data, 2511 + sizeof(data)); 2512 + 2513 + test_cmd_mock_domain_replace(self->stdev_id, iopf_hwpt_id); 2514 + EXPECT_ERRNO(EBUSY, 2515 + _test_ioctl_destroy(self->fd, iopf_hwpt_id)); 2516 + test_cmd_trigger_iopf(dev_id, fault_fd); 2517 + 2518 + test_cmd_mock_domain_replace(self->stdev_id, self->ioas_id); 2519 + test_ioctl_destroy(iopf_hwpt_id); 2520 + close(fault_fd); 2521 + test_ioctl_destroy(fault_id); 2522 + } 2523 + } 2524 + 2525 + TEST_F(iommufd_viommu, vdevice_alloc) 2526 + { 2527 + uint32_t viommu_id = self->viommu_id; 2528 + uint32_t dev_id = self->device_id; 2529 + uint32_t vdev_id = 0; 2530 + 2531 + if (dev_id) { 2532 + /* Set vdev_id to 0x99, unset it, and set to 0x88 */ 2533 + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); 2534 + test_err_vdevice_alloc(EEXIST, viommu_id, dev_id, 0x99, 2535 + &vdev_id); 2536 + test_ioctl_destroy(vdev_id); 2537 + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x88, &vdev_id); 2538 + test_ioctl_destroy(vdev_id); 2539 + } else { 2540 + test_err_vdevice_alloc(ENOENT, viommu_id, dev_id, 0x99, NULL); 2541 + } 2542 + } 2543 + 2544 + TEST_F(iommufd_viommu, vdevice_cache) 2545 + { 2546 + struct iommu_viommu_invalidate_selftest inv_reqs[2] = {}; 2547 + uint32_t viommu_id = self->viommu_id; 2548 + uint32_t dev_id = self->device_id; 2549 + uint32_t vdev_id = 0; 2550 + uint32_t num_inv; 2551 + 2552 + if (dev_id) { 2553 + test_cmd_vdevice_alloc(viommu_id, dev_id, 0x99, &vdev_id); 2554 + 2555 + test_cmd_dev_check_cache_all(dev_id, 2556 + IOMMU_TEST_DEV_CACHE_DEFAULT); 2557 + 2558 + /* Check data_type by passing zero-length array */ 2559 + num_inv = 0; 2560 + test_cmd_viommu_invalidate(viommu_id, inv_reqs, 2561 + sizeof(*inv_reqs), &num_inv); 2562 + assert(!num_inv); 2563 + 2564 + /* Negative test: Invalid data_type */ 2565 + num_inv = 1; 2566 + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, 2567 + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST_INVALID, 2568 + sizeof(*inv_reqs), &num_inv); 2569 + assert(!num_inv); 2570 + 2571 + /* Negative test: structure size sanity */ 2572 + num_inv = 1; 2573 + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, 2574 + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, 2575 + sizeof(*inv_reqs) + 1, &num_inv); 2576 + assert(!num_inv); 2577 + 2578 + num_inv = 1; 2579 + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, 2580 + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, 2581 + 1, &num_inv); 2582 + assert(!num_inv); 2583 + 2584 + /* Negative test: invalid flag is passed */ 2585 + num_inv = 1; 2586 + inv_reqs[0].flags = 0xffffffff; 2587 + inv_reqs[0].vdev_id = 0x99; 2588 + test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, 2589 + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, 2590 + sizeof(*inv_reqs), &num_inv); 2591 + assert(!num_inv); 2592 + 2593 + /* Negative test: invalid data_uptr when array is not empty */ 2594 + num_inv = 1; 2595 + inv_reqs[0].flags = 0; 2596 + inv_reqs[0].vdev_id = 0x99; 2597 + test_err_viommu_invalidate(EINVAL, viommu_id, NULL, 2598 + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, 2599 + sizeof(*inv_reqs), &num_inv); 2600 + assert(!num_inv); 2601 + 2602 + /* Negative test: invalid entry_len when array is not empty */ 2603 + num_inv = 1; 2604 + inv_reqs[0].flags = 0; 2605 + inv_reqs[0].vdev_id = 0x99; 2606 + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, 2607 + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, 2608 + 0, &num_inv); 2609 + assert(!num_inv); 2610 + 2611 + /* Negative test: invalid cache_id */ 2612 + num_inv = 1; 2613 + inv_reqs[0].flags = 0; 2614 + inv_reqs[0].vdev_id = 0x99; 2615 + inv_reqs[0].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; 2616 + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, 2617 + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, 2618 + sizeof(*inv_reqs), &num_inv); 2619 + assert(!num_inv); 2620 + 2621 + /* Negative test: invalid vdev_id */ 2622 + num_inv = 1; 2623 + inv_reqs[0].flags = 0; 2624 + inv_reqs[0].vdev_id = 0x9; 2625 + inv_reqs[0].cache_id = 0; 2626 + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, 2627 + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, 2628 + sizeof(*inv_reqs), &num_inv); 2629 + assert(!num_inv); 2630 + 2631 + /* 2632 + * Invalidate the 1st cache entry but fail the 2nd request 2633 + * due to invalid flags configuration in the 2nd request. 2634 + */ 2635 + num_inv = 2; 2636 + inv_reqs[0].flags = 0; 2637 + inv_reqs[0].vdev_id = 0x99; 2638 + inv_reqs[0].cache_id = 0; 2639 + inv_reqs[1].flags = 0xffffffff; 2640 + inv_reqs[1].vdev_id = 0x99; 2641 + inv_reqs[1].cache_id = 1; 2642 + test_err_viommu_invalidate(EOPNOTSUPP, viommu_id, inv_reqs, 2643 + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, 2644 + sizeof(*inv_reqs), &num_inv); 2645 + assert(num_inv == 1); 2646 + test_cmd_dev_check_cache(dev_id, 0, 0); 2647 + test_cmd_dev_check_cache(dev_id, 1, 2648 + IOMMU_TEST_DEV_CACHE_DEFAULT); 2649 + test_cmd_dev_check_cache(dev_id, 2, 2650 + IOMMU_TEST_DEV_CACHE_DEFAULT); 2651 + test_cmd_dev_check_cache(dev_id, 3, 2652 + IOMMU_TEST_DEV_CACHE_DEFAULT); 2653 + 2654 + /* 2655 + * Invalidate the 1st cache entry but fail the 2nd request 2656 + * due to invalid cache_id configuration in the 2nd request. 2657 + */ 2658 + num_inv = 2; 2659 + inv_reqs[0].flags = 0; 2660 + inv_reqs[0].vdev_id = 0x99; 2661 + inv_reqs[0].cache_id = 0; 2662 + inv_reqs[1].flags = 0; 2663 + inv_reqs[1].vdev_id = 0x99; 2664 + inv_reqs[1].cache_id = MOCK_DEV_CACHE_ID_MAX + 1; 2665 + test_err_viommu_invalidate(EINVAL, viommu_id, inv_reqs, 2666 + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, 2667 + sizeof(*inv_reqs), &num_inv); 2668 + assert(num_inv == 1); 2669 + test_cmd_dev_check_cache(dev_id, 0, 0); 2670 + test_cmd_dev_check_cache(dev_id, 1, 2671 + IOMMU_TEST_DEV_CACHE_DEFAULT); 2672 + test_cmd_dev_check_cache(dev_id, 2, 2673 + IOMMU_TEST_DEV_CACHE_DEFAULT); 2674 + test_cmd_dev_check_cache(dev_id, 3, 2675 + IOMMU_TEST_DEV_CACHE_DEFAULT); 2676 + 2677 + /* Invalidate the 2nd cache entry and verify */ 2678 + num_inv = 1; 2679 + inv_reqs[0].flags = 0; 2680 + inv_reqs[0].vdev_id = 0x99; 2681 + inv_reqs[0].cache_id = 1; 2682 + test_cmd_viommu_invalidate(viommu_id, inv_reqs, 2683 + sizeof(*inv_reqs), &num_inv); 2684 + assert(num_inv == 1); 2685 + test_cmd_dev_check_cache(dev_id, 0, 0); 2686 + test_cmd_dev_check_cache(dev_id, 1, 0); 2687 + test_cmd_dev_check_cache(dev_id, 2, 2688 + IOMMU_TEST_DEV_CACHE_DEFAULT); 2689 + test_cmd_dev_check_cache(dev_id, 3, 2690 + IOMMU_TEST_DEV_CACHE_DEFAULT); 2691 + 2692 + /* Invalidate the 3rd and 4th cache entries and verify */ 2693 + num_inv = 2; 2694 + inv_reqs[0].flags = 0; 2695 + inv_reqs[0].vdev_id = 0x99; 2696 + inv_reqs[0].cache_id = 2; 2697 + inv_reqs[1].flags = 0; 2698 + inv_reqs[1].vdev_id = 0x99; 2699 + inv_reqs[1].cache_id = 3; 2700 + test_cmd_viommu_invalidate(viommu_id, inv_reqs, 2701 + sizeof(*inv_reqs), &num_inv); 2702 + assert(num_inv == 2); 2703 + test_cmd_dev_check_cache_all(dev_id, 0); 2704 + 2705 + /* Invalidate all cache entries for nested_dev_id[1] and verify */ 2706 + num_inv = 1; 2707 + inv_reqs[0].vdev_id = 0x99; 2708 + inv_reqs[0].flags = IOMMU_TEST_INVALIDATE_FLAG_ALL; 2709 + test_cmd_viommu_invalidate(viommu_id, inv_reqs, 2710 + sizeof(*inv_reqs), &num_inv); 2711 + assert(num_inv == 1); 2712 + test_cmd_dev_check_cache_all(dev_id, 0); 2713 + test_ioctl_destroy(vdev_id); 2628 2714 } 2629 2715 } 2630 2716
+54
tools/testing/selftests/iommu/iommufd_fail_nth.c
··· 47 47 48 48 buffer = mmap(0, BUFFER_SIZE, PROT_READ | PROT_WRITE, 49 49 MAP_SHARED | MAP_ANONYMOUS, -1, 0); 50 + 51 + mfd_buffer = memfd_mmap(BUFFER_SIZE, PROT_READ | PROT_WRITE, MAP_SHARED, 52 + &mfd); 50 53 } 51 54 52 55 /* ··· 334 331 return 0; 335 332 } 336 333 334 + /* iopt_area_fill_domains() and iopt_area_fill_domain() */ 335 + TEST_FAIL_NTH(basic_fail_nth, map_file_domain) 336 + { 337 + uint32_t ioas_id; 338 + __u32 stdev_id; 339 + __u32 hwpt_id; 340 + __u64 iova; 341 + 342 + self->fd = open("/dev/iommu", O_RDWR); 343 + if (self->fd == -1) 344 + return -1; 345 + 346 + if (_test_ioctl_ioas_alloc(self->fd, &ioas_id)) 347 + return -1; 348 + 349 + if (_test_ioctl_set_temp_memory_limit(self->fd, 32)) 350 + return -1; 351 + 352 + fail_nth_enable(); 353 + 354 + if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL)) 355 + return -1; 356 + 357 + if (_test_ioctl_ioas_map_file(self->fd, ioas_id, mfd, 0, 262144, &iova, 358 + IOMMU_IOAS_MAP_WRITEABLE | 359 + IOMMU_IOAS_MAP_READABLE)) 360 + return -1; 361 + 362 + if (_test_ioctl_destroy(self->fd, stdev_id)) 363 + return -1; 364 + 365 + if (_test_cmd_mock_domain(self->fd, ioas_id, &stdev_id, &hwpt_id, NULL)) 366 + return -1; 367 + return 0; 368 + } 369 + 337 370 TEST_FAIL_NTH(basic_fail_nth, map_two_domains) 338 371 { 339 372 uint32_t ioas_id; ··· 621 582 uint32_t stdev_id; 622 583 uint32_t idev_id; 623 584 uint32_t hwpt_id; 585 + uint32_t viommu_id; 586 + uint32_t vdev_id; 624 587 __u64 iova; 625 588 626 589 self->fd = open("/dev/iommu", O_RDWR); ··· 665 624 666 625 if (_test_cmd_mock_domain_replace(self->fd, stdev_id, hwpt_id, NULL)) 667 626 return -1; 627 + 628 + if (_test_cmd_hwpt_alloc(self->fd, idev_id, ioas_id, 0, 629 + IOMMU_HWPT_ALLOC_NEST_PARENT, &hwpt_id, 630 + IOMMU_HWPT_DATA_NONE, 0, 0)) 631 + return -1; 632 + 633 + if (_test_cmd_viommu_alloc(self->fd, idev_id, hwpt_id, 634 + IOMMU_VIOMMU_TYPE_SELFTEST, 0, &viommu_id)) 635 + return -1; 636 + 637 + if (_test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, 0, &vdev_id)) 638 + return -1; 639 + 668 640 return 0; 669 641 } 670 642
+174
tools/testing/selftests/iommu/iommufd_utils.h
··· 22 22 #define BIT_MASK(nr) (1UL << ((nr) % __BITS_PER_LONG)) 23 23 #define BIT_WORD(nr) ((nr) / __BITS_PER_LONG) 24 24 25 + enum { 26 + IOPT_PAGES_ACCOUNT_NONE = 0, 27 + IOPT_PAGES_ACCOUNT_USER = 1, 28 + IOPT_PAGES_ACCOUNT_MM = 2, 29 + }; 30 + 25 31 #define DIV_ROUND_UP(n, d) (((n) + (d) - 1) / (d)) 26 32 27 33 static inline void set_bit(unsigned int nr, unsigned long *addr) ··· 46 40 static void *buffer; 47 41 static unsigned long BUFFER_SIZE; 48 42 43 + static void *mfd_buffer; 44 + static int mfd; 45 + 49 46 static unsigned long PAGE_SIZE; 50 47 51 48 #define sizeof_field(TYPE, MEMBER) sizeof((((TYPE *)0)->MEMBER)) 52 49 #define offsetofend(TYPE, MEMBER) \ 53 50 (offsetof(TYPE, MEMBER) + sizeof_field(TYPE, MEMBER)) 51 + 52 + static inline void *memfd_mmap(size_t length, int prot, int flags, int *mfd_p) 53 + { 54 + int mfd_flags = (flags & MAP_HUGETLB) ? MFD_HUGETLB : 0; 55 + int mfd = memfd_create("buffer", mfd_flags); 56 + 57 + if (mfd <= 0) 58 + return MAP_FAILED; 59 + if (ftruncate(mfd, length)) 60 + return MAP_FAILED; 61 + *mfd_p = mfd; 62 + return mmap(0, length, prot, flags, mfd, 0); 63 + } 54 64 55 65 /* 56 66 * Have the kernel check the refcount on pages. I don't know why a freshly ··· 256 234 test_cmd_hwpt_check_iotlb(hwpt_id, i, expected); \ 257 235 }) 258 236 237 + #define test_cmd_dev_check_cache(device_id, cache_id, expected) \ 238 + ({ \ 239 + struct iommu_test_cmd test_cmd = { \ 240 + .size = sizeof(test_cmd), \ 241 + .op = IOMMU_TEST_OP_DEV_CHECK_CACHE, \ 242 + .id = device_id, \ 243 + .check_dev_cache = { \ 244 + .id = cache_id, \ 245 + .cache = expected, \ 246 + }, \ 247 + }; \ 248 + ASSERT_EQ(0, ioctl(self->fd, \ 249 + _IOMMU_TEST_CMD( \ 250 + IOMMU_TEST_OP_DEV_CHECK_CACHE), \ 251 + &test_cmd)); \ 252 + }) 253 + 254 + #define test_cmd_dev_check_cache_all(device_id, expected) \ 255 + ({ \ 256 + int c; \ 257 + for (c = 0; c < MOCK_DEV_CACHE_NUM; c++) \ 258 + test_cmd_dev_check_cache(device_id, c, expected); \ 259 + }) 260 + 259 261 static int _test_cmd_hwpt_invalidate(int fd, __u32 hwpt_id, void *reqs, 260 262 uint32_t data_type, uint32_t lreq, 261 263 uint32_t *nreqs) ··· 309 263 EXPECT_ERRNO(_errno, _test_cmd_hwpt_invalidate( \ 310 264 self->fd, hwpt_id, reqs, \ 311 265 data_type, lreq, nreqs)); \ 266 + }) 267 + 268 + static int _test_cmd_viommu_invalidate(int fd, __u32 viommu_id, void *reqs, 269 + uint32_t data_type, uint32_t lreq, 270 + uint32_t *nreqs) 271 + { 272 + struct iommu_hwpt_invalidate cmd = { 273 + .size = sizeof(cmd), 274 + .hwpt_id = viommu_id, 275 + .data_type = data_type, 276 + .data_uptr = (uint64_t)reqs, 277 + .entry_len = lreq, 278 + .entry_num = *nreqs, 279 + }; 280 + int rc = ioctl(fd, IOMMU_HWPT_INVALIDATE, &cmd); 281 + *nreqs = cmd.entry_num; 282 + return rc; 283 + } 284 + 285 + #define test_cmd_viommu_invalidate(viommu, reqs, lreq, nreqs) \ 286 + ({ \ 287 + ASSERT_EQ(0, \ 288 + _test_cmd_viommu_invalidate(self->fd, viommu, reqs, \ 289 + IOMMU_VIOMMU_INVALIDATE_DATA_SELFTEST, \ 290 + lreq, nreqs)); \ 291 + }) 292 + #define test_err_viommu_invalidate(_errno, viommu_id, reqs, data_type, lreq, \ 293 + nreqs) \ 294 + ({ \ 295 + EXPECT_ERRNO(_errno, _test_cmd_viommu_invalidate( \ 296 + self->fd, viommu_id, reqs, \ 297 + data_type, lreq, nreqs)); \ 312 298 }) 313 299 314 300 static int _test_cmd_access_replace_ioas(int fd, __u32 access_id, ··· 667 589 EXPECT_ERRNO(_errno, _test_ioctl_ioas_unmap(self->fd, self->ioas_id, \ 668 590 iova, length, NULL)) 669 591 592 + static int _test_ioctl_ioas_map_file(int fd, unsigned int ioas_id, int mfd, 593 + size_t start, size_t length, __u64 *iova, 594 + unsigned int flags) 595 + { 596 + struct iommu_ioas_map_file cmd = { 597 + .size = sizeof(cmd), 598 + .flags = flags, 599 + .ioas_id = ioas_id, 600 + .fd = mfd, 601 + .start = start, 602 + .length = length, 603 + }; 604 + int ret; 605 + 606 + if (flags & IOMMU_IOAS_MAP_FIXED_IOVA) 607 + cmd.iova = *iova; 608 + 609 + ret = ioctl(fd, IOMMU_IOAS_MAP_FILE, &cmd); 610 + *iova = cmd.iova; 611 + return ret; 612 + } 613 + 614 + #define test_ioctl_ioas_map_file(mfd, start, length, iova_p) \ 615 + ASSERT_EQ(0, \ 616 + _test_ioctl_ioas_map_file( \ 617 + self->fd, self->ioas_id, mfd, start, length, iova_p, \ 618 + IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IOAS_MAP_READABLE)) 619 + 620 + #define test_err_ioctl_ioas_map_file(_errno, mfd, start, length, iova_p) \ 621 + EXPECT_ERRNO( \ 622 + _errno, \ 623 + _test_ioctl_ioas_map_file( \ 624 + self->fd, self->ioas_id, mfd, start, length, iova_p, \ 625 + IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IOAS_MAP_READABLE)) 626 + 627 + #define test_ioctl_ioas_map_id_file(ioas_id, mfd, start, length, iova_p) \ 628 + ASSERT_EQ(0, \ 629 + _test_ioctl_ioas_map_file( \ 630 + self->fd, ioas_id, mfd, start, length, iova_p, \ 631 + IOMMU_IOAS_MAP_WRITEABLE | IOMMU_IOAS_MAP_READABLE)) 632 + 670 633 static int _test_ioctl_set_temp_memory_limit(int fd, unsigned int limit) 671 634 { 672 635 struct iommu_test_cmd memlimit_cmd = { ··· 881 762 882 763 #define test_cmd_trigger_iopf(device_id, fault_fd) \ 883 764 ASSERT_EQ(0, _test_cmd_trigger_iopf(self->fd, device_id, fault_fd)) 765 + 766 + static int _test_cmd_viommu_alloc(int fd, __u32 device_id, __u32 hwpt_id, 767 + __u32 type, __u32 flags, __u32 *viommu_id) 768 + { 769 + struct iommu_viommu_alloc cmd = { 770 + .size = sizeof(cmd), 771 + .flags = flags, 772 + .type = type, 773 + .dev_id = device_id, 774 + .hwpt_id = hwpt_id, 775 + }; 776 + int ret; 777 + 778 + ret = ioctl(fd, IOMMU_VIOMMU_ALLOC, &cmd); 779 + if (ret) 780 + return ret; 781 + if (viommu_id) 782 + *viommu_id = cmd.out_viommu_id; 783 + return 0; 784 + } 785 + 786 + #define test_cmd_viommu_alloc(device_id, hwpt_id, type, viommu_id) \ 787 + ASSERT_EQ(0, _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ 788 + type, 0, viommu_id)) 789 + #define test_err_viommu_alloc(_errno, device_id, hwpt_id, type, viommu_id) \ 790 + EXPECT_ERRNO(_errno, \ 791 + _test_cmd_viommu_alloc(self->fd, device_id, hwpt_id, \ 792 + type, 0, viommu_id)) 793 + 794 + static int _test_cmd_vdevice_alloc(int fd, __u32 viommu_id, __u32 idev_id, 795 + __u64 virt_id, __u32 *vdev_id) 796 + { 797 + struct iommu_vdevice_alloc cmd = { 798 + .size = sizeof(cmd), 799 + .dev_id = idev_id, 800 + .viommu_id = viommu_id, 801 + .virt_id = virt_id, 802 + }; 803 + int ret; 804 + 805 + ret = ioctl(fd, IOMMU_VDEVICE_ALLOC, &cmd); 806 + if (ret) 807 + return ret; 808 + if (vdev_id) 809 + *vdev_id = cmd.out_vdevice_id; 810 + return 0; 811 + } 812 + 813 + #define test_cmd_vdevice_alloc(viommu_id, idev_id, virt_id, vdev_id) \ 814 + ASSERT_EQ(0, _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \ 815 + virt_id, vdev_id)) 816 + #define test_err_vdevice_alloc(_errno, viommu_id, idev_id, virt_id, vdev_id) \ 817 + EXPECT_ERRNO(_errno, \ 818 + _test_cmd_vdevice_alloc(self->fd, viommu_id, idev_id, \ 819 + virt_id, vdev_id))