Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

iommu/vt-d: Clear Present bit before tearing down context entry

When tearing down a context entry, the current implementation zeros the
entire 128-bit entry using multiple 64-bit writes. This creates a window
where the hardware can fetch a "torn" entry — where some fields are
already zeroed while the 'Present' bit is still set — leading to
unpredictable behavior or spurious faults.

While x86 provides strong write ordering, the compiler may reorder writes
to the two 64-bit halves of the context entry. Even without compiler
reordering, the hardware fetch is not guaranteed to be atomic with
respect to multiple CPU writes.

Align with the "Guidance to Software for Invalidations" in the VT-d spec
(Section 6.5.3.3) by implementing the recommended ownership handshake:

1. Clear only the 'Present' (P) bit of the context entry first to
signal the transition of ownership from hardware to software.
2. Use dma_wmb() to ensure the cleared bit is visible to the IOMMU.
3. Perform the required cache and context-cache invalidation to ensure
hardware no longer has cached references to the entry.
4. Fully zero out the entry only after the invalidation is complete.

Also, add a dma_wmb() to context_set_present() to ensure the entry
is fully initialized before the 'Present' bit becomes visible.

Fixes: ba39592764ed2 ("Intel IOMMU: Intel IOMMU driver")
Reported-by: Dmytro Maluka <dmaluka@chromium.org>
Closes: https://lore.kernel.org/all/aTG7gc7I5wExai3S@google.com/
Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com>
Reviewed-by: Dmytro Maluka <dmaluka@chromium.org>
Reviewed-by: Samiullah Khawaja <skhawaja@google.com>
Reviewed-by: Kevin Tian <kevin.tian@intel.com>
Link: https://lore.kernel.org/r/20260120061816.2132558-3-baolu.lu@linux.intel.com
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>

authored by

Lu Baolu and committed by
Joerg Roedel
c1e4f1dc 75ed0005

+27 -3
+3 -1
drivers/iommu/intel/iommu.c
··· 1240 1240 } 1241 1241 1242 1242 did = context_domain_id(context); 1243 - context_clear_entry(context); 1243 + context_clear_present(context); 1244 1244 __iommu_flush_cache(iommu, context, sizeof(*context)); 1245 1245 spin_unlock(&iommu->lock); 1246 1246 intel_context_flush_no_pasid(info, context, did); 1247 + context_clear_entry(context); 1248 + __iommu_flush_cache(iommu, context, sizeof(*context)); 1247 1249 } 1248 1250 1249 1251 int __domain_setup_first_level(struct intel_iommu *iommu, struct device *dev,
+20 -1
drivers/iommu/intel/iommu.h
··· 900 900 901 901 static inline void context_set_present(struct context_entry *context) 902 902 { 903 - context->lo |= 1; 903 + u64 val; 904 + 905 + dma_wmb(); 906 + val = READ_ONCE(context->lo) | 1; 907 + WRITE_ONCE(context->lo, val); 908 + } 909 + 910 + /* 911 + * Clear the Present (P) bit (bit 0) of a context table entry. This initiates 912 + * the transition of the entry's ownership from hardware to software. The 913 + * caller is responsible for fulfilling the invalidation handshake recommended 914 + * by the VT-d spec, Section 6.5.3.3 (Guidance to Software for Invalidations). 915 + */ 916 + static inline void context_clear_present(struct context_entry *context) 917 + { 918 + u64 val; 919 + 920 + val = READ_ONCE(context->lo) & GENMASK_ULL(63, 1); 921 + WRITE_ONCE(context->lo, val); 922 + dma_wmb(); 904 923 } 905 924 906 925 static inline void context_set_fault_enable(struct context_entry *context)
+4 -1
drivers/iommu/intel/pasid.c
··· 1024 1024 } 1025 1025 1026 1026 if (context_copied(iommu, bus, devfn)) { 1027 - context_clear_entry(context); 1027 + context_clear_present(context); 1028 1028 __iommu_flush_cache(iommu, context, sizeof(*context)); 1029 1029 1030 1030 /* ··· 1043 1043 qi_flush_pasid_cache(iommu, 0, QI_PC_GLOBAL, 0); 1044 1044 iommu->flush.flush_iotlb(iommu, 0, 0, 0, DMA_TLB_GLOBAL_FLUSH); 1045 1045 devtlb_invalidation_with_pasid(iommu, dev, IOMMU_NO_PASID); 1046 + 1047 + context_clear_entry(context); 1048 + __iommu_flush_cache(iommu, context, sizeof(*context)); 1046 1049 1047 1050 /* 1048 1051 * At this point, the device is supposed to finish reset at