Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

KVM: x86/mmu: Don't force atomic update if only the Accessed bit is volatile

Don't force SPTE modifications to be done atomically if the only volatile
bit in the SPTE is the Accessed bit. KVM and the primary MMU tolerate
stale aging state, and the probability of an Accessed bit A/D assist being
clobbered *and* affecting again is likely far lower than the probability
of consuming stale information due to not flushing TLBs when aging.

Rename spte_has_volatile_bits() to spte_needs_atomic_update() to better
capture the nature of the helper.

Opportunstically do s/write/update on the TDP MMU wrapper, as it's not
simply the "write" that needs to be done atomically, it's the entire
update, i.e. the entire read-modify-write operation needs to be done
atomically so that KVM has an accurate view of the old SPTE.

Leave kvm_tdp_mmu_write_spte_atomic() as is. While the name is imperfect,
it pairs with kvm_tdp_mmu_write_spte(), which in turn pairs with
kvm_tdp_mmu_read_spte(). And renaming all of those isn't obviously a net
positive, and would require significant churn.

Signed-off-by: James Houghton <jthoughton@google.com>
Link: https://lore.kernel.org/r/20250204004038.1680123-6-jthoughton@google.com
Co-developed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>

authored by

James Houghton and committed by
Sean Christopherson
61d65f2d e29b7492

+28 -30
+2 -2
Documentation/virt/kvm/locking.rst
··· 196 196 The Dirty bit is lost in this case. 197 197 198 198 In order to avoid this kind of issue, we always treat the spte as "volatile" 199 - if it can be updated out of mmu-lock [see spte_has_volatile_bits()]; it means 199 + if it can be updated out of mmu-lock [see spte_needs_atomic_update()]; it means 200 200 the spte is always atomically updated in this case. 201 201 202 202 3) flush tlbs due to spte updated ··· 212 212 213 213 Since the spte is "volatile" if it can be updated out of mmu-lock, we always 214 214 atomically update the spte and the race caused by fast page fault can be avoided. 215 - See the comments in spte_has_volatile_bits() and mmu_spte_update(). 215 + See the comments in spte_needs_atomic_update() and mmu_spte_update(). 216 216 217 217 Lockless Access Tracking: 218 218
+2 -2
arch/x86/kvm/mmu/mmu.c
··· 501 501 return false; 502 502 } 503 503 504 - if (!spte_has_volatile_bits(old_spte)) 504 + if (!spte_needs_atomic_update(old_spte)) 505 505 __update_clear_spte_fast(sptep, new_spte); 506 506 else 507 507 old_spte = __update_clear_spte_slow(sptep, new_spte); ··· 524 524 int level = sptep_to_sp(sptep)->role.level; 525 525 526 526 if (!is_shadow_present_pte(old_spte) || 527 - !spte_has_volatile_bits(old_spte)) 527 + !spte_needs_atomic_update(old_spte)) 528 528 __update_clear_spte_fast(sptep, SHADOW_NONPRESENT_VALUE); 529 529 else 530 530 old_spte = __update_clear_spte_slow(sptep, SHADOW_NONPRESENT_VALUE);
+16 -11
arch/x86/kvm/mmu/spte.c
··· 129 129 } 130 130 131 131 /* 132 - * Returns true if the SPTE has bits that may be set without holding mmu_lock. 133 - * The caller is responsible for checking if the SPTE is shadow-present, and 134 - * for determining whether or not the caller cares about non-leaf SPTEs. 132 + * Returns true if the SPTE needs to be updated atomically due to having bits 133 + * that may be changed without holding mmu_lock, and for which KVM must not 134 + * lose information. E.g. KVM must not drop Dirty bit information. The caller 135 + * is responsible for checking if the SPTE is shadow-present, and for 136 + * determining whether or not the caller cares about non-leaf SPTEs. 135 137 */ 136 - bool spte_has_volatile_bits(u64 spte) 138 + bool spte_needs_atomic_update(u64 spte) 137 139 { 140 + /* SPTEs can be made Writable bit by KVM's fast page fault handler. */ 138 141 if (!is_writable_pte(spte) && is_mmu_writable_spte(spte)) 139 142 return true; 140 143 144 + /* Access-tracked SPTEs can be restored by KVM's fast page fault handler. */ 141 145 if (is_access_track_spte(spte)) 142 146 return true; 143 147 144 - if (spte_ad_enabled(spte)) { 145 - if (!(spte & shadow_accessed_mask) || 146 - (is_writable_pte(spte) && !(spte & shadow_dirty_mask))) 147 - return true; 148 - } 149 - 150 - return false; 148 + /* 149 + * Dirty and Accessed bits can be set by the CPU. Ignore the Accessed 150 + * bit, as KVM tolerates false negatives/positives, e.g. KVM doesn't 151 + * invalidate TLBs when aging SPTEs, and so it's safe to clobber the 152 + * Accessed bit (and rare in practice). 153 + */ 154 + return spte_ad_enabled(spte) && is_writable_pte(spte) && 155 + !(spte & shadow_dirty_mask); 151 156 } 152 157 153 158 bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp,
+1 -1
arch/x86/kvm/mmu/spte.h
··· 519 519 return gen; 520 520 } 521 521 522 - bool spte_has_volatile_bits(u64 spte); 522 + bool spte_needs_atomic_update(u64 spte); 523 523 524 524 bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 525 525 const struct kvm_memory_slot *slot,
+7 -14
arch/x86/kvm/mmu/tdp_iter.h
··· 39 39 } 40 40 41 41 /* 42 - * SPTEs must be modified atomically if they are shadow-present, leaf 43 - * SPTEs, and have volatile bits, i.e. has bits that can be set outside 44 - * of mmu_lock. The Writable bit can be set by KVM's fast page fault 45 - * handler, and Accessed and Dirty bits can be set by the CPU. 46 - * 47 - * Note, non-leaf SPTEs do have Accessed bits and those bits are 48 - * technically volatile, but KVM doesn't consume the Accessed bit of 49 - * non-leaf SPTEs, i.e. KVM doesn't care if it clobbers the bit. This 50 - * logic needs to be reassessed if KVM were to use non-leaf Accessed 51 - * bits, e.g. to skip stepping down into child SPTEs when aging SPTEs. 42 + * SPTEs must be modified atomically if they are shadow-present, leaf SPTEs, 43 + * and have volatile bits (bits that can be set outside of mmu_lock) that 44 + * must not be clobbered. 52 45 */ 53 - static inline bool kvm_tdp_mmu_spte_need_atomic_write(u64 old_spte, int level) 46 + static inline bool kvm_tdp_mmu_spte_need_atomic_update(u64 old_spte, int level) 54 47 { 55 48 return is_shadow_present_pte(old_spte) && 56 49 is_last_spte(old_spte, level) && 57 - spte_has_volatile_bits(old_spte); 50 + spte_needs_atomic_update(old_spte); 58 51 } 59 52 60 53 static inline u64 kvm_tdp_mmu_write_spte(tdp_ptep_t sptep, u64 old_spte, 61 54 u64 new_spte, int level) 62 55 { 63 - if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) 56 + if (kvm_tdp_mmu_spte_need_atomic_update(old_spte, level)) 64 57 return kvm_tdp_mmu_write_spte_atomic(sptep, new_spte); 65 58 66 59 __kvm_tdp_mmu_write_spte(sptep, new_spte); ··· 63 70 static inline u64 tdp_mmu_clear_spte_bits(tdp_ptep_t sptep, u64 old_spte, 64 71 u64 mask, int level) 65 72 { 66 - if (kvm_tdp_mmu_spte_need_atomic_write(old_spte, level)) 73 + if (kvm_tdp_mmu_spte_need_atomic_update(old_spte, level)) 67 74 return tdp_mmu_clear_spte_bits_atomic(sptep, mask); 68 75 69 76 __kvm_tdp_mmu_write_spte(sptep, old_spte & ~mask);