commit d88ed5fb7c88f404e57fe2b2a6d19fefc35b4dc7 · tjh.dev/kernel

tjh.dev / kernel

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

KVM: selftests: Ensure all vCPUs hit -EFAULT during initial RO stage

During the initial mprotect(RO) stage of mmu_stress_test, keep vCPUs
spinning until all vCPUs have hit -EFAULT, i.e. until all vCPUs have tried
to write to a read-only page. If a vCPU manages to complete an entire
iteration of the loop without hitting a read-only page, *and* the vCPU
observes mprotect_ro_done before starting a second iteration, then the
vCPU will prematurely fall through to GUEST_SYNC(3) (on x86 and arm64) and
get out of sequence.

Replace the "do-while (!r)" loop around the associated _vcpu_run() with
a single invocation, as barring a KVM bug, the vCPU is guaranteed to hit
-EFAULT, and retrying on success is super confusion, hides KVM bugs, and
complicates this fix. The do-while loop was semi-unintentionally added
specifically to fudge around a KVM x86 bug, and said bug is unhittable
without modifying the test to force x86 down the !(x86||arm64) path.

On x86, if forced emulation is enabled, vcpu_arch_put_guest() may trigger
emulation of the store to memory. Due a (very, very) longstanding bug in
KVM x86's emulator, emulate writes to guest memory that fail during
__kvm_write_guest_page() unconditionally return KVM_EXIT_MMIO. While that
is desirable in the !memslot case, it's wrong in this case as the failure
happens due to __copy_to_user() hitting a read-only page, not an emulated
MMIO region.

But as above, x86 only uses vcpu_arch_put_guest() if the __x86_64__ guards
are clobbered to force x86 down the common path, and of course the
unexpected MMIO is a KVM bug, i.e. *should* cause a test failure.

Fixes: b6c304aec648 ("KVM: selftests: Verify KVM correctly handles mprotect(PROT_READ)")
Reported-by: Yan Zhao <yan.y.zhao@intel.com>
Closes: https://lore.kernel.org/all/20250208105318.16861-1-yan.y.zhao@intel.com
Debugged-by: Yan Zhao <yan.y.zhao@intel.com>
Reviewed-by: Yan Zhao <yan.y.zhao@intel.com>
Tested-by: Yan Zhao <yan.y.zhao@intel.com>
Link: https://lore.kernel.org/r/20250228230804.3845860-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

Sean Christopherson 10 months ago d88ed5fb 807cb9ce

+13 -8

1 changed file

expand all

unified split

tools

testing

selftests

kvm

mmu_stress_test.c

+13 -8

tools/testing/selftests/kvm/mmu_stress_test.c

··· 18 #include "ucall_common.h" 19 20 static bool mprotect_ro_done; 21 22 static void guest_code(uint64_t start_gpa, uint64_t end_gpa, uint64_t stride) 23 { ··· 37 38 /* 39 * Write to the region while mprotect(PROT_READ) is underway. Keep 40 - * looping until the memory is guaranteed to be read-only, otherwise 41 - * vCPUs may complete their writes and advance to the next stage 42 - * prematurely. 43 * 44 * For architectures that support skipping the faulting instruction, 45 * generate the store via inline assembly to ensure the exact length ··· 57 #else 58 vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa); 59 #endif 60 - } while (!READ_ONCE(mprotect_ro_done)); 61 62 /* 63 * Only architectures that write the entire range can explicitly sync, ··· 82 83 static int nr_vcpus; 84 static atomic_t rendezvous; 85 86 static void rendezvous_with_boss(void) 87 { ··· 150 * be stuck on the faulting instruction for other architectures. Go to 151 * stage 3 without a rendezvous 152 */ 153 - do { 154 - r = _vcpu_run(vcpu); 155 - } while (!r); 156 TEST_ASSERT(r == -1 && errno == EFAULT, 157 "Expected EFAULT on write to RO memory, got r = %d, errno = %d", r, errno); 158 159 #if defined(__x86_64__) || defined(__aarch64__) 160 /* ··· 384 rendezvous_with_vcpus(&time_run2, "run 2"); 385 386 mprotect(mem, slot_size, PROT_READ); 387 - usleep(10); 388 mprotect_ro_done = true; 389 sync_global_to_guest(vm, mprotect_ro_done); 390

··· 18 #include "ucall_common.h" 19 20 static bool mprotect_ro_done; 21 + static bool all_vcpus_hit_ro_fault; 22 23 static void guest_code(uint64_t start_gpa, uint64_t end_gpa, uint64_t stride) 24 { ··· 36 37 /* 38 * Write to the region while mprotect(PROT_READ) is underway. Keep 39 + * looping until the memory is guaranteed to be read-only and a fault 40 + * has occurred, otherwise vCPUs may complete their writes and advance 41 + * to the next stage prematurely. 42 * 43 * For architectures that support skipping the faulting instruction, 44 * generate the store via inline assembly to ensure the exact length ··· 56 #else 57 vcpu_arch_put_guest(*((volatile uint64_t *)gpa), gpa); 58 #endif 59 + } while (!READ_ONCE(mprotect_ro_done) || !READ_ONCE(all_vcpus_hit_ro_fault)); 60 61 /* 62 * Only architectures that write the entire range can explicitly sync, ··· 81 82 static int nr_vcpus; 83 static atomic_t rendezvous; 84 + static atomic_t nr_ro_faults; 85 86 static void rendezvous_with_boss(void) 87 { ··· 148 * be stuck on the faulting instruction for other architectures. Go to 149 * stage 3 without a rendezvous 150 */ 151 + r = _vcpu_run(vcpu); 152 TEST_ASSERT(r == -1 && errno == EFAULT, 153 "Expected EFAULT on write to RO memory, got r = %d, errno = %d", r, errno); 154 + 155 + atomic_inc(&nr_ro_faults); 156 + if (atomic_read(&nr_ro_faults) == nr_vcpus) { 157 + WRITE_ONCE(all_vcpus_hit_ro_fault, true); 158 + sync_global_to_guest(vm, all_vcpus_hit_ro_fault); 159 + } 160 161 #if defined(__x86_64__) || defined(__aarch64__) 162 /* ··· 378 rendezvous_with_vcpus(&time_run2, "run 2"); 379 380 mprotect(mem, slot_size, PROT_READ); 381 mprotect_ro_done = true; 382 sync_global_to_guest(vm, mprotect_ro_done); 383