Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

KVM: selftests: Test prefault memory during concurrent memslot removal

Expand the prefault memory selftest to add a regression test for a KVM bug
where KVM's retry logic would result in (breakable) deadlock due to the
memslot deletion waiting on prefaulting to release SRCU, and prefaulting
waiting on the memslot to fully disappear (KVM uses a two-step process to
delete memslots, and KVM x86 retries page faults if a to-be-deleted, a.k.a.
INVALID, memslot is encountered).

To exercise concurrent memslot remove, spawn a second thread to initiate
memslot removal at roughly the same time as prefaulting. Test memslot
removal for all testcases, i.e. don't limit concurrent removal to only the
success case. There are essentially three prefault scenarios (so far)
that are of interest:

1. Success
2. ENOENT due to no memslot
3. EAGAIN due to INVALID memslot

For all intents and purposes, #1 and #2 are mutually exclusive, or rather,
easier to test via separate testcases since writing to non-existent memory
is trivial. But for #3, making it mutually exclusive with #1 _or_ #2 is
actually more complex than testing memslot removal for all scenarios. The
only requirement to let memslot removal coexist with other scenarios is a
way to guarantee a stable result, e.g. that the "no memslot" test observes
ENOENT, not EAGAIN, for the final checks.

So, rather than make memslot removal mutually exclusive with the ENOENT
scenario, simply restore the memslot and retry prefaulting. For the "no
memslot" case, KVM_PRE_FAULT_MEMORY should be idempotent, i.e. should
always fail with ENOENT regardless of how many times userspace attempts
prefaulting.

Pass in both the base GPA and the offset (instead of the "full" GPA) so
that the worker can recreate the memslot.

Signed-off-by: Yan Zhao <yan.y.zhao@intel.com>
Co-developed-by: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20250924174255.2141847-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

authored by

Yan Zhao and committed by
Sean Christopherson
1bcc3f87 6b36119b

+114 -17
+114 -17
tools/testing/selftests/kvm/pre_fault_memory_test.c
··· 10 10 #include <test_util.h> 11 11 #include <kvm_util.h> 12 12 #include <processor.h> 13 + #include <pthread.h> 13 14 14 15 /* Arbitrarily chosen values */ 15 16 #define TEST_SIZE (SZ_2M + PAGE_SIZE) ··· 31 30 GUEST_DONE(); 32 31 } 33 32 34 - static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 gpa, u64 size, 35 - u64 left) 33 + struct slot_worker_data { 34 + struct kvm_vm *vm; 35 + u64 gpa; 36 + uint32_t flags; 37 + bool worker_ready; 38 + bool prefault_ready; 39 + bool recreate_slot; 40 + }; 41 + 42 + static void *delete_slot_worker(void *__data) 43 + { 44 + struct slot_worker_data *data = __data; 45 + struct kvm_vm *vm = data->vm; 46 + 47 + WRITE_ONCE(data->worker_ready, true); 48 + 49 + while (!READ_ONCE(data->prefault_ready)) 50 + cpu_relax(); 51 + 52 + vm_mem_region_delete(vm, TEST_SLOT); 53 + 54 + while (!READ_ONCE(data->recreate_slot)) 55 + cpu_relax(); 56 + 57 + vm_userspace_mem_region_add(vm, VM_MEM_SRC_ANONYMOUS, data->gpa, 58 + TEST_SLOT, TEST_NPAGES, data->flags); 59 + 60 + return NULL; 61 + } 62 + 63 + static void pre_fault_memory(struct kvm_vcpu *vcpu, u64 base_gpa, u64 offset, 64 + u64 size, u64 expected_left, bool private) 36 65 { 37 66 struct kvm_pre_fault_memory range = { 38 - .gpa = gpa, 67 + .gpa = base_gpa + offset, 39 68 .size = size, 40 69 .flags = 0, 41 70 }; 42 - u64 prev; 71 + struct slot_worker_data data = { 72 + .vm = vcpu->vm, 73 + .gpa = base_gpa, 74 + .flags = private ? KVM_MEM_GUEST_MEMFD : 0, 75 + }; 76 + bool slot_recreated = false; 77 + pthread_t slot_worker; 43 78 int ret, save_errno; 79 + u64 prev; 44 80 45 - do { 81 + /* 82 + * Concurrently delete (and recreate) the slot to test KVM's handling 83 + * of a racing memslot deletion with prefaulting. 84 + */ 85 + pthread_create(&slot_worker, NULL, delete_slot_worker, &data); 86 + 87 + while (!READ_ONCE(data.worker_ready)) 88 + cpu_relax(); 89 + 90 + WRITE_ONCE(data.prefault_ready, true); 91 + 92 + for (;;) { 46 93 prev = range.size; 47 94 ret = __vcpu_ioctl(vcpu, KVM_PRE_FAULT_MEMORY, &range); 48 95 save_errno = errno; ··· 98 49 "%sexpecting range.size to change on %s", 99 50 ret < 0 ? "not " : "", 100 51 ret < 0 ? "failure" : "success"); 101 - } while (ret >= 0 ? range.size : save_errno == EINTR); 102 52 103 - TEST_ASSERT(range.size == left, 104 - "Completed with %lld bytes left, expected %" PRId64, 105 - range.size, left); 53 + /* 54 + * Immediately retry prefaulting if KVM was interrupted by an 55 + * unrelated signal/event. 56 + */ 57 + if (ret < 0 && save_errno == EINTR) 58 + continue; 106 59 107 - if (left == 0) 108 - __TEST_ASSERT_VM_VCPU_IOCTL(!ret, "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm); 60 + /* 61 + * Tell the worker to recreate the slot in order to complete 62 + * prefaulting (if prefault didn't already succeed before the 63 + * slot was deleted) and/or to prepare for the next testcase. 64 + * Wait for the worker to exit so that the next invocation of 65 + * prefaulting is guaranteed to complete (assuming no KVM bugs). 66 + */ 67 + if (!slot_recreated) { 68 + WRITE_ONCE(data.recreate_slot, true); 69 + pthread_join(slot_worker, NULL); 70 + slot_recreated = true; 71 + 72 + /* 73 + * Retry prefaulting to get a stable result, i.e. to 74 + * avoid seeing random EAGAIN failures. Don't retry if 75 + * prefaulting already succeeded, as KVM disallows 76 + * prefaulting with size=0, i.e. blindly retrying would 77 + * result in test failures due to EINVAL. KVM should 78 + * always return success if all bytes are prefaulted, 79 + * i.e. there is no need to guard against EAGAIN being 80 + * returned. 81 + */ 82 + if (range.size) 83 + continue; 84 + } 85 + 86 + /* 87 + * All done if there are no remaining bytes to prefault, or if 88 + * prefaulting failed (EINTR was handled above, and EAGAIN due 89 + * to prefaulting a memslot that's being actively deleted should 90 + * be impossible since the memslot has already been recreated). 91 + */ 92 + if (!range.size || ret < 0) 93 + break; 94 + } 95 + 96 + TEST_ASSERT(range.size == expected_left, 97 + "Completed with %llu bytes left, expected %lu", 98 + range.size, expected_left); 99 + 100 + /* 101 + * Assert success if prefaulting the entire range should succeed, i.e. 102 + * complete with no bytes remaining. Otherwise prefaulting should have 103 + * failed due to ENOENT (due to RET_PF_EMULATE for emulated MMIO when 104 + * no memslot exists). 105 + */ 106 + if (!expected_left) 107 + TEST_ASSERT_VM_VCPU_IOCTL(!ret, KVM_PRE_FAULT_MEMORY, ret, vcpu->vm); 109 108 else 110 - /* No memory slot causes RET_PF_EMULATE. it results in -ENOENT. */ 111 - __TEST_ASSERT_VM_VCPU_IOCTL(ret && save_errno == ENOENT, 112 - "KVM_PRE_FAULT_MEMORY", ret, vcpu->vm); 109 + TEST_ASSERT_VM_VCPU_IOCTL(ret && save_errno == ENOENT, 110 + KVM_PRE_FAULT_MEMORY, ret, vcpu->vm); 113 111 } 114 112 115 113 static void __test_pre_fault_memory(unsigned long vm_type, bool private) ··· 193 97 194 98 if (private) 195 99 vm_mem_set_private(vm, guest_test_phys_mem, TEST_SIZE); 196 - pre_fault_memory(vcpu, guest_test_phys_mem, SZ_2M, 0); 197 - pre_fault_memory(vcpu, guest_test_phys_mem + SZ_2M, PAGE_SIZE * 2, PAGE_SIZE); 198 - pre_fault_memory(vcpu, guest_test_phys_mem + TEST_SIZE, PAGE_SIZE, PAGE_SIZE); 100 + 101 + pre_fault_memory(vcpu, guest_test_phys_mem, 0, SZ_2M, 0, private); 102 + pre_fault_memory(vcpu, guest_test_phys_mem, SZ_2M, PAGE_SIZE * 2, PAGE_SIZE, private); 103 + pre_fault_memory(vcpu, guest_test_phys_mem, TEST_SIZE, PAGE_SIZE, PAGE_SIZE, private); 199 104 200 105 vcpu_args_set(vcpu, 1, guest_test_virt_mem); 201 106 vcpu_run(vcpu);