Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

KVM: SVM: Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM CPUs

Add support for KVM_CAP_X86_BUS_LOCK_EXIT on SVM CPUs with Bus Lock
Threshold, which is close enough to VMX's Bus Lock Detection VM-Exit to
allow reusing KVM_CAP_X86_BUS_LOCK_EXIT.

The biggest difference between the two features is that Threshold is
fault-like, whereas Detection is trap-like. To allow the guest to make
forward progress, Threshold provides a per-VMCB counter which is
decremented every time a bus lock occurs, and a VM-Exit is triggered if
and only if the counter is '0'.

To provide Detection-like semantics, initialize the counter to '0', i.e.
exit on every bus lock, and when re-executing the guilty instruction, set
the counter to '1' to effectively step past the instruction.

Note, in the unlikely scenario that re-executing the instruction doesn't
trigger a bus lock, e.g. because the guest has changed memory types or
patched the guilty instruction, the bus lock counter will be left at '1',
i.e. the guest will be able to do a bus lock on a different instruction.
In a perfect world, KVM would ensure the counter is '0' if the guest has
made forward progress, e.g. if RIP has changed. But trying to close that
hole would incur non-trivial complexity, for marginal benefit; the intent
of KVM_CAP_X86_BUS_LOCK_EXIT is to allow userspace rate-limit bus locks,
not to allow for precise detection of problematic guest code. And, it's
simply not feasible to fully close the hole, e.g. if an interrupt arrives
before the original instruction can re-execute, the guest could step past
a different bus lock.

Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Manali Shukla <manali.shukla@amd.com>
Link: https://lore.kernel.org/r/20250502050346.14274-5-manali.shukla@amd.com
[sean: fix typo in comment]
Signed-off-by: Sean Christopherson <seanjc@google.com>

authored by

Manali Shukla and committed by
Sean Christopherson
89f9edf4 827547bc

+78
+5
Documentation/virt/kvm/api.rst
··· 7989 7989 KVM_RUN_X86_BUS_LOCK in vcpu-run->flags, and conditionally sets the exit_reason 7990 7990 to KVM_EXIT_X86_BUS_LOCK. 7991 7991 7992 + Due to differences in the underlying hardware implementation, the vCPU's RIP at 7993 + the time of exit diverges between Intel and AMD. On Intel hosts, RIP points at 7994 + the next instruction, i.e. the exit is trap-like. On AMD hosts, RIP points at 7995 + the offending instruction, i.e. the exit is fault-like. 7996 + 7992 7997 Note! Detected bus locks may be coincident with other exits to userspace, i.e. 7993 7998 KVM_RUN_X86_BUS_LOCK should be checked regardless of the primary exit reason if 7994 7999 userspace wants to take action on all detected bus locks.
+34
arch/x86/kvm/svm/nested.c
··· 678 678 vmcb02->control.iopm_base_pa = vmcb01->control.iopm_base_pa; 679 679 vmcb02->control.msrpm_base_pa = vmcb01->control.msrpm_base_pa; 680 680 681 + /* 682 + * Stash vmcb02's counter if the guest hasn't moved past the guilty 683 + * instruction; otherwise, reset the counter to '0'. 684 + * 685 + * In order to detect if L2 has made forward progress or not, track the 686 + * RIP at which a bus lock has occurred on a per-vmcb12 basis. If RIP 687 + * is changed, guest has clearly made forward progress, bus_lock_counter 688 + * still remained '1', so reset bus_lock_counter to '0'. Eg. In the 689 + * scenario, where a buslock happened in L1 before VMRUN, the bus lock 690 + * firmly happened on an instruction in the past. Even if vmcb01's 691 + * counter is still '1', (because the guilty instruction got patched), 692 + * the vCPU has clearly made forward progress and so KVM should reset 693 + * vmcb02's counter to '0'. 694 + * 695 + * If the RIP hasn't changed, stash the bus lock counter at nested VMRUN 696 + * to prevent the same guilty instruction from triggering a VM-Exit. Eg. 697 + * if userspace rate-limits the vCPU, then it's entirely possible that 698 + * L1's tick interrupt is pending by the time userspace re-runs the 699 + * vCPU. If KVM unconditionally clears the counter on VMRUN, then when 700 + * L1 re-enters L2, the same instruction will trigger a VM-Exit and the 701 + * entire cycle start over. 702 + */ 703 + if (vmcb02->save.rip && (svm->nested.ctl.bus_lock_rip == vmcb02->save.rip)) 704 + vmcb02->control.bus_lock_counter = 1; 705 + else 706 + vmcb02->control.bus_lock_counter = 0; 707 + 681 708 /* Done at vmrun: asid. */ 682 709 683 710 /* Also overwritten later if necessary. */ ··· 1065 1038 vmcb_mark_dirty(vmcb01, VMCB_INTERCEPTS); 1066 1039 1067 1040 } 1041 + 1042 + /* 1043 + * Invalidate bus_lock_rip unless KVM is still waiting for the guest 1044 + * to make forward progress before re-enabling bus lock detection. 1045 + */ 1046 + if (!vmcb02->control.bus_lock_counter) 1047 + svm->nested.ctl.bus_lock_rip = INVALID_GPA; 1068 1048 1069 1049 nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr); 1070 1050
+38
arch/x86/kvm/svm/svm.c
··· 1384 1384 svm->vmcb->control.int_ctl |= V_GIF_ENABLE_MASK; 1385 1385 } 1386 1386 1387 + if (vcpu->kvm->arch.bus_lock_detection_enabled) 1388 + svm_set_intercept(svm, INTERCEPT_BUSLOCK); 1389 + 1387 1390 if (sev_guest(vcpu->kvm)) 1388 1391 sev_init_vmcb(svm); 1389 1392 ··· 3309 3306 return kvm_handle_invpcid(vcpu, type, gva); 3310 3307 } 3311 3308 3309 + static inline int complete_userspace_buslock(struct kvm_vcpu *vcpu) 3310 + { 3311 + struct vcpu_svm *svm = to_svm(vcpu); 3312 + 3313 + /* 3314 + * If userspace has NOT changed RIP, then KVM's ABI is to let the guest 3315 + * execute the bus-locking instruction. Set the bus lock counter to '1' 3316 + * to effectively step past the bus lock. 3317 + */ 3318 + if (kvm_is_linear_rip(vcpu, vcpu->arch.cui_linear_rip)) 3319 + svm->vmcb->control.bus_lock_counter = 1; 3320 + 3321 + return 1; 3322 + } 3323 + 3324 + static int bus_lock_exit(struct kvm_vcpu *vcpu) 3325 + { 3326 + struct vcpu_svm *svm = to_svm(vcpu); 3327 + 3328 + vcpu->run->exit_reason = KVM_EXIT_X86_BUS_LOCK; 3329 + vcpu->run->flags |= KVM_RUN_X86_BUS_LOCK; 3330 + 3331 + vcpu->arch.cui_linear_rip = kvm_get_linear_rip(vcpu); 3332 + vcpu->arch.complete_userspace_io = complete_userspace_buslock; 3333 + 3334 + if (is_guest_mode(vcpu)) 3335 + svm->nested.ctl.bus_lock_rip = vcpu->arch.cui_linear_rip; 3336 + 3337 + return 0; 3338 + } 3339 + 3312 3340 static int (*const svm_exit_handlers[])(struct kvm_vcpu *vcpu) = { 3313 3341 [SVM_EXIT_READ_CR0] = cr_interception, 3314 3342 [SVM_EXIT_READ_CR3] = cr_interception, ··· 3409 3375 [SVM_EXIT_INVPCID] = invpcid_interception, 3410 3376 [SVM_EXIT_IDLE_HLT] = kvm_emulate_halt, 3411 3377 [SVM_EXIT_NPF] = npf_interception, 3378 + [SVM_EXIT_BUS_LOCK] = bus_lock_exit, 3412 3379 [SVM_EXIT_RSM] = rsm_interception, 3413 3380 [SVM_EXIT_AVIC_INCOMPLETE_IPI] = avic_incomplete_ipi_interception, 3414 3381 [SVM_EXIT_AVIC_UNACCELERATED_ACCESS] = avic_unaccelerated_access_interception, ··· 5411 5376 /* Nested VM can receive #VMEXIT instead of triggering #GP */ 5412 5377 kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); 5413 5378 } 5379 + 5380 + if (cpu_feature_enabled(X86_FEATURE_BUS_LOCK_THRESHOLD)) 5381 + kvm_caps.has_bus_lock_exit = true; 5414 5382 5415 5383 /* CPUID 0x80000008 */ 5416 5384 if (boot_cpu_has(X86_FEATURE_LS_CFG_SSBD) ||
+1
arch/x86/kvm/svm/svm.h
··· 173 173 u64 nested_cr3; 174 174 u64 virt_ext; 175 175 u32 clean; 176 + u64 bus_lock_rip; 176 177 union { 177 178 #if IS_ENABLED(CONFIG_HYPERV) || IS_ENABLED(CONFIG_KVM_HYPERV) 178 179 struct hv_vmcb_enlightenments hv_enlightenments;