KVM: x86: Load DR6 with guest value only before entering .vcpu_run() loop

Move the conditional loading of hardware DR6 with the guest's DR6 value
out of the core .vcpu_run() loop to fix a bug where KVM can load hardware
with a stale vcpu->arch.dr6.

When the guest accesses a DR and host userspace isn't debugging the guest,
KVM disables DR interception and loads the guest's values into hardware on
VM-Enter and saves them on VM-Exit. This allows the guest to access DRs
at will, e.g. so that a sequence of DR accesses to configure a breakpoint
only generates one VM-Exit.

For DR0-DR3, the logic/behavior is identical between VMX and SVM, and also
identical between KVM_DEBUGREG_BP_ENABLED (userspace debugging the guest)
and KVM_DEBUGREG_WONT_EXIT (guest using DRs), and so KVM handles loading
DR0-DR3 in common code, _outside_ of the core kvm_x86_ops.vcpu_run() loop.

But for DR6, the guest's value doesn't need to be loaded into hardware for
KVM_DEBUGREG_BP_ENABLED, and SVM provides a dedicated VMCB field whereas
VMX requires software to manually load the guest value, and so loading the
guest's value into DR6 is handled by {svm,vmx}_vcpu_run(), i.e. is done
_inside_ the core run loop.

Unfortunately, saving the guest values on VM-Exit is initiated by common
x86, again outside of the core run loop. If the guest modifies DR6 (in
hardware, when DR interception is disabled), and then the next VM-Exit is
a fastpath VM-Exit, KVM will reload hardware DR6 with vcpu->arch.dr6 and
clobber the guest's actual value.

The bug shows up primarily with nested VMX because KVM handles the VMX
preemption timer in the fastpath, and the window between hardware DR6
being modified (in guest context) and DR6 being read by guest software is
orders of magnitude larger in a nested setup. E.g. in non-nested, the
VMX preemption timer would need to fire precisely between #DB injection
and the #DB handler's read of DR6, whereas with a KVM-on-KVM setup, the
window where hardware DR6 is "dirty" extends all the way from L1 writing
DR6 to VMRESUME (in L1).

L1's view:
==========
<L1 disables DR interception>
CPU 0/KVM-7289 [023] d.... 2925.640961: kvm_entry: vcpu 0
A: L1 Writes DR6
CPU 0/KVM-7289 [023] d.... 2925.640963: <hack>: Set DRs, DR6 = 0xffff0ff1

B: CPU 0/KVM-7289 [023] d.... 2925.640967: kvm_exit: vcpu 0 reason EXTERNAL_INTERRUPT intr_info 0x800000ec

D: L1 reads DR6, arch.dr6 = 0
CPU 0/KVM-7289 [023] d.... 2925.640969: <hack>: Sync DRs, DR6 = 0xffff0ff0

CPU 0/KVM-7289 [023] d.... 2925.640976: kvm_entry: vcpu 0
L2 reads DR6, L1 disables DR interception
CPU 0/KVM-7289 [023] d.... 2925.640980: kvm_exit: vcpu 0 reason DR_ACCESS info1 0x0000000000000216
CPU 0/KVM-7289 [023] d.... 2925.640983: kvm_entry: vcpu 0

CPU 0/KVM-7289 [023] d.... 2925.640983: <hack>: Set DRs, DR6 = 0xffff0ff0

L2 detects failure
CPU 0/KVM-7289 [023] d.... 2925.640987: kvm_exit: vcpu 0 reason HLT
L1 reads DR6 (confirms failure)
CPU 0/KVM-7289 [023] d.... 2925.640990: <hack>: Sync DRs, DR6 = 0xffff0ff0

L0's view:
==========
L2 reads DR6, arch.dr6 = 0
CPU 23/KVM-5046 [001] d.... 3410.005610: kvm_exit: vcpu 23 reason DR_ACCESS info1 0x0000000000000216
CPU 23/KVM-5046 [001] ..... 3410.005610: kvm_nested_vmexit: vcpu 23 reason DR_ACCESS info1 0x0000000000000216

L2 => L1 nested VM-Exit
CPU 23/KVM-5046 [001] ..... 3410.005610: kvm_nested_vmexit_inject: reason: DR_ACCESS ext_inf1: 0x0000000000000216

CPU 23/KVM-5046 [001] d.... 3410.005610: kvm_entry: vcpu 23
CPU 23/KVM-5046 [001] d.... 3410.005611: kvm_exit: vcpu 23 reason VMREAD
CPU 23/KVM-5046 [001] d.... 3410.005611: kvm_entry: vcpu 23
CPU 23/KVM-5046 [001] d.... 3410.005612: kvm_exit: vcpu 23 reason VMREAD
CPU 23/KVM-5046 [001] d.... 3410.005612: kvm_entry: vcpu 23

L1 writes DR7, L0 disables DR interception
CPU 23/KVM-5046 [001] d.... 3410.005612: kvm_exit: vcpu 23 reason DR_ACCESS info1 0x0000000000000007
CPU 23/KVM-5046 [001] d.... 3410.005613: kvm_entry: vcpu 23

L0 writes DR6 = 0 (arch.dr6)
CPU 23/KVM-5046 [001] d.... 3410.005613: <hack>: Set DRs, DR6 = 0xffff0ff0

A: <L1 writes DR6 = 1, no interception, arch.dr6 is still '0'>

B: CPU 23/KVM-5046 [001] d.... 3410.005614: kvm_exit: vcpu 23 reason PREEMPTION_TIMER
CPU 23/KVM-5046 [001] d.... 3410.005614: kvm_entry: vcpu 23

C: L0 writes DR6 = 0 (arch.dr6)
CPU 23/KVM-5046 [001] d.... 3410.005614: <hack>: Set DRs, DR6 = 0xffff0ff0

L1 => L2 nested VM-Enter
CPU 23/KVM-5046 [001] d.... 3410.005616: kvm_exit: vcpu 23 reason VMRESUME

L0 reads DR6, arch.dr6 = 0

Reported-by: John Stultz <jstultz@google.com>
Closes: https://lkml.kernel.org/r/CANDhNCq5_F3HfFYABqFGCA1bPd_%2BxgNj-iDQhH4tDk%2Bwi8iZZg%40mail.gmail.com
Fixes: 375e28ffc0cf ("KVM: X86: Set host DR6 only on VMX and for KVM_DEBUGREG_WONT_EXIT")
Fixes: d67668e9dd76 ("KVM: x86, SVM: isolate vcpu->arch.dr6 from vmcb->save.dr6")
Cc: stable@vger.kernel.org
Cc: Jim Mattson <jmattson@google.com>
Tested-by: John Stultz <jstultz@google.com>
Link: https://lore.kernel.org/r/20250125011833.3644371-1-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

+19 -11
+1
arch/x86/include/asm/kvm-x86-ops.h
··· 48 48 KVM_X86_OP(get_gdt) 49 49 KVM_X86_OP(set_gdt) 50 50 KVM_X86_OP(sync_dirty_debug_regs) 51 + KVM_X86_OP(set_dr6) 51 52 KVM_X86_OP(set_dr7) 52 53 KVM_X86_OP(cache_reg) 53 54 KVM_X86_OP(get_rflags)
+1
arch/x86/include/asm/kvm_host.h
··· 1696 1696 void (*get_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); 1697 1697 void (*set_gdt)(struct kvm_vcpu *vcpu, struct desc_ptr *dt); 1698 1698 void (*sync_dirty_debug_regs)(struct kvm_vcpu *vcpu); 1699 + void (*set_dr6)(struct kvm_vcpu *vcpu, unsigned long value); 1699 1700 void (*set_dr7)(struct kvm_vcpu *vcpu, unsigned long value); 1700 1701 void (*cache_reg)(struct kvm_vcpu *vcpu, enum kvm_reg reg); 1701 1702 unsigned long (*get_rflags)(struct kvm_vcpu *vcpu);
+6 -7
arch/x86/kvm/svm/svm.c
··· 1991 1991 svm->asid = sd->next_asid++; 1992 1992 } 1993 1993 1994 - static void svm_set_dr6(struct vcpu_svm *svm, unsigned long value) 1994 + static void svm_set_dr6(struct kvm_vcpu *vcpu, unsigned long value) 1995 1995 { 1996 - struct vmcb *vmcb = svm->vmcb; 1996 + struct vmcb *vmcb = to_svm(vcpu)->vmcb; 1997 1997 1998 - if (svm->vcpu.arch.guest_state_protected) 1998 + if (vcpu->arch.guest_state_protected) 1999 1999 return; 2000 2000 2001 2001 if (unlikely(value != vmcb->save.dr6)) { ··· 4247 4247 * Run with all-zero DR6 unless needed, so that we can get the exact cause 4248 4248 * of a #DB. 4249 4249 */ 4250 - if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)) 4251 - svm_set_dr6(svm, vcpu->arch.dr6); 4252 - else 4253 - svm_set_dr6(svm, DR6_ACTIVE_LOW); 4250 + if (likely(!(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT))) 4251 + svm_set_dr6(vcpu, DR6_ACTIVE_LOW); 4254 4252 4255 4253 clgi(); 4256 4254 kvm_load_guest_xsave_state(vcpu); ··· 5041 5043 .set_idt = svm_set_idt, 5042 5044 .get_gdt = svm_get_gdt, 5043 5045 .set_gdt = svm_set_gdt, 5046 + .set_dr6 = svm_set_dr6, 5044 5047 .set_dr7 = svm_set_dr7, 5045 5048 .sync_dirty_debug_regs = svm_sync_dirty_debug_regs, 5046 5049 .cache_reg = svm_cache_reg,
+1
arch/x86/kvm/vmx/main.c
··· 61 61 .set_idt = vmx_set_idt, 62 62 .get_gdt = vmx_get_gdt, 63 63 .set_gdt = vmx_set_gdt, 64 + .set_dr6 = vmx_set_dr6, 64 65 .set_dr7 = vmx_set_dr7, 65 66 .sync_dirty_debug_regs = vmx_sync_dirty_debug_regs, 66 67 .cache_reg = vmx_cache_reg,
+6 -4
arch/x86/kvm/vmx/vmx.c
··· 5648 5648 set_debugreg(DR6_RESERVED, 6); 5649 5649 } 5650 5650 5651 + void vmx_set_dr6(struct kvm_vcpu *vcpu, unsigned long val) 5652 + { 5653 + lockdep_assert_irqs_disabled(); 5654 + set_debugreg(vcpu->arch.dr6, 6); 5655 + } 5656 + 5651 5657 void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val) 5652 5658 { 5653 5659 vmcs_writel(GUEST_DR7, val); ··· 7422 7416 vmcs_writel(HOST_CR4, cr4); 7423 7417 vmx->loaded_vmcs->host_state.cr4 = cr4; 7424 7418 } 7425 - 7426 - /* When KVM_DEBUGREG_WONT_EXIT, dr6 is accessible in guest. */ 7427 - if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)) 7428 - set_debugreg(vcpu->arch.dr6, 6); 7429 7419 7430 7420 /* When single-stepping over STI and MOV SS, we must clear the 7431 7421 * corresponding interruptibility bits in the guest state. Otherwise
+1
arch/x86/kvm/vmx/x86_ops.h
··· 73 73 void vmx_set_idt(struct kvm_vcpu *vcpu, struct desc_ptr *dt); 74 74 void vmx_get_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt); 75 75 void vmx_set_gdt(struct kvm_vcpu *vcpu, struct desc_ptr *dt); 76 + void vmx_set_dr6(struct kvm_vcpu *vcpu, unsigned long val); 76 77 void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val); 77 78 void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu); 78 79 void vmx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
+3
arch/x86/kvm/x86.c
··· 10961 10961 set_debugreg(vcpu->arch.eff_db[1], 1); 10962 10962 set_debugreg(vcpu->arch.eff_db[2], 2); 10963 10963 set_debugreg(vcpu->arch.eff_db[3], 3); 10964 + /* When KVM_DEBUGREG_WONT_EXIT, dr6 is accessible in guest. */ 10965 + if (unlikely(vcpu->arch.switch_db_regs & KVM_DEBUGREG_WONT_EXIT)) 10966 + kvm_x86_call(set_dr6)(vcpu, vcpu->arch.dr6); 10964 10967 } else if (unlikely(hw_breakpoint_active())) { 10965 10968 set_debugreg(0, 7); 10966 10969 }