KVM: SVM: Set RFLAGS.IF=1 in C code, to get VMRUN out of the STI shadow

Enable/disable local IRQs, i.e. set/clear RFLAGS.IF, in the common
svm_vcpu_enter_exit() just after/before guest_state_{enter,exit}_irqoff()
so that VMRUN is not executed in an STI shadow. AMD CPUs have a quirk
(some would say "bug"), where the STI shadow bleeds into the guest's
intr_state field if a #VMEXIT occurs during injection of an event, i.e. if
the VMRUN doesn't complete before the subsequent #VMEXIT.

The spurious "interrupts masked" state is relatively benign, as it only
occurs during event injection and is transient. Because KVM is already
injecting an event, the guest can't be in HLT, and if KVM is querying IRQ
blocking for injection, then KVM would need to force an immediate exit
anyways since injecting multiple events is impossible.

However, because KVM copies int_state verbatim from vmcb02 to vmcb12, the
spurious STI shadow is visible to L1 when running a nested VM, which can
trip sanity checks, e.g. in VMware's VMM.

Hoist the STI+CLI all the way to C code, as the aforementioned calls to
guest_state_{enter,exit}_irqoff() already inform lockdep that IRQs are
enabled/disabled, and taking a fault on VMRUN with RFLAGS.IF=1 is already
possible. I.e. if there's kernel code that is confused by running with
RFLAGS.IF=1, then it's already a problem. In practice, since GIF=0 also
blocks NMIs, the only change in exposure to non-KVM code (relative to
surrounding VMRUN with STI+CLI) is exception handling code, and except for
the kvm_rebooting=1 case, all exception in the core VM-Enter/VM-Exit path
are fatal.

Use the "raw" variants to enable/disable IRQs to avoid tracing in the
"no instrumentation" code; the guest state helpers also take care of
tracing IRQ state.

Oppurtunstically document why KVM needs to do STI in the first place.

Reported-by: Doug Covelli <doug.covelli@broadcom.com>
Closes: https://lore.kernel.org/all/CADH9ctBs1YPmE4aCfGPNBwA10cA8RuAk2gO7542DjMZgs4uzJQ@mail.gmail.com
Fixes: f14eec0a3203 ("KVM: SVM: move more vmentry code to assembly")
Cc: stable@vger.kernel.org
Reviewed-by: Jim Mattson <jmattson@google.com>
Link: https://lore.kernel.org/r/20250224165442.2338294-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

+15 -9
+14
arch/x86/kvm/svm/svm.c
··· 4189 4189 4190 4190 guest_state_enter_irqoff(); 4191 4191 4192 + /* 4193 + * Set RFLAGS.IF prior to VMRUN, as the host's RFLAGS.IF at the time of 4194 + * VMRUN controls whether or not physical IRQs are masked (KVM always 4195 + * runs with V_INTR_MASKING_MASK). Toggle RFLAGS.IF here to avoid the 4196 + * temptation to do STI+VMRUN+CLI, as AMD CPUs bleed the STI shadow 4197 + * into guest state if delivery of an event during VMRUN triggers a 4198 + * #VMEXIT, and the guest_state transitions already tell lockdep that 4199 + * IRQs are being enabled/disabled. Note! GIF=0 for the entirety of 4200 + * this path, so IRQs aren't actually unmasked while running host code. 4201 + */ 4202 + raw_local_irq_enable(); 4203 + 4192 4204 amd_clear_divider(); 4193 4205 4194 4206 if (sev_es_guest(vcpu->kvm)) ··· 4208 4196 sev_es_host_save_area(sd)); 4209 4197 else 4210 4198 __svm_vcpu_run(svm, spec_ctrl_intercepted); 4199 + 4200 + raw_local_irq_disable(); 4211 4201 4212 4202 guest_state_exit_irqoff(); 4213 4203 }
+1 -9
arch/x86/kvm/svm/vmenter.S
··· 170 170 mov VCPU_RDI(%_ASM_DI), %_ASM_DI 171 171 172 172 /* Enter guest mode */ 173 - sti 174 - 175 173 3: vmrun %_ASM_AX 176 174 4: 177 - cli 178 - 179 175 /* Pop @svm to RAX while it's the only available register. */ 180 176 pop %_ASM_AX 181 177 ··· 336 340 mov KVM_VMCB_pa(%rax), %rax 337 341 338 342 /* Enter guest mode */ 339 - sti 340 - 341 343 1: vmrun %rax 342 - 343 - 2: cli 344 - 344 + 2: 345 345 /* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */ 346 346 FILL_RETURN_BUFFER %rax, RSB_CLEAR_LOOPS, X86_FEATURE_RSB_VMEXIT 347 347