Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/vmscape: Add conditional IBPB mitigation

VMSCAPE is a vulnerability that exploits insufficient branch predictor
isolation between a guest and a userspace hypervisor (like QEMU). Existing
mitigations already protect kernel/KVM from a malicious guest. Userspace
can additionally be protected by flushing the branch predictors after a
VMexit.

Since it is the userspace that consumes the poisoned branch predictors,
conditionally issue an IBPB after a VMexit and before returning to
userspace. Workloads that frequently switch between hypervisor and
userspace will incur the most overhead from the new IBPB.

This new IBPB is not integrated with the existing IBPB sites. For
instance, a task can use the existing speculation control prctl() to
get an IBPB at context switch time. With this implementation, the
IBPB is doubled up: one at context switch and another before running
userspace.

The intent is to integrate and optimize these cases post-embargo.

[ dhansen: elaborate on suboptimal IBPB solution ]

Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Sean Christopherson <seanjc@google.com>

authored by

Pawan Gupta and committed by
Dave Hansen
2f8f1734 a508cec6

+27
+1
arch/x86/include/asm/cpufeatures.h
··· 494 494 #define X86_FEATURE_TSA_SQ_NO (21*32+11) /* AMD CPU not vulnerable to TSA-SQ */ 495 495 #define X86_FEATURE_TSA_L1_NO (21*32+12) /* AMD CPU not vulnerable to TSA-L1 */ 496 496 #define X86_FEATURE_CLEAR_CPU_BUF_VM (21*32+13) /* Clear CPU buffers using VERW before VMRUN */ 497 + #define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */ 497 498 498 499 /* 499 500 * BUG word(s)
+7
arch/x86/include/asm/entry-common.h
··· 93 93 * 8 (ia32) bits. 94 94 */ 95 95 choose_random_kstack_offset(rdtsc()); 96 + 97 + /* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */ 98 + if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) && 99 + this_cpu_read(x86_ibpb_exit_to_user)) { 100 + indirect_branch_prediction_barrier(); 101 + this_cpu_write(x86_ibpb_exit_to_user, false); 102 + } 96 103 } 97 104 #define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare 98 105
+2
arch/x86/include/asm/nospec-branch.h
··· 530 530 : "memory"); 531 531 } 532 532 533 + DECLARE_PER_CPU(bool, x86_ibpb_exit_to_user); 534 + 533 535 static inline void indirect_branch_prediction_barrier(void) 534 536 { 535 537 asm_inline volatile(ALTERNATIVE("", "call write_ibpb", X86_FEATURE_IBPB)
+8
arch/x86/kernel/cpu/bugs.c
··· 105 105 DEFINE_PER_CPU(u64, x86_spec_ctrl_current); 106 106 EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current); 107 107 108 + /* 109 + * Set when the CPU has run a potentially malicious guest. An IBPB will 110 + * be needed to before running userspace. That IBPB will flush the branch 111 + * predictor content. 112 + */ 113 + DEFINE_PER_CPU(bool, x86_ibpb_exit_to_user); 114 + EXPORT_PER_CPU_SYMBOL_GPL(x86_ibpb_exit_to_user); 115 + 108 116 u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB; 109 117 110 118 static u64 __ro_after_init x86_arch_cap_msr;
+9
arch/x86/kvm/x86.c
··· 11008 11008 wrmsrq(MSR_IA32_XFD_ERR, 0); 11009 11009 11010 11010 /* 11011 + * Mark this CPU as needing a branch predictor flush before running 11012 + * userspace. Must be done before enabling preemption to ensure it gets 11013 + * set for the CPU that actually ran the guest, and not the CPU that it 11014 + * may migrate to. 11015 + */ 11016 + if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER)) 11017 + this_cpu_write(x86_ibpb_exit_to_user, true); 11018 + 11019 + /* 11011 11020 * Consume any pending interrupts, including the possible source of 11012 11021 * VM-Exit on SVM and any ticks that occur between VM-Exit and now. 11013 11022 * An instruction is required after local_irq_enable() to fully unblock