Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

KVM: Make hardware_enable_failed a local variable in the "enable all" path

Rework detecting hardware enabling errors to use a local variable in the
"enable all" path to track whether or not enabling was successful across
all CPUs. Using a global variable complicates paths that enable hardware
only on the current CPU, e.g. kvm_resume() and kvm_online_cpu().

Opportunistically add a WARN if hardware enabling fails during
kvm_resume(), KVM is all kinds of hosed if CPU0 fails to enable hardware.
The WARN is largely futile in the current code, as KVM BUG()s on spurious
faults on VMX instructions, e.g. attempting to run a vCPU on CPU if
hardware enabling fails will explode.

------------[ cut here ]------------
kernel BUG at arch/x86/kvm/x86.c:508!
invalid opcode: 0000 [#1] SMP
CPU: 3 PID: 1009 Comm: CPU 4/KVM Not tainted 6.1.0-rc1+ #11
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:kvm_spurious_fault+0xa/0x10
Call Trace:
vmx_vcpu_load_vmcs+0x192/0x230 [kvm_intel]
vmx_vcpu_load+0x16/0x60 [kvm_intel]
kvm_arch_vcpu_load+0x32/0x1f0
vcpu_load+0x2f/0x40
kvm_arch_vcpu_ioctl_run+0x19/0x9d0
kvm_vcpu_ioctl+0x271/0x660
__x64_sys_ioctl+0x80/0xb0
do_syscall_64+0x2b/0x50
entry_SYSCALL_64_after_hwframe+0x46/0xb0

But, the WARN may provide a breadcrumb to understand what went awry, and
someday KVM may fix one or both of those bugs, e.g. by finding a way to
eat spurious faults no matter the context (easier said than done due to
side effects of certain operations, e.g. Intel's VMCLEAR).

Signed-off-by: Isaku Yamahata <isaku.yamahata@intel.com>
[sean: rebase, WARN on failure in kvm_resume()]
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-Id: <20221130230934.1014142-48-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

authored by

Isaku Yamahata and committed by
Paolo Bonzini
e6fb7d6e 37d25881

+16 -19
+16 -19
virt/kvm/kvm_main.c
··· 104 104 105 105 static DEFINE_PER_CPU(bool, hardware_enabled); 106 106 static int kvm_usage_count; 107 - static atomic_t hardware_enable_failed; 108 107 109 108 static struct kmem_cache *kvm_vcpu_cache; 110 109 ··· 5093 5094 &kvm_chardev_ops, 5094 5095 }; 5095 5096 5096 - static void hardware_enable_nolock(void *junk) 5097 + static int __hardware_enable_nolock(void) 5097 5098 { 5098 5099 if (__this_cpu_read(hardware_enabled)) 5099 - return; 5100 + return 0; 5100 5101 5101 5102 if (kvm_arch_hardware_enable()) { 5102 - atomic_inc(&hardware_enable_failed); 5103 5103 pr_info("kvm: enabling virtualization on CPU%d failed\n", 5104 5104 raw_smp_processor_id()); 5105 - return; 5105 + return -EIO; 5106 5106 } 5107 5107 5108 5108 __this_cpu_write(hardware_enabled, true); 5109 + return 0; 5110 + } 5111 + 5112 + static void hardware_enable_nolock(void *failed) 5113 + { 5114 + if (__hardware_enable_nolock()) 5115 + atomic_inc(failed); 5109 5116 } 5110 5117 5111 5118 static int kvm_online_cpu(unsigned int cpu) ··· 5124 5119 * errors when scheduled to this CPU. 5125 5120 */ 5126 5121 mutex_lock(&kvm_lock); 5127 - if (kvm_usage_count) { 5128 - WARN_ON_ONCE(atomic_read(&hardware_enable_failed)); 5129 - 5130 - hardware_enable_nolock(NULL); 5131 - 5132 - if (atomic_read(&hardware_enable_failed)) { 5133 - atomic_set(&hardware_enable_failed, 0); 5134 - ret = -EIO; 5135 - } 5136 - } 5122 + if (kvm_usage_count) 5123 + ret = __hardware_enable_nolock(); 5137 5124 mutex_unlock(&kvm_lock); 5138 5125 return ret; 5139 5126 } ··· 5173 5176 5174 5177 static int hardware_enable_all(void) 5175 5178 { 5179 + atomic_t failed = ATOMIC_INIT(0); 5176 5180 int r = 0; 5177 5181 5178 5182 /* ··· 5189 5191 5190 5192 kvm_usage_count++; 5191 5193 if (kvm_usage_count == 1) { 5192 - atomic_set(&hardware_enable_failed, 0); 5193 - on_each_cpu(hardware_enable_nolock, NULL, 1); 5194 + on_each_cpu(hardware_enable_nolock, &failed, 1); 5194 5195 5195 - if (atomic_read(&hardware_enable_failed)) { 5196 + if (atomic_read(&failed)) { 5196 5197 hardware_disable_all_nolock(); 5197 5198 r = -EBUSY; 5198 5199 } ··· 5825 5828 lockdep_assert_irqs_disabled(); 5826 5829 5827 5830 if (kvm_usage_count) 5828 - hardware_enable_nolock(NULL); 5831 + WARN_ON_ONCE(__hardware_enable_nolock()); 5829 5832 } 5830 5833 5831 5834 static struct syscore_ops kvm_syscore_ops = {