Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

KVM: x86: WARN if user-return MSR notifier is registered on exit

When freeing the per-CPU user-return MSRs structures, WARN if any CPU has
a registered notifier to help detect and/or debug potential use-after-free
issues. The lifecycle of the notifiers is rather convoluted, and has
several non-obvious paths where notifiers are unregistered, i.e. isn't
exactly the most robust code possible.

The notifiers they are registered on-demand in KVM, on the first WRMSR to
a tracked register. _Usually_ the notifier is unregistered whenever the
CPU returns to userspace. But because any given CPU isn't guaranteed to
return to userspace, e.g. the CPU could be offlined before doing so, KVM
also "drops", a.k.a. unregisters, the notifiers when virtualization is
disabled on the CPU.

Further complicating the unregister path is the fact that the calls to
disable virtualization come from common KVM, and the per-CPU calls are
guarded by a per-CPU flag (to harden _that_ code against bugs, e.g. due to
mishandling reboot). Reboot/shutdown in particular is problematic, as KVM
disables virtualization via IPI function call, i.e. from IRQ context,
instead of using the cpuhp framework, which runs in task context. I.e. on
reboot/shutdown, drop_user_return_notifiers() is called asynchronously.

Forced reboot/shutdown is the most problematic scenario, as userspace tasks
are not frozen before kvm_shutdown() is invoked, i.e. KVM could be actively
manipulating the user-return MSR lists and/or notifiers when the IPI
arrives. To a certain extent, all bets are off when userspace forces a
reboot/shutdown, but KVM should at least avoid a use-after-free, e.g. to
avoid crashing the kernel when trying to reboot.

Link: https://patch.msgid.link/20251030191528.3380553-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>

+25 -8
+25 -8
arch/x86/kvm/x86.c
··· 575 575 vcpu->arch.apf.gfns[i] = ~0; 576 576 } 577 577 578 + static int kvm_init_user_return_msrs(void) 579 + { 580 + user_return_msrs = alloc_percpu(struct kvm_user_return_msrs); 581 + if (!user_return_msrs) { 582 + pr_err("failed to allocate percpu user_return_msrs\n"); 583 + return -ENOMEM; 584 + } 585 + kvm_nr_uret_msrs = 0; 586 + return 0; 587 + } 588 + 589 + static void kvm_free_user_return_msrs(void) 590 + { 591 + int cpu; 592 + 593 + for_each_possible_cpu(cpu) 594 + WARN_ON_ONCE(per_cpu_ptr(user_return_msrs, cpu)->registered); 595 + 596 + free_percpu(user_return_msrs); 597 + } 598 + 578 599 static void kvm_on_user_return(struct user_return_notifier *urn) 579 600 { 580 601 unsigned slot; ··· 10066 10045 return -ENOMEM; 10067 10046 } 10068 10047 10069 - user_return_msrs = alloc_percpu(struct kvm_user_return_msrs); 10070 - if (!user_return_msrs) { 10071 - pr_err("failed to allocate percpu kvm_user_return_msrs\n"); 10072 - r = -ENOMEM; 10048 + r = kvm_init_user_return_msrs(); 10049 + if (r) 10073 10050 goto out_free_x86_emulator_cache; 10074 - } 10075 - kvm_nr_uret_msrs = 0; 10076 10051 10077 10052 r = kvm_mmu_vendor_module_init(); 10078 10053 if (r) ··· 10171 10154 out_mmu_exit: 10172 10155 kvm_mmu_vendor_module_exit(); 10173 10156 out_free_percpu: 10174 - free_percpu(user_return_msrs); 10157 + kvm_free_user_return_msrs(); 10175 10158 out_free_x86_emulator_cache: 10176 10159 kmem_cache_destroy(x86_emulator_cache); 10177 10160 return r; ··· 10200 10183 #endif 10201 10184 kvm_x86_call(hardware_unsetup)(); 10202 10185 kvm_mmu_vendor_module_exit(); 10203 - free_percpu(user_return_msrs); 10186 + kvm_free_user_return_msrs(); 10204 10187 kmem_cache_destroy(x86_emulator_cache); 10205 10188 #ifdef CONFIG_KVM_XEN 10206 10189 static_key_deferred_flush(&kvm_xen_enabled);