Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'kvm-x86-asyncpf_abi-6.9' of https://github.com/kvm-x86/linux into HEAD

Guest-side KVM async #PF ABI cleanup for 6.9

Delete kvm_vcpu_pv_apf_data.enabled to fix a goof in KVM's async #PF ABI where
the enabled field pushes the size of "struct kvm_vcpu_pv_apf_data" from 64 to
68 bytes, i.e. beyond a single cache line.

The enabled field is purely a guest-side flag that Linux-as-a-guest uses to
track whether or not the guest has enabled async #PF support. The actual flag
that is passed to the host, i.e. to KVM proper, is a single bit in a synthetic
MSR, MSR_KVM_ASYNC_PF_EN, i.e. is in a location completely unrelated to the
shared kvm_vcpu_pv_apf_data structure.

Simply drop the the field and use a dedicated guest-side per-CPU variable to
fix the ABI, as opposed to fixing the documentation to match reality. KVM has
never consumed kvm_vcpu_pv_apf_data.enabled, so the odds of the ABI change
breaking anything are extremely low.

+15 -16
+9 -10
Documentation/virt/kvm/x86/msr.rst
··· 193 193 Asynchronous page fault (APF) control MSR. 194 194 195 195 Bits 63-6 hold 64-byte aligned physical address of a 64 byte memory area 196 - which must be in guest RAM and must be zeroed. This memory is expected 197 - to hold a copy of the following structure:: 196 + which must be in guest RAM. This memory is expected to hold the 197 + following structure:: 198 198 199 199 struct kvm_vcpu_pv_apf_data { 200 200 /* Used for 'page not present' events delivered via #PF */ ··· 204 204 __u32 token; 205 205 206 206 __u8 pad[56]; 207 - __u32 enabled; 208 207 }; 209 208 210 209 Bits 5-4 of the MSR are reserved and should be zero. Bit 0 is set to 1 ··· 231 232 as regular page fault, guest must reset 'flags' to '0' before it does 232 233 something that can generate normal page fault. 233 234 234 - Bytes 5-7 of 64 byte memory location ('token') will be written to by the 235 + Bytes 4-7 of 64 byte memory location ('token') will be written to by the 235 236 hypervisor at the time of APF 'page ready' event injection. The content 236 - of these bytes is a token which was previously delivered as 'page not 237 - present' event. The event indicates the page in now available. Guest is 238 - supposed to write '0' to 'token' when it is done handling 'page ready' 239 - event and to write 1' to MSR_KVM_ASYNC_PF_ACK after clearing the location; 240 - writing to the MSR forces KVM to re-scan its queue and deliver the next 241 - pending notification. 237 + of these bytes is a token which was previously delivered in CR2 as 238 + 'page not present' event. The event indicates the page is now available. 239 + Guest is supposed to write '0' to 'token' when it is done handling 240 + 'page ready' event and to write '1' to MSR_KVM_ASYNC_PF_ACK after 241 + clearing the location; writing to the MSR forces KVM to re-scan its 242 + queue and deliver the next pending notification. 242 243 243 244 Note, MSR_KVM_ASYNC_PF_INT MSR specifying the interrupt vector for 'page 244 245 ready' APF delivery needs to be written to before enabling APF mechanism
-1
arch/x86/include/uapi/asm/kvm_para.h
··· 142 142 __u32 token; 143 143 144 144 __u8 pad[56]; 145 - __u32 enabled; 146 145 }; 147 146 148 147 #define KVM_PV_EOI_BIT 0
+6 -5
arch/x86/kernel/kvm.c
··· 65 65 66 66 early_param("no-steal-acc", parse_no_stealacc); 67 67 68 + static DEFINE_PER_CPU_READ_MOSTLY(bool, async_pf_enabled); 68 69 static DEFINE_PER_CPU_DECRYPTED(struct kvm_vcpu_pv_apf_data, apf_reason) __aligned(64); 69 70 DEFINE_PER_CPU_DECRYPTED(struct kvm_steal_time, steal_time) __aligned(64) __visible; 70 71 static int has_steal_clock = 0; ··· 245 244 { 246 245 u32 flags = 0; 247 246 248 - if (__this_cpu_read(apf_reason.enabled)) { 247 + if (__this_cpu_read(async_pf_enabled)) { 249 248 flags = __this_cpu_read(apf_reason.flags); 250 249 __this_cpu_write(apf_reason.flags, 0); 251 250 } ··· 296 295 297 296 inc_irq_stat(irq_hv_callback_count); 298 297 299 - if (__this_cpu_read(apf_reason.enabled)) { 298 + if (__this_cpu_read(async_pf_enabled)) { 300 299 token = __this_cpu_read(apf_reason.token); 301 300 kvm_async_pf_task_wake(token); 302 301 __this_cpu_write(apf_reason.token, 0); ··· 363 362 wrmsrl(MSR_KVM_ASYNC_PF_INT, HYPERVISOR_CALLBACK_VECTOR); 364 363 365 364 wrmsrl(MSR_KVM_ASYNC_PF_EN, pa); 366 - __this_cpu_write(apf_reason.enabled, 1); 365 + __this_cpu_write(async_pf_enabled, true); 367 366 pr_debug("setup async PF for cpu %d\n", smp_processor_id()); 368 367 } 369 368 ··· 384 383 385 384 static void kvm_pv_disable_apf(void) 386 385 { 387 - if (!__this_cpu_read(apf_reason.enabled)) 386 + if (!__this_cpu_read(async_pf_enabled)) 388 387 return; 389 388 390 389 wrmsrl(MSR_KVM_ASYNC_PF_EN, 0); 391 - __this_cpu_write(apf_reason.enabled, 0); 390 + __this_cpu_write(async_pf_enabled, false); 392 391 393 392 pr_debug("disable async PF for cpu %d\n", smp_processor_id()); 394 393 }