Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

KVM: PPC: Book3S HV: Stop using vc->dpdes for nested KVM guests

commit 6398326b9ba1 ("KVM: PPC: Book3S HV P9: Stop using vc->dpdes")
introduced an optimization to use only vcpu->doorbell_request for SMT
emulation for Power9 and above guests, but the code for nested guests
still relies on the old way of handling doorbells, due to which an L2
guest (see [1]) cannot be booted with XICS with SMT>1. The command to
repro this issue is:

// To be run in L1

qemu-system-ppc64 \
-drive file=rhel.qcow2,format=qcow2 \
-m 20G \
-smp 8,cores=1,threads=8 \
-cpu host \
-nographic \
-machine pseries,ic-mode=xics -accel kvm

Fix the plumbing to utilize vcpu->doorbell_request instead of vcore->dpdes
for nested KVM guests on P9 and above.

[1] Terminology
1. L0 : PowerNV linux running with HV privileges
2. L1 : Pseries KVM guest running on top of L0
2. L2 : Nested KVM guest running on top of L1

Fixes: 6398326b9ba1 ("KVM: PPC: Book3S HV P9: Stop using vc->dpdes")
Signed-off-by: Gautam Menghani <gautam@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://patch.msgid.link/20241109063301.105289-3-gautam@linux.ibm.com

authored by

Gautam Menghani and committed by
Michael Ellerman
0d3c6b28 ed351c57

+19 -4
+9
arch/powerpc/kvm/book3s_hv.c
··· 4286 4286 hvregs.hdec_expiry = time_limit; 4287 4287 4288 4288 /* 4289 + * hvregs has the doorbell status, so zero it here which 4290 + * enables us to receive doorbells when H_ENTER_NESTED is 4291 + * in progress for this vCPU 4292 + */ 4293 + 4294 + if (vcpu->arch.doorbell_request) 4295 + vcpu->arch.doorbell_request = 0; 4296 + 4297 + /* 4289 4298 * When setting DEC, we must always deal with irq_work_raise 4290 4299 * via NMI vs setting DEC. The problem occurs right as we 4291 4300 * switch into guest mode if a NMI hits and sets pending work
+10 -4
arch/powerpc/kvm/book3s_hv_nested.c
··· 32 32 struct kvmppc_vcore *vc = vcpu->arch.vcore; 33 33 34 34 hr->pcr = vc->pcr | PCR_MASK; 35 - hr->dpdes = vc->dpdes; 35 + hr->dpdes = vcpu->arch.doorbell_request; 36 36 hr->hfscr = vcpu->arch.hfscr; 37 37 hr->tb_offset = vc->tb_offset; 38 38 hr->dawr0 = vcpu->arch.dawr0; ··· 105 105 { 106 106 struct kvmppc_vcore *vc = vcpu->arch.vcore; 107 107 108 - hr->dpdes = vc->dpdes; 108 + hr->dpdes = vcpu->arch.doorbell_request; 109 109 hr->purr = vcpu->arch.purr; 110 110 hr->spurr = vcpu->arch.spurr; 111 111 hr->ic = vcpu->arch.ic; ··· 143 143 struct kvmppc_vcore *vc = vcpu->arch.vcore; 144 144 145 145 vc->pcr = hr->pcr | PCR_MASK; 146 - vc->dpdes = hr->dpdes; 146 + vcpu->arch.doorbell_request = hr->dpdes; 147 147 vcpu->arch.hfscr = hr->hfscr; 148 148 vcpu->arch.dawr0 = hr->dawr0; 149 149 vcpu->arch.dawrx0 = hr->dawrx0; ··· 170 170 { 171 171 struct kvmppc_vcore *vc = vcpu->arch.vcore; 172 172 173 - vc->dpdes = hr->dpdes; 173 + /* 174 + * This L2 vCPU might have received a doorbell while H_ENTER_NESTED was being handled. 175 + * Make sure we preserve the doorbell if it was either: 176 + * a) Sent after H_ENTER_NESTED was called on this vCPU (arch.doorbell_request would be 1) 177 + * b) Doorbell was not handled and L2 exited for some other reason (hr->dpdes would be 1) 178 + */ 179 + vcpu->arch.doorbell_request = vcpu->arch.doorbell_request | hr->dpdes; 174 180 vcpu->arch.hfscr = hr->hfscr; 175 181 vcpu->arch.purr = hr->purr; 176 182 vcpu->arch.spurr = hr->spurr;