Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Pull kvm fixes from Paolo Bonzini:
"Bugfixes, mostly for ARM and AMD, and more documentation.

Slightly bigger than usual because I couldn't send out what was
pending for rc4, but there is nothing worrisome going on. I have more
fixes pending for guest debugging support (gdbstub) but I will send
them next week"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (22 commits)
KVM: X86: Declare KVM_CAP_SET_GUEST_DEBUG properly
KVM: selftests: Fix build for evmcs.h
kvm: x86: Use KVM CPU capabilities to determine CR4 reserved bits
KVM: VMX: Explicitly clear RFLAGS.CF and RFLAGS.ZF in VM-Exit RSB path
docs/virt/kvm: Document configuring and running nested guests
KVM: s390: Remove false WARN_ON_ONCE for the PQAP instruction
kvm: ioapic: Restrict lazy EOI update to edge-triggered interrupts
KVM: x86: Fixes posted interrupt check for IRQs delivery modes
KVM: SVM: fill in kvm_run->debug.arch.dr[67]
KVM: nVMX: Replace a BUG_ON(1) with BUG() to squash clang warning
KVM: arm64: Fix 32bit PC wrap-around
KVM: arm64: vgic-v4: Initialize GICv4.1 even in the absence of a virtual ITS
KVM: arm64: Save/restore sp_el0 as part of __guest_enter
KVM: arm64: Delete duplicated label in invalid_vector
KVM: arm64: vgic-its: Fix memory leak on the error path of vgic_add_lpi()
KVM: arm64: vgic-v3: Retire all pending LPIs on vcpu destroy
KVM: arm: vgic-v2: Only use the virtual state when userspace accesses pending bits
KVM: arm: vgic: Only use the virtual state when userspace accesses enable bits
KVM: arm: vgic: Synchronize the whole guest on GIC{D,R}_I{S,C}ACTIVER read
KVM: arm64: PSCI: Forbid 64bit functions for 32bit guests
...

Linus Torvalds 5 years ago 8c16ec94 de268ccb

+628 -125

25 changed files

expand all

Documentation

virt

kvm

index.rst

running-nested-guests.rst

arch

arm64

kvm

guest.c

hyp

entry.S

hyp-entry.S

sysreg-sr.c

powerpc

kvm

powerpc.c

s390

kvm

kvm-s390.c

priv.c

x86

include

asm

kvm_host.h

kvm

ioapic.c

svm

svm.c

vmx

nested.c

vmenter.S

x86.c

tools

testing

selftests

kvm

include

evmcs.h

lib

x86_64

vmx.c

virt

kvm

arm

hyp

aarch32.c

psci.c

vgic

vgic-init.c

vgic-its.c

vgic-mmio-v2.c

vgic-mmio-v3.c

vgic-mmio.c

vgic-mmio.h

Documentation/virt/kvm/index.rst

··· 28 28 arm/index 29 29 30 30 devices/index 31 + 32 + running-nested-guests

+276

Documentation/virt/kvm/running-nested-guests.rst

··· 1 + ============================== 2 + Running nested guests with KVM 3 + ============================== 4 + 5 + A nested guest is the ability to run a guest inside another guest (it 6 + can be KVM-based or a different hypervisor). The straightforward 7 + example is a KVM guest that in turn runs on a KVM guest (the rest of 8 + this document is built on this example):: 9 + 10 + .----------------. .----------------. 11 + | | | | 12 + | L2 | | L2 | 13 + | (Nested Guest) | | (Nested Guest) | 14 + | | | | 15 + |----------------'--'----------------| 16 + | | 17 + | L1 (Guest Hypervisor) | 18 + | KVM (/dev/kvm) | 19 + | | 20 + .------------------------------------------------------. 21 + | L0 (Host Hypervisor) | 22 + | KVM (/dev/kvm) | 23 + |------------------------------------------------------| 24 + | Hardware (with virtualization extensions) | 25 + '------------------------------------------------------' 26 + 27 + Terminology: 28 + 29 + - L0 – level-0; the bare metal host, running KVM 30 + 31 + - L1 – level-1 guest; a VM running on L0; also called the "guest 32 + hypervisor", as it itself is capable of running KVM. 33 + 34 + - L2 – level-2 guest; a VM running on L1, this is the "nested guest" 35 + 36 + .. note:: The above diagram is modelled after the x86 architecture; 37 + s390x, ppc64 and other architectures are likely to have 38 + a different design for nesting. 39 + 40 + For example, s390x always has an LPAR (LogicalPARtition) 41 + hypervisor running on bare metal, adding another layer and 42 + resulting in at least four levels in a nested setup — L0 (bare 43 + metal, running the LPAR hypervisor), L1 (host hypervisor), L2 44 + (guest hypervisor), L3 (nested guest). 45 + 46 + This document will stick with the three-level terminology (L0, 47 + L1, and L2) for all architectures; and will largely focus on 48 + x86. 49 + 50 + 51 + Use Cases 52 + --------- 53 + 54 + There are several scenarios where nested KVM can be useful, to name a 55 + few: 56 + 57 + - As a developer, you want to test your software on different operating 58 + systems (OSes). Instead of renting multiple VMs from a Cloud 59 + Provider, using nested KVM lets you rent a large enough "guest 60 + hypervisor" (level-1 guest). This in turn allows you to create 61 + multiple nested guests (level-2 guests), running different OSes, on 62 + which you can develop and test your software. 63 + 64 + - Live migration of "guest hypervisors" and their nested guests, for 65 + load balancing, disaster recovery, etc. 66 + 67 + - VM image creation tools (e.g. ``virt-install``, etc) often run 68 + their own VM, and users expect these to work inside a VM. 69 + 70 + - Some OSes use virtualization internally for security (e.g. to let 71 + applications run safely in isolation). 72 + 73 + 74 + Enabling "nested" (x86) 75 + ----------------------- 76 + 77 + From Linux kernel v4.19 onwards, the ``nested`` KVM parameter is enabled 78 + by default for Intel and AMD. (Though your Linux distribution might 79 + override this default.) 80 + 81 + In case you are running a Linux kernel older than v4.19, to enable 82 + nesting, set the ``nested`` KVM module parameter to ``Y`` or ``1``. To 83 + persist this setting across reboots, you can add it in a config file, as 84 + shown below: 85 + 86 + 1. On the bare metal host (L0), list the kernel modules and ensure that 87 + the KVM modules:: 88 + 89 + $ lsmod | grep -i kvm 90 + kvm_intel 133627 0 91 + kvm 435079 1 kvm_intel 92 + 93 + 2. Show information for ``kvm_intel`` module:: 94 + 95 + $ modinfo kvm_intel | grep -i nested 96 + parm: nested:bool 97 + 98 + 3. For the nested KVM configuration to persist across reboots, place the 99 + below in ``/etc/modprobed/kvm_intel.conf`` (create the file if it 100 + doesn't exist):: 101 + 102 + $ cat /etc/modprobe.d/kvm_intel.conf 103 + options kvm-intel nested=y 104 + 105 + 4. Unload and re-load the KVM Intel module:: 106 + 107 + $ sudo rmmod kvm-intel 108 + $ sudo modprobe kvm-intel 109 + 110 + 5. Verify if the ``nested`` parameter for KVM is enabled:: 111 + 112 + $ cat /sys/module/kvm_intel/parameters/nested 113 + Y 114 + 115 + For AMD hosts, the process is the same as above, except that the module 116 + name is ``kvm-amd``. 117 + 118 + 119 + Additional nested-related kernel parameters (x86) 120 + ------------------------------------------------- 121 + 122 + If your hardware is sufficiently advanced (Intel Haswell processor or 123 + higher, which has newer hardware virt extensions), the following 124 + additional features will also be enabled by default: "Shadow VMCS 125 + (Virtual Machine Control Structure)", APIC Virtualization on your bare 126 + metal host (L0). Parameters for Intel hosts:: 127 + 128 + $ cat /sys/module/kvm_intel/parameters/enable_shadow_vmcs 129 + Y 130 + 131 + $ cat /sys/module/kvm_intel/parameters/enable_apicv 132 + Y 133 + 134 + $ cat /sys/module/kvm_intel/parameters/ept 135 + Y 136 + 137 + .. note:: If you suspect your L2 (i.e. nested guest) is running slower, 138 + ensure the above are enabled (particularly 139 + ``enable_shadow_vmcs`` and ``ept``). 140 + 141 + 142 + Starting a nested guest (x86) 143 + ----------------------------- 144 + 145 + Once your bare metal host (L0) is configured for nesting, you should be 146 + able to start an L1 guest with:: 147 + 148 + $ qemu-kvm -cpu host [...] 149 + 150 + The above will pass through the host CPU's capabilities as-is to the 151 + gues); or for better live migration compatibility, use a named CPU 152 + model supported by QEMU. e.g.:: 153 + 154 + $ qemu-kvm -cpu Haswell-noTSX-IBRS,vmx=on 155 + 156 + then the guest hypervisor will subsequently be capable of running a 157 + nested guest with accelerated KVM. 158 + 159 + 160 + Enabling "nested" (s390x) 161 + ------------------------- 162 + 163 + 1. On the host hypervisor (L0), enable the ``nested`` parameter on 164 + s390x:: 165 + 166 + $ rmmod kvm 167 + $ modprobe kvm nested=1 168 + 169 + .. note:: On s390x, the kernel parameter ``hpage`` is mutually exclusive 170 + with the ``nested`` paramter — i.e. to be able to enable 171 + ``nested``, the ``hpage`` parameter *must* be disabled. 172 + 173 + 2. The guest hypervisor (L1) must be provided with the ``sie`` CPU 174 + feature — with QEMU, this can be done by using "host passthrough" 175 + (via the command-line ``-cpu host``). 176 + 177 + 3. Now the KVM module can be loaded in the L1 (guest hypervisor):: 178 + 179 + $ modprobe kvm 180 + 181 + 182 + Live migration with nested KVM 183 + ------------------------------ 184 + 185 + Migrating an L1 guest, with a *live* nested guest in it, to another 186 + bare metal host, works as of Linux kernel 5.3 and QEMU 4.2.0 for 187 + Intel x86 systems, and even on older versions for s390x. 188 + 189 + On AMD systems, once an L1 guest has started an L2 guest, the L1 guest 190 + should no longer be migrated or saved (refer to QEMU documentation on 191 + "savevm"/"loadvm") until the L2 guest shuts down. Attempting to migrate 192 + or save-and-load an L1 guest while an L2 guest is running will result in 193 + undefined behavior. You might see a ``kernel BUG!`` entry in ``dmesg``, a 194 + kernel 'oops', or an outright kernel panic. Such a migrated or loaded L1 195 + guest can no longer be considered stable or secure, and must be restarted. 196 + Migrating an L1 guest merely configured to support nesting, while not 197 + actually running L2 guests, is expected to function normally even on AMD 198 + systems but may fail once guests are started. 199 + 200 + Migrating an L2 guest is always expected to succeed, so all the following 201 + scenarios should work even on AMD systems: 202 + 203 + - Migrating a nested guest (L2) to another L1 guest on the *same* bare 204 + metal host. 205 + 206 + - Migrating a nested guest (L2) to another L1 guest on a *different* 207 + bare metal host. 208 + 209 + - Migrating a nested guest (L2) to a bare metal host. 210 + 211 + Reporting bugs from nested setups 212 + ----------------------------------- 213 + 214 + Debugging "nested" problems can involve sifting through log files across 215 + L0, L1 and L2; this can result in tedious back-n-forth between the bug 216 + reporter and the bug fixer. 217 + 218 + - Mention that you are in a "nested" setup. If you are running any kind 219 + of "nesting" at all, say so. Unfortunately, this needs to be called 220 + out because when reporting bugs, people tend to forget to even 221 + *mention* that they're using nested virtualization. 222 + 223 + - Ensure you are actually running KVM on KVM. Sometimes people do not 224 + have KVM enabled for their guest hypervisor (L1), which results in 225 + them running with pure emulation or what QEMU calls it as "TCG", but 226 + they think they're running nested KVM. Thus confusing "nested Virt" 227 + (which could also mean, QEMU on KVM) with "nested KVM" (KVM on KVM). 228 + 229 + Information to collect (generic) 230 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 231 + 232 + The following is not an exhaustive list, but a very good starting point: 233 + 234 + - Kernel, libvirt, and QEMU version from L0 235 + 236 + - Kernel, libvirt and QEMU version from L1 237 + 238 + - QEMU command-line of L1 -- when using libvirt, you'll find it here: 239 + ``/var/log/libvirt/qemu/instance.log`` 240 + 241 + - QEMU command-line of L2 -- as above, when using libvirt, get the 242 + complete libvirt-generated QEMU command-line 243 + 244 + - ``cat /sys/cpuinfo`` from L0 245 + 246 + - ``cat /sys/cpuinfo`` from L1 247 + 248 + - ``lscpu`` from L0 249 + 250 + - ``lscpu`` from L1 251 + 252 + - Full ``dmesg`` output from L0 253 + 254 + - Full ``dmesg`` output from L1 255 + 256 + x86-specific info to collect 257 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 258 + 259 + Both the below commands, ``x86info`` and ``dmidecode``, should be 260 + available on most Linux distributions with the same name: 261 + 262 + - Output of: ``x86info -a`` from L0 263 + 264 + - Output of: ``x86info -a`` from L1 265 + 266 + - Output of: ``dmidecode`` from L0 267 + 268 + - Output of: ``dmidecode`` from L1 269 + 270 + s390x-specific info to collect 271 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 272 + 273 + Along with the earlier mentioned generic details, the below is 274 + also recommended: 275 + 276 + - ``/proc/sysinfo`` from L1; this will also include the info from L0

arch/arm64/kvm/guest.c

··· 200 200 } 201 201 202 202 memcpy((u32 *)regs + off, valp, KVM_REG_SIZE(reg->id)); 203 + 204 + if (*vcpu_cpsr(vcpu) & PSR_MODE32_BIT) { 205 + int i; 206 + 207 + for (i = 0; i < 16; i++) 208 + *vcpu_reg32(vcpu, i) = (u32)*vcpu_reg32(vcpu, i); 209 + } 203 210 out: 204 211 return err; 205 212 }

+23

arch/arm64/kvm/hyp/entry.S

··· 18 18 19 19 #define CPU_GP_REG_OFFSET(x) (CPU_GP_REGS + x) 20 20 #define CPU_XREG_OFFSET(x) CPU_GP_REG_OFFSET(CPU_USER_PT_REGS + 8*x) 21 + #define CPU_SP_EL0_OFFSET (CPU_XREG_OFFSET(30) + 8) 21 22 22 23 .text 23 24 .pushsection .hyp.text, "ax" ··· 48 47 ldp x29, lr, [\ctxt, #CPU_XREG_OFFSET(29)] 49 48 .endm 50 49 50 + .macro save_sp_el0 ctxt, tmp 51 + mrs \tmp, sp_el0 52 + str \tmp, [\ctxt, #CPU_SP_EL0_OFFSET] 53 + .endm 54 + 55 + .macro restore_sp_el0 ctxt, tmp 56 + ldr \tmp, [\ctxt, #CPU_SP_EL0_OFFSET] 57 + msr sp_el0, \tmp 58 + .endm 59 + 51 60 /* 52 61 * u64 __guest_enter(struct kvm_vcpu *vcpu, 53 62 * struct kvm_cpu_context *host_ctxt); ··· 70 59 71 60 // Store the host regs 72 61 save_callee_saved_regs x1 62 + 63 + // Save the host's sp_el0 64 + save_sp_el0 x1, x2 73 65 74 66 // Now the host state is stored if we have a pending RAS SError it must 75 67 // affect the host. If any asynchronous exception is pending we defer ··· 96 82 // as it may cause Pointer Authentication key signing mismatch errors 97 83 // when this feature is enabled for kernel code. 98 84 ptrauth_switch_to_guest x29, x0, x1, x2 85 + 86 + // Restore the guest's sp_el0 87 + restore_sp_el0 x29, x0 99 88 100 89 // Restore guest regs x0-x17 101 90 ldp x0, x1, [x29, #CPU_XREG_OFFSET(0)] ··· 147 130 // Store the guest regs x18-x29, lr 148 131 save_callee_saved_regs x1 149 132 133 + // Store the guest's sp_el0 134 + save_sp_el0 x1, x2 135 + 150 136 get_host_ctxt x2, x3 151 137 152 138 // Macro ptrauth_switch_to_guest format: ··· 158 138 // as it may cause Pointer Authentication key signing mismatch errors 159 139 // when this feature is enabled for kernel code. 160 140 ptrauth_switch_to_host x1, x2, x3, x4, x5 141 + 142 + // Restore the hosts's sp_el0 143 + restore_sp_el0 x2, x3 161 144 162 145 // Now restore the host regs 163 146 restore_callee_saved_regs x2

-1

arch/arm64/kvm/hyp/hyp-entry.S

··· 198 198 .macro invalid_vector label, target = __hyp_panic 199 199 .align 2 200 200 SYM_CODE_START(\label) 201 - \label: 202 201 b \target 203 202 SYM_CODE_END(\label) 204 203 .endm

+3 -14

arch/arm64/kvm/hyp/sysreg-sr.c

··· 15 15 /* 16 16 * Non-VHE: Both host and guest must save everything. 17 17 * 18 - * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and pstate, 19 - * which are handled as part of the el2 return state) on every switch. 18 + * VHE: Host and guest must save mdscr_el1 and sp_el0 (and the PC and 19 + * pstate, which are handled as part of the el2 return state) on every 20 + * switch (sp_el0 is being dealt with in the assembly code). 20 21 * tpidr_el0 and tpidrro_el0 only need to be switched when going 21 22 * to host userspace or a different VCPU. EL1 registers only need to be 22 23 * switched when potentially going to run a different VCPU. The latter two ··· 27 26 static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt) 28 27 { 29 28 ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1); 30 - 31 - /* 32 - * The host arm64 Linux uses sp_el0 to point to 'current' and it must 33 - * therefore be saved/restored on every entry/exit to/from the guest. 34 - */ 35 - ctxt->gp_regs.regs.sp = read_sysreg(sp_el0); 36 29 } 37 30 38 31 static void __hyp_text __sysreg_save_user_state(struct kvm_cpu_context *ctxt) ··· 94 99 static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt) 95 100 { 96 101 write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1); 97 - 98 - /* 99 - * The host arm64 Linux uses sp_el0 to point to 'current' and it must 100 - * therefore be saved/restored on every entry/exit to/from the guest. 101 - */ 102 - write_sysreg(ctxt->gp_regs.regs.sp, sp_el0); 103 102 } 104 103 105 104 static void __hyp_text __sysreg_restore_user_state(struct kvm_cpu_context *ctxt)

arch/powerpc/kvm/powerpc.c

··· 521 521 case KVM_CAP_IOEVENTFD: 522 522 case KVM_CAP_DEVICE_CTRL: 523 523 case KVM_CAP_IMMEDIATE_EXIT: 524 + case KVM_CAP_SET_GUEST_DEBUG: 524 525 r = 1; 525 526 break; 526 527 case KVM_CAP_PPC_GUEST_DEBUG_SSTEP:

arch/s390/kvm/kvm-s390.c

··· 545 545 case KVM_CAP_S390_AIS: 546 546 case KVM_CAP_S390_AIS_MIGRATION: 547 547 case KVM_CAP_S390_VCPU_RESETS: 548 + case KVM_CAP_SET_GUEST_DEBUG: 548 549 r = 1; 549 550 break; 550 551 case KVM_CAP_S390_HPAGE_1M:

+3 -1

arch/s390/kvm/priv.c

··· 626 626 * available for the guest are AQIC and TAPQ with the t bit set 627 627 * since we do not set IC.3 (FIII) we currently will only intercept 628 628 * the AQIC function code. 629 + * Note: running nested under z/VM can result in intercepts for other 630 + * function codes, e.g. PQAP(QCI). We do not support this and bail out. 629 631 */ 630 632 reg0 = vcpu->run->s.regs.gprs[0]; 631 633 fc = (reg0 >> 24) & 0xff; 632 - if (WARN_ON_ONCE(fc != 0x03)) 634 + if (fc != 0x03) 633 635 return -EOPNOTSUPP; 634 636 635 637 /* PQAP instruction is allowed for guest kernel only */

+2 -2

arch/x86/include/asm/kvm_host.h

··· 1663 1663 static inline bool kvm_irq_is_postable(struct kvm_lapic_irq *irq) 1664 1664 { 1665 1665 /* We can only post Fixed and LowPrio IRQs */ 1666 - return (irq->delivery_mode == dest_Fixed || 1667 - irq->delivery_mode == dest_LowestPrio); 1666 + return (irq->delivery_mode == APIC_DM_FIXED || 1667 + irq->delivery_mode == APIC_DM_LOWEST); 1668 1668 } 1669 1669 1670 1670 static inline void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu)

+5 -5

arch/x86/kvm/ioapic.c

··· 225 225 } 226 226 227 227 /* 228 - * AMD SVM AVIC accelerate EOI write and do not trap, 229 - * in-kernel IOAPIC will not be able to receive the EOI. 230 - * In this case, we do lazy update of the pending EOI when 231 - * trying to set IOAPIC irq. 228 + * AMD SVM AVIC accelerate EOI write iff the interrupt is edge 229 + * triggered, in which case the in-kernel IOAPIC will not be able 230 + * to receive the EOI. In this case, we do a lazy update of the 231 + * pending EOI when trying to set IOAPIC irq. 232 232 */ 233 - if (kvm_apicv_activated(ioapic->kvm)) 233 + if (edge && kvm_apicv_activated(ioapic->kvm)) 234 234 ioapic_lazy_update_eoi(ioapic, irq); 235 235 236 236 /*

arch/x86/kvm/svm/svm.c

··· 1752 1752 if (svm->vcpu.guest_debug & 1753 1753 (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) { 1754 1754 kvm_run->exit_reason = KVM_EXIT_DEBUG; 1755 + kvm_run->debug.arch.dr6 = svm->vmcb->save.dr6; 1756 + kvm_run->debug.arch.dr7 = svm->vmcb->save.dr7; 1755 1757 kvm_run->debug.arch.pc = 1756 1758 svm->vmcb->save.cs.base + svm->vmcb->save.rip; 1757 1759 kvm_run->debug.arch.exception = DB_VECTOR;

+1 -1

arch/x86/kvm/vmx/nested.c

··· 5165 5165 */ 5166 5166 break; 5167 5167 default: 5168 - BUG_ON(1); 5168 + BUG(); 5169 5169 break; 5170 5170 } 5171 5171

arch/x86/kvm/vmx/vmenter.S

··· 82 82 /* IMPORTANT: Stuff the RSB immediately after VM-Exit, before RET! */ 83 83 FILL_RETURN_BUFFER %_ASM_AX, RSB_CLEAR_LOOPS, X86_FEATURE_RETPOLINE 84 84 85 + /* Clear RFLAGS.CF and RFLAGS.ZF to preserve VM-Exit, i.e. !VM-Fail. */ 86 + or $1, %_ASM_AX 87 + 85 88 pop %_ASM_AX 86 89 .Lvmexit_skip_rsb: 87 90 #endif

+6 -15

arch/x86/kvm/x86.c

··· 926 926 __reserved_bits; \ 927 927 }) 928 928 929 - static u64 kvm_host_cr4_reserved_bits(struct cpuinfo_x86 *c) 930 - { 931 - u64 reserved_bits = __cr4_reserved_bits(cpu_has, c); 932 - 933 - if (kvm_cpu_cap_has(X86_FEATURE_LA57)) 934 - reserved_bits &= ~X86_CR4_LA57; 935 - 936 - if (kvm_cpu_cap_has(X86_FEATURE_UMIP)) 937 - reserved_bits &= ~X86_CR4_UMIP; 938 - 939 - return reserved_bits; 940 - } 941 - 942 929 static int kvm_valid_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) 943 930 { 944 931 if (cr4 & cr4_reserved_bits) ··· 3372 3385 case KVM_CAP_GET_MSR_FEATURES: 3373 3386 case KVM_CAP_MSR_PLATFORM_INFO: 3374 3387 case KVM_CAP_EXCEPTION_PAYLOAD: 3388 + case KVM_CAP_SET_GUEST_DEBUG: 3375 3389 r = 1; 3376 3390 break; 3377 3391 case KVM_CAP_SYNC_REGS: ··· 9663 9675 if (!kvm_cpu_cap_has(X86_FEATURE_XSAVES)) 9664 9676 supported_xss = 0; 9665 9677 9666 - cr4_reserved_bits = kvm_host_cr4_reserved_bits(&boot_cpu_data); 9678 + #define __kvm_cpu_cap_has(UNUSED_, f) kvm_cpu_cap_has(f) 9679 + cr4_reserved_bits = __cr4_reserved_bits(__kvm_cpu_cap_has, UNUSED_); 9680 + #undef __kvm_cpu_cap_has 9667 9681 9668 9682 if (kvm_has_tsc_control) { 9669 9683 /* ··· 9697 9707 9698 9708 WARN_ON(!irqs_disabled()); 9699 9709 9700 - if (kvm_host_cr4_reserved_bits(c) != cr4_reserved_bits) 9710 + if (__cr4_reserved_bits(cpu_has, c) != 9711 + __cr4_reserved_bits(cpu_has, &boot_cpu_data)) 9701 9712 return -EIO; 9702 9713 9703 9714 return ops->check_processor_compatibility();

+2 -2

tools/testing/selftests/kvm/include/evmcs.h

··· 219 219 #define HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_MASK \ 220 220 (~((1ull << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) - 1)) 221 221 222 - struct hv_enlightened_vmcs *current_evmcs; 223 - struct hv_vp_assist_page *current_vp_assist; 222 + extern struct hv_enlightened_vmcs *current_evmcs; 223 + extern struct hv_vp_assist_page *current_vp_assist; 224 224 225 225 int vcpu_enable_evmcs(struct kvm_vm *vm, int vcpu_id); 226 226

tools/testing/selftests/kvm/lib/x86_64/vmx.c

··· 17 17 18 18 bool enable_evmcs; 19 19 20 + struct hv_enlightened_vmcs *current_evmcs; 21 + struct hv_vp_assist_page *current_vp_assist; 22 + 20 23 struct eptPageTableEntry { 21 24 uint64_t readable:1; 22 25 uint64_t writable:1;

+6 -2

virt/kvm/arm/hyp/aarch32.c

··· 125 125 */ 126 126 void __hyp_text kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr) 127 127 { 128 + u32 pc = *vcpu_pc(vcpu); 128 129 bool is_thumb; 129 130 130 131 is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_AA32_T_BIT); 131 132 if (is_thumb && !is_wide_instr) 132 - *vcpu_pc(vcpu) += 2; 133 + pc += 2; 133 134 else 134 - *vcpu_pc(vcpu) += 4; 135 + pc += 4; 136 + 137 + *vcpu_pc(vcpu) = pc; 138 + 135 139 kvm_adjust_itstate(vcpu); 136 140 }

+40

virt/kvm/arm/psci.c

··· 186 186 kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET); 187 187 } 188 188 189 + static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu) 190 + { 191 + int i; 192 + 193 + /* 194 + * Zero the input registers' upper 32 bits. They will be fully 195 + * zeroed on exit, so we're fine changing them in place. 196 + */ 197 + for (i = 1; i < 4; i++) 198 + vcpu_set_reg(vcpu, i, lower_32_bits(vcpu_get_reg(vcpu, i))); 199 + } 200 + 201 + static unsigned long kvm_psci_check_allowed_function(struct kvm_vcpu *vcpu, u32 fn) 202 + { 203 + switch(fn) { 204 + case PSCI_0_2_FN64_CPU_SUSPEND: 205 + case PSCI_0_2_FN64_CPU_ON: 206 + case PSCI_0_2_FN64_AFFINITY_INFO: 207 + /* Disallow these functions for 32bit guests */ 208 + if (vcpu_mode_is_32bit(vcpu)) 209 + return PSCI_RET_NOT_SUPPORTED; 210 + break; 211 + } 212 + 213 + return 0; 214 + } 215 + 189 216 static int kvm_psci_0_2_call(struct kvm_vcpu *vcpu) 190 217 { 191 218 struct kvm *kvm = vcpu->kvm; 192 219 u32 psci_fn = smccc_get_function(vcpu); 193 220 unsigned long val; 194 221 int ret = 1; 222 + 223 + val = kvm_psci_check_allowed_function(vcpu, psci_fn); 224 + if (val) 225 + goto out; 195 226 196 227 switch (psci_fn) { 197 228 case PSCI_0_2_FN_PSCI_VERSION: ··· 241 210 val = PSCI_RET_SUCCESS; 242 211 break; 243 212 case PSCI_0_2_FN_CPU_ON: 213 + kvm_psci_narrow_to_32bit(vcpu); 214 + fallthrough; 244 215 case PSCI_0_2_FN64_CPU_ON: 245 216 mutex_lock(&kvm->lock); 246 217 val = kvm_psci_vcpu_on(vcpu); 247 218 mutex_unlock(&kvm->lock); 248 219 break; 249 220 case PSCI_0_2_FN_AFFINITY_INFO: 221 + kvm_psci_narrow_to_32bit(vcpu); 222 + fallthrough; 250 223 case PSCI_0_2_FN64_AFFINITY_INFO: 251 224 val = kvm_psci_vcpu_affinity_info(vcpu); 252 225 break; ··· 291 256 break; 292 257 } 293 258 259 + out: 294 260 smccc_set_retval(vcpu, val, 0, 0, 0); 295 261 return ret; 296 262 } ··· 309 273 break; 310 274 case PSCI_1_0_FN_PSCI_FEATURES: 311 275 feature = smccc_get_arg1(vcpu); 276 + val = kvm_psci_check_allowed_function(vcpu, feature); 277 + if (val) 278 + break; 279 + 312 280 switch(feature) { 313 281 case PSCI_0_2_FN_PSCI_VERSION: 314 282 case PSCI_0_2_FN_CPU_SUSPEND:

+16 -3

virt/kvm/arm/vgic/vgic-init.c

··· 294 294 } 295 295 } 296 296 297 - if (vgic_has_its(kvm)) { 297 + if (vgic_has_its(kvm)) 298 298 vgic_lpi_translation_cache_init(kvm); 299 + 300 + /* 301 + * If we have GICv4.1 enabled, unconditionnaly request enable the 302 + * v4 support so that we get HW-accelerated vSGIs. Otherwise, only 303 + * enable it if we present a virtual ITS to the guest. 304 + */ 305 + if (vgic_supports_direct_msis(kvm)) { 299 306 ret = vgic_v4_init(kvm); 300 307 if (ret) 301 308 goto out; ··· 355 348 { 356 349 struct vgic_cpu *vgic_cpu = &vcpu->arch.vgic_cpu; 357 350 351 + /* 352 + * Retire all pending LPIs on this vcpu anyway as we're 353 + * going to destroy it. 354 + */ 355 + vgic_flush_pending_lpis(vcpu); 356 + 358 357 INIT_LIST_HEAD(&vgic_cpu->ap_list_head); 359 358 } 360 359 ··· 372 359 373 360 vgic_debug_destroy(kvm); 374 361 375 - kvm_vgic_dist_destroy(kvm); 376 - 377 362 kvm_for_each_vcpu(i, vcpu, kvm) 378 363 kvm_vgic_vcpu_destroy(vcpu); 364 + 365 + kvm_vgic_dist_destroy(kvm); 379 366 } 380 367 381 368 void kvm_vgic_destroy(struct kvm *kvm)

+9 -2

virt/kvm/arm/vgic/vgic-its.c

··· 96 96 * We "cache" the configuration table entries in our struct vgic_irq's. 97 97 * However we only have those structs for mapped IRQs, so we read in 98 98 * the respective config data from memory here upon mapping the LPI. 99 + * 100 + * Should any of these fail, behave as if we couldn't create the LPI 101 + * by dropping the refcount and returning the error. 99 102 */ 100 103 ret = update_lpi_config(kvm, irq, NULL, false); 101 - if (ret) 104 + if (ret) { 105 + vgic_put_irq(kvm, irq); 102 106 return ERR_PTR(ret); 107 + } 103 108 104 109 ret = vgic_v3_lpi_sync_pending_status(kvm, irq); 105 - if (ret) 110 + if (ret) { 111 + vgic_put_irq(kvm, irq); 106 112 return ERR_PTR(ret); 113 + } 107 114 108 115 return irq; 109 116 }

+10 -6

virt/kvm/arm/vgic/vgic-mmio-v2.c

··· 409 409 NULL, vgic_mmio_uaccess_write_v2_group, 1, 410 410 VGIC_ACCESS_32bit), 411 411 REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_SET, 412 - vgic_mmio_read_enable, vgic_mmio_write_senable, NULL, NULL, 1, 412 + vgic_mmio_read_enable, vgic_mmio_write_senable, 413 + NULL, vgic_uaccess_write_senable, 1, 413 414 VGIC_ACCESS_32bit), 414 415 REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ENABLE_CLEAR, 415 - vgic_mmio_read_enable, vgic_mmio_write_cenable, NULL, NULL, 1, 416 + vgic_mmio_read_enable, vgic_mmio_write_cenable, 417 + NULL, vgic_uaccess_write_cenable, 1, 416 418 VGIC_ACCESS_32bit), 417 419 REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_SET, 418 - vgic_mmio_read_pending, vgic_mmio_write_spending, NULL, NULL, 1, 420 + vgic_mmio_read_pending, vgic_mmio_write_spending, 421 + NULL, vgic_uaccess_write_spending, 1, 419 422 VGIC_ACCESS_32bit), 420 423 REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PENDING_CLEAR, 421 - vgic_mmio_read_pending, vgic_mmio_write_cpending, NULL, NULL, 1, 424 + vgic_mmio_read_pending, vgic_mmio_write_cpending, 425 + NULL, vgic_uaccess_write_cpending, 1, 422 426 VGIC_ACCESS_32bit), 423 427 REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_SET, 424 428 vgic_mmio_read_active, vgic_mmio_write_sactive, 425 - NULL, vgic_mmio_uaccess_write_sactive, 1, 429 + vgic_uaccess_read_active, vgic_mmio_uaccess_write_sactive, 1, 426 430 VGIC_ACCESS_32bit), 427 431 REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_ACTIVE_CLEAR, 428 432 vgic_mmio_read_active, vgic_mmio_write_cactive, 429 - NULL, vgic_mmio_uaccess_write_cactive, 1, 433 + vgic_uaccess_read_active, vgic_mmio_uaccess_write_cactive, 1, 430 434 VGIC_ACCESS_32bit), 431 435 REGISTER_DESC_WITH_BITS_PER_IRQ(GIC_DIST_PRI, 432 436 vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL,

+18 -13

virt/kvm/arm/vgic/vgic-mmio-v3.c

··· 50 50 51 51 bool vgic_supports_direct_msis(struct kvm *kvm) 52 52 { 53 - return kvm_vgic_global_state.has_gicv4 && vgic_has_its(kvm); 53 + return (kvm_vgic_global_state.has_gicv4_1 || 54 + (kvm_vgic_global_state.has_gicv4 && vgic_has_its(kvm))); 54 55 } 55 56 56 57 /* ··· 539 538 vgic_mmio_read_group, vgic_mmio_write_group, NULL, NULL, 1, 540 539 VGIC_ACCESS_32bit), 541 540 REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISENABLER, 542 - vgic_mmio_read_enable, vgic_mmio_write_senable, NULL, NULL, 1, 541 + vgic_mmio_read_enable, vgic_mmio_write_senable, 542 + NULL, vgic_uaccess_write_senable, 1, 543 543 VGIC_ACCESS_32bit), 544 544 REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICENABLER, 545 - vgic_mmio_read_enable, vgic_mmio_write_cenable, NULL, NULL, 1, 545 + vgic_mmio_read_enable, vgic_mmio_write_cenable, 546 + NULL, vgic_uaccess_write_cenable, 1, 546 547 VGIC_ACCESS_32bit), 547 548 REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISPENDR, 548 549 vgic_mmio_read_pending, vgic_mmio_write_spending, ··· 556 553 VGIC_ACCESS_32bit), 557 554 REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ISACTIVER, 558 555 vgic_mmio_read_active, vgic_mmio_write_sactive, 559 - NULL, vgic_mmio_uaccess_write_sactive, 1, 556 + vgic_uaccess_read_active, vgic_mmio_uaccess_write_sactive, 1, 560 557 VGIC_ACCESS_32bit), 561 558 REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_ICACTIVER, 562 559 vgic_mmio_read_active, vgic_mmio_write_cactive, 563 - NULL, vgic_mmio_uaccess_write_cactive, 560 + vgic_uaccess_read_active, vgic_mmio_uaccess_write_cactive, 564 561 1, VGIC_ACCESS_32bit), 565 562 REGISTER_DESC_WITH_BITS_PER_IRQ_SHARED(GICD_IPRIORITYR, 566 563 vgic_mmio_read_priority, vgic_mmio_write_priority, NULL, NULL, ··· 612 609 REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_IGROUPR0, 613 610 vgic_mmio_read_group, vgic_mmio_write_group, 4, 614 611 VGIC_ACCESS_32bit), 615 - REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_ISENABLER0, 616 - vgic_mmio_read_enable, vgic_mmio_write_senable, 4, 612 + REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ISENABLER0, 613 + vgic_mmio_read_enable, vgic_mmio_write_senable, 614 + NULL, vgic_uaccess_write_senable, 4, 617 615 VGIC_ACCESS_32bit), 618 - REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_ICENABLER0, 619 - vgic_mmio_read_enable, vgic_mmio_write_cenable, 4, 616 + REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ICENABLER0, 617 + vgic_mmio_read_enable, vgic_mmio_write_cenable, 618 + NULL, vgic_uaccess_write_cenable, 4, 620 619 VGIC_ACCESS_32bit), 621 620 REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ISPENDR0, 622 621 vgic_mmio_read_pending, vgic_mmio_write_spending, ··· 630 625 VGIC_ACCESS_32bit), 631 626 REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ISACTIVER0, 632 627 vgic_mmio_read_active, vgic_mmio_write_sactive, 633 - NULL, vgic_mmio_uaccess_write_sactive, 634 - 4, VGIC_ACCESS_32bit), 628 + vgic_uaccess_read_active, vgic_mmio_uaccess_write_sactive, 4, 629 + VGIC_ACCESS_32bit), 635 630 REGISTER_DESC_WITH_LENGTH_UACCESS(SZ_64K + GICR_ICACTIVER0, 636 631 vgic_mmio_read_active, vgic_mmio_write_cactive, 637 - NULL, vgic_mmio_uaccess_write_cactive, 638 - 4, VGIC_ACCESS_32bit), 632 + vgic_uaccess_read_active, vgic_mmio_uaccess_write_cactive, 4, 633 + VGIC_ACCESS_32bit), 639 634 REGISTER_DESC_WITH_LENGTH(SZ_64K + GICR_IPRIORITYR0, 640 635 vgic_mmio_read_priority, vgic_mmio_write_priority, 32, 641 636 VGIC_ACCESS_32bit | VGIC_ACCESS_8bit),

+170 -58

virt/kvm/arm/vgic/vgic-mmio.c

··· 184 184 } 185 185 } 186 186 187 + int vgic_uaccess_write_senable(struct kvm_vcpu *vcpu, 188 + gpa_t addr, unsigned int len, 189 + unsigned long val) 190 + { 191 + u32 intid = VGIC_ADDR_TO_INTID(addr, 1); 192 + int i; 193 + unsigned long flags; 194 + 195 + for_each_set_bit(i, &val, len * 8) { 196 + struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i); 197 + 198 + raw_spin_lock_irqsave(&irq->irq_lock, flags); 199 + irq->enabled = true; 200 + vgic_queue_irq_unlock(vcpu->kvm, irq, flags); 201 + 202 + vgic_put_irq(vcpu->kvm, irq); 203 + } 204 + 205 + return 0; 206 + } 207 + 208 + int vgic_uaccess_write_cenable(struct kvm_vcpu *vcpu, 209 + gpa_t addr, unsigned int len, 210 + unsigned long val) 211 + { 212 + u32 intid = VGIC_ADDR_TO_INTID(addr, 1); 213 + int i; 214 + unsigned long flags; 215 + 216 + for_each_set_bit(i, &val, len * 8) { 217 + struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i); 218 + 219 + raw_spin_lock_irqsave(&irq->irq_lock, flags); 220 + irq->enabled = false; 221 + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); 222 + 223 + vgic_put_irq(vcpu->kvm, irq); 224 + } 225 + 226 + return 0; 227 + } 228 + 187 229 unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu, 188 230 gpa_t addr, unsigned int len) 189 231 { ··· 261 219 return value; 262 220 } 263 221 264 - /* Must be called with irq->irq_lock held */ 265 - static void vgic_hw_irq_spending(struct kvm_vcpu *vcpu, struct vgic_irq *irq, 266 - bool is_uaccess) 267 - { 268 - if (is_uaccess) 269 - return; 270 - 271 - irq->pending_latch = true; 272 - vgic_irq_set_phys_active(irq, true); 273 - } 274 - 275 222 static bool is_vgic_v2_sgi(struct kvm_vcpu *vcpu, struct vgic_irq *irq) 276 223 { 277 224 return (vgic_irq_is_sgi(irq->intid) && ··· 271 240 gpa_t addr, unsigned int len, 272 241 unsigned long val) 273 242 { 274 - bool is_uaccess = !kvm_get_running_vcpu(); 275 243 u32 intid = VGIC_ADDR_TO_INTID(addr, 1); 276 244 int i; 277 245 unsigned long flags; ··· 300 270 continue; 301 271 } 302 272 273 + irq->pending_latch = true; 303 274 if (irq->hw) 304 - vgic_hw_irq_spending(vcpu, irq, is_uaccess); 305 - else 306 - irq->pending_latch = true; 275 + vgic_irq_set_phys_active(irq, true); 276 + 307 277 vgic_queue_irq_unlock(vcpu->kvm, irq, flags); 308 278 vgic_put_irq(vcpu->kvm, irq); 309 279 } 310 280 } 311 281 312 - /* Must be called with irq->irq_lock held */ 313 - static void vgic_hw_irq_cpending(struct kvm_vcpu *vcpu, struct vgic_irq *irq, 314 - bool is_uaccess) 282 + int vgic_uaccess_write_spending(struct kvm_vcpu *vcpu, 283 + gpa_t addr, unsigned int len, 284 + unsigned long val) 315 285 { 316 - if (is_uaccess) 317 - return; 286 + u32 intid = VGIC_ADDR_TO_INTID(addr, 1); 287 + int i; 288 + unsigned long flags; 318 289 290 + for_each_set_bit(i, &val, len * 8) { 291 + struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i); 292 + 293 + raw_spin_lock_irqsave(&irq->irq_lock, flags); 294 + irq->pending_latch = true; 295 + 296 + /* 297 + * GICv2 SGIs are terribly broken. We can't restore 298 + * the source of the interrupt, so just pick the vcpu 299 + * itself as the source... 300 + */ 301 + if (is_vgic_v2_sgi(vcpu, irq)) 302 + irq->source |= BIT(vcpu->vcpu_id); 303 + 304 + vgic_queue_irq_unlock(vcpu->kvm, irq, flags); 305 + 306 + vgic_put_irq(vcpu->kvm, irq); 307 + } 308 + 309 + return 0; 310 + } 311 + 312 + /* Must be called with irq->irq_lock held */ 313 + static void vgic_hw_irq_cpending(struct kvm_vcpu *vcpu, struct vgic_irq *irq) 314 + { 319 315 irq->pending_latch = false; 320 316 321 317 /* ··· 364 308 gpa_t addr, unsigned int len, 365 309 unsigned long val) 366 310 { 367 - bool is_uaccess = !kvm_get_running_vcpu(); 368 311 u32 intid = VGIC_ADDR_TO_INTID(addr, 1); 369 312 int i; 370 313 unsigned long flags; ··· 394 339 } 395 340 396 341 if (irq->hw) 397 - vgic_hw_irq_cpending(vcpu, irq, is_uaccess); 342 + vgic_hw_irq_cpending(vcpu, irq); 398 343 else 399 344 irq->pending_latch = false; 400 345 ··· 403 348 } 404 349 } 405 350 406 - unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu, 407 - gpa_t addr, unsigned int len) 351 + int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu, 352 + gpa_t addr, unsigned int len, 353 + unsigned long val) 354 + { 355 + u32 intid = VGIC_ADDR_TO_INTID(addr, 1); 356 + int i; 357 + unsigned long flags; 358 + 359 + for_each_set_bit(i, &val, len * 8) { 360 + struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i); 361 + 362 + raw_spin_lock_irqsave(&irq->irq_lock, flags); 363 + /* 364 + * More fun with GICv2 SGIs! If we're clearing one of them 365 + * from userspace, which source vcpu to clear? Let's not 366 + * even think of it, and blow the whole set. 367 + */ 368 + if (is_vgic_v2_sgi(vcpu, irq)) 369 + irq->source = 0; 370 + 371 + irq->pending_latch = false; 372 + 373 + raw_spin_unlock_irqrestore(&irq->irq_lock, flags); 374 + 375 + vgic_put_irq(vcpu->kvm, irq); 376 + } 377 + 378 + return 0; 379 + } 380 + 381 + /* 382 + * If we are fiddling with an IRQ's active state, we have to make sure the IRQ 383 + * is not queued on some running VCPU's LRs, because then the change to the 384 + * active state can be overwritten when the VCPU's state is synced coming back 385 + * from the guest. 386 + * 387 + * For shared interrupts as well as GICv3 private interrupts, we have to 388 + * stop all the VCPUs because interrupts can be migrated while we don't hold 389 + * the IRQ locks and we don't want to be chasing moving targets. 390 + * 391 + * For GICv2 private interrupts we don't have to do anything because 392 + * userspace accesses to the VGIC state already require all VCPUs to be 393 + * stopped, and only the VCPU itself can modify its private interrupts 394 + * active state, which guarantees that the VCPU is not running. 395 + */ 396 + static void vgic_access_active_prepare(struct kvm_vcpu *vcpu, u32 intid) 397 + { 398 + if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || 399 + intid >= VGIC_NR_PRIVATE_IRQS) 400 + kvm_arm_halt_guest(vcpu->kvm); 401 + } 402 + 403 + /* See vgic_access_active_prepare */ 404 + static void vgic_access_active_finish(struct kvm_vcpu *vcpu, u32 intid) 405 + { 406 + if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || 407 + intid >= VGIC_NR_PRIVATE_IRQS) 408 + kvm_arm_resume_guest(vcpu->kvm); 409 + } 410 + 411 + static unsigned long __vgic_mmio_read_active(struct kvm_vcpu *vcpu, 412 + gpa_t addr, unsigned int len) 408 413 { 409 414 u32 intid = VGIC_ADDR_TO_INTID(addr, 1); 410 415 u32 value = 0; ··· 474 359 for (i = 0; i < len * 8; i++) { 475 360 struct vgic_irq *irq = vgic_get_irq(vcpu->kvm, vcpu, intid + i); 476 361 362 + /* 363 + * Even for HW interrupts, don't evaluate the HW state as 364 + * all the guest is interested in is the virtual state. 365 + */ 477 366 if (irq->active) 478 367 value |= (1U << i); 479 368 ··· 485 366 } 486 367 487 368 return value; 369 + } 370 + 371 + unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu, 372 + gpa_t addr, unsigned int len) 373 + { 374 + u32 intid = VGIC_ADDR_TO_INTID(addr, 1); 375 + u32 val; 376 + 377 + mutex_lock(&vcpu->kvm->lock); 378 + vgic_access_active_prepare(vcpu, intid); 379 + 380 + val = __vgic_mmio_read_active(vcpu, addr, len); 381 + 382 + vgic_access_active_finish(vcpu, intid); 383 + mutex_unlock(&vcpu->kvm->lock); 384 + 385 + return val; 386 + } 387 + 388 + unsigned long vgic_uaccess_read_active(struct kvm_vcpu *vcpu, 389 + gpa_t addr, unsigned int len) 390 + { 391 + return __vgic_mmio_read_active(vcpu, addr, len); 488 392 } 489 393 490 394 /* Must be called with irq->irq_lock held */ ··· 568 426 raw_spin_unlock_irqrestore(&irq->irq_lock, flags); 569 427 } 570 428 571 - /* 572 - * If we are fiddling with an IRQ's active state, we have to make sure the IRQ 573 - * is not queued on some running VCPU's LRs, because then the change to the 574 - * active state can be overwritten when the VCPU's state is synced coming back 575 - * from the guest. 576 - * 577 - * For shared interrupts, we have to stop all the VCPUs because interrupts can 578 - * be migrated while we don't hold the IRQ locks and we don't want to be 579 - * chasing moving targets. 580 - * 581 - * For private interrupts we don't have to do anything because userspace 582 - * accesses to the VGIC state already require all VCPUs to be stopped, and 583 - * only the VCPU itself can modify its private interrupts active state, which 584 - * guarantees that the VCPU is not running. 585 - */ 586 - static void vgic_change_active_prepare(struct kvm_vcpu *vcpu, u32 intid) 587 - { 588 - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || 589 - intid > VGIC_NR_PRIVATE_IRQS) 590 - kvm_arm_halt_guest(vcpu->kvm); 591 - } 592 - 593 - /* See vgic_change_active_prepare */ 594 - static void vgic_change_active_finish(struct kvm_vcpu *vcpu, u32 intid) 595 - { 596 - if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3 || 597 - intid > VGIC_NR_PRIVATE_IRQS) 598 - kvm_arm_resume_guest(vcpu->kvm); 599 - } 600 - 601 429 static void __vgic_mmio_write_cactive(struct kvm_vcpu *vcpu, 602 430 gpa_t addr, unsigned int len, 603 431 unsigned long val) ··· 589 477 u32 intid = VGIC_ADDR_TO_INTID(addr, 1); 590 478 591 479 mutex_lock(&vcpu->kvm->lock); 592 - vgic_change_active_prepare(vcpu, intid); 480 + vgic_access_active_prepare(vcpu, intid); 593 481 594 482 __vgic_mmio_write_cactive(vcpu, addr, len, val); 595 483 596 - vgic_change_active_finish(vcpu, intid); 484 + vgic_access_active_finish(vcpu, intid); 597 485 mutex_unlock(&vcpu->kvm->lock); 598 486 } 599 487 ··· 626 514 u32 intid = VGIC_ADDR_TO_INTID(addr, 1); 627 515 628 516 mutex_lock(&vcpu->kvm->lock); 629 - vgic_change_active_prepare(vcpu, intid); 517 + vgic_access_active_prepare(vcpu, intid); 630 518 631 519 __vgic_mmio_write_sactive(vcpu, addr, len, val); 632 520 633 - vgic_change_active_finish(vcpu, intid); 521 + vgic_access_active_finish(vcpu, intid); 634 522 mutex_unlock(&vcpu->kvm->lock); 635 523 } 636 524

+19

virt/kvm/arm/vgic/vgic-mmio.h

··· 138 138 gpa_t addr, unsigned int len, 139 139 unsigned long val); 140 140 141 + int vgic_uaccess_write_senable(struct kvm_vcpu *vcpu, 142 + gpa_t addr, unsigned int len, 143 + unsigned long val); 144 + 145 + int vgic_uaccess_write_cenable(struct kvm_vcpu *vcpu, 146 + gpa_t addr, unsigned int len, 147 + unsigned long val); 148 + 141 149 unsigned long vgic_mmio_read_pending(struct kvm_vcpu *vcpu, 142 150 gpa_t addr, unsigned int len); 143 151 ··· 157 149 gpa_t addr, unsigned int len, 158 150 unsigned long val); 159 151 152 + int vgic_uaccess_write_spending(struct kvm_vcpu *vcpu, 153 + gpa_t addr, unsigned int len, 154 + unsigned long val); 155 + 156 + int vgic_uaccess_write_cpending(struct kvm_vcpu *vcpu, 157 + gpa_t addr, unsigned int len, 158 + unsigned long val); 159 + 160 160 unsigned long vgic_mmio_read_active(struct kvm_vcpu *vcpu, 161 + gpa_t addr, unsigned int len); 162 + 163 + unsigned long vgic_uaccess_read_active(struct kvm_vcpu *vcpu, 161 164 gpa_t addr, unsigned int len); 162 165 163 166 void vgic_mmio_write_cactive(struct kvm_vcpu *vcpu,