Merge tag 'kvm-4.9-1' of git://git.kernel.org/pub/scm/virt/kvm/kvm

+9

Documentation/kernel-parameters.txt

··· 460 460 driver will print ACPI tables for AMD IOMMU during 461 461 IOMMU initialization. 462 462 463 + amd_iommu_intr= [HW,X86-64] 464 + Specifies one of the following AMD IOMMU interrupt 465 + remapping modes: 466 + legacy - Use legacy interrupt remapping mode. 467 + vapic - Use virtual APIC mode, which allows IOMMU 468 + to inject interrupts directly into guest. 469 + This mode requires kvm-amd.avic=1. 470 + (Default when IOMMU HW support is present.) 471 + 463 472 amijoy.map= [HW,JOY] Amiga joystick support 464 473 Map of devices attached to JOY0DAT and JOY1DAT 465 474 Format: <a>,<b>

+38

Documentation/virtual/kvm/devices/arm-vgic-its.txt

··· 1 + ARM Virtual Interrupt Translation Service (ITS) 2 + =============================================== 3 + 4 + Device types supported: 5 + KVM_DEV_TYPE_ARM_VGIC_ITS ARM Interrupt Translation Service Controller 6 + 7 + The ITS allows MSI(-X) interrupts to be injected into guests. This extension is 8 + optional. Creating a virtual ITS controller also requires a host GICv3 (see 9 + arm-vgic-v3.txt), but does not depend on having physical ITS controllers. 10 + 11 + There can be multiple ITS controllers per guest, each of them has to have 12 + a separate, non-overlapping MMIO region. 13 + 14 + 15 + Groups: 16 + KVM_DEV_ARM_VGIC_GRP_ADDR 17 + Attributes: 18 + KVM_VGIC_ITS_ADDR_TYPE (rw, 64-bit) 19 + Base address in the guest physical address space of the GICv3 ITS 20 + control register frame. 21 + This address needs to be 64K aligned and the region covers 128K. 22 + Errors: 23 + -E2BIG: Address outside of addressable IPA range 24 + -EINVAL: Incorrectly aligned address 25 + -EEXIST: Address already configured 26 + -EFAULT: Invalid user pointer for attr->addr. 27 + -ENODEV: Incorrect attribute or the ITS is not supported. 28 + 29 + 30 + KVM_DEV_ARM_VGIC_GRP_CTRL 31 + Attributes: 32 + KVM_DEV_ARM_VGIC_CTRL_INIT 33 + request the initialization of the ITS, no additional parameter in 34 + kvm_device_attr.addr. 35 + Errors: 36 + -ENXIO: ITS not properly configured as required prior to setting 37 + this attribute 38 + -ENOMEM: Memory shortage when allocating ITS internal data

+206

Documentation/virtual/kvm/devices/arm-vgic-v3.txt

··· 1 + ARM Virtual Generic Interrupt Controller v3 and later (VGICv3) 2 + ============================================================== 3 + 4 + 5 + Device types supported: 6 + KVM_DEV_TYPE_ARM_VGIC_V3 ARM Generic Interrupt Controller v3.0 7 + 8 + Only one VGIC instance may be instantiated through this API. The created VGIC 9 + will act as the VM interrupt controller, requiring emulated user-space devices 10 + to inject interrupts to the VGIC instead of directly to CPUs. It is not 11 + possible to create both a GICv3 and GICv2 on the same VM. 12 + 13 + Creating a guest GICv3 device requires a host GICv3 as well. 14 + 15 + 16 + Groups: 17 + KVM_DEV_ARM_VGIC_GRP_ADDR 18 + Attributes: 19 + KVM_VGIC_V3_ADDR_TYPE_DIST (rw, 64-bit) 20 + Base address in the guest physical address space of the GICv3 distributor 21 + register mappings. Only valid for KVM_DEV_TYPE_ARM_VGIC_V3. 22 + This address needs to be 64K aligned and the region covers 64 KByte. 23 + 24 + KVM_VGIC_V3_ADDR_TYPE_REDIST (rw, 64-bit) 25 + Base address in the guest physical address space of the GICv3 26 + redistributor register mappings. There are two 64K pages for each 27 + VCPU and all of the redistributor pages are contiguous. 28 + Only valid for KVM_DEV_TYPE_ARM_VGIC_V3. 29 + This address needs to be 64K aligned. 30 + Errors: 31 + -E2BIG: Address outside of addressable IPA range 32 + -EINVAL: Incorrectly aligned address 33 + -EEXIST: Address already configured 34 + -ENXIO: The group or attribute is unknown/unsupported for this device 35 + or hardware support is missing. 36 + -EFAULT: Invalid user pointer for attr->addr. 37 + 38 + 39 + 40 + KVM_DEV_ARM_VGIC_GRP_DIST_REGS 41 + KVM_DEV_ARM_VGIC_GRP_REDIST_REGS 42 + Attributes: 43 + The attr field of kvm_device_attr encodes two values: 44 + bits: | 63 .... 32 | 31 .... 0 | 45 + values: | mpidr | offset | 46 + 47 + All distributor regs are (rw, 32-bit) and kvm_device_attr.addr points to a 48 + __u32 value. 64-bit registers must be accessed by separately accessing the 49 + lower and higher word. 50 + 51 + Writes to read-only registers are ignored by the kernel. 52 + 53 + KVM_DEV_ARM_VGIC_GRP_DIST_REGS accesses the main distributor registers. 54 + KVM_DEV_ARM_VGIC_GRP_REDIST_REGS accesses the redistributor of the CPU 55 + specified by the mpidr. 56 + 57 + The offset is relative to the "[Re]Distributor base address" as defined 58 + in the GICv3/4 specs. Getting or setting such a register has the same 59 + effect as reading or writing the register on real hardware, except for the 60 + following registers: GICD_STATUSR, GICR_STATUSR, GICD_ISPENDR, 61 + GICR_ISPENDR0, GICD_ICPENDR, and GICR_ICPENDR0. These registers behave 62 + differently when accessed via this interface compared to their 63 + architecturally defined behavior to allow software a full view of the 64 + VGIC's internal state. 65 + 66 + The mpidr field is used to specify which 67 + redistributor is accessed. The mpidr is ignored for the distributor. 68 + 69 + The mpidr encoding is based on the affinity information in the 70 + architecture defined MPIDR, and the field is encoded as follows: 71 + | 63 .... 56 | 55 .... 48 | 47 .... 40 | 39 .... 32 | 72 + | Aff3 | Aff2 | Aff1 | Aff0 | 73 + 74 + Note that distributor fields are not banked, but return the same value 75 + regardless of the mpidr used to access the register. 76 + 77 + The GICD_STATUSR and GICR_STATUSR registers are architecturally defined such 78 + that a write of a clear bit has no effect, whereas a write with a set bit 79 + clears that value. To allow userspace to freely set the values of these two 80 + registers, setting the attributes with the register offsets for these two 81 + registers simply sets the non-reserved bits to the value written. 82 + 83 + 84 + Accesses (reads and writes) to the GICD_ISPENDR register region and 85 + GICR_ISPENDR0 registers get/set the value of the latched pending state for 86 + the interrupts. 87 + 88 + This is identical to the value returned by a guest read from ISPENDR for an 89 + edge triggered interrupt, but may differ for level triggered interrupts. 90 + For edge triggered interrupts, once an interrupt becomes pending (whether 91 + because of an edge detected on the input line or because of a guest write 92 + to ISPENDR) this state is "latched", and only cleared when either the 93 + interrupt is activated or when the guest writes to ICPENDR. A level 94 + triggered interrupt may be pending either because the level input is held 95 + high by a device, or because of a guest write to the ISPENDR register. Only 96 + ISPENDR writes are latched; if the device lowers the line level then the 97 + interrupt is no longer pending unless the guest also wrote to ISPENDR, and 98 + conversely writes to ICPENDR or activations of the interrupt do not clear 99 + the pending status if the line level is still being held high. (These 100 + rules are documented in the GICv3 specification descriptions of the ICPENDR 101 + and ISPENDR registers.) For a level triggered interrupt the value accessed 102 + here is that of the latch which is set by ISPENDR and cleared by ICPENDR or 103 + interrupt activation, whereas the value returned by a guest read from 104 + ISPENDR is the logical OR of the latch value and the input line level. 105 + 106 + Raw access to the latch state is provided to userspace so that it can save 107 + and restore the entire GIC internal state (which is defined by the 108 + combination of the current input line level and the latch state, and cannot 109 + be deduced from purely the line level and the value of the ISPENDR 110 + registers). 111 + 112 + Accesses to GICD_ICPENDR register region and GICR_ICPENDR0 registers have 113 + RAZ/WI semantics, meaning that reads always return 0 and writes are always 114 + ignored. 115 + 116 + Errors: 117 + -ENXIO: Getting or setting this register is not yet supported 118 + -EBUSY: One or more VCPUs are running 119 + 120 + 121 + KVM_DEV_ARM_VGIC_CPU_SYSREGS 122 + Attributes: 123 + The attr field of kvm_device_attr encodes two values: 124 + bits: | 63 .... 32 | 31 .... 16 | 15 .... 0 | 125 + values: | mpidr | RES | instr | 126 + 127 + The mpidr field encodes the CPU ID based on the affinity information in the 128 + architecture defined MPIDR, and the field is encoded as follows: 129 + | 63 .... 56 | 55 .... 48 | 47 .... 40 | 39 .... 32 | 130 + | Aff3 | Aff2 | Aff1 | Aff0 | 131 + 132 + The instr field encodes the system register to access based on the fields 133 + defined in the A64 instruction set encoding for system register access 134 + (RES means the bits are reserved for future use and should be zero): 135 + 136 + | 15 ... 14 | 13 ... 11 | 10 ... 7 | 6 ... 3 | 2 ... 0 | 137 + | Op 0 | Op1 | CRn | CRm | Op2 | 138 + 139 + All system regs accessed through this API are (rw, 64-bit) and 140 + kvm_device_attr.addr points to a __u64 value. 141 + 142 + KVM_DEV_ARM_VGIC_CPU_SYSREGS accesses the CPU interface registers for the 143 + CPU specified by the mpidr field. 144 + 145 + Errors: 146 + -ENXIO: Getting or setting this register is not yet supported 147 + -EBUSY: VCPU is running 148 + -EINVAL: Invalid mpidr supplied 149 + 150 + 151 + KVM_DEV_ARM_VGIC_GRP_NR_IRQS 152 + Attributes: 153 + A value describing the number of interrupts (SGI, PPI and SPI) for 154 + this GIC instance, ranging from 64 to 1024, in increments of 32. 155 + 156 + kvm_device_attr.addr points to a __u32 value. 157 + 158 + Errors: 159 + -EINVAL: Value set is out of the expected range 160 + -EBUSY: Value has already be set. 161 + 162 + 163 + KVM_DEV_ARM_VGIC_GRP_CTRL 164 + Attributes: 165 + KVM_DEV_ARM_VGIC_CTRL_INIT 166 + request the initialization of the VGIC, no additional parameter in 167 + kvm_device_attr.addr. 168 + Errors: 169 + -ENXIO: VGIC not properly configured as required prior to calling 170 + this attribute 171 + -ENODEV: no online VCPU 172 + -ENOMEM: memory shortage when allocating vgic internal data 173 + 174 + 175 + KVM_DEV_ARM_VGIC_GRP_LEVEL_INFO 176 + Attributes: 177 + The attr field of kvm_device_attr encodes the following values: 178 + bits: | 63 .... 32 | 31 .... 10 | 9 .... 0 | 179 + values: | mpidr | info | vINTID | 180 + 181 + The vINTID specifies which set of IRQs is reported on. 182 + 183 + The info field specifies which information userspace wants to get or set 184 + using this interface. Currently we support the following info values: 185 + 186 + VGIC_LEVEL_INFO_LINE_LEVEL: 187 + Get/Set the input level of the IRQ line for a set of 32 contiguously 188 + numbered interrupts. 189 + vINTID must be a multiple of 32. 190 + 191 + kvm_device_attr.addr points to a __u32 value which will contain a 192 + bitmap where a set bit means the interrupt level is asserted. 193 + 194 + Bit[n] indicates the status for interrupt vINTID + n. 195 + 196 + SGIs and any interrupt with a higher ID than the number of interrupts 197 + supported, will be RAZ/WI. LPIs are always edge-triggered and are 198 + therefore not supported by this interface. 199 + 200 + PPIs are reported per VCPU as specified in the mpidr field, and SPIs are 201 + reported with the same value regardless of the mpidr specified. 202 + 203 + The mpidr field encodes the CPU ID based on the affinity information in the 204 + architecture defined MPIDR, and the field is encoded as follows: 205 + | 63 .... 56 | 55 .... 48 | 47 .... 40 | 39 .... 32 | 206 + | Aff3 | Aff2 | Aff1 | Aff0 |

+17 -35

Documentation/virtual/kvm/devices/arm-vgic.txt

··· 1 - ARM Virtual Generic Interrupt Controller (VGIC) 2 - =============================================== 1 + ARM Virtual Generic Interrupt Controller v2 (VGIC) 2 + ================================================== 3 3 4 4 Device types supported: 5 5 KVM_DEV_TYPE_ARM_VGIC_V2 ARM Generic Interrupt Controller v2.0 6 - KVM_DEV_TYPE_ARM_VGIC_V3 ARM Generic Interrupt Controller v3.0 7 - KVM_DEV_TYPE_ARM_VGIC_ITS ARM Interrupt Translation Service Controller 8 6 9 - Only one VGIC instance of the V2/V3 types above may be instantiated through 10 - either this API or the legacy KVM_CREATE_IRQCHIP api. The created VGIC will 11 - act as the VM interrupt controller, requiring emulated user-space devices to 12 - inject interrupts to the VGIC instead of directly to CPUs. 7 + Only one VGIC instance may be instantiated through either this API or the 8 + legacy KVM_CREATE_IRQCHIP API. The created VGIC will act as the VM interrupt 9 + controller, requiring emulated user-space devices to inject interrupts to the 10 + VGIC instead of directly to CPUs. 13 11 14 - Creating a guest GICv3 device requires a host GICv3 as well. 15 - GICv3 implementations with hardware compatibility support allow a guest GICv2 16 - as well. 12 + GICv3 implementations with hardware compatibility support allow creating a 13 + guest GICv2 through this interface. For information on creating a guest GICv3 14 + device and guest ITS devices, see arm-vgic-v3.txt. It is not possible to 15 + create both a GICv3 and GICv2 device on the same VM. 17 16 18 - Creating a virtual ITS controller requires a host GICv3 (but does not depend 19 - on having physical ITS controllers). 20 - There can be multiple ITS controllers per guest, each of them has to have 21 - a separate, non-overlapping MMIO region. 22 17 23 18 Groups: 24 19 KVM_DEV_ARM_VGIC_GRP_ADDR ··· 27 32 Base address in the guest physical address space of the GIC virtual cpu 28 33 interface register mappings. Only valid for KVM_DEV_TYPE_ARM_VGIC_V2. 29 34 This address needs to be 4K aligned and the region covers 4 KByte. 30 - 31 - KVM_VGIC_V3_ADDR_TYPE_DIST (rw, 64-bit) 32 - Base address in the guest physical address space of the GICv3 distributor 33 - register mappings. Only valid for KVM_DEV_TYPE_ARM_VGIC_V3. 34 - This address needs to be 64K aligned and the region covers 64 KByte. 35 - 36 - KVM_VGIC_V3_ADDR_TYPE_REDIST (rw, 64-bit) 37 - Base address in the guest physical address space of the GICv3 38 - redistributor register mappings. There are two 64K pages for each 39 - VCPU and all of the redistributor pages are contiguous. 40 - Only valid for KVM_DEV_TYPE_ARM_VGIC_V3. 41 - This address needs to be 64K aligned. 42 - 43 - KVM_VGIC_V3_ADDR_TYPE_ITS (rw, 64-bit) 44 - Base address in the guest physical address space of the GICv3 ITS 45 - control register frame. The ITS allows MSI(-X) interrupts to be 46 - injected into guests. This extension is optional. If the kernel 47 - does not support the ITS, the call returns -ENODEV. 48 - Only valid for KVM_DEV_TYPE_ARM_VGIC_ITS. 49 - This address needs to be 64K aligned and the region covers 128K. 35 + Errors: 36 + -E2BIG: Address outside of addressable IPA range 37 + -EINVAL: Incorrectly aligned address 38 + -EEXIST: Address already configured 39 + -ENXIO: The group or attribute is unknown/unsupported for this device 40 + or hardware support is missing. 41 + -EFAULT: Invalid user pointer for attr->addr. 50 42 51 43 KVM_DEV_ARM_VGIC_GRP_DIST_REGS 52 44 Attributes:

+3 -1

Documentation/virtual/kvm/devices/vcpu.txt

··· 30 30 attribute 31 31 -EBUSY: PMUv3 already initialized 32 32 33 - Request the initialization of the PMUv3. 33 + Request the initialization of the PMUv3. This must be done after creating the 34 + in-kernel irqchip. Creating a PMU with a userspace irqchip is currently not 35 + supported.

+76 -17

arch/arm/include/asm/arch_gicv3.h

··· 22 22 23 23 #include <linux/io.h> 24 24 #include <asm/barrier.h> 25 - 26 - #define __ACCESS_CP15(CRn, Op1, CRm, Op2) p15, Op1, %0, CRn, CRm, Op2 27 - #define __ACCESS_CP15_64(Op1, CRm) p15, Op1, %Q0, %R0, CRm 25 + #include <asm/cp15.h> 28 26 29 27 #define ICC_EOIR1 __ACCESS_CP15(c12, 0, c12, 1) 30 28 #define ICC_DIR __ACCESS_CP15(c12, 0, c11, 1) ··· 97 99 #define ICH_AP1R2 __AP1Rx(2) 98 100 #define ICH_AP1R3 __AP1Rx(3) 99 101 102 + /* A32-to-A64 mappings used by VGIC save/restore */ 103 + 104 + #define CPUIF_MAP(a32, a64) \ 105 + static inline void write_ ## a64(u32 val) \ 106 + { \ 107 + write_sysreg(val, a32); \ 108 + } \ 109 + static inline u32 read_ ## a64(void) \ 110 + { \ 111 + return read_sysreg(a32); \ 112 + } \ 113 + 114 + #define CPUIF_MAP_LO_HI(a32lo, a32hi, a64) \ 115 + static inline void write_ ## a64(u64 val) \ 116 + { \ 117 + write_sysreg(lower_32_bits(val), a32lo);\ 118 + write_sysreg(upper_32_bits(val), a32hi);\ 119 + } \ 120 + static inline u64 read_ ## a64(void) \ 121 + { \ 122 + u64 val = read_sysreg(a32lo); \ 123 + \ 124 + val |= (u64)read_sysreg(a32hi) << 32; \ 125 + \ 126 + return val; \ 127 + } 128 + 129 + CPUIF_MAP(ICH_HCR, ICH_HCR_EL2) 130 + CPUIF_MAP(ICH_VTR, ICH_VTR_EL2) 131 + CPUIF_MAP(ICH_MISR, ICH_MISR_EL2) 132 + CPUIF_MAP(ICH_EISR, ICH_EISR_EL2) 133 + CPUIF_MAP(ICH_ELSR, ICH_ELSR_EL2) 134 + CPUIF_MAP(ICH_VMCR, ICH_VMCR_EL2) 135 + CPUIF_MAP(ICH_AP0R3, ICH_AP0R3_EL2) 136 + CPUIF_MAP(ICH_AP0R2, ICH_AP0R2_EL2) 137 + CPUIF_MAP(ICH_AP0R1, ICH_AP0R1_EL2) 138 + CPUIF_MAP(ICH_AP0R0, ICH_AP0R0_EL2) 139 + CPUIF_MAP(ICH_AP1R3, ICH_AP1R3_EL2) 140 + CPUIF_MAP(ICH_AP1R2, ICH_AP1R2_EL2) 141 + CPUIF_MAP(ICH_AP1R1, ICH_AP1R1_EL2) 142 + CPUIF_MAP(ICH_AP1R0, ICH_AP1R0_EL2) 143 + CPUIF_MAP(ICC_HSRE, ICC_SRE_EL2) 144 + CPUIF_MAP(ICC_SRE, ICC_SRE_EL1) 145 + 146 + CPUIF_MAP_LO_HI(ICH_LR15, ICH_LRC15, ICH_LR15_EL2) 147 + CPUIF_MAP_LO_HI(ICH_LR14, ICH_LRC14, ICH_LR14_EL2) 148 + CPUIF_MAP_LO_HI(ICH_LR13, ICH_LRC13, ICH_LR13_EL2) 149 + CPUIF_MAP_LO_HI(ICH_LR12, ICH_LRC12, ICH_LR12_EL2) 150 + CPUIF_MAP_LO_HI(ICH_LR11, ICH_LRC11, ICH_LR11_EL2) 151 + CPUIF_MAP_LO_HI(ICH_LR10, ICH_LRC10, ICH_LR10_EL2) 152 + CPUIF_MAP_LO_HI(ICH_LR9, ICH_LRC9, ICH_LR9_EL2) 153 + CPUIF_MAP_LO_HI(ICH_LR8, ICH_LRC8, ICH_LR8_EL2) 154 + CPUIF_MAP_LO_HI(ICH_LR7, ICH_LRC7, ICH_LR7_EL2) 155 + CPUIF_MAP_LO_HI(ICH_LR6, ICH_LRC6, ICH_LR6_EL2) 156 + CPUIF_MAP_LO_HI(ICH_LR5, ICH_LRC5, ICH_LR5_EL2) 157 + CPUIF_MAP_LO_HI(ICH_LR4, ICH_LRC4, ICH_LR4_EL2) 158 + CPUIF_MAP_LO_HI(ICH_LR3, ICH_LRC3, ICH_LR3_EL2) 159 + CPUIF_MAP_LO_HI(ICH_LR2, ICH_LRC2, ICH_LR2_EL2) 160 + CPUIF_MAP_LO_HI(ICH_LR1, ICH_LRC1, ICH_LR1_EL2) 161 + CPUIF_MAP_LO_HI(ICH_LR0, ICH_LRC0, ICH_LR0_EL2) 162 + 163 + #define read_gicreg(r) read_##r() 164 + #define write_gicreg(v, r) write_##r(v) 165 + 100 166 /* Low-level accessors */ 101 167 102 168 static inline void gic_write_eoir(u32 irq) 103 169 { 104 - asm volatile("mcr " __stringify(ICC_EOIR1) : : "r" (irq)); 170 + write_sysreg(irq, ICC_EOIR1); 105 171 isb(); 106 172 } 107 173 108 174 static inline void gic_write_dir(u32 val) 109 175 { 110 - asm volatile("mcr " __stringify(ICC_DIR) : : "r" (val)); 176 + write_sysreg(val, ICC_DIR); 111 177 isb(); 112 178 } 113 179 114 180 static inline u32 gic_read_iar(void) 115 181 { 116 - u32 irqstat; 182 + u32 irqstat = read_sysreg(ICC_IAR1); 117 183 118 - asm volatile("mrc " __stringify(ICC_IAR1) : "=r" (irqstat)); 119 184 dsb(sy); 185 + 120 186 return irqstat; 121 187 } 122 188 123 189 static inline void gic_write_pmr(u32 val) 124 190 { 125 - asm volatile("mcr " __stringify(ICC_PMR) : : "r" (val)); 191 + write_sysreg(val, ICC_PMR); 126 192 } 127 193 128 194 static inline void gic_write_ctlr(u32 val) 129 195 { 130 - asm volatile("mcr " __stringify(ICC_CTLR) : : "r" (val)); 196 + write_sysreg(val, ICC_CTLR); 131 197 isb(); 132 198 } 133 199 134 200 static inline void gic_write_grpen1(u32 val) 135 201 { 136 - asm volatile("mcr " __stringify(ICC_IGRPEN1) : : "r" (val)); 202 + write_sysreg(val, ICC_IGRPEN1); 137 203 isb(); 138 204 } 139 205 140 206 static inline void gic_write_sgi1r(u64 val) 141 207 { 142 - asm volatile("mcrr " __stringify(ICC_SGI1R) : : "r" (val)); 208 + write_sysreg(val, ICC_SGI1R); 143 209 } 144 210 145 211 static inline u32 gic_read_sre(void) 146 212 { 147 - u32 val; 148 - 149 - asm volatile("mrc " __stringify(ICC_SRE) : "=r" (val)); 150 - return val; 213 + return read_sysreg(ICC_SRE); 151 214 } 152 215 153 216 static inline void gic_write_sre(u32 val) 154 217 { 155 - asm volatile("mcr " __stringify(ICC_SRE) : : "r" (val)); 218 + write_sysreg(val, ICC_SRE); 156 219 isb(); 157 220 } 158 221 159 222 static inline void gic_write_bpr1(u32 val) 160 223 { 161 - asm volatile("mcr " __stringify(ICC_BPR1) : : "r" (val)); 224 + write_sysreg(val, ICC_BPR1); 162 225 } 163 226 164 227 /*

+15

arch/arm/include/asm/cp15.h

··· 49 49 50 50 #ifdef CONFIG_CPU_CP15 51 51 52 + #define __ACCESS_CP15(CRn, Op1, CRm, Op2) \ 53 + "mrc", "mcr", __stringify(p15, Op1, %0, CRn, CRm, Op2), u32 54 + #define __ACCESS_CP15_64(Op1, CRm) \ 55 + "mrrc", "mcrr", __stringify(p15, Op1, %Q0, %R0, CRm), u64 56 + 57 + #define __read_sysreg(r, w, c, t) ({ \ 58 + t __val; \ 59 + asm volatile(r " " c : "=r" (__val)); \ 60 + __val; \ 61 + }) 62 + #define read_sysreg(...) __read_sysreg(__VA_ARGS__) 63 + 64 + #define __write_sysreg(v, r, w, c, t) asm volatile(w " " c : : "r" ((t)(v))) 65 + #define write_sysreg(v, ...) __write_sysreg(v, __VA_ARGS__) 66 + 52 67 extern unsigned long cr_alignment; /* defined in entry-armv.S */ 53 68 54 69 static inline unsigned long get_cr(void)

+1

arch/arm/include/asm/cputype.h

··· 55 55 56 56 #define MPIDR_LEVEL_BITS 8 57 57 #define MPIDR_LEVEL_MASK ((1 << MPIDR_LEVEL_BITS) - 1) 58 + #define MPIDR_LEVEL_SHIFT(level) (MPIDR_LEVEL_BITS * level) 58 59 59 60 #define MPIDR_AFFINITY_LEVEL(mpidr, level) \ 60 61 ((mpidr >> (MPIDR_LEVEL_BITS * level)) & MPIDR_LEVEL_MASK)

+7

arch/arm/include/asm/kvm_asm.h

··· 21 21 22 22 #include <asm/virt.h> 23 23 24 + #define ARM_EXIT_WITH_ABORT_BIT 31 25 + #define ARM_EXCEPTION_CODE(x) ((x) & ~(1U << ARM_EXIT_WITH_ABORT_BIT)) 26 + #define ARM_ABORT_PENDING(x) !!((x) & (1U << ARM_EXIT_WITH_ABORT_BIT)) 27 + 24 28 #define ARM_EXCEPTION_RESET 0 25 29 #define ARM_EXCEPTION_UNDEFINED 1 26 30 #define ARM_EXCEPTION_SOFTWARE 2 ··· 72 68 extern void __init_stage2_translation(void); 73 69 74 70 extern void __kvm_hyp_reset(unsigned long); 71 + 72 + extern u64 __vgic_v3_get_ich_vtr_el2(void); 73 + extern void __vgic_v3_init_lrs(void); 75 74 #endif 76 75 77 76 #endif /* __ARM_KVM_ASM_H__ */

+28 -7

arch/arm/include/asm/kvm_emulate.h

··· 40 40 *vcpu_reg(vcpu, reg_num) = val; 41 41 } 42 42 43 - bool kvm_condition_valid(struct kvm_vcpu *vcpu); 44 - void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr); 43 + bool kvm_condition_valid32(const struct kvm_vcpu *vcpu); 44 + void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr); 45 45 void kvm_inject_undefined(struct kvm_vcpu *vcpu); 46 + void kvm_inject_vabt(struct kvm_vcpu *vcpu); 46 47 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr); 47 48 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr); 49 + 50 + static inline bool kvm_condition_valid(const struct kvm_vcpu *vcpu) 51 + { 52 + return kvm_condition_valid32(vcpu); 53 + } 54 + 55 + static inline void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr) 56 + { 57 + kvm_skip_instr32(vcpu, is_wide_instr); 58 + } 48 59 49 60 static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu) 50 61 { 51 62 vcpu->arch.hcr = HCR_GUEST_MASK; 52 63 } 53 64 54 - static inline unsigned long vcpu_get_hcr(struct kvm_vcpu *vcpu) 65 + static inline unsigned long vcpu_get_hcr(const struct kvm_vcpu *vcpu) 55 66 { 56 67 return vcpu->arch.hcr; 57 68 } ··· 72 61 vcpu->arch.hcr = hcr; 73 62 } 74 63 75 - static inline bool vcpu_mode_is_32bit(struct kvm_vcpu *vcpu) 64 + static inline bool vcpu_mode_is_32bit(const struct kvm_vcpu *vcpu) 76 65 { 77 66 return 1; 78 67 } ··· 82 71 return &vcpu->arch.ctxt.gp_regs.usr_regs.ARM_pc; 83 72 } 84 73 85 - static inline unsigned long *vcpu_cpsr(struct kvm_vcpu *vcpu) 74 + static inline unsigned long *vcpu_cpsr(const struct kvm_vcpu *vcpu) 86 75 { 87 - return &vcpu->arch.ctxt.gp_regs.usr_regs.ARM_cpsr; 76 + return (unsigned long *)&vcpu->arch.ctxt.gp_regs.usr_regs.ARM_cpsr; 88 77 } 89 78 90 79 static inline void vcpu_set_thumb(struct kvm_vcpu *vcpu) ··· 104 93 return cpsr_mode > USR_MODE;; 105 94 } 106 95 107 - static inline u32 kvm_vcpu_get_hsr(struct kvm_vcpu *vcpu) 96 + static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu) 108 97 { 109 98 return vcpu->arch.fault.hsr; 99 + } 100 + 101 + static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu) 102 + { 103 + u32 hsr = kvm_vcpu_get_hsr(vcpu); 104 + 105 + if (hsr & HSR_CV) 106 + return (hsr & HSR_COND) >> HSR_COND_SHIFT; 107 + 108 + return -1; 110 109 } 111 110 112 111 static inline unsigned long kvm_vcpu_get_hfar(struct kvm_vcpu *vcpu)

+11 -6

arch/arm/include/asm/kvm_host.h

··· 39 39 40 40 #include <kvm/arm_vgic.h> 41 41 42 + 43 + #ifdef CONFIG_ARM_GIC_V3 44 + #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS 45 + #else 42 46 #define KVM_MAX_VCPUS VGIC_V2_MAX_CPUS 47 + #endif 43 48 44 49 #define KVM_REQ_VCPU_EXIT 8 45 50 ··· 188 183 }; 189 184 190 185 struct kvm_vm_stat { 191 - u32 remote_tlb_flush; 186 + ulong remote_tlb_flush; 192 187 }; 193 188 194 189 struct kvm_vcpu_stat { 195 - u32 halt_successful_poll; 196 - u32 halt_attempted_poll; 197 - u32 halt_poll_invalid; 198 - u32 halt_wakeup; 199 - u32 hvc_exit_stat; 190 + u64 halt_successful_poll; 191 + u64 halt_attempted_poll; 192 + u64 halt_poll_invalid; 193 + u64 halt_wakeup; 194 + u64 hvc_exit_stat; 200 195 u64 wfe_exit_stat; 201 196 u64 wfi_exit_stat; 202 197 u64 mmio_exit_user;

+4 -14

arch/arm/include/asm/kvm_hyp.h

··· 20 20 21 21 #include <linux/compiler.h> 22 22 #include <linux/kvm_host.h> 23 + #include <asm/cp15.h> 23 24 #include <asm/kvm_mmu.h> 24 25 #include <asm/vfp.h> 25 26 26 27 #define __hyp_text __section(.hyp.text) notrace 27 28 28 - #define __ACCESS_CP15(CRn, Op1, CRm, Op2) \ 29 - "mrc", "mcr", __stringify(p15, Op1, %0, CRn, CRm, Op2), u32 30 - #define __ACCESS_CP15_64(Op1, CRm) \ 31 - "mrrc", "mcrr", __stringify(p15, Op1, %Q0, %R0, CRm), u64 32 29 #define __ACCESS_VFP(CRn) \ 33 30 "mrc", "mcr", __stringify(p10, 7, %0, CRn, cr0, 0), u32 34 - 35 - #define __write_sysreg(v, r, w, c, t) asm volatile(w " " c : : "r" ((t)(v))) 36 - #define write_sysreg(v, ...) __write_sysreg(v, __VA_ARGS__) 37 - 38 - #define __read_sysreg(r, w, c, t) ({ \ 39 - t __val; \ 40 - asm volatile(r " " c : "=r" (__val)); \ 41 - __val; \ 42 - }) 43 - #define read_sysreg(...) __read_sysreg(__VA_ARGS__) 44 31 45 32 #define write_special(v, r) \ 46 33 asm volatile("msr " __stringify(r) ", %0" : : "r" (v)) ··· 105 118 106 119 void __sysreg_save_state(struct kvm_cpu_context *ctxt); 107 120 void __sysreg_restore_state(struct kvm_cpu_context *ctxt); 121 + 122 + void __vgic_v3_save_state(struct kvm_vcpu *vcpu); 123 + void __vgic_v3_restore_state(struct kvm_vcpu *vcpu); 108 124 109 125 void asmlinkage __vfp_save_state(struct vfp_hard_struct *vfp); 110 126 void asmlinkage __vfp_restore_state(struct vfp_hard_struct *vfp);

+2 -26

arch/arm/include/asm/kvm_mmu.h

··· 63 63 static inline void kvm_set_pmd(pmd_t *pmd, pmd_t new_pmd) 64 64 { 65 65 *pmd = new_pmd; 66 - flush_pmd_entry(pmd); 66 + dsb(ishst); 67 67 } 68 68 69 69 static inline void kvm_set_pte(pte_t *pte, pte_t new_pte) 70 70 { 71 71 *pte = new_pte; 72 - /* 73 - * flush_pmd_entry just takes a void pointer and cleans the necessary 74 - * cache entries, so we can reuse the function for ptes. 75 - */ 76 - flush_pmd_entry(pte); 77 - } 78 - 79 - static inline void kvm_clean_pgd(pgd_t *pgd) 80 - { 81 - clean_dcache_area(pgd, PTRS_PER_S2_PGD * sizeof(pgd_t)); 82 - } 83 - 84 - static inline void kvm_clean_pmd(pmd_t *pmd) 85 - { 86 - clean_dcache_area(pmd, PTRS_PER_PMD * sizeof(pmd_t)); 87 - } 88 - 89 - static inline void kvm_clean_pmd_entry(pmd_t *pmd) 90 - { 91 - clean_pmd_entry(pmd); 92 - } 93 - 94 - static inline void kvm_clean_pte(pte_t *pte) 95 - { 96 - clean_pte_table(pte); 72 + dsb(ishst); 97 73 } 98 74 99 75 static inline pte_t kvm_s2pte_mkwrite(pte_t pte)

+7

arch/arm/include/uapi/asm/kvm.h

··· 84 84 #define KVM_VGIC_V2_DIST_SIZE 0x1000 85 85 #define KVM_VGIC_V2_CPU_SIZE 0x2000 86 86 87 + /* Supported VGICv3 address types */ 88 + #define KVM_VGIC_V3_ADDR_TYPE_DIST 2 89 + #define KVM_VGIC_V3_ADDR_TYPE_REDIST 3 90 + 91 + #define KVM_VGIC_V3_DIST_SIZE SZ_64K 92 + #define KVM_VGIC_V3_REDIST_SIZE (2 * SZ_64K) 93 + 87 94 #define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */ 88 95 #define KVM_ARM_VCPU_PSCI_0_2 1 /* CPU uses PSCI v0.2 */ 89 96

+3

arch/arm/kvm/Makefile

··· 21 21 obj-y += kvm-arm.o init.o interrupts.o 22 22 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o 23 23 obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o 24 + obj-y += $(KVM)/arm/aarch32.o 24 25 25 26 obj-y += $(KVM)/arm/vgic/vgic.o 26 27 obj-y += $(KVM)/arm/vgic/vgic-init.o 27 28 obj-y += $(KVM)/arm/vgic/vgic-irqfd.o 28 29 obj-y += $(KVM)/arm/vgic/vgic-v2.o 30 + obj-y += $(KVM)/arm/vgic/vgic-v3.o 29 31 obj-y += $(KVM)/arm/vgic/vgic-mmio.o 30 32 obj-y += $(KVM)/arm/vgic/vgic-mmio-v2.o 33 + obj-y += $(KVM)/arm/vgic/vgic-mmio-v3.o 31 34 obj-y += $(KVM)/arm/vgic/vgic-kvm-device.o 32 35 obj-y += $(KVM)/irqchip.o 33 36 obj-y += $(KVM)/arm/arch_timer.o

+14 -8

arch/arm/kvm/arm.c

··· 144 144 return ret; 145 145 } 146 146 147 + bool kvm_arch_has_vcpu_debugfs(void) 148 + { 149 + return false; 150 + } 151 + 152 + int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu) 153 + { 154 + return 0; 155 + } 156 + 147 157 int kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) 148 158 { 149 159 return VM_FAULT_SIGBUS; ··· 1186 1176 return -ENOMEM; 1187 1177 } 1188 1178 1179 + /* set size of VMID supported by CPU */ 1180 + kvm_vmid_bits = kvm_get_vmid_bits(); 1181 + kvm_info("%d-bit VMID\n", kvm_vmid_bits); 1182 + 1189 1183 return 0; 1190 1184 } 1191 1185 ··· 1255 1241 1256 1242 static int init_vhe_mode(void) 1257 1243 { 1258 - /* set size of VMID supported by CPU */ 1259 - kvm_vmid_bits = kvm_get_vmid_bits(); 1260 - kvm_info("%d-bit VMID\n", kvm_vmid_bits); 1261 - 1262 1244 kvm_info("VHE mode initialized successfully\n"); 1263 1245 return 0; 1264 1246 } ··· 1337 1327 goto out_err; 1338 1328 } 1339 1329 } 1340 - 1341 - /* set size of VMID supported by CPU */ 1342 - kvm_vmid_bits = kvm_get_vmid_bits(); 1343 - kvm_info("%d-bit VMID\n", kvm_vmid_bits); 1344 1330 1345 1331 kvm_info("Hyp mode initialized successfully\n"); 1346 1332

+35

arch/arm/kvm/coproc.c

··· 228 228 return true; 229 229 } 230 230 231 + static bool access_gic_sgi(struct kvm_vcpu *vcpu, 232 + const struct coproc_params *p, 233 + const struct coproc_reg *r) 234 + { 235 + u64 reg; 236 + 237 + if (!p->is_write) 238 + return read_from_write_only(vcpu, p); 239 + 240 + reg = (u64)*vcpu_reg(vcpu, p->Rt2) << 32; 241 + reg |= *vcpu_reg(vcpu, p->Rt1) ; 242 + 243 + vgic_v3_dispatch_sgi(vcpu, reg); 244 + 245 + return true; 246 + } 247 + 248 + static bool access_gic_sre(struct kvm_vcpu *vcpu, 249 + const struct coproc_params *p, 250 + const struct coproc_reg *r) 251 + { 252 + if (p->is_write) 253 + return ignore_write(vcpu, p); 254 + 255 + *vcpu_reg(vcpu, p->Rt1) = vcpu->arch.vgic_cpu.vgic_v3.vgic_sre; 256 + 257 + return true; 258 + } 259 + 231 260 /* 232 261 * We could trap ID_DFR0 and tell the guest we don't support performance 233 262 * monitoring. Unfortunately the patch to make the kernel check ID_DFR0 was ··· 390 361 { CRn(10), CRm( 3), Op1( 0), Op2( 1), is32, 391 362 access_vm_reg, reset_unknown, c10_AMAIR1}, 392 363 364 + /* ICC_SGI1R */ 365 + { CRm64(12), Op1( 0), is64, access_gic_sgi}, 366 + 393 367 /* VBAR: swapped by interrupt.S. */ 394 368 { CRn(12), CRm( 0), Op1( 0), Op2( 0), is32, 395 369 NULL, reset_val, c12_VBAR, 0x00000000 }, 370 + 371 + /* ICC_SRE */ 372 + { CRn(12), CRm(12), Op1( 0), Op2(5), is32, access_gic_sre }, 396 373 397 374 /* CONTEXTIDR/TPIDRURW/TPIDRURO/TPIDRPRW: swapped by interrupt.S. */ 398 375 { CRn(13), CRm( 0), Op1( 0), Op2( 1), is32,

+12 -99

arch/arm/kvm/emulate.c

··· 161 161 } 162 162 } 163 163 164 - /* 165 - * A conditional instruction is allowed to trap, even though it 166 - * wouldn't be executed. So let's re-implement the hardware, in 167 - * software! 168 - */ 169 - bool kvm_condition_valid(struct kvm_vcpu *vcpu) 170 - { 171 - unsigned long cpsr, cond, insn; 172 - 173 - /* 174 - * Exception Code 0 can only happen if we set HCR.TGE to 1, to 175 - * catch undefined instructions, and then we won't get past 176 - * the arm_exit_handlers test anyway. 177 - */ 178 - BUG_ON(!kvm_vcpu_trap_get_class(vcpu)); 179 - 180 - /* Top two bits non-zero? Unconditional. */ 181 - if (kvm_vcpu_get_hsr(vcpu) >> 30) 182 - return true; 183 - 184 - cpsr = *vcpu_cpsr(vcpu); 185 - 186 - /* Is condition field valid? */ 187 - if ((kvm_vcpu_get_hsr(vcpu) & HSR_CV) >> HSR_CV_SHIFT) 188 - cond = (kvm_vcpu_get_hsr(vcpu) & HSR_COND) >> HSR_COND_SHIFT; 189 - else { 190 - /* This can happen in Thumb mode: examine IT state. */ 191 - unsigned long it; 192 - 193 - it = ((cpsr >> 8) & 0xFC) | ((cpsr >> 25) & 0x3); 194 - 195 - /* it == 0 => unconditional. */ 196 - if (it == 0) 197 - return true; 198 - 199 - /* The cond for this insn works out as the top 4 bits. */ 200 - cond = (it >> 4); 201 - } 202 - 203 - /* Shift makes it look like an ARM-mode instruction */ 204 - insn = cond << 28; 205 - return arm_check_condition(insn, cpsr) != ARM_OPCODE_CONDTEST_FAIL; 206 - } 207 - 208 - /** 209 - * adjust_itstate - adjust ITSTATE when emulating instructions in IT-block 210 - * @vcpu: The VCPU pointer 211 - * 212 - * When exceptions occur while instructions are executed in Thumb IF-THEN 213 - * blocks, the ITSTATE field of the CPSR is not advanced (updated), so we have 214 - * to do this little bit of work manually. The fields map like this: 215 - * 216 - * IT[7:0] -> CPSR[26:25],CPSR[15:10] 217 - */ 218 - static void kvm_adjust_itstate(struct kvm_vcpu *vcpu) 219 - { 220 - unsigned long itbits, cond; 221 - unsigned long cpsr = *vcpu_cpsr(vcpu); 222 - bool is_arm = !(cpsr & PSR_T_BIT); 223 - 224 - BUG_ON(is_arm && (cpsr & PSR_IT_MASK)); 225 - 226 - if (!(cpsr & PSR_IT_MASK)) 227 - return; 228 - 229 - cond = (cpsr & 0xe000) >> 13; 230 - itbits = (cpsr & 0x1c00) >> (10 - 2); 231 - itbits |= (cpsr & (0x3 << 25)) >> 25; 232 - 233 - /* Perform ITAdvance (see page A-52 in ARM DDI 0406C) */ 234 - if ((itbits & 0x7) == 0) 235 - itbits = cond = 0; 236 - else 237 - itbits = (itbits << 1) & 0x1f; 238 - 239 - cpsr &= ~PSR_IT_MASK; 240 - cpsr |= cond << 13; 241 - cpsr |= (itbits & 0x1c) << (10 - 2); 242 - cpsr |= (itbits & 0x3) << 25; 243 - *vcpu_cpsr(vcpu) = cpsr; 244 - } 245 - 246 - /** 247 - * kvm_skip_instr - skip a trapped instruction and proceed to the next 248 - * @vcpu: The vcpu pointer 249 - */ 250 - void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr) 251 - { 252 - bool is_thumb; 253 - 254 - is_thumb = !!(*vcpu_cpsr(vcpu) & PSR_T_BIT); 255 - if (is_thumb && !is_wide_instr) 256 - *vcpu_pc(vcpu) += 2; 257 - else 258 - *vcpu_pc(vcpu) += 4; 259 - kvm_adjust_itstate(vcpu); 260 - } 261 - 262 - 263 164 /****************************************************************************** 264 165 * Inject exceptions into the guest 265 166 */ ··· 302 401 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr) 303 402 { 304 403 inject_abt(vcpu, true, addr); 404 + } 405 + 406 + /** 407 + * kvm_inject_vabt - inject an async abort / SError into the guest 408 + * @vcpu: The VCPU to receive the exception 409 + * 410 + * It is assumed that this code is called from the VCPU thread and that the 411 + * VCPU therefore is not currently executing guest code. 412 + */ 413 + void kvm_inject_vabt(struct kvm_vcpu *vcpu) 414 + { 415 + vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VA); 305 416 }

+22 -27

arch/arm/kvm/handle_exit.c

··· 28 28 29 29 typedef int (*exit_handle_fn)(struct kvm_vcpu *, struct kvm_run *); 30 30 31 - static int handle_svc_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run) 32 - { 33 - /* SVC called from Hyp mode should never get here */ 34 - kvm_debug("SVC called from Hyp mode shouldn't go here\n"); 35 - BUG(); 36 - return -EINVAL; /* Squash warning */ 37 - } 38 - 39 31 static int handle_hvc(struct kvm_vcpu *vcpu, struct kvm_run *run) 40 32 { 41 33 int ret; ··· 49 57 { 50 58 kvm_inject_undefined(vcpu); 51 59 return 1; 52 - } 53 - 54 - static int handle_pabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run) 55 - { 56 - /* The hypervisor should never cause aborts */ 57 - kvm_err("Prefetch Abort taken from Hyp mode at %#08lx (HSR: %#08x)\n", 58 - kvm_vcpu_get_hfar(vcpu), kvm_vcpu_get_hsr(vcpu)); 59 - return -EFAULT; 60 - } 61 - 62 - static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run) 63 - { 64 - /* This is either an error in the ws. code or an external abort */ 65 - kvm_err("Data Abort taken from Hyp mode at %#08lx (HSR: %#08x)\n", 66 - kvm_vcpu_get_hfar(vcpu), kvm_vcpu_get_hsr(vcpu)); 67 - return -EFAULT; 68 60 } 69 61 70 62 /** ··· 88 112 [HSR_EC_CP14_64] = kvm_handle_cp14_access, 89 113 [HSR_EC_CP_0_13] = kvm_handle_cp_0_13_access, 90 114 [HSR_EC_CP10_ID] = kvm_handle_cp10_id, 91 - [HSR_EC_SVC_HYP] = handle_svc_hyp, 92 115 [HSR_EC_HVC] = handle_hvc, 93 116 [HSR_EC_SMC] = handle_smc, 94 117 [HSR_EC_IABT] = kvm_handle_guest_abort, 95 - [HSR_EC_IABT_HYP] = handle_pabt_hyp, 96 118 [HSR_EC_DABT] = kvm_handle_guest_abort, 97 - [HSR_EC_DABT_HYP] = handle_dabt_hyp, 98 119 }; 99 120 100 121 static exit_handle_fn kvm_get_exit_handler(struct kvm_vcpu *vcpu) ··· 117 144 { 118 145 exit_handle_fn exit_handler; 119 146 147 + if (ARM_ABORT_PENDING(exception_index)) { 148 + u8 hsr_ec = kvm_vcpu_trap_get_class(vcpu); 149 + 150 + /* 151 + * HVC/SMC already have an adjusted PC, which we need 152 + * to correct in order to return to after having 153 + * injected the abort. 154 + */ 155 + if (hsr_ec == HSR_EC_HVC || hsr_ec == HSR_EC_SMC) { 156 + u32 adj = kvm_vcpu_trap_il_is32bit(vcpu) ? 4 : 2; 157 + *vcpu_pc(vcpu) -= adj; 158 + } 159 + 160 + kvm_inject_vabt(vcpu); 161 + return 1; 162 + } 163 + 164 + exception_index = ARM_EXCEPTION_CODE(exception_index); 165 + 120 166 switch (exception_index) { 121 167 case ARM_EXCEPTION_IRQ: 122 168 return 1; ··· 152 160 exit_handler = kvm_get_exit_handler(vcpu); 153 161 154 162 return exit_handler(vcpu, run); 163 + case ARM_EXCEPTION_DATA_ABORT: 164 + kvm_inject_vabt(vcpu); 165 + return 1; 155 166 default: 156 167 kvm_pr_unimpl("Unsupported exception type: %d", 157 168 exception_index);

+1

arch/arm/kvm/hyp/Makefile

··· 5 5 KVM=../../../../virt/kvm 6 6 7 7 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o 8 + obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o 8 9 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o 9 10 10 11 obj-$(CONFIG_KVM_ARM_HOST) += tlb.o

+31

arch/arm/kvm/hyp/entry.S

··· 18 18 #include <linux/linkage.h> 19 19 #include <asm/asm-offsets.h> 20 20 #include <asm/kvm_arm.h> 21 + #include <asm/kvm_asm.h> 21 22 22 23 .arch_extension virt 23 24 ··· 64 63 ldr lr, [r0, #4] 65 64 66 65 mov r0, r1 66 + mrs r1, SPSR 67 + mrs r2, ELR_hyp 68 + mrc p15, 4, r3, c5, c2, 0 @ HSR 69 + 70 + /* 71 + * Force loads and stores to complete before unmasking aborts 72 + * and forcing the delivery of the exception. This gives us a 73 + * single instruction window, which the handler will try to 74 + * match. 75 + */ 76 + dsb sy 77 + cpsie a 78 + 79 + .global abort_guest_exit_start 80 + abort_guest_exit_start: 81 + 82 + isb 83 + 84 + .global abort_guest_exit_end 85 + abort_guest_exit_end: 86 + 87 + /* 88 + * If we took an abort, r0[31] will be set, and cmp will set 89 + * the N bit in PSTATE. 90 + */ 91 + cmp r0, #0 92 + msrmi SPSR_cxsf, r1 93 + msrmi ELR_hyp, r2 94 + mcrmi p15, 4, r3, c5, c2, 0 @ HSR 95 + 67 96 bx lr 68 97 ENDPROC(__guest_exit) 69 98

+15 -1

arch/arm/kvm/hyp/hyp-entry.S

··· 81 81 invalid_vector hyp_undef ARM_EXCEPTION_UNDEFINED 82 82 invalid_vector hyp_svc ARM_EXCEPTION_SOFTWARE 83 83 invalid_vector hyp_pabt ARM_EXCEPTION_PREF_ABORT 84 - invalid_vector hyp_dabt ARM_EXCEPTION_DATA_ABORT 85 84 invalid_vector hyp_fiq ARM_EXCEPTION_FIQ 86 85 87 86 ENTRY(__hyp_do_panic) ··· 162 163 mov r1, #ARM_EXCEPTION_IRQ 163 164 load_vcpu r0 @ Load VCPU pointer to r0 164 165 b __guest_exit 166 + 167 + hyp_dabt: 168 + push {r0, r1} 169 + mrs r0, ELR_hyp 170 + ldr r1, =abort_guest_exit_start 171 + THUMB( add r1, r1, #1) 172 + cmp r0, r1 173 + ldrne r1, =abort_guest_exit_end 174 + THUMB( addne r1, r1, #1) 175 + cmpne r0, r1 176 + pop {r0, r1} 177 + bne __hyp_panic 178 + 179 + orr r0, r0, #(1 << ARM_EXIT_WITH_ABORT_BIT) 180 + eret 165 181 166 182 .ltorg 167 183

+20 -5

arch/arm/kvm/hyp/switch.c

··· 14 14 * You should have received a copy of the GNU General Public License 15 15 * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 16 */ 17 + #include <linux/jump_label.h> 17 18 18 19 #include <asm/kvm_asm.h> 19 20 #include <asm/kvm_hyp.h> ··· 55 54 { 56 55 u32 val; 57 56 57 + /* 58 + * If we pended a virtual abort, preserve it until it gets 59 + * cleared. See B1.9.9 (Virtual Abort exception) for details, 60 + * but the crucial bit is the zeroing of HCR.VA in the 61 + * pseudocode. 62 + */ 63 + if (vcpu->arch.hcr & HCR_VA) 64 + vcpu->arch.hcr = read_sysreg(HCR); 65 + 58 66 write_sysreg(0, HCR); 59 67 write_sysreg(0, HSTR); 60 68 val = read_sysreg(HDCR); ··· 84 74 write_sysreg(read_sysreg(MIDR), VPIDR); 85 75 } 86 76 77 + 87 78 static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu) 88 79 { 89 - __vgic_v2_save_state(vcpu); 80 + if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) 81 + __vgic_v3_save_state(vcpu); 82 + else 83 + __vgic_v2_save_state(vcpu); 90 84 } 91 85 92 86 static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu) 93 87 { 94 - __vgic_v2_restore_state(vcpu); 88 + if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) 89 + __vgic_v3_restore_state(vcpu); 90 + else 91 + __vgic_v2_restore_state(vcpu); 95 92 } 96 93 97 94 static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu) ··· 151 134 return true; 152 135 } 153 136 154 - static int __hyp_text __guest_run(struct kvm_vcpu *vcpu) 137 + int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu) 155 138 { 156 139 struct kvm_cpu_context *host_ctxt; 157 140 struct kvm_cpu_context *guest_ctxt; ··· 207 190 208 191 return exit_code; 209 192 } 210 - 211 - __alias(__guest_run) int __kvm_vcpu_run(struct kvm_vcpu *vcpu); 212 193 213 194 static const char * const __hyp_panic_string[] = { 214 195 [ARM_EXCEPTION_RESET] = "\nHYP panic: RST PC:%08x CPSR:%08x",

+4 -11

arch/arm/kvm/hyp/tlb.c

··· 34 34 * As v7 does not support flushing per IPA, just nuke the whole TLB 35 35 * instead, ignoring the ipa value. 36 36 */ 37 - static void __hyp_text __tlb_flush_vmid(struct kvm *kvm) 37 + void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm) 38 38 { 39 39 dsb(ishst); 40 40 ··· 50 50 write_sysreg(0, VTTBR); 51 51 } 52 52 53 - __alias(__tlb_flush_vmid) void __kvm_tlb_flush_vmid(struct kvm *kvm); 54 - 55 - static void __hyp_text __tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) 53 + void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) 56 54 { 57 - __tlb_flush_vmid(kvm); 55 + __kvm_tlb_flush_vmid(kvm); 58 56 } 59 57 60 - __alias(__tlb_flush_vmid_ipa) void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, 61 - phys_addr_t ipa); 62 - 63 - static void __hyp_text __tlb_flush_vm_context(void) 58 + void __hyp_text __kvm_flush_vm_context(void) 64 59 { 65 60 write_sysreg(0, TLBIALLNSNHIS); 66 61 write_sysreg(0, ICIALLUIS); 67 62 dsb(ish); 68 63 } 69 - 70 - __alias(__tlb_flush_vm_context) void __kvm_flush_vm_context(void);

-6

arch/arm/kvm/mmio.c

··· 126 126 int access_size; 127 127 bool sign_extend; 128 128 129 - if (kvm_vcpu_dabt_isextabt(vcpu)) { 130 - /* cache operation on I/O addr, tell guest unsupported */ 131 - kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu)); 132 - return 1; 133 - } 134 - 135 129 if (kvm_vcpu_dabt_iss1tw(vcpu)) { 136 130 /* page table accesses IO mem: tell guest to fix its TTBR */ 137 131 kvm_inject_dabt(vcpu, kvm_vcpu_get_hfar(vcpu));

+5 -2

arch/arm/kvm/mmu.c

··· 744 744 if (!pgd) 745 745 return -ENOMEM; 746 746 747 - kvm_clean_pgd(pgd); 748 747 kvm->arch.pgd = pgd; 749 748 return 0; 750 749 } ··· 935 936 if (!cache) 936 937 return 0; /* ignore calls from kvm_set_spte_hva */ 937 938 pte = mmu_memory_cache_alloc(cache); 938 - kvm_clean_pte(pte); 939 939 pmd_populate_kernel(NULL, pmd, pte); 940 940 get_page(virt_to_page(pmd)); 941 941 } ··· 1432 1434 int ret, idx; 1433 1435 1434 1436 is_iabt = kvm_vcpu_trap_is_iabt(vcpu); 1437 + if (unlikely(!is_iabt && kvm_vcpu_dabt_isextabt(vcpu))) { 1438 + kvm_inject_vabt(vcpu); 1439 + return 1; 1440 + } 1441 + 1435 1442 fault_ipa = kvm_vcpu_get_fault_ipa(vcpu); 1436 1443 1437 1444 trace_kvm_guest_fault(*vcpu_pc(vcpu), kvm_vcpu_get_hsr(vcpu),

+13

arch/arm64/include/asm/arch_gicv3.h

··· 80 80 #include <linux/stringify.h> 81 81 #include <asm/barrier.h> 82 82 83 + #define read_gicreg(r) \ 84 + ({ \ 85 + u64 reg; \ 86 + asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \ 87 + reg; \ 88 + }) 89 + 90 + #define write_gicreg(v,r) \ 91 + do { \ 92 + u64 __val = (v); \ 93 + asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\ 94 + } while (0) 95 + 83 96 /* 84 97 * Low-level accessors 85 98 *

+2 -2

arch/arm64/include/asm/kvm_arm.h

+7 -2

arch/arm64/include/asm/kvm_asm.h

··· 20 20 21 21 #include <asm/virt.h> 22 22 23 + #define ARM_EXIT_WITH_SERROR_BIT 31 24 + #define ARM_EXCEPTION_CODE(x) ((x) & ~(1U << ARM_EXIT_WITH_SERROR_BIT)) 25 + #define ARM_SERROR_PENDING(x) !!((x) & (1U << ARM_EXIT_WITH_SERROR_BIT)) 26 + 23 27 #define ARM_EXCEPTION_IRQ 0 24 - #define ARM_EXCEPTION_TRAP 1 28 + #define ARM_EXCEPTION_EL1_SERROR 1 29 + #define ARM_EXCEPTION_TRAP 2 25 30 /* The hyp-stub will return this for any kvm_call_hyp() call */ 26 - #define ARM_EXCEPTION_HYP_GONE 2 31 + #define ARM_EXCEPTION_HYP_GONE 3 27 32 28 33 #define KVM_ARM64_DEBUG_DIRTY_SHIFT 0 29 34 #define KVM_ARM64_DEBUG_DIRTY (1 << KVM_ARM64_DEBUG_DIRTY_SHIFT)

+11

arch/arm64/include/asm/kvm_emulate.h

··· 38 38 void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr); 39 39 40 40 void kvm_inject_undefined(struct kvm_vcpu *vcpu); 41 + void kvm_inject_vabt(struct kvm_vcpu *vcpu); 41 42 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr); 42 43 void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr); 43 44 ··· 146 145 static inline u32 kvm_vcpu_get_hsr(const struct kvm_vcpu *vcpu) 147 146 { 148 147 return vcpu->arch.fault.esr_el2; 148 + } 149 + 150 + static inline int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu) 151 + { 152 + u32 esr = kvm_vcpu_get_hsr(vcpu); 153 + 154 + if (esr & ESR_ELx_CV) 155 + return (esr & ESR_ELx_COND_MASK) >> ESR_ELx_COND_SHIFT; 156 + 157 + return -1; 149 158 } 150 159 151 160 static inline unsigned long kvm_vcpu_get_hfar(const struct kvm_vcpu *vcpu)

+6 -6

arch/arm64/include/asm/kvm_host.h

··· 290 290 #endif 291 291 292 292 struct kvm_vm_stat { 293 - u32 remote_tlb_flush; 293 + ulong remote_tlb_flush; 294 294 }; 295 295 296 296 struct kvm_vcpu_stat { 297 - u32 halt_successful_poll; 298 - u32 halt_attempted_poll; 299 - u32 halt_poll_invalid; 300 - u32 halt_wakeup; 301 - u32 hvc_exit_stat; 297 + u64 halt_successful_poll; 298 + u64 halt_attempted_poll; 299 + u64 halt_poll_invalid; 300 + u64 halt_wakeup; 301 + u64 hvc_exit_stat; 302 302 u64 wfe_exit_stat; 303 303 u64 wfi_exit_stat; 304 304 u64 mmio_exit_user;

+1

arch/arm64/include/asm/kvm_hyp.h

··· 123 123 124 124 void __vgic_v2_save_state(struct kvm_vcpu *vcpu); 125 125 void __vgic_v2_restore_state(struct kvm_vcpu *vcpu); 126 + int __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu); 126 127 127 128 void __vgic_v3_save_state(struct kvm_vcpu *vcpu); 128 129 void __vgic_v3_restore_state(struct kvm_vcpu *vcpu);

-6

arch/arm64/include/asm/kvm_mmu.h

··· 162 162 #define kvm_set_pte(ptep, pte) set_pte(ptep, pte) 163 163 #define kvm_set_pmd(pmdp, pmd) set_pmd(pmdp, pmd) 164 164 165 - static inline void kvm_clean_pgd(pgd_t *pgd) {} 166 - static inline void kvm_clean_pmd(pmd_t *pmd) {} 167 - static inline void kvm_clean_pmd_entry(pmd_t *pmd) {} 168 - static inline void kvm_clean_pte(pte_t *pte) {} 169 - static inline void kvm_clean_pte_entry(pte_t *pte) {} 170 - 171 165 static inline pte_t kvm_s2pte_mkwrite(pte_t pte) 172 166 { 173 167 pte_val(pte) |= PTE_S2_RDWR;

+2 -2

arch/arm64/kvm/Kconfig

··· 16 16 17 17 if VIRTUALIZATION 18 18 19 - config KVM_ARM_VGIC_V3 19 + config KVM_ARM_VGIC_V3_ITS 20 20 bool 21 21 22 22 config KVM ··· 34 34 select KVM_VFIO 35 35 select HAVE_KVM_EVENTFD 36 36 select HAVE_KVM_IRQFD 37 - select KVM_ARM_VGIC_V3 37 + select KVM_ARM_VGIC_V3_ITS 38 38 select KVM_ARM_PMU if HW_PERF_EVENTS 39 39 select HAVE_KVM_MSI 40 40 select HAVE_KVM_IRQCHIP

+2 -1

arch/arm64/kvm/Makefile

··· 16 16 kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/arm.o $(ARM)/mmu.o $(ARM)/mmio.o 17 17 kvm-$(CONFIG_KVM_ARM_HOST) += $(ARM)/psci.o $(ARM)/perf.o 18 18 19 - kvm-$(CONFIG_KVM_ARM_HOST) += emulate.o inject_fault.o regmap.o 19 + kvm-$(CONFIG_KVM_ARM_HOST) += inject_fault.o regmap.o 20 20 kvm-$(CONFIG_KVM_ARM_HOST) += hyp.o hyp-init.o handle_exit.o 21 21 kvm-$(CONFIG_KVM_ARM_HOST) += guest.o debug.o reset.o sys_regs.o sys_regs_generic_v8.o 22 + kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/aarch32.o 22 23 23 24 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic.o 24 25 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic/vgic-init.o

+9 -16

arch/arm64/kvm/emulate.c virt/kvm/arm/aarch32.c

··· 22 22 */ 23 23 24 24 #include <linux/kvm_host.h> 25 - #include <asm/esr.h> 26 25 #include <asm/kvm_emulate.h> 26 + #include <asm/kvm_hyp.h> 27 + 28 + #ifndef CONFIG_ARM64 29 + #define COMPAT_PSR_T_BIT PSR_T_BIT 30 + #define COMPAT_PSR_IT_MASK PSR_IT_MASK 31 + #endif 27 32 28 33 /* 29 34 * stolen from arch/arm/kernel/opcodes.c ··· 56 51 0xFFFF, /* AL always */ 57 52 0 /* NV */ 58 53 }; 59 - 60 - static int kvm_vcpu_get_condition(const struct kvm_vcpu *vcpu) 61 - { 62 - u32 esr = kvm_vcpu_get_hsr(vcpu); 63 - 64 - if (esr & ESR_ELx_CV) 65 - return (esr & ESR_ELx_COND_MASK) >> ESR_ELx_COND_SHIFT; 66 - 67 - return -1; 68 - } 69 54 70 55 /* 71 56 * Check if a trapped instruction should have been executed or not. ··· 109 114 * 110 115 * IT[7:0] -> CPSR[26:25],CPSR[15:10] 111 116 */ 112 - static void kvm_adjust_itstate(struct kvm_vcpu *vcpu) 117 + static void __hyp_text kvm_adjust_itstate(struct kvm_vcpu *vcpu) 113 118 { 114 119 unsigned long itbits, cond; 115 120 unsigned long cpsr = *vcpu_cpsr(vcpu); 116 121 bool is_arm = !(cpsr & COMPAT_PSR_T_BIT); 117 122 118 - BUG_ON(is_arm && (cpsr & COMPAT_PSR_IT_MASK)); 119 - 120 - if (!(cpsr & COMPAT_PSR_IT_MASK)) 123 + if (is_arm || !(cpsr & COMPAT_PSR_IT_MASK)) 121 124 return; 122 125 123 126 cond = (cpsr & 0xe000) >> 13; ··· 139 146 * kvm_skip_instr - skip a trapped instruction and proceed to the next 140 147 * @vcpu: The vcpu pointer 141 148 */ 142 - void kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr) 149 + void __hyp_text kvm_skip_instr32(struct kvm_vcpu *vcpu, bool is_wide_instr) 143 150 { 144 151 bool is_thumb; 145 152

+23

arch/arm64/kvm/handle_exit.c

··· 170 170 { 171 171 exit_handle_fn exit_handler; 172 172 173 + if (ARM_SERROR_PENDING(exception_index)) { 174 + u8 hsr_ec = ESR_ELx_EC(kvm_vcpu_get_hsr(vcpu)); 175 + 176 + /* 177 + * HVC/SMC already have an adjusted PC, which we need 178 + * to correct in order to return to after having 179 + * injected the SError. 180 + */ 181 + if (hsr_ec == ESR_ELx_EC_HVC32 || hsr_ec == ESR_ELx_EC_HVC64 || 182 + hsr_ec == ESR_ELx_EC_SMC32 || hsr_ec == ESR_ELx_EC_SMC64) { 183 + u32 adj = kvm_vcpu_trap_il_is32bit(vcpu) ? 4 : 2; 184 + *vcpu_pc(vcpu) -= adj; 185 + } 186 + 187 + kvm_inject_vabt(vcpu); 188 + return 1; 189 + } 190 + 191 + exception_index = ARM_EXCEPTION_CODE(exception_index); 192 + 173 193 switch (exception_index) { 174 194 case ARM_EXCEPTION_IRQ: 195 + return 1; 196 + case ARM_EXCEPTION_EL1_SERROR: 197 + kvm_inject_vabt(vcpu); 175 198 return 1; 176 199 case ARM_EXCEPTION_TRAP: 177 200 /*

+1 -1

arch/arm64/kvm/hyp/Makefile

··· 5 5 KVM=../../../../virt/kvm 6 6 7 7 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o 8 + obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v3-sr.o 8 9 obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o 9 10 10 - obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o 11 11 obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o 12 12 obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o 13 13 obj-$(CONFIG_KVM_ARM_HOST) += entry.o

+1 -3

arch/arm64/kvm/hyp/debug-sr.c

··· 131 131 vcpu->arch.debug_flags &= ~KVM_ARM64_DEBUG_DIRTY; 132 132 } 133 133 134 - static u32 __hyp_text __debug_read_mdcr_el2(void) 134 + u32 __hyp_text __kvm_get_mdcr_el2(void) 135 135 { 136 136 return read_sysreg(mdcr_el2); 137 137 } 138 - 139 - __alias(__debug_read_mdcr_el2) u32 __kvm_get_mdcr_el2(void);

+79 -47

arch/arm64/kvm/hyp/entry.S

··· 55 55 */ 56 56 ENTRY(__guest_enter) 57 57 // x0: vcpu 58 - // x1: host/guest context 59 - // x2-x18: clobbered by macros 58 + // x1: host context 59 + // x2-x17: clobbered by macros 60 + // x18: guest context 60 61 61 62 // Store the host regs 62 63 save_callee_saved_regs x1 63 64 64 - // Preserve vcpu & host_ctxt for use at exit time 65 - stp x0, x1, [sp, #-16]! 65 + // Store the host_ctxt for use at exit time 66 + str x1, [sp, #-16]! 66 67 67 - add x1, x0, #VCPU_CONTEXT 68 + add x18, x0, #VCPU_CONTEXT 68 69 69 - // Prepare x0-x1 for later restore by pushing them onto the stack 70 - ldp x2, x3, [x1, #CPU_XREG_OFFSET(0)] 71 - stp x2, x3, [sp, #-16]! 70 + // Restore guest regs x0-x17 71 + ldp x0, x1, [x18, #CPU_XREG_OFFSET(0)] 72 + ldp x2, x3, [x18, #CPU_XREG_OFFSET(2)] 73 + ldp x4, x5, [x18, #CPU_XREG_OFFSET(4)] 74 + ldp x6, x7, [x18, #CPU_XREG_OFFSET(6)] 75 + ldp x8, x9, [x18, #CPU_XREG_OFFSET(8)] 76 + ldp x10, x11, [x18, #CPU_XREG_OFFSET(10)] 77 + ldp x12, x13, [x18, #CPU_XREG_OFFSET(12)] 78 + ldp x14, x15, [x18, #CPU_XREG_OFFSET(14)] 79 + ldp x16, x17, [x18, #CPU_XREG_OFFSET(16)] 72 80 73 - // x2-x18 74 - ldp x2, x3, [x1, #CPU_XREG_OFFSET(2)] 75 - ldp x4, x5, [x1, #CPU_XREG_OFFSET(4)] 76 - ldp x6, x7, [x1, #CPU_XREG_OFFSET(6)] 77 - ldp x8, x9, [x1, #CPU_XREG_OFFSET(8)] 78 - ldp x10, x11, [x1, #CPU_XREG_OFFSET(10)] 79 - ldp x12, x13, [x1, #CPU_XREG_OFFSET(12)] 80 - ldp x14, x15, [x1, #CPU_XREG_OFFSET(14)] 81 - ldp x16, x17, [x1, #CPU_XREG_OFFSET(16)] 82 - ldr x18, [x1, #CPU_XREG_OFFSET(18)] 81 + // Restore guest regs x19-x29, lr 82 + restore_callee_saved_regs x18 83 83 84 - // x19-x29, lr 85 - restore_callee_saved_regs x1 86 - 87 - // Last bits of the 64bit state 88 - ldp x0, x1, [sp], #16 84 + // Restore guest reg x18 85 + ldr x18, [x18, #CPU_XREG_OFFSET(18)] 89 86 90 87 // Do not touch any register after this! 91 88 eret 92 89 ENDPROC(__guest_enter) 93 90 94 91 ENTRY(__guest_exit) 95 - // x0: vcpu 96 - // x1: return code 97 - // x2-x3: free 98 - // x4-x29,lr: vcpu regs 99 - // vcpu x0-x3 on the stack 92 + // x0: return code 93 + // x1: vcpu 94 + // x2-x29,lr: vcpu regs 95 + // vcpu x0-x1 on the stack 100 96 101 - add x2, x0, #VCPU_CONTEXT 97 + add x1, x1, #VCPU_CONTEXT 102 98 103 - stp x4, x5, [x2, #CPU_XREG_OFFSET(4)] 104 - stp x6, x7, [x2, #CPU_XREG_OFFSET(6)] 105 - stp x8, x9, [x2, #CPU_XREG_OFFSET(8)] 106 - stp x10, x11, [x2, #CPU_XREG_OFFSET(10)] 107 - stp x12, x13, [x2, #CPU_XREG_OFFSET(12)] 108 - stp x14, x15, [x2, #CPU_XREG_OFFSET(14)] 109 - stp x16, x17, [x2, #CPU_XREG_OFFSET(16)] 110 - str x18, [x2, #CPU_XREG_OFFSET(18)] 99 + ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN) 111 100 112 - ldp x6, x7, [sp], #16 // x2, x3 113 - ldp x4, x5, [sp], #16 // x0, x1 101 + // Store the guest regs x2 and x3 102 + stp x2, x3, [x1, #CPU_XREG_OFFSET(2)] 114 103 115 - stp x4, x5, [x2, #CPU_XREG_OFFSET(0)] 116 - stp x6, x7, [x2, #CPU_XREG_OFFSET(2)] 104 + // Retrieve the guest regs x0-x1 from the stack 105 + ldp x2, x3, [sp], #16 // x0, x1 117 106 118 - save_callee_saved_regs x2 107 + // Store the guest regs x0-x1 and x4-x18 108 + stp x2, x3, [x1, #CPU_XREG_OFFSET(0)] 109 + stp x4, x5, [x1, #CPU_XREG_OFFSET(4)] 110 + stp x6, x7, [x1, #CPU_XREG_OFFSET(6)] 111 + stp x8, x9, [x1, #CPU_XREG_OFFSET(8)] 112 + stp x10, x11, [x1, #CPU_XREG_OFFSET(10)] 113 + stp x12, x13, [x1, #CPU_XREG_OFFSET(12)] 114 + stp x14, x15, [x1, #CPU_XREG_OFFSET(14)] 115 + stp x16, x17, [x1, #CPU_XREG_OFFSET(16)] 116 + str x18, [x1, #CPU_XREG_OFFSET(18)] 119 117 120 - // Restore vcpu & host_ctxt from the stack 121 - // (preserving return code in x1) 122 - ldp x0, x2, [sp], #16 118 + // Store the guest regs x19-x29, lr 119 + save_callee_saved_regs x1 120 + 121 + // Restore the host_ctxt from the stack 122 + ldr x2, [sp], #16 123 + 123 124 // Now restore the host regs 124 125 restore_callee_saved_regs x2 125 126 126 - mov x0, x1 127 - ret 127 + // If we have a pending asynchronous abort, now is the 128 + // time to find out. From your VAXorcist book, page 666: 129 + // "Threaten me not, oh Evil one! For I speak with 130 + // the power of DEC, and I command thee to show thyself!" 131 + mrs x2, elr_el2 132 + mrs x3, esr_el2 133 + mrs x4, spsr_el2 134 + mov x5, x0 135 + 136 + dsb sy // Synchronize against in-flight ld/st 137 + msr daifclr, #4 // Unmask aborts 138 + 139 + // This is our single instruction exception window. A pending 140 + // SError is guaranteed to occur at the earliest when we unmask 141 + // it, and at the latest just after the ISB. 142 + .global abort_guest_exit_start 143 + abort_guest_exit_start: 144 + 145 + isb 146 + 147 + .global abort_guest_exit_end 148 + abort_guest_exit_end: 149 + 150 + // If the exception took place, restore the EL1 exception 151 + // context so that we can report some information. 152 + // Merge the exception code with the SError pending bit. 153 + tbz x0, #ARM_EXIT_WITH_SERROR_BIT, 1f 154 + msr elr_el2, x2 155 + msr esr_el2, x3 156 + msr spsr_el2, x4 157 + orr x0, x0, x5 158 + 1: ret 128 159 ENDPROC(__guest_exit) 129 160 130 161 ENTRY(__fpsimd_guest_restore) 162 + stp x2, x3, [sp, #-16]! 131 163 stp x4, lr, [sp, #-16]! 132 164 133 165 alternative_if_not ARM64_HAS_VIRT_HOST_EXTN

+44 -29

arch/arm64/kvm/hyp/hyp-entry.S

··· 27 27 .text 28 28 .pushsection .hyp.text, "ax" 29 29 30 - .macro save_x0_to_x3 31 - stp x0, x1, [sp, #-16]! 32 - stp x2, x3, [sp, #-16]! 33 - .endm 34 - 35 - .macro restore_x0_to_x3 36 - ldp x2, x3, [sp], #16 37 - ldp x0, x1, [sp], #16 38 - .endm 39 - 40 30 .macro do_el2_call 41 31 /* 42 32 * Shuffle the parameters before calling the function ··· 69 79 ENDPROC(__kvm_hyp_teardown) 70 80 71 81 el1_sync: // Guest trapped into EL2 72 - save_x0_to_x3 82 + stp x0, x1, [sp, #-16]! 73 83 74 84 alternative_if_not ARM64_HAS_VIRT_HOST_EXTN 75 85 mrs x1, esr_el2 76 86 alternative_else 77 87 mrs x1, esr_el1 78 88 alternative_endif 79 - lsr x2, x1, #ESR_ELx_EC_SHIFT 89 + lsr x0, x1, #ESR_ELx_EC_SHIFT 80 90 81 - cmp x2, #ESR_ELx_EC_HVC64 91 + cmp x0, #ESR_ELx_EC_HVC64 82 92 b.ne el1_trap 83 93 84 - mrs x3, vttbr_el2 // If vttbr is valid, the 64bit guest 85 - cbnz x3, el1_trap // called HVC 94 + mrs x1, vttbr_el2 // If vttbr is valid, the 64bit guest 95 + cbnz x1, el1_trap // called HVC 86 96 87 97 /* Here, we're pretty sure the host called HVC. */ 88 - restore_x0_to_x3 98 + ldp x0, x1, [sp], #16 89 99 90 100 cmp x0, #HVC_GET_VECTORS 91 101 b.ne 1f ··· 103 113 104 114 el1_trap: 105 115 /* 106 - * x1: ESR 107 - * x2: ESR_EC 116 + * x0: ESR_EC 108 117 */ 109 118 110 119 /* Guest accessed VFP/SIMD registers, save host, restore Guest */ 111 - cmp x2, #ESR_ELx_EC_FP_ASIMD 120 + cmp x0, #ESR_ELx_EC_FP_ASIMD 112 121 b.eq __fpsimd_guest_restore 113 122 114 - mrs x0, tpidr_el2 115 - mov x1, #ARM_EXCEPTION_TRAP 123 + mrs x1, tpidr_el2 124 + mov x0, #ARM_EXCEPTION_TRAP 116 125 b __guest_exit 117 126 118 127 el1_irq: 119 - save_x0_to_x3 120 - mrs x0, tpidr_el2 121 - mov x1, #ARM_EXCEPTION_IRQ 128 + stp x0, x1, [sp, #-16]! 129 + mrs x1, tpidr_el2 130 + mov x0, #ARM_EXCEPTION_IRQ 122 131 b __guest_exit 132 + 133 + el1_error: 134 + stp x0, x1, [sp, #-16]! 135 + mrs x1, tpidr_el2 136 + mov x0, #ARM_EXCEPTION_EL1_SERROR 137 + b __guest_exit 138 + 139 + el2_error: 140 + /* 141 + * Only two possibilities: 142 + * 1) Either we come from the exit path, having just unmasked 143 + * PSTATE.A: change the return code to an EL2 fault, and 144 + * carry on, as we're already in a sane state to handle it. 145 + * 2) Or we come from anywhere else, and that's a bug: we panic. 146 + * 147 + * For (1), x0 contains the original return code and x1 doesn't 148 + * contain anything meaningful at that stage. We can reuse them 149 + * as temp registers. 150 + * For (2), who cares? 151 + */ 152 + mrs x0, elr_el2 153 + adr x1, abort_guest_exit_start 154 + cmp x0, x1 155 + adr x1, abort_guest_exit_end 156 + ccmp x0, x1, #4, ne 157 + b.ne __hyp_panic 158 + mov x0, #(1 << ARM_EXIT_WITH_SERROR_BIT) 159 + eret 123 160 124 161 ENTRY(__hyp_do_panic) 125 162 mov lr, #(PSR_F_BIT | PSR_I_BIT | PSR_A_BIT | PSR_D_BIT |\ ··· 172 155 invalid_vector el2h_sync_invalid 173 156 invalid_vector el2h_irq_invalid 174 157 invalid_vector el2h_fiq_invalid 175 - invalid_vector el2h_error_invalid 176 158 invalid_vector el1_sync_invalid 177 159 invalid_vector el1_irq_invalid 178 160 invalid_vector el1_fiq_invalid 179 - invalid_vector el1_error_invalid 180 161 181 162 .ltorg 182 163 ··· 189 174 ventry el2h_sync_invalid // Synchronous EL2h 190 175 ventry el2h_irq_invalid // IRQ EL2h 191 176 ventry el2h_fiq_invalid // FIQ EL2h 192 - ventry el2h_error_invalid // Error EL2h 177 + ventry el2_error // Error EL2h 193 178 194 179 ventry el1_sync // Synchronous 64-bit EL1 195 180 ventry el1_irq // IRQ 64-bit EL1 196 181 ventry el1_fiq_invalid // FIQ 64-bit EL1 197 - ventry el1_error_invalid // Error 64-bit EL1 182 + ventry el1_error // Error 64-bit EL1 198 183 199 184 ventry el1_sync // Synchronous 32-bit EL1 200 185 ventry el1_irq // IRQ 32-bit EL1 201 186 ventry el1_fiq_invalid // FIQ 32-bit EL1 202 - ventry el1_error_invalid // Error 32-bit EL1 187 + ventry el1_error // Error 32-bit EL1 203 188 ENDPROC(__kvm_hyp_vector)

+71 -13

arch/arm64/kvm/hyp/switch.c

··· 16 16 */ 17 17 18 18 #include <linux/types.h> 19 + #include <linux/jump_label.h> 20 + 19 21 #include <asm/kvm_asm.h> 22 + #include <asm/kvm_emulate.h> 20 23 #include <asm/kvm_hyp.h> 21 24 22 25 static bool __hyp_text __fpsimd_enabled_nvhe(void) ··· 112 109 113 110 static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu) 114 111 { 112 + /* 113 + * If we pended a virtual abort, preserve it until it gets 114 + * cleared. See D1.14.3 (Virtual Interrupts) for details, but 115 + * the crucial bit is "On taking a vSError interrupt, 116 + * HCR_EL2.VSE is cleared to 0." 117 + */ 118 + if (vcpu->arch.hcr_el2 & HCR_VSE) 119 + vcpu->arch.hcr_el2 = read_sysreg(hcr_el2); 120 + 115 121 __deactivate_traps_arch()(); 116 122 write_sysreg(0, hstr_el2); 117 123 write_sysreg(read_sysreg(mdcr_el2) & MDCR_EL2_HPMN_MASK, mdcr_el2); ··· 138 126 write_sysreg(0, vttbr_el2); 139 127 } 140 128 141 - static hyp_alternate_select(__vgic_call_save_state, 142 - __vgic_v2_save_state, __vgic_v3_save_state, 143 - ARM64_HAS_SYSREG_GIC_CPUIF); 144 - 145 - static hyp_alternate_select(__vgic_call_restore_state, 146 - __vgic_v2_restore_state, __vgic_v3_restore_state, 147 - ARM64_HAS_SYSREG_GIC_CPUIF); 148 - 149 129 static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu) 150 130 { 151 - __vgic_call_save_state()(vcpu); 131 + if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) 132 + __vgic_v3_save_state(vcpu); 133 + else 134 + __vgic_v2_save_state(vcpu); 135 + 152 136 write_sysreg(read_sysreg(hcr_el2) & ~HCR_INT_OVERRIDE, hcr_el2); 153 137 } 154 138 ··· 157 149 val |= vcpu->arch.irq_lines; 158 150 write_sysreg(val, hcr_el2); 159 151 160 - __vgic_call_restore_state()(vcpu); 152 + if (static_branch_unlikely(&kvm_vgic_global_state.gicv3_cpuif)) 153 + __vgic_v3_restore_state(vcpu); 154 + else 155 + __vgic_v2_restore_state(vcpu); 161 156 } 162 157 163 158 static bool __hyp_text __true_value(void) ··· 243 232 return true; 244 233 } 245 234 246 - static int __hyp_text __guest_run(struct kvm_vcpu *vcpu) 235 + static void __hyp_text __skip_instr(struct kvm_vcpu *vcpu) 236 + { 237 + *vcpu_pc(vcpu) = read_sysreg_el2(elr); 238 + 239 + if (vcpu_mode_is_32bit(vcpu)) { 240 + vcpu->arch.ctxt.gp_regs.regs.pstate = read_sysreg_el2(spsr); 241 + kvm_skip_instr32(vcpu, kvm_vcpu_trap_il_is32bit(vcpu)); 242 + write_sysreg_el2(vcpu->arch.ctxt.gp_regs.regs.pstate, spsr); 243 + } else { 244 + *vcpu_pc(vcpu) += 4; 245 + } 246 + 247 + write_sysreg_el2(*vcpu_pc(vcpu), elr); 248 + } 249 + 250 + int __hyp_text __kvm_vcpu_run(struct kvm_vcpu *vcpu) 247 251 { 248 252 struct kvm_cpu_context *host_ctxt; 249 253 struct kvm_cpu_context *guest_ctxt; ··· 293 267 exit_code = __guest_enter(vcpu, host_ctxt); 294 268 /* And we're baaack! */ 295 269 270 + /* 271 + * We're using the raw exception code in order to only process 272 + * the trap if no SError is pending. We will come back to the 273 + * same PC once the SError has been injected, and replay the 274 + * trapping instruction. 275 + */ 296 276 if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu)) 297 277 goto again; 278 + 279 + if (static_branch_unlikely(&vgic_v2_cpuif_trap) && 280 + exit_code == ARM_EXCEPTION_TRAP) { 281 + bool valid; 282 + 283 + valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW && 284 + kvm_vcpu_trap_get_fault_type(vcpu) == FSC_FAULT && 285 + kvm_vcpu_dabt_isvalid(vcpu) && 286 + !kvm_vcpu_dabt_isextabt(vcpu) && 287 + !kvm_vcpu_dabt_iss1tw(vcpu); 288 + 289 + if (valid) { 290 + int ret = __vgic_v2_perform_cpuif_access(vcpu); 291 + 292 + if (ret == 1) { 293 + __skip_instr(vcpu); 294 + goto again; 295 + } 296 + 297 + if (ret == -1) { 298 + /* Promote an illegal access to an SError */ 299 + __skip_instr(vcpu); 300 + exit_code = ARM_EXCEPTION_EL1_SERROR; 301 + } 302 + 303 + /* 0 falls through to be handler out of EL2 */ 304 + } 305 + } 298 306 299 307 fp_enabled = __fpsimd_enabled(); 300 308 ··· 352 292 353 293 return exit_code; 354 294 } 355 - 356 - __alias(__guest_run) int __kvm_vcpu_run(struct kvm_vcpu *vcpu); 357 295 358 296 static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n"; 359 297

+3 -10

arch/arm64/kvm/hyp/tlb.c

··· 17 17 18 18 #include <asm/kvm_hyp.h> 19 19 20 - static void __hyp_text __tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) 20 + void __hyp_text __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) 21 21 { 22 22 dsb(ishst); 23 23 ··· 48 48 write_sysreg(0, vttbr_el2); 49 49 } 50 50 51 - __alias(__tlb_flush_vmid_ipa) void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, 52 - phys_addr_t ipa); 53 - 54 - static void __hyp_text __tlb_flush_vmid(struct kvm *kvm) 51 + void __hyp_text __kvm_tlb_flush_vmid(struct kvm *kvm) 55 52 { 56 53 dsb(ishst); 57 54 ··· 64 67 write_sysreg(0, vttbr_el2); 65 68 } 66 69 67 - __alias(__tlb_flush_vmid) void __kvm_tlb_flush_vmid(struct kvm *kvm); 68 - 69 - static void __hyp_text __tlb_flush_vm_context(void) 70 + void __hyp_text __kvm_flush_vm_context(void) 70 71 { 71 72 dsb(ishst); 72 73 asm volatile("tlbi alle1is \n" 73 74 "ic ialluis ": : ); 74 75 dsb(ish); 75 76 } 76 - 77 - __alias(__tlb_flush_vm_context) void __kvm_flush_vm_context(void);

+1 -16

arch/arm64/kvm/hyp/vgic-v3-sr.c virt/kvm/arm/hyp/vgic-v3-sr.c

··· 24 24 #define vtr_to_max_lr_idx(v) ((v) & 0xf) 25 25 #define vtr_to_nr_pri_bits(v) (((u32)(v) >> 29) + 1) 26 26 27 - #define read_gicreg(r) \ 28 - ({ \ 29 - u64 reg; \ 30 - asm volatile("mrs_s %0, " __stringify(r) : "=r" (reg)); \ 31 - reg; \ 32 - }) 33 - 34 - #define write_gicreg(v,r) \ 35 - do { \ 36 - u64 __val = (v); \ 37 - asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\ 38 - } while (0) 39 - 40 27 static u64 __hyp_text __gic_v3_get_lr(unsigned int lr) 41 28 { 42 29 switch (lr & 0xf) { ··· 322 335 __gic_v3_set_lr(0, i); 323 336 } 324 337 325 - static u64 __hyp_text __vgic_v3_read_ich_vtr_el2(void) 338 + u64 __hyp_text __vgic_v3_get_ich_vtr_el2(void) 326 339 { 327 340 return read_gicreg(ICH_VTR_EL2); 328 341 } 329 - 330 - __alias(__vgic_v3_read_ich_vtr_el2) u64 __vgic_v3_get_ich_vtr_el2(void);

+12

arch/arm64/kvm/inject_fault.c

··· 231 231 else 232 232 inject_undef64(vcpu); 233 233 } 234 + 235 + /** 236 + * kvm_inject_vabt - inject an async abort / SError into the guest 237 + * @vcpu: The VCPU to receive the exception 238 + * 239 + * It is assumed that this code is called from the VCPU thread and that the 240 + * VCPU therefore is not currently executing guest code. 241 + */ 242 + void kvm_inject_vabt(struct kvm_vcpu *vcpu) 243 + { 244 + vcpu_set_hcr(vcpu, vcpu_get_hcr(vcpu) | HCR_VSE); 245 + }

+40 -23

arch/mips/include/asm/kvm_host.h

··· 107 107 #define KVM_INVALID_INST 0xdeadbeef 108 108 #define KVM_INVALID_ADDR 0xdeadbeef 109 109 110 + /* 111 + * EVA has overlapping user & kernel address spaces, so user VAs may be > 112 + * PAGE_OFFSET. For this reason we can't use the default KVM_HVA_ERR_BAD of 113 + * PAGE_OFFSET. 114 + */ 115 + 116 + #define KVM_HVA_ERR_BAD (-1UL) 117 + #define KVM_HVA_ERR_RO_BAD (-2UL) 118 + 119 + static inline bool kvm_is_error_hva(unsigned long addr) 120 + { 121 + return IS_ERR_VALUE(addr); 122 + } 123 + 110 124 extern atomic_t kvm_mips_instance; 111 125 112 126 struct kvm_vm_stat { 113 - u32 remote_tlb_flush; 127 + ulong remote_tlb_flush; 114 128 }; 115 129 116 130 struct kvm_vcpu_stat { 117 - u32 wait_exits; 118 - u32 cache_exits; 119 - u32 signal_exits; 120 - u32 int_exits; 121 - u32 cop_unusable_exits; 122 - u32 tlbmod_exits; 123 - u32 tlbmiss_ld_exits; 124 - u32 tlbmiss_st_exits; 125 - u32 addrerr_st_exits; 126 - u32 addrerr_ld_exits; 127 - u32 syscall_exits; 128 - u32 resvd_inst_exits; 129 - u32 break_inst_exits; 130 - u32 trap_inst_exits; 131 - u32 msa_fpe_exits; 132 - u32 fpe_exits; 133 - u32 msa_disabled_exits; 134 - u32 flush_dcache_exits; 135 - u32 halt_successful_poll; 136 - u32 halt_attempted_poll; 137 - u32 halt_poll_invalid; 138 - u32 halt_wakeup; 131 + u64 wait_exits; 132 + u64 cache_exits; 133 + u64 signal_exits; 134 + u64 int_exits; 135 + u64 cop_unusable_exits; 136 + u64 tlbmod_exits; 137 + u64 tlbmiss_ld_exits; 138 + u64 tlbmiss_st_exits; 139 + u64 addrerr_st_exits; 140 + u64 addrerr_ld_exits; 141 + u64 syscall_exits; 142 + u64 resvd_inst_exits; 143 + u64 break_inst_exits; 144 + u64 trap_inst_exits; 145 + u64 msa_fpe_exits; 146 + u64 fpe_exits; 147 + u64 msa_disabled_exits; 148 + u64 flush_dcache_exits; 149 + u64 halt_successful_poll; 150 + u64 halt_attempted_poll; 151 + u64 halt_poll_invalid; 152 + u64 halt_wakeup; 139 153 }; 140 154 141 155 struct kvm_arch_memory_slot { ··· 327 313 u32 guest_user_asid[NR_CPUS]; 328 314 u32 guest_kernel_asid[NR_CPUS]; 329 315 struct mm_struct guest_kernel_mm, guest_user_mm; 316 + 317 + /* Guest ASID of last user mode execution */ 318 + unsigned int last_user_gasid; 330 319 331 320 int last_sched_cpu; 332 321

+64 -14

arch/mips/kvm/emulate.c

··· 846 846 return EMULATE_FAIL; 847 847 } 848 848 849 + /** 850 + * kvm_mips_invalidate_guest_tlb() - Indicates a change in guest MMU map. 851 + * @vcpu: VCPU with changed mappings. 852 + * @tlb: TLB entry being removed. 853 + * 854 + * This is called to indicate a single change in guest MMU mappings, so that we 855 + * can arrange TLB flushes on this and other CPUs. 856 + */ 857 + static void kvm_mips_invalidate_guest_tlb(struct kvm_vcpu *vcpu, 858 + struct kvm_mips_tlb *tlb) 859 + { 860 + int cpu, i; 861 + bool user; 862 + 863 + /* No need to flush for entries which are already invalid */ 864 + if (!((tlb->tlb_lo[0] | tlb->tlb_lo[1]) & ENTRYLO_V)) 865 + return; 866 + /* User address space doesn't need flushing for KSeg2/3 changes */ 867 + user = tlb->tlb_hi < KVM_GUEST_KSEG0; 868 + 869 + preempt_disable(); 870 + 871 + /* 872 + * Probe the shadow host TLB for the entry being overwritten, if one 873 + * matches, invalidate it 874 + */ 875 + kvm_mips_host_tlb_inv(vcpu, tlb->tlb_hi); 876 + 877 + /* Invalidate the whole ASID on other CPUs */ 878 + cpu = smp_processor_id(); 879 + for_each_possible_cpu(i) { 880 + if (i == cpu) 881 + continue; 882 + if (user) 883 + vcpu->arch.guest_user_asid[i] = 0; 884 + vcpu->arch.guest_kernel_asid[i] = 0; 885 + } 886 + 887 + preempt_enable(); 888 + } 889 + 849 890 /* Write Guest TLB Entry @ Index */ 850 891 enum emulation_result kvm_mips_emul_tlbwi(struct kvm_vcpu *vcpu) 851 892 { ··· 906 865 } 907 866 908 867 tlb = &vcpu->arch.guest_tlb[index]; 909 - /* 910 - * Probe the shadow host TLB for the entry being overwritten, if one 911 - * matches, invalidate it 912 - */ 913 - kvm_mips_host_tlb_inv(vcpu, tlb->tlb_hi); 868 + 869 + kvm_mips_invalidate_guest_tlb(vcpu, tlb); 914 870 915 871 tlb->tlb_mask = kvm_read_c0_guest_pagemask(cop0); 916 872 tlb->tlb_hi = kvm_read_c0_guest_entryhi(cop0); ··· 936 898 937 899 tlb = &vcpu->arch.guest_tlb[index]; 938 900 939 - /* 940 - * Probe the shadow host TLB for the entry being overwritten, if one 941 - * matches, invalidate it 942 - */ 943 - kvm_mips_host_tlb_inv(vcpu, tlb->tlb_hi); 901 + kvm_mips_invalidate_guest_tlb(vcpu, tlb); 944 902 945 903 tlb->tlb_mask = kvm_read_c0_guest_pagemask(cop0); 946 904 tlb->tlb_hi = kvm_read_c0_guest_entryhi(cop0); ··· 1060 1026 enum emulation_result er = EMULATE_DONE; 1061 1027 u32 rt, rd, sel; 1062 1028 unsigned long curr_pc; 1029 + int cpu, i; 1063 1030 1064 1031 /* 1065 1032 * Update PC and hold onto current PC in case there is ··· 1162 1127 } else if (rd == MIPS_CP0_TLB_HI && sel == 0) { 1163 1128 u32 nasid = 1164 1129 vcpu->arch.gprs[rt] & KVM_ENTRYHI_ASID; 1165 - if ((KSEGX(vcpu->arch.gprs[rt]) != CKSEG0) && 1166 - ((kvm_read_c0_guest_entryhi(cop0) & 1130 + if (((kvm_read_c0_guest_entryhi(cop0) & 1167 1131 KVM_ENTRYHI_ASID) != nasid)) { 1168 1132 trace_kvm_asid_change(vcpu, 1169 1133 kvm_read_c0_guest_entryhi(cop0) 1170 1134 & KVM_ENTRYHI_ASID, 1171 1135 nasid); 1172 1136 1173 - /* Blow away the shadow host TLBs */ 1174 - kvm_mips_flush_host_tlb(1); 1137 + /* 1138 + * Regenerate/invalidate kernel MMU 1139 + * context. 1140 + * The user MMU context will be 1141 + * regenerated lazily on re-entry to 1142 + * guest user if the guest ASID actually 1143 + * changes. 1144 + */ 1145 + preempt_disable(); 1146 + cpu = smp_processor_id(); 1147 + kvm_get_new_mmu_context(&vcpu->arch.guest_kernel_mm, 1148 + cpu, vcpu); 1149 + vcpu->arch.guest_kernel_asid[cpu] = 1150 + vcpu->arch.guest_kernel_mm.context.asid[cpu]; 1151 + for_each_possible_cpu(i) 1152 + if (i != cpu) 1153 + vcpu->arch.guest_kernel_asid[i] = 0; 1154 + preempt_enable(); 1175 1155 } 1176 1156 kvm_write_c0_guest_entryhi(cop0, 1177 1157 vcpu->arch.gprs[rt]);

+40

arch/mips/kvm/mips.c

··· 140 140 return 0; 141 141 } 142 142 143 + bool kvm_arch_has_vcpu_debugfs(void) 144 + { 145 + return false; 146 + } 147 + 148 + int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu) 149 + { 150 + return 0; 151 + } 152 + 143 153 void kvm_mips_free_vcpus(struct kvm *kvm) 144 154 { 145 155 unsigned int i; ··· 421 411 return -ENOIOCTLCMD; 422 412 } 423 413 414 + /* Must be called with preemption disabled, just before entering guest */ 415 + static void kvm_mips_check_asids(struct kvm_vcpu *vcpu) 416 + { 417 + struct mips_coproc *cop0 = vcpu->arch.cop0; 418 + int cpu = smp_processor_id(); 419 + unsigned int gasid; 420 + 421 + /* 422 + * Lazy host ASID regeneration for guest user mode. 423 + * If the guest ASID has changed since the last guest usermode 424 + * execution, regenerate the host ASID so as to invalidate stale TLB 425 + * entries. 426 + */ 427 + if (!KVM_GUEST_KERNEL_MODE(vcpu)) { 428 + gasid = kvm_read_c0_guest_entryhi(cop0) & KVM_ENTRYHI_ASID; 429 + if (gasid != vcpu->arch.last_user_gasid) { 430 + kvm_get_new_mmu_context(&vcpu->arch.guest_user_mm, cpu, 431 + vcpu); 432 + vcpu->arch.guest_user_asid[cpu] = 433 + vcpu->arch.guest_user_mm.context.asid[cpu]; 434 + vcpu->arch.last_user_gasid = gasid; 435 + } 436 + } 437 + } 438 + 424 439 int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) 425 440 { 426 441 int r = 0; ··· 473 438 htw_stop(); 474 439 475 440 trace_kvm_enter(vcpu); 441 + 442 + kvm_mips_check_asids(vcpu); 443 + 476 444 r = vcpu->arch.vcpu_run(run, vcpu); 477 445 trace_kvm_out(vcpu); 478 446 ··· 1588 1550 1589 1551 if (ret == RESUME_GUEST) { 1590 1552 trace_kvm_reenter(vcpu); 1553 + 1554 + kvm_mips_check_asids(vcpu); 1591 1555 1592 1556 /* 1593 1557 * If FPU / MSA are enabled (i.e. the guest's FPU / MSA context

+15 -3

arch/mips/kvm/mmu.c

··· 250 250 kvm_get_new_mmu_context(&vcpu->arch.guest_kernel_mm, cpu, vcpu); 251 251 vcpu->arch.guest_kernel_asid[cpu] = 252 252 vcpu->arch.guest_kernel_mm.context.asid[cpu]; 253 - kvm_get_new_mmu_context(&vcpu->arch.guest_user_mm, cpu, vcpu); 254 - vcpu->arch.guest_user_asid[cpu] = 255 - vcpu->arch.guest_user_mm.context.asid[cpu]; 256 253 newasid++; 257 254 258 255 kvm_debug("[%d]: cpu_context: %#lx\n", cpu, 259 256 cpu_context(cpu, current->mm)); 260 257 kvm_debug("[%d]: Allocated new ASID for Guest Kernel: %#x\n", 261 258 cpu, vcpu->arch.guest_kernel_asid[cpu]); 259 + } 260 + 261 + if ((vcpu->arch.guest_user_asid[cpu] ^ asid_cache(cpu)) & 262 + asid_version_mask(cpu)) { 263 + u32 gasid = kvm_read_c0_guest_entryhi(vcpu->arch.cop0) & 264 + KVM_ENTRYHI_ASID; 265 + 266 + kvm_get_new_mmu_context(&vcpu->arch.guest_user_mm, cpu, vcpu); 267 + vcpu->arch.guest_user_asid[cpu] = 268 + vcpu->arch.guest_user_mm.context.asid[cpu]; 269 + vcpu->arch.last_user_gasid = gasid; 270 + newasid++; 271 + 272 + kvm_debug("[%d]: cpu_context: %#lx\n", cpu, 273 + cpu_context(cpu, current->mm)); 262 274 kvm_debug("[%d]: Allocated new ASID for Guest User: %#x\n", cpu, 263 275 vcpu->arch.guest_user_asid[cpu]); 264 276 }

+18

arch/mips/kvm/trap_emul.c

··· 175 175 run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 176 176 ret = RESUME_HOST; 177 177 } 178 + } else if (KVM_GUEST_KERNEL_MODE(vcpu) 179 + && (KSEGX(badvaddr) == CKSEG0 || KSEGX(badvaddr) == CKSEG1)) { 180 + /* 181 + * With EVA we may get a TLB exception instead of an address 182 + * error when the guest performs MMIO to KSeg1 addresses. 183 + */ 184 + kvm_debug("Emulate %s MMIO space\n", 185 + store ? "Store to" : "Load from"); 186 + er = kvm_mips_emulate_inst(cause, opc, run, vcpu); 187 + if (er == EMULATE_FAIL) { 188 + kvm_err("Emulate %s MMIO space failed\n", 189 + store ? "Store to" : "Load from"); 190 + run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 191 + ret = RESUME_HOST; 192 + } else { 193 + run->exit_reason = KVM_EXIT_MMIO; 194 + ret = RESUME_HOST; 195 + } 178 196 } else { 179 197 kvm_err("Illegal TLB %s fault address , cause %#x, PC: %p, BadVaddr: %#lx\n", 180 198 store ? "ST" : "LD", cause, opc, badvaddr);

+37

arch/powerpc/include/asm/book3s/64/mmu-hash.h

··· 245 245 } 246 246 247 247 /* 248 + * This array is indexed by the LP field of the HPTE second dword. 249 + * Since this field may contain some RPN bits, some entries are 250 + * replicated so that we get the same value irrespective of RPN. 251 + * The top 4 bits are the page size index (MMU_PAGE_*) for the 252 + * actual page size, the bottom 4 bits are the base page size. 253 + */ 254 + extern u8 hpte_page_sizes[1 << LP_BITS]; 255 + 256 + static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l, 257 + bool is_base_size) 258 + { 259 + unsigned int i, lp; 260 + 261 + if (!(h & HPTE_V_LARGE)) 262 + return 1ul << 12; 263 + 264 + /* Look at the 8 bit LP value */ 265 + lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1); 266 + i = hpte_page_sizes[lp]; 267 + if (!i) 268 + return 0; 269 + if (!is_base_size) 270 + i >>= 4; 271 + return 1ul << mmu_psize_defs[i & 0xf].shift; 272 + } 273 + 274 + static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) 275 + { 276 + return __hpte_page_size(h, l, 0); 277 + } 278 + 279 + static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l) 280 + { 281 + return __hpte_page_size(h, l, 1); 282 + } 283 + 284 + /* 248 285 * The current system page and segment sizes 249 286 */ 250 287 extern int mmu_kernel_ssize;

+29

arch/powerpc/include/asm/io.h

··· 241 241 #endif 242 242 #endif /* __powerpc64__ */ 243 243 244 + 245 + /* 246 + * Simple Cache inhibited accessors 247 + * Unlike the DEF_MMIO_* macros, these don't include any h/w memory 248 + * barriers, callers need to manage memory barriers on their own. 249 + * These can only be used in hypervisor real mode. 250 + */ 251 + 252 + static inline u32 _lwzcix(unsigned long addr) 253 + { 254 + u32 ret; 255 + 256 + __asm__ __volatile__("lwzcix %0,0, %1" 257 + : "=r" (ret) : "r" (addr) : "memory"); 258 + return ret; 259 + } 260 + 261 + static inline void _stbcix(u64 addr, u8 val) 262 + { 263 + __asm__ __volatile__("stbcix %0,0,%1" 264 + : : "r" (val), "r" (addr) : "memory"); 265 + } 266 + 267 + static inline void _stwcix(u64 addr, u32 val) 268 + { 269 + __asm__ __volatile__("stwcix %0,0,%1" 270 + : : "r" (val), "r" (addr) : "memory"); 271 + } 272 + 244 273 /* 245 274 * Low level IO stream instructions are defined out of line for now 246 275 */

+10

arch/powerpc/include/asm/kvm_asm.h

··· 105 105 #define BOOK3S_INTERRUPT_FAC_UNAVAIL 0xf60 106 106 #define BOOK3S_INTERRUPT_H_FAC_UNAVAIL 0xf80 107 107 108 + /* book3s_hv */ 109 + 110 + /* 111 + * Special trap used to indicate to host that this is a 112 + * passthrough interrupt that could not be handled 113 + * completely in the guest. 114 + */ 115 + #define BOOK3S_INTERRUPT_HV_RM_HARD 0x5555 116 + 108 117 #define BOOK3S_IRQPRIO_SYSTEM_RESET 0 109 118 #define BOOK3S_IRQPRIO_DATA_SEGMENT 1 110 119 #define BOOK3S_IRQPRIO_INST_SEGMENT 2 ··· 145 136 #define RESUME_FLAG_NV (1<<0) /* Reload guest nonvolatile state? */ 146 137 #define RESUME_FLAG_HOST (1<<1) /* Resume host? */ 147 138 #define RESUME_FLAG_ARCH1 (1<<2) 139 + #define RESUME_FLAG_ARCH2 (1<<3) 148 140 149 141 #define RESUME_GUEST 0 150 142 #define RESUME_GUEST_NV RESUME_FLAG_NV

+39

arch/powerpc/include/asm/kvm_book3s.h

··· 69 69 int pagesize; 70 70 }; 71 71 72 + /* 73 + * Struct for a virtual core. 74 + * Note: entry_exit_map combines a bitmap of threads that have entered 75 + * in the bottom 8 bits and a bitmap of threads that have exited in the 76 + * next 8 bits. This is so that we can atomically set the entry bit 77 + * iff the exit map is 0 without taking a lock. 78 + */ 79 + struct kvmppc_vcore { 80 + int n_runnable; 81 + int num_threads; 82 + int entry_exit_map; 83 + int napping_threads; 84 + int first_vcpuid; 85 + u16 pcpu; 86 + u16 last_cpu; 87 + u8 vcore_state; 88 + u8 in_guest; 89 + struct kvmppc_vcore *master_vcore; 90 + struct kvm_vcpu *runnable_threads[MAX_SMT_THREADS]; 91 + struct list_head preempt_list; 92 + spinlock_t lock; 93 + struct swait_queue_head wq; 94 + spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */ 95 + u64 stolen_tb; 96 + u64 preempt_tb; 97 + struct kvm_vcpu *runner; 98 + struct kvm *kvm; 99 + u64 tb_offset; /* guest timebase - host timebase */ 100 + ulong lpcr; 101 + u32 arch_compat; 102 + ulong pcr; 103 + ulong dpdes; /* doorbell state (POWER8) */ 104 + ulong vtb; /* virtual timebase */ 105 + ulong conferring_threads; 106 + unsigned int halt_poll_ns; 107 + }; 108 + 72 109 struct kvmppc_vcpu_book3s { 73 110 struct kvmppc_sid_map sid_map[SID_MAP_NUM]; 74 111 struct { ··· 120 83 u64 sdr1; 121 84 u64 hior; 122 85 u64 msr_mask; 86 + u64 vtb; 123 87 #ifdef CONFIG_PPC_BOOK3S_32 124 88 u32 vsid_pool[VSID_POOL_SIZE]; 125 89 u32 vsid_next; ··· 229 191 struct kvm_vcpu *vcpu); 230 192 extern void kvmppc_copy_from_svcpu(struct kvm_vcpu *vcpu, 231 193 struct kvmppc_book3s_shadow_vcpu *svcpu); 194 + extern int kvm_irq_bypass; 232 195 233 196 static inline struct kvmppc_vcpu_book3s *to_book3s(struct kvm_vcpu *vcpu) 234 197 {

+8 -82

arch/powerpc/include/asm/kvm_book3s_64.h

··· 20 20 #ifndef __ASM_KVM_BOOK3S_64_H__ 21 21 #define __ASM_KVM_BOOK3S_64_H__ 22 22 23 + #include <asm/book3s/64/mmu-hash.h> 24 + 23 25 #ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE 24 26 static inline struct kvmppc_book3s_shadow_vcpu *svcpu_get(struct kvm_vcpu *vcpu) 25 27 { ··· 99 97 hpte[0] = cpu_to_be64(hpte_v); 100 98 } 101 99 102 - static inline int __hpte_actual_psize(unsigned int lp, int psize) 103 - { 104 - int i, shift; 105 - unsigned int mask; 106 - 107 - /* start from 1 ignoring MMU_PAGE_4K */ 108 - for (i = 1; i < MMU_PAGE_COUNT; i++) { 109 - 110 - /* invalid penc */ 111 - if (mmu_psize_defs[psize].penc[i] == -1) 112 - continue; 113 - /* 114 - * encoding bits per actual page size 115 - * PTE LP actual page size 116 - * rrrr rrrz >=8KB 117 - * rrrr rrzz >=16KB 118 - * rrrr rzzz >=32KB 119 - * rrrr zzzz >=64KB 120 - * ....... 121 - */ 122 - shift = mmu_psize_defs[i].shift - LP_SHIFT; 123 - if (shift > LP_BITS) 124 - shift = LP_BITS; 125 - mask = (1 << shift) - 1; 126 - if ((lp & mask) == mmu_psize_defs[psize].penc[i]) 127 - return i; 128 - } 129 - return -1; 130 - } 131 - 132 100 static inline unsigned long compute_tlbie_rb(unsigned long v, unsigned long r, 133 101 unsigned long pte_index) 134 102 { 135 - int b_psize = MMU_PAGE_4K, a_psize = MMU_PAGE_4K; 103 + int i, b_psize = MMU_PAGE_4K, a_psize = MMU_PAGE_4K; 136 104 unsigned int penc; 137 105 unsigned long rb = 0, va_low, sllp; 138 106 unsigned int lp = (r >> LP_SHIFT) & ((1 << LP_BITS) - 1); 139 107 140 108 if (v & HPTE_V_LARGE) { 141 - for (b_psize = 0; b_psize < MMU_PAGE_COUNT; b_psize++) { 142 - 143 - /* valid entries have a shift value */ 144 - if (!mmu_psize_defs[b_psize].shift) 145 - continue; 146 - 147 - a_psize = __hpte_actual_psize(lp, b_psize); 148 - if (a_psize != -1) 149 - break; 150 - } 109 + i = hpte_page_sizes[lp]; 110 + b_psize = i & 0xf; 111 + a_psize = i >> 4; 151 112 } 113 + 152 114 /* 153 115 * Ignore the top 14 bits of va 154 116 * v have top two bits covering segment size, hence move ··· 125 159 /* This covers 14..54 bits of va*/ 126 160 rb = (v & ~0x7fUL) << 16; /* AVA field */ 127 161 128 - rb |= (v >> HPTE_V_SSIZE_SHIFT) << 8; /* B field */ 129 162 /* 130 163 * AVA in v had cleared lower 23 bits. We need to derive 131 164 * that from pteg index ··· 176 211 break; 177 212 } 178 213 } 179 - rb |= (v >> 54) & 0x300; /* B field */ 214 + rb |= (v >> HPTE_V_SSIZE_SHIFT) << 8; /* B field */ 180 215 return rb; 181 - } 182 - 183 - static inline unsigned long __hpte_page_size(unsigned long h, unsigned long l, 184 - bool is_base_size) 185 - { 186 - 187 - int size, a_psize; 188 - /* Look at the 8 bit LP value */ 189 - unsigned int lp = (l >> LP_SHIFT) & ((1 << LP_BITS) - 1); 190 - 191 - /* only handle 4k, 64k and 16M pages for now */ 192 - if (!(h & HPTE_V_LARGE)) 193 - return 1ul << 12; 194 - else { 195 - for (size = 0; size < MMU_PAGE_COUNT; size++) { 196 - /* valid entries have a shift value */ 197 - if (!mmu_psize_defs[size].shift) 198 - continue; 199 - 200 - a_psize = __hpte_actual_psize(lp, size); 201 - if (a_psize != -1) { 202 - if (is_base_size) 203 - return 1ul << mmu_psize_defs[size].shift; 204 - return 1ul << mmu_psize_defs[a_psize].shift; 205 - } 206 - } 207 - 208 - } 209 - return 0; 210 - } 211 - 212 - static inline unsigned long hpte_page_size(unsigned long h, unsigned long l) 213 - { 214 - return __hpte_page_size(h, l, 0); 215 - } 216 - 217 - static inline unsigned long hpte_base_page_size(unsigned long h, unsigned long l) 218 - { 219 - return __hpte_page_size(h, l, 1); 220 216 } 221 217 222 218 static inline unsigned long hpte_rpn(unsigned long ptel, unsigned long psize)

+57 -67

arch/powerpc/include/asm/kvm_host.h

··· 43 43 #include <asm/cputhreads.h> 44 44 #define KVM_MAX_VCPU_ID (threads_per_subcore * KVM_MAX_VCORES) 45 45 46 + #define __KVM_HAVE_ARCH_INTC_INITIALIZED 47 + 46 48 #ifdef CONFIG_KVM_MMIO 47 49 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 48 50 #endif ··· 97 95 struct kvmppc_book3s_shadow_vcpu; 98 96 99 97 struct kvm_vm_stat { 100 - u32 remote_tlb_flush; 98 + ulong remote_tlb_flush; 101 99 }; 102 100 103 101 struct kvm_vcpu_stat { 104 - u32 sum_exits; 105 - u32 mmio_exits; 106 - u32 signal_exits; 107 - u32 light_exits; 102 + u64 sum_exits; 103 + u64 mmio_exits; 104 + u64 signal_exits; 105 + u64 light_exits; 108 106 /* Account for special types of light exits: */ 109 - u32 itlb_real_miss_exits; 110 - u32 itlb_virt_miss_exits; 111 - u32 dtlb_real_miss_exits; 112 - u32 dtlb_virt_miss_exits; 113 - u32 syscall_exits; 114 - u32 isi_exits; 115 - u32 dsi_exits; 116 - u32 emulated_inst_exits; 117 - u32 dec_exits; 118 - u32 ext_intr_exits; 119 - u32 halt_successful_poll; 120 - u32 halt_attempted_poll; 121 - u32 halt_poll_invalid; 122 - u32 halt_wakeup; 123 - u32 dbell_exits; 124 - u32 gdbell_exits; 125 - u32 ld; 126 - u32 st; 107 + u64 itlb_real_miss_exits; 108 + u64 itlb_virt_miss_exits; 109 + u64 dtlb_real_miss_exits; 110 + u64 dtlb_virt_miss_exits; 111 + u64 syscall_exits; 112 + u64 isi_exits; 113 + u64 dsi_exits; 114 + u64 emulated_inst_exits; 115 + u64 dec_exits; 116 + u64 ext_intr_exits; 117 + u64 halt_poll_success_ns; 118 + u64 halt_poll_fail_ns; 119 + u64 halt_wait_ns; 120 + u64 halt_successful_poll; 121 + u64 halt_attempted_poll; 122 + u64 halt_successful_wait; 123 + u64 halt_poll_invalid; 124 + u64 halt_wakeup; 125 + u64 dbell_exits; 126 + u64 gdbell_exits; 127 + u64 ld; 128 + u64 st; 127 129 #ifdef CONFIG_PPC_BOOK3S 128 - u32 pf_storage; 129 - u32 pf_instruc; 130 - u32 sp_storage; 131 - u32 sp_instruc; 132 - u32 queue_intr; 133 - u32 ld_slow; 134 - u32 st_slow; 130 + u64 pf_storage; 131 + u64 pf_instruc; 132 + u64 sp_storage; 133 + u64 sp_instruc; 134 + u64 queue_intr; 135 + u64 ld_slow; 136 + u64 st_slow; 135 137 #endif 138 + u64 pthru_all; 139 + u64 pthru_host; 140 + u64 pthru_bad_aff; 136 141 }; 137 142 138 143 enum kvm_exit_types { ··· 205 196 /* XICS components, defined in book3s_xics.c */ 206 197 struct kvmppc_xics; 207 198 struct kvmppc_icp; 199 + 200 + struct kvmppc_passthru_irqmap; 208 201 209 202 /* 210 203 * The reverse mapping array has one entry for each HPTE, ··· 278 267 #endif 279 268 #ifdef CONFIG_KVM_XICS 280 269 struct kvmppc_xics *xics; 270 + struct kvmppc_passthru_irqmap *pimap; 281 271 #endif 282 272 struct kvmppc_ops *kvm_ops; 283 273 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 284 274 /* This array can grow quite large, keep it at the end */ 285 275 struct kvmppc_vcore *vcores[KVM_MAX_VCORES]; 286 276 #endif 287 - }; 288 - 289 - /* 290 - * Struct for a virtual core. 291 - * Note: entry_exit_map combines a bitmap of threads that have entered 292 - * in the bottom 8 bits and a bitmap of threads that have exited in the 293 - * next 8 bits. This is so that we can atomically set the entry bit 294 - * iff the exit map is 0 without taking a lock. 295 - */ 296 - struct kvmppc_vcore { 297 - int n_runnable; 298 - int num_threads; 299 - int entry_exit_map; 300 - int napping_threads; 301 - int first_vcpuid; 302 - u16 pcpu; 303 - u16 last_cpu; 304 - u8 vcore_state; 305 - u8 in_guest; 306 - struct kvmppc_vcore *master_vcore; 307 - struct list_head runnable_threads; 308 - struct list_head preempt_list; 309 - spinlock_t lock; 310 - struct swait_queue_head wq; 311 - spinlock_t stoltb_lock; /* protects stolen_tb and preempt_tb */ 312 - u64 stolen_tb; 313 - u64 preempt_tb; 314 - struct kvm_vcpu *runner; 315 - struct kvm *kvm; 316 - u64 tb_offset; /* guest timebase - host timebase */ 317 - ulong lpcr; 318 - u32 arch_compat; 319 - ulong pcr; 320 - ulong dpdes; /* doorbell state (POWER8) */ 321 - ulong conferring_threads; 322 277 }; 323 278 324 279 #define VCORE_ENTRY_MAP(vc) ((vc)->entry_exit_map & 0xff) ··· 306 329 #define VCORE_SLEEPING 3 307 330 #define VCORE_RUNNING 4 308 331 #define VCORE_EXITING 5 332 + #define VCORE_POLLING 6 309 333 310 334 /* 311 335 * Struct used to manage memory for a virtual processor area ··· 374 396 u64 tb_min; /* min time */ 375 397 u64 tb_max; /* max time */ 376 398 }; 399 + 400 + #ifdef CONFIG_PPC_BOOK3S_64 401 + struct kvmppc_irq_map { 402 + u32 r_hwirq; 403 + u32 v_hwirq; 404 + struct irq_desc *desc; 405 + }; 406 + 407 + #define KVMPPC_PIRQ_MAPPED 1024 408 + struct kvmppc_passthru_irqmap { 409 + int n_mapped; 410 + struct kvmppc_irq_map mapped[KVMPPC_PIRQ_MAPPED]; 411 + }; 412 + #endif 377 413 378 414 # ifdef CONFIG_PPC_FSL_BOOK3E 379 415 #define KVMPPC_BOOKE_IAC_NUM 2 ··· 475 483 ulong purr; 476 484 ulong spurr; 477 485 ulong ic; 478 - ulong vtb; 479 486 ulong dscr; 480 487 ulong amr; 481 488 ulong uamor; ··· 659 668 long pgfault_index; 660 669 unsigned long pgfault_hpte[2]; 661 670 662 - struct list_head run_list; 663 671 struct task_struct *run_task; 664 672 struct kvm_run *kvm_run; 665 673

+28

arch/powerpc/include/asm/kvm_ppc.h

··· 287 287 long (*arch_vm_ioctl)(struct file *filp, unsigned int ioctl, 288 288 unsigned long arg); 289 289 int (*hcall_implemented)(unsigned long hcall); 290 + int (*irq_bypass_add_producer)(struct irq_bypass_consumer *, 291 + struct irq_bypass_producer *); 292 + void (*irq_bypass_del_producer)(struct irq_bypass_consumer *, 293 + struct irq_bypass_producer *); 290 294 }; 291 295 292 296 extern struct kvmppc_ops *kvmppc_hv_ops; ··· 457 453 { 458 454 return vcpu->arch.irq_type == KVMPPC_IRQ_XICS; 459 455 } 456 + 457 + static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap( 458 + struct kvm *kvm) 459 + { 460 + if (kvm && kvm_irq_bypass) 461 + return kvm->arch.pimap; 462 + return NULL; 463 + } 464 + 460 465 extern void kvmppc_alloc_host_rm_ops(void); 461 466 extern void kvmppc_free_host_rm_ops(void); 467 + extern void kvmppc_free_pimap(struct kvm *kvm); 468 + extern int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall); 462 469 extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu); 463 470 extern int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu, unsigned long server); 464 471 extern int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args); ··· 479 464 extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev, 480 465 struct kvm_vcpu *vcpu, u32 cpu); 481 466 extern void kvmppc_xics_ipi_action(void); 467 + extern void kvmppc_xics_set_mapped(struct kvm *kvm, unsigned long guest_irq, 468 + unsigned long host_irq); 469 + extern void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long guest_irq, 470 + unsigned long host_irq); 471 + extern long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, u32 xirr, 472 + struct kvmppc_irq_map *irq_map, 473 + struct kvmppc_passthru_irqmap *pimap); 482 474 extern int h_ipi_redirect; 483 475 #else 476 + static inline struct kvmppc_passthru_irqmap *kvmppc_get_passthru_irqmap( 477 + struct kvm *kvm) 478 + { return NULL; } 484 479 static inline void kvmppc_alloc_host_rm_ops(void) {}; 485 480 static inline void kvmppc_free_host_rm_ops(void) {}; 481 + static inline void kvmppc_free_pimap(struct kvm *kvm) {}; 482 + static inline int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall) 483 + { return 0; } 486 484 static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu) 487 485 { return 0; } 488 486 static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { }

+1

arch/powerpc/include/asm/mmu.h

··· 271 271 #define MMU_PAGE_16G 13 272 272 #define MMU_PAGE_64G 14 273 273 274 + /* N.B. we need to change the type of hpte_page_sizes if this gets to be > 16 */ 274 275 #define MMU_PAGE_COUNT 15 275 276 276 277 #ifdef CONFIG_PPC_BOOK3S_64

+1

arch/powerpc/include/asm/opal.h

··· 67 67 int64_t opal_pci_config_write_word(uint64_t phb_id, uint64_t bus_dev_func, 68 68 uint64_t offset, uint32_t data); 69 69 int64_t opal_set_xive(uint32_t isn, uint16_t server, uint8_t priority); 70 + int64_t opal_rm_set_xive(uint32_t isn, uint16_t server, uint8_t priority); 70 71 int64_t opal_get_xive(uint32_t isn, __be16 *server, uint8_t *priority); 71 72 int64_t opal_register_exception_handler(uint64_t opal_exception, 72 73 uint64_t handler_address,

+3

arch/powerpc/include/asm/pnv-pci.h

··· 12 12 13 13 #include <linux/pci.h> 14 14 #include <linux/pci_hotplug.h> 15 + #include <linux/irq.h> 15 16 #include <misc/cxl-base.h> 16 17 #include <asm/opal-api.h> 17 18 ··· 34 33 void pnv_cxl_release_hwirqs(struct pci_dev *dev, int hwirq, int num); 35 34 int pnv_cxl_get_irq_count(struct pci_dev *dev); 36 35 struct device_node *pnv_pci_get_phb_node(struct pci_dev *dev); 36 + int64_t pnv_opal_pci_msi_eoi(struct irq_chip *chip, unsigned int hw_irq); 37 + bool is_pnv_opal_msi(struct irq_chip *chip); 37 38 38 39 #ifdef CONFIG_CXL_BASE 39 40 int pnv_cxl_alloc_hwirq_ranges(struct cxl_irq_ranges *irqs,

+1

arch/powerpc/include/asm/reg.h

··· 737 737 #define MMCR0_FCHV 0x00000001UL /* freeze conditions in hypervisor mode */ 738 738 #define SPRN_MMCR1 798 739 739 #define SPRN_MMCR2 785 740 + #define SPRN_UMMCR2 769 740 741 #define SPRN_MMCRA 0x312 741 742 #define MMCRA_SDSYNC 0x80000000UL /* SDAR synced with SIAR */ 742 743 #define MMCRA_SDAR_DCACHE_MISS 0x40000000UL

+1 -1

arch/powerpc/kernel/asm-offsets.c

··· 506 506 DEFINE(VCPU_PURR, offsetof(struct kvm_vcpu, arch.purr)); 507 507 DEFINE(VCPU_SPURR, offsetof(struct kvm_vcpu, arch.spurr)); 508 508 DEFINE(VCPU_IC, offsetof(struct kvm_vcpu, arch.ic)); 509 - DEFINE(VCPU_VTB, offsetof(struct kvm_vcpu, arch.vtb)); 510 509 DEFINE(VCPU_DSCR, offsetof(struct kvm_vcpu, arch.dscr)); 511 510 DEFINE(VCPU_AMR, offsetof(struct kvm_vcpu, arch.amr)); 512 511 DEFINE(VCPU_UAMOR, offsetof(struct kvm_vcpu, arch.uamor)); ··· 556 557 DEFINE(VCORE_LPCR, offsetof(struct kvmppc_vcore, lpcr)); 557 558 DEFINE(VCORE_PCR, offsetof(struct kvmppc_vcore, pcr)); 558 559 DEFINE(VCORE_DPDES, offsetof(struct kvmppc_vcore, dpdes)); 560 + DEFINE(VCORE_VTB, offsetof(struct kvmppc_vcore, vtb)); 559 561 DEFINE(VCPU_SLB_E, offsetof(struct kvmppc_slb, orige)); 560 562 DEFINE(VCPU_SLB_V, offsetof(struct kvmppc_slb, origv)); 561 563 DEFINE(VCPU_SLB_SIZE, sizeof(struct kvmppc_slb));

+3

arch/powerpc/kvm/Kconfig

··· 22 22 select ANON_INODES 23 23 select HAVE_KVM_EVENTFD 24 24 select SRCU 25 + select KVM_VFIO 26 + select IRQ_BYPASS_MANAGER 27 + select HAVE_KVM_IRQ_BYPASS 25 28 26 29 config KVM_BOOK3S_HANDLER 27 30 bool

+8 -11

arch/powerpc/kvm/Makefile

··· 7 7 ccflags-y := -Ivirt/kvm -Iarch/powerpc/kvm 8 8 KVM := ../../../virt/kvm 9 9 10 - common-objs-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ 11 - $(KVM)/eventfd.o 10 + common-objs-y = $(KVM)/kvm_main.o $(KVM)/eventfd.o 12 11 common-objs-$(CONFIG_KVM_VFIO) += $(KVM)/vfio.o 12 + common-objs-$(CONFIG_KVM_MMIO) += $(KVM)/coalesced_mmio.o 13 13 14 14 CFLAGS_e500_mmu.o := -I. 15 15 CFLAGS_e500_mmu_host.o := -I. 16 16 CFLAGS_emulate.o := -I. 17 17 CFLAGS_emulate_loadstore.o := -I. 18 18 19 - common-objs-y += powerpc.o emulate.o emulate_loadstore.o 19 + common-objs-y += powerpc.o emulate_loadstore.o 20 20 obj-$(CONFIG_KVM_EXIT_TIMING) += timing.o 21 21 obj-$(CONFIG_KVM_BOOK3S_HANDLER) += book3s_exports.o 22 22 ··· 24 24 25 25 kvm-e500-objs := \ 26 26 $(common-objs-y) \ 27 + emulate.o \ 27 28 booke.o \ 28 29 booke_emulate.o \ 29 30 booke_interrupts.o \ ··· 36 35 37 36 kvm-e500mc-objs := \ 38 37 $(common-objs-y) \ 38 + emulate.o \ 39 39 booke.o \ 40 40 booke_emulate.o \ 41 41 bookehv_interrupts.o \ ··· 63 61 book3s_32_mmu.o 64 62 65 63 ifdef CONFIG_KVM_BOOK3S_PR_POSSIBLE 66 - kvm-book3s_64-module-objs := \ 67 - $(KVM)/coalesced_mmio.o 68 - 69 64 kvm-book3s_64-builtin-objs-$(CONFIG_KVM_BOOK3S_64_HANDLER) += \ 70 65 book3s_rmhandlers.o 71 66 endif ··· 88 89 kvm-book3s_64-objs-$(CONFIG_KVM_XICS) += \ 89 90 book3s_xics.o 90 91 91 - kvm-book3s_64-module-objs += \ 92 - $(KVM)/kvm_main.o \ 93 - $(KVM)/eventfd.o \ 94 - powerpc.o \ 95 - emulate_loadstore.o \ 92 + kvm-book3s_64-module-objs := \ 93 + $(common-objs-y) \ 96 94 book3s.o \ 97 95 book3s_64_vio.o \ 98 96 book3s_rtas.o \ ··· 99 103 100 104 kvm-book3s_32-objs := \ 101 105 $(common-objs-y) \ 106 + emulate.o \ 102 107 fpu.o \ 103 108 book3s_paired_singles.o \ 104 109 book3s.o \

+7 -6

arch/powerpc/kvm/book3s.c

··· 52 52 { "dec", VCPU_STAT(dec_exits) }, 53 53 { "ext_intr", VCPU_STAT(ext_intr_exits) }, 54 54 { "queue_intr", VCPU_STAT(queue_intr) }, 55 + { "halt_poll_success_ns", VCPU_STAT(halt_poll_success_ns) }, 56 + { "halt_poll_fail_ns", VCPU_STAT(halt_poll_fail_ns) }, 57 + { "halt_wait_ns", VCPU_STAT(halt_wait_ns) }, 55 58 { "halt_successful_poll", VCPU_STAT(halt_successful_poll), }, 56 59 { "halt_attempted_poll", VCPU_STAT(halt_attempted_poll), }, 60 + { "halt_successful_wait", VCPU_STAT(halt_successful_wait) }, 57 61 { "halt_poll_invalid", VCPU_STAT(halt_poll_invalid) }, 58 62 { "halt_wakeup", VCPU_STAT(halt_wakeup) }, 59 63 { "pf_storage", VCPU_STAT(pf_storage) }, ··· 68 64 { "ld_slow", VCPU_STAT(ld_slow) }, 69 65 { "st", VCPU_STAT(st) }, 70 66 { "st_slow", VCPU_STAT(st_slow) }, 67 + { "pthru_all", VCPU_STAT(pthru_all) }, 68 + { "pthru_host", VCPU_STAT(pthru_host) }, 69 + { "pthru_bad_aff", VCPU_STAT(pthru_bad_aff) }, 71 70 { NULL } 72 71 }; 73 72 ··· 599 592 case KVM_REG_PPC_BESCR: 600 593 *val = get_reg_val(id, vcpu->arch.bescr); 601 594 break; 602 - case KVM_REG_PPC_VTB: 603 - *val = get_reg_val(id, vcpu->arch.vtb); 604 - break; 605 595 case KVM_REG_PPC_IC: 606 596 *val = get_reg_val(id, vcpu->arch.ic); 607 597 break; ··· 669 665 break; 670 666 case KVM_REG_PPC_BESCR: 671 667 vcpu->arch.bescr = set_reg_val(id, *val); 672 - break; 673 - case KVM_REG_PPC_VTB: 674 - vcpu->arch.vtb = set_reg_val(id, *val); 675 668 break; 676 669 case KVM_REG_PPC_IC: 677 670 vcpu->arch.ic = set_reg_val(id, *val);

+3 -1

arch/powerpc/kvm/book3s_emulate.c

··· 498 498 case SPRN_MMCR0: 499 499 case SPRN_MMCR1: 500 500 case SPRN_MMCR2: 501 + case SPRN_UMMCR2: 501 502 #endif 502 503 break; 503 504 unprivileged: ··· 580 579 *spr_val = vcpu->arch.spurr; 581 580 break; 582 581 case SPRN_VTB: 583 - *spr_val = vcpu->arch.vtb; 582 + *spr_val = to_book3s(vcpu)->vtb; 584 583 break; 585 584 case SPRN_IC: 586 585 *spr_val = vcpu->arch.ic; ··· 641 640 case SPRN_MMCR0: 642 641 case SPRN_MMCR1: 643 642 case SPRN_MMCR2: 643 + case SPRN_UMMCR2: 644 644 case SPRN_TIR: 645 645 #endif 646 646 *spr_val = 0;

+373 -160

arch/powerpc/kvm/book3s_hv.c

··· 53 53 #include <asm/smp.h> 54 54 #include <asm/dbell.h> 55 55 #include <asm/hmi.h> 56 + #include <asm/pnv-pci.h> 56 57 #include <linux/gfp.h> 57 58 #include <linux/vmalloc.h> 58 59 #include <linux/highmem.h> 59 60 #include <linux/hugetlb.h> 61 + #include <linux/kvm_irqfd.h> 62 + #include <linux/irqbypass.h> 60 63 #include <linux/module.h> 64 + #include <linux/compiler.h> 61 65 62 66 #include "book3s.h" 63 67 ··· 74 70 75 71 /* Used to indicate that a guest page fault needs to be handled */ 76 72 #define RESUME_PAGE_FAULT (RESUME_GUEST | RESUME_FLAG_ARCH1) 73 + /* Used to indicate that a guest passthrough interrupt needs to be handled */ 74 + #define RESUME_PASSTHROUGH (RESUME_GUEST | RESUME_FLAG_ARCH2) 77 75 78 76 /* Used as a "null" value for timebase values */ 79 77 #define TB_NIL (~(u64)0) ··· 95 89 .get = param_get_int, 96 90 }; 97 91 92 + module_param_cb(kvm_irq_bypass, &module_param_ops, &kvm_irq_bypass, 93 + S_IRUGO | S_IWUSR); 94 + MODULE_PARM_DESC(kvm_irq_bypass, "Bypass passthrough interrupt optimization"); 95 + 98 96 module_param_cb(h_ipi_redirect, &module_param_ops, &h_ipi_redirect, 99 97 S_IRUGO | S_IWUSR); 100 98 MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core"); 101 99 #endif 102 100 101 + /* Maximum halt poll interval defaults to KVM_HALT_POLL_NS_DEFAULT */ 102 + static unsigned int halt_poll_max_ns = KVM_HALT_POLL_NS_DEFAULT; 103 + module_param(halt_poll_max_ns, uint, S_IRUGO | S_IWUSR); 104 + MODULE_PARM_DESC(halt_poll_max_ns, "Maximum halt poll time in ns"); 105 + 106 + /* Factor by which the vcore halt poll interval is grown, default is to double 107 + */ 108 + static unsigned int halt_poll_ns_grow = 2; 109 + module_param(halt_poll_ns_grow, int, S_IRUGO); 110 + MODULE_PARM_DESC(halt_poll_ns_grow, "Factor halt poll time is grown by"); 111 + 112 + /* Factor by which the vcore halt poll interval is shrunk, default is to reset 113 + */ 114 + static unsigned int halt_poll_ns_shrink; 115 + module_param(halt_poll_ns_shrink, int, S_IRUGO); 116 + MODULE_PARM_DESC(halt_poll_ns_shrink, "Factor halt poll time is shrunk by"); 117 + 103 118 static void kvmppc_end_cede(struct kvm_vcpu *vcpu); 104 119 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); 120 + 121 + static inline struct kvm_vcpu *next_runnable_thread(struct kvmppc_vcore *vc, 122 + int *ip) 123 + { 124 + int i = *ip; 125 + struct kvm_vcpu *vcpu; 126 + 127 + while (++i < MAX_SMT_THREADS) { 128 + vcpu = READ_ONCE(vc->runnable_threads[i]); 129 + if (vcpu) { 130 + *ip = i; 131 + return vcpu; 132 + } 133 + } 134 + return NULL; 135 + } 136 + 137 + /* Used to traverse the list of runnable threads for a given vcore */ 138 + #define for_each_runnable_thread(i, vcpu, vc) \ 139 + for (i = -1; (vcpu = next_runnable_thread(vc, &i)); ) 105 140 106 141 static bool kvmppc_ipi_thread(int cpu) 107 142 { ··· 1038 991 kvmppc_core_queue_program(vcpu, SRR1_PROGILL); 1039 992 r = RESUME_GUEST; 1040 993 break; 994 + case BOOK3S_INTERRUPT_HV_RM_HARD: 995 + r = RESUME_PASSTHROUGH; 996 + break; 1041 997 default: 1042 998 kvmppc_dump_regs(vcpu); 1043 999 printk(KERN_EMERG "trap=0x%x | pc=0x%lx | msr=0x%llx\n", ··· 1198 1148 break; 1199 1149 case KVM_REG_PPC_DPDES: 1200 1150 *val = get_reg_val(id, vcpu->arch.vcore->dpdes); 1151 + break; 1152 + case KVM_REG_PPC_VTB: 1153 + *val = get_reg_val(id, vcpu->arch.vcore->vtb); 1201 1154 break; 1202 1155 case KVM_REG_PPC_DAWR: 1203 1156 *val = get_reg_val(id, vcpu->arch.dawr); ··· 1394 1341 case KVM_REG_PPC_DPDES: 1395 1342 vcpu->arch.vcore->dpdes = set_reg_val(id, *val); 1396 1343 break; 1344 + case KVM_REG_PPC_VTB: 1345 + vcpu->arch.vcore->vtb = set_reg_val(id, *val); 1346 + break; 1397 1347 case KVM_REG_PPC_DAWR: 1398 1348 vcpu->arch.dawr = set_reg_val(id, *val); 1399 1349 break; ··· 1549 1493 if (vcore == NULL) 1550 1494 return NULL; 1551 1495 1552 - INIT_LIST_HEAD(&vcore->runnable_threads); 1553 1496 spin_lock_init(&vcore->lock); 1554 1497 spin_lock_init(&vcore->stoltb_lock); 1555 1498 init_swait_queue_head(&vcore->wq); ··· 1857 1802 vcpu->arch.state = KVMPPC_VCPU_BUSY_IN_HOST; 1858 1803 spin_unlock_irq(&vcpu->arch.tbacct_lock); 1859 1804 --vc->n_runnable; 1860 - list_del(&vcpu->arch.run_list); 1805 + WRITE_ONCE(vc->runnable_threads[vcpu->arch.ptid], NULL); 1861 1806 } 1862 1807 1863 1808 static int kvmppc_grab_hwthread(int cpu) ··· 2103 2048 vc->conferring_threads = 0; 2104 2049 } 2105 2050 2106 - /* 2107 - * See if the existing subcores can be split into 3 (or fewer) subcores 2108 - * of at most two threads each, so we can fit in another vcore. This 2109 - * assumes there are at most two subcores and at most 6 threads in total. 2110 - */ 2111 - static bool can_split_piggybacked_subcores(struct core_info *cip) 2112 - { 2113 - int sub, new_sub; 2114 - int large_sub = -1; 2115 - int thr; 2116 - int n_subcores = cip->n_subcores; 2117 - struct kvmppc_vcore *vc, *vcnext; 2118 - struct kvmppc_vcore *master_vc = NULL; 2119 - 2120 - for (sub = 0; sub < cip->n_subcores; ++sub) { 2121 - if (cip->subcore_threads[sub] <= 2) 2122 - continue; 2123 - if (large_sub >= 0) 2124 - return false; 2125 - large_sub = sub; 2126 - vc = list_first_entry(&cip->vcs[sub], struct kvmppc_vcore, 2127 - preempt_list); 2128 - if (vc->num_threads > 2) 2129 - return false; 2130 - n_subcores += (cip->subcore_threads[sub] - 1) >> 1; 2131 - } 2132 - if (large_sub < 0 || !subcore_config_ok(n_subcores + 1, 2)) 2133 - return false; 2134 - 2135 - /* 2136 - * Seems feasible, so go through and move vcores to new subcores. 2137 - * Note that when we have two or more vcores in one subcore, 2138 - * all those vcores must have only one thread each. 2139 - */ 2140 - new_sub = cip->n_subcores; 2141 - thr = 0; 2142 - sub = large_sub; 2143 - list_for_each_entry_safe(vc, vcnext, &cip->vcs[sub], preempt_list) { 2144 - if (thr >= 2) { 2145 - list_del(&vc->preempt_list); 2146 - list_add_tail(&vc->preempt_list, &cip->vcs[new_sub]); 2147 - /* vc->num_threads must be 1 */ 2148 - if (++cip->subcore_threads[new_sub] == 1) { 2149 - cip->subcore_vm[new_sub] = vc->kvm; 2150 - init_master_vcore(vc); 2151 - master_vc = vc; 2152 - ++cip->n_subcores; 2153 - } else { 2154 - vc->master_vcore = master_vc; 2155 - ++new_sub; 2156 - } 2157 - } 2158 - thr += vc->num_threads; 2159 - } 2160 - cip->subcore_threads[large_sub] = 2; 2161 - cip->max_subcore_threads = 2; 2162 - 2163 - return true; 2164 - } 2165 - 2166 2051 static bool can_dynamic_split(struct kvmppc_vcore *vc, struct core_info *cip) 2167 2052 { 2168 2053 int n_threads = vc->num_threads; ··· 2113 2118 2114 2119 if (n_threads < cip->max_subcore_threads) 2115 2120 n_threads = cip->max_subcore_threads; 2116 - if (subcore_config_ok(cip->n_subcores + 1, n_threads)) { 2117 - cip->max_subcore_threads = n_threads; 2118 - } else if (cip->n_subcores <= 2 && cip->total_threads <= 6 && 2119 - vc->num_threads <= 2) { 2120 - /* 2121 - * We may be able to fit another subcore in by 2122 - * splitting an existing subcore with 3 or 4 2123 - * threads into two 2-thread subcores, or one 2124 - * with 5 or 6 threads into three subcores. 2125 - * We can only do this if those subcores have 2126 - * piggybacked virtual cores. 2127 - */ 2128 - if (!can_split_piggybacked_subcores(cip)) 2129 - return false; 2130 - } else { 2121 + if (!subcore_config_ok(cip->n_subcores + 1, n_threads)) 2131 2122 return false; 2132 - } 2123 + cip->max_subcore_threads = n_threads; 2133 2124 2134 2125 sub = cip->n_subcores; 2135 2126 ++cip->n_subcores; ··· 2129 2148 return true; 2130 2149 } 2131 2150 2132 - static bool can_piggyback_subcore(struct kvmppc_vcore *pvc, 2133 - struct core_info *cip, int sub) 2134 - { 2135 - struct kvmppc_vcore *vc; 2136 - int n_thr; 2137 - 2138 - vc = list_first_entry(&cip->vcs[sub], struct kvmppc_vcore, 2139 - preempt_list); 2140 - 2141 - /* require same VM and same per-core reg values */ 2142 - if (pvc->kvm != vc->kvm || 2143 - pvc->tb_offset != vc->tb_offset || 2144 - pvc->pcr != vc->pcr || 2145 - pvc->lpcr != vc->lpcr) 2146 - return false; 2147 - 2148 - /* P8 guest with > 1 thread per core would see wrong TIR value */ 2149 - if (cpu_has_feature(CPU_FTR_ARCH_207S) && 2150 - (vc->num_threads > 1 || pvc->num_threads > 1)) 2151 - return false; 2152 - 2153 - n_thr = cip->subcore_threads[sub] + pvc->num_threads; 2154 - if (n_thr > cip->max_subcore_threads) { 2155 - if (!subcore_config_ok(cip->n_subcores, n_thr)) 2156 - return false; 2157 - cip->max_subcore_threads = n_thr; 2158 - } 2159 - 2160 - cip->total_threads += pvc->num_threads; 2161 - cip->subcore_threads[sub] = n_thr; 2162 - pvc->master_vcore = vc; 2163 - list_del(&pvc->preempt_list); 2164 - list_add_tail(&pvc->preempt_list, &cip->vcs[sub]); 2165 - 2166 - return true; 2167 - } 2168 - 2169 2151 /* 2170 2152 * Work out whether it is possible to piggyback the execution of 2171 2153 * vcore *pvc onto the execution of the other vcores described in *cip. ··· 2136 2192 static bool can_piggyback(struct kvmppc_vcore *pvc, struct core_info *cip, 2137 2193 int target_threads) 2138 2194 { 2139 - int sub; 2140 - 2141 2195 if (cip->total_threads + pvc->num_threads > target_threads) 2142 2196 return false; 2143 - for (sub = 0; sub < cip->n_subcores; ++sub) 2144 - if (cip->subcore_threads[sub] && 2145 - can_piggyback_subcore(pvc, cip, sub)) 2146 - return true; 2147 2197 2148 - if (can_dynamic_split(pvc, cip)) 2149 - return true; 2150 - 2151 - return false; 2198 + return can_dynamic_split(pvc, cip); 2152 2199 } 2153 2200 2154 2201 static void prepare_threads(struct kvmppc_vcore *vc) 2155 2202 { 2156 - struct kvm_vcpu *vcpu, *vnext; 2203 + int i; 2204 + struct kvm_vcpu *vcpu; 2157 2205 2158 - list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads, 2159 - arch.run_list) { 2206 + for_each_runnable_thread(i, vcpu, vc) { 2160 2207 if (signal_pending(vcpu->arch.run_task)) 2161 2208 vcpu->arch.ret = -EINTR; 2162 2209 else if (vcpu->arch.vpa.update_pending || ··· 2194 2259 2195 2260 static void post_guest_process(struct kvmppc_vcore *vc, bool is_master) 2196 2261 { 2197 - int still_running = 0; 2262 + int still_running = 0, i; 2198 2263 u64 now; 2199 2264 long ret; 2200 - struct kvm_vcpu *vcpu, *vnext; 2265 + struct kvm_vcpu *vcpu; 2201 2266 2202 2267 spin_lock(&vc->lock); 2203 2268 now = get_tb(); 2204 - list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads, 2205 - arch.run_list) { 2269 + for_each_runnable_thread(i, vcpu, vc) { 2206 2270 /* cancel pending dec exception if dec is positive */ 2207 2271 if (now < vcpu->arch.dec_expires && 2208 2272 kvmppc_core_pending_dec(vcpu)) ··· 2241 2307 } 2242 2308 if (vc->n_runnable > 0 && vc->runner == NULL) { 2243 2309 /* make sure there's a candidate runner awake */ 2244 - vcpu = list_first_entry(&vc->runnable_threads, 2245 - struct kvm_vcpu, arch.run_list); 2310 + i = -1; 2311 + vcpu = next_runnable_thread(vc, &i); 2246 2312 wake_up(&vcpu->arch.cpu_run); 2247 2313 } 2248 2314 } ··· 2295 2361 */ 2296 2362 static noinline void kvmppc_run_core(struct kvmppc_vcore *vc) 2297 2363 { 2298 - struct kvm_vcpu *vcpu, *vnext; 2364 + struct kvm_vcpu *vcpu; 2299 2365 int i; 2300 2366 int srcu_idx; 2301 2367 struct core_info core_info; ··· 2331 2397 */ 2332 2398 if ((threads_per_core > 1) && 2333 2399 ((vc->num_threads > threads_per_subcore) || !on_primary_thread())) { 2334 - list_for_each_entry_safe(vcpu, vnext, &vc->runnable_threads, 2335 - arch.run_list) { 2400 + for_each_runnable_thread(i, vcpu, vc) { 2336 2401 vcpu->arch.ret = -EBUSY; 2337 2402 kvmppc_remove_runnable(vc, vcpu); 2338 2403 wake_up(&vcpu->arch.cpu_run); ··· 2410 2477 active |= 1 << thr; 2411 2478 list_for_each_entry(pvc, &core_info.vcs[sub], preempt_list) { 2412 2479 pvc->pcpu = pcpu + thr; 2413 - list_for_each_entry(vcpu, &pvc->runnable_threads, 2414 - arch.run_list) { 2480 + for_each_runnable_thread(i, vcpu, pvc) { 2415 2481 kvmppc_start_thread(vcpu, pvc); 2416 2482 kvmppc_create_dtl_entry(vcpu, pvc); 2417 2483 trace_kvm_guest_enter(vcpu); ··· 2536 2604 finish_wait(&vcpu->arch.cpu_run, &wait); 2537 2605 } 2538 2606 2607 + static void grow_halt_poll_ns(struct kvmppc_vcore *vc) 2608 + { 2609 + /* 10us base */ 2610 + if (vc->halt_poll_ns == 0 && halt_poll_ns_grow) 2611 + vc->halt_poll_ns = 10000; 2612 + else 2613 + vc->halt_poll_ns *= halt_poll_ns_grow; 2614 + 2615 + if (vc->halt_poll_ns > halt_poll_max_ns) 2616 + vc->halt_poll_ns = halt_poll_max_ns; 2617 + } 2618 + 2619 + static void shrink_halt_poll_ns(struct kvmppc_vcore *vc) 2620 + { 2621 + if (halt_poll_ns_shrink == 0) 2622 + vc->halt_poll_ns = 0; 2623 + else 2624 + vc->halt_poll_ns /= halt_poll_ns_shrink; 2625 + } 2626 + 2627 + /* Check to see if any of the runnable vcpus on the vcore have pending 2628 + * exceptions or are no longer ceded 2629 + */ 2630 + static int kvmppc_vcore_check_block(struct kvmppc_vcore *vc) 2631 + { 2632 + struct kvm_vcpu *vcpu; 2633 + int i; 2634 + 2635 + for_each_runnable_thread(i, vcpu, vc) { 2636 + if (vcpu->arch.pending_exceptions || !vcpu->arch.ceded) 2637 + return 1; 2638 + } 2639 + 2640 + return 0; 2641 + } 2642 + 2539 2643 /* 2540 2644 * All the vcpus in this vcore are idle, so wait for a decrementer 2541 2645 * or external interrupt to one of the vcpus. vc->lock is held. 2542 2646 */ 2543 2647 static void kvmppc_vcore_blocked(struct kvmppc_vcore *vc) 2544 2648 { 2545 - struct kvm_vcpu *vcpu; 2649 + ktime_t cur, start_poll, start_wait; 2546 2650 int do_sleep = 1; 2651 + u64 block_ns; 2547 2652 DECLARE_SWAITQUEUE(wait); 2548 2653 2549 - prepare_to_swait(&vc->wq, &wait, TASK_INTERRUPTIBLE); 2654 + /* Poll for pending exceptions and ceded state */ 2655 + cur = start_poll = ktime_get(); 2656 + if (vc->halt_poll_ns) { 2657 + ktime_t stop = ktime_add_ns(start_poll, vc->halt_poll_ns); 2658 + ++vc->runner->stat.halt_attempted_poll; 2550 2659 2551 - /* 2552 - * Check one last time for pending exceptions and ceded state after 2553 - * we put ourselves on the wait queue 2554 - */ 2555 - list_for_each_entry(vcpu, &vc->runnable_threads, arch.run_list) { 2556 - if (vcpu->arch.pending_exceptions || !vcpu->arch.ceded) { 2557 - do_sleep = 0; 2558 - break; 2660 + vc->vcore_state = VCORE_POLLING; 2661 + spin_unlock(&vc->lock); 2662 + 2663 + do { 2664 + if (kvmppc_vcore_check_block(vc)) { 2665 + do_sleep = 0; 2666 + break; 2667 + } 2668 + cur = ktime_get(); 2669 + } while (single_task_running() && ktime_before(cur, stop)); 2670 + 2671 + spin_lock(&vc->lock); 2672 + vc->vcore_state = VCORE_INACTIVE; 2673 + 2674 + if (!do_sleep) { 2675 + ++vc->runner->stat.halt_successful_poll; 2676 + goto out; 2559 2677 } 2560 2678 } 2561 2679 2562 - if (!do_sleep) { 2680 + prepare_to_swait(&vc->wq, &wait, TASK_INTERRUPTIBLE); 2681 + 2682 + if (kvmppc_vcore_check_block(vc)) { 2563 2683 finish_swait(&vc->wq, &wait); 2564 - return; 2684 + do_sleep = 0; 2685 + /* If we polled, count this as a successful poll */ 2686 + if (vc->halt_poll_ns) 2687 + ++vc->runner->stat.halt_successful_poll; 2688 + goto out; 2565 2689 } 2690 + 2691 + start_wait = ktime_get(); 2566 2692 2567 2693 vc->vcore_state = VCORE_SLEEPING; 2568 2694 trace_kvmppc_vcore_blocked(vc, 0); ··· 2630 2640 spin_lock(&vc->lock); 2631 2641 vc->vcore_state = VCORE_INACTIVE; 2632 2642 trace_kvmppc_vcore_blocked(vc, 1); 2643 + ++vc->runner->stat.halt_successful_wait; 2644 + 2645 + cur = ktime_get(); 2646 + 2647 + out: 2648 + block_ns = ktime_to_ns(cur) - ktime_to_ns(start_poll); 2649 + 2650 + /* Attribute wait time */ 2651 + if (do_sleep) { 2652 + vc->runner->stat.halt_wait_ns += 2653 + ktime_to_ns(cur) - ktime_to_ns(start_wait); 2654 + /* Attribute failed poll time */ 2655 + if (vc->halt_poll_ns) 2656 + vc->runner->stat.halt_poll_fail_ns += 2657 + ktime_to_ns(start_wait) - 2658 + ktime_to_ns(start_poll); 2659 + } else { 2660 + /* Attribute successful poll time */ 2661 + if (vc->halt_poll_ns) 2662 + vc->runner->stat.halt_poll_success_ns += 2663 + ktime_to_ns(cur) - 2664 + ktime_to_ns(start_poll); 2665 + } 2666 + 2667 + /* Adjust poll time */ 2668 + if (halt_poll_max_ns) { 2669 + if (block_ns <= vc->halt_poll_ns) 2670 + ; 2671 + /* We slept and blocked for longer than the max halt time */ 2672 + else if (vc->halt_poll_ns && block_ns > halt_poll_max_ns) 2673 + shrink_halt_poll_ns(vc); 2674 + /* We slept and our poll time is too small */ 2675 + else if (vc->halt_poll_ns < halt_poll_max_ns && 2676 + block_ns < halt_poll_max_ns) 2677 + grow_halt_poll_ns(vc); 2678 + } else 2679 + vc->halt_poll_ns = 0; 2680 + 2681 + trace_kvmppc_vcore_wakeup(do_sleep, block_ns); 2633 2682 } 2634 2683 2635 2684 static int kvmppc_run_vcpu(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) 2636 2685 { 2637 - int n_ceded; 2686 + int n_ceded, i; 2638 2687 struct kvmppc_vcore *vc; 2639 - struct kvm_vcpu *v, *vn; 2688 + struct kvm_vcpu *v; 2640 2689 2641 2690 trace_kvmppc_run_vcpu_enter(vcpu); 2642 2691 ··· 2695 2666 vcpu->arch.stolen_logged = vcore_stolen_time(vc, mftb()); 2696 2667 vcpu->arch.state = KVMPPC_VCPU_RUNNABLE; 2697 2668 vcpu->arch.busy_preempt = TB_NIL; 2698 - list_add_tail(&vcpu->arch.run_list, &vc->runnable_threads); 2669 + WRITE_ONCE(vc->runnable_threads[vcpu->arch.ptid], vcpu); 2699 2670 ++vc->n_runnable; 2700 2671 2701 2672 /* ··· 2735 2706 kvmppc_wait_for_exec(vc, vcpu, TASK_INTERRUPTIBLE); 2736 2707 continue; 2737 2708 } 2738 - list_for_each_entry_safe(v, vn, &vc->runnable_threads, 2739 - arch.run_list) { 2709 + for_each_runnable_thread(i, v, vc) { 2740 2710 kvmppc_core_prepare_to_enter(v); 2741 2711 if (signal_pending(v->arch.run_task)) { 2742 2712 kvmppc_remove_runnable(vc, v); ··· 2748 2720 if (!vc->n_runnable || vcpu->arch.state != KVMPPC_VCPU_RUNNABLE) 2749 2721 break; 2750 2722 n_ceded = 0; 2751 - list_for_each_entry(v, &vc->runnable_threads, arch.run_list) { 2723 + for_each_runnable_thread(i, v, vc) { 2752 2724 if (!v->arch.pending_exceptions) 2753 2725 n_ceded += v->arch.ceded; 2754 2726 else ··· 2787 2759 2788 2760 if (vc->n_runnable && vc->vcore_state == VCORE_INACTIVE) { 2789 2761 /* Wake up some vcpu to run the core */ 2790 - v = list_first_entry(&vc->runnable_threads, 2791 - struct kvm_vcpu, arch.run_list); 2762 + i = -1; 2763 + v = next_runnable_thread(vc, &i); 2792 2764 wake_up(&v->arch.cpu_run); 2793 2765 } 2794 2766 ··· 2846 2818 r = kvmppc_book3s_hv_page_fault(run, vcpu, 2847 2819 vcpu->arch.fault_dar, vcpu->arch.fault_dsisr); 2848 2820 srcu_read_unlock(&vcpu->kvm->srcu, srcu_idx); 2849 - } 2821 + } else if (r == RESUME_PASSTHROUGH) 2822 + r = kvmppc_xics_rm_complete(vcpu, 0); 2850 2823 } while (is_kvmppc_resume_guest(r)); 2851 2824 2852 2825 out: ··· 3276 3247 kvmppc_free_vcores(kvm); 3277 3248 3278 3249 kvmppc_free_hpt(kvm); 3250 + 3251 + kvmppc_free_pimap(kvm); 3279 3252 } 3280 3253 3281 3254 /* We don't need to emulate any privileged instructions or dcbz */ ··· 3312 3281 3313 3282 return 0; 3314 3283 } 3284 + 3285 + #ifdef CONFIG_KVM_XICS 3286 + 3287 + void kvmppc_free_pimap(struct kvm *kvm) 3288 + { 3289 + kfree(kvm->arch.pimap); 3290 + } 3291 + 3292 + static struct kvmppc_passthru_irqmap *kvmppc_alloc_pimap(void) 3293 + { 3294 + return kzalloc(sizeof(struct kvmppc_passthru_irqmap), GFP_KERNEL); 3295 + } 3296 + 3297 + static int kvmppc_set_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi) 3298 + { 3299 + struct irq_desc *desc; 3300 + struct kvmppc_irq_map *irq_map; 3301 + struct kvmppc_passthru_irqmap *pimap; 3302 + struct irq_chip *chip; 3303 + int i; 3304 + 3305 + if (!kvm_irq_bypass) 3306 + return 1; 3307 + 3308 + desc = irq_to_desc(host_irq); 3309 + if (!desc) 3310 + return -EIO; 3311 + 3312 + mutex_lock(&kvm->lock); 3313 + 3314 + pimap = kvm->arch.pimap; 3315 + if (pimap == NULL) { 3316 + /* First call, allocate structure to hold IRQ map */ 3317 + pimap = kvmppc_alloc_pimap(); 3318 + if (pimap == NULL) { 3319 + mutex_unlock(&kvm->lock); 3320 + return -ENOMEM; 3321 + } 3322 + kvm->arch.pimap = pimap; 3323 + } 3324 + 3325 + /* 3326 + * For now, we only support interrupts for which the EOI operation 3327 + * is an OPAL call followed by a write to XIRR, since that's 3328 + * what our real-mode EOI code does. 3329 + */ 3330 + chip = irq_data_get_irq_chip(&desc->irq_data); 3331 + if (!chip || !is_pnv_opal_msi(chip)) { 3332 + pr_warn("kvmppc_set_passthru_irq_hv: Could not assign IRQ map for (%d,%d)\n", 3333 + host_irq, guest_gsi); 3334 + mutex_unlock(&kvm->lock); 3335 + return -ENOENT; 3336 + } 3337 + 3338 + /* 3339 + * See if we already have an entry for this guest IRQ number. 3340 + * If it's mapped to a hardware IRQ number, that's an error, 3341 + * otherwise re-use this entry. 3342 + */ 3343 + for (i = 0; i < pimap->n_mapped; i++) { 3344 + if (guest_gsi == pimap->mapped[i].v_hwirq) { 3345 + if (pimap->mapped[i].r_hwirq) { 3346 + mutex_unlock(&kvm->lock); 3347 + return -EINVAL; 3348 + } 3349 + break; 3350 + } 3351 + } 3352 + 3353 + if (i == KVMPPC_PIRQ_MAPPED) { 3354 + mutex_unlock(&kvm->lock); 3355 + return -EAGAIN; /* table is full */ 3356 + } 3357 + 3358 + irq_map = &pimap->mapped[i]; 3359 + 3360 + irq_map->v_hwirq = guest_gsi; 3361 + irq_map->desc = desc; 3362 + 3363 + /* 3364 + * Order the above two stores before the next to serialize with 3365 + * the KVM real mode handler. 3366 + */ 3367 + smp_wmb(); 3368 + irq_map->r_hwirq = desc->irq_data.hwirq; 3369 + 3370 + if (i == pimap->n_mapped) 3371 + pimap->n_mapped++; 3372 + 3373 + kvmppc_xics_set_mapped(kvm, guest_gsi, desc->irq_data.hwirq); 3374 + 3375 + mutex_unlock(&kvm->lock); 3376 + 3377 + return 0; 3378 + } 3379 + 3380 + static int kvmppc_clr_passthru_irq(struct kvm *kvm, int host_irq, int guest_gsi) 3381 + { 3382 + struct irq_desc *desc; 3383 + struct kvmppc_passthru_irqmap *pimap; 3384 + int i; 3385 + 3386 + if (!kvm_irq_bypass) 3387 + return 0; 3388 + 3389 + desc = irq_to_desc(host_irq); 3390 + if (!desc) 3391 + return -EIO; 3392 + 3393 + mutex_lock(&kvm->lock); 3394 + 3395 + if (kvm->arch.pimap == NULL) { 3396 + mutex_unlock(&kvm->lock); 3397 + return 0; 3398 + } 3399 + pimap = kvm->arch.pimap; 3400 + 3401 + for (i = 0; i < pimap->n_mapped; i++) { 3402 + if (guest_gsi == pimap->mapped[i].v_hwirq) 3403 + break; 3404 + } 3405 + 3406 + if (i == pimap->n_mapped) { 3407 + mutex_unlock(&kvm->lock); 3408 + return -ENODEV; 3409 + } 3410 + 3411 + kvmppc_xics_clr_mapped(kvm, guest_gsi, pimap->mapped[i].r_hwirq); 3412 + 3413 + /* invalidate the entry */ 3414 + pimap->mapped[i].r_hwirq = 0; 3415 + 3416 + /* 3417 + * We don't free this structure even when the count goes to 3418 + * zero. The structure is freed when we destroy the VM. 3419 + */ 3420 + 3421 + mutex_unlock(&kvm->lock); 3422 + return 0; 3423 + } 3424 + 3425 + static int kvmppc_irq_bypass_add_producer_hv(struct irq_bypass_consumer *cons, 3426 + struct irq_bypass_producer *prod) 3427 + { 3428 + int ret = 0; 3429 + struct kvm_kernel_irqfd *irqfd = 3430 + container_of(cons, struct kvm_kernel_irqfd, consumer); 3431 + 3432 + irqfd->producer = prod; 3433 + 3434 + ret = kvmppc_set_passthru_irq(irqfd->kvm, prod->irq, irqfd->gsi); 3435 + if (ret) 3436 + pr_info("kvmppc_set_passthru_irq (irq %d, gsi %d) fails: %d\n", 3437 + prod->irq, irqfd->gsi, ret); 3438 + 3439 + return ret; 3440 + } 3441 + 3442 + static void kvmppc_irq_bypass_del_producer_hv(struct irq_bypass_consumer *cons, 3443 + struct irq_bypass_producer *prod) 3444 + { 3445 + int ret; 3446 + struct kvm_kernel_irqfd *irqfd = 3447 + container_of(cons, struct kvm_kernel_irqfd, consumer); 3448 + 3449 + irqfd->producer = NULL; 3450 + 3451 + /* 3452 + * When producer of consumer is unregistered, we change back to 3453 + * default external interrupt handling mode - KVM real mode 3454 + * will switch back to host. 3455 + */ 3456 + ret = kvmppc_clr_passthru_irq(irqfd->kvm, prod->irq, irqfd->gsi); 3457 + if (ret) 3458 + pr_warn("kvmppc_clr_passthru_irq (irq %d, gsi %d) fails: %d\n", 3459 + prod->irq, irqfd->gsi, ret); 3460 + } 3461 + #endif 3315 3462 3316 3463 static long kvm_arch_vm_ioctl_hv(struct file *filp, 3317 3464 unsigned int ioctl, unsigned long arg) ··· 3609 3400 .fast_vcpu_kick = kvmppc_fast_vcpu_kick_hv, 3610 3401 .arch_vm_ioctl = kvm_arch_vm_ioctl_hv, 3611 3402 .hcall_implemented = kvmppc_hcall_impl_hv, 3403 + #ifdef CONFIG_KVM_XICS 3404 + .irq_bypass_add_producer = kvmppc_irq_bypass_add_producer_hv, 3405 + .irq_bypass_del_producer = kvmppc_irq_bypass_del_producer_hv, 3406 + #endif 3612 3407 }; 3613 3408 3614 3409 static int kvm_init_subcore_bitmap(void)

+156

arch/powerpc/kvm/book3s_hv_builtin.c

··· 25 25 #include <asm/xics.h> 26 26 #include <asm/dbell.h> 27 27 #include <asm/cputhreads.h> 28 + #include <asm/io.h> 28 29 29 30 #define KVM_CMA_CHUNK_ORDER 18 30 31 ··· 287 286 288 287 struct kvmppc_host_rm_ops *kvmppc_host_rm_ops_hv; 289 288 EXPORT_SYMBOL_GPL(kvmppc_host_rm_ops_hv); 289 + 290 + #ifdef CONFIG_KVM_XICS 291 + static struct kvmppc_irq_map *get_irqmap(struct kvmppc_passthru_irqmap *pimap, 292 + u32 xisr) 293 + { 294 + int i; 295 + 296 + /* 297 + * We access the mapped array here without a lock. That 298 + * is safe because we never reduce the number of entries 299 + * in the array and we never change the v_hwirq field of 300 + * an entry once it is set. 301 + * 302 + * We have also carefully ordered the stores in the writer 303 + * and the loads here in the reader, so that if we find a matching 304 + * hwirq here, the associated GSI and irq_desc fields are valid. 305 + */ 306 + for (i = 0; i < pimap->n_mapped; i++) { 307 + if (xisr == pimap->mapped[i].r_hwirq) { 308 + /* 309 + * Order subsequent reads in the caller to serialize 310 + * with the writer. 311 + */ 312 + smp_rmb(); 313 + return &pimap->mapped[i]; 314 + } 315 + } 316 + return NULL; 317 + } 318 + 319 + /* 320 + * If we have an interrupt that's not an IPI, check if we have a 321 + * passthrough adapter and if so, check if this external interrupt 322 + * is for the adapter. 323 + * We will attempt to deliver the IRQ directly to the target VCPU's 324 + * ICP, the virtual ICP (based on affinity - the xive value in ICS). 325 + * 326 + * If the delivery fails or if this is not for a passthrough adapter, 327 + * return to the host to handle this interrupt. We earlier 328 + * saved a copy of the XIRR in the PACA, it will be picked up by 329 + * the host ICP driver. 330 + */ 331 + static int kvmppc_check_passthru(u32 xisr, __be32 xirr) 332 + { 333 + struct kvmppc_passthru_irqmap *pimap; 334 + struct kvmppc_irq_map *irq_map; 335 + struct kvm_vcpu *vcpu; 336 + 337 + vcpu = local_paca->kvm_hstate.kvm_vcpu; 338 + if (!vcpu) 339 + return 1; 340 + pimap = kvmppc_get_passthru_irqmap(vcpu->kvm); 341 + if (!pimap) 342 + return 1; 343 + irq_map = get_irqmap(pimap, xisr); 344 + if (!irq_map) 345 + return 1; 346 + 347 + /* We're handling this interrupt, generic code doesn't need to */ 348 + local_paca->kvm_hstate.saved_xirr = 0; 349 + 350 + return kvmppc_deliver_irq_passthru(vcpu, xirr, irq_map, pimap); 351 + } 352 + 353 + #else 354 + static inline int kvmppc_check_passthru(u32 xisr, __be32 xirr) 355 + { 356 + return 1; 357 + } 358 + #endif 359 + 360 + /* 361 + * Determine what sort of external interrupt is pending (if any). 362 + * Returns: 363 + * 0 if no interrupt is pending 364 + * 1 if an interrupt is pending that needs to be handled by the host 365 + * 2 Passthrough that needs completion in the host 366 + * -1 if there was a guest wakeup IPI (which has now been cleared) 367 + * -2 if there is PCI passthrough external interrupt that was handled 368 + */ 369 + 370 + long kvmppc_read_intr(void) 371 + { 372 + unsigned long xics_phys; 373 + u32 h_xirr; 374 + __be32 xirr; 375 + u32 xisr; 376 + u8 host_ipi; 377 + 378 + /* see if a host IPI is pending */ 379 + host_ipi = local_paca->kvm_hstate.host_ipi; 380 + if (host_ipi) 381 + return 1; 382 + 383 + /* Now read the interrupt from the ICP */ 384 + xics_phys = local_paca->kvm_hstate.xics_phys; 385 + if (unlikely(!xics_phys)) 386 + return 1; 387 + 388 + /* 389 + * Save XIRR for later. Since we get control in reverse endian 390 + * on LE systems, save it byte reversed and fetch it back in 391 + * host endian. Note that xirr is the value read from the 392 + * XIRR register, while h_xirr is the host endian version. 393 + */ 394 + xirr = _lwzcix(xics_phys + XICS_XIRR); 395 + h_xirr = be32_to_cpu(xirr); 396 + local_paca->kvm_hstate.saved_xirr = h_xirr; 397 + xisr = h_xirr & 0xffffff; 398 + /* 399 + * Ensure that the store/load complete to guarantee all side 400 + * effects of loading from XIRR has completed 401 + */ 402 + smp_mb(); 403 + 404 + /* if nothing pending in the ICP */ 405 + if (!xisr) 406 + return 0; 407 + 408 + /* We found something in the ICP... 409 + * 410 + * If it is an IPI, clear the MFRR and EOI it. 411 + */ 412 + if (xisr == XICS_IPI) { 413 + _stbcix(xics_phys + XICS_MFRR, 0xff); 414 + _stwcix(xics_phys + XICS_XIRR, xirr); 415 + /* 416 + * Need to ensure side effects of above stores 417 + * complete before proceeding. 418 + */ 419 + smp_mb(); 420 + 421 + /* 422 + * We need to re-check host IPI now in case it got set in the 423 + * meantime. If it's clear, we bounce the interrupt to the 424 + * guest 425 + */ 426 + host_ipi = local_paca->kvm_hstate.host_ipi; 427 + if (unlikely(host_ipi != 0)) { 428 + /* We raced with the host, 429 + * we need to resend that IPI, bummer 430 + */ 431 + _stbcix(xics_phys + XICS_MFRR, IPI_PRIORITY); 432 + /* Let side effects complete */ 433 + smp_mb(); 434 + return 1; 435 + } 436 + 437 + /* OK, it's an IPI for us */ 438 + local_paca->kvm_hstate.saved_xirr = 0; 439 + return -1; 440 + } 441 + 442 + return kvmppc_check_passthru(xisr, xirr); 443 + }

+120

arch/powerpc/kvm/book3s_hv_rm_xics.c

··· 10 10 #include <linux/kernel.h> 11 11 #include <linux/kvm_host.h> 12 12 #include <linux/err.h> 13 + #include <linux/kernel_stat.h> 13 14 14 15 #include <asm/kvm_book3s.h> 15 16 #include <asm/kvm_ppc.h> ··· 19 18 #include <asm/debug.h> 20 19 #include <asm/synch.h> 21 20 #include <asm/cputhreads.h> 21 + #include <asm/pgtable.h> 22 22 #include <asm/ppc-opcode.h> 23 + #include <asm/pnv-pci.h> 24 + #include <asm/opal.h> 23 25 24 26 #include "book3s_xics.h" 25 27 ··· 30 26 31 27 int h_ipi_redirect = 1; 32 28 EXPORT_SYMBOL(h_ipi_redirect); 29 + int kvm_irq_bypass = 1; 30 + EXPORT_SYMBOL(kvm_irq_bypass); 33 31 34 32 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, 35 33 u32 new_irq); 34 + static int xics_opal_rm_set_server(unsigned int hw_irq, int server_cpu); 36 35 37 36 /* -- ICS routines -- */ 38 37 static void ics_rm_check_resend(struct kvmppc_xics *xics, ··· 715 708 icp->rm_action |= XICS_RM_NOTIFY_EOI; 716 709 icp->rm_eoied_irq = irq; 717 710 } 711 + 712 + if (state->host_irq) { 713 + ++vcpu->stat.pthru_all; 714 + if (state->intr_cpu != -1) { 715 + int pcpu = raw_smp_processor_id(); 716 + 717 + pcpu = cpu_first_thread_sibling(pcpu); 718 + ++vcpu->stat.pthru_host; 719 + if (state->intr_cpu != pcpu) { 720 + ++vcpu->stat.pthru_bad_aff; 721 + xics_opal_rm_set_server(state->host_irq, pcpu); 722 + } 723 + state->intr_cpu = -1; 724 + } 725 + } 718 726 bail: 719 727 return check_too_hard(xics, icp); 728 + } 729 + 730 + unsigned long eoi_rc; 731 + 732 + static void icp_eoi(struct irq_chip *c, u32 hwirq, u32 xirr) 733 + { 734 + unsigned long xics_phys; 735 + int64_t rc; 736 + 737 + rc = pnv_opal_pci_msi_eoi(c, hwirq); 738 + 739 + if (rc) 740 + eoi_rc = rc; 741 + 742 + iosync(); 743 + 744 + /* EOI it */ 745 + xics_phys = local_paca->kvm_hstate.xics_phys; 746 + _stwcix(xics_phys + XICS_XIRR, xirr); 747 + } 748 + 749 + static int xics_opal_rm_set_server(unsigned int hw_irq, int server_cpu) 750 + { 751 + unsigned int mangle_cpu = get_hard_smp_processor_id(server_cpu) << 2; 752 + 753 + return opal_rm_set_xive(hw_irq, mangle_cpu, DEFAULT_PRIORITY); 754 + } 755 + 756 + /* 757 + * Increment a per-CPU 32-bit unsigned integer variable. 758 + * Safe to call in real-mode. Handles vmalloc'ed addresses 759 + * 760 + * ToDo: Make this work for any integral type 761 + */ 762 + 763 + static inline void this_cpu_inc_rm(unsigned int __percpu *addr) 764 + { 765 + unsigned long l; 766 + unsigned int *raddr; 767 + int cpu = smp_processor_id(); 768 + 769 + raddr = per_cpu_ptr(addr, cpu); 770 + l = (unsigned long)raddr; 771 + 772 + if (REGION_ID(l) == VMALLOC_REGION_ID) { 773 + l = vmalloc_to_phys(raddr); 774 + raddr = (unsigned int *)l; 775 + } 776 + ++*raddr; 777 + } 778 + 779 + /* 780 + * We don't try to update the flags in the irq_desc 'istate' field in 781 + * here as would happen in the normal IRQ handling path for several reasons: 782 + * - state flags represent internal IRQ state and are not expected to be 783 + * updated outside the IRQ subsystem 784 + * - more importantly, these are useful for edge triggered interrupts, 785 + * IRQ probing, etc., but we are only handling MSI/MSIx interrupts here 786 + * and these states shouldn't apply to us. 787 + * 788 + * However, we do update irq_stats - we somewhat duplicate the code in 789 + * kstat_incr_irqs_this_cpu() for this since this function is defined 790 + * in irq/internal.h which we don't want to include here. 791 + * The only difference is that desc->kstat_irqs is an allocated per CPU 792 + * variable and could have been vmalloc'ed, so we can't directly 793 + * call __this_cpu_inc() on it. The kstat structure is a static 794 + * per CPU variable and it should be accessible by real-mode KVM. 795 + * 796 + */ 797 + static void kvmppc_rm_handle_irq_desc(struct irq_desc *desc) 798 + { 799 + this_cpu_inc_rm(desc->kstat_irqs); 800 + __this_cpu_inc(kstat.irqs_sum); 801 + } 802 + 803 + long kvmppc_deliver_irq_passthru(struct kvm_vcpu *vcpu, 804 + u32 xirr, 805 + struct kvmppc_irq_map *irq_map, 806 + struct kvmppc_passthru_irqmap *pimap) 807 + { 808 + struct kvmppc_xics *xics; 809 + struct kvmppc_icp *icp; 810 + u32 irq; 811 + 812 + irq = irq_map->v_hwirq; 813 + xics = vcpu->kvm->arch.xics; 814 + icp = vcpu->arch.icp; 815 + 816 + kvmppc_rm_handle_irq_desc(irq_map->desc); 817 + icp_rm_deliver_irq(xics, icp, irq); 818 + 819 + /* EOI the interrupt */ 820 + icp_eoi(irq_desc_get_chip(irq_map->desc), irq_map->r_hwirq, xirr); 821 + 822 + if (check_too_hard(xics, icp) == H_TOO_HARD) 823 + return 2; 824 + else 825 + return -2; 720 826 } 721 827 722 828 /* --- Non-real mode XICS-related built-in routines --- */

+109 -88

arch/powerpc/kvm/book3s_hv_rmhandlers.S

··· 221 221 li r3, 0 /* Don't wake on privileged (OS) doorbell */ 222 222 b kvm_do_nap 223 223 224 + /* 225 + * kvm_novcpu_wakeup 226 + * Entered from kvm_start_guest if kvm_hstate.napping is set 227 + * to NAPPING_NOVCPU 228 + * r2 = kernel TOC 229 + * r13 = paca 230 + */ 224 231 kvm_novcpu_wakeup: 225 232 ld r1, HSTATE_HOST_R1(r13) 226 233 ld r5, HSTATE_KVM_VCORE(r13) ··· 236 229 237 230 /* check the wake reason */ 238 231 bl kvmppc_check_wake_reason 232 + 233 + /* 234 + * Restore volatile registers since we could have called 235 + * a C routine in kvmppc_check_wake_reason. 236 + * r5 = VCORE 237 + */ 238 + ld r5, HSTATE_KVM_VCORE(r13) 239 239 240 240 /* see if any other thread is already exiting */ 241 241 lwz r0, VCORE_ENTRY_EXIT(r5) ··· 336 322 337 323 /* Check the wake reason in SRR1 to see why we got here */ 338 324 bl kvmppc_check_wake_reason 325 + /* 326 + * kvmppc_check_wake_reason could invoke a C routine, but we 327 + * have no volatile registers to restore when we return. 328 + */ 329 + 339 330 cmpdi r3, 0 340 331 bge kvm_no_guest 341 332 ··· 644 625 38: 645 626 646 627 BEGIN_FTR_SECTION 647 - /* DPDES is shared between threads */ 628 + /* DPDES and VTB are shared between threads */ 648 629 ld r8, VCORE_DPDES(r5) 630 + ld r7, VCORE_VTB(r5) 649 631 mtspr SPRN_DPDES, r8 632 + mtspr SPRN_VTB, r7 650 633 END_FTR_SECTION_IFSET(CPU_FTR_ARCH_207S) 651 634 652 635 /* Mark the subcore state as inside guest */ ··· 808 787 mtspr SPRN_CIABR, r7 809 788 mtspr SPRN_TAR, r8 810 789 ld r5, VCPU_IC(r4) 811 - ld r6, VCPU_VTB(r4) 812 - mtspr SPRN_IC, r5 813 - mtspr SPRN_VTB, r6 814 790 ld r8, VCPU_EBBHR(r4) 791 + mtspr SPRN_IC, r5 815 792 mtspr SPRN_EBBHR, r8 816 793 ld r5, VCPU_EBBRR(r4) 817 794 ld r6, VCPU_BESCR(r4) ··· 900 881 cmpwi r3, 512 /* 1 microsecond */ 901 882 blt hdec_soon 902 883 884 + deliver_guest_interrupt: 903 885 ld r6, VCPU_CTR(r4) 904 886 ld r7, VCPU_XER(r4) 905 887 ··· 915 895 mtspr SPRN_SRR0, r6 916 896 mtspr SPRN_SRR1, r7 917 897 918 - deliver_guest_interrupt: 919 898 /* r11 = vcpu->arch.msr & ~MSR_HV */ 920 899 rldicl r11, r11, 63 - MSR_HV_LG, 1 921 900 rotldi r11, r11, 1 + MSR_HV_LG ··· 1174 1155 * set, we know the host wants us out so let's do it now 1175 1156 */ 1176 1157 bl kvmppc_read_intr 1158 + 1159 + /* 1160 + * Restore the active volatile registers after returning from 1161 + * a C function. 1162 + */ 1163 + ld r9, HSTATE_KVM_VCPU(r13) 1164 + li r12, BOOK3S_INTERRUPT_EXTERNAL 1165 + 1166 + /* 1167 + * kvmppc_read_intr return codes: 1168 + * 1169 + * Exit to host (r3 > 0) 1170 + * 1 An interrupt is pending that needs to be handled by the host 1171 + * Exit guest and return to host by branching to guest_exit_cont 1172 + * 1173 + * 2 Passthrough that needs completion in the host 1174 + * Exit guest and return to host by branching to guest_exit_cont 1175 + * However, we also set r12 to BOOK3S_INTERRUPT_HV_RM_HARD 1176 + * to indicate to the host to complete handling the interrupt 1177 + * 1178 + * Before returning to guest, we check if any CPU is heading out 1179 + * to the host and if so, we head out also. If no CPUs are heading 1180 + * check return values <= 0. 1181 + * 1182 + * Return to guest (r3 <= 0) 1183 + * 0 No external interrupt is pending 1184 + * -1 A guest wakeup IPI (which has now been cleared) 1185 + * In either case, we return to guest to deliver any pending 1186 + * guest interrupts. 1187 + * 1188 + * -2 A PCI passthrough external interrupt was handled 1189 + * (interrupt was delivered directly to guest) 1190 + * Return to guest to deliver any pending guest interrupts. 1191 + */ 1192 + 1193 + cmpdi r3, 1 1194 + ble 1f 1195 + 1196 + /* Return code = 2 */ 1197 + li r12, BOOK3S_INTERRUPT_HV_RM_HARD 1198 + stw r12, VCPU_TRAP(r9) 1199 + b guest_exit_cont 1200 + 1201 + 1: /* Return code <= 1 */ 1177 1202 cmpdi r3, 0 1178 1203 bgt guest_exit_cont 1179 1204 1180 - /* Check if any CPU is heading out to the host, if so head out too */ 1205 + /* Return code <= 0 */ 1181 1206 4: ld r5, HSTATE_KVM_VCORE(r13) 1182 1207 lwz r0, VCORE_ENTRY_EXIT(r5) 1183 1208 cmpwi r0, 0x100 ··· 1334 1271 stw r6, VCPU_PSPB(r9) 1335 1272 std r7, VCPU_FSCR(r9) 1336 1273 mfspr r5, SPRN_IC 1337 - mfspr r6, SPRN_VTB 1338 1274 mfspr r7, SPRN_TAR 1339 1275 std r5, VCPU_IC(r9) 1340 - std r6, VCPU_VTB(r9) 1341 1276 std r7, VCPU_TAR(r9) 1342 1277 mfspr r8, SPRN_EBBHR 1343 1278 std r8, VCPU_EBBHR(r9) ··· 1562 1501 isync 1563 1502 1564 1503 BEGIN_FTR_SECTION 1565 - /* DPDES is shared between threads */ 1504 + /* DPDES and VTB are shared between threads */ 1566 1505 mfspr r7, SPRN_DPDES 1506 + mfspr r8, SPRN_VTB 1567 1507 std r7, VCORE_DPDES(r5) 1508 + std r8, VCORE_VTB(r5) 1568 1509 /* clear DPDES so we don't get guest doorbells in the host */ 1569 1510 li r8, 0 1570 1511 mtspr SPRN_DPDES, r8 ··· 2276 2213 ld r29, VCPU_GPR(R29)(r4) 2277 2214 ld r30, VCPU_GPR(R30)(r4) 2278 2215 ld r31, VCPU_GPR(R31)(r4) 2279 - 2216 + 2280 2217 /* Check the wake reason in SRR1 to see why we got here */ 2281 2218 bl kvmppc_check_wake_reason 2219 + 2220 + /* 2221 + * Restore volatile registers since we could have called a 2222 + * C routine in kvmppc_check_wake_reason 2223 + * r4 = VCPU 2224 + * r3 tells us whether we need to return to host or not 2225 + * WARNING: it gets checked further down: 2226 + * should not modify r3 until this check is done. 2227 + */ 2228 + ld r4, HSTATE_KVM_VCPU(r13) 2282 2229 2283 2230 /* clear our bit in vcore->napping_threads */ 2284 2231 34: ld r5,HSTATE_KVM_VCORE(r13) ··· 2303 2230 li r0,0 2304 2231 stb r0,HSTATE_NAPPING(r13) 2305 2232 2306 - /* See if the wake reason means we need to exit */ 2233 + /* See if the wake reason saved in r3 means we need to exit */ 2307 2234 stw r12, VCPU_TRAP(r4) 2308 2235 mr r9, r4 2309 2236 cmpdi r3, 0 ··· 2370 2297 * 0 if nothing needs to be done 2371 2298 * 1 if something happened that needs to be handled by the host 2372 2299 * -1 if there was a guest wakeup (IPI or msgsnd) 2300 + * -2 if we handled a PCI passthrough interrupt (returned by 2301 + * kvmppc_read_intr only) 2373 2302 * 2374 2303 * Also sets r12 to the interrupt vector for any interrupt that needs 2375 2304 * to be handled now by the host (0x500 for external interrupt), or zero. 2376 - * Modifies r0, r6, r7, r8. 2305 + * Modifies all volatile registers (since it may call a C function). 2306 + * This routine calls kvmppc_read_intr, a C function, if an external 2307 + * interrupt is pending. 2377 2308 */ 2378 2309 kvmppc_check_wake_reason: 2379 2310 mfspr r6, SPRN_SRR1 ··· 2387 2310 rlwinm r6, r6, 45-31, 0xe /* P7 wake reason field is 3 bits */ 2388 2311 ALT_FTR_SECTION_END_IFSET(CPU_FTR_ARCH_207S) 2389 2312 cmpwi r6, 8 /* was it an external interrupt? */ 2390 - li r12, BOOK3S_INTERRUPT_EXTERNAL 2391 - beq kvmppc_read_intr /* if so, see what it was */ 2313 + beq 7f /* if so, see what it was */ 2392 2314 li r3, 0 2393 2315 li r12, 0 2394 2316 cmpwi r6, 6 /* was it the decrementer? */ ··· 2426 2350 li r3, 1 2427 2351 blr 2428 2352 2429 - /* 2430 - * Determine what sort of external interrupt is pending (if any). 2431 - * Returns: 2432 - * 0 if no interrupt is pending 2433 - * 1 if an interrupt is pending that needs to be handled by the host 2434 - * -1 if there was a guest wakeup IPI (which has now been cleared) 2435 - * Modifies r0, r6, r7, r8, returns value in r3. 2436 - */ 2437 - kvmppc_read_intr: 2438 - /* see if a host IPI is pending */ 2439 - li r3, 1 2440 - lbz r0, HSTATE_HOST_IPI(r13) 2441 - cmpwi r0, 0 2442 - bne 1f 2353 + /* external interrupt - create a stack frame so we can call C */ 2354 + 7: mflr r0 2355 + std r0, PPC_LR_STKOFF(r1) 2356 + stdu r1, -PPC_MIN_STKFRM(r1) 2357 + bl kvmppc_read_intr 2358 + nop 2359 + li r12, BOOK3S_INTERRUPT_EXTERNAL 2360 + cmpdi r3, 1 2361 + ble 1f 2443 2362 2444 - /* Now read the interrupt from the ICP */ 2445 - ld r6, HSTATE_XICS_PHYS(r13) 2446 - li r7, XICS_XIRR 2447 - cmpdi r6, 0 2448 - beq- 1f 2449 - lwzcix r0, r6, r7 2450 2363 /* 2451 - * Save XIRR for later. Since we get in in reverse endian on LE 2452 - * systems, save it byte reversed and fetch it back in host endian. 2364 + * Return code of 2 means PCI passthrough interrupt, but 2365 + * we need to return back to host to complete handling the 2366 + * interrupt. Trap reason is expected in r12 by guest 2367 + * exit code. 2453 2368 */ 2454 - li r3, HSTATE_SAVED_XIRR 2455 - STWX_BE r0, r3, r13 2456 - #ifdef __LITTLE_ENDIAN__ 2457 - lwz r3, HSTATE_SAVED_XIRR(r13) 2458 - #else 2459 - mr r3, r0 2460 - #endif 2461 - rlwinm. r3, r3, 0, 0xffffff 2462 - sync 2463 - beq 1f /* if nothing pending in the ICP */ 2464 - 2465 - /* We found something in the ICP... 2466 - * 2467 - * If it's not an IPI, stash it in the PACA and return to 2468 - * the host, we don't (yet) handle directing real external 2469 - * interrupts directly to the guest 2470 - */ 2471 - cmpwi r3, XICS_IPI /* if there is, is it an IPI? */ 2472 - bne 42f 2473 - 2474 - /* It's an IPI, clear the MFRR and EOI it */ 2475 - li r3, 0xff 2476 - li r8, XICS_MFRR 2477 - stbcix r3, r6, r8 /* clear the IPI */ 2478 - stwcix r0, r6, r7 /* EOI it */ 2479 - sync 2480 - 2481 - /* We need to re-check host IPI now in case it got set in the 2482 - * meantime. If it's clear, we bounce the interrupt to the 2483 - * guest 2484 - */ 2485 - lbz r0, HSTATE_HOST_IPI(r13) 2486 - cmpwi r0, 0 2487 - bne- 43f 2488 - 2489 - /* OK, it's an IPI for us */ 2490 - li r12, 0 2491 - li r3, -1 2492 - 1: blr 2493 - 2494 - 42: /* It's not an IPI and it's for the host. We saved a copy of XIRR in 2495 - * the PACA earlier, it will be picked up by the host ICP driver 2496 - */ 2497 - li r3, 1 2498 - b 1b 2499 - 2500 - 43: /* We raced with the host, we need to resend that IPI, bummer */ 2501 - li r0, IPI_PRIORITY 2502 - stbcix r0, r6, r8 /* set the IPI */ 2503 - sync 2504 - li r3, 1 2505 - b 1b 2369 + li r12, BOOK3S_INTERRUPT_HV_RM_HARD 2370 + 1: 2371 + ld r0, PPC_MIN_STKFRM+PPC_LR_STKOFF(r1) 2372 + addi r1, r1, PPC_MIN_STKFRM 2373 + mtlr r0 2374 + blr 2506 2375 2507 2376 /* 2508 2377 * Save away FP, VMX and VSX registers.

+9 -1

arch/powerpc/kvm/book3s_pr.c

··· 226 226 */ 227 227 vcpu->arch.purr += get_tb() - vcpu->arch.entry_tb; 228 228 vcpu->arch.spurr += get_tb() - vcpu->arch.entry_tb; 229 - vcpu->arch.vtb += get_vtb() - vcpu->arch.entry_vtb; 229 + to_book3s(vcpu)->vtb += get_vtb() - vcpu->arch.entry_vtb; 230 230 if (cpu_has_feature(CPU_FTR_ARCH_207S)) 231 231 vcpu->arch.ic += mfspr(SPRN_IC) - vcpu->arch.entry_ic; 232 232 svcpu->in_use = false; ··· 448 448 case PVR_POWER7: 449 449 case PVR_POWER7p: 450 450 case PVR_POWER8: 451 + case PVR_POWER8E: 452 + case PVR_POWER8NVL: 451 453 vcpu->arch.hflags |= BOOK3S_HFLAG_MULTI_PGSIZE | 452 454 BOOK3S_HFLAG_NEW_TLBIE; 453 455 break; ··· 1363 1361 case KVM_REG_PPC_HIOR: 1364 1362 *val = get_reg_val(id, to_book3s(vcpu)->hior); 1365 1363 break; 1364 + case KVM_REG_PPC_VTB: 1365 + *val = get_reg_val(id, to_book3s(vcpu)->vtb); 1366 + break; 1366 1367 case KVM_REG_PPC_LPCR: 1367 1368 case KVM_REG_PPC_LPCR_64: 1368 1369 /* ··· 1401 1396 case KVM_REG_PPC_HIOR: 1402 1397 to_book3s(vcpu)->hior = set_reg_val(id, *val); 1403 1398 to_book3s(vcpu)->hior_explicit = true; 1399 + break; 1400 + case KVM_REG_PPC_VTB: 1401 + to_book3s(vcpu)->vtb = set_reg_val(id, *val); 1404 1402 break; 1405 1403 case KVM_REG_PPC_LPCR: 1406 1404 case KVM_REG_PPC_LPCR_64:

+56 -1

arch/powerpc/kvm/book3s_xics.c

··· 99 99 return 0; 100 100 } 101 101 102 + /* Record which CPU this arrived on for passed-through interrupts */ 103 + if (state->host_irq) 104 + state->intr_cpu = raw_smp_processor_id(); 105 + 102 106 /* Attempt delivery */ 103 107 icp_deliver_irq(xics, NULL, irq); 104 108 ··· 816 812 return H_SUCCESS; 817 813 } 818 814 819 - static noinline int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall) 815 + int kvmppc_xics_rm_complete(struct kvm_vcpu *vcpu, u32 hcall) 820 816 { 821 817 struct kvmppc_xics *xics = vcpu->kvm->arch.xics; 822 818 struct kvmppc_icp *icp = vcpu->arch.icp; ··· 845 841 846 842 return H_SUCCESS; 847 843 } 844 + EXPORT_SYMBOL_GPL(kvmppc_xics_rm_complete); 848 845 849 846 int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 req) 850 847 { ··· 897 892 898 893 /* -- Initialisation code etc. -- */ 899 894 895 + static void xics_debugfs_irqmap(struct seq_file *m, 896 + struct kvmppc_passthru_irqmap *pimap) 897 + { 898 + int i; 899 + 900 + if (!pimap) 901 + return; 902 + seq_printf(m, "========\nPIRQ mappings: %d maps\n===========\n", 903 + pimap->n_mapped); 904 + for (i = 0; i < pimap->n_mapped; i++) { 905 + seq_printf(m, "r_hwirq=%x, v_hwirq=%x\n", 906 + pimap->mapped[i].r_hwirq, pimap->mapped[i].v_hwirq); 907 + } 908 + } 909 + 900 910 static int xics_debug_show(struct seq_file *m, void *private) 901 911 { 902 912 struct kvmppc_xics *xics = m->private; ··· 932 912 t_rm_reject = 0; 933 913 t_check_resend = 0; 934 914 t_reject = 0; 915 + 916 + xics_debugfs_irqmap(m, kvm->arch.pimap); 935 917 936 918 seq_printf(m, "=========\nICP state\n=========\n"); 937 919 ··· 1274 1252 { 1275 1253 struct kvmppc_xics *xics = kvm->arch.xics; 1276 1254 1255 + if (!xics) 1256 + return -ENODEV; 1277 1257 return ics_deliver_irq(xics, irq, level); 1278 1258 } 1279 1259 ··· 1442 1418 { 1443 1419 return pin; 1444 1420 } 1421 + 1422 + void kvmppc_xics_set_mapped(struct kvm *kvm, unsigned long irq, 1423 + unsigned long host_irq) 1424 + { 1425 + struct kvmppc_xics *xics = kvm->arch.xics; 1426 + struct kvmppc_ics *ics; 1427 + u16 idx; 1428 + 1429 + ics = kvmppc_xics_find_ics(xics, irq, &idx); 1430 + if (!ics) 1431 + return; 1432 + 1433 + ics->irq_state[idx].host_irq = host_irq; 1434 + ics->irq_state[idx].intr_cpu = -1; 1435 + } 1436 + EXPORT_SYMBOL_GPL(kvmppc_xics_set_mapped); 1437 + 1438 + void kvmppc_xics_clr_mapped(struct kvm *kvm, unsigned long irq, 1439 + unsigned long host_irq) 1440 + { 1441 + struct kvmppc_xics *xics = kvm->arch.xics; 1442 + struct kvmppc_ics *ics; 1443 + u16 idx; 1444 + 1445 + ics = kvmppc_xics_find_ics(xics, irq, &idx); 1446 + if (!ics) 1447 + return; 1448 + 1449 + ics->irq_state[idx].host_irq = 0; 1450 + } 1451 + EXPORT_SYMBOL_GPL(kvmppc_xics_clr_mapped);

+2

arch/powerpc/kvm/book3s_xics.h

··· 42 42 u8 lsi; /* level-sensitive interrupt */ 43 43 u8 asserted; /* Only for LSI */ 44 44 u8 exists; 45 + int intr_cpu; 46 + u32 host_irq; 45 47 }; 46 48 47 49 /* Atomic ICP state, updated with a single compare & swap */

+1 -1

arch/powerpc/kvm/booke.c

··· 2038 2038 if (type == KVMPPC_DEBUG_NONE) 2039 2039 continue; 2040 2040 2041 - if (type & !(KVMPPC_DEBUG_WATCH_READ | 2041 + if (type & ~(KVMPPC_DEBUG_WATCH_READ | 2042 2042 KVMPPC_DEBUG_WATCH_WRITE | 2043 2043 KVMPPC_DEBUG_BREAKPOINT)) 2044 2044 return -EINVAL;

+38 -37

arch/powerpc/kvm/e500_mmu.c

··· 743 743 char *virt; 744 744 struct page **pages; 745 745 struct tlbe_priv *privs[2] = {}; 746 - u64 *g2h_bitmap = NULL; 746 + u64 *g2h_bitmap; 747 747 size_t array_len; 748 748 u32 sets; 749 749 int num_pages, ret, i; ··· 779 779 780 780 num_pages = DIV_ROUND_UP(cfg->array + array_len - 1, PAGE_SIZE) - 781 781 cfg->array / PAGE_SIZE; 782 - pages = kmalloc(sizeof(struct page *) * num_pages, GFP_KERNEL); 782 + pages = kmalloc_array(num_pages, sizeof(*pages), GFP_KERNEL); 783 783 if (!pages) 784 784 return -ENOMEM; 785 785 786 786 ret = get_user_pages_fast(cfg->array, num_pages, 1, pages); 787 787 if (ret < 0) 788 - goto err_pages; 788 + goto free_pages; 789 789 790 790 if (ret != num_pages) { 791 791 num_pages = ret; 792 792 ret = -EFAULT; 793 - goto err_put_page; 793 + goto put_pages; 794 794 } 795 795 796 796 virt = vmap(pages, num_pages, VM_MAP, PAGE_KERNEL); 797 797 if (!virt) { 798 798 ret = -ENOMEM; 799 - goto err_put_page; 799 + goto put_pages; 800 800 } 801 801 802 - privs[0] = kzalloc(sizeof(struct tlbe_priv) * params.tlb_sizes[0], 803 - GFP_KERNEL); 804 - privs[1] = kzalloc(sizeof(struct tlbe_priv) * params.tlb_sizes[1], 805 - GFP_KERNEL); 806 - 807 - if (!privs[0] || !privs[1]) { 802 + privs[0] = kcalloc(params.tlb_sizes[0], sizeof(*privs[0]), GFP_KERNEL); 803 + if (!privs[0]) { 808 804 ret = -ENOMEM; 809 - goto err_privs; 805 + goto put_pages; 810 806 } 811 807 812 - g2h_bitmap = kzalloc(sizeof(u64) * params.tlb_sizes[1], 813 - GFP_KERNEL); 808 + privs[1] = kcalloc(params.tlb_sizes[1], sizeof(*privs[1]), GFP_KERNEL); 809 + if (!privs[1]) { 810 + ret = -ENOMEM; 811 + goto free_privs_first; 812 + } 813 + 814 + g2h_bitmap = kcalloc(params.tlb_sizes[1], 815 + sizeof(*g2h_bitmap), 816 + GFP_KERNEL); 814 817 if (!g2h_bitmap) { 815 818 ret = -ENOMEM; 816 - goto err_privs; 819 + goto free_privs_second; 817 820 } 818 821 819 822 free_gtlb(vcpu_e500); ··· 848 845 849 846 kvmppc_recalc_tlb1map_range(vcpu_e500); 850 847 return 0; 851 - 852 - err_privs: 853 - kfree(privs[0]); 848 + free_privs_second: 854 849 kfree(privs[1]); 855 - 856 - err_put_page: 850 + free_privs_first: 851 + kfree(privs[0]); 852 + put_pages: 857 853 for (i = 0; i < num_pages; i++) 858 854 put_page(pages[i]); 859 - 860 - err_pages: 855 + free_pages: 861 856 kfree(pages); 862 857 return ret; 863 858 } ··· 905 904 int kvmppc_e500_tlb_init(struct kvmppc_vcpu_e500 *vcpu_e500) 906 905 { 907 906 struct kvm_vcpu *vcpu = &vcpu_e500->vcpu; 908 - int entry_size = sizeof(struct kvm_book3e_206_tlb_entry); 909 - int entries = KVM_E500_TLB0_SIZE + KVM_E500_TLB1_SIZE; 910 907 911 908 if (e500_mmu_host_init(vcpu_e500)) 912 - goto err; 909 + goto free_vcpu; 913 910 914 911 vcpu_e500->gtlb_params[0].entries = KVM_E500_TLB0_SIZE; 915 912 vcpu_e500->gtlb_params[1].entries = KVM_E500_TLB1_SIZE; ··· 919 920 vcpu_e500->gtlb_params[1].ways = KVM_E500_TLB1_SIZE; 920 921 vcpu_e500->gtlb_params[1].sets = 1; 921 922 922 - vcpu_e500->gtlb_arch = kmalloc(entries * entry_size, GFP_KERNEL); 923 + vcpu_e500->gtlb_arch = kmalloc_array(KVM_E500_TLB0_SIZE + 924 + KVM_E500_TLB1_SIZE, 925 + sizeof(*vcpu_e500->gtlb_arch), 926 + GFP_KERNEL); 923 927 if (!vcpu_e500->gtlb_arch) 924 928 return -ENOMEM; 925 929 926 930 vcpu_e500->gtlb_offset[0] = 0; 927 931 vcpu_e500->gtlb_offset[1] = KVM_E500_TLB0_SIZE; 928 932 929 - vcpu_e500->gtlb_priv[0] = kzalloc(sizeof(struct tlbe_ref) * 930 - vcpu_e500->gtlb_params[0].entries, 933 + vcpu_e500->gtlb_priv[0] = kcalloc(vcpu_e500->gtlb_params[0].entries, 934 + sizeof(struct tlbe_ref), 931 935 GFP_KERNEL); 932 936 if (!vcpu_e500->gtlb_priv[0]) 933 - goto err; 937 + goto free_vcpu; 934 938 935 - vcpu_e500->gtlb_priv[1] = kzalloc(sizeof(struct tlbe_ref) * 936 - vcpu_e500->gtlb_params[1].entries, 939 + vcpu_e500->gtlb_priv[1] = kcalloc(vcpu_e500->gtlb_params[1].entries, 940 + sizeof(struct tlbe_ref), 937 941 GFP_KERNEL); 938 942 if (!vcpu_e500->gtlb_priv[1]) 939 - goto err; 943 + goto free_vcpu; 940 944 941 - vcpu_e500->g2h_tlb1_map = kzalloc(sizeof(u64) * 942 - vcpu_e500->gtlb_params[1].entries, 945 + vcpu_e500->g2h_tlb1_map = kcalloc(vcpu_e500->gtlb_params[1].entries, 946 + sizeof(*vcpu_e500->g2h_tlb1_map), 943 947 GFP_KERNEL); 944 948 if (!vcpu_e500->g2h_tlb1_map) 945 - goto err; 949 + goto free_vcpu; 946 950 947 951 vcpu_mmu_init(vcpu, vcpu_e500->gtlb_params); 948 952 949 953 kvmppc_recalc_tlb1map_range(vcpu_e500); 950 954 return 0; 951 - 952 - err: 955 + free_vcpu: 953 956 free_gtlb(vcpu_e500); 954 957 return -1; 955 958 }

+61

arch/powerpc/kvm/powerpc.c

··· 27 27 #include <linux/slab.h> 28 28 #include <linux/file.h> 29 29 #include <linux/module.h> 30 + #include <linux/irqbypass.h> 31 + #include <linux/kvm_irqfd.h> 30 32 #include <asm/cputable.h> 31 33 #include <asm/uaccess.h> 32 34 #include <asm/kvm_ppc.h> ··· 438 436 return -EINVAL; 439 437 } 440 438 439 + bool kvm_arch_has_vcpu_debugfs(void) 440 + { 441 + return false; 442 + } 443 + 444 + int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu) 445 + { 446 + return 0; 447 + } 448 + 441 449 void kvm_arch_destroy_vm(struct kvm *kvm) 442 450 { 443 451 unsigned int i; ··· 749 737 #ifdef CONFIG_BOOKE 750 738 vcpu->arch.vrsave = mfspr(SPRN_VRSAVE); 751 739 #endif 740 + } 741 + 742 + /* 743 + * irq_bypass_add_producer and irq_bypass_del_producer are only 744 + * useful if the architecture supports PCI passthrough. 745 + * irq_bypass_stop and irq_bypass_start are not needed and so 746 + * kvm_ops are not defined for them. 747 + */ 748 + bool kvm_arch_has_irq_bypass(void) 749 + { 750 + return ((kvmppc_hv_ops && kvmppc_hv_ops->irq_bypass_add_producer) || 751 + (kvmppc_pr_ops && kvmppc_pr_ops->irq_bypass_add_producer)); 752 + } 753 + 754 + int kvm_arch_irq_bypass_add_producer(struct irq_bypass_consumer *cons, 755 + struct irq_bypass_producer *prod) 756 + { 757 + struct kvm_kernel_irqfd *irqfd = 758 + container_of(cons, struct kvm_kernel_irqfd, consumer); 759 + struct kvm *kvm = irqfd->kvm; 760 + 761 + if (kvm->arch.kvm_ops->irq_bypass_add_producer) 762 + return kvm->arch.kvm_ops->irq_bypass_add_producer(cons, prod); 763 + 764 + return 0; 765 + } 766 + 767 + void kvm_arch_irq_bypass_del_producer(struct irq_bypass_consumer *cons, 768 + struct irq_bypass_producer *prod) 769 + { 770 + struct kvm_kernel_irqfd *irqfd = 771 + container_of(cons, struct kvm_kernel_irqfd, consumer); 772 + struct kvm *kvm = irqfd->kvm; 773 + 774 + if (kvm->arch.kvm_ops->irq_bypass_del_producer) 775 + kvm->arch.kvm_ops->irq_bypass_del_producer(cons, prod); 752 776 } 753 777 754 778 static void kvmppc_complete_mmio_load(struct kvm_vcpu *vcpu, ··· 1213 1165 r = kvmppc_sanity_check(vcpu); 1214 1166 1215 1167 return r; 1168 + } 1169 + 1170 + bool kvm_arch_intc_initialized(struct kvm *kvm) 1171 + { 1172 + #ifdef CONFIG_KVM_MPIC 1173 + if (kvm->arch.mpic) 1174 + return true; 1175 + #endif 1176 + #ifdef CONFIG_KVM_XICS 1177 + if (kvm->arch.xics) 1178 + return true; 1179 + #endif 1180 + return false; 1216 1181 } 1217 1182 1218 1183 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu,

+22

arch/powerpc/kvm/trace_hv.h

··· 432 432 __entry->runner_vcpu, __entry->n_runnable, __entry->tgid) 433 433 ); 434 434 435 + TRACE_EVENT(kvmppc_vcore_wakeup, 436 + TP_PROTO(int do_sleep, __u64 ns), 437 + 438 + TP_ARGS(do_sleep, ns), 439 + 440 + TP_STRUCT__entry( 441 + __field(__u64, ns) 442 + __field(int, waited) 443 + __field(pid_t, tgid) 444 + ), 445 + 446 + TP_fast_assign( 447 + __entry->ns = ns; 448 + __entry->waited = do_sleep; 449 + __entry->tgid = current->tgid; 450 + ), 451 + 452 + TP_printk("%s time %lld ns, tgid=%d", 453 + __entry->waited ? "wait" : "poll", 454 + __entry->ns, __entry->tgid) 455 + ); 456 + 435 457 TRACE_EVENT(kvmppc_run_vcpu_enter, 436 458 TP_PROTO(struct kvm_vcpu *vcpu), 437 459

+2 -40

arch/powerpc/mm/hash_native_64.c

··· 493 493 } 494 494 #endif 495 495 496 - static inline int __hpte_actual_psize(unsigned int lp, int psize) 497 - { 498 - int i, shift; 499 - unsigned int mask; 500 - 501 - /* start from 1 ignoring MMU_PAGE_4K */ 502 - for (i = 1; i < MMU_PAGE_COUNT; i++) { 503 - 504 - /* invalid penc */ 505 - if (mmu_psize_defs[psize].penc[i] == -1) 506 - continue; 507 - /* 508 - * encoding bits per actual page size 509 - * PTE LP actual page size 510 - * rrrr rrrz >=8KB 511 - * rrrr rrzz >=16KB 512 - * rrrr rzzz >=32KB 513 - * rrrr zzzz >=64KB 514 - * ....... 515 - */ 516 - shift = mmu_psize_defs[i].shift - LP_SHIFT; 517 - if (shift > LP_BITS) 518 - shift = LP_BITS; 519 - mask = (1 << shift) - 1; 520 - if ((lp & mask) == mmu_psize_defs[psize].penc[i]) 521 - return i; 522 - } 523 - return -1; 524 - } 525 - 526 496 static void hpte_decode(struct hash_pte *hpte, unsigned long slot, 527 497 int *psize, int *apsize, int *ssize, unsigned long *vpn) 528 498 { ··· 508 538 size = MMU_PAGE_4K; 509 539 a_size = MMU_PAGE_4K; 510 540 } else { 511 - for (size = 0; size < MMU_PAGE_COUNT; size++) { 512 - 513 - /* valid entries have a shift value */ 514 - if (!mmu_psize_defs[size].shift) 515 - continue; 516 - 517 - a_size = __hpte_actual_psize(lp, size); 518 - if (a_size != -1) 519 - break; 520 - } 541 + size = hpte_page_sizes[lp] & 0xf; 542 + a_size = hpte_page_sizes[lp] >> 4; 521 543 } 522 544 /* This works for all page sizes, and for 256M and 1T segments */ 523 545 if (cpu_has_feature(CPU_FTR_ARCH_300))

+55

arch/powerpc/mm/hash_utils_64.c

··· 93 93 struct mmu_psize_def mmu_psize_defs[MMU_PAGE_COUNT]; 94 94 EXPORT_SYMBOL_GPL(mmu_psize_defs); 95 95 96 + u8 hpte_page_sizes[1 << LP_BITS]; 97 + EXPORT_SYMBOL_GPL(hpte_page_sizes); 98 + 96 99 struct hash_pte *htab_address; 97 100 unsigned long htab_size_bytes; 98 101 unsigned long htab_hash_mask; ··· 567 564 #endif /* CONFIG_HUGETLB_PAGE */ 568 565 } 569 566 567 + /* 568 + * Fill in the hpte_page_sizes[] array. 569 + * We go through the mmu_psize_defs[] array looking for all the 570 + * supported base/actual page size combinations. Each combination 571 + * has a unique pagesize encoding (penc) value in the low bits of 572 + * the LP field of the HPTE. For actual page sizes less than 1MB, 573 + * some of the upper LP bits are used for RPN bits, meaning that 574 + * we need to fill in several entries in hpte_page_sizes[]. 575 + * 576 + * In diagrammatic form, with r = RPN bits and z = page size bits: 577 + * PTE LP actual page size 578 + * rrrr rrrz >=8KB 579 + * rrrr rrzz >=16KB 580 + * rrrr rzzz >=32KB 581 + * rrrr zzzz >=64KB 582 + * ... 583 + * 584 + * The zzzz bits are implementation-specific but are chosen so that 585 + * no encoding for a larger page size uses the same value in its 586 + * low-order N bits as the encoding for the 2^(12+N) byte page size 587 + * (if it exists). 588 + */ 589 + static void init_hpte_page_sizes(void) 590 + { 591 + long int ap, bp; 592 + long int shift, penc; 593 + 594 + for (bp = 0; bp < MMU_PAGE_COUNT; ++bp) { 595 + if (!mmu_psize_defs[bp].shift) 596 + continue; /* not a supported page size */ 597 + for (ap = bp; ap < MMU_PAGE_COUNT; ++ap) { 598 + penc = mmu_psize_defs[bp].penc[ap]; 599 + if (penc == -1) 600 + continue; 601 + shift = mmu_psize_defs[ap].shift - LP_SHIFT; 602 + if (shift <= 0) 603 + continue; /* should never happen */ 604 + /* 605 + * For page sizes less than 1MB, this loop 606 + * replicates the entry for all possible values 607 + * of the rrrr bits. 608 + */ 609 + while (penc < (1 << LP_BITS)) { 610 + hpte_page_sizes[penc] = (ap << 4) | bp; 611 + penc += 1 << shift; 612 + } 613 + } 614 + } 615 + } 616 + 570 617 static void __init htab_init_page_sizes(void) 571 618 { 619 + init_hpte_page_sizes(); 620 + 572 621 if (!debug_pagealloc_enabled()) { 573 622 /* 574 623 * Pick a size for the linear mapping. Currently, we only

+1

arch/powerpc/platforms/powernv/opal-wrappers.S

··· 208 208 OPAL_CALL(opal_pci_config_write_half_word, OPAL_PCI_CONFIG_WRITE_HALF_WORD); 209 209 OPAL_CALL(opal_pci_config_write_word, OPAL_PCI_CONFIG_WRITE_WORD); 210 210 OPAL_CALL(opal_set_xive, OPAL_SET_XIVE); 211 + OPAL_CALL_REAL(opal_rm_set_xive, OPAL_SET_XIVE); 211 212 OPAL_CALL(opal_get_xive, OPAL_GET_XIVE); 212 213 OPAL_CALL(opal_register_exception_handler, OPAL_REGISTER_OPAL_EXCEPTION_HANDLER); 213 214 OPAL_CALL(opal_pci_eeh_freeze_status, OPAL_PCI_EEH_FREEZE_STATUS);

+21 -5

arch/powerpc/platforms/powernv/pci-ioda.c

··· 2718 2718 } 2719 2719 2720 2720 #ifdef CONFIG_PCI_MSI 2721 - static void pnv_ioda2_msi_eoi(struct irq_data *d) 2721 + int64_t pnv_opal_pci_msi_eoi(struct irq_chip *chip, unsigned int hw_irq) 2722 2722 { 2723 - unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d); 2724 - struct irq_chip *chip = irq_data_get_irq_chip(d); 2725 2723 struct pnv_phb *phb = container_of(chip, struct pnv_phb, 2726 2724 ioda.irq_chip); 2727 - int64_t rc; 2728 2725 2729 - rc = opal_pci_msi_eoi(phb->opal_id, hw_irq); 2726 + return opal_pci_msi_eoi(phb->opal_id, hw_irq); 2727 + } 2728 + 2729 + static void pnv_ioda2_msi_eoi(struct irq_data *d) 2730 + { 2731 + int64_t rc; 2732 + unsigned int hw_irq = (unsigned int)irqd_to_hwirq(d); 2733 + struct irq_chip *chip = irq_data_get_irq_chip(d); 2734 + 2735 + rc = pnv_opal_pci_msi_eoi(chip, hw_irq); 2730 2736 WARN_ON_ONCE(rc); 2731 2737 2732 2738 icp_native_eoi(d); ··· 2761 2755 } 2762 2756 irq_set_chip(virq, &phb->ioda.irq_chip); 2763 2757 } 2758 + 2759 + /* 2760 + * Returns true iff chip is something that we could call 2761 + * pnv_opal_pci_msi_eoi for. 2762 + */ 2763 + bool is_pnv_opal_msi(struct irq_chip *chip) 2764 + { 2765 + return chip->irq_eoi == pnv_ioda2_msi_eoi; 2766 + } 2767 + EXPORT_SYMBOL_GPL(is_pnv_opal_msi); 2764 2768 2765 2769 static int pnv_pci_ioda_msi_setup(struct pnv_phb *phb, struct pci_dev *dev, 2766 2770 unsigned int hwirq, unsigned int virq,

+68 -68

arch/s390/include/asm/kvm_host.h

··· 28 28 29 29 #define KVM_S390_BSCA_CPU_SLOTS 64 30 30 #define KVM_S390_ESCA_CPU_SLOTS 248 31 - #define KVM_MAX_VCPUS KVM_S390_ESCA_CPU_SLOTS 31 + #define KVM_MAX_VCPUS 255 32 32 #define KVM_USER_MEM_SLOTS 32 33 33 34 34 /* ··· 245 245 } __packed; 246 246 247 247 struct kvm_vcpu_stat { 248 - u32 exit_userspace; 249 - u32 exit_null; 250 - u32 exit_external_request; 251 - u32 exit_external_interrupt; 252 - u32 exit_stop_request; 253 - u32 exit_validity; 254 - u32 exit_instruction; 255 - u32 exit_pei; 256 - u32 halt_successful_poll; 257 - u32 halt_attempted_poll; 258 - u32 halt_poll_invalid; 259 - u32 halt_wakeup; 260 - u32 instruction_lctl; 261 - u32 instruction_lctlg; 262 - u32 instruction_stctl; 263 - u32 instruction_stctg; 264 - u32 exit_program_interruption; 265 - u32 exit_instr_and_program; 266 - u32 exit_operation_exception; 267 - u32 deliver_external_call; 268 - u32 deliver_emergency_signal; 269 - u32 deliver_service_signal; 270 - u32 deliver_virtio_interrupt; 271 - u32 deliver_stop_signal; 272 - u32 deliver_prefix_signal; 273 - u32 deliver_restart_signal; 274 - u32 deliver_program_int; 275 - u32 deliver_io_int; 276 - u32 exit_wait_state; 277 - u32 instruction_pfmf; 278 - u32 instruction_stidp; 279 - u32 instruction_spx; 280 - u32 instruction_stpx; 281 - u32 instruction_stap; 282 - u32 instruction_storage_key; 283 - u32 instruction_ipte_interlock; 284 - u32 instruction_stsch; 285 - u32 instruction_chsc; 286 - u32 instruction_stsi; 287 - u32 instruction_stfl; 288 - u32 instruction_tprot; 289 - u32 instruction_sie; 290 - u32 instruction_essa; 291 - u32 instruction_sthyi; 292 - u32 instruction_sigp_sense; 293 - u32 instruction_sigp_sense_running; 294 - u32 instruction_sigp_external_call; 295 - u32 instruction_sigp_emergency; 296 - u32 instruction_sigp_cond_emergency; 297 - u32 instruction_sigp_start; 298 - u32 instruction_sigp_stop; 299 - u32 instruction_sigp_stop_store_status; 300 - u32 instruction_sigp_store_status; 301 - u32 instruction_sigp_store_adtl_status; 302 - u32 instruction_sigp_arch; 303 - u32 instruction_sigp_prefix; 304 - u32 instruction_sigp_restart; 305 - u32 instruction_sigp_init_cpu_reset; 306 - u32 instruction_sigp_cpu_reset; 307 - u32 instruction_sigp_unknown; 308 - u32 diagnose_10; 309 - u32 diagnose_44; 310 - u32 diagnose_9c; 311 - u32 diagnose_258; 312 - u32 diagnose_308; 313 - u32 diagnose_500; 248 + u64 exit_userspace; 249 + u64 exit_null; 250 + u64 exit_external_request; 251 + u64 exit_external_interrupt; 252 + u64 exit_stop_request; 253 + u64 exit_validity; 254 + u64 exit_instruction; 255 + u64 exit_pei; 256 + u64 halt_successful_poll; 257 + u64 halt_attempted_poll; 258 + u64 halt_poll_invalid; 259 + u64 halt_wakeup; 260 + u64 instruction_lctl; 261 + u64 instruction_lctlg; 262 + u64 instruction_stctl; 263 + u64 instruction_stctg; 264 + u64 exit_program_interruption; 265 + u64 exit_instr_and_program; 266 + u64 exit_operation_exception; 267 + u64 deliver_external_call; 268 + u64 deliver_emergency_signal; 269 + u64 deliver_service_signal; 270 + u64 deliver_virtio_interrupt; 271 + u64 deliver_stop_signal; 272 + u64 deliver_prefix_signal; 273 + u64 deliver_restart_signal; 274 + u64 deliver_program_int; 275 + u64 deliver_io_int; 276 + u64 exit_wait_state; 277 + u64 instruction_pfmf; 278 + u64 instruction_stidp; 279 + u64 instruction_spx; 280 + u64 instruction_stpx; 281 + u64 instruction_stap; 282 + u64 instruction_storage_key; 283 + u64 instruction_ipte_interlock; 284 + u64 instruction_stsch; 285 + u64 instruction_chsc; 286 + u64 instruction_stsi; 287 + u64 instruction_stfl; 288 + u64 instruction_tprot; 289 + u64 instruction_sie; 290 + u64 instruction_essa; 291 + u64 instruction_sthyi; 292 + u64 instruction_sigp_sense; 293 + u64 instruction_sigp_sense_running; 294 + u64 instruction_sigp_external_call; 295 + u64 instruction_sigp_emergency; 296 + u64 instruction_sigp_cond_emergency; 297 + u64 instruction_sigp_start; 298 + u64 instruction_sigp_stop; 299 + u64 instruction_sigp_stop_store_status; 300 + u64 instruction_sigp_store_status; 301 + u64 instruction_sigp_store_adtl_status; 302 + u64 instruction_sigp_arch; 303 + u64 instruction_sigp_prefix; 304 + u64 instruction_sigp_restart; 305 + u64 instruction_sigp_init_cpu_reset; 306 + u64 instruction_sigp_cpu_reset; 307 + u64 instruction_sigp_unknown; 308 + u64 diagnose_10; 309 + u64 diagnose_44; 310 + u64 diagnose_9c; 311 + u64 diagnose_258; 312 + u64 diagnose_308; 313 + u64 diagnose_500; 314 314 }; 315 315 316 316 #define PGM_OPERATION 0x01 ··· 577 577 }; 578 578 579 579 struct kvm_vm_stat { 580 - u32 remote_tlb_flush; 580 + ulong remote_tlb_flush; 581 581 }; 582 582 583 583 struct kvm_arch_memory_slot {

+1

arch/s390/kernel/asm-offsets.c

··· 125 125 OFFSET(__LC_STFL_FAC_LIST, lowcore, stfl_fac_list); 126 126 OFFSET(__LC_STFLE_FAC_LIST, lowcore, stfle_fac_list); 127 127 OFFSET(__LC_MCCK_CODE, lowcore, mcck_interruption_code); 128 + OFFSET(__LC_EXT_DAMAGE_CODE, lowcore, external_damage_code); 128 129 OFFSET(__LC_MCCK_FAIL_STOR_ADDR, lowcore, failing_storage_address); 129 130 OFFSET(__LC_LAST_BREAK, lowcore, breaking_event_addr); 130 131 OFFSET(__LC_RST_OLD_PSW, lowcore, restart_old_psw);

+18 -19

arch/s390/kvm/gaccess.c

··· 495 495 tec = (struct trans_exc_code_bits *)&pgm->trans_exc_code; 496 496 497 497 switch (code) { 498 + case PGM_PROTECTION: 499 + switch (prot) { 500 + case PROT_TYPE_ALC: 501 + tec->b60 = 1; 502 + /* FALL THROUGH */ 503 + case PROT_TYPE_DAT: 504 + tec->b61 = 1; 505 + break; 506 + default: /* LA and KEYC set b61 to 0, other params undefined */ 507 + return code; 508 + } 509 + /* FALL THROUGH */ 498 510 case PGM_ASCE_TYPE: 499 511 case PGM_PAGE_TRANSLATION: 500 512 case PGM_REGION_FIRST_TRANS: ··· 516 504 /* 517 505 * op_access_id only applies to MOVE_PAGE -> set bit 61 518 506 * exc_access_id has to be set to 0 for some instructions. Both 519 - * cases have to be handled by the caller. We can always store 520 - * exc_access_id, as it is undefined for non-ar cases. 507 + * cases have to be handled by the caller. 521 508 */ 522 509 tec->addr = gva >> PAGE_SHIFT; 523 510 tec->fsi = mode == GACC_STORE ? FSI_STORE : FSI_FETCH; ··· 527 516 case PGM_ASTE_VALIDITY: 528 517 case PGM_ASTE_SEQUENCE: 529 518 case PGM_EXTENDED_AUTHORITY: 519 + /* 520 + * We can always store exc_access_id, as it is 521 + * undefined for non-ar cases. It is undefined for 522 + * most DAT protection exceptions. 523 + */ 530 524 pgm->exc_access_id = ar; 531 - break; 532 - case PGM_PROTECTION: 533 - switch (prot) { 534 - case PROT_TYPE_ALC: 535 - tec->b60 = 1; 536 - /* FALL THROUGH */ 537 - case PROT_TYPE_DAT: 538 - tec->b61 = 1; 539 - tec->addr = gva >> PAGE_SHIFT; 540 - tec->fsi = mode == GACC_STORE ? FSI_STORE : FSI_FETCH; 541 - tec->as = psw_bits(vcpu->arch.sie_block->gpsw).as; 542 - /* exc_access_id is undefined for most cases */ 543 - pgm->exc_access_id = ar; 544 - break; 545 - default: /* LA and KEYC set b61 to 0, other params undefined */ 546 - break; 547 - } 548 525 break; 549 526 } 550 527 return code;

+30 -29

arch/s390/kvm/guestdbg.c

··· 206 206 int kvm_s390_import_bp_data(struct kvm_vcpu *vcpu, 207 207 struct kvm_guest_debug *dbg) 208 208 { 209 - int ret = 0, nr_wp = 0, nr_bp = 0, i, size; 209 + int ret = 0, nr_wp = 0, nr_bp = 0, i; 210 210 struct kvm_hw_breakpoint *bp_data = NULL; 211 211 struct kvm_hw_wp_info_arch *wp_info = NULL; 212 212 struct kvm_hw_bp_info_arch *bp_info = NULL; ··· 216 216 else if (dbg->arch.nr_hw_bp > MAX_BP_COUNT) 217 217 return -EINVAL; 218 218 219 - size = dbg->arch.nr_hw_bp * sizeof(struct kvm_hw_breakpoint); 220 - bp_data = kmalloc(size, GFP_KERNEL); 221 - if (!bp_data) { 222 - ret = -ENOMEM; 223 - goto error; 224 - } 225 - 226 - if (copy_from_user(bp_data, dbg->arch.hw_bp, size)) { 227 - ret = -EFAULT; 228 - goto error; 229 - } 219 + bp_data = memdup_user(dbg->arch.hw_bp, 220 + sizeof(*bp_data) * dbg->arch.nr_hw_bp); 221 + if (IS_ERR(bp_data)) 222 + return PTR_ERR(bp_data); 230 223 231 224 for (i = 0; i < dbg->arch.nr_hw_bp; i++) { 232 225 switch (bp_data[i].type) { ··· 234 241 } 235 242 } 236 243 237 - size = nr_wp * sizeof(struct kvm_hw_wp_info_arch); 238 - if (size > 0) { 239 - wp_info = kmalloc(size, GFP_KERNEL); 244 + if (nr_wp > 0) { 245 + wp_info = kmalloc_array(nr_wp, 246 + sizeof(*wp_info), 247 + GFP_KERNEL); 240 248 if (!wp_info) { 241 249 ret = -ENOMEM; 242 250 goto error; 243 251 } 244 252 } 245 - size = nr_bp * sizeof(struct kvm_hw_bp_info_arch); 246 - if (size > 0) { 247 - bp_info = kmalloc(size, GFP_KERNEL); 253 + if (nr_bp > 0) { 254 + bp_info = kmalloc_array(nr_bp, 255 + sizeof(*bp_info), 256 + GFP_KERNEL); 248 257 if (!bp_info) { 249 258 ret = -ENOMEM; 250 259 goto error; ··· 377 382 vcpu->guest_debug &= ~KVM_GUESTDBG_EXIT_PENDING; 378 383 } 379 384 385 + #define PER_CODE_MASK (PER_EVENT_MASK >> 24) 386 + #define PER_CODE_BRANCH (PER_EVENT_BRANCH >> 24) 387 + #define PER_CODE_IFETCH (PER_EVENT_IFETCH >> 24) 388 + #define PER_CODE_STORE (PER_EVENT_STORE >> 24) 389 + #define PER_CODE_STORE_REAL (PER_EVENT_STORE_REAL >> 24) 390 + 380 391 #define per_bp_event(code) \ 381 - (code & (PER_EVENT_IFETCH | PER_EVENT_BRANCH)) 392 + (code & (PER_CODE_IFETCH | PER_CODE_BRANCH)) 382 393 #define per_write_wp_event(code) \ 383 - (code & (PER_EVENT_STORE | PER_EVENT_STORE_REAL)) 394 + (code & (PER_CODE_STORE | PER_CODE_STORE_REAL)) 384 395 385 396 static int debug_exit_required(struct kvm_vcpu *vcpu) 386 397 { 387 - u32 perc = (vcpu->arch.sie_block->perc << 24); 398 + u8 perc = vcpu->arch.sie_block->perc; 388 399 struct kvm_debug_exit_arch *debug_exit = &vcpu->run->debug.arch; 389 400 struct kvm_hw_wp_info_arch *wp_info = NULL; 390 401 struct kvm_hw_bp_info_arch *bp_info = NULL; ··· 445 444 const u8 ilen = kvm_s390_get_ilen(vcpu); 446 445 struct kvm_s390_pgm_info pgm_info = { 447 446 .code = PGM_PER, 448 - .per_code = PER_EVENT_IFETCH >> 24, 447 + .per_code = PER_CODE_IFETCH, 449 448 .per_address = __rewind_psw(vcpu->arch.sie_block->gpsw, ilen), 450 449 }; 451 450 ··· 459 458 460 459 static void filter_guest_per_event(struct kvm_vcpu *vcpu) 461 460 { 462 - u32 perc = vcpu->arch.sie_block->perc << 24; 461 + const u8 perc = vcpu->arch.sie_block->perc; 463 462 u64 peraddr = vcpu->arch.sie_block->peraddr; 464 463 u64 addr = vcpu->arch.sie_block->gpsw.addr; 465 464 u64 cr9 = vcpu->arch.sie_block->gcr[9]; 466 465 u64 cr10 = vcpu->arch.sie_block->gcr[10]; 467 466 u64 cr11 = vcpu->arch.sie_block->gcr[11]; 468 467 /* filter all events, demanded by the guest */ 469 - u32 guest_perc = perc & cr9 & PER_EVENT_MASK; 468 + u8 guest_perc = perc & (cr9 >> 24) & PER_CODE_MASK; 470 469 471 470 if (!guest_per_enabled(vcpu)) 472 471 guest_perc = 0; 473 472 474 473 /* filter "successful-branching" events */ 475 - if (guest_perc & PER_EVENT_BRANCH && 474 + if (guest_perc & PER_CODE_BRANCH && 476 475 cr9 & PER_CONTROL_BRANCH_ADDRESS && 477 476 !in_addr_range(addr, cr10, cr11)) 478 - guest_perc &= ~PER_EVENT_BRANCH; 477 + guest_perc &= ~PER_CODE_BRANCH; 479 478 480 479 /* filter "instruction-fetching" events */ 481 - if (guest_perc & PER_EVENT_IFETCH && 480 + if (guest_perc & PER_CODE_IFETCH && 482 481 !in_addr_range(peraddr, cr10, cr11)) 483 - guest_perc &= ~PER_EVENT_IFETCH; 482 + guest_perc &= ~PER_CODE_IFETCH; 484 483 485 484 /* All other PER events will be given to the guest */ 486 485 /* TODO: Check altered address/address space */ 487 486 488 - vcpu->arch.sie_block->perc = guest_perc >> 24; 487 + vcpu->arch.sie_block->perc = guest_perc; 489 488 490 489 if (!guest_perc) 491 490 vcpu->arch.sie_block->iprcc &= ~PGM_PER;

+1

arch/s390/kvm/intercept.c

··· 29 29 [0x01] = kvm_s390_handle_01, 30 30 [0x82] = kvm_s390_handle_lpsw, 31 31 [0x83] = kvm_s390_handle_diag, 32 + [0xaa] = kvm_s390_handle_aa, 32 33 [0xae] = kvm_s390_handle_sigp, 33 34 [0xb2] = kvm_s390_handle_b2, 34 35 [0xb6] = kvm_s390_handle_stctl,

+75 -23

arch/s390/kvm/interrupt.c

··· 24 24 #include <asm/sclp.h> 25 25 #include <asm/isc.h> 26 26 #include <asm/gmap.h> 27 + #include <asm/switch_to.h> 28 + #include <asm/nmi.h> 27 29 #include "kvm-s390.h" 28 30 #include "gaccess.h" 29 31 #include "trace-s390.h" ··· 42 40 if (!(atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_ECALL_PEND)) 43 41 return 0; 44 42 43 + BUG_ON(!kvm_s390_use_sca_entries()); 45 44 read_lock(&vcpu->kvm->arch.sca_lock); 46 45 if (vcpu->kvm->arch.use_esca) { 47 46 struct esca_block *sca = vcpu->kvm->arch.sca; ··· 71 68 { 72 69 int expect, rc; 73 70 71 + BUG_ON(!kvm_s390_use_sca_entries()); 74 72 read_lock(&vcpu->kvm->arch.sca_lock); 75 73 if (vcpu->kvm->arch.use_esca) { 76 74 struct esca_block *sca = vcpu->kvm->arch.sca; ··· 113 109 struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int; 114 110 int rc, expect; 115 111 112 + if (!kvm_s390_use_sca_entries()) 113 + return; 116 114 atomic_andnot(CPUSTAT_ECALL_PEND, li->cpuflags); 117 115 read_lock(&vcpu->kvm->arch.sca_lock); 118 116 if (vcpu->kvm->arch.use_esca) { ··· 406 400 return rc ? -EFAULT : 0; 407 401 } 408 402 403 + static int __write_machine_check(struct kvm_vcpu *vcpu, 404 + struct kvm_s390_mchk_info *mchk) 405 + { 406 + unsigned long ext_sa_addr; 407 + freg_t fprs[NUM_FPRS]; 408 + union mci mci; 409 + int rc; 410 + 411 + mci.val = mchk->mcic; 412 + /* take care of lazy register loading via vcpu load/put */ 413 + save_fpu_regs(); 414 + save_access_regs(vcpu->run->s.regs.acrs); 415 + 416 + /* Extended save area */ 417 + rc = read_guest_lc(vcpu, __LC_VX_SAVE_AREA_ADDR, &ext_sa_addr, 418 + sizeof(unsigned long)); 419 + /* Only bits 0-53 are used for address formation */ 420 + ext_sa_addr &= ~0x3ffUL; 421 + if (!rc && mci.vr && ext_sa_addr && test_kvm_facility(vcpu->kvm, 129)) { 422 + if (write_guest_abs(vcpu, ext_sa_addr, vcpu->run->s.regs.vrs, 423 + 512)) 424 + mci.vr = 0; 425 + } else { 426 + mci.vr = 0; 427 + } 428 + 429 + /* General interruption information */ 430 + rc |= put_guest_lc(vcpu, 1, (u8 __user *) __LC_AR_MODE_ID); 431 + rc |= write_guest_lc(vcpu, __LC_MCK_OLD_PSW, 432 + &vcpu->arch.sie_block->gpsw, sizeof(psw_t)); 433 + rc |= read_guest_lc(vcpu, __LC_MCK_NEW_PSW, 434 + &vcpu->arch.sie_block->gpsw, sizeof(psw_t)); 435 + rc |= put_guest_lc(vcpu, mci.val, (u64 __user *) __LC_MCCK_CODE); 436 + 437 + /* Register-save areas */ 438 + if (MACHINE_HAS_VX) { 439 + convert_vx_to_fp(fprs, (__vector128 *) vcpu->run->s.regs.vrs); 440 + rc |= write_guest_lc(vcpu, __LC_FPREGS_SAVE_AREA, fprs, 128); 441 + } else { 442 + rc |= write_guest_lc(vcpu, __LC_FPREGS_SAVE_AREA, 443 + vcpu->run->s.regs.fprs, 128); 444 + } 445 + rc |= write_guest_lc(vcpu, __LC_GPREGS_SAVE_AREA, 446 + vcpu->run->s.regs.gprs, 128); 447 + rc |= put_guest_lc(vcpu, current->thread.fpu.fpc, 448 + (u32 __user *) __LC_FP_CREG_SAVE_AREA); 449 + rc |= put_guest_lc(vcpu, vcpu->arch.sie_block->todpr, 450 + (u32 __user *) __LC_TOD_PROGREG_SAVE_AREA); 451 + rc |= put_guest_lc(vcpu, kvm_s390_get_cpu_timer(vcpu), 452 + (u64 __user *) __LC_CPU_TIMER_SAVE_AREA); 453 + rc |= put_guest_lc(vcpu, vcpu->arch.sie_block->ckc >> 8, 454 + (u64 __user *) __LC_CLOCK_COMP_SAVE_AREA); 455 + rc |= write_guest_lc(vcpu, __LC_AREGS_SAVE_AREA, 456 + &vcpu->run->s.regs.acrs, 64); 457 + rc |= write_guest_lc(vcpu, __LC_CREGS_SAVE_AREA, 458 + &vcpu->arch.sie_block->gcr, 128); 459 + 460 + /* Extended interruption information */ 461 + rc |= put_guest_lc(vcpu, mchk->ext_damage_code, 462 + (u32 __user *) __LC_EXT_DAMAGE_CODE); 463 + rc |= put_guest_lc(vcpu, mchk->failing_storage_address, 464 + (u64 __user *) __LC_MCCK_FAIL_STOR_ADDR); 465 + rc |= write_guest_lc(vcpu, __LC_PSW_SAVE_AREA, &mchk->fixed_logout, 466 + sizeof(mchk->fixed_logout)); 467 + return rc ? -EFAULT : 0; 468 + } 469 + 409 470 static int __must_check __deliver_machine_check(struct kvm_vcpu *vcpu) 410 471 { 411 472 struct kvm_s390_float_interrupt *fi = &vcpu->kvm->arch.float_int; 412 473 struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int; 413 474 struct kvm_s390_mchk_info mchk = {}; 414 - unsigned long adtl_status_addr; 415 475 int deliver = 0; 416 476 int rc = 0; 417 477 ··· 518 446 trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, 519 447 KVM_S390_MCHK, 520 448 mchk.cr14, mchk.mcic); 521 - 522 - rc = kvm_s390_vcpu_store_status(vcpu, 523 - KVM_S390_STORE_STATUS_PREFIXED); 524 - rc |= read_guest_lc(vcpu, __LC_VX_SAVE_AREA_ADDR, 525 - &adtl_status_addr, 526 - sizeof(unsigned long)); 527 - rc |= kvm_s390_vcpu_store_adtl_status(vcpu, 528 - adtl_status_addr); 529 - rc |= put_guest_lc(vcpu, mchk.mcic, 530 - (u64 __user *) __LC_MCCK_CODE); 531 - rc |= put_guest_lc(vcpu, mchk.failing_storage_address, 532 - (u64 __user *) __LC_MCCK_FAIL_STOR_ADDR); 533 - rc |= write_guest_lc(vcpu, __LC_PSW_SAVE_AREA, 534 - &mchk.fixed_logout, 535 - sizeof(mchk.fixed_logout)); 536 - rc |= write_guest_lc(vcpu, __LC_MCK_OLD_PSW, 537 - &vcpu->arch.sie_block->gpsw, 538 - sizeof(psw_t)); 539 - rc |= read_guest_lc(vcpu, __LC_MCK_NEW_PSW, 540 - &vcpu->arch.sie_block->gpsw, 541 - sizeof(psw_t)); 449 + rc = __write_machine_check(vcpu, &mchk); 542 450 } 543 - return rc ? -EFAULT : 0; 451 + return rc; 544 452 } 545 453 546 454 static int __must_check __deliver_restart(struct kvm_vcpu *vcpu)

+40 -35

arch/s390/kvm/kvm-s390.c

··· 384 384 case KVM_CAP_NR_VCPUS: 385 385 case KVM_CAP_MAX_VCPUS: 386 386 r = KVM_S390_BSCA_CPU_SLOTS; 387 - if (sclp.has_esca && sclp.has_64bscao) 387 + if (!kvm_s390_use_sca_entries()) 388 + r = KVM_MAX_VCPUS; 389 + else if (sclp.has_esca && sclp.has_64bscao) 388 390 r = KVM_S390_ESCA_CPU_SLOTS; 389 391 break; 390 392 case KVM_CAP_NR_MEMSLOTS: ··· 1500 1498 return rc; 1501 1499 } 1502 1500 1501 + bool kvm_arch_has_vcpu_debugfs(void) 1502 + { 1503 + return false; 1504 + } 1505 + 1506 + int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu) 1507 + { 1508 + return 0; 1509 + } 1510 + 1503 1511 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) 1504 1512 { 1505 1513 VCPU_EVENT(vcpu, 3, "%s", "free cpu"); ··· 1573 1561 1574 1562 static void sca_del_vcpu(struct kvm_vcpu *vcpu) 1575 1563 { 1564 + if (!kvm_s390_use_sca_entries()) 1565 + return; 1576 1566 read_lock(&vcpu->kvm->arch.sca_lock); 1577 1567 if (vcpu->kvm->arch.use_esca) { 1578 1568 struct esca_block *sca = vcpu->kvm->arch.sca; ··· 1592 1578 1593 1579 static void sca_add_vcpu(struct kvm_vcpu *vcpu) 1594 1580 { 1581 + if (!kvm_s390_use_sca_entries()) { 1582 + struct bsca_block *sca = vcpu->kvm->arch.sca; 1583 + 1584 + /* we still need the basic sca for the ipte control */ 1585 + vcpu->arch.sie_block->scaoh = (__u32)(((__u64)sca) >> 32); 1586 + vcpu->arch.sie_block->scaol = (__u32)(__u64)sca; 1587 + } 1595 1588 read_lock(&vcpu->kvm->arch.sca_lock); 1596 1589 if (vcpu->kvm->arch.use_esca) { 1597 1590 struct esca_block *sca = vcpu->kvm->arch.sca; ··· 1679 1658 { 1680 1659 int rc; 1681 1660 1661 + if (!kvm_s390_use_sca_entries()) { 1662 + if (id < KVM_MAX_VCPUS) 1663 + return true; 1664 + return false; 1665 + } 1682 1666 if (id < KVM_S390_BSCA_CPU_SLOTS) 1683 1667 return true; 1684 1668 if (!sclp.has_esca || !sclp.has_64bscao) ··· 1972 1946 vcpu->arch.sie_block->eca |= 1; 1973 1947 if (sclp.has_sigpif) 1974 1948 vcpu->arch.sie_block->eca |= 0x10000000U; 1975 - if (test_kvm_facility(vcpu->kvm, 64)) 1976 - vcpu->arch.sie_block->ecb3 |= 0x01; 1977 1949 if (test_kvm_facility(vcpu->kvm, 129)) { 1978 1950 vcpu->arch.sie_block->eca |= 0x00020000; 1979 1951 vcpu->arch.sie_block->ecd |= 0x20000000; ··· 2728 2704 if (vcpu->arch.pfault_token == KVM_S390_PFAULT_TOKEN_INVALID) 2729 2705 kvm_clear_async_pf_completion_queue(vcpu); 2730 2706 } 2707 + /* 2708 + * If userspace sets the riccb (e.g. after migration) to a valid state, 2709 + * we should enable RI here instead of doing the lazy enablement. 2710 + */ 2711 + if ((kvm_run->kvm_dirty_regs & KVM_SYNC_RICCB) && 2712 + test_kvm_facility(vcpu->kvm, 64)) { 2713 + struct runtime_instr_cb *riccb = 2714 + (struct runtime_instr_cb *) &kvm_run->s.regs.riccb; 2715 + 2716 + if (riccb->valid) 2717 + vcpu->arch.sie_block->ecb3 |= 0x01; 2718 + } 2719 + 2731 2720 kvm_run->kvm_dirty_regs = 0; 2732 2721 } 2733 2722 ··· 2882 2845 save_access_regs(vcpu->run->s.regs.acrs); 2883 2846 2884 2847 return kvm_s390_store_status_unloaded(vcpu, addr); 2885 - } 2886 - 2887 - /* 2888 - * store additional status at address 2889 - */ 2890 - int kvm_s390_store_adtl_status_unloaded(struct kvm_vcpu *vcpu, 2891 - unsigned long gpa) 2892 - { 2893 - /* Only bits 0-53 are used for address formation */ 2894 - if (!(gpa & ~0x3ff)) 2895 - return 0; 2896 - 2897 - return write_guest_abs(vcpu, gpa & ~0x3ff, 2898 - (void *)&vcpu->run->s.regs.vrs, 512); 2899 - } 2900 - 2901 - int kvm_s390_vcpu_store_adtl_status(struct kvm_vcpu *vcpu, unsigned long addr) 2902 - { 2903 - if (!test_kvm_facility(vcpu->kvm, 129)) 2904 - return 0; 2905 - 2906 - /* 2907 - * The guest VXRS are in the host VXRs due to the lazy 2908 - * copying in vcpu load/put. We can simply call save_fpu_regs() 2909 - * to save the current register state because we are in the 2910 - * middle of a load/put cycle. 2911 - * 2912 - * Let's update our copies before we save it into the save area. 2913 - */ 2914 - save_fpu_regs(); 2915 - 2916 - return kvm_s390_store_adtl_status_unloaded(vcpu, addr); 2917 2848 } 2918 2849 2919 2850 static void __disable_ibs_on_vcpu(struct kvm_vcpu *vcpu)

+11 -3

arch/s390/kvm/kvm-s390.h

··· 20 20 #include <linux/kvm_host.h> 21 21 #include <asm/facility.h> 22 22 #include <asm/processor.h> 23 + #include <asm/sclp.h> 23 24 24 25 typedef int (*intercept_handler_t)(struct kvm_vcpu *vcpu); 25 26 ··· 246 245 247 246 /* implemented in priv.c */ 248 247 int is_valid_psw(psw_t *psw); 248 + int kvm_s390_handle_aa(struct kvm_vcpu *vcpu); 249 249 int kvm_s390_handle_b2(struct kvm_vcpu *vcpu); 250 250 int kvm_s390_handle_e5(struct kvm_vcpu *vcpu); 251 251 int kvm_s390_handle_01(struct kvm_vcpu *vcpu); ··· 275 273 void kvm_s390_set_tod_clock(struct kvm *kvm, u64 tod); 276 274 long kvm_arch_fault_in_page(struct kvm_vcpu *vcpu, gpa_t gpa, int writable); 277 275 int kvm_s390_store_status_unloaded(struct kvm_vcpu *vcpu, unsigned long addr); 278 - int kvm_s390_store_adtl_status_unloaded(struct kvm_vcpu *vcpu, 279 - unsigned long addr); 280 276 int kvm_s390_vcpu_store_status(struct kvm_vcpu *vcpu, unsigned long addr); 281 - int kvm_s390_vcpu_store_adtl_status(struct kvm_vcpu *vcpu, unsigned long addr); 282 277 void kvm_s390_vcpu_start(struct kvm_vcpu *vcpu); 283 278 void kvm_s390_vcpu_stop(struct kvm_vcpu *vcpu); 284 279 void kvm_s390_vcpu_block(struct kvm_vcpu *vcpu); ··· 387 388 struct bsca_block *sca = kvm->arch.sca; /* SCA version doesn't matter */ 388 389 389 390 return &sca->ipte_control; 391 + } 392 + static inline int kvm_s390_use_sca_entries(void) 393 + { 394 + /* 395 + * Without SIGP interpretation, only SRS interpretation (if available) 396 + * might use the entries. By not setting the entries and keeping them 397 + * invalid, hardware will not access them but intercept. 398 + */ 399 + return sclp.has_sigpif; 390 400 } 391 401 #endif

+21

arch/s390/kvm/priv.c

··· 32 32 #include "kvm-s390.h" 33 33 #include "trace.h" 34 34 35 + static int handle_ri(struct kvm_vcpu *vcpu) 36 + { 37 + if (test_kvm_facility(vcpu->kvm, 64)) { 38 + vcpu->arch.sie_block->ecb3 |= 0x01; 39 + kvm_s390_retry_instr(vcpu); 40 + return 0; 41 + } else 42 + return kvm_s390_inject_program_int(vcpu, PGM_OPERATION); 43 + } 44 + 45 + int kvm_s390_handle_aa(struct kvm_vcpu *vcpu) 46 + { 47 + if ((vcpu->arch.sie_block->ipa & 0xf) <= 4) 48 + return handle_ri(vcpu); 49 + else 50 + return -EOPNOTSUPP; 51 + } 52 + 35 53 /* Handle SCK (SET CLOCK) interception */ 36 54 static int handle_set_clock(struct kvm_vcpu *vcpu) 37 55 { ··· 1111 1093 static const intercept_handler_t eb_handlers[256] = { 1112 1094 [0x2f] = handle_lctlg, 1113 1095 [0x25] = handle_stctg, 1096 + [0x60] = handle_ri, 1097 + [0x61] = handle_ri, 1098 + [0x62] = handle_ri, 1114 1099 }; 1115 1100 1116 1101 int kvm_s390_handle_eb(struct kvm_vcpu *vcpu)

+1

arch/x86/configs/kvm_guest.config kernel/configs/kvm_guest.config

··· 29 29 CONFIG_SCSI_LOWLEVEL=y 30 30 CONFIG_SCSI_VIRTIO=y 31 31 CONFIG_VIRTIO_INPUT=y 32 + CONFIG_DRM_VIRTIO_GPU=y

+1 -1

arch/x86/entry/vdso/vclock_gettime.c

··· 129 129 return 0; 130 130 } 131 131 132 - ret = __pvclock_read_cycles(pvti); 132 + ret = __pvclock_read_cycles(pvti, rdtsc_ordered()); 133 133 } while (pvclock_read_retry(pvti, version)); 134 134 135 135 /* refer to vread_tsc() comment for rationale */

+40 -36

arch/x86/include/asm/kvm_host.h

··· 568 568 struct kvm_steal_time steal; 569 569 } st; 570 570 571 + u64 tsc_offset; 571 572 u64 last_guest_tsc; 572 573 u64 last_host_tsc; 573 574 u64 tsc_offset_adjustment; ··· 702 701 /* Hyper-v based guest crash (NT kernel bugcheck) parameters */ 703 702 u64 hv_crash_param[HV_X64_MSR_CRASH_PARAMS]; 704 703 u64 hv_crash_ctl; 704 + 705 + HV_REFERENCE_TSC_PAGE tsc_ref; 705 706 }; 706 707 707 708 struct kvm_arch { ··· 784 781 bool disabled_lapic_found; 785 782 786 783 /* Struct members for AVIC */ 784 + u32 avic_vm_id; 787 785 u32 ldr_mode; 788 786 struct page *avic_logical_id_table_page; 789 787 struct page *avic_physical_id_table_page; 788 + struct hlist_node hnode; 790 789 791 790 bool x2apic_format; 792 791 bool x2apic_broadcast_quirk_disabled; 793 792 }; 794 793 795 794 struct kvm_vm_stat { 796 - u32 mmu_shadow_zapped; 797 - u32 mmu_pte_write; 798 - u32 mmu_pte_updated; 799 - u32 mmu_pde_zapped; 800 - u32 mmu_flooded; 801 - u32 mmu_recycled; 802 - u32 mmu_cache_miss; 803 - u32 mmu_unsync; 804 - u32 remote_tlb_flush; 805 - u32 lpages; 795 + ulong mmu_shadow_zapped; 796 + ulong mmu_pte_write; 797 + ulong mmu_pte_updated; 798 + ulong mmu_pde_zapped; 799 + ulong mmu_flooded; 800 + ulong mmu_recycled; 801 + ulong mmu_cache_miss; 802 + ulong mmu_unsync; 803 + ulong remote_tlb_flush; 804 + ulong lpages; 806 805 }; 807 806 808 807 struct kvm_vcpu_stat { 809 - u32 pf_fixed; 810 - u32 pf_guest; 811 - u32 tlb_flush; 812 - u32 invlpg; 808 + u64 pf_fixed; 809 + u64 pf_guest; 810 + u64 tlb_flush; 811 + u64 invlpg; 813 812 814 - u32 exits; 815 - u32 io_exits; 816 - u32 mmio_exits; 817 - u32 signal_exits; 818 - u32 irq_window_exits; 819 - u32 nmi_window_exits; 820 - u32 halt_exits; 821 - u32 halt_successful_poll; 822 - u32 halt_attempted_poll; 823 - u32 halt_poll_invalid; 824 - u32 halt_wakeup; 825 - u32 request_irq_exits; 826 - u32 irq_exits; 827 - u32 host_state_reload; 828 - u32 efer_reload; 829 - u32 fpu_reload; 830 - u32 insn_emulation; 831 - u32 insn_emulation_fail; 832 - u32 hypercalls; 833 - u32 irq_injections; 834 - u32 nmi_injections; 813 + u64 exits; 814 + u64 io_exits; 815 + u64 mmio_exits; 816 + u64 signal_exits; 817 + u64 irq_window_exits; 818 + u64 nmi_window_exits; 819 + u64 halt_exits; 820 + u64 halt_successful_poll; 821 + u64 halt_attempted_poll; 822 + u64 halt_poll_invalid; 823 + u64 halt_wakeup; 824 + u64 request_irq_exits; 825 + u64 irq_exits; 826 + u64 host_state_reload; 827 + u64 efer_reload; 828 + u64 fpu_reload; 829 + u64 insn_emulation; 830 + u64 insn_emulation_fail; 831 + u64 hypercalls; 832 + u64 irq_injections; 833 + u64 nmi_injections; 835 834 }; 836 835 837 836 struct x86_instruction_info; ··· 956 951 957 952 bool (*has_wbinvd_exit)(void); 958 953 959 - u64 (*read_tsc_offset)(struct kvm_vcpu *vcpu); 960 954 void (*write_tsc_offset)(struct kvm_vcpu *vcpu, u64 offset); 961 955 962 956 u64 (*read_l1_tsc)(struct kvm_vcpu *vcpu, u64 host_tsc);

+3 -2

arch/x86/include/asm/pvclock.h

··· 87 87 } 88 88 89 89 static __always_inline 90 - cycle_t __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src) 90 + cycle_t __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src, 91 + u64 tsc) 91 92 { 92 - u64 delta = rdtsc_ordered() - src->tsc_timestamp; 93 + u64 delta = tsc - src->tsc_timestamp; 93 94 cycle_t offset = pvclock_scale_delta(delta, src->tsc_to_system_mul, 94 95 src->tsc_shift); 95 96 return src->system_time + offset;

+1 -1

arch/x86/kernel/pvclock.c

··· 80 80 81 81 do { 82 82 version = pvclock_read_begin(src); 83 - ret = __pvclock_read_cycles(src); 83 + ret = __pvclock_read_cycles(src, rdtsc_ordered()); 84 84 flags = src->flags; 85 85 } while (pvclock_read_retry(src, version)); 86 86

+1 -1

arch/x86/kvm/Makefile

··· 13 13 14 14 kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ 15 15 i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ 16 - hyperv.o page_track.o 16 + hyperv.o page_track.o debugfs.o 17 17 18 18 kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += assigned-dev.o iommu.o 19 19

+2 -1

arch/x86/kvm/cpuid.c

··· 366 366 F(FSGSBASE) | F(BMI1) | F(HLE) | F(AVX2) | F(SMEP) | 367 367 F(BMI2) | F(ERMS) | f_invpcid | F(RTM) | f_mpx | F(RDSEED) | 368 368 F(ADX) | F(SMAP) | F(AVX512F) | F(AVX512PF) | F(AVX512ER) | 369 - F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB); 369 + F(AVX512CD) | F(CLFLUSHOPT) | F(CLWB) | F(AVX512DQ) | 370 + F(AVX512BW) | F(AVX512VL); 370 371 371 372 /* cpuid 0xD.1.eax */ 372 373 const u32 kvm_cpuid_D_1_eax_x86_features =

+69

arch/x86/kvm/debugfs.c

··· 1 + /* 2 + * Kernel-based Virtual Machine driver for Linux 3 + * 4 + * Copyright 2016 Red Hat, Inc. and/or its affiliates. 5 + * 6 + * This work is licensed under the terms of the GNU GPL, version 2. See 7 + * the COPYING file in the top-level directory. 8 + * 9 + */ 10 + #include <linux/kvm_host.h> 11 + #include <linux/debugfs.h> 12 + 13 + bool kvm_arch_has_vcpu_debugfs(void) 14 + { 15 + return true; 16 + } 17 + 18 + static int vcpu_get_tsc_offset(void *data, u64 *val) 19 + { 20 + struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data; 21 + *val = vcpu->arch.tsc_offset; 22 + return 0; 23 + } 24 + 25 + DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_offset_fops, vcpu_get_tsc_offset, NULL, "%lld\n"); 26 + 27 + static int vcpu_get_tsc_scaling_ratio(void *data, u64 *val) 28 + { 29 + struct kvm_vcpu *vcpu = (struct kvm_vcpu *) data; 30 + *val = vcpu->arch.tsc_scaling_ratio; 31 + return 0; 32 + } 33 + 34 + DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_scaling_fops, vcpu_get_tsc_scaling_ratio, NULL, "%llu\n"); 35 + 36 + static int vcpu_get_tsc_scaling_frac_bits(void *data, u64 *val) 37 + { 38 + *val = kvm_tsc_scaling_ratio_frac_bits; 39 + return 0; 40 + } 41 + 42 + DEFINE_SIMPLE_ATTRIBUTE(vcpu_tsc_scaling_frac_fops, vcpu_get_tsc_scaling_frac_bits, NULL, "%llu\n"); 43 + 44 + int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu) 45 + { 46 + struct dentry *ret; 47 + 48 + ret = debugfs_create_file("tsc-offset", 0444, 49 + vcpu->debugfs_dentry, 50 + vcpu, &vcpu_tsc_offset_fops); 51 + if (!ret) 52 + return -ENOMEM; 53 + 54 + if (kvm_has_tsc_control) { 55 + ret = debugfs_create_file("tsc-scaling-ratio", 0444, 56 + vcpu->debugfs_dentry, 57 + vcpu, &vcpu_tsc_scaling_fops); 58 + if (!ret) 59 + return -ENOMEM; 60 + ret = debugfs_create_file("tsc-scaling-ratio-frac-bits", 0444, 61 + vcpu->debugfs_dentry, 62 + vcpu, &vcpu_tsc_scaling_frac_fops); 63 + if (!ret) 64 + return -ENOMEM; 65 + 66 + } 67 + 68 + return 0; 69 + }

+141 -16

arch/x86/kvm/hyperv.c

··· 386 386 387 387 static u64 get_time_ref_counter(struct kvm *kvm) 388 388 { 389 - return div_u64(get_kernel_ns() + kvm->arch.kvmclock_offset, 100); 389 + struct kvm_hv *hv = &kvm->arch.hyperv; 390 + struct kvm_vcpu *vcpu; 391 + u64 tsc; 392 + 393 + /* 394 + * The guest has not set up the TSC page or the clock isn't 395 + * stable, fall back to get_kvmclock_ns. 396 + */ 397 + if (!hv->tsc_ref.tsc_sequence) 398 + return div_u64(get_kvmclock_ns(kvm), 100); 399 + 400 + vcpu = kvm_get_vcpu(kvm, 0); 401 + tsc = kvm_read_l1_tsc(vcpu, rdtsc()); 402 + return mul_u64_u64_shr(tsc, hv->tsc_ref.tsc_scale, 64) 403 + + hv->tsc_ref.tsc_offset; 390 404 } 391 405 392 406 static void stimer_mark_pending(struct kvm_vcpu_hv_stimer *stimer, ··· 770 756 return 0; 771 757 } 772 758 759 + /* 760 + * The kvmclock and Hyper-V TSC page use similar formulas, and converting 761 + * between them is possible: 762 + * 763 + * kvmclock formula: 764 + * nsec = (ticks - tsc_timestamp) * tsc_to_system_mul * 2^(tsc_shift-32) 765 + * + system_time 766 + * 767 + * Hyper-V formula: 768 + * nsec/100 = ticks * scale / 2^64 + offset 769 + * 770 + * When tsc_timestamp = system_time = 0, offset is zero in the Hyper-V formula. 771 + * By dividing the kvmclock formula by 100 and equating what's left we get: 772 + * ticks * scale / 2^64 = ticks * tsc_to_system_mul * 2^(tsc_shift-32) / 100 773 + * scale / 2^64 = tsc_to_system_mul * 2^(tsc_shift-32) / 100 774 + * scale = tsc_to_system_mul * 2^(32+tsc_shift) / 100 775 + * 776 + * Now expand the kvmclock formula and divide by 100: 777 + * nsec = ticks * tsc_to_system_mul * 2^(tsc_shift-32) 778 + * - tsc_timestamp * tsc_to_system_mul * 2^(tsc_shift-32) 779 + * + system_time 780 + * nsec/100 = ticks * tsc_to_system_mul * 2^(tsc_shift-32) / 100 781 + * - tsc_timestamp * tsc_to_system_mul * 2^(tsc_shift-32) / 100 782 + * + system_time / 100 783 + * 784 + * Replace tsc_to_system_mul * 2^(tsc_shift-32) / 100 by scale / 2^64: 785 + * nsec/100 = ticks * scale / 2^64 786 + * - tsc_timestamp * scale / 2^64 787 + * + system_time / 100 788 + * 789 + * Equate with the Hyper-V formula so that ticks * scale / 2^64 cancels out: 790 + * offset = system_time / 100 - tsc_timestamp * scale / 2^64 791 + * 792 + * These two equivalencies are implemented in this function. 793 + */ 794 + static bool compute_tsc_page_parameters(struct pvclock_vcpu_time_info *hv_clock, 795 + HV_REFERENCE_TSC_PAGE *tsc_ref) 796 + { 797 + u64 max_mul; 798 + 799 + if (!(hv_clock->flags & PVCLOCK_TSC_STABLE_BIT)) 800 + return false; 801 + 802 + /* 803 + * check if scale would overflow, if so we use the time ref counter 804 + * tsc_to_system_mul * 2^(tsc_shift+32) / 100 >= 2^64 805 + * tsc_to_system_mul / 100 >= 2^(32-tsc_shift) 806 + * tsc_to_system_mul >= 100 * 2^(32-tsc_shift) 807 + */ 808 + max_mul = 100ull << (32 - hv_clock->tsc_shift); 809 + if (hv_clock->tsc_to_system_mul >= max_mul) 810 + return false; 811 + 812 + /* 813 + * Otherwise compute the scale and offset according to the formulas 814 + * derived above. 815 + */ 816 + tsc_ref->tsc_scale = 817 + mul_u64_u32_div(1ULL << (32 + hv_clock->tsc_shift), 818 + hv_clock->tsc_to_system_mul, 819 + 100); 820 + 821 + tsc_ref->tsc_offset = hv_clock->system_time; 822 + do_div(tsc_ref->tsc_offset, 100); 823 + tsc_ref->tsc_offset -= 824 + mul_u64_u64_shr(hv_clock->tsc_timestamp, tsc_ref->tsc_scale, 64); 825 + return true; 826 + } 827 + 828 + void kvm_hv_setup_tsc_page(struct kvm *kvm, 829 + struct pvclock_vcpu_time_info *hv_clock) 830 + { 831 + struct kvm_hv *hv = &kvm->arch.hyperv; 832 + u32 tsc_seq; 833 + u64 gfn; 834 + 835 + BUILD_BUG_ON(sizeof(tsc_seq) != sizeof(hv->tsc_ref.tsc_sequence)); 836 + BUILD_BUG_ON(offsetof(HV_REFERENCE_TSC_PAGE, tsc_sequence) != 0); 837 + 838 + if (!(hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE)) 839 + return; 840 + 841 + gfn = hv->hv_tsc_page >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT; 842 + /* 843 + * Because the TSC parameters only vary when there is a 844 + * change in the master clock, do not bother with caching. 845 + */ 846 + if (unlikely(kvm_read_guest(kvm, gfn_to_gpa(gfn), 847 + &tsc_seq, sizeof(tsc_seq)))) 848 + return; 849 + 850 + /* 851 + * While we're computing and writing the parameters, force the 852 + * guest to use the time reference count MSR. 853 + */ 854 + hv->tsc_ref.tsc_sequence = 0; 855 + if (kvm_write_guest(kvm, gfn_to_gpa(gfn), 856 + &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence))) 857 + return; 858 + 859 + if (!compute_tsc_page_parameters(hv_clock, &hv->tsc_ref)) 860 + return; 861 + 862 + /* Ensure sequence is zero before writing the rest of the struct. */ 863 + smp_wmb(); 864 + if (kvm_write_guest(kvm, gfn_to_gpa(gfn), &hv->tsc_ref, sizeof(hv->tsc_ref))) 865 + return; 866 + 867 + /* 868 + * Now switch to the TSC page mechanism by writing the sequence. 869 + */ 870 + tsc_seq++; 871 + if (tsc_seq == 0xFFFFFFFF || tsc_seq == 0) 872 + tsc_seq = 1; 873 + 874 + /* Write the struct entirely before the non-zero sequence. */ 875 + smp_wmb(); 876 + 877 + hv->tsc_ref.tsc_sequence = tsc_seq; 878 + kvm_write_guest(kvm, gfn_to_gpa(gfn), 879 + &hv->tsc_ref, sizeof(hv->tsc_ref.tsc_sequence)); 880 + } 881 + 773 882 static int kvm_hv_set_msr_pw(struct kvm_vcpu *vcpu, u32 msr, u64 data, 774 883 bool host) 775 884 { ··· 930 793 mark_page_dirty(kvm, gfn); 931 794 break; 932 795 } 933 - case HV_X64_MSR_REFERENCE_TSC: { 934 - u64 gfn; 935 - HV_REFERENCE_TSC_PAGE tsc_ref; 936 - 937 - memset(&tsc_ref, 0, sizeof(tsc_ref)); 796 + case HV_X64_MSR_REFERENCE_TSC: 938 797 hv->hv_tsc_page = data; 939 - if (!(data & HV_X64_MSR_TSC_REFERENCE_ENABLE)) 940 - break; 941 - gfn = data >> HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT; 942 - if (kvm_write_guest( 943 - kvm, 944 - gfn << HV_X64_MSR_TSC_REFERENCE_ADDRESS_SHIFT, 945 - &tsc_ref, sizeof(tsc_ref))) 946 - return 1; 947 - mark_page_dirty(kvm, gfn); 798 + if (hv->hv_tsc_page & HV_X64_MSR_TSC_REFERENCE_ENABLE) 799 + kvm_make_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu); 948 800 break; 949 - } 950 801 case HV_X64_MSR_CRASH_P0 ... HV_X64_MSR_CRASH_P4: 951 802 return kvm_hv_msr_set_crash_data(vcpu, 952 803 msr - HV_X64_MSR_CRASH_P0,

+3

arch/x86/kvm/hyperv.h

··· 84 84 85 85 void kvm_hv_process_stimers(struct kvm_vcpu *vcpu); 86 86 87 + void kvm_hv_setup_tsc_page(struct kvm *kvm, 88 + struct pvclock_vcpu_time_info *hv_clock); 89 + 87 90 #endif

+3 -2

arch/x86/kvm/lapic.c

··· 1761 1761 if (value & MSR_IA32_APICBASE_ENABLE) { 1762 1762 kvm_apic_set_xapic_id(apic, vcpu->vcpu_id); 1763 1763 static_key_slow_dec_deferred(&apic_hw_disabled); 1764 - } else 1764 + } else { 1765 1765 static_key_slow_inc(&apic_hw_disabled.key); 1766 - recalculate_apic_map(vcpu->kvm); 1766 + recalculate_apic_map(vcpu->kvm); 1767 + } 1767 1768 } 1768 1769 1769 1770 if ((old_value ^ value) & X2APIC_ENABLE) {

+6 -6

arch/x86/kvm/mmu.c

··· 1207 1207 * 1208 1208 * Return true if tlb need be flushed. 1209 1209 */ 1210 - static bool spte_write_protect(struct kvm *kvm, u64 *sptep, bool pt_protect) 1210 + static bool spte_write_protect(u64 *sptep, bool pt_protect) 1211 1211 { 1212 1212 u64 spte = *sptep; 1213 1213 ··· 1233 1233 bool flush = false; 1234 1234 1235 1235 for_each_rmap_spte(rmap_head, &iter, sptep) 1236 - flush |= spte_write_protect(kvm, sptep, pt_protect); 1236 + flush |= spte_write_protect(sptep, pt_protect); 1237 1237 1238 1238 return flush; 1239 1239 } 1240 1240 1241 - static bool spte_clear_dirty(struct kvm *kvm, u64 *sptep) 1241 + static bool spte_clear_dirty(u64 *sptep) 1242 1242 { 1243 1243 u64 spte = *sptep; 1244 1244 ··· 1256 1256 bool flush = false; 1257 1257 1258 1258 for_each_rmap_spte(rmap_head, &iter, sptep) 1259 - flush |= spte_clear_dirty(kvm, sptep); 1259 + flush |= spte_clear_dirty(sptep); 1260 1260 1261 1261 return flush; 1262 1262 } 1263 1263 1264 - static bool spte_set_dirty(struct kvm *kvm, u64 *sptep) 1264 + static bool spte_set_dirty(u64 *sptep) 1265 1265 { 1266 1266 u64 spte = *sptep; 1267 1267 ··· 1279 1279 bool flush = false; 1280 1280 1281 1281 for_each_rmap_spte(rmap_head, &iter, sptep) 1282 - flush |= spte_set_dirty(kvm, sptep); 1282 + flush |= spte_set_dirty(sptep); 1283 1283 1284 1284 return flush; 1285 1285 }

+387 -30

arch/x86/kvm/svm.c

··· 34 34 #include <linux/sched.h> 35 35 #include <linux/trace_events.h> 36 36 #include <linux/slab.h> 37 + #include <linux/amd-iommu.h> 38 + #include <linux/hashtable.h> 37 39 38 40 #include <asm/apic.h> 39 41 #include <asm/perf_event.h> ··· 43 41 #include <asm/desc.h> 44 42 #include <asm/debugreg.h> 45 43 #include <asm/kvm_para.h> 44 + #include <asm/irq_remapping.h> 46 45 47 46 #include <asm/virtext.h> 48 47 #include "trace.h" ··· 98 95 #define AVIC_UNACCEL_ACCESS_WRITE_MASK 1 99 96 #define AVIC_UNACCEL_ACCESS_OFFSET_MASK 0xFF0 100 97 #define AVIC_UNACCEL_ACCESS_VECTOR_MASK 0xFFFFFFFF 98 + 99 + /* AVIC GATAG is encoded using VM and VCPU IDs */ 100 + #define AVIC_VCPU_ID_BITS 8 101 + #define AVIC_VCPU_ID_MASK ((1 << AVIC_VCPU_ID_BITS) - 1) 102 + 103 + #define AVIC_VM_ID_BITS 24 104 + #define AVIC_VM_ID_NR (1 << AVIC_VM_ID_BITS) 105 + #define AVIC_VM_ID_MASK ((1 << AVIC_VM_ID_BITS) - 1) 106 + 107 + #define AVIC_GATAG(x, y) (((x & AVIC_VM_ID_MASK) << AVIC_VCPU_ID_BITS) | \ 108 + (y & AVIC_VCPU_ID_MASK)) 109 + #define AVIC_GATAG_TO_VMID(x) ((x >> AVIC_VCPU_ID_BITS) & AVIC_VM_ID_MASK) 110 + #define AVIC_GATAG_TO_VCPUID(x) (x & AVIC_VCPU_ID_MASK) 101 111 102 112 static bool erratum_383_found __read_mostly; 103 113 ··· 201 185 struct page *avic_backing_page; 202 186 u64 *avic_physical_id_cache; 203 187 bool avic_is_running; 188 + 189 + /* 190 + * Per-vcpu list of struct amd_svm_iommu_ir: 191 + * This is used mainly to store interrupt remapping information used 192 + * when update the vcpu affinity. This avoids the need to scan for 193 + * IRTE and try to match ga_tag in the IOMMU driver. 194 + */ 195 + struct list_head ir_list; 196 + spinlock_t ir_list_lock; 197 + }; 198 + 199 + /* 200 + * This is a wrapper of struct amd_iommu_ir_data. 201 + */ 202 + struct amd_svm_iommu_ir { 203 + struct list_head node; /* Used by SVM for per-vcpu ir_list */ 204 + void *data; /* Storing pointer to struct amd_ir_data */ 204 205 }; 205 206 206 207 #define AVIC_LOGICAL_ID_ENTRY_GUEST_PHYSICAL_ID_MASK (0xFF) ··· 274 241 #ifdef CONFIG_X86_LOCAL_APIC 275 242 module_param(avic, int, S_IRUGO); 276 243 #endif 244 + 245 + /* AVIC VM ID bit masks and lock */ 246 + static DECLARE_BITMAP(avic_vm_id_bitmap, AVIC_VM_ID_NR); 247 + static DEFINE_SPINLOCK(avic_vm_id_lock); 277 248 278 249 static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0); 279 250 static void svm_flush_tlb(struct kvm_vcpu *vcpu); ··· 965 928 set_msr_interception(msrpm, MSR_IA32_LASTINTTOIP, 0, 0); 966 929 } 967 930 931 + /* Note: 932 + * This hash table is used to map VM_ID to a struct kvm_arch, 933 + * when handling AMD IOMMU GALOG notification to schedule in 934 + * a particular vCPU. 935 + */ 936 + #define SVM_VM_DATA_HASH_BITS 8 937 + DECLARE_HASHTABLE(svm_vm_data_hash, SVM_VM_DATA_HASH_BITS); 938 + static spinlock_t svm_vm_data_hash_lock; 939 + 940 + /* Note: 941 + * This function is called from IOMMU driver to notify 942 + * SVM to schedule in a particular vCPU of a particular VM. 943 + */ 944 + static int avic_ga_log_notifier(u32 ga_tag) 945 + { 946 + unsigned long flags; 947 + struct kvm_arch *ka = NULL; 948 + struct kvm_vcpu *vcpu = NULL; 949 + u32 vm_id = AVIC_GATAG_TO_VMID(ga_tag); 950 + u32 vcpu_id = AVIC_GATAG_TO_VCPUID(ga_tag); 951 + 952 + pr_debug("SVM: %s: vm_id=%#x, vcpu_id=%#x\n", __func__, vm_id, vcpu_id); 953 + 954 + spin_lock_irqsave(&svm_vm_data_hash_lock, flags); 955 + hash_for_each_possible(svm_vm_data_hash, ka, hnode, vm_id) { 956 + struct kvm *kvm = container_of(ka, struct kvm, arch); 957 + struct kvm_arch *vm_data = &kvm->arch; 958 + 959 + if (vm_data->avic_vm_id != vm_id) 960 + continue; 961 + vcpu = kvm_get_vcpu_by_id(kvm, vcpu_id); 962 + break; 963 + } 964 + spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); 965 + 966 + if (!vcpu) 967 + return 0; 968 + 969 + /* Note: 970 + * At this point, the IOMMU should have already set the pending 971 + * bit in the vAPIC backing page. So, we just need to schedule 972 + * in the vcpu. 973 + */ 974 + if (vcpu->mode == OUTSIDE_GUEST_MODE) 975 + kvm_vcpu_wake_up(vcpu); 976 + 977 + return 0; 978 + } 979 + 968 980 static __init int svm_hardware_setup(void) 969 981 { 970 982 int cpu; ··· 1072 986 if (avic) { 1073 987 if (!npt_enabled || 1074 988 !boot_cpu_has(X86_FEATURE_AVIC) || 1075 - !IS_ENABLED(CONFIG_X86_LOCAL_APIC)) 989 + !IS_ENABLED(CONFIG_X86_LOCAL_APIC)) { 1076 990 avic = false; 1077 - else 991 + } else { 1078 992 pr_info("AVIC enabled\n"); 993 + 994 + hash_init(svm_vm_data_hash); 995 + spin_lock_init(&svm_vm_data_hash_lock); 996 + amd_iommu_register_ga_log_notifier(&avic_ga_log_notifier); 997 + } 1079 998 } 1080 999 1081 1000 return 0; ··· 1117 1026 seg->attrib = SVM_SELECTOR_P_MASK | type; 1118 1027 seg->limit = 0xffff; 1119 1028 seg->base = 0; 1120 - } 1121 - 1122 - static u64 svm_read_tsc_offset(struct kvm_vcpu *vcpu) 1123 - { 1124 - struct vcpu_svm *svm = to_svm(vcpu); 1125 - 1126 - return svm->vmcb->control.tsc_offset; 1127 1029 } 1128 1030 1129 1031 static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) ··· 1364 1280 return 0; 1365 1281 } 1366 1282 1283 + static inline int avic_get_next_vm_id(void) 1284 + { 1285 + int id; 1286 + 1287 + spin_lock(&avic_vm_id_lock); 1288 + 1289 + /* AVIC VM ID is one-based. */ 1290 + id = find_next_zero_bit(avic_vm_id_bitmap, AVIC_VM_ID_NR, 1); 1291 + if (id <= AVIC_VM_ID_MASK) 1292 + __set_bit(id, avic_vm_id_bitmap); 1293 + else 1294 + id = -EAGAIN; 1295 + 1296 + spin_unlock(&avic_vm_id_lock); 1297 + return id; 1298 + } 1299 + 1300 + static inline int avic_free_vm_id(int id) 1301 + { 1302 + if (id <= 0 || id > AVIC_VM_ID_MASK) 1303 + return -EINVAL; 1304 + 1305 + spin_lock(&avic_vm_id_lock); 1306 + __clear_bit(id, avic_vm_id_bitmap); 1307 + spin_unlock(&avic_vm_id_lock); 1308 + return 0; 1309 + } 1310 + 1367 1311 static void avic_vm_destroy(struct kvm *kvm) 1368 1312 { 1313 + unsigned long flags; 1369 1314 struct kvm_arch *vm_data = &kvm->arch; 1315 + 1316 + avic_free_vm_id(vm_data->avic_vm_id); 1370 1317 1371 1318 if (vm_data->avic_logical_id_table_page) 1372 1319 __free_page(vm_data->avic_logical_id_table_page); 1373 1320 if (vm_data->avic_physical_id_table_page) 1374 1321 __free_page(vm_data->avic_physical_id_table_page); 1322 + 1323 + spin_lock_irqsave(&svm_vm_data_hash_lock, flags); 1324 + hash_del(&vm_data->hnode); 1325 + spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); 1375 1326 } 1376 1327 1377 1328 static int avic_vm_init(struct kvm *kvm) 1378 1329 { 1379 - int err = -ENOMEM; 1330 + unsigned long flags; 1331 + int vm_id, err = -ENOMEM; 1380 1332 struct kvm_arch *vm_data = &kvm->arch; 1381 1333 struct page *p_page; 1382 1334 struct page *l_page; 1383 1335 1384 1336 if (!avic) 1385 1337 return 0; 1338 + 1339 + vm_id = avic_get_next_vm_id(); 1340 + if (vm_id < 0) 1341 + return vm_id; 1342 + vm_data->avic_vm_id = (u32)vm_id; 1386 1343 1387 1344 /* Allocating physical APIC ID table (4KB) */ 1388 1345 p_page = alloc_page(GFP_KERNEL); ··· 1441 1316 vm_data->avic_logical_id_table_page = l_page; 1442 1317 clear_page(page_address(l_page)); 1443 1318 1319 + spin_lock_irqsave(&svm_vm_data_hash_lock, flags); 1320 + hash_add(svm_vm_data_hash, &vm_data->hnode, vm_data->avic_vm_id); 1321 + spin_unlock_irqrestore(&svm_vm_data_hash_lock, flags); 1322 + 1444 1323 return 0; 1445 1324 1446 1325 free_avic: ··· 1452 1323 return err; 1453 1324 } 1454 1325 1455 - /** 1456 - * This function is called during VCPU halt/unhalt. 1457 - */ 1458 - static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run) 1326 + static inline int 1327 + avic_update_iommu_vcpu_affinity(struct kvm_vcpu *vcpu, int cpu, bool r) 1459 1328 { 1460 - u64 entry; 1461 - int h_physical_id = kvm_cpu_get_apicid(vcpu->cpu); 1329 + int ret = 0; 1330 + unsigned long flags; 1331 + struct amd_svm_iommu_ir *ir; 1462 1332 struct vcpu_svm *svm = to_svm(vcpu); 1463 1333 1464 - if (!kvm_vcpu_apicv_active(vcpu)) 1465 - return; 1334 + if (!kvm_arch_has_assigned_device(vcpu->kvm)) 1335 + return 0; 1466 1336 1467 - svm->avic_is_running = is_run; 1337 + /* 1338 + * Here, we go through the per-vcpu ir_list to update all existing 1339 + * interrupt remapping table entry targeting this vcpu. 1340 + */ 1341 + spin_lock_irqsave(&svm->ir_list_lock, flags); 1468 1342 1469 - /* ID = 0xff (broadcast), ID > 0xff (reserved) */ 1470 - if (WARN_ON(h_physical_id >= AVIC_MAX_PHYSICAL_ID_COUNT)) 1471 - return; 1343 + if (list_empty(&svm->ir_list)) 1344 + goto out; 1472 1345 1473 - entry = READ_ONCE(*(svm->avic_physical_id_cache)); 1474 - WARN_ON(is_run == !!(entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK)); 1475 - 1476 - entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; 1477 - if (is_run) 1478 - entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; 1479 - WRITE_ONCE(*(svm->avic_physical_id_cache), entry); 1346 + list_for_each_entry(ir, &svm->ir_list, node) { 1347 + ret = amd_iommu_update_ga(cpu, r, ir->data); 1348 + if (ret) 1349 + break; 1350 + } 1351 + out: 1352 + spin_unlock_irqrestore(&svm->ir_list_lock, flags); 1353 + return ret; 1480 1354 } 1481 1355 1482 1356 static void avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu) ··· 1506 1374 entry |= AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; 1507 1375 1508 1376 WRITE_ONCE(*(svm->avic_physical_id_cache), entry); 1377 + avic_update_iommu_vcpu_affinity(vcpu, h_physical_id, 1378 + svm->avic_is_running); 1509 1379 } 1510 1380 1511 1381 static void avic_vcpu_put(struct kvm_vcpu *vcpu) ··· 1519 1385 return; 1520 1386 1521 1387 entry = READ_ONCE(*(svm->avic_physical_id_cache)); 1388 + if (entry & AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK) 1389 + avic_update_iommu_vcpu_affinity(vcpu, -1, 0); 1390 + 1522 1391 entry &= ~AVIC_PHYSICAL_ID_ENTRY_IS_RUNNING_MASK; 1523 1392 WRITE_ONCE(*(svm->avic_physical_id_cache), entry); 1393 + } 1394 + 1395 + /** 1396 + * This function is called during VCPU halt/unhalt. 1397 + */ 1398 + static void avic_set_running(struct kvm_vcpu *vcpu, bool is_run) 1399 + { 1400 + struct vcpu_svm *svm = to_svm(vcpu); 1401 + 1402 + svm->avic_is_running = is_run; 1403 + if (is_run) 1404 + avic_vcpu_load(vcpu, vcpu->cpu); 1405 + else 1406 + avic_vcpu_put(vcpu); 1524 1407 } 1525 1408 1526 1409 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) ··· 1601 1450 err = avic_init_backing_page(&svm->vcpu); 1602 1451 if (err) 1603 1452 goto free_page4; 1453 + 1454 + INIT_LIST_HEAD(&svm->ir_list); 1455 + spin_lock_init(&svm->ir_list_lock); 1604 1456 } 1605 1457 1606 1458 /* We initialize this flag to true to make sure that the is_running ··· 4400 4246 kvm_vcpu_wake_up(vcpu); 4401 4247 } 4402 4248 4249 + static void svm_ir_list_del(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi) 4250 + { 4251 + unsigned long flags; 4252 + struct amd_svm_iommu_ir *cur; 4253 + 4254 + spin_lock_irqsave(&svm->ir_list_lock, flags); 4255 + list_for_each_entry(cur, &svm->ir_list, node) { 4256 + if (cur->data != pi->ir_data) 4257 + continue; 4258 + list_del(&cur->node); 4259 + kfree(cur); 4260 + break; 4261 + } 4262 + spin_unlock_irqrestore(&svm->ir_list_lock, flags); 4263 + } 4264 + 4265 + static int svm_ir_list_add(struct vcpu_svm *svm, struct amd_iommu_pi_data *pi) 4266 + { 4267 + int ret = 0; 4268 + unsigned long flags; 4269 + struct amd_svm_iommu_ir *ir; 4270 + 4271 + /** 4272 + * In some cases, the existing irte is updaed and re-set, 4273 + * so we need to check here if it's already been * added 4274 + * to the ir_list. 4275 + */ 4276 + if (pi->ir_data && (pi->prev_ga_tag != 0)) { 4277 + struct kvm *kvm = svm->vcpu.kvm; 4278 + u32 vcpu_id = AVIC_GATAG_TO_VCPUID(pi->prev_ga_tag); 4279 + struct kvm_vcpu *prev_vcpu = kvm_get_vcpu_by_id(kvm, vcpu_id); 4280 + struct vcpu_svm *prev_svm; 4281 + 4282 + if (!prev_vcpu) { 4283 + ret = -EINVAL; 4284 + goto out; 4285 + } 4286 + 4287 + prev_svm = to_svm(prev_vcpu); 4288 + svm_ir_list_del(prev_svm, pi); 4289 + } 4290 + 4291 + /** 4292 + * Allocating new amd_iommu_pi_data, which will get 4293 + * add to the per-vcpu ir_list. 4294 + */ 4295 + ir = kzalloc(sizeof(struct amd_svm_iommu_ir), GFP_KERNEL); 4296 + if (!ir) { 4297 + ret = -ENOMEM; 4298 + goto out; 4299 + } 4300 + ir->data = pi->ir_data; 4301 + 4302 + spin_lock_irqsave(&svm->ir_list_lock, flags); 4303 + list_add(&ir->node, &svm->ir_list); 4304 + spin_unlock_irqrestore(&svm->ir_list_lock, flags); 4305 + out: 4306 + return ret; 4307 + } 4308 + 4309 + /** 4310 + * Note: 4311 + * The HW cannot support posting multicast/broadcast 4312 + * interrupts to a vCPU. So, we still use legacy interrupt 4313 + * remapping for these kind of interrupts. 4314 + * 4315 + * For lowest-priority interrupts, we only support 4316 + * those with single CPU as the destination, e.g. user 4317 + * configures the interrupts via /proc/irq or uses 4318 + * irqbalance to make the interrupts single-CPU. 4319 + */ 4320 + static int 4321 + get_pi_vcpu_info(struct kvm *kvm, struct kvm_kernel_irq_routing_entry *e, 4322 + struct vcpu_data *vcpu_info, struct vcpu_svm **svm) 4323 + { 4324 + struct kvm_lapic_irq irq; 4325 + struct kvm_vcpu *vcpu = NULL; 4326 + 4327 + kvm_set_msi_irq(kvm, e, &irq); 4328 + 4329 + if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) { 4330 + pr_debug("SVM: %s: use legacy intr remap mode for irq %u\n", 4331 + __func__, irq.vector); 4332 + return -1; 4333 + } 4334 + 4335 + pr_debug("SVM: %s: use GA mode for irq %u\n", __func__, 4336 + irq.vector); 4337 + *svm = to_svm(vcpu); 4338 + vcpu_info->pi_desc_addr = page_to_phys((*svm)->avic_backing_page); 4339 + vcpu_info->vector = irq.vector; 4340 + 4341 + return 0; 4342 + } 4343 + 4344 + /* 4345 + * svm_update_pi_irte - set IRTE for Posted-Interrupts 4346 + * 4347 + * @kvm: kvm 4348 + * @host_irq: host irq of the interrupt 4349 + * @guest_irq: gsi of the interrupt 4350 + * @set: set or unset PI 4351 + * returns 0 on success, < 0 on failure 4352 + */ 4353 + static int svm_update_pi_irte(struct kvm *kvm, unsigned int host_irq, 4354 + uint32_t guest_irq, bool set) 4355 + { 4356 + struct kvm_kernel_irq_routing_entry *e; 4357 + struct kvm_irq_routing_table *irq_rt; 4358 + int idx, ret = -EINVAL; 4359 + 4360 + if (!kvm_arch_has_assigned_device(kvm) || 4361 + !irq_remapping_cap(IRQ_POSTING_CAP)) 4362 + return 0; 4363 + 4364 + pr_debug("SVM: %s: host_irq=%#x, guest_irq=%#x, set=%#x\n", 4365 + __func__, host_irq, guest_irq, set); 4366 + 4367 + idx = srcu_read_lock(&kvm->irq_srcu); 4368 + irq_rt = srcu_dereference(kvm->irq_routing, &kvm->irq_srcu); 4369 + WARN_ON(guest_irq >= irq_rt->nr_rt_entries); 4370 + 4371 + hlist_for_each_entry(e, &irq_rt->map[guest_irq], link) { 4372 + struct vcpu_data vcpu_info; 4373 + struct vcpu_svm *svm = NULL; 4374 + 4375 + if (e->type != KVM_IRQ_ROUTING_MSI) 4376 + continue; 4377 + 4378 + /** 4379 + * Here, we setup with legacy mode in the following cases: 4380 + * 1. When cannot target interrupt to a specific vcpu. 4381 + * 2. Unsetting posted interrupt. 4382 + * 3. APIC virtialization is disabled for the vcpu. 4383 + */ 4384 + if (!get_pi_vcpu_info(kvm, e, &vcpu_info, &svm) && set && 4385 + kvm_vcpu_apicv_active(&svm->vcpu)) { 4386 + struct amd_iommu_pi_data pi; 4387 + 4388 + /* Try to enable guest_mode in IRTE */ 4389 + pi.base = page_to_phys(svm->avic_backing_page) & AVIC_HPA_MASK; 4390 + pi.ga_tag = AVIC_GATAG(kvm->arch.avic_vm_id, 4391 + svm->vcpu.vcpu_id); 4392 + pi.is_guest_mode = true; 4393 + pi.vcpu_data = &vcpu_info; 4394 + ret = irq_set_vcpu_affinity(host_irq, &pi); 4395 + 4396 + /** 4397 + * Here, we successfully setting up vcpu affinity in 4398 + * IOMMU guest mode. Now, we need to store the posted 4399 + * interrupt information in a per-vcpu ir_list so that 4400 + * we can reference to them directly when we update vcpu 4401 + * scheduling information in IOMMU irte. 4402 + */ 4403 + if (!ret && pi.is_guest_mode) 4404 + svm_ir_list_add(svm, &pi); 4405 + } else { 4406 + /* Use legacy mode in IRTE */ 4407 + struct amd_iommu_pi_data pi; 4408 + 4409 + /** 4410 + * Here, pi is used to: 4411 + * - Tell IOMMU to use legacy mode for this interrupt. 4412 + * - Retrieve ga_tag of prior interrupt remapping data. 4413 + */ 4414 + pi.is_guest_mode = false; 4415 + ret = irq_set_vcpu_affinity(host_irq, &pi); 4416 + 4417 + /** 4418 + * Check if the posted interrupt was previously 4419 + * setup with the guest_mode by checking if the ga_tag 4420 + * was cached. If so, we need to clean up the per-vcpu 4421 + * ir_list. 4422 + */ 4423 + if (!ret && pi.prev_ga_tag) { 4424 + int id = AVIC_GATAG_TO_VCPUID(pi.prev_ga_tag); 4425 + struct kvm_vcpu *vcpu; 4426 + 4427 + vcpu = kvm_get_vcpu_by_id(kvm, id); 4428 + if (vcpu) 4429 + svm_ir_list_del(to_svm(vcpu), &pi); 4430 + } 4431 + } 4432 + 4433 + if (!ret && svm) { 4434 + trace_kvm_pi_irte_update(svm->vcpu.vcpu_id, 4435 + host_irq, e->gsi, 4436 + vcpu_info.vector, 4437 + vcpu_info.pi_desc_addr, set); 4438 + } 4439 + 4440 + if (ret < 0) { 4441 + pr_err("%s: failed to update PI IRTE\n", __func__); 4442 + goto out; 4443 + } 4444 + } 4445 + 4446 + ret = 0; 4447 + out: 4448 + srcu_read_unlock(&kvm->irq_srcu, idx); 4449 + return ret; 4450 + } 4451 + 4403 4452 static int svm_nmi_allowed(struct kvm_vcpu *vcpu) 4404 4453 { 4405 4454 struct vcpu_svm *svm = to_svm(vcpu); ··· 5421 5064 5422 5065 .has_wbinvd_exit = svm_has_wbinvd_exit, 5423 5066 5424 - .read_tsc_offset = svm_read_tsc_offset, 5425 5067 .write_tsc_offset = svm_write_tsc_offset, 5426 5068 .adjust_tsc_offset_guest = svm_adjust_tsc_offset_guest, 5427 5069 .read_l1_tsc = svm_read_l1_tsc, ··· 5434 5078 5435 5079 .pmu_ops = &amd_pmu_ops, 5436 5080 .deliver_posted_interrupt = svm_deliver_avic_intr, 5081 + .update_pi_irte = svm_update_pi_irte, 5437 5082 }; 5438 5083 5439 5084 static int __init svm_init(void)

+135 -72

arch/x86/kvm/vmx.c

··· 927 927 static unsigned long *vmx_msr_bitmap_longmode; 928 928 static unsigned long *vmx_msr_bitmap_legacy_x2apic; 929 929 static unsigned long *vmx_msr_bitmap_longmode_x2apic; 930 + static unsigned long *vmx_msr_bitmap_legacy_x2apic_apicv_inactive; 931 + static unsigned long *vmx_msr_bitmap_longmode_x2apic_apicv_inactive; 930 932 static unsigned long *vmx_vmread_bitmap; 931 933 static unsigned long *vmx_vmwrite_bitmap; 932 934 ··· 941 939 static struct vmcs_config { 942 940 int size; 943 941 int order; 942 + u32 basic_cap; 944 943 u32 revision_id; 945 944 u32 pin_based_exec_ctrl; 946 945 u32 cpu_based_exec_ctrl; ··· 1216 1213 { 1217 1214 return vmcs_config.cpu_based_2nd_exec_ctrl & 1218 1215 SECONDARY_EXEC_PAUSE_LOOP_EXITING; 1216 + } 1217 + 1218 + static inline bool cpu_has_vmx_basic_inout(void) 1219 + { 1220 + return (((u64)vmcs_config.basic_cap << 32) & VMX_BASIC_INOUT); 1219 1221 } 1220 1222 1221 1223 static inline bool cpu_need_virtualize_apic_accesses(struct kvm_vcpu *vcpu) ··· 2526 2518 else if (cpu_has_secondary_exec_ctrls() && 2527 2519 (vmcs_read32(SECONDARY_VM_EXEC_CONTROL) & 2528 2520 SECONDARY_EXEC_VIRTUALIZE_X2APIC_MODE)) { 2529 - if (is_long_mode(vcpu)) 2530 - msr_bitmap = vmx_msr_bitmap_longmode_x2apic; 2531 - else 2532 - msr_bitmap = vmx_msr_bitmap_legacy_x2apic; 2521 + if (enable_apicv && kvm_vcpu_apicv_active(vcpu)) { 2522 + if (is_long_mode(vcpu)) 2523 + msr_bitmap = vmx_msr_bitmap_longmode_x2apic; 2524 + else 2525 + msr_bitmap = vmx_msr_bitmap_legacy_x2apic; 2526 + } else { 2527 + if (is_long_mode(vcpu)) 2528 + msr_bitmap = vmx_msr_bitmap_longmode_x2apic_apicv_inactive; 2529 + else 2530 + msr_bitmap = vmx_msr_bitmap_legacy_x2apic_apicv_inactive; 2531 + } 2533 2532 } else { 2534 2533 if (is_long_mode(vcpu)) 2535 2534 msr_bitmap = vmx_msr_bitmap_longmode; ··· 2616 2601 to_vmx(vcpu)->nested.vmcs01_tsc_offset : 2617 2602 vmcs_read64(TSC_OFFSET); 2618 2603 return host_tsc + tsc_offset; 2619 - } 2620 - 2621 - static u64 vmx_read_tsc_offset(struct kvm_vcpu *vcpu) 2622 - { 2623 - return vmcs_read64(TSC_OFFSET); 2624 2604 } 2625 2605 2626 2606 /* ··· 2887 2877 *pdata = VMCS12_REVISION | VMX_BASIC_TRUE_CTLS | 2888 2878 ((u64)VMCS12_SIZE << VMX_BASIC_VMCS_SIZE_SHIFT) | 2889 2879 (VMX_BASIC_MEM_TYPE_WB << VMX_BASIC_MEM_TYPE_SHIFT); 2880 + if (cpu_has_vmx_basic_inout()) 2881 + *pdata |= VMX_BASIC_INOUT; 2890 2882 break; 2891 2883 case MSR_IA32_VMX_TRUE_PINBASED_CTLS: 2892 2884 case MSR_IA32_VMX_PINBASED_CTLS: ··· 3469 3457 return -EIO; 3470 3458 3471 3459 vmcs_conf->size = vmx_msr_high & 0x1fff; 3472 - vmcs_conf->order = get_order(vmcs_config.size); 3460 + vmcs_conf->order = get_order(vmcs_conf->size); 3461 + vmcs_conf->basic_cap = vmx_msr_high & ~0x1fff; 3473 3462 vmcs_conf->revision_id = vmx_msr_low; 3474 3463 3475 3464 vmcs_conf->pin_based_exec_ctrl = _pin_based_exec_control; ··· 4691 4678 msr, MSR_TYPE_R | MSR_TYPE_W); 4692 4679 } 4693 4680 4694 - static void vmx_enable_intercept_msr_read_x2apic(u32 msr) 4681 + static void vmx_enable_intercept_msr_read_x2apic(u32 msr, bool apicv_active) 4695 4682 { 4696 - __vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, 4697 - msr, MSR_TYPE_R); 4698 - __vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, 4699 - msr, MSR_TYPE_R); 4683 + if (apicv_active) { 4684 + __vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, 4685 + msr, MSR_TYPE_R); 4686 + __vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, 4687 + msr, MSR_TYPE_R); 4688 + } else { 4689 + __vmx_enable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic_apicv_inactive, 4690 + msr, MSR_TYPE_R); 4691 + __vmx_enable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic_apicv_inactive, 4692 + msr, MSR_TYPE_R); 4693 + } 4700 4694 } 4701 4695 4702 - static void vmx_disable_intercept_msr_read_x2apic(u32 msr) 4696 + static void vmx_disable_intercept_msr_read_x2apic(u32 msr, bool apicv_active) 4703 4697 { 4704 - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, 4705 - msr, MSR_TYPE_R); 4706 - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, 4707 - msr, MSR_TYPE_R); 4698 + if (apicv_active) { 4699 + __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, 4700 + msr, MSR_TYPE_R); 4701 + __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, 4702 + msr, MSR_TYPE_R); 4703 + } else { 4704 + __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic_apicv_inactive, 4705 + msr, MSR_TYPE_R); 4706 + __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic_apicv_inactive, 4707 + msr, MSR_TYPE_R); 4708 + } 4708 4709 } 4709 4710 4710 - static void vmx_disable_intercept_msr_write_x2apic(u32 msr) 4711 + static void vmx_disable_intercept_msr_write_x2apic(u32 msr, bool apicv_active) 4711 4712 { 4712 - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, 4713 - msr, MSR_TYPE_W); 4714 - __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, 4715 - msr, MSR_TYPE_W); 4713 + if (apicv_active) { 4714 + __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic, 4715 + msr, MSR_TYPE_W); 4716 + __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic, 4717 + msr, MSR_TYPE_W); 4718 + } else { 4719 + __vmx_disable_intercept_for_msr(vmx_msr_bitmap_legacy_x2apic_apicv_inactive, 4720 + msr, MSR_TYPE_W); 4721 + __vmx_disable_intercept_for_msr(vmx_msr_bitmap_longmode_x2apic_apicv_inactive, 4722 + msr, MSR_TYPE_W); 4723 + } 4716 4724 } 4717 4725 4718 4726 static bool vmx_get_enable_apicv(void) ··· 5313 5279 { 5314 5280 struct vcpu_vmx *vmx = to_vmx(vcpu); 5315 5281 5316 - if (is_guest_mode(vcpu)) 5317 - return; 5282 + if (!is_guest_mode(vcpu)) { 5283 + if (!cpu_has_virtual_nmis()) { 5284 + /* 5285 + * Tracking the NMI-blocked state in software is built upon 5286 + * finding the next open IRQ window. This, in turn, depends on 5287 + * well-behaving guests: They have to keep IRQs disabled at 5288 + * least as long as the NMI handler runs. Otherwise we may 5289 + * cause NMI nesting, maybe breaking the guest. But as this is 5290 + * highly unlikely, we can live with the residual risk. 5291 + */ 5292 + vmx->soft_vnmi_blocked = 1; 5293 + vmx->vnmi_blocked_time = 0; 5294 + } 5318 5295 5319 - if (!cpu_has_virtual_nmis()) { 5320 - /* 5321 - * Tracking the NMI-blocked state in software is built upon 5322 - * finding the next open IRQ window. This, in turn, depends on 5323 - * well-behaving guests: They have to keep IRQs disabled at 5324 - * least as long as the NMI handler runs. Otherwise we may 5325 - * cause NMI nesting, maybe breaking the guest. But as this is 5326 - * highly unlikely, we can live with the residual risk. 5327 - */ 5328 - vmx->soft_vnmi_blocked = 1; 5329 - vmx->vnmi_blocked_time = 0; 5296 + ++vcpu->stat.nmi_injections; 5297 + vmx->nmi_known_unmasked = false; 5330 5298 } 5331 5299 5332 - ++vcpu->stat.nmi_injections; 5333 - vmx->nmi_known_unmasked = false; 5334 5300 if (vmx->rmode.vm86_active) { 5335 5301 if (kvm_inject_realmode_interrupt(vcpu, NMI_VECTOR, 0) != EMULATE_DONE) 5336 5302 kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); 5337 5303 return; 5338 5304 } 5305 + 5339 5306 vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 5340 5307 INTR_TYPE_NMI_INTR | INTR_INFO_VALID_MASK | NMI_VECTOR); 5341 5308 } ··· 6144 6109 exit_qualification = vmcs_readl(EXIT_QUALIFICATION); 6145 6110 6146 6111 gla_validity = (exit_qualification >> 7) & 0x3; 6147 - if (gla_validity != 0x3 && gla_validity != 0x1 && gla_validity != 0) { 6112 + if (gla_validity == 0x2) { 6148 6113 printk(KERN_ERR "EPT: Handling EPT violation failed!\n"); 6149 6114 printk(KERN_ERR "EPT: GPA: 0x%lx, GVA: 0x%lx\n", 6150 6115 (long unsigned int)vmcs_read64(GUEST_PHYSICAL_ADDRESS), ··· 6395 6360 if (!vmx_msr_bitmap_legacy_x2apic) 6396 6361 goto out2; 6397 6362 6363 + vmx_msr_bitmap_legacy_x2apic_apicv_inactive = 6364 + (unsigned long *)__get_free_page(GFP_KERNEL); 6365 + if (!vmx_msr_bitmap_legacy_x2apic_apicv_inactive) 6366 + goto out3; 6367 + 6398 6368 vmx_msr_bitmap_longmode = (unsigned long *)__get_free_page(GFP_KERNEL); 6399 6369 if (!vmx_msr_bitmap_longmode) 6400 - goto out3; 6370 + goto out4; 6401 6371 6402 6372 vmx_msr_bitmap_longmode_x2apic = 6403 6373 (unsigned long *)__get_free_page(GFP_KERNEL); 6404 6374 if (!vmx_msr_bitmap_longmode_x2apic) 6405 - goto out4; 6375 + goto out5; 6376 + 6377 + vmx_msr_bitmap_longmode_x2apic_apicv_inactive = 6378 + (unsigned long *)__get_free_page(GFP_KERNEL); 6379 + if (!vmx_msr_bitmap_longmode_x2apic_apicv_inactive) 6380 + goto out6; 6406 6381 6407 6382 vmx_vmread_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); 6408 6383 if (!vmx_vmread_bitmap) 6409 - goto out6; 6384 + goto out7; 6410 6385 6411 6386 vmx_vmwrite_bitmap = (unsigned long *)__get_free_page(GFP_KERNEL); 6412 6387 if (!vmx_vmwrite_bitmap) 6413 - goto out7; 6388 + goto out8; 6414 6389 6415 6390 memset(vmx_vmread_bitmap, 0xff, PAGE_SIZE); 6416 6391 memset(vmx_vmwrite_bitmap, 0xff, PAGE_SIZE); ··· 6439 6394 6440 6395 if (setup_vmcs_config(&vmcs_config) < 0) { 6441 6396 r = -EIO; 6442 - goto out8; 6397 + goto out9; 6443 6398 } 6444 6399 6445 6400 if (boot_cpu_has(X86_FEATURE_NX)) ··· 6506 6461 vmx_msr_bitmap_legacy, PAGE_SIZE); 6507 6462 memcpy(vmx_msr_bitmap_longmode_x2apic, 6508 6463 vmx_msr_bitmap_longmode, PAGE_SIZE); 6464 + memcpy(vmx_msr_bitmap_legacy_x2apic_apicv_inactive, 6465 + vmx_msr_bitmap_legacy, PAGE_SIZE); 6466 + memcpy(vmx_msr_bitmap_longmode_x2apic_apicv_inactive, 6467 + vmx_msr_bitmap_longmode, PAGE_SIZE); 6509 6468 6510 6469 set_bit(0, vmx_vpid_bitmap); /* 0 is reserved for host */ 6511 6470 6471 + /* 6472 + * enable_apicv && kvm_vcpu_apicv_active() 6473 + */ 6512 6474 for (msr = 0x800; msr <= 0x8ff; msr++) 6513 - vmx_disable_intercept_msr_read_x2apic(msr); 6475 + vmx_disable_intercept_msr_read_x2apic(msr, true); 6514 6476 6515 6477 /* TMCCT */ 6516 - vmx_enable_intercept_msr_read_x2apic(0x839); 6478 + vmx_enable_intercept_msr_read_x2apic(0x839, true); 6517 6479 /* TPR */ 6518 - vmx_disable_intercept_msr_write_x2apic(0x808); 6480 + vmx_disable_intercept_msr_write_x2apic(0x808, true); 6519 6481 /* EOI */ 6520 - vmx_disable_intercept_msr_write_x2apic(0x80b); 6482 + vmx_disable_intercept_msr_write_x2apic(0x80b, true); 6521 6483 /* SELF-IPI */ 6522 - vmx_disable_intercept_msr_write_x2apic(0x83f); 6484 + vmx_disable_intercept_msr_write_x2apic(0x83f, true); 6485 + 6486 + /* 6487 + * (enable_apicv && !kvm_vcpu_apicv_active()) || 6488 + * !enable_apicv 6489 + */ 6490 + /* TPR */ 6491 + vmx_disable_intercept_msr_read_x2apic(0x808, false); 6492 + vmx_disable_intercept_msr_write_x2apic(0x808, false); 6523 6493 6524 6494 if (enable_ept) { 6525 6495 kvm_mmu_set_mask_ptes(VMX_EPT_READABLE_MASK, ··· 6581 6521 6582 6522 return alloc_kvm_area(); 6583 6523 6584 - out8: 6524 + out9: 6585 6525 free_page((unsigned long)vmx_vmwrite_bitmap); 6586 - out7: 6526 + out8: 6587 6527 free_page((unsigned long)vmx_vmread_bitmap); 6528 + out7: 6529 + free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic_apicv_inactive); 6588 6530 out6: 6589 6531 free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic); 6590 - out4: 6532 + out5: 6591 6533 free_page((unsigned long)vmx_msr_bitmap_longmode); 6534 + out4: 6535 + free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic_apicv_inactive); 6592 6536 out3: 6593 6537 free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic); 6594 6538 out2: ··· 6608 6544 static __exit void hardware_unsetup(void) 6609 6545 { 6610 6546 free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic); 6547 + free_page((unsigned long)vmx_msr_bitmap_legacy_x2apic_apicv_inactive); 6611 6548 free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic); 6549 + free_page((unsigned long)vmx_msr_bitmap_longmode_x2apic_apicv_inactive); 6612 6550 free_page((unsigned long)vmx_msr_bitmap_legacy); 6613 6551 free_page((unsigned long)vmx_msr_bitmap_longmode); 6614 6552 free_page((unsigned long)vmx_io_bitmap_b); ··· 6792 6726 { 6793 6727 /* TODO: not to reset guest simply here. */ 6794 6728 kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); 6795 - pr_warn("kvm: nested vmx abort, indicator %d\n", indicator); 6729 + pr_debug_ratelimited("kvm: nested vmx abort, indicator %d\n", indicator); 6796 6730 } 6797 6731 6798 6732 static enum hrtimer_restart vmx_preemption_timer_fn(struct hrtimer *timer) ··· 7079 7013 vmx->nested.vmcs02_num = 0; 7080 7014 7081 7015 hrtimer_init(&vmx->nested.preemption_timer, CLOCK_MONOTONIC, 7082 - HRTIMER_MODE_REL); 7016 + HRTIMER_MODE_REL_PINNED); 7083 7017 vmx->nested.preemption_timer.function = vmx_preemption_timer_fn; 7084 7018 7085 7019 vmx->nested.vmxon = true; ··· 8501 8435 return; 8502 8436 } 8503 8437 8504 - /* 8505 - * There is not point to enable virtualize x2apic without enable 8506 - * apicv 8507 - */ 8508 - if (!cpu_has_vmx_virtualize_x2apic_mode() || 8509 - !kvm_vcpu_apicv_active(vcpu)) 8438 + if (!cpu_has_vmx_virtualize_x2apic_mode()) 8510 8439 return; 8511 8440 8512 8441 if (!cpu_need_tpr_shadow(vcpu)) ··· 9659 9598 maxphyaddr = cpuid_maxphyaddr(vcpu); 9660 9599 if (!IS_ALIGNED(addr, 16) || addr >> maxphyaddr || 9661 9600 (addr + count * sizeof(struct vmx_msr_entry) - 1) >> maxphyaddr) { 9662 - pr_warn_ratelimited( 9601 + pr_debug_ratelimited( 9663 9602 "nVMX: invalid MSR switch (0x%lx, %d, %llu, 0x%08llx)", 9664 9603 addr_field, maxphyaddr, count, addr); 9665 9604 return -EINVAL; ··· 9732 9671 for (i = 0; i < count; i++) { 9733 9672 if (kvm_vcpu_read_guest(vcpu, gpa + i * sizeof(e), 9734 9673 &e, sizeof(e))) { 9735 - pr_warn_ratelimited( 9674 + pr_debug_ratelimited( 9736 9675 "%s cannot read MSR entry (%u, 0x%08llx)\n", 9737 9676 __func__, i, gpa + i * sizeof(e)); 9738 9677 goto fail; 9739 9678 } 9740 9679 if (nested_vmx_load_msr_check(vcpu, &e)) { 9741 - pr_warn_ratelimited( 9680 + pr_debug_ratelimited( 9742 9681 "%s check failed (%u, 0x%x, 0x%x)\n", 9743 9682 __func__, i, e.index, e.reserved); 9744 9683 goto fail; ··· 9746 9685 msr.index = e.index; 9747 9686 msr.data = e.value; 9748 9687 if (kvm_set_msr(vcpu, &msr)) { 9749 - pr_warn_ratelimited( 9688 + pr_debug_ratelimited( 9750 9689 "%s cannot write MSR (%u, 0x%x, 0x%llx)\n", 9751 9690 __func__, i, e.index, e.value); 9752 9691 goto fail; ··· 9767 9706 if (kvm_vcpu_read_guest(vcpu, 9768 9707 gpa + i * sizeof(e), 9769 9708 &e, 2 * sizeof(u32))) { 9770 - pr_warn_ratelimited( 9709 + pr_debug_ratelimited( 9771 9710 "%s cannot read MSR entry (%u, 0x%08llx)\n", 9772 9711 __func__, i, gpa + i * sizeof(e)); 9773 9712 return -EINVAL; 9774 9713 } 9775 9714 if (nested_vmx_store_msr_check(vcpu, &e)) { 9776 - pr_warn_ratelimited( 9715 + pr_debug_ratelimited( 9777 9716 "%s check failed (%u, 0x%x, 0x%x)\n", 9778 9717 __func__, i, e.index, e.reserved); 9779 9718 return -EINVAL; ··· 9781 9720 msr_info.host_initiated = false; 9782 9721 msr_info.index = e.index; 9783 9722 if (kvm_get_msr(vcpu, &msr_info)) { 9784 - pr_warn_ratelimited( 9723 + pr_debug_ratelimited( 9785 9724 "%s cannot read MSR (%u, 0x%x)\n", 9786 9725 __func__, i, e.index); 9787 9726 return -EINVAL; ··· 9790 9729 gpa + i * sizeof(e) + 9791 9730 offsetof(struct vmx_msr_entry, value), 9792 9731 &msr_info.data, sizeof(msr_info.data))) { 9793 - pr_warn_ratelimited( 9732 + pr_debug_ratelimited( 9794 9733 "%s cannot write MSR (%u, 0x%x, 0x%llx)\n", 9795 9734 __func__, i, e.index, msr_info.data); 9796 9735 return -EINVAL; ··· 10561 10500 vmcs12->guest_pdptr3 = vmcs_read64(GUEST_PDPTR3); 10562 10501 } 10563 10502 10503 + if (nested_cpu_has_ept(vmcs12)) 10504 + vmcs12->guest_linear_address = vmcs_readl(GUEST_LINEAR_ADDRESS); 10505 + 10564 10506 if (nested_cpu_has_vid(vmcs12)) 10565 10507 vmcs12->guest_intr_status = vmcs_read16(GUEST_INTR_STATUS); 10566 10508 ··· 10857 10793 * We are now running in L2, mmu_notifier will force to reload the 10858 10794 * page's hpa for L2 vmcs. Need to reload it for L1 before entering L1. 10859 10795 */ 10860 - kvm_vcpu_reload_apic_access_page(vcpu); 10796 + kvm_make_request(KVM_REQ_APIC_PAGE_RELOAD, vcpu); 10861 10797 10862 10798 /* 10863 10799 * Exiting from L2 to L1, we're now back to L1 which thinks it just ··· 11338 11274 11339 11275 .has_wbinvd_exit = cpu_has_vmx_wbinvd_exit, 11340 11276 11341 - .read_tsc_offset = vmx_read_tsc_offset, 11342 11277 .write_tsc_offset = vmx_write_tsc_offset, 11343 11278 .adjust_tsc_offset_guest = vmx_adjust_tsc_offset_guest, 11344 11279 .read_l1_tsc = vmx_read_l1_tsc,

+105 -66

arch/x86/kvm/x86.c

··· 1367 1367 1368 1368 static void update_ia32_tsc_adjust_msr(struct kvm_vcpu *vcpu, s64 offset) 1369 1369 { 1370 - u64 curr_offset = kvm_x86_ops->read_tsc_offset(vcpu); 1370 + u64 curr_offset = vcpu->arch.tsc_offset; 1371 1371 vcpu->arch.ia32_tsc_adjust_msr += offset - curr_offset; 1372 1372 } 1373 1373 ··· 1413 1413 } 1414 1414 EXPORT_SYMBOL_GPL(kvm_read_l1_tsc); 1415 1415 1416 + static void kvm_vcpu_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) 1417 + { 1418 + kvm_x86_ops->write_tsc_offset(vcpu, offset); 1419 + vcpu->arch.tsc_offset = offset; 1420 + } 1421 + 1416 1422 void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr) 1417 1423 { 1418 1424 struct kvm *kvm = vcpu->kvm; ··· 1431 1425 1432 1426 raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); 1433 1427 offset = kvm_compute_tsc_offset(vcpu, data); 1434 - ns = get_kernel_ns(); 1428 + ns = ktime_get_boot_ns(); 1435 1429 elapsed = ns - kvm->arch.last_tsc_nsec; 1436 1430 1437 1431 if (vcpu->arch.virtual_tsc_khz) { ··· 1528 1522 1529 1523 if (guest_cpuid_has_tsc_adjust(vcpu) && !msr->host_initiated) 1530 1524 update_ia32_tsc_adjust_msr(vcpu, offset); 1531 - kvm_x86_ops->write_tsc_offset(vcpu, offset); 1525 + kvm_vcpu_write_tsc_offset(vcpu, offset); 1532 1526 raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); 1533 1527 1534 1528 spin_lock(&kvm->arch.pvclock_gtod_sync_lock); ··· 1722 1716 #endif 1723 1717 } 1724 1718 1719 + static u64 __get_kvmclock_ns(struct kvm *kvm) 1720 + { 1721 + struct kvm_vcpu *vcpu = kvm_get_vcpu(kvm, 0); 1722 + struct kvm_arch *ka = &kvm->arch; 1723 + s64 ns; 1724 + 1725 + if (vcpu->arch.hv_clock.flags & PVCLOCK_TSC_STABLE_BIT) { 1726 + u64 tsc = kvm_read_l1_tsc(vcpu, rdtsc()); 1727 + ns = __pvclock_read_cycles(&vcpu->arch.hv_clock, tsc); 1728 + } else { 1729 + ns = ktime_get_boot_ns() + ka->kvmclock_offset; 1730 + } 1731 + 1732 + return ns; 1733 + } 1734 + 1735 + u64 get_kvmclock_ns(struct kvm *kvm) 1736 + { 1737 + unsigned long flags; 1738 + s64 ns; 1739 + 1740 + local_irq_save(flags); 1741 + ns = __get_kvmclock_ns(kvm); 1742 + local_irq_restore(flags); 1743 + 1744 + return ns; 1745 + } 1746 + 1747 + static void kvm_setup_pvclock_page(struct kvm_vcpu *v) 1748 + { 1749 + struct kvm_vcpu_arch *vcpu = &v->arch; 1750 + struct pvclock_vcpu_time_info guest_hv_clock; 1751 + 1752 + if (unlikely(kvm_read_guest_cached(v->kvm, &vcpu->pv_time, 1753 + &guest_hv_clock, sizeof(guest_hv_clock)))) 1754 + return; 1755 + 1756 + /* This VCPU is paused, but it's legal for a guest to read another 1757 + * VCPU's kvmclock, so we really have to follow the specification where 1758 + * it says that version is odd if data is being modified, and even after 1759 + * it is consistent. 1760 + * 1761 + * Version field updates must be kept separate. This is because 1762 + * kvm_write_guest_cached might use a "rep movs" instruction, and 1763 + * writes within a string instruction are weakly ordered. So there 1764 + * are three writes overall. 1765 + * 1766 + * As a small optimization, only write the version field in the first 1767 + * and third write. The vcpu->pv_time cache is still valid, because the 1768 + * version field is the first in the struct. 1769 + */ 1770 + BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0); 1771 + 1772 + vcpu->hv_clock.version = guest_hv_clock.version + 1; 1773 + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, 1774 + &vcpu->hv_clock, 1775 + sizeof(vcpu->hv_clock.version)); 1776 + 1777 + smp_wmb(); 1778 + 1779 + /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */ 1780 + vcpu->hv_clock.flags |= (guest_hv_clock.flags & PVCLOCK_GUEST_STOPPED); 1781 + 1782 + if (vcpu->pvclock_set_guest_stopped_request) { 1783 + vcpu->hv_clock.flags |= PVCLOCK_GUEST_STOPPED; 1784 + vcpu->pvclock_set_guest_stopped_request = false; 1785 + } 1786 + 1787 + trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock); 1788 + 1789 + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, 1790 + &vcpu->hv_clock, 1791 + sizeof(vcpu->hv_clock)); 1792 + 1793 + smp_wmb(); 1794 + 1795 + vcpu->hv_clock.version++; 1796 + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, 1797 + &vcpu->hv_clock, 1798 + sizeof(vcpu->hv_clock.version)); 1799 + } 1800 + 1725 1801 static int kvm_guest_time_update(struct kvm_vcpu *v) 1726 1802 { 1727 1803 unsigned long flags, tgt_tsc_khz; ··· 1811 1723 struct kvm_arch *ka = &v->kvm->arch; 1812 1724 s64 kernel_ns; 1813 1725 u64 tsc_timestamp, host_tsc; 1814 - struct pvclock_vcpu_time_info guest_hv_clock; 1815 1726 u8 pvclock_flags; 1816 1727 bool use_master_clock; 1817 1728 ··· 1839 1752 } 1840 1753 if (!use_master_clock) { 1841 1754 host_tsc = rdtsc(); 1842 - kernel_ns = get_kernel_ns(); 1755 + kernel_ns = ktime_get_boot_ns(); 1843 1756 } 1844 1757 1845 1758 tsc_timestamp = kvm_read_l1_tsc(v, host_tsc); ··· 1864 1777 1865 1778 local_irq_restore(flags); 1866 1779 1867 - if (!vcpu->pv_time_enabled) 1868 - return 0; 1780 + /* With all the info we got, fill in the values */ 1869 1781 1870 1782 if (kvm_has_tsc_control) 1871 1783 tgt_tsc_khz = kvm_scale_tsc(v, tgt_tsc_khz); ··· 1876 1790 vcpu->hw_tsc_khz = tgt_tsc_khz; 1877 1791 } 1878 1792 1879 - /* With all the info we got, fill in the values */ 1880 1793 vcpu->hv_clock.tsc_timestamp = tsc_timestamp; 1881 1794 vcpu->hv_clock.system_time = kernel_ns + v->kvm->arch.kvmclock_offset; 1882 1795 vcpu->last_guest_tsc = tsc_timestamp; 1883 1796 1884 - if (unlikely(kvm_read_guest_cached(v->kvm, &vcpu->pv_time, 1885 - &guest_hv_clock, sizeof(guest_hv_clock)))) 1886 - return 0; 1887 - 1888 - /* This VCPU is paused, but it's legal for a guest to read another 1889 - * VCPU's kvmclock, so we really have to follow the specification where 1890 - * it says that version is odd if data is being modified, and even after 1891 - * it is consistent. 1892 - * 1893 - * Version field updates must be kept separate. This is because 1894 - * kvm_write_guest_cached might use a "rep movs" instruction, and 1895 - * writes within a string instruction are weakly ordered. So there 1896 - * are three writes overall. 1897 - * 1898 - * As a small optimization, only write the version field in the first 1899 - * and third write. The vcpu->pv_time cache is still valid, because the 1900 - * version field is the first in the struct. 1901 - */ 1902 - BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0); 1903 - 1904 - vcpu->hv_clock.version = guest_hv_clock.version + 1; 1905 - kvm_write_guest_cached(v->kvm, &vcpu->pv_time, 1906 - &vcpu->hv_clock, 1907 - sizeof(vcpu->hv_clock.version)); 1908 - 1909 - smp_wmb(); 1910 - 1911 - /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */ 1912 - pvclock_flags = (guest_hv_clock.flags & PVCLOCK_GUEST_STOPPED); 1913 - 1914 - if (vcpu->pvclock_set_guest_stopped_request) { 1915 - pvclock_flags |= PVCLOCK_GUEST_STOPPED; 1916 - vcpu->pvclock_set_guest_stopped_request = false; 1917 - } 1918 - 1919 1797 /* If the host uses TSC clocksource, then it is stable */ 1798 + pvclock_flags = 0; 1920 1799 if (use_master_clock) 1921 1800 pvclock_flags |= PVCLOCK_TSC_STABLE_BIT; 1922 1801 1923 1802 vcpu->hv_clock.flags = pvclock_flags; 1924 1803 1925 - trace_kvm_pvclock_update(v->vcpu_id, &vcpu->hv_clock); 1926 - 1927 - kvm_write_guest_cached(v->kvm, &vcpu->pv_time, 1928 - &vcpu->hv_clock, 1929 - sizeof(vcpu->hv_clock)); 1930 - 1931 - smp_wmb(); 1932 - 1933 - vcpu->hv_clock.version++; 1934 - kvm_write_guest_cached(v->kvm, &vcpu->pv_time, 1935 - &vcpu->hv_clock, 1936 - sizeof(vcpu->hv_clock.version)); 1804 + if (vcpu->pv_time_enabled) 1805 + kvm_setup_pvclock_page(v); 1806 + if (v == kvm_get_vcpu(v->kvm, 0)) 1807 + kvm_hv_setup_tsc_page(v->kvm, &vcpu->hv_clock); 1937 1808 return 0; 1938 1809 } 1939 1810 ··· 2789 2746 if (check_tsc_unstable()) { 2790 2747 u64 offset = kvm_compute_tsc_offset(vcpu, 2791 2748 vcpu->arch.last_guest_tsc); 2792 - kvm_x86_ops->write_tsc_offset(vcpu, offset); 2749 + kvm_vcpu_write_tsc_offset(vcpu, offset); 2793 2750 vcpu->arch.tsc_catchup = 1; 2794 2751 } 2795 2752 if (kvm_lapic_hv_timer_in_use(vcpu) && ··· 4082 4039 case KVM_SET_CLOCK: { 4083 4040 struct kvm_clock_data user_ns; 4084 4041 u64 now_ns; 4085 - s64 delta; 4086 4042 4087 4043 r = -EFAULT; 4088 4044 if (copy_from_user(&user_ns, argp, sizeof(user_ns))) ··· 4093 4051 4094 4052 r = 0; 4095 4053 local_irq_disable(); 4096 - now_ns = get_kernel_ns(); 4097 - delta = user_ns.clock - now_ns; 4054 + now_ns = __get_kvmclock_ns(kvm); 4055 + kvm->arch.kvmclock_offset += user_ns.clock - now_ns; 4098 4056 local_irq_enable(); 4099 - kvm->arch.kvmclock_offset = delta; 4100 4057 kvm_gen_update_masterclock(kvm); 4101 4058 break; 4102 4059 } ··· 4103 4062 struct kvm_clock_data user_ns; 4104 4063 u64 now_ns; 4105 4064 4106 - local_irq_disable(); 4107 - now_ns = get_kernel_ns(); 4108 - user_ns.clock = kvm->arch.kvmclock_offset + now_ns; 4109 - local_irq_enable(); 4065 + now_ns = get_kvmclock_ns(kvm); 4066 + user_ns.clock = now_ns; 4110 4067 user_ns.flags = 0; 4111 4068 memset(&user_ns.pad, 0, sizeof(user_ns.pad)); 4112 4069 ··· 6739 6700 6740 6701 kvm_put_guest_xcr0(vcpu); 6741 6702 6742 - /* Interrupt is enabled by handle_external_intr() */ 6743 6703 kvm_x86_ops->handle_external_intr(vcpu); 6744 6704 6745 6705 ++vcpu->stat.exits; ··· 7568 7530 * before any KVM threads can be running. Unfortunately, we can't 7569 7531 * bring the TSCs fully up to date with real time, as we aren't yet far 7570 7532 * enough into CPU bringup that we know how much real time has actually 7571 - * elapsed; our helper function, get_kernel_ns() will be using boot 7533 + * elapsed; our helper function, ktime_get_boot_ns() will be using boot 7572 7534 * variables that haven't been updated yet. 7573 7535 * 7574 7536 * So we simply find the maximum observed TSC above, then record the ··· 7803 7765 mutex_init(&kvm->arch.apic_map_lock); 7804 7766 spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock); 7805 7767 7768 + kvm->arch.kvmclock_offset = -ktime_get_boot_ns(); 7806 7769 pvclock_update_vm_gtod_copy(kvm); 7807 7770 7808 7771 INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn);

+1 -5

arch/x86/kvm/x86.h

··· 148 148 return kvm_register_write(vcpu, reg, val); 149 149 } 150 150 151 - static inline u64 get_kernel_ns(void) 152 - { 153 - return ktime_get_boot_ns(); 154 - } 155 - 156 151 static inline bool kvm_check_has_quirk(struct kvm *kvm, u64 quirk) 157 152 { 158 153 return !(kvm->arch.disabled_quirks & quirk); ··· 159 164 int kvm_inject_realmode_interrupt(struct kvm_vcpu *vcpu, int irq, int inc_eip); 160 165 161 166 void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr); 167 + u64 get_kvmclock_ns(struct kvm *kvm); 162 168 163 169 int kvm_read_guest_virt(struct x86_emulate_ctxt *ctxt, 164 170 gva_t addr, void *val, unsigned int bytes,

+427 -57

drivers/iommu/amd_iommu.c

··· 137 137 bool pri_tlp; /* PASID TLB required for 138 138 PPR completions */ 139 139 u32 errata; /* Bitmap for errata to apply */ 140 + bool use_vapic; /* Enable device to use vapic mode */ 140 141 }; 141 142 142 143 /* ··· 708 707 } 709 708 } 710 709 710 + #ifdef CONFIG_IRQ_REMAP 711 + static int (*iommu_ga_log_notifier)(u32); 712 + 713 + int amd_iommu_register_ga_log_notifier(int (*notifier)(u32)) 714 + { 715 + iommu_ga_log_notifier = notifier; 716 + 717 + return 0; 718 + } 719 + EXPORT_SYMBOL(amd_iommu_register_ga_log_notifier); 720 + 721 + static void iommu_poll_ga_log(struct amd_iommu *iommu) 722 + { 723 + u32 head, tail, cnt = 0; 724 + 725 + if (iommu->ga_log == NULL) 726 + return; 727 + 728 + head = readl(iommu->mmio_base + MMIO_GA_HEAD_OFFSET); 729 + tail = readl(iommu->mmio_base + MMIO_GA_TAIL_OFFSET); 730 + 731 + while (head != tail) { 732 + volatile u64 *raw; 733 + u64 log_entry; 734 + 735 + raw = (u64 *)(iommu->ga_log + head); 736 + cnt++; 737 + 738 + /* Avoid memcpy function-call overhead */ 739 + log_entry = *raw; 740 + 741 + /* Update head pointer of hardware ring-buffer */ 742 + head = (head + GA_ENTRY_SIZE) % GA_LOG_SIZE; 743 + writel(head, iommu->mmio_base + MMIO_GA_HEAD_OFFSET); 744 + 745 + /* Handle GA entry */ 746 + switch (GA_REQ_TYPE(log_entry)) { 747 + case GA_GUEST_NR: 748 + if (!iommu_ga_log_notifier) 749 + break; 750 + 751 + pr_debug("AMD-Vi: %s: devid=%#x, ga_tag=%#x\n", 752 + __func__, GA_DEVID(log_entry), 753 + GA_TAG(log_entry)); 754 + 755 + if (iommu_ga_log_notifier(GA_TAG(log_entry)) != 0) 756 + pr_err("AMD-Vi: GA log notifier failed.\n"); 757 + break; 758 + default: 759 + break; 760 + } 761 + } 762 + } 763 + #endif /* CONFIG_IRQ_REMAP */ 764 + 765 + #define AMD_IOMMU_INT_MASK \ 766 + (MMIO_STATUS_EVT_INT_MASK | \ 767 + MMIO_STATUS_PPR_INT_MASK | \ 768 + MMIO_STATUS_GALOG_INT_MASK) 769 + 711 770 irqreturn_t amd_iommu_int_thread(int irq, void *data) 712 771 { 713 772 struct amd_iommu *iommu = (struct amd_iommu *) data; 714 773 u32 status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET); 715 774 716 - while (status & (MMIO_STATUS_EVT_INT_MASK | MMIO_STATUS_PPR_INT_MASK)) { 717 - /* Enable EVT and PPR interrupts again */ 718 - writel((MMIO_STATUS_EVT_INT_MASK | MMIO_STATUS_PPR_INT_MASK), 775 + while (status & AMD_IOMMU_INT_MASK) { 776 + /* Enable EVT and PPR and GA interrupts again */ 777 + writel(AMD_IOMMU_INT_MASK, 719 778 iommu->mmio_base + MMIO_STATUS_OFFSET); 720 779 721 780 if (status & MMIO_STATUS_EVT_INT_MASK) { ··· 787 726 pr_devel("AMD-Vi: Processing IOMMU PPR Log\n"); 788 727 iommu_poll_ppr_log(iommu); 789 728 } 729 + 730 + #ifdef CONFIG_IRQ_REMAP 731 + if (status & MMIO_STATUS_GALOG_INT_MASK) { 732 + pr_devel("AMD-Vi: Processing IOMMU GA Log\n"); 733 + iommu_poll_ga_log(iommu); 734 + } 735 + #endif 790 736 791 737 /* 792 738 * Hardware bug: ERBT1312 ··· 3035 2967 if (!iommu) 3036 2968 return; 3037 2969 2970 + #ifdef CONFIG_IRQ_REMAP 2971 + if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) && 2972 + (dom->type == IOMMU_DOMAIN_UNMANAGED)) 2973 + dev_data->use_vapic = 0; 2974 + #endif 2975 + 3038 2976 iommu_completion_wait(iommu); 3039 2977 } 3040 2978 ··· 3065 2991 detach_device(dev); 3066 2992 3067 2993 ret = attach_device(dev, domain); 2994 + 2995 + #ifdef CONFIG_IRQ_REMAP 2996 + if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir)) { 2997 + if (dom->type == IOMMU_DOMAIN_UNMANAGED) 2998 + dev_data->use_vapic = 1; 2999 + else 3000 + dev_data->use_vapic = 0; 3001 + } 3002 + #endif 3068 3003 3069 3004 iommu_completion_wait(iommu); 3070 3005 ··· 3613 3530 * 3614 3531 *****************************************************************************/ 3615 3532 3616 - union irte { 3617 - u32 val; 3618 - struct { 3619 - u32 valid : 1, 3620 - no_fault : 1, 3621 - int_type : 3, 3622 - rq_eoi : 1, 3623 - dm : 1, 3624 - rsvd_1 : 1, 3625 - destination : 8, 3626 - vector : 8, 3627 - rsvd_2 : 8; 3628 - } fields; 3629 - }; 3630 - 3631 - struct irq_2_irte { 3632 - u16 devid; /* Device ID for IRTE table */ 3633 - u16 index; /* Index into IRTE table*/ 3634 - }; 3635 - 3636 - struct amd_ir_data { 3637 - struct irq_2_irte irq_2_irte; 3638 - union irte irte_entry; 3639 - union { 3640 - struct msi_msg msi_entry; 3641 - }; 3642 - }; 3643 - 3644 3533 static struct irq_chip amd_ir_chip; 3645 3534 3646 3535 #define DTE_IRQ_PHYS_ADDR_MASK (((1ULL << 45)-1) << 6) ··· 3633 3578 3634 3579 amd_iommu_dev_table[devid].data[2] = dte; 3635 3580 } 3636 - 3637 - #define IRTE_ALLOCATED (~1U) 3638 3581 3639 3582 static struct irq_remap_table *get_irq_table(u16 devid, bool ioapic) 3640 3583 { ··· 3679 3626 goto out; 3680 3627 } 3681 3628 3682 - memset(table->table, 0, MAX_IRQS_PER_TABLE * sizeof(u32)); 3629 + if (!AMD_IOMMU_GUEST_IR_GA(amd_iommu_guest_ir)) 3630 + memset(table->table, 0, 3631 + MAX_IRQS_PER_TABLE * sizeof(u32)); 3632 + else 3633 + memset(table->table, 0, 3634 + (MAX_IRQS_PER_TABLE * (sizeof(u64) * 2))); 3683 3635 3684 3636 if (ioapic) { 3685 3637 int i; 3686 3638 3687 3639 for (i = 0; i < 32; ++i) 3688 - table->table[i] = IRTE_ALLOCATED; 3640 + iommu->irte_ops->set_allocated(table, i); 3689 3641 } 3690 3642 3691 3643 irq_lookup_table[devid] = table; ··· 3716 3658 struct irq_remap_table *table; 3717 3659 unsigned long flags; 3718 3660 int index, c; 3661 + struct amd_iommu *iommu = amd_iommu_rlookup_table[devid]; 3662 + 3663 + if (!iommu) 3664 + return -ENODEV; 3719 3665 3720 3666 table = get_irq_table(devid, false); 3721 3667 if (!table) ··· 3731 3669 for (c = 0, index = table->min_index; 3732 3670 index < MAX_IRQS_PER_TABLE; 3733 3671 ++index) { 3734 - if (table->table[index] == 0) 3672 + if (!iommu->irte_ops->is_allocated(table, index)) 3735 3673 c += 1; 3736 3674 else 3737 3675 c = 0; 3738 3676 3739 3677 if (c == count) { 3740 3678 for (; c != 0; --c) 3741 - table->table[index - c + 1] = IRTE_ALLOCATED; 3679 + iommu->irte_ops->set_allocated(table, index - c + 1); 3742 3680 3743 3681 index -= count - 1; 3744 3682 goto out; ··· 3753 3691 return index; 3754 3692 } 3755 3693 3756 - static int modify_irte(u16 devid, int index, union irte irte) 3694 + static int modify_irte_ga(u16 devid, int index, struct irte_ga *irte, 3695 + struct amd_ir_data *data) 3696 + { 3697 + struct irq_remap_table *table; 3698 + struct amd_iommu *iommu; 3699 + unsigned long flags; 3700 + struct irte_ga *entry; 3701 + 3702 + iommu = amd_iommu_rlookup_table[devid]; 3703 + if (iommu == NULL) 3704 + return -EINVAL; 3705 + 3706 + table = get_irq_table(devid, false); 3707 + if (!table) 3708 + return -ENOMEM; 3709 + 3710 + spin_lock_irqsave(&table->lock, flags); 3711 + 3712 + entry = (struct irte_ga *)table->table; 3713 + entry = &entry[index]; 3714 + entry->lo.fields_remap.valid = 0; 3715 + entry->hi.val = irte->hi.val; 3716 + entry->lo.val = irte->lo.val; 3717 + entry->lo.fields_remap.valid = 1; 3718 + if (data) 3719 + data->ref = entry; 3720 + 3721 + spin_unlock_irqrestore(&table->lock, flags); 3722 + 3723 + iommu_flush_irt(iommu, devid); 3724 + iommu_completion_wait(iommu); 3725 + 3726 + return 0; 3727 + } 3728 + 3729 + static int modify_irte(u16 devid, int index, union irte *irte) 3757 3730 { 3758 3731 struct irq_remap_table *table; 3759 3732 struct amd_iommu *iommu; ··· 3803 3706 return -ENOMEM; 3804 3707 3805 3708 spin_lock_irqsave(&table->lock, flags); 3806 - table->table[index] = irte.val; 3709 + table->table[index] = irte->val; 3807 3710 spin_unlock_irqrestore(&table->lock, flags); 3808 3711 3809 3712 iommu_flush_irt(iommu, devid); ··· 3827 3730 return; 3828 3731 3829 3732 spin_lock_irqsave(&table->lock, flags); 3830 - table->table[index] = 0; 3733 + iommu->irte_ops->clear_allocated(table, index); 3831 3734 spin_unlock_irqrestore(&table->lock, flags); 3832 3735 3833 3736 iommu_flush_irt(iommu, devid); 3834 3737 iommu_completion_wait(iommu); 3738 + } 3739 + 3740 + static void irte_prepare(void *entry, 3741 + u32 delivery_mode, u32 dest_mode, 3742 + u8 vector, u32 dest_apicid, int devid) 3743 + { 3744 + union irte *irte = (union irte *) entry; 3745 + 3746 + irte->val = 0; 3747 + irte->fields.vector = vector; 3748 + irte->fields.int_type = delivery_mode; 3749 + irte->fields.destination = dest_apicid; 3750 + irte->fields.dm = dest_mode; 3751 + irte->fields.valid = 1; 3752 + } 3753 + 3754 + static void irte_ga_prepare(void *entry, 3755 + u32 delivery_mode, u32 dest_mode, 3756 + u8 vector, u32 dest_apicid, int devid) 3757 + { 3758 + struct irte_ga *irte = (struct irte_ga *) entry; 3759 + struct iommu_dev_data *dev_data = search_dev_data(devid); 3760 + 3761 + irte->lo.val = 0; 3762 + irte->hi.val = 0; 3763 + irte->lo.fields_remap.guest_mode = dev_data ? dev_data->use_vapic : 0; 3764 + irte->lo.fields_remap.int_type = delivery_mode; 3765 + irte->lo.fields_remap.dm = dest_mode; 3766 + irte->hi.fields.vector = vector; 3767 + irte->lo.fields_remap.destination = dest_apicid; 3768 + irte->lo.fields_remap.valid = 1; 3769 + } 3770 + 3771 + static void irte_activate(void *entry, u16 devid, u16 index) 3772 + { 3773 + union irte *irte = (union irte *) entry; 3774 + 3775 + irte->fields.valid = 1; 3776 + modify_irte(devid, index, irte); 3777 + } 3778 + 3779 + static void irte_ga_activate(void *entry, u16 devid, u16 index) 3780 + { 3781 + struct irte_ga *irte = (struct irte_ga *) entry; 3782 + 3783 + irte->lo.fields_remap.valid = 1; 3784 + modify_irte_ga(devid, index, irte, NULL); 3785 + } 3786 + 3787 + static void irte_deactivate(void *entry, u16 devid, u16 index) 3788 + { 3789 + union irte *irte = (union irte *) entry; 3790 + 3791 + irte->fields.valid = 0; 3792 + modify_irte(devid, index, irte); 3793 + } 3794 + 3795 + static void irte_ga_deactivate(void *entry, u16 devid, u16 index) 3796 + { 3797 + struct irte_ga *irte = (struct irte_ga *) entry; 3798 + 3799 + irte->lo.fields_remap.valid = 0; 3800 + modify_irte_ga(devid, index, irte, NULL); 3801 + } 3802 + 3803 + static void irte_set_affinity(void *entry, u16 devid, u16 index, 3804 + u8 vector, u32 dest_apicid) 3805 + { 3806 + union irte *irte = (union irte *) entry; 3807 + 3808 + irte->fields.vector = vector; 3809 + irte->fields.destination = dest_apicid; 3810 + modify_irte(devid, index, irte); 3811 + } 3812 + 3813 + static void irte_ga_set_affinity(void *entry, u16 devid, u16 index, 3814 + u8 vector, u32 dest_apicid) 3815 + { 3816 + struct irte_ga *irte = (struct irte_ga *) entry; 3817 + struct iommu_dev_data *dev_data = search_dev_data(devid); 3818 + 3819 + if (!dev_data || !dev_data->use_vapic) { 3820 + irte->hi.fields.vector = vector; 3821 + irte->lo.fields_remap.destination = dest_apicid; 3822 + irte->lo.fields_remap.guest_mode = 0; 3823 + modify_irte_ga(devid, index, irte, NULL); 3824 + } 3825 + } 3826 + 3827 + #define IRTE_ALLOCATED (~1U) 3828 + static void irte_set_allocated(struct irq_remap_table *table, int index) 3829 + { 3830 + table->table[index] = IRTE_ALLOCATED; 3831 + } 3832 + 3833 + static void irte_ga_set_allocated(struct irq_remap_table *table, int index) 3834 + { 3835 + struct irte_ga *ptr = (struct irte_ga *)table->table; 3836 + struct irte_ga *irte = &ptr[index]; 3837 + 3838 + memset(&irte->lo.val, 0, sizeof(u64)); 3839 + memset(&irte->hi.val, 0, sizeof(u64)); 3840 + irte->hi.fields.vector = 0xff; 3841 + } 3842 + 3843 + static bool irte_is_allocated(struct irq_remap_table *table, int index) 3844 + { 3845 + union irte *ptr = (union irte *)table->table; 3846 + union irte *irte = &ptr[index]; 3847 + 3848 + return irte->val != 0; 3849 + } 3850 + 3851 + static bool irte_ga_is_allocated(struct irq_remap_table *table, int index) 3852 + { 3853 + struct irte_ga *ptr = (struct irte_ga *)table->table; 3854 + struct irte_ga *irte = &ptr[index]; 3855 + 3856 + return irte->hi.fields.vector != 0; 3857 + } 3858 + 3859 + static void irte_clear_allocated(struct irq_remap_table *table, int index) 3860 + { 3861 + table->table[index] = 0; 3862 + } 3863 + 3864 + static void irte_ga_clear_allocated(struct irq_remap_table *table, int index) 3865 + { 3866 + struct irte_ga *ptr = (struct irte_ga *)table->table; 3867 + struct irte_ga *irte = &ptr[index]; 3868 + 3869 + memset(&irte->lo.val, 0, sizeof(u64)); 3870 + memset(&irte->hi.val, 0, sizeof(u64)); 3835 3871 } 3836 3872 3837 3873 static int get_devid(struct irq_alloc_info *info) ··· 4051 3821 { 4052 3822 struct irq_2_irte *irte_info = &data->irq_2_irte; 4053 3823 struct msi_msg *msg = &data->msi_entry; 4054 - union irte *irte = &data->irte_entry; 4055 3824 struct IO_APIC_route_entry *entry; 3825 + struct amd_iommu *iommu = amd_iommu_rlookup_table[devid]; 3826 + 3827 + if (!iommu) 3828 + return; 4056 3829 4057 3830 data->irq_2_irte.devid = devid; 4058 3831 data->irq_2_irte.index = index + sub_handle; 4059 - 4060 - /* Setup IRTE for IOMMU */ 4061 - irte->val = 0; 4062 - irte->fields.vector = irq_cfg->vector; 4063 - irte->fields.int_type = apic->irq_delivery_mode; 4064 - irte->fields.destination = irq_cfg->dest_apicid; 4065 - irte->fields.dm = apic->irq_dest_mode; 4066 - irte->fields.valid = 1; 3832 + iommu->irte_ops->prepare(data->entry, apic->irq_delivery_mode, 3833 + apic->irq_dest_mode, irq_cfg->vector, 3834 + irq_cfg->dest_apicid, devid); 4067 3835 4068 3836 switch (info->type) { 4069 3837 case X86_IRQ_ALLOC_TYPE_IOAPIC: ··· 4092 3864 } 4093 3865 } 4094 3866 3867 + struct amd_irte_ops irte_32_ops = { 3868 + .prepare = irte_prepare, 3869 + .activate = irte_activate, 3870 + .deactivate = irte_deactivate, 3871 + .set_affinity = irte_set_affinity, 3872 + .set_allocated = irte_set_allocated, 3873 + .is_allocated = irte_is_allocated, 3874 + .clear_allocated = irte_clear_allocated, 3875 + }; 3876 + 3877 + struct amd_irte_ops irte_128_ops = { 3878 + .prepare = irte_ga_prepare, 3879 + .activate = irte_ga_activate, 3880 + .deactivate = irte_ga_deactivate, 3881 + .set_affinity = irte_ga_set_affinity, 3882 + .set_allocated = irte_ga_set_allocated, 3883 + .is_allocated = irte_ga_is_allocated, 3884 + .clear_allocated = irte_ga_clear_allocated, 3885 + }; 3886 + 4095 3887 static int irq_remapping_alloc(struct irq_domain *domain, unsigned int virq, 4096 3888 unsigned int nr_irqs, void *arg) 4097 3889 { 4098 3890 struct irq_alloc_info *info = arg; 4099 3891 struct irq_data *irq_data; 4100 - struct amd_ir_data *data; 3892 + struct amd_ir_data *data = NULL; 4101 3893 struct irq_cfg *cfg; 4102 3894 int i, ret, devid; 4103 3895 int index = -1; ··· 4169 3921 if (!data) 4170 3922 goto out_free_data; 4171 3923 3924 + if (!AMD_IOMMU_GUEST_IR_GA(amd_iommu_guest_ir)) 3925 + data->entry = kzalloc(sizeof(union irte), GFP_KERNEL); 3926 + else 3927 + data->entry = kzalloc(sizeof(struct irte_ga), 3928 + GFP_KERNEL); 3929 + if (!data->entry) { 3930 + kfree(data); 3931 + goto out_free_data; 3932 + } 3933 + 4172 3934 irq_data->hwirq = (devid << 16) + i; 4173 3935 irq_data->chip_data = data; 4174 3936 irq_data->chip = &amd_ir_chip; ··· 4215 3957 data = irq_data->chip_data; 4216 3958 irte_info = &data->irq_2_irte; 4217 3959 free_irte(irte_info->devid, irte_info->index); 3960 + kfree(data->entry); 4218 3961 kfree(data); 4219 3962 } 4220 3963 } ··· 4227 3968 { 4228 3969 struct amd_ir_data *data = irq_data->chip_data; 4229 3970 struct irq_2_irte *irte_info = &data->irq_2_irte; 3971 + struct amd_iommu *iommu = amd_iommu_rlookup_table[irte_info->devid]; 4230 3972 4231 - modify_irte(irte_info->devid, irte_info->index, data->irte_entry); 3973 + if (iommu) 3974 + iommu->irte_ops->activate(data->entry, irte_info->devid, 3975 + irte_info->index); 4232 3976 } 4233 3977 4234 3978 static void irq_remapping_deactivate(struct irq_domain *domain, ··· 4239 3977 { 4240 3978 struct amd_ir_data *data = irq_data->chip_data; 4241 3979 struct irq_2_irte *irte_info = &data->irq_2_irte; 4242 - union irte entry; 3980 + struct amd_iommu *iommu = amd_iommu_rlookup_table[irte_info->devid]; 4243 3981 4244 - entry.val = 0; 4245 - modify_irte(irte_info->devid, irte_info->index, data->irte_entry); 3982 + if (iommu) 3983 + iommu->irte_ops->deactivate(data->entry, irte_info->devid, 3984 + irte_info->index); 4246 3985 } 4247 3986 4248 3987 static struct irq_domain_ops amd_ir_domain_ops = { ··· 4253 3990 .deactivate = irq_remapping_deactivate, 4254 3991 }; 4255 3992 3993 + static int amd_ir_set_vcpu_affinity(struct irq_data *data, void *vcpu_info) 3994 + { 3995 + struct amd_iommu *iommu; 3996 + struct amd_iommu_pi_data *pi_data = vcpu_info; 3997 + struct vcpu_data *vcpu_pi_info = pi_data->vcpu_data; 3998 + struct amd_ir_data *ir_data = data->chip_data; 3999 + struct irte_ga *irte = (struct irte_ga *) ir_data->entry; 4000 + struct irq_2_irte *irte_info = &ir_data->irq_2_irte; 4001 + struct iommu_dev_data *dev_data = search_dev_data(irte_info->devid); 4002 + 4003 + /* Note: 4004 + * This device has never been set up for guest mode. 4005 + * we should not modify the IRTE 4006 + */ 4007 + if (!dev_data || !dev_data->use_vapic) 4008 + return 0; 4009 + 4010 + pi_data->ir_data = ir_data; 4011 + 4012 + /* Note: 4013 + * SVM tries to set up for VAPIC mode, but we are in 4014 + * legacy mode. So, we force legacy mode instead. 4015 + */ 4016 + if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir)) { 4017 + pr_debug("AMD-Vi: %s: Fall back to using intr legacy remap\n", 4018 + __func__); 4019 + pi_data->is_guest_mode = false; 4020 + } 4021 + 4022 + iommu = amd_iommu_rlookup_table[irte_info->devid]; 4023 + if (iommu == NULL) 4024 + return -EINVAL; 4025 + 4026 + pi_data->prev_ga_tag = ir_data->cached_ga_tag; 4027 + if (pi_data->is_guest_mode) { 4028 + /* Setting */ 4029 + irte->hi.fields.ga_root_ptr = (pi_data->base >> 12); 4030 + irte->hi.fields.vector = vcpu_pi_info->vector; 4031 + irte->lo.fields_vapic.guest_mode = 1; 4032 + irte->lo.fields_vapic.ga_tag = pi_data->ga_tag; 4033 + 4034 + ir_data->cached_ga_tag = pi_data->ga_tag; 4035 + } else { 4036 + /* Un-Setting */ 4037 + struct irq_cfg *cfg = irqd_cfg(data); 4038 + 4039 + irte->hi.val = 0; 4040 + irte->lo.val = 0; 4041 + irte->hi.fields.vector = cfg->vector; 4042 + irte->lo.fields_remap.guest_mode = 0; 4043 + irte->lo.fields_remap.destination = cfg->dest_apicid; 4044 + irte->lo.fields_remap.int_type = apic->irq_delivery_mode; 4045 + irte->lo.fields_remap.dm = apic->irq_dest_mode; 4046 + 4047 + /* 4048 + * This communicates the ga_tag back to the caller 4049 + * so that it can do all the necessary clean up. 4050 + */ 4051 + ir_data->cached_ga_tag = 0; 4052 + } 4053 + 4054 + return modify_irte_ga(irte_info->devid, irte_info->index, irte, ir_data); 4055 + } 4056 + 4256 4057 static int amd_ir_set_affinity(struct irq_data *data, 4257 4058 const struct cpumask *mask, bool force) 4258 4059 { ··· 4324 3997 struct irq_2_irte *irte_info = &ir_data->irq_2_irte; 4325 3998 struct irq_cfg *cfg = irqd_cfg(data); 4326 3999 struct irq_data *parent = data->parent_data; 4000 + struct amd_iommu *iommu = amd_iommu_rlookup_table[irte_info->devid]; 4327 4001 int ret; 4002 + 4003 + if (!iommu) 4004 + return -ENODEV; 4328 4005 4329 4006 ret = parent->chip->irq_set_affinity(parent, mask, force); 4330 4007 if (ret < 0 || ret == IRQ_SET_MASK_OK_DONE) ··· 4338 4007 * Atomically updates the IRTE with the new destination, vector 4339 4008 * and flushes the interrupt entry cache. 4340 4009 */ 4341 - ir_data->irte_entry.fields.vector = cfg->vector; 4342 - ir_data->irte_entry.fields.destination = cfg->dest_apicid; 4343 - modify_irte(irte_info->devid, irte_info->index, ir_data->irte_entry); 4010 + iommu->irte_ops->set_affinity(ir_data->entry, irte_info->devid, 4011 + irte_info->index, cfg->vector, cfg->dest_apicid); 4344 4012 4345 4013 /* 4346 4014 * After this point, all the interrupts will start arriving ··· 4361 4031 static struct irq_chip amd_ir_chip = { 4362 4032 .irq_ack = ir_ack_apic_edge, 4363 4033 .irq_set_affinity = amd_ir_set_affinity, 4034 + .irq_set_vcpu_affinity = amd_ir_set_vcpu_affinity, 4364 4035 .irq_compose_msi_msg = ir_compose_msi_msg, 4365 4036 }; 4366 4037 ··· 4376 4045 4377 4046 return 0; 4378 4047 } 4048 + 4049 + int amd_iommu_update_ga(int cpu, bool is_run, void *data) 4050 + { 4051 + unsigned long flags; 4052 + struct amd_iommu *iommu; 4053 + struct irq_remap_table *irt; 4054 + struct amd_ir_data *ir_data = (struct amd_ir_data *)data; 4055 + int devid = ir_data->irq_2_irte.devid; 4056 + struct irte_ga *entry = (struct irte_ga *) ir_data->entry; 4057 + struct irte_ga *ref = (struct irte_ga *) ir_data->ref; 4058 + 4059 + if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) || 4060 + !ref || !entry || !entry->lo.fields_vapic.guest_mode) 4061 + return 0; 4062 + 4063 + iommu = amd_iommu_rlookup_table[devid]; 4064 + if (!iommu) 4065 + return -ENODEV; 4066 + 4067 + irt = get_irq_table(devid, false); 4068 + if (!irt) 4069 + return -ENODEV; 4070 + 4071 + spin_lock_irqsave(&irt->lock, flags); 4072 + 4073 + if (ref->lo.fields_vapic.guest_mode) { 4074 + if (cpu >= 0) 4075 + ref->lo.fields_vapic.destination = cpu; 4076 + ref->lo.fields_vapic.is_run = is_run; 4077 + barrier(); 4078 + } 4079 + 4080 + spin_unlock_irqrestore(&irt->lock, flags); 4081 + 4082 + iommu_flush_irt(iommu, devid); 4083 + iommu_completion_wait(iommu); 4084 + return 0; 4085 + } 4086 + EXPORT_SYMBOL(amd_iommu_update_ga); 4379 4087 #endif

+175 -6

drivers/iommu/amd_iommu_init.c

··· 84 84 #define ACPI_DEVFLAG_LINT1 0x80 85 85 #define ACPI_DEVFLAG_ATSDIS 0x10000000 86 86 87 + #define LOOP_TIMEOUT 100000 87 88 /* 88 89 * ACPI table definitions 89 90 * ··· 145 144 146 145 bool amd_iommu_dump; 147 146 bool amd_iommu_irq_remap __read_mostly; 147 + 148 + int amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_VAPIC; 148 149 149 150 static bool amd_iommu_detected; 150 151 static bool __initdata amd_iommu_disabled; ··· 388 385 /* Disable event logging and event interrupts */ 389 386 iommu_feature_disable(iommu, CONTROL_EVT_INT_EN); 390 387 iommu_feature_disable(iommu, CONTROL_EVT_LOG_EN); 388 + 389 + /* Disable IOMMU GA_LOG */ 390 + iommu_feature_disable(iommu, CONTROL_GALOG_EN); 391 + iommu_feature_disable(iommu, CONTROL_GAINT_EN); 391 392 392 393 /* Disable IOMMU hardware itself */ 393 394 iommu_feature_disable(iommu, CONTROL_IOMMU_EN); ··· 676 669 return; 677 670 678 671 free_pages((unsigned long)iommu->ppr_log, get_order(PPR_LOG_SIZE)); 672 + } 673 + 674 + static void free_ga_log(struct amd_iommu *iommu) 675 + { 676 + #ifdef CONFIG_IRQ_REMAP 677 + if (iommu->ga_log) 678 + free_pages((unsigned long)iommu->ga_log, 679 + get_order(GA_LOG_SIZE)); 680 + if (iommu->ga_log_tail) 681 + free_pages((unsigned long)iommu->ga_log_tail, 682 + get_order(8)); 683 + #endif 684 + } 685 + 686 + static int iommu_ga_log_enable(struct amd_iommu *iommu) 687 + { 688 + #ifdef CONFIG_IRQ_REMAP 689 + u32 status, i; 690 + 691 + if (!iommu->ga_log) 692 + return -EINVAL; 693 + 694 + status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET); 695 + 696 + /* Check if already running */ 697 + if (status & (MMIO_STATUS_GALOG_RUN_MASK)) 698 + return 0; 699 + 700 + iommu_feature_enable(iommu, CONTROL_GAINT_EN); 701 + iommu_feature_enable(iommu, CONTROL_GALOG_EN); 702 + 703 + for (i = 0; i < LOOP_TIMEOUT; ++i) { 704 + status = readl(iommu->mmio_base + MMIO_STATUS_OFFSET); 705 + if (status & (MMIO_STATUS_GALOG_RUN_MASK)) 706 + break; 707 + } 708 + 709 + if (i >= LOOP_TIMEOUT) 710 + return -EINVAL; 711 + #endif /* CONFIG_IRQ_REMAP */ 712 + return 0; 713 + } 714 + 715 + #ifdef CONFIG_IRQ_REMAP 716 + static int iommu_init_ga_log(struct amd_iommu *iommu) 717 + { 718 + u64 entry; 719 + 720 + if (!AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir)) 721 + return 0; 722 + 723 + iommu->ga_log = (u8 *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 724 + get_order(GA_LOG_SIZE)); 725 + if (!iommu->ga_log) 726 + goto err_out; 727 + 728 + iommu->ga_log_tail = (u8 *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, 729 + get_order(8)); 730 + if (!iommu->ga_log_tail) 731 + goto err_out; 732 + 733 + entry = (u64)virt_to_phys(iommu->ga_log) | GA_LOG_SIZE_512; 734 + memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_BASE_OFFSET, 735 + &entry, sizeof(entry)); 736 + entry = ((u64)virt_to_phys(iommu->ga_log) & 0xFFFFFFFFFFFFFULL) & ~7ULL; 737 + memcpy_toio(iommu->mmio_base + MMIO_GA_LOG_TAIL_OFFSET, 738 + &entry, sizeof(entry)); 739 + writel(0x00, iommu->mmio_base + MMIO_GA_HEAD_OFFSET); 740 + writel(0x00, iommu->mmio_base + MMIO_GA_TAIL_OFFSET); 741 + 742 + return 0; 743 + err_out: 744 + free_ga_log(iommu); 745 + return -EINVAL; 746 + } 747 + #endif /* CONFIG_IRQ_REMAP */ 748 + 749 + static int iommu_init_ga(struct amd_iommu *iommu) 750 + { 751 + int ret = 0; 752 + 753 + #ifdef CONFIG_IRQ_REMAP 754 + /* Note: We have already checked GASup from IVRS table. 755 + * Now, we need to make sure that GAMSup is set. 756 + */ 757 + if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir) && 758 + !iommu_feature(iommu, FEATURE_GAM_VAPIC)) 759 + amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY_GA; 760 + 761 + ret = iommu_init_ga_log(iommu); 762 + #endif /* CONFIG_IRQ_REMAP */ 763 + 764 + return ret; 679 765 } 680 766 681 767 static void iommu_enable_gt(struct amd_iommu *iommu) ··· 1244 1144 free_command_buffer(iommu); 1245 1145 free_event_buffer(iommu); 1246 1146 free_ppr_log(iommu); 1147 + free_ga_log(iommu); 1247 1148 iommu_unmap_mmio_space(iommu); 1248 1149 } 1249 1150 ··· 1359 1258 iommu->mmio_phys_end = MMIO_REG_END_OFFSET; 1360 1259 else 1361 1260 iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET; 1261 + if (((h->efr_attr & (0x1 << IOMMU_FEAT_GASUP_SHIFT)) == 0)) 1262 + amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY; 1362 1263 break; 1363 1264 case 0x11: 1364 1265 case 0x40: ··· 1368 1265 iommu->mmio_phys_end = MMIO_REG_END_OFFSET; 1369 1266 else 1370 1267 iommu->mmio_phys_end = MMIO_CNTR_CONF_OFFSET; 1268 + if (((h->efr_reg & (0x1 << IOMMU_EFR_GASUP_SHIFT)) == 0)) 1269 + amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY; 1371 1270 break; 1372 1271 default: 1373 1272 return -EINVAL; ··· 1537 1432 { 1538 1433 int cap_ptr = iommu->cap_ptr; 1539 1434 u32 range, misc, low, high; 1435 + int ret; 1540 1436 1541 1437 iommu->dev = pci_get_bus_and_slot(PCI_BUS_NUM(iommu->devid), 1542 1438 iommu->devid & 0xff); ··· 1593 1487 1594 1488 if (iommu_feature(iommu, FEATURE_PPR) && alloc_ppr_log(iommu)) 1595 1489 return -ENOMEM; 1490 + 1491 + ret = iommu_init_ga(iommu); 1492 + if (ret) 1493 + return ret; 1596 1494 1597 1495 if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE)) 1598 1496 amd_iommu_np_cache = true; ··· 1655 1545 dev_name(&iommu->dev->dev), iommu->cap_ptr); 1656 1546 1657 1547 if (iommu->cap & (1 << IOMMU_CAP_EFR)) { 1658 - pr_info("AMD-Vi: Extended features: "); 1548 + pr_info("AMD-Vi: Extended features (%#llx):\n", 1549 + iommu->features); 1659 1550 for (i = 0; i < ARRAY_SIZE(feat_str); ++i) { 1660 1551 if (iommu_feature(iommu, (1ULL << i))) 1661 1552 pr_cont(" %s", feat_str[i]); 1662 1553 } 1554 + 1555 + if (iommu->features & FEATURE_GAM_VAPIC) 1556 + pr_cont(" GA_vAPIC"); 1557 + 1663 1558 pr_cont("\n"); 1664 1559 } 1665 1560 } 1666 - if (irq_remapping_enabled) 1561 + if (irq_remapping_enabled) { 1667 1562 pr_info("AMD-Vi: Interrupt remapping enabled\n"); 1563 + if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir)) 1564 + pr_info("AMD-Vi: virtual APIC enabled\n"); 1565 + } 1668 1566 } 1669 1567 1670 1568 static int __init amd_iommu_init_pci(void) ··· 1762 1644 1763 1645 if (iommu->ppr_log != NULL) 1764 1646 iommu_feature_enable(iommu, CONTROL_PPFINT_EN); 1647 + 1648 + iommu_ga_log_enable(iommu); 1765 1649 1766 1650 return 0; 1767 1651 } ··· 1982 1862 iommu->stored_addr_lo | 1); 1983 1863 } 1984 1864 1865 + static void iommu_enable_ga(struct amd_iommu *iommu) 1866 + { 1867 + #ifdef CONFIG_IRQ_REMAP 1868 + switch (amd_iommu_guest_ir) { 1869 + case AMD_IOMMU_GUEST_IR_VAPIC: 1870 + iommu_feature_enable(iommu, CONTROL_GAM_EN); 1871 + /* Fall through */ 1872 + case AMD_IOMMU_GUEST_IR_LEGACY_GA: 1873 + iommu_feature_enable(iommu, CONTROL_GA_EN); 1874 + iommu->irte_ops = &irte_128_ops; 1875 + break; 1876 + default: 1877 + iommu->irte_ops = &irte_32_ops; 1878 + break; 1879 + } 1880 + #endif 1881 + } 1882 + 1985 1883 /* 1986 1884 * This function finally enables all IOMMUs found in the system after 1987 1885 * they have been initialized ··· 2015 1877 iommu_enable_command_buffer(iommu); 2016 1878 iommu_enable_event_buffer(iommu); 2017 1879 iommu_set_exclusion_range(iommu); 1880 + iommu_enable_ga(iommu); 2018 1881 iommu_enable(iommu); 2019 1882 iommu_flush_all_caches(iommu); 2020 1883 } 1884 + 1885 + #ifdef CONFIG_IRQ_REMAP 1886 + if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir)) 1887 + amd_iommu_irq_ops.capability |= (1 << IRQ_POSTING_CAP); 1888 + #endif 2021 1889 } 2022 1890 2023 1891 static void enable_iommus_v2(void) ··· 2049 1905 2050 1906 for_each_iommu(iommu) 2051 1907 iommu_disable(iommu); 1908 + 1909 + #ifdef CONFIG_IRQ_REMAP 1910 + if (AMD_IOMMU_GUEST_IR_VAPIC(amd_iommu_guest_ir)) 1911 + amd_iommu_irq_ops.capability &= ~(1 << IRQ_POSTING_CAP); 1912 + #endif 2052 1913 } 2053 1914 2054 1915 /* ··· 2208 2059 struct acpi_table_header *ivrs_base; 2209 2060 acpi_size ivrs_size; 2210 2061 acpi_status status; 2211 - int i, ret = 0; 2062 + int i, remap_cache_sz, ret = 0; 2212 2063 2213 2064 if (!amd_iommu_detected) 2214 2065 return -ENODEV; ··· 2306 2157 * remapping tables. 2307 2158 */ 2308 2159 ret = -ENOMEM; 2160 + if (!AMD_IOMMU_GUEST_IR_GA(amd_iommu_guest_ir)) 2161 + remap_cache_sz = MAX_IRQS_PER_TABLE * sizeof(u32); 2162 + else 2163 + remap_cache_sz = MAX_IRQS_PER_TABLE * (sizeof(u64) * 2); 2309 2164 amd_iommu_irq_cache = kmem_cache_create("irq_remap_cache", 2310 - MAX_IRQS_PER_TABLE * sizeof(u32), 2311 - IRQ_TABLE_ALIGNMENT, 2312 - 0, NULL); 2165 + remap_cache_sz, 2166 + IRQ_TABLE_ALIGNMENT, 2167 + 0, NULL); 2313 2168 if (!amd_iommu_irq_cache) 2314 2169 goto out; 2315 2170 ··· 2566 2413 return 1; 2567 2414 } 2568 2415 2416 + static int __init parse_amd_iommu_intr(char *str) 2417 + { 2418 + for (; *str; ++str) { 2419 + if (strncmp(str, "legacy", 6) == 0) { 2420 + amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_LEGACY; 2421 + break; 2422 + } 2423 + if (strncmp(str, "vapic", 5) == 0) { 2424 + amd_iommu_guest_ir = AMD_IOMMU_GUEST_IR_VAPIC; 2425 + break; 2426 + } 2427 + } 2428 + return 1; 2429 + } 2430 + 2569 2431 static int __init parse_amd_iommu_options(char *str) 2570 2432 { 2571 2433 for (; *str; ++str) { ··· 2689 2521 2690 2522 __setup("amd_iommu_dump", parse_amd_iommu_dump); 2691 2523 __setup("amd_iommu=", parse_amd_iommu_options); 2524 + __setup("amd_iommu_intr=", parse_amd_iommu_intr); 2692 2525 __setup("ivrs_ioapic", parse_ivrs_ioapic); 2693 2526 __setup("ivrs_hpet", parse_ivrs_hpet); 2694 2527 __setup("ivrs_acpihid", parse_ivrs_acpihid);

+1

drivers/iommu/amd_iommu_proto.h

··· 38 38 extern void amd_iommu_disable(void); 39 39 extern int amd_iommu_reenable(int); 40 40 extern int amd_iommu_enable_faulting(void); 41 + extern int amd_iommu_guest_ir; 41 42 42 43 /* IOMMUv2 specific functions */ 43 44 struct iommu_domain;

+149

drivers/iommu/amd_iommu_types.h

··· 22 22 23 23 #include <linux/types.h> 24 24 #include <linux/mutex.h> 25 + #include <linux/msi.h> 25 26 #include <linux/list.h> 26 27 #include <linux/spinlock.h> 27 28 #include <linux/pci.h> ··· 70 69 #define MMIO_EXCL_LIMIT_OFFSET 0x0028 71 70 #define MMIO_EXT_FEATURES 0x0030 72 71 #define MMIO_PPR_LOG_OFFSET 0x0038 72 + #define MMIO_GA_LOG_BASE_OFFSET 0x00e0 73 + #define MMIO_GA_LOG_TAIL_OFFSET 0x00e8 73 74 #define MMIO_CMD_HEAD_OFFSET 0x2000 74 75 #define MMIO_CMD_TAIL_OFFSET 0x2008 75 76 #define MMIO_EVT_HEAD_OFFSET 0x2010 ··· 79 76 #define MMIO_STATUS_OFFSET 0x2020 80 77 #define MMIO_PPR_HEAD_OFFSET 0x2030 81 78 #define MMIO_PPR_TAIL_OFFSET 0x2038 79 + #define MMIO_GA_HEAD_OFFSET 0x2040 80 + #define MMIO_GA_TAIL_OFFSET 0x2048 82 81 #define MMIO_CNTR_CONF_OFFSET 0x4000 83 82 #define MMIO_CNTR_REG_OFFSET 0x40000 84 83 #define MMIO_REG_END_OFFSET 0x80000 ··· 97 92 #define FEATURE_GA (1ULL<<7) 98 93 #define FEATURE_HE (1ULL<<8) 99 94 #define FEATURE_PC (1ULL<<9) 95 + #define FEATURE_GAM_VAPIC (1ULL<<21) 100 96 101 97 #define FEATURE_PASID_SHIFT 32 102 98 #define FEATURE_PASID_MASK (0x1fULL << FEATURE_PASID_SHIFT) ··· 116 110 #define MMIO_STATUS_EVT_INT_MASK (1 << 1) 117 111 #define MMIO_STATUS_COM_WAIT_INT_MASK (1 << 2) 118 112 #define MMIO_STATUS_PPR_INT_MASK (1 << 6) 113 + #define MMIO_STATUS_GALOG_RUN_MASK (1 << 8) 114 + #define MMIO_STATUS_GALOG_OVERFLOW_MASK (1 << 9) 115 + #define MMIO_STATUS_GALOG_INT_MASK (1 << 10) 119 116 120 117 /* event logging constants */ 121 118 #define EVENT_ENTRY_SIZE 0x10 ··· 155 146 #define CONTROL_PPFINT_EN 0x0eULL 156 147 #define CONTROL_PPR_EN 0x0fULL 157 148 #define CONTROL_GT_EN 0x10ULL 149 + #define CONTROL_GA_EN 0x11ULL 150 + #define CONTROL_GAM_EN 0x19ULL 151 + #define CONTROL_GALOG_EN 0x1CULL 152 + #define CONTROL_GAINT_EN 0x1DULL 158 153 159 154 #define CTRL_INV_TO_MASK (7 << CONTROL_INV_TIMEOUT) 160 155 #define CTRL_INV_TO_NONE 0 ··· 236 223 #define PPR_PASID(x) ((PPR_PASID2(x) << 16) | PPR_PASID1(x)) 237 224 238 225 #define PPR_REQ_FAULT 0x01 226 + 227 + /* Constants for GA Log handling */ 228 + #define GA_LOG_ENTRIES 512 229 + #define GA_LOG_SIZE_SHIFT 56 230 + #define GA_LOG_SIZE_512 (0x8ULL << GA_LOG_SIZE_SHIFT) 231 + #define GA_ENTRY_SIZE 8 232 + #define GA_LOG_SIZE (GA_ENTRY_SIZE * GA_LOG_ENTRIES) 233 + 234 + #define GA_TAG(x) (u32)(x & 0xffffffffULL) 235 + #define GA_DEVID(x) (u16)(((x) >> 32) & 0xffffULL) 236 + #define GA_REQ_TYPE(x) (((x) >> 60) & 0xfULL) 237 + 238 + #define GA_GUEST_NR 0x1 239 239 240 240 #define PAGE_MODE_NONE 0x00 241 241 #define PAGE_MODE_1_LEVEL 0x01 ··· 355 329 #define IOMMU_CAP_NPCACHE 26 356 330 #define IOMMU_CAP_EFR 27 357 331 332 + /* IOMMU Feature Reporting Field (for IVHD type 10h */ 333 + #define IOMMU_FEAT_GASUP_SHIFT 6 334 + 335 + /* IOMMU Extended Feature Register (EFR) */ 336 + #define IOMMU_EFR_GASUP_SHIFT 7 337 + 358 338 #define MAX_DOMAIN_ID 65536 359 339 360 340 /* Protection domain flags */ ··· 432 400 433 401 struct iommu_domain; 434 402 struct irq_domain; 403 + struct amd_irte_ops; 435 404 436 405 /* 437 406 * This structure contains generic data for IOMMU protection domains ··· 523 490 /* Base of the PPR log, if present */ 524 491 u8 *ppr_log; 525 492 493 + /* Base of the GA log, if present */ 494 + u8 *ga_log; 495 + 496 + /* Tail of the GA log, if present */ 497 + u8 *ga_log_tail; 498 + 526 499 /* true if interrupts for this IOMMU are already enabled */ 527 500 bool int_enabled; 528 501 ··· 562 523 #ifdef CONFIG_IRQ_REMAP 563 524 struct irq_domain *ir_domain; 564 525 struct irq_domain *msi_domain; 526 + 527 + struct amd_irte_ops *irte_ops; 565 528 #endif 566 529 567 530 volatile u64 __aligned(8) cmd_sem; ··· 723 682 724 683 return -EINVAL; 725 684 } 685 + 686 + enum amd_iommu_intr_mode_type { 687 + AMD_IOMMU_GUEST_IR_LEGACY, 688 + 689 + /* This mode is not visible to users. It is used when 690 + * we cannot fully enable vAPIC and fallback to only support 691 + * legacy interrupt remapping via 128-bit IRTE. 692 + */ 693 + AMD_IOMMU_GUEST_IR_LEGACY_GA, 694 + AMD_IOMMU_GUEST_IR_VAPIC, 695 + }; 696 + 697 + #define AMD_IOMMU_GUEST_IR_GA(x) (x == AMD_IOMMU_GUEST_IR_VAPIC || \ 698 + x == AMD_IOMMU_GUEST_IR_LEGACY_GA) 699 + 700 + #define AMD_IOMMU_GUEST_IR_VAPIC(x) (x == AMD_IOMMU_GUEST_IR_VAPIC) 701 + 702 + union irte { 703 + u32 val; 704 + struct { 705 + u32 valid : 1, 706 + no_fault : 1, 707 + int_type : 3, 708 + rq_eoi : 1, 709 + dm : 1, 710 + rsvd_1 : 1, 711 + destination : 8, 712 + vector : 8, 713 + rsvd_2 : 8; 714 + } fields; 715 + }; 716 + 717 + union irte_ga_lo { 718 + u64 val; 719 + 720 + /* For int remapping */ 721 + struct { 722 + u64 valid : 1, 723 + no_fault : 1, 724 + /* ------ */ 725 + int_type : 3, 726 + rq_eoi : 1, 727 + dm : 1, 728 + /* ------ */ 729 + guest_mode : 1, 730 + destination : 8, 731 + rsvd : 48; 732 + } fields_remap; 733 + 734 + /* For guest vAPIC */ 735 + struct { 736 + u64 valid : 1, 737 + no_fault : 1, 738 + /* ------ */ 739 + ga_log_intr : 1, 740 + rsvd1 : 3, 741 + is_run : 1, 742 + /* ------ */ 743 + guest_mode : 1, 744 + destination : 8, 745 + rsvd2 : 16, 746 + ga_tag : 32; 747 + } fields_vapic; 748 + }; 749 + 750 + union irte_ga_hi { 751 + u64 val; 752 + struct { 753 + u64 vector : 8, 754 + rsvd_1 : 4, 755 + ga_root_ptr : 40, 756 + rsvd_2 : 12; 757 + } fields; 758 + }; 759 + 760 + struct irte_ga { 761 + union irte_ga_lo lo; 762 + union irte_ga_hi hi; 763 + }; 764 + 765 + struct irq_2_irte { 766 + u16 devid; /* Device ID for IRTE table */ 767 + u16 index; /* Index into IRTE table*/ 768 + }; 769 + 770 + struct amd_ir_data { 771 + u32 cached_ga_tag; 772 + struct irq_2_irte irq_2_irte; 773 + struct msi_msg msi_entry; 774 + void *entry; /* Pointer to union irte or struct irte_ga */ 775 + void *ref; /* Pointer to the actual irte */ 776 + }; 777 + 778 + struct amd_irte_ops { 779 + void (*prepare)(void *, u32, u32, u8, u32, int); 780 + void (*activate)(void *, u16, u16); 781 + void (*deactivate)(void *, u16, u16); 782 + void (*set_affinity)(void *, u16, u16, u8, u32); 783 + void *(*get)(struct irq_remap_table *, int); 784 + void (*set_allocated)(struct irq_remap_table *, int); 785 + bool (*is_allocated)(struct irq_remap_table *, int); 786 + void (*clear_allocated)(struct irq_remap_table *, int); 787 + }; 788 + 789 + #ifdef CONFIG_IRQ_REMAP 790 + extern struct amd_irte_ops irte_32_ops; 791 + extern struct amd_irte_ops irte_128_ops; 792 + #endif 726 793 727 794 #endif /* _ASM_X86_AMD_IOMMU_TYPES_H */

+10 -8

include/kvm/arm_vgic.h

··· 20 20 #include <linux/kvm.h> 21 21 #include <linux/irqreturn.h> 22 22 #include <linux/spinlock.h> 23 + #include <linux/static_key.h> 23 24 #include <linux/types.h> 24 25 #include <kvm/iodev.h> 25 26 #include <linux/list.h> 27 + #include <linux/jump_label.h> 26 28 27 29 #define VGIC_V3_MAX_CPUS 255 28 30 #define VGIC_V2_MAX_CPUS 8 ··· 51 49 /* Physical address of vgic virtual cpu interface */ 52 50 phys_addr_t vcpu_base; 53 51 52 + /* GICV mapping */ 53 + void __iomem *vcpu_base_va; 54 + 54 55 /* virtual control interface mapping */ 55 56 void __iomem *vctrl_base; 56 57 ··· 68 63 69 64 /* Only needed for the legacy KVM_CREATE_IRQCHIP */ 70 65 bool can_emulate_gicv2; 66 + 67 + /* GIC system register CPU interface */ 68 + struct static_key_false gicv3_cpuif; 71 69 }; 72 70 73 71 extern struct vgic_global kvm_vgic_global_state; ··· 225 217 }; 226 218 227 219 struct vgic_v3_cpu_if { 228 - #ifdef CONFIG_KVM_ARM_VGIC_V3 229 220 u32 vgic_hcr; 230 221 u32 vgic_vmcr; 231 222 u32 vgic_sre; /* Restored only, change ignored */ ··· 234 227 u32 vgic_ap0r[4]; 235 228 u32 vgic_ap1r[4]; 236 229 u64 vgic_lr[VGIC_V3_MAX_LRS]; 237 - #endif 238 230 }; 239 231 240 232 struct vgic_cpu { ··· 271 265 bool lpis_enabled; 272 266 }; 273 267 268 + extern struct static_key_false vgic_v2_cpuif_trap; 269 + 274 270 int kvm_vgic_addr(struct kvm *kvm, unsigned long type, u64 *addr, bool write); 275 271 void kvm_vgic_early_init(struct kvm *kvm); 276 272 int kvm_vgic_create(struct kvm *kvm, u32 type); ··· 302 294 void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu); 303 295 void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu); 304 296 305 - #ifdef CONFIG_KVM_ARM_VGIC_V3 306 297 void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg); 307 - #else 308 - static inline void vgic_v3_dispatch_sgi(struct kvm_vcpu *vcpu, u64 reg) 309 - { 310 - } 311 - #endif 312 298 313 299 /** 314 300 * kvm_vgic_get_max_vcpus - Get the maximum number of VCPUs allowed by HW

+40 -3

include/linux/amd-iommu.h

··· 22 22 23 23 #include <linux/types.h> 24 24 25 + /* 26 + * This is mainly used to communicate information back-and-forth 27 + * between SVM and IOMMU for setting up and tearing down posted 28 + * interrupt 29 + */ 30 + struct amd_iommu_pi_data { 31 + u32 ga_tag; 32 + u32 prev_ga_tag; 33 + u64 base; 34 + bool is_guest_mode; 35 + struct vcpu_data *vcpu_data; 36 + void *ir_data; 37 + }; 38 + 25 39 #ifdef CONFIG_AMD_IOMMU 26 40 27 41 struct task_struct; ··· 182 168 183 169 extern int amd_iommu_set_invalidate_ctx_cb(struct pci_dev *pdev, 184 170 amd_iommu_invalidate_ctx cb); 185 - 186 - #else 171 + #else /* CONFIG_AMD_IOMMU */ 187 172 188 173 static inline int amd_iommu_detect(void) { return -ENODEV; } 189 174 190 - #endif 175 + #endif /* CONFIG_AMD_IOMMU */ 176 + 177 + #if defined(CONFIG_AMD_IOMMU) && defined(CONFIG_IRQ_REMAP) 178 + 179 + /* IOMMU AVIC Function */ 180 + extern int amd_iommu_register_ga_log_notifier(int (*notifier)(u32)); 181 + 182 + extern int 183 + amd_iommu_update_ga(int cpu, bool is_run, void *data); 184 + 185 + #else /* defined(CONFIG_AMD_IOMMU) && defined(CONFIG_IRQ_REMAP) */ 186 + 187 + static inline int 188 + amd_iommu_register_ga_log_notifier(int (*notifier)(u32)) 189 + { 190 + return 0; 191 + } 192 + 193 + static inline int 194 + amd_iommu_update_ga(int cpu, bool is_run, void *data) 195 + { 196 + return 0; 197 + } 198 + 199 + #endif /* defined(CONFIG_AMD_IOMMU) && defined(CONFIG_IRQ_REMAP) */ 191 200 192 201 #endif /* _ASM_X86_AMD_IOMMU_H */

+4

include/linux/kvm_host.h

··· 265 265 #endif 266 266 bool preempted; 267 267 struct kvm_vcpu_arch arch; 268 + struct dentry *debugfs_dentry; 268 269 }; 269 270 270 271 static inline int kvm_vcpu_exiting_guest_mode(struct kvm_vcpu *vcpu) ··· 749 748 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu); 750 749 void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu); 751 750 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu); 751 + 752 + bool kvm_arch_has_vcpu_debugfs(void); 753 + int kvm_arch_create_vcpu_debugfs(struct kvm_vcpu *vcpu); 752 754 753 755 int kvm_arch_hardware_enable(void); 754 756 void kvm_arch_hardware_disable(void);

+3 -14

virt/kvm/arm/arch_timer.c

··· 31 31 #include "trace.h" 32 32 33 33 static struct timecounter *timecounter; 34 - static struct workqueue_struct *wqueue; 35 34 static unsigned int host_vtimer_irq; 36 35 static u32 host_vtimer_irq_flags; 37 36 ··· 140 141 return HRTIMER_RESTART; 141 142 } 142 143 143 - queue_work(wqueue, &timer->expired); 144 + schedule_work(&timer->expired); 144 145 return HRTIMER_NORESTART; 145 146 } 146 147 ··· 445 446 if (err) { 446 447 kvm_err("kvm_arch_timer: can't request interrupt %d (%d)\n", 447 448 host_vtimer_irq, err); 448 - goto out; 449 - } 450 - 451 - wqueue = create_singlethread_workqueue("kvm_arch_timer"); 452 - if (!wqueue) { 453 - err = -ENOMEM; 454 - goto out_free; 449 + return err; 455 450 } 456 451 457 452 kvm_info("virtual timer IRQ%d\n", host_vtimer_irq); ··· 453 460 cpuhp_setup_state(CPUHP_AP_KVM_ARM_TIMER_STARTING, 454 461 "AP_KVM_ARM_TIMER_STARTING", kvm_timer_starting_cpu, 455 462 kvm_timer_dying_cpu); 456 - goto out; 457 - out_free: 458 - free_percpu_irq(host_vtimer_irq, kvm_get_running_vcpus()); 459 - out: 460 463 return err; 461 464 } 462 465 ··· 507 518 * VCPUs have the enabled variable set, before entering the guest, if 508 519 * the arch timers are enabled. 509 520 */ 510 - if (timecounter && wqueue) 521 + if (timecounter) 511 522 timer->enabled = 1; 512 523 513 524 return 0;

+57

virt/kvm/arm/hyp/vgic-v2-sr.c

··· 19 19 #include <linux/irqchip/arm-gic.h> 20 20 #include <linux/kvm_host.h> 21 21 22 + #include <asm/kvm_emulate.h> 22 23 #include <asm/kvm_hyp.h> 23 24 24 25 static void __hyp_text save_maint_int_state(struct kvm_vcpu *vcpu, ··· 168 167 writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR); 169 168 vcpu->arch.vgic_cpu.live_lrs = live_lrs; 170 169 } 170 + 171 + #ifdef CONFIG_ARM64 172 + /* 173 + * __vgic_v2_perform_cpuif_access -- perform a GICV access on behalf of the 174 + * guest. 175 + * 176 + * @vcpu: the offending vcpu 177 + * 178 + * Returns: 179 + * 1: GICV access successfully performed 180 + * 0: Not a GICV access 181 + * -1: Illegal GICV access 182 + */ 183 + int __hyp_text __vgic_v2_perform_cpuif_access(struct kvm_vcpu *vcpu) 184 + { 185 + struct kvm *kvm = kern_hyp_va(vcpu->kvm); 186 + struct vgic_dist *vgic = &kvm->arch.vgic; 187 + phys_addr_t fault_ipa; 188 + void __iomem *addr; 189 + int rd; 190 + 191 + /* Build the full address */ 192 + fault_ipa = kvm_vcpu_get_fault_ipa(vcpu); 193 + fault_ipa |= kvm_vcpu_get_hfar(vcpu) & GENMASK(11, 0); 194 + 195 + /* If not for GICV, move on */ 196 + if (fault_ipa < vgic->vgic_cpu_base || 197 + fault_ipa >= (vgic->vgic_cpu_base + KVM_VGIC_V2_CPU_SIZE)) 198 + return 0; 199 + 200 + /* Reject anything but a 32bit access */ 201 + if (kvm_vcpu_dabt_get_as(vcpu) != sizeof(u32)) 202 + return -1; 203 + 204 + /* Not aligned? Don't bother */ 205 + if (fault_ipa & 3) 206 + return -1; 207 + 208 + rd = kvm_vcpu_dabt_get_rd(vcpu); 209 + addr = kern_hyp_va((kern_hyp_va(&kvm_vgic_global_state))->vcpu_base_va); 210 + addr += fault_ipa - vgic->vgic_cpu_base; 211 + 212 + if (kvm_vcpu_dabt_iswrite(vcpu)) { 213 + u32 data = vcpu_data_guest_to_host(vcpu, 214 + vcpu_get_reg(vcpu, rd), 215 + sizeof(u32)); 216 + writel_relaxed(data, addr); 217 + } else { 218 + u32 data = readl_relaxed(addr); 219 + vcpu_set_reg(vcpu, rd, vcpu_data_host_to_guest(vcpu, data, 220 + sizeof(u32))); 221 + } 222 + 223 + return 1; 224 + } 225 + #endif

+8

virt/kvm/arm/pmu.c

··· 423 423 if (!kvm_arm_support_pmu_v3()) 424 424 return -ENODEV; 425 425 426 + /* 427 + * We currently require an in-kernel VGIC to use the PMU emulation, 428 + * because we do not support forwarding PMU overflow interrupts to 429 + * userspace yet. 430 + */ 431 + if (!irqchip_in_kernel(vcpu->kvm) || !vgic_initialized(vcpu->kvm)) 432 + return -ENODEV; 433 + 426 434 if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features) || 427 435 !kvm_arm_pmu_irq_initialized(vcpu)) 428 436 return -ENXIO;

+4

virt/kvm/arm/vgic/vgic-init.c

··· 405 405 break; 406 406 case GIC_V3: 407 407 ret = vgic_v3_probe(gic_kvm_info); 408 + if (!ret) { 409 + static_branch_enable(&kvm_vgic_global_state.gicv3_cpuif); 410 + kvm_info("GIC system register CPU interface enabled\n"); 411 + } 408 412 break; 409 413 default: 410 414 ret = -ENODEV;

-6

virt/kvm/arm/vgic/vgic-irqfd.c

··· 46 46 * @ue: user api routing entry handle 47 47 * return 0 on success, -EINVAL on errors. 48 48 */ 49 - #ifdef KVM_CAP_X2APIC_API 50 49 int kvm_set_routing_entry(struct kvm *kvm, 51 50 struct kvm_kernel_irq_routing_entry *e, 52 51 const struct kvm_irq_routing_entry *ue) 53 - #else 54 - /* Remove this version and the ifdefery once merged into 4.8 */ 55 - int kvm_set_routing_entry(struct kvm_kernel_irq_routing_entry *e, 56 - const struct kvm_irq_routing_entry *ue) 57 - #endif 58 52 { 59 53 int r = -EINVAL; 60 54

+85 -48

virt/kvm/arm/vgic/vgic-kvm-device.c

··· 71 71 addr_ptr = &vgic->vgic_cpu_base; 72 72 alignment = SZ_4K; 73 73 break; 74 - #ifdef CONFIG_KVM_ARM_VGIC_V3 75 74 case KVM_VGIC_V3_ADDR_TYPE_DIST: 76 75 type_needed = KVM_DEV_TYPE_ARM_VGIC_V3; 77 76 addr_ptr = &vgic->vgic_dist_base; ··· 81 82 addr_ptr = &vgic->vgic_redist_base; 82 83 alignment = SZ_64K; 83 84 break; 84 - #endif 85 85 default: 86 86 r = -ENODEV; 87 87 goto out; ··· 217 219 ret = kvm_register_device_ops(&kvm_arm_vgic_v2_ops, 218 220 KVM_DEV_TYPE_ARM_VGIC_V2); 219 221 break; 220 - #ifdef CONFIG_KVM_ARM_VGIC_V3 221 222 case KVM_DEV_TYPE_ARM_VGIC_V3: 222 223 ret = kvm_register_device_ops(&kvm_arm_vgic_v3_ops, 223 224 KVM_DEV_TYPE_ARM_VGIC_V3); 225 + 226 + #ifdef CONFIG_KVM_ARM_VGIC_V3_ITS 224 227 if (ret) 225 228 break; 226 229 ret = kvm_vgic_register_its_device(); 227 - break; 228 230 #endif 231 + break; 229 232 } 230 233 231 234 return ret; 232 235 } 233 236 234 - /** vgic_attr_regs_access: allows user space to read/write VGIC registers 235 - * 236 - * @dev: kvm device handle 237 - * @attr: kvm device attribute 238 - * @reg: address the value is read or written 239 - * @is_write: write flag 240 - * 241 - */ 242 - static int vgic_attr_regs_access(struct kvm_device *dev, 243 - struct kvm_device_attr *attr, 244 - u32 *reg, bool is_write) 245 - { 237 + struct vgic_reg_attr { 238 + struct kvm_vcpu *vcpu; 246 239 gpa_t addr; 247 - int cpuid, ret, c; 248 - struct kvm_vcpu *vcpu, *tmp_vcpu; 249 - int vcpu_lock_idx = -1; 240 + }; 241 + 242 + static int parse_vgic_v2_attr(struct kvm_device *dev, 243 + struct kvm_device_attr *attr, 244 + struct vgic_reg_attr *reg_attr) 245 + { 246 + int cpuid; 250 247 251 248 cpuid = (attr->attr & KVM_DEV_ARM_VGIC_CPUID_MASK) >> 252 249 KVM_DEV_ARM_VGIC_CPUID_SHIFT; 253 - vcpu = kvm_get_vcpu(dev->kvm, cpuid); 254 - addr = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK; 255 250 256 - mutex_lock(&dev->kvm->lock); 251 + if (cpuid >= atomic_read(&dev->kvm->online_vcpus)) 252 + return -EINVAL; 257 253 258 - ret = vgic_init(dev->kvm); 259 - if (ret) 260 - goto out; 254 + reg_attr->vcpu = kvm_get_vcpu(dev->kvm, cpuid); 255 + reg_attr->addr = attr->attr & KVM_DEV_ARM_VGIC_OFFSET_MASK; 261 256 262 - if (cpuid >= atomic_read(&dev->kvm->online_vcpus)) { 263 - ret = -EINVAL; 264 - goto out; 257 + return 0; 258 + } 259 + 260 + /* unlocks vcpus from @vcpu_lock_idx and smaller */ 261 + static void unlock_vcpus(struct kvm *kvm, int vcpu_lock_idx) 262 + { 263 + struct kvm_vcpu *tmp_vcpu; 264 + 265 + for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) { 266 + tmp_vcpu = kvm_get_vcpu(kvm, vcpu_lock_idx); 267 + mutex_unlock(&tmp_vcpu->mutex); 265 268 } 269 + } 270 + 271 + static void unlock_all_vcpus(struct kvm *kvm) 272 + { 273 + unlock_vcpus(kvm, atomic_read(&kvm->online_vcpus) - 1); 274 + } 275 + 276 + /* Returns true if all vcpus were locked, false otherwise */ 277 + static bool lock_all_vcpus(struct kvm *kvm) 278 + { 279 + struct kvm_vcpu *tmp_vcpu; 280 + int c; 266 281 267 282 /* 268 283 * Any time a vcpu is run, vcpu_load is called which tries to grab the ··· 283 272 * that no other VCPUs are run and fiddle with the vgic state while we 284 273 * access it. 285 274 */ 286 - ret = -EBUSY; 287 - kvm_for_each_vcpu(c, tmp_vcpu, dev->kvm) { 288 - if (!mutex_trylock(&tmp_vcpu->mutex)) 289 - goto out; 290 - vcpu_lock_idx = c; 275 + kvm_for_each_vcpu(c, tmp_vcpu, kvm) { 276 + if (!mutex_trylock(&tmp_vcpu->mutex)) { 277 + unlock_vcpus(kvm, c - 1); 278 + return false; 279 + } 280 + } 281 + 282 + return true; 283 + } 284 + 285 + /** 286 + * vgic_attr_regs_access_v2 - allows user space to access VGIC v2 state 287 + * 288 + * @dev: kvm device handle 289 + * @attr: kvm device attribute 290 + * @reg: address the value is read or written 291 + * @is_write: true if userspace is writing a register 292 + */ 293 + static int vgic_attr_regs_access_v2(struct kvm_device *dev, 294 + struct kvm_device_attr *attr, 295 + u32 *reg, bool is_write) 296 + { 297 + struct vgic_reg_attr reg_attr; 298 + gpa_t addr; 299 + struct kvm_vcpu *vcpu; 300 + int ret; 301 + 302 + ret = parse_vgic_v2_attr(dev, attr, &reg_attr); 303 + if (ret) 304 + return ret; 305 + 306 + vcpu = reg_attr.vcpu; 307 + addr = reg_attr.addr; 308 + 309 + mutex_lock(&dev->kvm->lock); 310 + 311 + ret = vgic_init(dev->kvm); 312 + if (ret) 313 + goto out; 314 + 315 + if (!lock_all_vcpus(dev->kvm)) { 316 + ret = -EBUSY; 317 + goto out; 291 318 } 292 319 293 320 switch (attr->group) { ··· 340 291 break; 341 292 } 342 293 294 + unlock_all_vcpus(dev->kvm); 343 295 out: 344 - for (; vcpu_lock_idx >= 0; vcpu_lock_idx--) { 345 - tmp_vcpu = kvm_get_vcpu(dev->kvm, vcpu_lock_idx); 346 - mutex_unlock(&tmp_vcpu->mutex); 347 - } 348 - 349 296 mutex_unlock(&dev->kvm->lock); 350 297 return ret; 351 298 } 352 - 353 - /* V2 ops */ 354 299 355 300 static int vgic_v2_set_attr(struct kvm_device *dev, 356 301 struct kvm_device_attr *attr) ··· 364 321 if (get_user(reg, uaddr)) 365 322 return -EFAULT; 366 323 367 - return vgic_attr_regs_access(dev, attr, &reg, true); 324 + return vgic_attr_regs_access_v2(dev, attr, &reg, true); 368 325 } 369 326 } 370 327 ··· 386 343 u32 __user *uaddr = (u32 __user *)(long)attr->addr; 387 344 u32 reg = 0; 388 345 389 - ret = vgic_attr_regs_access(dev, attr, &reg, false); 346 + ret = vgic_attr_regs_access_v2(dev, attr, &reg, false); 390 347 if (ret) 391 348 return ret; 392 349 return put_user(reg, uaddr); ··· 430 387 .has_attr = vgic_v2_has_attr, 431 388 }; 432 389 433 - /* V3 ops */ 434 - 435 - #ifdef CONFIG_KVM_ARM_VGIC_V3 436 - 437 390 static int vgic_v3_set_attr(struct kvm_device *dev, 438 391 struct kvm_device_attr *attr) 439 392 { ··· 472 433 .get_attr = vgic_v3_get_attr, 473 434 .has_attr = vgic_v3_has_attr, 474 435 }; 475 - 476 - #endif /* CONFIG_KVM_ARM_VGIC_V3 */

+5 -3

virt/kvm/arm/vgic/vgic-mmio-v3.c

··· 23 23 #include "vgic-mmio.h" 24 24 25 25 /* extract @num bytes at @offset bytes offset in data */ 26 - unsigned long extract_bytes(unsigned long data, unsigned int offset, 26 + unsigned long extract_bytes(u64 data, unsigned int offset, 27 27 unsigned int num) 28 28 { 29 29 return (data >> (offset * 8)) & GENMASK_ULL(num * 8 - 1, 0); ··· 42 42 return reg | ((u64)val << lower); 43 43 } 44 44 45 + #ifdef CONFIG_KVM_ARM_VGIC_V3_ITS 45 46 bool vgic_has_its(struct kvm *kvm) 46 47 { 47 48 struct vgic_dist *dist = &kvm->arch.vgic; ··· 52 51 53 52 return dist->has_its; 54 53 } 54 + #endif 55 55 56 56 static unsigned long vgic_mmio_read_v3_misc(struct kvm_vcpu *vcpu, 57 57 gpa_t addr, unsigned int len) ··· 181 179 int target_vcpu_id = vcpu->vcpu_id; 182 180 u64 value; 183 181 184 - value = (mpidr & GENMASK(23, 0)) << 32; 182 + value = (u64)(mpidr & GENMASK(23, 0)) << 32; 185 183 value |= ((target_vcpu_id & 0xffff) << 8); 186 184 if (target_vcpu_id == atomic_read(&vcpu->kvm->online_vcpus) - 1) 187 185 value |= GICR_TYPER_LAST; ··· 611 609 bool broadcast; 612 610 613 611 sgi = (reg & ICC_SGI1R_SGI_ID_MASK) >> ICC_SGI1R_SGI_ID_SHIFT; 614 - broadcast = reg & BIT(ICC_SGI1R_IRQ_ROUTING_MODE_BIT); 612 + broadcast = reg & BIT_ULL(ICC_SGI1R_IRQ_ROUTING_MODE_BIT); 615 613 target_cpus = (reg & ICC_SGI1R_TARGET_LIST_MASK) >> ICC_SGI1R_TARGET_LIST_SHIFT; 616 614 mpidr = SGI_AFFINITY_LEVEL(reg, 3); 617 615 mpidr |= SGI_AFFINITY_LEVEL(reg, 2);

-2

virt/kvm/arm/vgic/vgic-mmio.c

··· 550 550 case VGIC_V2: 551 551 len = vgic_v2_init_dist_iodev(io_device); 552 552 break; 553 - #ifdef CONFIG_KVM_ARM_VGIC_V3 554 553 case VGIC_V3: 555 554 len = vgic_v3_init_dist_iodev(io_device); 556 555 break; 557 - #endif 558 556 default: 559 557 BUG_ON(1); 560 558 }

+1 -3

virt/kvm/arm/vgic/vgic-mmio.h

··· 96 96 void vgic_data_host_to_mmio_bus(void *buf, unsigned int len, 97 97 unsigned long data); 98 98 99 - unsigned long extract_bytes(unsigned long data, unsigned int offset, 99 + unsigned long extract_bytes(u64 data, unsigned int offset, 100 100 unsigned int num); 101 101 102 102 u64 update_64bit_reg(u64 reg, unsigned int offset, unsigned int len, ··· 162 162 163 163 unsigned int vgic_v3_init_dist_iodev(struct vgic_io_device *dev); 164 164 165 - #ifdef CONFIG_KVM_ARM_VGIC_V3 166 165 u64 vgic_sanitise_outer_cacheability(u64 reg); 167 166 u64 vgic_sanitise_inner_cacheability(u64 reg); 168 167 u64 vgic_sanitise_shareability(u64 reg); 169 168 u64 vgic_sanitise_field(u64 reg, u64 field_mask, int field_shift, 170 169 u64 (*sanitise_fn)(u64)); 171 - #endif 172 170 173 171 #endif

+44 -27

virt/kvm/arm/vgic/vgic-v2.c

··· 278 278 goto out; 279 279 } 280 280 281 - ret = kvm_phys_addr_ioremap(kvm, dist->vgic_cpu_base, 282 - kvm_vgic_global_state.vcpu_base, 283 - KVM_VGIC_V2_CPU_SIZE, true); 284 - if (ret) { 285 - kvm_err("Unable to remap VGIC CPU to VCPU\n"); 286 - goto out; 281 + if (!static_branch_unlikely(&vgic_v2_cpuif_trap)) { 282 + ret = kvm_phys_addr_ioremap(kvm, dist->vgic_cpu_base, 283 + kvm_vgic_global_state.vcpu_base, 284 + KVM_VGIC_V2_CPU_SIZE, true); 285 + if (ret) { 286 + kvm_err("Unable to remap VGIC CPU to VCPU\n"); 287 + goto out; 288 + } 287 289 } 288 290 289 291 dist->ready = true; ··· 295 293 kvm_vgic_destroy(kvm); 296 294 return ret; 297 295 } 296 + 297 + DEFINE_STATIC_KEY_FALSE(vgic_v2_cpuif_trap); 298 298 299 299 /** 300 300 * vgic_v2_probe - probe for a GICv2 compatible interrupt controller in DT ··· 314 310 return -ENXIO; 315 311 } 316 312 317 - if (!PAGE_ALIGNED(info->vcpu.start)) { 318 - kvm_err("GICV physical address 0x%llx not page aligned\n", 319 - (unsigned long long)info->vcpu.start); 320 - return -ENXIO; 321 - } 313 + if (!PAGE_ALIGNED(info->vcpu.start) || 314 + !PAGE_ALIGNED(resource_size(&info->vcpu))) { 315 + kvm_info("GICV region size/alignment is unsafe, using trapping (reduced performance)\n"); 316 + kvm_vgic_global_state.vcpu_base_va = ioremap(info->vcpu.start, 317 + resource_size(&info->vcpu)); 318 + if (!kvm_vgic_global_state.vcpu_base_va) { 319 + kvm_err("Cannot ioremap GICV\n"); 320 + return -ENOMEM; 321 + } 322 322 323 - if (!PAGE_ALIGNED(resource_size(&info->vcpu))) { 324 - kvm_err("GICV size 0x%llx not a multiple of page size 0x%lx\n", 325 - (unsigned long long)resource_size(&info->vcpu), 326 - PAGE_SIZE); 327 - return -ENXIO; 323 + ret = create_hyp_io_mappings(kvm_vgic_global_state.vcpu_base_va, 324 + kvm_vgic_global_state.vcpu_base_va + resource_size(&info->vcpu), 325 + info->vcpu.start); 326 + if (ret) { 327 + kvm_err("Cannot map GICV into hyp\n"); 328 + goto out; 329 + } 330 + 331 + static_branch_enable(&vgic_v2_cpuif_trap); 328 332 } 329 333 330 334 kvm_vgic_global_state.vctrl_base = ioremap(info->vctrl.start, 331 335 resource_size(&info->vctrl)); 332 336 if (!kvm_vgic_global_state.vctrl_base) { 333 337 kvm_err("Cannot ioremap GICH\n"); 334 - return -ENOMEM; 338 + ret = -ENOMEM; 339 + goto out; 335 340 } 336 341 337 342 vtr = readl_relaxed(kvm_vgic_global_state.vctrl_base + GICH_VTR); 338 343 kvm_vgic_global_state.nr_lr = (vtr & 0x3f) + 1; 339 - 340 - ret = kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V2); 341 - if (ret) { 342 - kvm_err("Cannot register GICv2 KVM device\n"); 343 - iounmap(kvm_vgic_global_state.vctrl_base); 344 - return ret; 345 - } 346 344 347 345 ret = create_hyp_io_mappings(kvm_vgic_global_state.vctrl_base, 348 346 kvm_vgic_global_state.vctrl_base + ··· 352 346 info->vctrl.start); 353 347 if (ret) { 354 348 kvm_err("Cannot map VCTRL into hyp\n"); 355 - kvm_unregister_device_ops(KVM_DEV_TYPE_ARM_VGIC_V2); 356 - iounmap(kvm_vgic_global_state.vctrl_base); 357 - return ret; 349 + goto out; 350 + } 351 + 352 + ret = kvm_register_vgic_device(KVM_DEV_TYPE_ARM_VGIC_V2); 353 + if (ret) { 354 + kvm_err("Cannot register GICv2 KVM device\n"); 355 + goto out; 358 356 } 359 357 360 358 kvm_vgic_global_state.can_emulate_gicv2 = true; ··· 369 359 kvm_info("vgic-v2@%llx\n", info->vctrl.start); 370 360 371 361 return 0; 362 + out: 363 + if (kvm_vgic_global_state.vctrl_base) 364 + iounmap(kvm_vgic_global_state.vctrl_base); 365 + if (kvm_vgic_global_state.vcpu_base_va) 366 + iounmap(kvm_vgic_global_state.vcpu_base_va); 367 + 368 + return ret; 372 369 }

+7 -1

virt/kvm/arm/vgic/vgic.c

··· 29 29 #define DEBUG_SPINLOCK_BUG_ON(p) 30 30 #endif 31 31 32 - struct vgic_global __section(.hyp.text) kvm_vgic_global_state; 32 + struct vgic_global __section(.hyp.text) kvm_vgic_global_state = {.gicv3_cpuif = STATIC_KEY_FALSE_INIT,}; 33 33 34 34 /* 35 35 * Locking order is always: ··· 645 645 /* Sync back the hardware VGIC state into our emulation after a guest's run. */ 646 646 void kvm_vgic_sync_hwstate(struct kvm_vcpu *vcpu) 647 647 { 648 + if (unlikely(!vgic_initialized(vcpu->kvm))) 649 + return; 650 + 648 651 vgic_process_maintenance_interrupt(vcpu); 649 652 vgic_fold_lr_state(vcpu); 650 653 vgic_prune_ap_list(vcpu); ··· 656 653 /* Flush our emulation state into the GIC hardware before entering the guest. */ 657 654 void kvm_vgic_flush_hwstate(struct kvm_vcpu *vcpu) 658 655 { 656 + if (unlikely(!vgic_initialized(vcpu->kvm))) 657 + return; 658 + 659 659 spin_lock(&vcpu->arch.vgic_cpu.ap_list_lock); 660 660 vgic_flush_lr_state(vcpu); 661 661 spin_unlock(&vcpu->arch.vgic_cpu.ap_list_lock);

+2 -52

virt/kvm/arm/vgic/vgic.h

··· 72 72 kref_get(&irq->refcount); 73 73 } 74 74 75 - #ifdef CONFIG_KVM_ARM_VGIC_V3 76 75 void vgic_v3_process_maintenance(struct kvm_vcpu *vcpu); 77 76 void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu); 78 77 void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr); ··· 83 84 int vgic_v3_probe(const struct gic_kvm_info *info); 84 85 int vgic_v3_map_resources(struct kvm *kvm); 85 86 int vgic_register_redist_iodevs(struct kvm *kvm, gpa_t dist_base_address); 87 + 88 + #ifdef CONFIG_KVM_ARM_VGIC_V3_ITS 86 89 int vgic_register_its_iodevs(struct kvm *kvm); 87 90 bool vgic_has_its(struct kvm *kvm); 88 91 int kvm_vgic_register_its_device(void); 89 92 void vgic_enable_lpis(struct kvm_vcpu *vcpu); 90 93 int vgic_its_inject_msi(struct kvm *kvm, struct kvm_msi *msi); 91 94 #else 92 - static inline void vgic_v3_process_maintenance(struct kvm_vcpu *vcpu) 93 - { 94 - } 95 - 96 - static inline void vgic_v3_fold_lr_state(struct kvm_vcpu *vcpu) 97 - { 98 - } 99 - 100 - static inline void vgic_v3_populate_lr(struct kvm_vcpu *vcpu, 101 - struct vgic_irq *irq, int lr) 102 - { 103 - } 104 - 105 - static inline void vgic_v3_clear_lr(struct kvm_vcpu *vcpu, int lr) 106 - { 107 - } 108 - 109 - static inline void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) 110 - { 111 - } 112 - 113 - static inline 114 - void vgic_v3_set_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr) 115 - { 116 - } 117 - 118 - static inline 119 - void vgic_v3_get_vmcr(struct kvm_vcpu *vcpu, struct vgic_vmcr *vmcr) 120 - { 121 - } 122 - 123 - static inline void vgic_v3_enable(struct kvm_vcpu *vcpu) 124 - { 125 - } 126 - 127 - static inline int vgic_v3_probe(const struct gic_kvm_info *info) 128 - { 129 - return -ENODEV; 130 - } 131 - 132 - static inline int vgic_v3_map_resources(struct kvm *kvm) 133 - { 134 - return -ENODEV; 135 - } 136 - 137 - static inline int vgic_register_redist_iodevs(struct kvm *kvm, 138 - gpa_t dist_base_address) 139 - { 140 - return -ENODEV; 141 - } 142 - 143 95 static inline int vgic_register_its_iodevs(struct kvm *kvm) 144 96 { 145 97 return -ENODEV;

+3 -19

virt/kvm/eventfd.c

··· 42 42 43 43 #ifdef CONFIG_HAVE_KVM_IRQFD 44 44 45 - static struct workqueue_struct *irqfd_cleanup_wq; 46 45 47 46 static void 48 47 irqfd_inject(struct work_struct *work) ··· 167 168 168 169 list_del_init(&irqfd->list); 169 170 170 - queue_work(irqfd_cleanup_wq, &irqfd->shutdown); 171 + schedule_work(&irqfd->shutdown); 171 172 } 172 173 173 174 int __attribute__((weak)) kvm_arch_set_irq_inatomic( ··· 554 555 * so that we guarantee there will not be any more interrupts on this 555 556 * gsi once this deassign function returns. 556 557 */ 557 - flush_workqueue(irqfd_cleanup_wq); 558 + flush_work(&irqfd->shutdown); 558 559 559 560 return 0; 560 561 } ··· 591 592 * Block until we know all outstanding shutdown jobs have completed 592 593 * since we do not take a kvm* reference. 593 594 */ 594 - flush_workqueue(irqfd_cleanup_wq); 595 + flush_work(&irqfd->shutdown); 595 596 596 597 } 597 598 ··· 621 622 spin_unlock_irq(&kvm->irqfds.lock); 622 623 } 623 624 624 - /* 625 - * create a host-wide workqueue for issuing deferred shutdown requests 626 - * aggregated from all vm* instances. We need our own isolated single-thread 627 - * queue to prevent deadlock against flushing the normal work-queue. 628 - */ 629 - int kvm_irqfd_init(void) 630 - { 631 - irqfd_cleanup_wq = create_singlethread_workqueue("kvm-irqfd-cleanup"); 632 - if (!irqfd_cleanup_wq) 633 - return -ENOMEM; 634 - 635 - return 0; 636 - } 637 - 638 625 void kvm_irqfd_exit(void) 639 626 { 640 - destroy_workqueue(irqfd_cleanup_wq); 641 627 } 642 628 #endif 643 629

+39 -11

virt/kvm/kvm_main.c

··· 559 559 560 560 debugfs_remove_recursive(kvm->debugfs_dentry); 561 561 562 - for (i = 0; i < kvm_debugfs_num_entries; i++) 563 - kfree(kvm->debugfs_stat_data[i]); 564 - kfree(kvm->debugfs_stat_data); 562 + if (kvm->debugfs_stat_data) { 563 + for (i = 0; i < kvm_debugfs_num_entries; i++) 564 + kfree(kvm->debugfs_stat_data[i]); 565 + kfree(kvm->debugfs_stat_data); 566 + } 565 567 } 566 568 567 569 static int kvm_create_vm_debugfs(struct kvm *kvm, int fd) ··· 2371 2369 { 2372 2370 struct kvm_vcpu *vcpu = filp->private_data; 2373 2371 2372 + debugfs_remove_recursive(vcpu->debugfs_dentry); 2374 2373 kvm_put_kvm(vcpu->kvm); 2375 2374 return 0; 2376 2375 } ··· 2392 2389 static int create_vcpu_fd(struct kvm_vcpu *vcpu) 2393 2390 { 2394 2391 return anon_inode_getfd("kvm-vcpu", &kvm_vcpu_fops, vcpu, O_RDWR | O_CLOEXEC); 2392 + } 2393 + 2394 + static int kvm_create_vcpu_debugfs(struct kvm_vcpu *vcpu) 2395 + { 2396 + char dir_name[ITOA_MAX_LEN * 2]; 2397 + int ret; 2398 + 2399 + if (!kvm_arch_has_vcpu_debugfs()) 2400 + return 0; 2401 + 2402 + if (!debugfs_initialized()) 2403 + return 0; 2404 + 2405 + snprintf(dir_name, sizeof(dir_name), "vcpu%d", vcpu->vcpu_id); 2406 + vcpu->debugfs_dentry = debugfs_create_dir(dir_name, 2407 + vcpu->kvm->debugfs_dentry); 2408 + if (!vcpu->debugfs_dentry) 2409 + return -ENOMEM; 2410 + 2411 + ret = kvm_arch_create_vcpu_debugfs(vcpu); 2412 + if (ret < 0) { 2413 + debugfs_remove_recursive(vcpu->debugfs_dentry); 2414 + return ret; 2415 + } 2416 + 2417 + return 0; 2395 2418 } 2396 2419 2397 2420 /* ··· 2452 2423 if (r) 2453 2424 goto vcpu_destroy; 2454 2425 2426 + r = kvm_create_vcpu_debugfs(vcpu); 2427 + if (r) 2428 + goto vcpu_destroy; 2429 + 2455 2430 mutex_lock(&kvm->lock); 2456 2431 if (kvm_get_vcpu_by_id(kvm, id)) { 2457 2432 r = -EEXIST; ··· 2487 2454 2488 2455 unlock_vcpu_destroy: 2489 2456 mutex_unlock(&kvm->lock); 2457 + debugfs_remove_recursive(vcpu->debugfs_dentry); 2490 2458 vcpu_destroy: 2491 2459 kvm_arch_vcpu_destroy(vcpu); 2492 2460 vcpu_decrement: ··· 3653 3619 { 3654 3620 struct kvm_stat_data *stat_data = (struct kvm_stat_data *)data; 3655 3621 3656 - *val = *(u32 *)((void *)stat_data->kvm + stat_data->offset); 3622 + *val = *(ulong *)((void *)stat_data->kvm + stat_data->offset); 3657 3623 3658 3624 return 0; 3659 3625 } ··· 3683 3649 *val = 0; 3684 3650 3685 3651 kvm_for_each_vcpu(i, vcpu, stat_data->kvm) 3686 - *val += *(u32 *)((void *)vcpu + stat_data->offset); 3652 + *val += *(u64 *)((void *)vcpu + stat_data->offset); 3687 3653 3688 3654 return 0; 3689 3655 } ··· 3841 3807 * kvm_arch_init makes sure there's at most one caller 3842 3808 * for architectures that support multiple implementations, 3843 3809 * like intel and amd on x86. 3844 - * kvm_arch_init must be called before kvm_irqfd_init to avoid creating 3845 - * conflicts in case kvm is already setup for another implementation. 3846 3810 */ 3847 - r = kvm_irqfd_init(); 3848 - if (r) 3849 - goto out_irqfd; 3850 3811 3851 3812 if (!zalloc_cpumask_var(&cpus_hardware_enabled, GFP_KERNEL)) { 3852 3813 r = -ENOMEM; ··· 3923 3894 free_cpumask_var(cpus_hardware_enabled); 3924 3895 out_free_0: 3925 3896 kvm_irqfd_exit(); 3926 - out_irqfd: 3927 3897 kvm_arch_exit(); 3928 3898 out_fail: 3929 3899 return r;