Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

+95 -4

Documentation/virtual/kvm/api.txt

··· 2507 2507 2508 2508 4.80 KVM_SET_DEVICE_ATTR/KVM_GET_DEVICE_ATTR 2509 2509 2510 - Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device 2511 - Type: device ioctl, vm ioctl 2510 + Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device, 2511 + KVM_CAP_VCPU_ATTRIBUTES for vcpu device 2512 + Type: device ioctl, vm ioctl, vcpu ioctl 2512 2513 Parameters: struct kvm_device_attr 2513 2514 Returns: 0 on success, -1 on error 2514 2515 Errors: ··· 2534 2533 2535 2534 4.81 KVM_HAS_DEVICE_ATTR 2536 2535 2537 - Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device 2538 - Type: device ioctl, vm ioctl 2536 + Capability: KVM_CAP_DEVICE_CTRL, KVM_CAP_VM_ATTRIBUTES for vm device, 2537 + KVM_CAP_VCPU_ATTRIBUTES for vcpu device 2538 + Type: device ioctl, vm ioctl, vcpu ioctl 2539 2539 Parameters: struct kvm_device_attr 2540 2540 Returns: 0 on success, -1 on error 2541 2541 Errors: ··· 2579 2577 Depends on KVM_CAP_ARM_EL1_32BIT (arm64 only). 2580 2578 - KVM_ARM_VCPU_PSCI_0_2: Emulate PSCI v0.2 for the CPU. 2581 2579 Depends on KVM_CAP_ARM_PSCI_0_2. 2580 + - KVM_ARM_VCPU_PMU_V3: Emulate PMUv3 for the CPU. 2581 + Depends on KVM_CAP_ARM_PMU_V3. 2582 2582 2583 2583 2584 2584 4.83 KVM_ARM_PREFERRED_TARGET ··· 3039 3035 3040 3036 Queues an SMI on the thread's vcpu. 3041 3037 3038 + 4.97 KVM_CAP_PPC_MULTITCE 3039 + 3040 + Capability: KVM_CAP_PPC_MULTITCE 3041 + Architectures: ppc 3042 + Type: vm 3043 + 3044 + This capability means the kernel is capable of handling hypercalls 3045 + H_PUT_TCE_INDIRECT and H_STUFF_TCE without passing those into the user 3046 + space. This significantly accelerates DMA operations for PPC KVM guests. 3047 + User space should expect that its handlers for these hypercalls 3048 + are not going to be called if user space previously registered LIOBN 3049 + in KVM (via KVM_CREATE_SPAPR_TCE or similar calls). 3050 + 3051 + In order to enable H_PUT_TCE_INDIRECT and H_STUFF_TCE use in the guest, 3052 + user space might have to advertise it for the guest. For example, 3053 + IBM pSeries (sPAPR) guest starts using them if "hcall-multi-tce" is 3054 + present in the "ibm,hypertas-functions" device-tree property. 3055 + 3056 + The hypercalls mentioned above may or may not be processed successfully 3057 + in the kernel based fast path. If they can not be handled by the kernel, 3058 + they will get passed on to user space. So user space still has to have 3059 + an implementation for these despite the in kernel acceleration. 3060 + 3061 + This capability is always enabled. 3062 + 3063 + 4.98 KVM_CREATE_SPAPR_TCE_64 3064 + 3065 + Capability: KVM_CAP_SPAPR_TCE_64 3066 + Architectures: powerpc 3067 + Type: vm ioctl 3068 + Parameters: struct kvm_create_spapr_tce_64 (in) 3069 + Returns: file descriptor for manipulating the created TCE table 3070 + 3071 + This is an extension for KVM_CAP_SPAPR_TCE which only supports 32bit 3072 + windows, described in 4.62 KVM_CREATE_SPAPR_TCE 3073 + 3074 + This capability uses extended struct in ioctl interface: 3075 + 3076 + /* for KVM_CAP_SPAPR_TCE_64 */ 3077 + struct kvm_create_spapr_tce_64 { 3078 + __u64 liobn; 3079 + __u32 page_shift; 3080 + __u32 flags; 3081 + __u64 offset; /* in pages */ 3082 + __u64 size; /* in pages */ 3083 + }; 3084 + 3085 + The aim of extension is to support an additional bigger DMA window with 3086 + a variable page size. 3087 + KVM_CREATE_SPAPR_TCE_64 receives a 64bit window size, an IOMMU page shift and 3088 + a bus offset of the corresponding DMA window, @size and @offset are numbers 3089 + of IOMMU pages. 3090 + 3091 + @flags are not used at the moment. 3092 + 3093 + The rest of functionality is identical to KVM_CREATE_SPAPR_TCE. 3094 + 3095 + 4.98 KVM_REINJECT_CONTROL 3096 + 3097 + Capability: KVM_CAP_REINJECT_CONTROL 3098 + Architectures: x86 3099 + Type: vm ioctl 3100 + Parameters: struct kvm_reinject_control (in) 3101 + Returns: 0 on success, 3102 + -EFAULT if struct kvm_reinject_control cannot be read, 3103 + -ENXIO if KVM_CREATE_PIT or KVM_CREATE_PIT2 didn't succeed earlier. 3104 + 3105 + i8254 (PIT) has two modes, reinject and !reinject. The default is reinject, 3106 + where KVM queues elapsed i8254 ticks and monitors completion of interrupt from 3107 + vector(s) that i8254 injects. Reinject mode dequeues a tick and injects its 3108 + interrupt whenever there isn't a pending interrupt from i8254. 3109 + !reinject mode injects an interrupt as soon as a tick arrives. 3110 + 3111 + struct kvm_reinject_control { 3112 + __u8 pit_reinject; 3113 + __u8 reserved[31]; 3114 + }; 3115 + 3116 + pit_reinject = 0 (!reinject mode) is recommended, unless running an old 3117 + operating system that uses the PIT for timing (e.g. Linux 2.4.x). 3118 + 3042 3119 5. The kvm_run structure 3043 3120 ------------------------ 3044 3121 ··· 3424 3339 3425 3340 struct kvm_hyperv_exit { 3426 3341 #define KVM_EXIT_HYPERV_SYNIC 1 3342 + #define KVM_EXIT_HYPERV_HCALL 2 3427 3343 __u32 type; 3428 3344 union { 3429 3345 struct { ··· 3433 3347 __u64 evt_page; 3434 3348 __u64 msg_page; 3435 3349 } synic; 3350 + struct { 3351 + __u64 input; 3352 + __u64 result; 3353 + __u64 params[2]; 3354 + } hcall; 3436 3355 } u; 3437 3356 }; 3438 3357 /* KVM_EXIT_HYPERV */

+2

Documentation/virtual/kvm/devices/s390_flic.txt

··· 88 88 perform a gmap translation for the guest address provided in addr, 89 89 pin a userspace page for the translated address and add it to the 90 90 list of mappings 91 + Note: A new mapping will be created unconditionally; therefore, 92 + the calling code should avoid making duplicate mappings. 91 93 92 94 KVM_S390_IO_ADAPTER_UNMAP 93 95 release a userspace page for the translated address specified in addr

+33

Documentation/virtual/kvm/devices/vcpu.txt

··· 1 + Generic vcpu interface 2 + ==================================== 3 + 4 + The virtual cpu "device" also accepts the ioctls KVM_SET_DEVICE_ATTR, 5 + KVM_GET_DEVICE_ATTR, and KVM_HAS_DEVICE_ATTR. The interface uses the same struct 6 + kvm_device_attr as other devices, but targets VCPU-wide settings and controls. 7 + 8 + The groups and attributes per virtual cpu, if any, are architecture specific. 9 + 10 + 1. GROUP: KVM_ARM_VCPU_PMU_V3_CTRL 11 + Architectures: ARM64 12 + 13 + 1.1. ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_IRQ 14 + Parameters: in kvm_device_attr.addr the address for PMU overflow interrupt is a 15 + pointer to an int 16 + Returns: -EBUSY: The PMU overflow interrupt is already set 17 + -ENXIO: The overflow interrupt not set when attempting to get it 18 + -ENODEV: PMUv3 not supported 19 + -EINVAL: Invalid PMU overflow interrupt number supplied 20 + 21 + A value describing the PMUv3 (Performance Monitor Unit v3) overflow interrupt 22 + number for this vcpu. This interrupt could be a PPI or SPI, but the interrupt 23 + type must be same for each vcpu. As a PPI, the interrupt number is the same for 24 + all vcpus, while as an SPI it must be a separate number per vcpu. 25 + 26 + 1.2 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_INIT 27 + Parameters: no additional parameter in kvm_device_attr.addr 28 + Returns: -ENODEV: PMUv3 not supported 29 + -ENXIO: PMUv3 not properly configured as required prior to calling this 30 + attribute 31 + -EBUSY: PMUv3 already initialized 32 + 33 + Request the initialization of the PMUv3.

+52

Documentation/virtual/kvm/devices/vm.txt

··· 84 84 -EFAULT if the given address is not accessible from kernel space 85 85 -ENOMEM if not enough memory is available to process the ioctl 86 86 0 in case of success 87 + 88 + 3. GROUP: KVM_S390_VM_TOD 89 + Architectures: s390 90 + 91 + 3.1. ATTRIBUTE: KVM_S390_VM_TOD_HIGH 92 + 93 + Allows user space to set/get the TOD clock extension (u8). 94 + 95 + Parameters: address of a buffer in user space to store the data (u8) to 96 + Returns: -EFAULT if the given address is not accessible from kernel space 97 + -EINVAL if setting the TOD clock extension to != 0 is not supported 98 + 99 + 3.2. ATTRIBUTE: KVM_S390_VM_TOD_LOW 100 + 101 + Allows user space to set/get bits 0-63 of the TOD clock register as defined in 102 + the POP (u64). 103 + 104 + Parameters: address of a buffer in user space to store the data (u64) to 105 + Returns: -EFAULT if the given address is not accessible from kernel space 106 + 107 + 4. GROUP: KVM_S390_VM_CRYPTO 108 + Architectures: s390 109 + 110 + 4.1. ATTRIBUTE: KVM_S390_VM_CRYPTO_ENABLE_AES_KW (w/o) 111 + 112 + Allows user space to enable aes key wrapping, including generating a new 113 + wrapping key. 114 + 115 + Parameters: none 116 + Returns: 0 117 + 118 + 4.2. ATTRIBUTE: KVM_S390_VM_CRYPTO_ENABLE_DEA_KW (w/o) 119 + 120 + Allows user space to enable dea key wrapping, including generating a new 121 + wrapping key. 122 + 123 + Parameters: none 124 + Returns: 0 125 + 126 + 4.3. ATTRIBUTE: KVM_S390_VM_CRYPTO_DISABLE_AES_KW (w/o) 127 + 128 + Allows user space to disable aes key wrapping, clearing the wrapping key. 129 + 130 + Parameters: none 131 + Returns: 0 132 + 133 + 4.4. ATTRIBUTE: KVM_S390_VM_CRYPTO_DISABLE_DEA_KW (w/o) 134 + 135 + Allows user space to disable dea key wrapping, clearing the wrapping key. 136 + 137 + Parameters: none 138 + Returns: 0

+3 -3

Documentation/virtual/kvm/mmu.txt

··· 392 392 write-protected pages 393 393 - the guest page must be wholly contained by a single memory slot 394 394 395 - To check the last two conditions, the mmu maintains a ->write_count set of 395 + To check the last two conditions, the mmu maintains a ->disallow_lpage set of 396 396 arrays for each memory slot and large page size. Every write protected page 397 - causes its write_count to be incremented, thus preventing instantiation of 397 + causes its disallow_lpage to be incremented, thus preventing instantiation of 398 398 a large spte. The frames at the end of an unaligned memory slot have 399 - artificially inflated ->write_counts so they can never be instantiated. 399 + artificially inflated ->disallow_lpages so they can never be instantiated. 400 400 401 401 Zapping all pages (page generation count) 402 402 =========================================

+3 -38

arch/arm/include/asm/kvm_asm.h

··· 19 19 #ifndef __ARM_KVM_ASM_H__ 20 20 #define __ARM_KVM_ASM_H__ 21 21 22 - /* 0 is reserved as an invalid value. */ 23 - #define c0_MPIDR 1 /* MultiProcessor ID Register */ 24 - #define c0_CSSELR 2 /* Cache Size Selection Register */ 25 - #define c1_SCTLR 3 /* System Control Register */ 26 - #define c1_ACTLR 4 /* Auxiliary Control Register */ 27 - #define c1_CPACR 5 /* Coprocessor Access Control */ 28 - #define c2_TTBR0 6 /* Translation Table Base Register 0 */ 29 - #define c2_TTBR0_high 7 /* TTBR0 top 32 bits */ 30 - #define c2_TTBR1 8 /* Translation Table Base Register 1 */ 31 - #define c2_TTBR1_high 9 /* TTBR1 top 32 bits */ 32 - #define c2_TTBCR 10 /* Translation Table Base Control R. */ 33 - #define c3_DACR 11 /* Domain Access Control Register */ 34 - #define c5_DFSR 12 /* Data Fault Status Register */ 35 - #define c5_IFSR 13 /* Instruction Fault Status Register */ 36 - #define c5_ADFSR 14 /* Auxilary Data Fault Status R */ 37 - #define c5_AIFSR 15 /* Auxilary Instrunction Fault Status R */ 38 - #define c6_DFAR 16 /* Data Fault Address Register */ 39 - #define c6_IFAR 17 /* Instruction Fault Address Register */ 40 - #define c7_PAR 18 /* Physical Address Register */ 41 - #define c7_PAR_high 19 /* PAR top 32 bits */ 42 - #define c9_L2CTLR 20 /* Cortex A15/A7 L2 Control Register */ 43 - #define c10_PRRR 21 /* Primary Region Remap Register */ 44 - #define c10_NMRR 22 /* Normal Memory Remap Register */ 45 - #define c12_VBAR 23 /* Vector Base Address Register */ 46 - #define c13_CID 24 /* Context ID Register */ 47 - #define c13_TID_URW 25 /* Thread ID, User R/W */ 48 - #define c13_TID_URO 26 /* Thread ID, User R/O */ 49 - #define c13_TID_PRIV 27 /* Thread ID, Privileged */ 50 - #define c14_CNTKCTL 28 /* Timer Control Register (PL1) */ 51 - #define c10_AMAIR0 29 /* Auxilary Memory Attribute Indirection Reg0 */ 52 - #define c10_AMAIR1 30 /* Auxilary Memory Attribute Indirection Reg1 */ 53 - #define NR_CP15_REGS 31 /* Number of regs (incl. invalid) */ 22 + #include <asm/virt.h> 54 23 55 24 #define ARM_EXCEPTION_RESET 0 56 25 #define ARM_EXCEPTION_UNDEFINED 1 ··· 55 86 extern char __kvm_hyp_init[]; 56 87 extern char __kvm_hyp_init_end[]; 57 88 58 - extern char __kvm_hyp_exit[]; 59 - extern char __kvm_hyp_exit_end[]; 60 - 61 89 extern char __kvm_hyp_vector[]; 62 - 63 - extern char __kvm_hyp_code_start[]; 64 - extern char __kvm_hyp_code_end[]; 65 90 66 91 extern void __kvm_flush_vm_context(void); 67 92 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa); 68 93 extern void __kvm_tlb_flush_vmid(struct kvm *kvm); 69 94 70 95 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu); 96 + 97 + extern void __init_stage2_translation(void); 71 98 #endif 72 99 73 100 #endif /* __ARM_KVM_ASM_H__ */

+10 -10

arch/arm/include/asm/kvm_emulate.h

··· 68 68 69 69 static inline unsigned long *vcpu_pc(struct kvm_vcpu *vcpu) 70 70 { 71 - return &vcpu->arch.regs.usr_regs.ARM_pc; 71 + return &vcpu->arch.ctxt.gp_regs.usr_regs.ARM_pc; 72 72 } 73 73 74 74 static inline unsigned long *vcpu_cpsr(struct kvm_vcpu *vcpu) 75 75 { 76 - return &vcpu->arch.regs.usr_regs.ARM_cpsr; 76 + return &vcpu->arch.ctxt.gp_regs.usr_regs.ARM_cpsr; 77 77 } 78 78 79 79 static inline void vcpu_set_thumb(struct kvm_vcpu *vcpu) ··· 83 83 84 84 static inline bool mode_has_spsr(struct kvm_vcpu *vcpu) 85 85 { 86 - unsigned long cpsr_mode = vcpu->arch.regs.usr_regs.ARM_cpsr & MODE_MASK; 86 + unsigned long cpsr_mode = vcpu->arch.ctxt.gp_regs.usr_regs.ARM_cpsr & MODE_MASK; 87 87 return (cpsr_mode > USR_MODE && cpsr_mode < SYSTEM_MODE); 88 88 } 89 89 90 90 static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu) 91 91 { 92 - unsigned long cpsr_mode = vcpu->arch.regs.usr_regs.ARM_cpsr & MODE_MASK; 92 + unsigned long cpsr_mode = vcpu->arch.ctxt.gp_regs.usr_regs.ARM_cpsr & MODE_MASK; 93 93 return cpsr_mode > USR_MODE;; 94 94 } 95 95 ··· 106 106 static inline phys_addr_t kvm_vcpu_get_fault_ipa(struct kvm_vcpu *vcpu) 107 107 { 108 108 return ((phys_addr_t)vcpu->arch.fault.hpfar & HPFAR_MASK) << 8; 109 - } 110 - 111 - static inline unsigned long kvm_vcpu_get_hyp_pc(struct kvm_vcpu *vcpu) 112 - { 113 - return vcpu->arch.fault.hyp_pc; 114 109 } 115 110 116 111 static inline bool kvm_vcpu_dabt_isvalid(struct kvm_vcpu *vcpu) ··· 136 141 static inline bool kvm_vcpu_dabt_iss1tw(struct kvm_vcpu *vcpu) 137 142 { 138 143 return kvm_vcpu_get_hsr(vcpu) & HSR_DABT_S1PTW; 144 + } 145 + 146 + static inline bool kvm_vcpu_dabt_is_cm(struct kvm_vcpu *vcpu) 147 + { 148 + return !!(kvm_vcpu_get_hsr(vcpu) & HSR_DABT_CM); 139 149 } 140 150 141 151 /* Get Access Size from a data abort */ ··· 192 192 193 193 static inline unsigned long kvm_vcpu_get_mpidr_aff(struct kvm_vcpu *vcpu) 194 194 { 195 - return vcpu->arch.cp15[c0_MPIDR] & MPIDR_HWID_BITMASK; 195 + return vcpu_cp15(vcpu, c0_MPIDR) & MPIDR_HWID_BITMASK; 196 196 } 197 197 198 198 static inline void kvm_vcpu_set_be(struct kvm_vcpu *vcpu)

+70 -10

arch/arm/include/asm/kvm_host.h

··· 85 85 u32 hsr; /* Hyp Syndrome Register */ 86 86 u32 hxfar; /* Hyp Data/Inst. Fault Address Register */ 87 87 u32 hpfar; /* Hyp IPA Fault Address Register */ 88 - u32 hyp_pc; /* PC when exception was taken from Hyp mode */ 89 88 }; 90 89 91 - typedef struct vfp_hard_struct kvm_cpu_context_t; 90 + /* 91 + * 0 is reserved as an invalid value. 92 + * Order should be kept in sync with the save/restore code. 93 + */ 94 + enum vcpu_sysreg { 95 + __INVALID_SYSREG__, 96 + c0_MPIDR, /* MultiProcessor ID Register */ 97 + c0_CSSELR, /* Cache Size Selection Register */ 98 + c1_SCTLR, /* System Control Register */ 99 + c1_ACTLR, /* Auxiliary Control Register */ 100 + c1_CPACR, /* Coprocessor Access Control */ 101 + c2_TTBR0, /* Translation Table Base Register 0 */ 102 + c2_TTBR0_high, /* TTBR0 top 32 bits */ 103 + c2_TTBR1, /* Translation Table Base Register 1 */ 104 + c2_TTBR1_high, /* TTBR1 top 32 bits */ 105 + c2_TTBCR, /* Translation Table Base Control R. */ 106 + c3_DACR, /* Domain Access Control Register */ 107 + c5_DFSR, /* Data Fault Status Register */ 108 + c5_IFSR, /* Instruction Fault Status Register */ 109 + c5_ADFSR, /* Auxilary Data Fault Status R */ 110 + c5_AIFSR, /* Auxilary Instrunction Fault Status R */ 111 + c6_DFAR, /* Data Fault Address Register */ 112 + c6_IFAR, /* Instruction Fault Address Register */ 113 + c7_PAR, /* Physical Address Register */ 114 + c7_PAR_high, /* PAR top 32 bits */ 115 + c9_L2CTLR, /* Cortex A15/A7 L2 Control Register */ 116 + c10_PRRR, /* Primary Region Remap Register */ 117 + c10_NMRR, /* Normal Memory Remap Register */ 118 + c12_VBAR, /* Vector Base Address Register */ 119 + c13_CID, /* Context ID Register */ 120 + c13_TID_URW, /* Thread ID, User R/W */ 121 + c13_TID_URO, /* Thread ID, User R/O */ 122 + c13_TID_PRIV, /* Thread ID, Privileged */ 123 + c14_CNTKCTL, /* Timer Control Register (PL1) */ 124 + c10_AMAIR0, /* Auxilary Memory Attribute Indirection Reg0 */ 125 + c10_AMAIR1, /* Auxilary Memory Attribute Indirection Reg1 */ 126 + NR_CP15_REGS /* Number of regs (incl. invalid) */ 127 + }; 128 + 129 + struct kvm_cpu_context { 130 + struct kvm_regs gp_regs; 131 + struct vfp_hard_struct vfp; 132 + u32 cp15[NR_CP15_REGS]; 133 + }; 134 + 135 + typedef struct kvm_cpu_context kvm_cpu_context_t; 92 136 93 137 struct kvm_vcpu_arch { 94 - struct kvm_regs regs; 138 + struct kvm_cpu_context ctxt; 95 139 96 140 int target; /* Processor target */ 97 141 DECLARE_BITMAP(features, KVM_VCPU_MAX_FEATURES); 98 - 99 - /* System control coprocessor (cp15) */ 100 - u32 cp15[NR_CP15_REGS]; 101 142 102 143 /* The CPU type we expose to the VM */ 103 144 u32 midr; ··· 151 110 152 111 /* Exception Information */ 153 112 struct kvm_vcpu_fault_info fault; 154 - 155 - /* Floating point registers (VFP and Advanced SIMD/NEON) */ 156 - struct vfp_hard_struct vfp_guest; 157 113 158 114 /* Host FP context */ 159 115 kvm_cpu_context_t *host_cpu_context; ··· 196 158 u64 exits; 197 159 }; 198 160 161 + #define vcpu_cp15(v,r) (v)->arch.ctxt.cp15[r] 162 + 199 163 int kvm_vcpu_preferred_target(struct kvm_vcpu_init *init); 200 164 unsigned long kvm_arm_num_regs(struct kvm_vcpu *vcpu); 201 165 int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 __user *indices); 202 166 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); 203 167 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); 204 - u64 kvm_call_hyp(void *hypfn, ...); 168 + unsigned long kvm_call_hyp(void *hypfn, ...); 205 169 void force_vm_exit(const cpumask_t *mask); 206 170 207 171 #define KVM_ARCH_WANT_MMU_NOTIFIER ··· 260 220 kvm_call_hyp((void*)hyp_stack_ptr, vector_ptr, pgd_ptr); 261 221 } 262 222 223 + static inline void __cpu_init_stage2(void) 224 + { 225 + kvm_call_hyp(__init_stage2_translation); 226 + } 227 + 263 228 static inline int kvm_arch_dev_ioctl_check_extension(long ext) 264 229 { 265 230 return 0; ··· 287 242 static inline void kvm_arm_setup_debug(struct kvm_vcpu *vcpu) {} 288 243 static inline void kvm_arm_clear_debug(struct kvm_vcpu *vcpu) {} 289 244 static inline void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu) {} 245 + static inline int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu, 246 + struct kvm_device_attr *attr) 247 + { 248 + return -ENXIO; 249 + } 250 + static inline int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu, 251 + struct kvm_device_attr *attr) 252 + { 253 + return -ENXIO; 254 + } 255 + static inline int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu, 256 + struct kvm_device_attr *attr) 257 + { 258 + return -ENXIO; 259 + } 290 260 291 261 #endif /* __ARM_KVM_HOST_H__ */

+139

arch/arm/include/asm/kvm_hyp.h

··· 1 + /* 2 + * Copyright (C) 2015 - ARM Ltd 3 + * Author: Marc Zyngier <marc.zyngier@arm.com> 4 + * 5 + * This program is free software; you can redistribute it and/or modify 6 + * it under the terms of the GNU General Public License version 2 as 7 + * published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it will be useful, 10 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 + * GNU General Public License for more details. 13 + * 14 + * You should have received a copy of the GNU General Public License 15 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 + */ 17 + 18 + #ifndef __ARM_KVM_HYP_H__ 19 + #define __ARM_KVM_HYP_H__ 20 + 21 + #include <linux/compiler.h> 22 + #include <linux/kvm_host.h> 23 + #include <asm/kvm_mmu.h> 24 + #include <asm/vfp.h> 25 + 26 + #define __hyp_text __section(.hyp.text) notrace 27 + 28 + #define kern_hyp_va(v) (v) 29 + #define hyp_kern_va(v) (v) 30 + 31 + #define __ACCESS_CP15(CRn, Op1, CRm, Op2) \ 32 + "mrc", "mcr", __stringify(p15, Op1, %0, CRn, CRm, Op2), u32 33 + #define __ACCESS_CP15_64(Op1, CRm) \ 34 + "mrrc", "mcrr", __stringify(p15, Op1, %Q0, %R0, CRm), u64 35 + #define __ACCESS_VFP(CRn) \ 36 + "mrc", "mcr", __stringify(p10, 7, %0, CRn, cr0, 0), u32 37 + 38 + #define __write_sysreg(v, r, w, c, t) asm volatile(w " " c : : "r" ((t)(v))) 39 + #define write_sysreg(v, ...) __write_sysreg(v, __VA_ARGS__) 40 + 41 + #define __read_sysreg(r, w, c, t) ({ \ 42 + t __val; \ 43 + asm volatile(r " " c : "=r" (__val)); \ 44 + __val; \ 45 + }) 46 + #define read_sysreg(...) __read_sysreg(__VA_ARGS__) 47 + 48 + #define write_special(v, r) \ 49 + asm volatile("msr " __stringify(r) ", %0" : : "r" (v)) 50 + #define read_special(r) ({ \ 51 + u32 __val; \ 52 + asm volatile("mrs %0, " __stringify(r) : "=r" (__val)); \ 53 + __val; \ 54 + }) 55 + 56 + #define TTBR0 __ACCESS_CP15_64(0, c2) 57 + #define TTBR1 __ACCESS_CP15_64(1, c2) 58 + #define VTTBR __ACCESS_CP15_64(6, c2) 59 + #define PAR __ACCESS_CP15_64(0, c7) 60 + #define CNTV_CVAL __ACCESS_CP15_64(3, c14) 61 + #define CNTVOFF __ACCESS_CP15_64(4, c14) 62 + 63 + #define MIDR __ACCESS_CP15(c0, 0, c0, 0) 64 + #define CSSELR __ACCESS_CP15(c0, 2, c0, 0) 65 + #define VPIDR __ACCESS_CP15(c0, 4, c0, 0) 66 + #define VMPIDR __ACCESS_CP15(c0, 4, c0, 5) 67 + #define SCTLR __ACCESS_CP15(c1, 0, c0, 0) 68 + #define CPACR __ACCESS_CP15(c1, 0, c0, 2) 69 + #define HCR __ACCESS_CP15(c1, 4, c1, 0) 70 + #define HDCR __ACCESS_CP15(c1, 4, c1, 1) 71 + #define HCPTR __ACCESS_CP15(c1, 4, c1, 2) 72 + #define HSTR __ACCESS_CP15(c1, 4, c1, 3) 73 + #define TTBCR __ACCESS_CP15(c2, 0, c0, 2) 74 + #define HTCR __ACCESS_CP15(c2, 4, c0, 2) 75 + #define VTCR __ACCESS_CP15(c2, 4, c1, 2) 76 + #define DACR __ACCESS_CP15(c3, 0, c0, 0) 77 + #define DFSR __ACCESS_CP15(c5, 0, c0, 0) 78 + #define IFSR __ACCESS_CP15(c5, 0, c0, 1) 79 + #define ADFSR __ACCESS_CP15(c5, 0, c1, 0) 80 + #define AIFSR __ACCESS_CP15(c5, 0, c1, 1) 81 + #define HSR __ACCESS_CP15(c5, 4, c2, 0) 82 + #define DFAR __ACCESS_CP15(c6, 0, c0, 0) 83 + #define IFAR __ACCESS_CP15(c6, 0, c0, 2) 84 + #define HDFAR __ACCESS_CP15(c6, 4, c0, 0) 85 + #define HIFAR __ACCESS_CP15(c6, 4, c0, 2) 86 + #define HPFAR __ACCESS_CP15(c6, 4, c0, 4) 87 + #define ICIALLUIS __ACCESS_CP15(c7, 0, c1, 0) 88 + #define ATS1CPR __ACCESS_CP15(c7, 0, c8, 0) 89 + #define TLBIALLIS __ACCESS_CP15(c8, 0, c3, 0) 90 + #define TLBIALLNSNHIS __ACCESS_CP15(c8, 4, c3, 4) 91 + #define PRRR __ACCESS_CP15(c10, 0, c2, 0) 92 + #define NMRR __ACCESS_CP15(c10, 0, c2, 1) 93 + #define AMAIR0 __ACCESS_CP15(c10, 0, c3, 0) 94 + #define AMAIR1 __ACCESS_CP15(c10, 0, c3, 1) 95 + #define VBAR __ACCESS_CP15(c12, 0, c0, 0) 96 + #define CID __ACCESS_CP15(c13, 0, c0, 1) 97 + #define TID_URW __ACCESS_CP15(c13, 0, c0, 2) 98 + #define TID_URO __ACCESS_CP15(c13, 0, c0, 3) 99 + #define TID_PRIV __ACCESS_CP15(c13, 0, c0, 4) 100 + #define HTPIDR __ACCESS_CP15(c13, 4, c0, 2) 101 + #define CNTKCTL __ACCESS_CP15(c14, 0, c1, 0) 102 + #define CNTV_CTL __ACCESS_CP15(c14, 0, c3, 1) 103 + #define CNTHCTL __ACCESS_CP15(c14, 4, c1, 0) 104 + 105 + #define VFP_FPEXC __ACCESS_VFP(FPEXC) 106 + 107 + /* AArch64 compatibility macros, only for the timer so far */ 108 + #define read_sysreg_el0(r) read_sysreg(r##_el0) 109 + #define write_sysreg_el0(v, r) write_sysreg(v, r##_el0) 110 + 111 + #define cntv_ctl_el0 CNTV_CTL 112 + #define cntv_cval_el0 CNTV_CVAL 113 + #define cntvoff_el2 CNTVOFF 114 + #define cnthctl_el2 CNTHCTL 115 + 116 + void __timer_save_state(struct kvm_vcpu *vcpu); 117 + void __timer_restore_state(struct kvm_vcpu *vcpu); 118 + 119 + void __vgic_v2_save_state(struct kvm_vcpu *vcpu); 120 + void __vgic_v2_restore_state(struct kvm_vcpu *vcpu); 121 + 122 + void __sysreg_save_state(struct kvm_cpu_context *ctxt); 123 + void __sysreg_restore_state(struct kvm_cpu_context *ctxt); 124 + 125 + void asmlinkage __vfp_save_state(struct vfp_hard_struct *vfp); 126 + void asmlinkage __vfp_restore_state(struct vfp_hard_struct *vfp); 127 + static inline bool __vfp_enabled(void) 128 + { 129 + return !(read_sysreg(HCPTR) & (HCPTR_TCP(11) | HCPTR_TCP(10))); 130 + } 131 + 132 + void __hyp_text __banked_save_state(struct kvm_cpu_context *ctxt); 133 + void __hyp_text __banked_restore_state(struct kvm_cpu_context *ctxt); 134 + 135 + int asmlinkage __guest_enter(struct kvm_vcpu *vcpu, 136 + struct kvm_cpu_context *host); 137 + int asmlinkage __hyp_do_panic(const char *, int, u32); 138 + 139 + #endif /* __ARM_KVM_HYP_H__ */

+1 -1

arch/arm/include/asm/kvm_mmu.h

··· 179 179 180 180 static inline bool vcpu_has_cache_enabled(struct kvm_vcpu *vcpu) 181 181 { 182 - return (vcpu->arch.cp15[c1_SCTLR] & 0b101) == 0b101; 182 + return (vcpu_cp15(vcpu, c1_SCTLR) & 0b101) == 0b101; 183 183 } 184 184 185 185 static inline void __coherent_cache_guest_page(struct kvm_vcpu *vcpu,

+9

arch/arm/include/asm/virt.h

··· 74 74 { 75 75 return !!(__boot_cpu_mode & BOOT_CPU_MODE_MISMATCH); 76 76 } 77 + 78 + static inline bool is_kernel_in_hyp_mode(void) 79 + { 80 + return false; 81 + } 82 + 83 + /* The section containing the hypervisor text */ 84 + extern char __hyp_text_start[]; 85 + extern char __hyp_text_end[]; 77 86 #endif 78 87 79 88 #endif /* __ASSEMBLY__ */

+5 -35

arch/arm/kernel/asm-offsets.c

··· 170 170 DEFINE(CACHE_WRITEBACK_GRANULE, __CACHE_WRITEBACK_GRANULE); 171 171 BLANK(); 172 172 #ifdef CONFIG_KVM_ARM_HOST 173 - DEFINE(VCPU_KVM, offsetof(struct kvm_vcpu, kvm)); 174 - DEFINE(VCPU_MIDR, offsetof(struct kvm_vcpu, arch.midr)); 175 - DEFINE(VCPU_CP15, offsetof(struct kvm_vcpu, arch.cp15)); 176 - DEFINE(VCPU_VFP_GUEST, offsetof(struct kvm_vcpu, arch.vfp_guest)); 177 - DEFINE(VCPU_VFP_HOST, offsetof(struct kvm_vcpu, arch.host_cpu_context)); 178 - DEFINE(VCPU_REGS, offsetof(struct kvm_vcpu, arch.regs)); 179 - DEFINE(VCPU_USR_REGS, offsetof(struct kvm_vcpu, arch.regs.usr_regs)); 180 - DEFINE(VCPU_SVC_REGS, offsetof(struct kvm_vcpu, arch.regs.svc_regs)); 181 - DEFINE(VCPU_ABT_REGS, offsetof(struct kvm_vcpu, arch.regs.abt_regs)); 182 - DEFINE(VCPU_UND_REGS, offsetof(struct kvm_vcpu, arch.regs.und_regs)); 183 - DEFINE(VCPU_IRQ_REGS, offsetof(struct kvm_vcpu, arch.regs.irq_regs)); 184 - DEFINE(VCPU_FIQ_REGS, offsetof(struct kvm_vcpu, arch.regs.fiq_regs)); 185 - DEFINE(VCPU_PC, offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_pc)); 186 - DEFINE(VCPU_CPSR, offsetof(struct kvm_vcpu, arch.regs.usr_regs.ARM_cpsr)); 187 - DEFINE(VCPU_HCR, offsetof(struct kvm_vcpu, arch.hcr)); 188 - DEFINE(VCPU_IRQ_LINES, offsetof(struct kvm_vcpu, arch.irq_lines)); 189 - DEFINE(VCPU_HSR, offsetof(struct kvm_vcpu, arch.fault.hsr)); 190 - DEFINE(VCPU_HxFAR, offsetof(struct kvm_vcpu, arch.fault.hxfar)); 191 - DEFINE(VCPU_HPFAR, offsetof(struct kvm_vcpu, arch.fault.hpfar)); 192 - DEFINE(VCPU_HYP_PC, offsetof(struct kvm_vcpu, arch.fault.hyp_pc)); 193 - DEFINE(VCPU_VGIC_CPU, offsetof(struct kvm_vcpu, arch.vgic_cpu)); 194 - DEFINE(VGIC_V2_CPU_HCR, offsetof(struct vgic_cpu, vgic_v2.vgic_hcr)); 195 - DEFINE(VGIC_V2_CPU_VMCR, offsetof(struct vgic_cpu, vgic_v2.vgic_vmcr)); 196 - DEFINE(VGIC_V2_CPU_MISR, offsetof(struct vgic_cpu, vgic_v2.vgic_misr)); 197 - DEFINE(VGIC_V2_CPU_EISR, offsetof(struct vgic_cpu, vgic_v2.vgic_eisr)); 198 - DEFINE(VGIC_V2_CPU_ELRSR, offsetof(struct vgic_cpu, vgic_v2.vgic_elrsr)); 199 - DEFINE(VGIC_V2_CPU_APR, offsetof(struct vgic_cpu, vgic_v2.vgic_apr)); 200 - DEFINE(VGIC_V2_CPU_LR, offsetof(struct vgic_cpu, vgic_v2.vgic_lr)); 201 - DEFINE(VGIC_CPU_NR_LR, offsetof(struct vgic_cpu, nr_lr)); 202 - DEFINE(VCPU_TIMER_CNTV_CTL, offsetof(struct kvm_vcpu, arch.timer_cpu.cntv_ctl)); 203 - DEFINE(VCPU_TIMER_CNTV_CVAL, offsetof(struct kvm_vcpu, arch.timer_cpu.cntv_cval)); 204 - DEFINE(KVM_TIMER_CNTVOFF, offsetof(struct kvm, arch.timer.cntvoff)); 205 - DEFINE(KVM_TIMER_ENABLED, offsetof(struct kvm, arch.timer.enabled)); 206 - DEFINE(KVM_VGIC_VCTRL, offsetof(struct kvm, arch.vgic.vctrl_base)); 207 - DEFINE(KVM_VTTBR, offsetof(struct kvm, arch.vttbr)); 173 + DEFINE(VCPU_GUEST_CTXT, offsetof(struct kvm_vcpu, arch.ctxt)); 174 + DEFINE(VCPU_HOST_CTXT, offsetof(struct kvm_vcpu, arch.host_cpu_context)); 175 + DEFINE(CPU_CTXT_VFP, offsetof(struct kvm_cpu_context, vfp)); 176 + DEFINE(CPU_CTXT_GP_REGS, offsetof(struct kvm_cpu_context, gp_regs)); 177 + DEFINE(GP_REGS_USR, offsetof(struct kvm_regs, usr_regs)); 208 178 #endif 209 179 BLANK(); 210 180 #ifdef CONFIG_VDSO

+6

arch/arm/kernel/vmlinux.lds.S

··· 18 18 *(.proc.info.init) \ 19 19 VMLINUX_SYMBOL(__proc_info_end) = .; 20 20 21 + #define HYPERVISOR_TEXT \ 22 + VMLINUX_SYMBOL(__hyp_text_start) = .; \ 23 + *(.hyp.text) \ 24 + VMLINUX_SYMBOL(__hyp_text_end) = .; 25 + 21 26 #define IDMAP_TEXT \ 22 27 ALIGN_FUNCTION(); \ 23 28 VMLINUX_SYMBOL(__idmap_text_start) = .; \ ··· 113 108 TEXT_TEXT 114 109 SCHED_TEXT 115 110 LOCK_TEXT 111 + HYPERVISOR_TEXT 116 112 KPROBES_TEXT 117 113 *(.gnu.warning) 118 114 *(.glue_7)

+1

arch/arm/kvm/Makefile

··· 17 17 KVM := ../../../virt/kvm 18 18 kvm-arm-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o $(KVM)/eventfd.o $(KVM)/vfio.o 19 19 20 + obj-$(CONFIG_KVM_ARM_HOST) += hyp/ 20 21 obj-y += kvm-arm.o init.o interrupts.o 21 22 obj-y += arm.o handle_exit.o guest.o mmu.o emulate.o reset.o 22 23 obj-y += coproc.o coproc_a15.o coproc_a7.o mmio.o psci.o perf.o

+182 -62

arch/arm/kvm/arm.c

··· 28 28 #include <linux/sched.h> 29 29 #include <linux/kvm.h> 30 30 #include <trace/events/kvm.h> 31 + #include <kvm/arm_pmu.h> 31 32 32 33 #define CREATE_TRACE_POINTS 33 34 #include "trace.h" ··· 266 265 kvm_mmu_free_memory_caches(vcpu); 267 266 kvm_timer_vcpu_terminate(vcpu); 268 267 kvm_vgic_vcpu_destroy(vcpu); 268 + kvm_pmu_vcpu_destroy(vcpu); 269 269 kmem_cache_free(kvm_vcpu_cache, vcpu); 270 270 } 271 271 ··· 322 320 vcpu->cpu = -1; 323 321 324 322 kvm_arm_set_running_vcpu(NULL); 323 + kvm_timer_vcpu_put(vcpu); 325 324 } 326 325 327 326 int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, ··· 580 577 * non-preemptible context. 581 578 */ 582 579 preempt_disable(); 580 + kvm_pmu_flush_hwstate(vcpu); 583 581 kvm_timer_flush_hwstate(vcpu); 584 582 kvm_vgic_flush_hwstate(vcpu); 585 583 ··· 597 593 if (ret <= 0 || need_new_vmid_gen(vcpu->kvm) || 598 594 vcpu->arch.power_off || vcpu->arch.pause) { 599 595 local_irq_enable(); 596 + kvm_pmu_sync_hwstate(vcpu); 600 597 kvm_timer_sync_hwstate(vcpu); 601 598 kvm_vgic_sync_hwstate(vcpu); 602 599 preempt_enable(); ··· 647 642 trace_kvm_exit(ret, kvm_vcpu_trap_get_class(vcpu), *vcpu_pc(vcpu)); 648 643 649 644 /* 650 - * We must sync the timer state before the vgic state so that 651 - * the vgic can properly sample the updated state of the 645 + * We must sync the PMU and timer state before the vgic state so 646 + * that the vgic can properly sample the updated state of the 652 647 * interrupt line. 653 648 */ 649 + kvm_pmu_sync_hwstate(vcpu); 654 650 kvm_timer_sync_hwstate(vcpu); 655 651 656 652 kvm_vgic_sync_hwstate(vcpu); ··· 829 823 return 0; 830 824 } 831 825 826 + static int kvm_arm_vcpu_set_attr(struct kvm_vcpu *vcpu, 827 + struct kvm_device_attr *attr) 828 + { 829 + int ret = -ENXIO; 830 + 831 + switch (attr->group) { 832 + default: 833 + ret = kvm_arm_vcpu_arch_set_attr(vcpu, attr); 834 + break; 835 + } 836 + 837 + return ret; 838 + } 839 + 840 + static int kvm_arm_vcpu_get_attr(struct kvm_vcpu *vcpu, 841 + struct kvm_device_attr *attr) 842 + { 843 + int ret = -ENXIO; 844 + 845 + switch (attr->group) { 846 + default: 847 + ret = kvm_arm_vcpu_arch_get_attr(vcpu, attr); 848 + break; 849 + } 850 + 851 + return ret; 852 + } 853 + 854 + static int kvm_arm_vcpu_has_attr(struct kvm_vcpu *vcpu, 855 + struct kvm_device_attr *attr) 856 + { 857 + int ret = -ENXIO; 858 + 859 + switch (attr->group) { 860 + default: 861 + ret = kvm_arm_vcpu_arch_has_attr(vcpu, attr); 862 + break; 863 + } 864 + 865 + return ret; 866 + } 867 + 832 868 long kvm_arch_vcpu_ioctl(struct file *filp, 833 869 unsigned int ioctl, unsigned long arg) 834 870 { 835 871 struct kvm_vcpu *vcpu = filp->private_data; 836 872 void __user *argp = (void __user *)arg; 873 + struct kvm_device_attr attr; 837 874 838 875 switch (ioctl) { 839 876 case KVM_ARM_VCPU_INIT: { ··· 918 869 if (n < reg_list.n) 919 870 return -E2BIG; 920 871 return kvm_arm_copy_reg_indices(vcpu, user_list->reg); 872 + } 873 + case KVM_SET_DEVICE_ATTR: { 874 + if (copy_from_user(&attr, argp, sizeof(attr))) 875 + return -EFAULT; 876 + return kvm_arm_vcpu_set_attr(vcpu, &attr); 877 + } 878 + case KVM_GET_DEVICE_ATTR: { 879 + if (copy_from_user(&attr, argp, sizeof(attr))) 880 + return -EFAULT; 881 + return kvm_arm_vcpu_get_attr(vcpu, &attr); 882 + } 883 + case KVM_HAS_DEVICE_ATTR: { 884 + if (copy_from_user(&attr, argp, sizeof(attr))) 885 + return -EFAULT; 886 + return kvm_arm_vcpu_has_attr(vcpu, &attr); 921 887 } 922 888 default: 923 889 return -EINVAL; ··· 1031 967 } 1032 968 } 1033 969 970 + static void cpu_init_stage2(void *dummy) 971 + { 972 + __cpu_init_stage2(); 973 + } 974 + 1034 975 static void cpu_init_hyp_mode(void *dummy) 1035 976 { 1036 977 phys_addr_t boot_pgd_ptr; ··· 1054 985 vector_ptr = (unsigned long)__kvm_hyp_vector; 1055 986 1056 987 __cpu_init_hyp_mode(boot_pgd_ptr, pgd_ptr, hyp_stack_ptr, vector_ptr); 988 + __cpu_init_stage2(); 1057 989 1058 990 kvm_arm_init_debug(); 1059 991 } ··· 1105 1035 } 1106 1036 #endif 1107 1037 1038 + static void teardown_common_resources(void) 1039 + { 1040 + free_percpu(kvm_host_cpu_state); 1041 + } 1042 + 1043 + static int init_common_resources(void) 1044 + { 1045 + kvm_host_cpu_state = alloc_percpu(kvm_cpu_context_t); 1046 + if (!kvm_host_cpu_state) { 1047 + kvm_err("Cannot allocate host CPU state\n"); 1048 + return -ENOMEM; 1049 + } 1050 + 1051 + return 0; 1052 + } 1053 + 1054 + static int init_subsystems(void) 1055 + { 1056 + int err; 1057 + 1058 + /* 1059 + * Init HYP view of VGIC 1060 + */ 1061 + err = kvm_vgic_hyp_init(); 1062 + switch (err) { 1063 + case 0: 1064 + vgic_present = true; 1065 + break; 1066 + case -ENODEV: 1067 + case -ENXIO: 1068 + vgic_present = false; 1069 + break; 1070 + default: 1071 + return err; 1072 + } 1073 + 1074 + /* 1075 + * Init HYP architected timer support 1076 + */ 1077 + err = kvm_timer_hyp_init(); 1078 + if (err) 1079 + return err; 1080 + 1081 + kvm_perf_init(); 1082 + kvm_coproc_table_init(); 1083 + 1084 + return 0; 1085 + } 1086 + 1087 + static void teardown_hyp_mode(void) 1088 + { 1089 + int cpu; 1090 + 1091 + if (is_kernel_in_hyp_mode()) 1092 + return; 1093 + 1094 + free_hyp_pgds(); 1095 + for_each_possible_cpu(cpu) 1096 + free_page(per_cpu(kvm_arm_hyp_stack_page, cpu)); 1097 + } 1098 + 1099 + static int init_vhe_mode(void) 1100 + { 1101 + /* 1102 + * Execute the init code on each CPU. 1103 + */ 1104 + on_each_cpu(cpu_init_stage2, NULL, 1); 1105 + 1106 + /* set size of VMID supported by CPU */ 1107 + kvm_vmid_bits = kvm_get_vmid_bits(); 1108 + kvm_info("%d-bit VMID\n", kvm_vmid_bits); 1109 + 1110 + kvm_info("VHE mode initialized successfully\n"); 1111 + return 0; 1112 + } 1113 + 1108 1114 /** 1109 1115 * Inits Hyp-mode on all online CPUs 1110 1116 */ ··· 1211 1065 stack_page = __get_free_page(GFP_KERNEL); 1212 1066 if (!stack_page) { 1213 1067 err = -ENOMEM; 1214 - goto out_free_stack_pages; 1068 + goto out_err; 1215 1069 } 1216 1070 1217 1071 per_cpu(kvm_arm_hyp_stack_page, cpu) = stack_page; ··· 1220 1074 /* 1221 1075 * Map the Hyp-code called directly from the host 1222 1076 */ 1223 - err = create_hyp_mappings(__kvm_hyp_code_start, __kvm_hyp_code_end); 1077 + err = create_hyp_mappings(__hyp_text_start, __hyp_text_end); 1224 1078 if (err) { 1225 1079 kvm_err("Cannot map world-switch code\n"); 1226 - goto out_free_mappings; 1080 + goto out_err; 1227 1081 } 1228 1082 1229 1083 err = create_hyp_mappings(__start_rodata, __end_rodata); 1230 1084 if (err) { 1231 1085 kvm_err("Cannot map rodata section\n"); 1232 - goto out_free_mappings; 1086 + goto out_err; 1233 1087 } 1234 1088 1235 1089 /* ··· 1241 1095 1242 1096 if (err) { 1243 1097 kvm_err("Cannot map hyp stack\n"); 1244 - goto out_free_mappings; 1098 + goto out_err; 1245 1099 } 1246 - } 1247 - 1248 - /* 1249 - * Map the host CPU structures 1250 - */ 1251 - kvm_host_cpu_state = alloc_percpu(kvm_cpu_context_t); 1252 - if (!kvm_host_cpu_state) { 1253 - err = -ENOMEM; 1254 - kvm_err("Cannot allocate host CPU state\n"); 1255 - goto out_free_mappings; 1256 1100 } 1257 1101 1258 1102 for_each_possible_cpu(cpu) { ··· 1253 1117 1254 1118 if (err) { 1255 1119 kvm_err("Cannot map host CPU state: %d\n", err); 1256 - goto out_free_context; 1120 + goto out_err; 1257 1121 } 1258 1122 } 1259 1123 ··· 1262 1126 */ 1263 1127 on_each_cpu(cpu_init_hyp_mode, NULL, 1); 1264 1128 1265 - /* 1266 - * Init HYP view of VGIC 1267 - */ 1268 - err = kvm_vgic_hyp_init(); 1269 - switch (err) { 1270 - case 0: 1271 - vgic_present = true; 1272 - break; 1273 - case -ENODEV: 1274 - case -ENXIO: 1275 - vgic_present = false; 1276 - break; 1277 - default: 1278 - goto out_free_context; 1279 - } 1280 - 1281 - /* 1282 - * Init HYP architected timer support 1283 - */ 1284 - err = kvm_timer_hyp_init(); 1285 - if (err) 1286 - goto out_free_context; 1287 - 1288 1129 #ifndef CONFIG_HOTPLUG_CPU 1289 1130 free_boot_hyp_pgd(); 1290 1131 #endif 1291 1132 1292 - kvm_perf_init(); 1133 + cpu_notifier_register_begin(); 1134 + 1135 + err = __register_cpu_notifier(&hyp_init_cpu_nb); 1136 + 1137 + cpu_notifier_register_done(); 1138 + 1139 + if (err) { 1140 + kvm_err("Cannot register HYP init CPU notifier (%d)\n", err); 1141 + goto out_err; 1142 + } 1143 + 1144 + hyp_cpu_pm_init(); 1293 1145 1294 1146 /* set size of VMID supported by CPU */ 1295 1147 kvm_vmid_bits = kvm_get_vmid_bits(); ··· 1286 1162 kvm_info("Hyp mode initialized successfully\n"); 1287 1163 1288 1164 return 0; 1289 - out_free_context: 1290 - free_percpu(kvm_host_cpu_state); 1291 - out_free_mappings: 1292 - free_hyp_pgds(); 1293 - out_free_stack_pages: 1294 - for_each_possible_cpu(cpu) 1295 - free_page(per_cpu(kvm_arm_hyp_stack_page, cpu)); 1165 + 1296 1166 out_err: 1167 + teardown_hyp_mode(); 1297 1168 kvm_err("error initializing Hyp mode: %d\n", err); 1298 1169 return err; 1299 1170 } ··· 1332 1213 } 1333 1214 } 1334 1215 1335 - cpu_notifier_register_begin(); 1216 + err = init_common_resources(); 1217 + if (err) 1218 + return err; 1336 1219 1337 - err = init_hyp_mode(); 1220 + if (is_kernel_in_hyp_mode()) 1221 + err = init_vhe_mode(); 1222 + else 1223 + err = init_hyp_mode(); 1338 1224 if (err) 1339 1225 goto out_err; 1340 1226 1341 - err = __register_cpu_notifier(&hyp_init_cpu_nb); 1342 - if (err) { 1343 - kvm_err("Cannot register HYP init CPU notifier (%d)\n", err); 1344 - goto out_err; 1345 - } 1227 + err = init_subsystems(); 1228 + if (err) 1229 + goto out_hyp; 1346 1230 1347 - cpu_notifier_register_done(); 1348 - 1349 - hyp_cpu_pm_init(); 1350 - 1351 - kvm_coproc_table_init(); 1352 1231 return 0; 1232 + 1233 + out_hyp: 1234 + teardown_hyp_mode(); 1353 1235 out_err: 1354 - cpu_notifier_register_done(); 1236 + teardown_common_resources(); 1355 1237 return err; 1356 1238 } 1357 1239

+70 -56

arch/arm/kvm/coproc.c

··· 16 16 * along with this program; if not, write to the Free Software 17 17 * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. 18 18 */ 19 + 20 + #include <linux/bsearch.h> 19 21 #include <linux/mm.h> 20 22 #include <linux/kvm_host.h> 21 23 #include <linux/uaccess.h> ··· 56 54 const struct coproc_reg *r, 57 55 u64 val) 58 56 { 59 - vcpu->arch.cp15[r->reg] = val & 0xffffffff; 60 - vcpu->arch.cp15[r->reg + 1] = val >> 32; 57 + vcpu_cp15(vcpu, r->reg) = val & 0xffffffff; 58 + vcpu_cp15(vcpu, r->reg + 1) = val >> 32; 61 59 } 62 60 63 61 static inline u64 vcpu_cp15_reg64_get(struct kvm_vcpu *vcpu, ··· 65 63 { 66 64 u64 val; 67 65 68 - val = vcpu->arch.cp15[r->reg + 1]; 66 + val = vcpu_cp15(vcpu, r->reg + 1); 69 67 val = val << 32; 70 - val = val | vcpu->arch.cp15[r->reg]; 68 + val = val | vcpu_cp15(vcpu, r->reg); 71 69 return val; 72 70 } 73 71 ··· 106 104 * vcpu_id, but we read the 'U' bit from the underlying 107 105 * hardware directly. 108 106 */ 109 - vcpu->arch.cp15[c0_MPIDR] = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) | 107 + vcpu_cp15(vcpu, c0_MPIDR) = ((read_cpuid_mpidr() & MPIDR_SMP_BITMASK) | 110 108 ((vcpu->vcpu_id >> 2) << MPIDR_LEVEL_BITS) | 111 109 (vcpu->vcpu_id & 3)); 112 110 } ··· 119 117 if (p->is_write) 120 118 return ignore_write(vcpu, p); 121 119 122 - *vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[c1_ACTLR]; 120 + *vcpu_reg(vcpu, p->Rt1) = vcpu_cp15(vcpu, c1_ACTLR); 123 121 return true; 124 122 } 125 123 ··· 141 139 if (p->is_write) 142 140 return ignore_write(vcpu, p); 143 141 144 - *vcpu_reg(vcpu, p->Rt1) = vcpu->arch.cp15[c9_L2CTLR]; 142 + *vcpu_reg(vcpu, p->Rt1) = vcpu_cp15(vcpu, c9_L2CTLR); 145 143 return true; 146 144 } 147 145 ··· 158 156 ncores = min(ncores, 3U); 159 157 l2ctlr |= (ncores & 3) << 24; 160 158 161 - vcpu->arch.cp15[c9_L2CTLR] = l2ctlr; 159 + vcpu_cp15(vcpu, c9_L2CTLR) = l2ctlr; 162 160 } 163 161 164 162 static void reset_actlr(struct kvm_vcpu *vcpu, const struct coproc_reg *r) ··· 173 171 else 174 172 actlr &= ~(1U << 6); 175 173 176 - vcpu->arch.cp15[c1_ACTLR] = actlr; 174 + vcpu_cp15(vcpu, c1_ACTLR) = actlr; 177 175 } 178 176 179 177 /* ··· 220 218 221 219 BUG_ON(!p->is_write); 222 220 223 - vcpu->arch.cp15[r->reg] = *vcpu_reg(vcpu, p->Rt1); 221 + vcpu_cp15(vcpu, r->reg) = *vcpu_reg(vcpu, p->Rt1); 224 222 if (p->is_64bit) 225 - vcpu->arch.cp15[r->reg + 1] = *vcpu_reg(vcpu, p->Rt2); 223 + vcpu_cp15(vcpu, r->reg + 1) = *vcpu_reg(vcpu, p->Rt2); 226 224 227 225 kvm_toggle_cache(vcpu, was_enabled); 228 226 return true; ··· 383 381 { CRn(15), CRm( 0), Op1( 4), Op2( 0), is32, access_cbar}, 384 382 }; 385 383 384 + static int check_reg_table(const struct coproc_reg *table, unsigned int n) 385 + { 386 + unsigned int i; 387 + 388 + for (i = 1; i < n; i++) { 389 + if (cmp_reg(&table[i-1], &table[i]) >= 0) { 390 + kvm_err("reg table %p out of order (%d)\n", table, i - 1); 391 + return 1; 392 + } 393 + } 394 + 395 + return 0; 396 + } 397 + 386 398 /* Target specific emulation tables */ 387 399 static struct kvm_coproc_target_table *target_tables[KVM_ARM_NUM_TARGETS]; 388 400 389 401 void kvm_register_target_coproc_table(struct kvm_coproc_target_table *table) 390 402 { 391 - unsigned int i; 392 - 393 - for (i = 1; i < table->num; i++) 394 - BUG_ON(cmp_reg(&table->table[i-1], 395 - &table->table[i]) >= 0); 396 - 403 + BUG_ON(check_reg_table(table->table, table->num)); 397 404 target_tables[table->target] = table; 398 405 } 399 406 ··· 416 405 return table->table; 417 406 } 418 407 408 + #define reg_to_match_value(x) \ 409 + ({ \ 410 + unsigned long val; \ 411 + val = (x)->CRn << 11; \ 412 + val |= (x)->CRm << 7; \ 413 + val |= (x)->Op1 << 4; \ 414 + val |= (x)->Op2 << 1; \ 415 + val |= !(x)->is_64bit; \ 416 + val; \ 417 + }) 418 + 419 + static int match_reg(const void *key, const void *elt) 420 + { 421 + const unsigned long pval = (unsigned long)key; 422 + const struct coproc_reg *r = elt; 423 + 424 + return pval - reg_to_match_value(r); 425 + } 426 + 419 427 static const struct coproc_reg *find_reg(const struct coproc_params *params, 420 428 const struct coproc_reg table[], 421 429 unsigned int num) 422 430 { 423 - unsigned int i; 431 + unsigned long pval = reg_to_match_value(params); 424 432 425 - for (i = 0; i < num; i++) { 426 - const struct coproc_reg *r = &table[i]; 427 - 428 - if (params->is_64bit != r->is_64) 429 - continue; 430 - if (params->CRn != r->CRn) 431 - continue; 432 - if (params->CRm != r->CRm) 433 - continue; 434 - if (params->Op1 != r->Op1) 435 - continue; 436 - if (params->Op2 != r->Op2) 437 - continue; 438 - 439 - return r; 440 - } 441 - return NULL; 433 + return bsearch((void *)pval, table, num, sizeof(table[0]), match_reg); 442 434 } 443 435 444 436 static int emulate_cp15(struct kvm_vcpu *vcpu, ··· 659 645 { CRn( 0), CRm( 0), Op1( 0), Op2( 3), is32, NULL, get_TLBTR }, 660 646 { CRn( 0), CRm( 0), Op1( 0), Op2( 6), is32, NULL, get_REVIDR }, 661 647 648 + { CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR }, 649 + { CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR }, 650 + 662 651 { CRn( 0), CRm( 1), Op1( 0), Op2( 0), is32, NULL, get_ID_PFR0 }, 663 652 { CRn( 0), CRm( 1), Op1( 0), Op2( 1), is32, NULL, get_ID_PFR1 }, 664 653 { CRn( 0), CRm( 1), Op1( 0), Op2( 2), is32, NULL, get_ID_DFR0 }, ··· 677 660 { CRn( 0), CRm( 2), Op1( 0), Op2( 3), is32, NULL, get_ID_ISAR3 }, 678 661 { CRn( 0), CRm( 2), Op1( 0), Op2( 4), is32, NULL, get_ID_ISAR4 }, 679 662 { CRn( 0), CRm( 2), Op1( 0), Op2( 5), is32, NULL, get_ID_ISAR5 }, 680 - 681 - { CRn( 0), CRm( 0), Op1( 1), Op2( 1), is32, NULL, get_CLIDR }, 682 - { CRn( 0), CRm( 0), Op1( 1), Op2( 7), is32, NULL, get_AIDR }, 683 663 }; 684 664 685 665 /* ··· 915 901 if (vfpid < num_fp_regs()) { 916 902 if (KVM_REG_SIZE(id) != 8) 917 903 return -ENOENT; 918 - return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpregs[vfpid], 904 + return reg_to_user(uaddr, &vcpu->arch.ctxt.vfp.fpregs[vfpid], 919 905 id); 920 906 } 921 907 ··· 925 911 926 912 switch (vfpid) { 927 913 case KVM_REG_ARM_VFP_FPEXC: 928 - return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpexc, id); 914 + return reg_to_user(uaddr, &vcpu->arch.ctxt.vfp.fpexc, id); 929 915 case KVM_REG_ARM_VFP_FPSCR: 930 - return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpscr, id); 916 + return reg_to_user(uaddr, &vcpu->arch.ctxt.vfp.fpscr, id); 931 917 case KVM_REG_ARM_VFP_FPINST: 932 - return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst, id); 918 + return reg_to_user(uaddr, &vcpu->arch.ctxt.vfp.fpinst, id); 933 919 case KVM_REG_ARM_VFP_FPINST2: 934 - return reg_to_user(uaddr, &vcpu->arch.vfp_guest.fpinst2, id); 920 + return reg_to_user(uaddr, &vcpu->arch.ctxt.vfp.fpinst2, id); 935 921 case KVM_REG_ARM_VFP_MVFR0: 936 922 val = fmrx(MVFR0); 937 923 return reg_to_user(uaddr, &val, id); ··· 959 945 if (vfpid < num_fp_regs()) { 960 946 if (KVM_REG_SIZE(id) != 8) 961 947 return -ENOENT; 962 - return reg_from_user(&vcpu->arch.vfp_guest.fpregs[vfpid], 948 + return reg_from_user(&vcpu->arch.ctxt.vfp.fpregs[vfpid], 963 949 uaddr, id); 964 950 } 965 951 ··· 969 955 970 956 switch (vfpid) { 971 957 case KVM_REG_ARM_VFP_FPEXC: 972 - return reg_from_user(&vcpu->arch.vfp_guest.fpexc, uaddr, id); 958 + return reg_from_user(&vcpu->arch.ctxt.vfp.fpexc, uaddr, id); 973 959 case KVM_REG_ARM_VFP_FPSCR: 974 - return reg_from_user(&vcpu->arch.vfp_guest.fpscr, uaddr, id); 960 + return reg_from_user(&vcpu->arch.ctxt.vfp.fpscr, uaddr, id); 975 961 case KVM_REG_ARM_VFP_FPINST: 976 - return reg_from_user(&vcpu->arch.vfp_guest.fpinst, uaddr, id); 962 + return reg_from_user(&vcpu->arch.ctxt.vfp.fpinst, uaddr, id); 977 963 case KVM_REG_ARM_VFP_FPINST2: 978 - return reg_from_user(&vcpu->arch.vfp_guest.fpinst2, uaddr, id); 964 + return reg_from_user(&vcpu->arch.ctxt.vfp.fpinst2, uaddr, id); 979 965 /* These are invariant. */ 980 966 case KVM_REG_ARM_VFP_MVFR0: 981 967 if (reg_from_user(&val, uaddr, id)) ··· 1044 1030 val = vcpu_cp15_reg64_get(vcpu, r); 1045 1031 ret = reg_to_user(uaddr, &val, reg->id); 1046 1032 } else if (KVM_REG_SIZE(reg->id) == 4) { 1047 - ret = reg_to_user(uaddr, &vcpu->arch.cp15[r->reg], reg->id); 1033 + ret = reg_to_user(uaddr, &vcpu_cp15(vcpu, r->reg), reg->id); 1048 1034 } 1049 1035 1050 1036 return ret; ··· 1074 1060 if (!ret) 1075 1061 vcpu_cp15_reg64_set(vcpu, r, val); 1076 1062 } else if (KVM_REG_SIZE(reg->id) == 4) { 1077 - ret = reg_from_user(&vcpu->arch.cp15[r->reg], uaddr, reg->id); 1063 + ret = reg_from_user(&vcpu_cp15(vcpu, r->reg), uaddr, reg->id); 1078 1064 } 1079 1065 1080 1066 return ret; ··· 1110 1096 static u64 cp15_to_index(const struct coproc_reg *reg) 1111 1097 { 1112 1098 u64 val = KVM_REG_ARM | (15 << KVM_REG_ARM_COPROC_SHIFT); 1113 - if (reg->is_64) { 1099 + if (reg->is_64bit) { 1114 1100 val |= KVM_REG_SIZE_U64; 1115 1101 val |= (reg->Op1 << KVM_REG_ARM_OPC1_SHIFT); 1116 1102 /* ··· 1224 1210 unsigned int i; 1225 1211 1226 1212 /* Make sure tables are unique and in order. */ 1227 - for (i = 1; i < ARRAY_SIZE(cp15_regs); i++) 1228 - BUG_ON(cmp_reg(&cp15_regs[i-1], &cp15_regs[i]) >= 0); 1213 + BUG_ON(check_reg_table(cp15_regs, ARRAY_SIZE(cp15_regs))); 1214 + BUG_ON(check_reg_table(invariant_cp15, ARRAY_SIZE(invariant_cp15))); 1229 1215 1230 1216 /* We abuse the reset function to overwrite the table itself. */ 1231 1217 for (i = 0; i < ARRAY_SIZE(invariant_cp15); i++) ··· 1262 1248 const struct coproc_reg *table; 1263 1249 1264 1250 /* Catch someone adding a register without putting in reset entry. */ 1265 - memset(vcpu->arch.cp15, 0x42, sizeof(vcpu->arch.cp15)); 1251 + memset(vcpu->arch.ctxt.cp15, 0x42, sizeof(vcpu->arch.ctxt.cp15)); 1266 1252 1267 1253 /* Generic chip reset first (so target could override). */ 1268 1254 reset_coproc_regs(vcpu, cp15_regs, ARRAY_SIZE(cp15_regs)); ··· 1271 1257 reset_coproc_regs(vcpu, table, num); 1272 1258 1273 1259 for (num = 1; num < NR_CP15_REGS; num++) 1274 - if (vcpu->arch.cp15[num] == 0x42424242) 1275 - panic("Didn't reset vcpu->arch.cp15[%zi]", num); 1260 + if (vcpu_cp15(vcpu, num) == 0x42424242) 1261 + panic("Didn't reset vcpu_cp15(vcpu, %zi)", num); 1276 1262 }

+12 -12

arch/arm/kvm/coproc.h

··· 37 37 unsigned long Op1; 38 38 unsigned long Op2; 39 39 40 - bool is_64; 40 + bool is_64bit; 41 41 42 42 /* Trapped access from guest, if non-NULL. */ 43 43 bool (*access)(struct kvm_vcpu *, ··· 47 47 /* Initialization for vcpu. */ 48 48 void (*reset)(struct kvm_vcpu *, const struct coproc_reg *); 49 49 50 - /* Index into vcpu->arch.cp15[], or 0 if we don't need to save it. */ 50 + /* Index into vcpu_cp15(vcpu, ...), or 0 if we don't need to save it. */ 51 51 unsigned long reg; 52 52 53 53 /* Value (usually reset value) */ ··· 104 104 const struct coproc_reg *r) 105 105 { 106 106 BUG_ON(!r->reg); 107 - BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.cp15)); 108 - vcpu->arch.cp15[r->reg] = 0xdecafbad; 107 + BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.ctxt.cp15)); 108 + vcpu_cp15(vcpu, r->reg) = 0xdecafbad; 109 109 } 110 110 111 111 static inline void reset_val(struct kvm_vcpu *vcpu, const struct coproc_reg *r) 112 112 { 113 113 BUG_ON(!r->reg); 114 - BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.cp15)); 115 - vcpu->arch.cp15[r->reg] = r->val; 114 + BUG_ON(r->reg >= ARRAY_SIZE(vcpu->arch.ctxt.cp15)); 115 + vcpu_cp15(vcpu, r->reg) = r->val; 116 116 } 117 117 118 118 static inline void reset_unknown64(struct kvm_vcpu *vcpu, 119 119 const struct coproc_reg *r) 120 120 { 121 121 BUG_ON(!r->reg); 122 - BUG_ON(r->reg + 1 >= ARRAY_SIZE(vcpu->arch.cp15)); 122 + BUG_ON(r->reg + 1 >= ARRAY_SIZE(vcpu->arch.ctxt.cp15)); 123 123 124 - vcpu->arch.cp15[r->reg] = 0xdecafbad; 125 - vcpu->arch.cp15[r->reg+1] = 0xd0c0ffee; 124 + vcpu_cp15(vcpu, r->reg) = 0xdecafbad; 125 + vcpu_cp15(vcpu, r->reg+1) = 0xd0c0ffee; 126 126 } 127 127 128 128 static inline int cmp_reg(const struct coproc_reg *i1, ··· 141 141 return i1->Op1 - i2->Op1; 142 142 if (i1->Op2 != i2->Op2) 143 143 return i1->Op2 - i2->Op2; 144 - return i2->is_64 - i1->is_64; 144 + return i2->is_64bit - i1->is_64bit; 145 145 } 146 146 147 147 ··· 150 150 #define CRm64(_x) .CRn = _x, .CRm = 0 151 151 #define Op1(_x) .Op1 = _x 152 152 #define Op2(_x) .Op2 = _x 153 - #define is64 .is_64 = true 154 - #define is32 .is_64 = false 153 + #define is64 .is_64bit = true 154 + #define is32 .is_64bit = false 155 155 156 156 bool access_vm_reg(struct kvm_vcpu *vcpu, 157 157 const struct coproc_params *p,

+17 -17

arch/arm/kvm/emulate.c

··· 112 112 */ 113 113 unsigned long *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num) 114 114 { 115 - unsigned long *reg_array = (unsigned long *)&vcpu->arch.regs; 115 + unsigned long *reg_array = (unsigned long *)&vcpu->arch.ctxt.gp_regs; 116 116 unsigned long mode = *vcpu_cpsr(vcpu) & MODE_MASK; 117 117 118 118 switch (mode) { ··· 147 147 unsigned long mode = *vcpu_cpsr(vcpu) & MODE_MASK; 148 148 switch (mode) { 149 149 case SVC_MODE: 150 - return &vcpu->arch.regs.KVM_ARM_SVC_spsr; 150 + return &vcpu->arch.ctxt.gp_regs.KVM_ARM_SVC_spsr; 151 151 case ABT_MODE: 152 - return &vcpu->arch.regs.KVM_ARM_ABT_spsr; 152 + return &vcpu->arch.ctxt.gp_regs.KVM_ARM_ABT_spsr; 153 153 case UND_MODE: 154 - return &vcpu->arch.regs.KVM_ARM_UND_spsr; 154 + return &vcpu->arch.ctxt.gp_regs.KVM_ARM_UND_spsr; 155 155 case IRQ_MODE: 156 - return &vcpu->arch.regs.KVM_ARM_IRQ_spsr; 156 + return &vcpu->arch.ctxt.gp_regs.KVM_ARM_IRQ_spsr; 157 157 case FIQ_MODE: 158 - return &vcpu->arch.regs.KVM_ARM_FIQ_spsr; 158 + return &vcpu->arch.ctxt.gp_regs.KVM_ARM_FIQ_spsr; 159 159 default: 160 160 BUG(); 161 161 } ··· 266 266 267 267 static u32 exc_vector_base(struct kvm_vcpu *vcpu) 268 268 { 269 - u32 sctlr = vcpu->arch.cp15[c1_SCTLR]; 270 - u32 vbar = vcpu->arch.cp15[c12_VBAR]; 269 + u32 sctlr = vcpu_cp15(vcpu, c1_SCTLR); 270 + u32 vbar = vcpu_cp15(vcpu, c12_VBAR); 271 271 272 272 if (sctlr & SCTLR_V) 273 273 return 0xffff0000; ··· 282 282 static void kvm_update_psr(struct kvm_vcpu *vcpu, unsigned long mode) 283 283 { 284 284 unsigned long cpsr = *vcpu_cpsr(vcpu); 285 - u32 sctlr = vcpu->arch.cp15[c1_SCTLR]; 285 + u32 sctlr = vcpu_cp15(vcpu, c1_SCTLR); 286 286 287 287 *vcpu_cpsr(vcpu) = (cpsr & ~MODE_MASK) | mode; 288 288 ··· 357 357 358 358 if (is_pabt) { 359 359 /* Set IFAR and IFSR */ 360 - vcpu->arch.cp15[c6_IFAR] = addr; 361 - is_lpae = (vcpu->arch.cp15[c2_TTBCR] >> 31); 360 + vcpu_cp15(vcpu, c6_IFAR) = addr; 361 + is_lpae = (vcpu_cp15(vcpu, c2_TTBCR) >> 31); 362 362 /* Always give debug fault for now - should give guest a clue */ 363 363 if (is_lpae) 364 - vcpu->arch.cp15[c5_IFSR] = 1 << 9 | 0x22; 364 + vcpu_cp15(vcpu, c5_IFSR) = 1 << 9 | 0x22; 365 365 else 366 - vcpu->arch.cp15[c5_IFSR] = 2; 366 + vcpu_cp15(vcpu, c5_IFSR) = 2; 367 367 } else { /* !iabt */ 368 368 /* Set DFAR and DFSR */ 369 - vcpu->arch.cp15[c6_DFAR] = addr; 370 - is_lpae = (vcpu->arch.cp15[c2_TTBCR] >> 31); 369 + vcpu_cp15(vcpu, c6_DFAR) = addr; 370 + is_lpae = (vcpu_cp15(vcpu, c2_TTBCR) >> 31); 371 371 /* Always give debug fault for now - should give guest a clue */ 372 372 if (is_lpae) 373 - vcpu->arch.cp15[c5_DFSR] = 1 << 9 | 0x22; 373 + vcpu_cp15(vcpu, c5_DFSR) = 1 << 9 | 0x22; 374 374 else 375 - vcpu->arch.cp15[c5_DFSR] = 2; 375 + vcpu_cp15(vcpu, c5_DFSR) = 2; 376 376 } 377 377 378 378 }

+2 -3

arch/arm/kvm/guest.c

··· 25 25 #include <asm/cputype.h> 26 26 #include <asm/uaccess.h> 27 27 #include <asm/kvm.h> 28 - #include <asm/kvm_asm.h> 29 28 #include <asm/kvm_emulate.h> 30 29 #include <asm/kvm_coproc.h> 31 30 ··· 54 55 static int get_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) 55 56 { 56 57 u32 __user *uaddr = (u32 __user *)(long)reg->addr; 57 - struct kvm_regs *regs = &vcpu->arch.regs; 58 + struct kvm_regs *regs = &vcpu->arch.ctxt.gp_regs; 58 59 u64 off; 59 60 60 61 if (KVM_REG_SIZE(reg->id) != 4) ··· 71 72 static int set_core_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg) 72 73 { 73 74 u32 __user *uaddr = (u32 __user *)(long)reg->addr; 74 - struct kvm_regs *regs = &vcpu->arch.regs; 75 + struct kvm_regs *regs = &vcpu->arch.ctxt.gp_regs; 75 76 u64 off, val; 76 77 77 78 if (KVM_REG_SIZE(reg->id) != 4)

-7

arch/arm/kvm/handle_exit.c

··· 147 147 switch (exception_index) { 148 148 case ARM_EXCEPTION_IRQ: 149 149 return 1; 150 - case ARM_EXCEPTION_UNDEFINED: 151 - kvm_err("Undefined exception in Hyp mode at: %#08lx\n", 152 - kvm_vcpu_get_hyp_pc(vcpu)); 153 - BUG(); 154 - panic("KVM: Hypervisor undefined exception!\n"); 155 - case ARM_EXCEPTION_DATA_ABORT: 156 - case ARM_EXCEPTION_PREF_ABORT: 157 150 case ARM_EXCEPTION_HVC: 158 151 /* 159 152 * See ARM ARM B1.14.1: "Hyp traps on instructions

+17

arch/arm/kvm/hyp/Makefile

··· 1 + # 2 + # Makefile for Kernel-based Virtual Machine module, HYP part 3 + # 4 + 5 + KVM=../../../../virt/kvm 6 + 7 + obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o 8 + obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o 9 + 10 + obj-$(CONFIG_KVM_ARM_HOST) += tlb.o 11 + obj-$(CONFIG_KVM_ARM_HOST) += cp15-sr.o 12 + obj-$(CONFIG_KVM_ARM_HOST) += vfp.o 13 + obj-$(CONFIG_KVM_ARM_HOST) += banked-sr.o 14 + obj-$(CONFIG_KVM_ARM_HOST) += entry.o 15 + obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o 16 + obj-$(CONFIG_KVM_ARM_HOST) += switch.o 17 + obj-$(CONFIG_KVM_ARM_HOST) += s2-setup.o

+77

arch/arm/kvm/hyp/banked-sr.c

··· 1 + /* 2 + * Original code: 3 + * Copyright (C) 2012 - Virtual Open Systems and Columbia University 4 + * Author: Christoffer Dall <c.dall@virtualopensystems.com> 5 + * 6 + * Mostly rewritten in C by Marc Zyngier <marc.zyngier@arm.com> 7 + * 8 + * This program is free software; you can redistribute it and/or modify 9 + * it under the terms of the GNU General Public License version 2 as 10 + * published by the Free Software Foundation. 11 + * 12 + * This program is distributed in the hope that it will be useful, 13 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 14 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15 + * GNU General Public License for more details. 16 + * 17 + * You should have received a copy of the GNU General Public License 18 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 19 + */ 20 + 21 + #include <asm/kvm_hyp.h> 22 + 23 + __asm__(".arch_extension virt"); 24 + 25 + void __hyp_text __banked_save_state(struct kvm_cpu_context *ctxt) 26 + { 27 + ctxt->gp_regs.usr_regs.ARM_sp = read_special(SP_usr); 28 + ctxt->gp_regs.usr_regs.ARM_pc = read_special(ELR_hyp); 29 + ctxt->gp_regs.usr_regs.ARM_cpsr = read_special(SPSR); 30 + ctxt->gp_regs.KVM_ARM_SVC_sp = read_special(SP_svc); 31 + ctxt->gp_regs.KVM_ARM_SVC_lr = read_special(LR_svc); 32 + ctxt->gp_regs.KVM_ARM_SVC_spsr = read_special(SPSR_svc); 33 + ctxt->gp_regs.KVM_ARM_ABT_sp = read_special(SP_abt); 34 + ctxt->gp_regs.KVM_ARM_ABT_lr = read_special(LR_abt); 35 + ctxt->gp_regs.KVM_ARM_ABT_spsr = read_special(SPSR_abt); 36 + ctxt->gp_regs.KVM_ARM_UND_sp = read_special(SP_und); 37 + ctxt->gp_regs.KVM_ARM_UND_lr = read_special(LR_und); 38 + ctxt->gp_regs.KVM_ARM_UND_spsr = read_special(SPSR_und); 39 + ctxt->gp_regs.KVM_ARM_IRQ_sp = read_special(SP_irq); 40 + ctxt->gp_regs.KVM_ARM_IRQ_lr = read_special(LR_irq); 41 + ctxt->gp_regs.KVM_ARM_IRQ_spsr = read_special(SPSR_irq); 42 + ctxt->gp_regs.KVM_ARM_FIQ_r8 = read_special(R8_fiq); 43 + ctxt->gp_regs.KVM_ARM_FIQ_r9 = read_special(R9_fiq); 44 + ctxt->gp_regs.KVM_ARM_FIQ_r10 = read_special(R10_fiq); 45 + ctxt->gp_regs.KVM_ARM_FIQ_fp = read_special(R11_fiq); 46 + ctxt->gp_regs.KVM_ARM_FIQ_ip = read_special(R12_fiq); 47 + ctxt->gp_regs.KVM_ARM_FIQ_sp = read_special(SP_fiq); 48 + ctxt->gp_regs.KVM_ARM_FIQ_lr = read_special(LR_fiq); 49 + ctxt->gp_regs.KVM_ARM_FIQ_spsr = read_special(SPSR_fiq); 50 + } 51 + 52 + void __hyp_text __banked_restore_state(struct kvm_cpu_context *ctxt) 53 + { 54 + write_special(ctxt->gp_regs.usr_regs.ARM_sp, SP_usr); 55 + write_special(ctxt->gp_regs.usr_regs.ARM_pc, ELR_hyp); 56 + write_special(ctxt->gp_regs.usr_regs.ARM_cpsr, SPSR_cxsf); 57 + write_special(ctxt->gp_regs.KVM_ARM_SVC_sp, SP_svc); 58 + write_special(ctxt->gp_regs.KVM_ARM_SVC_lr, LR_svc); 59 + write_special(ctxt->gp_regs.KVM_ARM_SVC_spsr, SPSR_svc); 60 + write_special(ctxt->gp_regs.KVM_ARM_ABT_sp, SP_abt); 61 + write_special(ctxt->gp_regs.KVM_ARM_ABT_lr, LR_abt); 62 + write_special(ctxt->gp_regs.KVM_ARM_ABT_spsr, SPSR_abt); 63 + write_special(ctxt->gp_regs.KVM_ARM_UND_sp, SP_und); 64 + write_special(ctxt->gp_regs.KVM_ARM_UND_lr, LR_und); 65 + write_special(ctxt->gp_regs.KVM_ARM_UND_spsr, SPSR_und); 66 + write_special(ctxt->gp_regs.KVM_ARM_IRQ_sp, SP_irq); 67 + write_special(ctxt->gp_regs.KVM_ARM_IRQ_lr, LR_irq); 68 + write_special(ctxt->gp_regs.KVM_ARM_IRQ_spsr, SPSR_irq); 69 + write_special(ctxt->gp_regs.KVM_ARM_FIQ_r8, R8_fiq); 70 + write_special(ctxt->gp_regs.KVM_ARM_FIQ_r9, R9_fiq); 71 + write_special(ctxt->gp_regs.KVM_ARM_FIQ_r10, R10_fiq); 72 + write_special(ctxt->gp_regs.KVM_ARM_FIQ_fp, R11_fiq); 73 + write_special(ctxt->gp_regs.KVM_ARM_FIQ_ip, R12_fiq); 74 + write_special(ctxt->gp_regs.KVM_ARM_FIQ_sp, SP_fiq); 75 + write_special(ctxt->gp_regs.KVM_ARM_FIQ_lr, LR_fiq); 76 + write_special(ctxt->gp_regs.KVM_ARM_FIQ_spsr, SPSR_fiq); 77 + }

+84

arch/arm/kvm/hyp/cp15-sr.c

··· 1 + /* 2 + * Original code: 3 + * Copyright (C) 2012 - Virtual Open Systems and Columbia University 4 + * Author: Christoffer Dall <c.dall@virtualopensystems.com> 5 + * 6 + * Mostly rewritten in C by Marc Zyngier <marc.zyngier@arm.com> 7 + * 8 + * This program is free software; you can redistribute it and/or modify 9 + * it under the terms of the GNU General Public License version 2 as 10 + * published by the Free Software Foundation. 11 + * 12 + * This program is distributed in the hope that it will be useful, 13 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 14 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15 + * GNU General Public License for more details. 16 + * 17 + * You should have received a copy of the GNU General Public License 18 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 19 + */ 20 + 21 + #include <asm/kvm_hyp.h> 22 + 23 + static u64 *cp15_64(struct kvm_cpu_context *ctxt, int idx) 24 + { 25 + return (u64 *)(ctxt->cp15 + idx); 26 + } 27 + 28 + void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt) 29 + { 30 + ctxt->cp15[c0_MPIDR] = read_sysreg(VMPIDR); 31 + ctxt->cp15[c0_CSSELR] = read_sysreg(CSSELR); 32 + ctxt->cp15[c1_SCTLR] = read_sysreg(SCTLR); 33 + ctxt->cp15[c1_CPACR] = read_sysreg(CPACR); 34 + *cp15_64(ctxt, c2_TTBR0) = read_sysreg(TTBR0); 35 + *cp15_64(ctxt, c2_TTBR1) = read_sysreg(TTBR1); 36 + ctxt->cp15[c2_TTBCR] = read_sysreg(TTBCR); 37 + ctxt->cp15[c3_DACR] = read_sysreg(DACR); 38 + ctxt->cp15[c5_DFSR] = read_sysreg(DFSR); 39 + ctxt->cp15[c5_IFSR] = read_sysreg(IFSR); 40 + ctxt->cp15[c5_ADFSR] = read_sysreg(ADFSR); 41 + ctxt->cp15[c5_AIFSR] = read_sysreg(AIFSR); 42 + ctxt->cp15[c6_DFAR] = read_sysreg(DFAR); 43 + ctxt->cp15[c6_IFAR] = read_sysreg(IFAR); 44 + *cp15_64(ctxt, c7_PAR) = read_sysreg(PAR); 45 + ctxt->cp15[c10_PRRR] = read_sysreg(PRRR); 46 + ctxt->cp15[c10_NMRR] = read_sysreg(NMRR); 47 + ctxt->cp15[c10_AMAIR0] = read_sysreg(AMAIR0); 48 + ctxt->cp15[c10_AMAIR1] = read_sysreg(AMAIR1); 49 + ctxt->cp15[c12_VBAR] = read_sysreg(VBAR); 50 + ctxt->cp15[c13_CID] = read_sysreg(CID); 51 + ctxt->cp15[c13_TID_URW] = read_sysreg(TID_URW); 52 + ctxt->cp15[c13_TID_URO] = read_sysreg(TID_URO); 53 + ctxt->cp15[c13_TID_PRIV] = read_sysreg(TID_PRIV); 54 + ctxt->cp15[c14_CNTKCTL] = read_sysreg(CNTKCTL); 55 + } 56 + 57 + void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt) 58 + { 59 + write_sysreg(ctxt->cp15[c0_MPIDR], VMPIDR); 60 + write_sysreg(ctxt->cp15[c0_CSSELR], CSSELR); 61 + write_sysreg(ctxt->cp15[c1_SCTLR], SCTLR); 62 + write_sysreg(ctxt->cp15[c1_CPACR], CPACR); 63 + write_sysreg(*cp15_64(ctxt, c2_TTBR0), TTBR0); 64 + write_sysreg(*cp15_64(ctxt, c2_TTBR1), TTBR1); 65 + write_sysreg(ctxt->cp15[c2_TTBCR], TTBCR); 66 + write_sysreg(ctxt->cp15[c3_DACR], DACR); 67 + write_sysreg(ctxt->cp15[c5_DFSR], DFSR); 68 + write_sysreg(ctxt->cp15[c5_IFSR], IFSR); 69 + write_sysreg(ctxt->cp15[c5_ADFSR], ADFSR); 70 + write_sysreg(ctxt->cp15[c5_AIFSR], AIFSR); 71 + write_sysreg(ctxt->cp15[c6_DFAR], DFAR); 72 + write_sysreg(ctxt->cp15[c6_IFAR], IFAR); 73 + write_sysreg(*cp15_64(ctxt, c7_PAR), PAR); 74 + write_sysreg(ctxt->cp15[c10_PRRR], PRRR); 75 + write_sysreg(ctxt->cp15[c10_NMRR], NMRR); 76 + write_sysreg(ctxt->cp15[c10_AMAIR0], AMAIR0); 77 + write_sysreg(ctxt->cp15[c10_AMAIR1], AMAIR1); 78 + write_sysreg(ctxt->cp15[c12_VBAR], VBAR); 79 + write_sysreg(ctxt->cp15[c13_CID], CID); 80 + write_sysreg(ctxt->cp15[c13_TID_URW], TID_URW); 81 + write_sysreg(ctxt->cp15[c13_TID_URO], TID_URO); 82 + write_sysreg(ctxt->cp15[c13_TID_PRIV], TID_PRIV); 83 + write_sysreg(ctxt->cp15[c14_CNTKCTL], CNTKCTL); 84 + }

+101

arch/arm/kvm/hyp/entry.S

··· 1 + /* 2 + * Copyright (C) 2016 - ARM Ltd 3 + * Author: Marc Zyngier <marc.zyngier@arm.com> 4 + * 5 + * This program is free software; you can redistribute it and/or modify 6 + * it under the terms of the GNU General Public License version 2 as 7 + * published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it will be useful, 10 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 + * GNU General Public License for more details. 13 + * 14 + * You should have received a copy of the GNU General Public License 15 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 + */ 17 + 18 + #include <linux/linkage.h> 19 + #include <asm/asm-offsets.h> 20 + #include <asm/kvm_arm.h> 21 + 22 + .arch_extension virt 23 + 24 + .text 25 + .pushsection .hyp.text, "ax" 26 + 27 + #define USR_REGS_OFFSET (CPU_CTXT_GP_REGS + GP_REGS_USR) 28 + 29 + /* int __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host) */ 30 + ENTRY(__guest_enter) 31 + @ Save host registers 32 + add r1, r1, #(USR_REGS_OFFSET + S_R4) 33 + stm r1!, {r4-r12} 34 + str lr, [r1, #4] @ Skip SP_usr (already saved) 35 + 36 + @ Restore guest registers 37 + add r0, r0, #(VCPU_GUEST_CTXT + USR_REGS_OFFSET + S_R0) 38 + ldr lr, [r0, #S_LR] 39 + ldm r0, {r0-r12} 40 + 41 + clrex 42 + eret 43 + ENDPROC(__guest_enter) 44 + 45 + ENTRY(__guest_exit) 46 + /* 47 + * return convention: 48 + * guest r0, r1, r2 saved on the stack 49 + * r0: vcpu pointer 50 + * r1: exception code 51 + */ 52 + 53 + add r2, r0, #(VCPU_GUEST_CTXT + USR_REGS_OFFSET + S_R3) 54 + stm r2!, {r3-r12} 55 + str lr, [r2, #4] 56 + add r2, r0, #(VCPU_GUEST_CTXT + USR_REGS_OFFSET + S_R0) 57 + pop {r3, r4, r5} @ r0, r1, r2 58 + stm r2, {r3-r5} 59 + 60 + ldr r0, [r0, #VCPU_HOST_CTXT] 61 + add r0, r0, #(USR_REGS_OFFSET + S_R4) 62 + ldm r0!, {r4-r12} 63 + ldr lr, [r0, #4] 64 + 65 + mov r0, r1 66 + bx lr 67 + ENDPROC(__guest_exit) 68 + 69 + /* 70 + * If VFPv3 support is not available, then we will not switch the VFP 71 + * registers; however cp10 and cp11 accesses will still trap and fallback 72 + * to the regular coprocessor emulation code, which currently will 73 + * inject an undefined exception to the guest. 74 + */ 75 + #ifdef CONFIG_VFPv3 76 + ENTRY(__vfp_guest_restore) 77 + push {r3, r4, lr} 78 + 79 + @ NEON/VFP used. Turn on VFP access. 80 + mrc p15, 4, r1, c1, c1, 2 @ HCPTR 81 + bic r1, r1, #(HCPTR_TCP(10) | HCPTR_TCP(11)) 82 + mcr p15, 4, r1, c1, c1, 2 @ HCPTR 83 + isb 84 + 85 + @ Switch VFP/NEON hardware state to the guest's 86 + mov r4, r0 87 + ldr r0, [r0, #VCPU_HOST_CTXT] 88 + add r0, r0, #CPU_CTXT_VFP 89 + bl __vfp_save_state 90 + add r0, r4, #(VCPU_GUEST_CTXT + CPU_CTXT_VFP) 91 + bl __vfp_restore_state 92 + 93 + pop {r3, r4, lr} 94 + pop {r0, r1, r2} 95 + clrex 96 + eret 97 + ENDPROC(__vfp_guest_restore) 98 + #endif 99 + 100 + .popsection 101 +

+169

arch/arm/kvm/hyp/hyp-entry.S

··· 1 + /* 2 + * Copyright (C) 2012 - Virtual Open Systems and Columbia University 3 + * Author: Christoffer Dall <c.dall@virtualopensystems.com> 4 + * 5 + * This program is free software; you can redistribute it and/or modify 6 + * it under the terms of the GNU General Public License, version 2, as 7 + * published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it will be useful, 10 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 + * GNU General Public License for more details. 13 + * 14 + * You should have received a copy of the GNU General Public License 15 + * along with this program; if not, write to the Free Software 16 + * Foundation, 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. 17 + */ 18 + 19 + #include <linux/linkage.h> 20 + #include <asm/kvm_arm.h> 21 + #include <asm/kvm_asm.h> 22 + 23 + .arch_extension virt 24 + 25 + .text 26 + .pushsection .hyp.text, "ax" 27 + 28 + .macro load_vcpu reg 29 + mrc p15, 4, \reg, c13, c0, 2 @ HTPIDR 30 + .endm 31 + 32 + /******************************************************************** 33 + * Hypervisor exception vector and handlers 34 + * 35 + * 36 + * The KVM/ARM Hypervisor ABI is defined as follows: 37 + * 38 + * Entry to Hyp mode from the host kernel will happen _only_ when an HVC 39 + * instruction is issued since all traps are disabled when running the host 40 + * kernel as per the Hyp-mode initialization at boot time. 41 + * 42 + * HVC instructions cause a trap to the vector page + offset 0x14 (see hyp_hvc 43 + * below) when the HVC instruction is called from SVC mode (i.e. a guest or the 44 + * host kernel) and they cause a trap to the vector page + offset 0x8 when HVC 45 + * instructions are called from within Hyp-mode. 46 + * 47 + * Hyp-ABI: Calling HYP-mode functions from host (in SVC mode): 48 + * Switching to Hyp mode is done through a simple HVC #0 instruction. The 49 + * exception vector code will check that the HVC comes from VMID==0. 50 + * - r0 contains a pointer to a HYP function 51 + * - r1, r2, and r3 contain arguments to the above function. 52 + * - The HYP function will be called with its arguments in r0, r1 and r2. 53 + * On HYP function return, we return directly to SVC. 54 + * 55 + * Note that the above is used to execute code in Hyp-mode from a host-kernel 56 + * point of view, and is a different concept from performing a world-switch and 57 + * executing guest code SVC mode (with a VMID != 0). 58 + */ 59 + 60 + .align 5 61 + __kvm_hyp_vector: 62 + .global __kvm_hyp_vector 63 + 64 + @ Hyp-mode exception vector 65 + W(b) hyp_reset 66 + W(b) hyp_undef 67 + W(b) hyp_svc 68 + W(b) hyp_pabt 69 + W(b) hyp_dabt 70 + W(b) hyp_hvc 71 + W(b) hyp_irq 72 + W(b) hyp_fiq 73 + 74 + .macro invalid_vector label, cause 75 + .align 76 + \label: mov r0, #\cause 77 + b __hyp_panic 78 + .endm 79 + 80 + invalid_vector hyp_reset ARM_EXCEPTION_RESET 81 + invalid_vector hyp_undef ARM_EXCEPTION_UNDEFINED 82 + invalid_vector hyp_svc ARM_EXCEPTION_SOFTWARE 83 + invalid_vector hyp_pabt ARM_EXCEPTION_PREF_ABORT 84 + invalid_vector hyp_dabt ARM_EXCEPTION_DATA_ABORT 85 + invalid_vector hyp_fiq ARM_EXCEPTION_FIQ 86 + 87 + ENTRY(__hyp_do_panic) 88 + mrs lr, cpsr 89 + bic lr, lr, #MODE_MASK 90 + orr lr, lr, #SVC_MODE 91 + THUMB( orr lr, lr, #PSR_T_BIT ) 92 + msr spsr_cxsf, lr 93 + ldr lr, =panic 94 + msr ELR_hyp, lr 95 + ldr lr, =kvm_call_hyp 96 + clrex 97 + eret 98 + ENDPROC(__hyp_do_panic) 99 + 100 + hyp_hvc: 101 + /* 102 + * Getting here is either because of a trap from a guest, 103 + * or from executing HVC from the host kernel, which means 104 + * "do something in Hyp mode". 105 + */ 106 + push {r0, r1, r2} 107 + 108 + @ Check syndrome register 109 + mrc p15, 4, r1, c5, c2, 0 @ HSR 110 + lsr r0, r1, #HSR_EC_SHIFT 111 + cmp r0, #HSR_EC_HVC 112 + bne guest_trap @ Not HVC instr. 113 + 114 + /* 115 + * Let's check if the HVC came from VMID 0 and allow simple 116 + * switch to Hyp mode 117 + */ 118 + mrrc p15, 6, r0, r2, c2 119 + lsr r2, r2, #16 120 + and r2, r2, #0xff 121 + cmp r2, #0 122 + bne guest_trap @ Guest called HVC 123 + 124 + /* 125 + * Getting here means host called HVC, we shift parameters and branch 126 + * to Hyp function. 127 + */ 128 + pop {r0, r1, r2} 129 + 130 + /* Check for __hyp_get_vectors */ 131 + cmp r0, #-1 132 + mrceq p15, 4, r0, c12, c0, 0 @ get HVBAR 133 + beq 1f 134 + 135 + push {lr} 136 + 137 + mov lr, r0 138 + mov r0, r1 139 + mov r1, r2 140 + mov r2, r3 141 + 142 + THUMB( orr lr, #1) 143 + blx lr @ Call the HYP function 144 + 145 + pop {lr} 146 + 1: eret 147 + 148 + guest_trap: 149 + load_vcpu r0 @ Load VCPU pointer to r0 150 + 151 + #ifdef CONFIG_VFPv3 152 + @ Check for a VFP access 153 + lsr r1, r1, #HSR_EC_SHIFT 154 + cmp r1, #HSR_EC_CP_0_13 155 + beq __vfp_guest_restore 156 + #endif 157 + 158 + mov r1, #ARM_EXCEPTION_HVC 159 + b __guest_exit 160 + 161 + hyp_irq: 162 + push {r0, r1, r2} 163 + mov r1, #ARM_EXCEPTION_IRQ 164 + load_vcpu r0 @ Load VCPU pointer to r0 165 + b __guest_exit 166 + 167 + .ltorg 168 + 169 + .popsection

+33

arch/arm/kvm/hyp/s2-setup.c

··· 1 + /* 2 + * Copyright (C) 2016 - ARM Ltd 3 + * Author: Marc Zyngier <marc.zyngier@arm.com> 4 + * 5 + * This program is free software; you can redistribute it and/or modify 6 + * it under the terms of the GNU General Public License version 2 as 7 + * published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it will be useful, 10 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 + * GNU General Public License for more details. 13 + * 14 + * You should have received a copy of the GNU General Public License 15 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 + */ 17 + 18 + #include <linux/types.h> 19 + #include <asm/kvm_arm.h> 20 + #include <asm/kvm_asm.h> 21 + #include <asm/kvm_hyp.h> 22 + 23 + void __hyp_text __init_stage2_translation(void) 24 + { 25 + u64 val; 26 + 27 + val = read_sysreg(VTCR) & ~VTCR_MASK; 28 + 29 + val |= read_sysreg(HTCR) & VTCR_HTCR_SH; 30 + val |= KVM_VTCR_SL0 | KVM_VTCR_T0SZ | KVM_VTCR_S; 31 + 32 + write_sysreg(val, VTCR); 33 + }

+232

arch/arm/kvm/hyp/switch.c

··· 1 + /* 2 + * Copyright (C) 2015 - ARM Ltd 3 + * Author: Marc Zyngier <marc.zyngier@arm.com> 4 + * 5 + * This program is free software; you can redistribute it and/or modify 6 + * it under the terms of the GNU General Public License version 2 as 7 + * published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it will be useful, 10 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 + * GNU General Public License for more details. 13 + * 14 + * You should have received a copy of the GNU General Public License 15 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 + */ 17 + 18 + #include <asm/kvm_asm.h> 19 + #include <asm/kvm_hyp.h> 20 + 21 + __asm__(".arch_extension virt"); 22 + 23 + /* 24 + * Activate the traps, saving the host's fpexc register before 25 + * overwriting it. We'll restore it on VM exit. 26 + */ 27 + static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu, u32 *fpexc_host) 28 + { 29 + u32 val; 30 + 31 + /* 32 + * We are about to set HCPTR.TCP10/11 to trap all floating point 33 + * register accesses to HYP, however, the ARM ARM clearly states that 34 + * traps are only taken to HYP if the operation would not otherwise 35 + * trap to SVC. Therefore, always make sure that for 32-bit guests, 36 + * we set FPEXC.EN to prevent traps to SVC, when setting the TCP bits. 37 + */ 38 + val = read_sysreg(VFP_FPEXC); 39 + *fpexc_host = val; 40 + if (!(val & FPEXC_EN)) { 41 + write_sysreg(val | FPEXC_EN, VFP_FPEXC); 42 + isb(); 43 + } 44 + 45 + write_sysreg(vcpu->arch.hcr | vcpu->arch.irq_lines, HCR); 46 + /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */ 47 + write_sysreg(HSTR_T(15), HSTR); 48 + write_sysreg(HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11), HCPTR); 49 + val = read_sysreg(HDCR); 50 + write_sysreg(val | HDCR_TPM | HDCR_TPMCR, HDCR); 51 + } 52 + 53 + static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu) 54 + { 55 + u32 val; 56 + 57 + write_sysreg(0, HCR); 58 + write_sysreg(0, HSTR); 59 + val = read_sysreg(HDCR); 60 + write_sysreg(val & ~(HDCR_TPM | HDCR_TPMCR), HDCR); 61 + write_sysreg(0, HCPTR); 62 + } 63 + 64 + static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu) 65 + { 66 + struct kvm *kvm = kern_hyp_va(vcpu->kvm); 67 + write_sysreg(kvm->arch.vttbr, VTTBR); 68 + write_sysreg(vcpu->arch.midr, VPIDR); 69 + } 70 + 71 + static void __hyp_text __deactivate_vm(struct kvm_vcpu *vcpu) 72 + { 73 + write_sysreg(0, VTTBR); 74 + write_sysreg(read_sysreg(MIDR), VPIDR); 75 + } 76 + 77 + static void __hyp_text __vgic_save_state(struct kvm_vcpu *vcpu) 78 + { 79 + __vgic_v2_save_state(vcpu); 80 + } 81 + 82 + static void __hyp_text __vgic_restore_state(struct kvm_vcpu *vcpu) 83 + { 84 + __vgic_v2_restore_state(vcpu); 85 + } 86 + 87 + static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu) 88 + { 89 + u32 hsr = read_sysreg(HSR); 90 + u8 ec = hsr >> HSR_EC_SHIFT; 91 + u32 hpfar, far; 92 + 93 + vcpu->arch.fault.hsr = hsr; 94 + 95 + if (ec == HSR_EC_IABT) 96 + far = read_sysreg(HIFAR); 97 + else if (ec == HSR_EC_DABT) 98 + far = read_sysreg(HDFAR); 99 + else 100 + return true; 101 + 102 + /* 103 + * B3.13.5 Reporting exceptions taken to the Non-secure PL2 mode: 104 + * 105 + * Abort on the stage 2 translation for a memory access from a 106 + * Non-secure PL1 or PL0 mode: 107 + * 108 + * For any Access flag fault or Translation fault, and also for any 109 + * Permission fault on the stage 2 translation of a memory access 110 + * made as part of a translation table walk for a stage 1 translation, 111 + * the HPFAR holds the IPA that caused the fault. Otherwise, the HPFAR 112 + * is UNKNOWN. 113 + */ 114 + if (!(hsr & HSR_DABT_S1PTW) && (hsr & HSR_FSC_TYPE) == FSC_PERM) { 115 + u64 par, tmp; 116 + 117 + par = read_sysreg(PAR); 118 + write_sysreg(far, ATS1CPR); 119 + isb(); 120 + 121 + tmp = read_sysreg(PAR); 122 + write_sysreg(par, PAR); 123 + 124 + if (unlikely(tmp & 1)) 125 + return false; /* Translation failed, back to guest */ 126 + 127 + hpfar = ((tmp >> 12) & ((1UL << 28) - 1)) << 4; 128 + } else { 129 + hpfar = read_sysreg(HPFAR); 130 + } 131 + 132 + vcpu->arch.fault.hxfar = far; 133 + vcpu->arch.fault.hpfar = hpfar; 134 + return true; 135 + } 136 + 137 + static int __hyp_text __guest_run(struct kvm_vcpu *vcpu) 138 + { 139 + struct kvm_cpu_context *host_ctxt; 140 + struct kvm_cpu_context *guest_ctxt; 141 + bool fp_enabled; 142 + u64 exit_code; 143 + u32 fpexc; 144 + 145 + vcpu = kern_hyp_va(vcpu); 146 + write_sysreg(vcpu, HTPIDR); 147 + 148 + host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); 149 + guest_ctxt = &vcpu->arch.ctxt; 150 + 151 + __sysreg_save_state(host_ctxt); 152 + __banked_save_state(host_ctxt); 153 + 154 + __activate_traps(vcpu, &fpexc); 155 + __activate_vm(vcpu); 156 + 157 + __vgic_restore_state(vcpu); 158 + __timer_restore_state(vcpu); 159 + 160 + __sysreg_restore_state(guest_ctxt); 161 + __banked_restore_state(guest_ctxt); 162 + 163 + /* Jump in the fire! */ 164 + again: 165 + exit_code = __guest_enter(vcpu, host_ctxt); 166 + /* And we're baaack! */ 167 + 168 + if (exit_code == ARM_EXCEPTION_HVC && !__populate_fault_info(vcpu)) 169 + goto again; 170 + 171 + fp_enabled = __vfp_enabled(); 172 + 173 + __banked_save_state(guest_ctxt); 174 + __sysreg_save_state(guest_ctxt); 175 + __timer_save_state(vcpu); 176 + __vgic_save_state(vcpu); 177 + 178 + __deactivate_traps(vcpu); 179 + __deactivate_vm(vcpu); 180 + 181 + __banked_restore_state(host_ctxt); 182 + __sysreg_restore_state(host_ctxt); 183 + 184 + if (fp_enabled) { 185 + __vfp_save_state(&guest_ctxt->vfp); 186 + __vfp_restore_state(&host_ctxt->vfp); 187 + } 188 + 189 + write_sysreg(fpexc, VFP_FPEXC); 190 + 191 + return exit_code; 192 + } 193 + 194 + __alias(__guest_run) int __kvm_vcpu_run(struct kvm_vcpu *vcpu); 195 + 196 + static const char * const __hyp_panic_string[] = { 197 + [ARM_EXCEPTION_RESET] = "\nHYP panic: RST PC:%08x CPSR:%08x", 198 + [ARM_EXCEPTION_UNDEFINED] = "\nHYP panic: UNDEF PC:%08x CPSR:%08x", 199 + [ARM_EXCEPTION_SOFTWARE] = "\nHYP panic: SVC PC:%08x CPSR:%08x", 200 + [ARM_EXCEPTION_PREF_ABORT] = "\nHYP panic: PABRT PC:%08x CPSR:%08x", 201 + [ARM_EXCEPTION_DATA_ABORT] = "\nHYP panic: DABRT PC:%08x ADDR:%08x", 202 + [ARM_EXCEPTION_IRQ] = "\nHYP panic: IRQ PC:%08x CPSR:%08x", 203 + [ARM_EXCEPTION_FIQ] = "\nHYP panic: FIQ PC:%08x CPSR:%08x", 204 + [ARM_EXCEPTION_HVC] = "\nHYP panic: HVC PC:%08x CPSR:%08x", 205 + }; 206 + 207 + void __hyp_text __noreturn __hyp_panic(int cause) 208 + { 209 + u32 elr = read_special(ELR_hyp); 210 + u32 val; 211 + 212 + if (cause == ARM_EXCEPTION_DATA_ABORT) 213 + val = read_sysreg(HDFAR); 214 + else 215 + val = read_special(SPSR); 216 + 217 + if (read_sysreg(VTTBR)) { 218 + struct kvm_vcpu *vcpu; 219 + struct kvm_cpu_context *host_ctxt; 220 + 221 + vcpu = (struct kvm_vcpu *)read_sysreg(HTPIDR); 222 + host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); 223 + __deactivate_traps(vcpu); 224 + __deactivate_vm(vcpu); 225 + __sysreg_restore_state(host_ctxt); 226 + } 227 + 228 + /* Call panic for real */ 229 + __hyp_do_panic(__hyp_panic_string[cause], elr, val); 230 + 231 + unreachable(); 232 + }

+70

arch/arm/kvm/hyp/tlb.c

··· 1 + /* 2 + * Original code: 3 + * Copyright (C) 2012 - Virtual Open Systems and Columbia University 4 + * Author: Christoffer Dall <c.dall@virtualopensystems.com> 5 + * 6 + * Mostly rewritten in C by Marc Zyngier <marc.zyngier@arm.com> 7 + * 8 + * This program is free software; you can redistribute it and/or modify 9 + * it under the terms of the GNU General Public License version 2 as 10 + * published by the Free Software Foundation. 11 + * 12 + * This program is distributed in the hope that it will be useful, 13 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 14 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 15 + * GNU General Public License for more details. 16 + * 17 + * You should have received a copy of the GNU General Public License 18 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 19 + */ 20 + 21 + #include <asm/kvm_hyp.h> 22 + 23 + /** 24 + * Flush per-VMID TLBs 25 + * 26 + * __kvm_tlb_flush_vmid(struct kvm *kvm); 27 + * 28 + * We rely on the hardware to broadcast the TLB invalidation to all CPUs 29 + * inside the inner-shareable domain (which is the case for all v7 30 + * implementations). If we come across a non-IS SMP implementation, we'll 31 + * have to use an IPI based mechanism. Until then, we stick to the simple 32 + * hardware assisted version. 33 + * 34 + * As v7 does not support flushing per IPA, just nuke the whole TLB 35 + * instead, ignoring the ipa value. 36 + */ 37 + static void __hyp_text __tlb_flush_vmid(struct kvm *kvm) 38 + { 39 + dsb(ishst); 40 + 41 + /* Switch to requested VMID */ 42 + kvm = kern_hyp_va(kvm); 43 + write_sysreg(kvm->arch.vttbr, VTTBR); 44 + isb(); 45 + 46 + write_sysreg(0, TLBIALLIS); 47 + dsb(ish); 48 + isb(); 49 + 50 + write_sysreg(0, VTTBR); 51 + } 52 + 53 + __alias(__tlb_flush_vmid) void __kvm_tlb_flush_vmid(struct kvm *kvm); 54 + 55 + static void __hyp_text __tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) 56 + { 57 + __tlb_flush_vmid(kvm); 58 + } 59 + 60 + __alias(__tlb_flush_vmid_ipa) void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, 61 + phys_addr_t ipa); 62 + 63 + static void __hyp_text __tlb_flush_vm_context(void) 64 + { 65 + write_sysreg(0, TLBIALLNSNHIS); 66 + write_sysreg(0, ICIALLUIS); 67 + dsb(ish); 68 + } 69 + 70 + __alias(__tlb_flush_vm_context) void __kvm_flush_vm_context(void);

+68

arch/arm/kvm/hyp/vfp.S

··· 1 + /* 2 + * Copyright (C) 2012 - Virtual Open Systems and Columbia University 3 + * Author: Christoffer Dall <c.dall@virtualopensystems.com> 4 + * 5 + * This program is free software; you can redistribute it and/or modify 6 + * it under the terms of the GNU General Public License version 2 as 7 + * published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it will be useful, 10 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 + * GNU General Public License for more details. 13 + * 14 + * You should have received a copy of the GNU General Public License 15 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 + */ 17 + 18 + #include <linux/linkage.h> 19 + #include <asm/vfpmacros.h> 20 + 21 + .text 22 + .pushsection .hyp.text, "ax" 23 + 24 + /* void __vfp_save_state(struct vfp_hard_struct *vfp); */ 25 + ENTRY(__vfp_save_state) 26 + push {r4, r5} 27 + VFPFMRX r1, FPEXC 28 + 29 + @ Make sure *really* VFP is enabled so we can touch the registers. 30 + orr r5, r1, #FPEXC_EN 31 + tst r5, #FPEXC_EX @ Check for VFP Subarchitecture 32 + bic r5, r5, #FPEXC_EX @ FPEXC_EX disable 33 + VFPFMXR FPEXC, r5 34 + isb 35 + 36 + VFPFMRX r2, FPSCR 37 + beq 1f 38 + 39 + @ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so 40 + @ we only need to save them if FPEXC_EX is set. 41 + VFPFMRX r3, FPINST 42 + tst r5, #FPEXC_FP2V 43 + VFPFMRX r4, FPINST2, ne @ vmrsne 44 + 1: 45 + VFPFSTMIA r0, r5 @ Save VFP registers 46 + stm r0, {r1-r4} @ Save FPEXC, FPSCR, FPINST, FPINST2 47 + pop {r4, r5} 48 + bx lr 49 + ENDPROC(__vfp_save_state) 50 + 51 + /* void __vfp_restore_state(struct vfp_hard_struct *vfp); 52 + * Assume FPEXC_EN is on and FPEXC_EX is off */ 53 + ENTRY(__vfp_restore_state) 54 + VFPFLDMIA r0, r1 @ Load VFP registers 55 + ldm r0, {r0-r3} @ Load FPEXC, FPSCR, FPINST, FPINST2 56 + 57 + VFPFMXR FPSCR, r1 58 + tst r0, #FPEXC_EX @ Check for VFP Subarchitecture 59 + beq 1f 60 + VFPFMXR FPINST, r2 61 + tst r0, #FPEXC_FP2V 62 + VFPFMXR FPINST2, r3, ne 63 + 1: 64 + VFPFMXR FPEXC, r0 @ FPEXC (last, in case !EN) 65 + bx lr 66 + ENDPROC(__vfp_restore_state) 67 + 68 + .popsection

-8

arch/arm/kvm/init.S

··· 84 84 orr r0, r0, r1 85 85 mcr p15, 4, r0, c2, c0, 2 @ HTCR 86 86 87 - mrc p15, 4, r1, c2, c1, 2 @ VTCR 88 - ldr r2, =VTCR_MASK 89 - bic r1, r1, r2 90 - bic r0, r0, #(~VTCR_HTCR_SH) @ clear non-reusable HTCR bits 91 - orr r1, r0, r1 92 - orr r1, r1, #(KVM_VTCR_SL0 | KVM_VTCR_T0SZ | KVM_VTCR_S) 93 - mcr p15, 4, r1, c2, c1, 2 @ VTCR 94 - 95 87 @ Use the same memory attributes for hyp. accesses as the kernel 96 88 @ (copy MAIRx ro HMAIRx). 97 89 mrc p15, 0, r0, c10, c2, 0

+3 -477

arch/arm/kvm/interrupts.S

··· 17 17 */ 18 18 19 19 #include <linux/linkage.h> 20 - #include <linux/const.h> 21 - #include <asm/unified.h> 22 - #include <asm/page.h> 23 - #include <asm/ptrace.h> 24 - #include <asm/asm-offsets.h> 25 - #include <asm/kvm_asm.h> 26 - #include <asm/kvm_arm.h> 27 - #include <asm/vfpmacros.h> 28 - #include "interrupts_head.S" 29 20 30 21 .text 31 - 32 - __kvm_hyp_code_start: 33 - .globl __kvm_hyp_code_start 34 - 35 - /******************************************************************** 36 - * Flush per-VMID TLBs 37 - * 38 - * void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa); 39 - * 40 - * We rely on the hardware to broadcast the TLB invalidation to all CPUs 41 - * inside the inner-shareable domain (which is the case for all v7 42 - * implementations). If we come across a non-IS SMP implementation, we'll 43 - * have to use an IPI based mechanism. Until then, we stick to the simple 44 - * hardware assisted version. 45 - * 46 - * As v7 does not support flushing per IPA, just nuke the whole TLB 47 - * instead, ignoring the ipa value. 48 - */ 49 - ENTRY(__kvm_tlb_flush_vmid_ipa) 50 - push {r2, r3} 51 - 52 - dsb ishst 53 - add r0, r0, #KVM_VTTBR 54 - ldrd r2, r3, [r0] 55 - mcrr p15, 6, rr_lo_hi(r2, r3), c2 @ Write VTTBR 56 - isb 57 - mcr p15, 0, r0, c8, c3, 0 @ TLBIALLIS (rt ignored) 58 - dsb ish 59 - isb 60 - mov r2, #0 61 - mov r3, #0 62 - mcrr p15, 6, r2, r3, c2 @ Back to VMID #0 63 - isb @ Not necessary if followed by eret 64 - 65 - pop {r2, r3} 66 - bx lr 67 - ENDPROC(__kvm_tlb_flush_vmid_ipa) 68 - 69 - /** 70 - * void __kvm_tlb_flush_vmid(struct kvm *kvm) - Flush per-VMID TLBs 71 - * 72 - * Reuses __kvm_tlb_flush_vmid_ipa() for ARMv7, without passing address 73 - * parameter 74 - */ 75 - 76 - ENTRY(__kvm_tlb_flush_vmid) 77 - b __kvm_tlb_flush_vmid_ipa 78 - ENDPROC(__kvm_tlb_flush_vmid) 79 - 80 - /******************************************************************** 81 - * Flush TLBs and instruction caches of all CPUs inside the inner-shareable 82 - * domain, for all VMIDs 83 - * 84 - * void __kvm_flush_vm_context(void); 85 - */ 86 - ENTRY(__kvm_flush_vm_context) 87 - mov r0, #0 @ rn parameter for c15 flushes is SBZ 88 - 89 - /* Invalidate NS Non-Hyp TLB Inner Shareable (TLBIALLNSNHIS) */ 90 - mcr p15, 4, r0, c8, c3, 4 91 - /* Invalidate instruction caches Inner Shareable (ICIALLUIS) */ 92 - mcr p15, 0, r0, c7, c1, 0 93 - dsb ish 94 - isb @ Not necessary if followed by eret 95 - 96 - bx lr 97 - ENDPROC(__kvm_flush_vm_context) 98 - 99 - 100 - /******************************************************************** 101 - * Hypervisor world-switch code 102 - * 103 - * 104 - * int __kvm_vcpu_run(struct kvm_vcpu *vcpu) 105 - */ 106 - ENTRY(__kvm_vcpu_run) 107 - @ Save the vcpu pointer 108 - mcr p15, 4, vcpu, c13, c0, 2 @ HTPIDR 109 - 110 - save_host_regs 111 - 112 - restore_vgic_state 113 - restore_timer_state 114 - 115 - @ Store hardware CP15 state and load guest state 116 - read_cp15_state store_to_vcpu = 0 117 - write_cp15_state read_from_vcpu = 1 118 - 119 - @ If the host kernel has not been configured with VFPv3 support, 120 - @ then it is safer if we deny guests from using it as well. 121 - #ifdef CONFIG_VFPv3 122 - @ Set FPEXC_EN so the guest doesn't trap floating point instructions 123 - VFPFMRX r2, FPEXC @ VMRS 124 - push {r2} 125 - orr r2, r2, #FPEXC_EN 126 - VFPFMXR FPEXC, r2 @ VMSR 127 - #endif 128 - 129 - @ Configure Hyp-role 130 - configure_hyp_role vmentry 131 - 132 - @ Trap coprocessor CRx accesses 133 - set_hstr vmentry 134 - set_hcptr vmentry, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)) 135 - set_hdcr vmentry 136 - 137 - @ Write configured ID register into MIDR alias 138 - ldr r1, [vcpu, #VCPU_MIDR] 139 - mcr p15, 4, r1, c0, c0, 0 140 - 141 - @ Write guest view of MPIDR into VMPIDR 142 - ldr r1, [vcpu, #CP15_OFFSET(c0_MPIDR)] 143 - mcr p15, 4, r1, c0, c0, 5 144 - 145 - @ Set up guest memory translation 146 - ldr r1, [vcpu, #VCPU_KVM] 147 - add r1, r1, #KVM_VTTBR 148 - ldrd r2, r3, [r1] 149 - mcrr p15, 6, rr_lo_hi(r2, r3), c2 @ Write VTTBR 150 - 151 - @ We're all done, just restore the GPRs and go to the guest 152 - restore_guest_regs 153 - clrex @ Clear exclusive monitor 154 - eret 155 - 156 - __kvm_vcpu_return: 157 - /* 158 - * return convention: 159 - * guest r0, r1, r2 saved on the stack 160 - * r0: vcpu pointer 161 - * r1: exception code 162 - */ 163 - save_guest_regs 164 - 165 - @ Set VMID == 0 166 - mov r2, #0 167 - mov r3, #0 168 - mcrr p15, 6, r2, r3, c2 @ Write VTTBR 169 - 170 - @ Don't trap coprocessor accesses for host kernel 171 - set_hstr vmexit 172 - set_hdcr vmexit 173 - set_hcptr vmexit, (HCPTR_TTA | HCPTR_TCP(10) | HCPTR_TCP(11)), after_vfp_restore 174 - 175 - #ifdef CONFIG_VFPv3 176 - @ Switch VFP/NEON hardware state to the host's 177 - add r7, vcpu, #VCPU_VFP_GUEST 178 - store_vfp_state r7 179 - add r7, vcpu, #VCPU_VFP_HOST 180 - ldr r7, [r7] 181 - restore_vfp_state r7 182 - 183 - after_vfp_restore: 184 - @ Restore FPEXC_EN which we clobbered on entry 185 - pop {r2} 186 - VFPFMXR FPEXC, r2 187 - #else 188 - after_vfp_restore: 189 - #endif 190 - 191 - @ Reset Hyp-role 192 - configure_hyp_role vmexit 193 - 194 - @ Let host read hardware MIDR 195 - mrc p15, 0, r2, c0, c0, 0 196 - mcr p15, 4, r2, c0, c0, 0 197 - 198 - @ Back to hardware MPIDR 199 - mrc p15, 0, r2, c0, c0, 5 200 - mcr p15, 4, r2, c0, c0, 5 201 - 202 - @ Store guest CP15 state and restore host state 203 - read_cp15_state store_to_vcpu = 1 204 - write_cp15_state read_from_vcpu = 0 205 - 206 - save_timer_state 207 - save_vgic_state 208 - 209 - restore_host_regs 210 - clrex @ Clear exclusive monitor 211 - #ifndef CONFIG_CPU_ENDIAN_BE8 212 - mov r0, r1 @ Return the return code 213 - mov r1, #0 @ Clear upper bits in return value 214 - #else 215 - @ r1 already has return code 216 - mov r0, #0 @ Clear upper bits in return value 217 - #endif /* CONFIG_CPU_ENDIAN_BE8 */ 218 - bx lr @ return to IOCTL 219 22 220 23 /******************************************************************** 221 24 * Call function in Hyp mode 222 25 * 223 26 * 224 - * u64 kvm_call_hyp(void *hypfn, ...); 27 + * unsigned long kvm_call_hyp(void *hypfn, ...); 225 28 * 226 29 * This is not really a variadic function in the classic C-way and care must 227 30 * be taken when calling this to ensure parameters are passed in registers ··· 35 232 * passed as r0, r1, and r2 (a maximum of 3 arguments in addition to the 36 233 * function pointer can be passed). The function being called must be mapped 37 234 * in Hyp mode (see init_hyp_mode in arch/arm/kvm/arm.c). Return values are 38 - * passed in r0 and r1. 235 + * passed in r0 (strictly 32bit). 39 236 * 40 237 * A function pointer with a value of 0xffffffff has a special meaning, 41 238 * and is used to implement __hyp_get_vectors in the same way as in ··· 49 246 ENTRY(kvm_call_hyp) 50 247 hvc #0 51 248 bx lr 52 - 53 - /******************************************************************** 54 - * Hypervisor exception vector and handlers 55 - * 56 - * 57 - * The KVM/ARM Hypervisor ABI is defined as follows: 58 - * 59 - * Entry to Hyp mode from the host kernel will happen _only_ when an HVC 60 - * instruction is issued since all traps are disabled when running the host 61 - * kernel as per the Hyp-mode initialization at boot time. 62 - * 63 - * HVC instructions cause a trap to the vector page + offset 0x14 (see hyp_hvc 64 - * below) when the HVC instruction is called from SVC mode (i.e. a guest or the 65 - * host kernel) and they cause a trap to the vector page + offset 0x8 when HVC 66 - * instructions are called from within Hyp-mode. 67 - * 68 - * Hyp-ABI: Calling HYP-mode functions from host (in SVC mode): 69 - * Switching to Hyp mode is done through a simple HVC #0 instruction. The 70 - * exception vector code will check that the HVC comes from VMID==0 and if 71 - * so will push the necessary state (SPSR, lr_usr) on the Hyp stack. 72 - * - r0 contains a pointer to a HYP function 73 - * - r1, r2, and r3 contain arguments to the above function. 74 - * - The HYP function will be called with its arguments in r0, r1 and r2. 75 - * On HYP function return, we return directly to SVC. 76 - * 77 - * Note that the above is used to execute code in Hyp-mode from a host-kernel 78 - * point of view, and is a different concept from performing a world-switch and 79 - * executing guest code SVC mode (with a VMID != 0). 80 - */ 81 - 82 - /* Handle undef, svc, pabt, or dabt by crashing with a user notice */ 83 - .macro bad_exception exception_code, panic_str 84 - push {r0-r2} 85 - mrrc p15, 6, r0, r1, c2 @ Read VTTBR 86 - lsr r1, r1, #16 87 - ands r1, r1, #0xff 88 - beq 99f 89 - 90 - load_vcpu @ Load VCPU pointer 91 - .if \exception_code == ARM_EXCEPTION_DATA_ABORT 92 - mrc p15, 4, r2, c5, c2, 0 @ HSR 93 - mrc p15, 4, r1, c6, c0, 0 @ HDFAR 94 - str r2, [vcpu, #VCPU_HSR] 95 - str r1, [vcpu, #VCPU_HxFAR] 96 - .endif 97 - .if \exception_code == ARM_EXCEPTION_PREF_ABORT 98 - mrc p15, 4, r2, c5, c2, 0 @ HSR 99 - mrc p15, 4, r1, c6, c0, 2 @ HIFAR 100 - str r2, [vcpu, #VCPU_HSR] 101 - str r1, [vcpu, #VCPU_HxFAR] 102 - .endif 103 - mov r1, #\exception_code 104 - b __kvm_vcpu_return 105 - 106 - @ We were in the host already. Let's craft a panic-ing return to SVC. 107 - 99: mrs r2, cpsr 108 - bic r2, r2, #MODE_MASK 109 - orr r2, r2, #SVC_MODE 110 - THUMB( orr r2, r2, #PSR_T_BIT ) 111 - msr spsr_cxsf, r2 112 - mrs r1, ELR_hyp 113 - ldr r2, =panic 114 - msr ELR_hyp, r2 115 - ldr r0, =\panic_str 116 - clrex @ Clear exclusive monitor 117 - eret 118 - .endm 119 - 120 - .text 121 - 122 - .align 5 123 - __kvm_hyp_vector: 124 - .globl __kvm_hyp_vector 125 - 126 - @ Hyp-mode exception vector 127 - W(b) hyp_reset 128 - W(b) hyp_undef 129 - W(b) hyp_svc 130 - W(b) hyp_pabt 131 - W(b) hyp_dabt 132 - W(b) hyp_hvc 133 - W(b) hyp_irq 134 - W(b) hyp_fiq 135 - 136 - .align 137 - hyp_reset: 138 - b hyp_reset 139 - 140 - .align 141 - hyp_undef: 142 - bad_exception ARM_EXCEPTION_UNDEFINED, und_die_str 143 - 144 - .align 145 - hyp_svc: 146 - bad_exception ARM_EXCEPTION_HVC, svc_die_str 147 - 148 - .align 149 - hyp_pabt: 150 - bad_exception ARM_EXCEPTION_PREF_ABORT, pabt_die_str 151 - 152 - .align 153 - hyp_dabt: 154 - bad_exception ARM_EXCEPTION_DATA_ABORT, dabt_die_str 155 - 156 - .align 157 - hyp_hvc: 158 - /* 159 - * Getting here is either becuase of a trap from a guest or from calling 160 - * HVC from the host kernel, which means "switch to Hyp mode". 161 - */ 162 - push {r0, r1, r2} 163 - 164 - @ Check syndrome register 165 - mrc p15, 4, r1, c5, c2, 0 @ HSR 166 - lsr r0, r1, #HSR_EC_SHIFT 167 - cmp r0, #HSR_EC_HVC 168 - bne guest_trap @ Not HVC instr. 169 - 170 - /* 171 - * Let's check if the HVC came from VMID 0 and allow simple 172 - * switch to Hyp mode 173 - */ 174 - mrrc p15, 6, r0, r2, c2 175 - lsr r2, r2, #16 176 - and r2, r2, #0xff 177 - cmp r2, #0 178 - bne guest_trap @ Guest called HVC 179 - 180 - /* 181 - * Getting here means host called HVC, we shift parameters and branch 182 - * to Hyp function. 183 - */ 184 - pop {r0, r1, r2} 185 - 186 - /* Check for __hyp_get_vectors */ 187 - cmp r0, #-1 188 - mrceq p15, 4, r0, c12, c0, 0 @ get HVBAR 189 - beq 1f 190 - 191 - push {lr} 192 - mrs lr, SPSR 193 - push {lr} 194 - 195 - mov lr, r0 196 - mov r0, r1 197 - mov r1, r2 198 - mov r2, r3 199 - 200 - THUMB( orr lr, #1) 201 - blx lr @ Call the HYP function 202 - 203 - pop {lr} 204 - msr SPSR_csxf, lr 205 - pop {lr} 206 - 1: eret 207 - 208 - guest_trap: 209 - load_vcpu @ Load VCPU pointer to r0 210 - str r1, [vcpu, #VCPU_HSR] 211 - 212 - @ Check if we need the fault information 213 - lsr r1, r1, #HSR_EC_SHIFT 214 - #ifdef CONFIG_VFPv3 215 - cmp r1, #HSR_EC_CP_0_13 216 - beq switch_to_guest_vfp 217 - #endif 218 - cmp r1, #HSR_EC_IABT 219 - mrceq p15, 4, r2, c6, c0, 2 @ HIFAR 220 - beq 2f 221 - cmp r1, #HSR_EC_DABT 222 - bne 1f 223 - mrc p15, 4, r2, c6, c0, 0 @ HDFAR 224 - 225 - 2: str r2, [vcpu, #VCPU_HxFAR] 226 - 227 - /* 228 - * B3.13.5 Reporting exceptions taken to the Non-secure PL2 mode: 229 - * 230 - * Abort on the stage 2 translation for a memory access from a 231 - * Non-secure PL1 or PL0 mode: 232 - * 233 - * For any Access flag fault or Translation fault, and also for any 234 - * Permission fault on the stage 2 translation of a memory access 235 - * made as part of a translation table walk for a stage 1 translation, 236 - * the HPFAR holds the IPA that caused the fault. Otherwise, the HPFAR 237 - * is UNKNOWN. 238 - */ 239 - 240 - /* Check for permission fault, and S1PTW */ 241 - mrc p15, 4, r1, c5, c2, 0 @ HSR 242 - and r0, r1, #HSR_FSC_TYPE 243 - cmp r0, #FSC_PERM 244 - tsteq r1, #(1 << 7) @ S1PTW 245 - mrcne p15, 4, r2, c6, c0, 4 @ HPFAR 246 - bne 3f 247 - 248 - /* Preserve PAR */ 249 - mrrc p15, 0, r0, r1, c7 @ PAR 250 - push {r0, r1} 251 - 252 - /* Resolve IPA using the xFAR */ 253 - mcr p15, 0, r2, c7, c8, 0 @ ATS1CPR 254 - isb 255 - mrrc p15, 0, r0, r1, c7 @ PAR 256 - tst r0, #1 257 - bne 4f @ Failed translation 258 - ubfx r2, r0, #12, #20 259 - lsl r2, r2, #4 260 - orr r2, r2, r1, lsl #24 261 - 262 - /* Restore PAR */ 263 - pop {r0, r1} 264 - mcrr p15, 0, r0, r1, c7 @ PAR 265 - 266 - 3: load_vcpu @ Load VCPU pointer to r0 267 - str r2, [r0, #VCPU_HPFAR] 268 - 269 - 1: mov r1, #ARM_EXCEPTION_HVC 270 - b __kvm_vcpu_return 271 - 272 - 4: pop {r0, r1} @ Failed translation, return to guest 273 - mcrr p15, 0, r0, r1, c7 @ PAR 274 - clrex 275 - pop {r0, r1, r2} 276 - eret 277 - 278 - /* 279 - * If VFPv3 support is not available, then we will not switch the VFP 280 - * registers; however cp10 and cp11 accesses will still trap and fallback 281 - * to the regular coprocessor emulation code, which currently will 282 - * inject an undefined exception to the guest. 283 - */ 284 - #ifdef CONFIG_VFPv3 285 - switch_to_guest_vfp: 286 - push {r3-r7} 287 - 288 - @ NEON/VFP used. Turn on VFP access. 289 - set_hcptr vmtrap, (HCPTR_TCP(10) | HCPTR_TCP(11)) 290 - 291 - @ Switch VFP/NEON hardware state to the guest's 292 - add r7, r0, #VCPU_VFP_HOST 293 - ldr r7, [r7] 294 - store_vfp_state r7 295 - add r7, r0, #VCPU_VFP_GUEST 296 - restore_vfp_state r7 297 - 298 - pop {r3-r7} 299 - pop {r0-r2} 300 - clrex 301 - eret 302 - #endif 303 - 304 - .align 305 - hyp_irq: 306 - push {r0, r1, r2} 307 - mov r1, #ARM_EXCEPTION_IRQ 308 - load_vcpu @ Load VCPU pointer to r0 309 - b __kvm_vcpu_return 310 - 311 - .align 312 - hyp_fiq: 313 - b hyp_fiq 314 - 315 - .ltorg 316 - 317 - __kvm_hyp_code_end: 318 - .globl __kvm_hyp_code_end 319 - 320 - .section ".rodata" 321 - 322 - und_die_str: 323 - .ascii "unexpected undefined exception in Hyp mode at: %#08x\n" 324 - pabt_die_str: 325 - .ascii "unexpected prefetch abort in Hyp mode at: %#08x\n" 326 - dabt_die_str: 327 - .ascii "unexpected data abort in Hyp mode at: %#08x\n" 328 - svc_die_str: 329 - .ascii "unexpected HVC/SVC trap in Hyp mode at: %#08x\n" 249 + ENDPROC(kvm_call_hyp)

-648

arch/arm/kvm/interrupts_head.S

··· 1 - #include <linux/irqchip/arm-gic.h> 2 - #include <asm/assembler.h> 3 - 4 - #define VCPU_USR_REG(_reg_nr) (VCPU_USR_REGS + (_reg_nr * 4)) 5 - #define VCPU_USR_SP (VCPU_USR_REG(13)) 6 - #define VCPU_USR_LR (VCPU_USR_REG(14)) 7 - #define CP15_OFFSET(_cp15_reg_idx) (VCPU_CP15 + (_cp15_reg_idx * 4)) 8 - 9 - /* 10 - * Many of these macros need to access the VCPU structure, which is always 11 - * held in r0. These macros should never clobber r1, as it is used to hold the 12 - * exception code on the return path (except of course the macro that switches 13 - * all the registers before the final jump to the VM). 14 - */ 15 - vcpu .req r0 @ vcpu pointer always in r0 16 - 17 - /* Clobbers {r2-r6} */ 18 - .macro store_vfp_state vfp_base 19 - @ The VFPFMRX and VFPFMXR macros are the VMRS and VMSR instructions 20 - VFPFMRX r2, FPEXC 21 - @ Make sure VFP is enabled so we can touch the registers. 22 - orr r6, r2, #FPEXC_EN 23 - VFPFMXR FPEXC, r6 24 - 25 - VFPFMRX r3, FPSCR 26 - tst r2, #FPEXC_EX @ Check for VFP Subarchitecture 27 - beq 1f 28 - @ If FPEXC_EX is 0, then FPINST/FPINST2 reads are upredictable, so 29 - @ we only need to save them if FPEXC_EX is set. 30 - VFPFMRX r4, FPINST 31 - tst r2, #FPEXC_FP2V 32 - VFPFMRX r5, FPINST2, ne @ vmrsne 33 - bic r6, r2, #FPEXC_EX @ FPEXC_EX disable 34 - VFPFMXR FPEXC, r6 35 - 1: 36 - VFPFSTMIA \vfp_base, r6 @ Save VFP registers 37 - stm \vfp_base, {r2-r5} @ Save FPEXC, FPSCR, FPINST, FPINST2 38 - .endm 39 - 40 - /* Assume FPEXC_EN is on and FPEXC_EX is off, clobbers {r2-r6} */ 41 - .macro restore_vfp_state vfp_base 42 - VFPFLDMIA \vfp_base, r6 @ Load VFP registers 43 - ldm \vfp_base, {r2-r5} @ Load FPEXC, FPSCR, FPINST, FPINST2 44 - 45 - VFPFMXR FPSCR, r3 46 - tst r2, #FPEXC_EX @ Check for VFP Subarchitecture 47 - beq 1f 48 - VFPFMXR FPINST, r4 49 - tst r2, #FPEXC_FP2V 50 - VFPFMXR FPINST2, r5, ne 51 - 1: 52 - VFPFMXR FPEXC, r2 @ FPEXC (last, in case !EN) 53 - .endm 54 - 55 - /* These are simply for the macros to work - value don't have meaning */ 56 - .equ usr, 0 57 - .equ svc, 1 58 - .equ abt, 2 59 - .equ und, 3 60 - .equ irq, 4 61 - .equ fiq, 5 62 - 63 - .macro push_host_regs_mode mode 64 - mrs r2, SP_\mode 65 - mrs r3, LR_\mode 66 - mrs r4, SPSR_\mode 67 - push {r2, r3, r4} 68 - .endm 69 - 70 - /* 71 - * Store all host persistent registers on the stack. 72 - * Clobbers all registers, in all modes, except r0 and r1. 73 - */ 74 - .macro save_host_regs 75 - /* Hyp regs. Only ELR_hyp (SPSR_hyp already saved) */ 76 - mrs r2, ELR_hyp 77 - push {r2} 78 - 79 - /* usr regs */ 80 - push {r4-r12} @ r0-r3 are always clobbered 81 - mrs r2, SP_usr 82 - mov r3, lr 83 - push {r2, r3} 84 - 85 - push_host_regs_mode svc 86 - push_host_regs_mode abt 87 - push_host_regs_mode und 88 - push_host_regs_mode irq 89 - 90 - /* fiq regs */ 91 - mrs r2, r8_fiq 92 - mrs r3, r9_fiq 93 - mrs r4, r10_fiq 94 - mrs r5, r11_fiq 95 - mrs r6, r12_fiq 96 - mrs r7, SP_fiq 97 - mrs r8, LR_fiq 98 - mrs r9, SPSR_fiq 99 - push {r2-r9} 100 - .endm 101 - 102 - .macro pop_host_regs_mode mode 103 - pop {r2, r3, r4} 104 - msr SP_\mode, r2 105 - msr LR_\mode, r3 106 - msr SPSR_\mode, r4 107 - .endm 108 - 109 - /* 110 - * Restore all host registers from the stack. 111 - * Clobbers all registers, in all modes, except r0 and r1. 112 - */ 113 - .macro restore_host_regs 114 - pop {r2-r9} 115 - msr r8_fiq, r2 116 - msr r9_fiq, r3 117 - msr r10_fiq, r4 118 - msr r11_fiq, r5 119 - msr r12_fiq, r6 120 - msr SP_fiq, r7 121 - msr LR_fiq, r8 122 - msr SPSR_fiq, r9 123 - 124 - pop_host_regs_mode irq 125 - pop_host_regs_mode und 126 - pop_host_regs_mode abt 127 - pop_host_regs_mode svc 128 - 129 - pop {r2, r3} 130 - msr SP_usr, r2 131 - mov lr, r3 132 - pop {r4-r12} 133 - 134 - pop {r2} 135 - msr ELR_hyp, r2 136 - .endm 137 - 138 - /* 139 - * Restore SP, LR and SPSR for a given mode. offset is the offset of 140 - * this mode's registers from the VCPU base. 141 - * 142 - * Assumes vcpu pointer in vcpu reg 143 - * 144 - * Clobbers r1, r2, r3, r4. 145 - */ 146 - .macro restore_guest_regs_mode mode, offset 147 - add r1, vcpu, \offset 148 - ldm r1, {r2, r3, r4} 149 - msr SP_\mode, r2 150 - msr LR_\mode, r3 151 - msr SPSR_\mode, r4 152 - .endm 153 - 154 - /* 155 - * Restore all guest registers from the vcpu struct. 156 - * 157 - * Assumes vcpu pointer in vcpu reg 158 - * 159 - * Clobbers *all* registers. 160 - */ 161 - .macro restore_guest_regs 162 - restore_guest_regs_mode svc, #VCPU_SVC_REGS 163 - restore_guest_regs_mode abt, #VCPU_ABT_REGS 164 - restore_guest_regs_mode und, #VCPU_UND_REGS 165 - restore_guest_regs_mode irq, #VCPU_IRQ_REGS 166 - 167 - add r1, vcpu, #VCPU_FIQ_REGS 168 - ldm r1, {r2-r9} 169 - msr r8_fiq, r2 170 - msr r9_fiq, r3 171 - msr r10_fiq, r4 172 - msr r11_fiq, r5 173 - msr r12_fiq, r6 174 - msr SP_fiq, r7 175 - msr LR_fiq, r8 176 - msr SPSR_fiq, r9 177 - 178 - @ Load return state 179 - ldr r2, [vcpu, #VCPU_PC] 180 - ldr r3, [vcpu, #VCPU_CPSR] 181 - msr ELR_hyp, r2 182 - msr SPSR_cxsf, r3 183 - 184 - @ Load user registers 185 - ldr r2, [vcpu, #VCPU_USR_SP] 186 - ldr r3, [vcpu, #VCPU_USR_LR] 187 - msr SP_usr, r2 188 - mov lr, r3 189 - add vcpu, vcpu, #(VCPU_USR_REGS) 190 - ldm vcpu, {r0-r12} 191 - .endm 192 - 193 - /* 194 - * Save SP, LR and SPSR for a given mode. offset is the offset of 195 - * this mode's registers from the VCPU base. 196 - * 197 - * Assumes vcpu pointer in vcpu reg 198 - * 199 - * Clobbers r2, r3, r4, r5. 200 - */ 201 - .macro save_guest_regs_mode mode, offset 202 - add r2, vcpu, \offset 203 - mrs r3, SP_\mode 204 - mrs r4, LR_\mode 205 - mrs r5, SPSR_\mode 206 - stm r2, {r3, r4, r5} 207 - .endm 208 - 209 - /* 210 - * Save all guest registers to the vcpu struct 211 - * Expects guest's r0, r1, r2 on the stack. 212 - * 213 - * Assumes vcpu pointer in vcpu reg 214 - * 215 - * Clobbers r2, r3, r4, r5. 216 - */ 217 - .macro save_guest_regs 218 - @ Store usr registers 219 - add r2, vcpu, #VCPU_USR_REG(3) 220 - stm r2, {r3-r12} 221 - add r2, vcpu, #VCPU_USR_REG(0) 222 - pop {r3, r4, r5} @ r0, r1, r2 223 - stm r2, {r3, r4, r5} 224 - mrs r2, SP_usr 225 - mov r3, lr 226 - str r2, [vcpu, #VCPU_USR_SP] 227 - str r3, [vcpu, #VCPU_USR_LR] 228 - 229 - @ Store return state 230 - mrs r2, ELR_hyp 231 - mrs r3, spsr 232 - str r2, [vcpu, #VCPU_PC] 233 - str r3, [vcpu, #VCPU_CPSR] 234 - 235 - @ Store other guest registers 236 - save_guest_regs_mode svc, #VCPU_SVC_REGS 237 - save_guest_regs_mode abt, #VCPU_ABT_REGS 238 - save_guest_regs_mode und, #VCPU_UND_REGS 239 - save_guest_regs_mode irq, #VCPU_IRQ_REGS 240 - .endm 241 - 242 - /* Reads cp15 registers from hardware and stores them in memory 243 - * @store_to_vcpu: If 0, registers are written in-order to the stack, 244 - * otherwise to the VCPU struct pointed to by vcpup 245 - * 246 - * Assumes vcpu pointer in vcpu reg 247 - * 248 - * Clobbers r2 - r12 249 - */ 250 - .macro read_cp15_state store_to_vcpu 251 - mrc p15, 0, r2, c1, c0, 0 @ SCTLR 252 - mrc p15, 0, r3, c1, c0, 2 @ CPACR 253 - mrc p15, 0, r4, c2, c0, 2 @ TTBCR 254 - mrc p15, 0, r5, c3, c0, 0 @ DACR 255 - mrrc p15, 0, r6, r7, c2 @ TTBR 0 256 - mrrc p15, 1, r8, r9, c2 @ TTBR 1 257 - mrc p15, 0, r10, c10, c2, 0 @ PRRR 258 - mrc p15, 0, r11, c10, c2, 1 @ NMRR 259 - mrc p15, 2, r12, c0, c0, 0 @ CSSELR 260 - 261 - .if \store_to_vcpu == 0 262 - push {r2-r12} @ Push CP15 registers 263 - .else 264 - str r2, [vcpu, #CP15_OFFSET(c1_SCTLR)] 265 - str r3, [vcpu, #CP15_OFFSET(c1_CPACR)] 266 - str r4, [vcpu, #CP15_OFFSET(c2_TTBCR)] 267 - str r5, [vcpu, #CP15_OFFSET(c3_DACR)] 268 - add r2, vcpu, #CP15_OFFSET(c2_TTBR0) 269 - strd r6, r7, [r2] 270 - add r2, vcpu, #CP15_OFFSET(c2_TTBR1) 271 - strd r8, r9, [r2] 272 - str r10, [vcpu, #CP15_OFFSET(c10_PRRR)] 273 - str r11, [vcpu, #CP15_OFFSET(c10_NMRR)] 274 - str r12, [vcpu, #CP15_OFFSET(c0_CSSELR)] 275 - .endif 276 - 277 - mrc p15, 0, r2, c13, c0, 1 @ CID 278 - mrc p15, 0, r3, c13, c0, 2 @ TID_URW 279 - mrc p15, 0, r4, c13, c0, 3 @ TID_URO 280 - mrc p15, 0, r5, c13, c0, 4 @ TID_PRIV 281 - mrc p15, 0, r6, c5, c0, 0 @ DFSR 282 - mrc p15, 0, r7, c5, c0, 1 @ IFSR 283 - mrc p15, 0, r8, c5, c1, 0 @ ADFSR 284 - mrc p15, 0, r9, c5, c1, 1 @ AIFSR 285 - mrc p15, 0, r10, c6, c0, 0 @ DFAR 286 - mrc p15, 0, r11, c6, c0, 2 @ IFAR 287 - mrc p15, 0, r12, c12, c0, 0 @ VBAR 288 - 289 - .if \store_to_vcpu == 0 290 - push {r2-r12} @ Push CP15 registers 291 - .else 292 - str r2, [vcpu, #CP15_OFFSET(c13_CID)] 293 - str r3, [vcpu, #CP15_OFFSET(c13_TID_URW)] 294 - str r4, [vcpu, #CP15_OFFSET(c13_TID_URO)] 295 - str r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)] 296 - str r6, [vcpu, #CP15_OFFSET(c5_DFSR)] 297 - str r7, [vcpu, #CP15_OFFSET(c5_IFSR)] 298 - str r8, [vcpu, #CP15_OFFSET(c5_ADFSR)] 299 - str r9, [vcpu, #CP15_OFFSET(c5_AIFSR)] 300 - str r10, [vcpu, #CP15_OFFSET(c6_DFAR)] 301 - str r11, [vcpu, #CP15_OFFSET(c6_IFAR)] 302 - str r12, [vcpu, #CP15_OFFSET(c12_VBAR)] 303 - .endif 304 - 305 - mrc p15, 0, r2, c14, c1, 0 @ CNTKCTL 306 - mrrc p15, 0, r4, r5, c7 @ PAR 307 - mrc p15, 0, r6, c10, c3, 0 @ AMAIR0 308 - mrc p15, 0, r7, c10, c3, 1 @ AMAIR1 309 - 310 - .if \store_to_vcpu == 0 311 - push {r2,r4-r7} 312 - .else 313 - str r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)] 314 - add r12, vcpu, #CP15_OFFSET(c7_PAR) 315 - strd r4, r5, [r12] 316 - str r6, [vcpu, #CP15_OFFSET(c10_AMAIR0)] 317 - str r7, [vcpu, #CP15_OFFSET(c10_AMAIR1)] 318 - .endif 319 - .endm 320 - 321 - /* 322 - * Reads cp15 registers from memory and writes them to hardware 323 - * @read_from_vcpu: If 0, registers are read in-order from the stack, 324 - * otherwise from the VCPU struct pointed to by vcpup 325 - * 326 - * Assumes vcpu pointer in vcpu reg 327 - */ 328 - .macro write_cp15_state read_from_vcpu 329 - .if \read_from_vcpu == 0 330 - pop {r2,r4-r7} 331 - .else 332 - ldr r2, [vcpu, #CP15_OFFSET(c14_CNTKCTL)] 333 - add r12, vcpu, #CP15_OFFSET(c7_PAR) 334 - ldrd r4, r5, [r12] 335 - ldr r6, [vcpu, #CP15_OFFSET(c10_AMAIR0)] 336 - ldr r7, [vcpu, #CP15_OFFSET(c10_AMAIR1)] 337 - .endif 338 - 339 - mcr p15, 0, r2, c14, c1, 0 @ CNTKCTL 340 - mcrr p15, 0, r4, r5, c7 @ PAR 341 - mcr p15, 0, r6, c10, c3, 0 @ AMAIR0 342 - mcr p15, 0, r7, c10, c3, 1 @ AMAIR1 343 - 344 - .if \read_from_vcpu == 0 345 - pop {r2-r12} 346 - .else 347 - ldr r2, [vcpu, #CP15_OFFSET(c13_CID)] 348 - ldr r3, [vcpu, #CP15_OFFSET(c13_TID_URW)] 349 - ldr r4, [vcpu, #CP15_OFFSET(c13_TID_URO)] 350 - ldr r5, [vcpu, #CP15_OFFSET(c13_TID_PRIV)] 351 - ldr r6, [vcpu, #CP15_OFFSET(c5_DFSR)] 352 - ldr r7, [vcpu, #CP15_OFFSET(c5_IFSR)] 353 - ldr r8, [vcpu, #CP15_OFFSET(c5_ADFSR)] 354 - ldr r9, [vcpu, #CP15_OFFSET(c5_AIFSR)] 355 - ldr r10, [vcpu, #CP15_OFFSET(c6_DFAR)] 356 - ldr r11, [vcpu, #CP15_OFFSET(c6_IFAR)] 357 - ldr r12, [vcpu, #CP15_OFFSET(c12_VBAR)] 358 - .endif 359 - 360 - mcr p15, 0, r2, c13, c0, 1 @ CID 361 - mcr p15, 0, r3, c13, c0, 2 @ TID_URW 362 - mcr p15, 0, r4, c13, c0, 3 @ TID_URO 363 - mcr p15, 0, r5, c13, c0, 4 @ TID_PRIV 364 - mcr p15, 0, r6, c5, c0, 0 @ DFSR 365 - mcr p15, 0, r7, c5, c0, 1 @ IFSR 366 - mcr p15, 0, r8, c5, c1, 0 @ ADFSR 367 - mcr p15, 0, r9, c5, c1, 1 @ AIFSR 368 - mcr p15, 0, r10, c6, c0, 0 @ DFAR 369 - mcr p15, 0, r11, c6, c0, 2 @ IFAR 370 - mcr p15, 0, r12, c12, c0, 0 @ VBAR 371 - 372 - .if \read_from_vcpu == 0 373 - pop {r2-r12} 374 - .else 375 - ldr r2, [vcpu, #CP15_OFFSET(c1_SCTLR)] 376 - ldr r3, [vcpu, #CP15_OFFSET(c1_CPACR)] 377 - ldr r4, [vcpu, #CP15_OFFSET(c2_TTBCR)] 378 - ldr r5, [vcpu, #CP15_OFFSET(c3_DACR)] 379 - add r12, vcpu, #CP15_OFFSET(c2_TTBR0) 380 - ldrd r6, r7, [r12] 381 - add r12, vcpu, #CP15_OFFSET(c2_TTBR1) 382 - ldrd r8, r9, [r12] 383 - ldr r10, [vcpu, #CP15_OFFSET(c10_PRRR)] 384 - ldr r11, [vcpu, #CP15_OFFSET(c10_NMRR)] 385 - ldr r12, [vcpu, #CP15_OFFSET(c0_CSSELR)] 386 - .endif 387 - 388 - mcr p15, 0, r2, c1, c0, 0 @ SCTLR 389 - mcr p15, 0, r3, c1, c0, 2 @ CPACR 390 - mcr p15, 0, r4, c2, c0, 2 @ TTBCR 391 - mcr p15, 0, r5, c3, c0, 0 @ DACR 392 - mcrr p15, 0, r6, r7, c2 @ TTBR 0 393 - mcrr p15, 1, r8, r9, c2 @ TTBR 1 394 - mcr p15, 0, r10, c10, c2, 0 @ PRRR 395 - mcr p15, 0, r11, c10, c2, 1 @ NMRR 396 - mcr p15, 2, r12, c0, c0, 0 @ CSSELR 397 - .endm 398 - 399 - /* 400 - * Save the VGIC CPU state into memory 401 - * 402 - * Assumes vcpu pointer in vcpu reg 403 - */ 404 - .macro save_vgic_state 405 - /* Get VGIC VCTRL base into r2 */ 406 - ldr r2, [vcpu, #VCPU_KVM] 407 - ldr r2, [r2, #KVM_VGIC_VCTRL] 408 - cmp r2, #0 409 - beq 2f 410 - 411 - /* Compute the address of struct vgic_cpu */ 412 - add r11, vcpu, #VCPU_VGIC_CPU 413 - 414 - /* Save all interesting registers */ 415 - ldr r4, [r2, #GICH_VMCR] 416 - ldr r5, [r2, #GICH_MISR] 417 - ldr r6, [r2, #GICH_EISR0] 418 - ldr r7, [r2, #GICH_EISR1] 419 - ldr r8, [r2, #GICH_ELRSR0] 420 - ldr r9, [r2, #GICH_ELRSR1] 421 - ldr r10, [r2, #GICH_APR] 422 - ARM_BE8(rev r4, r4 ) 423 - ARM_BE8(rev r5, r5 ) 424 - ARM_BE8(rev r6, r6 ) 425 - ARM_BE8(rev r7, r7 ) 426 - ARM_BE8(rev r8, r8 ) 427 - ARM_BE8(rev r9, r9 ) 428 - ARM_BE8(rev r10, r10 ) 429 - 430 - str r4, [r11, #VGIC_V2_CPU_VMCR] 431 - str r5, [r11, #VGIC_V2_CPU_MISR] 432 - #ifdef CONFIG_CPU_ENDIAN_BE8 433 - str r6, [r11, #(VGIC_V2_CPU_EISR + 4)] 434 - str r7, [r11, #VGIC_V2_CPU_EISR] 435 - str r8, [r11, #(VGIC_V2_CPU_ELRSR + 4)] 436 - str r9, [r11, #VGIC_V2_CPU_ELRSR] 437 - #else 438 - str r6, [r11, #VGIC_V2_CPU_EISR] 439 - str r7, [r11, #(VGIC_V2_CPU_EISR + 4)] 440 - str r8, [r11, #VGIC_V2_CPU_ELRSR] 441 - str r9, [r11, #(VGIC_V2_CPU_ELRSR + 4)] 442 - #endif 443 - str r10, [r11, #VGIC_V2_CPU_APR] 444 - 445 - /* Clear GICH_HCR */ 446 - mov r5, #0 447 - str r5, [r2, #GICH_HCR] 448 - 449 - /* Save list registers */ 450 - add r2, r2, #GICH_LR0 451 - add r3, r11, #VGIC_V2_CPU_LR 452 - ldr r4, [r11, #VGIC_CPU_NR_LR] 453 - 1: ldr r6, [r2], #4 454 - ARM_BE8(rev r6, r6 ) 455 - str r6, [r3], #4 456 - subs r4, r4, #1 457 - bne 1b 458 - 2: 459 - .endm 460 - 461 - /* 462 - * Restore the VGIC CPU state from memory 463 - * 464 - * Assumes vcpu pointer in vcpu reg 465 - */ 466 - .macro restore_vgic_state 467 - /* Get VGIC VCTRL base into r2 */ 468 - ldr r2, [vcpu, #VCPU_KVM] 469 - ldr r2, [r2, #KVM_VGIC_VCTRL] 470 - cmp r2, #0 471 - beq 2f 472 - 473 - /* Compute the address of struct vgic_cpu */ 474 - add r11, vcpu, #VCPU_VGIC_CPU 475 - 476 - /* We only restore a minimal set of registers */ 477 - ldr r3, [r11, #VGIC_V2_CPU_HCR] 478 - ldr r4, [r11, #VGIC_V2_CPU_VMCR] 479 - ldr r8, [r11, #VGIC_V2_CPU_APR] 480 - ARM_BE8(rev r3, r3 ) 481 - ARM_BE8(rev r4, r4 ) 482 - ARM_BE8(rev r8, r8 ) 483 - 484 - str r3, [r2, #GICH_HCR] 485 - str r4, [r2, #GICH_VMCR] 486 - str r8, [r2, #GICH_APR] 487 - 488 - /* Restore list registers */ 489 - add r2, r2, #GICH_LR0 490 - add r3, r11, #VGIC_V2_CPU_LR 491 - ldr r4, [r11, #VGIC_CPU_NR_LR] 492 - 1: ldr r6, [r3], #4 493 - ARM_BE8(rev r6, r6 ) 494 - str r6, [r2], #4 495 - subs r4, r4, #1 496 - bne 1b 497 - 2: 498 - .endm 499 - 500 - #define CNTHCTL_PL1PCTEN (1 << 0) 501 - #define CNTHCTL_PL1PCEN (1 << 1) 502 - 503 - /* 504 - * Save the timer state onto the VCPU and allow physical timer/counter access 505 - * for the host. 506 - * 507 - * Assumes vcpu pointer in vcpu reg 508 - * Clobbers r2-r5 509 - */ 510 - .macro save_timer_state 511 - ldr r4, [vcpu, #VCPU_KVM] 512 - ldr r2, [r4, #KVM_TIMER_ENABLED] 513 - cmp r2, #0 514 - beq 1f 515 - 516 - mrc p15, 0, r2, c14, c3, 1 @ CNTV_CTL 517 - str r2, [vcpu, #VCPU_TIMER_CNTV_CTL] 518 - 519 - isb 520 - 521 - mrrc p15, 3, rr_lo_hi(r2, r3), c14 @ CNTV_CVAL 522 - ldr r4, =VCPU_TIMER_CNTV_CVAL 523 - add r5, vcpu, r4 524 - strd r2, r3, [r5] 525 - 526 - @ Ensure host CNTVCT == CNTPCT 527 - mov r2, #0 528 - mcrr p15, 4, r2, r2, c14 @ CNTVOFF 529 - 530 - 1: 531 - mov r2, #0 @ Clear ENABLE 532 - mcr p15, 0, r2, c14, c3, 1 @ CNTV_CTL 533 - 534 - @ Allow physical timer/counter access for the host 535 - mrc p15, 4, r2, c14, c1, 0 @ CNTHCTL 536 - orr r2, r2, #(CNTHCTL_PL1PCEN | CNTHCTL_PL1PCTEN) 537 - mcr p15, 4, r2, c14, c1, 0 @ CNTHCTL 538 - .endm 539 - 540 - /* 541 - * Load the timer state from the VCPU and deny physical timer/counter access 542 - * for the host. 543 - * 544 - * Assumes vcpu pointer in vcpu reg 545 - * Clobbers r2-r5 546 - */ 547 - .macro restore_timer_state 548 - @ Disallow physical timer access for the guest 549 - @ Physical counter access is allowed 550 - mrc p15, 4, r2, c14, c1, 0 @ CNTHCTL 551 - orr r2, r2, #CNTHCTL_PL1PCTEN 552 - bic r2, r2, #CNTHCTL_PL1PCEN 553 - mcr p15, 4, r2, c14, c1, 0 @ CNTHCTL 554 - 555 - ldr r4, [vcpu, #VCPU_KVM] 556 - ldr r2, [r4, #KVM_TIMER_ENABLED] 557 - cmp r2, #0 558 - beq 1f 559 - 560 - ldr r2, [r4, #KVM_TIMER_CNTVOFF] 561 - ldr r3, [r4, #(KVM_TIMER_CNTVOFF + 4)] 562 - mcrr p15, 4, rr_lo_hi(r2, r3), c14 @ CNTVOFF 563 - 564 - ldr r4, =VCPU_TIMER_CNTV_CVAL 565 - add r5, vcpu, r4 566 - ldrd r2, r3, [r5] 567 - mcrr p15, 3, rr_lo_hi(r2, r3), c14 @ CNTV_CVAL 568 - isb 569 - 570 - ldr r2, [vcpu, #VCPU_TIMER_CNTV_CTL] 571 - and r2, r2, #3 572 - mcr p15, 0, r2, c14, c3, 1 @ CNTV_CTL 573 - 1: 574 - .endm 575 - 576 - .equ vmentry, 0 577 - .equ vmexit, 1 578 - 579 - /* Configures the HSTR (Hyp System Trap Register) on entry/return 580 - * (hardware reset value is 0) */ 581 - .macro set_hstr operation 582 - mrc p15, 4, r2, c1, c1, 3 583 - ldr r3, =HSTR_T(15) 584 - .if \operation == vmentry 585 - orr r2, r2, r3 @ Trap CR{15} 586 - .else 587 - bic r2, r2, r3 @ Don't trap any CRx accesses 588 - .endif 589 - mcr p15, 4, r2, c1, c1, 3 590 - .endm 591 - 592 - /* Configures the HCPTR (Hyp Coprocessor Trap Register) on entry/return 593 - * (hardware reset value is 0). Keep previous value in r2. 594 - * An ISB is emited on vmexit/vmtrap, but executed on vmexit only if 595 - * VFP wasn't already enabled (always executed on vmtrap). 596 - * If a label is specified with vmexit, it is branched to if VFP wasn't 597 - * enabled. 598 - */ 599 - .macro set_hcptr operation, mask, label = none 600 - mrc p15, 4, r2, c1, c1, 2 601 - ldr r3, =\mask 602 - .if \operation == vmentry 603 - orr r3, r2, r3 @ Trap coproc-accesses defined in mask 604 - .else 605 - bic r3, r2, r3 @ Don't trap defined coproc-accesses 606 - .endif 607 - mcr p15, 4, r3, c1, c1, 2 608 - .if \operation != vmentry 609 - .if \operation == vmexit 610 - tst r2, #(HCPTR_TCP(10) | HCPTR_TCP(11)) 611 - beq 1f 612 - .endif 613 - isb 614 - .if \label != none 615 - b \label 616 - .endif 617 - 1: 618 - .endif 619 - .endm 620 - 621 - /* Configures the HDCR (Hyp Debug Configuration Register) on entry/return 622 - * (hardware reset value is 0) */ 623 - .macro set_hdcr operation 624 - mrc p15, 4, r2, c1, c1, 1 625 - ldr r3, =(HDCR_TPM|HDCR_TPMCR) 626 - .if \operation == vmentry 627 - orr r2, r2, r3 @ Trap some perfmon accesses 628 - .else 629 - bic r2, r2, r3 @ Don't trap any perfmon accesses 630 - .endif 631 - mcr p15, 4, r2, c1, c1, 1 632 - .endm 633 - 634 - /* Enable/Disable: stage-2 trans., trap interrupts, trap wfi, trap smc */ 635 - .macro configure_hyp_role operation 636 - .if \operation == vmentry 637 - ldr r2, [vcpu, #VCPU_HCR] 638 - ldr r3, [vcpu, #VCPU_IRQ_LINES] 639 - orr r2, r2, r3 640 - .else 641 - mov r2, #0 642 - .endif 643 - mcr p15, 4, r2, c1, c1, 0 @ HCR 644 - .endm 645 - 646 - .macro load_vcpu 647 - mrc p15, 4, vcpu, c13, c0, 2 @ HTPIDR 648 - .endm

+23

arch/arm/kvm/mmu.c

··· 28 28 #include <asm/kvm_mmio.h> 29 29 #include <asm/kvm_asm.h> 30 30 #include <asm/kvm_emulate.h> 31 + #include <asm/virt.h> 31 32 32 33 #include "trace.h" 33 34 ··· 599 598 unsigned long start = KERN_TO_HYP((unsigned long)from); 600 599 unsigned long end = KERN_TO_HYP((unsigned long)to); 601 600 601 + if (is_kernel_in_hyp_mode()) 602 + return 0; 603 + 602 604 start = start & PAGE_MASK; 603 605 end = PAGE_ALIGN(end); 604 606 ··· 633 629 { 634 630 unsigned long start = KERN_TO_HYP((unsigned long)from); 635 631 unsigned long end = KERN_TO_HYP((unsigned long)to); 632 + 633 + if (is_kernel_in_hyp_mode()) 634 + return 0; 636 635 637 636 /* Check for a valid kernel IO mapping */ 638 637 if (!is_vmalloc_addr(from) || !is_vmalloc_addr(to - 1)) ··· 1433 1426 if (is_iabt) { 1434 1427 /* Prefetch Abort on I/O address */ 1435 1428 kvm_inject_pabt(vcpu, kvm_vcpu_get_hfar(vcpu)); 1429 + ret = 1; 1430 + goto out_unlock; 1431 + } 1432 + 1433 + /* 1434 + * Check for a cache maintenance operation. Since we 1435 + * ended-up here, we know it is outside of any memory 1436 + * slot. But we can't find out if that is for a device, 1437 + * or if the guest is just being stupid. The only thing 1438 + * we know for sure is that this range cannot be cached. 1439 + * 1440 + * So let's assume that the guest is just being 1441 + * cautious, and skip the instruction. 1442 + */ 1443 + if (kvm_vcpu_dabt_is_cm(vcpu)) { 1444 + kvm_skip_instr(vcpu, kvm_vcpu_trap_il_is32bit(vcpu)); 1436 1445 ret = 1; 1437 1446 goto out_unlock; 1438 1447 }

+1 -1

arch/arm/kvm/reset.c

··· 71 71 } 72 72 73 73 /* Reset core registers */ 74 - memcpy(&vcpu->arch.regs, reset_regs, sizeof(vcpu->arch.regs)); 74 + memcpy(&vcpu->arch.ctxt.gp_regs, reset_regs, sizeof(vcpu->arch.ctxt.gp_regs)); 75 75 76 76 /* Reset CP15 registers */ 77 77 kvm_reset_coprocs(vcpu);

+13

arch/arm64/Kconfig

··· 750 750 not support these instructions and requires the kernel to be 751 751 built with binutils >= 2.25. 752 752 753 + config ARM64_VHE 754 + bool "Enable support for Virtualization Host Extensions (VHE)" 755 + default y 756 + help 757 + Virtualization Host Extensions (VHE) allow the kernel to run 758 + directly at EL2 (instead of EL1) on processors that support 759 + it. This leads to better performance for KVM, as they reduce 760 + the cost of the world switch. 761 + 762 + Selecting this option allows the VHE feature to be detected 763 + at runtime, and does not affect processors that do not 764 + implement this feature. 765 + 753 766 endmenu 754 767 755 768 endmenu

+5 -1

arch/arm64/include/asm/cpufeature.h

··· 30 30 #define ARM64_HAS_LSE_ATOMICS 5 31 31 #define ARM64_WORKAROUND_CAVIUM_23154 6 32 32 #define ARM64_WORKAROUND_834220 7 33 + /* #define ARM64_HAS_NO_HW_PREFETCH 8 */ 34 + /* #define ARM64_HAS_UAO 9 */ 35 + /* #define ARM64_ALT_PAN_NOT_UAO 10 */ 36 + #define ARM64_HAS_VIRT_HOST_EXTN 11 33 37 34 - #define ARM64_NCAPS 8 38 + #define ARM64_NCAPS 12 35 39 36 40 #ifndef __ASSEMBLY__ 37 41

+13 -5

arch/arm64/include/asm/hw_breakpoint.h

··· 18 18 19 19 #include <asm/cputype.h> 20 20 #include <asm/cpufeature.h> 21 + #include <asm/virt.h> 21 22 22 23 #ifdef __KERNEL__ 23 24 ··· 36 35 struct arch_hw_breakpoint_ctrl ctrl; 37 36 }; 38 37 38 + /* Privilege Levels */ 39 + #define AARCH64_BREAKPOINT_EL1 1 40 + #define AARCH64_BREAKPOINT_EL0 2 41 + 42 + #define DBG_HMC_HYP (1 << 13) 43 + 39 44 static inline u32 encode_ctrl_reg(struct arch_hw_breakpoint_ctrl ctrl) 40 45 { 41 - return (ctrl.len << 5) | (ctrl.type << 3) | (ctrl.privilege << 1) | 46 + u32 val = (ctrl.len << 5) | (ctrl.type << 3) | (ctrl.privilege << 1) | 42 47 ctrl.enabled; 48 + 49 + if (is_kernel_in_hyp_mode() && ctrl.privilege == AARCH64_BREAKPOINT_EL1) 50 + val |= DBG_HMC_HYP; 51 + 52 + return val; 43 53 } 44 54 45 55 static inline void decode_ctrl_reg(u32 reg, ··· 72 60 #define ARM_BREAKPOINT_LOAD 1 73 61 #define ARM_BREAKPOINT_STORE 2 74 62 #define AARCH64_ESR_ACCESS_MASK (1 << 6) 75 - 76 - /* Privilege Levels */ 77 - #define AARCH64_BREAKPOINT_EL1 1 78 - #define AARCH64_BREAKPOINT_EL0 2 79 63 80 64 /* Lengths */ 81 65 #define ARM_BREAKPOINT_LEN_1 0x1

+5 -1

arch/arm64/include/asm/kvm_arm.h

··· 23 23 #include <asm/types.h> 24 24 25 25 /* Hyp Configuration Register (HCR) bits */ 26 + #define HCR_E2H (UL(1) << 34) 26 27 #define HCR_ID (UL(1) << 33) 27 28 #define HCR_CD (UL(1) << 32) 28 29 #define HCR_RW_SHIFT 31 ··· 82 81 HCR_AMO | HCR_SWIO | HCR_TIDCP | HCR_RW) 83 82 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF) 84 83 #define HCR_INT_OVERRIDE (HCR_FMO | HCR_IMO) 85 - 84 + #define HCR_HOST_VHE_FLAGS (HCR_RW | HCR_TGE | HCR_E2H) 86 85 87 86 /* Hyp System Control Register (SCTLR_EL2) bits */ 88 87 #define SCTLR_EL2_EE (1 << 25) ··· 216 215 ECN(BREAKPT_LOW), ECN(BREAKPT_CUR), ECN(SOFTSTP_LOW), \ 217 216 ECN(SOFTSTP_CUR), ECN(WATCHPT_LOW), ECN(WATCHPT_CUR), \ 218 217 ECN(BKPT32), ECN(VECTOR32), ECN(BRK64) 218 + 219 + #define CPACR_EL1_FPEN (3 << 20) 220 + #define CPACR_EL1_TTA (1 << 28) 219 221 220 222 #endif /* __ARM64_KVM_ARM_H__ */

+3 -3

arch/arm64/include/asm/kvm_asm.h

··· 35 35 36 36 extern char __kvm_hyp_vector[]; 37 37 38 - #define __kvm_hyp_code_start __hyp_text_start 39 - #define __kvm_hyp_code_end __hyp_text_end 40 - 41 38 extern void __kvm_flush_vm_context(void); 42 39 extern void __kvm_tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa); 43 40 extern void __kvm_tlb_flush_vmid(struct kvm *kvm); ··· 42 45 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu); 43 46 44 47 extern u64 __vgic_v3_get_ich_vtr_el2(void); 48 + extern void __vgic_v3_init_lrs(void); 45 49 46 50 extern u32 __kvm_get_mdcr_el2(void); 51 + 52 + extern void __init_stage2_translation(void); 47 53 48 54 #endif 49 55

+8

arch/arm64/include/asm/kvm_emulate.h

··· 29 29 #include <asm/kvm_mmio.h> 30 30 #include <asm/ptrace.h> 31 31 #include <asm/cputype.h> 32 + #include <asm/virt.h> 32 33 33 34 unsigned long *vcpu_reg32(const struct kvm_vcpu *vcpu, u8 reg_num); 34 35 unsigned long *vcpu_spsr32(const struct kvm_vcpu *vcpu); ··· 44 43 static inline void vcpu_reset_hcr(struct kvm_vcpu *vcpu) 45 44 { 46 45 vcpu->arch.hcr_el2 = HCR_GUEST_FLAGS; 46 + if (is_kernel_in_hyp_mode()) 47 + vcpu->arch.hcr_el2 |= HCR_E2H; 47 48 if (test_bit(KVM_ARM_VCPU_EL1_32BIT, vcpu->arch.features)) 48 49 vcpu->arch.hcr_el2 &= ~HCR_RW; 49 50 } ··· 190 187 static inline bool kvm_vcpu_dabt_iss1tw(const struct kvm_vcpu *vcpu) 191 188 { 192 189 return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_S1PTW); 190 + } 191 + 192 + static inline bool kvm_vcpu_dabt_is_cm(const struct kvm_vcpu *vcpu) 193 + { 194 + return !!(kvm_vcpu_get_hsr(vcpu) & ESR_ELx_CM); 193 195 } 194 196 195 197 static inline int kvm_vcpu_dabt_get_as(const struct kvm_vcpu *vcpu)

+33 -1

arch/arm64/include/asm/kvm_host.h

··· 25 25 #include <linux/types.h> 26 26 #include <linux/kvm_types.h> 27 27 #include <asm/kvm.h> 28 + #include <asm/kvm_asm.h> 28 29 #include <asm/kvm_mmio.h> 30 + #include <asm/kvm_perf_event.h> 29 31 30 32 #define __KVM_HAVE_ARCH_INTC_INITIALIZED 31 33 ··· 38 36 39 37 #include <kvm/arm_vgic.h> 40 38 #include <kvm/arm_arch_timer.h> 39 + #include <kvm/arm_pmu.h> 41 40 42 41 #define KVM_MAX_VCPUS VGIC_V3_MAX_CPUS 43 42 44 - #define KVM_VCPU_MAX_FEATURES 3 43 + #define KVM_VCPU_MAX_FEATURES 4 45 44 46 45 int __attribute_const__ kvm_target_cpu(void); 47 46 int kvm_reset_vcpu(struct kvm_vcpu *vcpu); ··· 116 113 PAR_EL1, /* Physical Address Register */ 117 114 MDSCR_EL1, /* Monitor Debug System Control Register */ 118 115 MDCCINT_EL1, /* Monitor Debug Comms Channel Interrupt Enable Reg */ 116 + 117 + /* Performance Monitors Registers */ 118 + PMCR_EL0, /* Control Register */ 119 + PMSELR_EL0, /* Event Counter Selection Register */ 120 + PMEVCNTR0_EL0, /* Event Counter Register (0-30) */ 121 + PMEVCNTR30_EL0 = PMEVCNTR0_EL0 + 30, 122 + PMCCNTR_EL0, /* Cycle Counter Register */ 123 + PMEVTYPER0_EL0, /* Event Type Register (0-30) */ 124 + PMEVTYPER30_EL0 = PMEVTYPER0_EL0 + 30, 125 + PMCCFILTR_EL0, /* Cycle Count Filter Register */ 126 + PMCNTENSET_EL0, /* Count Enable Set Register */ 127 + PMINTENSET_EL1, /* Interrupt Enable Set Register */ 128 + PMOVSSET_EL0, /* Overflow Flag Status Set Register */ 129 + PMSWINC_EL0, /* Software Increment Register */ 130 + PMUSERENR_EL0, /* User Enable Register */ 119 131 120 132 /* 32bit specific registers. Keep them at the end of the range */ 121 133 DACR32_EL2, /* Domain Access Control Register */ ··· 229 211 /* VGIC state */ 230 212 struct vgic_cpu vgic_cpu; 231 213 struct arch_timer_cpu timer_cpu; 214 + struct kvm_pmu pmu; 232 215 233 216 /* 234 217 * Anything that is not used directly from assembly code goes ··· 361 342 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu); 362 343 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu); 363 344 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu); 345 + int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu, 346 + struct kvm_device_attr *attr); 347 + int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu, 348 + struct kvm_device_attr *attr); 349 + int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu, 350 + struct kvm_device_attr *attr); 351 + 352 + /* #define kvm_call_hyp(f, ...) __kvm_call_hyp(kvm_ksym_ref(f), ##__VA_ARGS__) */ 353 + 354 + static inline void __cpu_init_stage2(void) 355 + { 356 + kvm_call_hyp(__init_stage2_translation); 357 + } 364 358 365 359 #endif /* __ARM64_KVM_HOST_H__ */

+181

arch/arm64/include/asm/kvm_hyp.h

··· 1 + /* 2 + * Copyright (C) 2015 - ARM Ltd 3 + * Author: Marc Zyngier <marc.zyngier@arm.com> 4 + * 5 + * This program is free software; you can redistribute it and/or modify 6 + * it under the terms of the GNU General Public License version 2 as 7 + * published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it will be useful, 10 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 + * GNU General Public License for more details. 13 + * 14 + * You should have received a copy of the GNU General Public License 15 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 + */ 17 + 18 + #ifndef __ARM64_KVM_HYP_H__ 19 + #define __ARM64_KVM_HYP_H__ 20 + 21 + #include <linux/compiler.h> 22 + #include <linux/kvm_host.h> 23 + #include <asm/kvm_mmu.h> 24 + #include <asm/kvm_perf_event.h> 25 + #include <asm/sysreg.h> 26 + 27 + #define __hyp_text __section(.hyp.text) notrace 28 + 29 + static inline unsigned long __kern_hyp_va(unsigned long v) 30 + { 31 + asm volatile(ALTERNATIVE("and %0, %0, %1", 32 + "nop", 33 + ARM64_HAS_VIRT_HOST_EXTN) 34 + : "+r" (v) : "i" (HYP_PAGE_OFFSET_MASK)); 35 + return v; 36 + } 37 + 38 + #define kern_hyp_va(v) (typeof(v))(__kern_hyp_va((unsigned long)(v))) 39 + 40 + static inline unsigned long __hyp_kern_va(unsigned long v) 41 + { 42 + u64 offset = PAGE_OFFSET - HYP_PAGE_OFFSET; 43 + asm volatile(ALTERNATIVE("add %0, %0, %1", 44 + "nop", 45 + ARM64_HAS_VIRT_HOST_EXTN) 46 + : "+r" (v) : "r" (offset)); 47 + return v; 48 + } 49 + 50 + #define hyp_kern_va(v) (typeof(v))(__hyp_kern_va((unsigned long)(v))) 51 + 52 + #define read_sysreg_elx(r,nvh,vh) \ 53 + ({ \ 54 + u64 reg; \ 55 + asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##nvh),\ 56 + "mrs_s %0, " __stringify(r##vh),\ 57 + ARM64_HAS_VIRT_HOST_EXTN) \ 58 + : "=r" (reg)); \ 59 + reg; \ 60 + }) 61 + 62 + #define write_sysreg_elx(v,r,nvh,vh) \ 63 + do { \ 64 + u64 __val = (u64)(v); \ 65 + asm volatile(ALTERNATIVE("msr " __stringify(r##nvh) ", %x0",\ 66 + "msr_s " __stringify(r##vh) ", %x0",\ 67 + ARM64_HAS_VIRT_HOST_EXTN) \ 68 + : : "rZ" (__val)); \ 69 + } while (0) 70 + 71 + /* 72 + * Unified accessors for registers that have a different encoding 73 + * between VHE and non-VHE. They must be specified without their "ELx" 74 + * encoding. 75 + */ 76 + #define read_sysreg_el2(r) \ 77 + ({ \ 78 + u64 reg; \ 79 + asm volatile(ALTERNATIVE("mrs %0, " __stringify(r##_EL2),\ 80 + "mrs %0, " __stringify(r##_EL1),\ 81 + ARM64_HAS_VIRT_HOST_EXTN) \ 82 + : "=r" (reg)); \ 83 + reg; \ 84 + }) 85 + 86 + #define write_sysreg_el2(v,r) \ 87 + do { \ 88 + u64 __val = (u64)(v); \ 89 + asm volatile(ALTERNATIVE("msr " __stringify(r##_EL2) ", %x0",\ 90 + "msr " __stringify(r##_EL1) ", %x0",\ 91 + ARM64_HAS_VIRT_HOST_EXTN) \ 92 + : : "rZ" (__val)); \ 93 + } while (0) 94 + 95 + #define read_sysreg_el0(r) read_sysreg_elx(r, _EL0, _EL02) 96 + #define write_sysreg_el0(v,r) write_sysreg_elx(v, r, _EL0, _EL02) 97 + #define read_sysreg_el1(r) read_sysreg_elx(r, _EL1, _EL12) 98 + #define write_sysreg_el1(v,r) write_sysreg_elx(v, r, _EL1, _EL12) 99 + 100 + /* The VHE specific system registers and their encoding */ 101 + #define sctlr_EL12 sys_reg(3, 5, 1, 0, 0) 102 + #define cpacr_EL12 sys_reg(3, 5, 1, 0, 2) 103 + #define ttbr0_EL12 sys_reg(3, 5, 2, 0, 0) 104 + #define ttbr1_EL12 sys_reg(3, 5, 2, 0, 1) 105 + #define tcr_EL12 sys_reg(3, 5, 2, 0, 2) 106 + #define afsr0_EL12 sys_reg(3, 5, 5, 1, 0) 107 + #define afsr1_EL12 sys_reg(3, 5, 5, 1, 1) 108 + #define esr_EL12 sys_reg(3, 5, 5, 2, 0) 109 + #define far_EL12 sys_reg(3, 5, 6, 0, 0) 110 + #define mair_EL12 sys_reg(3, 5, 10, 2, 0) 111 + #define amair_EL12 sys_reg(3, 5, 10, 3, 0) 112 + #define vbar_EL12 sys_reg(3, 5, 12, 0, 0) 113 + #define contextidr_EL12 sys_reg(3, 5, 13, 0, 1) 114 + #define cntkctl_EL12 sys_reg(3, 5, 14, 1, 0) 115 + #define cntp_tval_EL02 sys_reg(3, 5, 14, 2, 0) 116 + #define cntp_ctl_EL02 sys_reg(3, 5, 14, 2, 1) 117 + #define cntp_cval_EL02 sys_reg(3, 5, 14, 2, 2) 118 + #define cntv_tval_EL02 sys_reg(3, 5, 14, 3, 0) 119 + #define cntv_ctl_EL02 sys_reg(3, 5, 14, 3, 1) 120 + #define cntv_cval_EL02 sys_reg(3, 5, 14, 3, 2) 121 + #define spsr_EL12 sys_reg(3, 5, 4, 0, 0) 122 + #define elr_EL12 sys_reg(3, 5, 4, 0, 1) 123 + 124 + /** 125 + * hyp_alternate_select - Generates patchable code sequences that are 126 + * used to switch between two implementations of a function, depending 127 + * on the availability of a feature. 128 + * 129 + * @fname: a symbol name that will be defined as a function returning a 130 + * function pointer whose type will match @orig and @alt 131 + * @orig: A pointer to the default function, as returned by @fname when 132 + * @cond doesn't hold 133 + * @alt: A pointer to the alternate function, as returned by @fname 134 + * when @cond holds 135 + * @cond: a CPU feature (as described in asm/cpufeature.h) 136 + */ 137 + #define hyp_alternate_select(fname, orig, alt, cond) \ 138 + typeof(orig) * __hyp_text fname(void) \ 139 + { \ 140 + typeof(alt) *val = orig; \ 141 + asm volatile(ALTERNATIVE("nop \n", \ 142 + "mov %0, %1 \n", \ 143 + cond) \ 144 + : "+r" (val) : "r" (alt)); \ 145 + return val; \ 146 + } 147 + 148 + void __vgic_v2_save_state(struct kvm_vcpu *vcpu); 149 + void __vgic_v2_restore_state(struct kvm_vcpu *vcpu); 150 + 151 + void __vgic_v3_save_state(struct kvm_vcpu *vcpu); 152 + void __vgic_v3_restore_state(struct kvm_vcpu *vcpu); 153 + 154 + void __timer_save_state(struct kvm_vcpu *vcpu); 155 + void __timer_restore_state(struct kvm_vcpu *vcpu); 156 + 157 + void __sysreg_save_host_state(struct kvm_cpu_context *ctxt); 158 + void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt); 159 + void __sysreg_save_guest_state(struct kvm_cpu_context *ctxt); 160 + void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt); 161 + void __sysreg32_save_state(struct kvm_vcpu *vcpu); 162 + void __sysreg32_restore_state(struct kvm_vcpu *vcpu); 163 + 164 + void __debug_save_state(struct kvm_vcpu *vcpu, 165 + struct kvm_guest_debug_arch *dbg, 166 + struct kvm_cpu_context *ctxt); 167 + void __debug_restore_state(struct kvm_vcpu *vcpu, 168 + struct kvm_guest_debug_arch *dbg, 169 + struct kvm_cpu_context *ctxt); 170 + void __debug_cond_save_host_state(struct kvm_vcpu *vcpu); 171 + void __debug_cond_restore_host_state(struct kvm_vcpu *vcpu); 172 + 173 + void __fpsimd_save_state(struct user_fpsimd_state *fp_regs); 174 + void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs); 175 + bool __fpsimd_enabled(void); 176 + 177 + u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt); 178 + void __noreturn __hyp_do_panic(unsigned long, ...); 179 + 180 + #endif /* __ARM64_KVM_HYP_H__ */ 181 +

+11 -1

arch/arm64/include/asm/kvm_mmu.h

··· 23 23 #include <asm/cpufeature.h> 24 24 25 25 /* 26 - * As we only have the TTBR0_EL2 register, we cannot express 26 + * As ARMv8.0 only has the TTBR0_EL2 register, we cannot express 27 27 * "negative" addresses. This makes it impossible to directly share 28 28 * mappings with the kernel. 29 29 * 30 30 * Instead, give the HYP mode its own VA region at a fixed offset from 31 31 * the kernel by just masking the top bits (which are all ones for a 32 32 * kernel address). 33 + * 34 + * ARMv8.1 (using VHE) does have a TTBR1_EL2, and doesn't use these 35 + * macros (the entire kernel runs at EL2). 33 36 */ 34 37 #define HYP_PAGE_OFFSET_SHIFT VA_BITS 35 38 #define HYP_PAGE_OFFSET_MASK ((UL(1) << HYP_PAGE_OFFSET_SHIFT) - 1) ··· 59 56 60 57 #ifdef __ASSEMBLY__ 61 58 59 + #include <asm/alternative.h> 60 + #include <asm/cpufeature.h> 61 + 62 62 /* 63 63 * Convert a kernel VA into a HYP VA. 64 64 * reg: VA to be converted. 65 65 */ 66 66 .macro kern_hyp_va reg 67 + alternative_if_not ARM64_HAS_VIRT_HOST_EXTN 67 68 and \reg, \reg, #HYP_PAGE_OFFSET_MASK 69 + alternative_else 70 + nop 71 + alternative_endif 68 72 .endm 69 73 70 74 #else

+68

arch/arm64/include/asm/kvm_perf_event.h

··· 1 + /* 2 + * Copyright (C) 2012 ARM Ltd. 3 + * 4 + * This program is free software; you can redistribute it and/or modify 5 + * it under the terms of the GNU General Public License version 2 as 6 + * published by the Free Software Foundation. 7 + * 8 + * This program is distributed in the hope that it will be useful, 9 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 10 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 11 + * GNU General Public License for more details. 12 + * 13 + * You should have received a copy of the GNU General Public License 14 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 15 + */ 16 + 17 + #ifndef __ASM_KVM_PERF_EVENT_H 18 + #define __ASM_KVM_PERF_EVENT_H 19 + 20 + #define ARMV8_PMU_MAX_COUNTERS 32 21 + #define ARMV8_PMU_COUNTER_MASK (ARMV8_PMU_MAX_COUNTERS - 1) 22 + 23 + /* 24 + * Per-CPU PMCR: config reg 25 + */ 26 + #define ARMV8_PMU_PMCR_E (1 << 0) /* Enable all counters */ 27 + #define ARMV8_PMU_PMCR_P (1 << 1) /* Reset all counters */ 28 + #define ARMV8_PMU_PMCR_C (1 << 2) /* Cycle counter reset */ 29 + #define ARMV8_PMU_PMCR_D (1 << 3) /* CCNT counts every 64th cpu cycle */ 30 + #define ARMV8_PMU_PMCR_X (1 << 4) /* Export to ETM */ 31 + #define ARMV8_PMU_PMCR_DP (1 << 5) /* Disable CCNT if non-invasive debug*/ 32 + /* Determines which bit of PMCCNTR_EL0 generates an overflow */ 33 + #define ARMV8_PMU_PMCR_LC (1 << 6) 34 + #define ARMV8_PMU_PMCR_N_SHIFT 11 /* Number of counters supported */ 35 + #define ARMV8_PMU_PMCR_N_MASK 0x1f 36 + #define ARMV8_PMU_PMCR_MASK 0x7f /* Mask for writable bits */ 37 + 38 + /* 39 + * PMOVSR: counters overflow flag status reg 40 + */ 41 + #define ARMV8_PMU_OVSR_MASK 0xffffffff /* Mask for writable bits */ 42 + #define ARMV8_PMU_OVERFLOWED_MASK ARMV8_PMU_OVSR_MASK 43 + 44 + /* 45 + * PMXEVTYPER: Event selection reg 46 + */ 47 + #define ARMV8_PMU_EVTYPE_MASK 0xc80003ff /* Mask for writable bits */ 48 + #define ARMV8_PMU_EVTYPE_EVENT 0x3ff /* Mask for EVENT bits */ 49 + 50 + #define ARMV8_PMU_EVTYPE_EVENT_SW_INCR 0 /* Software increment event */ 51 + 52 + /* 53 + * Event filters for PMUv3 54 + */ 55 + #define ARMV8_PMU_EXCLUDE_EL1 (1 << 31) 56 + #define ARMV8_PMU_EXCLUDE_EL0 (1 << 30) 57 + #define ARMV8_PMU_INCLUDE_EL2 (1 << 27) 58 + 59 + /* 60 + * PMUSERENR: user enable reg 61 + */ 62 + #define ARMV8_PMU_USERENR_MASK 0xf /* Mask for writable bits */ 63 + #define ARMV8_PMU_USERENR_EN (1 << 0) /* PMU regs can be accessed at EL0 */ 64 + #define ARMV8_PMU_USERENR_SW (1 << 1) /* PMSWINC can be written at EL0 */ 65 + #define ARMV8_PMU_USERENR_CR (1 << 2) /* Cycle counter can be read at EL0 */ 66 + #define ARMV8_PMU_USERENR_ER (1 << 3) /* Event counter can be read at EL0 */ 67 + 68 + #endif

+10

arch/arm64/include/asm/virt.h

··· 23 23 24 24 #ifndef __ASSEMBLY__ 25 25 26 + #include <asm/ptrace.h> 27 + 26 28 /* 27 29 * __boot_cpu_mode records what mode CPUs were booted in. 28 30 * A correctly-implemented bootloader must start all CPUs in the same mode: ··· 50 48 static inline bool is_hyp_mode_mismatched(void) 51 49 { 52 50 return __boot_cpu_mode[0] != __boot_cpu_mode[1]; 51 + } 52 + 53 + static inline bool is_kernel_in_hyp_mode(void) 54 + { 55 + u64 el; 56 + 57 + asm("mrs %0, CurrentEL" : "=r" (el)); 58 + return el == CurrentEL_EL2; 53 59 } 54 60 55 61 /* The section containing the hypervisor text */

+6

arch/arm64/include/uapi/asm/kvm.h

··· 94 94 #define KVM_ARM_VCPU_POWER_OFF 0 /* CPU is started in OFF state */ 95 95 #define KVM_ARM_VCPU_EL1_32BIT 1 /* CPU running a 32bit VM */ 96 96 #define KVM_ARM_VCPU_PSCI_0_2 2 /* CPU uses PSCI v0.2 */ 97 + #define KVM_ARM_VCPU_PMU_V3 3 /* Support guest PMUv3 */ 97 98 98 99 struct kvm_vcpu_init { 99 100 __u32 target; ··· 204 203 #define KVM_DEV_ARM_VGIC_GRP_NR_IRQS 3 205 204 #define KVM_DEV_ARM_VGIC_GRP_CTRL 4 206 205 #define KVM_DEV_ARM_VGIC_CTRL_INIT 0 206 + 207 + /* Device Control API on vcpu fd */ 208 + #define KVM_ARM_VCPU_PMU_V3_CTRL 0 209 + #define KVM_ARM_VCPU_PMU_V3_IRQ 0 210 + #define KVM_ARM_VCPU_PMU_V3_INIT 1 207 211 208 212 /* KVM_IRQ_LINE irq field index values */ 209 213 #define KVM_ARM_IRQ_TYPE_SHIFT 24

-3

arch/arm64/kernel/asm-offsets.c

··· 110 110 DEFINE(CPU_USER_PT_REGS, offsetof(struct kvm_regs, regs)); 111 111 DEFINE(CPU_FP_REGS, offsetof(struct kvm_regs, fp_regs)); 112 112 DEFINE(VCPU_FPEXC32_EL2, offsetof(struct kvm_vcpu, arch.ctxt.sys_regs[FPEXC32_EL2])); 113 - DEFINE(VCPU_ESR_EL2, offsetof(struct kvm_vcpu, arch.fault.esr_el2)); 114 - DEFINE(VCPU_FAR_EL2, offsetof(struct kvm_vcpu, arch.fault.far_el2)); 115 - DEFINE(VCPU_HPFAR_EL2, offsetof(struct kvm_vcpu, arch.fault.hpfar_el2)); 116 113 DEFINE(VCPU_HOST_CONTEXT, offsetof(struct kvm_vcpu, arch.host_cpu_context)); 117 114 #endif 118 115 #ifdef CONFIG_CPU_PM

+11

arch/arm64/kernel/cpufeature.c

··· 26 26 #include <asm/cpu_ops.h> 27 27 #include <asm/processor.h> 28 28 #include <asm/sysreg.h> 29 + #include <asm/virt.h> 29 30 30 31 unsigned long elf_hwcap __read_mostly; 31 32 EXPORT_SYMBOL_GPL(elf_hwcap); ··· 622 621 return has_sre; 623 622 } 624 623 624 + static bool runs_at_el2(const struct arm64_cpu_capabilities *entry) 625 + { 626 + return is_kernel_in_hyp_mode(); 627 + } 628 + 625 629 static const struct arm64_cpu_capabilities arm64_features[] = { 626 630 { 627 631 .desc = "GIC system register CPU interface", ··· 657 651 .min_field_value = 2, 658 652 }, 659 653 #endif /* CONFIG_AS_LSE && CONFIG_ARM64_LSE_ATOMICS */ 654 + { 655 + .desc = "Virtualization Host Extensions", 656 + .capability = ARM64_HAS_VIRT_HOST_EXTN, 657 + .matches = runs_at_el2, 658 + }, 660 659 {}, 661 660 }; 662 661

+27 -1

arch/arm64/kernel/head.S

··· 30 30 #include <asm/cache.h> 31 31 #include <asm/cputype.h> 32 32 #include <asm/kernel-pgtable.h> 33 + #include <asm/kvm_arm.h> 33 34 #include <asm/memory.h> 34 35 #include <asm/pgtable-hwdef.h> 35 36 #include <asm/pgtable.h> ··· 465 464 isb 466 465 ret 467 466 467 + 2: 468 + #ifdef CONFIG_ARM64_VHE 469 + /* 470 + * Check for VHE being present. For the rest of the EL2 setup, 471 + * x2 being non-zero indicates that we do have VHE, and that the 472 + * kernel is intended to run at EL2. 473 + */ 474 + mrs x2, id_aa64mmfr1_el1 475 + ubfx x2, x2, #8, #4 476 + #else 477 + mov x2, xzr 478 + #endif 479 + 468 480 /* Hyp configuration. */ 469 - 2: mov x0, #(1 << 31) // 64-bit EL1 481 + mov x0, #HCR_RW // 64-bit EL1 482 + cbz x2, set_hcr 483 + orr x0, x0, #HCR_TGE // Enable Host Extensions 484 + orr x0, x0, #HCR_E2H 485 + set_hcr: 470 486 msr hcr_el2, x0 487 + isb 471 488 472 489 /* Generic timers. */ 473 490 mrs x0, cnthctl_el2 ··· 545 526 /* Stage-2 translation */ 546 527 msr vttbr_el2, xzr 547 528 529 + cbz x2, install_el2_stub 530 + 531 + mov w20, #BOOT_CPU_MODE_EL2 // This CPU booted in EL2 532 + isb 533 + ret 534 + 535 + install_el2_stub: 548 536 /* Hypervisor stub */ 549 537 adrp x0, __hyp_stub_vectors 550 538 add x0, x0, #:lo12:__hyp_stub_vectors

+5 -1

arch/arm64/kernel/perf_event.c

··· 20 20 */ 21 21 22 22 #include <asm/irq_regs.h> 23 + #include <asm/virt.h> 23 24 24 25 #include <linux/of.h> 25 26 #include <linux/perf/arm_pmu.h> ··· 692 691 693 692 if (attr->exclude_idle) 694 693 return -EPERM; 694 + if (is_kernel_in_hyp_mode() && 695 + attr->exclude_kernel != attr->exclude_hv) 696 + return -EINVAL; 695 697 if (attr->exclude_user) 696 698 config_base |= ARMV8_EXCLUDE_EL0; 697 - if (attr->exclude_kernel) 699 + if (!is_kernel_in_hyp_mode() && attr->exclude_kernel) 698 700 config_base |= ARMV8_EXCLUDE_EL1; 699 701 if (!attr->exclude_hv) 700 702 config_base |= ARMV8_INCLUDE_EL2;

+7

arch/arm64/kvm/Kconfig

··· 36 36 select HAVE_KVM_EVENTFD 37 37 select HAVE_KVM_IRQFD 38 38 select KVM_ARM_VGIC_V3 39 + select KVM_ARM_PMU if HW_PERF_EVENTS 39 40 ---help--- 40 41 Support hosting virtualized guest machines. 41 42 We don't support KVM with 16K page tables yet, due to the multiple ··· 48 47 bool 49 48 ---help--- 50 49 Provides host support for ARM processors. 50 + 51 + config KVM_ARM_PMU 52 + bool 53 + ---help--- 54 + Adds support for a virtual Performance Monitoring Unit (PMU) in 55 + virtual machines. 51 56 52 57 source drivers/vhost/Kconfig 53 58

+1

arch/arm64/kvm/Makefile

··· 26 26 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3.o 27 27 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/vgic-v3-emul.o 28 28 kvm-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/arch_timer.o 29 + kvm-$(CONFIG_KVM_ARM_PMU) += $(KVM)/arm/pmu.o

+51

arch/arm64/kvm/guest.c

··· 380 380 } 381 381 return 0; 382 382 } 383 + 384 + int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu, 385 + struct kvm_device_attr *attr) 386 + { 387 + int ret; 388 + 389 + switch (attr->group) { 390 + case KVM_ARM_VCPU_PMU_V3_CTRL: 391 + ret = kvm_arm_pmu_v3_set_attr(vcpu, attr); 392 + break; 393 + default: 394 + ret = -ENXIO; 395 + break; 396 + } 397 + 398 + return ret; 399 + } 400 + 401 + int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu, 402 + struct kvm_device_attr *attr) 403 + { 404 + int ret; 405 + 406 + switch (attr->group) { 407 + case KVM_ARM_VCPU_PMU_V3_CTRL: 408 + ret = kvm_arm_pmu_v3_get_attr(vcpu, attr); 409 + break; 410 + default: 411 + ret = -ENXIO; 412 + break; 413 + } 414 + 415 + return ret; 416 + } 417 + 418 + int kvm_arm_vcpu_arch_has_attr(struct kvm_vcpu *vcpu, 419 + struct kvm_device_attr *attr) 420 + { 421 + int ret; 422 + 423 + switch (attr->group) { 424 + case KVM_ARM_VCPU_PMU_V3_CTRL: 425 + ret = kvm_arm_pmu_v3_has_attr(vcpu, attr); 426 + break; 427 + default: 428 + ret = -ENXIO; 429 + break; 430 + } 431 + 432 + return ret; 433 + }

+1 -14

arch/arm64/kvm/hyp-init.S

··· 87 87 #endif 88 88 /* 89 89 * Read the PARange bits from ID_AA64MMFR0_EL1 and set the PS bits in 90 - * TCR_EL2 and VTCR_EL2. 90 + * TCR_EL2. 91 91 */ 92 92 mrs x5, ID_AA64MMFR0_EL1 93 93 bfi x4, x5, #16, #3 94 94 95 95 msr tcr_el2, x4 96 - 97 - ldr x4, =VTCR_EL2_FLAGS 98 - bfi x4, x5, #16, #3 99 - /* 100 - * Read the VMIDBits bits from ID_AA64MMFR1_EL1 and set the VS bit in 101 - * VTCR_EL2. 102 - */ 103 - mrs x5, ID_AA64MMFR1_EL1 104 - ubfx x5, x5, #5, #1 105 - lsl x5, x5, #VTCR_EL2_VS 106 - orr x4, x4, x5 107 - 108 - msr vtcr_el2, x4 109 96 110 97 mrs x4, mair_el1 111 98 msr mair_el2, x4

+7

arch/arm64/kvm/hyp.S

··· 17 17 18 18 #include <linux/linkage.h> 19 19 20 + #include <asm/alternative.h> 20 21 #include <asm/assembler.h> 22 + #include <asm/cpufeature.h> 21 23 22 24 /* 23 25 * u64 kvm_call_hyp(void *hypfn, ...); ··· 40 38 * arch/arm64/kernel/hyp_stub.S. 41 39 */ 42 40 ENTRY(kvm_call_hyp) 41 + alternative_if_not ARM64_HAS_VIRT_HOST_EXTN 43 42 hvc #0 44 43 ret 44 + alternative_else 45 + b __vhe_hyp_call 46 + nop 47 + alternative_endif 45 48 ENDPROC(kvm_call_hyp)

+6 -2

arch/arm64/kvm/hyp/Makefile

··· 2 2 # Makefile for Kernel-based Virtual Machine module, HYP part 3 3 # 4 4 5 - obj-$(CONFIG_KVM_ARM_HOST) += vgic-v2-sr.o 5 + KVM=../../../../virt/kvm 6 + 7 + obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/vgic-v2-sr.o 8 + obj-$(CONFIG_KVM_ARM_HOST) += $(KVM)/arm/hyp/timer-sr.o 9 + 6 10 obj-$(CONFIG_KVM_ARM_HOST) += vgic-v3-sr.o 7 - obj-$(CONFIG_KVM_ARM_HOST) += timer-sr.o 8 11 obj-$(CONFIG_KVM_ARM_HOST) += sysreg-sr.o 9 12 obj-$(CONFIG_KVM_ARM_HOST) += debug-sr.o 10 13 obj-$(CONFIG_KVM_ARM_HOST) += entry.o ··· 15 12 obj-$(CONFIG_KVM_ARM_HOST) += fpsimd.o 16 13 obj-$(CONFIG_KVM_ARM_HOST) += tlb.o 17 14 obj-$(CONFIG_KVM_ARM_HOST) += hyp-entry.o 15 + obj-$(CONFIG_KVM_ARM_HOST) += s2-setup.o

+1 -3

arch/arm64/kvm/hyp/debug-sr.c

··· 19 19 #include <linux/kvm_host.h> 20 20 21 21 #include <asm/kvm_asm.h> 22 - #include <asm/kvm_mmu.h> 23 - 24 - #include "hyp.h" 22 + #include <asm/kvm_hyp.h> 25 23 26 24 #define read_debug(r,n) read_sysreg(r##n##_el1) 27 25 #define write_debug(v,r,n) write_sysreg(v, r##n##_el1)

+6

arch/arm64/kvm/hyp/entry.S

··· 130 130 ENTRY(__fpsimd_guest_restore) 131 131 stp x4, lr, [sp, #-16]! 132 132 133 + alternative_if_not ARM64_HAS_VIRT_HOST_EXTN 133 134 mrs x2, cptr_el2 134 135 bic x2, x2, #CPTR_EL2_TFP 135 136 msr cptr_el2, x2 137 + alternative_else 138 + mrs x2, cpacr_el1 139 + orr x2, x2, #CPACR_EL1_FPEN 140 + msr cpacr_el1, x2 141 + alternative_endif 136 142 isb 137 143 138 144 mrs x3, tpidr_el2

+36 -73

arch/arm64/kvm/hyp/hyp-entry.S

··· 19 19 20 20 #include <asm/alternative.h> 21 21 #include <asm/assembler.h> 22 - #include <asm/asm-offsets.h> 23 22 #include <asm/cpufeature.h> 24 23 #include <asm/kvm_arm.h> 25 24 #include <asm/kvm_asm.h> ··· 37 38 ldp x0, x1, [sp], #16 38 39 .endm 39 40 41 + .macro do_el2_call 42 + /* 43 + * Shuffle the parameters before calling the function 44 + * pointed to in x0. Assumes parameters in x[1,2,3]. 45 + */ 46 + sub sp, sp, #16 47 + str lr, [sp] 48 + mov lr, x0 49 + mov x0, x1 50 + mov x1, x2 51 + mov x2, x3 52 + blr lr 53 + ldr lr, [sp] 54 + add sp, sp, #16 55 + .endm 56 + 57 + ENTRY(__vhe_hyp_call) 58 + do_el2_call 59 + /* 60 + * We used to rely on having an exception return to get 61 + * an implicit isb. In the E2H case, we don't have it anymore. 62 + * rather than changing all the leaf functions, just do it here 63 + * before returning to the rest of the kernel. 64 + */ 65 + isb 66 + ret 67 + ENDPROC(__vhe_hyp_call) 68 + 40 69 el1_sync: // Guest trapped into EL2 41 70 save_x0_to_x3 42 71 72 + alternative_if_not ARM64_HAS_VIRT_HOST_EXTN 43 73 mrs x1, esr_el2 74 + alternative_else 75 + mrs x1, esr_el1 76 + alternative_endif 44 77 lsr x2, x1, #ESR_ELx_EC_SHIFT 45 78 46 79 cmp x2, #ESR_ELx_EC_HVC64 ··· 89 58 mrs x0, vbar_el2 90 59 b 2f 91 60 92 - 1: stp lr, xzr, [sp, #-16]! 93 - 61 + 1: 94 62 /* 95 - * Compute the function address in EL2, and shuffle the parameters. 63 + * Perform the EL2 call 96 64 */ 97 65 kern_hyp_va x0 98 - mov lr, x0 99 - mov x0, x1 100 - mov x1, x2 101 - mov x2, x3 102 - blr lr 66 + do_el2_call 103 67 104 - ldp lr, xzr, [sp], #16 105 68 2: eret 106 69 107 70 el1_trap: ··· 108 83 cmp x2, #ESR_ELx_EC_FP_ASIMD 109 84 b.eq __fpsimd_guest_restore 110 85 111 - cmp x2, #ESR_ELx_EC_DABT_LOW 112 - mov x0, #ESR_ELx_EC_IABT_LOW 113 - ccmp x2, x0, #4, ne 114 - b.ne 1f // Not an abort we care about 115 - 116 - /* This is an abort. Check for permission fault */ 117 - alternative_if_not ARM64_WORKAROUND_834220 118 - and x2, x1, #ESR_ELx_FSC_TYPE 119 - cmp x2, #FSC_PERM 120 - b.ne 1f // Not a permission fault 121 - alternative_else 122 - nop // Use the permission fault path to 123 - nop // check for a valid S1 translation, 124 - nop // regardless of the ESR value. 125 - alternative_endif 126 - 127 - /* 128 - * Check for Stage-1 page table walk, which is guaranteed 129 - * to give a valid HPFAR_EL2. 130 - */ 131 - tbnz x1, #7, 1f // S1PTW is set 132 - 133 - /* Preserve PAR_EL1 */ 134 - mrs x3, par_el1 135 - stp x3, xzr, [sp, #-16]! 136 - 137 - /* 138 - * Permission fault, HPFAR_EL2 is invalid. 139 - * Resolve the IPA the hard way using the guest VA. 140 - * Stage-1 translation already validated the memory access rights. 141 - * As such, we can use the EL1 translation regime, and don't have 142 - * to distinguish between EL0 and EL1 access. 143 - */ 144 - mrs x2, far_el2 145 - at s1e1r, x2 146 - isb 147 - 148 - /* Read result */ 149 - mrs x3, par_el1 150 - ldp x0, xzr, [sp], #16 // Restore PAR_EL1 from the stack 151 - msr par_el1, x0 152 - tbnz x3, #0, 3f // Bail out if we failed the translation 153 - ubfx x3, x3, #12, #36 // Extract IPA 154 - lsl x3, x3, #4 // and present it like HPFAR 155 - b 2f 156 - 157 - 1: mrs x3, hpfar_el2 158 - mrs x2, far_el2 159 - 160 - 2: mrs x0, tpidr_el2 161 - str w1, [x0, #VCPU_ESR_EL2] 162 - str x2, [x0, #VCPU_FAR_EL2] 163 - str x3, [x0, #VCPU_HPFAR_EL2] 164 - 86 + mrs x0, tpidr_el2 165 87 mov x1, #ARM_EXCEPTION_TRAP 166 88 b __guest_exit 167 - 168 - /* 169 - * Translation failed. Just return to the guest and 170 - * let it fault again. Another CPU is probably playing 171 - * behind our back. 172 - */ 173 - 3: restore_x0_to_x3 174 - 175 - eret 176 89 177 90 el1_irq: 178 91 save_x0_to_x3

-90

arch/arm64/kvm/hyp/hyp.h

··· 1 - /* 2 - * Copyright (C) 2015 - ARM Ltd 3 - * Author: Marc Zyngier <marc.zyngier@arm.com> 4 - * 5 - * This program is free software; you can redistribute it and/or modify 6 - * it under the terms of the GNU General Public License version 2 as 7 - * published by the Free Software Foundation. 8 - * 9 - * This program is distributed in the hope that it will be useful, 10 - * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 - * GNU General Public License for more details. 13 - * 14 - * You should have received a copy of the GNU General Public License 15 - * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 - */ 17 - 18 - #ifndef __ARM64_KVM_HYP_H__ 19 - #define __ARM64_KVM_HYP_H__ 20 - 21 - #include <linux/compiler.h> 22 - #include <linux/kvm_host.h> 23 - #include <asm/kvm_mmu.h> 24 - #include <asm/sysreg.h> 25 - 26 - #define __hyp_text __section(.hyp.text) notrace 27 - 28 - #define kern_hyp_va(v) (typeof(v))((unsigned long)(v) & HYP_PAGE_OFFSET_MASK) 29 - #define hyp_kern_va(v) (typeof(v))((unsigned long)(v) - HYP_PAGE_OFFSET \ 30 - + PAGE_OFFSET) 31 - 32 - /** 33 - * hyp_alternate_select - Generates patchable code sequences that are 34 - * used to switch between two implementations of a function, depending 35 - * on the availability of a feature. 36 - * 37 - * @fname: a symbol name that will be defined as a function returning a 38 - * function pointer whose type will match @orig and @alt 39 - * @orig: A pointer to the default function, as returned by @fname when 40 - * @cond doesn't hold 41 - * @alt: A pointer to the alternate function, as returned by @fname 42 - * when @cond holds 43 - * @cond: a CPU feature (as described in asm/cpufeature.h) 44 - */ 45 - #define hyp_alternate_select(fname, orig, alt, cond) \ 46 - typeof(orig) * __hyp_text fname(void) \ 47 - { \ 48 - typeof(alt) *val = orig; \ 49 - asm volatile(ALTERNATIVE("nop \n", \ 50 - "mov %0, %1 \n", \ 51 - cond) \ 52 - : "+r" (val) : "r" (alt)); \ 53 - return val; \ 54 - } 55 - 56 - void __vgic_v2_save_state(struct kvm_vcpu *vcpu); 57 - void __vgic_v2_restore_state(struct kvm_vcpu *vcpu); 58 - 59 - void __vgic_v3_save_state(struct kvm_vcpu *vcpu); 60 - void __vgic_v3_restore_state(struct kvm_vcpu *vcpu); 61 - 62 - void __timer_save_state(struct kvm_vcpu *vcpu); 63 - void __timer_restore_state(struct kvm_vcpu *vcpu); 64 - 65 - void __sysreg_save_state(struct kvm_cpu_context *ctxt); 66 - void __sysreg_restore_state(struct kvm_cpu_context *ctxt); 67 - void __sysreg32_save_state(struct kvm_vcpu *vcpu); 68 - void __sysreg32_restore_state(struct kvm_vcpu *vcpu); 69 - 70 - void __debug_save_state(struct kvm_vcpu *vcpu, 71 - struct kvm_guest_debug_arch *dbg, 72 - struct kvm_cpu_context *ctxt); 73 - void __debug_restore_state(struct kvm_vcpu *vcpu, 74 - struct kvm_guest_debug_arch *dbg, 75 - struct kvm_cpu_context *ctxt); 76 - void __debug_cond_save_host_state(struct kvm_vcpu *vcpu); 77 - void __debug_cond_restore_host_state(struct kvm_vcpu *vcpu); 78 - 79 - void __fpsimd_save_state(struct user_fpsimd_state *fp_regs); 80 - void __fpsimd_restore_state(struct user_fpsimd_state *fp_regs); 81 - static inline bool __fpsimd_enabled(void) 82 - { 83 - return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP); 84 - } 85 - 86 - u64 __guest_enter(struct kvm_vcpu *vcpu, struct kvm_cpu_context *host_ctxt); 87 - void __noreturn __hyp_do_panic(unsigned long, ...); 88 - 89 - #endif /* __ARM64_KVM_HYP_H__ */ 90 -

+43

arch/arm64/kvm/hyp/s2-setup.c

··· 1 + /* 2 + * Copyright (C) 2016 - ARM Ltd 3 + * Author: Marc Zyngier <marc.zyngier@arm.com> 4 + * 5 + * This program is free software; you can redistribute it and/or modify 6 + * it under the terms of the GNU General Public License version 2 as 7 + * published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it will be useful, 10 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 + * GNU General Public License for more details. 13 + * 14 + * You should have received a copy of the GNU General Public License 15 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 + */ 17 + 18 + #include <linux/types.h> 19 + #include <asm/kvm_arm.h> 20 + #include <asm/kvm_asm.h> 21 + #include <asm/kvm_hyp.h> 22 + 23 + void __hyp_text __init_stage2_translation(void) 24 + { 25 + u64 val = VTCR_EL2_FLAGS; 26 + u64 tmp; 27 + 28 + /* 29 + * Read the PARange bits from ID_AA64MMFR0_EL1 and set the PS 30 + * bits in VTCR_EL2. Amusingly, the PARange is 4 bits, while 31 + * PS is only 3. Fortunately, bit 19 is RES0 in VTCR_EL2... 32 + */ 33 + val |= (read_sysreg(id_aa64mmfr0_el1) & 7) << 16; 34 + 35 + /* 36 + * Read the VMIDBits bits from ID_AA64MMFR1_EL1 and set the VS 37 + * bit in VTCR_EL2. 38 + */ 39 + tmp = (read_sysreg(id_aa64mmfr1_el1) >> 4) & 0xf; 40 + val |= (tmp == 2) ? VTCR_EL2_VS : 0; 41 + 42 + write_sysreg(val, vtcr_el2); 43 + }

+187 -21

arch/arm64/kvm/hyp/switch.c

··· 15 15 * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 16 */ 17 17 18 - #include "hyp.h" 18 + #include <linux/types.h> 19 + #include <asm/kvm_asm.h> 20 + #include <asm/kvm_hyp.h> 21 + 22 + static bool __hyp_text __fpsimd_enabled_nvhe(void) 23 + { 24 + return !(read_sysreg(cptr_el2) & CPTR_EL2_TFP); 25 + } 26 + 27 + static bool __hyp_text __fpsimd_enabled_vhe(void) 28 + { 29 + return !!(read_sysreg(cpacr_el1) & CPACR_EL1_FPEN); 30 + } 31 + 32 + static hyp_alternate_select(__fpsimd_is_enabled, 33 + __fpsimd_enabled_nvhe, __fpsimd_enabled_vhe, 34 + ARM64_HAS_VIRT_HOST_EXTN); 35 + 36 + bool __hyp_text __fpsimd_enabled(void) 37 + { 38 + return __fpsimd_is_enabled()(); 39 + } 40 + 41 + static void __hyp_text __activate_traps_vhe(void) 42 + { 43 + u64 val; 44 + 45 + val = read_sysreg(cpacr_el1); 46 + val |= CPACR_EL1_TTA; 47 + val &= ~CPACR_EL1_FPEN; 48 + write_sysreg(val, cpacr_el1); 49 + 50 + write_sysreg(__kvm_hyp_vector, vbar_el1); 51 + } 52 + 53 + static void __hyp_text __activate_traps_nvhe(void) 54 + { 55 + u64 val; 56 + 57 + val = CPTR_EL2_DEFAULT; 58 + val |= CPTR_EL2_TTA | CPTR_EL2_TFP; 59 + write_sysreg(val, cptr_el2); 60 + } 61 + 62 + static hyp_alternate_select(__activate_traps_arch, 63 + __activate_traps_nvhe, __activate_traps_vhe, 64 + ARM64_HAS_VIRT_HOST_EXTN); 19 65 20 66 static void __hyp_text __activate_traps(struct kvm_vcpu *vcpu) 21 67 { ··· 82 36 write_sysreg(val, hcr_el2); 83 37 /* Trap on AArch32 cp15 c15 accesses (EL1 or EL0) */ 84 38 write_sysreg(1 << 15, hstr_el2); 85 - 86 - val = CPTR_EL2_DEFAULT; 87 - val |= CPTR_EL2_TTA | CPTR_EL2_TFP; 88 - write_sysreg(val, cptr_el2); 89 - 39 + /* Make sure we trap PMU access from EL0 to EL2 */ 40 + write_sysreg(ARMV8_PMU_USERENR_MASK, pmuserenr_el0); 90 41 write_sysreg(vcpu->arch.mdcr_el2, mdcr_el2); 42 + __activate_traps_arch()(); 91 43 } 44 + 45 + static void __hyp_text __deactivate_traps_vhe(void) 46 + { 47 + extern char vectors[]; /* kernel exception vectors */ 48 + 49 + write_sysreg(HCR_HOST_VHE_FLAGS, hcr_el2); 50 + write_sysreg(CPACR_EL1_FPEN, cpacr_el1); 51 + write_sysreg(vectors, vbar_el1); 52 + } 53 + 54 + static void __hyp_text __deactivate_traps_nvhe(void) 55 + { 56 + write_sysreg(HCR_RW, hcr_el2); 57 + write_sysreg(CPTR_EL2_DEFAULT, cptr_el2); 58 + } 59 + 60 + static hyp_alternate_select(__deactivate_traps_arch, 61 + __deactivate_traps_nvhe, __deactivate_traps_vhe, 62 + ARM64_HAS_VIRT_HOST_EXTN); 92 63 93 64 static void __hyp_text __deactivate_traps(struct kvm_vcpu *vcpu) 94 65 { 95 - write_sysreg(HCR_RW, hcr_el2); 66 + __deactivate_traps_arch()(); 96 67 write_sysreg(0, hstr_el2); 97 68 write_sysreg(read_sysreg(mdcr_el2) & MDCR_EL2_HPMN_MASK, mdcr_el2); 98 - write_sysreg(CPTR_EL2_DEFAULT, cptr_el2); 69 + write_sysreg(0, pmuserenr_el0); 99 70 } 100 71 101 72 static void __hyp_text __activate_vm(struct kvm_vcpu *vcpu) ··· 152 89 __vgic_call_restore_state()(vcpu); 153 90 } 154 91 92 + static bool __hyp_text __true_value(void) 93 + { 94 + return true; 95 + } 96 + 97 + static bool __hyp_text __false_value(void) 98 + { 99 + return false; 100 + } 101 + 102 + static hyp_alternate_select(__check_arm_834220, 103 + __false_value, __true_value, 104 + ARM64_WORKAROUND_834220); 105 + 106 + static bool __hyp_text __translate_far_to_hpfar(u64 far, u64 *hpfar) 107 + { 108 + u64 par, tmp; 109 + 110 + /* 111 + * Resolve the IPA the hard way using the guest VA. 112 + * 113 + * Stage-1 translation already validated the memory access 114 + * rights. As such, we can use the EL1 translation regime, and 115 + * don't have to distinguish between EL0 and EL1 access. 116 + * 117 + * We do need to save/restore PAR_EL1 though, as we haven't 118 + * saved the guest context yet, and we may return early... 119 + */ 120 + par = read_sysreg(par_el1); 121 + asm volatile("at s1e1r, %0" : : "r" (far)); 122 + isb(); 123 + 124 + tmp = read_sysreg(par_el1); 125 + write_sysreg(par, par_el1); 126 + 127 + if (unlikely(tmp & 1)) 128 + return false; /* Translation failed, back to guest */ 129 + 130 + /* Convert PAR to HPFAR format */ 131 + *hpfar = ((tmp >> 12) & ((1UL << 36) - 1)) << 4; 132 + return true; 133 + } 134 + 135 + static bool __hyp_text __populate_fault_info(struct kvm_vcpu *vcpu) 136 + { 137 + u64 esr = read_sysreg_el2(esr); 138 + u8 ec = esr >> ESR_ELx_EC_SHIFT; 139 + u64 hpfar, far; 140 + 141 + vcpu->arch.fault.esr_el2 = esr; 142 + 143 + if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW) 144 + return true; 145 + 146 + far = read_sysreg_el2(far); 147 + 148 + /* 149 + * The HPFAR can be invalid if the stage 2 fault did not 150 + * happen during a stage 1 page table walk (the ESR_EL2.S1PTW 151 + * bit is clear) and one of the two following cases are true: 152 + * 1. The fault was due to a permission fault 153 + * 2. The processor carries errata 834220 154 + * 155 + * Therefore, for all non S1PTW faults where we either have a 156 + * permission fault or the errata workaround is enabled, we 157 + * resolve the IPA using the AT instruction. 158 + */ 159 + if (!(esr & ESR_ELx_S1PTW) && 160 + (__check_arm_834220()() || (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) { 161 + if (!__translate_far_to_hpfar(far, &hpfar)) 162 + return false; 163 + } else { 164 + hpfar = read_sysreg(hpfar_el2); 165 + } 166 + 167 + vcpu->arch.fault.far_el2 = far; 168 + vcpu->arch.fault.hpfar_el2 = hpfar; 169 + return true; 170 + } 171 + 155 172 static int __hyp_text __guest_run(struct kvm_vcpu *vcpu) 156 173 { 157 174 struct kvm_cpu_context *host_ctxt; ··· 245 102 host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); 246 103 guest_ctxt = &vcpu->arch.ctxt; 247 104 248 - __sysreg_save_state(host_ctxt); 105 + __sysreg_save_host_state(host_ctxt); 249 106 __debug_cond_save_host_state(vcpu); 250 107 251 108 __activate_traps(vcpu); ··· 259 116 * to Cortex-A57 erratum #852523. 260 117 */ 261 118 __sysreg32_restore_state(vcpu); 262 - __sysreg_restore_state(guest_ctxt); 119 + __sysreg_restore_guest_state(guest_ctxt); 263 120 __debug_restore_state(vcpu, kern_hyp_va(vcpu->arch.debug_ptr), guest_ctxt); 264 121 265 122 /* Jump in the fire! */ 123 + again: 266 124 exit_code = __guest_enter(vcpu, host_ctxt); 267 125 /* And we're baaack! */ 268 126 127 + if (exit_code == ARM_EXCEPTION_TRAP && !__populate_fault_info(vcpu)) 128 + goto again; 129 + 269 130 fp_enabled = __fpsimd_enabled(); 270 131 271 - __sysreg_save_state(guest_ctxt); 132 + __sysreg_save_guest_state(guest_ctxt); 272 133 __sysreg32_save_state(vcpu); 273 134 __timer_save_state(vcpu); 274 135 __vgic_save_state(vcpu); ··· 280 133 __deactivate_traps(vcpu); 281 134 __deactivate_vm(vcpu); 282 135 283 - __sysreg_restore_state(host_ctxt); 136 + __sysreg_restore_host_state(host_ctxt); 284 137 285 138 if (fp_enabled) { 286 139 __fpsimd_save_state(&guest_ctxt->gp_regs.fp_regs); ··· 297 150 298 151 static const char __hyp_panic_string[] = "HYP panic:\nPS:%08llx PC:%016llx ESR:%08llx\nFAR:%016llx HPFAR:%016llx PAR:%016llx\nVCPU:%p\n"; 299 152 300 - void __hyp_text __noreturn __hyp_panic(void) 153 + static void __hyp_text __hyp_call_panic_nvhe(u64 spsr, u64 elr, u64 par) 301 154 { 302 155 unsigned long str_va = (unsigned long)__hyp_panic_string; 303 - u64 spsr = read_sysreg(spsr_el2); 304 - u64 elr = read_sysreg(elr_el2); 156 + 157 + __hyp_do_panic(hyp_kern_va(str_va), 158 + spsr, elr, 159 + read_sysreg(esr_el2), read_sysreg_el2(far), 160 + read_sysreg(hpfar_el2), par, 161 + (void *)read_sysreg(tpidr_el2)); 162 + } 163 + 164 + static void __hyp_text __hyp_call_panic_vhe(u64 spsr, u64 elr, u64 par) 165 + { 166 + panic(__hyp_panic_string, 167 + spsr, elr, 168 + read_sysreg_el2(esr), read_sysreg_el2(far), 169 + read_sysreg(hpfar_el2), par, 170 + (void *)read_sysreg(tpidr_el2)); 171 + } 172 + 173 + static hyp_alternate_select(__hyp_call_panic, 174 + __hyp_call_panic_nvhe, __hyp_call_panic_vhe, 175 + ARM64_HAS_VIRT_HOST_EXTN); 176 + 177 + void __hyp_text __noreturn __hyp_panic(void) 178 + { 179 + u64 spsr = read_sysreg_el2(spsr); 180 + u64 elr = read_sysreg_el2(elr); 305 181 u64 par = read_sysreg(par_el1); 306 182 307 183 if (read_sysreg(vttbr_el2)) { ··· 335 165 host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context); 336 166 __deactivate_traps(vcpu); 337 167 __deactivate_vm(vcpu); 338 - __sysreg_restore_state(host_ctxt); 168 + __sysreg_restore_host_state(host_ctxt); 339 169 } 340 170 341 171 /* Call panic for real */ 342 - __hyp_do_panic(hyp_kern_va(str_va), 343 - spsr, elr, 344 - read_sysreg(esr_el2), read_sysreg(far_el2), 345 - read_sysreg(hpfar_el2), par, 346 - (void *)read_sysreg(tpidr_el2)); 172 + __hyp_call_panic()(spsr, elr, par); 347 173 348 174 unreachable(); 349 175 }

+97 -50

arch/arm64/kvm/hyp/sysreg-sr.c

··· 19 19 #include <linux/kvm_host.h> 20 20 21 21 #include <asm/kvm_asm.h> 22 - #include <asm/kvm_mmu.h> 22 + #include <asm/kvm_hyp.h> 23 23 24 - #include "hyp.h" 24 + /* Yes, this does nothing, on purpose */ 25 + static void __hyp_text __sysreg_do_nothing(struct kvm_cpu_context *ctxt) { } 25 26 26 - /* ctxt is already in the HYP VA space */ 27 - void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt) 27 + /* 28 + * Non-VHE: Both host and guest must save everything. 29 + * 30 + * VHE: Host must save tpidr*_el[01], actlr_el1, sp0, pc, pstate, and 31 + * guest must save everything. 32 + */ 33 + 34 + static void __hyp_text __sysreg_save_common_state(struct kvm_cpu_context *ctxt) 28 35 { 29 - ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2); 30 - ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1); 31 - ctxt->sys_regs[SCTLR_EL1] = read_sysreg(sctlr_el1); 32 36 ctxt->sys_regs[ACTLR_EL1] = read_sysreg(actlr_el1); 33 - ctxt->sys_regs[CPACR_EL1] = read_sysreg(cpacr_el1); 34 - ctxt->sys_regs[TTBR0_EL1] = read_sysreg(ttbr0_el1); 35 - ctxt->sys_regs[TTBR1_EL1] = read_sysreg(ttbr1_el1); 36 - ctxt->sys_regs[TCR_EL1] = read_sysreg(tcr_el1); 37 - ctxt->sys_regs[ESR_EL1] = read_sysreg(esr_el1); 38 - ctxt->sys_regs[AFSR0_EL1] = read_sysreg(afsr0_el1); 39 - ctxt->sys_regs[AFSR1_EL1] = read_sysreg(afsr1_el1); 40 - ctxt->sys_regs[FAR_EL1] = read_sysreg(far_el1); 41 - ctxt->sys_regs[MAIR_EL1] = read_sysreg(mair_el1); 42 - ctxt->sys_regs[VBAR_EL1] = read_sysreg(vbar_el1); 43 - ctxt->sys_regs[CONTEXTIDR_EL1] = read_sysreg(contextidr_el1); 44 37 ctxt->sys_regs[TPIDR_EL0] = read_sysreg(tpidr_el0); 45 38 ctxt->sys_regs[TPIDRRO_EL0] = read_sysreg(tpidrro_el0); 46 39 ctxt->sys_regs[TPIDR_EL1] = read_sysreg(tpidr_el1); 47 - ctxt->sys_regs[AMAIR_EL1] = read_sysreg(amair_el1); 48 - ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg(cntkctl_el1); 40 + ctxt->gp_regs.regs.sp = read_sysreg(sp_el0); 41 + ctxt->gp_regs.regs.pc = read_sysreg_el2(elr); 42 + ctxt->gp_regs.regs.pstate = read_sysreg_el2(spsr); 43 + } 44 + 45 + static void __hyp_text __sysreg_save_state(struct kvm_cpu_context *ctxt) 46 + { 47 + ctxt->sys_regs[MPIDR_EL1] = read_sysreg(vmpidr_el2); 48 + ctxt->sys_regs[CSSELR_EL1] = read_sysreg(csselr_el1); 49 + ctxt->sys_regs[SCTLR_EL1] = read_sysreg_el1(sctlr); 50 + ctxt->sys_regs[CPACR_EL1] = read_sysreg_el1(cpacr); 51 + ctxt->sys_regs[TTBR0_EL1] = read_sysreg_el1(ttbr0); 52 + ctxt->sys_regs[TTBR1_EL1] = read_sysreg_el1(ttbr1); 53 + ctxt->sys_regs[TCR_EL1] = read_sysreg_el1(tcr); 54 + ctxt->sys_regs[ESR_EL1] = read_sysreg_el1(esr); 55 + ctxt->sys_regs[AFSR0_EL1] = read_sysreg_el1(afsr0); 56 + ctxt->sys_regs[AFSR1_EL1] = read_sysreg_el1(afsr1); 57 + ctxt->sys_regs[FAR_EL1] = read_sysreg_el1(far); 58 + ctxt->sys_regs[MAIR_EL1] = read_sysreg_el1(mair); 59 + ctxt->sys_regs[VBAR_EL1] = read_sysreg_el1(vbar); 60 + ctxt->sys_regs[CONTEXTIDR_EL1] = read_sysreg_el1(contextidr); 61 + ctxt->sys_regs[AMAIR_EL1] = read_sysreg_el1(amair); 62 + ctxt->sys_regs[CNTKCTL_EL1] = read_sysreg_el1(cntkctl); 49 63 ctxt->sys_regs[PAR_EL1] = read_sysreg(par_el1); 50 64 ctxt->sys_regs[MDSCR_EL1] = read_sysreg(mdscr_el1); 51 65 52 - ctxt->gp_regs.regs.sp = read_sysreg(sp_el0); 53 - ctxt->gp_regs.regs.pc = read_sysreg(elr_el2); 54 - ctxt->gp_regs.regs.pstate = read_sysreg(spsr_el2); 55 66 ctxt->gp_regs.sp_el1 = read_sysreg(sp_el1); 56 - ctxt->gp_regs.elr_el1 = read_sysreg(elr_el1); 57 - ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg(spsr_el1); 67 + ctxt->gp_regs.elr_el1 = read_sysreg_el1(elr); 68 + ctxt->gp_regs.spsr[KVM_SPSR_EL1]= read_sysreg_el1(spsr); 58 69 } 59 70 60 - void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt) 71 + static hyp_alternate_select(__sysreg_call_save_host_state, 72 + __sysreg_save_state, __sysreg_do_nothing, 73 + ARM64_HAS_VIRT_HOST_EXTN); 74 + 75 + void __hyp_text __sysreg_save_host_state(struct kvm_cpu_context *ctxt) 61 76 { 62 - write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2); 63 - write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1); 64 - write_sysreg(ctxt->sys_regs[SCTLR_EL1], sctlr_el1); 77 + __sysreg_call_save_host_state()(ctxt); 78 + __sysreg_save_common_state(ctxt); 79 + } 80 + 81 + void __hyp_text __sysreg_save_guest_state(struct kvm_cpu_context *ctxt) 82 + { 83 + __sysreg_save_state(ctxt); 84 + __sysreg_save_common_state(ctxt); 85 + } 86 + 87 + static void __hyp_text __sysreg_restore_common_state(struct kvm_cpu_context *ctxt) 88 + { 65 89 write_sysreg(ctxt->sys_regs[ACTLR_EL1], actlr_el1); 66 - write_sysreg(ctxt->sys_regs[CPACR_EL1], cpacr_el1); 67 - write_sysreg(ctxt->sys_regs[TTBR0_EL1], ttbr0_el1); 68 - write_sysreg(ctxt->sys_regs[TTBR1_EL1], ttbr1_el1); 69 - write_sysreg(ctxt->sys_regs[TCR_EL1], tcr_el1); 70 - write_sysreg(ctxt->sys_regs[ESR_EL1], esr_el1); 71 - write_sysreg(ctxt->sys_regs[AFSR0_EL1], afsr0_el1); 72 - write_sysreg(ctxt->sys_regs[AFSR1_EL1], afsr1_el1); 73 - write_sysreg(ctxt->sys_regs[FAR_EL1], far_el1); 74 - write_sysreg(ctxt->sys_regs[MAIR_EL1], mair_el1); 75 - write_sysreg(ctxt->sys_regs[VBAR_EL1], vbar_el1); 76 - write_sysreg(ctxt->sys_regs[CONTEXTIDR_EL1], contextidr_el1); 77 90 write_sysreg(ctxt->sys_regs[TPIDR_EL0], tpidr_el0); 78 91 write_sysreg(ctxt->sys_regs[TPIDRRO_EL0], tpidrro_el0); 79 92 write_sysreg(ctxt->sys_regs[TPIDR_EL1], tpidr_el1); 80 - write_sysreg(ctxt->sys_regs[AMAIR_EL1], amair_el1); 81 - write_sysreg(ctxt->sys_regs[CNTKCTL_EL1], cntkctl_el1); 82 - write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1); 83 - write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1); 93 + write_sysreg(ctxt->gp_regs.regs.sp, sp_el0); 94 + write_sysreg_el2(ctxt->gp_regs.regs.pc, elr); 95 + write_sysreg_el2(ctxt->gp_regs.regs.pstate, spsr); 96 + } 84 97 85 - write_sysreg(ctxt->gp_regs.regs.sp, sp_el0); 86 - write_sysreg(ctxt->gp_regs.regs.pc, elr_el2); 87 - write_sysreg(ctxt->gp_regs.regs.pstate, spsr_el2); 88 - write_sysreg(ctxt->gp_regs.sp_el1, sp_el1); 89 - write_sysreg(ctxt->gp_regs.elr_el1, elr_el1); 90 - write_sysreg(ctxt->gp_regs.spsr[KVM_SPSR_EL1], spsr_el1); 98 + static void __hyp_text __sysreg_restore_state(struct kvm_cpu_context *ctxt) 99 + { 100 + write_sysreg(ctxt->sys_regs[MPIDR_EL1], vmpidr_el2); 101 + write_sysreg(ctxt->sys_regs[CSSELR_EL1], csselr_el1); 102 + write_sysreg_el1(ctxt->sys_regs[SCTLR_EL1], sctlr); 103 + write_sysreg_el1(ctxt->sys_regs[CPACR_EL1], cpacr); 104 + write_sysreg_el1(ctxt->sys_regs[TTBR0_EL1], ttbr0); 105 + write_sysreg_el1(ctxt->sys_regs[TTBR1_EL1], ttbr1); 106 + write_sysreg_el1(ctxt->sys_regs[TCR_EL1], tcr); 107 + write_sysreg_el1(ctxt->sys_regs[ESR_EL1], esr); 108 + write_sysreg_el1(ctxt->sys_regs[AFSR0_EL1], afsr0); 109 + write_sysreg_el1(ctxt->sys_regs[AFSR1_EL1], afsr1); 110 + write_sysreg_el1(ctxt->sys_regs[FAR_EL1], far); 111 + write_sysreg_el1(ctxt->sys_regs[MAIR_EL1], mair); 112 + write_sysreg_el1(ctxt->sys_regs[VBAR_EL1], vbar); 113 + write_sysreg_el1(ctxt->sys_regs[CONTEXTIDR_EL1],contextidr); 114 + write_sysreg_el1(ctxt->sys_regs[AMAIR_EL1], amair); 115 + write_sysreg_el1(ctxt->sys_regs[CNTKCTL_EL1], cntkctl); 116 + write_sysreg(ctxt->sys_regs[PAR_EL1], par_el1); 117 + write_sysreg(ctxt->sys_regs[MDSCR_EL1], mdscr_el1); 118 + 119 + write_sysreg(ctxt->gp_regs.sp_el1, sp_el1); 120 + write_sysreg_el1(ctxt->gp_regs.elr_el1, elr); 121 + write_sysreg_el1(ctxt->gp_regs.spsr[KVM_SPSR_EL1],spsr); 122 + } 123 + 124 + static hyp_alternate_select(__sysreg_call_restore_host_state, 125 + __sysreg_restore_state, __sysreg_do_nothing, 126 + ARM64_HAS_VIRT_HOST_EXTN); 127 + 128 + void __hyp_text __sysreg_restore_host_state(struct kvm_cpu_context *ctxt) 129 + { 130 + __sysreg_call_restore_host_state()(ctxt); 131 + __sysreg_restore_common_state(ctxt); 132 + } 133 + 134 + void __hyp_text __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt) 135 + { 136 + __sysreg_restore_state(ctxt); 137 + __sysreg_restore_common_state(ctxt); 91 138 } 92 139 93 140 void __hyp_text __sysreg32_save_state(struct kvm_vcpu *vcpu)

+6 -8

arch/arm64/kvm/hyp/timer-sr.c virt/kvm/arm/hyp/timer-sr.c

··· 19 19 #include <linux/compiler.h> 20 20 #include <linux/kvm_host.h> 21 21 22 - #include <asm/kvm_mmu.h> 23 - 24 - #include "hyp.h" 22 + #include <asm/kvm_hyp.h> 25 23 26 24 /* vcpu is already in the HYP VA space */ 27 25 void __hyp_text __timer_save_state(struct kvm_vcpu *vcpu) ··· 29 31 u64 val; 30 32 31 33 if (kvm->arch.timer.enabled) { 32 - timer->cntv_ctl = read_sysreg(cntv_ctl_el0); 33 - timer->cntv_cval = read_sysreg(cntv_cval_el0); 34 + timer->cntv_ctl = read_sysreg_el0(cntv_ctl); 35 + timer->cntv_cval = read_sysreg_el0(cntv_cval); 34 36 } 35 37 36 38 /* Disable the virtual timer */ 37 - write_sysreg(0, cntv_ctl_el0); 39 + write_sysreg_el0(0, cntv_ctl); 38 40 39 41 /* Allow physical timer/counter access for the host */ 40 42 val = read_sysreg(cnthctl_el2); ··· 62 64 63 65 if (kvm->arch.timer.enabled) { 64 66 write_sysreg(kvm->arch.timer.cntvoff, cntvoff_el2); 65 - write_sysreg(timer->cntv_cval, cntv_cval_el0); 67 + write_sysreg_el0(timer->cntv_cval, cntv_cval); 66 68 isb(); 67 - write_sysreg(timer->cntv_ctl, cntv_ctl_el0); 69 + write_sysreg_el0(timer->cntv_ctl, cntv_ctl); 68 70 } 69 71 }

+1 -1

arch/arm64/kvm/hyp/tlb.c

··· 15 15 * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 16 */ 17 17 18 - #include "hyp.h" 18 + #include <asm/kvm_hyp.h> 19 19 20 20 static void __hyp_text __tlb_flush_vmid_ipa(struct kvm *kvm, phys_addr_t ipa) 21 21 {

-84

arch/arm64/kvm/hyp/vgic-v2-sr.c

··· 1 - /* 2 - * Copyright (C) 2012-2015 - ARM Ltd 3 - * Author: Marc Zyngier <marc.zyngier@arm.com> 4 - * 5 - * This program is free software; you can redistribute it and/or modify 6 - * it under the terms of the GNU General Public License version 2 as 7 - * published by the Free Software Foundation. 8 - * 9 - * This program is distributed in the hope that it will be useful, 10 - * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 - * GNU General Public License for more details. 13 - * 14 - * You should have received a copy of the GNU General Public License 15 - * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 - */ 17 - 18 - #include <linux/compiler.h> 19 - #include <linux/irqchip/arm-gic.h> 20 - #include <linux/kvm_host.h> 21 - 22 - #include <asm/kvm_mmu.h> 23 - 24 - #include "hyp.h" 25 - 26 - /* vcpu is already in the HYP VA space */ 27 - void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu) 28 - { 29 - struct kvm *kvm = kern_hyp_va(vcpu->kvm); 30 - struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2; 31 - struct vgic_dist *vgic = &kvm->arch.vgic; 32 - void __iomem *base = kern_hyp_va(vgic->vctrl_base); 33 - u32 eisr0, eisr1, elrsr0, elrsr1; 34 - int i, nr_lr; 35 - 36 - if (!base) 37 - return; 38 - 39 - nr_lr = vcpu->arch.vgic_cpu.nr_lr; 40 - cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR); 41 - cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR); 42 - eisr0 = readl_relaxed(base + GICH_EISR0); 43 - elrsr0 = readl_relaxed(base + GICH_ELRSR0); 44 - if (unlikely(nr_lr > 32)) { 45 - eisr1 = readl_relaxed(base + GICH_EISR1); 46 - elrsr1 = readl_relaxed(base + GICH_ELRSR1); 47 - } else { 48 - eisr1 = elrsr1 = 0; 49 - } 50 - #ifdef CONFIG_CPU_BIG_ENDIAN 51 - cpu_if->vgic_eisr = ((u64)eisr0 << 32) | eisr1; 52 - cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1; 53 - #else 54 - cpu_if->vgic_eisr = ((u64)eisr1 << 32) | eisr0; 55 - cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0; 56 - #endif 57 - cpu_if->vgic_apr = readl_relaxed(base + GICH_APR); 58 - 59 - writel_relaxed(0, base + GICH_HCR); 60 - 61 - for (i = 0; i < nr_lr; i++) 62 - cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4)); 63 - } 64 - 65 - /* vcpu is already in the HYP VA space */ 66 - void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu) 67 - { 68 - struct kvm *kvm = kern_hyp_va(vcpu->kvm); 69 - struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2; 70 - struct vgic_dist *vgic = &kvm->arch.vgic; 71 - void __iomem *base = kern_hyp_va(vgic->vctrl_base); 72 - int i, nr_lr; 73 - 74 - if (!base) 75 - return; 76 - 77 - writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR); 78 - writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR); 79 - writel_relaxed(cpu_if->vgic_apr, base + GICH_APR); 80 - 81 - nr_lr = vcpu->arch.vgic_cpu.nr_lr; 82 - for (i = 0; i < nr_lr; i++) 83 - writel_relaxed(cpu_if->vgic_lr[i], base + GICH_LR0 + (i * 4)); 84 - }

+225 -116

arch/arm64/kvm/hyp/vgic-v3-sr.c

··· 19 19 #include <linux/irqchip/arm-gic-v3.h> 20 20 #include <linux/kvm_host.h> 21 21 22 - #include <asm/kvm_mmu.h> 23 - 24 - #include "hyp.h" 22 + #include <asm/kvm_hyp.h> 25 23 26 24 #define vtr_to_max_lr_idx(v) ((v) & 0xf) 27 25 #define vtr_to_nr_pri_bits(v) (((u32)(v) >> 29) + 1) ··· 37 39 asm volatile("msr_s " __stringify(r) ", %0" : : "r" (__val));\ 38 40 } while (0) 39 41 40 - /* vcpu is already in the HYP VA space */ 42 + static u64 __hyp_text __gic_v3_get_lr(unsigned int lr) 43 + { 44 + switch (lr & 0xf) { 45 + case 0: 46 + return read_gicreg(ICH_LR0_EL2); 47 + case 1: 48 + return read_gicreg(ICH_LR1_EL2); 49 + case 2: 50 + return read_gicreg(ICH_LR2_EL2); 51 + case 3: 52 + return read_gicreg(ICH_LR3_EL2); 53 + case 4: 54 + return read_gicreg(ICH_LR4_EL2); 55 + case 5: 56 + return read_gicreg(ICH_LR5_EL2); 57 + case 6: 58 + return read_gicreg(ICH_LR6_EL2); 59 + case 7: 60 + return read_gicreg(ICH_LR7_EL2); 61 + case 8: 62 + return read_gicreg(ICH_LR8_EL2); 63 + case 9: 64 + return read_gicreg(ICH_LR9_EL2); 65 + case 10: 66 + return read_gicreg(ICH_LR10_EL2); 67 + case 11: 68 + return read_gicreg(ICH_LR11_EL2); 69 + case 12: 70 + return read_gicreg(ICH_LR12_EL2); 71 + case 13: 72 + return read_gicreg(ICH_LR13_EL2); 73 + case 14: 74 + return read_gicreg(ICH_LR14_EL2); 75 + case 15: 76 + return read_gicreg(ICH_LR15_EL2); 77 + } 78 + 79 + unreachable(); 80 + } 81 + 82 + static void __hyp_text __gic_v3_set_lr(u64 val, int lr) 83 + { 84 + switch (lr & 0xf) { 85 + case 0: 86 + write_gicreg(val, ICH_LR0_EL2); 87 + break; 88 + case 1: 89 + write_gicreg(val, ICH_LR1_EL2); 90 + break; 91 + case 2: 92 + write_gicreg(val, ICH_LR2_EL2); 93 + break; 94 + case 3: 95 + write_gicreg(val, ICH_LR3_EL2); 96 + break; 97 + case 4: 98 + write_gicreg(val, ICH_LR4_EL2); 99 + break; 100 + case 5: 101 + write_gicreg(val, ICH_LR5_EL2); 102 + break; 103 + case 6: 104 + write_gicreg(val, ICH_LR6_EL2); 105 + break; 106 + case 7: 107 + write_gicreg(val, ICH_LR7_EL2); 108 + break; 109 + case 8: 110 + write_gicreg(val, ICH_LR8_EL2); 111 + break; 112 + case 9: 113 + write_gicreg(val, ICH_LR9_EL2); 114 + break; 115 + case 10: 116 + write_gicreg(val, ICH_LR10_EL2); 117 + break; 118 + case 11: 119 + write_gicreg(val, ICH_LR11_EL2); 120 + break; 121 + case 12: 122 + write_gicreg(val, ICH_LR12_EL2); 123 + break; 124 + case 13: 125 + write_gicreg(val, ICH_LR13_EL2); 126 + break; 127 + case 14: 128 + write_gicreg(val, ICH_LR14_EL2); 129 + break; 130 + case 15: 131 + write_gicreg(val, ICH_LR15_EL2); 132 + break; 133 + } 134 + } 135 + 136 + static void __hyp_text save_maint_int_state(struct kvm_vcpu *vcpu, int nr_lr) 137 + { 138 + struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3; 139 + int i; 140 + bool expect_mi; 141 + 142 + expect_mi = !!(cpu_if->vgic_hcr & ICH_HCR_UIE); 143 + 144 + for (i = 0; i < nr_lr; i++) { 145 + if (!(vcpu->arch.vgic_cpu.live_lrs & (1UL << i))) 146 + continue; 147 + 148 + expect_mi |= (!(cpu_if->vgic_lr[i] & ICH_LR_HW) && 149 + (cpu_if->vgic_lr[i] & ICH_LR_EOI)); 150 + } 151 + 152 + if (expect_mi) { 153 + cpu_if->vgic_misr = read_gicreg(ICH_MISR_EL2); 154 + 155 + if (cpu_if->vgic_misr & ICH_MISR_EOI) 156 + cpu_if->vgic_eisr = read_gicreg(ICH_EISR_EL2); 157 + else 158 + cpu_if->vgic_eisr = 0; 159 + } else { 160 + cpu_if->vgic_misr = 0; 161 + cpu_if->vgic_eisr = 0; 162 + } 163 + } 164 + 41 165 void __hyp_text __vgic_v3_save_state(struct kvm_vcpu *vcpu) 42 166 { 43 167 struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3; 44 168 u64 val; 45 - u32 max_lr_idx, nr_pri_bits; 46 169 47 170 /* 48 171 * Make sure stores to the GIC via the memory mapped interface ··· 172 53 dsb(st); 173 54 174 55 cpu_if->vgic_vmcr = read_gicreg(ICH_VMCR_EL2); 175 - cpu_if->vgic_misr = read_gicreg(ICH_MISR_EL2); 176 - cpu_if->vgic_eisr = read_gicreg(ICH_EISR_EL2); 177 - cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2); 178 56 179 - write_gicreg(0, ICH_HCR_EL2); 180 - val = read_gicreg(ICH_VTR_EL2); 181 - max_lr_idx = vtr_to_max_lr_idx(val); 182 - nr_pri_bits = vtr_to_nr_pri_bits(val); 57 + if (vcpu->arch.vgic_cpu.live_lrs) { 58 + int i; 59 + u32 max_lr_idx, nr_pri_bits; 183 60 184 - switch (max_lr_idx) { 185 - case 15: 186 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)] = read_gicreg(ICH_LR15_EL2); 187 - case 14: 188 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)] = read_gicreg(ICH_LR14_EL2); 189 - case 13: 190 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)] = read_gicreg(ICH_LR13_EL2); 191 - case 12: 192 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)] = read_gicreg(ICH_LR12_EL2); 193 - case 11: 194 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)] = read_gicreg(ICH_LR11_EL2); 195 - case 10: 196 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)] = read_gicreg(ICH_LR10_EL2); 197 - case 9: 198 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)] = read_gicreg(ICH_LR9_EL2); 199 - case 8: 200 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)] = read_gicreg(ICH_LR8_EL2); 201 - case 7: 202 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)] = read_gicreg(ICH_LR7_EL2); 203 - case 6: 204 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)] = read_gicreg(ICH_LR6_EL2); 205 - case 5: 206 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)] = read_gicreg(ICH_LR5_EL2); 207 - case 4: 208 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)] = read_gicreg(ICH_LR4_EL2); 209 - case 3: 210 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)] = read_gicreg(ICH_LR3_EL2); 211 - case 2: 212 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)] = read_gicreg(ICH_LR2_EL2); 213 - case 1: 214 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)] = read_gicreg(ICH_LR1_EL2); 215 - case 0: 216 - cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)] = read_gicreg(ICH_LR0_EL2); 217 - } 61 + cpu_if->vgic_elrsr = read_gicreg(ICH_ELSR_EL2); 218 62 219 - switch (nr_pri_bits) { 220 - case 7: 221 - cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2); 222 - cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2); 223 - case 6: 224 - cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2); 225 - default: 226 - cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2); 227 - } 63 + write_gicreg(0, ICH_HCR_EL2); 64 + val = read_gicreg(ICH_VTR_EL2); 65 + max_lr_idx = vtr_to_max_lr_idx(val); 66 + nr_pri_bits = vtr_to_nr_pri_bits(val); 228 67 229 - switch (nr_pri_bits) { 230 - case 7: 231 - cpu_if->vgic_ap1r[3] = read_gicreg(ICH_AP1R3_EL2); 232 - cpu_if->vgic_ap1r[2] = read_gicreg(ICH_AP1R2_EL2); 233 - case 6: 234 - cpu_if->vgic_ap1r[1] = read_gicreg(ICH_AP1R1_EL2); 235 - default: 236 - cpu_if->vgic_ap1r[0] = read_gicreg(ICH_AP1R0_EL2); 68 + save_maint_int_state(vcpu, max_lr_idx + 1); 69 + 70 + for (i = 0; i <= max_lr_idx; i++) { 71 + if (!(vcpu->arch.vgic_cpu.live_lrs & (1UL << i))) 72 + continue; 73 + 74 + if (cpu_if->vgic_elrsr & (1 << i)) { 75 + cpu_if->vgic_lr[i] &= ~ICH_LR_STATE; 76 + continue; 77 + } 78 + 79 + cpu_if->vgic_lr[i] = __gic_v3_get_lr(i); 80 + __gic_v3_set_lr(0, i); 81 + } 82 + 83 + switch (nr_pri_bits) { 84 + case 7: 85 + cpu_if->vgic_ap0r[3] = read_gicreg(ICH_AP0R3_EL2); 86 + cpu_if->vgic_ap0r[2] = read_gicreg(ICH_AP0R2_EL2); 87 + case 6: 88 + cpu_if->vgic_ap0r[1] = read_gicreg(ICH_AP0R1_EL2); 89 + default: 90 + cpu_if->vgic_ap0r[0] = read_gicreg(ICH_AP0R0_EL2); 91 + } 92 + 93 + switch (nr_pri_bits) { 94 + case 7: 95 + cpu_if->vgic_ap1r[3] = read_gicreg(ICH_AP1R3_EL2); 96 + cpu_if->vgic_ap1r[2] = read_gicreg(ICH_AP1R2_EL2); 97 + case 6: 98 + cpu_if->vgic_ap1r[1] = read_gicreg(ICH_AP1R1_EL2); 99 + default: 100 + cpu_if->vgic_ap1r[0] = read_gicreg(ICH_AP1R0_EL2); 101 + } 102 + 103 + vcpu->arch.vgic_cpu.live_lrs = 0; 104 + } else { 105 + cpu_if->vgic_misr = 0; 106 + cpu_if->vgic_eisr = 0; 107 + cpu_if->vgic_elrsr = 0xffff; 108 + cpu_if->vgic_ap0r[0] = 0; 109 + cpu_if->vgic_ap0r[1] = 0; 110 + cpu_if->vgic_ap0r[2] = 0; 111 + cpu_if->vgic_ap0r[3] = 0; 112 + cpu_if->vgic_ap1r[0] = 0; 113 + cpu_if->vgic_ap1r[1] = 0; 114 + cpu_if->vgic_ap1r[2] = 0; 115 + cpu_if->vgic_ap1r[3] = 0; 237 116 } 238 117 239 118 val = read_gicreg(ICC_SRE_EL2); ··· 245 128 struct vgic_v3_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v3; 246 129 u64 val; 247 130 u32 max_lr_idx, nr_pri_bits; 131 + u16 live_lrs = 0; 132 + int i; 248 133 249 134 /* 250 135 * VFIQEn is RES1 if ICC_SRE_EL1.SRE is 1. This causes a ··· 259 140 write_gicreg(cpu_if->vgic_sre, ICC_SRE_EL1); 260 141 isb(); 261 142 262 - write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2); 263 - write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2); 264 - 265 143 val = read_gicreg(ICH_VTR_EL2); 266 144 max_lr_idx = vtr_to_max_lr_idx(val); 267 145 nr_pri_bits = vtr_to_nr_pri_bits(val); 268 146 269 - switch (nr_pri_bits) { 270 - case 7: 271 - write_gicreg(cpu_if->vgic_ap0r[3], ICH_AP0R3_EL2); 272 - write_gicreg(cpu_if->vgic_ap0r[2], ICH_AP0R2_EL2); 273 - case 6: 274 - write_gicreg(cpu_if->vgic_ap0r[1], ICH_AP0R1_EL2); 275 - default: 276 - write_gicreg(cpu_if->vgic_ap0r[0], ICH_AP0R0_EL2); 147 + for (i = 0; i <= max_lr_idx; i++) { 148 + if (cpu_if->vgic_lr[i] & ICH_LR_STATE) 149 + live_lrs |= (1 << i); 277 150 } 278 151 279 - switch (nr_pri_bits) { 280 - case 7: 281 - write_gicreg(cpu_if->vgic_ap1r[3], ICH_AP1R3_EL2); 282 - write_gicreg(cpu_if->vgic_ap1r[2], ICH_AP1R2_EL2); 283 - case 6: 284 - write_gicreg(cpu_if->vgic_ap1r[1], ICH_AP1R1_EL2); 285 - default: 286 - write_gicreg(cpu_if->vgic_ap1r[0], ICH_AP1R0_EL2); 287 - } 152 + write_gicreg(cpu_if->vgic_vmcr, ICH_VMCR_EL2); 288 153 289 - switch (max_lr_idx) { 290 - case 15: 291 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(15)], ICH_LR15_EL2); 292 - case 14: 293 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(14)], ICH_LR14_EL2); 294 - case 13: 295 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(13)], ICH_LR13_EL2); 296 - case 12: 297 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(12)], ICH_LR12_EL2); 298 - case 11: 299 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(11)], ICH_LR11_EL2); 300 - case 10: 301 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(10)], ICH_LR10_EL2); 302 - case 9: 303 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(9)], ICH_LR9_EL2); 304 - case 8: 305 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(8)], ICH_LR8_EL2); 306 - case 7: 307 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(7)], ICH_LR7_EL2); 308 - case 6: 309 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(6)], ICH_LR6_EL2); 310 - case 5: 311 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(5)], ICH_LR5_EL2); 312 - case 4: 313 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(4)], ICH_LR4_EL2); 314 - case 3: 315 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(3)], ICH_LR3_EL2); 316 - case 2: 317 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(2)], ICH_LR2_EL2); 318 - case 1: 319 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(1)], ICH_LR1_EL2); 320 - case 0: 321 - write_gicreg(cpu_if->vgic_lr[VGIC_V3_LR_INDEX(0)], ICH_LR0_EL2); 154 + if (live_lrs) { 155 + write_gicreg(cpu_if->vgic_hcr, ICH_HCR_EL2); 156 + 157 + switch (nr_pri_bits) { 158 + case 7: 159 + write_gicreg(cpu_if->vgic_ap0r[3], ICH_AP0R3_EL2); 160 + write_gicreg(cpu_if->vgic_ap0r[2], ICH_AP0R2_EL2); 161 + case 6: 162 + write_gicreg(cpu_if->vgic_ap0r[1], ICH_AP0R1_EL2); 163 + default: 164 + write_gicreg(cpu_if->vgic_ap0r[0], ICH_AP0R0_EL2); 165 + } 166 + 167 + switch (nr_pri_bits) { 168 + case 7: 169 + write_gicreg(cpu_if->vgic_ap1r[3], ICH_AP1R3_EL2); 170 + write_gicreg(cpu_if->vgic_ap1r[2], ICH_AP1R2_EL2); 171 + case 6: 172 + write_gicreg(cpu_if->vgic_ap1r[1], ICH_AP1R1_EL2); 173 + default: 174 + write_gicreg(cpu_if->vgic_ap1r[0], ICH_AP1R0_EL2); 175 + } 176 + 177 + for (i = 0; i <= max_lr_idx; i++) { 178 + if (!(live_lrs & (1 << i))) 179 + continue; 180 + 181 + __gic_v3_set_lr(cpu_if->vgic_lr[i], i); 182 + } 322 183 } 323 184 324 185 /* ··· 308 209 */ 309 210 isb(); 310 211 dsb(sy); 212 + vcpu->arch.vgic_cpu.live_lrs = live_lrs; 311 213 312 214 /* 313 215 * Prevent the guest from touching the GIC system registers if ··· 318 218 write_gicreg(read_gicreg(ICC_SRE_EL2) & ~ICC_SRE_EL2_ENABLE, 319 219 ICC_SRE_EL2); 320 220 } 221 + } 222 + 223 + void __hyp_text __vgic_v3_init_lrs(void) 224 + { 225 + int max_lr_idx = vtr_to_max_lr_idx(read_gicreg(ICH_VTR_EL2)); 226 + int i; 227 + 228 + for (i = 0; i <= max_lr_idx; i++) 229 + __gic_v3_set_lr(0, i); 321 230 } 322 231 323 232 static u64 __hyp_text __vgic_v3_read_ich_vtr_el2(void)

+7

arch/arm64/kvm/reset.c

··· 77 77 case KVM_CAP_GUEST_DEBUG_HW_WPS: 78 78 r = get_num_wrps(); 79 79 break; 80 + case KVM_CAP_ARM_PMU_V3: 81 + r = kvm_arm_support_pmu_v3(); 82 + break; 80 83 case KVM_CAP_SET_GUEST_DEBUG: 84 + case KVM_CAP_VCPU_ATTRIBUTES: 81 85 r = 1; 82 86 break; 83 87 default: ··· 123 119 124 120 /* Reset system registers */ 125 121 kvm_reset_sys_regs(vcpu); 122 + 123 + /* Reset PMU */ 124 + kvm_pmu_vcpu_reset(vcpu); 126 125 127 126 /* Reset timer */ 128 127 return kvm_timer_vcpu_reset(vcpu, cpu_vtimer_irq);

+562 -47

arch/arm64/kvm/sys_regs.c

··· 20 20 * along with this program. If not, see <http://www.gnu.org/licenses/>. 21 21 */ 22 22 23 + #include <linux/bsearch.h> 23 24 #include <linux/kvm_host.h> 24 25 #include <linux/mm.h> 25 26 #include <linux/uaccess.h> ··· 35 34 #include <asm/kvm_emulate.h> 36 35 #include <asm/kvm_host.h> 37 36 #include <asm/kvm_mmu.h> 37 + #include <asm/perf_event.h> 38 38 39 39 #include <trace/events/kvm.h> 40 40 ··· 441 439 vcpu_sys_reg(vcpu, MPIDR_EL1) = (1ULL << 31) | mpidr; 442 440 } 443 441 442 + static void reset_pmcr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r) 443 + { 444 + u64 pmcr, val; 445 + 446 + asm volatile("mrs %0, pmcr_el0\n" : "=r" (pmcr)); 447 + /* Writable bits of PMCR_EL0 (ARMV8_PMU_PMCR_MASK) is reset to UNKNOWN 448 + * except PMCR.E resetting to zero. 449 + */ 450 + val = ((pmcr & ~ARMV8_PMU_PMCR_MASK) 451 + | (ARMV8_PMU_PMCR_MASK & 0xdecafbad)) & (~ARMV8_PMU_PMCR_E); 452 + vcpu_sys_reg(vcpu, PMCR_EL0) = val; 453 + } 454 + 455 + static bool pmu_access_el0_disabled(struct kvm_vcpu *vcpu) 456 + { 457 + u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0); 458 + 459 + return !((reg & ARMV8_PMU_USERENR_EN) || vcpu_mode_priv(vcpu)); 460 + } 461 + 462 + static bool pmu_write_swinc_el0_disabled(struct kvm_vcpu *vcpu) 463 + { 464 + u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0); 465 + 466 + return !((reg & (ARMV8_PMU_USERENR_SW | ARMV8_PMU_USERENR_EN)) 467 + || vcpu_mode_priv(vcpu)); 468 + } 469 + 470 + static bool pmu_access_cycle_counter_el0_disabled(struct kvm_vcpu *vcpu) 471 + { 472 + u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0); 473 + 474 + return !((reg & (ARMV8_PMU_USERENR_CR | ARMV8_PMU_USERENR_EN)) 475 + || vcpu_mode_priv(vcpu)); 476 + } 477 + 478 + static bool pmu_access_event_counter_el0_disabled(struct kvm_vcpu *vcpu) 479 + { 480 + u64 reg = vcpu_sys_reg(vcpu, PMUSERENR_EL0); 481 + 482 + return !((reg & (ARMV8_PMU_USERENR_ER | ARMV8_PMU_USERENR_EN)) 483 + || vcpu_mode_priv(vcpu)); 484 + } 485 + 486 + static bool access_pmcr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 487 + const struct sys_reg_desc *r) 488 + { 489 + u64 val; 490 + 491 + if (!kvm_arm_pmu_v3_ready(vcpu)) 492 + return trap_raz_wi(vcpu, p, r); 493 + 494 + if (pmu_access_el0_disabled(vcpu)) 495 + return false; 496 + 497 + if (p->is_write) { 498 + /* Only update writeable bits of PMCR */ 499 + val = vcpu_sys_reg(vcpu, PMCR_EL0); 500 + val &= ~ARMV8_PMU_PMCR_MASK; 501 + val |= p->regval & ARMV8_PMU_PMCR_MASK; 502 + vcpu_sys_reg(vcpu, PMCR_EL0) = val; 503 + kvm_pmu_handle_pmcr(vcpu, val); 504 + } else { 505 + /* PMCR.P & PMCR.C are RAZ */ 506 + val = vcpu_sys_reg(vcpu, PMCR_EL0) 507 + & ~(ARMV8_PMU_PMCR_P | ARMV8_PMU_PMCR_C); 508 + p->regval = val; 509 + } 510 + 511 + return true; 512 + } 513 + 514 + static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 515 + const struct sys_reg_desc *r) 516 + { 517 + if (!kvm_arm_pmu_v3_ready(vcpu)) 518 + return trap_raz_wi(vcpu, p, r); 519 + 520 + if (pmu_access_event_counter_el0_disabled(vcpu)) 521 + return false; 522 + 523 + if (p->is_write) 524 + vcpu_sys_reg(vcpu, PMSELR_EL0) = p->regval; 525 + else 526 + /* return PMSELR.SEL field */ 527 + p->regval = vcpu_sys_reg(vcpu, PMSELR_EL0) 528 + & ARMV8_PMU_COUNTER_MASK; 529 + 530 + return true; 531 + } 532 + 533 + static bool access_pmceid(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 534 + const struct sys_reg_desc *r) 535 + { 536 + u64 pmceid; 537 + 538 + if (!kvm_arm_pmu_v3_ready(vcpu)) 539 + return trap_raz_wi(vcpu, p, r); 540 + 541 + BUG_ON(p->is_write); 542 + 543 + if (pmu_access_el0_disabled(vcpu)) 544 + return false; 545 + 546 + if (!(p->Op2 & 1)) 547 + asm volatile("mrs %0, pmceid0_el0\n" : "=r" (pmceid)); 548 + else 549 + asm volatile("mrs %0, pmceid1_el0\n" : "=r" (pmceid)); 550 + 551 + p->regval = pmceid; 552 + 553 + return true; 554 + } 555 + 556 + static bool pmu_counter_idx_valid(struct kvm_vcpu *vcpu, u64 idx) 557 + { 558 + u64 pmcr, val; 559 + 560 + pmcr = vcpu_sys_reg(vcpu, PMCR_EL0); 561 + val = (pmcr >> ARMV8_PMU_PMCR_N_SHIFT) & ARMV8_PMU_PMCR_N_MASK; 562 + if (idx >= val && idx != ARMV8_PMU_CYCLE_IDX) 563 + return false; 564 + 565 + return true; 566 + } 567 + 568 + static bool access_pmu_evcntr(struct kvm_vcpu *vcpu, 569 + struct sys_reg_params *p, 570 + const struct sys_reg_desc *r) 571 + { 572 + u64 idx; 573 + 574 + if (!kvm_arm_pmu_v3_ready(vcpu)) 575 + return trap_raz_wi(vcpu, p, r); 576 + 577 + if (r->CRn == 9 && r->CRm == 13) { 578 + if (r->Op2 == 2) { 579 + /* PMXEVCNTR_EL0 */ 580 + if (pmu_access_event_counter_el0_disabled(vcpu)) 581 + return false; 582 + 583 + idx = vcpu_sys_reg(vcpu, PMSELR_EL0) 584 + & ARMV8_PMU_COUNTER_MASK; 585 + } else if (r->Op2 == 0) { 586 + /* PMCCNTR_EL0 */ 587 + if (pmu_access_cycle_counter_el0_disabled(vcpu)) 588 + return false; 589 + 590 + idx = ARMV8_PMU_CYCLE_IDX; 591 + } else { 592 + BUG(); 593 + } 594 + } else if (r->CRn == 14 && (r->CRm & 12) == 8) { 595 + /* PMEVCNTRn_EL0 */ 596 + if (pmu_access_event_counter_el0_disabled(vcpu)) 597 + return false; 598 + 599 + idx = ((r->CRm & 3) << 3) | (r->Op2 & 7); 600 + } else { 601 + BUG(); 602 + } 603 + 604 + if (!pmu_counter_idx_valid(vcpu, idx)) 605 + return false; 606 + 607 + if (p->is_write) { 608 + if (pmu_access_el0_disabled(vcpu)) 609 + return false; 610 + 611 + kvm_pmu_set_counter_value(vcpu, idx, p->regval); 612 + } else { 613 + p->regval = kvm_pmu_get_counter_value(vcpu, idx); 614 + } 615 + 616 + return true; 617 + } 618 + 619 + static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 620 + const struct sys_reg_desc *r) 621 + { 622 + u64 idx, reg; 623 + 624 + if (!kvm_arm_pmu_v3_ready(vcpu)) 625 + return trap_raz_wi(vcpu, p, r); 626 + 627 + if (pmu_access_el0_disabled(vcpu)) 628 + return false; 629 + 630 + if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 1) { 631 + /* PMXEVTYPER_EL0 */ 632 + idx = vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMU_COUNTER_MASK; 633 + reg = PMEVTYPER0_EL0 + idx; 634 + } else if (r->CRn == 14 && (r->CRm & 12) == 12) { 635 + idx = ((r->CRm & 3) << 3) | (r->Op2 & 7); 636 + if (idx == ARMV8_PMU_CYCLE_IDX) 637 + reg = PMCCFILTR_EL0; 638 + else 639 + /* PMEVTYPERn_EL0 */ 640 + reg = PMEVTYPER0_EL0 + idx; 641 + } else { 642 + BUG(); 643 + } 644 + 645 + if (!pmu_counter_idx_valid(vcpu, idx)) 646 + return false; 647 + 648 + if (p->is_write) { 649 + kvm_pmu_set_counter_event_type(vcpu, p->regval, idx); 650 + vcpu_sys_reg(vcpu, reg) = p->regval & ARMV8_PMU_EVTYPE_MASK; 651 + } else { 652 + p->regval = vcpu_sys_reg(vcpu, reg) & ARMV8_PMU_EVTYPE_MASK; 653 + } 654 + 655 + return true; 656 + } 657 + 658 + static bool access_pmcnten(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 659 + const struct sys_reg_desc *r) 660 + { 661 + u64 val, mask; 662 + 663 + if (!kvm_arm_pmu_v3_ready(vcpu)) 664 + return trap_raz_wi(vcpu, p, r); 665 + 666 + if (pmu_access_el0_disabled(vcpu)) 667 + return false; 668 + 669 + mask = kvm_pmu_valid_counter_mask(vcpu); 670 + if (p->is_write) { 671 + val = p->regval & mask; 672 + if (r->Op2 & 0x1) { 673 + /* accessing PMCNTENSET_EL0 */ 674 + vcpu_sys_reg(vcpu, PMCNTENSET_EL0) |= val; 675 + kvm_pmu_enable_counter(vcpu, val); 676 + } else { 677 + /* accessing PMCNTENCLR_EL0 */ 678 + vcpu_sys_reg(vcpu, PMCNTENSET_EL0) &= ~val; 679 + kvm_pmu_disable_counter(vcpu, val); 680 + } 681 + } else { 682 + p->regval = vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask; 683 + } 684 + 685 + return true; 686 + } 687 + 688 + static bool access_pminten(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 689 + const struct sys_reg_desc *r) 690 + { 691 + u64 mask = kvm_pmu_valid_counter_mask(vcpu); 692 + 693 + if (!kvm_arm_pmu_v3_ready(vcpu)) 694 + return trap_raz_wi(vcpu, p, r); 695 + 696 + if (!vcpu_mode_priv(vcpu)) 697 + return false; 698 + 699 + if (p->is_write) { 700 + u64 val = p->regval & mask; 701 + 702 + if (r->Op2 & 0x1) 703 + /* accessing PMINTENSET_EL1 */ 704 + vcpu_sys_reg(vcpu, PMINTENSET_EL1) |= val; 705 + else 706 + /* accessing PMINTENCLR_EL1 */ 707 + vcpu_sys_reg(vcpu, PMINTENSET_EL1) &= ~val; 708 + } else { 709 + p->regval = vcpu_sys_reg(vcpu, PMINTENSET_EL1) & mask; 710 + } 711 + 712 + return true; 713 + } 714 + 715 + static bool access_pmovs(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 716 + const struct sys_reg_desc *r) 717 + { 718 + u64 mask = kvm_pmu_valid_counter_mask(vcpu); 719 + 720 + if (!kvm_arm_pmu_v3_ready(vcpu)) 721 + return trap_raz_wi(vcpu, p, r); 722 + 723 + if (pmu_access_el0_disabled(vcpu)) 724 + return false; 725 + 726 + if (p->is_write) { 727 + if (r->CRm & 0x2) 728 + /* accessing PMOVSSET_EL0 */ 729 + kvm_pmu_overflow_set(vcpu, p->regval & mask); 730 + else 731 + /* accessing PMOVSCLR_EL0 */ 732 + vcpu_sys_reg(vcpu, PMOVSSET_EL0) &= ~(p->regval & mask); 733 + } else { 734 + p->regval = vcpu_sys_reg(vcpu, PMOVSSET_EL0) & mask; 735 + } 736 + 737 + return true; 738 + } 739 + 740 + static bool access_pmswinc(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 741 + const struct sys_reg_desc *r) 742 + { 743 + u64 mask; 744 + 745 + if (!kvm_arm_pmu_v3_ready(vcpu)) 746 + return trap_raz_wi(vcpu, p, r); 747 + 748 + if (pmu_write_swinc_el0_disabled(vcpu)) 749 + return false; 750 + 751 + if (p->is_write) { 752 + mask = kvm_pmu_valid_counter_mask(vcpu); 753 + kvm_pmu_software_increment(vcpu, p->regval & mask); 754 + return true; 755 + } 756 + 757 + return false; 758 + } 759 + 760 + static bool access_pmuserenr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 761 + const struct sys_reg_desc *r) 762 + { 763 + if (!kvm_arm_pmu_v3_ready(vcpu)) 764 + return trap_raz_wi(vcpu, p, r); 765 + 766 + if (p->is_write) { 767 + if (!vcpu_mode_priv(vcpu)) 768 + return false; 769 + 770 + vcpu_sys_reg(vcpu, PMUSERENR_EL0) = p->regval 771 + & ARMV8_PMU_USERENR_MASK; 772 + } else { 773 + p->regval = vcpu_sys_reg(vcpu, PMUSERENR_EL0) 774 + & ARMV8_PMU_USERENR_MASK; 775 + } 776 + 777 + return true; 778 + } 779 + 444 780 /* Silly macro to expand the DBG{BCR,BVR,WVR,WCR}n_EL1 registers in one go */ 445 781 #define DBG_BCR_BVR_WCR_WVR_EL1(n) \ 446 782 /* DBGBVRn_EL1 */ \ ··· 793 453 /* DBGWCRn_EL1 */ \ 794 454 { Op0(0b10), Op1(0b000), CRn(0b0000), CRm((n)), Op2(0b111), \ 795 455 trap_wcr, reset_wcr, n, 0, get_wcr, set_wcr } 456 + 457 + /* Macro to expand the PMEVCNTRn_EL0 register */ 458 + #define PMU_PMEVCNTR_EL0(n) \ 459 + /* PMEVCNTRn_EL0 */ \ 460 + { Op0(0b11), Op1(0b011), CRn(0b1110), \ 461 + CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \ 462 + access_pmu_evcntr, reset_unknown, (PMEVCNTR0_EL0 + n), } 463 + 464 + /* Macro to expand the PMEVTYPERn_EL0 register */ 465 + #define PMU_PMEVTYPER_EL0(n) \ 466 + /* PMEVTYPERn_EL0 */ \ 467 + { Op0(0b11), Op1(0b011), CRn(0b1110), \ 468 + CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \ 469 + access_pmu_evtyper, reset_unknown, (PMEVTYPER0_EL0 + n), } 796 470 797 471 /* 798 472 * Architected system registers. ··· 937 583 938 584 /* PMINTENSET_EL1 */ 939 585 { Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b001), 940 - trap_raz_wi }, 586 + access_pminten, reset_unknown, PMINTENSET_EL1 }, 941 587 /* PMINTENCLR_EL1 */ 942 588 { Op0(0b11), Op1(0b000), CRn(0b1001), CRm(0b1110), Op2(0b010), 943 - trap_raz_wi }, 589 + access_pminten, NULL, PMINTENSET_EL1 }, 944 590 945 591 /* MAIR_EL1 */ 946 592 { Op0(0b11), Op1(0b000), CRn(0b1010), CRm(0b0010), Op2(0b000), ··· 977 623 978 624 /* PMCR_EL0 */ 979 625 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b000), 980 - trap_raz_wi }, 626 + access_pmcr, reset_pmcr, }, 981 627 /* PMCNTENSET_EL0 */ 982 628 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b001), 983 - trap_raz_wi }, 629 + access_pmcnten, reset_unknown, PMCNTENSET_EL0 }, 984 630 /* PMCNTENCLR_EL0 */ 985 631 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b010), 986 - trap_raz_wi }, 632 + access_pmcnten, NULL, PMCNTENSET_EL0 }, 987 633 /* PMOVSCLR_EL0 */ 988 634 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b011), 989 - trap_raz_wi }, 635 + access_pmovs, NULL, PMOVSSET_EL0 }, 990 636 /* PMSWINC_EL0 */ 991 637 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b100), 992 - trap_raz_wi }, 638 + access_pmswinc, reset_unknown, PMSWINC_EL0 }, 993 639 /* PMSELR_EL0 */ 994 640 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b101), 995 - trap_raz_wi }, 641 + access_pmselr, reset_unknown, PMSELR_EL0 }, 996 642 /* PMCEID0_EL0 */ 997 643 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b110), 998 - trap_raz_wi }, 644 + access_pmceid }, 999 645 /* PMCEID1_EL0 */ 1000 646 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1100), Op2(0b111), 1001 - trap_raz_wi }, 647 + access_pmceid }, 1002 648 /* PMCCNTR_EL0 */ 1003 649 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b000), 1004 - trap_raz_wi }, 650 + access_pmu_evcntr, reset_unknown, PMCCNTR_EL0 }, 1005 651 /* PMXEVTYPER_EL0 */ 1006 652 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b001), 1007 - trap_raz_wi }, 653 + access_pmu_evtyper }, 1008 654 /* PMXEVCNTR_EL0 */ 1009 655 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1101), Op2(0b010), 1010 - trap_raz_wi }, 1011 - /* PMUSERENR_EL0 */ 656 + access_pmu_evcntr }, 657 + /* PMUSERENR_EL0 658 + * This register resets as unknown in 64bit mode while it resets as zero 659 + * in 32bit mode. Here we choose to reset it as zero for consistency. 660 + */ 1012 661 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b000), 1013 - trap_raz_wi }, 662 + access_pmuserenr, reset_val, PMUSERENR_EL0, 0 }, 1014 663 /* PMOVSSET_EL0 */ 1015 664 { Op0(0b11), Op1(0b011), CRn(0b1001), CRm(0b1110), Op2(0b011), 1016 - trap_raz_wi }, 665 + access_pmovs, reset_unknown, PMOVSSET_EL0 }, 1017 666 1018 667 /* TPIDR_EL0 */ 1019 668 { Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b010), ··· 1024 667 /* TPIDRRO_EL0 */ 1025 668 { Op0(0b11), Op1(0b011), CRn(0b1101), CRm(0b0000), Op2(0b011), 1026 669 NULL, reset_unknown, TPIDRRO_EL0 }, 670 + 671 + /* PMEVCNTRn_EL0 */ 672 + PMU_PMEVCNTR_EL0(0), 673 + PMU_PMEVCNTR_EL0(1), 674 + PMU_PMEVCNTR_EL0(2), 675 + PMU_PMEVCNTR_EL0(3), 676 + PMU_PMEVCNTR_EL0(4), 677 + PMU_PMEVCNTR_EL0(5), 678 + PMU_PMEVCNTR_EL0(6), 679 + PMU_PMEVCNTR_EL0(7), 680 + PMU_PMEVCNTR_EL0(8), 681 + PMU_PMEVCNTR_EL0(9), 682 + PMU_PMEVCNTR_EL0(10), 683 + PMU_PMEVCNTR_EL0(11), 684 + PMU_PMEVCNTR_EL0(12), 685 + PMU_PMEVCNTR_EL0(13), 686 + PMU_PMEVCNTR_EL0(14), 687 + PMU_PMEVCNTR_EL0(15), 688 + PMU_PMEVCNTR_EL0(16), 689 + PMU_PMEVCNTR_EL0(17), 690 + PMU_PMEVCNTR_EL0(18), 691 + PMU_PMEVCNTR_EL0(19), 692 + PMU_PMEVCNTR_EL0(20), 693 + PMU_PMEVCNTR_EL0(21), 694 + PMU_PMEVCNTR_EL0(22), 695 + PMU_PMEVCNTR_EL0(23), 696 + PMU_PMEVCNTR_EL0(24), 697 + PMU_PMEVCNTR_EL0(25), 698 + PMU_PMEVCNTR_EL0(26), 699 + PMU_PMEVCNTR_EL0(27), 700 + PMU_PMEVCNTR_EL0(28), 701 + PMU_PMEVCNTR_EL0(29), 702 + PMU_PMEVCNTR_EL0(30), 703 + /* PMEVTYPERn_EL0 */ 704 + PMU_PMEVTYPER_EL0(0), 705 + PMU_PMEVTYPER_EL0(1), 706 + PMU_PMEVTYPER_EL0(2), 707 + PMU_PMEVTYPER_EL0(3), 708 + PMU_PMEVTYPER_EL0(4), 709 + PMU_PMEVTYPER_EL0(5), 710 + PMU_PMEVTYPER_EL0(6), 711 + PMU_PMEVTYPER_EL0(7), 712 + PMU_PMEVTYPER_EL0(8), 713 + PMU_PMEVTYPER_EL0(9), 714 + PMU_PMEVTYPER_EL0(10), 715 + PMU_PMEVTYPER_EL0(11), 716 + PMU_PMEVTYPER_EL0(12), 717 + PMU_PMEVTYPER_EL0(13), 718 + PMU_PMEVTYPER_EL0(14), 719 + PMU_PMEVTYPER_EL0(15), 720 + PMU_PMEVTYPER_EL0(16), 721 + PMU_PMEVTYPER_EL0(17), 722 + PMU_PMEVTYPER_EL0(18), 723 + PMU_PMEVTYPER_EL0(19), 724 + PMU_PMEVTYPER_EL0(20), 725 + PMU_PMEVTYPER_EL0(21), 726 + PMU_PMEVTYPER_EL0(22), 727 + PMU_PMEVTYPER_EL0(23), 728 + PMU_PMEVTYPER_EL0(24), 729 + PMU_PMEVTYPER_EL0(25), 730 + PMU_PMEVTYPER_EL0(26), 731 + PMU_PMEVTYPER_EL0(27), 732 + PMU_PMEVTYPER_EL0(28), 733 + PMU_PMEVTYPER_EL0(29), 734 + PMU_PMEVTYPER_EL0(30), 735 + /* PMCCFILTR_EL0 736 + * This register resets as unknown in 64bit mode while it resets as zero 737 + * in 32bit mode. Here we choose to reset it as zero for consistency. 738 + */ 739 + { Op0(0b11), Op1(0b011), CRn(0b1110), CRm(0b1111), Op2(0b111), 740 + access_pmu_evtyper, reset_val, PMCCFILTR_EL0, 0 }, 1027 741 1028 742 /* DACR32_EL2 */ 1029 743 { Op0(0b11), Op1(0b100), CRn(0b0011), CRm(0b0000), Op2(0b000), ··· 1285 857 { Op1( 0), CRm( 2), .access = trap_raz_wi }, 1286 858 }; 1287 859 860 + /* Macro to expand the PMEVCNTRn register */ 861 + #define PMU_PMEVCNTR(n) \ 862 + /* PMEVCNTRn */ \ 863 + { Op1(0), CRn(0b1110), \ 864 + CRm((0b1000 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \ 865 + access_pmu_evcntr } 866 + 867 + /* Macro to expand the PMEVTYPERn register */ 868 + #define PMU_PMEVTYPER(n) \ 869 + /* PMEVTYPERn */ \ 870 + { Op1(0), CRn(0b1110), \ 871 + CRm((0b1100 | (((n) >> 3) & 0x3))), Op2(((n) & 0x7)), \ 872 + access_pmu_evtyper } 873 + 1288 874 /* 1289 875 * Trapped cp15 registers. TTBR0/TTBR1 get a double encoding, 1290 876 * depending on the way they are accessed (as a 32bit or a 64bit ··· 1327 885 { Op1( 0), CRn( 7), CRm(14), Op2( 2), access_dcsw }, 1328 886 1329 887 /* PMU */ 1330 - { Op1( 0), CRn( 9), CRm(12), Op2( 0), trap_raz_wi }, 1331 - { Op1( 0), CRn( 9), CRm(12), Op2( 1), trap_raz_wi }, 1332 - { Op1( 0), CRn( 9), CRm(12), Op2( 2), trap_raz_wi }, 1333 - { Op1( 0), CRn( 9), CRm(12), Op2( 3), trap_raz_wi }, 1334 - { Op1( 0), CRn( 9), CRm(12), Op2( 5), trap_raz_wi }, 1335 - { Op1( 0), CRn( 9), CRm(12), Op2( 6), trap_raz_wi }, 1336 - { Op1( 0), CRn( 9), CRm(12), Op2( 7), trap_raz_wi }, 1337 - { Op1( 0), CRn( 9), CRm(13), Op2( 0), trap_raz_wi }, 1338 - { Op1( 0), CRn( 9), CRm(13), Op2( 1), trap_raz_wi }, 1339 - { Op1( 0), CRn( 9), CRm(13), Op2( 2), trap_raz_wi }, 1340 - { Op1( 0), CRn( 9), CRm(14), Op2( 0), trap_raz_wi }, 1341 - { Op1( 0), CRn( 9), CRm(14), Op2( 1), trap_raz_wi }, 1342 - { Op1( 0), CRn( 9), CRm(14), Op2( 2), trap_raz_wi }, 888 + { Op1( 0), CRn( 9), CRm(12), Op2( 0), access_pmcr }, 889 + { Op1( 0), CRn( 9), CRm(12), Op2( 1), access_pmcnten }, 890 + { Op1( 0), CRn( 9), CRm(12), Op2( 2), access_pmcnten }, 891 + { Op1( 0), CRn( 9), CRm(12), Op2( 3), access_pmovs }, 892 + { Op1( 0), CRn( 9), CRm(12), Op2( 4), access_pmswinc }, 893 + { Op1( 0), CRn( 9), CRm(12), Op2( 5), access_pmselr }, 894 + { Op1( 0), CRn( 9), CRm(12), Op2( 6), access_pmceid }, 895 + { Op1( 0), CRn( 9), CRm(12), Op2( 7), access_pmceid }, 896 + { Op1( 0), CRn( 9), CRm(13), Op2( 0), access_pmu_evcntr }, 897 + { Op1( 0), CRn( 9), CRm(13), Op2( 1), access_pmu_evtyper }, 898 + { Op1( 0), CRn( 9), CRm(13), Op2( 2), access_pmu_evcntr }, 899 + { Op1( 0), CRn( 9), CRm(14), Op2( 0), access_pmuserenr }, 900 + { Op1( 0), CRn( 9), CRm(14), Op2( 1), access_pminten }, 901 + { Op1( 0), CRn( 9), CRm(14), Op2( 2), access_pminten }, 902 + { Op1( 0), CRn( 9), CRm(14), Op2( 3), access_pmovs }, 1343 903 1344 904 { Op1( 0), CRn(10), CRm( 2), Op2( 0), access_vm_reg, NULL, c10_PRRR }, 1345 905 { Op1( 0), CRn(10), CRm( 2), Op2( 1), access_vm_reg, NULL, c10_NMRR }, ··· 1352 908 { Op1( 0), CRn(12), CRm(12), Op2( 5), trap_raz_wi }, 1353 909 1354 910 { Op1( 0), CRn(13), CRm( 0), Op2( 1), access_vm_reg, NULL, c13_CID }, 911 + 912 + /* PMEVCNTRn */ 913 + PMU_PMEVCNTR(0), 914 + PMU_PMEVCNTR(1), 915 + PMU_PMEVCNTR(2), 916 + PMU_PMEVCNTR(3), 917 + PMU_PMEVCNTR(4), 918 + PMU_PMEVCNTR(5), 919 + PMU_PMEVCNTR(6), 920 + PMU_PMEVCNTR(7), 921 + PMU_PMEVCNTR(8), 922 + PMU_PMEVCNTR(9), 923 + PMU_PMEVCNTR(10), 924 + PMU_PMEVCNTR(11), 925 + PMU_PMEVCNTR(12), 926 + PMU_PMEVCNTR(13), 927 + PMU_PMEVCNTR(14), 928 + PMU_PMEVCNTR(15), 929 + PMU_PMEVCNTR(16), 930 + PMU_PMEVCNTR(17), 931 + PMU_PMEVCNTR(18), 932 + PMU_PMEVCNTR(19), 933 + PMU_PMEVCNTR(20), 934 + PMU_PMEVCNTR(21), 935 + PMU_PMEVCNTR(22), 936 + PMU_PMEVCNTR(23), 937 + PMU_PMEVCNTR(24), 938 + PMU_PMEVCNTR(25), 939 + PMU_PMEVCNTR(26), 940 + PMU_PMEVCNTR(27), 941 + PMU_PMEVCNTR(28), 942 + PMU_PMEVCNTR(29), 943 + PMU_PMEVCNTR(30), 944 + /* PMEVTYPERn */ 945 + PMU_PMEVTYPER(0), 946 + PMU_PMEVTYPER(1), 947 + PMU_PMEVTYPER(2), 948 + PMU_PMEVTYPER(3), 949 + PMU_PMEVTYPER(4), 950 + PMU_PMEVTYPER(5), 951 + PMU_PMEVTYPER(6), 952 + PMU_PMEVTYPER(7), 953 + PMU_PMEVTYPER(8), 954 + PMU_PMEVTYPER(9), 955 + PMU_PMEVTYPER(10), 956 + PMU_PMEVTYPER(11), 957 + PMU_PMEVTYPER(12), 958 + PMU_PMEVTYPER(13), 959 + PMU_PMEVTYPER(14), 960 + PMU_PMEVTYPER(15), 961 + PMU_PMEVTYPER(16), 962 + PMU_PMEVTYPER(17), 963 + PMU_PMEVTYPER(18), 964 + PMU_PMEVTYPER(19), 965 + PMU_PMEVTYPER(20), 966 + PMU_PMEVTYPER(21), 967 + PMU_PMEVTYPER(22), 968 + PMU_PMEVTYPER(23), 969 + PMU_PMEVTYPER(24), 970 + PMU_PMEVTYPER(25), 971 + PMU_PMEVTYPER(26), 972 + PMU_PMEVTYPER(27), 973 + PMU_PMEVTYPER(28), 974 + PMU_PMEVTYPER(29), 975 + PMU_PMEVTYPER(30), 976 + /* PMCCFILTR */ 977 + { Op1(0), CRn(14), CRm(15), Op2(7), access_pmu_evtyper }, 1355 978 }; 1356 979 1357 980 static const struct sys_reg_desc cp15_64_regs[] = { 1358 981 { Op1( 0), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, c2_TTBR0 }, 982 + { Op1( 0), CRn( 0), CRm( 9), Op2( 0), access_pmu_evcntr }, 1359 983 { Op1( 0), CRn( 0), CRm(12), Op2( 0), access_gic_sgi }, 1360 984 { Op1( 1), CRn( 0), CRm( 2), Op2( 0), access_vm_reg, NULL, c2_TTBR1 }, 1361 985 }; ··· 1454 942 } 1455 943 } 1456 944 945 + #define reg_to_match_value(x) \ 946 + ({ \ 947 + unsigned long val; \ 948 + val = (x)->Op0 << 14; \ 949 + val |= (x)->Op1 << 11; \ 950 + val |= (x)->CRn << 7; \ 951 + val |= (x)->CRm << 3; \ 952 + val |= (x)->Op2; \ 953 + val; \ 954 + }) 955 + 956 + static int match_sys_reg(const void *key, const void *elt) 957 + { 958 + const unsigned long pval = (unsigned long)key; 959 + const struct sys_reg_desc *r = elt; 960 + 961 + return pval - reg_to_match_value(r); 962 + } 963 + 1457 964 static const struct sys_reg_desc *find_reg(const struct sys_reg_params *params, 1458 965 const struct sys_reg_desc table[], 1459 966 unsigned int num) 1460 967 { 1461 - unsigned int i; 968 + unsigned long pval = reg_to_match_value(params); 1462 969 1463 - for (i = 0; i < num; i++) { 1464 - const struct sys_reg_desc *r = &table[i]; 1465 - 1466 - if (params->Op0 != r->Op0) 1467 - continue; 1468 - if (params->Op1 != r->Op1) 1469 - continue; 1470 - if (params->CRn != r->CRn) 1471 - continue; 1472 - if (params->CRm != r->CRm) 1473 - continue; 1474 - if (params->Op2 != r->Op2) 1475 - continue; 1476 - 1477 - return r; 1478 - } 1479 - return NULL; 970 + return bsearch((void *)pval, table, num, sizeof(table[0]), match_sys_reg); 1480 971 } 1481 972 1482 973 int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run)

-2

arch/powerpc/include/asm/kvm_book3s_64.h

··· 33 33 } 34 34 #endif 35 35 36 - #define SPAPR_TCE_SHIFT 12 37 - 38 36 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 39 37 #define KVM_DEFAULT_HPT_ORDER 24 /* 16MB HPT by default */ 40 38 #endif

+4 -1

arch/powerpc/include/asm/kvm_host.h

··· 182 182 struct list_head list; 183 183 struct kvm *kvm; 184 184 u64 liobn; 185 - u32 window_size; 185 + struct rcu_head rcu; 186 + u32 page_shift; 187 + u64 offset; /* in pages */ 188 + u64 size; /* window size in pages */ 186 189 struct page *pages[0]; 187 190 }; 188 191

+50 -1

arch/powerpc/include/asm/kvm_ppc.h

··· 165 165 extern int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu); 166 166 167 167 extern long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, 168 - struct kvm_create_spapr_tce *args); 168 + struct kvm_create_spapr_tce_64 *args); 169 + extern struct kvmppc_spapr_tce_table *kvmppc_find_table( 170 + struct kvm_vcpu *vcpu, unsigned long liobn); 171 + extern long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, 172 + unsigned long ioba, unsigned long npages); 173 + extern long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *tt, 174 + unsigned long tce); 175 + extern long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa, 176 + unsigned long *ua, unsigned long **prmap); 177 + extern void kvmppc_tce_put(struct kvmppc_spapr_tce_table *tt, 178 + unsigned long idx, unsigned long tce); 169 179 extern long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, 170 180 unsigned long ioba, unsigned long tce); 181 + extern long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, 182 + unsigned long liobn, unsigned long ioba, 183 + unsigned long tce_list, unsigned long npages); 184 + extern long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu, 185 + unsigned long liobn, unsigned long ioba, 186 + unsigned long tce_value, unsigned long npages); 171 187 extern long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, 172 188 unsigned long ioba); 173 189 extern struct page *kvm_alloc_hpt(unsigned long nr_pages); ··· 453 437 { 454 438 return vcpu->arch.irq_type == KVMPPC_IRQ_XICS; 455 439 } 440 + extern void kvmppc_alloc_host_rm_ops(void); 441 + extern void kvmppc_free_host_rm_ops(void); 456 442 extern void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu); 457 443 extern int kvmppc_xics_create_icp(struct kvm_vcpu *vcpu, unsigned long server); 458 444 extern int kvm_vm_ioctl_xics_irq(struct kvm *kvm, struct kvm_irq_level *args); ··· 463 445 extern int kvmppc_xics_set_icp(struct kvm_vcpu *vcpu, u64 icpval); 464 446 extern int kvmppc_xics_connect_vcpu(struct kvm_device *dev, 465 447 struct kvm_vcpu *vcpu, u32 cpu); 448 + extern void kvmppc_xics_ipi_action(void); 449 + extern int h_ipi_redirect; 466 450 #else 451 + static inline void kvmppc_alloc_host_rm_ops(void) {}; 452 + static inline void kvmppc_free_host_rm_ops(void) {}; 467 453 static inline int kvmppc_xics_enabled(struct kvm_vcpu *vcpu) 468 454 { return 0; } 469 455 static inline void kvmppc_xics_free_icp(struct kvm_vcpu *vcpu) { } ··· 480 458 static inline int kvmppc_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd) 481 459 { return 0; } 482 460 #endif 461 + 462 + /* 463 + * Host-side operations we want to set up while running in real 464 + * mode in the guest operating on the xics. 465 + * Currently only VCPU wakeup is supported. 466 + */ 467 + 468 + union kvmppc_rm_state { 469 + unsigned long raw; 470 + struct { 471 + u32 in_host; 472 + u32 rm_action; 473 + }; 474 + }; 475 + 476 + struct kvmppc_host_rm_core { 477 + union kvmppc_rm_state rm_state; 478 + void *rm_data; 479 + char pad[112]; 480 + }; 481 + 482 + struct kvmppc_host_rm_ops { 483 + struct kvmppc_host_rm_core *rm_core; 484 + void (*vcpu_kick)(struct kvm_vcpu *vcpu); 485 + }; 486 + 487 + extern struct kvmppc_host_rm_ops *kvmppc_host_rm_ops_hv; 483 488 484 489 static inline unsigned long kvmppc_get_epr(struct kvm_vcpu *vcpu) 485 490 {

+3

arch/powerpc/include/asm/pgtable.h

··· 78 78 } 79 79 return __find_linux_pte_or_hugepte(pgdir, ea, is_thp, shift); 80 80 } 81 + 82 + unsigned long vmalloc_to_phys(void *vmalloc_addr); 83 + 81 84 #endif /* __ASSEMBLY__ */ 82 85 83 86 #endif /* _ASM_POWERPC_PGTABLE_H */

+4

arch/powerpc/include/asm/smp.h

··· 114 114 #define PPC_MSG_TICK_BROADCAST 2 115 115 #define PPC_MSG_DEBUGGER_BREAK 3 116 116 117 + /* This is only used by the powernv kernel */ 118 + #define PPC_MSG_RM_HOST_ACTION 4 119 + 117 120 /* for irq controllers that have dedicated ipis per message (4) */ 118 121 extern int smp_request_message_ipi(int virq, int message); 119 122 extern const char *smp_ipi_name[]; ··· 124 121 /* for irq controllers with only a single ipi */ 125 122 extern void smp_muxed_ipi_set_data(int cpu, unsigned long data); 126 123 extern void smp_muxed_ipi_message_pass(int cpu, int msg); 124 + extern void smp_muxed_ipi_set_message(int cpu, int msg); 127 125 extern irqreturn_t smp_ipi_demux(void); 128 126 129 127 void smp_init_pSeries(void);

+1

arch/powerpc/include/asm/xics.h

··· 30 30 #ifdef CONFIG_PPC_ICP_NATIVE 31 31 extern int icp_native_init(void); 32 32 extern void icp_native_flush_interrupt(void); 33 + extern void icp_native_cause_ipi_rm(int cpu); 33 34 #else 34 35 static inline int icp_native_init(void) { return -ENODEV; } 35 36 #endif

+9

arch/powerpc/include/uapi/asm/kvm.h

··· 333 333 __u32 window_size; 334 334 }; 335 335 336 + /* for KVM_CAP_SPAPR_TCE_64 */ 337 + struct kvm_create_spapr_tce_64 { 338 + __u64 liobn; 339 + __u32 page_shift; 340 + __u32 flags; 341 + __u64 offset; /* in pages */ 342 + __u64 size; /* in pages */ 343 + }; 344 + 336 345 /* for KVM_ALLOCATE_RMA */ 337 346 struct kvm_allocate_rma { 338 347 __u64 rma_size;

+23 -5

arch/powerpc/kernel/smp.c

··· 206 206 207 207 #ifdef CONFIG_PPC_SMP_MUXED_IPI 208 208 struct cpu_messages { 209 - int messages; /* current messages */ 209 + long messages; /* current messages */ 210 210 unsigned long data; /* data for cause ipi */ 211 211 }; 212 212 static DEFINE_PER_CPU_SHARED_ALIGNED(struct cpu_messages, ipi_message); ··· 218 218 info->data = data; 219 219 } 220 220 221 - void smp_muxed_ipi_message_pass(int cpu, int msg) 221 + void smp_muxed_ipi_set_message(int cpu, int msg) 222 222 { 223 223 struct cpu_messages *info = &per_cpu(ipi_message, cpu); 224 224 char *message = (char *)&info->messages; ··· 228 228 */ 229 229 smp_mb(); 230 230 message[msg] = 1; 231 + } 232 + 233 + void smp_muxed_ipi_message_pass(int cpu, int msg) 234 + { 235 + struct cpu_messages *info = &per_cpu(ipi_message, cpu); 236 + 237 + smp_muxed_ipi_set_message(cpu, msg); 231 238 /* 232 239 * cause_ipi functions are required to include a full barrier 233 240 * before doing whatever causes the IPI. ··· 243 236 } 244 237 245 238 #ifdef __BIG_ENDIAN__ 246 - #define IPI_MESSAGE(A) (1 << (24 - 8 * (A))) 239 + #define IPI_MESSAGE(A) (1uL << ((BITS_PER_LONG - 8) - 8 * (A))) 247 240 #else 248 - #define IPI_MESSAGE(A) (1 << (8 * (A))) 241 + #define IPI_MESSAGE(A) (1uL << (8 * (A))) 249 242 #endif 250 243 251 244 irqreturn_t smp_ipi_demux(void) 252 245 { 253 246 struct cpu_messages *info = this_cpu_ptr(&ipi_message); 254 - unsigned int all; 247 + unsigned long all; 255 248 256 249 mb(); /* order any irq clear */ 257 250 258 251 do { 259 252 all = xchg(&info->messages, 0); 253 + #if defined(CONFIG_KVM_XICS) && defined(CONFIG_KVM_BOOK3S_HV_POSSIBLE) 254 + /* 255 + * Must check for PPC_MSG_RM_HOST_ACTION messages 256 + * before PPC_MSG_CALL_FUNCTION messages because when 257 + * a VM is destroyed, we call kick_all_cpus_sync() 258 + * to ensure that any pending PPC_MSG_RM_HOST_ACTION 259 + * messages have completed before we free any VCPUs. 260 + */ 261 + if (all & IPI_MESSAGE(PPC_MSG_RM_HOST_ACTION)) 262 + kvmppc_xics_ipi_action(); 263 + #endif 260 264 if (all & IPI_MESSAGE(PPC_MSG_CALL_FUNCTION)) 261 265 generic_smp_call_function_interrupt(); 262 266 if (all & IPI_MESSAGE(PPC_MSG_RESCHEDULE))

+1 -1

arch/powerpc/kvm/Makefile

··· 8 8 KVM := ../../../virt/kvm 9 9 10 10 common-objs-y = $(KVM)/kvm_main.o $(KVM)/coalesced_mmio.o \ 11 - $(KVM)/eventfd.o 11 + $(KVM)/eventfd.o $(KVM)/vfio.o 12 12 13 13 CFLAGS_e500_mmu.o := -I. 14 14 CFLAGS_e500_mmu_host.o := -I.

+1 -1

arch/powerpc/kvm/book3s.c

··· 807 807 { 808 808 809 809 #ifdef CONFIG_PPC64 810 - INIT_LIST_HEAD(&kvm->arch.spapr_tce_tables); 810 + INIT_LIST_HEAD_RCU(&kvm->arch.spapr_tce_tables); 811 811 INIT_LIST_HEAD(&kvm->arch.rtas_tokens); 812 812 #endif 813 813

+137 -21

arch/powerpc/kvm/book3s_64_vio.c

··· 14 14 * 15 15 * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com> 16 16 * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com> 17 + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com> 17 18 */ 18 19 19 20 #include <linux/types.h> ··· 37 36 #include <asm/ppc-opcode.h> 38 37 #include <asm/kvm_host.h> 39 38 #include <asm/udbg.h> 39 + #include <asm/iommu.h> 40 + #include <asm/tce.h> 40 41 41 - #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) 42 - 43 - static long kvmppc_stt_npages(unsigned long window_size) 42 + static unsigned long kvmppc_tce_pages(unsigned long iommu_pages) 44 43 { 45 - return ALIGN((window_size >> SPAPR_TCE_SHIFT) 46 - * sizeof(u64), PAGE_SIZE) / PAGE_SIZE; 44 + return ALIGN(iommu_pages * sizeof(u64), PAGE_SIZE) / PAGE_SIZE; 47 45 } 48 46 49 - static void release_spapr_tce_table(struct kvmppc_spapr_tce_table *stt) 47 + static unsigned long kvmppc_stt_pages(unsigned long tce_pages) 50 48 { 51 - struct kvm *kvm = stt->kvm; 52 - int i; 49 + unsigned long stt_bytes = sizeof(struct kvmppc_spapr_tce_table) + 50 + (tce_pages * sizeof(struct page *)); 53 51 54 - mutex_lock(&kvm->lock); 55 - list_del(&stt->list); 56 - for (i = 0; i < kvmppc_stt_npages(stt->window_size); i++) 52 + return tce_pages + ALIGN(stt_bytes, PAGE_SIZE) / PAGE_SIZE; 53 + } 54 + 55 + static long kvmppc_account_memlimit(unsigned long stt_pages, bool inc) 56 + { 57 + long ret = 0; 58 + 59 + if (!current || !current->mm) 60 + return ret; /* process exited */ 61 + 62 + down_write(&current->mm->mmap_sem); 63 + 64 + if (inc) { 65 + unsigned long locked, lock_limit; 66 + 67 + locked = current->mm->locked_vm + stt_pages; 68 + lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; 69 + if (locked > lock_limit && !capable(CAP_IPC_LOCK)) 70 + ret = -ENOMEM; 71 + else 72 + current->mm->locked_vm += stt_pages; 73 + } else { 74 + if (WARN_ON_ONCE(stt_pages > current->mm->locked_vm)) 75 + stt_pages = current->mm->locked_vm; 76 + 77 + current->mm->locked_vm -= stt_pages; 78 + } 79 + 80 + pr_debug("[%d] RLIMIT_MEMLOCK KVM %c%ld %ld/%ld%s\n", current->pid, 81 + inc ? '+' : '-', 82 + stt_pages << PAGE_SHIFT, 83 + current->mm->locked_vm << PAGE_SHIFT, 84 + rlimit(RLIMIT_MEMLOCK), 85 + ret ? " - exceeded" : ""); 86 + 87 + up_write(&current->mm->mmap_sem); 88 + 89 + return ret; 90 + } 91 + 92 + static void release_spapr_tce_table(struct rcu_head *head) 93 + { 94 + struct kvmppc_spapr_tce_table *stt = container_of(head, 95 + struct kvmppc_spapr_tce_table, rcu); 96 + unsigned long i, npages = kvmppc_tce_pages(stt->size); 97 + 98 + for (i = 0; i < npages; i++) 57 99 __free_page(stt->pages[i]); 58 - kfree(stt); 59 - mutex_unlock(&kvm->lock); 60 100 61 - kvm_put_kvm(kvm); 101 + kfree(stt); 62 102 } 63 103 64 104 static int kvm_spapr_tce_fault(struct vm_area_struct *vma, struct vm_fault *vmf) ··· 107 65 struct kvmppc_spapr_tce_table *stt = vma->vm_file->private_data; 108 66 struct page *page; 109 67 110 - if (vmf->pgoff >= kvmppc_stt_npages(stt->window_size)) 68 + if (vmf->pgoff >= kvmppc_tce_pages(stt->size)) 111 69 return VM_FAULT_SIGBUS; 112 70 113 71 page = stt->pages[vmf->pgoff]; ··· 130 88 { 131 89 struct kvmppc_spapr_tce_table *stt = filp->private_data; 132 90 133 - release_spapr_tce_table(stt); 91 + list_del_rcu(&stt->list); 92 + 93 + kvm_put_kvm(stt->kvm); 94 + 95 + kvmppc_account_memlimit( 96 + kvmppc_stt_pages(kvmppc_tce_pages(stt->size)), false); 97 + call_rcu(&stt->rcu, release_spapr_tce_table); 98 + 134 99 return 0; 135 100 } 136 101 ··· 147 98 }; 148 99 149 100 long kvm_vm_ioctl_create_spapr_tce(struct kvm *kvm, 150 - struct kvm_create_spapr_tce *args) 101 + struct kvm_create_spapr_tce_64 *args) 151 102 { 152 103 struct kvmppc_spapr_tce_table *stt = NULL; 153 - long npages; 104 + unsigned long npages, size; 154 105 int ret = -ENOMEM; 155 106 int i; 107 + 108 + if (!args->size) 109 + return -EINVAL; 156 110 157 111 /* Check this LIOBN hasn't been previously allocated */ 158 112 list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) { ··· 163 111 return -EBUSY; 164 112 } 165 113 166 - npages = kvmppc_stt_npages(args->window_size); 114 + size = args->size; 115 + npages = kvmppc_tce_pages(size); 116 + ret = kvmppc_account_memlimit(kvmppc_stt_pages(npages), true); 117 + if (ret) { 118 + stt = NULL; 119 + goto fail; 120 + } 167 121 168 122 stt = kzalloc(sizeof(*stt) + npages * sizeof(struct page *), 169 123 GFP_KERNEL); ··· 177 119 goto fail; 178 120 179 121 stt->liobn = args->liobn; 180 - stt->window_size = args->window_size; 122 + stt->page_shift = args->page_shift; 123 + stt->offset = args->offset; 124 + stt->size = size; 181 125 stt->kvm = kvm; 182 126 183 127 for (i = 0; i < npages; i++) { ··· 191 131 kvm_get_kvm(kvm); 192 132 193 133 mutex_lock(&kvm->lock); 194 - list_add(&stt->list, &kvm->arch.spapr_tce_tables); 134 + list_add_rcu(&stt->list, &kvm->arch.spapr_tce_tables); 195 135 196 136 mutex_unlock(&kvm->lock); 197 137 ··· 208 148 } 209 149 return ret; 210 150 } 151 + 152 + long kvmppc_h_put_tce_indirect(struct kvm_vcpu *vcpu, 153 + unsigned long liobn, unsigned long ioba, 154 + unsigned long tce_list, unsigned long npages) 155 + { 156 + struct kvmppc_spapr_tce_table *stt; 157 + long i, ret = H_SUCCESS, idx; 158 + unsigned long entry, ua = 0; 159 + u64 __user *tces, tce; 160 + 161 + stt = kvmppc_find_table(vcpu, liobn); 162 + if (!stt) 163 + return H_TOO_HARD; 164 + 165 + entry = ioba >> stt->page_shift; 166 + /* 167 + * SPAPR spec says that the maximum size of the list is 512 TCEs 168 + * so the whole table fits in 4K page 169 + */ 170 + if (npages > 512) 171 + return H_PARAMETER; 172 + 173 + if (tce_list & (SZ_4K - 1)) 174 + return H_PARAMETER; 175 + 176 + ret = kvmppc_ioba_validate(stt, ioba, npages); 177 + if (ret != H_SUCCESS) 178 + return ret; 179 + 180 + idx = srcu_read_lock(&vcpu->kvm->srcu); 181 + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, NULL)) { 182 + ret = H_TOO_HARD; 183 + goto unlock_exit; 184 + } 185 + tces = (u64 __user *) ua; 186 + 187 + for (i = 0; i < npages; ++i) { 188 + if (get_user(tce, tces + i)) { 189 + ret = H_TOO_HARD; 190 + goto unlock_exit; 191 + } 192 + tce = be64_to_cpu(tce); 193 + 194 + ret = kvmppc_tce_validate(stt, tce); 195 + if (ret != H_SUCCESS) 196 + goto unlock_exit; 197 + 198 + kvmppc_tce_put(stt, entry + i, tce); 199 + } 200 + 201 + unlock_exit: 202 + srcu_read_unlock(&vcpu->kvm->srcu, idx); 203 + 204 + return ret; 205 + } 206 + EXPORT_SYMBOL_GPL(kvmppc_h_put_tce_indirect);

+285 -39

arch/powerpc/kvm/book3s_64_vio_hv.c

··· 14 14 * 15 15 * Copyright 2010 Paul Mackerras, IBM Corp. <paulus@au1.ibm.com> 16 16 * Copyright 2011 David Gibson, IBM Corporation <dwg@au1.ibm.com> 17 + * Copyright 2016 Alexey Kardashevskiy, IBM Corporation <aik@au1.ibm.com> 17 18 */ 18 19 19 20 #include <linux/types.h> ··· 31 30 #include <asm/kvm_ppc.h> 32 31 #include <asm/kvm_book3s.h> 33 32 #include <asm/mmu-hash64.h> 33 + #include <asm/mmu_context.h> 34 34 #include <asm/hvcall.h> 35 35 #include <asm/synch.h> 36 36 #include <asm/ppc-opcode.h> 37 37 #include <asm/kvm_host.h> 38 38 #include <asm/udbg.h> 39 + #include <asm/iommu.h> 40 + #include <asm/tce.h> 41 + #include <asm/iommu.h> 39 42 40 43 #define TCES_PER_PAGE (PAGE_SIZE / sizeof(u64)) 41 44 42 - /* WARNING: This will be called in real-mode on HV KVM and virtual 45 + /* 46 + * Finds a TCE table descriptor by LIOBN. 47 + * 48 + * WARNING: This will be called in real or virtual mode on HV KVM and virtual 43 49 * mode on PR KVM 44 50 */ 45 - long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, 46 - unsigned long ioba, unsigned long tce) 51 + struct kvmppc_spapr_tce_table *kvmppc_find_table(struct kvm_vcpu *vcpu, 52 + unsigned long liobn) 47 53 { 48 54 struct kvm *kvm = vcpu->kvm; 49 55 struct kvmppc_spapr_tce_table *stt; 56 + 57 + list_for_each_entry_lockless(stt, &kvm->arch.spapr_tce_tables, list) 58 + if (stt->liobn == liobn) 59 + return stt; 60 + 61 + return NULL; 62 + } 63 + EXPORT_SYMBOL_GPL(kvmppc_find_table); 64 + 65 + /* 66 + * Validates IO address. 67 + * 68 + * WARNING: This will be called in real-mode on HV KVM and virtual 69 + * mode on PR KVM 70 + */ 71 + long kvmppc_ioba_validate(struct kvmppc_spapr_tce_table *stt, 72 + unsigned long ioba, unsigned long npages) 73 + { 74 + unsigned long mask = (1ULL << stt->page_shift) - 1; 75 + unsigned long idx = ioba >> stt->page_shift; 76 + 77 + if ((ioba & mask) || (idx < stt->offset) || 78 + (idx - stt->offset + npages > stt->size) || 79 + (idx + npages < idx)) 80 + return H_PARAMETER; 81 + 82 + return H_SUCCESS; 83 + } 84 + EXPORT_SYMBOL_GPL(kvmppc_ioba_validate); 85 + 86 + /* 87 + * Validates TCE address. 88 + * At the moment flags and page mask are validated. 89 + * As the host kernel does not access those addresses (just puts them 90 + * to the table and user space is supposed to process them), we can skip 91 + * checking other things (such as TCE is a guest RAM address or the page 92 + * was actually allocated). 93 + * 94 + * WARNING: This will be called in real-mode on HV KVM and virtual 95 + * mode on PR KVM 96 + */ 97 + long kvmppc_tce_validate(struct kvmppc_spapr_tce_table *stt, unsigned long tce) 98 + { 99 + unsigned long page_mask = ~((1ULL << stt->page_shift) - 1); 100 + unsigned long mask = ~(page_mask | TCE_PCI_WRITE | TCE_PCI_READ); 101 + 102 + if (tce & mask) 103 + return H_PARAMETER; 104 + 105 + return H_SUCCESS; 106 + } 107 + EXPORT_SYMBOL_GPL(kvmppc_tce_validate); 108 + 109 + /* Note on the use of page_address() in real mode, 110 + * 111 + * It is safe to use page_address() in real mode on ppc64 because 112 + * page_address() is always defined as lowmem_page_address() 113 + * which returns __va(PFN_PHYS(page_to_pfn(page))) which is arithmetic 114 + * operation and does not access page struct. 115 + * 116 + * Theoretically page_address() could be defined different 117 + * but either WANT_PAGE_VIRTUAL or HASHED_PAGE_VIRTUAL 118 + * would have to be enabled. 119 + * WANT_PAGE_VIRTUAL is never enabled on ppc32/ppc64, 120 + * HASHED_PAGE_VIRTUAL could be enabled for ppc32 only and only 121 + * if CONFIG_HIGHMEM is defined. As CONFIG_SPARSEMEM_VMEMMAP 122 + * is not expected to be enabled on ppc32, page_address() 123 + * is safe for ppc32 as well. 124 + * 125 + * WARNING: This will be called in real-mode on HV KVM and virtual 126 + * mode on PR KVM 127 + */ 128 + static u64 *kvmppc_page_address(struct page *page) 129 + { 130 + #if defined(HASHED_PAGE_VIRTUAL) || defined(WANT_PAGE_VIRTUAL) 131 + #error TODO: fix to avoid page_address() here 132 + #endif 133 + return (u64 *) page_address(page); 134 + } 135 + 136 + /* 137 + * Handles TCE requests for emulated devices. 138 + * Puts guest TCE values to the table and expects user space to convert them. 139 + * Called in both real and virtual modes. 140 + * Cannot fail so kvmppc_tce_validate must be called before it. 141 + * 142 + * WARNING: This will be called in real-mode on HV KVM and virtual 143 + * mode on PR KVM 144 + */ 145 + void kvmppc_tce_put(struct kvmppc_spapr_tce_table *stt, 146 + unsigned long idx, unsigned long tce) 147 + { 148 + struct page *page; 149 + u64 *tbl; 150 + 151 + idx -= stt->offset; 152 + page = stt->pages[idx / TCES_PER_PAGE]; 153 + tbl = kvmppc_page_address(page); 154 + 155 + tbl[idx % TCES_PER_PAGE] = tce; 156 + } 157 + EXPORT_SYMBOL_GPL(kvmppc_tce_put); 158 + 159 + long kvmppc_gpa_to_ua(struct kvm *kvm, unsigned long gpa, 160 + unsigned long *ua, unsigned long **prmap) 161 + { 162 + unsigned long gfn = gpa >> PAGE_SHIFT; 163 + struct kvm_memory_slot *memslot; 164 + 165 + memslot = search_memslots(kvm_memslots(kvm), gfn); 166 + if (!memslot) 167 + return -EINVAL; 168 + 169 + *ua = __gfn_to_hva_memslot(memslot, gfn) | 170 + (gpa & ~(PAGE_MASK | TCE_PCI_READ | TCE_PCI_WRITE)); 171 + 172 + #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 173 + if (prmap) 174 + *prmap = &memslot->arch.rmap[gfn - memslot->base_gfn]; 175 + #endif 176 + 177 + return 0; 178 + } 179 + EXPORT_SYMBOL_GPL(kvmppc_gpa_to_ua); 180 + 181 + #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 182 + long kvmppc_h_put_tce(struct kvm_vcpu *vcpu, unsigned long liobn, 183 + unsigned long ioba, unsigned long tce) 184 + { 185 + struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn); 186 + long ret; 50 187 51 188 /* udbg_printf("H_PUT_TCE(): liobn=0x%lx ioba=0x%lx, tce=0x%lx\n", */ 52 189 /* liobn, ioba, tce); */ 53 190 54 - list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) { 55 - if (stt->liobn == liobn) { 56 - unsigned long idx = ioba >> SPAPR_TCE_SHIFT; 57 - struct page *page; 58 - u64 *tbl; 191 + if (!stt) 192 + return H_TOO_HARD; 59 193 60 - /* udbg_printf("H_PUT_TCE: liobn 0x%lx => stt=%p window_size=0x%x\n", */ 61 - /* liobn, stt, stt->window_size); */ 62 - if (ioba >= stt->window_size) 63 - return H_PARAMETER; 194 + ret = kvmppc_ioba_validate(stt, ioba, 1); 195 + if (ret != H_SUCCESS) 196 + return ret; 64 197 65 - page = stt->pages[idx / TCES_PER_PAGE]; 66 - tbl = (u64 *)page_address(page); 198 + ret = kvmppc_tce_validate(stt, tce); 199 + if (ret != H_SUCCESS) 200 + return ret; 67 201 68 - /* FIXME: Need to validate the TCE itself */ 69 - /* udbg_printf("tce @ %p\n", &tbl[idx % TCES_PER_PAGE]); */ 70 - tbl[idx % TCES_PER_PAGE] = tce; 71 - return H_SUCCESS; 72 - } 73 - } 202 + kvmppc_tce_put(stt, ioba >> stt->page_shift, tce); 74 203 75 - /* Didn't find the liobn, punt it to userspace */ 76 - return H_TOO_HARD; 204 + return H_SUCCESS; 77 205 } 78 206 EXPORT_SYMBOL_GPL(kvmppc_h_put_tce); 207 + 208 + static long kvmppc_rm_ua_to_hpa(struct kvm_vcpu *vcpu, 209 + unsigned long ua, unsigned long *phpa) 210 + { 211 + pte_t *ptep, pte; 212 + unsigned shift = 0; 213 + 214 + ptep = __find_linux_pte_or_hugepte(vcpu->arch.pgdir, ua, NULL, &shift); 215 + if (!ptep || !pte_present(*ptep)) 216 + return -ENXIO; 217 + pte = *ptep; 218 + 219 + if (!shift) 220 + shift = PAGE_SHIFT; 221 + 222 + /* Avoid handling anything potentially complicated in realmode */ 223 + if (shift > PAGE_SHIFT) 224 + return -EAGAIN; 225 + 226 + if (!pte_young(pte)) 227 + return -EAGAIN; 228 + 229 + *phpa = (pte_pfn(pte) << PAGE_SHIFT) | (ua & ((1ULL << shift) - 1)) | 230 + (ua & ~PAGE_MASK); 231 + 232 + return 0; 233 + } 234 + 235 + long kvmppc_rm_h_put_tce_indirect(struct kvm_vcpu *vcpu, 236 + unsigned long liobn, unsigned long ioba, 237 + unsigned long tce_list, unsigned long npages) 238 + { 239 + struct kvmppc_spapr_tce_table *stt; 240 + long i, ret = H_SUCCESS; 241 + unsigned long tces, entry, ua = 0; 242 + unsigned long *rmap = NULL; 243 + 244 + stt = kvmppc_find_table(vcpu, liobn); 245 + if (!stt) 246 + return H_TOO_HARD; 247 + 248 + entry = ioba >> stt->page_shift; 249 + /* 250 + * The spec says that the maximum size of the list is 512 TCEs 251 + * so the whole table addressed resides in 4K page 252 + */ 253 + if (npages > 512) 254 + return H_PARAMETER; 255 + 256 + if (tce_list & (SZ_4K - 1)) 257 + return H_PARAMETER; 258 + 259 + ret = kvmppc_ioba_validate(stt, ioba, npages); 260 + if (ret != H_SUCCESS) 261 + return ret; 262 + 263 + if (kvmppc_gpa_to_ua(vcpu->kvm, tce_list, &ua, &rmap)) 264 + return H_TOO_HARD; 265 + 266 + rmap = (void *) vmalloc_to_phys(rmap); 267 + 268 + /* 269 + * Synchronize with the MMU notifier callbacks in 270 + * book3s_64_mmu_hv.c (kvm_unmap_hva_hv etc.). 271 + * While we have the rmap lock, code running on other CPUs 272 + * cannot finish unmapping the host real page that backs 273 + * this guest real page, so we are OK to access the host 274 + * real page. 275 + */ 276 + lock_rmap(rmap); 277 + if (kvmppc_rm_ua_to_hpa(vcpu, ua, &tces)) { 278 + ret = H_TOO_HARD; 279 + goto unlock_exit; 280 + } 281 + 282 + for (i = 0; i < npages; ++i) { 283 + unsigned long tce = be64_to_cpu(((u64 *)tces)[i]); 284 + 285 + ret = kvmppc_tce_validate(stt, tce); 286 + if (ret != H_SUCCESS) 287 + goto unlock_exit; 288 + 289 + kvmppc_tce_put(stt, entry + i, tce); 290 + } 291 + 292 + unlock_exit: 293 + unlock_rmap(rmap); 294 + 295 + return ret; 296 + } 297 + 298 + long kvmppc_h_stuff_tce(struct kvm_vcpu *vcpu, 299 + unsigned long liobn, unsigned long ioba, 300 + unsigned long tce_value, unsigned long npages) 301 + { 302 + struct kvmppc_spapr_tce_table *stt; 303 + long i, ret; 304 + 305 + stt = kvmppc_find_table(vcpu, liobn); 306 + if (!stt) 307 + return H_TOO_HARD; 308 + 309 + ret = kvmppc_ioba_validate(stt, ioba, npages); 310 + if (ret != H_SUCCESS) 311 + return ret; 312 + 313 + /* Check permission bits only to allow userspace poison TCE for debug */ 314 + if (tce_value & (TCE_PCI_WRITE | TCE_PCI_READ)) 315 + return H_PARAMETER; 316 + 317 + for (i = 0; i < npages; ++i, ioba += (1ULL << stt->page_shift)) 318 + kvmppc_tce_put(stt, ioba >> stt->page_shift, tce_value); 319 + 320 + return H_SUCCESS; 321 + } 322 + EXPORT_SYMBOL_GPL(kvmppc_h_stuff_tce); 79 323 80 324 long kvmppc_h_get_tce(struct kvm_vcpu *vcpu, unsigned long liobn, 81 325 unsigned long ioba) 82 326 { 83 - struct kvm *kvm = vcpu->kvm; 84 - struct kvmppc_spapr_tce_table *stt; 327 + struct kvmppc_spapr_tce_table *stt = kvmppc_find_table(vcpu, liobn); 328 + long ret; 329 + unsigned long idx; 330 + struct page *page; 331 + u64 *tbl; 85 332 86 - list_for_each_entry(stt, &kvm->arch.spapr_tce_tables, list) { 87 - if (stt->liobn == liobn) { 88 - unsigned long idx = ioba >> SPAPR_TCE_SHIFT; 89 - struct page *page; 90 - u64 *tbl; 333 + if (!stt) 334 + return H_TOO_HARD; 91 335 92 - if (ioba >= stt->window_size) 93 - return H_PARAMETER; 336 + ret = kvmppc_ioba_validate(stt, ioba, 1); 337 + if (ret != H_SUCCESS) 338 + return ret; 94 339 95 - page = stt->pages[idx / TCES_PER_PAGE]; 96 - tbl = (u64 *)page_address(page); 340 + idx = (ioba >> stt->page_shift) - stt->offset; 341 + page = stt->pages[idx / TCES_PER_PAGE]; 342 + tbl = (u64 *)page_address(page); 97 343 98 - vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE]; 99 - return H_SUCCESS; 100 - } 101 - } 344 + vcpu->arch.gpr[4] = tbl[idx % TCES_PER_PAGE]; 102 345 103 - /* Didn't find the liobn, punt it to userspace */ 104 - return H_TOO_HARD; 346 + return H_SUCCESS; 105 347 } 106 348 EXPORT_SYMBOL_GPL(kvmppc_h_get_tce); 349 + 350 + #endif /* KVM_BOOK3S_HV_POSSIBLE */

+191 -1

arch/powerpc/kvm/book3s_hv.c

··· 81 81 module_param(target_smt_mode, int, S_IRUGO | S_IWUSR); 82 82 MODULE_PARM_DESC(target_smt_mode, "Target threads per core (0 = max)"); 83 83 84 + #ifdef CONFIG_KVM_XICS 85 + static struct kernel_param_ops module_param_ops = { 86 + .set = param_set_int, 87 + .get = param_get_int, 88 + }; 89 + 90 + module_param_cb(h_ipi_redirect, &module_param_ops, &h_ipi_redirect, 91 + S_IRUGO | S_IWUSR); 92 + MODULE_PARM_DESC(h_ipi_redirect, "Redirect H_IPI wakeup to a free host core"); 93 + #endif 94 + 84 95 static void kvmppc_end_cede(struct kvm_vcpu *vcpu); 85 96 static int kvmppc_hv_setup_htab_rma(struct kvm_vcpu *vcpu); 86 97 ··· 779 768 if (kvmppc_xics_enabled(vcpu)) { 780 769 ret = kvmppc_xics_hcall(vcpu, req); 781 770 break; 782 - } /* fallthrough */ 771 + } 772 + return RESUME_HOST; 773 + case H_PUT_TCE: 774 + ret = kvmppc_h_put_tce(vcpu, kvmppc_get_gpr(vcpu, 4), 775 + kvmppc_get_gpr(vcpu, 5), 776 + kvmppc_get_gpr(vcpu, 6)); 777 + if (ret == H_TOO_HARD) 778 + return RESUME_HOST; 779 + break; 780 + case H_PUT_TCE_INDIRECT: 781 + ret = kvmppc_h_put_tce_indirect(vcpu, kvmppc_get_gpr(vcpu, 4), 782 + kvmppc_get_gpr(vcpu, 5), 783 + kvmppc_get_gpr(vcpu, 6), 784 + kvmppc_get_gpr(vcpu, 7)); 785 + if (ret == H_TOO_HARD) 786 + return RESUME_HOST; 787 + break; 788 + case H_STUFF_TCE: 789 + ret = kvmppc_h_stuff_tce(vcpu, kvmppc_get_gpr(vcpu, 4), 790 + kvmppc_get_gpr(vcpu, 5), 791 + kvmppc_get_gpr(vcpu, 6), 792 + kvmppc_get_gpr(vcpu, 7)); 793 + if (ret == H_TOO_HARD) 794 + return RESUME_HOST; 795 + break; 783 796 default: 784 797 return RESUME_HOST; 785 798 } ··· 2314 2279 } 2315 2280 2316 2281 /* 2282 + * Clear core from the list of active host cores as we are about to 2283 + * enter the guest. Only do this if it is the primary thread of the 2284 + * core (not if a subcore) that is entering the guest. 2285 + */ 2286 + static inline void kvmppc_clear_host_core(int cpu) 2287 + { 2288 + int core; 2289 + 2290 + if (!kvmppc_host_rm_ops_hv || cpu_thread_in_core(cpu)) 2291 + return; 2292 + /* 2293 + * Memory barrier can be omitted here as we will do a smp_wmb() 2294 + * later in kvmppc_start_thread and we need ensure that state is 2295 + * visible to other CPUs only after we enter guest. 2296 + */ 2297 + core = cpu >> threads_shift; 2298 + kvmppc_host_rm_ops_hv->rm_core[core].rm_state.in_host = 0; 2299 + } 2300 + 2301 + /* 2302 + * Advertise this core as an active host core since we exited the guest 2303 + * Only need to do this if it is the primary thread of the core that is 2304 + * exiting. 2305 + */ 2306 + static inline void kvmppc_set_host_core(int cpu) 2307 + { 2308 + int core; 2309 + 2310 + if (!kvmppc_host_rm_ops_hv || cpu_thread_in_core(cpu)) 2311 + return; 2312 + 2313 + /* 2314 + * Memory barrier can be omitted here because we do a spin_unlock 2315 + * immediately after this which provides the memory barrier. 2316 + */ 2317 + core = cpu >> threads_shift; 2318 + kvmppc_host_rm_ops_hv->rm_core[core].rm_state.in_host = 1; 2319 + } 2320 + 2321 + /* 2317 2322 * Run a set of guest threads on a physical core. 2318 2323 * Called with vc->lock held. 2319 2324 */ ··· 2465 2390 } 2466 2391 } 2467 2392 2393 + kvmppc_clear_host_core(pcpu); 2394 + 2468 2395 /* Start all the threads */ 2469 2396 active = 0; 2470 2397 for (sub = 0; sub < core_info.n_subcores; ++sub) { ··· 2562 2485 if (sip && sip->napped[i]) 2563 2486 kvmppc_ipi_thread(pcpu + i); 2564 2487 } 2488 + 2489 + kvmppc_set_host_core(pcpu); 2565 2490 2566 2491 spin_unlock(&vc->lock); 2567 2492 ··· 3062 2983 goto out_srcu; 3063 2984 } 3064 2985 2986 + #ifdef CONFIG_KVM_XICS 2987 + static int kvmppc_cpu_notify(struct notifier_block *self, unsigned long action, 2988 + void *hcpu) 2989 + { 2990 + unsigned long cpu = (long)hcpu; 2991 + 2992 + switch (action) { 2993 + case CPU_UP_PREPARE: 2994 + case CPU_UP_PREPARE_FROZEN: 2995 + kvmppc_set_host_core(cpu); 2996 + break; 2997 + 2998 + #ifdef CONFIG_HOTPLUG_CPU 2999 + case CPU_DEAD: 3000 + case CPU_DEAD_FROZEN: 3001 + case CPU_UP_CANCELED: 3002 + case CPU_UP_CANCELED_FROZEN: 3003 + kvmppc_clear_host_core(cpu); 3004 + break; 3005 + #endif 3006 + default: 3007 + break; 3008 + } 3009 + 3010 + return NOTIFY_OK; 3011 + } 3012 + 3013 + static struct notifier_block kvmppc_cpu_notifier = { 3014 + .notifier_call = kvmppc_cpu_notify, 3015 + }; 3016 + 3017 + /* 3018 + * Allocate a per-core structure for managing state about which cores are 3019 + * running in the host versus the guest and for exchanging data between 3020 + * real mode KVM and CPU running in the host. 3021 + * This is only done for the first VM. 3022 + * The allocated structure stays even if all VMs have stopped. 3023 + * It is only freed when the kvm-hv module is unloaded. 3024 + * It's OK for this routine to fail, we just don't support host 3025 + * core operations like redirecting H_IPI wakeups. 3026 + */ 3027 + void kvmppc_alloc_host_rm_ops(void) 3028 + { 3029 + struct kvmppc_host_rm_ops *ops; 3030 + unsigned long l_ops; 3031 + int cpu, core; 3032 + int size; 3033 + 3034 + /* Not the first time here ? */ 3035 + if (kvmppc_host_rm_ops_hv != NULL) 3036 + return; 3037 + 3038 + ops = kzalloc(sizeof(struct kvmppc_host_rm_ops), GFP_KERNEL); 3039 + if (!ops) 3040 + return; 3041 + 3042 + size = cpu_nr_cores() * sizeof(struct kvmppc_host_rm_core); 3043 + ops->rm_core = kzalloc(size, GFP_KERNEL); 3044 + 3045 + if (!ops->rm_core) { 3046 + kfree(ops); 3047 + return; 3048 + } 3049 + 3050 + get_online_cpus(); 3051 + 3052 + for (cpu = 0; cpu < nr_cpu_ids; cpu += threads_per_core) { 3053 + if (!cpu_online(cpu)) 3054 + continue; 3055 + 3056 + core = cpu >> threads_shift; 3057 + ops->rm_core[core].rm_state.in_host = 1; 3058 + } 3059 + 3060 + ops->vcpu_kick = kvmppc_fast_vcpu_kick_hv; 3061 + 3062 + /* 3063 + * Make the contents of the kvmppc_host_rm_ops structure visible 3064 + * to other CPUs before we assign it to the global variable. 3065 + * Do an atomic assignment (no locks used here), but if someone 3066 + * beats us to it, just free our copy and return. 3067 + */ 3068 + smp_wmb(); 3069 + l_ops = (unsigned long) ops; 3070 + 3071 + if (cmpxchg64((unsigned long *)&kvmppc_host_rm_ops_hv, 0, l_ops)) { 3072 + put_online_cpus(); 3073 + kfree(ops->rm_core); 3074 + kfree(ops); 3075 + return; 3076 + } 3077 + 3078 + register_cpu_notifier(&kvmppc_cpu_notifier); 3079 + 3080 + put_online_cpus(); 3081 + } 3082 + 3083 + void kvmppc_free_host_rm_ops(void) 3084 + { 3085 + if (kvmppc_host_rm_ops_hv) { 3086 + unregister_cpu_notifier(&kvmppc_cpu_notifier); 3087 + kfree(kvmppc_host_rm_ops_hv->rm_core); 3088 + kfree(kvmppc_host_rm_ops_hv); 3089 + kvmppc_host_rm_ops_hv = NULL; 3090 + } 3091 + } 3092 + #endif 3093 + 3065 3094 static int kvmppc_core_init_vm_hv(struct kvm *kvm) 3066 3095 { 3067 3096 unsigned long lpcr, lpid; ··· 3181 2994 if ((long)lpid < 0) 3182 2995 return -ENOMEM; 3183 2996 kvm->arch.lpid = lpid; 2997 + 2998 + kvmppc_alloc_host_rm_ops(); 3184 2999 3185 3000 /* 3186 3001 * Since we don't flush the TLB when tearing down a VM, ··· 3417 3228 3418 3229 static void kvmppc_book3s_exit_hv(void) 3419 3230 { 3231 + kvmppc_free_host_rm_ops(); 3420 3232 kvmppc_hv_ops = NULL; 3421 3233 } 3422 3234

+3

arch/powerpc/kvm/book3s_hv_builtin.c

··· 283 283 kvmhv_interrupt_vcore(vc, ee); 284 284 } 285 285 } 286 + 287 + struct kvmppc_host_rm_ops *kvmppc_host_rm_ops_hv; 288 + EXPORT_SYMBOL_GPL(kvmppc_host_rm_ops_hv);

+128 -3

arch/powerpc/kvm/book3s_hv_rm_xics.c

··· 17 17 #include <asm/xics.h> 18 18 #include <asm/debug.h> 19 19 #include <asm/synch.h> 20 + #include <asm/cputhreads.h> 20 21 #include <asm/ppc-opcode.h> 21 22 22 23 #include "book3s_xics.h" 23 24 24 25 #define DEBUG_PASSUP 26 + 27 + int h_ipi_redirect = 1; 28 + EXPORT_SYMBOL(h_ipi_redirect); 25 29 26 30 static void icp_rm_deliver_irq(struct kvmppc_xics *xics, struct kvmppc_icp *icp, 27 31 u32 new_irq); ··· 54 50 55 51 /* -- ICP routines -- */ 56 52 53 + #ifdef CONFIG_SMP 54 + static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) 55 + { 56 + int hcpu; 57 + 58 + hcpu = hcore << threads_shift; 59 + kvmppc_host_rm_ops_hv->rm_core[hcore].rm_data = vcpu; 60 + smp_muxed_ipi_set_message(hcpu, PPC_MSG_RM_HOST_ACTION); 61 + icp_native_cause_ipi_rm(hcpu); 62 + } 63 + #else 64 + static inline void icp_send_hcore_msg(int hcore, struct kvm_vcpu *vcpu) { } 65 + #endif 66 + 67 + /* 68 + * We start the search from our current CPU Id in the core map 69 + * and go in a circle until we get back to our ID looking for a 70 + * core that is running in host context and that hasn't already 71 + * been targeted for another rm_host_ops. 72 + * 73 + * In the future, could consider using a fairer algorithm (one 74 + * that distributes the IPIs better) 75 + * 76 + * Returns -1, if no CPU could be found in the host 77 + * Else, returns a CPU Id which has been reserved for use 78 + */ 79 + static inline int grab_next_hostcore(int start, 80 + struct kvmppc_host_rm_core *rm_core, int max, int action) 81 + { 82 + bool success; 83 + int core; 84 + union kvmppc_rm_state old, new; 85 + 86 + for (core = start + 1; core < max; core++) { 87 + old = new = READ_ONCE(rm_core[core].rm_state); 88 + 89 + if (!old.in_host || old.rm_action) 90 + continue; 91 + 92 + /* Try to grab this host core if not taken already. */ 93 + new.rm_action = action; 94 + 95 + success = cmpxchg64(&rm_core[core].rm_state.raw, 96 + old.raw, new.raw) == old.raw; 97 + if (success) { 98 + /* 99 + * Make sure that the store to the rm_action is made 100 + * visible before we return to caller (and the 101 + * subsequent store to rm_data) to synchronize with 102 + * the IPI handler. 103 + */ 104 + smp_wmb(); 105 + return core; 106 + } 107 + } 108 + 109 + return -1; 110 + } 111 + 112 + static inline int find_available_hostcore(int action) 113 + { 114 + int core; 115 + int my_core = smp_processor_id() >> threads_shift; 116 + struct kvmppc_host_rm_core *rm_core = kvmppc_host_rm_ops_hv->rm_core; 117 + 118 + core = grab_next_hostcore(my_core, rm_core, cpu_nr_cores(), action); 119 + if (core == -1) 120 + core = grab_next_hostcore(core, rm_core, my_core, action); 121 + 122 + return core; 123 + } 124 + 57 125 static void icp_rm_set_vcpu_irq(struct kvm_vcpu *vcpu, 58 126 struct kvm_vcpu *this_vcpu) 59 127 { 60 128 struct kvmppc_icp *this_icp = this_vcpu->arch.icp; 61 129 int cpu; 130 + int hcore; 62 131 63 132 /* Mark the target VCPU as having an interrupt pending */ 64 133 vcpu->stat.queue_intr++; ··· 143 66 return; 144 67 } 145 68 146 - /* Check if the core is loaded, if not, too hard */ 69 + /* 70 + * Check if the core is loaded, 71 + * if not, find an available host core to post to wake the VCPU, 72 + * if we can't find one, set up state to eventually return too hard. 73 + */ 147 74 cpu = vcpu->arch.thread_cpu; 148 75 if (cpu < 0 || cpu >= nr_cpu_ids) { 149 - this_icp->rm_action |= XICS_RM_KICK_VCPU; 150 - this_icp->rm_kick_target = vcpu; 76 + hcore = -1; 77 + if (kvmppc_host_rm_ops_hv && h_ipi_redirect) 78 + hcore = find_available_hostcore(XICS_RM_KICK_VCPU); 79 + if (hcore != -1) { 80 + icp_send_hcore_msg(hcore, vcpu); 81 + } else { 82 + this_icp->rm_action |= XICS_RM_KICK_VCPU; 83 + this_icp->rm_kick_target = vcpu; 84 + } 151 85 return; 152 86 } 153 87 ··· 710 622 } 711 623 bail: 712 624 return check_too_hard(xics, icp); 625 + } 626 + 627 + /* --- Non-real mode XICS-related built-in routines --- */ 628 + 629 + /** 630 + * Host Operations poked by RM KVM 631 + */ 632 + static void rm_host_ipi_action(int action, void *data) 633 + { 634 + switch (action) { 635 + case XICS_RM_KICK_VCPU: 636 + kvmppc_host_rm_ops_hv->vcpu_kick(data); 637 + break; 638 + default: 639 + WARN(1, "Unexpected rm_action=%d data=%p\n", action, data); 640 + break; 641 + } 642 + 643 + } 644 + 645 + void kvmppc_xics_ipi_action(void) 646 + { 647 + int core; 648 + unsigned int cpu = smp_processor_id(); 649 + struct kvmppc_host_rm_core *rm_corep; 650 + 651 + core = cpu >> threads_shift; 652 + rm_corep = &kvmppc_host_rm_ops_hv->rm_core[core]; 653 + 654 + if (rm_corep->rm_data) { 655 + rm_host_ipi_action(rm_corep->rm_state.rm_action, 656 + rm_corep->rm_data); 657 + /* Order these stores against the real mode KVM */ 658 + rm_corep->rm_data = NULL; 659 + smp_wmb(); 660 + rm_corep->rm_state.rm_action = 0; 661 + } 713 662 }

+2 -2

arch/powerpc/kvm/book3s_hv_rmhandlers.S

··· 2020 2020 .long 0 /* 0x12c */ 2021 2021 .long 0 /* 0x130 */ 2022 2022 .long DOTSYM(kvmppc_h_set_xdabr) - hcall_real_table 2023 - .long 0 /* 0x138 */ 2024 - .long 0 /* 0x13c */ 2023 + .long DOTSYM(kvmppc_h_stuff_tce) - hcall_real_table 2024 + .long DOTSYM(kvmppc_rm_h_put_tce_indirect) - hcall_real_table 2025 2025 .long 0 /* 0x140 */ 2026 2026 .long 0 /* 0x144 */ 2027 2027 .long 0 /* 0x148 */

+35

arch/powerpc/kvm/book3s_pr_papr.c

··· 280 280 return EMULATE_DONE; 281 281 } 282 282 283 + static int kvmppc_h_pr_put_tce_indirect(struct kvm_vcpu *vcpu) 284 + { 285 + unsigned long liobn = kvmppc_get_gpr(vcpu, 4); 286 + unsigned long ioba = kvmppc_get_gpr(vcpu, 5); 287 + unsigned long tce = kvmppc_get_gpr(vcpu, 6); 288 + unsigned long npages = kvmppc_get_gpr(vcpu, 7); 289 + long rc; 290 + 291 + rc = kvmppc_h_put_tce_indirect(vcpu, liobn, ioba, 292 + tce, npages); 293 + if (rc == H_TOO_HARD) 294 + return EMULATE_FAIL; 295 + kvmppc_set_gpr(vcpu, 3, rc); 296 + return EMULATE_DONE; 297 + } 298 + 299 + static int kvmppc_h_pr_stuff_tce(struct kvm_vcpu *vcpu) 300 + { 301 + unsigned long liobn = kvmppc_get_gpr(vcpu, 4); 302 + unsigned long ioba = kvmppc_get_gpr(vcpu, 5); 303 + unsigned long tce_value = kvmppc_get_gpr(vcpu, 6); 304 + unsigned long npages = kvmppc_get_gpr(vcpu, 7); 305 + long rc; 306 + 307 + rc = kvmppc_h_stuff_tce(vcpu, liobn, ioba, tce_value, npages); 308 + if (rc == H_TOO_HARD) 309 + return EMULATE_FAIL; 310 + kvmppc_set_gpr(vcpu, 3, rc); 311 + return EMULATE_DONE; 312 + } 313 + 283 314 static int kvmppc_h_pr_xics_hcall(struct kvm_vcpu *vcpu, u32 cmd) 284 315 { 285 316 long rc = kvmppc_xics_hcall(vcpu, cmd); ··· 337 306 return kvmppc_h_pr_bulk_remove(vcpu); 338 307 case H_PUT_TCE: 339 308 return kvmppc_h_pr_put_tce(vcpu); 309 + case H_PUT_TCE_INDIRECT: 310 + return kvmppc_h_pr_put_tce_indirect(vcpu); 311 + case H_STUFF_TCE: 312 + return kvmppc_h_pr_stuff_tce(vcpu); 340 313 case H_CEDE: 341 314 kvmppc_set_msr_fast(vcpu, kvmppc_get_msr(vcpu) | MSR_EE); 342 315 kvm_vcpu_block(vcpu);

+37 -1

arch/powerpc/kvm/powerpc.c

··· 33 33 #include <asm/tlbflush.h> 34 34 #include <asm/cputhreads.h> 35 35 #include <asm/irqflags.h> 36 + #include <asm/iommu.h> 36 37 #include "timing.h" 37 38 #include "irq.h" 38 39 #include "../mm/mmu_decl.h" ··· 438 437 unsigned int i; 439 438 struct kvm_vcpu *vcpu; 440 439 440 + #ifdef CONFIG_KVM_XICS 441 + /* 442 + * We call kick_all_cpus_sync() to ensure that all 443 + * CPUs have executed any pending IPIs before we 444 + * continue and free VCPUs structures below. 445 + */ 446 + if (is_kvmppc_hv_enabled(kvm)) 447 + kick_all_cpus_sync(); 448 + #endif 449 + 441 450 kvm_for_each_vcpu(i, vcpu, kvm) 442 451 kvm_arch_vcpu_free(vcpu); 443 452 ··· 520 509 521 510 #ifdef CONFIG_PPC_BOOK3S_64 522 511 case KVM_CAP_SPAPR_TCE: 512 + case KVM_CAP_SPAPR_TCE_64: 523 513 case KVM_CAP_PPC_ALLOC_HTAB: 524 514 case KVM_CAP_PPC_RTAS: 525 515 case KVM_CAP_PPC_FIXUP_HCALL: ··· 579 567 break; 580 568 #ifdef CONFIG_PPC_BOOK3S_64 581 569 case KVM_CAP_PPC_GET_SMMU_INFO: 570 + r = 1; 571 + break; 572 + case KVM_CAP_SPAPR_MULTITCE: 582 573 r = 1; 583 574 break; 584 575 #endif ··· 1346 1331 break; 1347 1332 } 1348 1333 #ifdef CONFIG_PPC_BOOK3S_64 1334 + case KVM_CREATE_SPAPR_TCE_64: { 1335 + struct kvm_create_spapr_tce_64 create_tce_64; 1336 + 1337 + r = -EFAULT; 1338 + if (copy_from_user(&create_tce_64, argp, sizeof(create_tce_64))) 1339 + goto out; 1340 + if (create_tce_64.flags) { 1341 + r = -EINVAL; 1342 + goto out; 1343 + } 1344 + r = kvm_vm_ioctl_create_spapr_tce(kvm, &create_tce_64); 1345 + goto out; 1346 + } 1349 1347 case KVM_CREATE_SPAPR_TCE: { 1350 1348 struct kvm_create_spapr_tce create_tce; 1349 + struct kvm_create_spapr_tce_64 create_tce_64; 1351 1350 1352 1351 r = -EFAULT; 1353 1352 if (copy_from_user(&create_tce, argp, sizeof(create_tce))) 1354 1353 goto out; 1355 - r = kvm_vm_ioctl_create_spapr_tce(kvm, &create_tce); 1354 + 1355 + create_tce_64.liobn = create_tce.liobn; 1356 + create_tce_64.page_shift = IOMMU_PAGE_SHIFT_4K; 1357 + create_tce_64.offset = 0; 1358 + create_tce_64.size = create_tce.window_size >> 1359 + IOMMU_PAGE_SHIFT_4K; 1360 + create_tce_64.flags = 0; 1361 + r = kvm_vm_ioctl_create_spapr_tce(kvm, &create_tce_64); 1356 1362 goto out; 1357 1363 } 1358 1364 case KVM_PPC_GET_SMMU_INFO: {

+8

arch/powerpc/mm/pgtable.c

··· 243 243 } 244 244 #endif /* CONFIG_DEBUG_VM */ 245 245 246 + unsigned long vmalloc_to_phys(void *va) 247 + { 248 + unsigned long pfn = vmalloc_to_pfn(va); 249 + 250 + BUG_ON(!pfn); 251 + return __pa(pfn_to_kaddr(pfn)) + offset_in_page(va); 252 + } 253 + EXPORT_SYMBOL_GPL(vmalloc_to_phys);

-8

arch/powerpc/perf/hv-24x7.c

··· 493 493 } 494 494 } 495 495 496 - static unsigned long vmalloc_to_phys(void *v) 497 - { 498 - struct page *p = vmalloc_to_page(v); 499 - 500 - BUG_ON(!p); 501 - return page_to_phys(p) + offset_in_page(v); 502 - } 503 - 504 496 /* */ 505 497 struct event_uniq { 506 498 struct rb_node node;

+21

arch/powerpc/sysdev/xics/icp-native.c

··· 159 159 icp_native_set_qirr(cpu, IPI_PRIORITY); 160 160 } 161 161 162 + #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 163 + void icp_native_cause_ipi_rm(int cpu) 164 + { 165 + /* 166 + * Currently not used to send IPIs to another CPU 167 + * on the same core. Only caller is KVM real mode. 168 + * Need the physical address of the XICS to be 169 + * previously saved in kvm_hstate in the paca. 170 + */ 171 + unsigned long xics_phys; 172 + 173 + /* 174 + * Just like the cause_ipi functions, it is required to 175 + * include a full barrier (out8 includes a sync) before 176 + * causing the IPI. 177 + */ 178 + xics_phys = paca[cpu].kvm_hstate.xics_phys; 179 + out_rm8((u8 *)(xics_phys + XICS_MFRR), IPI_PRIORITY); 180 + } 181 + #endif 182 + 162 183 /* 163 184 * Called when an interrupt is received on an off-line CPU to 164 185 * clear the interrupt, so that the CPU can go back to nap mode.

+26 -15

arch/s390/include/asm/kvm_host.h

··· 20 20 #include <linux/kvm_types.h> 21 21 #include <linux/kvm_host.h> 22 22 #include <linux/kvm.h> 23 + #include <linux/seqlock.h> 23 24 #include <asm/debug.h> 24 25 #include <asm/cpu.h> 25 26 #include <asm/fpu/api.h> ··· 230 229 __u8 data[256]; 231 230 } __packed; 232 231 233 - struct kvm_s390_vregs { 234 - __vector128 vrs[32]; 235 - __u8 reserved200[512]; /* for future vector expansion */ 236 - } __packed; 237 - 238 232 struct sie_page { 239 233 struct kvm_s390_sie_block sie_block; 240 234 __u8 reserved200[1024]; /* 0x0200 */ 241 235 struct kvm_s390_itdb itdb; /* 0x0600 */ 242 - __u8 reserved700[1280]; /* 0x0700 */ 243 - struct kvm_s390_vregs vregs; /* 0x0c00 */ 236 + __u8 reserved700[2304]; /* 0x0700 */ 244 237 } __packed; 245 238 246 239 struct kvm_vcpu_stat { ··· 553 558 unsigned long pfault_token; 554 559 unsigned long pfault_select; 555 560 unsigned long pfault_compare; 561 + bool cputm_enabled; 562 + /* 563 + * The seqcount protects updates to cputm_start and sie_block.cputm, 564 + * this way we can have non-blocking reads with consistent values. 565 + * Only the owning VCPU thread (vcpu->cpu) is allowed to change these 566 + * values and to start/stop/enable/disable cpu timer accounting. 567 + */ 568 + seqcount_t cputm_seqcount; 569 + __u64 cputm_start; 556 570 }; 557 571 558 572 struct kvm_vm_stat { ··· 600 596 #define S390_ARCH_FAC_MASK_SIZE_U64 \ 601 597 (S390_ARCH_FAC_MASK_SIZE_BYTE / sizeof(u64)) 602 598 603 - struct kvm_s390_fac { 604 - /* facility list requested by guest */ 605 - __u64 list[S390_ARCH_FAC_LIST_SIZE_U64]; 606 - /* facility mask supported by kvm & hosting machine */ 607 - __u64 mask[S390_ARCH_FAC_LIST_SIZE_U64]; 608 - }; 609 - 610 599 struct kvm_s390_cpu_model { 611 - struct kvm_s390_fac *fac; 600 + /* facility mask supported by kvm & hosting machine */ 601 + __u64 fac_mask[S390_ARCH_FAC_LIST_SIZE_U64]; 602 + /* facility list requested by guest (in dma page) */ 603 + __u64 *fac_list; 612 604 struct cpuid cpu_id; 613 605 unsigned short ibc; 614 606 }; ··· 622 622 __u8 aes_wrapping_key_mask[32]; /* 0x0060 */ 623 623 __u8 reserved80[128]; /* 0x0080 */ 624 624 }; 625 + 626 + /* 627 + * sie_page2 has to be allocated as DMA because fac_list and crycb need 628 + * 31bit addresses in the sie control block. 629 + */ 630 + struct sie_page2 { 631 + __u64 fac_list[S390_ARCH_FAC_LIST_SIZE_U64]; /* 0x0000 */ 632 + struct kvm_s390_crypto_cb crycb; /* 0x0800 */ 633 + u8 reserved900[0x1000 - 0x900]; /* 0x0900 */ 634 + } __packed; 625 635 626 636 struct kvm_arch{ 627 637 void *sca; ··· 653 643 int ipte_lock_count; 654 644 struct mutex ipte_mutex; 655 645 spinlock_t start_stop_lock; 646 + struct sie_page2 *sie_page2; 656 647 struct kvm_s390_cpu_model model; 657 648 struct kvm_s390_crypto crypto; 658 649 u64 epoch;

+6 -2

arch/s390/include/uapi/asm/kvm.h

··· 154 154 #define KVM_SYNC_PFAULT (1UL << 5) 155 155 #define KVM_SYNC_VRS (1UL << 6) 156 156 #define KVM_SYNC_RICCB (1UL << 7) 157 + #define KVM_SYNC_FPRS (1UL << 8) 157 158 /* definition of registers in kvm_run */ 158 159 struct kvm_sync_regs { 159 160 __u64 prefix; /* prefix register */ ··· 169 168 __u64 pft; /* pfault token [PFAULT] */ 170 169 __u64 pfs; /* pfault select [PFAULT] */ 171 170 __u64 pfc; /* pfault compare [PFAULT] */ 172 - __u64 vrs[32][2]; /* vector registers */ 171 + union { 172 + __u64 vrs[32][2]; /* vector registers (KVM_SYNC_VRS) */ 173 + __u64 fprs[16]; /* fp registers (KVM_SYNC_FPRS) */ 174 + }; 173 175 __u8 reserved[512]; /* for future vector expansion */ 174 - __u32 fpc; /* only valid with vector registers */ 176 + __u32 fpc; /* valid on KVM_SYNC_VRS or KVM_SYNC_FPRS */ 175 177 __u8 padding[52]; /* riccb needs to be 64byte aligned */ 176 178 __u8 riccb[64]; /* runtime instrumentation controls block */ 177 179 };

+1

arch/s390/include/uapi/asm/sie.h

··· 7 7 { 0x9c, "DIAG (0x9c) time slice end directed" }, \ 8 8 { 0x204, "DIAG (0x204) logical-cpu utilization" }, \ 9 9 { 0x258, "DIAG (0x258) page-reference services" }, \ 10 + { 0x288, "DIAG (0x288) watchdog functions" }, \ 10 11 { 0x308, "DIAG (0x308) ipl functions" }, \ 11 12 { 0x500, "DIAG (0x500) KVM virtio functions" }, \ 12 13 { 0x501, "DIAG (0x501) KVM breakpoint" }

+30 -27

arch/s390/kvm/gaccess.c

··· 373 373 } 374 374 375 375 static int ar_translation(struct kvm_vcpu *vcpu, union asce *asce, ar_t ar, 376 - int write) 376 + enum gacc_mode mode) 377 377 { 378 378 union alet alet; 379 379 struct ale ale; ··· 454 454 } 455 455 } 456 456 457 - if (ale.fo == 1 && write) 457 + if (ale.fo == 1 && mode == GACC_STORE) 458 458 return PGM_PROTECTION; 459 459 460 460 asce->val = aste.asce; ··· 477 477 }; 478 478 479 479 static int get_vcpu_asce(struct kvm_vcpu *vcpu, union asce *asce, 480 - ar_t ar, int write) 480 + ar_t ar, enum gacc_mode mode) 481 481 { 482 482 int rc; 483 - psw_t *psw = &vcpu->arch.sie_block->gpsw; 483 + struct psw_bits psw = psw_bits(vcpu->arch.sie_block->gpsw); 484 484 struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm; 485 485 struct trans_exc_code_bits *tec_bits; 486 486 487 487 memset(pgm, 0, sizeof(*pgm)); 488 488 tec_bits = (struct trans_exc_code_bits *)&pgm->trans_exc_code; 489 - tec_bits->fsi = write ? FSI_STORE : FSI_FETCH; 490 - tec_bits->as = psw_bits(*psw).as; 489 + tec_bits->fsi = mode == GACC_STORE ? FSI_STORE : FSI_FETCH; 490 + tec_bits->as = psw.as; 491 491 492 - if (!psw_bits(*psw).t) { 492 + if (!psw.t) { 493 493 asce->val = 0; 494 494 asce->r = 1; 495 495 return 0; 496 496 } 497 497 498 - switch (psw_bits(vcpu->arch.sie_block->gpsw).as) { 498 + if (mode == GACC_IFETCH) 499 + psw.as = psw.as == PSW_AS_HOME ? PSW_AS_HOME : PSW_AS_PRIMARY; 500 + 501 + switch (psw.as) { 499 502 case PSW_AS_PRIMARY: 500 503 asce->val = vcpu->arch.sie_block->gcr[1]; 501 504 return 0; ··· 509 506 asce->val = vcpu->arch.sie_block->gcr[13]; 510 507 return 0; 511 508 case PSW_AS_ACCREG: 512 - rc = ar_translation(vcpu, asce, ar, write); 509 + rc = ar_translation(vcpu, asce, ar, mode); 513 510 switch (rc) { 514 511 case PGM_ALEN_TRANSLATION: 515 512 case PGM_ALE_SEQUENCE: ··· 541 538 * @gva: guest virtual address 542 539 * @gpa: points to where guest physical (absolute) address should be stored 543 540 * @asce: effective asce 544 - * @write: indicates if access is a write access 541 + * @mode: indicates the access mode to be used 545 542 * 546 543 * Translate a guest virtual address into a guest absolute address by means 547 544 * of dynamic address translation as specified by the architecture. ··· 557 554 */ 558 555 static unsigned long guest_translate(struct kvm_vcpu *vcpu, unsigned long gva, 559 556 unsigned long *gpa, const union asce asce, 560 - int write) 557 + enum gacc_mode mode) 561 558 { 562 559 union vaddress vaddr = {.addr = gva}; 563 560 union raddress raddr = {.addr = gva}; ··· 702 699 real_address: 703 700 raddr.addr = kvm_s390_real_to_abs(vcpu, raddr.addr); 704 701 absolute_address: 705 - if (write && dat_protection) 702 + if (mode == GACC_STORE && dat_protection) 706 703 return PGM_PROTECTION; 707 704 if (kvm_is_error_gpa(vcpu->kvm, raddr.addr)) 708 705 return PGM_ADDRESSING; ··· 731 728 732 729 static int guest_page_range(struct kvm_vcpu *vcpu, unsigned long ga, 733 730 unsigned long *pages, unsigned long nr_pages, 734 - const union asce asce, int write) 731 + const union asce asce, enum gacc_mode mode) 735 732 { 736 733 struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm; 737 734 psw_t *psw = &vcpu->arch.sie_block->gpsw; ··· 743 740 while (nr_pages) { 744 741 ga = kvm_s390_logical_to_effective(vcpu, ga); 745 742 tec_bits->addr = ga >> PAGE_SHIFT; 746 - if (write && lap_enabled && is_low_address(ga)) { 743 + if (mode == GACC_STORE && lap_enabled && is_low_address(ga)) { 747 744 pgm->code = PGM_PROTECTION; 748 745 return pgm->code; 749 746 } 750 747 ga &= PAGE_MASK; 751 748 if (psw_bits(*psw).t) { 752 - rc = guest_translate(vcpu, ga, pages, asce, write); 749 + rc = guest_translate(vcpu, ga, pages, asce, mode); 753 750 if (rc < 0) 754 751 return rc; 755 752 if (rc == PGM_PROTECTION) ··· 771 768 } 772 769 773 770 int access_guest(struct kvm_vcpu *vcpu, unsigned long ga, ar_t ar, void *data, 774 - unsigned long len, int write) 771 + unsigned long len, enum gacc_mode mode) 775 772 { 776 773 psw_t *psw = &vcpu->arch.sie_block->gpsw; 777 774 unsigned long _len, nr_pages, gpa, idx; ··· 783 780 784 781 if (!len) 785 782 return 0; 786 - rc = get_vcpu_asce(vcpu, &asce, ar, write); 783 + rc = get_vcpu_asce(vcpu, &asce, ar, mode); 787 784 if (rc) 788 785 return rc; 789 786 nr_pages = (((ga & ~PAGE_MASK) + len - 1) >> PAGE_SHIFT) + 1; ··· 795 792 need_ipte_lock = psw_bits(*psw).t && !asce.r; 796 793 if (need_ipte_lock) 797 794 ipte_lock(vcpu); 798 - rc = guest_page_range(vcpu, ga, pages, nr_pages, asce, write); 795 + rc = guest_page_range(vcpu, ga, pages, nr_pages, asce, mode); 799 796 for (idx = 0; idx < nr_pages && !rc; idx++) { 800 797 gpa = *(pages + idx) + (ga & ~PAGE_MASK); 801 798 _len = min(PAGE_SIZE - (gpa & ~PAGE_MASK), len); 802 - if (write) 799 + if (mode == GACC_STORE) 803 800 rc = kvm_write_guest(vcpu->kvm, gpa, data, _len); 804 801 else 805 802 rc = kvm_read_guest(vcpu->kvm, gpa, data, _len); ··· 815 812 } 816 813 817 814 int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, 818 - void *data, unsigned long len, int write) 815 + void *data, unsigned long len, enum gacc_mode mode) 819 816 { 820 817 unsigned long _len, gpa; 821 818 int rc = 0; ··· 823 820 while (len && !rc) { 824 821 gpa = kvm_s390_real_to_abs(vcpu, gra); 825 822 _len = min(PAGE_SIZE - (gpa & ~PAGE_MASK), len); 826 - if (write) 823 + if (mode) 827 824 rc = write_guest_abs(vcpu, gpa, data, _len); 828 825 else 829 826 rc = read_guest_abs(vcpu, gpa, data, _len); ··· 844 841 * has to take care of this. 845 842 */ 846 843 int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, ar_t ar, 847 - unsigned long *gpa, int write) 844 + unsigned long *gpa, enum gacc_mode mode) 848 845 { 849 846 struct kvm_s390_pgm_info *pgm = &vcpu->arch.pgm; 850 847 psw_t *psw = &vcpu->arch.sie_block->gpsw; ··· 854 851 855 852 gva = kvm_s390_logical_to_effective(vcpu, gva); 856 853 tec = (struct trans_exc_code_bits *)&pgm->trans_exc_code; 857 - rc = get_vcpu_asce(vcpu, &asce, ar, write); 854 + rc = get_vcpu_asce(vcpu, &asce, ar, mode); 858 855 tec->addr = gva >> PAGE_SHIFT; 859 856 if (rc) 860 857 return rc; 861 858 if (is_low_address(gva) && low_address_protection_enabled(vcpu, asce)) { 862 - if (write) { 859 + if (mode == GACC_STORE) { 863 860 rc = pgm->code = PGM_PROTECTION; 864 861 return rc; 865 862 } 866 863 } 867 864 868 865 if (psw_bits(*psw).t && !asce.r) { /* Use DAT? */ 869 - rc = guest_translate(vcpu, gva, gpa, asce, write); 866 + rc = guest_translate(vcpu, gva, gpa, asce, mode); 870 867 if (rc > 0) { 871 868 if (rc == PGM_PROTECTION) 872 869 tec->b61 = 1; ··· 886 883 * check_gva_range - test a range of guest virtual addresses for accessibility 887 884 */ 888 885 int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva, ar_t ar, 889 - unsigned long length, int is_write) 886 + unsigned long length, enum gacc_mode mode) 890 887 { 891 888 unsigned long gpa; 892 889 unsigned long currlen; ··· 895 892 ipte_lock(vcpu); 896 893 while (length > 0 && !rc) { 897 894 currlen = min(length, PAGE_SIZE - (gva % PAGE_SIZE)); 898 - rc = guest_translate_address(vcpu, gva, ar, &gpa, is_write); 895 + rc = guest_translate_address(vcpu, gva, ar, &gpa, mode); 899 896 gva += currlen; 900 897 length -= currlen; 901 898 }

+32 -6

arch/s390/kvm/gaccess.h

··· 155 155 return kvm_read_guest(vcpu->kvm, gpa, data, len); 156 156 } 157 157 158 + enum gacc_mode { 159 + GACC_FETCH, 160 + GACC_STORE, 161 + GACC_IFETCH, 162 + }; 163 + 158 164 int guest_translate_address(struct kvm_vcpu *vcpu, unsigned long gva, 159 - ar_t ar, unsigned long *gpa, int write); 165 + ar_t ar, unsigned long *gpa, enum gacc_mode mode); 160 166 int check_gva_range(struct kvm_vcpu *vcpu, unsigned long gva, ar_t ar, 161 - unsigned long length, int is_write); 167 + unsigned long length, enum gacc_mode mode); 162 168 163 169 int access_guest(struct kvm_vcpu *vcpu, unsigned long ga, ar_t ar, void *data, 164 - unsigned long len, int write); 170 + unsigned long len, enum gacc_mode mode); 165 171 166 172 int access_guest_real(struct kvm_vcpu *vcpu, unsigned long gra, 167 - void *data, unsigned long len, int write); 173 + void *data, unsigned long len, enum gacc_mode mode); 168 174 169 175 /** 170 176 * write_guest - copy data from kernel space to guest space ··· 221 215 int write_guest(struct kvm_vcpu *vcpu, unsigned long ga, ar_t ar, void *data, 222 216 unsigned long len) 223 217 { 224 - return access_guest(vcpu, ga, ar, data, len, 1); 218 + return access_guest(vcpu, ga, ar, data, len, GACC_STORE); 225 219 } 226 220 227 221 /** ··· 241 235 int read_guest(struct kvm_vcpu *vcpu, unsigned long ga, ar_t ar, void *data, 242 236 unsigned long len) 243 237 { 244 - return access_guest(vcpu, ga, ar, data, len, 0); 238 + return access_guest(vcpu, ga, ar, data, len, GACC_FETCH); 239 + } 240 + 241 + /** 242 + * read_guest_instr - copy instruction data from guest space to kernel space 243 + * @vcpu: virtual cpu 244 + * @data: destination address in kernel space 245 + * @len: number of bytes to copy 246 + * 247 + * Copy @len bytes from the current psw address (guest space) to @data (kernel 248 + * space). 249 + * 250 + * The behaviour of read_guest_instr is identical to read_guest, except that 251 + * instruction data will be read from primary space when in home-space or 252 + * address-space mode. 253 + */ 254 + static inline __must_check 255 + int read_guest_instr(struct kvm_vcpu *vcpu, void *data, unsigned long len) 256 + { 257 + return access_guest(vcpu, vcpu->arch.sie_block->gpsw.addr, 0, data, len, 258 + GACC_IFETCH); 245 259 } 246 260 247 261 /**

+47 -31

arch/s390/kvm/intercept.c

··· 38 38 [0xeb] = kvm_s390_handle_eb, 39 39 }; 40 40 41 - void kvm_s390_rewind_psw(struct kvm_vcpu *vcpu, int ilc) 41 + u8 kvm_s390_get_ilen(struct kvm_vcpu *vcpu) 42 42 { 43 43 struct kvm_s390_sie_block *sie_block = vcpu->arch.sie_block; 44 + u8 ilen = 0; 44 45 45 - /* Use the length of the EXECUTE instruction if necessary */ 46 - if (sie_block->icptstatus & 1) { 47 - ilc = (sie_block->icptstatus >> 4) & 0x6; 48 - if (!ilc) 49 - ilc = 4; 46 + switch (vcpu->arch.sie_block->icptcode) { 47 + case ICPT_INST: 48 + case ICPT_INSTPROGI: 49 + case ICPT_OPEREXC: 50 + case ICPT_PARTEXEC: 51 + case ICPT_IOINST: 52 + /* instruction only stored for these icptcodes */ 53 + ilen = insn_length(vcpu->arch.sie_block->ipa >> 8); 54 + /* Use the length of the EXECUTE instruction if necessary */ 55 + if (sie_block->icptstatus & 1) { 56 + ilen = (sie_block->icptstatus >> 4) & 0x6; 57 + if (!ilen) 58 + ilen = 4; 59 + } 60 + break; 61 + case ICPT_PROGI: 62 + /* bit 1+2 of pgmilc are the ilc, so we directly get ilen */ 63 + ilen = vcpu->arch.sie_block->pgmilc & 0x6; 64 + break; 50 65 } 51 - sie_block->gpsw.addr = __rewind_psw(sie_block->gpsw, ilc); 66 + return ilen; 52 67 } 53 68 54 69 static int handle_noop(struct kvm_vcpu *vcpu) ··· 136 121 return -EOPNOTSUPP; 137 122 } 138 123 139 - static void __extract_prog_irq(struct kvm_vcpu *vcpu, 140 - struct kvm_s390_pgm_info *pgm_info) 124 + static int inject_prog_on_prog_intercept(struct kvm_vcpu *vcpu) 141 125 { 142 - memset(pgm_info, 0, sizeof(struct kvm_s390_pgm_info)); 143 - pgm_info->code = vcpu->arch.sie_block->iprcc; 126 + struct kvm_s390_pgm_info pgm_info = { 127 + .code = vcpu->arch.sie_block->iprcc, 128 + /* the PSW has already been rewound */ 129 + .flags = KVM_S390_PGM_FLAGS_NO_REWIND, 130 + }; 144 131 145 132 switch (vcpu->arch.sie_block->iprcc & ~PGM_PER) { 146 133 case PGM_AFX_TRANSLATION: ··· 155 138 case PGM_PRIMARY_AUTHORITY: 156 139 case PGM_SECONDARY_AUTHORITY: 157 140 case PGM_SPACE_SWITCH: 158 - pgm_info->trans_exc_code = vcpu->arch.sie_block->tecmc; 141 + pgm_info.trans_exc_code = vcpu->arch.sie_block->tecmc; 159 142 break; 160 143 case PGM_ALEN_TRANSLATION: 161 144 case PGM_ALE_SEQUENCE: ··· 163 146 case PGM_ASTE_SEQUENCE: 164 147 case PGM_ASTE_VALIDITY: 165 148 case PGM_EXTENDED_AUTHORITY: 166 - pgm_info->exc_access_id = vcpu->arch.sie_block->eai; 149 + pgm_info.exc_access_id = vcpu->arch.sie_block->eai; 167 150 break; 168 151 case PGM_ASCE_TYPE: 169 152 case PGM_PAGE_TRANSLATION: ··· 171 154 case PGM_REGION_SECOND_TRANS: 172 155 case PGM_REGION_THIRD_TRANS: 173 156 case PGM_SEGMENT_TRANSLATION: 174 - pgm_info->trans_exc_code = vcpu->arch.sie_block->tecmc; 175 - pgm_info->exc_access_id = vcpu->arch.sie_block->eai; 176 - pgm_info->op_access_id = vcpu->arch.sie_block->oai; 157 + pgm_info.trans_exc_code = vcpu->arch.sie_block->tecmc; 158 + pgm_info.exc_access_id = vcpu->arch.sie_block->eai; 159 + pgm_info.op_access_id = vcpu->arch.sie_block->oai; 177 160 break; 178 161 case PGM_MONITOR: 179 - pgm_info->mon_class_nr = vcpu->arch.sie_block->mcn; 180 - pgm_info->mon_code = vcpu->arch.sie_block->tecmc; 162 + pgm_info.mon_class_nr = vcpu->arch.sie_block->mcn; 163 + pgm_info.mon_code = vcpu->arch.sie_block->tecmc; 181 164 break; 182 165 case PGM_VECTOR_PROCESSING: 183 166 case PGM_DATA: 184 - pgm_info->data_exc_code = vcpu->arch.sie_block->dxc; 167 + pgm_info.data_exc_code = vcpu->arch.sie_block->dxc; 185 168 break; 186 169 case PGM_PROTECTION: 187 - pgm_info->trans_exc_code = vcpu->arch.sie_block->tecmc; 188 - pgm_info->exc_access_id = vcpu->arch.sie_block->eai; 170 + pgm_info.trans_exc_code = vcpu->arch.sie_block->tecmc; 171 + pgm_info.exc_access_id = vcpu->arch.sie_block->eai; 189 172 break; 190 173 default: 191 174 break; 192 175 } 193 176 194 177 if (vcpu->arch.sie_block->iprcc & PGM_PER) { 195 - pgm_info->per_code = vcpu->arch.sie_block->perc; 196 - pgm_info->per_atmid = vcpu->arch.sie_block->peratmid; 197 - pgm_info->per_address = vcpu->arch.sie_block->peraddr; 198 - pgm_info->per_access_id = vcpu->arch.sie_block->peraid; 178 + pgm_info.per_code = vcpu->arch.sie_block->perc; 179 + pgm_info.per_atmid = vcpu->arch.sie_block->peratmid; 180 + pgm_info.per_address = vcpu->arch.sie_block->peraddr; 181 + pgm_info.per_access_id = vcpu->arch.sie_block->peraid; 199 182 } 183 + return kvm_s390_inject_prog_irq(vcpu, &pgm_info); 200 184 } 201 185 202 186 /* ··· 226 208 227 209 static int handle_prog(struct kvm_vcpu *vcpu) 228 210 { 229 - struct kvm_s390_pgm_info pgm_info; 230 211 psw_t psw; 231 212 int rc; 232 213 ··· 251 234 if (rc) 252 235 return rc; 253 236 254 - __extract_prog_irq(vcpu, &pgm_info); 255 - return kvm_s390_inject_prog_irq(vcpu, &pgm_info); 237 + return inject_prog_on_prog_intercept(vcpu); 256 238 } 257 239 258 240 /** ··· 318 302 319 303 /* Make sure that the source is paged-in */ 320 304 rc = guest_translate_address(vcpu, vcpu->run->s.regs.gprs[reg2], 321 - reg2, &srcaddr, 0); 305 + reg2, &srcaddr, GACC_FETCH); 322 306 if (rc) 323 307 return kvm_s390_inject_prog_cond(vcpu, rc); 324 308 rc = kvm_arch_fault_in_page(vcpu, srcaddr, 0); ··· 327 311 328 312 /* Make sure that the destination is paged-in */ 329 313 rc = guest_translate_address(vcpu, vcpu->run->s.regs.gprs[reg1], 330 - reg1, &dstaddr, 1); 314 + reg1, &dstaddr, GACC_STORE); 331 315 if (rc) 332 316 return kvm_s390_inject_prog_cond(vcpu, rc); 333 317 rc = kvm_arch_fault_in_page(vcpu, dstaddr, 1); 334 318 if (rc != 0) 335 319 return rc; 336 320 337 - kvm_s390_rewind_psw(vcpu, 4); 321 + kvm_s390_retry_instr(vcpu); 338 322 339 323 return 0; 340 324 }

+55 -38

arch/s390/kvm/interrupt.c

··· 182 182 183 183 static int cpu_timer_irq_pending(struct kvm_vcpu *vcpu) 184 184 { 185 - return (vcpu->arch.sie_block->cputm >> 63) && 186 - cpu_timer_interrupts_enabled(vcpu); 185 + if (!cpu_timer_interrupts_enabled(vcpu)) 186 + return 0; 187 + return kvm_s390_get_cpu_timer(vcpu) >> 63; 187 188 } 188 189 189 190 static inline int is_ioirq(unsigned long irq_type) ··· 334 333 set_intercept_indicators_ext(vcpu); 335 334 set_intercept_indicators_mchk(vcpu); 336 335 set_intercept_indicators_stop(vcpu); 337 - } 338 - 339 - static u16 get_ilc(struct kvm_vcpu *vcpu) 340 - { 341 - switch (vcpu->arch.sie_block->icptcode) { 342 - case ICPT_INST: 343 - case ICPT_INSTPROGI: 344 - case ICPT_OPEREXC: 345 - case ICPT_PARTEXEC: 346 - case ICPT_IOINST: 347 - /* last instruction only stored for these icptcodes */ 348 - return insn_length(vcpu->arch.sie_block->ipa >> 8); 349 - case ICPT_PROGI: 350 - return vcpu->arch.sie_block->pgmilc; 351 - default: 352 - return 0; 353 - } 354 336 } 355 337 356 338 static int __must_check __deliver_cpu_timer(struct kvm_vcpu *vcpu) ··· 572 588 struct kvm_s390_local_interrupt *li = &vcpu->arch.local_int; 573 589 struct kvm_s390_pgm_info pgm_info; 574 590 int rc = 0, nullifying = false; 575 - u16 ilc = get_ilc(vcpu); 591 + u16 ilen; 576 592 577 593 spin_lock(&li->lock); 578 594 pgm_info = li->irq.pgm; ··· 580 596 memset(&li->irq.pgm, 0, sizeof(pgm_info)); 581 597 spin_unlock(&li->lock); 582 598 583 - VCPU_EVENT(vcpu, 3, "deliver: program irq code 0x%x, ilc:%d", 584 - pgm_info.code, ilc); 599 + ilen = pgm_info.flags & KVM_S390_PGM_FLAGS_ILC_MASK; 600 + VCPU_EVENT(vcpu, 3, "deliver: program irq code 0x%x, ilen:%d", 601 + pgm_info.code, ilen); 585 602 vcpu->stat.deliver_program_int++; 586 603 trace_kvm_s390_deliver_interrupt(vcpu->vcpu_id, KVM_S390_PROGRAM_INT, 587 604 pgm_info.code, 0); ··· 666 681 (u8 *) __LC_PER_ACCESS_ID); 667 682 } 668 683 669 - if (nullifying && vcpu->arch.sie_block->icptcode == ICPT_INST) 670 - kvm_s390_rewind_psw(vcpu, ilc); 684 + if (nullifying && !(pgm_info.flags & KVM_S390_PGM_FLAGS_NO_REWIND)) 685 + kvm_s390_rewind_psw(vcpu, ilen); 671 686 672 - rc |= put_guest_lc(vcpu, ilc, (u16 *) __LC_PGM_ILC); 687 + /* bit 1+2 of the target are the ilc, so we can directly use ilen */ 688 + rc |= put_guest_lc(vcpu, ilen, (u16 *) __LC_PGM_ILC); 673 689 rc |= put_guest_lc(vcpu, vcpu->arch.sie_block->gbea, 674 690 (u64 *) __LC_LAST_BREAK); 675 691 rc |= put_guest_lc(vcpu, pgm_info.code, ··· 909 923 return ckc_irq_pending(vcpu) || cpu_timer_irq_pending(vcpu); 910 924 } 911 925 926 + static u64 __calculate_sltime(struct kvm_vcpu *vcpu) 927 + { 928 + u64 now, cputm, sltime = 0; 929 + 930 + if (ckc_interrupts_enabled(vcpu)) { 931 + now = kvm_s390_get_tod_clock_fast(vcpu->kvm); 932 + sltime = tod_to_ns(vcpu->arch.sie_block->ckc - now); 933 + /* already expired or overflow? */ 934 + if (!sltime || vcpu->arch.sie_block->ckc <= now) 935 + return 0; 936 + if (cpu_timer_interrupts_enabled(vcpu)) { 937 + cputm = kvm_s390_get_cpu_timer(vcpu); 938 + /* already expired? */ 939 + if (cputm >> 63) 940 + return 0; 941 + return min(sltime, tod_to_ns(cputm)); 942 + } 943 + } else if (cpu_timer_interrupts_enabled(vcpu)) { 944 + sltime = kvm_s390_get_cpu_timer(vcpu); 945 + /* already expired? */ 946 + if (sltime >> 63) 947 + return 0; 948 + } 949 + return sltime; 950 + } 951 + 912 952 int kvm_s390_handle_wait(struct kvm_vcpu *vcpu) 913 953 { 914 - u64 now, sltime; 954 + u64 sltime; 915 955 916 956 vcpu->stat.exit_wait_state++; 917 957 ··· 950 938 return -EOPNOTSUPP; /* disabled wait */ 951 939 } 952 940 953 - if (!ckc_interrupts_enabled(vcpu)) { 941 + if (!ckc_interrupts_enabled(vcpu) && 942 + !cpu_timer_interrupts_enabled(vcpu)) { 954 943 VCPU_EVENT(vcpu, 3, "%s", "enabled wait w/o timer"); 955 944 __set_cpu_idle(vcpu); 956 945 goto no_timer; 957 946 } 958 947 959 - now = kvm_s390_get_tod_clock_fast(vcpu->kvm); 960 - sltime = tod_to_ns(vcpu->arch.sie_block->ckc - now); 961 - 962 - /* underflow */ 963 - if (vcpu->arch.sie_block->ckc < now) 948 + sltime = __calculate_sltime(vcpu); 949 + if (!sltime) 964 950 return 0; 965 951 966 952 __set_cpu_idle(vcpu); 967 953 hrtimer_start(&vcpu->arch.ckc_timer, ktime_set (0, sltime) , HRTIMER_MODE_REL); 968 - VCPU_EVENT(vcpu, 4, "enabled wait via clock comparator: %llu ns", sltime); 954 + VCPU_EVENT(vcpu, 4, "enabled wait: %llu ns", sltime); 969 955 no_timer: 970 956 srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); 971 957 kvm_vcpu_block(vcpu); ··· 990 980 enum hrtimer_restart kvm_s390_idle_wakeup(struct hrtimer *timer) 991 981 { 992 982 struct kvm_vcpu *vcpu; 993 - u64 now, sltime; 983 + u64 sltime; 994 984 995 985 vcpu = container_of(timer, struct kvm_vcpu, arch.ckc_timer); 996 - now = kvm_s390_get_tod_clock_fast(vcpu->kvm); 997 - sltime = tod_to_ns(vcpu->arch.sie_block->ckc - now); 986 + sltime = __calculate_sltime(vcpu); 998 987 999 988 /* 1000 989 * If the monotonic clock runs faster than the tod clock we might be 1001 990 * woken up too early and have to go back to sleep to avoid deadlocks. 1002 991 */ 1003 - if (vcpu->arch.sie_block->ckc > now && 1004 - hrtimer_forward_now(timer, ns_to_ktime(sltime))) 992 + if (sltime && hrtimer_forward_now(timer, ns_to_ktime(sltime))) 1005 993 return HRTIMER_RESTART; 1006 994 kvm_s390_vcpu_wakeup(vcpu); 1007 995 return HRTIMER_NORESTART; ··· 1067 1059 trace_kvm_s390_inject_vcpu(vcpu->vcpu_id, KVM_S390_PROGRAM_INT, 1068 1060 irq->u.pgm.code, 0); 1069 1061 1062 + if (!(irq->u.pgm.flags & KVM_S390_PGM_FLAGS_ILC_VALID)) { 1063 + /* auto detection if no valid ILC was given */ 1064 + irq->u.pgm.flags &= ~KVM_S390_PGM_FLAGS_ILC_MASK; 1065 + irq->u.pgm.flags |= kvm_s390_get_ilen(vcpu); 1066 + irq->u.pgm.flags |= KVM_S390_PGM_FLAGS_ILC_VALID; 1067 + } 1068 + 1070 1069 if (irq->u.pgm.code == PGM_PER) { 1071 1070 li->irq.pgm.code |= PGM_PER; 1071 + li->irq.pgm.flags = irq->u.pgm.flags; 1072 1072 /* only modify PER related information */ 1073 1073 li->irq.pgm.per_address = irq->u.pgm.per_address; 1074 1074 li->irq.pgm.per_code = irq->u.pgm.per_code; ··· 1085 1069 } else if (!(irq->u.pgm.code & PGM_PER)) { 1086 1070 li->irq.pgm.code = (li->irq.pgm.code & PGM_PER) | 1087 1071 irq->u.pgm.code; 1072 + li->irq.pgm.flags = irq->u.pgm.flags; 1088 1073 /* only modify non-PER information */ 1089 1074 li->irq.pgm.trans_exc_code = irq->u.pgm.trans_exc_code; 1090 1075 li->irq.pgm.mon_code = irq->u.pgm.mon_code;

+172 -63

arch/s390/kvm/kvm-s390.c

··· 158 158 kvm->arch.epoch -= *delta; 159 159 kvm_for_each_vcpu(i, vcpu, kvm) { 160 160 vcpu->arch.sie_block->epoch -= *delta; 161 + if (vcpu->arch.cputm_enabled) 162 + vcpu->arch.cputm_start += *delta; 161 163 } 162 164 } 163 165 return NOTIFY_OK; ··· 276 274 unsigned long address; 277 275 struct gmap *gmap = kvm->arch.gmap; 278 276 279 - down_read(&gmap->mm->mmap_sem); 280 277 /* Loop over all guest pages */ 281 278 last_gfn = memslot->base_gfn + memslot->npages; 282 279 for (cur_gfn = memslot->base_gfn; cur_gfn <= last_gfn; cur_gfn++) { ··· 283 282 284 283 if (gmap_test_and_clear_dirty(address, gmap)) 285 284 mark_page_dirty(kvm, cur_gfn); 285 + if (fatal_signal_pending(current)) 286 + return; 287 + cond_resched(); 286 288 } 287 - up_read(&gmap->mm->mmap_sem); 288 289 } 289 290 290 291 /* Section: vm related */ ··· 355 352 if (atomic_read(&kvm->online_vcpus)) { 356 353 r = -EBUSY; 357 354 } else if (MACHINE_HAS_VX) { 358 - set_kvm_facility(kvm->arch.model.fac->mask, 129); 359 - set_kvm_facility(kvm->arch.model.fac->list, 129); 355 + set_kvm_facility(kvm->arch.model.fac_mask, 129); 356 + set_kvm_facility(kvm->arch.model.fac_list, 129); 360 357 r = 0; 361 358 } else 362 359 r = -EINVAL; ··· 370 367 if (atomic_read(&kvm->online_vcpus)) { 371 368 r = -EBUSY; 372 369 } else if (test_facility(64)) { 373 - set_kvm_facility(kvm->arch.model.fac->mask, 64); 374 - set_kvm_facility(kvm->arch.model.fac->list, 64); 370 + set_kvm_facility(kvm->arch.model.fac_mask, 64); 371 + set_kvm_facility(kvm->arch.model.fac_list, 64); 375 372 r = 0; 376 373 } 377 374 mutex_unlock(&kvm->lock); ··· 654 651 memcpy(&kvm->arch.model.cpu_id, &proc->cpuid, 655 652 sizeof(struct cpuid)); 656 653 kvm->arch.model.ibc = proc->ibc; 657 - memcpy(kvm->arch.model.fac->list, proc->fac_list, 654 + memcpy(kvm->arch.model.fac_list, proc->fac_list, 658 655 S390_ARCH_FAC_LIST_SIZE_BYTE); 659 656 } else 660 657 ret = -EFAULT; ··· 688 685 } 689 686 memcpy(&proc->cpuid, &kvm->arch.model.cpu_id, sizeof(struct cpuid)); 690 687 proc->ibc = kvm->arch.model.ibc; 691 - memcpy(&proc->fac_list, kvm->arch.model.fac->list, S390_ARCH_FAC_LIST_SIZE_BYTE); 688 + memcpy(&proc->fac_list, kvm->arch.model.fac_list, 689 + S390_ARCH_FAC_LIST_SIZE_BYTE); 692 690 if (copy_to_user((void __user *)attr->addr, proc, sizeof(*proc))) 693 691 ret = -EFAULT; 694 692 kfree(proc); ··· 709 705 } 710 706 get_cpu_id((struct cpuid *) &mach->cpuid); 711 707 mach->ibc = sclp.ibc; 712 - memcpy(&mach->fac_mask, kvm->arch.model.fac->mask, 708 + memcpy(&mach->fac_mask, kvm->arch.model.fac_mask, 713 709 S390_ARCH_FAC_LIST_SIZE_BYTE); 714 710 memcpy((unsigned long *)&mach->fac_list, S390_lowcore.stfle_fac_list, 715 711 S390_ARCH_FAC_LIST_SIZE_BYTE); ··· 1086 1082 cpu_id->version = 0xff; 1087 1083 } 1088 1084 1089 - static int kvm_s390_crypto_init(struct kvm *kvm) 1085 + static void kvm_s390_crypto_init(struct kvm *kvm) 1090 1086 { 1091 1087 if (!test_kvm_facility(kvm, 76)) 1092 - return 0; 1088 + return; 1093 1089 1094 - kvm->arch.crypto.crycb = kzalloc(sizeof(*kvm->arch.crypto.crycb), 1095 - GFP_KERNEL | GFP_DMA); 1096 - if (!kvm->arch.crypto.crycb) 1097 - return -ENOMEM; 1098 - 1090 + kvm->arch.crypto.crycb = &kvm->arch.sie_page2->crycb; 1099 1091 kvm_s390_set_crycb_format(kvm); 1100 1092 1101 1093 /* Enable AES/DEA protected key functions by default */ ··· 1101 1101 sizeof(kvm->arch.crypto.crycb->aes_wrapping_key_mask)); 1102 1102 get_random_bytes(kvm->arch.crypto.crycb->dea_wrapping_key_mask, 1103 1103 sizeof(kvm->arch.crypto.crycb->dea_wrapping_key_mask)); 1104 - 1105 - return 0; 1106 1104 } 1107 1105 1108 1106 static void sca_dispose(struct kvm *kvm) ··· 1154 1156 if (!kvm->arch.dbf) 1155 1157 goto out_err; 1156 1158 1157 - /* 1158 - * The architectural maximum amount of facilities is 16 kbit. To store 1159 - * this amount, 2 kbyte of memory is required. Thus we need a full 1160 - * page to hold the guest facility list (arch.model.fac->list) and the 1161 - * facility mask (arch.model.fac->mask). Its address size has to be 1162 - * 31 bits and word aligned. 1163 - */ 1164 - kvm->arch.model.fac = 1165 - (struct kvm_s390_fac *) get_zeroed_page(GFP_KERNEL | GFP_DMA); 1166 - if (!kvm->arch.model.fac) 1159 + kvm->arch.sie_page2 = 1160 + (struct sie_page2 *) get_zeroed_page(GFP_KERNEL | GFP_DMA); 1161 + if (!kvm->arch.sie_page2) 1167 1162 goto out_err; 1168 1163 1169 1164 /* Populate the facility mask initially. */ 1170 - memcpy(kvm->arch.model.fac->mask, S390_lowcore.stfle_fac_list, 1165 + memcpy(kvm->arch.model.fac_mask, S390_lowcore.stfle_fac_list, 1171 1166 S390_ARCH_FAC_LIST_SIZE_BYTE); 1172 1167 for (i = 0; i < S390_ARCH_FAC_LIST_SIZE_U64; i++) { 1173 1168 if (i < kvm_s390_fac_list_mask_size()) 1174 - kvm->arch.model.fac->mask[i] &= kvm_s390_fac_list_mask[i]; 1169 + kvm->arch.model.fac_mask[i] &= kvm_s390_fac_list_mask[i]; 1175 1170 else 1176 - kvm->arch.model.fac->mask[i] = 0UL; 1171 + kvm->arch.model.fac_mask[i] = 0UL; 1177 1172 } 1178 1173 1179 1174 /* Populate the facility list initially. */ 1180 - memcpy(kvm->arch.model.fac->list, kvm->arch.model.fac->mask, 1175 + kvm->arch.model.fac_list = kvm->arch.sie_page2->fac_list; 1176 + memcpy(kvm->arch.model.fac_list, kvm->arch.model.fac_mask, 1181 1177 S390_ARCH_FAC_LIST_SIZE_BYTE); 1182 1178 1183 1179 kvm_s390_get_cpu_id(&kvm->arch.model.cpu_id); 1184 1180 kvm->arch.model.ibc = sclp.ibc & 0x0fff; 1185 1181 1186 - if (kvm_s390_crypto_init(kvm) < 0) 1187 - goto out_err; 1182 + kvm_s390_crypto_init(kvm); 1188 1183 1189 1184 spin_lock_init(&kvm->arch.float_int.lock); 1190 1185 for (i = 0; i < FIRQ_LIST_COUNT; i++) ··· 1213 1222 1214 1223 return 0; 1215 1224 out_err: 1216 - kfree(kvm->arch.crypto.crycb); 1217 - free_page((unsigned long)kvm->arch.model.fac); 1225 + free_page((unsigned long)kvm->arch.sie_page2); 1218 1226 debug_unregister(kvm->arch.dbf); 1219 1227 sca_dispose(kvm); 1220 1228 KVM_EVENT(3, "creation of vm failed: %d", rc); ··· 1259 1269 void kvm_arch_destroy_vm(struct kvm *kvm) 1260 1270 { 1261 1271 kvm_free_vcpus(kvm); 1262 - free_page((unsigned long)kvm->arch.model.fac); 1263 1272 sca_dispose(kvm); 1264 1273 debug_unregister(kvm->arch.dbf); 1265 - kfree(kvm->arch.crypto.crycb); 1274 + free_page((unsigned long)kvm->arch.sie_page2); 1266 1275 if (!kvm_is_ucontrol(kvm)) 1267 1276 gmap_free(kvm->arch.gmap); 1268 1277 kvm_s390_destroy_adapters(kvm); ··· 1403 1414 KVM_SYNC_PFAULT; 1404 1415 if (test_kvm_facility(vcpu->kvm, 64)) 1405 1416 vcpu->run->kvm_valid_regs |= KVM_SYNC_RICCB; 1406 - if (test_kvm_facility(vcpu->kvm, 129)) 1417 + /* fprs can be synchronized via vrs, even if the guest has no vx. With 1418 + * MACHINE_HAS_VX, (load|store)_fpu_regs() will work with vrs format. 1419 + */ 1420 + if (MACHINE_HAS_VX) 1407 1421 vcpu->run->kvm_valid_regs |= KVM_SYNC_VRS; 1422 + else 1423 + vcpu->run->kvm_valid_regs |= KVM_SYNC_FPRS; 1408 1424 1409 1425 if (kvm_is_ucontrol(vcpu->kvm)) 1410 1426 return __kvm_ucontrol_vcpu_init(vcpu); 1411 1427 1412 1428 return 0; 1429 + } 1430 + 1431 + /* needs disabled preemption to protect from TOD sync and vcpu_load/put */ 1432 + static void __start_cpu_timer_accounting(struct kvm_vcpu *vcpu) 1433 + { 1434 + WARN_ON_ONCE(vcpu->arch.cputm_start != 0); 1435 + raw_write_seqcount_begin(&vcpu->arch.cputm_seqcount); 1436 + vcpu->arch.cputm_start = get_tod_clock_fast(); 1437 + raw_write_seqcount_end(&vcpu->arch.cputm_seqcount); 1438 + } 1439 + 1440 + /* needs disabled preemption to protect from TOD sync and vcpu_load/put */ 1441 + static void __stop_cpu_timer_accounting(struct kvm_vcpu *vcpu) 1442 + { 1443 + WARN_ON_ONCE(vcpu->arch.cputm_start == 0); 1444 + raw_write_seqcount_begin(&vcpu->arch.cputm_seqcount); 1445 + vcpu->arch.sie_block->cputm -= get_tod_clock_fast() - vcpu->arch.cputm_start; 1446 + vcpu->arch.cputm_start = 0; 1447 + raw_write_seqcount_end(&vcpu->arch.cputm_seqcount); 1448 + } 1449 + 1450 + /* needs disabled preemption to protect from TOD sync and vcpu_load/put */ 1451 + static void __enable_cpu_timer_accounting(struct kvm_vcpu *vcpu) 1452 + { 1453 + WARN_ON_ONCE(vcpu->arch.cputm_enabled); 1454 + vcpu->arch.cputm_enabled = true; 1455 + __start_cpu_timer_accounting(vcpu); 1456 + } 1457 + 1458 + /* needs disabled preemption to protect from TOD sync and vcpu_load/put */ 1459 + static void __disable_cpu_timer_accounting(struct kvm_vcpu *vcpu) 1460 + { 1461 + WARN_ON_ONCE(!vcpu->arch.cputm_enabled); 1462 + __stop_cpu_timer_accounting(vcpu); 1463 + vcpu->arch.cputm_enabled = false; 1464 + } 1465 + 1466 + static void enable_cpu_timer_accounting(struct kvm_vcpu *vcpu) 1467 + { 1468 + preempt_disable(); /* protect from TOD sync and vcpu_load/put */ 1469 + __enable_cpu_timer_accounting(vcpu); 1470 + preempt_enable(); 1471 + } 1472 + 1473 + static void disable_cpu_timer_accounting(struct kvm_vcpu *vcpu) 1474 + { 1475 + preempt_disable(); /* protect from TOD sync and vcpu_load/put */ 1476 + __disable_cpu_timer_accounting(vcpu); 1477 + preempt_enable(); 1478 + } 1479 + 1480 + /* set the cpu timer - may only be called from the VCPU thread itself */ 1481 + void kvm_s390_set_cpu_timer(struct kvm_vcpu *vcpu, __u64 cputm) 1482 + { 1483 + preempt_disable(); /* protect from TOD sync and vcpu_load/put */ 1484 + raw_write_seqcount_begin(&vcpu->arch.cputm_seqcount); 1485 + if (vcpu->arch.cputm_enabled) 1486 + vcpu->arch.cputm_start = get_tod_clock_fast(); 1487 + vcpu->arch.sie_block->cputm = cputm; 1488 + raw_write_seqcount_end(&vcpu->arch.cputm_seqcount); 1489 + preempt_enable(); 1490 + } 1491 + 1492 + /* update and get the cpu timer - can also be called from other VCPU threads */ 1493 + __u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu) 1494 + { 1495 + unsigned int seq; 1496 + __u64 value; 1497 + 1498 + if (unlikely(!vcpu->arch.cputm_enabled)) 1499 + return vcpu->arch.sie_block->cputm; 1500 + 1501 + preempt_disable(); /* protect from TOD sync and vcpu_load/put */ 1502 + do { 1503 + seq = raw_read_seqcount(&vcpu->arch.cputm_seqcount); 1504 + /* 1505 + * If the writer would ever execute a read in the critical 1506 + * section, e.g. in irq context, we have a deadlock. 1507 + */ 1508 + WARN_ON_ONCE((seq & 1) && smp_processor_id() == vcpu->cpu); 1509 + value = vcpu->arch.sie_block->cputm; 1510 + /* if cputm_start is 0, accounting is being started/stopped */ 1511 + if (likely(vcpu->arch.cputm_start)) 1512 + value -= get_tod_clock_fast() - vcpu->arch.cputm_start; 1513 + } while (read_seqcount_retry(&vcpu->arch.cputm_seqcount, seq & ~1)); 1514 + preempt_enable(); 1515 + return value; 1413 1516 } 1414 1517 1415 1518 void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) ··· 1511 1430 vcpu->arch.host_fpregs.fpc = current->thread.fpu.fpc; 1512 1431 vcpu->arch.host_fpregs.regs = current->thread.fpu.regs; 1513 1432 1514 - /* Depending on MACHINE_HAS_VX, data stored to vrs either 1515 - * has vector register or floating point register format. 1516 - */ 1517 - current->thread.fpu.regs = vcpu->run->s.regs.vrs; 1433 + if (MACHINE_HAS_VX) 1434 + current->thread.fpu.regs = vcpu->run->s.regs.vrs; 1435 + else 1436 + current->thread.fpu.regs = vcpu->run->s.regs.fprs; 1518 1437 current->thread.fpu.fpc = vcpu->run->s.regs.fpc; 1519 1438 if (test_fp_ctl(current->thread.fpu.fpc)) 1520 1439 /* User space provided an invalid FPC, let's clear it */ ··· 1524 1443 restore_access_regs(vcpu->run->s.regs.acrs); 1525 1444 gmap_enable(vcpu->arch.gmap); 1526 1445 atomic_or(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags); 1446 + if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu)) 1447 + __start_cpu_timer_accounting(vcpu); 1448 + vcpu->cpu = cpu; 1527 1449 } 1528 1450 1529 1451 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) 1530 1452 { 1453 + vcpu->cpu = -1; 1454 + if (vcpu->arch.cputm_enabled && !is_vcpu_idle(vcpu)) 1455 + __stop_cpu_timer_accounting(vcpu); 1531 1456 atomic_andnot(CPUSTAT_RUNNING, &vcpu->arch.sie_block->cpuflags); 1532 1457 gmap_disable(vcpu->arch.gmap); 1533 1458 ··· 1555 1468 vcpu->arch.sie_block->gpsw.mask = 0UL; 1556 1469 vcpu->arch.sie_block->gpsw.addr = 0UL; 1557 1470 kvm_s390_set_prefix(vcpu, 0); 1558 - vcpu->arch.sie_block->cputm = 0UL; 1471 + kvm_s390_set_cpu_timer(vcpu, 0); 1559 1472 vcpu->arch.sie_block->ckc = 0UL; 1560 1473 vcpu->arch.sie_block->todpr = 0; 1561 1474 memset(vcpu->arch.sie_block->gcr, 0, 16 * sizeof(__u64)); ··· 1625 1538 1626 1539 vcpu->arch.cpu_id = model->cpu_id; 1627 1540 vcpu->arch.sie_block->ibc = model->ibc; 1628 - vcpu->arch.sie_block->fac = (int) (long) model->fac->list; 1541 + if (test_kvm_facility(vcpu->kvm, 7)) 1542 + vcpu->arch.sie_block->fac = (u32)(u64) model->fac_list; 1629 1543 } 1630 1544 1631 1545 int kvm_arch_vcpu_setup(struct kvm_vcpu *vcpu) ··· 1704 1616 vcpu->arch.local_int.float_int = &kvm->arch.float_int; 1705 1617 vcpu->arch.local_int.wq = &vcpu->wq; 1706 1618 vcpu->arch.local_int.cpuflags = &vcpu->arch.sie_block->cpuflags; 1619 + seqcount_init(&vcpu->arch.cputm_seqcount); 1707 1620 1708 1621 rc = kvm_vcpu_init(vcpu, kvm, id); 1709 1622 if (rc) ··· 1804 1715 (u64 __user *)reg->addr); 1805 1716 break; 1806 1717 case KVM_REG_S390_CPU_TIMER: 1807 - r = put_user(vcpu->arch.sie_block->cputm, 1718 + r = put_user(kvm_s390_get_cpu_timer(vcpu), 1808 1719 (u64 __user *)reg->addr); 1809 1720 break; 1810 1721 case KVM_REG_S390_CLOCK_COMP: ··· 1842 1753 struct kvm_one_reg *reg) 1843 1754 { 1844 1755 int r = -EINVAL; 1756 + __u64 val; 1845 1757 1846 1758 switch (reg->id) { 1847 1759 case KVM_REG_S390_TODPR: ··· 1854 1764 (u64 __user *)reg->addr); 1855 1765 break; 1856 1766 case KVM_REG_S390_CPU_TIMER: 1857 - r = get_user(vcpu->arch.sie_block->cputm, 1858 - (u64 __user *)reg->addr); 1767 + r = get_user(val, (u64 __user *)reg->addr); 1768 + if (!r) 1769 + kvm_s390_set_cpu_timer(vcpu, val); 1859 1770 break; 1860 1771 case KVM_REG_S390_CLOCK_COMP: 1861 1772 r = get_user(vcpu->arch.sie_block->ckc, ··· 2249 2158 2250 2159 static int vcpu_post_run_fault_in_sie(struct kvm_vcpu *vcpu) 2251 2160 { 2252 - psw_t *psw = &vcpu->arch.sie_block->gpsw; 2253 - u8 opcode; 2161 + struct kvm_s390_pgm_info pgm_info = { 2162 + .code = PGM_ADDRESSING, 2163 + }; 2164 + u8 opcode, ilen; 2254 2165 int rc; 2255 2166 2256 2167 VCPU_EVENT(vcpu, 3, "%s", "fault in sie instruction"); ··· 2266 2173 * to look up the current opcode to get the length of the instruction 2267 2174 * to be able to forward the PSW. 2268 2175 */ 2269 - rc = read_guest(vcpu, psw->addr, 0, &opcode, 1); 2270 - if (rc) 2271 - return kvm_s390_inject_prog_cond(vcpu, rc); 2272 - psw->addr = __rewind_psw(*psw, -insn_length(opcode)); 2273 - 2274 - return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); 2176 + rc = read_guest_instr(vcpu, &opcode, 1); 2177 + ilen = insn_length(opcode); 2178 + if (rc < 0) { 2179 + return rc; 2180 + } else if (rc) { 2181 + /* Instruction-Fetching Exceptions - we can't detect the ilen. 2182 + * Forward by arbitrary ilc, injection will take care of 2183 + * nullification if necessary. 2184 + */ 2185 + pgm_info = vcpu->arch.pgm; 2186 + ilen = 4; 2187 + } 2188 + pgm_info.flags = ilen | KVM_S390_PGM_FLAGS_ILC_VALID; 2189 + kvm_s390_forward_psw(vcpu, ilen); 2190 + return kvm_s390_inject_prog_irq(vcpu, &pgm_info); 2275 2191 } 2276 2192 2277 2193 static int vcpu_post_run(struct kvm_vcpu *vcpu, int exit_reason) ··· 2346 2244 */ 2347 2245 local_irq_disable(); 2348 2246 __kvm_guest_enter(); 2247 + __disable_cpu_timer_accounting(vcpu); 2349 2248 local_irq_enable(); 2350 2249 exit_reason = sie64a(vcpu->arch.sie_block, 2351 2250 vcpu->run->s.regs.gprs); 2352 2251 local_irq_disable(); 2252 + __enable_cpu_timer_accounting(vcpu); 2353 2253 __kvm_guest_exit(); 2354 2254 local_irq_enable(); 2355 2255 vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); ··· 2375 2271 kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); 2376 2272 } 2377 2273 if (kvm_run->kvm_dirty_regs & KVM_SYNC_ARCH0) { 2378 - vcpu->arch.sie_block->cputm = kvm_run->s.regs.cputm; 2274 + kvm_s390_set_cpu_timer(vcpu, kvm_run->s.regs.cputm); 2379 2275 vcpu->arch.sie_block->ckc = kvm_run->s.regs.ckc; 2380 2276 vcpu->arch.sie_block->todpr = kvm_run->s.regs.todpr; 2381 2277 vcpu->arch.sie_block->pp = kvm_run->s.regs.pp; ··· 2397 2293 kvm_run->psw_addr = vcpu->arch.sie_block->gpsw.addr; 2398 2294 kvm_run->s.regs.prefix = kvm_s390_get_prefix(vcpu); 2399 2295 memcpy(&kvm_run->s.regs.crs, &vcpu->arch.sie_block->gcr, 128); 2400 - kvm_run->s.regs.cputm = vcpu->arch.sie_block->cputm; 2296 + kvm_run->s.regs.cputm = kvm_s390_get_cpu_timer(vcpu); 2401 2297 kvm_run->s.regs.ckc = vcpu->arch.sie_block->ckc; 2402 2298 kvm_run->s.regs.todpr = vcpu->arch.sie_block->todpr; 2403 2299 kvm_run->s.regs.pp = vcpu->arch.sie_block->pp; ··· 2429 2325 } 2430 2326 2431 2327 sync_regs(vcpu, kvm_run); 2328 + enable_cpu_timer_accounting(vcpu); 2432 2329 2433 2330 might_fault(); 2434 2331 rc = __vcpu_run(vcpu); ··· 2449 2344 rc = 0; 2450 2345 } 2451 2346 2347 + disable_cpu_timer_accounting(vcpu); 2452 2348 store_regs(vcpu, kvm_run); 2453 2349 2454 2350 if (vcpu->sigset_active) ··· 2470 2364 unsigned char archmode = 1; 2471 2365 freg_t fprs[NUM_FPRS]; 2472 2366 unsigned int px; 2473 - u64 clkcomp; 2367 + u64 clkcomp, cputm; 2474 2368 int rc; 2475 2369 2476 2370 px = kvm_s390_get_prefix(vcpu); ··· 2492 2386 fprs, 128); 2493 2387 } else { 2494 2388 rc = write_guest_abs(vcpu, gpa + __LC_FPREGS_SAVE_AREA, 2495 - vcpu->run->s.regs.vrs, 128); 2389 + vcpu->run->s.regs.fprs, 128); 2496 2390 } 2497 2391 rc |= write_guest_abs(vcpu, gpa + __LC_GPREGS_SAVE_AREA, 2498 2392 vcpu->run->s.regs.gprs, 128); ··· 2504 2398 &vcpu->run->s.regs.fpc, 4); 2505 2399 rc |= write_guest_abs(vcpu, gpa + __LC_TOD_PROGREG_SAVE_AREA, 2506 2400 &vcpu->arch.sie_block->todpr, 4); 2401 + cputm = kvm_s390_get_cpu_timer(vcpu); 2507 2402 rc |= write_guest_abs(vcpu, gpa + __LC_CPU_TIMER_SAVE_AREA, 2508 - &vcpu->arch.sie_block->cputm, 8); 2403 + &cputm, 8); 2509 2404 clkcomp = vcpu->arch.sie_block->ckc >> 8; 2510 2405 rc |= write_guest_abs(vcpu, gpa + __LC_CLOCK_COMP_SAVE_AREA, 2511 2406 &clkcomp, 8); ··· 2712 2605 switch (mop->op) { 2713 2606 case KVM_S390_MEMOP_LOGICAL_READ: 2714 2607 if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { 2715 - r = check_gva_range(vcpu, mop->gaddr, mop->ar, mop->size, false); 2608 + r = check_gva_range(vcpu, mop->gaddr, mop->ar, 2609 + mop->size, GACC_FETCH); 2716 2610 break; 2717 2611 } 2718 2612 r = read_guest(vcpu, mop->gaddr, mop->ar, tmpbuf, mop->size); ··· 2724 2616 break; 2725 2617 case KVM_S390_MEMOP_LOGICAL_WRITE: 2726 2618 if (mop->flags & KVM_S390_MEMOP_F_CHECK_ONLY) { 2727 - r = check_gva_range(vcpu, mop->gaddr, mop->ar, mop->size, true); 2619 + r = check_gva_range(vcpu, mop->gaddr, mop->ar, 2620 + mop->size, GACC_STORE); 2728 2621 break; 2729 2622 } 2730 2623 if (copy_from_user(tmpbuf, uaddr, mop->size)) {

+25 -3

arch/s390/kvm/kvm-s390.h

··· 19 19 #include <linux/kvm.h> 20 20 #include <linux/kvm_host.h> 21 21 #include <asm/facility.h> 22 + #include <asm/processor.h> 22 23 23 24 typedef int (*intercept_handler_t)(struct kvm_vcpu *vcpu); 24 25 ··· 52 51 static inline int is_vcpu_stopped(struct kvm_vcpu *vcpu) 53 52 { 54 53 return atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_STOPPED; 54 + } 55 + 56 + static inline int is_vcpu_idle(struct kvm_vcpu *vcpu) 57 + { 58 + return atomic_read(&vcpu->arch.sie_block->cpuflags) & CPUSTAT_WAIT; 55 59 } 56 60 57 61 static inline int kvm_is_ucontrol(struct kvm *kvm) ··· 160 154 /* test availability of facility in a kvm instance */ 161 155 static inline int test_kvm_facility(struct kvm *kvm, unsigned long nr) 162 156 { 163 - return __test_facility(nr, kvm->arch.model.fac->mask) && 164 - __test_facility(nr, kvm->arch.model.fac->list); 157 + return __test_facility(nr, kvm->arch.model.fac_mask) && 158 + __test_facility(nr, kvm->arch.model.fac_list); 165 159 } 166 160 167 161 static inline int set_kvm_facility(u64 *fac_list, unsigned long nr) ··· 218 212 int kvm_s390_mask_adapter(struct kvm *kvm, unsigned int id, bool masked); 219 213 220 214 /* implemented in intercept.c */ 221 - void kvm_s390_rewind_psw(struct kvm_vcpu *vcpu, int ilc); 215 + u8 kvm_s390_get_ilen(struct kvm_vcpu *vcpu); 222 216 int kvm_handle_sie_intercept(struct kvm_vcpu *vcpu); 217 + static inline void kvm_s390_rewind_psw(struct kvm_vcpu *vcpu, int ilen) 218 + { 219 + struct kvm_s390_sie_block *sie_block = vcpu->arch.sie_block; 220 + 221 + sie_block->gpsw.addr = __rewind_psw(sie_block->gpsw, ilen); 222 + } 223 + static inline void kvm_s390_forward_psw(struct kvm_vcpu *vcpu, int ilen) 224 + { 225 + kvm_s390_rewind_psw(vcpu, -ilen); 226 + } 227 + static inline void kvm_s390_retry_instr(struct kvm_vcpu *vcpu) 228 + { 229 + kvm_s390_rewind_psw(vcpu, kvm_s390_get_ilen(vcpu)); 230 + } 223 231 224 232 /* implemented in priv.c */ 225 233 int is_valid_psw(psw_t *psw); ··· 268 248 void kvm_s390_vcpu_unsetup_cmma(struct kvm_vcpu *vcpu); 269 249 unsigned long kvm_s390_fac_list_mask_size(void); 270 250 extern unsigned long kvm_s390_fac_list_mask[]; 251 + void kvm_s390_set_cpu_timer(struct kvm_vcpu *vcpu, __u64 cputm); 252 + __u64 kvm_s390_get_cpu_timer(struct kvm_vcpu *vcpu); 271 253 272 254 /* implemented in diag.c */ 273 255 int kvm_s390_handle_diag(struct kvm_vcpu *vcpu);

+8 -7

arch/s390/kvm/priv.c

··· 173 173 if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_PSTATE) 174 174 return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP); 175 175 176 - kvm_s390_rewind_psw(vcpu, 4); 176 + kvm_s390_retry_instr(vcpu); 177 177 VCPU_EVENT(vcpu, 4, "%s", "retrying storage key operation"); 178 178 return 0; 179 179 } ··· 184 184 if (psw_bits(vcpu->arch.sie_block->gpsw).p) 185 185 return kvm_s390_inject_program_int(vcpu, PGM_PRIVILEGED_OP); 186 186 wait_event(vcpu->kvm->arch.ipte_wq, !ipte_lock_held(vcpu)); 187 - kvm_s390_rewind_psw(vcpu, 4); 187 + kvm_s390_retry_instr(vcpu); 188 188 VCPU_EVENT(vcpu, 4, "%s", "retrying ipte interlock operation"); 189 189 return 0; 190 190 } ··· 354 354 * We need to shift the lower 32 facility bits (bit 0-31) from a u64 355 355 * into a u32 memory representation. They will remain bits 0-31. 356 356 */ 357 - fac = *vcpu->kvm->arch.model.fac->list >> 32; 357 + fac = *vcpu->kvm->arch.model.fac_list >> 32; 358 358 rc = write_guest_lc(vcpu, offsetof(struct lowcore, stfl_fac_list), 359 359 &fac, sizeof(fac)); 360 360 if (rc) ··· 759 759 if (((vcpu->arch.sie_block->ipb & 0xf0000000) >> 28) > 6) 760 760 return kvm_s390_inject_program_int(vcpu, PGM_SPECIFICATION); 761 761 762 - /* Rewind PSW to repeat the ESSA instruction */ 763 - kvm_s390_rewind_psw(vcpu, 4); 762 + /* Retry the ESSA instruction */ 763 + kvm_s390_retry_instr(vcpu); 764 764 vcpu->arch.sie_block->cbrlo &= PAGE_MASK; /* reset nceo */ 765 765 cbrlo = phys_to_virt(vcpu->arch.sie_block->cbrlo); 766 766 down_read(&gmap->mm->mmap_sem); ··· 981 981 return -EOPNOTSUPP; 982 982 if (vcpu->arch.sie_block->gpsw.mask & PSW_MASK_DAT) 983 983 ipte_lock(vcpu); 984 - ret = guest_translate_address(vcpu, address1, ar, &gpa, 1); 984 + ret = guest_translate_address(vcpu, address1, ar, &gpa, GACC_STORE); 985 985 if (ret == PGM_PROTECTION) { 986 986 /* Write protected? Try again with read-only... */ 987 987 cc = 1; 988 - ret = guest_translate_address(vcpu, address1, ar, &gpa, 0); 988 + ret = guest_translate_address(vcpu, address1, ar, &gpa, 989 + GACC_FETCH); 989 990 } 990 991 if (ret) { 991 992 if (ret == PGM_ADDRESSING || ret == PGM_TRANSLATION_SPEC) {

+20 -11

arch/x86/include/asm/kvm_host.h

··· 32 32 #include <asm/mtrr.h> 33 33 #include <asm/msr-index.h> 34 34 #include <asm/asm.h> 35 + #include <asm/kvm_page_track.h> 35 36 36 37 #define KVM_MAX_VCPUS 255 37 38 #define KVM_SOFT_MAX_VCPUS 160 ··· 215 214 void *objects[KVM_NR_MEM_OBJS]; 216 215 }; 217 216 217 + /* 218 + * the pages used as guest page table on soft mmu are tracked by 219 + * kvm_memory_slot.arch.gfn_track which is 16 bits, so the role bits used 220 + * by indirect shadow page can not be more than 15 bits. 221 + * 222 + * Currently, we used 14 bits that are @level, @cr4_pae, @quadrant, @access, 223 + * @nxe, @cr0_wp, @smep_andnot_wp and @smap_andnot_wp. 224 + */ 218 225 union kvm_mmu_page_role { 219 226 unsigned word; 220 227 struct { ··· 285 276 #endif 286 277 287 278 /* Number of writes since the last time traversal visited this page. */ 288 - int write_flooding_count; 279 + atomic_t write_flooding_count; 289 280 }; 290 281 291 282 struct kvm_pio_request { ··· 347 338 348 339 struct rsvd_bits_validate guest_rsvd_check; 349 340 350 - /* 351 - * Bitmap: bit set = last pte in walk 352 - * index[0:1]: level (zero-based) 353 - * index[2]: pte.ps 354 - */ 355 - u8 last_pte_bitmap; 341 + /* Can have large pages at levels 2..last_nonleaf_level-1. */ 342 + u8 last_nonleaf_level; 356 343 357 344 bool nx; 358 345 ··· 503 498 struct kvm_mmu_memory_cache mmu_page_header_cache; 504 499 505 500 struct fpu guest_fpu; 506 - bool eager_fpu; 507 501 u64 xcr0; 508 502 u64 guest_supported_xcr0; 509 503 u32 guest_xstate_size; ··· 648 644 }; 649 645 650 646 struct kvm_lpage_info { 651 - int write_count; 647 + int disallow_lpage; 652 648 }; 653 649 654 650 struct kvm_arch_memory_slot { 655 651 struct kvm_rmap_head *rmap[KVM_NR_PAGE_SIZES]; 656 652 struct kvm_lpage_info *lpage_info[KVM_NR_PAGE_SIZES - 1]; 653 + unsigned short *gfn_track[KVM_PAGE_TRACK_MAX]; 657 654 }; 658 655 659 656 /* ··· 699 694 */ 700 695 struct list_head active_mmu_pages; 701 696 struct list_head zapped_obsolete_pages; 697 + struct kvm_page_track_notifier_node mmu_sp_tracker; 698 + struct kvm_page_track_notifier_head track_notifier_head; 702 699 703 700 struct list_head assigned_dev_head; 704 701 struct iommu_domain *iommu_domain; ··· 761 754 762 755 bool irqchip_split; 763 756 u8 nr_reserved_ioapic_pins; 757 + 758 + bool disabled_lapic_found; 764 759 }; 765 760 766 761 struct kvm_vm_stat { ··· 997 988 void kvm_mmu_destroy(struct kvm_vcpu *vcpu); 998 989 int kvm_mmu_create(struct kvm_vcpu *vcpu); 999 990 void kvm_mmu_setup(struct kvm_vcpu *vcpu); 991 + void kvm_mmu_init_vm(struct kvm *kvm); 992 + void kvm_mmu_uninit_vm(struct kvm *kvm); 1000 993 void kvm_mmu_set_mask_ptes(u64 user_mask, u64 accessed_mask, 1001 994 u64 dirty_mask, u64 nx_mask, u64 x_mask); 1002 995 ··· 1138 1127 1139 1128 void kvm_inject_nmi(struct kvm_vcpu *vcpu); 1140 1129 1141 - void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, 1142 - const u8 *new, int bytes); 1143 1130 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn); 1144 1131 int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva); 1145 1132 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);

+61

arch/x86/include/asm/kvm_page_track.h

··· 1 + #ifndef _ASM_X86_KVM_PAGE_TRACK_H 2 + #define _ASM_X86_KVM_PAGE_TRACK_H 3 + 4 + enum kvm_page_track_mode { 5 + KVM_PAGE_TRACK_WRITE, 6 + KVM_PAGE_TRACK_MAX, 7 + }; 8 + 9 + /* 10 + * The notifier represented by @kvm_page_track_notifier_node is linked into 11 + * the head which will be notified when guest is triggering the track event. 12 + * 13 + * Write access on the head is protected by kvm->mmu_lock, read access 14 + * is protected by track_srcu. 15 + */ 16 + struct kvm_page_track_notifier_head { 17 + struct srcu_struct track_srcu; 18 + struct hlist_head track_notifier_list; 19 + }; 20 + 21 + struct kvm_page_track_notifier_node { 22 + struct hlist_node node; 23 + 24 + /* 25 + * It is called when guest is writing the write-tracked page 26 + * and write emulation is finished at that time. 27 + * 28 + * @vcpu: the vcpu where the write access happened. 29 + * @gpa: the physical address written by guest. 30 + * @new: the data was written to the address. 31 + * @bytes: the written length. 32 + */ 33 + void (*track_write)(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, 34 + int bytes); 35 + }; 36 + 37 + void kvm_page_track_init(struct kvm *kvm); 38 + 39 + void kvm_page_track_free_memslot(struct kvm_memory_slot *free, 40 + struct kvm_memory_slot *dont); 41 + int kvm_page_track_create_memslot(struct kvm_memory_slot *slot, 42 + unsigned long npages); 43 + 44 + void kvm_slot_page_track_add_page(struct kvm *kvm, 45 + struct kvm_memory_slot *slot, gfn_t gfn, 46 + enum kvm_page_track_mode mode); 47 + void kvm_slot_page_track_remove_page(struct kvm *kvm, 48 + struct kvm_memory_slot *slot, gfn_t gfn, 49 + enum kvm_page_track_mode mode); 50 + bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn, 51 + enum kvm_page_track_mode mode); 52 + 53 + void 54 + kvm_page_track_register_notifier(struct kvm *kvm, 55 + struct kvm_page_track_notifier_node *n); 56 + void 57 + kvm_page_track_unregister_notifier(struct kvm *kvm, 58 + struct kvm_page_track_notifier_node *n); 59 + void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, 60 + int bytes); 61 + #endif

+3 -1

arch/x86/include/uapi/asm/hyperv.h

··· 226 226 (~((1ull << HV_X64_MSR_HYPERCALL_PAGE_ADDRESS_SHIFT) - 1)) 227 227 228 228 /* Declare the various hypercall operations. */ 229 - #define HV_X64_HV_NOTIFY_LONG_SPIN_WAIT 0x0008 229 + #define HVCALL_NOTIFY_LONG_SPIN_WAIT 0x0008 230 + #define HVCALL_POST_MESSAGE 0x005c 231 + #define HVCALL_SIGNAL_EVENT 0x005d 230 232 231 233 #define HV_X64_MSR_APIC_ASSIST_PAGE_ENABLE 0x00000001 232 234 #define HV_X64_MSR_APIC_ASSIST_PAGE_ADDRESS_SHIFT 12

+2 -1

arch/x86/kvm/Makefile

··· 13 13 14 14 kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ 15 15 i8254.o ioapic.o irq_comm.o cpuid.o pmu.o mtrr.o \ 16 - hyperv.o 16 + hyperv.o page_track.o 17 17 18 18 kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += assigned-dev.o iommu.o 19 + 19 20 kvm-intel-y += vmx.o pmu_intel.o 20 21 kvm-amd-y += svm.o pmu_amd.o 21 22

+4 -10

arch/x86/kvm/assigned-dev.c

··· 51 51 static struct kvm_assigned_dev_kernel *kvm_find_assigned_dev(struct list_head *head, 52 52 int assigned_dev_id) 53 53 { 54 - struct list_head *ptr; 55 54 struct kvm_assigned_dev_kernel *match; 56 55 57 - list_for_each(ptr, head) { 58 - match = list_entry(ptr, struct kvm_assigned_dev_kernel, list); 56 + list_for_each_entry(match, head, list) { 59 57 if (match->assigned_dev_id == assigned_dev_id) 60 58 return match; 61 59 } ··· 371 373 372 374 void kvm_free_all_assigned_devices(struct kvm *kvm) 373 375 { 374 - struct list_head *ptr, *ptr2; 375 - struct kvm_assigned_dev_kernel *assigned_dev; 376 + struct kvm_assigned_dev_kernel *assigned_dev, *tmp; 376 377 377 - list_for_each_safe(ptr, ptr2, &kvm->arch.assigned_dev_head) { 378 - assigned_dev = list_entry(ptr, 379 - struct kvm_assigned_dev_kernel, 380 - list); 381 - 378 + list_for_each_entry_safe(assigned_dev, tmp, 379 + &kvm->arch.assigned_dev_head, list) { 382 380 kvm_free_assigned_device(kvm, assigned_dev); 383 381 } 384 382 }

+10 -4

arch/x86/kvm/cpuid.c

··· 46 46 return ret; 47 47 } 48 48 49 + bool kvm_mpx_supported(void) 50 + { 51 + return ((host_xcr0 & (XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR)) 52 + && kvm_x86_ops->mpx_supported()); 53 + } 54 + EXPORT_SYMBOL_GPL(kvm_mpx_supported); 55 + 49 56 u64 kvm_supported_xcr0(void) 50 57 { 51 58 u64 xcr0 = KVM_SUPPORTED_XCR0 & host_xcr0; 52 59 53 - if (!kvm_x86_ops->mpx_supported()) 60 + if (!kvm_mpx_supported()) 54 61 xcr0 &= ~(XFEATURE_MASK_BNDREGS | XFEATURE_MASK_BNDCSR); 55 62 56 63 return xcr0; ··· 104 97 if (best && (best->eax & (F(XSAVES) | F(XSAVEC)))) 105 98 best->ebx = xstate_required_size(vcpu->arch.xcr0, true); 106 99 107 - vcpu->arch.eager_fpu = use_eager_fpu() || guest_cpuid_has_mpx(vcpu); 108 - if (vcpu->arch.eager_fpu) 100 + if (use_eager_fpu()) 109 101 kvm_x86_ops->fpu_activate(vcpu); 110 102 111 103 /* ··· 301 295 #endif 302 296 unsigned f_rdtscp = kvm_x86_ops->rdtscp_supported() ? F(RDTSCP) : 0; 303 297 unsigned f_invpcid = kvm_x86_ops->invpcid_supported() ? F(INVPCID) : 0; 304 - unsigned f_mpx = kvm_x86_ops->mpx_supported() ? F(MPX) : 0; 298 + unsigned f_mpx = kvm_mpx_supported() ? F(MPX) : 0; 305 299 unsigned f_xsaves = kvm_x86_ops->xsaves_supported() ? F(XSAVES) : 0; 306 300 307 301 /* cpuid 1.edx */

+1 -8

arch/x86/kvm/cpuid.h

··· 5 5 #include <asm/cpu.h> 6 6 7 7 int kvm_update_cpuid(struct kvm_vcpu *vcpu); 8 + bool kvm_mpx_supported(void); 8 9 struct kvm_cpuid_entry2 *kvm_find_cpuid_entry(struct kvm_vcpu *vcpu, 9 10 u32 function, u32 index); 10 11 int kvm_dev_ioctl_get_cpuid(struct kvm_cpuid2 *cpuid, ··· 134 133 135 134 best = kvm_find_cpuid_entry(vcpu, 7, 0); 136 135 return best && (best->ebx & bit(X86_FEATURE_RTM)); 137 - } 138 - 139 - static inline bool guest_cpuid_has_mpx(struct kvm_vcpu *vcpu) 140 - { 141 - struct kvm_cpuid_entry2 *best; 142 - 143 - best = kvm_find_cpuid_entry(vcpu, 7, 0); 144 - return best && (best->ebx & bit(X86_FEATURE_MPX)); 145 136 } 146 137 147 138 static inline bool guest_cpuid_has_pcommit(struct kvm_vcpu *vcpu)

+41 -9

arch/x86/kvm/hyperv.c

··· 1043 1043 return kvm->arch.hyperv.hv_hypercall & HV_X64_MSR_HYPERCALL_ENABLE; 1044 1044 } 1045 1045 1046 + static void kvm_hv_hypercall_set_result(struct kvm_vcpu *vcpu, u64 result) 1047 + { 1048 + bool longmode; 1049 + 1050 + longmode = is_64_bit_mode(vcpu); 1051 + if (longmode) 1052 + kvm_register_write(vcpu, VCPU_REGS_RAX, result); 1053 + else { 1054 + kvm_register_write(vcpu, VCPU_REGS_RDX, result >> 32); 1055 + kvm_register_write(vcpu, VCPU_REGS_RAX, result & 0xffffffff); 1056 + } 1057 + } 1058 + 1059 + static int kvm_hv_hypercall_complete_userspace(struct kvm_vcpu *vcpu) 1060 + { 1061 + struct kvm_run *run = vcpu->run; 1062 + 1063 + kvm_hv_hypercall_set_result(vcpu, run->hyperv.u.hcall.result); 1064 + return 1; 1065 + } 1066 + 1046 1067 int kvm_hv_hypercall(struct kvm_vcpu *vcpu) 1047 1068 { 1048 1069 u64 param, ingpa, outgpa, ret; ··· 1076 1055 */ 1077 1056 if (kvm_x86_ops->get_cpl(vcpu) != 0 || !is_protmode(vcpu)) { 1078 1057 kvm_queue_exception(vcpu, UD_VECTOR); 1079 - return 0; 1058 + return 1; 1080 1059 } 1081 1060 1082 1061 longmode = is_64_bit_mode(vcpu); ··· 1104 1083 1105 1084 trace_kvm_hv_hypercall(code, fast, rep_cnt, rep_idx, ingpa, outgpa); 1106 1085 1086 + /* Hypercall continuation is not supported yet */ 1087 + if (rep_cnt || rep_idx) { 1088 + res = HV_STATUS_INVALID_HYPERCALL_CODE; 1089 + goto set_result; 1090 + } 1091 + 1107 1092 switch (code) { 1108 - case HV_X64_HV_NOTIFY_LONG_SPIN_WAIT: 1093 + case HVCALL_NOTIFY_LONG_SPIN_WAIT: 1109 1094 kvm_vcpu_on_spin(vcpu); 1110 1095 break; 1096 + case HVCALL_POST_MESSAGE: 1097 + case HVCALL_SIGNAL_EVENT: 1098 + vcpu->run->exit_reason = KVM_EXIT_HYPERV; 1099 + vcpu->run->hyperv.type = KVM_EXIT_HYPERV_HCALL; 1100 + vcpu->run->hyperv.u.hcall.input = param; 1101 + vcpu->run->hyperv.u.hcall.params[0] = ingpa; 1102 + vcpu->run->hyperv.u.hcall.params[1] = outgpa; 1103 + vcpu->arch.complete_userspace_io = 1104 + kvm_hv_hypercall_complete_userspace; 1105 + return 0; 1111 1106 default: 1112 1107 res = HV_STATUS_INVALID_HYPERCALL_CODE; 1113 1108 break; 1114 1109 } 1115 1110 1111 + set_result: 1116 1112 ret = res | (((u64)rep_done & 0xfff) << 32); 1117 - if (longmode) { 1118 - kvm_register_write(vcpu, VCPU_REGS_RAX, ret); 1119 - } else { 1120 - kvm_register_write(vcpu, VCPU_REGS_RDX, ret >> 32); 1121 - kvm_register_write(vcpu, VCPU_REGS_RAX, ret & 0xffffffff); 1122 - } 1123 - 1113 + kvm_hv_hypercall_set_result(vcpu, ret); 1124 1114 return 1; 1125 1115 }

+149 -197

arch/x86/kvm/i8254.c

··· 51 51 #define RW_STATE_WORD0 3 52 52 #define RW_STATE_WORD1 4 53 53 54 - /* Compute with 96 bit intermediate result: (a*b)/c */ 55 - static u64 muldiv64(u64 a, u32 b, u32 c) 54 + static void pit_set_gate(struct kvm_pit *pit, int channel, u32 val) 56 55 { 57 - union { 58 - u64 ll; 59 - struct { 60 - u32 low, high; 61 - } l; 62 - } u, res; 63 - u64 rl, rh; 64 - 65 - u.ll = a; 66 - rl = (u64)u.l.low * (u64)b; 67 - rh = (u64)u.l.high * (u64)b; 68 - rh += (rl >> 32); 69 - res.l.high = div64_u64(rh, c); 70 - res.l.low = div64_u64(((mod_64(rh, c) << 32) + (rl & 0xffffffff)), c); 71 - return res.ll; 72 - } 73 - 74 - static void pit_set_gate(struct kvm *kvm, int channel, u32 val) 75 - { 76 - struct kvm_kpit_channel_state *c = 77 - &kvm->arch.vpit->pit_state.channels[channel]; 78 - 79 - WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock)); 56 + struct kvm_kpit_channel_state *c = &pit->pit_state.channels[channel]; 80 57 81 58 switch (c->mode) { 82 59 default: ··· 74 97 c->gate = val; 75 98 } 76 99 77 - static int pit_get_gate(struct kvm *kvm, int channel) 100 + static int pit_get_gate(struct kvm_pit *pit, int channel) 78 101 { 79 - WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock)); 80 - 81 - return kvm->arch.vpit->pit_state.channels[channel].gate; 102 + return pit->pit_state.channels[channel].gate; 82 103 } 83 104 84 - static s64 __kpit_elapsed(struct kvm *kvm) 105 + static s64 __kpit_elapsed(struct kvm_pit *pit) 85 106 { 86 107 s64 elapsed; 87 108 ktime_t remaining; 88 - struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state; 109 + struct kvm_kpit_state *ps = &pit->pit_state; 89 110 90 111 if (!ps->period) 91 112 return 0; ··· 103 128 return elapsed; 104 129 } 105 130 106 - static s64 kpit_elapsed(struct kvm *kvm, struct kvm_kpit_channel_state *c, 131 + static s64 kpit_elapsed(struct kvm_pit *pit, struct kvm_kpit_channel_state *c, 107 132 int channel) 108 133 { 109 134 if (channel == 0) 110 - return __kpit_elapsed(kvm); 135 + return __kpit_elapsed(pit); 111 136 112 137 return ktime_to_ns(ktime_sub(ktime_get(), c->count_load_time)); 113 138 } 114 139 115 - static int pit_get_count(struct kvm *kvm, int channel) 140 + static int pit_get_count(struct kvm_pit *pit, int channel) 116 141 { 117 - struct kvm_kpit_channel_state *c = 118 - &kvm->arch.vpit->pit_state.channels[channel]; 142 + struct kvm_kpit_channel_state *c = &pit->pit_state.channels[channel]; 119 143 s64 d, t; 120 144 int counter; 121 145 122 - WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock)); 123 - 124 - t = kpit_elapsed(kvm, c, channel); 125 - d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC); 146 + t = kpit_elapsed(pit, c, channel); 147 + d = mul_u64_u32_div(t, KVM_PIT_FREQ, NSEC_PER_SEC); 126 148 127 149 switch (c->mode) { 128 150 case 0: ··· 139 167 return counter; 140 168 } 141 169 142 - static int pit_get_out(struct kvm *kvm, int channel) 170 + static int pit_get_out(struct kvm_pit *pit, int channel) 143 171 { 144 - struct kvm_kpit_channel_state *c = 145 - &kvm->arch.vpit->pit_state.channels[channel]; 172 + struct kvm_kpit_channel_state *c = &pit->pit_state.channels[channel]; 146 173 s64 d, t; 147 174 int out; 148 175 149 - WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock)); 150 - 151 - t = kpit_elapsed(kvm, c, channel); 152 - d = muldiv64(t, KVM_PIT_FREQ, NSEC_PER_SEC); 176 + t = kpit_elapsed(pit, c, channel); 177 + d = mul_u64_u32_div(t, KVM_PIT_FREQ, NSEC_PER_SEC); 153 178 154 179 switch (c->mode) { 155 180 default: ··· 171 202 return out; 172 203 } 173 204 174 - static void pit_latch_count(struct kvm *kvm, int channel) 205 + static void pit_latch_count(struct kvm_pit *pit, int channel) 175 206 { 176 - struct kvm_kpit_channel_state *c = 177 - &kvm->arch.vpit->pit_state.channels[channel]; 178 - 179 - WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock)); 207 + struct kvm_kpit_channel_state *c = &pit->pit_state.channels[channel]; 180 208 181 209 if (!c->count_latched) { 182 - c->latched_count = pit_get_count(kvm, channel); 210 + c->latched_count = pit_get_count(pit, channel); 183 211 c->count_latched = c->rw_mode; 184 212 } 185 213 } 186 214 187 - static void pit_latch_status(struct kvm *kvm, int channel) 215 + static void pit_latch_status(struct kvm_pit *pit, int channel) 188 216 { 189 - struct kvm_kpit_channel_state *c = 190 - &kvm->arch.vpit->pit_state.channels[channel]; 191 - 192 - WARN_ON(!mutex_is_locked(&kvm->arch.vpit->pit_state.lock)); 217 + struct kvm_kpit_channel_state *c = &pit->pit_state.channels[channel]; 193 218 194 219 if (!c->status_latched) { 195 220 /* TODO: Return NULL COUNT (bit 6). */ 196 - c->status = ((pit_get_out(kvm, channel) << 7) | 221 + c->status = ((pit_get_out(pit, channel) << 7) | 197 222 (c->rw_mode << 4) | 198 223 (c->mode << 1) | 199 224 c->bcd); ··· 195 232 } 196 233 } 197 234 235 + static inline struct kvm_pit *pit_state_to_pit(struct kvm_kpit_state *ps) 236 + { 237 + return container_of(ps, struct kvm_pit, pit_state); 238 + } 239 + 198 240 static void kvm_pit_ack_irq(struct kvm_irq_ack_notifier *kian) 199 241 { 200 242 struct kvm_kpit_state *ps = container_of(kian, struct kvm_kpit_state, 201 243 irq_ack_notifier); 202 - int value; 244 + struct kvm_pit *pit = pit_state_to_pit(ps); 203 245 204 - spin_lock(&ps->inject_lock); 205 - value = atomic_dec_return(&ps->pending); 206 - if (value < 0) 207 - /* spurious acks can be generated if, for example, the 208 - * PIC is being reset. Handle it gracefully here 209 - */ 210 - atomic_inc(&ps->pending); 211 - else if (value > 0) 212 - /* in this case, we had multiple outstanding pit interrupts 213 - * that we needed to inject. Reinject 214 - */ 215 - queue_kthread_work(&ps->pit->worker, &ps->pit->expired); 216 - ps->irq_ack = 1; 217 - spin_unlock(&ps->inject_lock); 246 + atomic_set(&ps->irq_ack, 1); 247 + /* irq_ack should be set before pending is read. Order accesses with 248 + * inc(pending) in pit_timer_fn and xchg(irq_ack, 0) in pit_do_work. 249 + */ 250 + smp_mb(); 251 + if (atomic_dec_if_positive(&ps->pending) > 0) 252 + queue_kthread_work(&pit->worker, &pit->expired); 218 253 } 219 254 220 255 void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu) ··· 243 282 struct kvm_vcpu *vcpu; 244 283 int i; 245 284 struct kvm_kpit_state *ps = &pit->pit_state; 246 - int inject = 0; 247 285 248 - /* Try to inject pending interrupts when 249 - * last one has been acked. 286 + if (atomic_read(&ps->reinject) && !atomic_xchg(&ps->irq_ack, 0)) 287 + return; 288 + 289 + kvm_set_irq(kvm, pit->irq_source_id, 0, 1, false); 290 + kvm_set_irq(kvm, pit->irq_source_id, 0, 0, false); 291 + 292 + /* 293 + * Provides NMI watchdog support via Virtual Wire mode. 294 + * The route is: PIT -> LVT0 in NMI mode. 295 + * 296 + * Note: Our Virtual Wire implementation does not follow 297 + * the MP specification. We propagate a PIT interrupt to all 298 + * VCPUs and only when LVT0 is in NMI mode. The interrupt can 299 + * also be simultaneously delivered through PIC and IOAPIC. 250 300 */ 251 - spin_lock(&ps->inject_lock); 252 - if (ps->irq_ack) { 253 - ps->irq_ack = 0; 254 - inject = 1; 255 - } 256 - spin_unlock(&ps->inject_lock); 257 - if (inject) { 258 - kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 1, false); 259 - kvm_set_irq(kvm, kvm->arch.vpit->irq_source_id, 0, 0, false); 260 - 261 - /* 262 - * Provides NMI watchdog support via Virtual Wire mode. 263 - * The route is: PIT -> PIC -> LVT0 in NMI mode. 264 - * 265 - * Note: Our Virtual Wire implementation is simplified, only 266 - * propagating PIT interrupts to all VCPUs when they have set 267 - * LVT0 to NMI delivery. Other PIC interrupts are just sent to 268 - * VCPU0, and only if its LVT0 is in EXTINT mode. 269 - */ 270 - if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0) 271 - kvm_for_each_vcpu(i, vcpu, kvm) 272 - kvm_apic_nmi_wd_deliver(vcpu); 273 - } 301 + if (atomic_read(&kvm->arch.vapics_in_nmi_mode) > 0) 302 + kvm_for_each_vcpu(i, vcpu, kvm) 303 + kvm_apic_nmi_wd_deliver(vcpu); 274 304 } 275 305 276 306 static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) 277 307 { 278 308 struct kvm_kpit_state *ps = container_of(data, struct kvm_kpit_state, timer); 279 - struct kvm_pit *pt = ps->kvm->arch.vpit; 309 + struct kvm_pit *pt = pit_state_to_pit(ps); 280 310 281 - if (ps->reinject || !atomic_read(&ps->pending)) { 311 + if (atomic_read(&ps->reinject)) 282 312 atomic_inc(&ps->pending); 283 - queue_kthread_work(&pt->worker, &pt->expired); 284 - } 313 + 314 + queue_kthread_work(&pt->worker, &pt->expired); 285 315 286 316 if (ps->is_periodic) { 287 317 hrtimer_add_expires_ns(&ps->timer, ps->period); ··· 281 329 return HRTIMER_NORESTART; 282 330 } 283 331 284 - static void create_pit_timer(struct kvm *kvm, u32 val, int is_period) 332 + static inline void kvm_pit_reset_reinject(struct kvm_pit *pit) 285 333 { 286 - struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state; 334 + atomic_set(&pit->pit_state.pending, 0); 335 + atomic_set(&pit->pit_state.irq_ack, 1); 336 + } 337 + 338 + void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject) 339 + { 340 + struct kvm_kpit_state *ps = &pit->pit_state; 341 + struct kvm *kvm = pit->kvm; 342 + 343 + if (atomic_read(&ps->reinject) == reinject) 344 + return; 345 + 346 + if (reinject) { 347 + /* The initial state is preserved while ps->reinject == 0. */ 348 + kvm_pit_reset_reinject(pit); 349 + kvm_register_irq_ack_notifier(kvm, &ps->irq_ack_notifier); 350 + kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier); 351 + } else { 352 + kvm_unregister_irq_ack_notifier(kvm, &ps->irq_ack_notifier); 353 + kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier); 354 + } 355 + 356 + atomic_set(&ps->reinject, reinject); 357 + } 358 + 359 + static void create_pit_timer(struct kvm_pit *pit, u32 val, int is_period) 360 + { 361 + struct kvm_kpit_state *ps = &pit->pit_state; 362 + struct kvm *kvm = pit->kvm; 287 363 s64 interval; 288 364 289 365 if (!ioapic_in_kernel(kvm) || 290 366 ps->flags & KVM_PIT_FLAGS_HPET_LEGACY) 291 367 return; 292 368 293 - interval = muldiv64(val, NSEC_PER_SEC, KVM_PIT_FREQ); 369 + interval = mul_u64_u32_div(val, NSEC_PER_SEC, KVM_PIT_FREQ); 294 370 295 371 pr_debug("create pit timer, interval is %llu nsec\n", interval); 296 372 297 373 /* TODO The new value only affected after the retriggered */ 298 374 hrtimer_cancel(&ps->timer); 299 - flush_kthread_work(&ps->pit->expired); 375 + flush_kthread_work(&pit->expired); 300 376 ps->period = interval; 301 377 ps->is_periodic = is_period; 302 378 303 - ps->timer.function = pit_timer_fn; 304 - ps->kvm = ps->pit->kvm; 305 - 306 - atomic_set(&ps->pending, 0); 307 - ps->irq_ack = 1; 379 + kvm_pit_reset_reinject(pit); 308 380 309 381 /* 310 382 * Do not allow the guest to program periodic timers with small ··· 351 375 HRTIMER_MODE_ABS); 352 376 } 353 377 354 - static void pit_load_count(struct kvm *kvm, int channel, u32 val) 378 + static void pit_load_count(struct kvm_pit *pit, int channel, u32 val) 355 379 { 356 - struct kvm_kpit_state *ps = &kvm->arch.vpit->pit_state; 357 - 358 - WARN_ON(!mutex_is_locked(&ps->lock)); 380 + struct kvm_kpit_state *ps = &pit->pit_state; 359 381 360 382 pr_debug("load_count val is %d, channel is %d\n", val, channel); 361 383 ··· 378 404 case 1: 379 405 /* FIXME: enhance mode 4 precision */ 380 406 case 4: 381 - create_pit_timer(kvm, val, 0); 407 + create_pit_timer(pit, val, 0); 382 408 break; 383 409 case 2: 384 410 case 3: 385 - create_pit_timer(kvm, val, 1); 411 + create_pit_timer(pit, val, 1); 386 412 break; 387 413 default: 388 - destroy_pit_timer(kvm->arch.vpit); 414 + destroy_pit_timer(pit); 389 415 } 390 416 } 391 417 392 - void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start) 418 + void kvm_pit_load_count(struct kvm_pit *pit, int channel, u32 val, 419 + int hpet_legacy_start) 393 420 { 394 421 u8 saved_mode; 422 + 423 + WARN_ON_ONCE(!mutex_is_locked(&pit->pit_state.lock)); 424 + 395 425 if (hpet_legacy_start) { 396 426 /* save existing mode for later reenablement */ 397 427 WARN_ON(channel != 0); 398 - saved_mode = kvm->arch.vpit->pit_state.channels[0].mode; 399 - kvm->arch.vpit->pit_state.channels[0].mode = 0xff; /* disable timer */ 400 - pit_load_count(kvm, channel, val); 401 - kvm->arch.vpit->pit_state.channels[0].mode = saved_mode; 428 + saved_mode = pit->pit_state.channels[0].mode; 429 + pit->pit_state.channels[0].mode = 0xff; /* disable timer */ 430 + pit_load_count(pit, channel, val); 431 + pit->pit_state.channels[0].mode = saved_mode; 402 432 } else { 403 - pit_load_count(kvm, channel, val); 433 + pit_load_count(pit, channel, val); 404 434 } 405 435 } 406 436 ··· 430 452 { 431 453 struct kvm_pit *pit = dev_to_pit(this); 432 454 struct kvm_kpit_state *pit_state = &pit->pit_state; 433 - struct kvm *kvm = pit->kvm; 434 455 int channel, access; 435 456 struct kvm_kpit_channel_state *s; 436 457 u32 val = *(u32 *) data; ··· 453 476 s = &pit_state->channels[channel]; 454 477 if (val & (2 << channel)) { 455 478 if (!(val & 0x20)) 456 - pit_latch_count(kvm, channel); 479 + pit_latch_count(pit, channel); 457 480 if (!(val & 0x10)) 458 - pit_latch_status(kvm, channel); 481 + pit_latch_status(pit, channel); 459 482 } 460 483 } 461 484 } else { ··· 463 486 s = &pit_state->channels[channel]; 464 487 access = (val >> 4) & KVM_PIT_CHANNEL_MASK; 465 488 if (access == 0) { 466 - pit_latch_count(kvm, channel); 489 + pit_latch_count(pit, channel); 467 490 } else { 468 491 s->rw_mode = access; 469 492 s->read_state = access; ··· 480 503 switch (s->write_state) { 481 504 default: 482 505 case RW_STATE_LSB: 483 - pit_load_count(kvm, addr, val); 506 + pit_load_count(pit, addr, val); 484 507 break; 485 508 case RW_STATE_MSB: 486 - pit_load_count(kvm, addr, val << 8); 509 + pit_load_count(pit, addr, val << 8); 487 510 break; 488 511 case RW_STATE_WORD0: 489 512 s->write_latch = val; 490 513 s->write_state = RW_STATE_WORD1; 491 514 break; 492 515 case RW_STATE_WORD1: 493 - pit_load_count(kvm, addr, s->write_latch | (val << 8)); 516 + pit_load_count(pit, addr, s->write_latch | (val << 8)); 494 517 s->write_state = RW_STATE_WORD0; 495 518 break; 496 519 } ··· 506 529 { 507 530 struct kvm_pit *pit = dev_to_pit(this); 508 531 struct kvm_kpit_state *pit_state = &pit->pit_state; 509 - struct kvm *kvm = pit->kvm; 510 532 int ret, count; 511 533 struct kvm_kpit_channel_state *s; 512 534 if (!pit_in_range(addr)) ··· 542 566 switch (s->read_state) { 543 567 default: 544 568 case RW_STATE_LSB: 545 - count = pit_get_count(kvm, addr); 569 + count = pit_get_count(pit, addr); 546 570 ret = count & 0xff; 547 571 break; 548 572 case RW_STATE_MSB: 549 - count = pit_get_count(kvm, addr); 573 + count = pit_get_count(pit, addr); 550 574 ret = (count >> 8) & 0xff; 551 575 break; 552 576 case RW_STATE_WORD0: 553 - count = pit_get_count(kvm, addr); 577 + count = pit_get_count(pit, addr); 554 578 ret = count & 0xff; 555 579 s->read_state = RW_STATE_WORD1; 556 580 break; 557 581 case RW_STATE_WORD1: 558 - count = pit_get_count(kvm, addr); 582 + count = pit_get_count(pit, addr); 559 583 ret = (count >> 8) & 0xff; 560 584 s->read_state = RW_STATE_WORD0; 561 585 break; ··· 576 600 { 577 601 struct kvm_pit *pit = speaker_to_pit(this); 578 602 struct kvm_kpit_state *pit_state = &pit->pit_state; 579 - struct kvm *kvm = pit->kvm; 580 603 u32 val = *(u32 *) data; 581 604 if (addr != KVM_SPEAKER_BASE_ADDRESS) 582 605 return -EOPNOTSUPP; 583 606 584 607 mutex_lock(&pit_state->lock); 585 608 pit_state->speaker_data_on = (val >> 1) & 1; 586 - pit_set_gate(kvm, 2, val & 1); 609 + pit_set_gate(pit, 2, val & 1); 587 610 mutex_unlock(&pit_state->lock); 588 611 return 0; 589 612 } ··· 593 618 { 594 619 struct kvm_pit *pit = speaker_to_pit(this); 595 620 struct kvm_kpit_state *pit_state = &pit->pit_state; 596 - struct kvm *kvm = pit->kvm; 597 621 unsigned int refresh_clock; 598 622 int ret; 599 623 if (addr != KVM_SPEAKER_BASE_ADDRESS) ··· 602 628 refresh_clock = ((unsigned int)ktime_to_ns(ktime_get()) >> 14) & 1; 603 629 604 630 mutex_lock(&pit_state->lock); 605 - ret = ((pit_state->speaker_data_on << 1) | pit_get_gate(kvm, 2) | 606 - (pit_get_out(kvm, 2) << 5) | (refresh_clock << 4)); 631 + ret = ((pit_state->speaker_data_on << 1) | pit_get_gate(pit, 2) | 632 + (pit_get_out(pit, 2) << 5) | (refresh_clock << 4)); 607 633 if (len > sizeof(ret)) 608 634 len = sizeof(ret); 609 635 memcpy(data, (char *)&ret, len); ··· 611 637 return 0; 612 638 } 613 639 614 - void kvm_pit_reset(struct kvm_pit *pit) 640 + static void kvm_pit_reset(struct kvm_pit *pit) 615 641 { 616 642 int i; 617 643 struct kvm_kpit_channel_state *c; 618 644 619 - mutex_lock(&pit->pit_state.lock); 620 645 pit->pit_state.flags = 0; 621 646 for (i = 0; i < 3; i++) { 622 647 c = &pit->pit_state.channels[i]; 623 648 c->mode = 0xff; 624 649 c->gate = (i != 2); 625 - pit_load_count(pit->kvm, i, 0); 650 + pit_load_count(pit, i, 0); 626 651 } 627 - mutex_unlock(&pit->pit_state.lock); 628 652 629 - atomic_set(&pit->pit_state.pending, 0); 630 - pit->pit_state.irq_ack = 1; 653 + kvm_pit_reset_reinject(pit); 631 654 } 632 655 633 656 static void pit_mask_notifer(struct kvm_irq_mask_notifier *kimn, bool mask) 634 657 { 635 658 struct kvm_pit *pit = container_of(kimn, struct kvm_pit, mask_notifier); 636 659 637 - if (!mask) { 638 - atomic_set(&pit->pit_state.pending, 0); 639 - pit->pit_state.irq_ack = 1; 640 - } 660 + if (!mask) 661 + kvm_pit_reset_reinject(pit); 641 662 } 642 663 643 664 static const struct kvm_io_device_ops pit_dev_ops = { ··· 659 690 return NULL; 660 691 661 692 pit->irq_source_id = kvm_request_irq_source_id(kvm); 662 - if (pit->irq_source_id < 0) { 663 - kfree(pit); 664 - return NULL; 665 - } 693 + if (pit->irq_source_id < 0) 694 + goto fail_request; 666 695 667 696 mutex_init(&pit->pit_state.lock); 668 - mutex_lock(&pit->pit_state.lock); 669 - spin_lock_init(&pit->pit_state.inject_lock); 670 697 671 698 pid = get_pid(task_tgid(current)); 672 699 pid_nr = pid_vnr(pid); ··· 671 706 init_kthread_worker(&pit->worker); 672 707 pit->worker_task = kthread_run(kthread_worker_fn, &pit->worker, 673 708 "kvm-pit/%d", pid_nr); 674 - if (IS_ERR(pit->worker_task)) { 675 - mutex_unlock(&pit->pit_state.lock); 676 - kvm_free_irq_source_id(kvm, pit->irq_source_id); 677 - kfree(pit); 678 - return NULL; 679 - } 709 + if (IS_ERR(pit->worker_task)) 710 + goto fail_kthread; 711 + 680 712 init_kthread_work(&pit->expired, pit_do_work); 681 713 682 - kvm->arch.vpit = pit; 683 714 pit->kvm = kvm; 684 715 685 716 pit_state = &pit->pit_state; 686 - pit_state->pit = pit; 687 717 hrtimer_init(&pit_state->timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); 718 + pit_state->timer.function = pit_timer_fn; 719 + 688 720 pit_state->irq_ack_notifier.gsi = 0; 689 721 pit_state->irq_ack_notifier.irq_acked = kvm_pit_ack_irq; 690 - kvm_register_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier); 691 - pit_state->reinject = true; 692 - mutex_unlock(&pit->pit_state.lock); 722 + pit->mask_notifier.func = pit_mask_notifer; 693 723 694 724 kvm_pit_reset(pit); 695 725 696 - pit->mask_notifier.func = pit_mask_notifer; 697 - kvm_register_irq_mask_notifier(kvm, 0, &pit->mask_notifier); 726 + kvm_pit_set_reinject(pit, true); 698 727 699 728 kvm_iodevice_init(&pit->dev, &pit_dev_ops); 700 729 ret = kvm_io_bus_register_dev(kvm, KVM_PIO_BUS, KVM_PIT_BASE_ADDRESS, 701 730 KVM_PIT_MEM_LENGTH, &pit->dev); 702 731 if (ret < 0) 703 - goto fail; 732 + goto fail_register_pit; 704 733 705 734 if (flags & KVM_PIT_SPEAKER_DUMMY) { 706 735 kvm_iodevice_init(&pit->speaker_dev, &speaker_dev_ops); ··· 702 743 KVM_SPEAKER_BASE_ADDRESS, 4, 703 744 &pit->speaker_dev); 704 745 if (ret < 0) 705 - goto fail_unregister; 746 + goto fail_register_speaker; 706 747 } 707 748 708 749 return pit; 709 750 710 - fail_unregister: 751 + fail_register_speaker: 711 752 kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->dev); 712 - 713 - fail: 714 - kvm_unregister_irq_mask_notifier(kvm, 0, &pit->mask_notifier); 715 - kvm_unregister_irq_ack_notifier(kvm, &pit_state->irq_ack_notifier); 716 - kvm_free_irq_source_id(kvm, pit->irq_source_id); 753 + fail_register_pit: 754 + kvm_pit_set_reinject(pit, false); 717 755 kthread_stop(pit->worker_task); 756 + fail_kthread: 757 + kvm_free_irq_source_id(kvm, pit->irq_source_id); 758 + fail_request: 718 759 kfree(pit); 719 760 return NULL; 720 761 } 721 762 722 763 void kvm_free_pit(struct kvm *kvm) 723 764 { 724 - struct hrtimer *timer; 765 + struct kvm_pit *pit = kvm->arch.vpit; 725 766 726 - if (kvm->arch.vpit) { 727 - kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &kvm->arch.vpit->dev); 728 - kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, 729 - &kvm->arch.vpit->speaker_dev); 730 - kvm_unregister_irq_mask_notifier(kvm, 0, 731 - &kvm->arch.vpit->mask_notifier); 732 - kvm_unregister_irq_ack_notifier(kvm, 733 - &kvm->arch.vpit->pit_state.irq_ack_notifier); 734 - mutex_lock(&kvm->arch.vpit->pit_state.lock); 735 - timer = &kvm->arch.vpit->pit_state.timer; 736 - hrtimer_cancel(timer); 737 - flush_kthread_work(&kvm->arch.vpit->expired); 738 - kthread_stop(kvm->arch.vpit->worker_task); 739 - kvm_free_irq_source_id(kvm, kvm->arch.vpit->irq_source_id); 740 - mutex_unlock(&kvm->arch.vpit->pit_state.lock); 741 - kfree(kvm->arch.vpit); 767 + if (pit) { 768 + kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->dev); 769 + kvm_io_bus_unregister_dev(kvm, KVM_PIO_BUS, &pit->speaker_dev); 770 + kvm_pit_set_reinject(pit, false); 771 + hrtimer_cancel(&pit->pit_state.timer); 772 + flush_kthread_work(&pit->expired); 773 + kthread_stop(pit->worker_task); 774 + kvm_free_irq_source_id(kvm, pit->irq_source_id); 775 + kfree(pit); 742 776 } 743 777 }

+9 -8

arch/x86/kvm/i8254.h

··· 22 22 }; 23 23 24 24 struct kvm_kpit_state { 25 + /* All members before "struct mutex lock" are protected by the lock. */ 25 26 struct kvm_kpit_channel_state channels[3]; 26 27 u32 flags; 27 28 bool is_periodic; 28 29 s64 period; /* unit: ns */ 29 30 struct hrtimer timer; 30 - atomic_t pending; /* accumulated triggered timers */ 31 - bool reinject; 32 - struct kvm *kvm; 33 31 u32 speaker_data_on; 32 + 34 33 struct mutex lock; 35 - struct kvm_pit *pit; 36 - spinlock_t inject_lock; 37 - unsigned long irq_ack; 34 + atomic_t reinject; 35 + atomic_t pending; /* accumulated triggered timers */ 36 + atomic_t irq_ack; 38 37 struct kvm_irq_ack_notifier irq_ack_notifier; 39 38 }; 40 39 ··· 56 57 #define KVM_MAX_PIT_INTR_INTERVAL HZ / 100 57 58 #define KVM_PIT_CHANNEL_MASK 0x3 58 59 59 - void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start); 60 60 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags); 61 61 void kvm_free_pit(struct kvm *kvm); 62 - void kvm_pit_reset(struct kvm_pit *pit); 62 + 63 + void kvm_pit_load_count(struct kvm_pit *pit, int channel, u32 val, 64 + int hpet_legacy_start); 65 + void kvm_pit_set_reinject(struct kvm_pit *pit, bool reinject); 63 66 64 67 #endif

+21 -9

arch/x86/kvm/ioapic.c

··· 94 94 static void rtc_irq_eoi_tracking_reset(struct kvm_ioapic *ioapic) 95 95 { 96 96 ioapic->rtc_status.pending_eoi = 0; 97 - bitmap_zero(ioapic->rtc_status.dest_map, KVM_MAX_VCPUS); 97 + bitmap_zero(ioapic->rtc_status.dest_map.map, KVM_MAX_VCPUS); 98 98 } 99 99 100 100 static void kvm_rtc_eoi_tracking_restore_all(struct kvm_ioapic *ioapic); ··· 117 117 return; 118 118 119 119 new_val = kvm_apic_pending_eoi(vcpu, e->fields.vector); 120 - old_val = test_bit(vcpu->vcpu_id, ioapic->rtc_status.dest_map); 120 + old_val = test_bit(vcpu->vcpu_id, ioapic->rtc_status.dest_map.map); 121 121 122 122 if (new_val == old_val) 123 123 return; 124 124 125 125 if (new_val) { 126 - __set_bit(vcpu->vcpu_id, ioapic->rtc_status.dest_map); 126 + __set_bit(vcpu->vcpu_id, ioapic->rtc_status.dest_map.map); 127 127 ioapic->rtc_status.pending_eoi++; 128 128 } else { 129 - __clear_bit(vcpu->vcpu_id, ioapic->rtc_status.dest_map); 129 + __clear_bit(vcpu->vcpu_id, ioapic->rtc_status.dest_map.map); 130 130 ioapic->rtc_status.pending_eoi--; 131 131 rtc_status_pending_eoi_check_valid(ioapic); 132 132 } ··· 156 156 157 157 static void rtc_irq_eoi(struct kvm_ioapic *ioapic, struct kvm_vcpu *vcpu) 158 158 { 159 - if (test_and_clear_bit(vcpu->vcpu_id, ioapic->rtc_status.dest_map)) { 159 + if (test_and_clear_bit(vcpu->vcpu_id, 160 + ioapic->rtc_status.dest_map.map)) { 160 161 --ioapic->rtc_status.pending_eoi; 161 162 rtc_status_pending_eoi_check_valid(ioapic); 162 163 } ··· 237 236 void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu, ulong *ioapic_handled_vectors) 238 237 { 239 238 struct kvm_ioapic *ioapic = vcpu->kvm->arch.vioapic; 239 + struct dest_map *dest_map = &ioapic->rtc_status.dest_map; 240 240 union kvm_ioapic_redirect_entry *e; 241 241 int index; 242 242 243 243 spin_lock(&ioapic->lock); 244 + 245 + /* Make sure we see any missing RTC EOI */ 246 + if (test_bit(vcpu->vcpu_id, dest_map->map)) 247 + __set_bit(dest_map->vectors[vcpu->vcpu_id], 248 + ioapic_handled_vectors); 249 + 244 250 for (index = 0; index < IOAPIC_NUM_PINS; index++) { 245 251 e = &ioapic->redirtbl[index]; 246 252 if (e->fields.trig_mode == IOAPIC_LEVEL_TRIG || ··· 354 346 */ 355 347 BUG_ON(ioapic->rtc_status.pending_eoi != 0); 356 348 ret = kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe, 357 - ioapic->rtc_status.dest_map); 349 + &ioapic->rtc_status.dest_map); 358 350 ioapic->rtc_status.pending_eoi = (ret < 0 ? 0 : ret); 359 351 } else 360 352 ret = kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe, NULL); ··· 415 407 static void __kvm_ioapic_update_eoi(struct kvm_vcpu *vcpu, 416 408 struct kvm_ioapic *ioapic, int vector, int trigger_mode) 417 409 { 418 - int i; 410 + struct dest_map *dest_map = &ioapic->rtc_status.dest_map; 419 411 struct kvm_lapic *apic = vcpu->arch.apic; 412 + int i; 413 + 414 + /* RTC special handling */ 415 + if (test_bit(vcpu->vcpu_id, dest_map->map) && 416 + vector == dest_map->vectors[vcpu->vcpu_id]) 417 + rtc_irq_eoi(ioapic, vcpu); 420 418 421 419 for (i = 0; i < IOAPIC_NUM_PINS; i++) { 422 420 union kvm_ioapic_redirect_entry *ent = &ioapic->redirtbl[i]; ··· 430 416 if (ent->fields.vector != vector) 431 417 continue; 432 418 433 - if (i == RTC_GSI) 434 - rtc_irq_eoi(ioapic, vcpu); 435 419 /* 436 420 * We are dropping lock while calling ack notifiers because ack 437 421 * notifier callbacks for assigned devices call into IOAPIC

+15 -2

arch/x86/kvm/ioapic.h

··· 40 40 #define RTC_GSI -1U 41 41 #endif 42 42 43 + struct dest_map { 44 + /* vcpu bitmap where IRQ has been sent */ 45 + DECLARE_BITMAP(map, KVM_MAX_VCPUS); 46 + 47 + /* 48 + * Vector sent to a given vcpu, only valid when 49 + * the vcpu's bit in map is set 50 + */ 51 + u8 vectors[KVM_MAX_VCPUS]; 52 + }; 53 + 54 + 43 55 struct rtc_status { 44 56 int pending_eoi; 45 - DECLARE_BITMAP(dest_map, KVM_MAX_VCPUS); 57 + struct dest_map dest_map; 46 58 }; 47 59 48 60 union kvm_ioapic_redirect_entry { ··· 130 118 int level, bool line_status); 131 119 void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int irq_source_id); 132 120 int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, 133 - struct kvm_lapic_irq *irq, unsigned long *dest_map); 121 + struct kvm_lapic_irq *irq, 122 + struct dest_map *dest_map); 134 123 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state); 135 124 int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state); 136 125 void kvm_ioapic_scan_entry(struct kvm_vcpu *vcpu,

+6 -3

arch/x86/kvm/irq.c

··· 33 33 */ 34 34 int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) 35 35 { 36 - return apic_has_pending_timer(vcpu); 36 + if (lapic_in_kernel(vcpu)) 37 + return apic_has_pending_timer(vcpu); 38 + 39 + return 0; 37 40 } 38 41 EXPORT_SYMBOL(kvm_cpu_has_pending_timer); 39 42 ··· 140 137 141 138 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu) 142 139 { 143 - kvm_inject_apic_timer_irqs(vcpu); 144 - /* TODO: PIT, RTC etc. */ 140 + if (lapic_in_kernel(vcpu)) 141 + kvm_inject_apic_timer_irqs(vcpu); 145 142 } 146 143 EXPORT_SYMBOL_GPL(kvm_inject_pending_timer_irqs); 147 144

-8

arch/x86/kvm/irq.h

··· 109 109 return ret; 110 110 } 111 111 112 - static inline int lapic_in_kernel(struct kvm_vcpu *vcpu) 113 - { 114 - /* Same as irqchip_in_kernel(vcpu->kvm), but with less 115 - * pointer chasing and no unnecessary memory barriers. 116 - */ 117 - return vcpu->arch.apic != NULL; 118 - } 119 - 120 112 void kvm_pic_reset(struct kvm_kpic_state *s); 121 113 122 114 void kvm_inject_pending_timer_irqs(struct kvm_vcpu *vcpu);

+22 -5

arch/x86/kvm/irq_comm.c

··· 34 34 #include "lapic.h" 35 35 36 36 #include "hyperv.h" 37 + #include "x86.h" 37 38 38 39 static int kvm_set_pic_irq(struct kvm_kernel_irq_routing_entry *e, 39 40 struct kvm *kvm, int irq_source_id, int level, ··· 54 53 } 55 54 56 55 int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src, 57 - struct kvm_lapic_irq *irq, unsigned long *dest_map) 56 + struct kvm_lapic_irq *irq, struct dest_map *dest_map) 58 57 { 59 58 int i, r = -1; 60 59 struct kvm_vcpu *vcpu, *lowest = NULL; 60 + unsigned long dest_vcpu_bitmap[BITS_TO_LONGS(KVM_MAX_VCPUS)]; 61 + unsigned int dest_vcpus = 0; 61 62 62 63 if (irq->dest_mode == 0 && irq->dest_id == 0xff && 63 64 kvm_lowest_prio_delivery(irq)) { ··· 69 66 70 67 if (kvm_irq_delivery_to_apic_fast(kvm, src, irq, &r, dest_map)) 71 68 return r; 69 + 70 + memset(dest_vcpu_bitmap, 0, sizeof(dest_vcpu_bitmap)); 72 71 73 72 kvm_for_each_vcpu(i, vcpu, kvm) { 74 73 if (!kvm_apic_present(vcpu)) ··· 85 80 r = 0; 86 81 r += kvm_apic_set_irq(vcpu, irq, dest_map); 87 82 } else if (kvm_lapic_enabled(vcpu)) { 88 - if (!lowest) 89 - lowest = vcpu; 90 - else if (kvm_apic_compare_prio(vcpu, lowest) < 0) 91 - lowest = vcpu; 83 + if (!kvm_vector_hashing_enabled()) { 84 + if (!lowest) 85 + lowest = vcpu; 86 + else if (kvm_apic_compare_prio(vcpu, lowest) < 0) 87 + lowest = vcpu; 88 + } else { 89 + __set_bit(i, dest_vcpu_bitmap); 90 + dest_vcpus++; 91 + } 92 92 } 93 + } 94 + 95 + if (dest_vcpus != 0) { 96 + int idx = kvm_vector_to_index(irq->vector, dest_vcpus, 97 + dest_vcpu_bitmap, KVM_MAX_VCPUS); 98 + 99 + lowest = kvm_get_vcpu(kvm, idx); 93 100 } 94 101 95 102 if (lowest)

+112 -46

arch/x86/kvm/lapic.c

··· 281 281 struct kvm_cpuid_entry2 *feat; 282 282 u32 v = APIC_VERSION; 283 283 284 - if (!kvm_vcpu_has_lapic(vcpu)) 284 + if (!lapic_in_kernel(vcpu)) 285 285 return; 286 286 287 287 feat = kvm_find_cpuid_entry(apic->vcpu, 0x1, 0); ··· 475 475 476 476 int kvm_lapic_find_highest_irr(struct kvm_vcpu *vcpu) 477 477 { 478 - int highest_irr; 479 - 480 478 /* This may race with setting of irr in __apic_accept_irq() and 481 479 * value returned may be wrong, but kvm_vcpu_kick() in __apic_accept_irq 482 480 * will cause vmexit immediately and the value will be recalculated 483 481 * on the next vmentry. 484 482 */ 485 - if (!kvm_vcpu_has_lapic(vcpu)) 486 - return 0; 487 - highest_irr = apic_find_highest_irr(vcpu->arch.apic); 488 - 489 - return highest_irr; 483 + return apic_find_highest_irr(vcpu->arch.apic); 490 484 } 491 485 492 486 static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, 493 487 int vector, int level, int trig_mode, 494 - unsigned long *dest_map); 488 + struct dest_map *dest_map); 495 489 496 490 int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, 497 - unsigned long *dest_map) 491 + struct dest_map *dest_map) 498 492 { 499 493 struct kvm_lapic *apic = vcpu->arch.apic; 500 494 ··· 669 675 } 670 676 } 671 677 678 + int kvm_vector_to_index(u32 vector, u32 dest_vcpus, 679 + const unsigned long *bitmap, u32 bitmap_size) 680 + { 681 + u32 mod; 682 + int i, idx = -1; 683 + 684 + mod = vector % dest_vcpus; 685 + 686 + for (i = 0; i <= mod; i++) { 687 + idx = find_next_bit(bitmap, bitmap_size, idx + 1); 688 + BUG_ON(idx == bitmap_size); 689 + } 690 + 691 + return idx; 692 + } 693 + 694 + static void kvm_apic_disabled_lapic_found(struct kvm *kvm) 695 + { 696 + if (!kvm->arch.disabled_lapic_found) { 697 + kvm->arch.disabled_lapic_found = true; 698 + printk(KERN_INFO 699 + "Disabled LAPIC found during irq injection\n"); 700 + } 701 + } 702 + 672 703 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, 673 - struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map) 704 + struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map) 674 705 { 675 706 struct kvm_apic_map *map; 676 707 unsigned long bitmap = 1; ··· 746 727 747 728 dst = map->logical_map[cid]; 748 729 749 - if (kvm_lowest_prio_delivery(irq)) { 730 + if (!kvm_lowest_prio_delivery(irq)) 731 + goto set_irq; 732 + 733 + if (!kvm_vector_hashing_enabled()) { 750 734 int l = -1; 751 735 for_each_set_bit(i, &bitmap, 16) { 752 736 if (!dst[i]) 753 737 continue; 754 738 if (l < 0) 755 739 l = i; 756 - else if (kvm_apic_compare_prio(dst[i]->vcpu, dst[l]->vcpu) < 0) 740 + else if (kvm_apic_compare_prio(dst[i]->vcpu, 741 + dst[l]->vcpu) < 0) 757 742 l = i; 758 743 } 759 - 760 744 bitmap = (l >= 0) ? 1 << l : 0; 745 + } else { 746 + int idx; 747 + unsigned int dest_vcpus; 748 + 749 + dest_vcpus = hweight16(bitmap); 750 + if (dest_vcpus == 0) 751 + goto out; 752 + 753 + idx = kvm_vector_to_index(irq->vector, 754 + dest_vcpus, &bitmap, 16); 755 + 756 + if (!dst[idx]) { 757 + kvm_apic_disabled_lapic_found(kvm); 758 + goto out; 759 + } 760 + 761 + bitmap = (idx >= 0) ? 1 << idx : 0; 761 762 } 762 763 } 763 764 765 + set_irq: 764 766 for_each_set_bit(i, &bitmap, 16) { 765 767 if (!dst[i]) 766 768 continue; ··· 794 754 return ret; 795 755 } 796 756 757 + /* 758 + * This routine tries to handler interrupts in posted mode, here is how 759 + * it deals with different cases: 760 + * - For single-destination interrupts, handle it in posted mode 761 + * - Else if vector hashing is enabled and it is a lowest-priority 762 + * interrupt, handle it in posted mode and use the following mechanism 763 + * to find the destinaiton vCPU. 764 + * 1. For lowest-priority interrupts, store all the possible 765 + * destination vCPUs in an array. 766 + * 2. Use "guest vector % max number of destination vCPUs" to find 767 + * the right destination vCPU in the array for the lowest-priority 768 + * interrupt. 769 + * - Otherwise, use remapped mode to inject the interrupt. 770 + */ 797 771 bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq, 798 772 struct kvm_vcpu **dest_vcpu) 799 773 { ··· 849 795 if (cid >= ARRAY_SIZE(map->logical_map)) 850 796 goto out; 851 797 852 - for_each_set_bit(i, &bitmap, 16) { 853 - dst = map->logical_map[cid][i]; 854 - if (++r == 2) 798 + if (kvm_vector_hashing_enabled() && 799 + kvm_lowest_prio_delivery(irq)) { 800 + int idx; 801 + unsigned int dest_vcpus; 802 + 803 + dest_vcpus = hweight16(bitmap); 804 + if (dest_vcpus == 0) 805 + goto out; 806 + 807 + idx = kvm_vector_to_index(irq->vector, dest_vcpus, 808 + &bitmap, 16); 809 + 810 + dst = map->logical_map[cid][idx]; 811 + if (!dst) { 812 + kvm_apic_disabled_lapic_found(kvm); 813 + goto out; 814 + } 815 + 816 + *dest_vcpu = dst->vcpu; 817 + } else { 818 + for_each_set_bit(i, &bitmap, 16) { 819 + dst = map->logical_map[cid][i]; 820 + if (++r == 2) 821 + goto out; 822 + } 823 + 824 + if (dst && kvm_apic_present(dst->vcpu)) 825 + *dest_vcpu = dst->vcpu; 826 + else 855 827 goto out; 856 828 } 857 - 858 - if (dst && kvm_apic_present(dst->vcpu)) 859 - *dest_vcpu = dst->vcpu; 860 - else 861 - goto out; 862 829 } 863 830 864 831 ret = true; ··· 894 819 */ 895 820 static int __apic_accept_irq(struct kvm_lapic *apic, int delivery_mode, 896 821 int vector, int level, int trig_mode, 897 - unsigned long *dest_map) 822 + struct dest_map *dest_map) 898 823 { 899 824 int result = 0; 900 825 struct kvm_vcpu *vcpu = apic->vcpu; ··· 914 839 915 840 result = 1; 916 841 917 - if (dest_map) 918 - __set_bit(vcpu->vcpu_id, dest_map); 842 + if (dest_map) { 843 + __set_bit(vcpu->vcpu_id, dest_map->map); 844 + dest_map->vectors[vcpu->vcpu_id] = vector; 845 + } 919 846 920 847 if (apic_test_vector(vector, apic->regs + APIC_TMR) != !!trig_mode) { 921 848 if (trig_mode) ··· 1316 1239 struct kvm_lapic *apic = vcpu->arch.apic; 1317 1240 u64 guest_tsc, tsc_deadline; 1318 1241 1319 - if (!kvm_vcpu_has_lapic(vcpu)) 1242 + if (!lapic_in_kernel(vcpu)) 1320 1243 return; 1321 1244 1322 1245 if (apic->lapic_timer.expired_tscdeadline == 0) ··· 1592 1515 1593 1516 void kvm_lapic_set_eoi(struct kvm_vcpu *vcpu) 1594 1517 { 1595 - if (kvm_vcpu_has_lapic(vcpu)) 1596 - apic_reg_write(vcpu->arch.apic, APIC_EOI, 0); 1518 + apic_reg_write(vcpu->arch.apic, APIC_EOI, 0); 1597 1519 } 1598 1520 EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi); 1599 1521 ··· 1642 1566 { 1643 1567 struct kvm_lapic *apic = vcpu->arch.apic; 1644 1568 1645 - if (!kvm_vcpu_has_lapic(vcpu) || apic_lvtt_oneshot(apic) || 1569 + if (!lapic_in_kernel(vcpu) || apic_lvtt_oneshot(apic) || 1646 1570 apic_lvtt_period(apic)) 1647 1571 return 0; 1648 1572 ··· 1653 1577 { 1654 1578 struct kvm_lapic *apic = vcpu->arch.apic; 1655 1579 1656 - if (!kvm_vcpu_has_lapic(vcpu) || apic_lvtt_oneshot(apic) || 1580 + if (!lapic_in_kernel(vcpu) || apic_lvtt_oneshot(apic) || 1657 1581 apic_lvtt_period(apic)) 1658 1582 return; 1659 1583 ··· 1666 1590 { 1667 1591 struct kvm_lapic *apic = vcpu->arch.apic; 1668 1592 1669 - if (!kvm_vcpu_has_lapic(vcpu)) 1670 - return; 1671 - 1672 1593 apic_set_tpr(apic, ((cr8 & 0x0f) << 4) 1673 1594 | (kvm_apic_get_reg(apic, APIC_TASKPRI) & 4)); 1674 1595 } ··· 1673 1600 u64 kvm_lapic_get_cr8(struct kvm_vcpu *vcpu) 1674 1601 { 1675 1602 u64 tpr; 1676 - 1677 - if (!kvm_vcpu_has_lapic(vcpu)) 1678 - return 0; 1679 1603 1680 1604 tpr = (u64) kvm_apic_get_reg(vcpu->arch.apic, APIC_TASKPRI); 1681 1605 ··· 1798 1728 { 1799 1729 struct kvm_lapic *apic = vcpu->arch.apic; 1800 1730 1801 - if (kvm_vcpu_has_lapic(vcpu) && apic_enabled(apic) && 1802 - apic_lvt_enabled(apic, APIC_LVTT)) 1731 + if (apic_enabled(apic) && apic_lvt_enabled(apic, APIC_LVTT)) 1803 1732 return atomic_read(&apic->lapic_timer.pending); 1804 1733 1805 1734 return 0; ··· 1895 1826 struct kvm_lapic *apic = vcpu->arch.apic; 1896 1827 int highest_irr; 1897 1828 1898 - if (!kvm_vcpu_has_lapic(vcpu) || !apic_enabled(apic)) 1829 + if (!apic_enabled(apic)) 1899 1830 return -1; 1900 1831 1901 1832 apic_update_ppr(apic); ··· 1922 1853 void kvm_inject_apic_timer_irqs(struct kvm_vcpu *vcpu) 1923 1854 { 1924 1855 struct kvm_lapic *apic = vcpu->arch.apic; 1925 - 1926 - if (!kvm_vcpu_has_lapic(vcpu)) 1927 - return; 1928 1856 1929 1857 if (atomic_read(&apic->lapic_timer.pending) > 0) { 1930 1858 kvm_apic_local_deliver(apic, APIC_LVTT); ··· 1998 1932 { 1999 1933 struct hrtimer *timer; 2000 1934 2001 - if (!kvm_vcpu_has_lapic(vcpu)) 1935 + if (!lapic_in_kernel(vcpu)) 2002 1936 return; 2003 1937 2004 1938 timer = &vcpu->arch.apic->lapic_timer.timer; ··· 2171 2105 { 2172 2106 struct kvm_lapic *apic = vcpu->arch.apic; 2173 2107 2174 - if (!kvm_vcpu_has_lapic(vcpu)) 2108 + if (!lapic_in_kernel(vcpu)) 2175 2109 return 1; 2176 2110 2177 2111 /* if this is ICR write vector before command */ ··· 2185 2119 struct kvm_lapic *apic = vcpu->arch.apic; 2186 2120 u32 low, high = 0; 2187 2121 2188 - if (!kvm_vcpu_has_lapic(vcpu)) 2122 + if (!lapic_in_kernel(vcpu)) 2189 2123 return 1; 2190 2124 2191 2125 if (apic_reg_read(apic, reg, 4, &low)) ··· 2217 2151 u8 sipi_vector; 2218 2152 unsigned long pe; 2219 2153 2220 - if (!kvm_vcpu_has_lapic(vcpu) || !apic->pending_events) 2154 + if (!lapic_in_kernel(vcpu) || !apic->pending_events) 2221 2155 return; 2222 2156 2223 2157 /*

+11 -6

arch/x86/kvm/lapic.h

··· 42 42 unsigned long pending_events; 43 43 unsigned int sipi_vector; 44 44 }; 45 + 46 + struct dest_map; 47 + 45 48 int kvm_create_lapic(struct kvm_vcpu *vcpu); 46 49 void kvm_free_lapic(struct kvm_vcpu *vcpu); 47 50 ··· 63 60 void __kvm_apic_update_irr(u32 *pir, void *regs); 64 61 void kvm_apic_update_irr(struct kvm_vcpu *vcpu, u32 *pir); 65 62 int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, 66 - unsigned long *dest_map); 63 + struct dest_map *dest_map); 67 64 int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type); 68 65 69 66 bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic *src, 70 - struct kvm_lapic_irq *irq, int *r, unsigned long *dest_map); 67 + struct kvm_lapic_irq *irq, int *r, struct dest_map *dest_map); 71 68 72 69 u64 kvm_get_apic_base(struct kvm_vcpu *vcpu); 73 70 int kvm_set_apic_base(struct kvm_vcpu *vcpu, struct msr_data *msr_info); ··· 106 103 107 104 extern struct static_key kvm_no_apic_vcpu; 108 105 109 - static inline bool kvm_vcpu_has_lapic(struct kvm_vcpu *vcpu) 106 + static inline bool lapic_in_kernel(struct kvm_vcpu *vcpu) 110 107 { 111 108 if (static_key_false(&kvm_no_apic_vcpu)) 112 109 return vcpu->arch.apic; ··· 133 130 134 131 static inline bool kvm_apic_present(struct kvm_vcpu *vcpu) 135 132 { 136 - return kvm_vcpu_has_lapic(vcpu) && kvm_apic_hw_enabled(vcpu->arch.apic); 133 + return lapic_in_kernel(vcpu) && kvm_apic_hw_enabled(vcpu->arch.apic); 137 134 } 138 135 139 136 static inline int kvm_lapic_enabled(struct kvm_vcpu *vcpu) ··· 153 150 154 151 static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu) 155 152 { 156 - return kvm_vcpu_has_lapic(vcpu) && vcpu->arch.apic->pending_events; 153 + return lapic_in_kernel(vcpu) && vcpu->arch.apic->pending_events; 157 154 } 158 155 159 156 static inline bool kvm_lowest_prio_delivery(struct kvm_lapic_irq *irq) ··· 164 161 165 162 static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu) 166 163 { 167 - return kvm_vcpu_has_lapic(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events); 164 + return lapic_in_kernel(vcpu) && test_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events); 168 165 } 169 166 170 167 static inline int kvm_apic_id(struct kvm_lapic *apic) ··· 178 175 179 176 bool kvm_intr_is_single_vcpu_fast(struct kvm *kvm, struct kvm_lapic_irq *irq, 180 177 struct kvm_vcpu **dest_vcpu); 178 + int kvm_vector_to_index(u32 vector, u32 dest_vcpus, 179 + const unsigned long *bitmap, u32 bitmap_size); 181 180 #endif

+290 -212

arch/x86/kvm/mmu.c

··· 41 41 #include <asm/cmpxchg.h> 42 42 #include <asm/io.h> 43 43 #include <asm/vmx.h> 44 + #include <asm/kvm_page_track.h> 44 45 45 46 /* 46 47 * When setting this variable to true it enables Two-Dimensional-Paging ··· 777 776 return &slot->arch.lpage_info[level - 2][idx]; 778 777 } 779 778 779 + static void update_gfn_disallow_lpage_count(struct kvm_memory_slot *slot, 780 + gfn_t gfn, int count) 781 + { 782 + struct kvm_lpage_info *linfo; 783 + int i; 784 + 785 + for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) { 786 + linfo = lpage_info_slot(gfn, slot, i); 787 + linfo->disallow_lpage += count; 788 + WARN_ON(linfo->disallow_lpage < 0); 789 + } 790 + } 791 + 792 + void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn) 793 + { 794 + update_gfn_disallow_lpage_count(slot, gfn, 1); 795 + } 796 + 797 + void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn) 798 + { 799 + update_gfn_disallow_lpage_count(slot, gfn, -1); 800 + } 801 + 780 802 static void account_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) 781 803 { 782 804 struct kvm_memslots *slots; 783 805 struct kvm_memory_slot *slot; 784 - struct kvm_lpage_info *linfo; 785 806 gfn_t gfn; 786 - int i; 787 807 808 + kvm->arch.indirect_shadow_pages++; 788 809 gfn = sp->gfn; 789 810 slots = kvm_memslots_for_spte_role(kvm, sp->role); 790 811 slot = __gfn_to_memslot(slots, gfn); 791 - for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) { 792 - linfo = lpage_info_slot(gfn, slot, i); 793 - linfo->write_count += 1; 794 - } 795 - kvm->arch.indirect_shadow_pages++; 812 + 813 + /* the non-leaf shadow pages are keeping readonly. */ 814 + if (sp->role.level > PT_PAGE_TABLE_LEVEL) 815 + return kvm_slot_page_track_add_page(kvm, slot, gfn, 816 + KVM_PAGE_TRACK_WRITE); 817 + 818 + kvm_mmu_gfn_disallow_lpage(slot, gfn); 796 819 } 797 820 798 821 static void unaccount_shadowed(struct kvm *kvm, struct kvm_mmu_page *sp) 799 822 { 800 823 struct kvm_memslots *slots; 801 824 struct kvm_memory_slot *slot; 802 - struct kvm_lpage_info *linfo; 803 825 gfn_t gfn; 804 - int i; 805 826 827 + kvm->arch.indirect_shadow_pages--; 806 828 gfn = sp->gfn; 807 829 slots = kvm_memslots_for_spte_role(kvm, sp->role); 808 830 slot = __gfn_to_memslot(slots, gfn); 809 - for (i = PT_DIRECTORY_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) { 810 - linfo = lpage_info_slot(gfn, slot, i); 811 - linfo->write_count -= 1; 812 - WARN_ON(linfo->write_count < 0); 813 - } 814 - kvm->arch.indirect_shadow_pages--; 831 + if (sp->role.level > PT_PAGE_TABLE_LEVEL) 832 + return kvm_slot_page_track_remove_page(kvm, slot, gfn, 833 + KVM_PAGE_TRACK_WRITE); 834 + 835 + kvm_mmu_gfn_allow_lpage(slot, gfn); 815 836 } 816 837 817 - static int __has_wrprotected_page(gfn_t gfn, int level, 818 - struct kvm_memory_slot *slot) 838 + static bool __mmu_gfn_lpage_is_disallowed(gfn_t gfn, int level, 839 + struct kvm_memory_slot *slot) 819 840 { 820 841 struct kvm_lpage_info *linfo; 821 842 822 843 if (slot) { 823 844 linfo = lpage_info_slot(gfn, slot, level); 824 - return linfo->write_count; 845 + return !!linfo->disallow_lpage; 825 846 } 826 847 827 - return 1; 848 + return true; 828 849 } 829 850 830 - static int has_wrprotected_page(struct kvm_vcpu *vcpu, gfn_t gfn, int level) 851 + static bool mmu_gfn_lpage_is_disallowed(struct kvm_vcpu *vcpu, gfn_t gfn, 852 + int level) 831 853 { 832 854 struct kvm_memory_slot *slot; 833 855 834 856 slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); 835 - return __has_wrprotected_page(gfn, level, slot); 857 + return __mmu_gfn_lpage_is_disallowed(gfn, level, slot); 836 858 } 837 859 838 860 static int host_mapping_level(struct kvm *kvm, gfn_t gfn) ··· 921 897 max_level = min(kvm_x86_ops->get_lpage_level(), host_level); 922 898 923 899 for (level = PT_DIRECTORY_LEVEL; level <= max_level; ++level) 924 - if (__has_wrprotected_page(large_gfn, level, slot)) 900 + if (__mmu_gfn_lpage_is_disallowed(large_gfn, level, slot)) 925 901 break; 926 902 927 903 return level - 1; ··· 1347 1323 kvm_mmu_write_protect_pt_masked(kvm, slot, gfn_offset, mask); 1348 1324 } 1349 1325 1350 - static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn) 1326 + bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, 1327 + struct kvm_memory_slot *slot, u64 gfn) 1351 1328 { 1352 - struct kvm_memory_slot *slot; 1353 1329 struct kvm_rmap_head *rmap_head; 1354 1330 int i; 1355 1331 bool write_protected = false; 1356 1332 1357 - slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); 1358 - 1359 1333 for (i = PT_PAGE_TABLE_LEVEL; i <= PT_MAX_HUGEPAGE_LEVEL; ++i) { 1360 1334 rmap_head = __gfn_to_rmap(gfn, i, slot); 1361 - write_protected |= __rmap_write_protect(vcpu->kvm, rmap_head, true); 1335 + write_protected |= __rmap_write_protect(kvm, rmap_head, true); 1362 1336 } 1363 1337 1364 1338 return write_protected; 1339 + } 1340 + 1341 + static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 gfn) 1342 + { 1343 + struct kvm_memory_slot *slot; 1344 + 1345 + slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); 1346 + return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn); 1365 1347 } 1366 1348 1367 1349 static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head) ··· 1784 1754 static int nonpaging_sync_page(struct kvm_vcpu *vcpu, 1785 1755 struct kvm_mmu_page *sp) 1786 1756 { 1787 - return 1; 1757 + return 0; 1788 1758 } 1789 1759 1790 1760 static void nonpaging_invlpg(struct kvm_vcpu *vcpu, gva_t gva) ··· 1870 1840 return nr_unsync_leaf; 1871 1841 } 1872 1842 1843 + #define INVALID_INDEX (-1) 1844 + 1873 1845 static int mmu_unsync_walk(struct kvm_mmu_page *sp, 1874 1846 struct kvm_mmu_pages *pvec) 1875 1847 { 1848 + pvec->nr = 0; 1876 1849 if (!sp->unsync_children) 1877 1850 return 0; 1878 1851 1879 - mmu_pages_add(pvec, sp, 0); 1852 + mmu_pages_add(pvec, sp, INVALID_INDEX); 1880 1853 return __mmu_unsync_walk(sp, pvec); 1881 1854 } 1882 1855 ··· 1916 1883 if ((_sp)->role.direct || (_sp)->role.invalid) {} else 1917 1884 1918 1885 /* @sp->gfn should be write-protected at the call site */ 1919 - static int __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 1920 - struct list_head *invalid_list, bool clear_unsync) 1886 + static bool __kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 1887 + struct list_head *invalid_list) 1921 1888 { 1922 1889 if (sp->role.cr4_pae != !!is_pae(vcpu)) { 1923 1890 kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); 1924 - return 1; 1891 + return false; 1925 1892 } 1926 1893 1927 - if (clear_unsync) 1928 - kvm_unlink_unsync_page(vcpu->kvm, sp); 1929 - 1930 - if (vcpu->arch.mmu.sync_page(vcpu, sp)) { 1894 + if (vcpu->arch.mmu.sync_page(vcpu, sp) == 0) { 1931 1895 kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); 1932 - return 1; 1896 + return false; 1933 1897 } 1934 1898 1935 - kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); 1936 - return 0; 1899 + return true; 1937 1900 } 1938 1901 1939 - static int kvm_sync_page_transient(struct kvm_vcpu *vcpu, 1940 - struct kvm_mmu_page *sp) 1902 + static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu, 1903 + struct list_head *invalid_list, 1904 + bool remote_flush, bool local_flush) 1941 1905 { 1942 - LIST_HEAD(invalid_list); 1943 - int ret; 1906 + if (!list_empty(invalid_list)) { 1907 + kvm_mmu_commit_zap_page(vcpu->kvm, invalid_list); 1908 + return; 1909 + } 1944 1910 1945 - ret = __kvm_sync_page(vcpu, sp, &invalid_list, false); 1946 - if (ret) 1947 - kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); 1948 - 1949 - return ret; 1911 + if (remote_flush) 1912 + kvm_flush_remote_tlbs(vcpu->kvm); 1913 + else if (local_flush) 1914 + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); 1950 1915 } 1951 1916 1952 1917 #ifdef CONFIG_KVM_MMU_AUDIT ··· 1954 1923 static void mmu_audit_disable(void) { } 1955 1924 #endif 1956 1925 1957 - static int kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 1926 + static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 1958 1927 struct list_head *invalid_list) 1959 1928 { 1960 - return __kvm_sync_page(vcpu, sp, invalid_list, true); 1929 + kvm_unlink_unsync_page(vcpu->kvm, sp); 1930 + return __kvm_sync_page(vcpu, sp, invalid_list); 1961 1931 } 1962 1932 1963 1933 /* @gfn should be write-protected at the call site */ 1964 - static void kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn) 1934 + static bool kvm_sync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, 1935 + struct list_head *invalid_list) 1965 1936 { 1966 1937 struct kvm_mmu_page *s; 1967 - LIST_HEAD(invalid_list); 1968 - bool flush = false; 1938 + bool ret = false; 1969 1939 1970 1940 for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) { 1971 1941 if (!s->unsync) 1972 1942 continue; 1973 1943 1974 1944 WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL); 1975 - kvm_unlink_unsync_page(vcpu->kvm, s); 1976 - if ((s->role.cr4_pae != !!is_pae(vcpu)) || 1977 - (vcpu->arch.mmu.sync_page(vcpu, s))) { 1978 - kvm_mmu_prepare_zap_page(vcpu->kvm, s, &invalid_list); 1979 - continue; 1980 - } 1981 - flush = true; 1945 + ret |= kvm_sync_page(vcpu, s, invalid_list); 1982 1946 } 1983 1947 1984 - kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); 1985 - if (flush) 1986 - kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); 1948 + return ret; 1987 1949 } 1988 1950 1989 1951 struct mmu_page_path { 1990 - struct kvm_mmu_page *parent[PT64_ROOT_LEVEL-1]; 1991 - unsigned int idx[PT64_ROOT_LEVEL-1]; 1952 + struct kvm_mmu_page *parent[PT64_ROOT_LEVEL]; 1953 + unsigned int idx[PT64_ROOT_LEVEL]; 1992 1954 }; 1993 1955 1994 1956 #define for_each_sp(pvec, sp, parents, i) \ 1995 - for (i = mmu_pages_next(&pvec, &parents, -1), \ 1996 - sp = pvec.page[i].sp; \ 1957 + for (i = mmu_pages_first(&pvec, &parents); \ 1997 1958 i < pvec.nr && ({ sp = pvec.page[i].sp; 1;}); \ 1998 1959 i = mmu_pages_next(&pvec, &parents, i)) 1999 1960 ··· 1997 1974 1998 1975 for (n = i+1; n < pvec->nr; n++) { 1999 1976 struct kvm_mmu_page *sp = pvec->page[n].sp; 1977 + unsigned idx = pvec->page[n].idx; 1978 + int level = sp->role.level; 2000 1979 2001 - if (sp->role.level == PT_PAGE_TABLE_LEVEL) { 2002 - parents->idx[0] = pvec->page[n].idx; 2003 - return n; 2004 - } 1980 + parents->idx[level-1] = idx; 1981 + if (level == PT_PAGE_TABLE_LEVEL) 1982 + break; 2005 1983 2006 - parents->parent[sp->role.level-2] = sp; 2007 - parents->idx[sp->role.level-1] = pvec->page[n].idx; 1984 + parents->parent[level-2] = sp; 2008 1985 } 2009 1986 2010 1987 return n; 1988 + } 1989 + 1990 + static int mmu_pages_first(struct kvm_mmu_pages *pvec, 1991 + struct mmu_page_path *parents) 1992 + { 1993 + struct kvm_mmu_page *sp; 1994 + int level; 1995 + 1996 + if (pvec->nr == 0) 1997 + return 0; 1998 + 1999 + WARN_ON(pvec->page[0].idx != INVALID_INDEX); 2000 + 2001 + sp = pvec->page[0].sp; 2002 + level = sp->role.level; 2003 + WARN_ON(level == PT_PAGE_TABLE_LEVEL); 2004 + 2005 + parents->parent[level-2] = sp; 2006 + 2007 + /* Also set up a sentinel. Further entries in pvec are all 2008 + * children of sp, so this element is never overwritten. 2009 + */ 2010 + parents->parent[level-1] = NULL; 2011 + return mmu_pages_next(pvec, parents, 0); 2011 2012 } 2012 2013 2013 2014 static void mmu_pages_clear_parents(struct mmu_page_path *parents) ··· 2041 1994 2042 1995 do { 2043 1996 unsigned int idx = parents->idx[level]; 2044 - 2045 1997 sp = parents->parent[level]; 2046 1998 if (!sp) 2047 1999 return; 2048 2000 2001 + WARN_ON(idx == INVALID_INDEX); 2049 2002 clear_unsync_child_bit(sp, idx); 2050 2003 level++; 2051 - } while (level < PT64_ROOT_LEVEL-1 && !sp->unsync_children); 2052 - } 2053 - 2054 - static void kvm_mmu_pages_init(struct kvm_mmu_page *parent, 2055 - struct mmu_page_path *parents, 2056 - struct kvm_mmu_pages *pvec) 2057 - { 2058 - parents->parent[parent->role.level-1] = NULL; 2059 - pvec->nr = 0; 2004 + } while (!sp->unsync_children); 2060 2005 } 2061 2006 2062 2007 static void mmu_sync_children(struct kvm_vcpu *vcpu, ··· 2059 2020 struct mmu_page_path parents; 2060 2021 struct kvm_mmu_pages pages; 2061 2022 LIST_HEAD(invalid_list); 2023 + bool flush = false; 2062 2024 2063 - kvm_mmu_pages_init(parent, &parents, &pages); 2064 2025 while (mmu_unsync_walk(parent, &pages)) { 2065 2026 bool protected = false; 2066 2027 2067 2028 for_each_sp(pages, sp, parents, i) 2068 2029 protected |= rmap_write_protect(vcpu, sp->gfn); 2069 2030 2070 - if (protected) 2031 + if (protected) { 2071 2032 kvm_flush_remote_tlbs(vcpu->kvm); 2033 + flush = false; 2034 + } 2072 2035 2073 2036 for_each_sp(pages, sp, parents, i) { 2074 - kvm_sync_page(vcpu, sp, &invalid_list); 2037 + flush |= kvm_sync_page(vcpu, sp, &invalid_list); 2075 2038 mmu_pages_clear_parents(&parents); 2076 2039 } 2077 - kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); 2078 - cond_resched_lock(&vcpu->kvm->mmu_lock); 2079 - kvm_mmu_pages_init(parent, &parents, &pages); 2040 + if (need_resched() || spin_needbreak(&vcpu->kvm->mmu_lock)) { 2041 + kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush); 2042 + cond_resched_lock(&vcpu->kvm->mmu_lock); 2043 + flush = false; 2044 + } 2080 2045 } 2046 + 2047 + kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush); 2081 2048 } 2082 2049 2083 2050 static void __clear_sp_write_flooding_count(struct kvm_mmu_page *sp) 2084 2051 { 2085 - sp->write_flooding_count = 0; 2052 + atomic_set(&sp->write_flooding_count, 0); 2086 2053 } 2087 2054 2088 2055 static void clear_sp_write_flooding_count(u64 *spte) ··· 2114 2069 unsigned quadrant; 2115 2070 struct kvm_mmu_page *sp; 2116 2071 bool need_sync = false; 2072 + bool flush = false; 2073 + LIST_HEAD(invalid_list); 2117 2074 2118 2075 role = vcpu->arch.mmu.base_role; 2119 2076 role.level = level; ··· 2139 2092 if (sp->role.word != role.word) 2140 2093 continue; 2141 2094 2142 - if (sp->unsync && kvm_sync_page_transient(vcpu, sp)) 2143 - break; 2095 + if (sp->unsync) { 2096 + /* The page is good, but __kvm_sync_page might still end 2097 + * up zapping it. If so, break in order to rebuild it. 2098 + */ 2099 + if (!__kvm_sync_page(vcpu, sp, &invalid_list)) 2100 + break; 2101 + 2102 + WARN_ON(!list_empty(&invalid_list)); 2103 + kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); 2104 + } 2144 2105 2145 2106 if (sp->unsync_children) 2146 2107 kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); ··· 2167 2112 hlist_add_head(&sp->hash_link, 2168 2113 &vcpu->kvm->arch.mmu_page_hash[kvm_page_table_hashfn(gfn)]); 2169 2114 if (!direct) { 2170 - if (rmap_write_protect(vcpu, gfn)) 2171 - kvm_flush_remote_tlbs(vcpu->kvm); 2172 - if (level > PT_PAGE_TABLE_LEVEL && need_sync) 2173 - kvm_sync_pages(vcpu, gfn); 2174 - 2115 + /* 2116 + * we should do write protection before syncing pages 2117 + * otherwise the content of the synced shadow page may 2118 + * be inconsistent with guest page table. 2119 + */ 2175 2120 account_shadowed(vcpu->kvm, sp); 2121 + if (level == PT_PAGE_TABLE_LEVEL && 2122 + rmap_write_protect(vcpu, gfn)) 2123 + kvm_flush_remote_tlbs(vcpu->kvm); 2124 + 2125 + if (level > PT_PAGE_TABLE_LEVEL && need_sync) 2126 + flush |= kvm_sync_pages(vcpu, gfn, &invalid_list); 2176 2127 } 2177 2128 sp->mmu_valid_gen = vcpu->kvm->arch.mmu_valid_gen; 2178 2129 clear_page(sp->spt); 2179 2130 trace_kvm_mmu_get_page(sp, true); 2131 + 2132 + kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush); 2180 2133 return sp; 2181 2134 } 2182 2135 ··· 2332 2269 if (parent->role.level == PT_PAGE_TABLE_LEVEL) 2333 2270 return 0; 2334 2271 2335 - kvm_mmu_pages_init(parent, &parents, &pages); 2336 2272 while (mmu_unsync_walk(parent, &pages)) { 2337 2273 struct kvm_mmu_page *sp; 2338 2274 ··· 2340 2278 mmu_pages_clear_parents(&parents); 2341 2279 zapped++; 2342 2280 } 2343 - kvm_mmu_pages_init(parent, &parents, &pages); 2344 2281 } 2345 2282 2346 2283 return zapped; ··· 2415 2354 if (list_empty(&kvm->arch.active_mmu_pages)) 2416 2355 return false; 2417 2356 2418 - sp = list_entry(kvm->arch.active_mmu_pages.prev, 2419 - struct kvm_mmu_page, link); 2357 + sp = list_last_entry(&kvm->arch.active_mmu_pages, 2358 + struct kvm_mmu_page, link); 2420 2359 kvm_mmu_prepare_zap_page(kvm, sp, invalid_list); 2421 2360 2422 2361 return true; ··· 2469 2408 } 2470 2409 EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page); 2471 2410 2472 - static void __kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) 2411 + static void kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) 2473 2412 { 2474 2413 trace_kvm_mmu_unsync_page(sp); 2475 2414 ++vcpu->kvm->stat.mmu_unsync; ··· 2478 2417 kvm_mmu_mark_parents_unsync(sp); 2479 2418 } 2480 2419 2481 - static void kvm_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn) 2420 + static bool mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn, 2421 + bool can_unsync) 2482 2422 { 2483 - struct kvm_mmu_page *s; 2423 + struct kvm_mmu_page *sp; 2484 2424 2485 - for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) { 2486 - if (s->unsync) 2487 - continue; 2488 - WARN_ON(s->role.level != PT_PAGE_TABLE_LEVEL); 2489 - __kvm_unsync_page(vcpu, s); 2490 - } 2491 - } 2425 + if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE)) 2426 + return true; 2492 2427 2493 - static int mmu_need_write_protect(struct kvm_vcpu *vcpu, gfn_t gfn, 2494 - bool can_unsync) 2495 - { 2496 - struct kvm_mmu_page *s; 2497 - bool need_unsync = false; 2498 - 2499 - for_each_gfn_indirect_valid_sp(vcpu->kvm, s, gfn) { 2428 + for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) { 2500 2429 if (!can_unsync) 2501 - return 1; 2430 + return true; 2502 2431 2503 - if (s->role.level != PT_PAGE_TABLE_LEVEL) 2504 - return 1; 2432 + if (sp->unsync) 2433 + continue; 2505 2434 2506 - if (!s->unsync) 2507 - need_unsync = true; 2435 + WARN_ON(sp->role.level != PT_PAGE_TABLE_LEVEL); 2436 + kvm_unsync_page(vcpu, sp); 2508 2437 } 2509 - if (need_unsync) 2510 - kvm_unsync_pages(vcpu, gfn); 2511 - return 0; 2438 + 2439 + return false; 2512 2440 } 2513 2441 2514 2442 static bool kvm_is_mmio_pfn(kvm_pfn_t pfn) ··· 2553 2503 * be fixed if guest refault. 2554 2504 */ 2555 2505 if (level > PT_PAGE_TABLE_LEVEL && 2556 - has_wrprotected_page(vcpu, gfn, level)) 2506 + mmu_gfn_lpage_is_disallowed(vcpu, gfn, level)) 2557 2507 goto done; 2558 2508 2559 2509 spte |= PT_WRITABLE_MASK | SPTE_MMU_WRITEABLE; ··· 2818 2768 if (!is_error_noslot_pfn(pfn) && !kvm_is_reserved_pfn(pfn) && 2819 2769 level == PT_PAGE_TABLE_LEVEL && 2820 2770 PageTransCompound(pfn_to_page(pfn)) && 2821 - !has_wrprotected_page(vcpu, gfn, PT_DIRECTORY_LEVEL)) { 2771 + !mmu_gfn_lpage_is_disallowed(vcpu, gfn, PT_DIRECTORY_LEVEL)) { 2822 2772 unsigned long mask; 2823 2773 /* 2824 2774 * mmu_notifier_retry was successful and we hold the ··· 2846 2796 static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn, 2847 2797 kvm_pfn_t pfn, unsigned access, int *ret_val) 2848 2798 { 2849 - bool ret = true; 2850 - 2851 2799 /* The pfn is invalid, report the error! */ 2852 2800 if (unlikely(is_error_pfn(pfn))) { 2853 2801 *ret_val = kvm_handle_bad_page(vcpu, gfn, pfn); 2854 - goto exit; 2802 + return true; 2855 2803 } 2856 2804 2857 2805 if (unlikely(is_noslot_pfn(pfn))) 2858 2806 vcpu_cache_mmio_info(vcpu, gva, gfn, access); 2859 2807 2860 - ret = false; 2861 - exit: 2862 - return ret; 2808 + return false; 2863 2809 } 2864 2810 2865 2811 static bool page_fault_can_be_fast(u32 error_code) ··· 3319 3273 return __is_rsvd_bits_set(&mmu->shadow_zero_check, spte, level); 3320 3274 } 3321 3275 3322 - static bool quickly_check_mmio_pf(struct kvm_vcpu *vcpu, u64 addr, bool direct) 3276 + static bool mmio_info_in_cache(struct kvm_vcpu *vcpu, u64 addr, bool direct) 3323 3277 { 3324 3278 if (direct) 3325 3279 return vcpu_match_mmio_gpa(vcpu, addr); ··· 3378 3332 u64 spte; 3379 3333 bool reserved; 3380 3334 3381 - if (quickly_check_mmio_pf(vcpu, addr, direct)) 3335 + if (mmio_info_in_cache(vcpu, addr, direct)) 3382 3336 return RET_MMIO_PF_EMULATE; 3383 3337 3384 3338 reserved = walk_shadow_page_get_mmio_spte(vcpu, addr, &spte); ··· 3408 3362 } 3409 3363 EXPORT_SYMBOL_GPL(handle_mmio_page_fault); 3410 3364 3365 + static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu, 3366 + u32 error_code, gfn_t gfn) 3367 + { 3368 + if (unlikely(error_code & PFERR_RSVD_MASK)) 3369 + return false; 3370 + 3371 + if (!(error_code & PFERR_PRESENT_MASK) || 3372 + !(error_code & PFERR_WRITE_MASK)) 3373 + return false; 3374 + 3375 + /* 3376 + * guest is writing the page which is write tracked which can 3377 + * not be fixed by page fault handler. 3378 + */ 3379 + if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE)) 3380 + return true; 3381 + 3382 + return false; 3383 + } 3384 + 3385 + static void shadow_page_table_clear_flood(struct kvm_vcpu *vcpu, gva_t addr) 3386 + { 3387 + struct kvm_shadow_walk_iterator iterator; 3388 + u64 spte; 3389 + 3390 + if (!VALID_PAGE(vcpu->arch.mmu.root_hpa)) 3391 + return; 3392 + 3393 + walk_shadow_page_lockless_begin(vcpu); 3394 + for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) { 3395 + clear_sp_write_flooding_count(iterator.sptep); 3396 + if (!is_shadow_present_pte(spte)) 3397 + break; 3398 + } 3399 + walk_shadow_page_lockless_end(vcpu); 3400 + } 3401 + 3411 3402 static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gva_t gva, 3412 3403 u32 error_code, bool prefault) 3413 3404 { 3414 - gfn_t gfn; 3405 + gfn_t gfn = gva >> PAGE_SHIFT; 3415 3406 int r; 3416 3407 3417 3408 pgprintk("%s: gva %lx error %x\n", __func__, gva, error_code); 3418 3409 3419 - if (unlikely(error_code & PFERR_RSVD_MASK)) { 3420 - r = handle_mmio_page_fault(vcpu, gva, true); 3421 - 3422 - if (likely(r != RET_MMIO_PF_INVALID)) 3423 - return r; 3424 - } 3410 + if (page_fault_handle_page_track(vcpu, error_code, gfn)) 3411 + return 1; 3425 3412 3426 3413 r = mmu_topup_memory_caches(vcpu); 3427 3414 if (r) ··· 3462 3383 3463 3384 MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); 3464 3385 3465 - gfn = gva >> PAGE_SHIFT; 3466 3386 3467 3387 return nonpaging_map(vcpu, gva & PAGE_MASK, 3468 3388 error_code, gfn, prefault); ··· 3538 3460 3539 3461 MMU_WARN_ON(!VALID_PAGE(vcpu->arch.mmu.root_hpa)); 3540 3462 3541 - if (unlikely(error_code & PFERR_RSVD_MASK)) { 3542 - r = handle_mmio_page_fault(vcpu, gpa, true); 3543 - 3544 - if (likely(r != RET_MMIO_PF_INVALID)) 3545 - return r; 3546 - } 3463 + if (page_fault_handle_page_track(vcpu, error_code, gfn)) 3464 + return 1; 3547 3465 3548 3466 r = mmu_topup_memory_caches(vcpu); 3549 3467 if (r) ··· 3632 3558 return false; 3633 3559 } 3634 3560 3635 - static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gpte) 3561 + static inline bool is_last_gpte(struct kvm_mmu *mmu, 3562 + unsigned level, unsigned gpte) 3636 3563 { 3637 - unsigned index; 3564 + /* 3565 + * PT_PAGE_TABLE_LEVEL always terminates. The RHS has bit 7 set 3566 + * iff level <= PT_PAGE_TABLE_LEVEL, which for our purpose means 3567 + * level == PT_PAGE_TABLE_LEVEL; set PT_PAGE_SIZE_MASK in gpte then. 3568 + */ 3569 + gpte |= level - PT_PAGE_TABLE_LEVEL - 1; 3638 3570 3639 - index = level - 1; 3640 - index |= (gpte & PT_PAGE_SIZE_MASK) >> (PT_PAGE_SIZE_SHIFT - 2); 3641 - return mmu->last_pte_bitmap & (1 << index); 3571 + /* 3572 + * The RHS has bit 7 set iff level < mmu->last_nonleaf_level. 3573 + * If it is clear, there are no large pages at this level, so clear 3574 + * PT_PAGE_SIZE_MASK in gpte if that is the case. 3575 + */ 3576 + gpte &= level - mmu->last_nonleaf_level; 3577 + 3578 + return gpte & PT_PAGE_SIZE_MASK; 3642 3579 } 3643 3580 3644 3581 #define PTTYPE_EPT 18 /* arbitrary */ ··· 3923 3838 } 3924 3839 } 3925 3840 3926 - static void update_last_pte_bitmap(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu) 3841 + static void update_last_nonleaf_level(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu) 3927 3842 { 3928 - u8 map; 3929 - unsigned level, root_level = mmu->root_level; 3930 - const unsigned ps_set_index = 1 << 2; /* bit 2 of index: ps */ 3843 + unsigned root_level = mmu->root_level; 3931 3844 3932 - if (root_level == PT32E_ROOT_LEVEL) 3933 - --root_level; 3934 - /* PT_PAGE_TABLE_LEVEL always terminates */ 3935 - map = 1 | (1 << ps_set_index); 3936 - for (level = PT_DIRECTORY_LEVEL; level <= root_level; ++level) { 3937 - if (level <= PT_PDPE_LEVEL 3938 - && (mmu->root_level >= PT32E_ROOT_LEVEL || is_pse(vcpu))) 3939 - map |= 1 << (ps_set_index | (level - 1)); 3940 - } 3941 - mmu->last_pte_bitmap = map; 3845 + mmu->last_nonleaf_level = root_level; 3846 + if (root_level == PT32_ROOT_LEVEL && is_pse(vcpu)) 3847 + mmu->last_nonleaf_level++; 3942 3848 } 3943 3849 3944 3850 static void paging64_init_context_common(struct kvm_vcpu *vcpu, ··· 3941 3865 3942 3866 reset_rsvds_bits_mask(vcpu, context); 3943 3867 update_permission_bitmask(vcpu, context, false); 3944 - update_last_pte_bitmap(vcpu, context); 3868 + update_last_nonleaf_level(vcpu, context); 3945 3869 3946 3870 MMU_WARN_ON(!is_pae(vcpu)); 3947 3871 context->page_fault = paging64_page_fault; ··· 3968 3892 3969 3893 reset_rsvds_bits_mask(vcpu, context); 3970 3894 update_permission_bitmask(vcpu, context, false); 3971 - update_last_pte_bitmap(vcpu, context); 3895 + update_last_nonleaf_level(vcpu, context); 3972 3896 3973 3897 context->page_fault = paging32_page_fault; 3974 3898 context->gva_to_gpa = paging32_gva_to_gpa; ··· 4026 3950 } 4027 3951 4028 3952 update_permission_bitmask(vcpu, context, false); 4029 - update_last_pte_bitmap(vcpu, context); 3953 + update_last_nonleaf_level(vcpu, context); 4030 3954 reset_tdp_shadow_zero_bits_mask(vcpu, context); 4031 3955 } 4032 3956 ··· 4132 4056 } 4133 4057 4134 4058 update_permission_bitmask(vcpu, g_context, false); 4135 - update_last_pte_bitmap(vcpu, g_context); 4059 + update_last_nonleaf_level(vcpu, g_context); 4136 4060 } 4137 4061 4138 4062 static void init_kvm_mmu(struct kvm_vcpu *vcpu) ··· 4203 4127 return (old & ~new & PT64_PERM_MASK) != 0; 4204 4128 } 4205 4129 4206 - static void mmu_pte_write_flush_tlb(struct kvm_vcpu *vcpu, bool zap_page, 4207 - bool remote_flush, bool local_flush) 4208 - { 4209 - if (zap_page) 4210 - return; 4211 - 4212 - if (remote_flush) 4213 - kvm_flush_remote_tlbs(vcpu->kvm); 4214 - else if (local_flush) 4215 - kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu); 4216 - } 4217 - 4218 4130 static u64 mmu_pte_write_fetch_gpte(struct kvm_vcpu *vcpu, gpa_t *gpa, 4219 4131 const u8 *new, int *bytes) 4220 4132 { ··· 4252 4188 if (sp->role.level == PT_PAGE_TABLE_LEVEL) 4253 4189 return false; 4254 4190 4255 - return ++sp->write_flooding_count >= 3; 4191 + atomic_inc(&sp->write_flooding_count); 4192 + return atomic_read(&sp->write_flooding_count) >= 3; 4256 4193 } 4257 4194 4258 4195 /* ··· 4315 4250 return spte; 4316 4251 } 4317 4252 4318 - void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, 4319 - const u8 *new, int bytes) 4253 + static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, gpa_t gpa, 4254 + const u8 *new, int bytes) 4320 4255 { 4321 4256 gfn_t gfn = gpa >> PAGE_SHIFT; 4322 4257 struct kvm_mmu_page *sp; 4323 4258 LIST_HEAD(invalid_list); 4324 4259 u64 entry, gentry, *spte; 4325 4260 int npte; 4326 - bool remote_flush, local_flush, zap_page; 4261 + bool remote_flush, local_flush; 4327 4262 union kvm_mmu_page_role mask = { }; 4328 4263 4329 4264 mask.cr0_wp = 1; ··· 4340 4275 if (!ACCESS_ONCE(vcpu->kvm->arch.indirect_shadow_pages)) 4341 4276 return; 4342 4277 4343 - zap_page = remote_flush = local_flush = false; 4278 + remote_flush = local_flush = false; 4344 4279 4345 4280 pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); 4346 4281 ··· 4360 4295 for_each_gfn_indirect_valid_sp(vcpu->kvm, sp, gfn) { 4361 4296 if (detect_write_misaligned(sp, gpa, bytes) || 4362 4297 detect_write_flooding(sp)) { 4363 - zap_page |= !!kvm_mmu_prepare_zap_page(vcpu->kvm, sp, 4364 - &invalid_list); 4298 + kvm_mmu_prepare_zap_page(vcpu->kvm, sp, &invalid_list); 4365 4299 ++vcpu->kvm->stat.mmu_flooded; 4366 4300 continue; 4367 4301 } ··· 4382 4318 ++spte; 4383 4319 } 4384 4320 } 4385 - mmu_pte_write_flush_tlb(vcpu, zap_page, remote_flush, local_flush); 4386 - kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); 4321 + kvm_mmu_flush_or_zap(vcpu, &invalid_list, remote_flush, local_flush); 4387 4322 kvm_mmu_audit(vcpu, AUDIT_POST_PTE_WRITE); 4388 4323 spin_unlock(&vcpu->kvm->mmu_lock); 4389 4324 } ··· 4419 4356 kvm_mmu_commit_zap_page(vcpu->kvm, &invalid_list); 4420 4357 } 4421 4358 4422 - static bool is_mmio_page_fault(struct kvm_vcpu *vcpu, gva_t addr) 4423 - { 4424 - if (vcpu->arch.mmu.direct_map || mmu_is_nested(vcpu)) 4425 - return vcpu_match_mmio_gpa(vcpu, addr); 4426 - 4427 - return vcpu_match_mmio_gva(vcpu, addr); 4428 - } 4429 - 4430 4359 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gva_t cr2, u32 error_code, 4431 4360 void *insn, int insn_len) 4432 4361 { 4433 4362 int r, emulation_type = EMULTYPE_RETRY; 4434 4363 enum emulation_result er; 4364 + bool direct = vcpu->arch.mmu.direct_map || mmu_is_nested(vcpu); 4365 + 4366 + if (unlikely(error_code & PFERR_RSVD_MASK)) { 4367 + r = handle_mmio_page_fault(vcpu, cr2, direct); 4368 + if (r == RET_MMIO_PF_EMULATE) { 4369 + emulation_type = 0; 4370 + goto emulate; 4371 + } 4372 + if (r == RET_MMIO_PF_RETRY) 4373 + return 1; 4374 + if (r < 0) 4375 + return r; 4376 + } 4435 4377 4436 4378 r = vcpu->arch.mmu.page_fault(vcpu, cr2, error_code, false); 4437 4379 if (r < 0) 4438 - goto out; 4380 + return r; 4381 + if (!r) 4382 + return 1; 4439 4383 4440 - if (!r) { 4441 - r = 1; 4442 - goto out; 4443 - } 4444 - 4445 - if (is_mmio_page_fault(vcpu, cr2)) 4384 + if (mmio_info_in_cache(vcpu, cr2, direct)) 4446 4385 emulation_type = 0; 4447 - 4386 + emulate: 4448 4387 er = x86_emulate_instruction(vcpu, cr2, emulation_type, insn, insn_len); 4449 4388 4450 4389 switch (er) { ··· 4460 4395 default: 4461 4396 BUG(); 4462 4397 } 4463 - out: 4464 - return r; 4465 4398 } 4466 4399 EXPORT_SYMBOL_GPL(kvm_mmu_page_fault); 4467 4400 ··· 4526 4463 MMU_WARN_ON(VALID_PAGE(vcpu->arch.mmu.root_hpa)); 4527 4464 4528 4465 init_kvm_mmu(vcpu); 4466 + } 4467 + 4468 + void kvm_mmu_init_vm(struct kvm *kvm) 4469 + { 4470 + struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker; 4471 + 4472 + node->track_write = kvm_mmu_pte_write; 4473 + kvm_page_track_register_notifier(kvm, node); 4474 + } 4475 + 4476 + void kvm_mmu_uninit_vm(struct kvm *kvm) 4477 + { 4478 + struct kvm_page_track_notifier_node *node = &kvm->arch.mmu_sp_tracker; 4479 + 4480 + kvm_page_track_unregister_notifier(kvm, node); 4529 4481 } 4530 4482 4531 4483 /* The return value indicates if tlb flush on all vcpus is needed. */

+5

arch/x86/kvm/mmu.h

··· 174 174 175 175 void kvm_mmu_invalidate_zap_all_pages(struct kvm *kvm); 176 176 void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end); 177 + 178 + void kvm_mmu_gfn_disallow_lpage(struct kvm_memory_slot *slot, gfn_t gfn); 179 + void kvm_mmu_gfn_allow_lpage(struct kvm_memory_slot *slot, gfn_t gfn); 180 + bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm, 181 + struct kvm_memory_slot *slot, u64 gfn); 177 182 #endif

+222

arch/x86/kvm/page_track.c

··· 1 + /* 2 + * Support KVM gust page tracking 3 + * 4 + * This feature allows us to track page access in guest. Currently, only 5 + * write access is tracked. 6 + * 7 + * Copyright(C) 2015 Intel Corporation. 8 + * 9 + * Author: 10 + * Xiao Guangrong <guangrong.xiao@linux.intel.com> 11 + * 12 + * This work is licensed under the terms of the GNU GPL, version 2. See 13 + * the COPYING file in the top-level directory. 14 + */ 15 + 16 + #include <linux/kvm_host.h> 17 + #include <asm/kvm_host.h> 18 + #include <asm/kvm_page_track.h> 19 + 20 + #include "mmu.h" 21 + 22 + void kvm_page_track_free_memslot(struct kvm_memory_slot *free, 23 + struct kvm_memory_slot *dont) 24 + { 25 + int i; 26 + 27 + for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) 28 + if (!dont || free->arch.gfn_track[i] != 29 + dont->arch.gfn_track[i]) { 30 + kvfree(free->arch.gfn_track[i]); 31 + free->arch.gfn_track[i] = NULL; 32 + } 33 + } 34 + 35 + int kvm_page_track_create_memslot(struct kvm_memory_slot *slot, 36 + unsigned long npages) 37 + { 38 + int i; 39 + 40 + for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) { 41 + slot->arch.gfn_track[i] = kvm_kvzalloc(npages * 42 + sizeof(*slot->arch.gfn_track[i])); 43 + if (!slot->arch.gfn_track[i]) 44 + goto track_free; 45 + } 46 + 47 + return 0; 48 + 49 + track_free: 50 + kvm_page_track_free_memslot(slot, NULL); 51 + return -ENOMEM; 52 + } 53 + 54 + static inline bool page_track_mode_is_valid(enum kvm_page_track_mode mode) 55 + { 56 + if (mode < 0 || mode >= KVM_PAGE_TRACK_MAX) 57 + return false; 58 + 59 + return true; 60 + } 61 + 62 + static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn, 63 + enum kvm_page_track_mode mode, short count) 64 + { 65 + int index, val; 66 + 67 + index = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL); 68 + 69 + val = slot->arch.gfn_track[mode][index]; 70 + 71 + if (WARN_ON(val + count < 0 || val + count > USHRT_MAX)) 72 + return; 73 + 74 + slot->arch.gfn_track[mode][index] += count; 75 + } 76 + 77 + /* 78 + * add guest page to the tracking pool so that corresponding access on that 79 + * page will be intercepted. 80 + * 81 + * It should be called under the protection both of mmu-lock and kvm->srcu 82 + * or kvm->slots_lock. 83 + * 84 + * @kvm: the guest instance we are interested in. 85 + * @slot: the @gfn belongs to. 86 + * @gfn: the guest page. 87 + * @mode: tracking mode, currently only write track is supported. 88 + */ 89 + void kvm_slot_page_track_add_page(struct kvm *kvm, 90 + struct kvm_memory_slot *slot, gfn_t gfn, 91 + enum kvm_page_track_mode mode) 92 + { 93 + 94 + if (WARN_ON(!page_track_mode_is_valid(mode))) 95 + return; 96 + 97 + update_gfn_track(slot, gfn, mode, 1); 98 + 99 + /* 100 + * new track stops large page mapping for the 101 + * tracked page. 102 + */ 103 + kvm_mmu_gfn_disallow_lpage(slot, gfn); 104 + 105 + if (mode == KVM_PAGE_TRACK_WRITE) 106 + if (kvm_mmu_slot_gfn_write_protect(kvm, slot, gfn)) 107 + kvm_flush_remote_tlbs(kvm); 108 + } 109 + 110 + /* 111 + * remove the guest page from the tracking pool which stops the interception 112 + * of corresponding access on that page. It is the opposed operation of 113 + * kvm_slot_page_track_add_page(). 114 + * 115 + * It should be called under the protection both of mmu-lock and kvm->srcu 116 + * or kvm->slots_lock. 117 + * 118 + * @kvm: the guest instance we are interested in. 119 + * @slot: the @gfn belongs to. 120 + * @gfn: the guest page. 121 + * @mode: tracking mode, currently only write track is supported. 122 + */ 123 + void kvm_slot_page_track_remove_page(struct kvm *kvm, 124 + struct kvm_memory_slot *slot, gfn_t gfn, 125 + enum kvm_page_track_mode mode) 126 + { 127 + if (WARN_ON(!page_track_mode_is_valid(mode))) 128 + return; 129 + 130 + update_gfn_track(slot, gfn, mode, -1); 131 + 132 + /* 133 + * allow large page mapping for the tracked page 134 + * after the tracker is gone. 135 + */ 136 + kvm_mmu_gfn_allow_lpage(slot, gfn); 137 + } 138 + 139 + /* 140 + * check if the corresponding access on the specified guest page is tracked. 141 + */ 142 + bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn, 143 + enum kvm_page_track_mode mode) 144 + { 145 + struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); 146 + int index = gfn_to_index(gfn, slot->base_gfn, PT_PAGE_TABLE_LEVEL); 147 + 148 + if (WARN_ON(!page_track_mode_is_valid(mode))) 149 + return false; 150 + 151 + return !!ACCESS_ONCE(slot->arch.gfn_track[mode][index]); 152 + } 153 + 154 + void kvm_page_track_init(struct kvm *kvm) 155 + { 156 + struct kvm_page_track_notifier_head *head; 157 + 158 + head = &kvm->arch.track_notifier_head; 159 + init_srcu_struct(&head->track_srcu); 160 + INIT_HLIST_HEAD(&head->track_notifier_list); 161 + } 162 + 163 + /* 164 + * register the notifier so that event interception for the tracked guest 165 + * pages can be received. 166 + */ 167 + void 168 + kvm_page_track_register_notifier(struct kvm *kvm, 169 + struct kvm_page_track_notifier_node *n) 170 + { 171 + struct kvm_page_track_notifier_head *head; 172 + 173 + head = &kvm->arch.track_notifier_head; 174 + 175 + spin_lock(&kvm->mmu_lock); 176 + hlist_add_head_rcu(&n->node, &head->track_notifier_list); 177 + spin_unlock(&kvm->mmu_lock); 178 + } 179 + 180 + /* 181 + * stop receiving the event interception. It is the opposed operation of 182 + * kvm_page_track_register_notifier(). 183 + */ 184 + void 185 + kvm_page_track_unregister_notifier(struct kvm *kvm, 186 + struct kvm_page_track_notifier_node *n) 187 + { 188 + struct kvm_page_track_notifier_head *head; 189 + 190 + head = &kvm->arch.track_notifier_head; 191 + 192 + spin_lock(&kvm->mmu_lock); 193 + hlist_del_rcu(&n->node); 194 + spin_unlock(&kvm->mmu_lock); 195 + synchronize_srcu(&head->track_srcu); 196 + } 197 + 198 + /* 199 + * Notify the node that write access is intercepted and write emulation is 200 + * finished at this time. 201 + * 202 + * The node should figure out if the written page is the one that node is 203 + * interested in by itself. 204 + */ 205 + void kvm_page_track_write(struct kvm_vcpu *vcpu, gpa_t gpa, const u8 *new, 206 + int bytes) 207 + { 208 + struct kvm_page_track_notifier_head *head; 209 + struct kvm_page_track_notifier_node *n; 210 + int idx; 211 + 212 + head = &vcpu->kvm->arch.track_notifier_head; 213 + 214 + if (hlist_empty(&head->track_notifier_list)) 215 + return; 216 + 217 + idx = srcu_read_lock(&head->track_srcu); 218 + hlist_for_each_entry_rcu(n, &head->track_notifier_list, node) 219 + if (n->track_write) 220 + n->track_write(vcpu, gpa, new, bytes); 221 + srcu_read_unlock(&head->track_srcu, idx); 222 + }

+18 -17

arch/x86/kvm/paging_tmpl.h

··· 189 189 ((gpte & VMX_EPT_EXECUTABLE_MASK) ? ACC_EXEC_MASK : 0) | 190 190 ACC_USER_MASK; 191 191 #else 192 - access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK; 193 - access &= ~(gpte >> PT64_NX_SHIFT); 192 + BUILD_BUG_ON(ACC_EXEC_MASK != PT_PRESENT_MASK); 193 + BUILD_BUG_ON(ACC_EXEC_MASK != 1); 194 + access = gpte & (PT_WRITABLE_MASK | PT_USER_MASK | PT_PRESENT_MASK); 195 + /* Combine NX with P (which is set here) to get ACC_EXEC_MASK. */ 196 + access ^= (gpte >> PT64_NX_SHIFT); 194 197 #endif 195 198 196 199 return access; ··· 705 702 706 703 pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code); 707 704 708 - if (unlikely(error_code & PFERR_RSVD_MASK)) { 709 - r = handle_mmio_page_fault(vcpu, addr, mmu_is_nested(vcpu)); 710 - if (likely(r != RET_MMIO_PF_INVALID)) 711 - return r; 712 - 713 - /* 714 - * page fault with PFEC.RSVD = 1 is caused by shadow 715 - * page fault, should not be used to walk guest page 716 - * table. 717 - */ 718 - error_code &= ~PFERR_RSVD_MASK; 719 - }; 720 - 721 705 r = mmu_topup_memory_caches(vcpu); 722 706 if (r) 723 707 return r; 708 + 709 + /* 710 + * If PFEC.RSVD is set, this is a shadow page fault. 711 + * The bit needs to be cleared before walking guest page tables. 712 + */ 713 + error_code &= ~PFERR_RSVD_MASK; 724 714 725 715 /* 726 716 * Look up the guest pte for the faulting address. ··· 729 733 inject_page_fault(vcpu, &walker.fault); 730 734 731 735 return 0; 736 + } 737 + 738 + if (page_fault_handle_page_track(vcpu, error_code, walker.gfn)) { 739 + shadow_page_table_clear_flood(vcpu, addr); 740 + return 1; 732 741 } 733 742 734 743 vcpu->arch.write_fault_to_shadow_pgtable = false; ··· 946 945 947 946 if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte, 948 947 sizeof(pt_element_t))) 949 - return -EINVAL; 948 + return 0; 950 949 951 950 if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) { 952 951 vcpu->kvm->tlbs_dirty++; ··· 978 977 host_writable); 979 978 } 980 979 981 - return !nr_present; 980 + return nr_present; 982 981 } 983 982 984 983 #undef pt_element_t

+1 -1

arch/x86/kvm/pmu.c

··· 257 257 258 258 void kvm_pmu_deliver_pmi(struct kvm_vcpu *vcpu) 259 259 { 260 - if (vcpu->arch.apic) 260 + if (lapic_in_kernel(vcpu)) 261 261 kvm_apic_local_deliver(vcpu->arch.apic, APIC_LVTPC); 262 262 } 263 263

+1 -2

arch/x86/kvm/svm.c

··· 1858 1858 static int vmmcall_interception(struct vcpu_svm *svm) 1859 1859 { 1860 1860 svm->next_rip = kvm_rip_read(&svm->vcpu) + 3; 1861 - kvm_emulate_hypercall(&svm->vcpu); 1862 - return 1; 1861 + return kvm_emulate_hypercall(&svm->vcpu); 1863 1862 } 1864 1863 1865 1864 static unsigned long nested_svm_get_tdp_cr3(struct kvm_vcpu *vcpu)

+8 -4

arch/x86/kvm/trace.h

··· 996 996 * Tracepoint for VT-d posted-interrupts. 997 997 */ 998 998 TRACE_EVENT(kvm_pi_irte_update, 999 - TP_PROTO(unsigned int vcpu_id, unsigned int gsi, 1000 - unsigned int gvec, u64 pi_desc_addr, bool set), 1001 - TP_ARGS(vcpu_id, gsi, gvec, pi_desc_addr, set), 999 + TP_PROTO(unsigned int host_irq, unsigned int vcpu_id, 1000 + unsigned int gsi, unsigned int gvec, 1001 + u64 pi_desc_addr, bool set), 1002 + TP_ARGS(host_irq, vcpu_id, gsi, gvec, pi_desc_addr, set), 1002 1003 1003 1004 TP_STRUCT__entry( 1005 + __field( unsigned int, host_irq ) 1004 1006 __field( unsigned int, vcpu_id ) 1005 1007 __field( unsigned int, gsi ) 1006 1008 __field( unsigned int, gvec ) ··· 1011 1009 ), 1012 1010 1013 1011 TP_fast_assign( 1012 + __entry->host_irq = host_irq; 1014 1013 __entry->vcpu_id = vcpu_id; 1015 1014 __entry->gsi = gsi; 1016 1015 __entry->gvec = gvec; ··· 1019 1016 __entry->set = set; 1020 1017 ), 1021 1018 1022 - TP_printk("VT-d PI is %s for this irq, vcpu %u, gsi: 0x%x, " 1019 + TP_printk("VT-d PI is %s for irq %u, vcpu %u, gsi: 0x%x, " 1023 1020 "gvec: 0x%x, pi_desc_addr: 0x%llx", 1024 1021 __entry->set ? "enabled and being updated" : "disabled", 1022 + __entry->host_irq, 1025 1023 __entry->vcpu_id, 1026 1024 __entry->gsi, 1027 1025 __entry->gvec,

+53 -32

arch/x86/kvm/vmx.c

··· 863 863 static u64 construct_eptp(unsigned long root_hpa); 864 864 static void kvm_cpu_vmxon(u64 addr); 865 865 static void kvm_cpu_vmxoff(void); 866 - static bool vmx_mpx_supported(void); 867 866 static bool vmx_xsaves_supported(void); 868 867 static int vmx_set_tss_addr(struct kvm *kvm, unsigned int addr); 869 868 static void vmx_set_segment(struct kvm_vcpu *vcpu, ··· 962 963 MSR_EFER, MSR_TSC_AUX, MSR_STAR, 963 964 }; 964 965 965 - static inline bool is_page_fault(u32 intr_info) 966 + static inline bool is_exception_n(u32 intr_info, u8 vector) 966 967 { 967 968 return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VECTOR_MASK | 968 969 INTR_INFO_VALID_MASK)) == 969 - (INTR_TYPE_HARD_EXCEPTION | PF_VECTOR | INTR_INFO_VALID_MASK); 970 + (INTR_TYPE_HARD_EXCEPTION | vector | INTR_INFO_VALID_MASK); 971 + } 972 + 973 + static inline bool is_debug(u32 intr_info) 974 + { 975 + return is_exception_n(intr_info, DB_VECTOR); 976 + } 977 + 978 + static inline bool is_breakpoint(u32 intr_info) 979 + { 980 + return is_exception_n(intr_info, BP_VECTOR); 981 + } 982 + 983 + static inline bool is_page_fault(u32 intr_info) 984 + { 985 + return is_exception_n(intr_info, PF_VECTOR); 970 986 } 971 987 972 988 static inline bool is_no_device(u32 intr_info) 973 989 { 974 - return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VECTOR_MASK | 975 - INTR_INFO_VALID_MASK)) == 976 - (INTR_TYPE_HARD_EXCEPTION | NM_VECTOR | INTR_INFO_VALID_MASK); 990 + return is_exception_n(intr_info, NM_VECTOR); 977 991 } 978 992 979 993 static inline bool is_invalid_opcode(u32 intr_info) 980 994 { 981 - return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VECTOR_MASK | 982 - INTR_INFO_VALID_MASK)) == 983 - (INTR_TYPE_HARD_EXCEPTION | UD_VECTOR | INTR_INFO_VALID_MASK); 995 + return is_exception_n(intr_info, UD_VECTOR); 984 996 } 985 997 986 998 static inline bool is_external_interrupt(u32 intr_info) ··· 2615 2605 VM_EXIT_LOAD_IA32_EFER | VM_EXIT_SAVE_IA32_EFER | 2616 2606 VM_EXIT_SAVE_VMX_PREEMPTION_TIMER | VM_EXIT_ACK_INTR_ON_EXIT; 2617 2607 2618 - if (vmx_mpx_supported()) 2608 + if (kvm_mpx_supported()) 2619 2609 vmx->nested.nested_vmx_exit_ctls_high |= VM_EXIT_CLEAR_BNDCFGS; 2620 2610 2621 2611 /* We support free control of debug control saving. */ ··· 2636 2626 VM_ENTRY_LOAD_IA32_PAT; 2637 2627 vmx->nested.nested_vmx_entry_ctls_high |= 2638 2628 (VM_ENTRY_ALWAYSON_WITHOUT_TRUE_MSR | VM_ENTRY_LOAD_IA32_EFER); 2639 - if (vmx_mpx_supported()) 2629 + if (kvm_mpx_supported()) 2640 2630 vmx->nested.nested_vmx_entry_ctls_high |= VM_ENTRY_LOAD_BNDCFGS; 2641 2631 2642 2632 /* We support free control of debug control loading. */ ··· 2880 2870 msr_info->data = vmcs_readl(GUEST_SYSENTER_ESP); 2881 2871 break; 2882 2872 case MSR_IA32_BNDCFGS: 2883 - if (!vmx_mpx_supported()) 2873 + if (!kvm_mpx_supported()) 2884 2874 return 1; 2885 2875 msr_info->data = vmcs_read64(GUEST_BNDCFGS); 2886 2876 break; ··· 2957 2947 vmcs_writel(GUEST_SYSENTER_ESP, data); 2958 2948 break; 2959 2949 case MSR_IA32_BNDCFGS: 2960 - if (!vmx_mpx_supported()) 2950 + if (!kvm_mpx_supported()) 2961 2951 return 1; 2962 2952 vmcs_write64(GUEST_BNDCFGS, data); 2963 2953 break; ··· 3430 3420 for (i = j = 0; i < max_shadow_read_write_fields; i++) { 3431 3421 switch (shadow_read_write_fields[i]) { 3432 3422 case GUEST_BNDCFGS: 3433 - if (!vmx_mpx_supported()) 3423 + if (!kvm_mpx_supported()) 3434 3424 continue; 3435 3425 break; 3436 3426 default: ··· 5639 5629 } 5640 5630 5641 5631 if (vcpu->guest_debug == 0) { 5642 - u32 cpu_based_vm_exec_control; 5643 - 5644 - cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL); 5645 - cpu_based_vm_exec_control &= ~CPU_BASED_MOV_DR_EXITING; 5646 - vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control); 5632 + vmcs_clear_bits(CPU_BASED_VM_EXEC_CONTROL, 5633 + CPU_BASED_MOV_DR_EXITING); 5647 5634 5648 5635 /* 5649 5636 * No more DR vmexits; force a reload of the debug registers ··· 5677 5670 5678 5671 static void vmx_sync_dirty_debug_regs(struct kvm_vcpu *vcpu) 5679 5672 { 5680 - u32 cpu_based_vm_exec_control; 5681 - 5682 5673 get_debugreg(vcpu->arch.db[0], 0); 5683 5674 get_debugreg(vcpu->arch.db[1], 1); 5684 5675 get_debugreg(vcpu->arch.db[2], 2); ··· 5685 5680 vcpu->arch.dr7 = vmcs_readl(GUEST_DR7); 5686 5681 5687 5682 vcpu->arch.switch_db_regs &= ~KVM_DEBUGREG_WONT_EXIT; 5688 - 5689 - cpu_based_vm_exec_control = vmcs_read32(CPU_BASED_VM_EXEC_CONTROL); 5690 - cpu_based_vm_exec_control |= CPU_BASED_MOV_DR_EXITING; 5691 - vmcs_write32(CPU_BASED_VM_EXEC_CONTROL, cpu_based_vm_exec_control); 5683 + vmcs_set_bits(CPU_BASED_VM_EXEC_CONTROL, CPU_BASED_MOV_DR_EXITING); 5692 5684 } 5693 5685 5694 5686 static void vmx_set_dr7(struct kvm_vcpu *vcpu, unsigned long val) ··· 5770 5768 5771 5769 static int handle_vmcall(struct kvm_vcpu *vcpu) 5772 5770 { 5773 - kvm_emulate_hypercall(vcpu); 5774 - return 1; 5771 + return kvm_emulate_hypercall(vcpu); 5775 5772 } 5776 5773 5777 5774 static int handle_invd(struct kvm_vcpu *vcpu) ··· 6457 6456 6458 6457 if (vmx->nested.vmcs02_num >= max(VMCS02_POOL_SIZE, 1)) { 6459 6458 /* Recycle the least recently used VMCS. */ 6460 - item = list_entry(vmx->nested.vmcs02_pool.prev, 6461 - struct vmcs02_list, list); 6459 + item = list_last_entry(&vmx->nested.vmcs02_pool, 6460 + struct vmcs02_list, list); 6462 6461 item->vmptr = vmx->nested.current_vmptr; 6463 6462 list_move(&item->list, &vmx->nested.vmcs02_pool); 6464 6463 return &item->vmcs02; ··· 7773 7772 return enable_ept; 7774 7773 else if (is_no_device(intr_info) && 7775 7774 !(vmcs12->guest_cr0 & X86_CR0_TS)) 7775 + return false; 7776 + else if (is_debug(intr_info) && 7777 + vcpu->guest_debug & 7778 + (KVM_GUESTDBG_SINGLESTEP | KVM_GUESTDBG_USE_HW_BP)) 7779 + return false; 7780 + else if (is_breakpoint(intr_info) && 7781 + vcpu->guest_debug & KVM_GUESTDBG_USE_SW_BP) 7776 7782 return false; 7777 7783 return vmcs12->exception_bitmap & 7778 7784 (1u << (intr_info & INTR_INFO_VECTOR_MASK)); ··· 10285 10277 vmcs12->guest_sysenter_cs = vmcs_read32(GUEST_SYSENTER_CS); 10286 10278 vmcs12->guest_sysenter_esp = vmcs_readl(GUEST_SYSENTER_ESP); 10287 10279 vmcs12->guest_sysenter_eip = vmcs_readl(GUEST_SYSENTER_EIP); 10288 - if (vmx_mpx_supported()) 10280 + if (kvm_mpx_supported()) 10289 10281 vmcs12->guest_bndcfgs = vmcs_read64(GUEST_BNDCFGS); 10290 10282 if (nested_cpu_has_xsaves(vmcs12)) 10291 10283 vmcs12->xss_exit_bitmap = vmcs_read64(XSS_EXIT_BITMAP); ··· 10793 10785 */ 10794 10786 10795 10787 kvm_set_msi_irq(e, &irq); 10796 - if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) 10788 + if (!kvm_intr_is_single_vcpu(kvm, &irq, &vcpu)) { 10789 + /* 10790 + * Make sure the IRTE is in remapped mode if 10791 + * we don't handle it in posted mode. 10792 + */ 10793 + ret = irq_set_vcpu_affinity(host_irq, NULL); 10794 + if (ret < 0) { 10795 + printk(KERN_INFO 10796 + "failed to back to remapped mode, irq: %u\n", 10797 + host_irq); 10798 + goto out; 10799 + } 10800 + 10797 10801 continue; 10802 + } 10798 10803 10799 10804 vcpu_info.pi_desc_addr = __pa(vcpu_to_pi_desc(vcpu)); 10800 10805 vcpu_info.vector = irq.vector; 10801 10806 10802 - trace_kvm_pi_irte_update(vcpu->vcpu_id, e->gsi, 10807 + trace_kvm_pi_irte_update(vcpu->vcpu_id, host_irq, e->gsi, 10803 10808 vcpu_info.vector, vcpu_info.pi_desc_addr, set); 10804 10809 10805 10810 if (set)

+95 -63

arch/x86/kvm/x86.c

··· 123 123 unsigned int __read_mostly lapic_timer_advance_ns = 0; 124 124 module_param(lapic_timer_advance_ns, uint, S_IRUGO | S_IWUSR); 125 125 126 + static bool __read_mostly vector_hashing = true; 127 + module_param(vector_hashing, bool, S_IRUGO); 128 + 126 129 static bool __read_mostly backwards_tsc_observed = false; 127 130 128 131 #define KVM_NR_SHARED_MSRS 16 ··· 1199 1196 1200 1197 static uint32_t div_frac(uint32_t dividend, uint32_t divisor) 1201 1198 { 1202 - uint32_t quotient, remainder; 1203 - 1204 - /* Don't try to replace with do_div(), this one calculates 1205 - * "(dividend << 32) / divisor" */ 1206 - __asm__ ( "divl %4" 1207 - : "=a" (quotient), "=d" (remainder) 1208 - : "0" (0), "1" (dividend), "r" (divisor) ); 1209 - return quotient; 1199 + do_shl32_div32(dividend, divisor); 1200 + return dividend; 1210 1201 } 1211 1202 1212 - static void kvm_get_time_scale(uint32_t scaled_khz, uint32_t base_khz, 1203 + static void kvm_get_time_scale(uint64_t scaled_hz, uint64_t base_hz, 1213 1204 s8 *pshift, u32 *pmultiplier) 1214 1205 { 1215 1206 uint64_t scaled64; ··· 1211 1214 uint64_t tps64; 1212 1215 uint32_t tps32; 1213 1216 1214 - tps64 = base_khz * 1000LL; 1215 - scaled64 = scaled_khz * 1000LL; 1217 + tps64 = base_hz; 1218 + scaled64 = scaled_hz; 1216 1219 while (tps64 > scaled64*2 || tps64 & 0xffffffff00000000ULL) { 1217 1220 tps64 >>= 1; 1218 1221 shift--; ··· 1230 1233 *pshift = shift; 1231 1234 *pmultiplier = div_frac(scaled64, tps32); 1232 1235 1233 - pr_debug("%s: base_khz %u => %u, shift %d, mul %u\n", 1234 - __func__, base_khz, scaled_khz, shift, *pmultiplier); 1236 + pr_debug("%s: base_hz %llu => %llu, shift %d, mul %u\n", 1237 + __func__, base_hz, scaled_hz, shift, *pmultiplier); 1235 1238 } 1236 1239 1237 1240 #ifdef CONFIG_X86_64 ··· 1290 1293 return 0; 1291 1294 } 1292 1295 1293 - static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 this_tsc_khz) 1296 + static int kvm_set_tsc_khz(struct kvm_vcpu *vcpu, u32 user_tsc_khz) 1294 1297 { 1295 1298 u32 thresh_lo, thresh_hi; 1296 1299 int use_scaling = 0; 1297 1300 1298 1301 /* tsc_khz can be zero if TSC calibration fails */ 1299 - if (this_tsc_khz == 0) { 1302 + if (user_tsc_khz == 0) { 1300 1303 /* set tsc_scaling_ratio to a safe value */ 1301 1304 vcpu->arch.tsc_scaling_ratio = kvm_default_tsc_scaling_ratio; 1302 1305 return -1; 1303 1306 } 1304 1307 1305 1308 /* Compute a scale to convert nanoseconds in TSC cycles */ 1306 - kvm_get_time_scale(this_tsc_khz, NSEC_PER_SEC / 1000, 1309 + kvm_get_time_scale(user_tsc_khz * 1000LL, NSEC_PER_SEC, 1307 1310 &vcpu->arch.virtual_tsc_shift, 1308 1311 &vcpu->arch.virtual_tsc_mult); 1309 - vcpu->arch.virtual_tsc_khz = this_tsc_khz; 1312 + vcpu->arch.virtual_tsc_khz = user_tsc_khz; 1310 1313 1311 1314 /* 1312 1315 * Compute the variation in TSC rate which is acceptable ··· 1316 1319 */ 1317 1320 thresh_lo = adjust_tsc_khz(tsc_khz, -tsc_tolerance_ppm); 1318 1321 thresh_hi = adjust_tsc_khz(tsc_khz, tsc_tolerance_ppm); 1319 - if (this_tsc_khz < thresh_lo || this_tsc_khz > thresh_hi) { 1320 - pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", this_tsc_khz, thresh_lo, thresh_hi); 1322 + if (user_tsc_khz < thresh_lo || user_tsc_khz > thresh_hi) { 1323 + pr_debug("kvm: requested TSC rate %u falls outside tolerance [%u,%u]\n", user_tsc_khz, thresh_lo, thresh_hi); 1321 1324 use_scaling = 1; 1322 1325 } 1323 - return set_tsc_khz(vcpu, this_tsc_khz, use_scaling); 1326 + return set_tsc_khz(vcpu, user_tsc_khz, use_scaling); 1324 1327 } 1325 1328 1326 1329 static u64 compute_guest_tsc(struct kvm_vcpu *vcpu, s64 kernel_ns) ··· 1713 1716 1714 1717 static int kvm_guest_time_update(struct kvm_vcpu *v) 1715 1718 { 1716 - unsigned long flags, this_tsc_khz, tgt_tsc_khz; 1719 + unsigned long flags, tgt_tsc_khz; 1717 1720 struct kvm_vcpu_arch *vcpu = &v->arch; 1718 1721 struct kvm_arch *ka = &v->kvm->arch; 1719 1722 s64 kernel_ns; ··· 1739 1742 1740 1743 /* Keep irq disabled to prevent changes to the clock */ 1741 1744 local_irq_save(flags); 1742 - this_tsc_khz = __this_cpu_read(cpu_tsc_khz); 1743 - if (unlikely(this_tsc_khz == 0)) { 1745 + tgt_tsc_khz = __this_cpu_read(cpu_tsc_khz); 1746 + if (unlikely(tgt_tsc_khz == 0)) { 1744 1747 local_irq_restore(flags); 1745 1748 kvm_make_request(KVM_REQ_CLOCK_UPDATE, v); 1746 1749 return 1; ··· 1775 1778 if (!vcpu->pv_time_enabled) 1776 1779 return 0; 1777 1780 1778 - if (unlikely(vcpu->hw_tsc_khz != this_tsc_khz)) { 1779 - tgt_tsc_khz = kvm_has_tsc_control ? 1780 - vcpu->virtual_tsc_khz : this_tsc_khz; 1781 - kvm_get_time_scale(NSEC_PER_SEC / 1000, tgt_tsc_khz, 1781 + if (kvm_has_tsc_control) 1782 + tgt_tsc_khz = kvm_scale_tsc(v, tgt_tsc_khz); 1783 + 1784 + if (unlikely(vcpu->hw_tsc_khz != tgt_tsc_khz)) { 1785 + kvm_get_time_scale(NSEC_PER_SEC, tgt_tsc_khz * 1000LL, 1782 1786 &vcpu->hv_clock.tsc_shift, 1783 1787 &vcpu->hv_clock.tsc_to_system_mul); 1784 - vcpu->hw_tsc_khz = this_tsc_khz; 1788 + vcpu->hw_tsc_khz = tgt_tsc_khz; 1785 1789 } 1786 1790 1787 1791 /* With all the info we got, fill in the values */ ··· 2985 2987 kvm_x86_ops->set_nmi_mask(vcpu, events->nmi.masked); 2986 2988 2987 2989 if (events->flags & KVM_VCPUEVENT_VALID_SIPI_VECTOR && 2988 - kvm_vcpu_has_lapic(vcpu)) 2990 + lapic_in_kernel(vcpu)) 2989 2991 vcpu->arch.apic->sipi_vector = events->sipi_vector; 2990 2992 2991 2993 if (events->flags & KVM_VCPUEVENT_VALID_SMM) { ··· 2998 3000 vcpu->arch.hflags |= HF_SMM_INSIDE_NMI_MASK; 2999 3001 else 3000 3002 vcpu->arch.hflags &= ~HF_SMM_INSIDE_NMI_MASK; 3001 - if (kvm_vcpu_has_lapic(vcpu)) { 3003 + if (lapic_in_kernel(vcpu)) { 3002 3004 if (events->smi.latched_init) 3003 3005 set_bit(KVM_APIC_INIT, &vcpu->arch.apic->pending_events); 3004 3006 else ··· 3238 3240 switch (ioctl) { 3239 3241 case KVM_GET_LAPIC: { 3240 3242 r = -EINVAL; 3241 - if (!vcpu->arch.apic) 3243 + if (!lapic_in_kernel(vcpu)) 3242 3244 goto out; 3243 3245 u.lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); 3244 3246 ··· 3256 3258 } 3257 3259 case KVM_SET_LAPIC: { 3258 3260 r = -EINVAL; 3259 - if (!vcpu->arch.apic) 3261 + if (!lapic_in_kernel(vcpu)) 3260 3262 goto out; 3261 3263 u.lapic = memdup_user(argp, sizeof(*u.lapic)); 3262 3264 if (IS_ERR(u.lapic)) ··· 3603 3605 3604 3606 static int kvm_vm_ioctl_get_pit(struct kvm *kvm, struct kvm_pit_state *ps) 3605 3607 { 3606 - mutex_lock(&kvm->arch.vpit->pit_state.lock); 3607 - memcpy(ps, &kvm->arch.vpit->pit_state, sizeof(struct kvm_pit_state)); 3608 - mutex_unlock(&kvm->arch.vpit->pit_state.lock); 3608 + struct kvm_kpit_state *kps = &kvm->arch.vpit->pit_state; 3609 + 3610 + BUILD_BUG_ON(sizeof(*ps) != sizeof(kps->channels)); 3611 + 3612 + mutex_lock(&kps->lock); 3613 + memcpy(ps, &kps->channels, sizeof(*ps)); 3614 + mutex_unlock(&kps->lock); 3609 3615 return 0; 3610 3616 } 3611 3617 3612 3618 static int kvm_vm_ioctl_set_pit(struct kvm *kvm, struct kvm_pit_state *ps) 3613 3619 { 3614 3620 int i; 3615 - mutex_lock(&kvm->arch.vpit->pit_state.lock); 3616 - memcpy(&kvm->arch.vpit->pit_state, ps, sizeof(struct kvm_pit_state)); 3621 + struct kvm_pit *pit = kvm->arch.vpit; 3622 + 3623 + mutex_lock(&pit->pit_state.lock); 3624 + memcpy(&pit->pit_state.channels, ps, sizeof(*ps)); 3617 3625 for (i = 0; i < 3; i++) 3618 - kvm_pit_load_count(kvm, i, ps->channels[i].count, 0); 3619 - mutex_unlock(&kvm->arch.vpit->pit_state.lock); 3626 + kvm_pit_load_count(pit, i, ps->channels[i].count, 0); 3627 + mutex_unlock(&pit->pit_state.lock); 3620 3628 return 0; 3621 3629 } 3622 3630 ··· 3642 3638 int start = 0; 3643 3639 int i; 3644 3640 u32 prev_legacy, cur_legacy; 3645 - mutex_lock(&kvm->arch.vpit->pit_state.lock); 3646 - prev_legacy = kvm->arch.vpit->pit_state.flags & KVM_PIT_FLAGS_HPET_LEGACY; 3641 + struct kvm_pit *pit = kvm->arch.vpit; 3642 + 3643 + mutex_lock(&pit->pit_state.lock); 3644 + prev_legacy = pit->pit_state.flags & KVM_PIT_FLAGS_HPET_LEGACY; 3647 3645 cur_legacy = ps->flags & KVM_PIT_FLAGS_HPET_LEGACY; 3648 3646 if (!prev_legacy && cur_legacy) 3649 3647 start = 1; 3650 - memcpy(&kvm->arch.vpit->pit_state.channels, &ps->channels, 3651 - sizeof(kvm->arch.vpit->pit_state.channels)); 3652 - kvm->arch.vpit->pit_state.flags = ps->flags; 3648 + memcpy(&pit->pit_state.channels, &ps->channels, 3649 + sizeof(pit->pit_state.channels)); 3650 + pit->pit_state.flags = ps->flags; 3653 3651 for (i = 0; i < 3; i++) 3654 - kvm_pit_load_count(kvm, i, kvm->arch.vpit->pit_state.channels[i].count, 3652 + kvm_pit_load_count(pit, i, pit->pit_state.channels[i].count, 3655 3653 start && i == 0); 3656 - mutex_unlock(&kvm->arch.vpit->pit_state.lock); 3654 + mutex_unlock(&pit->pit_state.lock); 3657 3655 return 0; 3658 3656 } 3659 3657 3660 3658 static int kvm_vm_ioctl_reinject(struct kvm *kvm, 3661 3659 struct kvm_reinject_control *control) 3662 3660 { 3663 - if (!kvm->arch.vpit) 3661 + struct kvm_pit *pit = kvm->arch.vpit; 3662 + 3663 + if (!pit) 3664 3664 return -ENXIO; 3665 - mutex_lock(&kvm->arch.vpit->pit_state.lock); 3666 - kvm->arch.vpit->pit_state.reinject = control->pit_reinject; 3667 - mutex_unlock(&kvm->arch.vpit->pit_state.lock); 3665 + 3666 + /* pit->pit_state.lock was overloaded to prevent userspace from getting 3667 + * an inconsistent state after running multiple KVM_REINJECT_CONTROL 3668 + * ioctls in parallel. Use a separate lock if that ioctl isn't rare. 3669 + */ 3670 + mutex_lock(&pit->pit_state.lock); 3671 + kvm_pit_set_reinject(pit, control->pit_reinject); 3672 + mutex_unlock(&pit->pit_state.lock); 3673 + 3668 3674 return 0; 3669 3675 } 3670 3676 ··· 4107 4093 4108 4094 do { 4109 4095 n = min(len, 8); 4110 - if (!(vcpu->arch.apic && 4096 + if (!(lapic_in_kernel(vcpu) && 4111 4097 !kvm_iodevice_write(vcpu, &vcpu->arch.apic->dev, addr, n, v)) 4112 4098 && kvm_io_bus_write(vcpu, KVM_MMIO_BUS, addr, n, v)) 4113 4099 break; ··· 4127 4113 4128 4114 do { 4129 4115 n = min(len, 8); 4130 - if (!(vcpu->arch.apic && 4116 + if (!(lapic_in_kernel(vcpu) && 4131 4117 !kvm_iodevice_read(vcpu, &vcpu->arch.apic->dev, 4132 4118 addr, n, v)) 4133 4119 && kvm_io_bus_read(vcpu, KVM_MMIO_BUS, addr, n, v)) ··· 4360 4346 ret = kvm_vcpu_write_guest(vcpu, gpa, val, bytes); 4361 4347 if (ret < 0) 4362 4348 return 0; 4363 - kvm_mmu_pte_write(vcpu, gpa, val, bytes); 4349 + kvm_page_track_write(vcpu, gpa, val, bytes); 4364 4350 return 1; 4365 4351 } 4366 4352 ··· 4618 4604 return X86EMUL_CMPXCHG_FAILED; 4619 4605 4620 4606 kvm_vcpu_mark_page_dirty(vcpu, gpa >> PAGE_SHIFT); 4621 - kvm_mmu_pte_write(vcpu, gpa, new, bytes); 4607 + kvm_page_track_write(vcpu, gpa, new, bytes); 4622 4608 4623 4609 return X86EMUL_CONTINUE; 4624 4610 ··· 6024 6010 if (!kvm_x86_ops->update_cr8_intercept) 6025 6011 return; 6026 6012 6027 - if (!vcpu->arch.apic) 6013 + if (!lapic_in_kernel(vcpu)) 6028 6014 return; 6029 6015 6030 6016 if (vcpu->arch.apicv_active) ··· 7052 7038 int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, 7053 7039 struct kvm_mp_state *mp_state) 7054 7040 { 7055 - if (!kvm_vcpu_has_lapic(vcpu) && 7041 + if (!lapic_in_kernel(vcpu) && 7056 7042 mp_state->mp_state != KVM_MP_STATE_RUNNABLE) 7057 7043 return -EINVAL; 7058 7044 ··· 7328 7314 * Every 255 times fpu_counter rolls over to 0; a guest that uses 7329 7315 * the FPU in bursts will revert to loading it on demand. 7330 7316 */ 7331 - if (!vcpu->arch.eager_fpu) { 7317 + if (!use_eager_fpu()) { 7332 7318 if (++vcpu->fpu_counter < 5) 7333 7319 kvm_make_request(KVM_REQ_DEACTIVATE_FPU, vcpu); 7334 7320 } ··· 7607 7593 } 7608 7594 7609 7595 struct static_key kvm_no_apic_vcpu __read_mostly; 7596 + EXPORT_SYMBOL_GPL(kvm_no_apic_vcpu); 7610 7597 7611 7598 int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu) 7612 7599 { ··· 7739 7724 INIT_DELAYED_WORK(&kvm->arch.kvmclock_update_work, kvmclock_update_fn); 7740 7725 INIT_DELAYED_WORK(&kvm->arch.kvmclock_sync_work, kvmclock_sync_fn); 7741 7726 7727 + kvm_page_track_init(kvm); 7728 + kvm_mmu_init_vm(kvm); 7729 + 7742 7730 return 0; 7743 7731 } 7744 7732 ··· 7868 7850 kfree(kvm->arch.vioapic); 7869 7851 kvm_free_vcpus(kvm); 7870 7852 kfree(rcu_dereference_check(kvm->arch.apic_map, 1)); 7853 + kvm_mmu_uninit_vm(kvm); 7871 7854 } 7872 7855 7873 7856 void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free, ··· 7890 7871 free->arch.lpage_info[i - 1] = NULL; 7891 7872 } 7892 7873 } 7874 + 7875 + kvm_page_track_free_memslot(free, dont); 7893 7876 } 7894 7877 7895 7878 int kvm_arch_create_memslot(struct kvm *kvm, struct kvm_memory_slot *slot, ··· 7900 7879 int i; 7901 7880 7902 7881 for (i = 0; i < KVM_NR_PAGE_SIZES; ++i) { 7882 + struct kvm_lpage_info *linfo; 7903 7883 unsigned long ugfn; 7904 7884 int lpages; 7905 7885 int level = i + 1; ··· 7915 7893 if (i == 0) 7916 7894 continue; 7917 7895 7918 - slot->arch.lpage_info[i - 1] = kvm_kvzalloc(lpages * 7919 - sizeof(*slot->arch.lpage_info[i - 1])); 7920 - if (!slot->arch.lpage_info[i - 1]) 7896 + linfo = kvm_kvzalloc(lpages * sizeof(*linfo)); 7897 + if (!linfo) 7921 7898 goto out_free; 7922 7899 7900 + slot->arch.lpage_info[i - 1] = linfo; 7901 + 7923 7902 if (slot->base_gfn & (KVM_PAGES_PER_HPAGE(level) - 1)) 7924 - slot->arch.lpage_info[i - 1][0].write_count = 1; 7903 + linfo[0].disallow_lpage = 1; 7925 7904 if ((slot->base_gfn + npages) & (KVM_PAGES_PER_HPAGE(level) - 1)) 7926 - slot->arch.lpage_info[i - 1][lpages - 1].write_count = 1; 7905 + linfo[lpages - 1].disallow_lpage = 1; 7927 7906 ugfn = slot->userspace_addr >> PAGE_SHIFT; 7928 7907 /* 7929 7908 * If the gfn and userspace address are not aligned wrt each ··· 7936 7913 unsigned long j; 7937 7914 7938 7915 for (j = 0; j < lpages; ++j) 7939 - slot->arch.lpage_info[i - 1][j].write_count = 1; 7916 + linfo[j].disallow_lpage = 1; 7940 7917 } 7941 7918 } 7919 + 7920 + if (kvm_page_track_create_memslot(slot, npages)) 7921 + goto out_free; 7942 7922 7943 7923 return 0; 7944 7924 ··· 8395 8369 8396 8370 return kvm_x86_ops->update_pi_irte(kvm, host_irq, guest_irq, set); 8397 8371 } 8372 + 8373 + bool kvm_vector_hashing_enabled(void) 8374 + { 8375 + return vector_hashing; 8376 + } 8377 + EXPORT_SYMBOL_GPL(kvm_vector_hashing_enabled); 8398 8378 8399 8379 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_exit); 8400 8380 EXPORT_TRACEPOINT_SYMBOL_GPL(kvm_fast_mmio);

+16

arch/x86/kvm/x86.h

··· 179 179 int kvm_mtrr_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata); 180 180 bool kvm_mtrr_check_gfn_range_consistency(struct kvm_vcpu *vcpu, gfn_t gfn, 181 181 int page_num); 182 + bool kvm_vector_hashing_enabled(void); 182 183 183 184 #define KVM_SUPPORTED_XCR0 (XFEATURE_MASK_FP | XFEATURE_MASK_SSE \ 184 185 | XFEATURE_MASK_YMM | XFEATURE_MASK_BNDREGS \ ··· 193 192 extern unsigned int lapic_timer_advance_ns; 194 193 195 194 extern struct static_key kvm_no_apic_vcpu; 195 + 196 + /* Same "calling convention" as do_div: 197 + * - divide (n << 32) by base 198 + * - put result in n 199 + * - return remainder 200 + */ 201 + #define do_shl32_div32(n, base) \ 202 + ({ \ 203 + u32 __quot, __rem; \ 204 + asm("divl %2" : "=a" (__quot), "=d" (__rem) \ 205 + : "rm" (base), "0" (0), "1" ((u32) n)); \ 206 + n = __quot; \ 207 + __rem; \ 208 + }) 209 + 196 210 #endif

+59 -37

drivers/clocksource/arm_arch_timer.c

··· 75 75 76 76 static struct clock_event_device __percpu *arch_timer_evt; 77 77 78 - static bool arch_timer_use_virtual = true; 78 + static enum ppi_nr arch_timer_uses_ppi = VIRT_PPI; 79 79 static bool arch_timer_c3stop; 80 80 static bool arch_timer_mem_use_virtual; 81 81 ··· 271 271 clk->name = "arch_sys_timer"; 272 272 clk->rating = 450; 273 273 clk->cpumask = cpumask_of(smp_processor_id()); 274 - if (arch_timer_use_virtual) { 275 - clk->irq = arch_timer_ppi[VIRT_PPI]; 274 + clk->irq = arch_timer_ppi[arch_timer_uses_ppi]; 275 + switch (arch_timer_uses_ppi) { 276 + case VIRT_PPI: 276 277 clk->set_state_shutdown = arch_timer_shutdown_virt; 277 278 clk->set_state_oneshot_stopped = arch_timer_shutdown_virt; 278 279 clk->set_next_event = arch_timer_set_next_event_virt; 279 - } else { 280 - clk->irq = arch_timer_ppi[PHYS_SECURE_PPI]; 280 + break; 281 + case PHYS_SECURE_PPI: 282 + case PHYS_NONSECURE_PPI: 283 + case HYP_PPI: 281 284 clk->set_state_shutdown = arch_timer_shutdown_phys; 282 285 clk->set_state_oneshot_stopped = arch_timer_shutdown_phys; 283 286 clk->set_next_event = arch_timer_set_next_event_phys; 287 + break; 288 + default: 289 + BUG(); 284 290 } 285 291 } else { 286 292 clk->features |= CLOCK_EVT_FEAT_DYNIRQ; ··· 356 350 arch_timer_set_cntkctl(cntkctl); 357 351 } 358 352 353 + static bool arch_timer_has_nonsecure_ppi(void) 354 + { 355 + return (arch_timer_uses_ppi == PHYS_SECURE_PPI && 356 + arch_timer_ppi[PHYS_NONSECURE_PPI]); 357 + } 358 + 359 359 static int arch_timer_setup(struct clock_event_device *clk) 360 360 { 361 361 __arch_timer_setup(ARCH_CP15_TIMER, clk); 362 362 363 - if (arch_timer_use_virtual) 364 - enable_percpu_irq(arch_timer_ppi[VIRT_PPI], 0); 365 - else { 366 - enable_percpu_irq(arch_timer_ppi[PHYS_SECURE_PPI], 0); 367 - if (arch_timer_ppi[PHYS_NONSECURE_PPI]) 368 - enable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI], 0); 369 - } 363 + enable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi], 0); 364 + 365 + if (arch_timer_has_nonsecure_ppi()) 366 + enable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI], 0); 370 367 371 368 arch_counter_set_user_access(); 372 369 if (IS_ENABLED(CONFIG_ARM_ARCH_TIMER_EVTSTREAM)) ··· 411 402 (unsigned long)arch_timer_rate / 1000000, 412 403 (unsigned long)(arch_timer_rate / 10000) % 100, 413 404 type & ARCH_CP15_TIMER ? 414 - arch_timer_use_virtual ? "virt" : "phys" : 405 + (arch_timer_uses_ppi == VIRT_PPI) ? "virt" : "phys" : 415 406 "", 416 407 type == (ARCH_CP15_TIMER | ARCH_MEM_TIMER) ? "/" : "", 417 408 type & ARCH_MEM_TIMER ? ··· 481 472 482 473 /* Register the CP15 based counter if we have one */ 483 474 if (type & ARCH_CP15_TIMER) { 484 - if (IS_ENABLED(CONFIG_ARM64) || arch_timer_use_virtual) 475 + if (IS_ENABLED(CONFIG_ARM64) || arch_timer_uses_ppi == VIRT_PPI) 485 476 arch_timer_read_counter = arch_counter_get_cntvct; 486 477 else 487 478 arch_timer_read_counter = arch_counter_get_cntpct; ··· 511 502 pr_debug("arch_timer_teardown disable IRQ%d cpu #%d\n", 512 503 clk->irq, smp_processor_id()); 513 504 514 - if (arch_timer_use_virtual) 515 - disable_percpu_irq(arch_timer_ppi[VIRT_PPI]); 516 - else { 517 - disable_percpu_irq(arch_timer_ppi[PHYS_SECURE_PPI]); 518 - if (arch_timer_ppi[PHYS_NONSECURE_PPI]) 519 - disable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI]); 520 - } 505 + disable_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi]); 506 + if (arch_timer_has_nonsecure_ppi()) 507 + disable_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI]); 521 508 522 509 clk->set_state_shutdown(clk); 523 510 } ··· 579 574 goto out; 580 575 } 581 576 582 - if (arch_timer_use_virtual) { 583 - ppi = arch_timer_ppi[VIRT_PPI]; 577 + ppi = arch_timer_ppi[arch_timer_uses_ppi]; 578 + switch (arch_timer_uses_ppi) { 579 + case VIRT_PPI: 584 580 err = request_percpu_irq(ppi, arch_timer_handler_virt, 585 581 "arch_timer", arch_timer_evt); 586 - } else { 587 - ppi = arch_timer_ppi[PHYS_SECURE_PPI]; 582 + break; 583 + case PHYS_SECURE_PPI: 584 + case PHYS_NONSECURE_PPI: 588 585 err = request_percpu_irq(ppi, arch_timer_handler_phys, 589 586 "arch_timer", arch_timer_evt); 590 587 if (!err && arch_timer_ppi[PHYS_NONSECURE_PPI]) { ··· 597 590 free_percpu_irq(arch_timer_ppi[PHYS_SECURE_PPI], 598 591 arch_timer_evt); 599 592 } 593 + break; 594 + case HYP_PPI: 595 + err = request_percpu_irq(ppi, arch_timer_handler_phys, 596 + "arch_timer", arch_timer_evt); 597 + break; 598 + default: 599 + BUG(); 600 600 } 601 601 602 602 if (err) { ··· 628 614 out_unreg_notify: 629 615 unregister_cpu_notifier(&arch_timer_cpu_nb); 630 616 out_free_irq: 631 - if (arch_timer_use_virtual) 632 - free_percpu_irq(arch_timer_ppi[VIRT_PPI], arch_timer_evt); 633 - else { 634 - free_percpu_irq(arch_timer_ppi[PHYS_SECURE_PPI], 617 + free_percpu_irq(arch_timer_ppi[arch_timer_uses_ppi], arch_timer_evt); 618 + if (arch_timer_has_nonsecure_ppi()) 619 + free_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI], 635 620 arch_timer_evt); 636 - if (arch_timer_ppi[PHYS_NONSECURE_PPI]) 637 - free_percpu_irq(arch_timer_ppi[PHYS_NONSECURE_PPI], 638 - arch_timer_evt); 639 - } 640 621 641 622 out_free: 642 623 free_percpu(arch_timer_evt); ··· 718 709 * 719 710 * If no interrupt provided for virtual timer, we'll have to 720 711 * stick to the physical timer. It'd better be accessible... 712 + * 713 + * On ARMv8.1 with VH extensions, the kernel runs in HYP. VHE 714 + * accesses to CNTP_*_EL1 registers are silently redirected to 715 + * their CNTHP_*_EL2 counterparts, and use a different PPI 716 + * number. 721 717 */ 722 718 if (is_hyp_mode_available() || !arch_timer_ppi[VIRT_PPI]) { 723 - arch_timer_use_virtual = false; 719 + bool has_ppi; 724 720 725 - if (!arch_timer_ppi[PHYS_SECURE_PPI] || 726 - !arch_timer_ppi[PHYS_NONSECURE_PPI]) { 721 + if (is_kernel_in_hyp_mode()) { 722 + arch_timer_uses_ppi = HYP_PPI; 723 + has_ppi = !!arch_timer_ppi[HYP_PPI]; 724 + } else { 725 + arch_timer_uses_ppi = PHYS_SECURE_PPI; 726 + has_ppi = (!!arch_timer_ppi[PHYS_SECURE_PPI] || 727 + !!arch_timer_ppi[PHYS_NONSECURE_PPI]); 728 + } 729 + 730 + if (!has_ppi) { 727 731 pr_warn("arch_timer: No interrupt available, giving up\n"); 728 732 return; 729 733 } ··· 769 747 */ 770 748 if (IS_ENABLED(CONFIG_ARM) && 771 749 of_property_read_bool(np, "arm,cpu-registers-not-fw-configured")) 772 - arch_timer_use_virtual = false; 750 + arch_timer_uses_ppi = PHYS_SECURE_PPI; 773 751 774 752 arch_timer_init(); 775 753 }

-6

drivers/hv/hyperv_vmbus.h

··· 256 256 u8 rsvdz4[1984]; 257 257 }; 258 258 259 - /* Declare the various hypercall operations. */ 260 - enum hv_call_code { 261 - HVCALL_POST_MESSAGE = 0x005c, 262 - HVCALL_SIGNAL_EVENT = 0x005d, 263 - }; 264 - 265 259 /* Definition of the hv_post_message hypercall input structure. */ 266 260 struct hv_input_post_message { 267 261 union hv_connection_id connectionid;

+5

include/kvm/arm_arch_timer.h

··· 55 55 56 56 /* VGIC mapping */ 57 57 struct irq_phys_map *map; 58 + 59 + /* Active IRQ state caching */ 60 + bool active_cleared_last; 58 61 }; 59 62 60 63 int kvm_timer_hyp_init(void); ··· 76 73 bool kvm_timer_should_fire(struct kvm_vcpu *vcpu); 77 74 void kvm_timer_schedule(struct kvm_vcpu *vcpu); 78 75 void kvm_timer_unschedule(struct kvm_vcpu *vcpu); 76 + 77 + void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu); 79 78 80 79 #endif

+110

include/kvm/arm_pmu.h

··· 1 + /* 2 + * Copyright (C) 2015 Linaro Ltd. 3 + * Author: Shannon Zhao <shannon.zhao@linaro.org> 4 + * 5 + * This program is free software; you can redistribute it and/or modify 6 + * it under the terms of the GNU General Public License version 2 as 7 + * published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it will be useful, 10 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 + * GNU General Public License for more details. 13 + * 14 + * You should have received a copy of the GNU General Public License 15 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 + */ 17 + 18 + #ifndef __ASM_ARM_KVM_PMU_H 19 + #define __ASM_ARM_KVM_PMU_H 20 + 21 + #ifdef CONFIG_KVM_ARM_PMU 22 + 23 + #include <linux/perf_event.h> 24 + #include <asm/perf_event.h> 25 + 26 + #define ARMV8_PMU_CYCLE_IDX (ARMV8_PMU_MAX_COUNTERS - 1) 27 + 28 + struct kvm_pmc { 29 + u8 idx; /* index into the pmu->pmc array */ 30 + struct perf_event *perf_event; 31 + u64 bitmask; 32 + }; 33 + 34 + struct kvm_pmu { 35 + int irq_num; 36 + struct kvm_pmc pmc[ARMV8_PMU_MAX_COUNTERS]; 37 + bool ready; 38 + bool irq_level; 39 + }; 40 + 41 + #define kvm_arm_pmu_v3_ready(v) ((v)->arch.pmu.ready) 42 + #define kvm_arm_pmu_irq_initialized(v) ((v)->arch.pmu.irq_num >= VGIC_NR_SGIS) 43 + u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx); 44 + void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val); 45 + u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu); 46 + void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu); 47 + void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu); 48 + void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val); 49 + void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val); 50 + void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val); 51 + void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu); 52 + void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu); 53 + void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val); 54 + void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val); 55 + void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, 56 + u64 select_idx); 57 + bool kvm_arm_support_pmu_v3(void); 58 + int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, 59 + struct kvm_device_attr *attr); 60 + int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, 61 + struct kvm_device_attr *attr); 62 + int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, 63 + struct kvm_device_attr *attr); 64 + #else 65 + struct kvm_pmu { 66 + }; 67 + 68 + #define kvm_arm_pmu_v3_ready(v) (false) 69 + #define kvm_arm_pmu_irq_initialized(v) (false) 70 + static inline u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, 71 + u64 select_idx) 72 + { 73 + return 0; 74 + } 75 + static inline void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, 76 + u64 select_idx, u64 val) {} 77 + static inline u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu) 78 + { 79 + return 0; 80 + } 81 + static inline void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) {} 82 + static inline void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) {} 83 + static inline void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val) {} 84 + static inline void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val) {} 85 + static inline void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val) {} 86 + static inline void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) {} 87 + static inline void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) {} 88 + static inline void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) {} 89 + static inline void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) {} 90 + static inline void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, 91 + u64 data, u64 select_idx) {} 92 + static inline bool kvm_arm_support_pmu_v3(void) { return false; } 93 + static inline int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, 94 + struct kvm_device_attr *attr) 95 + { 96 + return -ENXIO; 97 + } 98 + static inline int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, 99 + struct kvm_device_attr *attr) 100 + { 101 + return -ENXIO; 102 + } 103 + static inline int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, 104 + struct kvm_device_attr *attr) 105 + { 106 + return -ENXIO; 107 + } 108 + #endif 109 + 110 + #endif

+2 -6

include/kvm/arm_vgic.h

··· 279 279 u32 vgic_lr[VGIC_V2_MAX_LRS]; 280 280 }; 281 281 282 - /* 283 - * LRs are stored in reverse order in memory. make sure we index them 284 - * correctly. 285 - */ 286 - #define VGIC_V3_LR_INDEX(lr) (VGIC_V3_MAX_LRS - 1 - lr) 287 - 288 282 struct vgic_v3_cpu_if { 289 283 #ifdef CONFIG_KVM_ARM_VGIC_V3 290 284 u32 vgic_hcr; ··· 315 321 316 322 /* Protected by the distributor's irq_phys_map_lock */ 317 323 struct list_head irq_phys_map_list; 324 + 325 + u64 live_lrs; 318 326 }; 319 327 320 328 #define LR_EMPTY 0xff

+5 -4

include/trace/events/kvm.h

··· 359 359 #endif 360 360 361 361 TRACE_EVENT(kvm_halt_poll_ns, 362 - TP_PROTO(bool grow, unsigned int vcpu_id, int new, int old), 362 + TP_PROTO(bool grow, unsigned int vcpu_id, unsigned int new, 363 + unsigned int old), 363 364 TP_ARGS(grow, vcpu_id, new, old), 364 365 365 366 TP_STRUCT__entry( 366 367 __field(bool, grow) 367 368 __field(unsigned int, vcpu_id) 368 - __field(int, new) 369 - __field(int, old) 369 + __field(unsigned int, new) 370 + __field(unsigned int, old) 370 371 ), 371 372 372 373 TP_fast_assign( ··· 377 376 __entry->old = old; 378 377 ), 379 378 380 - TP_printk("vcpu %u: halt_poll_ns %d (%s %d)", 379 + TP_printk("vcpu %u: halt_poll_ns %u (%s %u)", 381 380 __entry->vcpu_id, 382 381 __entry->new, 383 382 __entry->grow ? "grow" : "shrink",

+18 -1

include/uapi/linux/kvm.h

··· 157 157 158 158 struct kvm_hyperv_exit { 159 159 #define KVM_EXIT_HYPERV_SYNIC 1 160 + #define KVM_EXIT_HYPERV_HCALL 2 160 161 __u32 type; 161 162 union { 162 163 struct { ··· 166 165 __u64 evt_page; 167 166 __u64 msg_page; 168 167 } synic; 168 + struct { 169 + __u64 input; 170 + __u64 result; 171 + __u64 params[2]; 172 + } hcall; 169 173 } u; 170 174 }; 171 175 ··· 547 541 __u8 exc_access_id; 548 542 __u8 per_access_id; 549 543 __u8 op_access_id; 550 - __u8 pad[3]; 544 + #define KVM_S390_PGM_FLAGS_ILC_VALID 0x01 545 + #define KVM_S390_PGM_FLAGS_ILC_0 0x02 546 + #define KVM_S390_PGM_FLAGS_ILC_1 0x04 547 + #define KVM_S390_PGM_FLAGS_ILC_MASK 0x06 548 + #define KVM_S390_PGM_FLAGS_NO_REWIND 0x08 549 + __u8 flags; 550 + __u8 pad[2]; 551 551 }; 552 552 553 553 struct kvm_s390_prefix_info { ··· 862 850 #define KVM_CAP_IOEVENTFD_ANY_LENGTH 122 863 851 #define KVM_CAP_HYPERV_SYNIC 123 864 852 #define KVM_CAP_S390_RI 124 853 + #define KVM_CAP_SPAPR_TCE_64 125 854 + #define KVM_CAP_ARM_PMU_V3 126 855 + #define KVM_CAP_VCPU_ATTRIBUTES 127 865 856 866 857 #ifdef KVM_CAP_IRQ_ROUTING 867 858 ··· 1157 1142 /* Available with KVM_CAP_PPC_ALLOC_HTAB */ 1158 1143 #define KVM_PPC_ALLOCATE_HTAB _IOWR(KVMIO, 0xa7, __u32) 1159 1144 #define KVM_CREATE_SPAPR_TCE _IOW(KVMIO, 0xa8, struct kvm_create_spapr_tce) 1145 + #define KVM_CREATE_SPAPR_TCE_64 _IOW(KVMIO, 0xa8, \ 1146 + struct kvm_create_spapr_tce_64) 1160 1147 /* Available with KVM_CAP_RMA */ 1161 1148 #define KVM_ALLOCATE_RMA _IOR(KVMIO, 0xa9, struct kvm_allocate_rma) 1162 1149 /* Available with KVM_CAP_PPC_HTAB_FD */

+31

virt/kvm/arm/arch_timer.c

··· 34 34 static struct workqueue_struct *wqueue; 35 35 static unsigned int host_vtimer_irq; 36 36 37 + void kvm_timer_vcpu_put(struct kvm_vcpu *vcpu) 38 + { 39 + vcpu->arch.timer_cpu.active_cleared_last = false; 40 + } 41 + 37 42 static cycle_t kvm_phys_timer_read(void) 38 43 { 39 44 return timecounter->cc->read(timecounter->cc); ··· 135 130 136 131 BUG_ON(!vgic_initialized(vcpu->kvm)); 137 132 133 + timer->active_cleared_last = false; 138 134 timer->irq.level = new_level; 139 135 trace_kvm_timer_update_irq(vcpu->vcpu_id, timer->map->virt_irq, 140 136 timer->irq.level); ··· 251 245 else 252 246 phys_active = false; 253 247 248 + /* 249 + * We want to avoid hitting the (re)distributor as much as 250 + * possible, as this is a potentially expensive MMIO access 251 + * (not to mention locks in the irq layer), and a solution for 252 + * this is to cache the "active" state in memory. 253 + * 254 + * Things to consider: we cannot cache an "active set" state, 255 + * because the HW can change this behind our back (it becomes 256 + * "clear" in the HW). We must then restrict the caching to 257 + * the "clear" state. 258 + * 259 + * The cache is invalidated on: 260 + * - vcpu put, indicating that the HW cannot be trusted to be 261 + * in a sane state on the next vcpu load, 262 + * - any change in the interrupt state 263 + * 264 + * Usage conditions: 265 + * - cached value is "active clear" 266 + * - value to be programmed is "active clear" 267 + */ 268 + if (timer->active_cleared_last && !phys_active) 269 + return; 270 + 254 271 ret = irq_set_irqchip_state(timer->map->irq, 255 272 IRQCHIP_STATE_ACTIVE, 256 273 phys_active); 257 274 WARN_ON(ret); 275 + 276 + timer->active_cleared_last = !phys_active; 258 277 } 259 278 260 279 /**

+170

virt/kvm/arm/hyp/vgic-v2-sr.c

··· 1 + /* 2 + * Copyright (C) 2012-2015 - ARM Ltd 3 + * Author: Marc Zyngier <marc.zyngier@arm.com> 4 + * 5 + * This program is free software; you can redistribute it and/or modify 6 + * it under the terms of the GNU General Public License version 2 as 7 + * published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it will be useful, 10 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 + * GNU General Public License for more details. 13 + * 14 + * You should have received a copy of the GNU General Public License 15 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 + */ 17 + 18 + #include <linux/compiler.h> 19 + #include <linux/irqchip/arm-gic.h> 20 + #include <linux/kvm_host.h> 21 + 22 + #include <asm/kvm_hyp.h> 23 + 24 + static void __hyp_text save_maint_int_state(struct kvm_vcpu *vcpu, 25 + void __iomem *base) 26 + { 27 + struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2; 28 + int nr_lr = vcpu->arch.vgic_cpu.nr_lr; 29 + u32 eisr0, eisr1; 30 + int i; 31 + bool expect_mi; 32 + 33 + expect_mi = !!(cpu_if->vgic_hcr & GICH_HCR_UIE); 34 + 35 + for (i = 0; i < nr_lr; i++) { 36 + if (!(vcpu->arch.vgic_cpu.live_lrs & (1UL << i))) 37 + continue; 38 + 39 + expect_mi |= (!(cpu_if->vgic_lr[i] & GICH_LR_HW) && 40 + (cpu_if->vgic_lr[i] & GICH_LR_EOI)); 41 + } 42 + 43 + if (expect_mi) { 44 + cpu_if->vgic_misr = readl_relaxed(base + GICH_MISR); 45 + 46 + if (cpu_if->vgic_misr & GICH_MISR_EOI) { 47 + eisr0 = readl_relaxed(base + GICH_EISR0); 48 + if (unlikely(nr_lr > 32)) 49 + eisr1 = readl_relaxed(base + GICH_EISR1); 50 + else 51 + eisr1 = 0; 52 + } else { 53 + eisr0 = eisr1 = 0; 54 + } 55 + } else { 56 + cpu_if->vgic_misr = 0; 57 + eisr0 = eisr1 = 0; 58 + } 59 + 60 + #ifdef CONFIG_CPU_BIG_ENDIAN 61 + cpu_if->vgic_eisr = ((u64)eisr0 << 32) | eisr1; 62 + #else 63 + cpu_if->vgic_eisr = ((u64)eisr1 << 32) | eisr0; 64 + #endif 65 + } 66 + 67 + static void __hyp_text save_elrsr(struct kvm_vcpu *vcpu, void __iomem *base) 68 + { 69 + struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2; 70 + int nr_lr = vcpu->arch.vgic_cpu.nr_lr; 71 + u32 elrsr0, elrsr1; 72 + 73 + elrsr0 = readl_relaxed(base + GICH_ELRSR0); 74 + if (unlikely(nr_lr > 32)) 75 + elrsr1 = readl_relaxed(base + GICH_ELRSR1); 76 + else 77 + elrsr1 = 0; 78 + 79 + #ifdef CONFIG_CPU_BIG_ENDIAN 80 + cpu_if->vgic_elrsr = ((u64)elrsr0 << 32) | elrsr1; 81 + #else 82 + cpu_if->vgic_elrsr = ((u64)elrsr1 << 32) | elrsr0; 83 + #endif 84 + } 85 + 86 + static void __hyp_text save_lrs(struct kvm_vcpu *vcpu, void __iomem *base) 87 + { 88 + struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2; 89 + int nr_lr = vcpu->arch.vgic_cpu.nr_lr; 90 + int i; 91 + 92 + for (i = 0; i < nr_lr; i++) { 93 + if (!(vcpu->arch.vgic_cpu.live_lrs & (1UL << i))) 94 + continue; 95 + 96 + if (cpu_if->vgic_elrsr & (1UL << i)) { 97 + cpu_if->vgic_lr[i] &= ~GICH_LR_STATE; 98 + continue; 99 + } 100 + 101 + cpu_if->vgic_lr[i] = readl_relaxed(base + GICH_LR0 + (i * 4)); 102 + writel_relaxed(0, base + GICH_LR0 + (i * 4)); 103 + } 104 + } 105 + 106 + /* vcpu is already in the HYP VA space */ 107 + void __hyp_text __vgic_v2_save_state(struct kvm_vcpu *vcpu) 108 + { 109 + struct kvm *kvm = kern_hyp_va(vcpu->kvm); 110 + struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2; 111 + struct vgic_dist *vgic = &kvm->arch.vgic; 112 + void __iomem *base = kern_hyp_va(vgic->vctrl_base); 113 + 114 + if (!base) 115 + return; 116 + 117 + cpu_if->vgic_vmcr = readl_relaxed(base + GICH_VMCR); 118 + 119 + if (vcpu->arch.vgic_cpu.live_lrs) { 120 + cpu_if->vgic_apr = readl_relaxed(base + GICH_APR); 121 + 122 + save_maint_int_state(vcpu, base); 123 + save_elrsr(vcpu, base); 124 + save_lrs(vcpu, base); 125 + 126 + writel_relaxed(0, base + GICH_HCR); 127 + 128 + vcpu->arch.vgic_cpu.live_lrs = 0; 129 + } else { 130 + cpu_if->vgic_eisr = 0; 131 + cpu_if->vgic_elrsr = ~0UL; 132 + cpu_if->vgic_misr = 0; 133 + cpu_if->vgic_apr = 0; 134 + } 135 + } 136 + 137 + /* vcpu is already in the HYP VA space */ 138 + void __hyp_text __vgic_v2_restore_state(struct kvm_vcpu *vcpu) 139 + { 140 + struct kvm *kvm = kern_hyp_va(vcpu->kvm); 141 + struct vgic_v2_cpu_if *cpu_if = &vcpu->arch.vgic_cpu.vgic_v2; 142 + struct vgic_dist *vgic = &kvm->arch.vgic; 143 + void __iomem *base = kern_hyp_va(vgic->vctrl_base); 144 + int i, nr_lr; 145 + u64 live_lrs = 0; 146 + 147 + if (!base) 148 + return; 149 + 150 + nr_lr = vcpu->arch.vgic_cpu.nr_lr; 151 + 152 + for (i = 0; i < nr_lr; i++) 153 + if (cpu_if->vgic_lr[i] & GICH_LR_STATE) 154 + live_lrs |= 1UL << i; 155 + 156 + if (live_lrs) { 157 + writel_relaxed(cpu_if->vgic_hcr, base + GICH_HCR); 158 + writel_relaxed(cpu_if->vgic_apr, base + GICH_APR); 159 + for (i = 0; i < nr_lr; i++) { 160 + if (!(live_lrs & (1UL << i))) 161 + continue; 162 + 163 + writel_relaxed(cpu_if->vgic_lr[i], 164 + base + GICH_LR0 + (i * 4)); 165 + } 166 + } 167 + 168 + writel_relaxed(cpu_if->vgic_vmcr, base + GICH_VMCR); 169 + vcpu->arch.vgic_cpu.live_lrs = live_lrs; 170 + }

+529

virt/kvm/arm/pmu.c

··· 1 + /* 2 + * Copyright (C) 2015 Linaro Ltd. 3 + * Author: Shannon Zhao <shannon.zhao@linaro.org> 4 + * 5 + * This program is free software; you can redistribute it and/or modify 6 + * it under the terms of the GNU General Public License version 2 as 7 + * published by the Free Software Foundation. 8 + * 9 + * This program is distributed in the hope that it will be useful, 10 + * but WITHOUT ANY WARRANTY; without even the implied warranty of 11 + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 12 + * GNU General Public License for more details. 13 + * 14 + * You should have received a copy of the GNU General Public License 15 + * along with this program. If not, see <http://www.gnu.org/licenses/>. 16 + */ 17 + 18 + #include <linux/cpu.h> 19 + #include <linux/kvm.h> 20 + #include <linux/kvm_host.h> 21 + #include <linux/perf_event.h> 22 + #include <linux/uaccess.h> 23 + #include <asm/kvm_emulate.h> 24 + #include <kvm/arm_pmu.h> 25 + #include <kvm/arm_vgic.h> 26 + 27 + /** 28 + * kvm_pmu_get_counter_value - get PMU counter value 29 + * @vcpu: The vcpu pointer 30 + * @select_idx: The counter index 31 + */ 32 + u64 kvm_pmu_get_counter_value(struct kvm_vcpu *vcpu, u64 select_idx) 33 + { 34 + u64 counter, reg, enabled, running; 35 + struct kvm_pmu *pmu = &vcpu->arch.pmu; 36 + struct kvm_pmc *pmc = &pmu->pmc[select_idx]; 37 + 38 + reg = (select_idx == ARMV8_PMU_CYCLE_IDX) 39 + ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx; 40 + counter = vcpu_sys_reg(vcpu, reg); 41 + 42 + /* The real counter value is equal to the value of counter register plus 43 + * the value perf event counts. 44 + */ 45 + if (pmc->perf_event) 46 + counter += perf_event_read_value(pmc->perf_event, &enabled, 47 + &running); 48 + 49 + return counter & pmc->bitmask; 50 + } 51 + 52 + /** 53 + * kvm_pmu_set_counter_value - set PMU counter value 54 + * @vcpu: The vcpu pointer 55 + * @select_idx: The counter index 56 + * @val: The counter value 57 + */ 58 + void kvm_pmu_set_counter_value(struct kvm_vcpu *vcpu, u64 select_idx, u64 val) 59 + { 60 + u64 reg; 61 + 62 + reg = (select_idx == ARMV8_PMU_CYCLE_IDX) 63 + ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + select_idx; 64 + vcpu_sys_reg(vcpu, reg) += (s64)val - kvm_pmu_get_counter_value(vcpu, select_idx); 65 + } 66 + 67 + /** 68 + * kvm_pmu_stop_counter - stop PMU counter 69 + * @pmc: The PMU counter pointer 70 + * 71 + * If this counter has been configured to monitor some event, release it here. 72 + */ 73 + static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc) 74 + { 75 + u64 counter, reg; 76 + 77 + if (pmc->perf_event) { 78 + counter = kvm_pmu_get_counter_value(vcpu, pmc->idx); 79 + reg = (pmc->idx == ARMV8_PMU_CYCLE_IDX) 80 + ? PMCCNTR_EL0 : PMEVCNTR0_EL0 + pmc->idx; 81 + vcpu_sys_reg(vcpu, reg) = counter; 82 + perf_event_disable(pmc->perf_event); 83 + perf_event_release_kernel(pmc->perf_event); 84 + pmc->perf_event = NULL; 85 + } 86 + } 87 + 88 + /** 89 + * kvm_pmu_vcpu_reset - reset pmu state for cpu 90 + * @vcpu: The vcpu pointer 91 + * 92 + */ 93 + void kvm_pmu_vcpu_reset(struct kvm_vcpu *vcpu) 94 + { 95 + int i; 96 + struct kvm_pmu *pmu = &vcpu->arch.pmu; 97 + 98 + for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) { 99 + kvm_pmu_stop_counter(vcpu, &pmu->pmc[i]); 100 + pmu->pmc[i].idx = i; 101 + pmu->pmc[i].bitmask = 0xffffffffUL; 102 + } 103 + } 104 + 105 + /** 106 + * kvm_pmu_vcpu_destroy - free perf event of PMU for cpu 107 + * @vcpu: The vcpu pointer 108 + * 109 + */ 110 + void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) 111 + { 112 + int i; 113 + struct kvm_pmu *pmu = &vcpu->arch.pmu; 114 + 115 + for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) { 116 + struct kvm_pmc *pmc = &pmu->pmc[i]; 117 + 118 + if (pmc->perf_event) { 119 + perf_event_disable(pmc->perf_event); 120 + perf_event_release_kernel(pmc->perf_event); 121 + pmc->perf_event = NULL; 122 + } 123 + } 124 + } 125 + 126 + u64 kvm_pmu_valid_counter_mask(struct kvm_vcpu *vcpu) 127 + { 128 + u64 val = vcpu_sys_reg(vcpu, PMCR_EL0) >> ARMV8_PMU_PMCR_N_SHIFT; 129 + 130 + val &= ARMV8_PMU_PMCR_N_MASK; 131 + if (val == 0) 132 + return BIT(ARMV8_PMU_CYCLE_IDX); 133 + else 134 + return GENMASK(val - 1, 0) | BIT(ARMV8_PMU_CYCLE_IDX); 135 + } 136 + 137 + /** 138 + * kvm_pmu_enable_counter - enable selected PMU counter 139 + * @vcpu: The vcpu pointer 140 + * @val: the value guest writes to PMCNTENSET register 141 + * 142 + * Call perf_event_enable to start counting the perf event 143 + */ 144 + void kvm_pmu_enable_counter(struct kvm_vcpu *vcpu, u64 val) 145 + { 146 + int i; 147 + struct kvm_pmu *pmu = &vcpu->arch.pmu; 148 + struct kvm_pmc *pmc; 149 + 150 + if (!(vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E) || !val) 151 + return; 152 + 153 + for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) { 154 + if (!(val & BIT(i))) 155 + continue; 156 + 157 + pmc = &pmu->pmc[i]; 158 + if (pmc->perf_event) { 159 + perf_event_enable(pmc->perf_event); 160 + if (pmc->perf_event->state != PERF_EVENT_STATE_ACTIVE) 161 + kvm_debug("fail to enable perf event\n"); 162 + } 163 + } 164 + } 165 + 166 + /** 167 + * kvm_pmu_disable_counter - disable selected PMU counter 168 + * @vcpu: The vcpu pointer 169 + * @val: the value guest writes to PMCNTENCLR register 170 + * 171 + * Call perf_event_disable to stop counting the perf event 172 + */ 173 + void kvm_pmu_disable_counter(struct kvm_vcpu *vcpu, u64 val) 174 + { 175 + int i; 176 + struct kvm_pmu *pmu = &vcpu->arch.pmu; 177 + struct kvm_pmc *pmc; 178 + 179 + if (!val) 180 + return; 181 + 182 + for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) { 183 + if (!(val & BIT(i))) 184 + continue; 185 + 186 + pmc = &pmu->pmc[i]; 187 + if (pmc->perf_event) 188 + perf_event_disable(pmc->perf_event); 189 + } 190 + } 191 + 192 + static u64 kvm_pmu_overflow_status(struct kvm_vcpu *vcpu) 193 + { 194 + u64 reg = 0; 195 + 196 + if ((vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E)) 197 + reg = vcpu_sys_reg(vcpu, PMOVSSET_EL0); 198 + reg &= vcpu_sys_reg(vcpu, PMCNTENSET_EL0); 199 + reg &= vcpu_sys_reg(vcpu, PMINTENSET_EL1); 200 + reg &= kvm_pmu_valid_counter_mask(vcpu); 201 + 202 + return reg; 203 + } 204 + 205 + /** 206 + * kvm_pmu_overflow_set - set PMU overflow interrupt 207 + * @vcpu: The vcpu pointer 208 + * @val: the value guest writes to PMOVSSET register 209 + */ 210 + void kvm_pmu_overflow_set(struct kvm_vcpu *vcpu, u64 val) 211 + { 212 + u64 reg; 213 + 214 + if (val == 0) 215 + return; 216 + 217 + vcpu_sys_reg(vcpu, PMOVSSET_EL0) |= val; 218 + reg = kvm_pmu_overflow_status(vcpu); 219 + if (reg != 0) 220 + kvm_vcpu_kick(vcpu); 221 + } 222 + 223 + static void kvm_pmu_update_state(struct kvm_vcpu *vcpu) 224 + { 225 + struct kvm_pmu *pmu = &vcpu->arch.pmu; 226 + bool overflow; 227 + 228 + if (!kvm_arm_pmu_v3_ready(vcpu)) 229 + return; 230 + 231 + overflow = !!kvm_pmu_overflow_status(vcpu); 232 + if (pmu->irq_level != overflow) { 233 + pmu->irq_level = overflow; 234 + kvm_vgic_inject_irq(vcpu->kvm, vcpu->vcpu_id, 235 + pmu->irq_num, overflow); 236 + } 237 + } 238 + 239 + /** 240 + * kvm_pmu_flush_hwstate - flush pmu state to cpu 241 + * @vcpu: The vcpu pointer 242 + * 243 + * Check if the PMU has overflowed while we were running in the host, and inject 244 + * an interrupt if that was the case. 245 + */ 246 + void kvm_pmu_flush_hwstate(struct kvm_vcpu *vcpu) 247 + { 248 + kvm_pmu_update_state(vcpu); 249 + } 250 + 251 + /** 252 + * kvm_pmu_sync_hwstate - sync pmu state from cpu 253 + * @vcpu: The vcpu pointer 254 + * 255 + * Check if the PMU has overflowed while we were running in the guest, and 256 + * inject an interrupt if that was the case. 257 + */ 258 + void kvm_pmu_sync_hwstate(struct kvm_vcpu *vcpu) 259 + { 260 + kvm_pmu_update_state(vcpu); 261 + } 262 + 263 + static inline struct kvm_vcpu *kvm_pmc_to_vcpu(struct kvm_pmc *pmc) 264 + { 265 + struct kvm_pmu *pmu; 266 + struct kvm_vcpu_arch *vcpu_arch; 267 + 268 + pmc -= pmc->idx; 269 + pmu = container_of(pmc, struct kvm_pmu, pmc[0]); 270 + vcpu_arch = container_of(pmu, struct kvm_vcpu_arch, pmu); 271 + return container_of(vcpu_arch, struct kvm_vcpu, arch); 272 + } 273 + 274 + /** 275 + * When perf event overflows, call kvm_pmu_overflow_set to set overflow status. 276 + */ 277 + static void kvm_pmu_perf_overflow(struct perf_event *perf_event, 278 + struct perf_sample_data *data, 279 + struct pt_regs *regs) 280 + { 281 + struct kvm_pmc *pmc = perf_event->overflow_handler_context; 282 + struct kvm_vcpu *vcpu = kvm_pmc_to_vcpu(pmc); 283 + int idx = pmc->idx; 284 + 285 + kvm_pmu_overflow_set(vcpu, BIT(idx)); 286 + } 287 + 288 + /** 289 + * kvm_pmu_software_increment - do software increment 290 + * @vcpu: The vcpu pointer 291 + * @val: the value guest writes to PMSWINC register 292 + */ 293 + void kvm_pmu_software_increment(struct kvm_vcpu *vcpu, u64 val) 294 + { 295 + int i; 296 + u64 type, enable, reg; 297 + 298 + if (val == 0) 299 + return; 300 + 301 + enable = vcpu_sys_reg(vcpu, PMCNTENSET_EL0); 302 + for (i = 0; i < ARMV8_PMU_CYCLE_IDX; i++) { 303 + if (!(val & BIT(i))) 304 + continue; 305 + type = vcpu_sys_reg(vcpu, PMEVTYPER0_EL0 + i) 306 + & ARMV8_PMU_EVTYPE_EVENT; 307 + if ((type == ARMV8_PMU_EVTYPE_EVENT_SW_INCR) 308 + && (enable & BIT(i))) { 309 + reg = vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) + 1; 310 + reg = lower_32_bits(reg); 311 + vcpu_sys_reg(vcpu, PMEVCNTR0_EL0 + i) = reg; 312 + if (!reg) 313 + kvm_pmu_overflow_set(vcpu, BIT(i)); 314 + } 315 + } 316 + } 317 + 318 + /** 319 + * kvm_pmu_handle_pmcr - handle PMCR register 320 + * @vcpu: The vcpu pointer 321 + * @val: the value guest writes to PMCR register 322 + */ 323 + void kvm_pmu_handle_pmcr(struct kvm_vcpu *vcpu, u64 val) 324 + { 325 + struct kvm_pmu *pmu = &vcpu->arch.pmu; 326 + struct kvm_pmc *pmc; 327 + u64 mask; 328 + int i; 329 + 330 + mask = kvm_pmu_valid_counter_mask(vcpu); 331 + if (val & ARMV8_PMU_PMCR_E) { 332 + kvm_pmu_enable_counter(vcpu, 333 + vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & mask); 334 + } else { 335 + kvm_pmu_disable_counter(vcpu, mask); 336 + } 337 + 338 + if (val & ARMV8_PMU_PMCR_C) 339 + kvm_pmu_set_counter_value(vcpu, ARMV8_PMU_CYCLE_IDX, 0); 340 + 341 + if (val & ARMV8_PMU_PMCR_P) { 342 + for (i = 0; i < ARMV8_PMU_CYCLE_IDX; i++) 343 + kvm_pmu_set_counter_value(vcpu, i, 0); 344 + } 345 + 346 + if (val & ARMV8_PMU_PMCR_LC) { 347 + pmc = &pmu->pmc[ARMV8_PMU_CYCLE_IDX]; 348 + pmc->bitmask = 0xffffffffffffffffUL; 349 + } 350 + } 351 + 352 + static bool kvm_pmu_counter_is_enabled(struct kvm_vcpu *vcpu, u64 select_idx) 353 + { 354 + return (vcpu_sys_reg(vcpu, PMCR_EL0) & ARMV8_PMU_PMCR_E) && 355 + (vcpu_sys_reg(vcpu, PMCNTENSET_EL0) & BIT(select_idx)); 356 + } 357 + 358 + /** 359 + * kvm_pmu_set_counter_event_type - set selected counter to monitor some event 360 + * @vcpu: The vcpu pointer 361 + * @data: The data guest writes to PMXEVTYPER_EL0 362 + * @select_idx: The number of selected counter 363 + * 364 + * When OS accesses PMXEVTYPER_EL0, that means it wants to set a PMC to count an 365 + * event with given hardware event number. Here we call perf_event API to 366 + * emulate this action and create a kernel perf event for it. 367 + */ 368 + void kvm_pmu_set_counter_event_type(struct kvm_vcpu *vcpu, u64 data, 369 + u64 select_idx) 370 + { 371 + struct kvm_pmu *pmu = &vcpu->arch.pmu; 372 + struct kvm_pmc *pmc = &pmu->pmc[select_idx]; 373 + struct perf_event *event; 374 + struct perf_event_attr attr; 375 + u64 eventsel, counter; 376 + 377 + kvm_pmu_stop_counter(vcpu, pmc); 378 + eventsel = data & ARMV8_PMU_EVTYPE_EVENT; 379 + 380 + /* Software increment event does't need to be backed by a perf event */ 381 + if (eventsel == ARMV8_PMU_EVTYPE_EVENT_SW_INCR) 382 + return; 383 + 384 + memset(&attr, 0, sizeof(struct perf_event_attr)); 385 + attr.type = PERF_TYPE_RAW; 386 + attr.size = sizeof(attr); 387 + attr.pinned = 1; 388 + attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, select_idx); 389 + attr.exclude_user = data & ARMV8_PMU_EXCLUDE_EL0 ? 1 : 0; 390 + attr.exclude_kernel = data & ARMV8_PMU_EXCLUDE_EL1 ? 1 : 0; 391 + attr.exclude_hv = 1; /* Don't count EL2 events */ 392 + attr.exclude_host = 1; /* Don't count host events */ 393 + attr.config = eventsel; 394 + 395 + counter = kvm_pmu_get_counter_value(vcpu, select_idx); 396 + /* The initial sample period (overflow count) of an event. */ 397 + attr.sample_period = (-counter) & pmc->bitmask; 398 + 399 + event = perf_event_create_kernel_counter(&attr, -1, current, 400 + kvm_pmu_perf_overflow, pmc); 401 + if (IS_ERR(event)) { 402 + pr_err_once("kvm: pmu event creation failed %ld\n", 403 + PTR_ERR(event)); 404 + return; 405 + } 406 + 407 + pmc->perf_event = event; 408 + } 409 + 410 + bool kvm_arm_support_pmu_v3(void) 411 + { 412 + /* 413 + * Check if HW_PERF_EVENTS are supported by checking the number of 414 + * hardware performance counters. This could ensure the presence of 415 + * a physical PMU and CONFIG_PERF_EVENT is selected. 416 + */ 417 + return (perf_num_counters() > 0); 418 + } 419 + 420 + static int kvm_arm_pmu_v3_init(struct kvm_vcpu *vcpu) 421 + { 422 + if (!kvm_arm_support_pmu_v3()) 423 + return -ENODEV; 424 + 425 + if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features) || 426 + !kvm_arm_pmu_irq_initialized(vcpu)) 427 + return -ENXIO; 428 + 429 + if (kvm_arm_pmu_v3_ready(vcpu)) 430 + return -EBUSY; 431 + 432 + kvm_pmu_vcpu_reset(vcpu); 433 + vcpu->arch.pmu.ready = true; 434 + 435 + return 0; 436 + } 437 + 438 + static bool irq_is_valid(struct kvm *kvm, int irq, bool is_ppi) 439 + { 440 + int i; 441 + struct kvm_vcpu *vcpu; 442 + 443 + kvm_for_each_vcpu(i, vcpu, kvm) { 444 + if (!kvm_arm_pmu_irq_initialized(vcpu)) 445 + continue; 446 + 447 + if (is_ppi) { 448 + if (vcpu->arch.pmu.irq_num != irq) 449 + return false; 450 + } else { 451 + if (vcpu->arch.pmu.irq_num == irq) 452 + return false; 453 + } 454 + } 455 + 456 + return true; 457 + } 458 + 459 + 460 + int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) 461 + { 462 + switch (attr->attr) { 463 + case KVM_ARM_VCPU_PMU_V3_IRQ: { 464 + int __user *uaddr = (int __user *)(long)attr->addr; 465 + int irq; 466 + 467 + if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features)) 468 + return -ENODEV; 469 + 470 + if (get_user(irq, uaddr)) 471 + return -EFAULT; 472 + 473 + /* 474 + * The PMU overflow interrupt could be a PPI or SPI, but for one 475 + * VM the interrupt type must be same for each vcpu. As a PPI, 476 + * the interrupt number is the same for all vcpus, while as an 477 + * SPI it must be a separate number per vcpu. 478 + */ 479 + if (irq < VGIC_NR_SGIS || irq >= vcpu->kvm->arch.vgic.nr_irqs || 480 + !irq_is_valid(vcpu->kvm, irq, irq < VGIC_NR_PRIVATE_IRQS)) 481 + return -EINVAL; 482 + 483 + if (kvm_arm_pmu_irq_initialized(vcpu)) 484 + return -EBUSY; 485 + 486 + kvm_debug("Set kvm ARM PMU irq: %d\n", irq); 487 + vcpu->arch.pmu.irq_num = irq; 488 + return 0; 489 + } 490 + case KVM_ARM_VCPU_PMU_V3_INIT: 491 + return kvm_arm_pmu_v3_init(vcpu); 492 + } 493 + 494 + return -ENXIO; 495 + } 496 + 497 + int kvm_arm_pmu_v3_get_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) 498 + { 499 + switch (attr->attr) { 500 + case KVM_ARM_VCPU_PMU_V3_IRQ: { 501 + int __user *uaddr = (int __user *)(long)attr->addr; 502 + int irq; 503 + 504 + if (!test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features)) 505 + return -ENODEV; 506 + 507 + if (!kvm_arm_pmu_irq_initialized(vcpu)) 508 + return -ENXIO; 509 + 510 + irq = vcpu->arch.pmu.irq_num; 511 + return put_user(irq, uaddr); 512 + } 513 + } 514 + 515 + return -ENXIO; 516 + } 517 + 518 + int kvm_arm_pmu_v3_has_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) 519 + { 520 + switch (attr->attr) { 521 + case KVM_ARM_VCPU_PMU_V3_IRQ: 522 + case KVM_ARM_VCPU_PMU_V3_INIT: 523 + if (kvm_arm_support_pmu_v3() && 524 + test_bit(KVM_ARM_VCPU_PMU_V3, vcpu->arch.features)) 525 + return 0; 526 + } 527 + 528 + return -ENXIO; 529 + }

+5 -5

virt/kvm/arm/vgic-v2-emul.c

··· 321 321 322 322 static const struct vgic_io_range vgic_dist_ranges[] = { 323 323 { 324 + .base = GIC_DIST_SOFTINT, 325 + .len = 4, 326 + .handle_mmio = handle_mmio_sgi_reg, 327 + }, 328 + { 324 329 .base = GIC_DIST_CTRL, 325 330 .len = 12, 326 331 .bits_per_irq = 0, ··· 390 385 .len = VGIC_MAX_IRQS / 4, 391 386 .bits_per_irq = 2, 392 387 .handle_mmio = handle_mmio_cfg_reg, 393 - }, 394 - { 395 - .base = GIC_DIST_SOFTINT, 396 - .len = 4, 397 - .handle_mmio = handle_mmio_sgi_reg, 398 388 }, 399 389 { 400 390 .base = GIC_DIST_SGI_PENDING_CLEAR,

+12

virt/kvm/arm/vgic-v2.c

··· 176 176 177 177 static struct vgic_params vgic_v2_params; 178 178 179 + static void vgic_cpu_init_lrs(void *params) 180 + { 181 + struct vgic_params *vgic = params; 182 + int i; 183 + 184 + for (i = 0; i < vgic->nr_lr; i++) 185 + writel_relaxed(0, vgic->vctrl_base + GICH_LR0 + (i * 4)); 186 + } 187 + 179 188 /** 180 189 * vgic_v2_probe - probe for a GICv2 compatible interrupt controller in DT 181 190 * @node: pointer to the DT node ··· 266 257 267 258 vgic->type = VGIC_V2; 268 259 vgic->max_gic_vcpus = VGIC_V2_MAX_CPUS; 260 + 261 + on_each_cpu(vgic_cpu_init_lrs, vgic, 1); 262 + 269 263 *ops = &vgic_v2_ops; 270 264 *params = vgic; 271 265 goto out;

+9 -2

virt/kvm/arm/vgic-v3.c

··· 42 42 static struct vgic_lr vgic_v3_get_lr(const struct kvm_vcpu *vcpu, int lr) 43 43 { 44 44 struct vgic_lr lr_desc; 45 - u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[VGIC_V3_LR_INDEX(lr)]; 45 + u64 val = vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[lr]; 46 46 47 47 if (vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) 48 48 lr_desc.irq = val & ICH_LR_VIRTUALID_MASK; ··· 106 106 lr_val |= ((u64)lr_desc.hwirq) << ICH_LR_PHYS_ID_SHIFT; 107 107 } 108 108 109 - vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[VGIC_V3_LR_INDEX(lr)] = lr_val; 109 + vcpu->arch.vgic_cpu.vgic_v3.vgic_lr[lr] = lr_val; 110 110 111 111 if (!(lr_desc.state & LR_STATE_MASK)) 112 112 vcpu->arch.vgic_cpu.vgic_v3.vgic_elrsr |= (1U << lr); ··· 216 216 217 217 static struct vgic_params vgic_v3_params; 218 218 219 + static void vgic_cpu_init_lrs(void *params) 220 + { 221 + kvm_call_hyp(__vgic_v3_init_lrs); 222 + } 223 + 219 224 /** 220 225 * vgic_v3_probe - probe for a GICv3 compatible interrupt controller in DT 221 226 * @node: pointer to the DT node ··· 288 283 289 284 kvm_info("%s@%llx IRQ%d\n", vgic_node->name, 290 285 vcpu_res.start, vgic->maint_irq); 286 + 287 + on_each_cpu(vgic_cpu_init_lrs, vgic, 1); 291 288 292 289 *ops = &vgic_v3_ops; 293 290 *params = vgic;

+4 -4

virt/kvm/async_pf.c

··· 109 109 /* cancel outstanding work queue item */ 110 110 while (!list_empty(&vcpu->async_pf.queue)) { 111 111 struct kvm_async_pf *work = 112 - list_entry(vcpu->async_pf.queue.next, 113 - typeof(*work), queue); 112 + list_first_entry(&vcpu->async_pf.queue, 113 + typeof(*work), queue); 114 114 list_del(&work->queue); 115 115 116 116 #ifdef CONFIG_KVM_ASYNC_PF_SYNC ··· 127 127 spin_lock(&vcpu->async_pf.lock); 128 128 while (!list_empty(&vcpu->async_pf.done)) { 129 129 struct kvm_async_pf *work = 130 - list_entry(vcpu->async_pf.done.next, 131 - typeof(*work), link); 130 + list_first_entry(&vcpu->async_pf.done, 131 + typeof(*work), link); 132 132 list_del(&work->link); 133 133 kmem_cache_free(async_pf_cache, work); 134 134 }

+21 -16

virt/kvm/kvm_main.c

··· 72 72 73 73 /* Default doubles per-vcpu halt_poll_ns. */ 74 74 static unsigned int halt_poll_ns_grow = 2; 75 - module_param(halt_poll_ns_grow, int, S_IRUGO); 75 + module_param(halt_poll_ns_grow, uint, S_IRUGO | S_IWUSR); 76 76 77 77 /* Default resets per-vcpu halt_poll_ns . */ 78 78 static unsigned int halt_poll_ns_shrink; 79 - module_param(halt_poll_ns_shrink, int, S_IRUGO); 79 + module_param(halt_poll_ns_shrink, uint, S_IRUGO | S_IWUSR); 80 80 81 81 /* 82 82 * Ordering of locks: ··· 619 619 620 620 static void kvm_destroy_devices(struct kvm *kvm) 621 621 { 622 - struct list_head *node, *tmp; 622 + struct kvm_device *dev, *tmp; 623 623 624 - list_for_each_safe(node, tmp, &kvm->devices) { 625 - struct kvm_device *dev = 626 - list_entry(node, struct kvm_device, vm_node); 627 - 628 - list_del(node); 624 + list_for_each_entry_safe(dev, tmp, &kvm->devices, vm_node) { 625 + list_del(&dev->vm_node); 629 626 dev->ops->destroy(dev); 630 627 } 631 628 } ··· 1433 1436 { 1434 1437 unsigned long addr = __gfn_to_hva_many(slot, gfn, NULL, write_fault); 1435 1438 1436 - if (addr == KVM_HVA_ERR_RO_BAD) 1439 + if (addr == KVM_HVA_ERR_RO_BAD) { 1440 + if (writable) 1441 + *writable = false; 1437 1442 return KVM_PFN_ERR_RO_FAULT; 1443 + } 1438 1444 1439 - if (kvm_is_error_hva(addr)) 1445 + if (kvm_is_error_hva(addr)) { 1446 + if (writable) 1447 + *writable = false; 1440 1448 return KVM_PFN_NOSLOT; 1449 + } 1441 1450 1442 1451 /* Do not map writable pfn in the readonly memslot. */ 1443 1452 if (writable && memslot_is_readonly(slot)) { ··· 1945 1942 1946 1943 static void grow_halt_poll_ns(struct kvm_vcpu *vcpu) 1947 1944 { 1948 - int old, val; 1945 + unsigned int old, val, grow; 1949 1946 1950 1947 old = val = vcpu->halt_poll_ns; 1948 + grow = READ_ONCE(halt_poll_ns_grow); 1951 1949 /* 10us base */ 1952 - if (val == 0 && halt_poll_ns_grow) 1950 + if (val == 0 && grow) 1953 1951 val = 10000; 1954 1952 else 1955 - val *= halt_poll_ns_grow; 1953 + val *= grow; 1956 1954 1957 1955 if (val > halt_poll_ns) 1958 1956 val = halt_poll_ns; ··· 1964 1960 1965 1961 static void shrink_halt_poll_ns(struct kvm_vcpu *vcpu) 1966 1962 { 1967 - int old, val; 1963 + unsigned int old, val, shrink; 1968 1964 1969 1965 old = val = vcpu->halt_poll_ns; 1970 - if (halt_poll_ns_shrink == 0) 1966 + shrink = READ_ONCE(halt_poll_ns_shrink); 1967 + if (shrink == 0) 1971 1968 val = 0; 1972 1969 else 1973 - val /= halt_poll_ns_shrink; 1970 + val /= shrink; 1974 1971 1975 1972 vcpu->halt_poll_ns = val; 1976 1973 trace_kvm_halt_poll_ns_shrink(vcpu->vcpu_id, val, old);