Merge tag 'kvmarm-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

+48 -44

Documentation/virt/kvm/api.rst

··· 417 417 ----------------- 418 418 419 419 :Capability: basic 420 - :Architectures: all except ARM, arm64 420 + :Architectures: all except arm64 421 421 :Type: vcpu ioctl 422 422 :Parameters: struct kvm_regs (out) 423 423 :Returns: 0 on success, -1 on error ··· 450 450 ----------------- 451 451 452 452 :Capability: basic 453 - :Architectures: all except ARM, arm64 453 + :Architectures: all except arm64 454 454 :Type: vcpu ioctl 455 455 :Parameters: struct kvm_regs (in) 456 456 :Returns: 0 on success, -1 on error ··· 824 824 ----------------------- 825 825 826 826 :Capability: KVM_CAP_IRQCHIP, KVM_CAP_S390_IRQCHIP (s390) 827 - :Architectures: x86, ARM, arm64, s390 827 + :Architectures: x86, arm64, s390 828 828 :Type: vm ioctl 829 829 :Parameters: none 830 830 :Returns: 0 on success, -1 on error ··· 833 833 On x86, creates a virtual ioapic, a virtual PIC (two PICs, nested), and sets up 834 834 future vcpus to have a local APIC. IRQ routing for GSIs 0-15 is set to both 835 835 PIC and IOAPIC; GSI 16-23 only go to the IOAPIC. 836 - On ARM/arm64, a GICv2 is created. Any other GIC versions require the usage of 836 + On arm64, a GICv2 is created. Any other GIC versions require the usage of 837 837 KVM_CREATE_DEVICE, which also supports creating a GICv2. Using 838 838 KVM_CREATE_DEVICE is preferred over KVM_CREATE_IRQCHIP for GICv2. 839 839 On s390, a dummy irq routing table is created. ··· 846 846 ----------------- 847 847 848 848 :Capability: KVM_CAP_IRQCHIP 849 - :Architectures: x86, arm, arm64 849 + :Architectures: x86, arm64 850 850 :Type: vm ioctl 851 851 :Parameters: struct kvm_irq_level 852 852 :Returns: 0 on success, -1 on error ··· 870 870 of course). 871 871 872 872 873 - ARM/arm64 can signal an interrupt either at the CPU level, or at the 873 + arm64 can signal an interrupt either at the CPU level, or at the 874 874 in-kernel irqchip (GIC), and for in-kernel irqchip can tell the GIC to 875 875 use PPIs designated for specific cpus. The irq field is interpreted 876 876 like this:: ··· 896 896 identified as (256 * vcpu2_index + vcpu_index). Otherwise, vcpu2_index 897 897 must be zero. 898 898 899 - Note that on arm/arm64, the KVM_CAP_IRQCHIP capability only conditions 899 + Note that on arm64, the KVM_CAP_IRQCHIP capability only conditions 900 900 injection of interrupts for the in-kernel irqchip. KVM_IRQ_LINE can always 901 901 be used for a userspace interrupt controller. 902 902 ··· 1087 1087 1088 1088 :Capability: KVM_CAP_VCPU_EVENTS 1089 1089 :Extended by: KVM_CAP_INTR_SHADOW 1090 - :Architectures: x86, arm, arm64 1090 + :Architectures: x86, arm64 1091 1091 :Type: vcpu ioctl 1092 1092 :Parameters: struct kvm_vcpu_event (out) 1093 1093 :Returns: 0 on success, -1 on error ··· 1146 1146 fields contain a valid state. This bit will be set whenever 1147 1147 KVM_CAP_EXCEPTION_PAYLOAD is enabled. 1148 1148 1149 - ARM/ARM64: 1150 - ^^^^^^^^^^ 1149 + ARM64: 1150 + ^^^^^^ 1151 1151 1152 1152 If the guest accesses a device that is being emulated by the host kernel in 1153 1153 such a way that a real device would generate a physical SError, KVM may make ··· 1206 1206 1207 1207 :Capability: KVM_CAP_VCPU_EVENTS 1208 1208 :Extended by: KVM_CAP_INTR_SHADOW 1209 - :Architectures: x86, arm, arm64 1209 + :Architectures: x86, arm64 1210 1210 :Type: vcpu ioctl 1211 1211 :Parameters: struct kvm_vcpu_event (in) 1212 1212 :Returns: 0 on success, -1 on error ··· 1241 1241 exception_has_payload, exception_payload, and exception.pending fields 1242 1242 contain a valid state and shall be written into the VCPU. 1243 1243 1244 - ARM/ARM64: 1245 - ^^^^^^^^^^ 1244 + ARM64: 1245 + ^^^^^^ 1246 1246 1247 1247 User space may need to inject several types of events to the guest. 1248 1248 ··· 1449 1449 --------------------- 1450 1450 1451 1451 :Capability: KVM_CAP_MP_STATE 1452 - :Architectures: x86, s390, arm, arm64, riscv 1452 + :Architectures: x86, s390, arm64, riscv 1453 1453 :Type: vcpu ioctl 1454 1454 :Parameters: struct kvm_mp_state (out) 1455 1455 :Returns: 0 on success; -1 on error ··· 1467 1467 1468 1468 ========================== =============================================== 1469 1469 KVM_MP_STATE_RUNNABLE the vcpu is currently running 1470 - [x86,arm/arm64,riscv] 1470 + [x86,arm64,riscv] 1471 1471 KVM_MP_STATE_UNINITIALIZED the vcpu is an application processor (AP) 1472 1472 which has not yet received an INIT signal [x86] 1473 1473 KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is ··· 1476 1476 is waiting for an interrupt [x86] 1477 1477 KVM_MP_STATE_SIPI_RECEIVED the vcpu has just received a SIPI (vector 1478 1478 accessible via KVM_GET_VCPU_EVENTS) [x86] 1479 - KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64,riscv] 1479 + KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm64,riscv] 1480 1480 KVM_MP_STATE_CHECK_STOP the vcpu is in a special error state [s390] 1481 1481 KVM_MP_STATE_OPERATING the vcpu is operating (running or halted) 1482 1482 [s390] ··· 1488 1488 in-kernel irqchip, the multiprocessing state must be maintained by userspace on 1489 1489 these architectures. 1490 1490 1491 - For arm/arm64/riscv: 1492 - ^^^^^^^^^^^^^^^^^^^^ 1491 + For arm64/riscv: 1492 + ^^^^^^^^^^^^^^^^ 1493 1493 1494 1494 The only states that are valid are KVM_MP_STATE_STOPPED and 1495 1495 KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not. ··· 1498 1498 --------------------- 1499 1499 1500 1500 :Capability: KVM_CAP_MP_STATE 1501 - :Architectures: x86, s390, arm, arm64, riscv 1501 + :Architectures: x86, s390, arm64, riscv 1502 1502 :Type: vcpu ioctl 1503 1503 :Parameters: struct kvm_mp_state (in) 1504 1504 :Returns: 0 on success; -1 on error ··· 1510 1510 in-kernel irqchip, the multiprocessing state must be maintained by userspace on 1511 1511 these architectures. 1512 1512 1513 - For arm/arm64/riscv: 1514 - ^^^^^^^^^^^^^^^^^^^^ 1513 + For arm64/riscv: 1514 + ^^^^^^^^^^^^^^^^ 1515 1515 1516 1516 The only states that are valid are KVM_MP_STATE_STOPPED and 1517 1517 KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not. ··· 1780 1780 ------------------------ 1781 1781 1782 1782 :Capability: KVM_CAP_IRQ_ROUTING 1783 - :Architectures: x86 s390 arm arm64 1783 + :Architectures: x86 s390 arm64 1784 1784 :Type: vm ioctl 1785 1785 :Parameters: struct kvm_irq_routing (in) 1786 1786 :Returns: 0 on success, -1 on error 1787 1787 1788 1788 Sets the GSI routing table entries, overwriting any previously set entries. 1789 1789 1790 - On arm/arm64, GSI routing has the following limitation: 1790 + On arm64, GSI routing has the following limitation: 1791 1791 1792 1792 - GSI routing does not apply to KVM_IRQ_LINE but only to KVM_IRQFD. 1793 1793 ··· 2855 2855 ------------------- 2856 2856 2857 2857 :Capability: KVM_CAP_SIGNAL_MSI 2858 - :Architectures: x86 arm arm64 2858 + :Architectures: x86 arm64 2859 2859 :Type: vm ioctl 2860 2860 :Parameters: struct kvm_msi (in) 2861 2861 :Returns: >0 on delivery, 0 if guest blocked the MSI, and -1 on error ··· 3043 3043 -------------- 3044 3044 3045 3045 :Capability: KVM_CAP_IRQFD 3046 - :Architectures: x86 s390 arm arm64 3046 + :Architectures: x86 s390 arm64 3047 3047 :Type: vm ioctl 3048 3048 :Parameters: struct kvm_irqfd (in) 3049 3049 :Returns: 0 on success, -1 on error ··· 3069 3069 irqfd. The KVM_IRQFD_FLAG_RESAMPLE is only necessary on assignment 3070 3070 and need not be specified with KVM_IRQFD_FLAG_DEASSIGN. 3071 3071 3072 - On arm/arm64, gsi routing being supported, the following can happen: 3072 + On arm64, gsi routing being supported, the following can happen: 3073 3073 3074 3074 - in case no routing entry is associated to this gsi, injection fails 3075 3075 - in case the gsi is associated to an irqchip routing entry, ··· 3325 3325 ---------------------- 3326 3326 3327 3327 :Capability: basic 3328 - :Architectures: arm, arm64 3328 + :Architectures: arm64 3329 3329 :Type: vcpu ioctl 3330 3330 :Parameters: struct kvm_vcpu_init (in) 3331 3331 :Returns: 0 on success; -1 on error ··· 3423 3423 ----------------------------- 3424 3424 3425 3425 :Capability: basic 3426 - :Architectures: arm, arm64 3426 + :Architectures: arm64 3427 3427 :Type: vm ioctl 3428 3428 :Parameters: struct kvm_vcpu_init (out) 3429 3429 :Returns: 0 on success; -1 on error ··· 3452 3452 --------------------- 3453 3453 3454 3454 :Capability: basic 3455 - :Architectures: arm, arm64, mips 3455 + :Architectures: arm64, mips 3456 3456 :Type: vcpu ioctl 3457 3457 :Parameters: struct kvm_reg_list (in/out) 3458 3458 :Returns: 0 on success; -1 on error ··· 3479 3479 ----------------------------------------- 3480 3480 3481 3481 :Capability: KVM_CAP_ARM_SET_DEVICE_ADDR 3482 - :Architectures: arm, arm64 3482 + :Architectures: arm64 3483 3483 :Type: vm ioctl 3484 3484 :Parameters: struct kvm_arm_device_address (in) 3485 3485 :Returns: 0 on success, -1 on error ··· 3506 3506 to know about. The id field is an architecture specific identifier for a 3507 3507 specific device. 3508 3508 3509 - ARM/arm64 divides the id field into two parts, a device id and an 3509 + arm64 divides the id field into two parts, a device id and an 3510 3510 address type id specific to the individual device:: 3511 3511 3512 3512 bits: | 63 ... 32 | 31 ... 16 | 15 ... 0 | 3513 3513 field: | 0x00000000 | device id | addr type id | 3514 3514 3515 - ARM/arm64 currently only require this when using the in-kernel GIC 3515 + arm64 currently only require this when using the in-kernel GIC 3516 3516 support for the hardware VGIC features, using KVM_ARM_DEVICE_VGIC_V2 3517 3517 as the device id. When setting the base address for the guest's 3518 3518 mapping of the VGIC virtual CPU and distributor interface, the ioctl ··· 4794 4794 ------------------------------------ 4795 4795 4796 4796 :Capability: KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 4797 - :Architectures: x86, arm, arm64, mips 4797 + :Architectures: x86, arm64, mips 4798 4798 :Type: vm ioctl 4799 4799 :Parameters: struct kvm_clear_dirty_log (in) 4800 4800 :Returns: 0 on success, -1 on error ··· 4906 4906 4.119 KVM_ARM_VCPU_FINALIZE 4907 4907 --------------------------- 4908 4908 4909 - :Architectures: arm, arm64 4909 + :Architectures: arm64 4910 4910 :Type: vcpu ioctl 4911 4911 :Parameters: int feature (in) 4912 4912 :Returns: 0 on success, -1 on error ··· 5988 5988 5989 5989 If exit_reason is KVM_EXIT_SYSTEM_EVENT then the vcpu has triggered 5990 5990 a system-level event using some architecture specific mechanism (hypercall 5991 - or some special instruction). In case of ARM/ARM64, this is triggered using 5991 + or some special instruction). In case of ARM64, this is triggered using 5992 5992 HVC instruction based PSCI call from the vcpu. The 'type' field describes 5993 5993 the system-level event type. The 'flags' field describes architecture 5994 5994 specific flags for the system-level event. ··· 6006 6006 has requested a crash condition maintenance. Userspace can choose 6007 6007 to ignore the request, or to gather VM memory core dump and/or 6008 6008 reset/shutdown of the VM. 6009 + 6010 + Valid flags are: 6011 + 6012 + - KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2 (arm64 only) -- the guest issued 6013 + a SYSTEM_RESET2 call according to v1.1 of the PSCI specification. 6009 6014 6010 6015 :: 6011 6016 ··· 6086 6081 __u64 fault_ipa; 6087 6082 } arm_nisv; 6088 6083 6089 - Used on arm and arm64 systems. If a guest accesses memory not in a memslot, 6084 + Used on arm64 systems. If a guest accesses memory not in a memslot, 6090 6085 KVM will typically return to userspace and ask it to do MMIO emulation on its 6091 6086 behalf. However, for certain classes of instructions, no instruction decode 6092 6087 (direction, length of memory access) is provided, and fetching and decoding ··· 6103 6098 Userspace implementations can query for KVM_CAP_ARM_NISV_TO_USER, and enable 6104 6099 this capability at VM creation. Once this is done, these types of errors will 6105 6100 instead return to userspace with KVM_EXIT_ARM_NISV, with the valid bits from 6106 - the HSR (arm) and ESR_EL2 (arm64) in the esr_iss field, and the faulting IPA 6107 - in the fault_ipa field. Userspace can either fix up the access if it's 6108 - actually an I/O access by decoding the instruction from guest memory (if it's 6109 - very brave) and continue executing the guest, or it can decide to suspend, 6110 - dump, or restart the guest. 6101 + the ESR_EL2 in the esr_iss field, and the faulting IPA in the fault_ipa field. 6102 + Userspace can either fix up the access if it's actually an I/O access by 6103 + decoding the instruction from guest memory (if it's very brave) and continue 6104 + executing the guest, or it can decide to suspend, dump, or restart the guest. 6111 6105 6112 6106 Note that KVM does not skip the faulting instruction as it does for 6113 6107 KVM_EXIT_MMIO, but userspace has to emulate any change to the processing state ··· 6813 6809 6814 6810 7.18 KVM_CAP_MANUAL_DIRTY_LOG_PROTECT2 6815 6811 6816 - :Architectures: x86, arm, arm64, mips 6812 + :Architectures: x86, arm64, mips 6817 6813 :Parameters: args[0] whether feature should be enabled or not 6818 6814 6819 6815 Valid flags are:: ··· 7210 7206 8.9 KVM_CAP_ARM_USER_IRQ 7211 7207 ------------------------ 7212 7208 7213 - :Architectures: arm, arm64 7209 + :Architectures: arm64 7214 7210 7215 7211 This capability, if KVM_CHECK_EXTENSION indicates that it is available, means 7216 7212 that if userspace creates a VM without an in-kernel interrupt controller, it ··· 7337 7333 8.19 KVM_CAP_ARM_INJECT_SERROR_ESR 7338 7334 ---------------------------------- 7339 7335 7340 - :Architectures: arm, arm64 7336 + :Architectures: arm64 7341 7337 7342 7338 This capability indicates that userspace can specify (via the 7343 7339 KVM_SET_VCPU_EVENTS ioctl) the syndrome value reported to the guest when it

+34 -2

Documentation/virt/kvm/devices/vcpu.rst

··· 70 70 -ENODEV PMUv3 not supported or GIC not initialized 71 71 -ENXIO PMUv3 not properly configured or in-kernel irqchip not 72 72 configured as required prior to calling this attribute 73 - -EBUSY PMUv3 already initialized 73 + -EBUSY PMUv3 already initialized or a VCPU has already run 74 74 -EINVAL Invalid filter range 75 75 ======= ====================================================== 76 76 ··· 104 104 isn't strictly speaking an event. Filtering the cycle counter is possible 105 105 using event 0x11 (CPU_CYCLES). 106 106 107 + 1.4 ATTRIBUTE: KVM_ARM_VCPU_PMU_V3_SET_PMU 108 + ------------------------------------------ 109 + 110 + :Parameters: in kvm_device_attr.addr the address to an int representing the PMU 111 + identifier. 112 + 113 + :Returns: 114 + 115 + ======= ==================================================== 116 + -EBUSY PMUv3 already initialized, a VCPU has already run or 117 + an event filter has already been set 118 + -EFAULT Error accessing the PMU identifier 119 + -ENXIO PMU not found 120 + -ENODEV PMUv3 not supported or GIC not initialized 121 + -ENOMEM Could not allocate memory 122 + ======= ==================================================== 123 + 124 + Request that the VCPU uses the specified hardware PMU when creating guest events 125 + for the purpose of PMU emulation. The PMU identifier can be read from the "type" 126 + file for the desired PMU instance under /sys/devices (or, equivalent, 127 + /sys/bus/even_source). This attribute is particularly useful on heterogeneous 128 + systems where there are at least two CPU PMUs on the system. The PMU that is set 129 + for one VCPU will be used by all the other VCPUs. It isn't possible to set a PMU 130 + if a PMU event filter is already present. 131 + 132 + Note that KVM will not make any attempts to run the VCPU on the physical CPUs 133 + associated with the PMU specified by this attribute. This is entirely left to 134 + userspace. However, attempting to run the VCPU on a physical CPU not supported 135 + by the PMU will fail and KVM_RUN will return with 136 + exit_reason = KVM_EXIT_FAIL_ENTRY and populate the fail_entry struct by setting 137 + hardare_entry_failure_reason field to KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED and 138 + the cpu field to the processor id. 107 139 108 140 2. GROUP: KVM_ARM_VCPU_TIMER_CTRL 109 141 ================================= 110 142 111 - :Architectures: ARM, ARM64 143 + :Architectures: ARM64 112 144 113 145 2.1. ATTRIBUTES: KVM_ARM_VCPU_TIMER_IRQ_VTIMER, KVM_ARM_VCPU_TIMER_IRQ_PTIMER 114 146 -----------------------------------------------------------------------------

+1

arch/arm64/Kconfig

··· 682 682 683 683 config ARM64_ERRATUM_2077057 684 684 bool "Cortex-A510: 2077057: workaround software-step corrupting SPSR_EL2" 685 + default y 685 686 help 686 687 This option adds the workaround for ARM Cortex-A510 erratum 2077057. 687 688 Affected Cortex-A510 may corrupt SPSR_EL2 when the a step exception is

+36 -9

arch/arm64/include/asm/kvm_host.h

··· 50 50 #define KVM_DIRTY_LOG_MANUAL_CAPS (KVM_DIRTY_LOG_MANUAL_PROTECT_ENABLE | \ 51 51 KVM_DIRTY_LOG_INITIALLY_SET) 52 52 53 + #define KVM_HAVE_MMU_RWLOCK 54 + 53 55 /* 54 56 * Mode of operation configurable with kvm-arm.mode early param. 55 57 * See Documentation/admin-guide/kernel-parameters.txt for more information. ··· 73 71 void kvm_arm_vcpu_destroy(struct kvm_vcpu *vcpu); 74 72 75 73 struct kvm_vmid { 76 - /* The VMID generation used for the virt. memory system */ 77 - u64 vmid_gen; 78 - u32 vmid; 74 + atomic64_t id; 79 75 }; 80 76 81 77 struct kvm_s2_mmu { ··· 122 122 * should) opt in to this feature if KVM_CAP_ARM_NISV_TO_USER is 123 123 * supported. 124 124 */ 125 - bool return_nisv_io_abort_to_user; 125 + #define KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER 0 126 + /* Memory Tagging Extension enabled for the guest */ 127 + #define KVM_ARCH_FLAG_MTE_ENABLED 1 128 + /* At least one vCPU has ran in the VM */ 129 + #define KVM_ARCH_FLAG_HAS_RAN_ONCE 2 130 + unsigned long flags; 126 131 127 132 /* 128 133 * VM-wide PMU filter, implemented as a bitmap and big enough for 129 134 * up to 2^10 events (ARMv8.0) or 2^16 events (ARMv8.1+). 130 135 */ 131 136 unsigned long *pmu_filter; 132 - unsigned int pmuver; 137 + struct arm_pmu *arm_pmu; 138 + 139 + cpumask_var_t supported_cpus; 133 140 134 141 u8 pfr0_csv2; 135 142 u8 pfr0_csv3; 136 - 137 - /* Memory Tagging Extension enabled for the guest */ 138 - bool mte_enabled; 139 143 }; 140 144 141 145 struct kvm_vcpu_fault_info { ··· 175 171 PAR_EL1, /* Physical Address Register */ 176 172 MDSCR_EL1, /* Monitor Debug System Control Register */ 177 173 MDCCINT_EL1, /* Monitor Debug Comms Channel Interrupt Enable Reg */ 174 + OSLSR_EL1, /* OS Lock Status Register */ 178 175 DISR_EL1, /* Deferred Interrupt Status Register */ 179 176 180 177 /* Performance Monitors Registers */ ··· 440 435 #define KVM_ARM64_DEBUG_STATE_SAVE_SPE (1 << 12) /* Save SPE context if active */ 441 436 #define KVM_ARM64_DEBUG_STATE_SAVE_TRBE (1 << 13) /* Save TRBE context if active */ 442 437 #define KVM_ARM64_FP_FOREIGN_FPSTATE (1 << 14) 438 + #define KVM_ARM64_ON_UNSUPPORTED_CPU (1 << 15) /* Physical CPU not in supported_cpus */ 443 439 444 440 #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | \ 445 441 KVM_GUESTDBG_USE_SW_BP | \ ··· 458 452 #else 459 453 #define vcpu_has_ptrauth(vcpu) false 460 454 #endif 455 + 456 + #define vcpu_on_unsupported_cpu(vcpu) \ 457 + ((vcpu)->arch.flags & KVM_ARM64_ON_UNSUPPORTED_CPU) 458 + 459 + #define vcpu_set_on_unsupported_cpu(vcpu) \ 460 + ((vcpu)->arch.flags |= KVM_ARM64_ON_UNSUPPORTED_CPU) 461 + 462 + #define vcpu_clear_on_unsupported_cpu(vcpu) \ 463 + ((vcpu)->arch.flags &= ~KVM_ARM64_ON_UNSUPPORTED_CPU) 461 464 462 465 #define vcpu_gp_regs(v) (&(v)->arch.ctxt.regs) 463 466 ··· 707 692 int kvm_arm_pvtime_has_attr(struct kvm_vcpu *vcpu, 708 693 struct kvm_device_attr *attr); 709 694 695 + extern unsigned int kvm_arm_vmid_bits; 696 + int kvm_arm_vmid_alloc_init(void); 697 + void kvm_arm_vmid_alloc_free(void); 698 + void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid); 699 + void kvm_arm_vmid_clear_active(void); 700 + 710 701 static inline void kvm_arm_pvtime_vcpu_init(struct kvm_vcpu_arch *vcpu_arch) 711 702 { 712 703 vcpu_arch->steal.base = GPA_INVALID; ··· 746 725 void kvm_arm_setup_debug(struct kvm_vcpu *vcpu); 747 726 void kvm_arm_clear_debug(struct kvm_vcpu *vcpu); 748 727 void kvm_arm_reset_debug_ptr(struct kvm_vcpu *vcpu); 728 + 729 + #define kvm_vcpu_os_lock_enabled(vcpu) \ 730 + (!!(__vcpu_sys_reg(vcpu, OSLSR_EL1) & SYS_OSLSR_OSLK)) 731 + 749 732 int kvm_arm_vcpu_arch_set_attr(struct kvm_vcpu *vcpu, 750 733 struct kvm_device_attr *attr); 751 734 int kvm_arm_vcpu_arch_get_attr(struct kvm_vcpu *vcpu, ··· 811 786 #define kvm_arm_vcpu_sve_finalized(vcpu) \ 812 787 ((vcpu)->arch.flags & KVM_ARM64_VCPU_SVE_FINALIZED) 813 788 814 - #define kvm_has_mte(kvm) (system_supports_mte() && (kvm)->arch.mte_enabled) 789 + #define kvm_has_mte(kvm) \ 790 + (system_supports_mte() && \ 791 + test_bit(KVM_ARCH_FLAG_MTE_ENABLED, &(kvm)->arch.flags)) 815 792 #define kvm_vcpu_has_pmu(vcpu) \ 816 793 (test_bit(KVM_ARM_VCPU_PMU_V3, (vcpu)->arch.features)) 817 794

+3 -1

arch/arm64/include/asm/kvm_mmu.h

··· 115 115 #include <asm/cache.h> 116 116 #include <asm/cacheflush.h> 117 117 #include <asm/mmu_context.h> 118 + #include <asm/kvm_host.h> 118 119 119 120 void kvm_update_va_mask(struct alt_instr *alt, 120 121 __le32 *origptr, __le32 *updptr, int nr_inst); ··· 267 266 u64 cnp = system_supports_cnp() ? VTTBR_CNP_BIT : 0; 268 267 269 268 baddr = mmu->pgd_phys; 270 - vmid_field = (u64)READ_ONCE(vmid->vmid) << VTTBR_VMID_SHIFT; 269 + vmid_field = atomic64_read(&vmid->id) << VTTBR_VMID_SHIFT; 270 + vmid_field &= VTTBR_VMID_MASK(kvm_arm_vmid_bits); 271 271 return kvm_phys_to_vttbr(baddr) | vmid_field | cnp; 272 272 } 273 273

+8

arch/arm64/include/asm/sysreg.h

··· 128 128 #define SYS_DBGWVRn_EL1(n) sys_reg(2, 0, 0, n, 6) 129 129 #define SYS_DBGWCRn_EL1(n) sys_reg(2, 0, 0, n, 7) 130 130 #define SYS_MDRAR_EL1 sys_reg(2, 0, 1, 0, 0) 131 + 131 132 #define SYS_OSLAR_EL1 sys_reg(2, 0, 1, 0, 4) 133 + #define SYS_OSLAR_OSLK BIT(0) 134 + 132 135 #define SYS_OSLSR_EL1 sys_reg(2, 0, 1, 1, 4) 136 + #define SYS_OSLSR_OSLM_MASK (BIT(3) | BIT(0)) 137 + #define SYS_OSLSR_OSLM_NI 0 138 + #define SYS_OSLSR_OSLM_IMPLEMENTED BIT(3) 139 + #define SYS_OSLSR_OSLK BIT(1) 140 + 133 141 #define SYS_OSDLR_EL1 sys_reg(2, 0, 1, 3, 4) 134 142 #define SYS_DBGPRCR_EL1 sys_reg(2, 0, 1, 4, 4) 135 143 #define SYS_DBGCLAIMSET_EL1 sys_reg(2, 0, 7, 8, 6)

+11

arch/arm64/include/uapi/asm/kvm.h

··· 362 362 #define KVM_ARM_VCPU_PMU_V3_IRQ 0 363 363 #define KVM_ARM_VCPU_PMU_V3_INIT 1 364 364 #define KVM_ARM_VCPU_PMU_V3_FILTER 2 365 + #define KVM_ARM_VCPU_PMU_V3_SET_PMU 3 365 366 #define KVM_ARM_VCPU_TIMER_CTRL 1 366 367 #define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0 367 368 #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1 ··· 413 412 #define KVM_PSCI_RET_NI PSCI_RET_NOT_SUPPORTED 414 413 #define KVM_PSCI_RET_INVAL PSCI_RET_INVALID_PARAMS 415 414 #define KVM_PSCI_RET_DENIED PSCI_RET_DENIED 415 + 416 + /* arm64-specific kvm_run::system_event flags */ 417 + /* 418 + * Reset caused by a PSCI v1.1 SYSTEM_RESET2 call. 419 + * Valid only when the system event has a type of KVM_SYSTEM_EVENT_RESET. 420 + */ 421 + #define KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2 (1ULL << 0) 422 + 423 + /* run->fail_entry.hardware_entry_failure_reason codes. */ 424 + #define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED (1ULL << 0) 416 425 417 426 #endif 418 427

+7 -1

arch/arm64/kernel/fpsimd.c

··· 348 348 349 349 /* 350 350 * Ensure FPSIMD/SVE storage in memory for the loaded context is up to 351 - * date with respect to the CPU registers. 351 + * date with respect to the CPU registers. Note carefully that the 352 + * current context is the context last bound to the CPU stored in 353 + * last, if KVM is involved this may be the guest VM context rather 354 + * than the host thread for the VM pointed to by current. This means 355 + * that we must always reference the state storage via last rather 356 + * than via current, other than the TIF_ flags which KVM will 357 + * carefully maintain for us. 352 358 */ 353 359 static void fpsimd_save(void) 354 360 {

+3

arch/arm64/kernel/image-vars.h

··· 79 79 /* Kernel symbol used by icache_is_vpipt(). */ 80 80 KVM_NVHE_ALIAS(__icache_flags); 81 81 82 + /* VMID bits set by the KVM VMID allocator */ 83 + KVM_NVHE_ALIAS(kvm_arm_vmid_bits); 84 + 82 85 /* Kernel symbols needed for cpus_have_final/const_caps checks. */ 83 86 KVM_NVHE_ALIAS(arm64_const_caps_ready); 84 87 KVM_NVHE_ALIAS(cpu_hwcap_keys);

+1 -1

arch/arm64/kvm/Makefile

··· 14 14 inject_fault.o va_layout.o handle_exit.o \ 15 15 guest.o debug.o reset.o sys_regs.o \ 16 16 vgic-sys-reg-v3.o fpsimd.o pmu.o pkvm.o \ 17 - arch_timer.o trng.o\ 17 + arch_timer.o trng.o vmid.o \ 18 18 vgic/vgic.o vgic/vgic-init.o \ 19 19 vgic/vgic-irqfd.o vgic/vgic-v2.o \ 20 20 vgic/vgic-v3.o vgic/vgic-v4.o \

+47 -95

arch/arm64/kvm/arm.c

··· 53 53 unsigned long kvm_arm_hyp_percpu_base[NR_CPUS]; 54 54 DECLARE_KVM_NVHE_PER_CPU(struct kvm_nvhe_init_params, kvm_init_params); 55 55 56 - /* The VMID used in the VTTBR */ 57 - static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1); 58 - static u32 kvm_next_vmid; 59 - static DEFINE_SPINLOCK(kvm_vmid_lock); 60 - 61 56 static bool vgic_present; 62 57 63 58 static DEFINE_PER_CPU(unsigned char, kvm_arm_hardware_enabled); ··· 84 89 switch (cap->cap) { 85 90 case KVM_CAP_ARM_NISV_TO_USER: 86 91 r = 0; 87 - kvm->arch.return_nisv_io_abort_to_user = true; 92 + set_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER, 93 + &kvm->arch.flags); 88 94 break; 89 95 case KVM_CAP_ARM_MTE: 90 96 mutex_lock(&kvm->lock); ··· 93 97 r = -EINVAL; 94 98 } else { 95 99 r = 0; 96 - kvm->arch.mte_enabled = true; 100 + set_bit(KVM_ARCH_FLAG_MTE_ENABLED, &kvm->arch.flags); 97 101 } 98 102 mutex_unlock(&kvm->lock); 99 103 break; ··· 146 150 if (ret) 147 151 goto out_free_stage2_pgd; 148 152 153 + if (!zalloc_cpumask_var(&kvm->arch.supported_cpus, GFP_KERNEL)) 154 + goto out_free_stage2_pgd; 155 + cpumask_copy(kvm->arch.supported_cpus, cpu_possible_mask); 156 + 149 157 kvm_vgic_early_init(kvm); 150 158 151 159 /* The maximum number of VCPUs is limited by the host's GIC model */ ··· 176 176 void kvm_arch_destroy_vm(struct kvm *kvm) 177 177 { 178 178 bitmap_free(kvm->arch.pmu_filter); 179 + free_cpumask_var(kvm->arch.supported_cpus); 179 180 180 181 kvm_vgic_destroy(kvm); 181 182 ··· 412 411 if (vcpu_has_ptrauth(vcpu)) 413 412 vcpu_ptrauth_disable(vcpu); 414 413 kvm_arch_vcpu_load_debug_state_flags(vcpu); 414 + 415 + if (!cpumask_test_cpu(smp_processor_id(), vcpu->kvm->arch.supported_cpus)) 416 + vcpu_set_on_unsupported_cpu(vcpu); 415 417 } 416 418 417 419 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) ··· 426 422 kvm_timer_vcpu_put(vcpu); 427 423 kvm_vgic_put(vcpu); 428 424 kvm_vcpu_pmu_restore_host(vcpu); 425 + kvm_arm_vmid_clear_active(); 429 426 427 + vcpu_clear_on_unsupported_cpu(vcpu); 430 428 vcpu->cpu = -1; 431 429 } 432 430 ··· 495 489 } 496 490 #endif 497 491 498 - /* Just ensure a guest exit from a particular CPU */ 499 - static void exit_vm_noop(void *info) 500 - { 501 - } 502 - 503 - void force_vm_exit(const cpumask_t *mask) 504 - { 505 - preempt_disable(); 506 - smp_call_function_many(mask, exit_vm_noop, NULL, true); 507 - preempt_enable(); 508 - } 509 - 510 - /** 511 - * need_new_vmid_gen - check that the VMID is still valid 512 - * @vmid: The VMID to check 513 - * 514 - * return true if there is a new generation of VMIDs being used 515 - * 516 - * The hardware supports a limited set of values with the value zero reserved 517 - * for the host, so we check if an assigned value belongs to a previous 518 - * generation, which requires us to assign a new value. If we're the first to 519 - * use a VMID for the new generation, we must flush necessary caches and TLBs 520 - * on all CPUs. 521 - */ 522 - static bool need_new_vmid_gen(struct kvm_vmid *vmid) 523 - { 524 - u64 current_vmid_gen = atomic64_read(&kvm_vmid_gen); 525 - smp_rmb(); /* Orders read of kvm_vmid_gen and kvm->arch.vmid */ 526 - return unlikely(READ_ONCE(vmid->vmid_gen) != current_vmid_gen); 527 - } 528 - 529 - /** 530 - * update_vmid - Update the vmid with a valid VMID for the current generation 531 - * @vmid: The stage-2 VMID information struct 532 - */ 533 - static void update_vmid(struct kvm_vmid *vmid) 534 - { 535 - if (!need_new_vmid_gen(vmid)) 536 - return; 537 - 538 - spin_lock(&kvm_vmid_lock); 539 - 540 - /* 541 - * We need to re-check the vmid_gen here to ensure that if another vcpu 542 - * already allocated a valid vmid for this vm, then this vcpu should 543 - * use the same vmid. 544 - */ 545 - if (!need_new_vmid_gen(vmid)) { 546 - spin_unlock(&kvm_vmid_lock); 547 - return; 548 - } 549 - 550 - /* First user of a new VMID generation? */ 551 - if (unlikely(kvm_next_vmid == 0)) { 552 - atomic64_inc(&kvm_vmid_gen); 553 - kvm_next_vmid = 1; 554 - 555 - /* 556 - * On SMP we know no other CPUs can use this CPU's or each 557 - * other's VMID after force_vm_exit returns since the 558 - * kvm_vmid_lock blocks them from reentry to the guest. 559 - */ 560 - force_vm_exit(cpu_all_mask); 561 - /* 562 - * Now broadcast TLB + ICACHE invalidation over the inner 563 - * shareable domain to make sure all data structures are 564 - * clean. 565 - */ 566 - kvm_call_hyp(__kvm_flush_vm_context); 567 - } 568 - 569 - WRITE_ONCE(vmid->vmid, kvm_next_vmid); 570 - kvm_next_vmid++; 571 - kvm_next_vmid &= (1 << kvm_get_vmid_bits()) - 1; 572 - 573 - smp_wmb(); 574 - WRITE_ONCE(vmid->vmid_gen, atomic64_read(&kvm_vmid_gen)); 575 - 576 - spin_unlock(&kvm_vmid_lock); 577 - } 578 - 579 492 static int kvm_vcpu_initialized(struct kvm_vcpu *vcpu) 580 493 { 581 494 return vcpu->arch.target >= 0; ··· 558 633 */ 559 634 if (kvm_vm_is_protected(kvm)) 560 635 kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu); 636 + 637 + mutex_lock(&kvm->lock); 638 + set_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags); 639 + mutex_unlock(&kvm->lock); 561 640 562 641 return ret; 563 642 } ··· 721 792 } 722 793 } 723 794 795 + if (unlikely(vcpu_on_unsupported_cpu(vcpu))) { 796 + run->exit_reason = KVM_EXIT_FAIL_ENTRY; 797 + run->fail_entry.hardware_entry_failure_reason = KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED; 798 + run->fail_entry.cpu = smp_processor_id(); 799 + *ret = 0; 800 + return true; 801 + } 802 + 724 803 return kvm_request_pending(vcpu) || 725 - need_new_vmid_gen(&vcpu->arch.hw_mmu->vmid) || 726 804 xfer_to_guest_mode_work_pending(); 727 805 } 728 806 ··· 791 855 if (!ret) 792 856 ret = 1; 793 857 794 - update_vmid(&vcpu->arch.hw_mmu->vmid); 795 - 796 858 check_vcpu_requests(vcpu); 797 859 798 860 /* ··· 799 865 * non-preemptible context. 800 866 */ 801 867 preempt_disable(); 868 + 869 + /* 870 + * The VMID allocator only tracks active VMIDs per 871 + * physical CPU, and therefore the VMID allocated may not be 872 + * preserved on VMID roll-over if the task was preempted, 873 + * making a thread's VMID inactive. So we need to call 874 + * kvm_arm_vmid_update() in non-premptible context. 875 + */ 876 + kvm_arm_vmid_update(&vcpu->arch.hw_mmu->vmid); 802 877 803 878 kvm_pmu_flush_hwstate(vcpu); 804 879 ··· 888 945 * context synchronization event) is necessary to ensure that 889 946 * pending interrupts are taken. 890 947 */ 891 - local_irq_enable(); 892 - isb(); 893 - local_irq_disable(); 948 + if (ARM_EXCEPTION_CODE(ret) == ARM_EXCEPTION_IRQ) { 949 + local_irq_enable(); 950 + isb(); 951 + local_irq_disable(); 952 + } 894 953 895 954 guest_timing_exit_irqoff(); 896 955 ··· 1690 1745 1691 1746 /* 1692 1747 * Copy the MPIDR <-> logical CPU ID mapping to hyp. 1693 - * Only copy the set of online CPUs whose features have been chacked 1748 + * Only copy the set of online CPUs whose features have been checked 1694 1749 * against the finalized system capabilities. The hypervisor will not 1695 1750 * allow any other CPUs from the `possible` set to boot. 1696 1751 */ ··· 2106 2161 if (err) 2107 2162 return err; 2108 2163 2164 + err = kvm_arm_vmid_alloc_init(); 2165 + if (err) { 2166 + kvm_err("Failed to initialize VMID allocator.\n"); 2167 + return err; 2168 + } 2169 + 2109 2170 if (!in_hyp_mode) { 2110 2171 err = init_hyp_mode(); 2111 2172 if (err) ··· 2151 2200 if (!in_hyp_mode) 2152 2201 teardown_hyp_mode(); 2153 2202 out_err: 2203 + kvm_arm_vmid_alloc_free(); 2154 2204 return err; 2155 2205 } 2156 2206

+22 -4

arch/arm64/kvm/debug.c

··· 105 105 * - Userspace is using the hardware to debug the guest 106 106 * (KVM_GUESTDBG_USE_HW is set). 107 107 * - The guest is not using debug (KVM_ARM64_DEBUG_DIRTY is clear). 108 + * - The guest has enabled the OS Lock (debug exceptions are blocked). 108 109 */ 109 110 if ((vcpu->guest_debug & KVM_GUESTDBG_USE_HW) || 110 - !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY)) 111 + !(vcpu->arch.flags & KVM_ARM64_DEBUG_DIRTY) || 112 + kvm_vcpu_os_lock_enabled(vcpu)) 111 113 vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA; 112 114 113 115 trace_kvm_arm_set_dreg32("MDCR_EL2", vcpu->arch.mdcr_el2); ··· 162 160 163 161 kvm_arm_setup_mdcr_el2(vcpu); 164 162 165 - /* Is Guest debugging in effect? */ 166 - if (vcpu->guest_debug) { 163 + /* Check if we need to use the debug registers. */ 164 + if (vcpu->guest_debug || kvm_vcpu_os_lock_enabled(vcpu)) { 167 165 /* Save guest debug state */ 168 166 save_guest_debug_regs(vcpu); 169 167 ··· 225 223 trace_kvm_arm_set_regset("WAPTS", get_num_wrps(), 226 224 &vcpu->arch.debug_ptr->dbg_wcr[0], 227 225 &vcpu->arch.debug_ptr->dbg_wvr[0]); 226 + 227 + /* 228 + * The OS Lock blocks debug exceptions in all ELs when it is 229 + * enabled. If the guest has enabled the OS Lock, constrain its 230 + * effects to the guest. Emulate the behavior by clearing 231 + * MDSCR_EL1.MDE. In so doing, we ensure that host debug 232 + * exceptions are unaffected by guest configuration of the OS 233 + * Lock. 234 + */ 235 + } else if (kvm_vcpu_os_lock_enabled(vcpu)) { 236 + mdscr = vcpu_read_sys_reg(vcpu, MDSCR_EL1); 237 + mdscr &= ~DBG_MDSCR_MDE; 238 + vcpu_write_sys_reg(vcpu, mdscr, MDSCR_EL1); 228 239 } 229 240 } 230 241 ··· 259 244 { 260 245 trace_kvm_arm_clear_debug(vcpu->guest_debug); 261 246 262 - if (vcpu->guest_debug) { 247 + /* 248 + * Restore the guest's debug registers if we were using them. 249 + */ 250 + if (vcpu->guest_debug || kvm_vcpu_os_lock_enabled(vcpu)) { 263 251 restore_guest_debug_regs(vcpu); 264 252 265 253 /*

+10 -4

arch/arm64/kvm/fpsimd.c

··· 84 84 vcpu->arch.flags |= KVM_ARM64_HOST_SVE_ENABLED; 85 85 } 86 86 87 + /* 88 + * Called just before entering the guest once we are no longer 89 + * preemptable. Syncs the host's TIF_FOREIGN_FPSTATE with the KVM 90 + * mirror of the flag used by the hypervisor. 91 + */ 87 92 void kvm_arch_vcpu_ctxflush_fp(struct kvm_vcpu *vcpu) 88 93 { 89 94 if (test_thread_flag(TIF_FOREIGN_FPSTATE)) ··· 98 93 } 99 94 100 95 /* 101 - * If the guest FPSIMD state was loaded, update the host's context 102 - * tracking data mark the CPU FPSIMD regs as dirty and belonging to vcpu 103 - * so that they will be written back if the kernel clobbers them due to 104 - * kernel-mode NEON before re-entry into the guest. 96 + * Called just after exiting the guest. If the guest FPSIMD state 97 + * was loaded, update the host's context tracking data mark the CPU 98 + * FPSIMD regs as dirty and belonging to vcpu so that they will be 99 + * written back if the kernel clobbers them due to kernel-mode NEON 100 + * before re-entry into the guest. 105 101 */ 106 102 void kvm_arch_vcpu_ctxsync_fp(struct kvm_vcpu *vcpu) 107 103 {

+1 -1

arch/arm64/kvm/guest.c

··· 282 282 break; 283 283 284 284 /* 285 - * Otherwide, this is a priviledged mode, and *all* the 285 + * Otherwise, this is a privileged mode, and *all* the 286 286 * registers must be narrowed to 32bit. 287 287 */ 288 288 default:

+1 -1

arch/arm64/kvm/handle_exit.c

··· 248 248 case ARM_EXCEPTION_HYP_GONE: 249 249 /* 250 250 * EL2 has been reset to the hyp-stub. This happens when a guest 251 - * is pre-empted by kvm_reboot()'s shutdown call. 251 + * is pre-emptied by kvm_reboot()'s shutdown call. 252 252 */ 253 253 run->exit_reason = KVM_EXIT_FAIL_ENTRY; 254 254 return 0;

+4

arch/arm64/kvm/hyp/include/hyp/switch.h

··· 173 173 return false; 174 174 175 175 /* Valid trap. Switch the context: */ 176 + 177 + /* First disable enough traps to allow us to update the registers */ 176 178 if (has_vhe()) { 177 179 reg = CPACR_EL1_FPEN; 178 180 if (sve_guest) ··· 190 188 } 191 189 isb(); 192 190 191 + /* Write out the host state if it's in the registers */ 193 192 if (vcpu->arch.flags & KVM_ARM64_FP_HOST) { 194 193 __fpsimd_save_state(vcpu->arch.host_fpsimd_state); 195 194 vcpu->arch.flags &= ~KVM_ARM64_FP_HOST; 196 195 } 197 196 197 + /* Restore the guest state */ 198 198 if (sve_guest) 199 199 __hyp_sve_restore_guest(vcpu); 200 200 else

+2 -1

arch/arm64/kvm/hyp/nvhe/Makefile

··· 13 13 lib-objs := $(addprefix ../../../lib/, $(lib-objs)) 14 14 15 15 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \ 16 - hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \ 16 + hyp-main.o hyp-smp.o psci-relay.o early_alloc.o page_alloc.o \ 17 17 cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o 18 18 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \ 19 19 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o 20 + obj-$(CONFIG_DEBUG_LIST) += list_debug.o 20 21 obj-y += $(lib-objs) 21 22 22 23 ##

+54

arch/arm64/kvm/hyp/nvhe/list_debug.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2022 - Google LLC 4 + * Author: Keir Fraser <keirf@google.com> 5 + */ 6 + 7 + #include <linux/list.h> 8 + #include <linux/bug.h> 9 + 10 + static inline __must_check bool nvhe_check_data_corruption(bool v) 11 + { 12 + return v; 13 + } 14 + 15 + #define NVHE_CHECK_DATA_CORRUPTION(condition) \ 16 + nvhe_check_data_corruption(({ \ 17 + bool corruption = unlikely(condition); \ 18 + if (corruption) { \ 19 + if (IS_ENABLED(CONFIG_BUG_ON_DATA_CORRUPTION)) { \ 20 + BUG_ON(1); \ 21 + } else \ 22 + WARN_ON(1); \ 23 + } \ 24 + corruption; \ 25 + })) 26 + 27 + /* The predicates checked here are taken from lib/list_debug.c. */ 28 + 29 + bool __list_add_valid(struct list_head *new, struct list_head *prev, 30 + struct list_head *next) 31 + { 32 + if (NVHE_CHECK_DATA_CORRUPTION(next->prev != prev) || 33 + NVHE_CHECK_DATA_CORRUPTION(prev->next != next) || 34 + NVHE_CHECK_DATA_CORRUPTION(new == prev || new == next)) 35 + return false; 36 + 37 + return true; 38 + } 39 + 40 + bool __list_del_entry_valid(struct list_head *entry) 41 + { 42 + struct list_head *prev, *next; 43 + 44 + prev = entry->prev; 45 + next = entry->next; 46 + 47 + if (NVHE_CHECK_DATA_CORRUPTION(next == LIST_POISON1) || 48 + NVHE_CHECK_DATA_CORRUPTION(prev == LIST_POISON2) || 49 + NVHE_CHECK_DATA_CORRUPTION(prev->next != entry) || 50 + NVHE_CHECK_DATA_CORRUPTION(next->prev != entry)) 51 + return false; 52 + 53 + return true; 54 + }

+1 -2

arch/arm64/kvm/hyp/nvhe/mem_protect.c

··· 138 138 139 139 mmu->pgd_phys = __hyp_pa(host_kvm.pgt.pgd); 140 140 mmu->pgt = &host_kvm.pgt; 141 - WRITE_ONCE(mmu->vmid.vmid_gen, 0); 142 - WRITE_ONCE(mmu->vmid.vmid, 0); 141 + atomic64_set(&mmu->vmid.id, 0); 143 142 144 143 return 0; 145 144 }

+2 -2

arch/arm64/kvm/hyp/nvhe/page_alloc.c

··· 102 102 * Only the first struct hyp_page of a high-order page (otherwise known 103 103 * as the 'head') should have p->order set. The non-head pages should 104 104 * have p->order = HYP_NO_ORDER. Here @p may no longer be the head 105 - * after coallescing, so make sure to mark it HYP_NO_ORDER proactively. 105 + * after coalescing, so make sure to mark it HYP_NO_ORDER proactively. 106 106 */ 107 107 p->order = HYP_NO_ORDER; 108 108 for (; (order + 1) < pool->max_order; order++) { ··· 110 110 if (!buddy) 111 111 break; 112 112 113 - /* Take the buddy out of its list, and coallesce with @p */ 113 + /* Take the buddy out of its list, and coalesce with @p */ 114 114 page_remove_from_list(buddy); 115 115 buddy->order = HYP_NO_ORDER; 116 116 p = min(p, buddy);

-22

arch/arm64/kvm/hyp/nvhe/stub.c

··· 1 - // SPDX-License-Identifier: GPL-2.0-only 2 - /* 3 - * Stubs for out-of-line function calls caused by re-using kernel 4 - * infrastructure at EL2. 5 - * 6 - * Copyright (C) 2020 - Google LLC 7 - */ 8 - 9 - #include <linux/list.h> 10 - 11 - #ifdef CONFIG_DEBUG_LIST 12 - bool __list_add_valid(struct list_head *new, struct list_head *prev, 13 - struct list_head *next) 14 - { 15 - return true; 16 - } 17 - 18 - bool __list_del_entry_valid(struct list_head *entry) 19 - { 20 - return true; 21 - } 22 - #endif

+2 -1

arch/arm64/kvm/mmio.c

··· 135 135 * volunteered to do so, and bail out otherwise. 136 136 */ 137 137 if (!kvm_vcpu_dabt_isvalid(vcpu)) { 138 - if (vcpu->kvm->arch.return_nisv_io_abort_to_user) { 138 + if (test_bit(KVM_ARCH_FLAG_RETURN_NISV_IO_ABORT_TO_USER, 139 + &vcpu->kvm->arch.flags)) { 139 140 run->exit_reason = KVM_EXIT_ARM_NISV; 140 141 run->arm_nisv.esr_iss = kvm_vcpu_dabt_iss_nisv_sanitized(vcpu); 141 142 run->arm_nisv.fault_ipa = fault_ipa;

+32 -20

arch/arm64/kvm/mmu.c

··· 58 58 break; 59 59 60 60 if (resched && next != end) 61 - cond_resched_lock(&kvm->mmu_lock); 61 + cond_resched_rwlock_write(&kvm->mmu_lock); 62 62 } while (addr = next, addr != end); 63 63 64 64 return ret; ··· 179 179 struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu); 180 180 phys_addr_t end = start + size; 181 181 182 - assert_spin_locked(&kvm->mmu_lock); 182 + lockdep_assert_held_write(&kvm->mmu_lock); 183 183 WARN_ON(size & ~PAGE_MASK); 184 184 WARN_ON(stage2_apply_range(kvm, start, end, kvm_pgtable_stage2_unmap, 185 185 may_block)); ··· 213 213 int idx, bkt; 214 214 215 215 idx = srcu_read_lock(&kvm->srcu); 216 - spin_lock(&kvm->mmu_lock); 216 + write_lock(&kvm->mmu_lock); 217 217 218 218 slots = kvm_memslots(kvm); 219 219 kvm_for_each_memslot(memslot, bkt, slots) 220 220 stage2_flush_memslot(kvm, memslot); 221 221 222 - spin_unlock(&kvm->mmu_lock); 222 + write_unlock(&kvm->mmu_lock); 223 223 srcu_read_unlock(&kvm->srcu, idx); 224 224 } 225 225 ··· 615 615 }; 616 616 617 617 /** 618 - * kvm_init_stage2_mmu - Initialise a S2 MMU strucrure 618 + * kvm_init_stage2_mmu - Initialise a S2 MMU structure 619 619 * @kvm: The pointer to the KVM structure 620 620 * @mmu: The pointer to the s2 MMU structure 621 621 * ··· 653 653 654 654 mmu->pgt = pgt; 655 655 mmu->pgd_phys = __pa(pgt->pgd); 656 - WRITE_ONCE(mmu->vmid.vmid_gen, 0); 657 656 return 0; 658 657 659 658 out_destroy_pgtable: ··· 719 720 720 721 idx = srcu_read_lock(&kvm->srcu); 721 722 mmap_read_lock(current->mm); 722 - spin_lock(&kvm->mmu_lock); 723 + write_lock(&kvm->mmu_lock); 723 724 724 725 slots = kvm_memslots(kvm); 725 726 kvm_for_each_memslot(memslot, bkt, slots) 726 727 stage2_unmap_memslot(kvm, memslot); 727 728 728 - spin_unlock(&kvm->mmu_lock); 729 + write_unlock(&kvm->mmu_lock); 729 730 mmap_read_unlock(current->mm); 730 731 srcu_read_unlock(&kvm->srcu, idx); 731 732 } ··· 735 736 struct kvm *kvm = kvm_s2_mmu_to_kvm(mmu); 736 737 struct kvm_pgtable *pgt = NULL; 737 738 738 - spin_lock(&kvm->mmu_lock); 739 + write_lock(&kvm->mmu_lock); 739 740 pgt = mmu->pgt; 740 741 if (pgt) { 741 742 mmu->pgd_phys = 0; 742 743 mmu->pgt = NULL; 743 744 free_percpu(mmu->last_vcpu_ran); 744 745 } 745 - spin_unlock(&kvm->mmu_lock); 746 + write_unlock(&kvm->mmu_lock); 746 747 747 748 if (pgt) { 748 749 kvm_pgtable_stage2_destroy(pgt); ··· 782 783 if (ret) 783 784 break; 784 785 785 - spin_lock(&kvm->mmu_lock); 786 + write_lock(&kvm->mmu_lock); 786 787 ret = kvm_pgtable_stage2_map(pgt, addr, PAGE_SIZE, pa, prot, 787 788 &cache); 788 - spin_unlock(&kvm->mmu_lock); 789 + write_unlock(&kvm->mmu_lock); 789 790 if (ret) 790 791 break; 791 792 ··· 833 834 start = memslot->base_gfn << PAGE_SHIFT; 834 835 end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT; 835 836 836 - spin_lock(&kvm->mmu_lock); 837 + write_lock(&kvm->mmu_lock); 837 838 stage2_wp_range(&kvm->arch.mmu, start, end); 838 - spin_unlock(&kvm->mmu_lock); 839 + write_unlock(&kvm->mmu_lock); 839 840 kvm_flush_remote_tlbs(kvm); 840 841 } 841 842 ··· 1079 1080 gfn_t gfn; 1080 1081 kvm_pfn_t pfn; 1081 1082 bool logging_active = memslot_is_logging(memslot); 1083 + bool logging_perm_fault = false; 1082 1084 unsigned long fault_level = kvm_vcpu_trap_get_fault_level(vcpu); 1083 1085 unsigned long vma_pagesize, fault_granule; 1084 1086 enum kvm_pgtable_prot prot = KVM_PGTABLE_PROT_R; ··· 1114 1114 if (logging_active) { 1115 1115 force_pte = true; 1116 1116 vma_shift = PAGE_SHIFT; 1117 + logging_perm_fault = (fault_status == FSC_PERM && write_fault); 1117 1118 } else { 1118 1119 vma_shift = get_vma_page_shift(vma, hva); 1119 1120 } ··· 1213 1212 if (exec_fault && device) 1214 1213 return -ENOEXEC; 1215 1214 1216 - spin_lock(&kvm->mmu_lock); 1215 + /* 1216 + * To reduce MMU contentions and enhance concurrency during dirty 1217 + * logging dirty logging, only acquire read lock for permission 1218 + * relaxation. 1219 + */ 1220 + if (logging_perm_fault) 1221 + read_lock(&kvm->mmu_lock); 1222 + else 1223 + write_lock(&kvm->mmu_lock); 1217 1224 pgt = vcpu->arch.hw_mmu->pgt; 1218 1225 if (mmu_notifier_retry(kvm, mmu_seq)) 1219 1226 goto out_unlock; ··· 1280 1271 } 1281 1272 1282 1273 out_unlock: 1283 - spin_unlock(&kvm->mmu_lock); 1274 + if (logging_perm_fault) 1275 + read_unlock(&kvm->mmu_lock); 1276 + else 1277 + write_unlock(&kvm->mmu_lock); 1284 1278 kvm_set_pfn_accessed(pfn); 1285 1279 kvm_release_pfn_clean(pfn); 1286 1280 return ret != -EAGAIN ? ret : 0; ··· 1298 1286 1299 1287 trace_kvm_access_fault(fault_ipa); 1300 1288 1301 - spin_lock(&vcpu->kvm->mmu_lock); 1289 + write_lock(&vcpu->kvm->mmu_lock); 1302 1290 mmu = vcpu->arch.hw_mmu; 1303 1291 kpte = kvm_pgtable_stage2_mkyoung(mmu->pgt, fault_ipa); 1304 - spin_unlock(&vcpu->kvm->mmu_lock); 1292 + write_unlock(&vcpu->kvm->mmu_lock); 1305 1293 1306 1294 pte = __pte(kpte); 1307 1295 if (pte_valid(pte)) ··· 1704 1692 gpa_t gpa = slot->base_gfn << PAGE_SHIFT; 1705 1693 phys_addr_t size = slot->npages << PAGE_SHIFT; 1706 1694 1707 - spin_lock(&kvm->mmu_lock); 1695 + write_lock(&kvm->mmu_lock); 1708 1696 unmap_stage2_range(&kvm->arch.mmu, gpa, size); 1709 - spin_unlock(&kvm->mmu_lock); 1697 + write_unlock(&kvm->mmu_lock); 1710 1698 } 1711 1699 1712 1700 /*

+110 -31

arch/arm64/kvm/pmu-emul.c

··· 7 7 #include <linux/cpu.h> 8 8 #include <linux/kvm.h> 9 9 #include <linux/kvm_host.h> 10 + #include <linux/list.h> 10 11 #include <linux/perf_event.h> 11 12 #include <linux/perf/arm_pmu.h> 12 13 #include <linux/uaccess.h> ··· 17 16 18 17 DEFINE_STATIC_KEY_FALSE(kvm_arm_pmu_available); 19 18 19 + static LIST_HEAD(arm_pmus); 20 + static DEFINE_MUTEX(arm_pmus_lock); 21 + 20 22 static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx); 21 23 static void kvm_pmu_update_pmc_chained(struct kvm_vcpu *vcpu, u64 select_idx); 22 24 static void kvm_pmu_stop_counter(struct kvm_vcpu *vcpu, struct kvm_pmc *pmc); ··· 28 24 29 25 static u32 kvm_pmu_event_mask(struct kvm *kvm) 30 26 { 31 - switch (kvm->arch.pmuver) { 27 + unsigned int pmuver; 28 + 29 + pmuver = kvm->arch.arm_pmu->pmuver; 30 + 31 + switch (pmuver) { 32 32 case ID_AA64DFR0_PMUVER_8_0: 33 33 return GENMASK(9, 0); 34 34 case ID_AA64DFR0_PMUVER_8_1: ··· 41 33 case ID_AA64DFR0_PMUVER_8_7: 42 34 return GENMASK(15, 0); 43 35 default: /* Shouldn't be here, just for sanity */ 44 - WARN_ONCE(1, "Unknown PMU version %d\n", kvm->arch.pmuver); 36 + WARN_ONCE(1, "Unknown PMU version %d\n", pmuver); 45 37 return 0; 46 38 } 47 39 } ··· 608 600 */ 609 601 static void kvm_pmu_create_perf_event(struct kvm_vcpu *vcpu, u64 select_idx) 610 602 { 603 + struct arm_pmu *arm_pmu = vcpu->kvm->arch.arm_pmu; 611 604 struct kvm_pmu *pmu = &vcpu->arch.pmu; 612 605 struct kvm_pmc *pmc; 613 606 struct perf_event *event; ··· 645 636 return; 646 637 647 638 memset(&attr, 0, sizeof(struct perf_event_attr)); 648 - attr.type = PERF_TYPE_RAW; 639 + attr.type = arm_pmu->pmu.type; 649 640 attr.size = sizeof(attr); 650 641 attr.pinned = 1; 651 642 attr.disabled = !kvm_pmu_counter_is_enabled(vcpu, pmc->idx); ··· 754 745 755 746 void kvm_host_pmu_init(struct arm_pmu *pmu) 756 747 { 757 - if (pmu->pmuver != 0 && pmu->pmuver != ID_AA64DFR0_PMUVER_IMP_DEF && 758 - !kvm_arm_support_pmu_v3() && !is_protected_kvm_enabled()) 748 + struct arm_pmu_entry *entry; 749 + 750 + if (pmu->pmuver == 0 || pmu->pmuver == ID_AA64DFR0_PMUVER_IMP_DEF || 751 + is_protected_kvm_enabled()) 752 + return; 753 + 754 + mutex_lock(&arm_pmus_lock); 755 + 756 + entry = kmalloc(sizeof(*entry), GFP_KERNEL); 757 + if (!entry) 758 + goto out_unlock; 759 + 760 + entry->arm_pmu = pmu; 761 + list_add_tail(&entry->entry, &arm_pmus); 762 + 763 + if (list_is_singular(&arm_pmus)) 759 764 static_branch_enable(&kvm_arm_pmu_available); 765 + 766 + out_unlock: 767 + mutex_unlock(&arm_pmus_lock); 760 768 } 761 769 762 - static int kvm_pmu_probe_pmuver(void) 770 + static struct arm_pmu *kvm_pmu_probe_armpmu(void) 763 771 { 764 772 struct perf_event_attr attr = { }; 765 773 struct perf_event *event; 766 - struct arm_pmu *pmu; 767 - int pmuver = ID_AA64DFR0_PMUVER_IMP_DEF; 774 + struct arm_pmu *pmu = NULL; 768 775 769 776 /* 770 777 * Create a dummy event that only counts user cycles. As we'll never ··· 805 780 if (IS_ERR(event)) { 806 781 pr_err_once("kvm: pmu event creation failed %ld\n", 807 782 PTR_ERR(event)); 808 - return ID_AA64DFR0_PMUVER_IMP_DEF; 783 + return NULL; 809 784 } 810 785 811 786 if (event->pmu) { 812 787 pmu = to_arm_pmu(event->pmu); 813 - if (pmu->pmuver) 814 - pmuver = pmu->pmuver; 788 + if (pmu->pmuver == 0 || 789 + pmu->pmuver == ID_AA64DFR0_PMUVER_IMP_DEF) 790 + pmu = NULL; 815 791 } 816 792 817 793 perf_event_disable(event); 818 794 perf_event_release_kernel(event); 819 795 820 - return pmuver; 796 + return pmu; 821 797 } 822 798 823 799 u64 kvm_pmu_get_pmceid(struct kvm_vcpu *vcpu, bool pmceid1) ··· 836 810 * Don't advertise STALL_SLOT, as PMMIR_EL0 is handled 837 811 * as RAZ 838 812 */ 839 - if (vcpu->kvm->arch.pmuver >= ID_AA64DFR0_PMUVER_8_4) 813 + if (vcpu->kvm->arch.arm_pmu->pmuver >= ID_AA64DFR0_PMUVER_8_4) 840 814 val &= ~BIT_ULL(ARMV8_PMUV3_PERFCTR_STALL_SLOT - 32); 841 815 base = 32; 842 816 } ··· 948 922 return true; 949 923 } 950 924 925 + static int kvm_arm_pmu_v3_set_pmu(struct kvm_vcpu *vcpu, int pmu_id) 926 + { 927 + struct kvm *kvm = vcpu->kvm; 928 + struct arm_pmu_entry *entry; 929 + struct arm_pmu *arm_pmu; 930 + int ret = -ENXIO; 931 + 932 + mutex_lock(&kvm->lock); 933 + mutex_lock(&arm_pmus_lock); 934 + 935 + list_for_each_entry(entry, &arm_pmus, entry) { 936 + arm_pmu = entry->arm_pmu; 937 + if (arm_pmu->pmu.type == pmu_id) { 938 + if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags) || 939 + (kvm->arch.pmu_filter && kvm->arch.arm_pmu != arm_pmu)) { 940 + ret = -EBUSY; 941 + break; 942 + } 943 + 944 + kvm->arch.arm_pmu = arm_pmu; 945 + cpumask_copy(kvm->arch.supported_cpus, &arm_pmu->supported_cpus); 946 + ret = 0; 947 + break; 948 + } 949 + } 950 + 951 + mutex_unlock(&arm_pmus_lock); 952 + mutex_unlock(&kvm->lock); 953 + return ret; 954 + } 955 + 951 956 int kvm_arm_pmu_v3_set_attr(struct kvm_vcpu *vcpu, struct kvm_device_attr *attr) 952 957 { 958 + struct kvm *kvm = vcpu->kvm; 959 + 953 960 if (!kvm_vcpu_has_pmu(vcpu)) 954 961 return -ENODEV; 955 962 956 963 if (vcpu->arch.pmu.created) 957 964 return -EBUSY; 958 965 959 - if (!vcpu->kvm->arch.pmuver) 960 - vcpu->kvm->arch.pmuver = kvm_pmu_probe_pmuver(); 961 - 962 - if (vcpu->kvm->arch.pmuver == ID_AA64DFR0_PMUVER_IMP_DEF) 963 - return -ENODEV; 966 + mutex_lock(&kvm->lock); 967 + if (!kvm->arch.arm_pmu) { 968 + /* No PMU set, get the default one */ 969 + kvm->arch.arm_pmu = kvm_pmu_probe_armpmu(); 970 + if (!kvm->arch.arm_pmu) { 971 + mutex_unlock(&kvm->lock); 972 + return -ENODEV; 973 + } 974 + } 975 + mutex_unlock(&kvm->lock); 964 976 965 977 switch (attr->attr) { 966 978 case KVM_ARM_VCPU_PMU_V3_IRQ: { 967 979 int __user *uaddr = (int __user *)(long)attr->addr; 968 980 int irq; 969 981 970 - if (!irqchip_in_kernel(vcpu->kvm)) 982 + if (!irqchip_in_kernel(kvm)) 971 983 return -EINVAL; 972 984 973 985 if (get_user(irq, uaddr)) ··· 1015 951 if (!(irq_is_ppi(irq) || irq_is_spi(irq))) 1016 952 return -EINVAL; 1017 953 1018 - if (!pmu_irq_is_valid(vcpu->kvm, irq)) 954 + if (!pmu_irq_is_valid(kvm, irq)) 1019 955 return -EINVAL; 1020 956 1021 957 if (kvm_arm_pmu_irq_initialized(vcpu)) ··· 1030 966 struct kvm_pmu_event_filter filter; 1031 967 int nr_events; 1032 968 1033 - nr_events = kvm_pmu_event_mask(vcpu->kvm) + 1; 969 + nr_events = kvm_pmu_event_mask(kvm) + 1; 1034 970 1035 971 uaddr = (struct kvm_pmu_event_filter __user *)(long)attr->addr; 1036 972 ··· 1042 978 filter.action != KVM_PMU_EVENT_DENY)) 1043 979 return -EINVAL; 1044 980 1045 - mutex_lock(&vcpu->kvm->lock); 981 + mutex_lock(&kvm->lock); 1046 982 1047 - if (!vcpu->kvm->arch.pmu_filter) { 1048 - vcpu->kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT); 1049 - if (!vcpu->kvm->arch.pmu_filter) { 1050 - mutex_unlock(&vcpu->kvm->lock); 983 + if (test_bit(KVM_ARCH_FLAG_HAS_RAN_ONCE, &kvm->arch.flags)) { 984 + mutex_unlock(&kvm->lock); 985 + return -EBUSY; 986 + } 987 + 988 + if (!kvm->arch.pmu_filter) { 989 + kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT); 990 + if (!kvm->arch.pmu_filter) { 991 + mutex_unlock(&kvm->lock); 1051 992 return -ENOMEM; 1052 993 } 1053 994 ··· 1063 994 * events, the default is to allow. 1064 995 */ 1065 996 if (filter.action == KVM_PMU_EVENT_ALLOW) 1066 - bitmap_zero(vcpu->kvm->arch.pmu_filter, nr_events); 997 + bitmap_zero(kvm->arch.pmu_filter, nr_events); 1067 998 else 1068 - bitmap_fill(vcpu->kvm->arch.pmu_filter, nr_events); 999 + bitmap_fill(kvm->arch.pmu_filter, nr_events); 1069 1000 } 1070 1001 1071 1002 if (filter.action == KVM_PMU_EVENT_ALLOW) 1072 - bitmap_set(vcpu->kvm->arch.pmu_filter, filter.base_event, filter.nevents); 1003 + bitmap_set(kvm->arch.pmu_filter, filter.base_event, filter.nevents); 1073 1004 else 1074 - bitmap_clear(vcpu->kvm->arch.pmu_filter, filter.base_event, filter.nevents); 1005 + bitmap_clear(kvm->arch.pmu_filter, filter.base_event, filter.nevents); 1075 1006 1076 - mutex_unlock(&vcpu->kvm->lock); 1007 + mutex_unlock(&kvm->lock); 1077 1008 1078 1009 return 0; 1010 + } 1011 + case KVM_ARM_VCPU_PMU_V3_SET_PMU: { 1012 + int __user *uaddr = (int __user *)(long)attr->addr; 1013 + int pmu_id; 1014 + 1015 + if (get_user(pmu_id, uaddr)) 1016 + return -EFAULT; 1017 + 1018 + return kvm_arm_pmu_v3_set_pmu(vcpu, pmu_id); 1079 1019 } 1080 1020 case KVM_ARM_VCPU_PMU_V3_INIT: 1081 1021 return kvm_arm_pmu_v3_init(vcpu); ··· 1123 1045 case KVM_ARM_VCPU_PMU_V3_IRQ: 1124 1046 case KVM_ARM_VCPU_PMU_V3_INIT: 1125 1047 case KVM_ARM_VCPU_PMU_V3_FILTER: 1048 + case KVM_ARM_VCPU_PMU_V3_SET_PMU: 1126 1049 if (kvm_vcpu_has_pmu(vcpu)) 1127 1050 return 0; 1128 1051 }

+52 -14

arch/arm64/kvm/psci.c

··· 84 84 if (!vcpu) 85 85 return PSCI_RET_INVALID_PARAMS; 86 86 if (!vcpu->arch.power_off) { 87 - if (kvm_psci_version(source_vcpu, kvm) != KVM_ARM_PSCI_0_1) 87 + if (kvm_psci_version(source_vcpu) != KVM_ARM_PSCI_0_1) 88 88 return PSCI_RET_ALREADY_ON; 89 89 else 90 90 return PSCI_RET_INVALID_PARAMS; ··· 161 161 return PSCI_0_2_AFFINITY_LEVEL_OFF; 162 162 } 163 163 164 - static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type) 164 + static void kvm_prepare_system_event(struct kvm_vcpu *vcpu, u32 type, u64 flags) 165 165 { 166 166 unsigned long i; 167 167 struct kvm_vcpu *tmp; ··· 181 181 182 182 memset(&vcpu->run->system_event, 0, sizeof(vcpu->run->system_event)); 183 183 vcpu->run->system_event.type = type; 184 + vcpu->run->system_event.flags = flags; 184 185 vcpu->run->exit_reason = KVM_EXIT_SYSTEM_EVENT; 185 186 } 186 187 187 188 static void kvm_psci_system_off(struct kvm_vcpu *vcpu) 188 189 { 189 - kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN); 190 + kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN, 0); 190 191 } 191 192 192 193 static void kvm_psci_system_reset(struct kvm_vcpu *vcpu) 193 194 { 194 - kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET); 195 + kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET, 0); 196 + } 197 + 198 + static void kvm_psci_system_reset2(struct kvm_vcpu *vcpu) 199 + { 200 + kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET, 201 + KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2); 195 202 } 196 203 197 204 static void kvm_psci_narrow_to_32bit(struct kvm_vcpu *vcpu) ··· 311 304 return ret; 312 305 } 313 306 314 - static int kvm_psci_1_0_call(struct kvm_vcpu *vcpu) 307 + static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor) 315 308 { 316 309 u32 psci_fn = smccc_get_function(vcpu); 317 - u32 feature; 310 + u32 arg; 318 311 unsigned long val; 319 312 int ret = 1; 320 313 314 + if (minor > 1) 315 + return -EINVAL; 316 + 321 317 switch(psci_fn) { 322 318 case PSCI_0_2_FN_PSCI_VERSION: 323 - val = KVM_ARM_PSCI_1_0; 319 + val = minor == 0 ? KVM_ARM_PSCI_1_0 : KVM_ARM_PSCI_1_1; 324 320 break; 325 321 case PSCI_1_0_FN_PSCI_FEATURES: 326 - feature = smccc_get_arg1(vcpu); 327 - val = kvm_psci_check_allowed_function(vcpu, feature); 322 + arg = smccc_get_arg1(vcpu); 323 + val = kvm_psci_check_allowed_function(vcpu, arg); 328 324 if (val) 329 325 break; 330 326 331 - switch(feature) { 327 + switch(arg) { 332 328 case PSCI_0_2_FN_PSCI_VERSION: 333 329 case PSCI_0_2_FN_CPU_SUSPEND: 334 330 case PSCI_0_2_FN64_CPU_SUSPEND: ··· 347 337 case ARM_SMCCC_VERSION_FUNC_ID: 348 338 val = 0; 349 339 break; 340 + case PSCI_1_1_FN_SYSTEM_RESET2: 341 + case PSCI_1_1_FN64_SYSTEM_RESET2: 342 + if (minor >= 1) { 343 + val = 0; 344 + break; 345 + } 346 + fallthrough; 350 347 default: 351 348 val = PSCI_RET_NOT_SUPPORTED; 352 349 break; 353 350 } 354 351 break; 352 + case PSCI_1_1_FN_SYSTEM_RESET2: 353 + kvm_psci_narrow_to_32bit(vcpu); 354 + fallthrough; 355 + case PSCI_1_1_FN64_SYSTEM_RESET2: 356 + if (minor >= 1) { 357 + arg = smccc_get_arg1(vcpu); 358 + 359 + if (arg <= PSCI_1_1_RESET_TYPE_SYSTEM_WARM_RESET || 360 + arg >= PSCI_1_1_RESET_TYPE_VENDOR_START) { 361 + kvm_psci_system_reset2(vcpu); 362 + vcpu_set_reg(vcpu, 0, PSCI_RET_INTERNAL_FAILURE); 363 + return 0; 364 + } 365 + 366 + val = PSCI_RET_INVALID_PARAMS; 367 + break; 368 + } 369 + fallthrough; 355 370 default: 356 371 return kvm_psci_0_2_call(vcpu); 357 372 } ··· 426 391 */ 427 392 int kvm_psci_call(struct kvm_vcpu *vcpu) 428 393 { 429 - switch (kvm_psci_version(vcpu, vcpu->kvm)) { 394 + switch (kvm_psci_version(vcpu)) { 395 + case KVM_ARM_PSCI_1_1: 396 + return kvm_psci_1_x_call(vcpu, 1); 430 397 case KVM_ARM_PSCI_1_0: 431 - return kvm_psci_1_0_call(vcpu); 398 + return kvm_psci_1_x_call(vcpu, 0); 432 399 case KVM_ARM_PSCI_0_2: 433 400 return kvm_psci_0_2_call(vcpu); 434 401 case KVM_ARM_PSCI_0_1: 435 402 return kvm_psci_0_1_call(vcpu); 436 403 default: 437 404 return -EINVAL; 438 - }; 405 + } 439 406 } 440 407 441 408 int kvm_arm_get_fw_num_regs(struct kvm_vcpu *vcpu) ··· 507 470 508 471 switch (reg->id) { 509 472 case KVM_REG_ARM_PSCI_VERSION: 510 - val = kvm_psci_version(vcpu, vcpu->kvm); 473 + val = kvm_psci_version(vcpu); 511 474 break; 512 475 case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_1: 513 476 case KVM_REG_ARM_SMCCC_ARCH_WORKAROUND_2: ··· 547 510 return 0; 548 511 case KVM_ARM_PSCI_0_2: 549 512 case KVM_ARM_PSCI_1_0: 513 + case KVM_ARM_PSCI_1_1: 550 514 if (!wants_02) 551 515 return -EINVAL; 552 516 vcpu->kvm->arch.psci_version = val;

+57 -17

arch/arm64/kvm/sys_regs.c

··· 44 44 * 64bit interface. 45 45 */ 46 46 47 + static int reg_from_user(u64 *val, const void __user *uaddr, u64 id); 48 + static int reg_to_user(void __user *uaddr, const u64 *val, u64 id); 49 + static u64 sys_reg_to_index(const struct sys_reg_desc *reg); 50 + 47 51 static bool read_from_write_only(struct kvm_vcpu *vcpu, 48 52 struct sys_reg_params *params, 49 53 const struct sys_reg_desc *r) ··· 291 287 return trap_raz_wi(vcpu, p, r); 292 288 } 293 289 290 + static bool trap_oslar_el1(struct kvm_vcpu *vcpu, 291 + struct sys_reg_params *p, 292 + const struct sys_reg_desc *r) 293 + { 294 + u64 oslsr; 295 + 296 + if (!p->is_write) 297 + return read_from_write_only(vcpu, p, r); 298 + 299 + /* Forward the OSLK bit to OSLSR */ 300 + oslsr = __vcpu_sys_reg(vcpu, OSLSR_EL1) & ~SYS_OSLSR_OSLK; 301 + if (p->regval & SYS_OSLAR_OSLK) 302 + oslsr |= SYS_OSLSR_OSLK; 303 + 304 + __vcpu_sys_reg(vcpu, OSLSR_EL1) = oslsr; 305 + return true; 306 + } 307 + 294 308 static bool trap_oslsr_el1(struct kvm_vcpu *vcpu, 295 309 struct sys_reg_params *p, 296 310 const struct sys_reg_desc *r) 297 311 { 298 - if (p->is_write) { 299 - return ignore_write(vcpu, p); 300 - } else { 301 - p->regval = (1 << 3); 302 - return true; 303 - } 312 + if (p->is_write) 313 + return write_to_read_only(vcpu, p, r); 314 + 315 + p->regval = __vcpu_sys_reg(vcpu, r->reg); 316 + return true; 317 + } 318 + 319 + static int set_oslsr_el1(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, 320 + const struct kvm_one_reg *reg, void __user *uaddr) 321 + { 322 + u64 id = sys_reg_to_index(rd); 323 + u64 val; 324 + int err; 325 + 326 + err = reg_from_user(&val, uaddr, id); 327 + if (err) 328 + return err; 329 + 330 + /* 331 + * The only modifiable bit is the OSLK bit. Refuse the write if 332 + * userspace attempts to change any other bit in the register. 333 + */ 334 + if ((val ^ rd->val) & ~SYS_OSLSR_OSLK) 335 + return -EINVAL; 336 + 337 + __vcpu_sys_reg(vcpu, rd->reg) = val; 338 + return 0; 304 339 } 305 340 306 341 static bool trap_dbgauthstatus_el1(struct kvm_vcpu *vcpu, ··· 1207 1164 return __access_id_reg(vcpu, p, r, true); 1208 1165 } 1209 1166 1210 - static int reg_from_user(u64 *val, const void __user *uaddr, u64 id); 1211 - static int reg_to_user(void __user *uaddr, const u64 *val, u64 id); 1212 - static u64 sys_reg_to_index(const struct sys_reg_desc *reg); 1213 - 1214 1167 /* Visibility overrides for SVE-specific control registers */ 1215 1168 static unsigned int sve_visibility(const struct kvm_vcpu *vcpu, 1216 1169 const struct sys_reg_desc *rd) ··· 1457 1418 * Debug handling: We do trap most, if not all debug related system 1458 1419 * registers. The implementation is good enough to ensure that a guest 1459 1420 * can use these with minimal performance degradation. The drawback is 1460 - * that we don't implement any of the external debug, none of the 1461 - * OSlock protocol. This should be revisited if we ever encounter a 1462 - * more demanding guest... 1421 + * that we don't implement any of the external debug architecture. 1422 + * This should be revisited if we ever encounter a more demanding 1423 + * guest... 1463 1424 */ 1464 1425 static const struct sys_reg_desc sys_reg_descs[] = { 1465 1426 { SYS_DESC(SYS_DC_ISW), access_dcsw }, ··· 1486 1447 DBG_BCR_BVR_WCR_WVR_EL1(15), 1487 1448 1488 1449 { SYS_DESC(SYS_MDRAR_EL1), trap_raz_wi }, 1489 - { SYS_DESC(SYS_OSLAR_EL1), trap_raz_wi }, 1490 - { SYS_DESC(SYS_OSLSR_EL1), trap_oslsr_el1 }, 1450 + { SYS_DESC(SYS_OSLAR_EL1), trap_oslar_el1 }, 1451 + { SYS_DESC(SYS_OSLSR_EL1), trap_oslsr_el1, reset_val, OSLSR_EL1, 1452 + SYS_OSLSR_OSLM_IMPLEMENTED, .set_user = set_oslsr_el1, }, 1491 1453 { SYS_DESC(SYS_OSDLR_EL1), trap_raz_wi }, 1492 1454 { SYS_DESC(SYS_DBGPRCR_EL1), trap_raz_wi }, 1493 1455 { SYS_DESC(SYS_DBGCLAIMSET_EL1), trap_raz_wi }, ··· 1960 1920 1961 1921 DBGBXVR(0), 1962 1922 /* DBGOSLAR */ 1963 - { Op1( 0), CRn( 1), CRm( 0), Op2( 4), trap_raz_wi }, 1923 + { Op1( 0), CRn( 1), CRm( 0), Op2( 4), trap_oslar_el1 }, 1964 1924 DBGBXVR(1), 1965 1925 /* DBGOSLSR */ 1966 - { Op1( 0), CRn( 1), CRm( 1), Op2( 4), trap_oslsr_el1 }, 1926 + { Op1( 0), CRn( 1), CRm( 1), Op2( 4), trap_oslsr_el1, NULL, OSLSR_EL1 }, 1967 1927 DBGBXVR(2), 1968 1928 DBGBXVR(3), 1969 1929 /* DBGOSDLR */

+1 -1

arch/arm64/kvm/vgic/vgic.c

··· 37 37 * If you need to take multiple locks, always take the upper lock first, 38 38 * then the lower ones, e.g. first take the its_lock, then the irq_lock. 39 39 * If you are already holding a lock and need to take a higher one, you 40 - * have to drop the lower ranking lock first and re-aquire it after having 40 + * have to drop the lower ranking lock first and re-acquire it after having 41 41 * taken the upper one. 42 42 * 43 43 * When taking more than one ap_list_lock at the same time, always take the

+196

arch/arm64/kvm/vmid.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * VMID allocator. 4 + * 5 + * Based on Arm64 ASID allocator algorithm. 6 + * Please refer arch/arm64/mm/context.c for detailed 7 + * comments on algorithm. 8 + * 9 + * Copyright (C) 2002-2003 Deep Blue Solutions Ltd, all rights reserved. 10 + * Copyright (C) 2012 ARM Ltd. 11 + */ 12 + 13 + #include <linux/bitfield.h> 14 + #include <linux/bitops.h> 15 + 16 + #include <asm/kvm_asm.h> 17 + #include <asm/kvm_mmu.h> 18 + 19 + unsigned int kvm_arm_vmid_bits; 20 + static DEFINE_RAW_SPINLOCK(cpu_vmid_lock); 21 + 22 + static atomic64_t vmid_generation; 23 + static unsigned long *vmid_map; 24 + 25 + static DEFINE_PER_CPU(atomic64_t, active_vmids); 26 + static DEFINE_PER_CPU(u64, reserved_vmids); 27 + 28 + #define VMID_MASK (~GENMASK(kvm_arm_vmid_bits - 1, 0)) 29 + #define VMID_FIRST_VERSION (1UL << kvm_arm_vmid_bits) 30 + 31 + #define NUM_USER_VMIDS VMID_FIRST_VERSION 32 + #define vmid2idx(vmid) ((vmid) & ~VMID_MASK) 33 + #define idx2vmid(idx) vmid2idx(idx) 34 + 35 + /* 36 + * As vmid #0 is always reserved, we will never allocate one 37 + * as below and can be treated as invalid. This is used to 38 + * set the active_vmids on vCPU schedule out. 39 + */ 40 + #define VMID_ACTIVE_INVALID VMID_FIRST_VERSION 41 + 42 + #define vmid_gen_match(vmid) \ 43 + (!(((vmid) ^ atomic64_read(&vmid_generation)) >> kvm_arm_vmid_bits)) 44 + 45 + static void flush_context(void) 46 + { 47 + int cpu; 48 + u64 vmid; 49 + 50 + bitmap_clear(vmid_map, 0, NUM_USER_VMIDS); 51 + 52 + for_each_possible_cpu(cpu) { 53 + vmid = atomic64_xchg_relaxed(&per_cpu(active_vmids, cpu), 0); 54 + 55 + /* Preserve reserved VMID */ 56 + if (vmid == 0) 57 + vmid = per_cpu(reserved_vmids, cpu); 58 + __set_bit(vmid2idx(vmid), vmid_map); 59 + per_cpu(reserved_vmids, cpu) = vmid; 60 + } 61 + 62 + /* 63 + * Unlike ASID allocator, we expect less frequent rollover in 64 + * case of VMIDs. Hence, instead of marking the CPU as 65 + * flush_pending and issuing a local context invalidation on 66 + * the next context-switch, we broadcast TLB flush + I-cache 67 + * invalidation over the inner shareable domain on rollover. 68 + */ 69 + kvm_call_hyp(__kvm_flush_vm_context); 70 + } 71 + 72 + static bool check_update_reserved_vmid(u64 vmid, u64 newvmid) 73 + { 74 + int cpu; 75 + bool hit = false; 76 + 77 + /* 78 + * Iterate over the set of reserved VMIDs looking for a match 79 + * and update to use newvmid (i.e. the same VMID in the current 80 + * generation). 81 + */ 82 + for_each_possible_cpu(cpu) { 83 + if (per_cpu(reserved_vmids, cpu) == vmid) { 84 + hit = true; 85 + per_cpu(reserved_vmids, cpu) = newvmid; 86 + } 87 + } 88 + 89 + return hit; 90 + } 91 + 92 + static u64 new_vmid(struct kvm_vmid *kvm_vmid) 93 + { 94 + static u32 cur_idx = 1; 95 + u64 vmid = atomic64_read(&kvm_vmid->id); 96 + u64 generation = atomic64_read(&vmid_generation); 97 + 98 + if (vmid != 0) { 99 + u64 newvmid = generation | (vmid & ~VMID_MASK); 100 + 101 + if (check_update_reserved_vmid(vmid, newvmid)) { 102 + atomic64_set(&kvm_vmid->id, newvmid); 103 + return newvmid; 104 + } 105 + 106 + if (!__test_and_set_bit(vmid2idx(vmid), vmid_map)) { 107 + atomic64_set(&kvm_vmid->id, newvmid); 108 + return newvmid; 109 + } 110 + } 111 + 112 + vmid = find_next_zero_bit(vmid_map, NUM_USER_VMIDS, cur_idx); 113 + if (vmid != NUM_USER_VMIDS) 114 + goto set_vmid; 115 + 116 + /* We're out of VMIDs, so increment the global generation count */ 117 + generation = atomic64_add_return_relaxed(VMID_FIRST_VERSION, 118 + &vmid_generation); 119 + flush_context(); 120 + 121 + /* We have more VMIDs than CPUs, so this will always succeed */ 122 + vmid = find_next_zero_bit(vmid_map, NUM_USER_VMIDS, 1); 123 + 124 + set_vmid: 125 + __set_bit(vmid, vmid_map); 126 + cur_idx = vmid; 127 + vmid = idx2vmid(vmid) | generation; 128 + atomic64_set(&kvm_vmid->id, vmid); 129 + return vmid; 130 + } 131 + 132 + /* Called from vCPU sched out with preemption disabled */ 133 + void kvm_arm_vmid_clear_active(void) 134 + { 135 + atomic64_set(this_cpu_ptr(&active_vmids), VMID_ACTIVE_INVALID); 136 + } 137 + 138 + void kvm_arm_vmid_update(struct kvm_vmid *kvm_vmid) 139 + { 140 + unsigned long flags; 141 + u64 vmid, old_active_vmid; 142 + 143 + vmid = atomic64_read(&kvm_vmid->id); 144 + 145 + /* 146 + * Please refer comments in check_and_switch_context() in 147 + * arch/arm64/mm/context.c. 148 + * 149 + * Unlike ASID allocator, we set the active_vmids to 150 + * VMID_ACTIVE_INVALID on vCPU schedule out to avoid 151 + * reserving the VMID space needlessly on rollover. 152 + * Hence explicitly check here for a "!= 0" to 153 + * handle the sync with a concurrent rollover. 154 + */ 155 + old_active_vmid = atomic64_read(this_cpu_ptr(&active_vmids)); 156 + if (old_active_vmid != 0 && vmid_gen_match(vmid) && 157 + 0 != atomic64_cmpxchg_relaxed(this_cpu_ptr(&active_vmids), 158 + old_active_vmid, vmid)) 159 + return; 160 + 161 + raw_spin_lock_irqsave(&cpu_vmid_lock, flags); 162 + 163 + /* Check that our VMID belongs to the current generation. */ 164 + vmid = atomic64_read(&kvm_vmid->id); 165 + if (!vmid_gen_match(vmid)) 166 + vmid = new_vmid(kvm_vmid); 167 + 168 + atomic64_set(this_cpu_ptr(&active_vmids), vmid); 169 + raw_spin_unlock_irqrestore(&cpu_vmid_lock, flags); 170 + } 171 + 172 + /* 173 + * Initialize the VMID allocator 174 + */ 175 + int kvm_arm_vmid_alloc_init(void) 176 + { 177 + kvm_arm_vmid_bits = kvm_get_vmid_bits(); 178 + 179 + /* 180 + * Expect allocation after rollover to fail if we don't have 181 + * at least one more VMID than CPUs. VMID #0 is always reserved. 182 + */ 183 + WARN_ON(NUM_USER_VMIDS - 1 <= num_possible_cpus()); 184 + atomic64_set(&vmid_generation, VMID_FIRST_VERSION); 185 + vmid_map = kcalloc(BITS_TO_LONGS(NUM_USER_VMIDS), 186 + sizeof(*vmid_map), GFP_KERNEL); 187 + if (!vmid_map) 188 + return -ENOMEM; 189 + 190 + return 0; 191 + } 192 + 193 + void kvm_arm_vmid_alloc_free(void) 194 + { 195 + kfree(vmid_map); 196 + }

+5

include/kvm/arm_pmu.h

··· 29 29 struct irq_work overflow_work; 30 30 }; 31 31 32 + struct arm_pmu_entry { 33 + struct list_head entry; 34 + struct arm_pmu *arm_pmu; 35 + }; 36 + 32 37 DECLARE_STATIC_KEY_FALSE(kvm_arm_pmu_available); 33 38 34 39 static __always_inline bool kvm_arm_support_pmu_v3(void)

+3 -6

include/kvm/arm_psci.h

··· 13 13 #define KVM_ARM_PSCI_0_1 PSCI_VERSION(0, 1) 14 14 #define KVM_ARM_PSCI_0_2 PSCI_VERSION(0, 2) 15 15 #define KVM_ARM_PSCI_1_0 PSCI_VERSION(1, 0) 16 + #define KVM_ARM_PSCI_1_1 PSCI_VERSION(1, 1) 16 17 17 - #define KVM_ARM_PSCI_LATEST KVM_ARM_PSCI_1_0 18 + #define KVM_ARM_PSCI_LATEST KVM_ARM_PSCI_1_1 18 19 19 - /* 20 - * We need the KVM pointer independently from the vcpu as we can call 21 - * this from HYP, and need to apply kern_hyp_va on it... 22 - */ 23 - static inline int kvm_psci_version(struct kvm_vcpu *vcpu, struct kvm *kvm) 20 + static inline int kvm_psci_version(struct kvm_vcpu *vcpu) 24 21 { 25 22 /* 26 23 * Our PSCI implementation stays the same across versions from

+1 -1

include/linux/perf_event.h

··· 864 864 #define PERF_NR_CONTEXTS 4 865 865 866 866 /** 867 - * struct perf_event_cpu_context - per cpu event context structure 867 + * struct perf_cpu_context - per cpu event context structure 868 868 */ 869 869 struct perf_cpu_context { 870 870 struct perf_event_context ctx;

+4

include/uapi/linux/psci.h

··· 82 82 #define PSCI_0_2_TOS_UP_NO_MIGRATE 1 83 83 #define PSCI_0_2_TOS_MP 2 84 84 85 + /* PSCI v1.1 reset type encoding for SYSTEM_RESET2 */ 86 + #define PSCI_1_1_RESET_TYPE_SYSTEM_WARM_RESET 0 87 + #define PSCI_1_1_RESET_TYPE_VENDOR_START 0x80000000U 88 + 85 89 /* PSCI version decoding (independent of PSCI version) */ 86 90 #define PSCI_VERSION_MAJOR_SHIFT 16 87 91 #define PSCI_VERSION_MINOR_MASK \

+1

tools/arch/arm64/include/uapi/asm/kvm.h

··· 362 362 #define KVM_ARM_VCPU_PMU_V3_IRQ 0 363 363 #define KVM_ARM_VCPU_PMU_V3_INIT 1 364 364 #define KVM_ARM_VCPU_PMU_V3_FILTER 2 365 + #define KVM_ARM_VCPU_PMU_V3_SET_PMU 3 365 366 #define KVM_ARM_VCPU_TIMER_CTRL 1 366 367 #define KVM_ARM_VCPU_TIMER_IRQ_VTIMER 0 367 368 #define KVM_ARM_VCPU_TIMER_IRQ_PTIMER 1

+56 -2

tools/testing/selftests/kvm/aarch64/debug-exceptions.c

··· 23 23 #define SPSR_D (1 << 9) 24 24 #define SPSR_SS (1 << 21) 25 25 26 - extern unsigned char sw_bp, hw_bp, bp_svc, bp_brk, hw_wp, ss_start; 26 + extern unsigned char sw_bp, sw_bp2, hw_bp, hw_bp2, bp_svc, bp_brk, hw_wp, ss_start; 27 27 static volatile uint64_t sw_bp_addr, hw_bp_addr; 28 28 static volatile uint64_t wp_addr, wp_data_addr; 29 29 static volatile uint64_t svc_addr; ··· 45 45 write_sysreg(0, dbgwcr0_el1); 46 46 write_sysreg(0, dbgwvr0_el1); 47 47 isb(); 48 + } 49 + 50 + static void enable_os_lock(void) 51 + { 52 + write_sysreg(1, oslar_el1); 53 + isb(); 54 + 55 + GUEST_ASSERT(read_sysreg(oslsr_el1) & 2); 48 56 } 49 57 50 58 static void install_wp(uint64_t addr) ··· 107 99 GUEST_SYNC(0); 108 100 109 101 /* Software-breakpoint */ 102 + reset_debug_state(); 110 103 asm volatile("sw_bp: brk #0"); 111 104 GUEST_ASSERT_EQ(sw_bp_addr, PC(sw_bp)); 112 105 ··· 160 151 GUEST_ASSERT_EQ(ss_addr[0], PC(ss_start)); 161 152 GUEST_ASSERT_EQ(ss_addr[1], PC(ss_start) + 4); 162 153 GUEST_ASSERT_EQ(ss_addr[2], PC(ss_start) + 8); 154 + 155 + GUEST_SYNC(6); 156 + 157 + /* OS Lock does not block software-breakpoint */ 158 + reset_debug_state(); 159 + enable_os_lock(); 160 + sw_bp_addr = 0; 161 + asm volatile("sw_bp2: brk #0"); 162 + GUEST_ASSERT_EQ(sw_bp_addr, PC(sw_bp2)); 163 + 164 + GUEST_SYNC(7); 165 + 166 + /* OS Lock blocking hardware-breakpoint */ 167 + reset_debug_state(); 168 + enable_os_lock(); 169 + install_hw_bp(PC(hw_bp2)); 170 + hw_bp_addr = 0; 171 + asm volatile("hw_bp2: nop"); 172 + GUEST_ASSERT_EQ(hw_bp_addr, 0); 173 + 174 + GUEST_SYNC(8); 175 + 176 + /* OS Lock blocking watchpoint */ 177 + reset_debug_state(); 178 + enable_os_lock(); 179 + write_data = '\0'; 180 + wp_data_addr = 0; 181 + install_wp(PC(write_data)); 182 + write_data = 'x'; 183 + GUEST_ASSERT_EQ(write_data, 'x'); 184 + GUEST_ASSERT_EQ(wp_data_addr, 0); 185 + 186 + GUEST_SYNC(9); 187 + 188 + /* OS Lock blocking single-step */ 189 + reset_debug_state(); 190 + enable_os_lock(); 191 + ss_addr[0] = 0; 192 + install_ss(); 193 + ss_idx = 0; 194 + asm volatile("mrs x0, esr_el1\n\t" 195 + "add x0, x0, #1\n\t" 196 + "msr daifset, #8\n\t" 197 + : : : "x0"); 198 + GUEST_ASSERT_EQ(ss_addr[0], 0); 163 199 164 200 GUEST_DONE(); 165 201 } ··· 277 223 vm_install_sync_handler(vm, VECTOR_SYNC_CURRENT, 278 224 ESR_EC_SVC64, guest_svc_handler); 279 225 280 - for (stage = 0; stage < 7; stage++) { 226 + for (stage = 0; stage < 11; stage++) { 281 227 vcpu_run(vm, VCPU_ID); 282 228 283 229 switch (get_ucall(vm, VCPU_ID, &uc)) {

+1

tools/testing/selftests/kvm/aarch64/get-reg-list.c

··· 760 760 ARM64_SYS_REG(2, 0, 0, 15, 5), 761 761 ARM64_SYS_REG(2, 0, 0, 15, 6), 762 762 ARM64_SYS_REG(2, 0, 0, 15, 7), 763 + ARM64_SYS_REG(2, 0, 1, 1, 4), /* OSLSR_EL1 */ 763 764 ARM64_SYS_REG(2, 4, 0, 7, 0), /* DBGVCR32_EL2 */ 764 765 ARM64_SYS_REG(3, 0, 0, 0, 5), /* MPIDR_EL1 */ 765 766 ARM64_SYS_REG(3, 0, 0, 1, 0), /* ID_PFR0_EL1 */

+26 -19

tools/testing/selftests/kvm/aarch64/vgic_irq.c

··· 306 306 uint32_t prio, intid, ap1r; 307 307 int i; 308 308 309 - /* Set the priorities of the first (KVM_NUM_PRIOS - 1) IRQs 309 + /* 310 + * Set the priorities of the first (KVM_NUM_PRIOS - 1) IRQs 310 311 * in descending order, so intid+1 can preempt intid. 311 312 */ 312 313 for (i = 0, prio = (num - 1) * 8; i < num; i++, prio -= 8) { ··· 316 315 gic_set_priority(intid, prio); 317 316 } 318 317 319 - /* In a real migration, KVM would restore all GIC state before running 318 + /* 319 + * In a real migration, KVM would restore all GIC state before running 320 320 * guest code. 321 321 */ 322 322 for (i = 0; i < num; i++) { ··· 474 472 guest_restore_active(args, MIN_SPI, 4, f->cmd); 475 473 } 476 474 477 - static void guest_code(struct test_args args) 475 + static void guest_code(struct test_args *args) 478 476 { 479 - uint32_t i, nr_irqs = args.nr_irqs; 480 - bool level_sensitive = args.level_sensitive; 477 + uint32_t i, nr_irqs = args->nr_irqs; 478 + bool level_sensitive = args->level_sensitive; 481 479 struct kvm_inject_desc *f, *inject_fns; 482 480 483 481 gic_init(GIC_V3, 1, dist, redist); ··· 486 484 gic_irq_enable(i); 487 485 488 486 for (i = MIN_SPI; i < nr_irqs; i++) 489 - gic_irq_set_config(i, !args.level_sensitive); 487 + gic_irq_set_config(i, !level_sensitive); 490 488 491 - gic_set_eoi_split(args.eoi_split); 489 + gic_set_eoi_split(args->eoi_split); 492 490 493 - reset_priorities(&args); 491 + reset_priorities(args); 494 492 gic_set_priority_mask(CPU_PRIO_MASK); 495 493 496 494 inject_fns = level_sensitive ? inject_level_fns ··· 499 497 local_irq_enable(); 500 498 501 499 /* Start the tests. */ 502 - for_each_supported_inject_fn(&args, inject_fns, f) { 503 - test_injection(&args, f); 504 - test_preemption(&args, f); 505 - test_injection_failure(&args, f); 500 + for_each_supported_inject_fn(args, inject_fns, f) { 501 + test_injection(args, f); 502 + test_preemption(args, f); 503 + test_injection_failure(args, f); 506 504 } 507 505 508 - /* Restore the active state of IRQs. This would happen when live 506 + /* 507 + * Restore the active state of IRQs. This would happen when live 509 508 * migrating IRQs in the middle of being handled. 510 509 */ 511 - for_each_supported_activate_fn(&args, set_active_fns, f) 512 - test_restore_active(&args, f); 510 + for_each_supported_activate_fn(args, set_active_fns, f) 511 + test_restore_active(args, f); 513 512 514 513 GUEST_DONE(); 515 514 } ··· 576 573 kvm_gsi_routing_write(vm, routing); 577 574 } else { 578 575 ret = _kvm_gsi_routing_write(vm, routing); 579 - /* The kernel only checks for KVM_IRQCHIP_NUM_PINS. */ 580 - if (intid >= KVM_IRQCHIP_NUM_PINS) 576 + /* The kernel only checks e->irqchip.pin >= KVM_IRQCHIP_NUM_PINS */ 577 + if (((uint64_t)intid + num - 1 - MIN_SPI) >= KVM_IRQCHIP_NUM_PINS) 581 578 TEST_ASSERT(ret != 0 && errno == EINVAL, 582 579 "Bad intid %u did not cause KVM_SET_GSI_ROUTING " 583 580 "error: rc: %i errno: %i", intid, ret, errno); ··· 742 739 int gic_fd; 743 740 struct kvm_vm *vm; 744 741 struct kvm_inject_args inject_args; 742 + vm_vaddr_t args_gva; 745 743 746 744 struct test_args args = { 747 745 .nr_irqs = nr_irqs, ··· 761 757 vcpu_init_descriptor_tables(vm, VCPU_ID); 762 758 763 759 /* Setup the guest args page (so it gets the args). */ 764 - vcpu_args_set(vm, 0, 1, args); 760 + args_gva = vm_vaddr_alloc_page(vm); 761 + memcpy(addr_gva2hva(vm, args_gva), &args, sizeof(args)); 762 + vcpu_args_set(vm, 0, 1, args_gva); 765 763 766 764 gic_fd = vgic_v3_setup(vm, 1, nr_irqs, 767 765 GICD_BASE_GPA, GICR_BASE_GPA); ··· 847 841 } 848 842 } 849 843 850 - /* If the user just specified nr_irqs and/or gic_version, then run all 844 + /* 845 + * If the user just specified nr_irqs and/or gic_version, then run all 851 846 * combinations. 852 847 */ 853 848 if (default_args) {

+10

tools/testing/selftests/kvm/dirty_log_perf_test.c

··· 18 18 #include "test_util.h" 19 19 #include "perf_test_util.h" 20 20 #include "guest_modes.h" 21 + #ifdef __aarch64__ 22 + #include "aarch64/vgic.h" 23 + 24 + #define GICD_BASE_GPA 0x8000000ULL 25 + #define GICR_BASE_GPA 0x80A0000ULL 26 + #endif 21 27 22 28 /* How many host loops to run by default (one KVM_GET_DIRTY_LOG for each loop)*/ 23 29 #define TEST_HOST_LOOP_N 2UL ··· 205 199 cap.args[0] = dirty_log_manual_caps; 206 200 vm_enable_cap(vm, &cap); 207 201 } 202 + 203 + #ifdef __aarch64__ 204 + vgic_v3_setup(vm, nr_vcpus, 64, GICD_BASE_GPA, GICR_BASE_GPA); 205 + #endif 208 206 209 207 /* Start the iterations */ 210 208 iteration = 0;

+7 -5

tools/testing/selftests/kvm/lib/aarch64/gic_v3.c

··· 19 19 unsigned int nr_spis; 20 20 }; 21 21 22 - #define sgi_base_from_redist(redist_base) (redist_base + SZ_64K) 22 + #define sgi_base_from_redist(redist_base) (redist_base + SZ_64K) 23 23 #define DIST_BIT (1U << 31) 24 24 25 25 enum gicv3_intid_range { ··· 105 105 { 106 106 uint32_t val; 107 107 108 - /* All other fields are read-only, so no need to read CTLR first. In 108 + /* 109 + * All other fields are read-only, so no need to read CTLR first. In 109 110 * fact, the kernel does the same. 110 111 */ 111 112 val = split ? (1U << 1) : 0; ··· 160 159 uint32_t cpu_or_dist; 161 160 162 161 GUEST_ASSERT(bits_per_field <= reg_bits); 163 - GUEST_ASSERT(*val < (1U << bits_per_field)); 164 - /* Some registers like IROUTER are 64 bit long. Those are currently not 165 - * supported by readl nor writel, so just asserting here until then. 162 + GUEST_ASSERT(!write || *val < (1U << bits_per_field)); 163 + /* 164 + * This function does not support 64 bit accesses. Just asserting here 165 + * until we implement readq/writeq. 166 166 */ 167 167 GUEST_ASSERT(reg_bits == 32); 168 168

+5 -4

tools/testing/selftests/kvm/lib/aarch64/vgic.c

··· 140 140 uint64_t val; 141 141 bool intid_is_private = INTID_IS_SGI(intid) || INTID_IS_PPI(intid); 142 142 143 - /* Check that the addr part of the attr is within 32 bits. */ 144 - assert(attr <= KVM_DEV_ARM_VGIC_OFFSET_MASK); 145 - 146 143 uint32_t group = intid_is_private ? KVM_DEV_ARM_VGIC_GRP_REDIST_REGS 147 144 : KVM_DEV_ARM_VGIC_GRP_DIST_REGS; 148 145 ··· 149 152 attr += SZ_64K; 150 153 } 151 154 152 - /* All calls will succeed, even with invalid intid's, as long as the 155 + /* Check that the addr part of the attr is within 32 bits. */ 156 + assert((attr & ~KVM_DEV_ARM_VGIC_OFFSET_MASK) == 0); 157 + 158 + /* 159 + * All calls will succeed, even with invalid intid's, as long as the 153 160 * addr part of the attr is within 32 bits (checked above). An invalid 154 161 * intid will just make the read/writes point to above the intended 155 162 * register space (i.e., ICPENDR after ISPENDR).