Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

+13 -2

Documentation/admin-guide/kernel-parameters.txt

··· 2353 2353 [KVM] Controls how many 4KiB pages are periodically zapped 2354 2354 back to huge pages. 0 disables the recovery, otherwise if 2355 2355 the value is N KVM will zap 1/Nth of the 4KiB pages every 2356 - minute. The default is 60. 2356 + period (see below). The default is 60. 2357 + 2358 + kvm.nx_huge_pages_recovery_period_ms= 2359 + [KVM] Controls the time period at which KVM zaps 4KiB pages 2360 + back to huge pages. If the value is a non-zero N, KVM will 2361 + zap a portion (see ratio above) of the pages every N msecs. 2362 + If the value is 0 (the default), KVM will pick a period based 2363 + on the ratio, such that a page is zapped after 1 hour on average. 2357 2364 2358 2365 kvm-amd.nested= [KVM,AMD] Allow nested virtualization in KVM/SVM. 2359 2366 Default is 1 (enabled) ··· 2372 2365 kvm-arm.mode= 2373 2366 [KVM,ARM] Select one of KVM/arm64's modes of operation. 2374 2367 2368 + none: Forcefully disable KVM. 2369 + 2375 2370 nvhe: Standard nVHE-based mode, without support for 2376 2371 protected guests. 2377 2372 ··· 2381 2372 state is kept private from the host. 2382 2373 Not valid if the kernel is running in EL2. 2383 2374 2384 - Defaults to VHE/nVHE based on hardware support. 2375 + Defaults to VHE/nVHE based on hardware support. Setting 2376 + mode to "protected" will disable kexec and hibernation 2377 + for the host. 2385 2378 2386 2379 kvm-arm.vgic_v3_group0_trap= 2387 2380 [KVM,ARM] Trap guest accesses to GICv3 group-0

+223 -18

Documentation/virt/kvm/api.rst

··· 532 532 ------------------ 533 533 534 534 :Capability: basic 535 - :Architectures: x86, ppc, mips 535 + :Architectures: x86, ppc, mips, riscv 536 536 :Type: vcpu ioctl 537 537 :Parameters: struct kvm_interrupt (in) 538 538 :Returns: 0 on success, negative on failure. ··· 598 598 599 599 Queues an external interrupt to be injected into the virtual CPU. A negative 600 600 interrupt number dequeues the interrupt. 601 + 602 + This is an asynchronous vcpu ioctl and can be invoked from any thread. 603 + 604 + RISC-V: 605 + ^^^^^^^ 606 + 607 + Queues an external interrupt to be injected into the virutal CPU. This ioctl 608 + is overloaded with 2 different irq values: 609 + 610 + a) KVM_INTERRUPT_SET 611 + 612 + This sets external interrupt for a virtual CPU and it will receive 613 + once it is ready. 614 + 615 + b) KVM_INTERRUPT_UNSET 616 + 617 + This clears pending external interrupt for a virtual CPU. 601 618 602 619 This is an asynchronous vcpu ioctl and can be invoked from any thread. 603 620 ··· 1010 993 When KVM_CAP_ADJUST_CLOCK is passed to KVM_CHECK_EXTENSION, it returns the 1011 994 set of bits that KVM can return in struct kvm_clock_data's flag member. 1012 995 1013 - The only flag defined now is KVM_CLOCK_TSC_STABLE. If set, the returned 1014 - value is the exact kvmclock value seen by all VCPUs at the instant 1015 - when KVM_GET_CLOCK was called. If clear, the returned value is simply 1016 - CLOCK_MONOTONIC plus a constant offset; the offset can be modified 1017 - with KVM_SET_CLOCK. KVM will try to make all VCPUs follow this clock, 1018 - but the exact value read by each VCPU could differ, because the host 1019 - TSC is not stable. 996 + The following flags are defined: 997 + 998 + KVM_CLOCK_TSC_STABLE 999 + If set, the returned value is the exact kvmclock 1000 + value seen by all VCPUs at the instant when KVM_GET_CLOCK was called. 1001 + If clear, the returned value is simply CLOCK_MONOTONIC plus a constant 1002 + offset; the offset can be modified with KVM_SET_CLOCK. KVM will try 1003 + to make all VCPUs follow this clock, but the exact value read by each 1004 + VCPU could differ, because the host TSC is not stable. 1005 + 1006 + KVM_CLOCK_REALTIME 1007 + If set, the `realtime` field in the kvm_clock_data 1008 + structure is populated with the value of the host's real time 1009 + clocksource at the instant when KVM_GET_CLOCK was called. If clear, 1010 + the `realtime` field does not contain a value. 1011 + 1012 + KVM_CLOCK_HOST_TSC 1013 + If set, the `host_tsc` field in the kvm_clock_data 1014 + structure is populated with the value of the host's timestamp counter (TSC) 1015 + at the instant when KVM_GET_CLOCK was called. If clear, the `host_tsc` field 1016 + does not contain a value. 1020 1017 1021 1018 :: 1022 1019 1023 1020 struct kvm_clock_data { 1024 1021 __u64 clock; /* kvmclock current value */ 1025 1022 __u32 flags; 1026 - __u32 pad[9]; 1023 + __u32 pad0; 1024 + __u64 realtime; 1025 + __u64 host_tsc; 1026 + __u32 pad[4]; 1027 1027 }; 1028 1028 1029 1029 ··· 1057 1023 In conjunction with KVM_GET_CLOCK, it is used to ensure monotonicity on scenarios 1058 1024 such as migration. 1059 1025 1026 + The following flags can be passed: 1027 + 1028 + KVM_CLOCK_REALTIME 1029 + If set, KVM will compare the value of the `realtime` field 1030 + with the value of the host's real time clocksource at the instant when 1031 + KVM_SET_CLOCK was called. The difference in elapsed time is added to the final 1032 + kvmclock value that will be provided to guests. 1033 + 1034 + Other flags returned by ``KVM_GET_CLOCK`` are accepted but ignored. 1035 + 1060 1036 :: 1061 1037 1062 1038 struct kvm_clock_data { 1063 1039 __u64 clock; /* kvmclock current value */ 1064 1040 __u32 flags; 1065 - __u32 pad[9]; 1041 + __u32 pad0; 1042 + __u64 realtime; 1043 + __u64 host_tsc; 1044 + __u32 pad[4]; 1066 1045 }; 1067 1046 1068 1047 ··· 1446 1399 --------------------- 1447 1400 1448 1401 :Capability: KVM_CAP_MP_STATE 1449 - :Architectures: x86, s390, arm, arm64 1402 + :Architectures: x86, s390, arm, arm64, riscv 1450 1403 :Type: vcpu ioctl 1451 1404 :Parameters: struct kvm_mp_state (out) 1452 1405 :Returns: 0 on success; -1 on error ··· 1463 1416 Possible values are: 1464 1417 1465 1418 ========================== =============================================== 1466 - KVM_MP_STATE_RUNNABLE the vcpu is currently running [x86,arm/arm64] 1419 + KVM_MP_STATE_RUNNABLE the vcpu is currently running 1420 + [x86,arm/arm64,riscv] 1467 1421 KVM_MP_STATE_UNINITIALIZED the vcpu is an application processor (AP) 1468 1422 which has not yet received an INIT signal [x86] 1469 1423 KVM_MP_STATE_INIT_RECEIVED the vcpu has received an INIT signal, and is ··· 1473 1425 is waiting for an interrupt [x86] 1474 1426 KVM_MP_STATE_SIPI_RECEIVED the vcpu has just received a SIPI (vector 1475 1427 accessible via KVM_GET_VCPU_EVENTS) [x86] 1476 - KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64] 1428 + KVM_MP_STATE_STOPPED the vcpu is stopped [s390,arm/arm64,riscv] 1477 1429 KVM_MP_STATE_CHECK_STOP the vcpu is in a special error state [s390] 1478 1430 KVM_MP_STATE_OPERATING the vcpu is operating (running or halted) 1479 1431 [s390] ··· 1485 1437 in-kernel irqchip, the multiprocessing state must be maintained by userspace on 1486 1438 these architectures. 1487 1439 1488 - For arm/arm64: 1489 - ^^^^^^^^^^^^^^ 1440 + For arm/arm64/riscv: 1441 + ^^^^^^^^^^^^^^^^^^^^ 1490 1442 1491 1443 The only states that are valid are KVM_MP_STATE_STOPPED and 1492 1444 KVM_MP_STATE_RUNNABLE which reflect if the vcpu is paused or not. ··· 1495 1447 --------------------- 1496 1448 1497 1449 :Capability: KVM_CAP_MP_STATE 1498 - :Architectures: x86, s390, arm, arm64 1450 + :Architectures: x86, s390, arm, arm64, riscv 1499 1451 :Type: vcpu ioctl 1500 1452 :Parameters: struct kvm_mp_state (in) 1501 1453 :Returns: 0 on success; -1 on error ··· 1507 1459 in-kernel irqchip, the multiprocessing state must be maintained by userspace on 1508 1460 these architectures. 1509 1461 1510 - For arm/arm64: 1511 - ^^^^^^^^^^^^^^ 1462 + For arm/arm64/riscv: 1463 + ^^^^^^^^^^^^^^^^^^^^ 1512 1464 1513 1465 The only states that are valid are KVM_MP_STATE_STOPPED and 1514 1466 KVM_MP_STATE_RUNNABLE which reflect if the vcpu should be paused or not. ··· 2624 2576 following id bit patterns:: 2625 2577 2626 2578 0x7020 0000 0003 02 <0:3> <reg:5> 2579 + 2580 + RISC-V registers are mapped using the lower 32 bits. The upper 8 bits of 2581 + that is the register group type. 2582 + 2583 + RISC-V config registers are meant for configuring a Guest VCPU and it has 2584 + the following id bit patterns:: 2585 + 2586 + 0x8020 0000 01 <index into the kvm_riscv_config struct:24> (32bit Host) 2587 + 0x8030 0000 01 <index into the kvm_riscv_config struct:24> (64bit Host) 2588 + 2589 + Following are the RISC-V config registers: 2590 + 2591 + ======================= ========= ============================================= 2592 + Encoding Register Description 2593 + ======================= ========= ============================================= 2594 + 0x80x0 0000 0100 0000 isa ISA feature bitmap of Guest VCPU 2595 + ======================= ========= ============================================= 2596 + 2597 + The isa config register can be read anytime but can only be written before 2598 + a Guest VCPU runs. It will have ISA feature bits matching underlying host 2599 + set by default. 2600 + 2601 + RISC-V core registers represent the general excution state of a Guest VCPU 2602 + and it has the following id bit patterns:: 2603 + 2604 + 0x8020 0000 02 <index into the kvm_riscv_core struct:24> (32bit Host) 2605 + 0x8030 0000 02 <index into the kvm_riscv_core struct:24> (64bit Host) 2606 + 2607 + Following are the RISC-V core registers: 2608 + 2609 + ======================= ========= ============================================= 2610 + Encoding Register Description 2611 + ======================= ========= ============================================= 2612 + 0x80x0 0000 0200 0000 regs.pc Program counter 2613 + 0x80x0 0000 0200 0001 regs.ra Return address 2614 + 0x80x0 0000 0200 0002 regs.sp Stack pointer 2615 + 0x80x0 0000 0200 0003 regs.gp Global pointer 2616 + 0x80x0 0000 0200 0004 regs.tp Task pointer 2617 + 0x80x0 0000 0200 0005 regs.t0 Caller saved register 0 2618 + 0x80x0 0000 0200 0006 regs.t1 Caller saved register 1 2619 + 0x80x0 0000 0200 0007 regs.t2 Caller saved register 2 2620 + 0x80x0 0000 0200 0008 regs.s0 Callee saved register 0 2621 + 0x80x0 0000 0200 0009 regs.s1 Callee saved register 1 2622 + 0x80x0 0000 0200 000a regs.a0 Function argument (or return value) 0 2623 + 0x80x0 0000 0200 000b regs.a1 Function argument (or return value) 1 2624 + 0x80x0 0000 0200 000c regs.a2 Function argument 2 2625 + 0x80x0 0000 0200 000d regs.a3 Function argument 3 2626 + 0x80x0 0000 0200 000e regs.a4 Function argument 4 2627 + 0x80x0 0000 0200 000f regs.a5 Function argument 5 2628 + 0x80x0 0000 0200 0010 regs.a6 Function argument 6 2629 + 0x80x0 0000 0200 0011 regs.a7 Function argument 7 2630 + 0x80x0 0000 0200 0012 regs.s2 Callee saved register 2 2631 + 0x80x0 0000 0200 0013 regs.s3 Callee saved register 3 2632 + 0x80x0 0000 0200 0014 regs.s4 Callee saved register 4 2633 + 0x80x0 0000 0200 0015 regs.s5 Callee saved register 5 2634 + 0x80x0 0000 0200 0016 regs.s6 Callee saved register 6 2635 + 0x80x0 0000 0200 0017 regs.s7 Callee saved register 7 2636 + 0x80x0 0000 0200 0018 regs.s8 Callee saved register 8 2637 + 0x80x0 0000 0200 0019 regs.s9 Callee saved register 9 2638 + 0x80x0 0000 0200 001a regs.s10 Callee saved register 10 2639 + 0x80x0 0000 0200 001b regs.s11 Callee saved register 11 2640 + 0x80x0 0000 0200 001c regs.t3 Caller saved register 3 2641 + 0x80x0 0000 0200 001d regs.t4 Caller saved register 4 2642 + 0x80x0 0000 0200 001e regs.t5 Caller saved register 5 2643 + 0x80x0 0000 0200 001f regs.t6 Caller saved register 6 2644 + 0x80x0 0000 0200 0020 mode Privilege mode (1 = S-mode or 0 = U-mode) 2645 + ======================= ========= ============================================= 2646 + 2647 + RISC-V csr registers represent the supervisor mode control/status registers 2648 + of a Guest VCPU and it has the following id bit patterns:: 2649 + 2650 + 0x8020 0000 03 <index into the kvm_riscv_csr struct:24> (32bit Host) 2651 + 0x8030 0000 03 <index into the kvm_riscv_csr struct:24> (64bit Host) 2652 + 2653 + Following are the RISC-V csr registers: 2654 + 2655 + ======================= ========= ============================================= 2656 + Encoding Register Description 2657 + ======================= ========= ============================================= 2658 + 0x80x0 0000 0300 0000 sstatus Supervisor status 2659 + 0x80x0 0000 0300 0001 sie Supervisor interrupt enable 2660 + 0x80x0 0000 0300 0002 stvec Supervisor trap vector base 2661 + 0x80x0 0000 0300 0003 sscratch Supervisor scratch register 2662 + 0x80x0 0000 0300 0004 sepc Supervisor exception program counter 2663 + 0x80x0 0000 0300 0005 scause Supervisor trap cause 2664 + 0x80x0 0000 0300 0006 stval Supervisor bad address or instruction 2665 + 0x80x0 0000 0300 0007 sip Supervisor interrupt pending 2666 + 0x80x0 0000 0300 0008 satp Supervisor address translation and protection 2667 + ======================= ========= ============================================= 2668 + 2669 + RISC-V timer registers represent the timer state of a Guest VCPU and it has 2670 + the following id bit patterns:: 2671 + 2672 + 0x8030 0000 04 <index into the kvm_riscv_timer struct:24> 2673 + 2674 + Following are the RISC-V timer registers: 2675 + 2676 + ======================= ========= ============================================= 2677 + Encoding Register Description 2678 + ======================= ========= ============================================= 2679 + 0x8030 0000 0400 0000 frequency Time base frequency (read-only) 2680 + 0x8030 0000 0400 0001 time Time value visible to Guest 2681 + 0x8030 0000 0400 0002 compare Time compare programmed by Guest 2682 + 0x8030 0000 0400 0003 state Time compare state (1 = ON or 0 = OFF) 2683 + ======================= ========= ============================================= 2684 + 2685 + RISC-V F-extension registers represent the single precision floating point 2686 + state of a Guest VCPU and it has the following id bit patterns:: 2687 + 2688 + 0x8020 0000 05 <index into the __riscv_f_ext_state struct:24> 2689 + 2690 + Following are the RISC-V F-extension registers: 2691 + 2692 + ======================= ========= ============================================= 2693 + Encoding Register Description 2694 + ======================= ========= ============================================= 2695 + 0x8020 0000 0500 0000 f[0] Floating point register 0 2696 + ... 2697 + 0x8020 0000 0500 001f f[31] Floating point register 31 2698 + 0x8020 0000 0500 0020 fcsr Floating point control and status register 2699 + ======================= ========= ============================================= 2700 + 2701 + RISC-V D-extension registers represent the double precision floating point 2702 + state of a Guest VCPU and it has the following id bit patterns:: 2703 + 2704 + 0x8020 0000 06 <index into the __riscv_d_ext_state struct:24> (fcsr) 2705 + 0x8030 0000 06 <index into the __riscv_d_ext_state struct:24> (non-fcsr) 2706 + 2707 + Following are the RISC-V D-extension registers: 2708 + 2709 + ======================= ========= ============================================= 2710 + Encoding Register Description 2711 + ======================= ========= ============================================= 2712 + 0x8030 0000 0600 0000 f[0] Floating point register 0 2713 + ... 2714 + 0x8030 0000 0600 001f f[31] Floating point register 31 2715 + 0x8020 0000 0600 0020 fcsr Floating point control and status register 2716 + ======================= ========= ============================================= 2627 2717 2628 2718 2629 2719 4.69 KVM_GET_ONE_REG ··· 6033 5847 - KVM_EXIT_XEN_HCALL -- synchronously notify user-space about Xen hypercall. 6034 5848 Userspace is expected to place the hypercall result into the appropriate 6035 5849 field before invoking KVM_RUN again. 5850 + 5851 + :: 5852 + 5853 + /* KVM_EXIT_RISCV_SBI */ 5854 + struct { 5855 + unsigned long extension_id; 5856 + unsigned long function_id; 5857 + unsigned long args[6]; 5858 + unsigned long ret[2]; 5859 + } riscv_sbi; 5860 + If exit reason is KVM_EXIT_RISCV_SBI then it indicates that the VCPU has 5861 + done a SBI call which is not handled by KVM RISC-V kernel module. The details 5862 + of the SBI call are available in 'riscv_sbi' member of kvm_run structure. The 5863 + 'extension_id' field of 'riscv_sbi' represents SBI extension ID whereas the 5864 + 'function_id' field represents function ID of given SBI extension. The 'args' 5865 + array field of 'riscv_sbi' represents parameters for the SBI call and 'ret' 5866 + array field represents return values. The userspace should update the return 5867 + values of SBI call before resuming the VCPU. For more details on RISC-V SBI 5868 + spec refer, https://github.com/riscv/riscv-sbi-doc. 6036 5869 6037 5870 :: 6038 5871

+70

Documentation/virt/kvm/devices/vcpu.rst

··· 161 161 base address must be 64 byte aligned and exist within a valid guest memory 162 162 region. See Documentation/virt/kvm/arm/pvtime.rst for more information 163 163 including the layout of the stolen time structure. 164 + 165 + 4. GROUP: KVM_VCPU_TSC_CTRL 166 + =========================== 167 + 168 + :Architectures: x86 169 + 170 + 4.1 ATTRIBUTE: KVM_VCPU_TSC_OFFSET 171 + 172 + :Parameters: 64-bit unsigned TSC offset 173 + 174 + Returns: 175 + 176 + ======= ====================================== 177 + -EFAULT Error reading/writing the provided 178 + parameter address. 179 + -ENXIO Attribute not supported 180 + ======= ====================================== 181 + 182 + Specifies the guest's TSC offset relative to the host's TSC. The guest's 183 + TSC is then derived by the following equation: 184 + 185 + guest_tsc = host_tsc + KVM_VCPU_TSC_OFFSET 186 + 187 + This attribute is useful to adjust the guest's TSC on live migration, 188 + so that the TSC counts the time during which the VM was paused. The 189 + following describes a possible algorithm to use for this purpose. 190 + 191 + From the source VMM process: 192 + 193 + 1. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_src), 194 + kvmclock nanoseconds (guest_src), and host CLOCK_REALTIME nanoseconds 195 + (host_src). 196 + 197 + 2. Read the KVM_VCPU_TSC_OFFSET attribute for every vCPU to record the 198 + guest TSC offset (ofs_src[i]). 199 + 200 + 3. Invoke the KVM_GET_TSC_KHZ ioctl to record the frequency of the 201 + guest's TSC (freq). 202 + 203 + From the destination VMM process: 204 + 205 + 4. Invoke the KVM_SET_CLOCK ioctl, providing the source nanoseconds from 206 + kvmclock (guest_src) and CLOCK_REALTIME (host_src) in their respective 207 + fields. Ensure that the KVM_CLOCK_REALTIME flag is set in the provided 208 + structure. 209 + 210 + KVM will advance the VM's kvmclock to account for elapsed time since 211 + recording the clock values. Note that this will cause problems in 212 + the guest (e.g., timeouts) unless CLOCK_REALTIME is synchronized 213 + between the source and destination, and a reasonably short time passes 214 + between the source pausing the VMs and the destination executing 215 + steps 4-7. 216 + 217 + 5. Invoke the KVM_GET_CLOCK ioctl to record the host TSC (tsc_dest) and 218 + kvmclock nanoseconds (guest_dest). 219 + 220 + 6. Adjust the guest TSC offsets for every vCPU to account for (1) time 221 + elapsed since recording state and (2) difference in TSCs between the 222 + source and destination machine: 223 + 224 + ofs_dst[i] = ofs_src[i] - 225 + (guest_src - guest_dest) * freq + 226 + (tsc_src - tsc_dest) 227 + 228 + ("ofs[i] + tsc - guest * freq" is the guest TSC value corresponding to 229 + a time of 0 in kvmclock. The above formula ensures that it is the 230 + same on the destination as it was on the source). 231 + 232 + 7. Write the KVM_VCPU_TSC_OFFSET attribute for every vCPU with the 233 + respective value derived in the previous step.

+1 -1

Documentation/virt/kvm/devices/xics.rst

··· 22 22 Errors: 23 23 24 24 ======= ========================================== 25 - -EINVAL Value greater than KVM_MAX_VCPU_ID. 25 + -EINVAL Value greater than KVM_MAX_VCPU_IDS. 26 26 -EFAULT Invalid user pointer for attr->addr. 27 27 -EBUSY A vcpu is already connected to the device. 28 28 ======= ==========================================

+1 -1

Documentation/virt/kvm/devices/xive.rst

··· 91 91 Errors: 92 92 93 93 ======= ========================================== 94 - -EINVAL Value greater than KVM_MAX_VCPU_ID. 94 + -EINVAL Value greater than KVM_MAX_VCPU_IDS. 95 95 -EFAULT Invalid user pointer for attr->addr. 96 96 -EBUSY A vCPU is already connected to the device. 97 97 ======= ==========================================

+12

MAINTAINERS

··· 10342 10342 F: arch/powerpc/kernel/kvm* 10343 10343 F: arch/powerpc/kvm/ 10344 10344 10345 + KERNEL VIRTUAL MACHINE FOR RISC-V (KVM/riscv) 10346 + M: Anup Patel <anup.patel@wdc.com> 10347 + R: Atish Patra <atish.patra@wdc.com> 10348 + L: kvm@vger.kernel.org 10349 + L: kvm-riscv@lists.infradead.org 10350 + L: linux-riscv@lists.infradead.org 10351 + S: Maintained 10352 + T: git git://github.com/kvm-riscv/linux.git 10353 + F: arch/riscv/include/asm/kvm* 10354 + F: arch/riscv/include/uapi/asm/kvm* 10355 + F: arch/riscv/kvm/ 10356 + 10345 10357 KERNEL VIRTUAL MACHINE for s390 (KVM/s390) 10346 10358 M: Christian Borntraeger <borntraeger@de.ibm.com> 10347 10359 M: Janosch Frank <frankja@linux.ibm.com>

+1

arch/arm64/Kconfig

··· 185 185 select HAVE_GCC_PLUGINS 186 186 select HAVE_HW_BREAKPOINT if PERF_EVENTS 187 187 select HAVE_IRQ_TIME_ACCOUNTING 188 + select HAVE_KVM 188 189 select HAVE_NMI 189 190 select HAVE_PATA_PLATFORM 190 191 select HAVE_PERF_EVENTS

+1

arch/arm64/include/asm/kvm_arm.h

··· 295 295 #define MDCR_EL2_HPMFZO (UL(1) << 29) 296 296 #define MDCR_EL2_MTPME (UL(1) << 28) 297 297 #define MDCR_EL2_TDCC (UL(1) << 27) 298 + #define MDCR_EL2_HLP (UL(1) << 26) 298 299 #define MDCR_EL2_HCCD (UL(1) << 23) 299 300 #define MDCR_EL2_TTRF (UL(1) << 19) 300 301 #define MDCR_EL2_HPMD (UL(1) << 17)

+28 -20

arch/arm64/include/asm/kvm_asm.h

··· 44 44 #define KVM_HOST_SMCCC_FUNC(name) KVM_HOST_SMCCC_ID(__KVM_HOST_SMCCC_FUNC_##name) 45 45 46 46 #define __KVM_HOST_SMCCC_FUNC___kvm_hyp_init 0 47 - #define __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run 1 48 - #define __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context 2 49 - #define __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_ipa 3 50 - #define __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid 4 51 - #define __KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context 5 52 - #define __KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff 6 53 - #define __KVM_HOST_SMCCC_FUNC___kvm_enable_ssbs 7 54 - #define __KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config 8 55 - #define __KVM_HOST_SMCCC_FUNC___vgic_v3_read_vmcr 9 56 - #define __KVM_HOST_SMCCC_FUNC___vgic_v3_write_vmcr 10 57 - #define __KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs 11 58 - #define __KVM_HOST_SMCCC_FUNC___kvm_get_mdcr_el2 12 59 - #define __KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs 13 60 - #define __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs 14 61 - #define __KVM_HOST_SMCCC_FUNC___pkvm_init 15 62 - #define __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp 16 63 - #define __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping 17 64 - #define __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector 18 65 - #define __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize 19 66 - #define __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc 20 67 47 68 48 #ifndef __ASSEMBLY__ 69 49 70 50 #include <linux/mm.h> 51 + 52 + enum __kvm_host_smccc_func { 53 + /* Hypercalls available only prior to pKVM finalisation */ 54 + /* __KVM_HOST_SMCCC_FUNC___kvm_hyp_init */ 55 + __KVM_HOST_SMCCC_FUNC___kvm_get_mdcr_el2 = __KVM_HOST_SMCCC_FUNC___kvm_hyp_init + 1, 56 + __KVM_HOST_SMCCC_FUNC___pkvm_init, 57 + __KVM_HOST_SMCCC_FUNC___pkvm_create_private_mapping, 58 + __KVM_HOST_SMCCC_FUNC___pkvm_cpu_set_vector, 59 + __KVM_HOST_SMCCC_FUNC___kvm_enable_ssbs, 60 + __KVM_HOST_SMCCC_FUNC___vgic_v3_init_lrs, 61 + __KVM_HOST_SMCCC_FUNC___vgic_v3_get_gic_config, 62 + __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize, 63 + 64 + /* Hypercalls available after pKVM finalisation */ 65 + __KVM_HOST_SMCCC_FUNC___pkvm_host_share_hyp, 66 + __KVM_HOST_SMCCC_FUNC___kvm_adjust_pc, 67 + __KVM_HOST_SMCCC_FUNC___kvm_vcpu_run, 68 + __KVM_HOST_SMCCC_FUNC___kvm_flush_vm_context, 69 + __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid_ipa, 70 + __KVM_HOST_SMCCC_FUNC___kvm_tlb_flush_vmid, 71 + __KVM_HOST_SMCCC_FUNC___kvm_flush_cpu_context, 72 + __KVM_HOST_SMCCC_FUNC___kvm_timer_set_cntvoff, 73 + __KVM_HOST_SMCCC_FUNC___vgic_v3_read_vmcr, 74 + __KVM_HOST_SMCCC_FUNC___vgic_v3_write_vmcr, 75 + __KVM_HOST_SMCCC_FUNC___vgic_v3_save_aprs, 76 + __KVM_HOST_SMCCC_FUNC___vgic_v3_restore_aprs, 77 + __KVM_HOST_SMCCC_FUNC___pkvm_vcpu_init_traps, 78 + }; 71 79 72 80 #define DECLARE_KVM_VHE_SYM(sym) extern char sym[] 73 81 #define DECLARE_KVM_NVHE_SYM(sym) extern char kvm_nvhe_sym(sym)[]

+4 -1

arch/arm64/include/asm/kvm_emulate.h

··· 396 396 if (vcpu_mode_is_32bit(vcpu)) 397 397 return !!(*vcpu_cpsr(vcpu) & PSR_AA32_E_BIT); 398 398 399 - return !!(vcpu_read_sys_reg(vcpu, SCTLR_EL1) & (1 << 25)); 399 + if (vcpu_mode_priv(vcpu)) 400 + return !!(vcpu_read_sys_reg(vcpu, SCTLR_EL1) & SCTLR_ELx_EE); 401 + else 402 + return !!(vcpu_read_sys_reg(vcpu, SCTLR_EL1) & SCTLR_EL1_E0E); 400 403 } 401 404 402 405 static inline unsigned long vcpu_data_guest_to_host(struct kvm_vcpu *vcpu,

+3 -1

arch/arm64/include/asm/kvm_host.h

··· 58 58 enum kvm_mode { 59 59 KVM_MODE_DEFAULT, 60 60 KVM_MODE_PROTECTED, 61 + KVM_MODE_NONE, 61 62 }; 62 63 enum kvm_mode kvm_get_mode(void); 63 64 ··· 772 771 773 772 #define __KVM_HAVE_ARCH_VM_ALLOC 774 773 struct kvm *kvm_arch_alloc_vm(void); 775 - void kvm_arch_free_vm(struct kvm *kvm); 776 774 777 775 int kvm_arm_setup_stage2(struct kvm *kvm, unsigned long type); 778 776 ··· 779 779 { 780 780 return false; 781 781 } 782 + 783 + void kvm_init_protected_traps(struct kvm_vcpu *vcpu); 782 784 783 785 int kvm_arm_vcpu_finalize(struct kvm_vcpu *vcpu, int feature); 784 786 bool kvm_arm_vcpu_is_finalized(struct kvm_vcpu *vcpu);

+5

arch/arm64/include/asm/kvm_hyp.h

··· 115 115 void __noreturn __host_enter(struct kvm_cpu_context *host_ctxt); 116 116 #endif 117 117 118 + extern u64 kvm_nvhe_sym(id_aa64pfr0_el1_sys_val); 119 + extern u64 kvm_nvhe_sym(id_aa64pfr1_el1_sys_val); 120 + extern u64 kvm_nvhe_sym(id_aa64isar0_el1_sys_val); 121 + extern u64 kvm_nvhe_sym(id_aa64isar1_el1_sys_val); 118 122 extern u64 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val); 119 123 extern u64 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val); 124 + extern u64 kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val); 120 125 121 126 #endif /* __ARM64_KVM_HYP_H__ */

+3

arch/arm64/include/asm/sysreg.h

··· 1160 1160 #define ICH_HCR_TC (1 << 10) 1161 1161 #define ICH_HCR_TALL0 (1 << 11) 1162 1162 #define ICH_HCR_TALL1 (1 << 12) 1163 + #define ICH_HCR_TDIR (1 << 14) 1163 1164 #define ICH_HCR_EOIcount_SHIFT 27 1164 1165 #define ICH_HCR_EOIcount_MASK (0x1f << ICH_HCR_EOIcount_SHIFT) 1165 1166 ··· 1193 1192 #define ICH_VTR_SEIS_MASK (1 << ICH_VTR_SEIS_SHIFT) 1194 1193 #define ICH_VTR_A3V_SHIFT 21 1195 1194 #define ICH_VTR_A3V_MASK (1 << ICH_VTR_A3V_SHIFT) 1195 + #define ICH_VTR_TDS_SHIFT 19 1196 + #define ICH_VTR_TDS_MASK (1 << ICH_VTR_TDS_SHIFT) 1196 1197 1197 1198 #define ARM64_FEATURE_FIELD_BITS 4 1198 1199

+2 -1

arch/arm64/kernel/smp.c

··· 1128 1128 { 1129 1129 bool smp_spin_tables = (num_possible_cpus() > 1 && !have_cpu_die()); 1130 1130 1131 - return !!cpus_stuck_in_kernel || smp_spin_tables; 1131 + return !!cpus_stuck_in_kernel || smp_spin_tables || 1132 + is_protected_kvm_enabled(); 1132 1133 }

+3 -7

arch/arm64/kvm/Kconfig

··· 4 4 # 5 5 6 6 source "virt/lib/Kconfig" 7 + source "virt/kvm/Kconfig" 7 8 8 9 menuconfig VIRTUALIZATION 9 10 bool "Virtualization" ··· 20 19 21 20 menuconfig KVM 22 21 bool "Kernel-based Virtual Machine (KVM) support" 23 - depends on OF 22 + depends on HAVE_KVM 24 23 select MMU_NOTIFIER 25 24 select PREEMPT_NOTIFIERS 26 25 select HAVE_KVM_CPU_RELAX_INTERCEPT ··· 44 43 45 44 If unsure, say N. 46 45 47 - if KVM 48 - 49 - source "virt/kvm/Kconfig" 50 - 51 46 config NVHE_EL2_DEBUG 52 47 bool "Debug mode for non-VHE EL2 object" 48 + depends on KVM 53 49 help 54 50 Say Y here to enable the debug mode for the non-VHE KVM EL2 object. 55 51 Failure reports will BUG() in the hypervisor. This is intended for 56 52 local EL2 hypervisor development. 57 53 58 54 If unsure, say N. 59 - 60 - endif # KVM 61 55 62 56 endif # VIRTUALIZATION

+73 -31

arch/arm64/kvm/arm.c

··· 291 291 292 292 struct kvm *kvm_arch_alloc_vm(void) 293 293 { 294 - if (!has_vhe()) 295 - return kzalloc(sizeof(struct kvm), GFP_KERNEL); 294 + size_t sz = sizeof(struct kvm); 296 295 297 - return vzalloc(sizeof(struct kvm)); 298 - } 299 - 300 - void kvm_arch_free_vm(struct kvm *kvm) 301 - { 302 296 if (!has_vhe()) 303 - kfree(kvm); 304 - else 305 - vfree(kvm); 297 + return kzalloc(sz, GFP_KERNEL_ACCOUNT); 298 + 299 + return __vmalloc(sz, GFP_KERNEL_ACCOUNT | __GFP_HIGHMEM | __GFP_ZERO); 306 300 } 307 301 308 302 int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id) ··· 613 619 return ret; 614 620 615 621 ret = kvm_arm_pmu_v3_enable(vcpu); 622 + 623 + /* 624 + * Initialize traps for protected VMs. 625 + * NOTE: Move to run in EL2 directly, rather than via a hypercall, once 626 + * the code is in place for first run initialization at EL2. 627 + */ 628 + if (kvm_vm_is_protected(kvm)) 629 + kvm_call_hyp_nvhe(__pkvm_vcpu_init_traps, vcpu); 616 630 617 631 return ret; 618 632 } ··· 1581 1579 kvm_call_hyp_nvhe(__pkvm_cpu_set_vector, data->slot); 1582 1580 } 1583 1581 1584 - static void cpu_hyp_reinit(void) 1582 + static void cpu_hyp_init_context(void) 1585 1583 { 1586 1584 kvm_init_host_cpu_context(&this_cpu_ptr_hyp_sym(kvm_host_data)->host_ctxt); 1587 1585 1588 - cpu_hyp_reset(); 1586 + if (!is_kernel_in_hyp_mode()) 1587 + cpu_init_hyp_mode(); 1588 + } 1589 + 1590 + static void cpu_hyp_init_features(void) 1591 + { 1592 + cpu_set_hyp_vector(); 1593 + kvm_arm_init_debug(); 1589 1594 1590 1595 if (is_kernel_in_hyp_mode()) 1591 1596 kvm_timer_init_vhe(); 1592 - else 1593 - cpu_init_hyp_mode(); 1594 - 1595 - cpu_set_hyp_vector(); 1596 - 1597 - kvm_arm_init_debug(); 1598 1597 1599 1598 if (vgic_present) 1600 1599 kvm_vgic_init_cpu_hardware(); 1600 + } 1601 + 1602 + static void cpu_hyp_reinit(void) 1603 + { 1604 + cpu_hyp_reset(); 1605 + cpu_hyp_init_context(); 1606 + cpu_hyp_init_features(); 1601 1607 } 1602 1608 1603 1609 static void _kvm_arch_hardware_enable(void *discard) ··· 1798 1788 int ret; 1799 1789 1800 1790 preempt_disable(); 1801 - hyp_install_host_vector(); 1791 + cpu_hyp_init_context(); 1802 1792 ret = kvm_call_hyp_nvhe(__pkvm_init, hyp_mem_base, hyp_mem_size, 1803 1793 num_possible_cpus(), kern_hyp_va(per_cpu_base), 1804 1794 hyp_va_bits); 1795 + cpu_hyp_init_features(); 1796 + 1797 + /* 1798 + * The stub hypercalls are now disabled, so set our local flag to 1799 + * prevent a later re-init attempt in kvm_arch_hardware_enable(). 1800 + */ 1801 + __this_cpu_write(kvm_arm_hardware_enabled, 1); 1805 1802 preempt_enable(); 1806 1803 1807 1804 return ret; ··· 1819 1802 void *addr = phys_to_virt(hyp_mem_base); 1820 1803 int ret; 1821 1804 1805 + kvm_nvhe_sym(id_aa64pfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR0_EL1); 1806 + kvm_nvhe_sym(id_aa64pfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1); 1807 + kvm_nvhe_sym(id_aa64isar0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64ISAR0_EL1); 1808 + kvm_nvhe_sym(id_aa64isar1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64ISAR1_EL1); 1822 1809 kvm_nvhe_sym(id_aa64mmfr0_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR0_EL1); 1823 1810 kvm_nvhe_sym(id_aa64mmfr1_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR1_EL1); 1811 + kvm_nvhe_sym(id_aa64mmfr2_el1_sys_val) = read_sanitised_ftr_reg(SYS_ID_AA64MMFR2_EL1); 1824 1812 1825 1813 ret = create_hyp_mappings(addr, addr + hyp_mem_size, PAGE_HYP); 1826 1814 if (ret) ··· 1993 1971 return err; 1994 1972 } 1995 1973 1996 - static void _kvm_host_prot_finalize(void *discard) 1974 + static void _kvm_host_prot_finalize(void *arg) 1997 1975 { 1998 - WARN_ON(kvm_call_hyp_nvhe(__pkvm_prot_finalize)); 1976 + int *err = arg; 1977 + 1978 + if (WARN_ON(kvm_call_hyp_nvhe(__pkvm_prot_finalize))) 1979 + WRITE_ONCE(*err, -EINVAL); 1980 + } 1981 + 1982 + static int pkvm_drop_host_privileges(void) 1983 + { 1984 + int ret = 0; 1985 + 1986 + /* 1987 + * Flip the static key upfront as that may no longer be possible 1988 + * once the host stage 2 is installed. 1989 + */ 1990 + static_branch_enable(&kvm_protected_mode_initialized); 1991 + on_each_cpu(_kvm_host_prot_finalize, &ret, 1); 1992 + return ret; 1999 1993 } 2000 1994 2001 1995 static int finalize_hyp_mode(void) ··· 2025 1987 * None of other sections should ever be introspected. 2026 1988 */ 2027 1989 kmemleak_free_part(__hyp_bss_start, __hyp_bss_end - __hyp_bss_start); 2028 - 2029 - /* 2030 - * Flip the static key upfront as that may no longer be possible 2031 - * once the host stage 2 is installed. 2032 - */ 2033 - static_branch_enable(&kvm_protected_mode_initialized); 2034 - on_each_cpu(_kvm_host_prot_finalize, NULL, 1); 2035 - 2036 - return 0; 1990 + return pkvm_drop_host_privileges(); 2037 1991 } 2038 1992 2039 1993 struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr) ··· 2091 2061 2092 2062 if (!is_hyp_mode_available()) { 2093 2063 kvm_info("HYP mode not available\n"); 2064 + return -ENODEV; 2065 + } 2066 + 2067 + if (kvm_get_mode() == KVM_MODE_NONE) { 2068 + kvm_info("KVM disabled from command line\n"); 2094 2069 return -ENODEV; 2095 2070 } 2096 2071 ··· 2172 2137 return 0; 2173 2138 } 2174 2139 2175 - if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) 2140 + if (strcmp(arg, "nvhe") == 0 && !WARN_ON(is_kernel_in_hyp_mode())) { 2141 + kvm_mode = KVM_MODE_DEFAULT; 2176 2142 return 0; 2143 + } 2144 + 2145 + if (strcmp(arg, "none") == 0) { 2146 + kvm_mode = KVM_MODE_NONE; 2147 + return 0; 2148 + } 2177 2149 2178 2150 return -EINVAL; 2179 2151 }

+75

arch/arm64/kvm/hyp/include/hyp/fault.h

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2015 - ARM Ltd 4 + * Author: Marc Zyngier <marc.zyngier@arm.com> 5 + */ 6 + 7 + #ifndef __ARM64_KVM_HYP_FAULT_H__ 8 + #define __ARM64_KVM_HYP_FAULT_H__ 9 + 10 + #include <asm/kvm_asm.h> 11 + #include <asm/kvm_emulate.h> 12 + #include <asm/kvm_hyp.h> 13 + #include <asm/kvm_mmu.h> 14 + 15 + static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar) 16 + { 17 + u64 par, tmp; 18 + 19 + /* 20 + * Resolve the IPA the hard way using the guest VA. 21 + * 22 + * Stage-1 translation already validated the memory access 23 + * rights. As such, we can use the EL1 translation regime, and 24 + * don't have to distinguish between EL0 and EL1 access. 25 + * 26 + * We do need to save/restore PAR_EL1 though, as we haven't 27 + * saved the guest context yet, and we may return early... 28 + */ 29 + par = read_sysreg_par(); 30 + if (!__kvm_at("s1e1r", far)) 31 + tmp = read_sysreg_par(); 32 + else 33 + tmp = SYS_PAR_EL1_F; /* back to the guest */ 34 + write_sysreg(par, par_el1); 35 + 36 + if (unlikely(tmp & SYS_PAR_EL1_F)) 37 + return false; /* Translation failed, back to guest */ 38 + 39 + /* Convert PAR to HPFAR format */ 40 + *hpfar = PAR_TO_HPFAR(tmp); 41 + return true; 42 + } 43 + 44 + static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault) 45 + { 46 + u64 hpfar, far; 47 + 48 + far = read_sysreg_el2(SYS_FAR); 49 + 50 + /* 51 + * The HPFAR can be invalid if the stage 2 fault did not 52 + * happen during a stage 1 page table walk (the ESR_EL2.S1PTW 53 + * bit is clear) and one of the two following cases are true: 54 + * 1. The fault was due to a permission fault 55 + * 2. The processor carries errata 834220 56 + * 57 + * Therefore, for all non S1PTW faults where we either have a 58 + * permission fault or the errata workaround is enabled, we 59 + * resolve the IPA using the AT instruction. 60 + */ 61 + if (!(esr & ESR_ELx_S1PTW) && 62 + (cpus_have_final_cap(ARM64_WORKAROUND_834220) || 63 + (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) { 64 + if (!__translate_far_to_hpfar(far, &hpfar)) 65 + return false; 66 + } else { 67 + hpfar = read_sysreg(hpfar_el2); 68 + } 69 + 70 + fault->far_el2 = far; 71 + fault->hpfar_el2 = hpfar; 72 + return true; 73 + } 74 + 75 + #endif

+97 -138

arch/arm64/kvm/hyp/include/hyp/switch.h

··· 8 8 #define __ARM64_KVM_HYP_SWITCH_H__ 9 9 10 10 #include <hyp/adjust_pc.h> 11 + #include <hyp/fault.h> 11 12 12 13 #include <linux/arm-smccc.h> 13 14 #include <linux/kvm_host.h> ··· 138 137 } 139 138 } 140 139 141 - static inline bool __translate_far_to_hpfar(u64 far, u64 *hpfar) 142 - { 143 - u64 par, tmp; 144 - 145 - /* 146 - * Resolve the IPA the hard way using the guest VA. 147 - * 148 - * Stage-1 translation already validated the memory access 149 - * rights. As such, we can use the EL1 translation regime, and 150 - * don't have to distinguish between EL0 and EL1 access. 151 - * 152 - * We do need to save/restore PAR_EL1 though, as we haven't 153 - * saved the guest context yet, and we may return early... 154 - */ 155 - par = read_sysreg_par(); 156 - if (!__kvm_at("s1e1r", far)) 157 - tmp = read_sysreg_par(); 158 - else 159 - tmp = SYS_PAR_EL1_F; /* back to the guest */ 160 - write_sysreg(par, par_el1); 161 - 162 - if (unlikely(tmp & SYS_PAR_EL1_F)) 163 - return false; /* Translation failed, back to guest */ 164 - 165 - /* Convert PAR to HPFAR format */ 166 - *hpfar = PAR_TO_HPFAR(tmp); 167 - return true; 168 - } 169 - 170 - static inline bool __get_fault_info(u64 esr, struct kvm_vcpu_fault_info *fault) 171 - { 172 - u64 hpfar, far; 173 - 174 - far = read_sysreg_el2(SYS_FAR); 175 - 176 - /* 177 - * The HPFAR can be invalid if the stage 2 fault did not 178 - * happen during a stage 1 page table walk (the ESR_EL2.S1PTW 179 - * bit is clear) and one of the two following cases are true: 180 - * 1. The fault was due to a permission fault 181 - * 2. The processor carries errata 834220 182 - * 183 - * Therefore, for all non S1PTW faults where we either have a 184 - * permission fault or the errata workaround is enabled, we 185 - * resolve the IPA using the AT instruction. 186 - */ 187 - if (!(esr & ESR_ELx_S1PTW) && 188 - (cpus_have_final_cap(ARM64_WORKAROUND_834220) || 189 - (esr & ESR_ELx_FSC_TYPE) == FSC_PERM)) { 190 - if (!__translate_far_to_hpfar(far, &hpfar)) 191 - return false; 192 - } else { 193 - hpfar = read_sysreg(hpfar_el2); 194 - } 195 - 196 - fault->far_el2 = far; 197 - fault->hpfar_el2 = hpfar; 198 - return true; 199 - } 200 - 201 140 static inline bool __populate_fault_info(struct kvm_vcpu *vcpu) 202 141 { 203 - u8 ec; 204 - u64 esr; 205 - 206 - esr = vcpu->arch.fault.esr_el2; 207 - ec = ESR_ELx_EC(esr); 208 - 209 - if (ec != ESR_ELx_EC_DABT_LOW && ec != ESR_ELx_EC_IABT_LOW) 210 - return true; 211 - 212 - return __get_fault_info(esr, &vcpu->arch.fault); 142 + return __get_fault_info(vcpu->arch.fault.esr_el2, &vcpu->arch.fault); 213 143 } 214 144 215 145 static inline void __hyp_sve_save_host(struct kvm_vcpu *vcpu) ··· 161 229 write_sysreg_el1(__vcpu_sys_reg(vcpu, ZCR_EL1), SYS_ZCR); 162 230 } 163 231 164 - /* Check for an FPSIMD/SVE trap and handle as appropriate */ 165 - static inline bool __hyp_handle_fpsimd(struct kvm_vcpu *vcpu) 232 + /* 233 + * We trap the first access to the FP/SIMD to save the host context and 234 + * restore the guest context lazily. 235 + * If FP/SIMD is not implemented, handle the trap and inject an undefined 236 + * instruction exception to the guest. Similarly for trapped SVE accesses. 237 + */ 238 + static bool kvm_hyp_handle_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) 166 239 { 167 240 bool sve_guest, sve_host; 168 241 u8 esr_ec; ··· 185 248 } 186 249 187 250 esr_ec = kvm_vcpu_trap_get_class(vcpu); 188 - if (esr_ec != ESR_ELx_EC_FP_ASIMD && 189 - esr_ec != ESR_ELx_EC_SVE) 190 - return false; 191 251 192 252 /* Don't handle SVE traps for non-SVE vcpus here: */ 193 253 if (!sve_guest && esr_ec != ESR_ELx_EC_FP_ASIMD) ··· 286 352 287 353 static inline bool esr_is_ptrauth_trap(u32 esr) 288 354 { 289 - u32 ec = ESR_ELx_EC(esr); 290 - 291 - if (ec == ESR_ELx_EC_PAC) 292 - return true; 293 - 294 - if (ec != ESR_ELx_EC_SYS64) 295 - return false; 296 - 297 355 switch (esr_sys64_to_sysreg(esr)) { 298 356 case SYS_APIAKEYLO_EL1: 299 357 case SYS_APIAKEYHI_EL1: ··· 314 388 315 389 DECLARE_PER_CPU(struct kvm_cpu_context, kvm_hyp_ctxt); 316 390 317 - static inline bool __hyp_handle_ptrauth(struct kvm_vcpu *vcpu) 391 + static bool kvm_hyp_handle_ptrauth(struct kvm_vcpu *vcpu, u64 *exit_code) 318 392 { 319 393 struct kvm_cpu_context *ctxt; 320 394 u64 val; 321 395 322 - if (!vcpu_has_ptrauth(vcpu) || 323 - !esr_is_ptrauth_trap(kvm_vcpu_get_esr(vcpu))) 396 + if (!vcpu_has_ptrauth(vcpu)) 324 397 return false; 325 398 326 399 ctxt = this_cpu_ptr(&kvm_hyp_ctxt); ··· 336 411 write_sysreg(val, hcr_el2); 337 412 338 413 return true; 414 + } 415 + 416 + static bool kvm_hyp_handle_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code) 417 + { 418 + if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM) && 419 + handle_tx2_tvm(vcpu)) 420 + return true; 421 + 422 + if (static_branch_unlikely(&vgic_v3_cpuif_trap) && 423 + __vgic_v3_perform_cpuif_access(vcpu) == 1) 424 + return true; 425 + 426 + if (esr_is_ptrauth_trap(kvm_vcpu_get_esr(vcpu))) 427 + return kvm_hyp_handle_ptrauth(vcpu, exit_code); 428 + 429 + return false; 430 + } 431 + 432 + static bool kvm_hyp_handle_cp15_32(struct kvm_vcpu *vcpu, u64 *exit_code) 433 + { 434 + if (static_branch_unlikely(&vgic_v3_cpuif_trap) && 435 + __vgic_v3_perform_cpuif_access(vcpu) == 1) 436 + return true; 437 + 438 + return false; 439 + } 440 + 441 + static bool kvm_hyp_handle_iabt_low(struct kvm_vcpu *vcpu, u64 *exit_code) 442 + { 443 + if (!__populate_fault_info(vcpu)) 444 + return true; 445 + 446 + return false; 447 + } 448 + 449 + static bool kvm_hyp_handle_dabt_low(struct kvm_vcpu *vcpu, u64 *exit_code) 450 + { 451 + if (!__populate_fault_info(vcpu)) 452 + return true; 453 + 454 + if (static_branch_unlikely(&vgic_v2_cpuif_trap)) { 455 + bool valid; 456 + 457 + valid = kvm_vcpu_trap_get_fault_type(vcpu) == FSC_FAULT && 458 + kvm_vcpu_dabt_isvalid(vcpu) && 459 + !kvm_vcpu_abt_issea(vcpu) && 460 + !kvm_vcpu_abt_iss1tw(vcpu); 461 + 462 + if (valid) { 463 + int ret = __vgic_v2_perform_cpuif_access(vcpu); 464 + 465 + if (ret == 1) 466 + return true; 467 + 468 + /* Promote an illegal access to an SError.*/ 469 + if (ret == -1) 470 + *exit_code = ARM_EXCEPTION_EL1_SERROR; 471 + } 472 + } 473 + 474 + return false; 475 + } 476 + 477 + typedef bool (*exit_handler_fn)(struct kvm_vcpu *, u64 *); 478 + 479 + static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu); 480 + 481 + /* 482 + * Allow the hypervisor to handle the exit with an exit handler if it has one. 483 + * 484 + * Returns true if the hypervisor handled the exit, and control should go back 485 + * to the guest, or false if it hasn't. 486 + */ 487 + static inline bool kvm_hyp_handle_exit(struct kvm_vcpu *vcpu, u64 *exit_code) 488 + { 489 + const exit_handler_fn *handlers = kvm_get_exit_handler_array(vcpu); 490 + exit_handler_fn fn; 491 + 492 + fn = handlers[kvm_vcpu_trap_get_class(vcpu)]; 493 + 494 + if (fn) 495 + return fn(vcpu, exit_code); 496 + 497 + return false; 339 498 } 340 499 341 500 /* ··· 456 447 if (*exit_code != ARM_EXCEPTION_TRAP) 457 448 goto exit; 458 449 459 - if (cpus_have_final_cap(ARM64_WORKAROUND_CAVIUM_TX2_219_TVM) && 460 - kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 && 461 - handle_tx2_tvm(vcpu)) 450 + /* Check if there's an exit handler and allow it to handle the exit. */ 451 + if (kvm_hyp_handle_exit(vcpu, exit_code)) 462 452 goto guest; 463 - 464 - /* 465 - * We trap the first access to the FP/SIMD to save the host context 466 - * and restore the guest context lazily. 467 - * If FP/SIMD is not implemented, handle the trap and inject an 468 - * undefined instruction exception to the guest. 469 - * Similarly for trapped SVE accesses. 470 - */ 471 - if (__hyp_handle_fpsimd(vcpu)) 472 - goto guest; 473 - 474 - if (__hyp_handle_ptrauth(vcpu)) 475 - goto guest; 476 - 477 - if (!__populate_fault_info(vcpu)) 478 - goto guest; 479 - 480 - if (static_branch_unlikely(&vgic_v2_cpuif_trap)) { 481 - bool valid; 482 - 483 - valid = kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_DABT_LOW && 484 - kvm_vcpu_trap_get_fault_type(vcpu) == FSC_FAULT && 485 - kvm_vcpu_dabt_isvalid(vcpu) && 486 - !kvm_vcpu_abt_issea(vcpu) && 487 - !kvm_vcpu_abt_iss1tw(vcpu); 488 - 489 - if (valid) { 490 - int ret = __vgic_v2_perform_cpuif_access(vcpu); 491 - 492 - if (ret == 1) 493 - goto guest; 494 - 495 - /* Promote an illegal access to an SError.*/ 496 - if (ret == -1) 497 - *exit_code = ARM_EXCEPTION_EL1_SERROR; 498 - 499 - goto exit; 500 - } 501 - } 502 - 503 - if (static_branch_unlikely(&vgic_v3_cpuif_trap) && 504 - (kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_SYS64 || 505 - kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_CP15_32)) { 506 - int ret = __vgic_v3_perform_cpuif_access(vcpu); 507 - 508 - if (ret == 1) 509 - goto guest; 510 - } 511 - 512 453 exit: 513 454 /* Return to the host kernel and handle the exit */ 514 455 return false;

+200

arch/arm64/kvm/hyp/include/nvhe/fixed_config.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright (C) 2021 Google LLC 4 + * Author: Fuad Tabba <tabba@google.com> 5 + */ 6 + 7 + #ifndef __ARM64_KVM_FIXED_CONFIG_H__ 8 + #define __ARM64_KVM_FIXED_CONFIG_H__ 9 + 10 + #include <asm/sysreg.h> 11 + 12 + /* 13 + * This file contains definitions for features to be allowed or restricted for 14 + * guest virtual machines, depending on the mode KVM is running in and on the 15 + * type of guest that is running. 16 + * 17 + * The ALLOW masks represent a bitmask of feature fields that are allowed 18 + * without any restrictions as long as they are supported by the system. 19 + * 20 + * The RESTRICT_UNSIGNED masks, if present, represent unsigned fields for 21 + * features that are restricted to support at most the specified feature. 22 + * 23 + * If a feature field is not present in either, than it is not supported. 24 + * 25 + * The approach taken for protected VMs is to allow features that are: 26 + * - Needed by common Linux distributions (e.g., floating point) 27 + * - Trivial to support, e.g., supporting the feature does not introduce or 28 + * require tracking of additional state in KVM 29 + * - Cannot be trapped or prevent the guest from using anyway 30 + */ 31 + 32 + /* 33 + * Allow for protected VMs: 34 + * - Floating-point and Advanced SIMD 35 + * - Data Independent Timing 36 + */ 37 + #define PVM_ID_AA64PFR0_ALLOW (\ 38 + ARM64_FEATURE_MASK(ID_AA64PFR0_FP) | \ 39 + ARM64_FEATURE_MASK(ID_AA64PFR0_ASIMD) | \ 40 + ARM64_FEATURE_MASK(ID_AA64PFR0_DIT) \ 41 + ) 42 + 43 + /* 44 + * Restrict to the following *unsigned* features for protected VMs: 45 + * - AArch64 guests only (no support for AArch32 guests): 46 + * AArch32 adds complexity in trap handling, emulation, condition codes, 47 + * etc... 48 + * - RAS (v1) 49 + * Supported by KVM 50 + */ 51 + #define PVM_ID_AA64PFR0_RESTRICT_UNSIGNED (\ 52 + FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL0), ID_AA64PFR0_ELx_64BIT_ONLY) | \ 53 + FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1), ID_AA64PFR0_ELx_64BIT_ONLY) | \ 54 + FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL2), ID_AA64PFR0_ELx_64BIT_ONLY) | \ 55 + FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_EL3), ID_AA64PFR0_ELx_64BIT_ONLY) | \ 56 + FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_RAS), ID_AA64PFR0_RAS_V1) \ 57 + ) 58 + 59 + /* 60 + * Allow for protected VMs: 61 + * - Branch Target Identification 62 + * - Speculative Store Bypassing 63 + */ 64 + #define PVM_ID_AA64PFR1_ALLOW (\ 65 + ARM64_FEATURE_MASK(ID_AA64PFR1_BT) | \ 66 + ARM64_FEATURE_MASK(ID_AA64PFR1_SSBS) \ 67 + ) 68 + 69 + /* 70 + * Allow for protected VMs: 71 + * - Mixed-endian 72 + * - Distinction between Secure and Non-secure Memory 73 + * - Mixed-endian at EL0 only 74 + * - Non-context synchronizing exception entry and exit 75 + */ 76 + #define PVM_ID_AA64MMFR0_ALLOW (\ 77 + ARM64_FEATURE_MASK(ID_AA64MMFR0_BIGENDEL) | \ 78 + ARM64_FEATURE_MASK(ID_AA64MMFR0_SNSMEM) | \ 79 + ARM64_FEATURE_MASK(ID_AA64MMFR0_BIGENDEL0) | \ 80 + ARM64_FEATURE_MASK(ID_AA64MMFR0_EXS) \ 81 + ) 82 + 83 + /* 84 + * Restrict to the following *unsigned* features for protected VMs: 85 + * - 40-bit IPA 86 + * - 16-bit ASID 87 + */ 88 + #define PVM_ID_AA64MMFR0_RESTRICT_UNSIGNED (\ 89 + FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64MMFR0_PARANGE), ID_AA64MMFR0_PARANGE_40) | \ 90 + FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64MMFR0_ASID), ID_AA64MMFR0_ASID_16) \ 91 + ) 92 + 93 + /* 94 + * Allow for protected VMs: 95 + * - Hardware translation table updates to Access flag and Dirty state 96 + * - Number of VMID bits from CPU 97 + * - Hierarchical Permission Disables 98 + * - Privileged Access Never 99 + * - SError interrupt exceptions from speculative reads 100 + * - Enhanced Translation Synchronization 101 + */ 102 + #define PVM_ID_AA64MMFR1_ALLOW (\ 103 + ARM64_FEATURE_MASK(ID_AA64MMFR1_HADBS) | \ 104 + ARM64_FEATURE_MASK(ID_AA64MMFR1_VMIDBITS) | \ 105 + ARM64_FEATURE_MASK(ID_AA64MMFR1_HPD) | \ 106 + ARM64_FEATURE_MASK(ID_AA64MMFR1_PAN) | \ 107 + ARM64_FEATURE_MASK(ID_AA64MMFR1_SPECSEI) | \ 108 + ARM64_FEATURE_MASK(ID_AA64MMFR1_ETS) \ 109 + ) 110 + 111 + /* 112 + * Allow for protected VMs: 113 + * - Common not Private translations 114 + * - User Access Override 115 + * - IESB bit in the SCTLR_ELx registers 116 + * - Unaligned single-copy atomicity and atomic functions 117 + * - ESR_ELx.EC value on an exception by read access to feature ID space 118 + * - TTL field in address operations. 119 + * - Break-before-make sequences when changing translation block size 120 + * - E0PDx mechanism 121 + */ 122 + #define PVM_ID_AA64MMFR2_ALLOW (\ 123 + ARM64_FEATURE_MASK(ID_AA64MMFR2_CNP) | \ 124 + ARM64_FEATURE_MASK(ID_AA64MMFR2_UAO) | \ 125 + ARM64_FEATURE_MASK(ID_AA64MMFR2_IESB) | \ 126 + ARM64_FEATURE_MASK(ID_AA64MMFR2_AT) | \ 127 + ARM64_FEATURE_MASK(ID_AA64MMFR2_IDS) | \ 128 + ARM64_FEATURE_MASK(ID_AA64MMFR2_TTL) | \ 129 + ARM64_FEATURE_MASK(ID_AA64MMFR2_BBM) | \ 130 + ARM64_FEATURE_MASK(ID_AA64MMFR2_E0PD) \ 131 + ) 132 + 133 + /* 134 + * No support for Scalable Vectors for protected VMs: 135 + * Requires additional support from KVM, e.g., context-switching and 136 + * trapping at EL2 137 + */ 138 + #define PVM_ID_AA64ZFR0_ALLOW (0ULL) 139 + 140 + /* 141 + * No support for debug, including breakpoints, and watchpoints for protected 142 + * VMs: 143 + * The Arm architecture mandates support for at least the Armv8 debug 144 + * architecture, which would include at least 2 hardware breakpoints and 145 + * watchpoints. Providing that support to protected guests adds 146 + * considerable state and complexity. Therefore, the reserved value of 0 is 147 + * used for debug-related fields. 148 + */ 149 + #define PVM_ID_AA64DFR0_ALLOW (0ULL) 150 + #define PVM_ID_AA64DFR1_ALLOW (0ULL) 151 + 152 + /* 153 + * No support for implementation defined features. 154 + */ 155 + #define PVM_ID_AA64AFR0_ALLOW (0ULL) 156 + #define PVM_ID_AA64AFR1_ALLOW (0ULL) 157 + 158 + /* 159 + * No restrictions on instructions implemented in AArch64. 160 + */ 161 + #define PVM_ID_AA64ISAR0_ALLOW (\ 162 + ARM64_FEATURE_MASK(ID_AA64ISAR0_AES) | \ 163 + ARM64_FEATURE_MASK(ID_AA64ISAR0_SHA1) | \ 164 + ARM64_FEATURE_MASK(ID_AA64ISAR0_SHA2) | \ 165 + ARM64_FEATURE_MASK(ID_AA64ISAR0_CRC32) | \ 166 + ARM64_FEATURE_MASK(ID_AA64ISAR0_ATOMICS) | \ 167 + ARM64_FEATURE_MASK(ID_AA64ISAR0_RDM) | \ 168 + ARM64_FEATURE_MASK(ID_AA64ISAR0_SHA3) | \ 169 + ARM64_FEATURE_MASK(ID_AA64ISAR0_SM3) | \ 170 + ARM64_FEATURE_MASK(ID_AA64ISAR0_SM4) | \ 171 + ARM64_FEATURE_MASK(ID_AA64ISAR0_DP) | \ 172 + ARM64_FEATURE_MASK(ID_AA64ISAR0_FHM) | \ 173 + ARM64_FEATURE_MASK(ID_AA64ISAR0_TS) | \ 174 + ARM64_FEATURE_MASK(ID_AA64ISAR0_TLB) | \ 175 + ARM64_FEATURE_MASK(ID_AA64ISAR0_RNDR) \ 176 + ) 177 + 178 + #define PVM_ID_AA64ISAR1_ALLOW (\ 179 + ARM64_FEATURE_MASK(ID_AA64ISAR1_DPB) | \ 180 + ARM64_FEATURE_MASK(ID_AA64ISAR1_APA) | \ 181 + ARM64_FEATURE_MASK(ID_AA64ISAR1_API) | \ 182 + ARM64_FEATURE_MASK(ID_AA64ISAR1_JSCVT) | \ 183 + ARM64_FEATURE_MASK(ID_AA64ISAR1_FCMA) | \ 184 + ARM64_FEATURE_MASK(ID_AA64ISAR1_LRCPC) | \ 185 + ARM64_FEATURE_MASK(ID_AA64ISAR1_GPA) | \ 186 + ARM64_FEATURE_MASK(ID_AA64ISAR1_GPI) | \ 187 + ARM64_FEATURE_MASK(ID_AA64ISAR1_FRINTTS) | \ 188 + ARM64_FEATURE_MASK(ID_AA64ISAR1_SB) | \ 189 + ARM64_FEATURE_MASK(ID_AA64ISAR1_SPECRES) | \ 190 + ARM64_FEATURE_MASK(ID_AA64ISAR1_BF16) | \ 191 + ARM64_FEATURE_MASK(ID_AA64ISAR1_DGH) | \ 192 + ARM64_FEATURE_MASK(ID_AA64ISAR1_I8MM) \ 193 + ) 194 + 195 + u64 pvm_read_id_reg(const struct kvm_vcpu *vcpu, u32 id); 196 + bool kvm_handle_pvm_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code); 197 + bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 *exit_code); 198 + int kvm_check_pvm_sysreg_table(void); 199 + 200 + #endif /* __ARM64_KVM_FIXED_CONFIG_H__ */

+2

arch/arm64/kvm/hyp/include/nvhe/trap_handler.h

··· 15 15 #define DECLARE_REG(type, name, ctxt, reg) \ 16 16 type name = (type)cpu_reg(ctxt, (reg)) 17 17 18 + void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu); 19 + 18 20 #endif /* __ARM64_KVM_NVHE_TRAP_HANDLER_H__ */

+1 -1

arch/arm64/kvm/hyp/nvhe/Makefile

··· 14 14 15 15 obj-y := timer-sr.o sysreg-sr.o debug-sr.o switch.o tlb.o hyp-init.o host.o \ 16 16 hyp-main.o hyp-smp.o psci-relay.o early_alloc.o stub.o page_alloc.o \ 17 - cache.o setup.o mm.o mem_protect.o 17 + cache.o setup.o mm.o mem_protect.o sys_regs.o pkvm.o 18 18 obj-y += ../vgic-v3-sr.o ../aarch32.o ../vgic-v2-cpuif-proxy.o ../entry.o \ 19 19 ../fpsimd.o ../hyp-entry.o ../exception.o ../pgtable.o 20 20 obj-y += $(lib-objs)

+17 -9

arch/arm64/kvm/hyp/nvhe/host.S

··· 110 110 b __host_enter_for_panic 111 111 SYM_FUNC_END(__hyp_do_panic) 112 112 113 - .macro host_el1_sync_vect 114 - .align 7 115 - .L__vect_start\@: 116 - stp x0, x1, [sp, #-16]! 117 - mrs x0, esr_el2 118 - lsr x0, x0, #ESR_ELx_EC_SHIFT 119 - cmp x0, #ESR_ELx_EC_HVC64 120 - b.ne __host_exit 121 - 113 + SYM_FUNC_START(__host_hvc) 122 114 ldp x0, x1, [sp] // Don't fixup the stack yet 115 + 116 + /* No stub for you, sonny Jim */ 117 + alternative_if ARM64_KVM_PROTECTED_MODE 118 + b __host_exit 119 + alternative_else_nop_endif 123 120 124 121 /* Check for a stub HVC call */ 125 122 cmp x0, #HVC_STUB_HCALL_NR ··· 134 137 ldr x5, =__kvm_handle_stub_hvc 135 138 hyp_pa x5, x6 136 139 br x5 140 + SYM_FUNC_END(__host_hvc) 141 + 142 + .macro host_el1_sync_vect 143 + .align 7 144 + .L__vect_start\@: 145 + stp x0, x1, [sp, #-16]! 146 + mrs x0, esr_el2 147 + lsr x0, x0, #ESR_ELx_EC_SHIFT 148 + cmp x0, #ESR_ELx_EC_HVC64 149 + b.eq __host_hvc 150 + b __host_exit 137 151 .L__vect_end\@: 138 152 .if ((.L__vect_end\@ - .L__vect_start\@) > 0x80) 139 153 .error "host_el1_sync_vect larger than vector entry"

+36 -12

arch/arm64/kvm/hyp/nvhe/hyp-main.c

··· 4 4 * Author: Andrew Scull <ascull@google.com> 5 5 */ 6 6 7 - #include <hyp/switch.h> 7 + #include <hyp/adjust_pc.h> 8 8 9 9 #include <asm/pgtable-types.h> 10 10 #include <asm/kvm_asm.h> ··· 160 160 { 161 161 cpu_reg(host_ctxt, 1) = __pkvm_prot_finalize(); 162 162 } 163 + 164 + static void handle___pkvm_vcpu_init_traps(struct kvm_cpu_context *host_ctxt) 165 + { 166 + DECLARE_REG(struct kvm_vcpu *, vcpu, host_ctxt, 1); 167 + 168 + __pkvm_vcpu_init_traps(kern_hyp_va(vcpu)); 169 + } 170 + 163 171 typedef void (*hcall_t)(struct kvm_cpu_context *); 164 172 165 173 #define HANDLE_FUNC(x) [__KVM_HOST_SMCCC_FUNC_##x] = (hcall_t)handle_##x 166 174 167 175 static const hcall_t host_hcall[] = { 168 - HANDLE_FUNC(__kvm_vcpu_run), 176 + /* ___kvm_hyp_init */ 177 + HANDLE_FUNC(__kvm_get_mdcr_el2), 178 + HANDLE_FUNC(__pkvm_init), 179 + HANDLE_FUNC(__pkvm_create_private_mapping), 180 + HANDLE_FUNC(__pkvm_cpu_set_vector), 181 + HANDLE_FUNC(__kvm_enable_ssbs), 182 + HANDLE_FUNC(__vgic_v3_init_lrs), 183 + HANDLE_FUNC(__vgic_v3_get_gic_config), 184 + HANDLE_FUNC(__pkvm_prot_finalize), 185 + 186 + HANDLE_FUNC(__pkvm_host_share_hyp), 169 187 HANDLE_FUNC(__kvm_adjust_pc), 188 + HANDLE_FUNC(__kvm_vcpu_run), 170 189 HANDLE_FUNC(__kvm_flush_vm_context), 171 190 HANDLE_FUNC(__kvm_tlb_flush_vmid_ipa), 172 191 HANDLE_FUNC(__kvm_tlb_flush_vmid), 173 192 HANDLE_FUNC(__kvm_flush_cpu_context), 174 193 HANDLE_FUNC(__kvm_timer_set_cntvoff), 175 - HANDLE_FUNC(__kvm_enable_ssbs), 176 - HANDLE_FUNC(__vgic_v3_get_gic_config), 177 194 HANDLE_FUNC(__vgic_v3_read_vmcr), 178 195 HANDLE_FUNC(__vgic_v3_write_vmcr), 179 - HANDLE_FUNC(__vgic_v3_init_lrs), 180 - HANDLE_FUNC(__kvm_get_mdcr_el2), 181 196 HANDLE_FUNC(__vgic_v3_save_aprs), 182 197 HANDLE_FUNC(__vgic_v3_restore_aprs), 183 - HANDLE_FUNC(__pkvm_init), 184 - HANDLE_FUNC(__pkvm_cpu_set_vector), 185 - HANDLE_FUNC(__pkvm_host_share_hyp), 186 - HANDLE_FUNC(__pkvm_create_private_mapping), 187 - HANDLE_FUNC(__pkvm_prot_finalize), 198 + HANDLE_FUNC(__pkvm_vcpu_init_traps), 188 199 }; 189 200 190 201 static void handle_host_hcall(struct kvm_cpu_context *host_ctxt) 191 202 { 192 203 DECLARE_REG(unsigned long, id, host_ctxt, 0); 204 + unsigned long hcall_min = 0; 193 205 hcall_t hfn; 206 + 207 + /* 208 + * If pKVM has been initialised then reject any calls to the 209 + * early "privileged" hypercalls. Note that we cannot reject 210 + * calls to __pkvm_prot_finalize for two reasons: (1) The static 211 + * key used to determine initialisation must be toggled prior to 212 + * finalisation and (2) finalisation is performed on a per-CPU 213 + * basis. This is all fine, however, since __pkvm_prot_finalize 214 + * returns -EPERM after the first call for a given CPU. 215 + */ 216 + if (static_branch_unlikely(&kvm_protected_mode_initialized)) 217 + hcall_min = __KVM_HOST_SMCCC_FUNC___pkvm_prot_finalize; 194 218 195 219 id -= KVM_HOST_SMCCC_ID(0); 196 220 197 - if (unlikely(id >= ARRAY_SIZE(host_hcall))) 221 + if (unlikely(id < hcall_min || id >= ARRAY_SIZE(host_hcall))) 198 222 goto inval; 199 223 200 224 hfn = host_hcall[id];

+4 -7

arch/arm64/kvm/hyp/nvhe/mem_protect.c

··· 11 11 #include <asm/kvm_pgtable.h> 12 12 #include <asm/stage2_pgtable.h> 13 13 14 - #include <hyp/switch.h> 14 + #include <hyp/fault.h> 15 15 16 16 #include <nvhe/gfp.h> 17 17 #include <nvhe/memory.h> ··· 24 24 struct host_kvm host_kvm; 25 25 26 26 static struct hyp_pool host_s2_pool; 27 - 28 - /* 29 - * Copies of the host's CPU features registers holding sanitized values. 30 - */ 31 - u64 id_aa64mmfr0_el1_sys_val; 32 - u64 id_aa64mmfr1_el1_sys_val; 33 27 34 28 const u8 pkvm_hyp_id = 1; 35 29 ··· 127 133 { 128 134 struct kvm_s2_mmu *mmu = &host_kvm.arch.mmu; 129 135 struct kvm_nvhe_init_params *params = this_cpu_ptr(&kvm_init_params); 136 + 137 + if (params->hcr_el2 & HCR_VM) 138 + return -EPERM; 130 139 131 140 params->vttbr = kvm_get_vttbr(mmu); 132 141 params->vtcr = host_kvm.arch.vtcr;

+185

arch/arm64/kvm/hyp/nvhe/pkvm.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2021 Google LLC 4 + * Author: Fuad Tabba <tabba@google.com> 5 + */ 6 + 7 + #include <linux/kvm_host.h> 8 + #include <linux/mm.h> 9 + #include <nvhe/fixed_config.h> 10 + #include <nvhe/trap_handler.h> 11 + 12 + /* 13 + * Set trap register values based on features in ID_AA64PFR0. 14 + */ 15 + static void pvm_init_traps_aa64pfr0(struct kvm_vcpu *vcpu) 16 + { 17 + const u64 feature_ids = pvm_read_id_reg(vcpu, SYS_ID_AA64PFR0_EL1); 18 + u64 hcr_set = HCR_RW; 19 + u64 hcr_clear = 0; 20 + u64 cptr_set = 0; 21 + 22 + /* Protected KVM does not support AArch32 guests. */ 23 + BUILD_BUG_ON(FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_EL0), 24 + PVM_ID_AA64PFR0_RESTRICT_UNSIGNED) != ID_AA64PFR0_ELx_64BIT_ONLY); 25 + BUILD_BUG_ON(FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1), 26 + PVM_ID_AA64PFR0_RESTRICT_UNSIGNED) != ID_AA64PFR0_ELx_64BIT_ONLY); 27 + 28 + /* 29 + * Linux guests assume support for floating-point and Advanced SIMD. Do 30 + * not change the trapping behavior for these from the KVM default. 31 + */ 32 + BUILD_BUG_ON(!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_FP), 33 + PVM_ID_AA64PFR0_ALLOW)); 34 + BUILD_BUG_ON(!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_ASIMD), 35 + PVM_ID_AA64PFR0_ALLOW)); 36 + 37 + /* Trap RAS unless all current versions are supported */ 38 + if (FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_RAS), feature_ids) < 39 + ID_AA64PFR0_RAS_V1P1) { 40 + hcr_set |= HCR_TERR | HCR_TEA; 41 + hcr_clear |= HCR_FIEN; 42 + } 43 + 44 + /* Trap AMU */ 45 + if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_AMU), feature_ids)) { 46 + hcr_clear |= HCR_AMVOFFEN; 47 + cptr_set |= CPTR_EL2_TAM; 48 + } 49 + 50 + /* Trap SVE */ 51 + if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_SVE), feature_ids)) 52 + cptr_set |= CPTR_EL2_TZ; 53 + 54 + vcpu->arch.hcr_el2 |= hcr_set; 55 + vcpu->arch.hcr_el2 &= ~hcr_clear; 56 + vcpu->arch.cptr_el2 |= cptr_set; 57 + } 58 + 59 + /* 60 + * Set trap register values based on features in ID_AA64PFR1. 61 + */ 62 + static void pvm_init_traps_aa64pfr1(struct kvm_vcpu *vcpu) 63 + { 64 + const u64 feature_ids = pvm_read_id_reg(vcpu, SYS_ID_AA64PFR1_EL1); 65 + u64 hcr_set = 0; 66 + u64 hcr_clear = 0; 67 + 68 + /* Memory Tagging: Trap and Treat as Untagged if not supported. */ 69 + if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR1_MTE), feature_ids)) { 70 + hcr_set |= HCR_TID5; 71 + hcr_clear |= HCR_DCT | HCR_ATA; 72 + } 73 + 74 + vcpu->arch.hcr_el2 |= hcr_set; 75 + vcpu->arch.hcr_el2 &= ~hcr_clear; 76 + } 77 + 78 + /* 79 + * Set trap register values based on features in ID_AA64DFR0. 80 + */ 81 + static void pvm_init_traps_aa64dfr0(struct kvm_vcpu *vcpu) 82 + { 83 + const u64 feature_ids = pvm_read_id_reg(vcpu, SYS_ID_AA64DFR0_EL1); 84 + u64 mdcr_set = 0; 85 + u64 mdcr_clear = 0; 86 + u64 cptr_set = 0; 87 + 88 + /* Trap/constrain PMU */ 89 + if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_PMUVER), feature_ids)) { 90 + mdcr_set |= MDCR_EL2_TPM | MDCR_EL2_TPMCR; 91 + mdcr_clear |= MDCR_EL2_HPME | MDCR_EL2_MTPME | 92 + MDCR_EL2_HPMN_MASK; 93 + } 94 + 95 + /* Trap Debug */ 96 + if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_DEBUGVER), feature_ids)) 97 + mdcr_set |= MDCR_EL2_TDRA | MDCR_EL2_TDA | MDCR_EL2_TDE; 98 + 99 + /* Trap OS Double Lock */ 100 + if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_DOUBLELOCK), feature_ids)) 101 + mdcr_set |= MDCR_EL2_TDOSA; 102 + 103 + /* Trap SPE */ 104 + if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_PMSVER), feature_ids)) { 105 + mdcr_set |= MDCR_EL2_TPMS; 106 + mdcr_clear |= MDCR_EL2_E2PB_MASK << MDCR_EL2_E2PB_SHIFT; 107 + } 108 + 109 + /* Trap Trace Filter */ 110 + if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_TRACE_FILT), feature_ids)) 111 + mdcr_set |= MDCR_EL2_TTRF; 112 + 113 + /* Trap Trace */ 114 + if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64DFR0_TRACEVER), feature_ids)) 115 + cptr_set |= CPTR_EL2_TTA; 116 + 117 + vcpu->arch.mdcr_el2 |= mdcr_set; 118 + vcpu->arch.mdcr_el2 &= ~mdcr_clear; 119 + vcpu->arch.cptr_el2 |= cptr_set; 120 + } 121 + 122 + /* 123 + * Set trap register values based on features in ID_AA64MMFR0. 124 + */ 125 + static void pvm_init_traps_aa64mmfr0(struct kvm_vcpu *vcpu) 126 + { 127 + const u64 feature_ids = pvm_read_id_reg(vcpu, SYS_ID_AA64MMFR0_EL1); 128 + u64 mdcr_set = 0; 129 + 130 + /* Trap Debug Communications Channel registers */ 131 + if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR0_FGT), feature_ids)) 132 + mdcr_set |= MDCR_EL2_TDCC; 133 + 134 + vcpu->arch.mdcr_el2 |= mdcr_set; 135 + } 136 + 137 + /* 138 + * Set trap register values based on features in ID_AA64MMFR1. 139 + */ 140 + static void pvm_init_traps_aa64mmfr1(struct kvm_vcpu *vcpu) 141 + { 142 + const u64 feature_ids = pvm_read_id_reg(vcpu, SYS_ID_AA64MMFR1_EL1); 143 + u64 hcr_set = 0; 144 + 145 + /* Trap LOR */ 146 + if (!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64MMFR1_LOR), feature_ids)) 147 + hcr_set |= HCR_TLOR; 148 + 149 + vcpu->arch.hcr_el2 |= hcr_set; 150 + } 151 + 152 + /* 153 + * Set baseline trap register values. 154 + */ 155 + static void pvm_init_trap_regs(struct kvm_vcpu *vcpu) 156 + { 157 + const u64 hcr_trap_feat_regs = HCR_TID3; 158 + const u64 hcr_trap_impdef = HCR_TACR | HCR_TIDCP | HCR_TID1; 159 + 160 + /* 161 + * Always trap: 162 + * - Feature id registers: to control features exposed to guests 163 + * - Implementation-defined features 164 + */ 165 + vcpu->arch.hcr_el2 |= hcr_trap_feat_regs | hcr_trap_impdef; 166 + 167 + /* Clear res0 and set res1 bits to trap potential new features. */ 168 + vcpu->arch.hcr_el2 &= ~(HCR_RES0); 169 + vcpu->arch.mdcr_el2 &= ~(MDCR_EL2_RES0); 170 + vcpu->arch.cptr_el2 |= CPTR_NVHE_EL2_RES1; 171 + vcpu->arch.cptr_el2 &= ~(CPTR_NVHE_EL2_RES0); 172 + } 173 + 174 + /* 175 + * Initialize trap register values for protected VMs. 176 + */ 177 + void __pkvm_vcpu_init_traps(struct kvm_vcpu *vcpu) 178 + { 179 + pvm_init_trap_regs(vcpu); 180 + pvm_init_traps_aa64pfr0(vcpu); 181 + pvm_init_traps_aa64pfr1(vcpu); 182 + pvm_init_traps_aa64dfr0(vcpu); 183 + pvm_init_traps_aa64mmfr0(vcpu); 184 + pvm_init_traps_aa64mmfr1(vcpu); 185 + }

+3

arch/arm64/kvm/hyp/nvhe/setup.c

··· 10 10 #include <asm/kvm_pgtable.h> 11 11 12 12 #include <nvhe/early_alloc.h> 13 + #include <nvhe/fixed_config.h> 13 14 #include <nvhe/gfp.h> 14 15 #include <nvhe/memory.h> 15 16 #include <nvhe/mem_protect.h> ··· 260 259 void *virt = hyp_phys_to_virt(phys); 261 260 void (*fn)(phys_addr_t params_pa, void *finalize_fn_va); 262 261 int ret; 262 + 263 + BUG_ON(kvm_check_pvm_sysreg_table()); 263 264 264 265 if (!PAGE_ALIGNED(phys) || !PAGE_ALIGNED(size)) 265 266 return -EINVAL;

+99

arch/arm64/kvm/hyp/nvhe/switch.c

··· 27 27 #include <asm/processor.h> 28 28 #include <asm/thread_info.h> 29 29 30 + #include <nvhe/fixed_config.h> 30 31 #include <nvhe/mem_protect.h> 31 32 32 33 /* Non-VHE specific context */ ··· 159 158 write_sysreg(pmu->events_host, pmcntenset_el0); 160 159 } 161 160 161 + /** 162 + * Handler for protected VM MSR, MRS or System instruction execution in AArch64. 163 + * 164 + * Returns true if the hypervisor has handled the exit, and control should go 165 + * back to the guest, or false if it hasn't. 166 + */ 167 + static bool kvm_handle_pvm_sys64(struct kvm_vcpu *vcpu, u64 *exit_code) 168 + { 169 + /* 170 + * Make sure we handle the exit for workarounds and ptrauth 171 + * before the pKVM handling, as the latter could decide to 172 + * UNDEF. 173 + */ 174 + return (kvm_hyp_handle_sysreg(vcpu, exit_code) || 175 + kvm_handle_pvm_sysreg(vcpu, exit_code)); 176 + } 177 + 178 + /** 179 + * Handler for protected floating-point and Advanced SIMD accesses. 180 + * 181 + * Returns true if the hypervisor has handled the exit, and control should go 182 + * back to the guest, or false if it hasn't. 183 + */ 184 + static bool kvm_handle_pvm_fpsimd(struct kvm_vcpu *vcpu, u64 *exit_code) 185 + { 186 + /* Linux guests assume support for floating-point and Advanced SIMD. */ 187 + BUILD_BUG_ON(!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_FP), 188 + PVM_ID_AA64PFR0_ALLOW)); 189 + BUILD_BUG_ON(!FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_ASIMD), 190 + PVM_ID_AA64PFR0_ALLOW)); 191 + 192 + return kvm_hyp_handle_fpsimd(vcpu, exit_code); 193 + } 194 + 195 + static const exit_handler_fn hyp_exit_handlers[] = { 196 + [0 ... ESR_ELx_EC_MAX] = NULL, 197 + [ESR_ELx_EC_CP15_32] = kvm_hyp_handle_cp15_32, 198 + [ESR_ELx_EC_SYS64] = kvm_hyp_handle_sysreg, 199 + [ESR_ELx_EC_SVE] = kvm_hyp_handle_fpsimd, 200 + [ESR_ELx_EC_FP_ASIMD] = kvm_hyp_handle_fpsimd, 201 + [ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low, 202 + [ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low, 203 + [ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth, 204 + }; 205 + 206 + static const exit_handler_fn pvm_exit_handlers[] = { 207 + [0 ... ESR_ELx_EC_MAX] = NULL, 208 + [ESR_ELx_EC_SYS64] = kvm_handle_pvm_sys64, 209 + [ESR_ELx_EC_SVE] = kvm_handle_pvm_restricted, 210 + [ESR_ELx_EC_FP_ASIMD] = kvm_handle_pvm_fpsimd, 211 + [ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low, 212 + [ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low, 213 + [ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth, 214 + }; 215 + 216 + static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu) 217 + { 218 + if (unlikely(kvm_vm_is_protected(kern_hyp_va(vcpu->kvm)))) 219 + return pvm_exit_handlers; 220 + 221 + return hyp_exit_handlers; 222 + } 223 + 224 + /* 225 + * Some guests (e.g., protected VMs) are not be allowed to run in AArch32. 226 + * The ARMv8 architecture does not give the hypervisor a mechanism to prevent a 227 + * guest from dropping to AArch32 EL0 if implemented by the CPU. If the 228 + * hypervisor spots a guest in such a state ensure it is handled, and don't 229 + * trust the host to spot or fix it. The check below is based on the one in 230 + * kvm_arch_vcpu_ioctl_run(). 231 + * 232 + * Returns false if the guest ran in AArch32 when it shouldn't have, and 233 + * thus should exit to the host, or true if a the guest run loop can continue. 234 + */ 235 + static bool handle_aarch32_guest(struct kvm_vcpu *vcpu, u64 *exit_code) 236 + { 237 + struct kvm *kvm = kern_hyp_va(vcpu->kvm); 238 + 239 + if (kvm_vm_is_protected(kvm) && vcpu_mode_is_32bit(vcpu)) { 240 + /* 241 + * As we have caught the guest red-handed, decide that it isn't 242 + * fit for purpose anymore by making the vcpu invalid. The VMM 243 + * can try and fix it by re-initializing the vcpu with 244 + * KVM_ARM_VCPU_INIT, however, this is likely not possible for 245 + * protected VMs. 246 + */ 247 + vcpu->arch.target = -1; 248 + *exit_code &= BIT(ARM_EXIT_WITH_SERROR_BIT); 249 + *exit_code |= ARM_EXCEPTION_IL; 250 + return false; 251 + } 252 + 253 + return true; 254 + } 255 + 162 256 /* Switch to the guest for legacy non-VHE systems */ 163 257 int __kvm_vcpu_run(struct kvm_vcpu *vcpu) 164 258 { ··· 315 219 do { 316 220 /* Jump in the fire! */ 317 221 exit_code = __guest_enter(vcpu); 222 + 223 + if (unlikely(!handle_aarch32_guest(vcpu, &exit_code))) 224 + break; 318 225 319 226 /* And we're baaack! */ 320 227 } while (fixup_guest_exit(vcpu, &exit_code));

+487

arch/arm64/kvm/hyp/nvhe/sys_regs.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2021 Google LLC 4 + * Author: Fuad Tabba <tabba@google.com> 5 + */ 6 + 7 + #include <linux/irqchip/arm-gic-v3.h> 8 + 9 + #include <asm/kvm_asm.h> 10 + #include <asm/kvm_mmu.h> 11 + 12 + #include <hyp/adjust_pc.h> 13 + 14 + #include <nvhe/fixed_config.h> 15 + 16 + #include "../../sys_regs.h" 17 + 18 + /* 19 + * Copies of the host's CPU features registers holding sanitized values at hyp. 20 + */ 21 + u64 id_aa64pfr0_el1_sys_val; 22 + u64 id_aa64pfr1_el1_sys_val; 23 + u64 id_aa64isar0_el1_sys_val; 24 + u64 id_aa64isar1_el1_sys_val; 25 + u64 id_aa64mmfr0_el1_sys_val; 26 + u64 id_aa64mmfr1_el1_sys_val; 27 + u64 id_aa64mmfr2_el1_sys_val; 28 + 29 + /* 30 + * Inject an unknown/undefined exception to an AArch64 guest while most of its 31 + * sysregs are live. 32 + */ 33 + static void inject_undef64(struct kvm_vcpu *vcpu) 34 + { 35 + u32 esr = (ESR_ELx_EC_UNKNOWN << ESR_ELx_EC_SHIFT); 36 + 37 + *vcpu_pc(vcpu) = read_sysreg_el2(SYS_ELR); 38 + *vcpu_cpsr(vcpu) = read_sysreg_el2(SYS_SPSR); 39 + 40 + vcpu->arch.flags |= (KVM_ARM64_EXCEPT_AA64_EL1 | 41 + KVM_ARM64_EXCEPT_AA64_ELx_SYNC | 42 + KVM_ARM64_PENDING_EXCEPTION); 43 + 44 + __kvm_adjust_pc(vcpu); 45 + 46 + write_sysreg_el1(esr, SYS_ESR); 47 + write_sysreg_el1(read_sysreg_el2(SYS_ELR), SYS_ELR); 48 + write_sysreg_el2(*vcpu_pc(vcpu), SYS_ELR); 49 + write_sysreg_el2(*vcpu_cpsr(vcpu), SYS_SPSR); 50 + } 51 + 52 + /* 53 + * Returns the restricted features values of the feature register based on the 54 + * limitations in restrict_fields. 55 + * A feature id field value of 0b0000 does not impose any restrictions. 56 + * Note: Use only for unsigned feature field values. 57 + */ 58 + static u64 get_restricted_features_unsigned(u64 sys_reg_val, 59 + u64 restrict_fields) 60 + { 61 + u64 value = 0UL; 62 + u64 mask = GENMASK_ULL(ARM64_FEATURE_FIELD_BITS - 1, 0); 63 + 64 + /* 65 + * According to the Arm Architecture Reference Manual, feature fields 66 + * use increasing values to indicate increases in functionality. 67 + * Iterate over the restricted feature fields and calculate the minimum 68 + * unsigned value between the one supported by the system, and what the 69 + * value is being restricted to. 70 + */ 71 + while (sys_reg_val && restrict_fields) { 72 + value |= min(sys_reg_val & mask, restrict_fields & mask); 73 + sys_reg_val &= ~mask; 74 + restrict_fields &= ~mask; 75 + mask <<= ARM64_FEATURE_FIELD_BITS; 76 + } 77 + 78 + return value; 79 + } 80 + 81 + /* 82 + * Functions that return the value of feature id registers for protected VMs 83 + * based on allowed features, system features, and KVM support. 84 + */ 85 + 86 + static u64 get_pvm_id_aa64pfr0(const struct kvm_vcpu *vcpu) 87 + { 88 + const struct kvm *kvm = (const struct kvm *)kern_hyp_va(vcpu->kvm); 89 + u64 set_mask = 0; 90 + u64 allow_mask = PVM_ID_AA64PFR0_ALLOW; 91 + 92 + if (!vcpu_has_sve(vcpu)) 93 + allow_mask &= ~ARM64_FEATURE_MASK(ID_AA64PFR0_SVE); 94 + 95 + set_mask |= get_restricted_features_unsigned(id_aa64pfr0_el1_sys_val, 96 + PVM_ID_AA64PFR0_RESTRICT_UNSIGNED); 97 + 98 + /* Spectre and Meltdown mitigation in KVM */ 99 + set_mask |= FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_CSV2), 100 + (u64)kvm->arch.pfr0_csv2); 101 + set_mask |= FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_CSV3), 102 + (u64)kvm->arch.pfr0_csv3); 103 + 104 + return (id_aa64pfr0_el1_sys_val & allow_mask) | set_mask; 105 + } 106 + 107 + static u64 get_pvm_id_aa64pfr1(const struct kvm_vcpu *vcpu) 108 + { 109 + const struct kvm *kvm = (const struct kvm *)kern_hyp_va(vcpu->kvm); 110 + u64 allow_mask = PVM_ID_AA64PFR1_ALLOW; 111 + 112 + if (!kvm_has_mte(kvm)) 113 + allow_mask &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_MTE); 114 + 115 + return id_aa64pfr1_el1_sys_val & allow_mask; 116 + } 117 + 118 + static u64 get_pvm_id_aa64zfr0(const struct kvm_vcpu *vcpu) 119 + { 120 + /* 121 + * No support for Scalable Vectors, therefore, hyp has no sanitized 122 + * copy of the feature id register. 123 + */ 124 + BUILD_BUG_ON(PVM_ID_AA64ZFR0_ALLOW != 0ULL); 125 + return 0; 126 + } 127 + 128 + static u64 get_pvm_id_aa64dfr0(const struct kvm_vcpu *vcpu) 129 + { 130 + /* 131 + * No support for debug, including breakpoints, and watchpoints, 132 + * therefore, pKVM has no sanitized copy of the feature id register. 133 + */ 134 + BUILD_BUG_ON(PVM_ID_AA64DFR0_ALLOW != 0ULL); 135 + return 0; 136 + } 137 + 138 + static u64 get_pvm_id_aa64dfr1(const struct kvm_vcpu *vcpu) 139 + { 140 + /* 141 + * No support for debug, therefore, hyp has no sanitized copy of the 142 + * feature id register. 143 + */ 144 + BUILD_BUG_ON(PVM_ID_AA64DFR1_ALLOW != 0ULL); 145 + return 0; 146 + } 147 + 148 + static u64 get_pvm_id_aa64afr0(const struct kvm_vcpu *vcpu) 149 + { 150 + /* 151 + * No support for implementation defined features, therefore, hyp has no 152 + * sanitized copy of the feature id register. 153 + */ 154 + BUILD_BUG_ON(PVM_ID_AA64AFR0_ALLOW != 0ULL); 155 + return 0; 156 + } 157 + 158 + static u64 get_pvm_id_aa64afr1(const struct kvm_vcpu *vcpu) 159 + { 160 + /* 161 + * No support for implementation defined features, therefore, hyp has no 162 + * sanitized copy of the feature id register. 163 + */ 164 + BUILD_BUG_ON(PVM_ID_AA64AFR1_ALLOW != 0ULL); 165 + return 0; 166 + } 167 + 168 + static u64 get_pvm_id_aa64isar0(const struct kvm_vcpu *vcpu) 169 + { 170 + return id_aa64isar0_el1_sys_val & PVM_ID_AA64ISAR0_ALLOW; 171 + } 172 + 173 + static u64 get_pvm_id_aa64isar1(const struct kvm_vcpu *vcpu) 174 + { 175 + u64 allow_mask = PVM_ID_AA64ISAR1_ALLOW; 176 + 177 + if (!vcpu_has_ptrauth(vcpu)) 178 + allow_mask &= ~(ARM64_FEATURE_MASK(ID_AA64ISAR1_APA) | 179 + ARM64_FEATURE_MASK(ID_AA64ISAR1_API) | 180 + ARM64_FEATURE_MASK(ID_AA64ISAR1_GPA) | 181 + ARM64_FEATURE_MASK(ID_AA64ISAR1_GPI)); 182 + 183 + return id_aa64isar1_el1_sys_val & allow_mask; 184 + } 185 + 186 + static u64 get_pvm_id_aa64mmfr0(const struct kvm_vcpu *vcpu) 187 + { 188 + u64 set_mask; 189 + 190 + set_mask = get_restricted_features_unsigned(id_aa64mmfr0_el1_sys_val, 191 + PVM_ID_AA64MMFR0_RESTRICT_UNSIGNED); 192 + 193 + return (id_aa64mmfr0_el1_sys_val & PVM_ID_AA64MMFR0_ALLOW) | set_mask; 194 + } 195 + 196 + static u64 get_pvm_id_aa64mmfr1(const struct kvm_vcpu *vcpu) 197 + { 198 + return id_aa64mmfr1_el1_sys_val & PVM_ID_AA64MMFR1_ALLOW; 199 + } 200 + 201 + static u64 get_pvm_id_aa64mmfr2(const struct kvm_vcpu *vcpu) 202 + { 203 + return id_aa64mmfr2_el1_sys_val & PVM_ID_AA64MMFR2_ALLOW; 204 + } 205 + 206 + /* Read a sanitized cpufeature ID register by its encoding */ 207 + u64 pvm_read_id_reg(const struct kvm_vcpu *vcpu, u32 id) 208 + { 209 + switch (id) { 210 + case SYS_ID_AA64PFR0_EL1: 211 + return get_pvm_id_aa64pfr0(vcpu); 212 + case SYS_ID_AA64PFR1_EL1: 213 + return get_pvm_id_aa64pfr1(vcpu); 214 + case SYS_ID_AA64ZFR0_EL1: 215 + return get_pvm_id_aa64zfr0(vcpu); 216 + case SYS_ID_AA64DFR0_EL1: 217 + return get_pvm_id_aa64dfr0(vcpu); 218 + case SYS_ID_AA64DFR1_EL1: 219 + return get_pvm_id_aa64dfr1(vcpu); 220 + case SYS_ID_AA64AFR0_EL1: 221 + return get_pvm_id_aa64afr0(vcpu); 222 + case SYS_ID_AA64AFR1_EL1: 223 + return get_pvm_id_aa64afr1(vcpu); 224 + case SYS_ID_AA64ISAR0_EL1: 225 + return get_pvm_id_aa64isar0(vcpu); 226 + case SYS_ID_AA64ISAR1_EL1: 227 + return get_pvm_id_aa64isar1(vcpu); 228 + case SYS_ID_AA64MMFR0_EL1: 229 + return get_pvm_id_aa64mmfr0(vcpu); 230 + case SYS_ID_AA64MMFR1_EL1: 231 + return get_pvm_id_aa64mmfr1(vcpu); 232 + case SYS_ID_AA64MMFR2_EL1: 233 + return get_pvm_id_aa64mmfr2(vcpu); 234 + default: 235 + /* 236 + * Should never happen because all cases are covered in 237 + * pvm_sys_reg_descs[]. 238 + */ 239 + WARN_ON(1); 240 + break; 241 + } 242 + 243 + return 0; 244 + } 245 + 246 + static u64 read_id_reg(const struct kvm_vcpu *vcpu, 247 + struct sys_reg_desc const *r) 248 + { 249 + return pvm_read_id_reg(vcpu, reg_to_encoding(r)); 250 + } 251 + 252 + /* Handler to RAZ/WI sysregs */ 253 + static bool pvm_access_raz_wi(struct kvm_vcpu *vcpu, struct sys_reg_params *p, 254 + const struct sys_reg_desc *r) 255 + { 256 + if (!p->is_write) 257 + p->regval = 0; 258 + 259 + return true; 260 + } 261 + 262 + /* 263 + * Accessor for AArch32 feature id registers. 264 + * 265 + * The value of these registers is "unknown" according to the spec if AArch32 266 + * isn't supported. 267 + */ 268 + static bool pvm_access_id_aarch32(struct kvm_vcpu *vcpu, 269 + struct sys_reg_params *p, 270 + const struct sys_reg_desc *r) 271 + { 272 + if (p->is_write) { 273 + inject_undef64(vcpu); 274 + return false; 275 + } 276 + 277 + /* 278 + * No support for AArch32 guests, therefore, pKVM has no sanitized copy 279 + * of AArch32 feature id registers. 280 + */ 281 + BUILD_BUG_ON(FIELD_GET(ARM64_FEATURE_MASK(ID_AA64PFR0_EL1), 282 + PVM_ID_AA64PFR0_RESTRICT_UNSIGNED) > ID_AA64PFR0_ELx_64BIT_ONLY); 283 + 284 + return pvm_access_raz_wi(vcpu, p, r); 285 + } 286 + 287 + /* 288 + * Accessor for AArch64 feature id registers. 289 + * 290 + * If access is allowed, set the regval to the protected VM's view of the 291 + * register and return true. 292 + * Otherwise, inject an undefined exception and return false. 293 + */ 294 + static bool pvm_access_id_aarch64(struct kvm_vcpu *vcpu, 295 + struct sys_reg_params *p, 296 + const struct sys_reg_desc *r) 297 + { 298 + if (p->is_write) { 299 + inject_undef64(vcpu); 300 + return false; 301 + } 302 + 303 + p->regval = read_id_reg(vcpu, r); 304 + return true; 305 + } 306 + 307 + static bool pvm_gic_read_sre(struct kvm_vcpu *vcpu, 308 + struct sys_reg_params *p, 309 + const struct sys_reg_desc *r) 310 + { 311 + /* pVMs only support GICv3. 'nuf said. */ 312 + if (!p->is_write) 313 + p->regval = ICC_SRE_EL1_DIB | ICC_SRE_EL1_DFB | ICC_SRE_EL1_SRE; 314 + 315 + return true; 316 + } 317 + 318 + /* Mark the specified system register as an AArch32 feature id register. */ 319 + #define AARCH32(REG) { SYS_DESC(REG), .access = pvm_access_id_aarch32 } 320 + 321 + /* Mark the specified system register as an AArch64 feature id register. */ 322 + #define AARCH64(REG) { SYS_DESC(REG), .access = pvm_access_id_aarch64 } 323 + 324 + /* Mark the specified system register as Read-As-Zero/Write-Ignored */ 325 + #define RAZ_WI(REG) { SYS_DESC(REG), .access = pvm_access_raz_wi } 326 + 327 + /* Mark the specified system register as not being handled in hyp. */ 328 + #define HOST_HANDLED(REG) { SYS_DESC(REG), .access = NULL } 329 + 330 + /* 331 + * Architected system registers. 332 + * Important: Must be sorted ascending by Op0, Op1, CRn, CRm, Op2 333 + * 334 + * NOTE: Anything not explicitly listed here is *restricted by default*, i.e., 335 + * it will lead to injecting an exception into the guest. 336 + */ 337 + static const struct sys_reg_desc pvm_sys_reg_descs[] = { 338 + /* Cache maintenance by set/way operations are restricted. */ 339 + 340 + /* Debug and Trace Registers are restricted. */ 341 + 342 + /* AArch64 mappings of the AArch32 ID registers */ 343 + /* CRm=1 */ 344 + AARCH32(SYS_ID_PFR0_EL1), 345 + AARCH32(SYS_ID_PFR1_EL1), 346 + AARCH32(SYS_ID_DFR0_EL1), 347 + AARCH32(SYS_ID_AFR0_EL1), 348 + AARCH32(SYS_ID_MMFR0_EL1), 349 + AARCH32(SYS_ID_MMFR1_EL1), 350 + AARCH32(SYS_ID_MMFR2_EL1), 351 + AARCH32(SYS_ID_MMFR3_EL1), 352 + 353 + /* CRm=2 */ 354 + AARCH32(SYS_ID_ISAR0_EL1), 355 + AARCH32(SYS_ID_ISAR1_EL1), 356 + AARCH32(SYS_ID_ISAR2_EL1), 357 + AARCH32(SYS_ID_ISAR3_EL1), 358 + AARCH32(SYS_ID_ISAR4_EL1), 359 + AARCH32(SYS_ID_ISAR5_EL1), 360 + AARCH32(SYS_ID_MMFR4_EL1), 361 + AARCH32(SYS_ID_ISAR6_EL1), 362 + 363 + /* CRm=3 */ 364 + AARCH32(SYS_MVFR0_EL1), 365 + AARCH32(SYS_MVFR1_EL1), 366 + AARCH32(SYS_MVFR2_EL1), 367 + AARCH32(SYS_ID_PFR2_EL1), 368 + AARCH32(SYS_ID_DFR1_EL1), 369 + AARCH32(SYS_ID_MMFR5_EL1), 370 + 371 + /* AArch64 ID registers */ 372 + /* CRm=4 */ 373 + AARCH64(SYS_ID_AA64PFR0_EL1), 374 + AARCH64(SYS_ID_AA64PFR1_EL1), 375 + AARCH64(SYS_ID_AA64ZFR0_EL1), 376 + AARCH64(SYS_ID_AA64DFR0_EL1), 377 + AARCH64(SYS_ID_AA64DFR1_EL1), 378 + AARCH64(SYS_ID_AA64AFR0_EL1), 379 + AARCH64(SYS_ID_AA64AFR1_EL1), 380 + AARCH64(SYS_ID_AA64ISAR0_EL1), 381 + AARCH64(SYS_ID_AA64ISAR1_EL1), 382 + AARCH64(SYS_ID_AA64MMFR0_EL1), 383 + AARCH64(SYS_ID_AA64MMFR1_EL1), 384 + AARCH64(SYS_ID_AA64MMFR2_EL1), 385 + 386 + /* Scalable Vector Registers are restricted. */ 387 + 388 + RAZ_WI(SYS_ERRIDR_EL1), 389 + RAZ_WI(SYS_ERRSELR_EL1), 390 + RAZ_WI(SYS_ERXFR_EL1), 391 + RAZ_WI(SYS_ERXCTLR_EL1), 392 + RAZ_WI(SYS_ERXSTATUS_EL1), 393 + RAZ_WI(SYS_ERXADDR_EL1), 394 + RAZ_WI(SYS_ERXMISC0_EL1), 395 + RAZ_WI(SYS_ERXMISC1_EL1), 396 + 397 + /* Performance Monitoring Registers are restricted. */ 398 + 399 + /* Limited Ordering Regions Registers are restricted. */ 400 + 401 + HOST_HANDLED(SYS_ICC_SGI1R_EL1), 402 + HOST_HANDLED(SYS_ICC_ASGI1R_EL1), 403 + HOST_HANDLED(SYS_ICC_SGI0R_EL1), 404 + { SYS_DESC(SYS_ICC_SRE_EL1), .access = pvm_gic_read_sre, }, 405 + 406 + HOST_HANDLED(SYS_CCSIDR_EL1), 407 + HOST_HANDLED(SYS_CLIDR_EL1), 408 + HOST_HANDLED(SYS_CSSELR_EL1), 409 + HOST_HANDLED(SYS_CTR_EL0), 410 + 411 + /* Performance Monitoring Registers are restricted. */ 412 + 413 + /* Activity Monitoring Registers are restricted. */ 414 + 415 + HOST_HANDLED(SYS_CNTP_TVAL_EL0), 416 + HOST_HANDLED(SYS_CNTP_CTL_EL0), 417 + HOST_HANDLED(SYS_CNTP_CVAL_EL0), 418 + 419 + /* Performance Monitoring Registers are restricted. */ 420 + }; 421 + 422 + /* 423 + * Checks that the sysreg table is unique and in-order. 424 + * 425 + * Returns 0 if the table is consistent, or 1 otherwise. 426 + */ 427 + int kvm_check_pvm_sysreg_table(void) 428 + { 429 + unsigned int i; 430 + 431 + for (i = 1; i < ARRAY_SIZE(pvm_sys_reg_descs); i++) { 432 + if (cmp_sys_reg(&pvm_sys_reg_descs[i-1], &pvm_sys_reg_descs[i]) >= 0) 433 + return 1; 434 + } 435 + 436 + return 0; 437 + } 438 + 439 + /* 440 + * Handler for protected VM MSR, MRS or System instruction execution. 441 + * 442 + * Returns true if the hypervisor has handled the exit, and control should go 443 + * back to the guest, or false if it hasn't, to be handled by the host. 444 + */ 445 + bool kvm_handle_pvm_sysreg(struct kvm_vcpu *vcpu, u64 *exit_code) 446 + { 447 + const struct sys_reg_desc *r; 448 + struct sys_reg_params params; 449 + unsigned long esr = kvm_vcpu_get_esr(vcpu); 450 + int Rt = kvm_vcpu_sys_get_rt(vcpu); 451 + 452 + params = esr_sys64_to_params(esr); 453 + params.regval = vcpu_get_reg(vcpu, Rt); 454 + 455 + r = find_reg(&params, pvm_sys_reg_descs, ARRAY_SIZE(pvm_sys_reg_descs)); 456 + 457 + /* Undefined (RESTRICTED). */ 458 + if (r == NULL) { 459 + inject_undef64(vcpu); 460 + return true; 461 + } 462 + 463 + /* Handled by the host (HOST_HANDLED) */ 464 + if (r->access == NULL) 465 + return false; 466 + 467 + /* Handled by hyp: skip instruction if instructed to do so. */ 468 + if (r->access(vcpu, &params, r)) 469 + __kvm_skip_instr(vcpu); 470 + 471 + if (!params.is_write) 472 + vcpu_set_reg(vcpu, Rt, params.regval); 473 + 474 + return true; 475 + } 476 + 477 + /** 478 + * Handler for protected VM restricted exceptions. 479 + * 480 + * Inject an undefined exception into the guest and return true to indicate that 481 + * the hypervisor has handled the exit, and control should go back to the guest. 482 + */ 483 + bool kvm_handle_pvm_restricted(struct kvm_vcpu *vcpu, u64 *exit_code) 484 + { 485 + inject_undef64(vcpu); 486 + return true; 487 + }

+8 -14

arch/arm64/kvm/hyp/vgic-v3-sr.c

··· 695 695 goto spurious; 696 696 697 697 lr_val &= ~ICH_LR_STATE; 698 - /* No active state for LPIs */ 699 - if ((lr_val & ICH_LR_VIRTUAL_ID_MASK) <= VGIC_MAX_SPI) 700 - lr_val |= ICH_LR_ACTIVE_BIT; 698 + lr_val |= ICH_LR_ACTIVE_BIT; 701 699 __gic_v3_set_lr(lr_val, lr); 702 700 __vgic_v3_set_active_priority(lr_prio, vmcr, grp); 703 701 vcpu_set_reg(vcpu, rt, lr_val & ICH_LR_VIRTUAL_ID_MASK); ··· 762 764 /* Drop priority in any case */ 763 765 act_prio = __vgic_v3_clear_highest_active_priority(); 764 766 765 - /* If EOIing an LPI, no deactivate to be performed */ 766 - if (vid >= VGIC_MIN_LPI) 767 - return; 768 - 769 - /* EOImode == 1, nothing to be done here */ 770 - if (vmcr & ICH_VMCR_EOIM_MASK) 771 - return; 772 - 773 767 lr = __vgic_v3_find_active_lr(vcpu, vid, &lr_val); 774 768 if (lr == -1) { 775 - __vgic_v3_bump_eoicount(); 769 + /* Do not bump EOIcount for LPIs that aren't in the LRs */ 770 + if (!(vid >= VGIC_MIN_LPI)) 771 + __vgic_v3_bump_eoicount(); 776 772 return; 777 773 } 774 + 775 + /* EOImode == 1 and not an LPI, nothing to be done here */ 776 + if ((vmcr & ICH_VMCR_EOIM_MASK) && !(vid >= VGIC_MIN_LPI)) 777 + return; 778 778 779 779 lr_prio = (lr_val & ICH_LR_PRIORITY_MASK) >> ICH_LR_PRIORITY_SHIFT; 780 780 ··· 983 987 val = ((vtr >> 29) & 7) << ICC_CTLR_EL1_PRI_BITS_SHIFT; 984 988 /* IDbits */ 985 989 val |= ((vtr >> 23) & 7) << ICC_CTLR_EL1_ID_BITS_SHIFT; 986 - /* SEIS */ 987 - val |= ((vtr >> 22) & 1) << ICC_CTLR_EL1_SEIS_SHIFT; 988 990 /* A3V */ 989 991 val |= ((vtr >> 21) & 1) << ICC_CTLR_EL1_A3V_SHIFT; 990 992 /* EOImode */

+16

arch/arm64/kvm/hyp/vhe/switch.c

··· 96 96 __deactivate_traps_common(vcpu); 97 97 } 98 98 99 + static const exit_handler_fn hyp_exit_handlers[] = { 100 + [0 ... ESR_ELx_EC_MAX] = NULL, 101 + [ESR_ELx_EC_CP15_32] = kvm_hyp_handle_cp15_32, 102 + [ESR_ELx_EC_SYS64] = kvm_hyp_handle_sysreg, 103 + [ESR_ELx_EC_SVE] = kvm_hyp_handle_fpsimd, 104 + [ESR_ELx_EC_FP_ASIMD] = kvm_hyp_handle_fpsimd, 105 + [ESR_ELx_EC_IABT_LOW] = kvm_hyp_handle_iabt_low, 106 + [ESR_ELx_EC_DABT_LOW] = kvm_hyp_handle_dabt_low, 107 + [ESR_ELx_EC_PAC] = kvm_hyp_handle_ptrauth, 108 + }; 109 + 110 + static const exit_handler_fn *kvm_get_exit_handler_array(struct kvm_vcpu *vcpu) 111 + { 112 + return hyp_exit_handlers; 113 + } 114 + 99 115 /* Switch to the guest for VHE systems running in EL2 */ 100 116 static int __kvm_vcpu_run_vhe(struct kvm_vcpu *vcpu) 101 117 {

+1 -1

arch/arm64/kvm/mmu.c

··· 512 512 return -EINVAL; 513 513 } 514 514 515 - pgt = kzalloc(sizeof(*pgt), GFP_KERNEL); 515 + pgt = kzalloc(sizeof(*pgt), GFP_KERNEL_ACCOUNT); 516 516 if (!pgt) 517 517 return -ENOMEM; 518 518

+1 -1

arch/arm64/kvm/pmu-emul.c

··· 978 978 mutex_lock(&vcpu->kvm->lock); 979 979 980 980 if (!vcpu->kvm->arch.pmu_filter) { 981 - vcpu->kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL); 981 + vcpu->kvm->arch.pmu_filter = bitmap_alloc(nr_events, GFP_KERNEL_ACCOUNT); 982 982 if (!vcpu->kvm->arch.pmu_filter) { 983 983 mutex_unlock(&vcpu->kvm->lock); 984 984 return -ENOMEM;

+1 -1

arch/arm64/kvm/reset.c

··· 106 106 vl > SVE_VL_ARCH_MAX)) 107 107 return -EIO; 108 108 109 - buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL); 109 + buf = kzalloc(SVE_SIG_REGS_SIZE(sve_vq_from_vl(vl)), GFP_KERNEL_ACCOUNT); 110 110 if (!buf) 111 111 return -ENOMEM; 112 112

+25 -18

arch/arm64/kvm/sys_regs.c

··· 1064 1064 struct sys_reg_desc const *r, bool raz) 1065 1065 { 1066 1066 u32 id = reg_to_encoding(r); 1067 - u64 val = raz ? 0 : read_sanitised_ftr_reg(id); 1067 + u64 val; 1068 + 1069 + if (raz) 1070 + return 0; 1071 + 1072 + val = read_sanitised_ftr_reg(id); 1068 1073 1069 1074 switch (id) { 1070 1075 case SYS_ID_AA64PFR0_EL1: ··· 1080 1075 val |= FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_CSV2), (u64)vcpu->kvm->arch.pfr0_csv2); 1081 1076 val &= ~ARM64_FEATURE_MASK(ID_AA64PFR0_CSV3); 1082 1077 val |= FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_CSV3), (u64)vcpu->kvm->arch.pfr0_csv3); 1078 + if (irqchip_in_kernel(vcpu->kvm) && 1079 + vcpu->kvm->arch.vgic.vgic_model == KVM_DEV_TYPE_ARM_VGIC_V3) { 1080 + val &= ~ARM64_FEATURE_MASK(ID_AA64PFR0_GIC); 1081 + val |= FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR0_GIC), 1); 1082 + } 1083 1083 break; 1084 1084 case SYS_ID_AA64PFR1_EL1: 1085 - val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_MTE); 1086 - if (kvm_has_mte(vcpu->kvm)) { 1087 - u64 pfr, mte; 1088 - 1089 - pfr = read_sanitised_ftr_reg(SYS_ID_AA64PFR1_EL1); 1090 - mte = cpuid_feature_extract_unsigned_field(pfr, ID_AA64PFR1_MTE_SHIFT); 1091 - val |= FIELD_PREP(ARM64_FEATURE_MASK(ID_AA64PFR1_MTE), mte); 1092 - } 1085 + if (!kvm_has_mte(vcpu->kvm)) 1086 + val &= ~ARM64_FEATURE_MASK(ID_AA64PFR1_MTE); 1093 1087 break; 1094 1088 case SYS_ID_AA64ISAR1_EL1: 1095 1089 if (!vcpu_has_ptrauth(vcpu)) ··· 1272 1268 return __set_id_reg(vcpu, rd, uaddr, raz); 1273 1269 } 1274 1270 1275 - static int get_raz_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, 1276 - const struct kvm_one_reg *reg, void __user *uaddr) 1277 - { 1278 - return __get_id_reg(vcpu, rd, uaddr, true); 1279 - } 1280 - 1281 1271 static int set_raz_id_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, 1282 1272 const struct kvm_one_reg *reg, void __user *uaddr) 1283 1273 { 1284 1274 return __set_id_reg(vcpu, rd, uaddr, true); 1275 + } 1276 + 1277 + static int get_raz_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, 1278 + const struct kvm_one_reg *reg, void __user *uaddr) 1279 + { 1280 + const u64 id = sys_reg_to_index(rd); 1281 + const u64 val = 0; 1282 + 1283 + return reg_to_user(uaddr, &val, id); 1285 1284 } 1286 1285 1287 1286 static int set_wi_reg(struct kvm_vcpu *vcpu, const struct sys_reg_desc *rd, ··· 1395 1388 #define ID_UNALLOCATED(crm, op2) { \ 1396 1389 Op0(3), Op1(0), CRn(0), CRm(crm), Op2(op2), \ 1397 1390 .access = access_raz_id_reg, \ 1398 - .get_user = get_raz_id_reg, \ 1391 + .get_user = get_raz_reg, \ 1399 1392 .set_user = set_raz_id_reg, \ 1400 1393 } 1401 1394 ··· 1407 1400 #define ID_HIDDEN(name) { \ 1408 1401 SYS_DESC(SYS_##name), \ 1409 1402 .access = access_raz_id_reg, \ 1410 - .get_user = get_raz_id_reg, \ 1403 + .get_user = get_raz_reg, \ 1411 1404 .set_user = set_raz_id_reg, \ 1412 1405 } 1413 1406 ··· 1649 1642 * previously (and pointlessly) advertised in the past... 1650 1643 */ 1651 1644 { PMU_SYS_REG(SYS_PMSWINC_EL0), 1652 - .get_user = get_raz_id_reg, .set_user = set_wi_reg, 1645 + .get_user = get_raz_reg, .set_user = set_wi_reg, 1653 1646 .access = access_pmswinc, .reset = NULL }, 1654 1647 { PMU_SYS_REG(SYS_PMSELR_EL0), 1655 1648 .access = access_pmselr, .reset = reset_pmselr, .reg = PMSELR_EL0 },

+1 -1

arch/arm64/kvm/vgic/vgic-init.c

··· 134 134 struct kvm_vcpu *vcpu0 = kvm_get_vcpu(kvm, 0); 135 135 int i; 136 136 137 - dist->spis = kcalloc(nr_spis, sizeof(struct vgic_irq), GFP_KERNEL); 137 + dist->spis = kcalloc(nr_spis, sizeof(struct vgic_irq), GFP_KERNEL_ACCOUNT); 138 138 if (!dist->spis) 139 139 return -ENOMEM; 140 140

+1 -1

arch/arm64/kvm/vgic/vgic-irqfd.c

··· 139 139 u32 nr = dist->nr_spis; 140 140 int i, ret; 141 141 142 - entries = kcalloc(nr, sizeof(*entries), GFP_KERNEL); 142 + entries = kcalloc(nr, sizeof(*entries), GFP_KERNEL_ACCOUNT); 143 143 if (!entries) 144 144 return -ENOMEM; 145 145

+9 -9

arch/arm64/kvm/vgic/vgic-its.c

··· 48 48 if (irq) 49 49 return irq; 50 50 51 - irq = kzalloc(sizeof(struct vgic_irq), GFP_KERNEL); 51 + irq = kzalloc(sizeof(struct vgic_irq), GFP_KERNEL_ACCOUNT); 52 52 if (!irq) 53 53 return ERR_PTR(-ENOMEM); 54 54 ··· 332 332 * we must be careful not to overrun the array. 333 333 */ 334 334 irq_count = READ_ONCE(dist->lpi_list_count); 335 - intids = kmalloc_array(irq_count, sizeof(intids[0]), GFP_KERNEL); 335 + intids = kmalloc_array(irq_count, sizeof(intids[0]), GFP_KERNEL_ACCOUNT); 336 336 if (!intids) 337 337 return -ENOMEM; 338 338 ··· 985 985 if (!vgic_its_check_id(its, its->baser_coll_table, coll_id, NULL)) 986 986 return E_ITS_MAPC_COLLECTION_OOR; 987 987 988 - collection = kzalloc(sizeof(*collection), GFP_KERNEL); 988 + collection = kzalloc(sizeof(*collection), GFP_KERNEL_ACCOUNT); 989 989 if (!collection) 990 990 return -ENOMEM; 991 991 ··· 1029 1029 { 1030 1030 struct its_ite *ite; 1031 1031 1032 - ite = kzalloc(sizeof(*ite), GFP_KERNEL); 1032 + ite = kzalloc(sizeof(*ite), GFP_KERNEL_ACCOUNT); 1033 1033 if (!ite) 1034 1034 return ERR_PTR(-ENOMEM); 1035 1035 ··· 1150 1150 { 1151 1151 struct its_device *device; 1152 1152 1153 - device = kzalloc(sizeof(*device), GFP_KERNEL); 1153 + device = kzalloc(sizeof(*device), GFP_KERNEL_ACCOUNT); 1154 1154 if (!device) 1155 1155 return ERR_PTR(-ENOMEM); 1156 1156 ··· 1847 1847 struct vgic_translation_cache_entry *cte; 1848 1848 1849 1849 /* An allocation failure is not fatal */ 1850 - cte = kzalloc(sizeof(*cte), GFP_KERNEL); 1850 + cte = kzalloc(sizeof(*cte), GFP_KERNEL_ACCOUNT); 1851 1851 if (WARN_ON(!cte)) 1852 1852 break; 1853 1853 ··· 1888 1888 if (type != KVM_DEV_TYPE_ARM_VGIC_ITS) 1889 1889 return -ENODEV; 1890 1890 1891 - its = kzalloc(sizeof(struct vgic_its), GFP_KERNEL); 1891 + its = kzalloc(sizeof(struct vgic_its), GFP_KERNEL_ACCOUNT); 1892 1892 if (!its) 1893 1893 return -ENOMEM; 1894 1894 ··· 2710 2710 if (copy_from_user(&addr, uaddr, sizeof(addr))) 2711 2711 return -EFAULT; 2712 2712 2713 - ret = vgic_check_ioaddr(dev->kvm, &its->vgic_its_base, 2714 - addr, SZ_64K); 2713 + ret = vgic_check_iorange(dev->kvm, its->vgic_its_base, 2714 + addr, SZ_64K, KVM_VGIC_V3_ITS_SIZE); 2715 2715 if (ret) 2716 2716 return ret; 2717 2717

+16 -9

arch/arm64/kvm/vgic/vgic-kvm-device.c

··· 14 14 15 15 /* common helpers */ 16 16 17 - int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr, 18 - phys_addr_t addr, phys_addr_t alignment) 17 + int vgic_check_iorange(struct kvm *kvm, phys_addr_t ioaddr, 18 + phys_addr_t addr, phys_addr_t alignment, 19 + phys_addr_t size) 19 20 { 20 - if (addr & ~kvm_phys_mask(kvm)) 21 - return -E2BIG; 21 + if (!IS_VGIC_ADDR_UNDEF(ioaddr)) 22 + return -EEXIST; 22 23 23 - if (!IS_ALIGNED(addr, alignment)) 24 + if (!IS_ALIGNED(addr, alignment) || !IS_ALIGNED(size, alignment)) 24 25 return -EINVAL; 25 26 26 - if (!IS_VGIC_ADDR_UNDEF(*ioaddr)) 27 - return -EEXIST; 27 + if (addr + size < addr) 28 + return -EINVAL; 29 + 30 + if (addr & ~kvm_phys_mask(kvm) || addr + size > kvm_phys_size(kvm)) 31 + return -E2BIG; 28 32 29 33 return 0; 30 34 } ··· 61 57 { 62 58 int r = 0; 63 59 struct vgic_dist *vgic = &kvm->arch.vgic; 64 - phys_addr_t *addr_ptr, alignment; 60 + phys_addr_t *addr_ptr, alignment, size; 65 61 u64 undef_value = VGIC_ADDR_UNDEF; 66 62 67 63 mutex_lock(&kvm->lock); ··· 70 66 r = vgic_check_type(kvm, KVM_DEV_TYPE_ARM_VGIC_V2); 71 67 addr_ptr = &vgic->vgic_dist_base; 72 68 alignment = SZ_4K; 69 + size = KVM_VGIC_V2_DIST_SIZE; 73 70 break; 74 71 case KVM_VGIC_V2_ADDR_TYPE_CPU: 75 72 r = vgic_check_type(kvm, KVM_DEV_TYPE_ARM_VGIC_V2); 76 73 addr_ptr = &vgic->vgic_cpu_base; 77 74 alignment = SZ_4K; 75 + size = KVM_VGIC_V2_CPU_SIZE; 78 76 break; 79 77 case KVM_VGIC_V3_ADDR_TYPE_DIST: 80 78 r = vgic_check_type(kvm, KVM_DEV_TYPE_ARM_VGIC_V3); 81 79 addr_ptr = &vgic->vgic_dist_base; 82 80 alignment = SZ_64K; 81 + size = KVM_VGIC_V3_DIST_SIZE; 83 82 break; 84 83 case KVM_VGIC_V3_ADDR_TYPE_REDIST: { 85 84 struct vgic_redist_region *rdreg; ··· 147 140 goto out; 148 141 149 142 if (write) { 150 - r = vgic_check_ioaddr(kvm, addr_ptr, *addr, alignment); 143 + r = vgic_check_iorange(kvm, *addr_ptr, *addr, alignment, size); 151 144 if (!r) 152 145 *addr_ptr = *addr; 153 146 } else {

+5 -3

arch/arm64/kvm/vgic/vgic-mmio-v3.c

··· 796 796 struct vgic_dist *d = &kvm->arch.vgic; 797 797 struct vgic_redist_region *rdreg; 798 798 struct list_head *rd_regions = &d->rd_regions; 799 - size_t size = count * KVM_VGIC_V3_REDIST_SIZE; 799 + int nr_vcpus = atomic_read(&kvm->online_vcpus); 800 + size_t size = count ? count * KVM_VGIC_V3_REDIST_SIZE 801 + : nr_vcpus * KVM_VGIC_V3_REDIST_SIZE; 800 802 int ret; 801 803 802 804 /* cross the end of memory ? */ ··· 836 834 if (vgic_v3_rdist_overlap(kvm, base, size)) 837 835 return -EINVAL; 838 836 839 - rdreg = kzalloc(sizeof(*rdreg), GFP_KERNEL); 837 + rdreg = kzalloc(sizeof(*rdreg), GFP_KERNEL_ACCOUNT); 840 838 if (!rdreg) 841 839 return -ENOMEM; 842 840 843 841 rdreg->base = VGIC_ADDR_UNDEF; 844 842 845 - ret = vgic_check_ioaddr(kvm, &rdreg->base, base, SZ_64K); 843 + ret = vgic_check_iorange(kvm, rdreg->base, base, SZ_64K, size); 846 844 if (ret) 847 845 goto free; 848 846

+22 -5

arch/arm64/kvm/vgic/vgic-v3.c

··· 15 15 static bool group0_trap; 16 16 static bool group1_trap; 17 17 static bool common_trap; 18 + static bool dir_trap; 18 19 static bool gicv4_enable; 19 20 20 21 void vgic_v3_set_underflow(struct kvm_vcpu *vcpu) ··· 297 296 vgic_v3->vgic_hcr |= ICH_HCR_TALL1; 298 297 if (common_trap) 299 298 vgic_v3->vgic_hcr |= ICH_HCR_TC; 299 + if (dir_trap) 300 + vgic_v3->vgic_hcr |= ICH_HCR_TDIR; 300 301 } 301 302 302 303 int vgic_v3_lpi_sync_pending_status(struct kvm *kvm, struct vgic_irq *irq) ··· 486 483 return false; 487 484 488 485 list_for_each_entry(rdreg, &d->rd_regions, list) { 489 - if (rdreg->base + vgic_v3_rd_region_size(kvm, rdreg) < 490 - rdreg->base) 486 + size_t sz = vgic_v3_rd_region_size(kvm, rdreg); 487 + 488 + if (vgic_check_iorange(kvm, VGIC_ADDR_UNDEF, 489 + rdreg->base, SZ_64K, sz)) 491 490 return false; 492 491 } 493 492 ··· 676 671 group1_trap = true; 677 672 } 678 673 679 - if (group0_trap || group1_trap || common_trap) { 680 - kvm_info("GICv3 sysreg trapping enabled ([%s%s%s], reduced performance)\n", 674 + if (kvm_vgic_global_state.ich_vtr_el2 & ICH_VTR_SEIS_MASK) { 675 + kvm_info("GICv3 with locally generated SEI\n"); 676 + 677 + group0_trap = true; 678 + group1_trap = true; 679 + if (ich_vtr_el2 & ICH_VTR_TDS_MASK) 680 + dir_trap = true; 681 + else 682 + common_trap = true; 683 + } 684 + 685 + if (group0_trap || group1_trap || common_trap | dir_trap) { 686 + kvm_info("GICv3 sysreg trapping enabled ([%s%s%s%s], reduced performance)\n", 681 687 group0_trap ? "G0" : "", 682 688 group1_trap ? "G1" : "", 683 - common_trap ? "C" : ""); 689 + common_trap ? "C" : "", 690 + dir_trap ? "D" : ""); 684 691 static_branch_enable(&vgic_v3_cpuif_trap); 685 692 } 686 693

+1 -1

arch/arm64/kvm/vgic/vgic-v4.c

··· 246 246 nr_vcpus = atomic_read(&kvm->online_vcpus); 247 247 248 248 dist->its_vm.vpes = kcalloc(nr_vcpus, sizeof(*dist->its_vm.vpes), 249 - GFP_KERNEL); 249 + GFP_KERNEL_ACCOUNT); 250 250 if (!dist->its_vm.vpes) 251 251 return -ENOMEM; 252 252

+3 -2

arch/arm64/kvm/vgic/vgic.h

··· 172 172 void vgic_irq_handle_resampling(struct vgic_irq *irq, 173 173 bool lr_deactivated, bool lr_pending); 174 174 175 - int vgic_check_ioaddr(struct kvm *kvm, phys_addr_t *ioaddr, 176 - phys_addr_t addr, phys_addr_t alignment); 175 + int vgic_check_iorange(struct kvm *kvm, phys_addr_t ioaddr, 176 + phys_addr_t addr, phys_addr_t alignment, 177 + phys_addr_t size); 177 178 178 179 void vgic_v2_fold_lr_state(struct kvm_vcpu *vcpu); 179 180 void vgic_v2_populate_lr(struct kvm_vcpu *vcpu, struct vgic_irq *irq, int lr);

+1 -1

arch/mips/kvm/mips.c

··· 1073 1073 r = KVM_MAX_VCPUS; 1074 1074 break; 1075 1075 case KVM_CAP_MAX_VCPU_ID: 1076 - r = KVM_MAX_VCPU_ID; 1076 + r = KVM_MAX_VCPU_IDS; 1077 1077 break; 1078 1078 case KVM_CAP_MIPS_FPU: 1079 1079 /* We don't handle systems with inconsistent cpu_has_fpu */

+1 -1

arch/powerpc/include/asm/kvm_book3s.h

··· 434 434 #define SPLIT_HACK_OFFS 0xfb000000 435 435 436 436 /* 437 - * This packs a VCPU ID from the [0..KVM_MAX_VCPU_ID) space down to the 437 + * This packs a VCPU ID from the [0..KVM_MAX_VCPU_IDS) space down to the 438 438 * [0..KVM_MAX_VCPUS) space, using knowledge of the guest's core stride 439 439 * (but not its actual threading mode, which is not available) to avoid 440 440 * collisions.

+2 -2

arch/powerpc/include/asm/kvm_host.h

··· 33 33 34 34 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE 35 35 #include <asm/kvm_book3s_asm.h> /* for MAX_SMT_THREADS */ 36 - #define KVM_MAX_VCPU_ID (MAX_SMT_THREADS * KVM_MAX_VCORES) 36 + #define KVM_MAX_VCPU_IDS (MAX_SMT_THREADS * KVM_MAX_VCORES) 37 37 #define KVM_MAX_NESTED_GUESTS KVMPPC_NR_LPIDS 38 38 39 39 #else 40 - #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS 40 + #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS 41 41 #endif /* CONFIG_KVM_BOOK3S_HV_POSSIBLE */ 42 42 43 43 #define __KVM_HAVE_ARCH_INTC_INITIALIZED

+1 -1

arch/powerpc/kvm/book3s_xive.c

··· 1928 1928 1929 1929 pr_devel("%s nr_servers=%u\n", __func__, nr_servers); 1930 1930 1931 - if (!nr_servers || nr_servers > KVM_MAX_VCPU_ID) 1931 + if (!nr_servers || nr_servers > KVM_MAX_VCPU_IDS) 1932 1932 return -EINVAL; 1933 1933 1934 1934 mutex_lock(&xive->lock);

+1 -1

arch/powerpc/kvm/powerpc.c

··· 649 649 r = KVM_MAX_VCPUS; 650 650 break; 651 651 case KVM_CAP_MAX_VCPU_ID: 652 - r = KVM_MAX_VCPU_ID; 652 + r = KVM_MAX_VCPU_IDS; 653 653 break; 654 654 #ifdef CONFIG_PPC_BOOK3S_64 655 655 case KVM_CAP_PPC_GET_SMMU_INFO:

+2

arch/riscv/Kconfig

··· 566 566 source "kernel/power/Kconfig" 567 567 568 568 endmenu 569 + 570 + source "arch/riscv/kvm/Kconfig"

+1

arch/riscv/Makefile

··· 100 100 head-y := arch/riscv/kernel/head.o 101 101 102 102 core-$(CONFIG_RISCV_ERRATA_ALTERNATIVE) += arch/riscv/errata/ 103 + core-$(CONFIG_KVM) += arch/riscv/kvm/ 103 104 104 105 libs-y += arch/riscv/lib/ 105 106 libs-$(CONFIG_EFI_STUB) += $(objtree)/drivers/firmware/efi/libstub/lib.a

+87

arch/riscv/include/asm/csr.h

··· 58 58 59 59 /* Interrupt causes (minus the high bit) */ 60 60 #define IRQ_S_SOFT 1 61 + #define IRQ_VS_SOFT 2 61 62 #define IRQ_M_SOFT 3 62 63 #define IRQ_S_TIMER 5 64 + #define IRQ_VS_TIMER 6 63 65 #define IRQ_M_TIMER 7 64 66 #define IRQ_S_EXT 9 67 + #define IRQ_VS_EXT 10 65 68 #define IRQ_M_EXT 11 66 69 67 70 /* Exception causes */ 68 71 #define EXC_INST_MISALIGNED 0 69 72 #define EXC_INST_ACCESS 1 73 + #define EXC_INST_ILLEGAL 2 70 74 #define EXC_BREAKPOINT 3 71 75 #define EXC_LOAD_ACCESS 5 72 76 #define EXC_STORE_ACCESS 7 73 77 #define EXC_SYSCALL 8 78 + #define EXC_HYPERVISOR_SYSCALL 9 79 + #define EXC_SUPERVISOR_SYSCALL 10 74 80 #define EXC_INST_PAGE_FAULT 12 75 81 #define EXC_LOAD_PAGE_FAULT 13 76 82 #define EXC_STORE_PAGE_FAULT 15 83 + #define EXC_INST_GUEST_PAGE_FAULT 20 84 + #define EXC_LOAD_GUEST_PAGE_FAULT 21 85 + #define EXC_VIRTUAL_INST_FAULT 22 86 + #define EXC_STORE_GUEST_PAGE_FAULT 23 77 87 78 88 /* PMP configuration */ 79 89 #define PMP_R 0x01 ··· 94 84 #define PMP_A_NA4 0x10 95 85 #define PMP_A_NAPOT 0x18 96 86 #define PMP_L 0x80 87 + 88 + /* HSTATUS flags */ 89 + #ifdef CONFIG_64BIT 90 + #define HSTATUS_VSXL _AC(0x300000000, UL) 91 + #define HSTATUS_VSXL_SHIFT 32 92 + #endif 93 + #define HSTATUS_VTSR _AC(0x00400000, UL) 94 + #define HSTATUS_VTW _AC(0x00200000, UL) 95 + #define HSTATUS_VTVM _AC(0x00100000, UL) 96 + #define HSTATUS_VGEIN _AC(0x0003f000, UL) 97 + #define HSTATUS_VGEIN_SHIFT 12 98 + #define HSTATUS_HU _AC(0x00000200, UL) 99 + #define HSTATUS_SPVP _AC(0x00000100, UL) 100 + #define HSTATUS_SPV _AC(0x00000080, UL) 101 + #define HSTATUS_GVA _AC(0x00000040, UL) 102 + #define HSTATUS_VSBE _AC(0x00000020, UL) 103 + 104 + /* HGATP flags */ 105 + #define HGATP_MODE_OFF _AC(0, UL) 106 + #define HGATP_MODE_SV32X4 _AC(1, UL) 107 + #define HGATP_MODE_SV39X4 _AC(8, UL) 108 + #define HGATP_MODE_SV48X4 _AC(9, UL) 109 + 110 + #define HGATP32_MODE_SHIFT 31 111 + #define HGATP32_VMID_SHIFT 22 112 + #define HGATP32_VMID_MASK _AC(0x1FC00000, UL) 113 + #define HGATP32_PPN _AC(0x003FFFFF, UL) 114 + 115 + #define HGATP64_MODE_SHIFT 60 116 + #define HGATP64_VMID_SHIFT 44 117 + #define HGATP64_VMID_MASK _AC(0x03FFF00000000000, UL) 118 + #define HGATP64_PPN _AC(0x00000FFFFFFFFFFF, UL) 119 + 120 + #define HGATP_PAGE_SHIFT 12 121 + 122 + #ifdef CONFIG_64BIT 123 + #define HGATP_PPN HGATP64_PPN 124 + #define HGATP_VMID_SHIFT HGATP64_VMID_SHIFT 125 + #define HGATP_VMID_MASK HGATP64_VMID_MASK 126 + #define HGATP_MODE_SHIFT HGATP64_MODE_SHIFT 127 + #else 128 + #define HGATP_PPN HGATP32_PPN 129 + #define HGATP_VMID_SHIFT HGATP32_VMID_SHIFT 130 + #define HGATP_VMID_MASK HGATP32_VMID_MASK 131 + #define HGATP_MODE_SHIFT HGATP32_MODE_SHIFT 132 + #endif 133 + 134 + /* VSIP & HVIP relation */ 135 + #define VSIP_TO_HVIP_SHIFT (IRQ_VS_SOFT - IRQ_S_SOFT) 136 + #define VSIP_VALID_MASK ((_AC(1, UL) << IRQ_S_SOFT) | \ 137 + (_AC(1, UL) << IRQ_S_TIMER) | \ 138 + (_AC(1, UL) << IRQ_S_EXT)) 97 139 98 140 /* symbolic CSR names: */ 99 141 #define CSR_CYCLE 0xc00 ··· 165 103 #define CSR_STVAL 0x143 166 104 #define CSR_SIP 0x144 167 105 #define CSR_SATP 0x180 106 + 107 + #define CSR_VSSTATUS 0x200 108 + #define CSR_VSIE 0x204 109 + #define CSR_VSTVEC 0x205 110 + #define CSR_VSSCRATCH 0x240 111 + #define CSR_VSEPC 0x241 112 + #define CSR_VSCAUSE 0x242 113 + #define CSR_VSTVAL 0x243 114 + #define CSR_VSIP 0x244 115 + #define CSR_VSATP 0x280 116 + 117 + #define CSR_HSTATUS 0x600 118 + #define CSR_HEDELEG 0x602 119 + #define CSR_HIDELEG 0x603 120 + #define CSR_HIE 0x604 121 + #define CSR_HTIMEDELTA 0x605 122 + #define CSR_HCOUNTEREN 0x606 123 + #define CSR_HGEIE 0x607 124 + #define CSR_HTIMEDELTAH 0x615 125 + #define CSR_HTVAL 0x643 126 + #define CSR_HIP 0x644 127 + #define CSR_HVIP 0x645 128 + #define CSR_HTINST 0x64a 129 + #define CSR_HGATP 0x680 130 + #define CSR_HGEIP 0xe12 168 131 169 132 #define CSR_MSTATUS 0x300 170 133 #define CSR_MISA 0x301

+264

arch/riscv/include/asm/kvm_host.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Anup Patel <anup.patel@wdc.com> 7 + */ 8 + 9 + #ifndef __RISCV_KVM_HOST_H__ 10 + #define __RISCV_KVM_HOST_H__ 11 + 12 + #include <linux/types.h> 13 + #include <linux/kvm.h> 14 + #include <linux/kvm_types.h> 15 + #include <asm/kvm_vcpu_fp.h> 16 + #include <asm/kvm_vcpu_timer.h> 17 + 18 + #ifdef CONFIG_64BIT 19 + #define KVM_MAX_VCPUS (1U << 16) 20 + #else 21 + #define KVM_MAX_VCPUS (1U << 9) 22 + #endif 23 + 24 + #define KVM_HALT_POLL_NS_DEFAULT 500000 25 + 26 + #define KVM_VCPU_MAX_FEATURES 0 27 + 28 + #define KVM_REQ_SLEEP \ 29 + KVM_ARCH_REQ_FLAGS(0, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP) 30 + #define KVM_REQ_VCPU_RESET KVM_ARCH_REQ(1) 31 + #define KVM_REQ_UPDATE_HGATP KVM_ARCH_REQ(2) 32 + 33 + struct kvm_vm_stat { 34 + struct kvm_vm_stat_generic generic; 35 + }; 36 + 37 + struct kvm_vcpu_stat { 38 + struct kvm_vcpu_stat_generic generic; 39 + u64 ecall_exit_stat; 40 + u64 wfi_exit_stat; 41 + u64 mmio_exit_user; 42 + u64 mmio_exit_kernel; 43 + u64 exits; 44 + }; 45 + 46 + struct kvm_arch_memory_slot { 47 + }; 48 + 49 + struct kvm_vmid { 50 + /* 51 + * Writes to vmid_version and vmid happen with vmid_lock held 52 + * whereas reads happen without any lock held. 53 + */ 54 + unsigned long vmid_version; 55 + unsigned long vmid; 56 + }; 57 + 58 + struct kvm_arch { 59 + /* stage2 vmid */ 60 + struct kvm_vmid vmid; 61 + 62 + /* stage2 page table */ 63 + pgd_t *pgd; 64 + phys_addr_t pgd_phys; 65 + 66 + /* Guest Timer */ 67 + struct kvm_guest_timer timer; 68 + }; 69 + 70 + struct kvm_mmio_decode { 71 + unsigned long insn; 72 + int insn_len; 73 + int len; 74 + int shift; 75 + int return_handled; 76 + }; 77 + 78 + struct kvm_sbi_context { 79 + int return_handled; 80 + }; 81 + 82 + #define KVM_MMU_PAGE_CACHE_NR_OBJS 32 83 + 84 + struct kvm_mmu_page_cache { 85 + int nobjs; 86 + void *objects[KVM_MMU_PAGE_CACHE_NR_OBJS]; 87 + }; 88 + 89 + struct kvm_cpu_trap { 90 + unsigned long sepc; 91 + unsigned long scause; 92 + unsigned long stval; 93 + unsigned long htval; 94 + unsigned long htinst; 95 + }; 96 + 97 + struct kvm_cpu_context { 98 + unsigned long zero; 99 + unsigned long ra; 100 + unsigned long sp; 101 + unsigned long gp; 102 + unsigned long tp; 103 + unsigned long t0; 104 + unsigned long t1; 105 + unsigned long t2; 106 + unsigned long s0; 107 + unsigned long s1; 108 + unsigned long a0; 109 + unsigned long a1; 110 + unsigned long a2; 111 + unsigned long a3; 112 + unsigned long a4; 113 + unsigned long a5; 114 + unsigned long a6; 115 + unsigned long a7; 116 + unsigned long s2; 117 + unsigned long s3; 118 + unsigned long s4; 119 + unsigned long s5; 120 + unsigned long s6; 121 + unsigned long s7; 122 + unsigned long s8; 123 + unsigned long s9; 124 + unsigned long s10; 125 + unsigned long s11; 126 + unsigned long t3; 127 + unsigned long t4; 128 + unsigned long t5; 129 + unsigned long t6; 130 + unsigned long sepc; 131 + unsigned long sstatus; 132 + unsigned long hstatus; 133 + union __riscv_fp_state fp; 134 + }; 135 + 136 + struct kvm_vcpu_csr { 137 + unsigned long vsstatus; 138 + unsigned long vsie; 139 + unsigned long vstvec; 140 + unsigned long vsscratch; 141 + unsigned long vsepc; 142 + unsigned long vscause; 143 + unsigned long vstval; 144 + unsigned long hvip; 145 + unsigned long vsatp; 146 + unsigned long scounteren; 147 + }; 148 + 149 + struct kvm_vcpu_arch { 150 + /* VCPU ran at least once */ 151 + bool ran_atleast_once; 152 + 153 + /* ISA feature bits (similar to MISA) */ 154 + unsigned long isa; 155 + 156 + /* SSCRATCH, STVEC, and SCOUNTEREN of Host */ 157 + unsigned long host_sscratch; 158 + unsigned long host_stvec; 159 + unsigned long host_scounteren; 160 + 161 + /* CPU context of Host */ 162 + struct kvm_cpu_context host_context; 163 + 164 + /* CPU context of Guest VCPU */ 165 + struct kvm_cpu_context guest_context; 166 + 167 + /* CPU CSR context of Guest VCPU */ 168 + struct kvm_vcpu_csr guest_csr; 169 + 170 + /* CPU context upon Guest VCPU reset */ 171 + struct kvm_cpu_context guest_reset_context; 172 + 173 + /* CPU CSR context upon Guest VCPU reset */ 174 + struct kvm_vcpu_csr guest_reset_csr; 175 + 176 + /* 177 + * VCPU interrupts 178 + * 179 + * We have a lockless approach for tracking pending VCPU interrupts 180 + * implemented using atomic bitops. The irqs_pending bitmap represent 181 + * pending interrupts whereas irqs_pending_mask represent bits changed 182 + * in irqs_pending. Our approach is modeled around multiple producer 183 + * and single consumer problem where the consumer is the VCPU itself. 184 + */ 185 + unsigned long irqs_pending; 186 + unsigned long irqs_pending_mask; 187 + 188 + /* VCPU Timer */ 189 + struct kvm_vcpu_timer timer; 190 + 191 + /* MMIO instruction details */ 192 + struct kvm_mmio_decode mmio_decode; 193 + 194 + /* SBI context */ 195 + struct kvm_sbi_context sbi_context; 196 + 197 + /* Cache pages needed to program page tables with spinlock held */ 198 + struct kvm_mmu_page_cache mmu_page_cache; 199 + 200 + /* VCPU power-off state */ 201 + bool power_off; 202 + 203 + /* Don't run the VCPU (blocked) */ 204 + bool pause; 205 + 206 + /* SRCU lock index for in-kernel run loop */ 207 + int srcu_idx; 208 + }; 209 + 210 + static inline void kvm_arch_hardware_unsetup(void) {} 211 + static inline void kvm_arch_sync_events(struct kvm *kvm) {} 212 + static inline void kvm_arch_sched_in(struct kvm_vcpu *vcpu, int cpu) {} 213 + static inline void kvm_arch_vcpu_block_finish(struct kvm_vcpu *vcpu) {} 214 + 215 + #define KVM_ARCH_WANT_MMU_NOTIFIER 216 + 217 + void __kvm_riscv_hfence_gvma_vmid_gpa(unsigned long gpa_divby_4, 218 + unsigned long vmid); 219 + void __kvm_riscv_hfence_gvma_vmid(unsigned long vmid); 220 + void __kvm_riscv_hfence_gvma_gpa(unsigned long gpa_divby_4); 221 + void __kvm_riscv_hfence_gvma_all(void); 222 + 223 + int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, 224 + struct kvm_memory_slot *memslot, 225 + gpa_t gpa, unsigned long hva, bool is_write); 226 + void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu); 227 + int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm); 228 + void kvm_riscv_stage2_free_pgd(struct kvm *kvm); 229 + void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu); 230 + void kvm_riscv_stage2_mode_detect(void); 231 + unsigned long kvm_riscv_stage2_mode(void); 232 + 233 + void kvm_riscv_stage2_vmid_detect(void); 234 + unsigned long kvm_riscv_stage2_vmid_bits(void); 235 + int kvm_riscv_stage2_vmid_init(struct kvm *kvm); 236 + bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid); 237 + void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu); 238 + 239 + void __kvm_riscv_unpriv_trap(void); 240 + 241 + unsigned long kvm_riscv_vcpu_unpriv_read(struct kvm_vcpu *vcpu, 242 + bool read_insn, 243 + unsigned long guest_addr, 244 + struct kvm_cpu_trap *trap); 245 + void kvm_riscv_vcpu_trap_redirect(struct kvm_vcpu *vcpu, 246 + struct kvm_cpu_trap *trap); 247 + int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run); 248 + int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, 249 + struct kvm_cpu_trap *trap); 250 + 251 + void __kvm_riscv_switch_to(struct kvm_vcpu_arch *vcpu_arch); 252 + 253 + int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq); 254 + int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq); 255 + void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu); 256 + void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu); 257 + bool kvm_riscv_vcpu_has_interrupts(struct kvm_vcpu *vcpu, unsigned long mask); 258 + void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu); 259 + void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu); 260 + 261 + int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run); 262 + int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run); 263 + 264 + #endif /* __RISCV_KVM_HOST_H__ */

+7

arch/riscv/include/asm/kvm_types.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _ASM_RISCV_KVM_TYPES_H 3 + #define _ASM_RISCV_KVM_TYPES_H 4 + 5 + #define KVM_ARCH_NR_OBJS_PER_MEMORY_CACHE 40 6 + 7 + #endif /* _ASM_RISCV_KVM_TYPES_H */

+59

arch/riscv/include/asm/kvm_vcpu_fp.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright (C) 2021 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Atish Patra <atish.patra@wdc.com> 7 + * Anup Patel <anup.patel@wdc.com> 8 + */ 9 + 10 + #ifndef __KVM_VCPU_RISCV_FP_H 11 + #define __KVM_VCPU_RISCV_FP_H 12 + 13 + #include <linux/types.h> 14 + 15 + struct kvm_cpu_context; 16 + 17 + #ifdef CONFIG_FPU 18 + void __kvm_riscv_fp_f_save(struct kvm_cpu_context *context); 19 + void __kvm_riscv_fp_f_restore(struct kvm_cpu_context *context); 20 + void __kvm_riscv_fp_d_save(struct kvm_cpu_context *context); 21 + void __kvm_riscv_fp_d_restore(struct kvm_cpu_context *context); 22 + 23 + void kvm_riscv_vcpu_fp_reset(struct kvm_vcpu *vcpu); 24 + void kvm_riscv_vcpu_guest_fp_save(struct kvm_cpu_context *cntx, 25 + unsigned long isa); 26 + void kvm_riscv_vcpu_guest_fp_restore(struct kvm_cpu_context *cntx, 27 + unsigned long isa); 28 + void kvm_riscv_vcpu_host_fp_save(struct kvm_cpu_context *cntx); 29 + void kvm_riscv_vcpu_host_fp_restore(struct kvm_cpu_context *cntx); 30 + #else 31 + static inline void kvm_riscv_vcpu_fp_reset(struct kvm_vcpu *vcpu) 32 + { 33 + } 34 + static inline void kvm_riscv_vcpu_guest_fp_save(struct kvm_cpu_context *cntx, 35 + unsigned long isa) 36 + { 37 + } 38 + static inline void kvm_riscv_vcpu_guest_fp_restore( 39 + struct kvm_cpu_context *cntx, 40 + unsigned long isa) 41 + { 42 + } 43 + static inline void kvm_riscv_vcpu_host_fp_save(struct kvm_cpu_context *cntx) 44 + { 45 + } 46 + static inline void kvm_riscv_vcpu_host_fp_restore( 47 + struct kvm_cpu_context *cntx) 48 + { 49 + } 50 + #endif 51 + 52 + int kvm_riscv_vcpu_get_reg_fp(struct kvm_vcpu *vcpu, 53 + const struct kvm_one_reg *reg, 54 + unsigned long rtype); 55 + int kvm_riscv_vcpu_set_reg_fp(struct kvm_vcpu *vcpu, 56 + const struct kvm_one_reg *reg, 57 + unsigned long rtype); 58 + 59 + #endif

+44

arch/riscv/include/asm/kvm_vcpu_timer.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Atish Patra <atish.patra@wdc.com> 7 + */ 8 + 9 + #ifndef __KVM_VCPU_RISCV_TIMER_H 10 + #define __KVM_VCPU_RISCV_TIMER_H 11 + 12 + #include <linux/hrtimer.h> 13 + 14 + struct kvm_guest_timer { 15 + /* Mult & Shift values to get nanoseconds from cycles */ 16 + u32 nsec_mult; 17 + u32 nsec_shift; 18 + /* Time delta value */ 19 + u64 time_delta; 20 + }; 21 + 22 + struct kvm_vcpu_timer { 23 + /* Flag for whether init is done */ 24 + bool init_done; 25 + /* Flag for whether timer event is configured */ 26 + bool next_set; 27 + /* Next timer event cycles */ 28 + u64 next_cycles; 29 + /* Underlying hrtimer instance */ 30 + struct hrtimer hrt; 31 + }; 32 + 33 + int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu, u64 ncycles); 34 + int kvm_riscv_vcpu_get_reg_timer(struct kvm_vcpu *vcpu, 35 + const struct kvm_one_reg *reg); 36 + int kvm_riscv_vcpu_set_reg_timer(struct kvm_vcpu *vcpu, 37 + const struct kvm_one_reg *reg); 38 + int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu); 39 + int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu); 40 + int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu); 41 + void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu); 42 + int kvm_riscv_guest_timer_init(struct kvm *kvm); 43 + 44 + #endif

+128

arch/riscv/include/uapi/asm/kvm.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */ 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Anup Patel <anup.patel@wdc.com> 7 + */ 8 + 9 + #ifndef __LINUX_KVM_RISCV_H 10 + #define __LINUX_KVM_RISCV_H 11 + 12 + #ifndef __ASSEMBLY__ 13 + 14 + #include <linux/types.h> 15 + #include <asm/ptrace.h> 16 + 17 + #define __KVM_HAVE_READONLY_MEM 18 + 19 + #define KVM_COALESCED_MMIO_PAGE_OFFSET 1 20 + 21 + #define KVM_INTERRUPT_SET -1U 22 + #define KVM_INTERRUPT_UNSET -2U 23 + 24 + /* for KVM_GET_REGS and KVM_SET_REGS */ 25 + struct kvm_regs { 26 + }; 27 + 28 + /* for KVM_GET_FPU and KVM_SET_FPU */ 29 + struct kvm_fpu { 30 + }; 31 + 32 + /* KVM Debug exit structure */ 33 + struct kvm_debug_exit_arch { 34 + }; 35 + 36 + /* for KVM_SET_GUEST_DEBUG */ 37 + struct kvm_guest_debug_arch { 38 + }; 39 + 40 + /* definition of registers in kvm_run */ 41 + struct kvm_sync_regs { 42 + }; 43 + 44 + /* for KVM_GET_SREGS and KVM_SET_SREGS */ 45 + struct kvm_sregs { 46 + }; 47 + 48 + /* CONFIG registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ 49 + struct kvm_riscv_config { 50 + unsigned long isa; 51 + }; 52 + 53 + /* CORE registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ 54 + struct kvm_riscv_core { 55 + struct user_regs_struct regs; 56 + unsigned long mode; 57 + }; 58 + 59 + /* Possible privilege modes for kvm_riscv_core */ 60 + #define KVM_RISCV_MODE_S 1 61 + #define KVM_RISCV_MODE_U 0 62 + 63 + /* CSR registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ 64 + struct kvm_riscv_csr { 65 + unsigned long sstatus; 66 + unsigned long sie; 67 + unsigned long stvec; 68 + unsigned long sscratch; 69 + unsigned long sepc; 70 + unsigned long scause; 71 + unsigned long stval; 72 + unsigned long sip; 73 + unsigned long satp; 74 + unsigned long scounteren; 75 + }; 76 + 77 + /* TIMER registers for KVM_GET_ONE_REG and KVM_SET_ONE_REG */ 78 + struct kvm_riscv_timer { 79 + __u64 frequency; 80 + __u64 time; 81 + __u64 compare; 82 + __u64 state; 83 + }; 84 + 85 + /* Possible states for kvm_riscv_timer */ 86 + #define KVM_RISCV_TIMER_STATE_OFF 0 87 + #define KVM_RISCV_TIMER_STATE_ON 1 88 + 89 + #define KVM_REG_SIZE(id) \ 90 + (1U << (((id) & KVM_REG_SIZE_MASK) >> KVM_REG_SIZE_SHIFT)) 91 + 92 + /* If you need to interpret the index values, here is the key: */ 93 + #define KVM_REG_RISCV_TYPE_MASK 0x00000000FF000000 94 + #define KVM_REG_RISCV_TYPE_SHIFT 24 95 + 96 + /* Config registers are mapped as type 1 */ 97 + #define KVM_REG_RISCV_CONFIG (0x01 << KVM_REG_RISCV_TYPE_SHIFT) 98 + #define KVM_REG_RISCV_CONFIG_REG(name) \ 99 + (offsetof(struct kvm_riscv_config, name) / sizeof(unsigned long)) 100 + 101 + /* Core registers are mapped as type 2 */ 102 + #define KVM_REG_RISCV_CORE (0x02 << KVM_REG_RISCV_TYPE_SHIFT) 103 + #define KVM_REG_RISCV_CORE_REG(name) \ 104 + (offsetof(struct kvm_riscv_core, name) / sizeof(unsigned long)) 105 + 106 + /* Control and status registers are mapped as type 3 */ 107 + #define KVM_REG_RISCV_CSR (0x03 << KVM_REG_RISCV_TYPE_SHIFT) 108 + #define KVM_REG_RISCV_CSR_REG(name) \ 109 + (offsetof(struct kvm_riscv_csr, name) / sizeof(unsigned long)) 110 + 111 + /* Timer registers are mapped as type 4 */ 112 + #define KVM_REG_RISCV_TIMER (0x04 << KVM_REG_RISCV_TYPE_SHIFT) 113 + #define KVM_REG_RISCV_TIMER_REG(name) \ 114 + (offsetof(struct kvm_riscv_timer, name) / sizeof(__u64)) 115 + 116 + /* F extension registers are mapped as type 5 */ 117 + #define KVM_REG_RISCV_FP_F (0x05 << KVM_REG_RISCV_TYPE_SHIFT) 118 + #define KVM_REG_RISCV_FP_F_REG(name) \ 119 + (offsetof(struct __riscv_f_ext_state, name) / sizeof(__u32)) 120 + 121 + /* D extension registers are mapped as type 6 */ 122 + #define KVM_REG_RISCV_FP_D (0x06 << KVM_REG_RISCV_TYPE_SHIFT) 123 + #define KVM_REG_RISCV_FP_D_REG(name) \ 124 + (offsetof(struct __riscv_d_ext_state, name) / sizeof(__u64)) 125 + 126 + #endif 127 + 128 + #endif /* __LINUX_KVM_RISCV_H */

+156

arch/riscv/kernel/asm-offsets.c

··· 7 7 #define GENERATING_ASM_OFFSETS 8 8 9 9 #include <linux/kbuild.h> 10 + #include <linux/mm.h> 10 11 #include <linux/sched.h> 12 + #include <asm/kvm_host.h> 11 13 #include <asm/thread_info.h> 12 14 #include <asm/ptrace.h> 13 15 ··· 111 109 OFFSET(PT_STATUS, pt_regs, status); 112 110 OFFSET(PT_BADADDR, pt_regs, badaddr); 113 111 OFFSET(PT_CAUSE, pt_regs, cause); 112 + 113 + OFFSET(KVM_ARCH_GUEST_ZERO, kvm_vcpu_arch, guest_context.zero); 114 + OFFSET(KVM_ARCH_GUEST_RA, kvm_vcpu_arch, guest_context.ra); 115 + OFFSET(KVM_ARCH_GUEST_SP, kvm_vcpu_arch, guest_context.sp); 116 + OFFSET(KVM_ARCH_GUEST_GP, kvm_vcpu_arch, guest_context.gp); 117 + OFFSET(KVM_ARCH_GUEST_TP, kvm_vcpu_arch, guest_context.tp); 118 + OFFSET(KVM_ARCH_GUEST_T0, kvm_vcpu_arch, guest_context.t0); 119 + OFFSET(KVM_ARCH_GUEST_T1, kvm_vcpu_arch, guest_context.t1); 120 + OFFSET(KVM_ARCH_GUEST_T2, kvm_vcpu_arch, guest_context.t2); 121 + OFFSET(KVM_ARCH_GUEST_S0, kvm_vcpu_arch, guest_context.s0); 122 + OFFSET(KVM_ARCH_GUEST_S1, kvm_vcpu_arch, guest_context.s1); 123 + OFFSET(KVM_ARCH_GUEST_A0, kvm_vcpu_arch, guest_context.a0); 124 + OFFSET(KVM_ARCH_GUEST_A1, kvm_vcpu_arch, guest_context.a1); 125 + OFFSET(KVM_ARCH_GUEST_A2, kvm_vcpu_arch, guest_context.a2); 126 + OFFSET(KVM_ARCH_GUEST_A3, kvm_vcpu_arch, guest_context.a3); 127 + OFFSET(KVM_ARCH_GUEST_A4, kvm_vcpu_arch, guest_context.a4); 128 + OFFSET(KVM_ARCH_GUEST_A5, kvm_vcpu_arch, guest_context.a5); 129 + OFFSET(KVM_ARCH_GUEST_A6, kvm_vcpu_arch, guest_context.a6); 130 + OFFSET(KVM_ARCH_GUEST_A7, kvm_vcpu_arch, guest_context.a7); 131 + OFFSET(KVM_ARCH_GUEST_S2, kvm_vcpu_arch, guest_context.s2); 132 + OFFSET(KVM_ARCH_GUEST_S3, kvm_vcpu_arch, guest_context.s3); 133 + OFFSET(KVM_ARCH_GUEST_S4, kvm_vcpu_arch, guest_context.s4); 134 + OFFSET(KVM_ARCH_GUEST_S5, kvm_vcpu_arch, guest_context.s5); 135 + OFFSET(KVM_ARCH_GUEST_S6, kvm_vcpu_arch, guest_context.s6); 136 + OFFSET(KVM_ARCH_GUEST_S7, kvm_vcpu_arch, guest_context.s7); 137 + OFFSET(KVM_ARCH_GUEST_S8, kvm_vcpu_arch, guest_context.s8); 138 + OFFSET(KVM_ARCH_GUEST_S9, kvm_vcpu_arch, guest_context.s9); 139 + OFFSET(KVM_ARCH_GUEST_S10, kvm_vcpu_arch, guest_context.s10); 140 + OFFSET(KVM_ARCH_GUEST_S11, kvm_vcpu_arch, guest_context.s11); 141 + OFFSET(KVM_ARCH_GUEST_T3, kvm_vcpu_arch, guest_context.t3); 142 + OFFSET(KVM_ARCH_GUEST_T4, kvm_vcpu_arch, guest_context.t4); 143 + OFFSET(KVM_ARCH_GUEST_T5, kvm_vcpu_arch, guest_context.t5); 144 + OFFSET(KVM_ARCH_GUEST_T6, kvm_vcpu_arch, guest_context.t6); 145 + OFFSET(KVM_ARCH_GUEST_SEPC, kvm_vcpu_arch, guest_context.sepc); 146 + OFFSET(KVM_ARCH_GUEST_SSTATUS, kvm_vcpu_arch, guest_context.sstatus); 147 + OFFSET(KVM_ARCH_GUEST_HSTATUS, kvm_vcpu_arch, guest_context.hstatus); 148 + OFFSET(KVM_ARCH_GUEST_SCOUNTEREN, kvm_vcpu_arch, guest_csr.scounteren); 149 + 150 + OFFSET(KVM_ARCH_HOST_ZERO, kvm_vcpu_arch, host_context.zero); 151 + OFFSET(KVM_ARCH_HOST_RA, kvm_vcpu_arch, host_context.ra); 152 + OFFSET(KVM_ARCH_HOST_SP, kvm_vcpu_arch, host_context.sp); 153 + OFFSET(KVM_ARCH_HOST_GP, kvm_vcpu_arch, host_context.gp); 154 + OFFSET(KVM_ARCH_HOST_TP, kvm_vcpu_arch, host_context.tp); 155 + OFFSET(KVM_ARCH_HOST_T0, kvm_vcpu_arch, host_context.t0); 156 + OFFSET(KVM_ARCH_HOST_T1, kvm_vcpu_arch, host_context.t1); 157 + OFFSET(KVM_ARCH_HOST_T2, kvm_vcpu_arch, host_context.t2); 158 + OFFSET(KVM_ARCH_HOST_S0, kvm_vcpu_arch, host_context.s0); 159 + OFFSET(KVM_ARCH_HOST_S1, kvm_vcpu_arch, host_context.s1); 160 + OFFSET(KVM_ARCH_HOST_A0, kvm_vcpu_arch, host_context.a0); 161 + OFFSET(KVM_ARCH_HOST_A1, kvm_vcpu_arch, host_context.a1); 162 + OFFSET(KVM_ARCH_HOST_A2, kvm_vcpu_arch, host_context.a2); 163 + OFFSET(KVM_ARCH_HOST_A3, kvm_vcpu_arch, host_context.a3); 164 + OFFSET(KVM_ARCH_HOST_A4, kvm_vcpu_arch, host_context.a4); 165 + OFFSET(KVM_ARCH_HOST_A5, kvm_vcpu_arch, host_context.a5); 166 + OFFSET(KVM_ARCH_HOST_A6, kvm_vcpu_arch, host_context.a6); 167 + OFFSET(KVM_ARCH_HOST_A7, kvm_vcpu_arch, host_context.a7); 168 + OFFSET(KVM_ARCH_HOST_S2, kvm_vcpu_arch, host_context.s2); 169 + OFFSET(KVM_ARCH_HOST_S3, kvm_vcpu_arch, host_context.s3); 170 + OFFSET(KVM_ARCH_HOST_S4, kvm_vcpu_arch, host_context.s4); 171 + OFFSET(KVM_ARCH_HOST_S5, kvm_vcpu_arch, host_context.s5); 172 + OFFSET(KVM_ARCH_HOST_S6, kvm_vcpu_arch, host_context.s6); 173 + OFFSET(KVM_ARCH_HOST_S7, kvm_vcpu_arch, host_context.s7); 174 + OFFSET(KVM_ARCH_HOST_S8, kvm_vcpu_arch, host_context.s8); 175 + OFFSET(KVM_ARCH_HOST_S9, kvm_vcpu_arch, host_context.s9); 176 + OFFSET(KVM_ARCH_HOST_S10, kvm_vcpu_arch, host_context.s10); 177 + OFFSET(KVM_ARCH_HOST_S11, kvm_vcpu_arch, host_context.s11); 178 + OFFSET(KVM_ARCH_HOST_T3, kvm_vcpu_arch, host_context.t3); 179 + OFFSET(KVM_ARCH_HOST_T4, kvm_vcpu_arch, host_context.t4); 180 + OFFSET(KVM_ARCH_HOST_T5, kvm_vcpu_arch, host_context.t5); 181 + OFFSET(KVM_ARCH_HOST_T6, kvm_vcpu_arch, host_context.t6); 182 + OFFSET(KVM_ARCH_HOST_SEPC, kvm_vcpu_arch, host_context.sepc); 183 + OFFSET(KVM_ARCH_HOST_SSTATUS, kvm_vcpu_arch, host_context.sstatus); 184 + OFFSET(KVM_ARCH_HOST_HSTATUS, kvm_vcpu_arch, host_context.hstatus); 185 + OFFSET(KVM_ARCH_HOST_SSCRATCH, kvm_vcpu_arch, host_sscratch); 186 + OFFSET(KVM_ARCH_HOST_STVEC, kvm_vcpu_arch, host_stvec); 187 + OFFSET(KVM_ARCH_HOST_SCOUNTEREN, kvm_vcpu_arch, host_scounteren); 188 + 189 + OFFSET(KVM_ARCH_TRAP_SEPC, kvm_cpu_trap, sepc); 190 + OFFSET(KVM_ARCH_TRAP_SCAUSE, kvm_cpu_trap, scause); 191 + OFFSET(KVM_ARCH_TRAP_STVAL, kvm_cpu_trap, stval); 192 + OFFSET(KVM_ARCH_TRAP_HTVAL, kvm_cpu_trap, htval); 193 + OFFSET(KVM_ARCH_TRAP_HTINST, kvm_cpu_trap, htinst); 194 + 195 + /* F extension */ 196 + 197 + OFFSET(KVM_ARCH_FP_F_F0, kvm_cpu_context, fp.f.f[0]); 198 + OFFSET(KVM_ARCH_FP_F_F1, kvm_cpu_context, fp.f.f[1]); 199 + OFFSET(KVM_ARCH_FP_F_F2, kvm_cpu_context, fp.f.f[2]); 200 + OFFSET(KVM_ARCH_FP_F_F3, kvm_cpu_context, fp.f.f[3]); 201 + OFFSET(KVM_ARCH_FP_F_F4, kvm_cpu_context, fp.f.f[4]); 202 + OFFSET(KVM_ARCH_FP_F_F5, kvm_cpu_context, fp.f.f[5]); 203 + OFFSET(KVM_ARCH_FP_F_F6, kvm_cpu_context, fp.f.f[6]); 204 + OFFSET(KVM_ARCH_FP_F_F7, kvm_cpu_context, fp.f.f[7]); 205 + OFFSET(KVM_ARCH_FP_F_F8, kvm_cpu_context, fp.f.f[8]); 206 + OFFSET(KVM_ARCH_FP_F_F9, kvm_cpu_context, fp.f.f[9]); 207 + OFFSET(KVM_ARCH_FP_F_F10, kvm_cpu_context, fp.f.f[10]); 208 + OFFSET(KVM_ARCH_FP_F_F11, kvm_cpu_context, fp.f.f[11]); 209 + OFFSET(KVM_ARCH_FP_F_F12, kvm_cpu_context, fp.f.f[12]); 210 + OFFSET(KVM_ARCH_FP_F_F13, kvm_cpu_context, fp.f.f[13]); 211 + OFFSET(KVM_ARCH_FP_F_F14, kvm_cpu_context, fp.f.f[14]); 212 + OFFSET(KVM_ARCH_FP_F_F15, kvm_cpu_context, fp.f.f[15]); 213 + OFFSET(KVM_ARCH_FP_F_F16, kvm_cpu_context, fp.f.f[16]); 214 + OFFSET(KVM_ARCH_FP_F_F17, kvm_cpu_context, fp.f.f[17]); 215 + OFFSET(KVM_ARCH_FP_F_F18, kvm_cpu_context, fp.f.f[18]); 216 + OFFSET(KVM_ARCH_FP_F_F19, kvm_cpu_context, fp.f.f[19]); 217 + OFFSET(KVM_ARCH_FP_F_F20, kvm_cpu_context, fp.f.f[20]); 218 + OFFSET(KVM_ARCH_FP_F_F21, kvm_cpu_context, fp.f.f[21]); 219 + OFFSET(KVM_ARCH_FP_F_F22, kvm_cpu_context, fp.f.f[22]); 220 + OFFSET(KVM_ARCH_FP_F_F23, kvm_cpu_context, fp.f.f[23]); 221 + OFFSET(KVM_ARCH_FP_F_F24, kvm_cpu_context, fp.f.f[24]); 222 + OFFSET(KVM_ARCH_FP_F_F25, kvm_cpu_context, fp.f.f[25]); 223 + OFFSET(KVM_ARCH_FP_F_F26, kvm_cpu_context, fp.f.f[26]); 224 + OFFSET(KVM_ARCH_FP_F_F27, kvm_cpu_context, fp.f.f[27]); 225 + OFFSET(KVM_ARCH_FP_F_F28, kvm_cpu_context, fp.f.f[28]); 226 + OFFSET(KVM_ARCH_FP_F_F29, kvm_cpu_context, fp.f.f[29]); 227 + OFFSET(KVM_ARCH_FP_F_F30, kvm_cpu_context, fp.f.f[30]); 228 + OFFSET(KVM_ARCH_FP_F_F31, kvm_cpu_context, fp.f.f[31]); 229 + OFFSET(KVM_ARCH_FP_F_FCSR, kvm_cpu_context, fp.f.fcsr); 230 + 231 + /* D extension */ 232 + 233 + OFFSET(KVM_ARCH_FP_D_F0, kvm_cpu_context, fp.d.f[0]); 234 + OFFSET(KVM_ARCH_FP_D_F1, kvm_cpu_context, fp.d.f[1]); 235 + OFFSET(KVM_ARCH_FP_D_F2, kvm_cpu_context, fp.d.f[2]); 236 + OFFSET(KVM_ARCH_FP_D_F3, kvm_cpu_context, fp.d.f[3]); 237 + OFFSET(KVM_ARCH_FP_D_F4, kvm_cpu_context, fp.d.f[4]); 238 + OFFSET(KVM_ARCH_FP_D_F5, kvm_cpu_context, fp.d.f[5]); 239 + OFFSET(KVM_ARCH_FP_D_F6, kvm_cpu_context, fp.d.f[6]); 240 + OFFSET(KVM_ARCH_FP_D_F7, kvm_cpu_context, fp.d.f[7]); 241 + OFFSET(KVM_ARCH_FP_D_F8, kvm_cpu_context, fp.d.f[8]); 242 + OFFSET(KVM_ARCH_FP_D_F9, kvm_cpu_context, fp.d.f[9]); 243 + OFFSET(KVM_ARCH_FP_D_F10, kvm_cpu_context, fp.d.f[10]); 244 + OFFSET(KVM_ARCH_FP_D_F11, kvm_cpu_context, fp.d.f[11]); 245 + OFFSET(KVM_ARCH_FP_D_F12, kvm_cpu_context, fp.d.f[12]); 246 + OFFSET(KVM_ARCH_FP_D_F13, kvm_cpu_context, fp.d.f[13]); 247 + OFFSET(KVM_ARCH_FP_D_F14, kvm_cpu_context, fp.d.f[14]); 248 + OFFSET(KVM_ARCH_FP_D_F15, kvm_cpu_context, fp.d.f[15]); 249 + OFFSET(KVM_ARCH_FP_D_F16, kvm_cpu_context, fp.d.f[16]); 250 + OFFSET(KVM_ARCH_FP_D_F17, kvm_cpu_context, fp.d.f[17]); 251 + OFFSET(KVM_ARCH_FP_D_F18, kvm_cpu_context, fp.d.f[18]); 252 + OFFSET(KVM_ARCH_FP_D_F19, kvm_cpu_context, fp.d.f[19]); 253 + OFFSET(KVM_ARCH_FP_D_F20, kvm_cpu_context, fp.d.f[20]); 254 + OFFSET(KVM_ARCH_FP_D_F21, kvm_cpu_context, fp.d.f[21]); 255 + OFFSET(KVM_ARCH_FP_D_F22, kvm_cpu_context, fp.d.f[22]); 256 + OFFSET(KVM_ARCH_FP_D_F23, kvm_cpu_context, fp.d.f[23]); 257 + OFFSET(KVM_ARCH_FP_D_F24, kvm_cpu_context, fp.d.f[24]); 258 + OFFSET(KVM_ARCH_FP_D_F25, kvm_cpu_context, fp.d.f[25]); 259 + OFFSET(KVM_ARCH_FP_D_F26, kvm_cpu_context, fp.d.f[26]); 260 + OFFSET(KVM_ARCH_FP_D_F27, kvm_cpu_context, fp.d.f[27]); 261 + OFFSET(KVM_ARCH_FP_D_F28, kvm_cpu_context, fp.d.f[28]); 262 + OFFSET(KVM_ARCH_FP_D_F29, kvm_cpu_context, fp.d.f[29]); 263 + OFFSET(KVM_ARCH_FP_D_F30, kvm_cpu_context, fp.d.f[30]); 264 + OFFSET(KVM_ARCH_FP_D_F31, kvm_cpu_context, fp.d.f[31]); 265 + OFFSET(KVM_ARCH_FP_D_FCSR, kvm_cpu_context, fp.d.fcsr); 114 266 115 267 /* 116 268 * THREAD_{F,X}* might be larger than a S-type offset can handle, but

+35

arch/riscv/kvm/Kconfig

··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + # 3 + # KVM configuration 4 + # 5 + 6 + source "virt/kvm/Kconfig" 7 + 8 + menuconfig VIRTUALIZATION 9 + bool "Virtualization" 10 + help 11 + Say Y here to get to see options for using your Linux host to run 12 + other operating systems inside virtual machines (guests). 13 + This option alone does not add any kernel code. 14 + 15 + If you say N, all options in this submenu will be skipped and 16 + disabled. 17 + 18 + if VIRTUALIZATION 19 + 20 + config KVM 21 + tristate "Kernel-based Virtual Machine (KVM) support (EXPERIMENTAL)" 22 + depends on RISCV_SBI && MMU 23 + select MMU_NOTIFIER 24 + select PREEMPT_NOTIFIERS 25 + select KVM_MMIO 26 + select KVM_GENERIC_DIRTYLOG_READ_PROTECT 27 + select HAVE_KVM_VCPU_ASYNC_IOCTL 28 + select HAVE_KVM_EVENTFD 29 + select SRCU 30 + help 31 + Support hosting virtualized guest machines. 32 + 33 + If unsure, say N. 34 + 35 + endif # VIRTUALIZATION

+26

arch/riscv/kvm/Makefile

··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + # 3 + # Makefile for RISC-V KVM support 4 + # 5 + 6 + ccflags-y += -I $(srctree)/$(src) 7 + 8 + KVM := ../../../virt/kvm 9 + 10 + obj-$(CONFIG_KVM) += kvm.o 11 + 12 + kvm-y += $(KVM)/kvm_main.o 13 + kvm-y += $(KVM)/coalesced_mmio.o 14 + kvm-y += $(KVM)/binary_stats.o 15 + kvm-y += $(KVM)/eventfd.o 16 + kvm-y += main.o 17 + kvm-y += vm.o 18 + kvm-y += vmid.o 19 + kvm-y += tlb.o 20 + kvm-y += mmu.o 21 + kvm-y += vcpu.o 22 + kvm-y += vcpu_exit.o 23 + kvm-y += vcpu_fp.o 24 + kvm-y += vcpu_switch.o 25 + kvm-y += vcpu_sbi.o 26 + kvm-y += vcpu_timer.o

+118

arch/riscv/kvm/main.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Anup Patel <anup.patel@wdc.com> 7 + */ 8 + 9 + #include <linux/errno.h> 10 + #include <linux/err.h> 11 + #include <linux/module.h> 12 + #include <linux/kvm_host.h> 13 + #include <asm/csr.h> 14 + #include <asm/hwcap.h> 15 + #include <asm/sbi.h> 16 + 17 + long kvm_arch_dev_ioctl(struct file *filp, 18 + unsigned int ioctl, unsigned long arg) 19 + { 20 + return -EINVAL; 21 + } 22 + 23 + int kvm_arch_check_processor_compat(void *opaque) 24 + { 25 + return 0; 26 + } 27 + 28 + int kvm_arch_hardware_setup(void *opaque) 29 + { 30 + return 0; 31 + } 32 + 33 + int kvm_arch_hardware_enable(void) 34 + { 35 + unsigned long hideleg, hedeleg; 36 + 37 + hedeleg = 0; 38 + hedeleg |= (1UL << EXC_INST_MISALIGNED); 39 + hedeleg |= (1UL << EXC_BREAKPOINT); 40 + hedeleg |= (1UL << EXC_SYSCALL); 41 + hedeleg |= (1UL << EXC_INST_PAGE_FAULT); 42 + hedeleg |= (1UL << EXC_LOAD_PAGE_FAULT); 43 + hedeleg |= (1UL << EXC_STORE_PAGE_FAULT); 44 + csr_write(CSR_HEDELEG, hedeleg); 45 + 46 + hideleg = 0; 47 + hideleg |= (1UL << IRQ_VS_SOFT); 48 + hideleg |= (1UL << IRQ_VS_TIMER); 49 + hideleg |= (1UL << IRQ_VS_EXT); 50 + csr_write(CSR_HIDELEG, hideleg); 51 + 52 + csr_write(CSR_HCOUNTEREN, -1UL); 53 + 54 + csr_write(CSR_HVIP, 0); 55 + 56 + return 0; 57 + } 58 + 59 + void kvm_arch_hardware_disable(void) 60 + { 61 + csr_write(CSR_HEDELEG, 0); 62 + csr_write(CSR_HIDELEG, 0); 63 + } 64 + 65 + int kvm_arch_init(void *opaque) 66 + { 67 + const char *str; 68 + 69 + if (!riscv_isa_extension_available(NULL, h)) { 70 + kvm_info("hypervisor extension not available\n"); 71 + return -ENODEV; 72 + } 73 + 74 + if (sbi_spec_is_0_1()) { 75 + kvm_info("require SBI v0.2 or higher\n"); 76 + return -ENODEV; 77 + } 78 + 79 + if (sbi_probe_extension(SBI_EXT_RFENCE) <= 0) { 80 + kvm_info("require SBI RFENCE extension\n"); 81 + return -ENODEV; 82 + } 83 + 84 + kvm_riscv_stage2_mode_detect(); 85 + 86 + kvm_riscv_stage2_vmid_detect(); 87 + 88 + kvm_info("hypervisor extension available\n"); 89 + 90 + switch (kvm_riscv_stage2_mode()) { 91 + case HGATP_MODE_SV32X4: 92 + str = "Sv32x4"; 93 + break; 94 + case HGATP_MODE_SV39X4: 95 + str = "Sv39x4"; 96 + break; 97 + case HGATP_MODE_SV48X4: 98 + str = "Sv48x4"; 99 + break; 100 + default: 101 + return -ENODEV; 102 + } 103 + kvm_info("using %s G-stage page table format\n", str); 104 + 105 + kvm_info("VMID %ld bits available\n", kvm_riscv_stage2_vmid_bits()); 106 + 107 + return 0; 108 + } 109 + 110 + void kvm_arch_exit(void) 111 + { 112 + } 113 + 114 + static int riscv_kvm_init(void) 115 + { 116 + return kvm_init(NULL, sizeof(struct kvm_vcpu), 0, THIS_MODULE); 117 + } 118 + module_init(riscv_kvm_init);

+802

arch/riscv/kvm/mmu.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Anup Patel <anup.patel@wdc.com> 7 + */ 8 + 9 + #include <linux/bitops.h> 10 + #include <linux/errno.h> 11 + #include <linux/err.h> 12 + #include <linux/hugetlb.h> 13 + #include <linux/module.h> 14 + #include <linux/uaccess.h> 15 + #include <linux/vmalloc.h> 16 + #include <linux/kvm_host.h> 17 + #include <linux/sched/signal.h> 18 + #include <asm/csr.h> 19 + #include <asm/page.h> 20 + #include <asm/pgtable.h> 21 + #include <asm/sbi.h> 22 + 23 + #ifdef CONFIG_64BIT 24 + static unsigned long stage2_mode = (HGATP_MODE_SV39X4 << HGATP_MODE_SHIFT); 25 + static unsigned long stage2_pgd_levels = 3; 26 + #define stage2_index_bits 9 27 + #else 28 + static unsigned long stage2_mode = (HGATP_MODE_SV32X4 << HGATP_MODE_SHIFT); 29 + static unsigned long stage2_pgd_levels = 2; 30 + #define stage2_index_bits 10 31 + #endif 32 + 33 + #define stage2_pgd_xbits 2 34 + #define stage2_pgd_size (1UL << (HGATP_PAGE_SHIFT + stage2_pgd_xbits)) 35 + #define stage2_gpa_bits (HGATP_PAGE_SHIFT + \ 36 + (stage2_pgd_levels * stage2_index_bits) + \ 37 + stage2_pgd_xbits) 38 + #define stage2_gpa_size ((gpa_t)(1ULL << stage2_gpa_bits)) 39 + 40 + #define stage2_pte_leaf(__ptep) \ 41 + (pte_val(*(__ptep)) & (_PAGE_READ | _PAGE_WRITE | _PAGE_EXEC)) 42 + 43 + static inline unsigned long stage2_pte_index(gpa_t addr, u32 level) 44 + { 45 + unsigned long mask; 46 + unsigned long shift = HGATP_PAGE_SHIFT + (stage2_index_bits * level); 47 + 48 + if (level == (stage2_pgd_levels - 1)) 49 + mask = (PTRS_PER_PTE * (1UL << stage2_pgd_xbits)) - 1; 50 + else 51 + mask = PTRS_PER_PTE - 1; 52 + 53 + return (addr >> shift) & mask; 54 + } 55 + 56 + static inline unsigned long stage2_pte_page_vaddr(pte_t pte) 57 + { 58 + return (unsigned long)pfn_to_virt(pte_val(pte) >> _PAGE_PFN_SHIFT); 59 + } 60 + 61 + static int stage2_page_size_to_level(unsigned long page_size, u32 *out_level) 62 + { 63 + u32 i; 64 + unsigned long psz = 1UL << 12; 65 + 66 + for (i = 0; i < stage2_pgd_levels; i++) { 67 + if (page_size == (psz << (i * stage2_index_bits))) { 68 + *out_level = i; 69 + return 0; 70 + } 71 + } 72 + 73 + return -EINVAL; 74 + } 75 + 76 + static int stage2_level_to_page_size(u32 level, unsigned long *out_pgsize) 77 + { 78 + if (stage2_pgd_levels < level) 79 + return -EINVAL; 80 + 81 + *out_pgsize = 1UL << (12 + (level * stage2_index_bits)); 82 + 83 + return 0; 84 + } 85 + 86 + static int stage2_cache_topup(struct kvm_mmu_page_cache *pcache, 87 + int min, int max) 88 + { 89 + void *page; 90 + 91 + BUG_ON(max > KVM_MMU_PAGE_CACHE_NR_OBJS); 92 + if (pcache->nobjs >= min) 93 + return 0; 94 + while (pcache->nobjs < max) { 95 + page = (void *)__get_free_page(GFP_KERNEL | __GFP_ZERO); 96 + if (!page) 97 + return -ENOMEM; 98 + pcache->objects[pcache->nobjs++] = page; 99 + } 100 + 101 + return 0; 102 + } 103 + 104 + static void stage2_cache_flush(struct kvm_mmu_page_cache *pcache) 105 + { 106 + while (pcache && pcache->nobjs) 107 + free_page((unsigned long)pcache->objects[--pcache->nobjs]); 108 + } 109 + 110 + static void *stage2_cache_alloc(struct kvm_mmu_page_cache *pcache) 111 + { 112 + void *p; 113 + 114 + if (!pcache) 115 + return NULL; 116 + 117 + BUG_ON(!pcache->nobjs); 118 + p = pcache->objects[--pcache->nobjs]; 119 + 120 + return p; 121 + } 122 + 123 + static bool stage2_get_leaf_entry(struct kvm *kvm, gpa_t addr, 124 + pte_t **ptepp, u32 *ptep_level) 125 + { 126 + pte_t *ptep; 127 + u32 current_level = stage2_pgd_levels - 1; 128 + 129 + *ptep_level = current_level; 130 + ptep = (pte_t *)kvm->arch.pgd; 131 + ptep = &ptep[stage2_pte_index(addr, current_level)]; 132 + while (ptep && pte_val(*ptep)) { 133 + if (stage2_pte_leaf(ptep)) { 134 + *ptep_level = current_level; 135 + *ptepp = ptep; 136 + return true; 137 + } 138 + 139 + if (current_level) { 140 + current_level--; 141 + *ptep_level = current_level; 142 + ptep = (pte_t *)stage2_pte_page_vaddr(*ptep); 143 + ptep = &ptep[stage2_pte_index(addr, current_level)]; 144 + } else { 145 + ptep = NULL; 146 + } 147 + } 148 + 149 + return false; 150 + } 151 + 152 + static void stage2_remote_tlb_flush(struct kvm *kvm, u32 level, gpa_t addr) 153 + { 154 + struct cpumask hmask; 155 + unsigned long size = PAGE_SIZE; 156 + struct kvm_vmid *vmid = &kvm->arch.vmid; 157 + 158 + if (stage2_level_to_page_size(level, &size)) 159 + return; 160 + addr &= ~(size - 1); 161 + 162 + /* 163 + * TODO: Instead of cpu_online_mask, we should only target CPUs 164 + * where the Guest/VM is running. 165 + */ 166 + preempt_disable(); 167 + riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask); 168 + sbi_remote_hfence_gvma_vmid(cpumask_bits(&hmask), addr, size, 169 + READ_ONCE(vmid->vmid)); 170 + preempt_enable(); 171 + } 172 + 173 + static int stage2_set_pte(struct kvm *kvm, u32 level, 174 + struct kvm_mmu_page_cache *pcache, 175 + gpa_t addr, const pte_t *new_pte) 176 + { 177 + u32 current_level = stage2_pgd_levels - 1; 178 + pte_t *next_ptep = (pte_t *)kvm->arch.pgd; 179 + pte_t *ptep = &next_ptep[stage2_pte_index(addr, current_level)]; 180 + 181 + if (current_level < level) 182 + return -EINVAL; 183 + 184 + while (current_level != level) { 185 + if (stage2_pte_leaf(ptep)) 186 + return -EEXIST; 187 + 188 + if (!pte_val(*ptep)) { 189 + next_ptep = stage2_cache_alloc(pcache); 190 + if (!next_ptep) 191 + return -ENOMEM; 192 + *ptep = pfn_pte(PFN_DOWN(__pa(next_ptep)), 193 + __pgprot(_PAGE_TABLE)); 194 + } else { 195 + if (stage2_pte_leaf(ptep)) 196 + return -EEXIST; 197 + next_ptep = (pte_t *)stage2_pte_page_vaddr(*ptep); 198 + } 199 + 200 + current_level--; 201 + ptep = &next_ptep[stage2_pte_index(addr, current_level)]; 202 + } 203 + 204 + *ptep = *new_pte; 205 + if (stage2_pte_leaf(ptep)) 206 + stage2_remote_tlb_flush(kvm, current_level, addr); 207 + 208 + return 0; 209 + } 210 + 211 + static int stage2_map_page(struct kvm *kvm, 212 + struct kvm_mmu_page_cache *pcache, 213 + gpa_t gpa, phys_addr_t hpa, 214 + unsigned long page_size, 215 + bool page_rdonly, bool page_exec) 216 + { 217 + int ret; 218 + u32 level = 0; 219 + pte_t new_pte; 220 + pgprot_t prot; 221 + 222 + ret = stage2_page_size_to_level(page_size, &level); 223 + if (ret) 224 + return ret; 225 + 226 + /* 227 + * A RISC-V implementation can choose to either: 228 + * 1) Update 'A' and 'D' PTE bits in hardware 229 + * 2) Generate page fault when 'A' and/or 'D' bits are not set 230 + * PTE so that software can update these bits. 231 + * 232 + * We support both options mentioned above. To achieve this, we 233 + * always set 'A' and 'D' PTE bits at time of creating stage2 234 + * mapping. To support KVM dirty page logging with both options 235 + * mentioned above, we will write-protect stage2 PTEs to track 236 + * dirty pages. 237 + */ 238 + 239 + if (page_exec) { 240 + if (page_rdonly) 241 + prot = PAGE_READ_EXEC; 242 + else 243 + prot = PAGE_WRITE_EXEC; 244 + } else { 245 + if (page_rdonly) 246 + prot = PAGE_READ; 247 + else 248 + prot = PAGE_WRITE; 249 + } 250 + new_pte = pfn_pte(PFN_DOWN(hpa), prot); 251 + new_pte = pte_mkdirty(new_pte); 252 + 253 + return stage2_set_pte(kvm, level, pcache, gpa, &new_pte); 254 + } 255 + 256 + enum stage2_op { 257 + STAGE2_OP_NOP = 0, /* Nothing */ 258 + STAGE2_OP_CLEAR, /* Clear/Unmap */ 259 + STAGE2_OP_WP, /* Write-protect */ 260 + }; 261 + 262 + static void stage2_op_pte(struct kvm *kvm, gpa_t addr, 263 + pte_t *ptep, u32 ptep_level, enum stage2_op op) 264 + { 265 + int i, ret; 266 + pte_t *next_ptep; 267 + u32 next_ptep_level; 268 + unsigned long next_page_size, page_size; 269 + 270 + ret = stage2_level_to_page_size(ptep_level, &page_size); 271 + if (ret) 272 + return; 273 + 274 + BUG_ON(addr & (page_size - 1)); 275 + 276 + if (!pte_val(*ptep)) 277 + return; 278 + 279 + if (ptep_level && !stage2_pte_leaf(ptep)) { 280 + next_ptep = (pte_t *)stage2_pte_page_vaddr(*ptep); 281 + next_ptep_level = ptep_level - 1; 282 + ret = stage2_level_to_page_size(next_ptep_level, 283 + &next_page_size); 284 + if (ret) 285 + return; 286 + 287 + if (op == STAGE2_OP_CLEAR) 288 + set_pte(ptep, __pte(0)); 289 + for (i = 0; i < PTRS_PER_PTE; i++) 290 + stage2_op_pte(kvm, addr + i * next_page_size, 291 + &next_ptep[i], next_ptep_level, op); 292 + if (op == STAGE2_OP_CLEAR) 293 + put_page(virt_to_page(next_ptep)); 294 + } else { 295 + if (op == STAGE2_OP_CLEAR) 296 + set_pte(ptep, __pte(0)); 297 + else if (op == STAGE2_OP_WP) 298 + set_pte(ptep, __pte(pte_val(*ptep) & ~_PAGE_WRITE)); 299 + stage2_remote_tlb_flush(kvm, ptep_level, addr); 300 + } 301 + } 302 + 303 + static void stage2_unmap_range(struct kvm *kvm, gpa_t start, 304 + gpa_t size, bool may_block) 305 + { 306 + int ret; 307 + pte_t *ptep; 308 + u32 ptep_level; 309 + bool found_leaf; 310 + unsigned long page_size; 311 + gpa_t addr = start, end = start + size; 312 + 313 + while (addr < end) { 314 + found_leaf = stage2_get_leaf_entry(kvm, addr, 315 + &ptep, &ptep_level); 316 + ret = stage2_level_to_page_size(ptep_level, &page_size); 317 + if (ret) 318 + break; 319 + 320 + if (!found_leaf) 321 + goto next; 322 + 323 + if (!(addr & (page_size - 1)) && ((end - addr) >= page_size)) 324 + stage2_op_pte(kvm, addr, ptep, 325 + ptep_level, STAGE2_OP_CLEAR); 326 + 327 + next: 328 + addr += page_size; 329 + 330 + /* 331 + * If the range is too large, release the kvm->mmu_lock 332 + * to prevent starvation and lockup detector warnings. 333 + */ 334 + if (may_block && addr < end) 335 + cond_resched_lock(&kvm->mmu_lock); 336 + } 337 + } 338 + 339 + static void stage2_wp_range(struct kvm *kvm, gpa_t start, gpa_t end) 340 + { 341 + int ret; 342 + pte_t *ptep; 343 + u32 ptep_level; 344 + bool found_leaf; 345 + gpa_t addr = start; 346 + unsigned long page_size; 347 + 348 + while (addr < end) { 349 + found_leaf = stage2_get_leaf_entry(kvm, addr, 350 + &ptep, &ptep_level); 351 + ret = stage2_level_to_page_size(ptep_level, &page_size); 352 + if (ret) 353 + break; 354 + 355 + if (!found_leaf) 356 + goto next; 357 + 358 + if (!(addr & (page_size - 1)) && ((end - addr) >= page_size)) 359 + stage2_op_pte(kvm, addr, ptep, 360 + ptep_level, STAGE2_OP_WP); 361 + 362 + next: 363 + addr += page_size; 364 + } 365 + } 366 + 367 + static void stage2_wp_memory_region(struct kvm *kvm, int slot) 368 + { 369 + struct kvm_memslots *slots = kvm_memslots(kvm); 370 + struct kvm_memory_slot *memslot = id_to_memslot(slots, slot); 371 + phys_addr_t start = memslot->base_gfn << PAGE_SHIFT; 372 + phys_addr_t end = (memslot->base_gfn + memslot->npages) << PAGE_SHIFT; 373 + 374 + spin_lock(&kvm->mmu_lock); 375 + stage2_wp_range(kvm, start, end); 376 + spin_unlock(&kvm->mmu_lock); 377 + kvm_flush_remote_tlbs(kvm); 378 + } 379 + 380 + static int stage2_ioremap(struct kvm *kvm, gpa_t gpa, phys_addr_t hpa, 381 + unsigned long size, bool writable) 382 + { 383 + pte_t pte; 384 + int ret = 0; 385 + unsigned long pfn; 386 + phys_addr_t addr, end; 387 + struct kvm_mmu_page_cache pcache = { 0, }; 388 + 389 + end = (gpa + size + PAGE_SIZE - 1) & PAGE_MASK; 390 + pfn = __phys_to_pfn(hpa); 391 + 392 + for (addr = gpa; addr < end; addr += PAGE_SIZE) { 393 + pte = pfn_pte(pfn, PAGE_KERNEL); 394 + 395 + if (!writable) 396 + pte = pte_wrprotect(pte); 397 + 398 + ret = stage2_cache_topup(&pcache, 399 + stage2_pgd_levels, 400 + KVM_MMU_PAGE_CACHE_NR_OBJS); 401 + if (ret) 402 + goto out; 403 + 404 + spin_lock(&kvm->mmu_lock); 405 + ret = stage2_set_pte(kvm, 0, &pcache, addr, &pte); 406 + spin_unlock(&kvm->mmu_lock); 407 + if (ret) 408 + goto out; 409 + 410 + pfn++; 411 + } 412 + 413 + out: 414 + stage2_cache_flush(&pcache); 415 + return ret; 416 + } 417 + 418 + void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm *kvm, 419 + struct kvm_memory_slot *slot, 420 + gfn_t gfn_offset, 421 + unsigned long mask) 422 + { 423 + phys_addr_t base_gfn = slot->base_gfn + gfn_offset; 424 + phys_addr_t start = (base_gfn + __ffs(mask)) << PAGE_SHIFT; 425 + phys_addr_t end = (base_gfn + __fls(mask) + 1) << PAGE_SHIFT; 426 + 427 + stage2_wp_range(kvm, start, end); 428 + } 429 + 430 + void kvm_arch_sync_dirty_log(struct kvm *kvm, struct kvm_memory_slot *memslot) 431 + { 432 + } 433 + 434 + void kvm_arch_flush_remote_tlbs_memslot(struct kvm *kvm, 435 + const struct kvm_memory_slot *memslot) 436 + { 437 + kvm_flush_remote_tlbs(kvm); 438 + } 439 + 440 + void kvm_arch_free_memslot(struct kvm *kvm, struct kvm_memory_slot *free) 441 + { 442 + } 443 + 444 + void kvm_arch_memslots_updated(struct kvm *kvm, u64 gen) 445 + { 446 + } 447 + 448 + void kvm_arch_flush_shadow_all(struct kvm *kvm) 449 + { 450 + kvm_riscv_stage2_free_pgd(kvm); 451 + } 452 + 453 + void kvm_arch_flush_shadow_memslot(struct kvm *kvm, 454 + struct kvm_memory_slot *slot) 455 + { 456 + } 457 + 458 + void kvm_arch_commit_memory_region(struct kvm *kvm, 459 + const struct kvm_userspace_memory_region *mem, 460 + struct kvm_memory_slot *old, 461 + const struct kvm_memory_slot *new, 462 + enum kvm_mr_change change) 463 + { 464 + /* 465 + * At this point memslot has been committed and there is an 466 + * allocated dirty_bitmap[], dirty pages will be tracked while 467 + * the memory slot is write protected. 468 + */ 469 + if (change != KVM_MR_DELETE && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) 470 + stage2_wp_memory_region(kvm, mem->slot); 471 + } 472 + 473 + int kvm_arch_prepare_memory_region(struct kvm *kvm, 474 + struct kvm_memory_slot *memslot, 475 + const struct kvm_userspace_memory_region *mem, 476 + enum kvm_mr_change change) 477 + { 478 + hva_t hva = mem->userspace_addr; 479 + hva_t reg_end = hva + mem->memory_size; 480 + bool writable = !(mem->flags & KVM_MEM_READONLY); 481 + int ret = 0; 482 + 483 + if (change != KVM_MR_CREATE && change != KVM_MR_MOVE && 484 + change != KVM_MR_FLAGS_ONLY) 485 + return 0; 486 + 487 + /* 488 + * Prevent userspace from creating a memory region outside of the GPA 489 + * space addressable by the KVM guest GPA space. 490 + */ 491 + if ((memslot->base_gfn + memslot->npages) >= 492 + (stage2_gpa_size >> PAGE_SHIFT)) 493 + return -EFAULT; 494 + 495 + mmap_read_lock(current->mm); 496 + 497 + /* 498 + * A memory region could potentially cover multiple VMAs, and 499 + * any holes between them, so iterate over all of them to find 500 + * out if we can map any of them right now. 501 + * 502 + * +--------------------------------------------+ 503 + * +---------------+----------------+ +----------------+ 504 + * | : VMA 1 | VMA 2 | | VMA 3 : | 505 + * +---------------+----------------+ +----------------+ 506 + * | memory region | 507 + * +--------------------------------------------+ 508 + */ 509 + do { 510 + struct vm_area_struct *vma = find_vma(current->mm, hva); 511 + hva_t vm_start, vm_end; 512 + 513 + if (!vma || vma->vm_start >= reg_end) 514 + break; 515 + 516 + /* 517 + * Mapping a read-only VMA is only allowed if the 518 + * memory region is configured as read-only. 519 + */ 520 + if (writable && !(vma->vm_flags & VM_WRITE)) { 521 + ret = -EPERM; 522 + break; 523 + } 524 + 525 + /* Take the intersection of this VMA with the memory region */ 526 + vm_start = max(hva, vma->vm_start); 527 + vm_end = min(reg_end, vma->vm_end); 528 + 529 + if (vma->vm_flags & VM_PFNMAP) { 530 + gpa_t gpa = mem->guest_phys_addr + 531 + (vm_start - mem->userspace_addr); 532 + phys_addr_t pa; 533 + 534 + pa = (phys_addr_t)vma->vm_pgoff << PAGE_SHIFT; 535 + pa += vm_start - vma->vm_start; 536 + 537 + /* IO region dirty page logging not allowed */ 538 + if (memslot->flags & KVM_MEM_LOG_DIRTY_PAGES) { 539 + ret = -EINVAL; 540 + goto out; 541 + } 542 + 543 + ret = stage2_ioremap(kvm, gpa, pa, 544 + vm_end - vm_start, writable); 545 + if (ret) 546 + break; 547 + } 548 + hva = vm_end; 549 + } while (hva < reg_end); 550 + 551 + if (change == KVM_MR_FLAGS_ONLY) 552 + goto out; 553 + 554 + spin_lock(&kvm->mmu_lock); 555 + if (ret) 556 + stage2_unmap_range(kvm, mem->guest_phys_addr, 557 + mem->memory_size, false); 558 + spin_unlock(&kvm->mmu_lock); 559 + 560 + out: 561 + mmap_read_unlock(current->mm); 562 + return ret; 563 + } 564 + 565 + bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) 566 + { 567 + if (!kvm->arch.pgd) 568 + return false; 569 + 570 + stage2_unmap_range(kvm, range->start << PAGE_SHIFT, 571 + (range->end - range->start) << PAGE_SHIFT, 572 + range->may_block); 573 + return false; 574 + } 575 + 576 + bool kvm_set_spte_gfn(struct kvm *kvm, struct kvm_gfn_range *range) 577 + { 578 + int ret; 579 + kvm_pfn_t pfn = pte_pfn(range->pte); 580 + 581 + if (!kvm->arch.pgd) 582 + return false; 583 + 584 + WARN_ON(range->end - range->start != 1); 585 + 586 + ret = stage2_map_page(kvm, NULL, range->start << PAGE_SHIFT, 587 + __pfn_to_phys(pfn), PAGE_SIZE, true, true); 588 + if (ret) { 589 + kvm_debug("Failed to map stage2 page (error %d)\n", ret); 590 + return true; 591 + } 592 + 593 + return false; 594 + } 595 + 596 + bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) 597 + { 598 + pte_t *ptep; 599 + u32 ptep_level = 0; 600 + u64 size = (range->end - range->start) << PAGE_SHIFT; 601 + 602 + if (!kvm->arch.pgd) 603 + return false; 604 + 605 + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PGDIR_SIZE); 606 + 607 + if (!stage2_get_leaf_entry(kvm, range->start << PAGE_SHIFT, 608 + &ptep, &ptep_level)) 609 + return false; 610 + 611 + return ptep_test_and_clear_young(NULL, 0, ptep); 612 + } 613 + 614 + bool kvm_test_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) 615 + { 616 + pte_t *ptep; 617 + u32 ptep_level = 0; 618 + u64 size = (range->end - range->start) << PAGE_SHIFT; 619 + 620 + if (!kvm->arch.pgd) 621 + return false; 622 + 623 + WARN_ON(size != PAGE_SIZE && size != PMD_SIZE && size != PGDIR_SIZE); 624 + 625 + if (!stage2_get_leaf_entry(kvm, range->start << PAGE_SHIFT, 626 + &ptep, &ptep_level)) 627 + return false; 628 + 629 + return pte_young(*ptep); 630 + } 631 + 632 + int kvm_riscv_stage2_map(struct kvm_vcpu *vcpu, 633 + struct kvm_memory_slot *memslot, 634 + gpa_t gpa, unsigned long hva, bool is_write) 635 + { 636 + int ret; 637 + kvm_pfn_t hfn; 638 + bool writeable; 639 + short vma_pageshift; 640 + gfn_t gfn = gpa >> PAGE_SHIFT; 641 + struct vm_area_struct *vma; 642 + struct kvm *kvm = vcpu->kvm; 643 + struct kvm_mmu_page_cache *pcache = &vcpu->arch.mmu_page_cache; 644 + bool logging = (memslot->dirty_bitmap && 645 + !(memslot->flags & KVM_MEM_READONLY)) ? true : false; 646 + unsigned long vma_pagesize, mmu_seq; 647 + 648 + mmap_read_lock(current->mm); 649 + 650 + vma = find_vma_intersection(current->mm, hva, hva + 1); 651 + if (unlikely(!vma)) { 652 + kvm_err("Failed to find VMA for hva 0x%lx\n", hva); 653 + mmap_read_unlock(current->mm); 654 + return -EFAULT; 655 + } 656 + 657 + if (is_vm_hugetlb_page(vma)) 658 + vma_pageshift = huge_page_shift(hstate_vma(vma)); 659 + else 660 + vma_pageshift = PAGE_SHIFT; 661 + vma_pagesize = 1ULL << vma_pageshift; 662 + if (logging || (vma->vm_flags & VM_PFNMAP)) 663 + vma_pagesize = PAGE_SIZE; 664 + 665 + if (vma_pagesize == PMD_SIZE || vma_pagesize == PGDIR_SIZE) 666 + gfn = (gpa & huge_page_mask(hstate_vma(vma))) >> PAGE_SHIFT; 667 + 668 + mmap_read_unlock(current->mm); 669 + 670 + if (vma_pagesize != PGDIR_SIZE && 671 + vma_pagesize != PMD_SIZE && 672 + vma_pagesize != PAGE_SIZE) { 673 + kvm_err("Invalid VMA page size 0x%lx\n", vma_pagesize); 674 + return -EFAULT; 675 + } 676 + 677 + /* We need minimum second+third level pages */ 678 + ret = stage2_cache_topup(pcache, stage2_pgd_levels, 679 + KVM_MMU_PAGE_CACHE_NR_OBJS); 680 + if (ret) { 681 + kvm_err("Failed to topup stage2 cache\n"); 682 + return ret; 683 + } 684 + 685 + mmu_seq = kvm->mmu_notifier_seq; 686 + 687 + hfn = gfn_to_pfn_prot(kvm, gfn, is_write, &writeable); 688 + if (hfn == KVM_PFN_ERR_HWPOISON) { 689 + send_sig_mceerr(BUS_MCEERR_AR, (void __user *)hva, 690 + vma_pageshift, current); 691 + return 0; 692 + } 693 + if (is_error_noslot_pfn(hfn)) 694 + return -EFAULT; 695 + 696 + /* 697 + * If logging is active then we allow writable pages only 698 + * for write faults. 699 + */ 700 + if (logging && !is_write) 701 + writeable = false; 702 + 703 + spin_lock(&kvm->mmu_lock); 704 + 705 + if (mmu_notifier_retry(kvm, mmu_seq)) 706 + goto out_unlock; 707 + 708 + if (writeable) { 709 + kvm_set_pfn_dirty(hfn); 710 + mark_page_dirty(kvm, gfn); 711 + ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT, 712 + vma_pagesize, false, true); 713 + } else { 714 + ret = stage2_map_page(kvm, pcache, gpa, hfn << PAGE_SHIFT, 715 + vma_pagesize, true, true); 716 + } 717 + 718 + if (ret) 719 + kvm_err("Failed to map in stage2\n"); 720 + 721 + out_unlock: 722 + spin_unlock(&kvm->mmu_lock); 723 + kvm_set_pfn_accessed(hfn); 724 + kvm_release_pfn_clean(hfn); 725 + return ret; 726 + } 727 + 728 + void kvm_riscv_stage2_flush_cache(struct kvm_vcpu *vcpu) 729 + { 730 + stage2_cache_flush(&vcpu->arch.mmu_page_cache); 731 + } 732 + 733 + int kvm_riscv_stage2_alloc_pgd(struct kvm *kvm) 734 + { 735 + struct page *pgd_page; 736 + 737 + if (kvm->arch.pgd != NULL) { 738 + kvm_err("kvm_arch already initialized?\n"); 739 + return -EINVAL; 740 + } 741 + 742 + pgd_page = alloc_pages(GFP_KERNEL | __GFP_ZERO, 743 + get_order(stage2_pgd_size)); 744 + if (!pgd_page) 745 + return -ENOMEM; 746 + kvm->arch.pgd = page_to_virt(pgd_page); 747 + kvm->arch.pgd_phys = page_to_phys(pgd_page); 748 + 749 + return 0; 750 + } 751 + 752 + void kvm_riscv_stage2_free_pgd(struct kvm *kvm) 753 + { 754 + void *pgd = NULL; 755 + 756 + spin_lock(&kvm->mmu_lock); 757 + if (kvm->arch.pgd) { 758 + stage2_unmap_range(kvm, 0UL, stage2_gpa_size, false); 759 + pgd = READ_ONCE(kvm->arch.pgd); 760 + kvm->arch.pgd = NULL; 761 + kvm->arch.pgd_phys = 0; 762 + } 763 + spin_unlock(&kvm->mmu_lock); 764 + 765 + if (pgd) 766 + free_pages((unsigned long)pgd, get_order(stage2_pgd_size)); 767 + } 768 + 769 + void kvm_riscv_stage2_update_hgatp(struct kvm_vcpu *vcpu) 770 + { 771 + unsigned long hgatp = stage2_mode; 772 + struct kvm_arch *k = &vcpu->kvm->arch; 773 + 774 + hgatp |= (READ_ONCE(k->vmid.vmid) << HGATP_VMID_SHIFT) & 775 + HGATP_VMID_MASK; 776 + hgatp |= (k->pgd_phys >> PAGE_SHIFT) & HGATP_PPN; 777 + 778 + csr_write(CSR_HGATP, hgatp); 779 + 780 + if (!kvm_riscv_stage2_vmid_bits()) 781 + __kvm_riscv_hfence_gvma_all(); 782 + } 783 + 784 + void kvm_riscv_stage2_mode_detect(void) 785 + { 786 + #ifdef CONFIG_64BIT 787 + /* Try Sv48x4 stage2 mode */ 788 + csr_write(CSR_HGATP, HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); 789 + if ((csr_read(CSR_HGATP) >> HGATP_MODE_SHIFT) == HGATP_MODE_SV48X4) { 790 + stage2_mode = (HGATP_MODE_SV48X4 << HGATP_MODE_SHIFT); 791 + stage2_pgd_levels = 4; 792 + } 793 + csr_write(CSR_HGATP, 0); 794 + 795 + __kvm_riscv_hfence_gvma_all(); 796 + #endif 797 + } 798 + 799 + unsigned long kvm_riscv_stage2_mode(void) 800 + { 801 + return stage2_mode >> HGATP_MODE_SHIFT; 802 + }

+74

arch/riscv/kvm/tlb.S

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Anup Patel <anup.patel@wdc.com> 7 + */ 8 + 9 + #include <linux/linkage.h> 10 + #include <asm/asm.h> 11 + 12 + .text 13 + .altmacro 14 + .option norelax 15 + 16 + /* 17 + * Instruction encoding of hfence.gvma is: 18 + * HFENCE.GVMA rs1, rs2 19 + * HFENCE.GVMA zero, rs2 20 + * HFENCE.GVMA rs1 21 + * HFENCE.GVMA 22 + * 23 + * rs1!=zero and rs2!=zero ==> HFENCE.GVMA rs1, rs2 24 + * rs1==zero and rs2!=zero ==> HFENCE.GVMA zero, rs2 25 + * rs1!=zero and rs2==zero ==> HFENCE.GVMA rs1 26 + * rs1==zero and rs2==zero ==> HFENCE.GVMA 27 + * 28 + * Instruction encoding of HFENCE.GVMA is: 29 + * 0110001 rs2(5) rs1(5) 000 00000 1110011 30 + */ 31 + 32 + ENTRY(__kvm_riscv_hfence_gvma_vmid_gpa) 33 + /* 34 + * rs1 = a0 (GPA >> 2) 35 + * rs2 = a1 (VMID) 36 + * HFENCE.GVMA a0, a1 37 + * 0110001 01011 01010 000 00000 1110011 38 + */ 39 + .word 0x62b50073 40 + ret 41 + ENDPROC(__kvm_riscv_hfence_gvma_vmid_gpa) 42 + 43 + ENTRY(__kvm_riscv_hfence_gvma_vmid) 44 + /* 45 + * rs1 = zero 46 + * rs2 = a0 (VMID) 47 + * HFENCE.GVMA zero, a0 48 + * 0110001 01010 00000 000 00000 1110011 49 + */ 50 + .word 0x62a00073 51 + ret 52 + ENDPROC(__kvm_riscv_hfence_gvma_vmid) 53 + 54 + ENTRY(__kvm_riscv_hfence_gvma_gpa) 55 + /* 56 + * rs1 = a0 (GPA >> 2) 57 + * rs2 = zero 58 + * HFENCE.GVMA a0 59 + * 0110001 00000 01010 000 00000 1110011 60 + */ 61 + .word 0x62050073 62 + ret 63 + ENDPROC(__kvm_riscv_hfence_gvma_gpa) 64 + 65 + ENTRY(__kvm_riscv_hfence_gvma_all) 66 + /* 67 + * rs1 = zero 68 + * rs2 = zero 69 + * HFENCE.GVMA 70 + * 0110001 00000 00000 000 00000 1110011 71 + */ 72 + .word 0x62000073 73 + ret 74 + ENDPROC(__kvm_riscv_hfence_gvma_all)

+825

arch/riscv/kvm/vcpu.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Anup Patel <anup.patel@wdc.com> 7 + */ 8 + 9 + #include <linux/bitops.h> 10 + #include <linux/errno.h> 11 + #include <linux/err.h> 12 + #include <linux/kdebug.h> 13 + #include <linux/module.h> 14 + #include <linux/percpu.h> 15 + #include <linux/uaccess.h> 16 + #include <linux/vmalloc.h> 17 + #include <linux/sched/signal.h> 18 + #include <linux/fs.h> 19 + #include <linux/kvm_host.h> 20 + #include <asm/csr.h> 21 + #include <asm/hwcap.h> 22 + 23 + const struct _kvm_stats_desc kvm_vcpu_stats_desc[] = { 24 + KVM_GENERIC_VCPU_STATS(), 25 + STATS_DESC_COUNTER(VCPU, ecall_exit_stat), 26 + STATS_DESC_COUNTER(VCPU, wfi_exit_stat), 27 + STATS_DESC_COUNTER(VCPU, mmio_exit_user), 28 + STATS_DESC_COUNTER(VCPU, mmio_exit_kernel), 29 + STATS_DESC_COUNTER(VCPU, exits) 30 + }; 31 + 32 + const struct kvm_stats_header kvm_vcpu_stats_header = { 33 + .name_size = KVM_STATS_NAME_SIZE, 34 + .num_desc = ARRAY_SIZE(kvm_vcpu_stats_desc), 35 + .id_offset = sizeof(struct kvm_stats_header), 36 + .desc_offset = sizeof(struct kvm_stats_header) + KVM_STATS_NAME_SIZE, 37 + .data_offset = sizeof(struct kvm_stats_header) + KVM_STATS_NAME_SIZE + 38 + sizeof(kvm_vcpu_stats_desc), 39 + }; 40 + 41 + #define KVM_RISCV_ISA_ALLOWED (riscv_isa_extension_mask(a) | \ 42 + riscv_isa_extension_mask(c) | \ 43 + riscv_isa_extension_mask(d) | \ 44 + riscv_isa_extension_mask(f) | \ 45 + riscv_isa_extension_mask(i) | \ 46 + riscv_isa_extension_mask(m) | \ 47 + riscv_isa_extension_mask(s) | \ 48 + riscv_isa_extension_mask(u)) 49 + 50 + static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu) 51 + { 52 + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; 53 + struct kvm_vcpu_csr *reset_csr = &vcpu->arch.guest_reset_csr; 54 + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; 55 + struct kvm_cpu_context *reset_cntx = &vcpu->arch.guest_reset_context; 56 + 57 + memcpy(csr, reset_csr, sizeof(*csr)); 58 + 59 + memcpy(cntx, reset_cntx, sizeof(*cntx)); 60 + 61 + kvm_riscv_vcpu_fp_reset(vcpu); 62 + 63 + kvm_riscv_vcpu_timer_reset(vcpu); 64 + 65 + WRITE_ONCE(vcpu->arch.irqs_pending, 0); 66 + WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0); 67 + } 68 + 69 + int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id) 70 + { 71 + return 0; 72 + } 73 + 74 + int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu) 75 + { 76 + struct kvm_cpu_context *cntx; 77 + 78 + /* Mark this VCPU never ran */ 79 + vcpu->arch.ran_atleast_once = false; 80 + 81 + /* Setup ISA features available to VCPU */ 82 + vcpu->arch.isa = riscv_isa_extension_base(NULL) & KVM_RISCV_ISA_ALLOWED; 83 + 84 + /* Setup reset state of shadow SSTATUS and HSTATUS CSRs */ 85 + cntx = &vcpu->arch.guest_reset_context; 86 + cntx->sstatus = SR_SPP | SR_SPIE; 87 + cntx->hstatus = 0; 88 + cntx->hstatus |= HSTATUS_VTW; 89 + cntx->hstatus |= HSTATUS_SPVP; 90 + cntx->hstatus |= HSTATUS_SPV; 91 + 92 + /* Setup VCPU timer */ 93 + kvm_riscv_vcpu_timer_init(vcpu); 94 + 95 + /* Reset VCPU */ 96 + kvm_riscv_reset_vcpu(vcpu); 97 + 98 + return 0; 99 + } 100 + 101 + void kvm_arch_vcpu_postcreate(struct kvm_vcpu *vcpu) 102 + { 103 + } 104 + 105 + void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu) 106 + { 107 + /* Cleanup VCPU timer */ 108 + kvm_riscv_vcpu_timer_deinit(vcpu); 109 + 110 + /* Flush the pages pre-allocated for Stage2 page table mappings */ 111 + kvm_riscv_stage2_flush_cache(vcpu); 112 + } 113 + 114 + int kvm_cpu_has_pending_timer(struct kvm_vcpu *vcpu) 115 + { 116 + return kvm_riscv_vcpu_has_interrupts(vcpu, 1UL << IRQ_VS_TIMER); 117 + } 118 + 119 + void kvm_arch_vcpu_blocking(struct kvm_vcpu *vcpu) 120 + { 121 + } 122 + 123 + void kvm_arch_vcpu_unblocking(struct kvm_vcpu *vcpu) 124 + { 125 + } 126 + 127 + int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) 128 + { 129 + return (kvm_riscv_vcpu_has_interrupts(vcpu, -1UL) && 130 + !vcpu->arch.power_off && !vcpu->arch.pause); 131 + } 132 + 133 + int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) 134 + { 135 + return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE; 136 + } 137 + 138 + bool kvm_arch_vcpu_in_kernel(struct kvm_vcpu *vcpu) 139 + { 140 + return (vcpu->arch.guest_context.sstatus & SR_SPP) ? true : false; 141 + } 142 + 143 + vm_fault_t kvm_arch_vcpu_fault(struct kvm_vcpu *vcpu, struct vm_fault *vmf) 144 + { 145 + return VM_FAULT_SIGBUS; 146 + } 147 + 148 + static int kvm_riscv_vcpu_get_reg_config(struct kvm_vcpu *vcpu, 149 + const struct kvm_one_reg *reg) 150 + { 151 + unsigned long __user *uaddr = 152 + (unsigned long __user *)(unsigned long)reg->addr; 153 + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | 154 + KVM_REG_SIZE_MASK | 155 + KVM_REG_RISCV_CONFIG); 156 + unsigned long reg_val; 157 + 158 + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) 159 + return -EINVAL; 160 + 161 + switch (reg_num) { 162 + case KVM_REG_RISCV_CONFIG_REG(isa): 163 + reg_val = vcpu->arch.isa; 164 + break; 165 + default: 166 + return -EINVAL; 167 + } 168 + 169 + if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id))) 170 + return -EFAULT; 171 + 172 + return 0; 173 + } 174 + 175 + static int kvm_riscv_vcpu_set_reg_config(struct kvm_vcpu *vcpu, 176 + const struct kvm_one_reg *reg) 177 + { 178 + unsigned long __user *uaddr = 179 + (unsigned long __user *)(unsigned long)reg->addr; 180 + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | 181 + KVM_REG_SIZE_MASK | 182 + KVM_REG_RISCV_CONFIG); 183 + unsigned long reg_val; 184 + 185 + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) 186 + return -EINVAL; 187 + 188 + if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id))) 189 + return -EFAULT; 190 + 191 + switch (reg_num) { 192 + case KVM_REG_RISCV_CONFIG_REG(isa): 193 + if (!vcpu->arch.ran_atleast_once) { 194 + vcpu->arch.isa = reg_val; 195 + vcpu->arch.isa &= riscv_isa_extension_base(NULL); 196 + vcpu->arch.isa &= KVM_RISCV_ISA_ALLOWED; 197 + kvm_riscv_vcpu_fp_reset(vcpu); 198 + } else { 199 + return -EOPNOTSUPP; 200 + } 201 + break; 202 + default: 203 + return -EINVAL; 204 + } 205 + 206 + return 0; 207 + } 208 + 209 + static int kvm_riscv_vcpu_get_reg_core(struct kvm_vcpu *vcpu, 210 + const struct kvm_one_reg *reg) 211 + { 212 + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; 213 + unsigned long __user *uaddr = 214 + (unsigned long __user *)(unsigned long)reg->addr; 215 + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | 216 + KVM_REG_SIZE_MASK | 217 + KVM_REG_RISCV_CORE); 218 + unsigned long reg_val; 219 + 220 + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) 221 + return -EINVAL; 222 + if (reg_num >= sizeof(struct kvm_riscv_core) / sizeof(unsigned long)) 223 + return -EINVAL; 224 + 225 + if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc)) 226 + reg_val = cntx->sepc; 227 + else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num && 228 + reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6)) 229 + reg_val = ((unsigned long *)cntx)[reg_num]; 230 + else if (reg_num == KVM_REG_RISCV_CORE_REG(mode)) 231 + reg_val = (cntx->sstatus & SR_SPP) ? 232 + KVM_RISCV_MODE_S : KVM_RISCV_MODE_U; 233 + else 234 + return -EINVAL; 235 + 236 + if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id))) 237 + return -EFAULT; 238 + 239 + return 0; 240 + } 241 + 242 + static int kvm_riscv_vcpu_set_reg_core(struct kvm_vcpu *vcpu, 243 + const struct kvm_one_reg *reg) 244 + { 245 + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; 246 + unsigned long __user *uaddr = 247 + (unsigned long __user *)(unsigned long)reg->addr; 248 + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | 249 + KVM_REG_SIZE_MASK | 250 + KVM_REG_RISCV_CORE); 251 + unsigned long reg_val; 252 + 253 + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) 254 + return -EINVAL; 255 + if (reg_num >= sizeof(struct kvm_riscv_core) / sizeof(unsigned long)) 256 + return -EINVAL; 257 + 258 + if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id))) 259 + return -EFAULT; 260 + 261 + if (reg_num == KVM_REG_RISCV_CORE_REG(regs.pc)) 262 + cntx->sepc = reg_val; 263 + else if (KVM_REG_RISCV_CORE_REG(regs.pc) < reg_num && 264 + reg_num <= KVM_REG_RISCV_CORE_REG(regs.t6)) 265 + ((unsigned long *)cntx)[reg_num] = reg_val; 266 + else if (reg_num == KVM_REG_RISCV_CORE_REG(mode)) { 267 + if (reg_val == KVM_RISCV_MODE_S) 268 + cntx->sstatus |= SR_SPP; 269 + else 270 + cntx->sstatus &= ~SR_SPP; 271 + } else 272 + return -EINVAL; 273 + 274 + return 0; 275 + } 276 + 277 + static int kvm_riscv_vcpu_get_reg_csr(struct kvm_vcpu *vcpu, 278 + const struct kvm_one_reg *reg) 279 + { 280 + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; 281 + unsigned long __user *uaddr = 282 + (unsigned long __user *)(unsigned long)reg->addr; 283 + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | 284 + KVM_REG_SIZE_MASK | 285 + KVM_REG_RISCV_CSR); 286 + unsigned long reg_val; 287 + 288 + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) 289 + return -EINVAL; 290 + if (reg_num >= sizeof(struct kvm_riscv_csr) / sizeof(unsigned long)) 291 + return -EINVAL; 292 + 293 + if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) { 294 + kvm_riscv_vcpu_flush_interrupts(vcpu); 295 + reg_val = (csr->hvip >> VSIP_TO_HVIP_SHIFT) & VSIP_VALID_MASK; 296 + } else 297 + reg_val = ((unsigned long *)csr)[reg_num]; 298 + 299 + if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id))) 300 + return -EFAULT; 301 + 302 + return 0; 303 + } 304 + 305 + static int kvm_riscv_vcpu_set_reg_csr(struct kvm_vcpu *vcpu, 306 + const struct kvm_one_reg *reg) 307 + { 308 + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; 309 + unsigned long __user *uaddr = 310 + (unsigned long __user *)(unsigned long)reg->addr; 311 + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | 312 + KVM_REG_SIZE_MASK | 313 + KVM_REG_RISCV_CSR); 314 + unsigned long reg_val; 315 + 316 + if (KVM_REG_SIZE(reg->id) != sizeof(unsigned long)) 317 + return -EINVAL; 318 + if (reg_num >= sizeof(struct kvm_riscv_csr) / sizeof(unsigned long)) 319 + return -EINVAL; 320 + 321 + if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id))) 322 + return -EFAULT; 323 + 324 + if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) { 325 + reg_val &= VSIP_VALID_MASK; 326 + reg_val <<= VSIP_TO_HVIP_SHIFT; 327 + } 328 + 329 + ((unsigned long *)csr)[reg_num] = reg_val; 330 + 331 + if (reg_num == KVM_REG_RISCV_CSR_REG(sip)) 332 + WRITE_ONCE(vcpu->arch.irqs_pending_mask, 0); 333 + 334 + return 0; 335 + } 336 + 337 + static int kvm_riscv_vcpu_set_reg(struct kvm_vcpu *vcpu, 338 + const struct kvm_one_reg *reg) 339 + { 340 + if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CONFIG) 341 + return kvm_riscv_vcpu_set_reg_config(vcpu, reg); 342 + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CORE) 343 + return kvm_riscv_vcpu_set_reg_core(vcpu, reg); 344 + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR) 345 + return kvm_riscv_vcpu_set_reg_csr(vcpu, reg); 346 + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_TIMER) 347 + return kvm_riscv_vcpu_set_reg_timer(vcpu, reg); 348 + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_F) 349 + return kvm_riscv_vcpu_set_reg_fp(vcpu, reg, 350 + KVM_REG_RISCV_FP_F); 351 + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_D) 352 + return kvm_riscv_vcpu_set_reg_fp(vcpu, reg, 353 + KVM_REG_RISCV_FP_D); 354 + 355 + return -EINVAL; 356 + } 357 + 358 + static int kvm_riscv_vcpu_get_reg(struct kvm_vcpu *vcpu, 359 + const struct kvm_one_reg *reg) 360 + { 361 + if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CONFIG) 362 + return kvm_riscv_vcpu_get_reg_config(vcpu, reg); 363 + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CORE) 364 + return kvm_riscv_vcpu_get_reg_core(vcpu, reg); 365 + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_CSR) 366 + return kvm_riscv_vcpu_get_reg_csr(vcpu, reg); 367 + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_TIMER) 368 + return kvm_riscv_vcpu_get_reg_timer(vcpu, reg); 369 + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_F) 370 + return kvm_riscv_vcpu_get_reg_fp(vcpu, reg, 371 + KVM_REG_RISCV_FP_F); 372 + else if ((reg->id & KVM_REG_RISCV_TYPE_MASK) == KVM_REG_RISCV_FP_D) 373 + return kvm_riscv_vcpu_get_reg_fp(vcpu, reg, 374 + KVM_REG_RISCV_FP_D); 375 + 376 + return -EINVAL; 377 + } 378 + 379 + long kvm_arch_vcpu_async_ioctl(struct file *filp, 380 + unsigned int ioctl, unsigned long arg) 381 + { 382 + struct kvm_vcpu *vcpu = filp->private_data; 383 + void __user *argp = (void __user *)arg; 384 + 385 + if (ioctl == KVM_INTERRUPT) { 386 + struct kvm_interrupt irq; 387 + 388 + if (copy_from_user(&irq, argp, sizeof(irq))) 389 + return -EFAULT; 390 + 391 + if (irq.irq == KVM_INTERRUPT_SET) 392 + return kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_VS_EXT); 393 + else 394 + return kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_VS_EXT); 395 + } 396 + 397 + return -ENOIOCTLCMD; 398 + } 399 + 400 + long kvm_arch_vcpu_ioctl(struct file *filp, 401 + unsigned int ioctl, unsigned long arg) 402 + { 403 + struct kvm_vcpu *vcpu = filp->private_data; 404 + void __user *argp = (void __user *)arg; 405 + long r = -EINVAL; 406 + 407 + switch (ioctl) { 408 + case KVM_SET_ONE_REG: 409 + case KVM_GET_ONE_REG: { 410 + struct kvm_one_reg reg; 411 + 412 + r = -EFAULT; 413 + if (copy_from_user(&reg, argp, sizeof(reg))) 414 + break; 415 + 416 + if (ioctl == KVM_SET_ONE_REG) 417 + r = kvm_riscv_vcpu_set_reg(vcpu, &reg); 418 + else 419 + r = kvm_riscv_vcpu_get_reg(vcpu, &reg); 420 + break; 421 + } 422 + default: 423 + break; 424 + } 425 + 426 + return r; 427 + } 428 + 429 + int kvm_arch_vcpu_ioctl_get_sregs(struct kvm_vcpu *vcpu, 430 + struct kvm_sregs *sregs) 431 + { 432 + return -EINVAL; 433 + } 434 + 435 + int kvm_arch_vcpu_ioctl_set_sregs(struct kvm_vcpu *vcpu, 436 + struct kvm_sregs *sregs) 437 + { 438 + return -EINVAL; 439 + } 440 + 441 + int kvm_arch_vcpu_ioctl_get_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) 442 + { 443 + return -EINVAL; 444 + } 445 + 446 + int kvm_arch_vcpu_ioctl_set_fpu(struct kvm_vcpu *vcpu, struct kvm_fpu *fpu) 447 + { 448 + return -EINVAL; 449 + } 450 + 451 + int kvm_arch_vcpu_ioctl_translate(struct kvm_vcpu *vcpu, 452 + struct kvm_translation *tr) 453 + { 454 + return -EINVAL; 455 + } 456 + 457 + int kvm_arch_vcpu_ioctl_get_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) 458 + { 459 + return -EINVAL; 460 + } 461 + 462 + int kvm_arch_vcpu_ioctl_set_regs(struct kvm_vcpu *vcpu, struct kvm_regs *regs) 463 + { 464 + return -EINVAL; 465 + } 466 + 467 + void kvm_riscv_vcpu_flush_interrupts(struct kvm_vcpu *vcpu) 468 + { 469 + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; 470 + unsigned long mask, val; 471 + 472 + if (READ_ONCE(vcpu->arch.irqs_pending_mask)) { 473 + mask = xchg_acquire(&vcpu->arch.irqs_pending_mask, 0); 474 + val = READ_ONCE(vcpu->arch.irqs_pending) & mask; 475 + 476 + csr->hvip &= ~mask; 477 + csr->hvip |= val; 478 + } 479 + } 480 + 481 + void kvm_riscv_vcpu_sync_interrupts(struct kvm_vcpu *vcpu) 482 + { 483 + unsigned long hvip; 484 + struct kvm_vcpu_arch *v = &vcpu->arch; 485 + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; 486 + 487 + /* Read current HVIP and VSIE CSRs */ 488 + csr->vsie = csr_read(CSR_VSIE); 489 + 490 + /* Sync-up HVIP.VSSIP bit changes does by Guest */ 491 + hvip = csr_read(CSR_HVIP); 492 + if ((csr->hvip ^ hvip) & (1UL << IRQ_VS_SOFT)) { 493 + if (hvip & (1UL << IRQ_VS_SOFT)) { 494 + if (!test_and_set_bit(IRQ_VS_SOFT, 495 + &v->irqs_pending_mask)) 496 + set_bit(IRQ_VS_SOFT, &v->irqs_pending); 497 + } else { 498 + if (!test_and_set_bit(IRQ_VS_SOFT, 499 + &v->irqs_pending_mask)) 500 + clear_bit(IRQ_VS_SOFT, &v->irqs_pending); 501 + } 502 + } 503 + } 504 + 505 + int kvm_riscv_vcpu_set_interrupt(struct kvm_vcpu *vcpu, unsigned int irq) 506 + { 507 + if (irq != IRQ_VS_SOFT && 508 + irq != IRQ_VS_TIMER && 509 + irq != IRQ_VS_EXT) 510 + return -EINVAL; 511 + 512 + set_bit(irq, &vcpu->arch.irqs_pending); 513 + smp_mb__before_atomic(); 514 + set_bit(irq, &vcpu->arch.irqs_pending_mask); 515 + 516 + kvm_vcpu_kick(vcpu); 517 + 518 + return 0; 519 + } 520 + 521 + int kvm_riscv_vcpu_unset_interrupt(struct kvm_vcpu *vcpu, unsigned int irq) 522 + { 523 + if (irq != IRQ_VS_SOFT && 524 + irq != IRQ_VS_TIMER && 525 + irq != IRQ_VS_EXT) 526 + return -EINVAL; 527 + 528 + clear_bit(irq, &vcpu->arch.irqs_pending); 529 + smp_mb__before_atomic(); 530 + set_bit(irq, &vcpu->arch.irqs_pending_mask); 531 + 532 + return 0; 533 + } 534 + 535 + bool kvm_riscv_vcpu_has_interrupts(struct kvm_vcpu *vcpu, unsigned long mask) 536 + { 537 + unsigned long ie = ((vcpu->arch.guest_csr.vsie & VSIP_VALID_MASK) 538 + << VSIP_TO_HVIP_SHIFT) & mask; 539 + 540 + return (READ_ONCE(vcpu->arch.irqs_pending) & ie) ? true : false; 541 + } 542 + 543 + void kvm_riscv_vcpu_power_off(struct kvm_vcpu *vcpu) 544 + { 545 + vcpu->arch.power_off = true; 546 + kvm_make_request(KVM_REQ_SLEEP, vcpu); 547 + kvm_vcpu_kick(vcpu); 548 + } 549 + 550 + void kvm_riscv_vcpu_power_on(struct kvm_vcpu *vcpu) 551 + { 552 + vcpu->arch.power_off = false; 553 + kvm_vcpu_wake_up(vcpu); 554 + } 555 + 556 + int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, 557 + struct kvm_mp_state *mp_state) 558 + { 559 + if (vcpu->arch.power_off) 560 + mp_state->mp_state = KVM_MP_STATE_STOPPED; 561 + else 562 + mp_state->mp_state = KVM_MP_STATE_RUNNABLE; 563 + 564 + return 0; 565 + } 566 + 567 + int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, 568 + struct kvm_mp_state *mp_state) 569 + { 570 + int ret = 0; 571 + 572 + switch (mp_state->mp_state) { 573 + case KVM_MP_STATE_RUNNABLE: 574 + vcpu->arch.power_off = false; 575 + break; 576 + case KVM_MP_STATE_STOPPED: 577 + kvm_riscv_vcpu_power_off(vcpu); 578 + break; 579 + default: 580 + ret = -EINVAL; 581 + } 582 + 583 + return ret; 584 + } 585 + 586 + int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu, 587 + struct kvm_guest_debug *dbg) 588 + { 589 + /* TODO; To be implemented later. */ 590 + return -EINVAL; 591 + } 592 + 593 + void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu) 594 + { 595 + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; 596 + 597 + csr_write(CSR_VSSTATUS, csr->vsstatus); 598 + csr_write(CSR_VSIE, csr->vsie); 599 + csr_write(CSR_VSTVEC, csr->vstvec); 600 + csr_write(CSR_VSSCRATCH, csr->vsscratch); 601 + csr_write(CSR_VSEPC, csr->vsepc); 602 + csr_write(CSR_VSCAUSE, csr->vscause); 603 + csr_write(CSR_VSTVAL, csr->vstval); 604 + csr_write(CSR_HVIP, csr->hvip); 605 + csr_write(CSR_VSATP, csr->vsatp); 606 + 607 + kvm_riscv_stage2_update_hgatp(vcpu); 608 + 609 + kvm_riscv_vcpu_timer_restore(vcpu); 610 + 611 + kvm_riscv_vcpu_host_fp_save(&vcpu->arch.host_context); 612 + kvm_riscv_vcpu_guest_fp_restore(&vcpu->arch.guest_context, 613 + vcpu->arch.isa); 614 + 615 + vcpu->cpu = cpu; 616 + } 617 + 618 + void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu) 619 + { 620 + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; 621 + 622 + vcpu->cpu = -1; 623 + 624 + kvm_riscv_vcpu_guest_fp_save(&vcpu->arch.guest_context, 625 + vcpu->arch.isa); 626 + kvm_riscv_vcpu_host_fp_restore(&vcpu->arch.host_context); 627 + 628 + csr_write(CSR_HGATP, 0); 629 + 630 + csr->vsstatus = csr_read(CSR_VSSTATUS); 631 + csr->vsie = csr_read(CSR_VSIE); 632 + csr->vstvec = csr_read(CSR_VSTVEC); 633 + csr->vsscratch = csr_read(CSR_VSSCRATCH); 634 + csr->vsepc = csr_read(CSR_VSEPC); 635 + csr->vscause = csr_read(CSR_VSCAUSE); 636 + csr->vstval = csr_read(CSR_VSTVAL); 637 + csr->hvip = csr_read(CSR_HVIP); 638 + csr->vsatp = csr_read(CSR_VSATP); 639 + } 640 + 641 + static void kvm_riscv_check_vcpu_requests(struct kvm_vcpu *vcpu) 642 + { 643 + struct rcuwait *wait = kvm_arch_vcpu_get_wait(vcpu); 644 + 645 + if (kvm_request_pending(vcpu)) { 646 + if (kvm_check_request(KVM_REQ_SLEEP, vcpu)) { 647 + rcuwait_wait_event(wait, 648 + (!vcpu->arch.power_off) && (!vcpu->arch.pause), 649 + TASK_INTERRUPTIBLE); 650 + 651 + if (vcpu->arch.power_off || vcpu->arch.pause) { 652 + /* 653 + * Awaken to handle a signal, request to 654 + * sleep again later. 655 + */ 656 + kvm_make_request(KVM_REQ_SLEEP, vcpu); 657 + } 658 + } 659 + 660 + if (kvm_check_request(KVM_REQ_VCPU_RESET, vcpu)) 661 + kvm_riscv_reset_vcpu(vcpu); 662 + 663 + if (kvm_check_request(KVM_REQ_UPDATE_HGATP, vcpu)) 664 + kvm_riscv_stage2_update_hgatp(vcpu); 665 + 666 + if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu)) 667 + __kvm_riscv_hfence_gvma_all(); 668 + } 669 + } 670 + 671 + static void kvm_riscv_update_hvip(struct kvm_vcpu *vcpu) 672 + { 673 + struct kvm_vcpu_csr *csr = &vcpu->arch.guest_csr; 674 + 675 + csr_write(CSR_HVIP, csr->hvip); 676 + } 677 + 678 + int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu) 679 + { 680 + int ret; 681 + struct kvm_cpu_trap trap; 682 + struct kvm_run *run = vcpu->run; 683 + 684 + /* Mark this VCPU ran at least once */ 685 + vcpu->arch.ran_atleast_once = true; 686 + 687 + vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); 688 + 689 + /* Process MMIO value returned from user-space */ 690 + if (run->exit_reason == KVM_EXIT_MMIO) { 691 + ret = kvm_riscv_vcpu_mmio_return(vcpu, vcpu->run); 692 + if (ret) { 693 + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); 694 + return ret; 695 + } 696 + } 697 + 698 + /* Process SBI value returned from user-space */ 699 + if (run->exit_reason == KVM_EXIT_RISCV_SBI) { 700 + ret = kvm_riscv_vcpu_sbi_return(vcpu, vcpu->run); 701 + if (ret) { 702 + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); 703 + return ret; 704 + } 705 + } 706 + 707 + if (run->immediate_exit) { 708 + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); 709 + return -EINTR; 710 + } 711 + 712 + vcpu_load(vcpu); 713 + 714 + kvm_sigset_activate(vcpu); 715 + 716 + ret = 1; 717 + run->exit_reason = KVM_EXIT_UNKNOWN; 718 + while (ret > 0) { 719 + /* Check conditions before entering the guest */ 720 + cond_resched(); 721 + 722 + kvm_riscv_stage2_vmid_update(vcpu); 723 + 724 + kvm_riscv_check_vcpu_requests(vcpu); 725 + 726 + preempt_disable(); 727 + 728 + local_irq_disable(); 729 + 730 + /* 731 + * Exit if we have a signal pending so that we can deliver 732 + * the signal to user space. 733 + */ 734 + if (signal_pending(current)) { 735 + ret = -EINTR; 736 + run->exit_reason = KVM_EXIT_INTR; 737 + } 738 + 739 + /* 740 + * Ensure we set mode to IN_GUEST_MODE after we disable 741 + * interrupts and before the final VCPU requests check. 742 + * See the comment in kvm_vcpu_exiting_guest_mode() and 743 + * Documentation/virtual/kvm/vcpu-requests.rst 744 + */ 745 + vcpu->mode = IN_GUEST_MODE; 746 + 747 + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); 748 + smp_mb__after_srcu_read_unlock(); 749 + 750 + /* 751 + * We might have got VCPU interrupts updated asynchronously 752 + * so update it in HW. 753 + */ 754 + kvm_riscv_vcpu_flush_interrupts(vcpu); 755 + 756 + /* Update HVIP CSR for current CPU */ 757 + kvm_riscv_update_hvip(vcpu); 758 + 759 + if (ret <= 0 || 760 + kvm_riscv_stage2_vmid_ver_changed(&vcpu->kvm->arch.vmid) || 761 + kvm_request_pending(vcpu)) { 762 + vcpu->mode = OUTSIDE_GUEST_MODE; 763 + local_irq_enable(); 764 + preempt_enable(); 765 + vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); 766 + continue; 767 + } 768 + 769 + guest_enter_irqoff(); 770 + 771 + __kvm_riscv_switch_to(&vcpu->arch); 772 + 773 + vcpu->mode = OUTSIDE_GUEST_MODE; 774 + vcpu->stat.exits++; 775 + 776 + /* 777 + * Save SCAUSE, STVAL, HTVAL, and HTINST because we might 778 + * get an interrupt between __kvm_riscv_switch_to() and 779 + * local_irq_enable() which can potentially change CSRs. 780 + */ 781 + trap.sepc = vcpu->arch.guest_context.sepc; 782 + trap.scause = csr_read(CSR_SCAUSE); 783 + trap.stval = csr_read(CSR_STVAL); 784 + trap.htval = csr_read(CSR_HTVAL); 785 + trap.htinst = csr_read(CSR_HTINST); 786 + 787 + /* Syncup interrupts state with HW */ 788 + kvm_riscv_vcpu_sync_interrupts(vcpu); 789 + 790 + /* 791 + * We may have taken a host interrupt in VS/VU-mode (i.e. 792 + * while executing the guest). This interrupt is still 793 + * pending, as we haven't serviced it yet! 794 + * 795 + * We're now back in HS-mode with interrupts disabled 796 + * so enabling the interrupts now will have the effect 797 + * of taking the interrupt again, in HS-mode this time. 798 + */ 799 + local_irq_enable(); 800 + 801 + /* 802 + * We do local_irq_enable() before calling guest_exit() so 803 + * that if a timer interrupt hits while running the guest 804 + * we account that tick as being spent in the guest. We 805 + * enable preemption after calling guest_exit() so that if 806 + * we get preempted we make sure ticks after that is not 807 + * counted as guest time. 808 + */ 809 + guest_exit(); 810 + 811 + preempt_enable(); 812 + 813 + vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); 814 + 815 + ret = kvm_riscv_vcpu_exit(vcpu, run, &trap); 816 + } 817 + 818 + kvm_sigset_deactivate(vcpu); 819 + 820 + vcpu_put(vcpu); 821 + 822 + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); 823 + 824 + return ret; 825 + }

+701

arch/riscv/kvm/vcpu_exit.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Anup Patel <anup.patel@wdc.com> 7 + */ 8 + 9 + #include <linux/bitops.h> 10 + #include <linux/errno.h> 11 + #include <linux/err.h> 12 + #include <linux/kvm_host.h> 13 + #include <asm/csr.h> 14 + 15 + #define INSN_OPCODE_MASK 0x007c 16 + #define INSN_OPCODE_SHIFT 2 17 + #define INSN_OPCODE_SYSTEM 28 18 + 19 + #define INSN_MASK_WFI 0xffffffff 20 + #define INSN_MATCH_WFI 0x10500073 21 + 22 + #define INSN_MATCH_LB 0x3 23 + #define INSN_MASK_LB 0x707f 24 + #define INSN_MATCH_LH 0x1003 25 + #define INSN_MASK_LH 0x707f 26 + #define INSN_MATCH_LW 0x2003 27 + #define INSN_MASK_LW 0x707f 28 + #define INSN_MATCH_LD 0x3003 29 + #define INSN_MASK_LD 0x707f 30 + #define INSN_MATCH_LBU 0x4003 31 + #define INSN_MASK_LBU 0x707f 32 + #define INSN_MATCH_LHU 0x5003 33 + #define INSN_MASK_LHU 0x707f 34 + #define INSN_MATCH_LWU 0x6003 35 + #define INSN_MASK_LWU 0x707f 36 + #define INSN_MATCH_SB 0x23 37 + #define INSN_MASK_SB 0x707f 38 + #define INSN_MATCH_SH 0x1023 39 + #define INSN_MASK_SH 0x707f 40 + #define INSN_MATCH_SW 0x2023 41 + #define INSN_MASK_SW 0x707f 42 + #define INSN_MATCH_SD 0x3023 43 + #define INSN_MASK_SD 0x707f 44 + 45 + #define INSN_MATCH_C_LD 0x6000 46 + #define INSN_MASK_C_LD 0xe003 47 + #define INSN_MATCH_C_SD 0xe000 48 + #define INSN_MASK_C_SD 0xe003 49 + #define INSN_MATCH_C_LW 0x4000 50 + #define INSN_MASK_C_LW 0xe003 51 + #define INSN_MATCH_C_SW 0xc000 52 + #define INSN_MASK_C_SW 0xe003 53 + #define INSN_MATCH_C_LDSP 0x6002 54 + #define INSN_MASK_C_LDSP 0xe003 55 + #define INSN_MATCH_C_SDSP 0xe002 56 + #define INSN_MASK_C_SDSP 0xe003 57 + #define INSN_MATCH_C_LWSP 0x4002 58 + #define INSN_MASK_C_LWSP 0xe003 59 + #define INSN_MATCH_C_SWSP 0xc002 60 + #define INSN_MASK_C_SWSP 0xe003 61 + 62 + #define INSN_16BIT_MASK 0x3 63 + 64 + #define INSN_IS_16BIT(insn) (((insn) & INSN_16BIT_MASK) != INSN_16BIT_MASK) 65 + 66 + #define INSN_LEN(insn) (INSN_IS_16BIT(insn) ? 2 : 4) 67 + 68 + #ifdef CONFIG_64BIT 69 + #define LOG_REGBYTES 3 70 + #else 71 + #define LOG_REGBYTES 2 72 + #endif 73 + #define REGBYTES (1 << LOG_REGBYTES) 74 + 75 + #define SH_RD 7 76 + #define SH_RS1 15 77 + #define SH_RS2 20 78 + #define SH_RS2C 2 79 + 80 + #define RV_X(x, s, n) (((x) >> (s)) & ((1 << (n)) - 1)) 81 + #define RVC_LW_IMM(x) ((RV_X(x, 6, 1) << 2) | \ 82 + (RV_X(x, 10, 3) << 3) | \ 83 + (RV_X(x, 5, 1) << 6)) 84 + #define RVC_LD_IMM(x) ((RV_X(x, 10, 3) << 3) | \ 85 + (RV_X(x, 5, 2) << 6)) 86 + #define RVC_LWSP_IMM(x) ((RV_X(x, 4, 3) << 2) | \ 87 + (RV_X(x, 12, 1) << 5) | \ 88 + (RV_X(x, 2, 2) << 6)) 89 + #define RVC_LDSP_IMM(x) ((RV_X(x, 5, 2) << 3) | \ 90 + (RV_X(x, 12, 1) << 5) | \ 91 + (RV_X(x, 2, 3) << 6)) 92 + #define RVC_SWSP_IMM(x) ((RV_X(x, 9, 4) << 2) | \ 93 + (RV_X(x, 7, 2) << 6)) 94 + #define RVC_SDSP_IMM(x) ((RV_X(x, 10, 3) << 3) | \ 95 + (RV_X(x, 7, 3) << 6)) 96 + #define RVC_RS1S(insn) (8 + RV_X(insn, SH_RD, 3)) 97 + #define RVC_RS2S(insn) (8 + RV_X(insn, SH_RS2C, 3)) 98 + #define RVC_RS2(insn) RV_X(insn, SH_RS2C, 5) 99 + 100 + #define SHIFT_RIGHT(x, y) \ 101 + ((y) < 0 ? ((x) << -(y)) : ((x) >> (y))) 102 + 103 + #define REG_MASK \ 104 + ((1 << (5 + LOG_REGBYTES)) - (1 << LOG_REGBYTES)) 105 + 106 + #define REG_OFFSET(insn, pos) \ 107 + (SHIFT_RIGHT((insn), (pos) - LOG_REGBYTES) & REG_MASK) 108 + 109 + #define REG_PTR(insn, pos, regs) \ 110 + ((ulong *)((ulong)(regs) + REG_OFFSET(insn, pos))) 111 + 112 + #define GET_RM(insn) (((insn) >> 12) & 7) 113 + 114 + #define GET_RS1(insn, regs) (*REG_PTR(insn, SH_RS1, regs)) 115 + #define GET_RS2(insn, regs) (*REG_PTR(insn, SH_RS2, regs)) 116 + #define GET_RS1S(insn, regs) (*REG_PTR(RVC_RS1S(insn), 0, regs)) 117 + #define GET_RS2S(insn, regs) (*REG_PTR(RVC_RS2S(insn), 0, regs)) 118 + #define GET_RS2C(insn, regs) (*REG_PTR(insn, SH_RS2C, regs)) 119 + #define GET_SP(regs) (*REG_PTR(2, 0, regs)) 120 + #define SET_RD(insn, regs, val) (*REG_PTR(insn, SH_RD, regs) = (val)) 121 + #define IMM_I(insn) ((s32)(insn) >> 20) 122 + #define IMM_S(insn) (((s32)(insn) >> 25 << 5) | \ 123 + (s32)(((insn) >> 7) & 0x1f)) 124 + #define MASK_FUNCT3 0x7000 125 + 126 + static int truly_illegal_insn(struct kvm_vcpu *vcpu, 127 + struct kvm_run *run, 128 + ulong insn) 129 + { 130 + struct kvm_cpu_trap utrap = { 0 }; 131 + 132 + /* Redirect trap to Guest VCPU */ 133 + utrap.sepc = vcpu->arch.guest_context.sepc; 134 + utrap.scause = EXC_INST_ILLEGAL; 135 + utrap.stval = insn; 136 + kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); 137 + 138 + return 1; 139 + } 140 + 141 + static int system_opcode_insn(struct kvm_vcpu *vcpu, 142 + struct kvm_run *run, 143 + ulong insn) 144 + { 145 + if ((insn & INSN_MASK_WFI) == INSN_MATCH_WFI) { 146 + vcpu->stat.wfi_exit_stat++; 147 + if (!kvm_arch_vcpu_runnable(vcpu)) { 148 + srcu_read_unlock(&vcpu->kvm->srcu, vcpu->arch.srcu_idx); 149 + kvm_vcpu_block(vcpu); 150 + vcpu->arch.srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); 151 + kvm_clear_request(KVM_REQ_UNHALT, vcpu); 152 + } 153 + vcpu->arch.guest_context.sepc += INSN_LEN(insn); 154 + return 1; 155 + } 156 + 157 + return truly_illegal_insn(vcpu, run, insn); 158 + } 159 + 160 + static int virtual_inst_fault(struct kvm_vcpu *vcpu, struct kvm_run *run, 161 + struct kvm_cpu_trap *trap) 162 + { 163 + unsigned long insn = trap->stval; 164 + struct kvm_cpu_trap utrap = { 0 }; 165 + struct kvm_cpu_context *ct; 166 + 167 + if (unlikely(INSN_IS_16BIT(insn))) { 168 + if (insn == 0) { 169 + ct = &vcpu->arch.guest_context; 170 + insn = kvm_riscv_vcpu_unpriv_read(vcpu, true, 171 + ct->sepc, 172 + &utrap); 173 + if (utrap.scause) { 174 + utrap.sepc = ct->sepc; 175 + kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); 176 + return 1; 177 + } 178 + } 179 + if (INSN_IS_16BIT(insn)) 180 + return truly_illegal_insn(vcpu, run, insn); 181 + } 182 + 183 + switch ((insn & INSN_OPCODE_MASK) >> INSN_OPCODE_SHIFT) { 184 + case INSN_OPCODE_SYSTEM: 185 + return system_opcode_insn(vcpu, run, insn); 186 + default: 187 + return truly_illegal_insn(vcpu, run, insn); 188 + } 189 + } 190 + 191 + static int emulate_load(struct kvm_vcpu *vcpu, struct kvm_run *run, 192 + unsigned long fault_addr, unsigned long htinst) 193 + { 194 + u8 data_buf[8]; 195 + unsigned long insn; 196 + int shift = 0, len = 0, insn_len = 0; 197 + struct kvm_cpu_trap utrap = { 0 }; 198 + struct kvm_cpu_context *ct = &vcpu->arch.guest_context; 199 + 200 + /* Determine trapped instruction */ 201 + if (htinst & 0x1) { 202 + /* 203 + * Bit[0] == 1 implies trapped instruction value is 204 + * transformed instruction or custom instruction. 205 + */ 206 + insn = htinst | INSN_16BIT_MASK; 207 + insn_len = (htinst & BIT(1)) ? INSN_LEN(insn) : 2; 208 + } else { 209 + /* 210 + * Bit[0] == 0 implies trapped instruction value is 211 + * zero or special value. 212 + */ 213 + insn = kvm_riscv_vcpu_unpriv_read(vcpu, true, ct->sepc, 214 + &utrap); 215 + if (utrap.scause) { 216 + /* Redirect trap if we failed to read instruction */ 217 + utrap.sepc = ct->sepc; 218 + kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); 219 + return 1; 220 + } 221 + insn_len = INSN_LEN(insn); 222 + } 223 + 224 + /* Decode length of MMIO and shift */ 225 + if ((insn & INSN_MASK_LW) == INSN_MATCH_LW) { 226 + len = 4; 227 + shift = 8 * (sizeof(ulong) - len); 228 + } else if ((insn & INSN_MASK_LB) == INSN_MATCH_LB) { 229 + len = 1; 230 + shift = 8 * (sizeof(ulong) - len); 231 + } else if ((insn & INSN_MASK_LBU) == INSN_MATCH_LBU) { 232 + len = 1; 233 + shift = 8 * (sizeof(ulong) - len); 234 + #ifdef CONFIG_64BIT 235 + } else if ((insn & INSN_MASK_LD) == INSN_MATCH_LD) { 236 + len = 8; 237 + shift = 8 * (sizeof(ulong) - len); 238 + } else if ((insn & INSN_MASK_LWU) == INSN_MATCH_LWU) { 239 + len = 4; 240 + #endif 241 + } else if ((insn & INSN_MASK_LH) == INSN_MATCH_LH) { 242 + len = 2; 243 + shift = 8 * (sizeof(ulong) - len); 244 + } else if ((insn & INSN_MASK_LHU) == INSN_MATCH_LHU) { 245 + len = 2; 246 + #ifdef CONFIG_64BIT 247 + } else if ((insn & INSN_MASK_C_LD) == INSN_MATCH_C_LD) { 248 + len = 8; 249 + shift = 8 * (sizeof(ulong) - len); 250 + insn = RVC_RS2S(insn) << SH_RD; 251 + } else if ((insn & INSN_MASK_C_LDSP) == INSN_MATCH_C_LDSP && 252 + ((insn >> SH_RD) & 0x1f)) { 253 + len = 8; 254 + shift = 8 * (sizeof(ulong) - len); 255 + #endif 256 + } else if ((insn & INSN_MASK_C_LW) == INSN_MATCH_C_LW) { 257 + len = 4; 258 + shift = 8 * (sizeof(ulong) - len); 259 + insn = RVC_RS2S(insn) << SH_RD; 260 + } else if ((insn & INSN_MASK_C_LWSP) == INSN_MATCH_C_LWSP && 261 + ((insn >> SH_RD) & 0x1f)) { 262 + len = 4; 263 + shift = 8 * (sizeof(ulong) - len); 264 + } else { 265 + return -EOPNOTSUPP; 266 + } 267 + 268 + /* Fault address should be aligned to length of MMIO */ 269 + if (fault_addr & (len - 1)) 270 + return -EIO; 271 + 272 + /* Save instruction decode info */ 273 + vcpu->arch.mmio_decode.insn = insn; 274 + vcpu->arch.mmio_decode.insn_len = insn_len; 275 + vcpu->arch.mmio_decode.shift = shift; 276 + vcpu->arch.mmio_decode.len = len; 277 + vcpu->arch.mmio_decode.return_handled = 0; 278 + 279 + /* Update MMIO details in kvm_run struct */ 280 + run->mmio.is_write = false; 281 + run->mmio.phys_addr = fault_addr; 282 + run->mmio.len = len; 283 + 284 + /* Try to handle MMIO access in the kernel */ 285 + if (!kvm_io_bus_read(vcpu, KVM_MMIO_BUS, fault_addr, len, data_buf)) { 286 + /* Successfully handled MMIO access in the kernel so resume */ 287 + memcpy(run->mmio.data, data_buf, len); 288 + vcpu->stat.mmio_exit_kernel++; 289 + kvm_riscv_vcpu_mmio_return(vcpu, run); 290 + return 1; 291 + } 292 + 293 + /* Exit to userspace for MMIO emulation */ 294 + vcpu->stat.mmio_exit_user++; 295 + run->exit_reason = KVM_EXIT_MMIO; 296 + 297 + return 0; 298 + } 299 + 300 + static int emulate_store(struct kvm_vcpu *vcpu, struct kvm_run *run, 301 + unsigned long fault_addr, unsigned long htinst) 302 + { 303 + u8 data8; 304 + u16 data16; 305 + u32 data32; 306 + u64 data64; 307 + ulong data; 308 + unsigned long insn; 309 + int len = 0, insn_len = 0; 310 + struct kvm_cpu_trap utrap = { 0 }; 311 + struct kvm_cpu_context *ct = &vcpu->arch.guest_context; 312 + 313 + /* Determine trapped instruction */ 314 + if (htinst & 0x1) { 315 + /* 316 + * Bit[0] == 1 implies trapped instruction value is 317 + * transformed instruction or custom instruction. 318 + */ 319 + insn = htinst | INSN_16BIT_MASK; 320 + insn_len = (htinst & BIT(1)) ? INSN_LEN(insn) : 2; 321 + } else { 322 + /* 323 + * Bit[0] == 0 implies trapped instruction value is 324 + * zero or special value. 325 + */ 326 + insn = kvm_riscv_vcpu_unpriv_read(vcpu, true, ct->sepc, 327 + &utrap); 328 + if (utrap.scause) { 329 + /* Redirect trap if we failed to read instruction */ 330 + utrap.sepc = ct->sepc; 331 + kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); 332 + return 1; 333 + } 334 + insn_len = INSN_LEN(insn); 335 + } 336 + 337 + data = GET_RS2(insn, &vcpu->arch.guest_context); 338 + data8 = data16 = data32 = data64 = data; 339 + 340 + if ((insn & INSN_MASK_SW) == INSN_MATCH_SW) { 341 + len = 4; 342 + } else if ((insn & INSN_MASK_SB) == INSN_MATCH_SB) { 343 + len = 1; 344 + #ifdef CONFIG_64BIT 345 + } else if ((insn & INSN_MASK_SD) == INSN_MATCH_SD) { 346 + len = 8; 347 + #endif 348 + } else if ((insn & INSN_MASK_SH) == INSN_MATCH_SH) { 349 + len = 2; 350 + #ifdef CONFIG_64BIT 351 + } else if ((insn & INSN_MASK_C_SD) == INSN_MATCH_C_SD) { 352 + len = 8; 353 + data64 = GET_RS2S(insn, &vcpu->arch.guest_context); 354 + } else if ((insn & INSN_MASK_C_SDSP) == INSN_MATCH_C_SDSP && 355 + ((insn >> SH_RD) & 0x1f)) { 356 + len = 8; 357 + data64 = GET_RS2C(insn, &vcpu->arch.guest_context); 358 + #endif 359 + } else if ((insn & INSN_MASK_C_SW) == INSN_MATCH_C_SW) { 360 + len = 4; 361 + data32 = GET_RS2S(insn, &vcpu->arch.guest_context); 362 + } else if ((insn & INSN_MASK_C_SWSP) == INSN_MATCH_C_SWSP && 363 + ((insn >> SH_RD) & 0x1f)) { 364 + len = 4; 365 + data32 = GET_RS2C(insn, &vcpu->arch.guest_context); 366 + } else { 367 + return -EOPNOTSUPP; 368 + } 369 + 370 + /* Fault address should be aligned to length of MMIO */ 371 + if (fault_addr & (len - 1)) 372 + return -EIO; 373 + 374 + /* Save instruction decode info */ 375 + vcpu->arch.mmio_decode.insn = insn; 376 + vcpu->arch.mmio_decode.insn_len = insn_len; 377 + vcpu->arch.mmio_decode.shift = 0; 378 + vcpu->arch.mmio_decode.len = len; 379 + vcpu->arch.mmio_decode.return_handled = 0; 380 + 381 + /* Copy data to kvm_run instance */ 382 + switch (len) { 383 + case 1: 384 + *((u8 *)run->mmio.data) = data8; 385 + break; 386 + case 2: 387 + *((u16 *)run->mmio.data) = data16; 388 + break; 389 + case 4: 390 + *((u32 *)run->mmio.data) = data32; 391 + break; 392 + case 8: 393 + *((u64 *)run->mmio.data) = data64; 394 + break; 395 + default: 396 + return -EOPNOTSUPP; 397 + } 398 + 399 + /* Update MMIO details in kvm_run struct */ 400 + run->mmio.is_write = true; 401 + run->mmio.phys_addr = fault_addr; 402 + run->mmio.len = len; 403 + 404 + /* Try to handle MMIO access in the kernel */ 405 + if (!kvm_io_bus_write(vcpu, KVM_MMIO_BUS, 406 + fault_addr, len, run->mmio.data)) { 407 + /* Successfully handled MMIO access in the kernel so resume */ 408 + vcpu->stat.mmio_exit_kernel++; 409 + kvm_riscv_vcpu_mmio_return(vcpu, run); 410 + return 1; 411 + } 412 + 413 + /* Exit to userspace for MMIO emulation */ 414 + vcpu->stat.mmio_exit_user++; 415 + run->exit_reason = KVM_EXIT_MMIO; 416 + 417 + return 0; 418 + } 419 + 420 + static int stage2_page_fault(struct kvm_vcpu *vcpu, struct kvm_run *run, 421 + struct kvm_cpu_trap *trap) 422 + { 423 + struct kvm_memory_slot *memslot; 424 + unsigned long hva, fault_addr; 425 + bool writeable; 426 + gfn_t gfn; 427 + int ret; 428 + 429 + fault_addr = (trap->htval << 2) | (trap->stval & 0x3); 430 + gfn = fault_addr >> PAGE_SHIFT; 431 + memslot = gfn_to_memslot(vcpu->kvm, gfn); 432 + hva = gfn_to_hva_memslot_prot(memslot, gfn, &writeable); 433 + 434 + if (kvm_is_error_hva(hva) || 435 + (trap->scause == EXC_STORE_GUEST_PAGE_FAULT && !writeable)) { 436 + switch (trap->scause) { 437 + case EXC_LOAD_GUEST_PAGE_FAULT: 438 + return emulate_load(vcpu, run, fault_addr, 439 + trap->htinst); 440 + case EXC_STORE_GUEST_PAGE_FAULT: 441 + return emulate_store(vcpu, run, fault_addr, 442 + trap->htinst); 443 + default: 444 + return -EOPNOTSUPP; 445 + }; 446 + } 447 + 448 + ret = kvm_riscv_stage2_map(vcpu, memslot, fault_addr, hva, 449 + (trap->scause == EXC_STORE_GUEST_PAGE_FAULT) ? true : false); 450 + if (ret < 0) 451 + return ret; 452 + 453 + return 1; 454 + } 455 + 456 + /** 457 + * kvm_riscv_vcpu_unpriv_read -- Read machine word from Guest memory 458 + * 459 + * @vcpu: The VCPU pointer 460 + * @read_insn: Flag representing whether we are reading instruction 461 + * @guest_addr: Guest address to read 462 + * @trap: Output pointer to trap details 463 + */ 464 + unsigned long kvm_riscv_vcpu_unpriv_read(struct kvm_vcpu *vcpu, 465 + bool read_insn, 466 + unsigned long guest_addr, 467 + struct kvm_cpu_trap *trap) 468 + { 469 + register unsigned long taddr asm("a0") = (unsigned long)trap; 470 + register unsigned long ttmp asm("a1"); 471 + register unsigned long val asm("t0"); 472 + register unsigned long tmp asm("t1"); 473 + register unsigned long addr asm("t2") = guest_addr; 474 + unsigned long flags; 475 + unsigned long old_stvec, old_hstatus; 476 + 477 + local_irq_save(flags); 478 + 479 + old_hstatus = csr_swap(CSR_HSTATUS, vcpu->arch.guest_context.hstatus); 480 + old_stvec = csr_swap(CSR_STVEC, (ulong)&__kvm_riscv_unpriv_trap); 481 + 482 + if (read_insn) { 483 + /* 484 + * HLVX.HU instruction 485 + * 0110010 00011 rs1 100 rd 1110011 486 + */ 487 + asm volatile ("\n" 488 + ".option push\n" 489 + ".option norvc\n" 490 + "add %[ttmp], %[taddr], 0\n" 491 + /* 492 + * HLVX.HU %[val], (%[addr]) 493 + * HLVX.HU t0, (t2) 494 + * 0110010 00011 00111 100 00101 1110011 495 + */ 496 + ".word 0x6433c2f3\n" 497 + "andi %[tmp], %[val], 3\n" 498 + "addi %[tmp], %[tmp], -3\n" 499 + "bne %[tmp], zero, 2f\n" 500 + "addi %[addr], %[addr], 2\n" 501 + /* 502 + * HLVX.HU %[tmp], (%[addr]) 503 + * HLVX.HU t1, (t2) 504 + * 0110010 00011 00111 100 00110 1110011 505 + */ 506 + ".word 0x6433c373\n" 507 + "sll %[tmp], %[tmp], 16\n" 508 + "add %[val], %[val], %[tmp]\n" 509 + "2:\n" 510 + ".option pop" 511 + : [val] "=&r" (val), [tmp] "=&r" (tmp), 512 + [taddr] "+&r" (taddr), [ttmp] "+&r" (ttmp), 513 + [addr] "+&r" (addr) : : "memory"); 514 + 515 + if (trap->scause == EXC_LOAD_PAGE_FAULT) 516 + trap->scause = EXC_INST_PAGE_FAULT; 517 + } else { 518 + /* 519 + * HLV.D instruction 520 + * 0110110 00000 rs1 100 rd 1110011 521 + * 522 + * HLV.W instruction 523 + * 0110100 00000 rs1 100 rd 1110011 524 + */ 525 + asm volatile ("\n" 526 + ".option push\n" 527 + ".option norvc\n" 528 + "add %[ttmp], %[taddr], 0\n" 529 + #ifdef CONFIG_64BIT 530 + /* 531 + * HLV.D %[val], (%[addr]) 532 + * HLV.D t0, (t2) 533 + * 0110110 00000 00111 100 00101 1110011 534 + */ 535 + ".word 0x6c03c2f3\n" 536 + #else 537 + /* 538 + * HLV.W %[val], (%[addr]) 539 + * HLV.W t0, (t2) 540 + * 0110100 00000 00111 100 00101 1110011 541 + */ 542 + ".word 0x6803c2f3\n" 543 + #endif 544 + ".option pop" 545 + : [val] "=&r" (val), 546 + [taddr] "+&r" (taddr), [ttmp] "+&r" (ttmp) 547 + : [addr] "r" (addr) : "memory"); 548 + } 549 + 550 + csr_write(CSR_STVEC, old_stvec); 551 + csr_write(CSR_HSTATUS, old_hstatus); 552 + 553 + local_irq_restore(flags); 554 + 555 + return val; 556 + } 557 + 558 + /** 559 + * kvm_riscv_vcpu_trap_redirect -- Redirect trap to Guest 560 + * 561 + * @vcpu: The VCPU pointer 562 + * @trap: Trap details 563 + */ 564 + void kvm_riscv_vcpu_trap_redirect(struct kvm_vcpu *vcpu, 565 + struct kvm_cpu_trap *trap) 566 + { 567 + unsigned long vsstatus = csr_read(CSR_VSSTATUS); 568 + 569 + /* Change Guest SSTATUS.SPP bit */ 570 + vsstatus &= ~SR_SPP; 571 + if (vcpu->arch.guest_context.sstatus & SR_SPP) 572 + vsstatus |= SR_SPP; 573 + 574 + /* Change Guest SSTATUS.SPIE bit */ 575 + vsstatus &= ~SR_SPIE; 576 + if (vsstatus & SR_SIE) 577 + vsstatus |= SR_SPIE; 578 + 579 + /* Clear Guest SSTATUS.SIE bit */ 580 + vsstatus &= ~SR_SIE; 581 + 582 + /* Update Guest SSTATUS */ 583 + csr_write(CSR_VSSTATUS, vsstatus); 584 + 585 + /* Update Guest SCAUSE, STVAL, and SEPC */ 586 + csr_write(CSR_VSCAUSE, trap->scause); 587 + csr_write(CSR_VSTVAL, trap->stval); 588 + csr_write(CSR_VSEPC, trap->sepc); 589 + 590 + /* Set Guest PC to Guest exception vector */ 591 + vcpu->arch.guest_context.sepc = csr_read(CSR_VSTVEC); 592 + } 593 + 594 + /** 595 + * kvm_riscv_vcpu_mmio_return -- Handle MMIO loads after user space emulation 596 + * or in-kernel IO emulation 597 + * 598 + * @vcpu: The VCPU pointer 599 + * @run: The VCPU run struct containing the mmio data 600 + */ 601 + int kvm_riscv_vcpu_mmio_return(struct kvm_vcpu *vcpu, struct kvm_run *run) 602 + { 603 + u8 data8; 604 + u16 data16; 605 + u32 data32; 606 + u64 data64; 607 + ulong insn; 608 + int len, shift; 609 + 610 + if (vcpu->arch.mmio_decode.return_handled) 611 + return 0; 612 + 613 + vcpu->arch.mmio_decode.return_handled = 1; 614 + insn = vcpu->arch.mmio_decode.insn; 615 + 616 + if (run->mmio.is_write) 617 + goto done; 618 + 619 + len = vcpu->arch.mmio_decode.len; 620 + shift = vcpu->arch.mmio_decode.shift; 621 + 622 + switch (len) { 623 + case 1: 624 + data8 = *((u8 *)run->mmio.data); 625 + SET_RD(insn, &vcpu->arch.guest_context, 626 + (ulong)data8 << shift >> shift); 627 + break; 628 + case 2: 629 + data16 = *((u16 *)run->mmio.data); 630 + SET_RD(insn, &vcpu->arch.guest_context, 631 + (ulong)data16 << shift >> shift); 632 + break; 633 + case 4: 634 + data32 = *((u32 *)run->mmio.data); 635 + SET_RD(insn, &vcpu->arch.guest_context, 636 + (ulong)data32 << shift >> shift); 637 + break; 638 + case 8: 639 + data64 = *((u64 *)run->mmio.data); 640 + SET_RD(insn, &vcpu->arch.guest_context, 641 + (ulong)data64 << shift >> shift); 642 + break; 643 + default: 644 + return -EOPNOTSUPP; 645 + } 646 + 647 + done: 648 + /* Move to next instruction */ 649 + vcpu->arch.guest_context.sepc += vcpu->arch.mmio_decode.insn_len; 650 + 651 + return 0; 652 + } 653 + 654 + /* 655 + * Return > 0 to return to guest, < 0 on error, 0 (and set exit_reason) on 656 + * proper exit to userspace. 657 + */ 658 + int kvm_riscv_vcpu_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, 659 + struct kvm_cpu_trap *trap) 660 + { 661 + int ret; 662 + 663 + /* If we got host interrupt then do nothing */ 664 + if (trap->scause & CAUSE_IRQ_FLAG) 665 + return 1; 666 + 667 + /* Handle guest traps */ 668 + ret = -EFAULT; 669 + run->exit_reason = KVM_EXIT_UNKNOWN; 670 + switch (trap->scause) { 671 + case EXC_VIRTUAL_INST_FAULT: 672 + if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV) 673 + ret = virtual_inst_fault(vcpu, run, trap); 674 + break; 675 + case EXC_INST_GUEST_PAGE_FAULT: 676 + case EXC_LOAD_GUEST_PAGE_FAULT: 677 + case EXC_STORE_GUEST_PAGE_FAULT: 678 + if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV) 679 + ret = stage2_page_fault(vcpu, run, trap); 680 + break; 681 + case EXC_SUPERVISOR_SYSCALL: 682 + if (vcpu->arch.guest_context.hstatus & HSTATUS_SPV) 683 + ret = kvm_riscv_vcpu_sbi_ecall(vcpu, run); 684 + break; 685 + default: 686 + break; 687 + } 688 + 689 + /* Print details in-case of error */ 690 + if (ret < 0) { 691 + kvm_err("VCPU exit error %d\n", ret); 692 + kvm_err("SEPC=0x%lx SSTATUS=0x%lx HSTATUS=0x%lx\n", 693 + vcpu->arch.guest_context.sepc, 694 + vcpu->arch.guest_context.sstatus, 695 + vcpu->arch.guest_context.hstatus); 696 + kvm_err("SCAUSE=0x%lx STVAL=0x%lx HTVAL=0x%lx HTINST=0x%lx\n", 697 + trap->scause, trap->stval, trap->htval, trap->htinst); 698 + } 699 + 700 + return ret; 701 + }

+167

arch/riscv/kvm/vcpu_fp.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2021 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Atish Patra <atish.patra@wdc.com> 7 + * Anup Patel <anup.patel@wdc.com> 8 + */ 9 + 10 + #include <linux/errno.h> 11 + #include <linux/err.h> 12 + #include <linux/kvm_host.h> 13 + #include <linux/uaccess.h> 14 + 15 + #ifdef CONFIG_FPU 16 + void kvm_riscv_vcpu_fp_reset(struct kvm_vcpu *vcpu) 17 + { 18 + unsigned long isa = vcpu->arch.isa; 19 + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; 20 + 21 + cntx->sstatus &= ~SR_FS; 22 + if (riscv_isa_extension_available(&isa, f) || 23 + riscv_isa_extension_available(&isa, d)) 24 + cntx->sstatus |= SR_FS_INITIAL; 25 + else 26 + cntx->sstatus |= SR_FS_OFF; 27 + } 28 + 29 + void kvm_riscv_vcpu_fp_clean(struct kvm_cpu_context *cntx) 30 + { 31 + cntx->sstatus &= ~SR_FS; 32 + cntx->sstatus |= SR_FS_CLEAN; 33 + } 34 + 35 + void kvm_riscv_vcpu_guest_fp_save(struct kvm_cpu_context *cntx, 36 + unsigned long isa) 37 + { 38 + if ((cntx->sstatus & SR_FS) == SR_FS_DIRTY) { 39 + if (riscv_isa_extension_available(&isa, d)) 40 + __kvm_riscv_fp_d_save(cntx); 41 + else if (riscv_isa_extension_available(&isa, f)) 42 + __kvm_riscv_fp_f_save(cntx); 43 + kvm_riscv_vcpu_fp_clean(cntx); 44 + } 45 + } 46 + 47 + void kvm_riscv_vcpu_guest_fp_restore(struct kvm_cpu_context *cntx, 48 + unsigned long isa) 49 + { 50 + if ((cntx->sstatus & SR_FS) != SR_FS_OFF) { 51 + if (riscv_isa_extension_available(&isa, d)) 52 + __kvm_riscv_fp_d_restore(cntx); 53 + else if (riscv_isa_extension_available(&isa, f)) 54 + __kvm_riscv_fp_f_restore(cntx); 55 + kvm_riscv_vcpu_fp_clean(cntx); 56 + } 57 + } 58 + 59 + void kvm_riscv_vcpu_host_fp_save(struct kvm_cpu_context *cntx) 60 + { 61 + /* No need to check host sstatus as it can be modified outside */ 62 + if (riscv_isa_extension_available(NULL, d)) 63 + __kvm_riscv_fp_d_save(cntx); 64 + else if (riscv_isa_extension_available(NULL, f)) 65 + __kvm_riscv_fp_f_save(cntx); 66 + } 67 + 68 + void kvm_riscv_vcpu_host_fp_restore(struct kvm_cpu_context *cntx) 69 + { 70 + if (riscv_isa_extension_available(NULL, d)) 71 + __kvm_riscv_fp_d_restore(cntx); 72 + else if (riscv_isa_extension_available(NULL, f)) 73 + __kvm_riscv_fp_f_restore(cntx); 74 + } 75 + #endif 76 + 77 + int kvm_riscv_vcpu_get_reg_fp(struct kvm_vcpu *vcpu, 78 + const struct kvm_one_reg *reg, 79 + unsigned long rtype) 80 + { 81 + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; 82 + unsigned long isa = vcpu->arch.isa; 83 + unsigned long __user *uaddr = 84 + (unsigned long __user *)(unsigned long)reg->addr; 85 + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | 86 + KVM_REG_SIZE_MASK | 87 + rtype); 88 + void *reg_val; 89 + 90 + if ((rtype == KVM_REG_RISCV_FP_F) && 91 + riscv_isa_extension_available(&isa, f)) { 92 + if (KVM_REG_SIZE(reg->id) != sizeof(u32)) 93 + return -EINVAL; 94 + if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr)) 95 + reg_val = &cntx->fp.f.fcsr; 96 + else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) && 97 + reg_num <= KVM_REG_RISCV_FP_F_REG(f[31])) 98 + reg_val = &cntx->fp.f.f[reg_num]; 99 + else 100 + return -EINVAL; 101 + } else if ((rtype == KVM_REG_RISCV_FP_D) && 102 + riscv_isa_extension_available(&isa, d)) { 103 + if (reg_num == KVM_REG_RISCV_FP_D_REG(fcsr)) { 104 + if (KVM_REG_SIZE(reg->id) != sizeof(u32)) 105 + return -EINVAL; 106 + reg_val = &cntx->fp.d.fcsr; 107 + } else if ((KVM_REG_RISCV_FP_D_REG(f[0]) <= reg_num) && 108 + reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) { 109 + if (KVM_REG_SIZE(reg->id) != sizeof(u64)) 110 + return -EINVAL; 111 + reg_val = &cntx->fp.d.f[reg_num]; 112 + } else 113 + return -EINVAL; 114 + } else 115 + return -EINVAL; 116 + 117 + if (copy_to_user(uaddr, reg_val, KVM_REG_SIZE(reg->id))) 118 + return -EFAULT; 119 + 120 + return 0; 121 + } 122 + 123 + int kvm_riscv_vcpu_set_reg_fp(struct kvm_vcpu *vcpu, 124 + const struct kvm_one_reg *reg, 125 + unsigned long rtype) 126 + { 127 + struct kvm_cpu_context *cntx = &vcpu->arch.guest_context; 128 + unsigned long isa = vcpu->arch.isa; 129 + unsigned long __user *uaddr = 130 + (unsigned long __user *)(unsigned long)reg->addr; 131 + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | 132 + KVM_REG_SIZE_MASK | 133 + rtype); 134 + void *reg_val; 135 + 136 + if ((rtype == KVM_REG_RISCV_FP_F) && 137 + riscv_isa_extension_available(&isa, f)) { 138 + if (KVM_REG_SIZE(reg->id) != sizeof(u32)) 139 + return -EINVAL; 140 + if (reg_num == KVM_REG_RISCV_FP_F_REG(fcsr)) 141 + reg_val = &cntx->fp.f.fcsr; 142 + else if ((KVM_REG_RISCV_FP_F_REG(f[0]) <= reg_num) && 143 + reg_num <= KVM_REG_RISCV_FP_F_REG(f[31])) 144 + reg_val = &cntx->fp.f.f[reg_num]; 145 + else 146 + return -EINVAL; 147 + } else if ((rtype == KVM_REG_RISCV_FP_D) && 148 + riscv_isa_extension_available(&isa, d)) { 149 + if (reg_num == KVM_REG_RISCV_FP_D_REG(fcsr)) { 150 + if (KVM_REG_SIZE(reg->id) != sizeof(u32)) 151 + return -EINVAL; 152 + reg_val = &cntx->fp.d.fcsr; 153 + } else if ((KVM_REG_RISCV_FP_D_REG(f[0]) <= reg_num) && 154 + reg_num <= KVM_REG_RISCV_FP_D_REG(f[31])) { 155 + if (KVM_REG_SIZE(reg->id) != sizeof(u64)) 156 + return -EINVAL; 157 + reg_val = &cntx->fp.d.f[reg_num]; 158 + } else 159 + return -EINVAL; 160 + } else 161 + return -EINVAL; 162 + 163 + if (copy_from_user(reg_val, uaddr, KVM_REG_SIZE(reg->id))) 164 + return -EFAULT; 165 + 166 + return 0; 167 + }

+185

arch/riscv/kvm/vcpu_sbi.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /** 3 + * Copyright (c) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Atish Patra <atish.patra@wdc.com> 7 + */ 8 + 9 + #include <linux/errno.h> 10 + #include <linux/err.h> 11 + #include <linux/kvm_host.h> 12 + #include <asm/csr.h> 13 + #include <asm/sbi.h> 14 + #include <asm/kvm_vcpu_timer.h> 15 + 16 + #define SBI_VERSION_MAJOR 0 17 + #define SBI_VERSION_MINOR 1 18 + 19 + static void kvm_riscv_vcpu_sbi_forward(struct kvm_vcpu *vcpu, 20 + struct kvm_run *run) 21 + { 22 + struct kvm_cpu_context *cp = &vcpu->arch.guest_context; 23 + 24 + vcpu->arch.sbi_context.return_handled = 0; 25 + vcpu->stat.ecall_exit_stat++; 26 + run->exit_reason = KVM_EXIT_RISCV_SBI; 27 + run->riscv_sbi.extension_id = cp->a7; 28 + run->riscv_sbi.function_id = cp->a6; 29 + run->riscv_sbi.args[0] = cp->a0; 30 + run->riscv_sbi.args[1] = cp->a1; 31 + run->riscv_sbi.args[2] = cp->a2; 32 + run->riscv_sbi.args[3] = cp->a3; 33 + run->riscv_sbi.args[4] = cp->a4; 34 + run->riscv_sbi.args[5] = cp->a5; 35 + run->riscv_sbi.ret[0] = cp->a0; 36 + run->riscv_sbi.ret[1] = cp->a1; 37 + } 38 + 39 + int kvm_riscv_vcpu_sbi_return(struct kvm_vcpu *vcpu, struct kvm_run *run) 40 + { 41 + struct kvm_cpu_context *cp = &vcpu->arch.guest_context; 42 + 43 + /* Handle SBI return only once */ 44 + if (vcpu->arch.sbi_context.return_handled) 45 + return 0; 46 + vcpu->arch.sbi_context.return_handled = 1; 47 + 48 + /* Update return values */ 49 + cp->a0 = run->riscv_sbi.ret[0]; 50 + cp->a1 = run->riscv_sbi.ret[1]; 51 + 52 + /* Move to next instruction */ 53 + vcpu->arch.guest_context.sepc += 4; 54 + 55 + return 0; 56 + } 57 + 58 + #ifdef CONFIG_RISCV_SBI_V01 59 + 60 + static void kvm_sbi_system_shutdown(struct kvm_vcpu *vcpu, 61 + struct kvm_run *run, u32 type) 62 + { 63 + int i; 64 + struct kvm_vcpu *tmp; 65 + 66 + kvm_for_each_vcpu(i, tmp, vcpu->kvm) 67 + tmp->arch.power_off = true; 68 + kvm_make_all_cpus_request(vcpu->kvm, KVM_REQ_SLEEP); 69 + 70 + memset(&run->system_event, 0, sizeof(run->system_event)); 71 + run->system_event.type = type; 72 + run->exit_reason = KVM_EXIT_SYSTEM_EVENT; 73 + } 74 + 75 + int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run) 76 + { 77 + ulong hmask; 78 + int i, ret = 1; 79 + u64 next_cycle; 80 + struct kvm_vcpu *rvcpu; 81 + bool next_sepc = true; 82 + struct cpumask cm, hm; 83 + struct kvm *kvm = vcpu->kvm; 84 + struct kvm_cpu_trap utrap = { 0 }; 85 + struct kvm_cpu_context *cp = &vcpu->arch.guest_context; 86 + 87 + if (!cp) 88 + return -EINVAL; 89 + 90 + switch (cp->a7) { 91 + case SBI_EXT_0_1_CONSOLE_GETCHAR: 92 + case SBI_EXT_0_1_CONSOLE_PUTCHAR: 93 + /* 94 + * The CONSOLE_GETCHAR/CONSOLE_PUTCHAR SBI calls cannot be 95 + * handled in kernel so we forward these to user-space 96 + */ 97 + kvm_riscv_vcpu_sbi_forward(vcpu, run); 98 + next_sepc = false; 99 + ret = 0; 100 + break; 101 + case SBI_EXT_0_1_SET_TIMER: 102 + #if __riscv_xlen == 32 103 + next_cycle = ((u64)cp->a1 << 32) | (u64)cp->a0; 104 + #else 105 + next_cycle = (u64)cp->a0; 106 + #endif 107 + kvm_riscv_vcpu_timer_next_event(vcpu, next_cycle); 108 + break; 109 + case SBI_EXT_0_1_CLEAR_IPI: 110 + kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_VS_SOFT); 111 + break; 112 + case SBI_EXT_0_1_SEND_IPI: 113 + if (cp->a0) 114 + hmask = kvm_riscv_vcpu_unpriv_read(vcpu, false, cp->a0, 115 + &utrap); 116 + else 117 + hmask = (1UL << atomic_read(&kvm->online_vcpus)) - 1; 118 + if (utrap.scause) { 119 + utrap.sepc = cp->sepc; 120 + kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); 121 + next_sepc = false; 122 + break; 123 + } 124 + for_each_set_bit(i, &hmask, BITS_PER_LONG) { 125 + rvcpu = kvm_get_vcpu_by_id(vcpu->kvm, i); 126 + kvm_riscv_vcpu_set_interrupt(rvcpu, IRQ_VS_SOFT); 127 + } 128 + break; 129 + case SBI_EXT_0_1_SHUTDOWN: 130 + kvm_sbi_system_shutdown(vcpu, run, KVM_SYSTEM_EVENT_SHUTDOWN); 131 + next_sepc = false; 132 + ret = 0; 133 + break; 134 + case SBI_EXT_0_1_REMOTE_FENCE_I: 135 + case SBI_EXT_0_1_REMOTE_SFENCE_VMA: 136 + case SBI_EXT_0_1_REMOTE_SFENCE_VMA_ASID: 137 + if (cp->a0) 138 + hmask = kvm_riscv_vcpu_unpriv_read(vcpu, false, cp->a0, 139 + &utrap); 140 + else 141 + hmask = (1UL << atomic_read(&kvm->online_vcpus)) - 1; 142 + if (utrap.scause) { 143 + utrap.sepc = cp->sepc; 144 + kvm_riscv_vcpu_trap_redirect(vcpu, &utrap); 145 + next_sepc = false; 146 + break; 147 + } 148 + cpumask_clear(&cm); 149 + for_each_set_bit(i, &hmask, BITS_PER_LONG) { 150 + rvcpu = kvm_get_vcpu_by_id(vcpu->kvm, i); 151 + if (rvcpu->cpu < 0) 152 + continue; 153 + cpumask_set_cpu(rvcpu->cpu, &cm); 154 + } 155 + riscv_cpuid_to_hartid_mask(&cm, &hm); 156 + if (cp->a7 == SBI_EXT_0_1_REMOTE_FENCE_I) 157 + sbi_remote_fence_i(cpumask_bits(&hm)); 158 + else if (cp->a7 == SBI_EXT_0_1_REMOTE_SFENCE_VMA) 159 + sbi_remote_hfence_vvma(cpumask_bits(&hm), 160 + cp->a1, cp->a2); 161 + else 162 + sbi_remote_hfence_vvma_asid(cpumask_bits(&hm), 163 + cp->a1, cp->a2, cp->a3); 164 + break; 165 + default: 166 + /* Return error for unsupported SBI calls */ 167 + cp->a0 = SBI_ERR_NOT_SUPPORTED; 168 + break; 169 + } 170 + 171 + if (next_sepc) 172 + cp->sepc += 4; 173 + 174 + return ret; 175 + } 176 + 177 + #else 178 + 179 + int kvm_riscv_vcpu_sbi_ecall(struct kvm_vcpu *vcpu, struct kvm_run *run) 180 + { 181 + kvm_riscv_vcpu_sbi_forward(vcpu, run); 182 + return 0; 183 + } 184 + 185 + #endif

+400

arch/riscv/kvm/vcpu_switch.S

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Anup Patel <anup.patel@wdc.com> 7 + */ 8 + 9 + #include <linux/linkage.h> 10 + #include <asm/asm.h> 11 + #include <asm/asm-offsets.h> 12 + #include <asm/csr.h> 13 + 14 + .text 15 + .altmacro 16 + .option norelax 17 + 18 + ENTRY(__kvm_riscv_switch_to) 19 + /* Save Host GPRs (except A0 and T0-T6) */ 20 + REG_S ra, (KVM_ARCH_HOST_RA)(a0) 21 + REG_S sp, (KVM_ARCH_HOST_SP)(a0) 22 + REG_S gp, (KVM_ARCH_HOST_GP)(a0) 23 + REG_S tp, (KVM_ARCH_HOST_TP)(a0) 24 + REG_S s0, (KVM_ARCH_HOST_S0)(a0) 25 + REG_S s1, (KVM_ARCH_HOST_S1)(a0) 26 + REG_S a1, (KVM_ARCH_HOST_A1)(a0) 27 + REG_S a2, (KVM_ARCH_HOST_A2)(a0) 28 + REG_S a3, (KVM_ARCH_HOST_A3)(a0) 29 + REG_S a4, (KVM_ARCH_HOST_A4)(a0) 30 + REG_S a5, (KVM_ARCH_HOST_A5)(a0) 31 + REG_S a6, (KVM_ARCH_HOST_A6)(a0) 32 + REG_S a7, (KVM_ARCH_HOST_A7)(a0) 33 + REG_S s2, (KVM_ARCH_HOST_S2)(a0) 34 + REG_S s3, (KVM_ARCH_HOST_S3)(a0) 35 + REG_S s4, (KVM_ARCH_HOST_S4)(a0) 36 + REG_S s5, (KVM_ARCH_HOST_S5)(a0) 37 + REG_S s6, (KVM_ARCH_HOST_S6)(a0) 38 + REG_S s7, (KVM_ARCH_HOST_S7)(a0) 39 + REG_S s8, (KVM_ARCH_HOST_S8)(a0) 40 + REG_S s9, (KVM_ARCH_HOST_S9)(a0) 41 + REG_S s10, (KVM_ARCH_HOST_S10)(a0) 42 + REG_S s11, (KVM_ARCH_HOST_S11)(a0) 43 + 44 + /* Save Host and Restore Guest SSTATUS */ 45 + REG_L t0, (KVM_ARCH_GUEST_SSTATUS)(a0) 46 + csrrw t0, CSR_SSTATUS, t0 47 + REG_S t0, (KVM_ARCH_HOST_SSTATUS)(a0) 48 + 49 + /* Save Host and Restore Guest HSTATUS */ 50 + REG_L t1, (KVM_ARCH_GUEST_HSTATUS)(a0) 51 + csrrw t1, CSR_HSTATUS, t1 52 + REG_S t1, (KVM_ARCH_HOST_HSTATUS)(a0) 53 + 54 + /* Save Host and Restore Guest SCOUNTEREN */ 55 + REG_L t2, (KVM_ARCH_GUEST_SCOUNTEREN)(a0) 56 + csrrw t2, CSR_SCOUNTEREN, t2 57 + REG_S t2, (KVM_ARCH_HOST_SCOUNTEREN)(a0) 58 + 59 + /* Save Host SSCRATCH and change it to struct kvm_vcpu_arch pointer */ 60 + csrrw t3, CSR_SSCRATCH, a0 61 + REG_S t3, (KVM_ARCH_HOST_SSCRATCH)(a0) 62 + 63 + /* Save Host STVEC and change it to return path */ 64 + la t4, __kvm_switch_return 65 + csrrw t4, CSR_STVEC, t4 66 + REG_S t4, (KVM_ARCH_HOST_STVEC)(a0) 67 + 68 + /* Restore Guest SEPC */ 69 + REG_L t0, (KVM_ARCH_GUEST_SEPC)(a0) 70 + csrw CSR_SEPC, t0 71 + 72 + /* Restore Guest GPRs (except A0) */ 73 + REG_L ra, (KVM_ARCH_GUEST_RA)(a0) 74 + REG_L sp, (KVM_ARCH_GUEST_SP)(a0) 75 + REG_L gp, (KVM_ARCH_GUEST_GP)(a0) 76 + REG_L tp, (KVM_ARCH_GUEST_TP)(a0) 77 + REG_L t0, (KVM_ARCH_GUEST_T0)(a0) 78 + REG_L t1, (KVM_ARCH_GUEST_T1)(a0) 79 + REG_L t2, (KVM_ARCH_GUEST_T2)(a0) 80 + REG_L s0, (KVM_ARCH_GUEST_S0)(a0) 81 + REG_L s1, (KVM_ARCH_GUEST_S1)(a0) 82 + REG_L a1, (KVM_ARCH_GUEST_A1)(a0) 83 + REG_L a2, (KVM_ARCH_GUEST_A2)(a0) 84 + REG_L a3, (KVM_ARCH_GUEST_A3)(a0) 85 + REG_L a4, (KVM_ARCH_GUEST_A4)(a0) 86 + REG_L a5, (KVM_ARCH_GUEST_A5)(a0) 87 + REG_L a6, (KVM_ARCH_GUEST_A6)(a0) 88 + REG_L a7, (KVM_ARCH_GUEST_A7)(a0) 89 + REG_L s2, (KVM_ARCH_GUEST_S2)(a0) 90 + REG_L s3, (KVM_ARCH_GUEST_S3)(a0) 91 + REG_L s4, (KVM_ARCH_GUEST_S4)(a0) 92 + REG_L s5, (KVM_ARCH_GUEST_S5)(a0) 93 + REG_L s6, (KVM_ARCH_GUEST_S6)(a0) 94 + REG_L s7, (KVM_ARCH_GUEST_S7)(a0) 95 + REG_L s8, (KVM_ARCH_GUEST_S8)(a0) 96 + REG_L s9, (KVM_ARCH_GUEST_S9)(a0) 97 + REG_L s10, (KVM_ARCH_GUEST_S10)(a0) 98 + REG_L s11, (KVM_ARCH_GUEST_S11)(a0) 99 + REG_L t3, (KVM_ARCH_GUEST_T3)(a0) 100 + REG_L t4, (KVM_ARCH_GUEST_T4)(a0) 101 + REG_L t5, (KVM_ARCH_GUEST_T5)(a0) 102 + REG_L t6, (KVM_ARCH_GUEST_T6)(a0) 103 + 104 + /* Restore Guest A0 */ 105 + REG_L a0, (KVM_ARCH_GUEST_A0)(a0) 106 + 107 + /* Resume Guest */ 108 + sret 109 + 110 + /* Back to Host */ 111 + .align 2 112 + __kvm_switch_return: 113 + /* Swap Guest A0 with SSCRATCH */ 114 + csrrw a0, CSR_SSCRATCH, a0 115 + 116 + /* Save Guest GPRs (except A0) */ 117 + REG_S ra, (KVM_ARCH_GUEST_RA)(a0) 118 + REG_S sp, (KVM_ARCH_GUEST_SP)(a0) 119 + REG_S gp, (KVM_ARCH_GUEST_GP)(a0) 120 + REG_S tp, (KVM_ARCH_GUEST_TP)(a0) 121 + REG_S t0, (KVM_ARCH_GUEST_T0)(a0) 122 + REG_S t1, (KVM_ARCH_GUEST_T1)(a0) 123 + REG_S t2, (KVM_ARCH_GUEST_T2)(a0) 124 + REG_S s0, (KVM_ARCH_GUEST_S0)(a0) 125 + REG_S s1, (KVM_ARCH_GUEST_S1)(a0) 126 + REG_S a1, (KVM_ARCH_GUEST_A1)(a0) 127 + REG_S a2, (KVM_ARCH_GUEST_A2)(a0) 128 + REG_S a3, (KVM_ARCH_GUEST_A3)(a0) 129 + REG_S a4, (KVM_ARCH_GUEST_A4)(a0) 130 + REG_S a5, (KVM_ARCH_GUEST_A5)(a0) 131 + REG_S a6, (KVM_ARCH_GUEST_A6)(a0) 132 + REG_S a7, (KVM_ARCH_GUEST_A7)(a0) 133 + REG_S s2, (KVM_ARCH_GUEST_S2)(a0) 134 + REG_S s3, (KVM_ARCH_GUEST_S3)(a0) 135 + REG_S s4, (KVM_ARCH_GUEST_S4)(a0) 136 + REG_S s5, (KVM_ARCH_GUEST_S5)(a0) 137 + REG_S s6, (KVM_ARCH_GUEST_S6)(a0) 138 + REG_S s7, (KVM_ARCH_GUEST_S7)(a0) 139 + REG_S s8, (KVM_ARCH_GUEST_S8)(a0) 140 + REG_S s9, (KVM_ARCH_GUEST_S9)(a0) 141 + REG_S s10, (KVM_ARCH_GUEST_S10)(a0) 142 + REG_S s11, (KVM_ARCH_GUEST_S11)(a0) 143 + REG_S t3, (KVM_ARCH_GUEST_T3)(a0) 144 + REG_S t4, (KVM_ARCH_GUEST_T4)(a0) 145 + REG_S t5, (KVM_ARCH_GUEST_T5)(a0) 146 + REG_S t6, (KVM_ARCH_GUEST_T6)(a0) 147 + 148 + /* Save Guest SEPC */ 149 + csrr t0, CSR_SEPC 150 + REG_S t0, (KVM_ARCH_GUEST_SEPC)(a0) 151 + 152 + /* Restore Host STVEC */ 153 + REG_L t1, (KVM_ARCH_HOST_STVEC)(a0) 154 + csrw CSR_STVEC, t1 155 + 156 + /* Save Guest A0 and Restore Host SSCRATCH */ 157 + REG_L t2, (KVM_ARCH_HOST_SSCRATCH)(a0) 158 + csrrw t2, CSR_SSCRATCH, t2 159 + REG_S t2, (KVM_ARCH_GUEST_A0)(a0) 160 + 161 + /* Save Guest and Restore Host SCOUNTEREN */ 162 + REG_L t3, (KVM_ARCH_HOST_SCOUNTEREN)(a0) 163 + csrrw t3, CSR_SCOUNTEREN, t3 164 + REG_S t3, (KVM_ARCH_GUEST_SCOUNTEREN)(a0) 165 + 166 + /* Save Guest and Restore Host HSTATUS */ 167 + REG_L t4, (KVM_ARCH_HOST_HSTATUS)(a0) 168 + csrrw t4, CSR_HSTATUS, t4 169 + REG_S t4, (KVM_ARCH_GUEST_HSTATUS)(a0) 170 + 171 + /* Save Guest and Restore Host SSTATUS */ 172 + REG_L t5, (KVM_ARCH_HOST_SSTATUS)(a0) 173 + csrrw t5, CSR_SSTATUS, t5 174 + REG_S t5, (KVM_ARCH_GUEST_SSTATUS)(a0) 175 + 176 + /* Restore Host GPRs (except A0 and T0-T6) */ 177 + REG_L ra, (KVM_ARCH_HOST_RA)(a0) 178 + REG_L sp, (KVM_ARCH_HOST_SP)(a0) 179 + REG_L gp, (KVM_ARCH_HOST_GP)(a0) 180 + REG_L tp, (KVM_ARCH_HOST_TP)(a0) 181 + REG_L s0, (KVM_ARCH_HOST_S0)(a0) 182 + REG_L s1, (KVM_ARCH_HOST_S1)(a0) 183 + REG_L a1, (KVM_ARCH_HOST_A1)(a0) 184 + REG_L a2, (KVM_ARCH_HOST_A2)(a0) 185 + REG_L a3, (KVM_ARCH_HOST_A3)(a0) 186 + REG_L a4, (KVM_ARCH_HOST_A4)(a0) 187 + REG_L a5, (KVM_ARCH_HOST_A5)(a0) 188 + REG_L a6, (KVM_ARCH_HOST_A6)(a0) 189 + REG_L a7, (KVM_ARCH_HOST_A7)(a0) 190 + REG_L s2, (KVM_ARCH_HOST_S2)(a0) 191 + REG_L s3, (KVM_ARCH_HOST_S3)(a0) 192 + REG_L s4, (KVM_ARCH_HOST_S4)(a0) 193 + REG_L s5, (KVM_ARCH_HOST_S5)(a0) 194 + REG_L s6, (KVM_ARCH_HOST_S6)(a0) 195 + REG_L s7, (KVM_ARCH_HOST_S7)(a0) 196 + REG_L s8, (KVM_ARCH_HOST_S8)(a0) 197 + REG_L s9, (KVM_ARCH_HOST_S9)(a0) 198 + REG_L s10, (KVM_ARCH_HOST_S10)(a0) 199 + REG_L s11, (KVM_ARCH_HOST_S11)(a0) 200 + 201 + /* Return to C code */ 202 + ret 203 + ENDPROC(__kvm_riscv_switch_to) 204 + 205 + ENTRY(__kvm_riscv_unpriv_trap) 206 + /* 207 + * We assume that faulting unpriv load/store instruction is 208 + * 4-byte long and blindly increment SEPC by 4. 209 + * 210 + * The trap details will be saved at address pointed by 'A0' 211 + * register and we use 'A1' register as temporary. 212 + */ 213 + csrr a1, CSR_SEPC 214 + REG_S a1, (KVM_ARCH_TRAP_SEPC)(a0) 215 + addi a1, a1, 4 216 + csrw CSR_SEPC, a1 217 + csrr a1, CSR_SCAUSE 218 + REG_S a1, (KVM_ARCH_TRAP_SCAUSE)(a0) 219 + csrr a1, CSR_STVAL 220 + REG_S a1, (KVM_ARCH_TRAP_STVAL)(a0) 221 + csrr a1, CSR_HTVAL 222 + REG_S a1, (KVM_ARCH_TRAP_HTVAL)(a0) 223 + csrr a1, CSR_HTINST 224 + REG_S a1, (KVM_ARCH_TRAP_HTINST)(a0) 225 + sret 226 + ENDPROC(__kvm_riscv_unpriv_trap) 227 + 228 + #ifdef CONFIG_FPU 229 + .align 3 230 + .global __kvm_riscv_fp_f_save 231 + __kvm_riscv_fp_f_save: 232 + csrr t2, CSR_SSTATUS 233 + li t1, SR_FS 234 + csrs CSR_SSTATUS, t1 235 + frcsr t0 236 + fsw f0, KVM_ARCH_FP_F_F0(a0) 237 + fsw f1, KVM_ARCH_FP_F_F1(a0) 238 + fsw f2, KVM_ARCH_FP_F_F2(a0) 239 + fsw f3, KVM_ARCH_FP_F_F3(a0) 240 + fsw f4, KVM_ARCH_FP_F_F4(a0) 241 + fsw f5, KVM_ARCH_FP_F_F5(a0) 242 + fsw f6, KVM_ARCH_FP_F_F6(a0) 243 + fsw f7, KVM_ARCH_FP_F_F7(a0) 244 + fsw f8, KVM_ARCH_FP_F_F8(a0) 245 + fsw f9, KVM_ARCH_FP_F_F9(a0) 246 + fsw f10, KVM_ARCH_FP_F_F10(a0) 247 + fsw f11, KVM_ARCH_FP_F_F11(a0) 248 + fsw f12, KVM_ARCH_FP_F_F12(a0) 249 + fsw f13, KVM_ARCH_FP_F_F13(a0) 250 + fsw f14, KVM_ARCH_FP_F_F14(a0) 251 + fsw f15, KVM_ARCH_FP_F_F15(a0) 252 + fsw f16, KVM_ARCH_FP_F_F16(a0) 253 + fsw f17, KVM_ARCH_FP_F_F17(a0) 254 + fsw f18, KVM_ARCH_FP_F_F18(a0) 255 + fsw f19, KVM_ARCH_FP_F_F19(a0) 256 + fsw f20, KVM_ARCH_FP_F_F20(a0) 257 + fsw f21, KVM_ARCH_FP_F_F21(a0) 258 + fsw f22, KVM_ARCH_FP_F_F22(a0) 259 + fsw f23, KVM_ARCH_FP_F_F23(a0) 260 + fsw f24, KVM_ARCH_FP_F_F24(a0) 261 + fsw f25, KVM_ARCH_FP_F_F25(a0) 262 + fsw f26, KVM_ARCH_FP_F_F26(a0) 263 + fsw f27, KVM_ARCH_FP_F_F27(a0) 264 + fsw f28, KVM_ARCH_FP_F_F28(a0) 265 + fsw f29, KVM_ARCH_FP_F_F29(a0) 266 + fsw f30, KVM_ARCH_FP_F_F30(a0) 267 + fsw f31, KVM_ARCH_FP_F_F31(a0) 268 + sw t0, KVM_ARCH_FP_F_FCSR(a0) 269 + csrw CSR_SSTATUS, t2 270 + ret 271 + 272 + .align 3 273 + .global __kvm_riscv_fp_d_save 274 + __kvm_riscv_fp_d_save: 275 + csrr t2, CSR_SSTATUS 276 + li t1, SR_FS 277 + csrs CSR_SSTATUS, t1 278 + frcsr t0 279 + fsd f0, KVM_ARCH_FP_D_F0(a0) 280 + fsd f1, KVM_ARCH_FP_D_F1(a0) 281 + fsd f2, KVM_ARCH_FP_D_F2(a0) 282 + fsd f3, KVM_ARCH_FP_D_F3(a0) 283 + fsd f4, KVM_ARCH_FP_D_F4(a0) 284 + fsd f5, KVM_ARCH_FP_D_F5(a0) 285 + fsd f6, KVM_ARCH_FP_D_F6(a0) 286 + fsd f7, KVM_ARCH_FP_D_F7(a0) 287 + fsd f8, KVM_ARCH_FP_D_F8(a0) 288 + fsd f9, KVM_ARCH_FP_D_F9(a0) 289 + fsd f10, KVM_ARCH_FP_D_F10(a0) 290 + fsd f11, KVM_ARCH_FP_D_F11(a0) 291 + fsd f12, KVM_ARCH_FP_D_F12(a0) 292 + fsd f13, KVM_ARCH_FP_D_F13(a0) 293 + fsd f14, KVM_ARCH_FP_D_F14(a0) 294 + fsd f15, KVM_ARCH_FP_D_F15(a0) 295 + fsd f16, KVM_ARCH_FP_D_F16(a0) 296 + fsd f17, KVM_ARCH_FP_D_F17(a0) 297 + fsd f18, KVM_ARCH_FP_D_F18(a0) 298 + fsd f19, KVM_ARCH_FP_D_F19(a0) 299 + fsd f20, KVM_ARCH_FP_D_F20(a0) 300 + fsd f21, KVM_ARCH_FP_D_F21(a0) 301 + fsd f22, KVM_ARCH_FP_D_F22(a0) 302 + fsd f23, KVM_ARCH_FP_D_F23(a0) 303 + fsd f24, KVM_ARCH_FP_D_F24(a0) 304 + fsd f25, KVM_ARCH_FP_D_F25(a0) 305 + fsd f26, KVM_ARCH_FP_D_F26(a0) 306 + fsd f27, KVM_ARCH_FP_D_F27(a0) 307 + fsd f28, KVM_ARCH_FP_D_F28(a0) 308 + fsd f29, KVM_ARCH_FP_D_F29(a0) 309 + fsd f30, KVM_ARCH_FP_D_F30(a0) 310 + fsd f31, KVM_ARCH_FP_D_F31(a0) 311 + sw t0, KVM_ARCH_FP_D_FCSR(a0) 312 + csrw CSR_SSTATUS, t2 313 + ret 314 + 315 + .align 3 316 + .global __kvm_riscv_fp_f_restore 317 + __kvm_riscv_fp_f_restore: 318 + csrr t2, CSR_SSTATUS 319 + li t1, SR_FS 320 + lw t0, KVM_ARCH_FP_F_FCSR(a0) 321 + csrs CSR_SSTATUS, t1 322 + flw f0, KVM_ARCH_FP_F_F0(a0) 323 + flw f1, KVM_ARCH_FP_F_F1(a0) 324 + flw f2, KVM_ARCH_FP_F_F2(a0) 325 + flw f3, KVM_ARCH_FP_F_F3(a0) 326 + flw f4, KVM_ARCH_FP_F_F4(a0) 327 + flw f5, KVM_ARCH_FP_F_F5(a0) 328 + flw f6, KVM_ARCH_FP_F_F6(a0) 329 + flw f7, KVM_ARCH_FP_F_F7(a0) 330 + flw f8, KVM_ARCH_FP_F_F8(a0) 331 + flw f9, KVM_ARCH_FP_F_F9(a0) 332 + flw f10, KVM_ARCH_FP_F_F10(a0) 333 + flw f11, KVM_ARCH_FP_F_F11(a0) 334 + flw f12, KVM_ARCH_FP_F_F12(a0) 335 + flw f13, KVM_ARCH_FP_F_F13(a0) 336 + flw f14, KVM_ARCH_FP_F_F14(a0) 337 + flw f15, KVM_ARCH_FP_F_F15(a0) 338 + flw f16, KVM_ARCH_FP_F_F16(a0) 339 + flw f17, KVM_ARCH_FP_F_F17(a0) 340 + flw f18, KVM_ARCH_FP_F_F18(a0) 341 + flw f19, KVM_ARCH_FP_F_F19(a0) 342 + flw f20, KVM_ARCH_FP_F_F20(a0) 343 + flw f21, KVM_ARCH_FP_F_F21(a0) 344 + flw f22, KVM_ARCH_FP_F_F22(a0) 345 + flw f23, KVM_ARCH_FP_F_F23(a0) 346 + flw f24, KVM_ARCH_FP_F_F24(a0) 347 + flw f25, KVM_ARCH_FP_F_F25(a0) 348 + flw f26, KVM_ARCH_FP_F_F26(a0) 349 + flw f27, KVM_ARCH_FP_F_F27(a0) 350 + flw f28, KVM_ARCH_FP_F_F28(a0) 351 + flw f29, KVM_ARCH_FP_F_F29(a0) 352 + flw f30, KVM_ARCH_FP_F_F30(a0) 353 + flw f31, KVM_ARCH_FP_F_F31(a0) 354 + fscsr t0 355 + csrw CSR_SSTATUS, t2 356 + ret 357 + 358 + .align 3 359 + .global __kvm_riscv_fp_d_restore 360 + __kvm_riscv_fp_d_restore: 361 + csrr t2, CSR_SSTATUS 362 + li t1, SR_FS 363 + lw t0, KVM_ARCH_FP_D_FCSR(a0) 364 + csrs CSR_SSTATUS, t1 365 + fld f0, KVM_ARCH_FP_D_F0(a0) 366 + fld f1, KVM_ARCH_FP_D_F1(a0) 367 + fld f2, KVM_ARCH_FP_D_F2(a0) 368 + fld f3, KVM_ARCH_FP_D_F3(a0) 369 + fld f4, KVM_ARCH_FP_D_F4(a0) 370 + fld f5, KVM_ARCH_FP_D_F5(a0) 371 + fld f6, KVM_ARCH_FP_D_F6(a0) 372 + fld f7, KVM_ARCH_FP_D_F7(a0) 373 + fld f8, KVM_ARCH_FP_D_F8(a0) 374 + fld f9, KVM_ARCH_FP_D_F9(a0) 375 + fld f10, KVM_ARCH_FP_D_F10(a0) 376 + fld f11, KVM_ARCH_FP_D_F11(a0) 377 + fld f12, KVM_ARCH_FP_D_F12(a0) 378 + fld f13, KVM_ARCH_FP_D_F13(a0) 379 + fld f14, KVM_ARCH_FP_D_F14(a0) 380 + fld f15, KVM_ARCH_FP_D_F15(a0) 381 + fld f16, KVM_ARCH_FP_D_F16(a0) 382 + fld f17, KVM_ARCH_FP_D_F17(a0) 383 + fld f18, KVM_ARCH_FP_D_F18(a0) 384 + fld f19, KVM_ARCH_FP_D_F19(a0) 385 + fld f20, KVM_ARCH_FP_D_F20(a0) 386 + fld f21, KVM_ARCH_FP_D_F21(a0) 387 + fld f22, KVM_ARCH_FP_D_F22(a0) 388 + fld f23, KVM_ARCH_FP_D_F23(a0) 389 + fld f24, KVM_ARCH_FP_D_F24(a0) 390 + fld f25, KVM_ARCH_FP_D_F25(a0) 391 + fld f26, KVM_ARCH_FP_D_F26(a0) 392 + fld f27, KVM_ARCH_FP_D_F27(a0) 393 + fld f28, KVM_ARCH_FP_D_F28(a0) 394 + fld f29, KVM_ARCH_FP_D_F29(a0) 395 + fld f30, KVM_ARCH_FP_D_F30(a0) 396 + fld f31, KVM_ARCH_FP_D_F31(a0) 397 + fscsr t0 398 + csrw CSR_SSTATUS, t2 399 + ret 400 + #endif

+225

arch/riscv/kvm/vcpu_timer.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Atish Patra <atish.patra@wdc.com> 7 + */ 8 + 9 + #include <linux/errno.h> 10 + #include <linux/err.h> 11 + #include <linux/kvm_host.h> 12 + #include <linux/uaccess.h> 13 + #include <clocksource/timer-riscv.h> 14 + #include <asm/csr.h> 15 + #include <asm/delay.h> 16 + #include <asm/kvm_vcpu_timer.h> 17 + 18 + static u64 kvm_riscv_current_cycles(struct kvm_guest_timer *gt) 19 + { 20 + return get_cycles64() + gt->time_delta; 21 + } 22 + 23 + static u64 kvm_riscv_delta_cycles2ns(u64 cycles, 24 + struct kvm_guest_timer *gt, 25 + struct kvm_vcpu_timer *t) 26 + { 27 + unsigned long flags; 28 + u64 cycles_now, cycles_delta, delta_ns; 29 + 30 + local_irq_save(flags); 31 + cycles_now = kvm_riscv_current_cycles(gt); 32 + if (cycles_now < cycles) 33 + cycles_delta = cycles - cycles_now; 34 + else 35 + cycles_delta = 0; 36 + delta_ns = (cycles_delta * gt->nsec_mult) >> gt->nsec_shift; 37 + local_irq_restore(flags); 38 + 39 + return delta_ns; 40 + } 41 + 42 + static enum hrtimer_restart kvm_riscv_vcpu_hrtimer_expired(struct hrtimer *h) 43 + { 44 + u64 delta_ns; 45 + struct kvm_vcpu_timer *t = container_of(h, struct kvm_vcpu_timer, hrt); 46 + struct kvm_vcpu *vcpu = container_of(t, struct kvm_vcpu, arch.timer); 47 + struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer; 48 + 49 + if (kvm_riscv_current_cycles(gt) < t->next_cycles) { 50 + delta_ns = kvm_riscv_delta_cycles2ns(t->next_cycles, gt, t); 51 + hrtimer_forward_now(&t->hrt, ktime_set(0, delta_ns)); 52 + return HRTIMER_RESTART; 53 + } 54 + 55 + t->next_set = false; 56 + kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_VS_TIMER); 57 + 58 + return HRTIMER_NORESTART; 59 + } 60 + 61 + static int kvm_riscv_vcpu_timer_cancel(struct kvm_vcpu_timer *t) 62 + { 63 + if (!t->init_done || !t->next_set) 64 + return -EINVAL; 65 + 66 + hrtimer_cancel(&t->hrt); 67 + t->next_set = false; 68 + 69 + return 0; 70 + } 71 + 72 + int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu, u64 ncycles) 73 + { 74 + struct kvm_vcpu_timer *t = &vcpu->arch.timer; 75 + struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer; 76 + u64 delta_ns; 77 + 78 + if (!t->init_done) 79 + return -EINVAL; 80 + 81 + kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_VS_TIMER); 82 + 83 + delta_ns = kvm_riscv_delta_cycles2ns(ncycles, gt, t); 84 + t->next_cycles = ncycles; 85 + hrtimer_start(&t->hrt, ktime_set(0, delta_ns), HRTIMER_MODE_REL); 86 + t->next_set = true; 87 + 88 + return 0; 89 + } 90 + 91 + int kvm_riscv_vcpu_get_reg_timer(struct kvm_vcpu *vcpu, 92 + const struct kvm_one_reg *reg) 93 + { 94 + struct kvm_vcpu_timer *t = &vcpu->arch.timer; 95 + struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer; 96 + u64 __user *uaddr = (u64 __user *)(unsigned long)reg->addr; 97 + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | 98 + KVM_REG_SIZE_MASK | 99 + KVM_REG_RISCV_TIMER); 100 + u64 reg_val; 101 + 102 + if (KVM_REG_SIZE(reg->id) != sizeof(u64)) 103 + return -EINVAL; 104 + if (reg_num >= sizeof(struct kvm_riscv_timer) / sizeof(u64)) 105 + return -EINVAL; 106 + 107 + switch (reg_num) { 108 + case KVM_REG_RISCV_TIMER_REG(frequency): 109 + reg_val = riscv_timebase; 110 + break; 111 + case KVM_REG_RISCV_TIMER_REG(time): 112 + reg_val = kvm_riscv_current_cycles(gt); 113 + break; 114 + case KVM_REG_RISCV_TIMER_REG(compare): 115 + reg_val = t->next_cycles; 116 + break; 117 + case KVM_REG_RISCV_TIMER_REG(state): 118 + reg_val = (t->next_set) ? KVM_RISCV_TIMER_STATE_ON : 119 + KVM_RISCV_TIMER_STATE_OFF; 120 + break; 121 + default: 122 + return -EINVAL; 123 + } 124 + 125 + if (copy_to_user(uaddr, &reg_val, KVM_REG_SIZE(reg->id))) 126 + return -EFAULT; 127 + 128 + return 0; 129 + } 130 + 131 + int kvm_riscv_vcpu_set_reg_timer(struct kvm_vcpu *vcpu, 132 + const struct kvm_one_reg *reg) 133 + { 134 + struct kvm_vcpu_timer *t = &vcpu->arch.timer; 135 + struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer; 136 + u64 __user *uaddr = (u64 __user *)(unsigned long)reg->addr; 137 + unsigned long reg_num = reg->id & ~(KVM_REG_ARCH_MASK | 138 + KVM_REG_SIZE_MASK | 139 + KVM_REG_RISCV_TIMER); 140 + u64 reg_val; 141 + int ret = 0; 142 + 143 + if (KVM_REG_SIZE(reg->id) != sizeof(u64)) 144 + return -EINVAL; 145 + if (reg_num >= sizeof(struct kvm_riscv_timer) / sizeof(u64)) 146 + return -EINVAL; 147 + 148 + if (copy_from_user(&reg_val, uaddr, KVM_REG_SIZE(reg->id))) 149 + return -EFAULT; 150 + 151 + switch (reg_num) { 152 + case KVM_REG_RISCV_TIMER_REG(frequency): 153 + ret = -EOPNOTSUPP; 154 + break; 155 + case KVM_REG_RISCV_TIMER_REG(time): 156 + gt->time_delta = reg_val - get_cycles64(); 157 + break; 158 + case KVM_REG_RISCV_TIMER_REG(compare): 159 + t->next_cycles = reg_val; 160 + break; 161 + case KVM_REG_RISCV_TIMER_REG(state): 162 + if (reg_val == KVM_RISCV_TIMER_STATE_ON) 163 + ret = kvm_riscv_vcpu_timer_next_event(vcpu, reg_val); 164 + else 165 + ret = kvm_riscv_vcpu_timer_cancel(t); 166 + break; 167 + default: 168 + ret = -EINVAL; 169 + break; 170 + } 171 + 172 + return ret; 173 + } 174 + 175 + int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu) 176 + { 177 + struct kvm_vcpu_timer *t = &vcpu->arch.timer; 178 + 179 + if (t->init_done) 180 + return -EINVAL; 181 + 182 + hrtimer_init(&t->hrt, CLOCK_MONOTONIC, HRTIMER_MODE_REL); 183 + t->hrt.function = kvm_riscv_vcpu_hrtimer_expired; 184 + t->init_done = true; 185 + t->next_set = false; 186 + 187 + return 0; 188 + } 189 + 190 + int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu) 191 + { 192 + int ret; 193 + 194 + ret = kvm_riscv_vcpu_timer_cancel(&vcpu->arch.timer); 195 + vcpu->arch.timer.init_done = false; 196 + 197 + return ret; 198 + } 199 + 200 + int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu) 201 + { 202 + return kvm_riscv_vcpu_timer_cancel(&vcpu->arch.timer); 203 + } 204 + 205 + void kvm_riscv_vcpu_timer_restore(struct kvm_vcpu *vcpu) 206 + { 207 + struct kvm_guest_timer *gt = &vcpu->kvm->arch.timer; 208 + 209 + #ifdef CONFIG_64BIT 210 + csr_write(CSR_HTIMEDELTA, gt->time_delta); 211 + #else 212 + csr_write(CSR_HTIMEDELTA, (u32)(gt->time_delta)); 213 + csr_write(CSR_HTIMEDELTAH, (u32)(gt->time_delta >> 32)); 214 + #endif 215 + } 216 + 217 + int kvm_riscv_guest_timer_init(struct kvm *kvm) 218 + { 219 + struct kvm_guest_timer *gt = &kvm->arch.timer; 220 + 221 + riscv_cs_get_mult_shift(&gt->nsec_mult, &gt->nsec_shift); 222 + gt->time_delta = -get_cycles64(); 223 + 224 + return 0; 225 + }

+97

arch/riscv/kvm/vm.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Anup Patel <anup.patel@wdc.com> 7 + */ 8 + 9 + #include <linux/errno.h> 10 + #include <linux/err.h> 11 + #include <linux/module.h> 12 + #include <linux/uaccess.h> 13 + #include <linux/kvm_host.h> 14 + 15 + const struct _kvm_stats_desc kvm_vm_stats_desc[] = { 16 + KVM_GENERIC_VM_STATS() 17 + }; 18 + static_assert(ARRAY_SIZE(kvm_vm_stats_desc) == 19 + sizeof(struct kvm_vm_stat) / sizeof(u64)); 20 + 21 + const struct kvm_stats_header kvm_vm_stats_header = { 22 + .name_size = KVM_STATS_NAME_SIZE, 23 + .num_desc = ARRAY_SIZE(kvm_vm_stats_desc), 24 + .id_offset = sizeof(struct kvm_stats_header), 25 + .desc_offset = sizeof(struct kvm_stats_header) + KVM_STATS_NAME_SIZE, 26 + .data_offset = sizeof(struct kvm_stats_header) + KVM_STATS_NAME_SIZE + 27 + sizeof(kvm_vm_stats_desc), 28 + }; 29 + 30 + int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) 31 + { 32 + int r; 33 + 34 + r = kvm_riscv_stage2_alloc_pgd(kvm); 35 + if (r) 36 + return r; 37 + 38 + r = kvm_riscv_stage2_vmid_init(kvm); 39 + if (r) { 40 + kvm_riscv_stage2_free_pgd(kvm); 41 + return r; 42 + } 43 + 44 + return kvm_riscv_guest_timer_init(kvm); 45 + } 46 + 47 + void kvm_arch_destroy_vm(struct kvm *kvm) 48 + { 49 + int i; 50 + 51 + for (i = 0; i < KVM_MAX_VCPUS; ++i) { 52 + if (kvm->vcpus[i]) { 53 + kvm_vcpu_destroy(kvm->vcpus[i]); 54 + kvm->vcpus[i] = NULL; 55 + } 56 + } 57 + atomic_set(&kvm->online_vcpus, 0); 58 + } 59 + 60 + int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) 61 + { 62 + int r; 63 + 64 + switch (ext) { 65 + case KVM_CAP_IOEVENTFD: 66 + case KVM_CAP_DEVICE_CTRL: 67 + case KVM_CAP_USER_MEMORY: 68 + case KVM_CAP_SYNC_MMU: 69 + case KVM_CAP_DESTROY_MEMORY_REGION_WORKS: 70 + case KVM_CAP_ONE_REG: 71 + case KVM_CAP_READONLY_MEM: 72 + case KVM_CAP_MP_STATE: 73 + case KVM_CAP_IMMEDIATE_EXIT: 74 + r = 1; 75 + break; 76 + case KVM_CAP_NR_VCPUS: 77 + r = num_online_cpus(); 78 + break; 79 + case KVM_CAP_MAX_VCPUS: 80 + r = KVM_MAX_VCPUS; 81 + break; 82 + case KVM_CAP_NR_MEMSLOTS: 83 + r = KVM_USER_MEM_SLOTS; 84 + break; 85 + default: 86 + r = 0; 87 + break; 88 + } 89 + 90 + return r; 91 + } 92 + 93 + long kvm_arch_vm_ioctl(struct file *filp, 94 + unsigned int ioctl, unsigned long arg) 95 + { 96 + return -EINVAL; 97 + }

+120

arch/riscv/kvm/vmid.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Anup Patel <anup.patel@wdc.com> 7 + */ 8 + 9 + #include <linux/bitops.h> 10 + #include <linux/cpumask.h> 11 + #include <linux/errno.h> 12 + #include <linux/err.h> 13 + #include <linux/module.h> 14 + #include <linux/kvm_host.h> 15 + #include <asm/csr.h> 16 + #include <asm/sbi.h> 17 + 18 + static unsigned long vmid_version = 1; 19 + static unsigned long vmid_next; 20 + static unsigned long vmid_bits; 21 + static DEFINE_SPINLOCK(vmid_lock); 22 + 23 + void kvm_riscv_stage2_vmid_detect(void) 24 + { 25 + unsigned long old; 26 + 27 + /* Figure-out number of VMID bits in HW */ 28 + old = csr_read(CSR_HGATP); 29 + csr_write(CSR_HGATP, old | HGATP_VMID_MASK); 30 + vmid_bits = csr_read(CSR_HGATP); 31 + vmid_bits = (vmid_bits & HGATP_VMID_MASK) >> HGATP_VMID_SHIFT; 32 + vmid_bits = fls_long(vmid_bits); 33 + csr_write(CSR_HGATP, old); 34 + 35 + /* We polluted local TLB so flush all guest TLB */ 36 + __kvm_riscv_hfence_gvma_all(); 37 + 38 + /* We don't use VMID bits if they are not sufficient */ 39 + if ((1UL << vmid_bits) < num_possible_cpus()) 40 + vmid_bits = 0; 41 + } 42 + 43 + unsigned long kvm_riscv_stage2_vmid_bits(void) 44 + { 45 + return vmid_bits; 46 + } 47 + 48 + int kvm_riscv_stage2_vmid_init(struct kvm *kvm) 49 + { 50 + /* Mark the initial VMID and VMID version invalid */ 51 + kvm->arch.vmid.vmid_version = 0; 52 + kvm->arch.vmid.vmid = 0; 53 + 54 + return 0; 55 + } 56 + 57 + bool kvm_riscv_stage2_vmid_ver_changed(struct kvm_vmid *vmid) 58 + { 59 + if (!vmid_bits) 60 + return false; 61 + 62 + return unlikely(READ_ONCE(vmid->vmid_version) != 63 + READ_ONCE(vmid_version)); 64 + } 65 + 66 + void kvm_riscv_stage2_vmid_update(struct kvm_vcpu *vcpu) 67 + { 68 + int i; 69 + struct kvm_vcpu *v; 70 + struct cpumask hmask; 71 + struct kvm_vmid *vmid = &vcpu->kvm->arch.vmid; 72 + 73 + if (!kvm_riscv_stage2_vmid_ver_changed(vmid)) 74 + return; 75 + 76 + spin_lock(&vmid_lock); 77 + 78 + /* 79 + * We need to re-check the vmid_version here to ensure that if 80 + * another vcpu already allocated a valid vmid for this vm. 81 + */ 82 + if (!kvm_riscv_stage2_vmid_ver_changed(vmid)) { 83 + spin_unlock(&vmid_lock); 84 + return; 85 + } 86 + 87 + /* First user of a new VMID version? */ 88 + if (unlikely(vmid_next == 0)) { 89 + WRITE_ONCE(vmid_version, READ_ONCE(vmid_version) + 1); 90 + vmid_next = 1; 91 + 92 + /* 93 + * We ran out of VMIDs so we increment vmid_version and 94 + * start assigning VMIDs from 1. 95 + * 96 + * This also means existing VMIDs assignement to all Guest 97 + * instances is invalid and we have force VMID re-assignement 98 + * for all Guest instances. The Guest instances that were not 99 + * running will automatically pick-up new VMIDs because will 100 + * call kvm_riscv_stage2_vmid_update() whenever they enter 101 + * in-kernel run loop. For Guest instances that are already 102 + * running, we force VM exits on all host CPUs using IPI and 103 + * flush all Guest TLBs. 104 + */ 105 + riscv_cpuid_to_hartid_mask(cpu_online_mask, &hmask); 106 + sbi_remote_hfence_gvma(cpumask_bits(&hmask), 0, 0); 107 + } 108 + 109 + vmid->vmid = vmid_next; 110 + vmid_next++; 111 + vmid_next &= (1 << vmid_bits) - 1; 112 + 113 + WRITE_ONCE(vmid->vmid_version, READ_ONCE(vmid_version)); 114 + 115 + spin_unlock(&vmid_lock); 116 + 117 + /* Request stage2 page table update for all VCPUs */ 118 + kvm_for_each_vcpu(i, v, vcpu->kvm) 119 + kvm_make_request(KVM_REQ_UPDATE_HGATP, v); 120 + }

+6 -3

arch/s390/include/asm/pgtable.h

··· 1074 1074 pte_t res; 1075 1075 1076 1076 res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID)); 1077 + /* At this point the reference through the mapping is still present */ 1077 1078 if (mm_is_protected(mm) && pte_present(res)) 1078 - uv_convert_from_secure(pte_val(res) & PAGE_MASK); 1079 + uv_convert_owned_from_secure(pte_val(res) & PAGE_MASK); 1079 1080 return res; 1080 1081 } 1081 1082 ··· 1092 1091 pte_t res; 1093 1092 1094 1093 res = ptep_xchg_direct(vma->vm_mm, addr, ptep, __pte(_PAGE_INVALID)); 1094 + /* At this point the reference through the mapping is still present */ 1095 1095 if (mm_is_protected(vma->vm_mm) && pte_present(res)) 1096 - uv_convert_from_secure(pte_val(res) & PAGE_MASK); 1096 + uv_convert_owned_from_secure(pte_val(res) & PAGE_MASK); 1097 1097 return res; 1098 1098 } 1099 1099 ··· 1118 1116 } else { 1119 1117 res = ptep_xchg_lazy(mm, addr, ptep, __pte(_PAGE_INVALID)); 1120 1118 } 1119 + /* At this point the reference through the mapping is still present */ 1121 1120 if (mm_is_protected(mm) && pte_present(res)) 1122 - uv_convert_from_secure(pte_val(res) & PAGE_MASK); 1121 + uv_convert_owned_from_secure(pte_val(res) & PAGE_MASK); 1123 1122 return res; 1124 1123 } 1125 1124

+13 -2

arch/s390/include/asm/uv.h

··· 18 18 #include <asm/page.h> 19 19 #include <asm/gmap.h> 20 20 21 + #define UVC_CC_OK 0 22 + #define UVC_CC_ERROR 1 23 + #define UVC_CC_BUSY 2 24 + #define UVC_CC_PARTIAL 3 25 + 21 26 #define UVC_RC_EXECUTED 0x0001 22 27 #define UVC_RC_INV_CMD 0x0002 23 28 #define UVC_RC_INV_STATE 0x0003 ··· 356 351 } 357 352 358 353 int gmap_make_secure(struct gmap *gmap, unsigned long gaddr, void *uvcb); 359 - int uv_destroy_page(unsigned long paddr); 354 + int uv_destroy_owned_page(unsigned long paddr); 360 355 int uv_convert_from_secure(unsigned long paddr); 356 + int uv_convert_owned_from_secure(unsigned long paddr); 361 357 int gmap_convert_to_secure(struct gmap *gmap, unsigned long gaddr); 362 358 363 359 void setup_uv(void); ··· 366 360 #define is_prot_virt_host() 0 367 361 static inline void setup_uv(void) {} 368 362 369 - static inline int uv_destroy_page(unsigned long paddr) 363 + static inline int uv_destroy_owned_page(unsigned long paddr) 370 364 { 371 365 return 0; 372 366 } 373 367 374 368 static inline int uv_convert_from_secure(unsigned long paddr) 369 + { 370 + return 0; 371 + } 372 + 373 + static inline int uv_convert_owned_from_secure(unsigned long paddr) 375 374 { 376 375 return 0; 377 376 }

+57 -8

arch/s390/kernel/uv.c

··· 100 100 * 101 101 * @paddr: Absolute host address of page to be destroyed 102 102 */ 103 - int uv_destroy_page(unsigned long paddr) 103 + static int uv_destroy_page(unsigned long paddr) 104 104 { 105 105 struct uv_cb_cfs uvcb = { 106 106 .header.cmd = UVC_CMD_DESTR_SEC_STOR, ··· 121 121 } 122 122 123 123 /* 124 + * The caller must already hold a reference to the page 125 + */ 126 + int uv_destroy_owned_page(unsigned long paddr) 127 + { 128 + struct page *page = phys_to_page(paddr); 129 + int rc; 130 + 131 + get_page(page); 132 + rc = uv_destroy_page(paddr); 133 + if (!rc) 134 + clear_bit(PG_arch_1, &page->flags); 135 + put_page(page); 136 + return rc; 137 + } 138 + 139 + /* 124 140 * Requests the Ultravisor to encrypt a guest page and make it 125 141 * accessible to the host for paging (export). 126 142 * ··· 153 137 if (uv_call(0, (u64)&uvcb)) 154 138 return -EINVAL; 155 139 return 0; 140 + } 141 + 142 + /* 143 + * The caller must already hold a reference to the page 144 + */ 145 + int uv_convert_owned_from_secure(unsigned long paddr) 146 + { 147 + struct page *page = phys_to_page(paddr); 148 + int rc; 149 + 150 + get_page(page); 151 + rc = uv_convert_from_secure(paddr); 152 + if (!rc) 153 + clear_bit(PG_arch_1, &page->flags); 154 + put_page(page); 155 + return rc; 156 156 } 157 157 158 158 /* ··· 197 165 { 198 166 pte_t entry = READ_ONCE(*ptep); 199 167 struct page *page; 200 - int expected, rc = 0; 168 + int expected, cc = 0; 201 169 202 170 if (!pte_present(entry)) 203 171 return -ENXIO; ··· 213 181 if (!page_ref_freeze(page, expected)) 214 182 return -EBUSY; 215 183 set_bit(PG_arch_1, &page->flags); 216 - rc = uv_call(0, (u64)uvcb); 184 + /* 185 + * If the UVC does not succeed or fail immediately, we don't want to 186 + * loop for long, or we might get stall notifications. 187 + * On the other hand, this is a complex scenario and we are holding a lot of 188 + * locks, so we can't easily sleep and reschedule. We try only once, 189 + * and if the UVC returned busy or partial completion, we return 190 + * -EAGAIN and we let the callers deal with it. 191 + */ 192 + cc = __uv_call(0, (u64)uvcb); 217 193 page_ref_unfreeze(page, expected); 218 - /* Return -ENXIO if the page was not mapped, -EINVAL otherwise */ 219 - if (rc) 220 - rc = uvcb->rc == 0x10a ? -ENXIO : -EINVAL; 221 - return rc; 194 + /* 195 + * Return -ENXIO if the page was not mapped, -EINVAL for other errors. 196 + * If busy or partially completed, return -EAGAIN. 197 + */ 198 + if (cc == UVC_CC_OK) 199 + return 0; 200 + else if (cc == UVC_CC_BUSY || cc == UVC_CC_PARTIAL) 201 + return -EAGAIN; 202 + return uvcb->rc == 0x10a ? -ENXIO : -EINVAL; 222 203 } 223 204 224 205 /* ··· 257 212 uaddr = __gmap_translate(gmap, gaddr); 258 213 if (IS_ERR_VALUE(uaddr)) 259 214 goto out; 260 - vma = find_vma(gmap->mm, uaddr); 215 + vma = vma_lookup(gmap->mm, uaddr); 261 216 if (!vma) 262 217 goto out; 263 218 /* ··· 284 239 mmap_read_unlock(gmap->mm); 285 240 286 241 if (rc == -EAGAIN) { 242 + /* 243 + * If we are here because the UVC returned busy or partial 244 + * completion, this is just a useless check, but it is safe. 245 + */ 287 246 wait_on_page_writeback(page); 288 247 } else if (rc == -EBUSY) { 289 248 /*

+5

arch/s390/kvm/intercept.c

··· 518 518 */ 519 519 if (rc == -EINVAL) 520 520 return 0; 521 + /* 522 + * If we got -EAGAIN here, we simply return it. It will eventually 523 + * get propagated all the way to userspace, which should then try 524 + * again. 525 + */ 521 526 return rc; 522 527 } 523 528

+4 -3

arch/s390/kvm/kvm-s390.c

··· 2487 2487 case KVM_S390_PV_COMMAND: { 2488 2488 struct kvm_pv_cmd args; 2489 2489 2490 - /* protvirt means user sigp */ 2491 - kvm->arch.user_cpu_state_ctrl = 1; 2490 + /* protvirt means user cpu state */ 2491 + kvm_s390_set_user_cpu_state_ctrl(kvm); 2492 2492 r = 0; 2493 2493 if (!is_prot_virt_host()) { 2494 2494 r = -EINVAL; ··· 3802 3802 vcpu_load(vcpu); 3803 3803 3804 3804 /* user space knows about this interface - let it control the state */ 3805 - vcpu->kvm->arch.user_cpu_state_ctrl = 1; 3805 + kvm_s390_set_user_cpu_state_ctrl(vcpu->kvm); 3806 3806 3807 3807 switch (mp_state->mp_state) { 3808 3808 case KVM_MP_STATE_STOPPED: ··· 4255 4255 if (kvm_run->kvm_dirty_regs & KVM_SYNC_DIAG318) { 4256 4256 vcpu->arch.diag318_info.val = kvm_run->s.regs.diag318; 4257 4257 vcpu->arch.sie_block->cpnc = vcpu->arch.diag318_info.cpnc; 4258 + VCPU_EVENT(vcpu, 3, "setting cpnc to %d", vcpu->arch.diag318_info.cpnc); 4258 4259 } 4259 4260 /* 4260 4261 * If userspace sets the riccb (e.g. after migration) to a valid state,

+9

arch/s390/kvm/kvm-s390.h

··· 208 208 return kvm->arch.user_cpu_state_ctrl != 0; 209 209 } 210 210 211 + static inline void kvm_s390_set_user_cpu_state_ctrl(struct kvm *kvm) 212 + { 213 + if (kvm->arch.user_cpu_state_ctrl) 214 + return; 215 + 216 + VM_EVENT(kvm, 3, "%s", "ENABLE: Userspace CPU state control"); 217 + kvm->arch.user_cpu_state_ctrl = 1; 218 + } 219 + 211 220 /* implemented in pv.c */ 212 221 int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc); 213 222 int kvm_s390_pv_create_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc);

+2

arch/s390/kvm/priv.c

··· 397 397 mmap_read_unlock(current->mm); 398 398 if (rc == -EFAULT) 399 399 return kvm_s390_inject_program_int(vcpu, PGM_ADDRESSING); 400 + if (rc == -EAGAIN) 401 + continue; 400 402 if (rc < 0) 401 403 return rc; 402 404 start += PAGE_SIZE;

+10 -11

arch/s390/kvm/pv.c

··· 16 16 17 17 int kvm_s390_pv_destroy_cpu(struct kvm_vcpu *vcpu, u16 *rc, u16 *rrc) 18 18 { 19 - int cc = 0; 19 + int cc; 20 20 21 - if (kvm_s390_pv_cpu_get_handle(vcpu)) { 22 - cc = uv_cmd_nodata(kvm_s390_pv_cpu_get_handle(vcpu), 23 - UVC_CMD_DESTROY_SEC_CPU, rc, rrc); 21 + if (!kvm_s390_pv_cpu_get_handle(vcpu)) 22 + return 0; 24 23 25 - KVM_UV_EVENT(vcpu->kvm, 3, 26 - "PROTVIRT DESTROY VCPU %d: rc %x rrc %x", 27 - vcpu->vcpu_id, *rc, *rrc); 28 - WARN_ONCE(cc, "protvirt destroy cpu failed rc %x rrc %x", 29 - *rc, *rrc); 30 - } 24 + cc = uv_cmd_nodata(kvm_s390_pv_cpu_get_handle(vcpu), UVC_CMD_DESTROY_SEC_CPU, rc, rrc); 25 + 26 + KVM_UV_EVENT(vcpu->kvm, 3, "PROTVIRT DESTROY VCPU %d: rc %x rrc %x", 27 + vcpu->vcpu_id, *rc, *rrc); 28 + WARN_ONCE(cc, "protvirt destroy cpu failed rc %x rrc %x", *rc, *rrc); 29 + 31 30 /* Intended memory leak for something that should never happen. */ 32 31 if (!cc) 33 32 free_pages(vcpu->arch.pv.stor_base, ··· 195 196 uvcb.conf_base_stor_origin = (u64)kvm->arch.pv.stor_base; 196 197 uvcb.conf_virt_stor_origin = (u64)kvm->arch.pv.stor_var; 197 198 198 - cc = uv_call(0, (u64)&uvcb); 199 + cc = uv_call_sched(0, (u64)&uvcb); 199 200 *rc = uvcb.header.rc; 200 201 *rrc = uvcb.header.rrc; 201 202 KVM_UV_EVENT(kvm, 3, "PROTVIRT CREATE VM: handle %llx len %llx rc %x rrc %x",

+1 -13

arch/s390/kvm/sigp.c

··· 151 151 static int __sigp_set_arch(struct kvm_vcpu *vcpu, u32 parameter, 152 152 u64 *status_reg) 153 153 { 154 - unsigned int i; 155 - struct kvm_vcpu *v; 156 - bool all_stopped = true; 157 - 158 - kvm_for_each_vcpu(i, v, vcpu->kvm) { 159 - if (v == vcpu) 160 - continue; 161 - if (!is_vcpu_stopped(v)) 162 - all_stopped = false; 163 - } 164 - 165 154 *status_reg &= 0xffffffff00000000UL; 166 155 167 156 /* Reject set arch order, with czam we're always in z/Arch mode. */ 168 - *status_reg |= (all_stopped ? SIGP_STATUS_INVALID_PARAMETER : 169 - SIGP_STATUS_INCORRECT_STATE); 157 + *status_reg |= SIGP_STATUS_INVALID_PARAMETER; 170 158 return SIGP_CC_STATUS_STORED; 171 159 } 172 160

+12 -3

arch/s390/mm/gmap.c

··· 672 672 */ 673 673 void __gmap_zap(struct gmap *gmap, unsigned long gaddr) 674 674 { 675 + struct vm_area_struct *vma; 675 676 unsigned long vmaddr; 676 677 spinlock_t *ptl; 677 678 pte_t *ptep; ··· 682 681 gaddr >> PMD_SHIFT); 683 682 if (vmaddr) { 684 683 vmaddr |= gaddr & ~PMD_MASK; 684 + 685 + vma = vma_lookup(gmap->mm, vmaddr); 686 + if (!vma || is_vm_hugetlb_page(vma)) 687 + return; 688 + 685 689 /* Get pointer to the page table entry */ 686 690 ptep = get_locked_pte(gmap->mm, vmaddr, &ptl); 687 - if (likely(ptep)) 691 + if (likely(ptep)) { 688 692 ptep_zap_unused(gmap->mm, vmaddr, ptep, 0); 689 - pte_unmap_unlock(ptep, ptl); 693 + pte_unmap_unlock(ptep, ptl); 694 + } 690 695 } 691 696 } 692 697 EXPORT_SYMBOL_GPL(__gmap_zap); ··· 2684 2677 { 2685 2678 pte_t pte = READ_ONCE(*ptep); 2686 2679 2680 + /* There is a reference through the mapping */ 2687 2681 if (pte_present(pte)) 2688 - WARN_ON_ONCE(uv_destroy_page(pte_val(pte) & PAGE_MASK)); 2682 + WARN_ON_ONCE(uv_destroy_owned_page(pte_val(pte) & PAGE_MASK)); 2683 + 2689 2684 return 0; 2690 2685 } 2691 2686

+77 -32

arch/s390/mm/pgtable.c

··· 429 429 } 430 430 431 431 #ifdef CONFIG_PGSTE 432 - static pmd_t *pmd_alloc_map(struct mm_struct *mm, unsigned long addr) 432 + static int pmd_lookup(struct mm_struct *mm, unsigned long addr, pmd_t **pmdp) 433 433 { 434 + struct vm_area_struct *vma; 434 435 pgd_t *pgd; 435 436 p4d_t *p4d; 436 437 pud_t *pud; 437 - pmd_t *pmd; 438 + 439 + /* We need a valid VMA, otherwise this is clearly a fault. */ 440 + vma = vma_lookup(mm, addr); 441 + if (!vma) 442 + return -EFAULT; 438 443 439 444 pgd = pgd_offset(mm, addr); 440 - p4d = p4d_alloc(mm, pgd, addr); 441 - if (!p4d) 442 - return NULL; 443 - pud = pud_alloc(mm, p4d, addr); 444 - if (!pud) 445 - return NULL; 446 - pmd = pmd_alloc(mm, pud, addr); 447 - return pmd; 445 + if (!pgd_present(*pgd)) 446 + return -ENOENT; 447 + 448 + p4d = p4d_offset(pgd, addr); 449 + if (!p4d_present(*p4d)) 450 + return -ENOENT; 451 + 452 + pud = pud_offset(p4d, addr); 453 + if (!pud_present(*pud)) 454 + return -ENOENT; 455 + 456 + /* Large PUDs are not supported yet. */ 457 + if (pud_large(*pud)) 458 + return -EFAULT; 459 + 460 + *pmdp = pmd_offset(pud, addr); 461 + return 0; 448 462 } 449 463 #endif 450 464 ··· 792 778 pmd_t *pmdp; 793 779 pte_t *ptep; 794 780 795 - pmdp = pmd_alloc_map(mm, addr); 796 - if (unlikely(!pmdp)) 781 + /* 782 + * If we don't have a PTE table and if there is no huge page mapped, 783 + * we can ignore attempts to set the key to 0, because it already is 0. 784 + */ 785 + switch (pmd_lookup(mm, addr, &pmdp)) { 786 + case -ENOENT: 787 + return key ? -EFAULT : 0; 788 + case 0: 789 + break; 790 + default: 797 791 return -EFAULT; 792 + } 798 793 799 794 ptl = pmd_lock(mm, pmdp); 800 795 if (!pmd_present(*pmdp)) { 801 796 spin_unlock(ptl); 802 - return -EFAULT; 797 + return key ? -EFAULT : 0; 803 798 } 804 799 805 800 if (pmd_large(*pmdp)) { ··· 824 801 } 825 802 spin_unlock(ptl); 826 803 827 - ptep = pte_alloc_map_lock(mm, pmdp, addr, &ptl); 828 - if (unlikely(!ptep)) 829 - return -EFAULT; 830 - 804 + ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); 831 805 new = old = pgste_get_lock(ptep); 832 806 pgste_val(new) &= ~(PGSTE_GR_BIT | PGSTE_GC_BIT | 833 807 PGSTE_ACC_BITS | PGSTE_FP_BIT); ··· 901 881 pte_t *ptep; 902 882 int cc = 0; 903 883 904 - pmdp = pmd_alloc_map(mm, addr); 905 - if (unlikely(!pmdp)) 884 + /* 885 + * If we don't have a PTE table and if there is no huge page mapped, 886 + * the storage key is 0 and there is nothing for us to do. 887 + */ 888 + switch (pmd_lookup(mm, addr, &pmdp)) { 889 + case -ENOENT: 890 + return 0; 891 + case 0: 892 + break; 893 + default: 906 894 return -EFAULT; 895 + } 907 896 908 897 ptl = pmd_lock(mm, pmdp); 909 898 if (!pmd_present(*pmdp)) { 910 899 spin_unlock(ptl); 911 - return -EFAULT; 900 + return 0; 912 901 } 913 902 914 903 if (pmd_large(*pmdp)) { ··· 929 900 } 930 901 spin_unlock(ptl); 931 902 932 - ptep = pte_alloc_map_lock(mm, pmdp, addr, &ptl); 933 - if (unlikely(!ptep)) 934 - return -EFAULT; 935 - 903 + ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); 936 904 new = old = pgste_get_lock(ptep); 937 905 /* Reset guest reference bit only */ 938 906 pgste_val(new) &= ~PGSTE_GR_BIT; ··· 961 935 pmd_t *pmdp; 962 936 pte_t *ptep; 963 937 964 - pmdp = pmd_alloc_map(mm, addr); 965 - if (unlikely(!pmdp)) 938 + /* 939 + * If we don't have a PTE table and if there is no huge page mapped, 940 + * the storage key is 0. 941 + */ 942 + *key = 0; 943 + 944 + switch (pmd_lookup(mm, addr, &pmdp)) { 945 + case -ENOENT: 946 + return 0; 947 + case 0: 948 + break; 949 + default: 966 950 return -EFAULT; 951 + } 967 952 968 953 ptl = pmd_lock(mm, pmdp); 969 954 if (!pmd_present(*pmdp)) { 970 - /* Not yet mapped memory has a zero key */ 971 955 spin_unlock(ptl); 972 - *key = 0; 973 956 return 0; 974 957 } 975 958 ··· 991 956 } 992 957 spin_unlock(ptl); 993 958 994 - ptep = pte_alloc_map_lock(mm, pmdp, addr, &ptl); 995 - if (unlikely(!ptep)) 996 - return -EFAULT; 997 - 959 + ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl); 998 960 pgste = pgste_get_lock(ptep); 999 961 *key = (pgste_val(pgste) & (PGSTE_ACC_BITS | PGSTE_FP_BIT)) >> 56; 1000 962 paddr = pte_val(*ptep) & PAGE_MASK; ··· 1020 988 int pgste_perform_essa(struct mm_struct *mm, unsigned long hva, int orc, 1021 989 unsigned long *oldpte, unsigned long *oldpgste) 1022 990 { 991 + struct vm_area_struct *vma; 1023 992 unsigned long pgstev; 1024 993 spinlock_t *ptl; 1025 994 pgste_t pgste; ··· 1030 997 WARN_ON_ONCE(orc > ESSA_MAX); 1031 998 if (unlikely(orc > ESSA_MAX)) 1032 999 return -EINVAL; 1000 + 1001 + vma = vma_lookup(mm, hva); 1002 + if (!vma || is_vm_hugetlb_page(vma)) 1003 + return -EFAULT; 1033 1004 ptep = get_locked_pte(mm, hva, &ptl); 1034 1005 if (unlikely(!ptep)) 1035 1006 return -EFAULT; ··· 1126 1089 int set_pgste_bits(struct mm_struct *mm, unsigned long hva, 1127 1090 unsigned long bits, unsigned long value) 1128 1091 { 1092 + struct vm_area_struct *vma; 1129 1093 spinlock_t *ptl; 1130 1094 pgste_t new; 1131 1095 pte_t *ptep; 1132 1096 1097 + vma = vma_lookup(mm, hva); 1098 + if (!vma || is_vm_hugetlb_page(vma)) 1099 + return -EFAULT; 1133 1100 ptep = get_locked_pte(mm, hva, &ptl); 1134 1101 if (unlikely(!ptep)) 1135 1102 return -EFAULT; ··· 1158 1117 */ 1159 1118 int get_pgste(struct mm_struct *mm, unsigned long hva, unsigned long *pgstep) 1160 1119 { 1120 + struct vm_area_struct *vma; 1161 1121 spinlock_t *ptl; 1162 1122 pte_t *ptep; 1163 1123 1124 + vma = vma_lookup(mm, hva); 1125 + if (!vma || is_vm_hugetlb_page(vma)) 1126 + return -EFAULT; 1164 1127 ptep = get_locked_pte(mm, hva, &ptl); 1165 1128 if (unlikely(!ptep)) 1166 1129 return -EFAULT;

+30 -18

arch/x86/include/asm/kvm_host.h

··· 50 50 * so ratio of 4 should be enough. 51 51 */ 52 52 #define KVM_VCPU_ID_RATIO 4 53 - #define KVM_MAX_VCPU_ID (KVM_MAX_VCPUS * KVM_VCPU_ID_RATIO) 53 + #define KVM_MAX_VCPU_IDS (KVM_MAX_VCPUS * KVM_VCPU_ID_RATIO) 54 54 55 55 /* memory slots that are not exposed to userspace */ 56 56 #define KVM_PRIVATE_MEM_SLOTS 3 ··· 407 407 #define KVM_HAVE_MMU_RWLOCK 408 408 409 409 struct kvm_mmu_page; 410 + struct kvm_page_fault; 410 411 411 412 /* 412 413 * x86 supports 4 paging modes (5-level 64-bit, 4-level 64-bit, 3-level 32-bit, ··· 417 416 struct kvm_mmu { 418 417 unsigned long (*get_guest_pgd)(struct kvm_vcpu *vcpu); 419 418 u64 (*get_pdptr)(struct kvm_vcpu *vcpu, int index); 420 - int (*page_fault)(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u32 err, 421 - bool prefault); 419 + int (*page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); 422 420 void (*inject_page_fault)(struct kvm_vcpu *vcpu, 423 421 struct x86_exception *fault); 424 422 gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gpa_t gva_or_gpa, ··· 499 499 u64 fixed_ctr_ctrl; 500 500 u64 global_ctrl; 501 501 u64 global_status; 502 - u64 global_ovf_ctrl; 503 502 u64 counter_bitmask[2]; 504 503 u64 global_ctrl_mask; 505 504 u64 global_ovf_ctrl_mask; ··· 580 581 struct kvm_hyperv_exit exit; 581 582 struct kvm_vcpu_hv_stimer stimer[HV_SYNIC_STIMER_COUNT]; 582 583 DECLARE_BITMAP(stimer_pending_bitmap, HV_SYNIC_STIMER_COUNT); 583 - cpumask_t tlb_flush; 584 584 bool enforce_cpuid; 585 585 struct { 586 586 u32 features_eax; /* HYPERV_CPUID_FEATURES.EAX */ ··· 1071 1073 atomic_t apic_map_dirty; 1072 1074 1073 1075 /* Protects apic_access_memslot_enabled and apicv_inhibit_reasons */ 1074 - struct mutex apicv_update_lock; 1076 + struct rw_semaphore apicv_update_lock; 1075 1077 1076 1078 bool apic_access_memslot_enabled; 1077 1079 unsigned long apicv_inhibit_reasons; ··· 1085 1087 1086 1088 unsigned long irq_sources_bitmap; 1087 1089 s64 kvmclock_offset; 1090 + 1091 + /* 1092 + * This also protects nr_vcpus_matched_tsc which is read from a 1093 + * preemption-disabled region, so it must be a raw spinlock. 1094 + */ 1088 1095 raw_spinlock_t tsc_write_lock; 1089 1096 u64 last_tsc_nsec; 1090 1097 u64 last_tsc_write; 1091 1098 u32 last_tsc_khz; 1099 + u64 last_tsc_offset; 1092 1100 u64 cur_tsc_nsec; 1093 1101 u64 cur_tsc_write; 1094 1102 u64 cur_tsc_offset; 1095 1103 u64 cur_tsc_generation; 1096 1104 int nr_vcpus_matched_tsc; 1097 1105 1098 - raw_spinlock_t pvclock_gtod_sync_lock; 1106 + seqcount_raw_spinlock_t pvclock_sc; 1099 1107 bool use_master_clock; 1100 1108 u64 master_kernel_ns; 1101 1109 u64 master_cycle_now; ··· 1211 1207 #endif /* CONFIG_X86_64 */ 1212 1208 1213 1209 /* 1214 - * If set, rmaps have been allocated for all memslots and should be 1215 - * allocated for any newly created or modified memslots. 1210 + * If set, at least one shadow root has been allocated. This flag 1211 + * is used as one input when determining whether certain memslot 1212 + * related allocations are necessary. 1216 1213 */ 1217 - bool memslots_have_rmaps; 1214 + bool shadow_root_allocated; 1218 1215 1219 1216 #if IS_ENABLED(CONFIG_HYPERV) 1220 1217 hpa_t hv_root_tdp; ··· 1301 1296 } 1302 1297 1303 1298 struct kvm_x86_ops { 1299 + const char *name; 1300 + 1304 1301 int (*hardware_enable)(void); 1305 1302 void (*hardware_disable)(void); 1306 1303 void (*hardware_unsetup)(void); ··· 1412 1405 void (*write_tsc_multiplier)(struct kvm_vcpu *vcpu, u64 multiplier); 1413 1406 1414 1407 /* 1415 - * Retrieve somewhat arbitrary exit information. Intended to be used 1416 - * only from within tracepoints to avoid VMREADs when tracing is off. 1408 + * Retrieve somewhat arbitrary exit information. Intended to 1409 + * be used only from within tracepoints or error paths. 1417 1410 */ 1418 - void (*get_exit_info)(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2, 1411 + void (*get_exit_info)(struct kvm_vcpu *vcpu, u32 *reason, 1412 + u64 *info1, u64 *info2, 1419 1413 u32 *exit_int_info, u32 *exit_int_info_err_code); 1420 1414 1421 1415 int (*check_intercept)(struct kvm_vcpu *vcpu, ··· 1549 1541 { 1550 1542 return __vmalloc(kvm_x86_ops.vm_size, GFP_KERNEL_ACCOUNT | __GFP_ZERO); 1551 1543 } 1544 + 1545 + #define __KVM_HAVE_ARCH_VM_FREE 1552 1546 void kvm_arch_free_vm(struct kvm *kvm); 1553 1547 1554 1548 #define __KVM_HAVE_ARCH_FLUSH_REMOTE_TLB ··· 1667 1657 int kvm_emulate_instruction(struct kvm_vcpu *vcpu, int emulation_type); 1668 1658 int kvm_emulate_instruction_from_buffer(struct kvm_vcpu *vcpu, 1669 1659 void *insn, int insn_len); 1660 + void __kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, 1661 + u64 *data, u8 ndata); 1662 + void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu); 1670 1663 1671 1664 void kvm_enable_efer_bits(u64); 1672 1665 bool kvm_valid_efer(struct kvm_vcpu *vcpu, u64 efer); ··· 1726 1713 void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault); 1727 1714 bool kvm_inject_emulated_page_fault(struct kvm_vcpu *vcpu, 1728 1715 struct x86_exception *fault); 1729 - int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, 1730 - gfn_t gfn, void *data, int offset, int len, 1731 - u32 access); 1732 1716 bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl); 1733 1717 bool kvm_require_dr(struct kvm_vcpu *vcpu, int dr); 1734 1718 ··· 1874 1864 unsigned long kvm_get_linear_rip(struct kvm_vcpu *vcpu); 1875 1865 bool kvm_is_linear_rip(struct kvm_vcpu *vcpu, unsigned long linear_rip); 1876 1866 1877 - void kvm_make_mclock_inprogress_request(struct kvm *kvm); 1878 1867 void kvm_make_scan_ioapic_request(struct kvm *kvm); 1879 1868 void kvm_make_scan_ioapic_request_mask(struct kvm *kvm, 1880 1869 unsigned long *vcpu_bitmap); ··· 1942 1933 1943 1934 int kvm_cpu_dirty_log_size(void); 1944 1935 1945 - int alloc_all_memslots_rmaps(struct kvm *kvm); 1936 + int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages); 1937 + 1938 + #define KVM_CLOCK_VALID_FLAGS \ 1939 + (KVM_CLOCK_TSC_STABLE | KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC) 1946 1940 1947 1941 #endif /* _ASM_X86_KVM_HOST_H */

+8 -3

arch/x86/include/asm/kvm_page_track.h

··· 49 49 int kvm_page_track_init(struct kvm *kvm); 50 50 void kvm_page_track_cleanup(struct kvm *kvm); 51 51 52 + bool kvm_page_track_write_tracking_enabled(struct kvm *kvm); 53 + int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot); 54 + 52 55 void kvm_page_track_free_memslot(struct kvm_memory_slot *slot); 53 - int kvm_page_track_create_memslot(struct kvm_memory_slot *slot, 56 + int kvm_page_track_create_memslot(struct kvm *kvm, 57 + struct kvm_memory_slot *slot, 54 58 unsigned long npages); 55 59 56 60 void kvm_slot_page_track_add_page(struct kvm *kvm, ··· 63 59 void kvm_slot_page_track_remove_page(struct kvm *kvm, 64 60 struct kvm_memory_slot *slot, gfn_t gfn, 65 61 enum kvm_page_track_mode mode); 66 - bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn, 67 - enum kvm_page_track_mode mode); 62 + bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu, 63 + struct kvm_memory_slot *slot, gfn_t gfn, 64 + enum kvm_page_track_mode mode); 68 65 69 66 void 70 67 kvm_page_track_register_notifier(struct kvm *kvm,

+4

arch/x86/include/uapi/asm/kvm.h

··· 504 504 #define KVM_PMU_EVENT_ALLOW 0 505 505 #define KVM_PMU_EVENT_DENY 1 506 506 507 + /* for KVM_{GET,SET,HAS}_DEVICE_ATTR */ 508 + #define KVM_VCPU_TSC_CTRL 0 /* control group for the timestamp counter (TSC) */ 509 + #define KVM_VCPU_TSC_OFFSET 0 /* attribute for the TSC offset */ 510 + 507 511 #endif /* _ASM_X86_KVM_H */

+3 -1

arch/x86/kernel/irq.c

··· 291 291 { 292 292 if (handler) 293 293 kvm_posted_intr_wakeup_handler = handler; 294 - else 294 + else { 295 295 kvm_posted_intr_wakeup_handler = dummy_handler; 296 + synchronize_rcu(); 297 + } 296 298 } 297 299 EXPORT_SYMBOL_GPL(kvm_set_posted_intr_wakeup_handler); 298 300

+3

arch/x86/kvm/Kconfig

··· 129 129 This option adds a R/W kVM module parameter 'mmu_audit', which allows 130 130 auditing of KVM MMU events at runtime. 131 131 132 + config KVM_EXTERNAL_WRITE_TRACKING 133 + bool 134 + 132 135 endif # VIRTUALIZATION

+9 -1

arch/x86/kvm/cpuid.c

··· 53 53 return ret; 54 54 } 55 55 56 + /* 57 + * This one is tied to SSB in the user API, and not 58 + * visible in /proc/cpuinfo. 59 + */ 60 + #define KVM_X86_FEATURE_PSFD (13*32+28) /* Predictive Store Forwarding Disable */ 61 + 56 62 #define F feature_bit 57 63 #define SF(name) (boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0) 64 + 58 65 59 66 static inline struct kvm_cpuid_entry2 *cpuid_entry2_find( 60 67 struct kvm_cpuid_entry2 *entries, int nent, u32 function, u32 index) ··· 507 500 kvm_cpu_cap_mask(CPUID_8000_0008_EBX, 508 501 F(CLZERO) | F(XSAVEERPTR) | 509 502 F(WBNOINVD) | F(AMD_IBPB) | F(AMD_IBRS) | F(AMD_SSBD) | F(VIRT_SSBD) | 510 - F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON) 503 + F(AMD_SSB_NO) | F(AMD_STIBP) | F(AMD_STIBP_ALWAYS_ON) | 504 + __feature_bit(KVM_X86_FEATURE_PSFD) 511 505 ); 512 506 513 507 /*

+5

arch/x86/kvm/emulate.c

··· 4222 4222 if (enable_vmware_backdoor && is_vmware_backdoor_pmc(rcx)) 4223 4223 return X86EMUL_CONTINUE; 4224 4224 4225 + /* 4226 + * If CR4.PCE is set, the SDM requires CPL=0 or CR0.PE=0. The CR0.PE 4227 + * check however is unnecessary because CPL is always 0 outside 4228 + * protected mode. 4229 + */ 4225 4230 if ((!(cr4 & X86_CR4_PCE) && ctxt->ops->cpl(ctxt)) || 4226 4231 ctxt->ops->check_pmc(ctxt, rcx)) 4227 4232 return emulate_gp(ctxt, 0);

+11 -11

arch/x86/kvm/hyperv.c

··· 112 112 if (!!auto_eoi_old == !!auto_eoi_new) 113 113 return; 114 114 115 - mutex_lock(&vcpu->kvm->arch.apicv_update_lock); 115 + down_write(&vcpu->kvm->arch.apicv_update_lock); 116 116 117 117 if (auto_eoi_new) 118 118 hv->synic_auto_eoi_used++; ··· 123 123 !hv->synic_auto_eoi_used, 124 124 APICV_INHIBIT_REASON_HYPERV); 125 125 126 - mutex_unlock(&vcpu->kvm->arch.apicv_update_lock); 126 + up_write(&vcpu->kvm->arch.apicv_update_lock); 127 127 } 128 128 129 129 static int synic_set_sint(struct kvm_vcpu_hv_synic *synic, int sint, ··· 1754 1754 int i; 1755 1755 gpa_t gpa; 1756 1756 struct kvm *kvm = vcpu->kvm; 1757 - struct kvm_vcpu_hv *hv_vcpu = to_hv_vcpu(vcpu); 1758 1757 struct hv_tlb_flush_ex flush_ex; 1759 1758 struct hv_tlb_flush flush; 1760 1759 u64 vp_bitmap[KVM_HV_MAX_SPARSE_VCPU_SET_BITS]; ··· 1835 1836 } 1836 1837 } 1837 1838 1838 - cpumask_clear(&hv_vcpu->tlb_flush); 1839 - 1840 - vcpu_mask = all_cpus ? NULL : 1841 - sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, 1842 - vp_bitmap, vcpu_bitmap); 1843 - 1844 1839 /* 1845 1840 * vcpu->arch.cr3 may not be up-to-date for running vCPUs so we can't 1846 1841 * analyze it here, flush TLB regardless of the specified address space. 1847 1842 */ 1848 - kvm_make_vcpus_request_mask(kvm, KVM_REQ_TLB_FLUSH_GUEST, 1849 - NULL, vcpu_mask, &hv_vcpu->tlb_flush); 1843 + if (all_cpus) { 1844 + kvm_make_all_cpus_request(kvm, KVM_REQ_TLB_FLUSH_GUEST); 1845 + } else { 1846 + vcpu_mask = sparse_set_to_vcpu_mask(kvm, sparse_banks, valid_bank_mask, 1847 + vp_bitmap, vcpu_bitmap); 1848 + 1849 + kvm_make_vcpus_request_mask(kvm, KVM_REQ_TLB_FLUSH_GUEST, 1850 + vcpu_mask); 1851 + } 1850 1852 1851 1853 ret_success: 1852 1854 /* We always do full TLB flush, set 'Reps completed' = 'Rep Count' */

+1 -1

arch/x86/kvm/ioapic.c

··· 96 96 static void rtc_irq_eoi_tracking_reset(struct kvm_ioapic *ioapic) 97 97 { 98 98 ioapic->rtc_status.pending_eoi = 0; 99 - bitmap_zero(ioapic->rtc_status.dest_map.map, KVM_MAX_VCPU_ID + 1); 99 + bitmap_zero(ioapic->rtc_status.dest_map.map, KVM_MAX_VCPU_IDS); 100 100 } 101 101 102 102 static void kvm_rtc_eoi_tracking_restore_all(struct kvm_ioapic *ioapic);

+2 -2

arch/x86/kvm/ioapic.h

··· 39 39 40 40 struct dest_map { 41 41 /* vcpu bitmap where IRQ has been sent */ 42 - DECLARE_BITMAP(map, KVM_MAX_VCPU_ID + 1); 42 + DECLARE_BITMAP(map, KVM_MAX_VCPU_IDS); 43 43 44 44 /* 45 45 * Vector sent to a given vcpu, only valid when 46 46 * the vcpu's bit in map is set 47 47 */ 48 - u8 vectors[KVM_MAX_VCPU_ID + 1]; 48 + u8 vectors[KVM_MAX_VCPU_IDS]; 49 49 }; 50 50 51 51

+100 -14

arch/x86/kvm/mmu.h

··· 44 44 #define PT32_ROOT_LEVEL 2 45 45 #define PT32E_ROOT_LEVEL 3 46 46 47 - #define KVM_MMU_CR4_ROLE_BITS (X86_CR4_PGE | X86_CR4_PSE | X86_CR4_PAE | \ 48 - X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE | \ 49 - X86_CR4_LA57) 47 + #define KVM_MMU_CR4_ROLE_BITS (X86_CR4_PSE | X86_CR4_PAE | X86_CR4_LA57 | \ 48 + X86_CR4_SMEP | X86_CR4_SMAP | X86_CR4_PKE) 50 49 51 50 #define KVM_MMU_CR0_ROLE_BITS (X86_CR0_PG | X86_CR0_WP) 52 51 ··· 79 80 int kvm_mmu_load(struct kvm_vcpu *vcpu); 80 81 void kvm_mmu_unload(struct kvm_vcpu *vcpu); 81 82 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu); 83 + void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu); 82 84 83 85 static inline int kvm_mmu_reload(struct kvm_vcpu *vcpu) 84 86 { ··· 114 114 vcpu->arch.mmu->shadow_root_level); 115 115 } 116 116 117 - int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, 118 - bool prefault); 117 + struct kvm_page_fault { 118 + /* arguments to kvm_mmu_do_page_fault. */ 119 + const gpa_t addr; 120 + const u32 error_code; 121 + const bool prefetch; 122 + 123 + /* Derived from error_code. */ 124 + const bool exec; 125 + const bool write; 126 + const bool present; 127 + const bool rsvd; 128 + const bool user; 129 + 130 + /* Derived from mmu and global state. */ 131 + const bool is_tdp; 132 + const bool nx_huge_page_workaround_enabled; 133 + 134 + /* 135 + * Whether a >4KB mapping can be created or is forbidden due to NX 136 + * hugepages. 137 + */ 138 + bool huge_page_disallowed; 139 + 140 + /* 141 + * Maximum page size that can be created for this fault; input to 142 + * FNAME(fetch), __direct_map and kvm_tdp_mmu_map. 143 + */ 144 + u8 max_level; 145 + 146 + /* 147 + * Page size that can be created based on the max_level and the 148 + * page size used by the host mapping. 149 + */ 150 + u8 req_level; 151 + 152 + /* 153 + * Page size that will be created based on the req_level and 154 + * huge_page_disallowed. 155 + */ 156 + u8 goal_level; 157 + 158 + /* Shifted addr, or result of guest page table walk if addr is a gva. */ 159 + gfn_t gfn; 160 + 161 + /* The memslot containing gfn. May be NULL. */ 162 + struct kvm_memory_slot *slot; 163 + 164 + /* Outputs of kvm_faultin_pfn. */ 165 + kvm_pfn_t pfn; 166 + hva_t hva; 167 + bool map_writable; 168 + }; 169 + 170 + int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); 171 + 172 + extern int nx_huge_pages; 173 + static inline bool is_nx_huge_page_enabled(void) 174 + { 175 + return READ_ONCE(nx_huge_pages); 176 + } 119 177 120 178 static inline int kvm_mmu_do_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, 121 - u32 err, bool prefault) 179 + u32 err, bool prefetch) 122 180 { 181 + struct kvm_page_fault fault = { 182 + .addr = cr2_or_gpa, 183 + .error_code = err, 184 + .exec = err & PFERR_FETCH_MASK, 185 + .write = err & PFERR_WRITE_MASK, 186 + .present = err & PFERR_PRESENT_MASK, 187 + .rsvd = err & PFERR_RSVD_MASK, 188 + .user = err & PFERR_USER_MASK, 189 + .prefetch = prefetch, 190 + .is_tdp = likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault), 191 + .nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(), 192 + 193 + .max_level = KVM_MAX_HUGEPAGE_LEVEL, 194 + .req_level = PG_LEVEL_4K, 195 + .goal_level = PG_LEVEL_4K, 196 + }; 123 197 #ifdef CONFIG_RETPOLINE 124 - if (likely(vcpu->arch.mmu->page_fault == kvm_tdp_page_fault)) 125 - return kvm_tdp_page_fault(vcpu, cr2_or_gpa, err, prefault); 198 + if (fault.is_tdp) 199 + return kvm_tdp_page_fault(vcpu, &fault); 126 200 #endif 127 - return vcpu->arch.mmu->page_fault(vcpu, cr2_or_gpa, err, prefault); 201 + return vcpu->arch.mmu->page_fault(vcpu, &fault); 128 202 } 129 203 130 204 /* ··· 304 230 int kvm_mmu_post_init_vm(struct kvm *kvm); 305 231 void kvm_mmu_pre_destroy_vm(struct kvm *kvm); 306 232 307 - static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) 233 + static inline bool kvm_shadow_root_allocated(struct kvm *kvm) 308 234 { 309 235 /* 310 - * Read memslot_have_rmaps before rmap pointers. Hence, threads reading 311 - * memslots_have_rmaps in any lock context are guaranteed to see the 312 - * pointers. Pairs with smp_store_release in alloc_all_memslots_rmaps. 236 + * Read shadow_root_allocated before related pointers. Hence, threads 237 + * reading shadow_root_allocated in any lock context are guaranteed to 238 + * see the pointers. Pairs with smp_store_release in 239 + * mmu_first_shadow_root_alloc. 313 240 */ 314 - return smp_load_acquire(&kvm->arch.memslots_have_rmaps); 241 + return smp_load_acquire(&kvm->arch.shadow_root_allocated); 242 + } 243 + 244 + #ifdef CONFIG_X86_64 245 + static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; } 246 + #else 247 + static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; } 248 + #endif 249 + 250 + static inline bool kvm_memslots_have_rmaps(struct kvm *kvm) 251 + { 252 + return !is_tdp_mmu_enabled(kvm) || kvm_shadow_root_allocated(kvm); 315 253 } 316 254 317 255 static inline gfn_t gfn_to_index(gfn_t gfn, gfn_t base_gfn, int level)

+363 -343

arch/x86/kvm/mmu/mmu.c

··· 58 58 extern bool itlb_multihit_kvm_mitigation; 59 59 60 60 int __read_mostly nx_huge_pages = -1; 61 + static uint __read_mostly nx_huge_pages_recovery_period_ms; 61 62 #ifdef CONFIG_PREEMPT_RT 62 63 /* Recovery can cause latency spikes, disable it for PREEMPT_RT. */ 63 64 static uint __read_mostly nx_huge_pages_recovery_ratio = 0; ··· 67 66 #endif 68 67 69 68 static int set_nx_huge_pages(const char *val, const struct kernel_param *kp); 70 - static int set_nx_huge_pages_recovery_ratio(const char *val, const struct kernel_param *kp); 69 + static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel_param *kp); 71 70 72 71 static const struct kernel_param_ops nx_huge_pages_ops = { 73 72 .set = set_nx_huge_pages, 74 73 .get = param_get_bool, 75 74 }; 76 75 77 - static const struct kernel_param_ops nx_huge_pages_recovery_ratio_ops = { 78 - .set = set_nx_huge_pages_recovery_ratio, 76 + static const struct kernel_param_ops nx_huge_pages_recovery_param_ops = { 77 + .set = set_nx_huge_pages_recovery_param, 79 78 .get = param_get_uint, 80 79 }; 81 80 82 81 module_param_cb(nx_huge_pages, &nx_huge_pages_ops, &nx_huge_pages, 0644); 83 82 __MODULE_PARM_TYPE(nx_huge_pages, "bool"); 84 - module_param_cb(nx_huge_pages_recovery_ratio, &nx_huge_pages_recovery_ratio_ops, 83 + module_param_cb(nx_huge_pages_recovery_ratio, &nx_huge_pages_recovery_param_ops, 85 84 &nx_huge_pages_recovery_ratio, 0644); 86 85 __MODULE_PARM_TYPE(nx_huge_pages_recovery_ratio, "uint"); 86 + module_param_cb(nx_huge_pages_recovery_period_ms, &nx_huge_pages_recovery_param_ops, 87 + &nx_huge_pages_recovery_period_ms, 0644); 88 + __MODULE_PARM_TYPE(nx_huge_pages_recovery_period_ms, "uint"); 87 89 88 90 static bool __read_mostly force_flush_and_sync_on_reuse; 89 91 module_param_named(flush_on_reuse, force_flush_and_sync_on_reuse, bool, 0644); ··· 1075 1071 return kvm_mmu_memory_cache_nr_free_objects(mc); 1076 1072 } 1077 1073 1078 - static int rmap_add(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) 1079 - { 1080 - struct kvm_memory_slot *slot; 1081 - struct kvm_mmu_page *sp; 1082 - struct kvm_rmap_head *rmap_head; 1083 - 1084 - sp = sptep_to_sp(spte); 1085 - kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn); 1086 - slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); 1087 - rmap_head = gfn_to_rmap(gfn, sp->role.level, slot); 1088 - return pte_list_add(vcpu, spte, rmap_head); 1089 - } 1090 - 1091 - 1092 1074 static void rmap_remove(struct kvm *kvm, u64 *spte) 1093 1075 { 1094 1076 struct kvm_memslots *slots; ··· 1087 1097 gfn = kvm_mmu_page_get_gfn(sp, spte - sp->spt); 1088 1098 1089 1099 /* 1090 - * Unlike rmap_add and rmap_recycle, rmap_remove does not run in the 1091 - * context of a vCPU so have to determine which memslots to use based 1092 - * on context information in sp->role. 1100 + * Unlike rmap_add, rmap_remove does not run in the context of a vCPU 1101 + * so we have to determine which memslots to use based on context 1102 + * information in sp->role. 1093 1103 */ 1094 1104 slots = kvm_memslots_for_spte_role(kvm, sp->role); 1095 1105 ··· 1629 1639 1630 1640 #define RMAP_RECYCLE_THRESHOLD 1000 1631 1641 1632 - static void rmap_recycle(struct kvm_vcpu *vcpu, u64 *spte, gfn_t gfn) 1642 + static void rmap_add(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, 1643 + u64 *spte, gfn_t gfn) 1633 1644 { 1634 - struct kvm_memory_slot *slot; 1635 - struct kvm_rmap_head *rmap_head; 1636 1645 struct kvm_mmu_page *sp; 1646 + struct kvm_rmap_head *rmap_head; 1647 + int rmap_count; 1637 1648 1638 1649 sp = sptep_to_sp(spte); 1639 - slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); 1650 + kvm_mmu_page_set_gfn(sp, spte - sp->spt, gfn); 1640 1651 rmap_head = gfn_to_rmap(gfn, sp->role.level, slot); 1652 + rmap_count = pte_list_add(vcpu, spte, rmap_head); 1641 1653 1642 - kvm_unmap_rmapp(vcpu->kvm, rmap_head, NULL, gfn, sp->role.level, __pte(0)); 1643 - kvm_flush_remote_tlbs_with_address(vcpu->kvm, sp->gfn, 1644 - KVM_PAGES_PER_HPAGE(sp->role.level)); 1654 + if (rmap_count > RMAP_RECYCLE_THRESHOLD) { 1655 + kvm_unmap_rmapp(vcpu->kvm, rmap_head, NULL, gfn, sp->role.level, __pte(0)); 1656 + kvm_flush_remote_tlbs_with_address( 1657 + vcpu->kvm, sp->gfn, KVM_PAGES_PER_HPAGE(sp->role.level)); 1658 + } 1645 1659 } 1646 1660 1647 1661 bool kvm_age_gfn(struct kvm *kvm, struct kvm_gfn_range *range) ··· 1789 1795 static int nonpaging_sync_page(struct kvm_vcpu *vcpu, 1790 1796 struct kvm_mmu_page *sp) 1791 1797 { 1792 - return 0; 1798 + return -1; 1793 1799 } 1794 1800 1795 1801 #define KVM_PAGE_ARRAY_NR 16 ··· 1903 1909 static bool kvm_sync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 1904 1910 struct list_head *invalid_list) 1905 1911 { 1906 - if (vcpu->arch.mmu->sync_page(vcpu, sp) == 0) { 1912 + int ret = vcpu->arch.mmu->sync_page(vcpu, sp); 1913 + 1914 + if (ret < 0) { 1907 1915 kvm_mmu_prepare_zap_page(vcpu->kvm, sp, invalid_list); 1908 1916 return false; 1909 1917 } 1910 1918 1911 - return true; 1919 + return !!ret; 1912 1920 } 1913 1921 1914 1922 static bool kvm_mmu_remote_flush_or_zap(struct kvm *kvm, ··· 1925 1929 else 1926 1930 kvm_flush_remote_tlbs(kvm); 1927 1931 return true; 1928 - } 1929 - 1930 - static void kvm_mmu_flush_or_zap(struct kvm_vcpu *vcpu, 1931 - struct list_head *invalid_list, 1932 - bool remote_flush, bool local_flush) 1933 - { 1934 - if (kvm_mmu_remote_flush_or_zap(vcpu->kvm, invalid_list, remote_flush)) 1935 - return; 1936 - 1937 - if (local_flush) 1938 - kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); 1939 1932 } 1940 1933 1941 1934 #ifdef CONFIG_KVM_MMU_AUDIT ··· 2029 2044 protected |= rmap_write_protect(vcpu, sp->gfn); 2030 2045 2031 2046 if (protected) { 2032 - kvm_flush_remote_tlbs(vcpu->kvm); 2047 + kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, true); 2033 2048 flush = false; 2034 2049 } 2035 2050 ··· 2039 2054 mmu_pages_clear_parents(&parents); 2040 2055 } 2041 2056 if (need_resched() || rwlock_needbreak(&vcpu->kvm->mmu_lock)) { 2042 - kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush); 2057 + kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush); 2043 2058 if (!can_yield) { 2044 2059 kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); 2045 2060 return -EINTR; ··· 2050 2065 } 2051 2066 } 2052 2067 2053 - kvm_mmu_flush_or_zap(vcpu, &invalid_list, false, flush); 2068 + kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush); 2054 2069 return 0; 2055 2070 } 2056 2071 ··· 2134 2149 break; 2135 2150 2136 2151 WARN_ON(!list_empty(&invalid_list)); 2137 - kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); 2152 + kvm_flush_remote_tlbs(vcpu->kvm); 2138 2153 } 2139 2154 2140 2155 __clear_sp_write_flooding_count(sp); ··· 2214 2229 static void __shadow_walk_next(struct kvm_shadow_walk_iterator *iterator, 2215 2230 u64 spte) 2216 2231 { 2217 - if (is_last_spte(spte, iterator->level)) { 2232 + if (!is_shadow_present_pte(spte) || is_last_spte(spte, iterator->level)) { 2218 2233 iterator->level = 0; 2219 2234 return; 2220 2235 } ··· 2576 2591 * were marked unsync (or if there is no shadow page), -EPERM if the SPTE must 2577 2592 * be write-protected. 2578 2593 */ 2579 - int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync) 2594 + int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, 2595 + gfn_t gfn, bool can_unsync, bool prefetch) 2580 2596 { 2581 2597 struct kvm_mmu_page *sp; 2582 2598 bool locked = false; ··· 2587 2601 * track machinery is used to write-protect upper-level shadow pages, 2588 2602 * i.e. this guards the role.level == 4K assertion below! 2589 2603 */ 2590 - if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE)) 2604 + if (kvm_slot_page_track_is_active(vcpu, slot, gfn, KVM_PAGE_TRACK_WRITE)) 2591 2605 return -EPERM; 2592 2606 2593 2607 /* ··· 2602 2616 2603 2617 if (sp->unsync) 2604 2618 continue; 2619 + 2620 + if (prefetch) 2621 + return -EEXIST; 2605 2622 2606 2623 /* 2607 2624 * TDP MMU page faults require an additional spinlock as they ··· 2669 2680 * (sp->unsync = true) 2670 2681 * 2671 2682 * The write barrier below ensures that 1.1 happens before 1.2 and thus 2672 - * the situation in 2.4 does not arise. The implicit barrier in 2.2 2673 - * pairs with this write barrier. 2683 + * the situation in 2.4 does not arise. It pairs with the read barrier 2684 + * in is_unsync_root(), placed between 2.1's load of SPTE.W and 2.3. 2674 2685 */ 2675 2686 smp_wmb(); 2676 2687 2677 2688 return 0; 2678 2689 } 2679 2690 2680 - static int set_spte(struct kvm_vcpu *vcpu, u64 *sptep, 2681 - unsigned int pte_access, int level, 2682 - gfn_t gfn, kvm_pfn_t pfn, bool speculative, 2683 - bool can_unsync, bool host_writable) 2691 + static int mmu_set_spte(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, 2692 + u64 *sptep, unsigned int pte_access, gfn_t gfn, 2693 + kvm_pfn_t pfn, struct kvm_page_fault *fault) 2684 2694 { 2685 - u64 spte; 2686 - struct kvm_mmu_page *sp; 2687 - int ret; 2688 - 2689 - sp = sptep_to_sp(sptep); 2690 - 2691 - ret = make_spte(vcpu, pte_access, level, gfn, pfn, *sptep, speculative, 2692 - can_unsync, host_writable, sp_ad_disabled(sp), &spte); 2693 - 2694 - if (spte & PT_WRITABLE_MASK) 2695 - kvm_vcpu_mark_page_dirty(vcpu, gfn); 2696 - 2697 - if (*sptep == spte) 2698 - ret |= SET_SPTE_SPURIOUS; 2699 - else if (mmu_spte_update(sptep, spte)) 2700 - ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH; 2701 - return ret; 2702 - } 2703 - 2704 - static int mmu_set_spte(struct kvm_vcpu *vcpu, u64 *sptep, 2705 - unsigned int pte_access, bool write_fault, int level, 2706 - gfn_t gfn, kvm_pfn_t pfn, bool speculative, 2707 - bool host_writable) 2708 - { 2695 + struct kvm_mmu_page *sp = sptep_to_sp(sptep); 2696 + int level = sp->role.level; 2709 2697 int was_rmapped = 0; 2710 - int rmap_count; 2711 - int set_spte_ret; 2712 2698 int ret = RET_PF_FIXED; 2713 2699 bool flush = false; 2700 + bool wrprot; 2701 + u64 spte; 2702 + 2703 + /* Prefetching always gets a writable pfn. */ 2704 + bool host_writable = !fault || fault->map_writable; 2705 + bool prefetch = !fault || fault->prefetch; 2706 + bool write_fault = fault && fault->write; 2714 2707 2715 2708 pgprintk("%s: spte %llx write_fault %d gfn %llx\n", __func__, 2716 2709 *sptep, write_fault, gfn); ··· 2723 2752 was_rmapped = 1; 2724 2753 } 2725 2754 2726 - set_spte_ret = set_spte(vcpu, sptep, pte_access, level, gfn, pfn, 2727 - speculative, true, host_writable); 2728 - if (set_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) { 2729 - if (write_fault) 2730 - ret = RET_PF_EMULATE; 2731 - kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); 2755 + wrprot = make_spte(vcpu, sp, slot, pte_access, gfn, pfn, *sptep, prefetch, 2756 + true, host_writable, &spte); 2757 + 2758 + if (*sptep == spte) { 2759 + ret = RET_PF_SPURIOUS; 2760 + } else { 2761 + trace_kvm_mmu_set_spte(level, gfn, sptep); 2762 + flush |= mmu_spte_update(sptep, spte); 2732 2763 } 2733 2764 2734 - if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH || flush) 2765 + if (wrprot) { 2766 + if (write_fault) 2767 + ret = RET_PF_EMULATE; 2768 + } 2769 + 2770 + if (flush) 2735 2771 kvm_flush_remote_tlbs_with_address(vcpu->kvm, gfn, 2736 2772 KVM_PAGES_PER_HPAGE(level)); 2737 2773 2738 - /* 2739 - * The fault is fully spurious if and only if the new SPTE and old SPTE 2740 - * are identical, and emulation is not required. 2741 - */ 2742 - if ((set_spte_ret & SET_SPTE_SPURIOUS) && ret == RET_PF_FIXED) { 2743 - WARN_ON_ONCE(!was_rmapped); 2744 - return RET_PF_SPURIOUS; 2745 - } 2746 - 2747 2774 pgprintk("%s: setting spte %llx\n", __func__, *sptep); 2748 - trace_kvm_mmu_set_spte(level, gfn, sptep); 2749 2775 2750 2776 if (!was_rmapped) { 2777 + WARN_ON_ONCE(ret == RET_PF_SPURIOUS); 2751 2778 kvm_update_page_stats(vcpu->kvm, level, 1); 2752 - rmap_count = rmap_add(vcpu, sptep, gfn); 2753 - if (rmap_count > RMAP_RECYCLE_THRESHOLD) 2754 - rmap_recycle(vcpu, sptep, gfn); 2779 + rmap_add(vcpu, slot, sptep, gfn); 2755 2780 } 2756 2781 2757 2782 return ret; 2758 - } 2759 - 2760 - static kvm_pfn_t pte_prefetch_gfn_to_pfn(struct kvm_vcpu *vcpu, gfn_t gfn, 2761 - bool no_dirty_log) 2762 - { 2763 - struct kvm_memory_slot *slot; 2764 - 2765 - slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, no_dirty_log); 2766 - if (!slot) 2767 - return KVM_PFN_ERR_FAULT; 2768 - 2769 - return gfn_to_pfn_memslot_atomic(slot, gfn); 2770 2783 } 2771 2784 2772 2785 static int direct_pte_prefetch_many(struct kvm_vcpu *vcpu, ··· 2773 2818 return -1; 2774 2819 2775 2820 for (i = 0; i < ret; i++, gfn++, start++) { 2776 - mmu_set_spte(vcpu, start, access, false, sp->role.level, gfn, 2777 - page_to_pfn(pages[i]), true, true); 2821 + mmu_set_spte(vcpu, slot, start, access, gfn, 2822 + page_to_pfn(pages[i]), NULL); 2778 2823 put_page(pages[i]); 2779 2824 } 2780 2825 ··· 2797 2842 if (!start) 2798 2843 continue; 2799 2844 if (direct_pte_prefetch_many(vcpu, sp, start, spte) < 0) 2800 - break; 2845 + return; 2801 2846 start = NULL; 2802 2847 } else if (!start) 2803 2848 start = spte; 2804 2849 } 2850 + if (start) 2851 + direct_pte_prefetch_many(vcpu, sp, start, spte); 2805 2852 } 2806 2853 2807 2854 static void direct_pte_prefetch(struct kvm_vcpu *vcpu, u64 *sptep) ··· 2881 2924 return min(host_level, max_level); 2882 2925 } 2883 2926 2884 - int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn, 2885 - int max_level, kvm_pfn_t *pfnp, 2886 - bool huge_page_disallowed, int *req_level) 2927 + void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) 2887 2928 { 2888 - struct kvm_memory_slot *slot; 2889 - kvm_pfn_t pfn = *pfnp; 2929 + struct kvm_memory_slot *slot = fault->slot; 2890 2930 kvm_pfn_t mask; 2891 - int level; 2892 2931 2893 - *req_level = PG_LEVEL_4K; 2932 + fault->huge_page_disallowed = fault->exec && fault->nx_huge_page_workaround_enabled; 2894 2933 2895 - if (unlikely(max_level == PG_LEVEL_4K)) 2896 - return PG_LEVEL_4K; 2934 + if (unlikely(fault->max_level == PG_LEVEL_4K)) 2935 + return; 2897 2936 2898 - if (is_error_noslot_pfn(pfn) || kvm_is_reserved_pfn(pfn)) 2899 - return PG_LEVEL_4K; 2937 + if (is_error_noslot_pfn(fault->pfn) || kvm_is_reserved_pfn(fault->pfn)) 2938 + return; 2900 2939 2901 - slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, true); 2902 - if (!slot) 2903 - return PG_LEVEL_4K; 2940 + if (kvm_slot_dirty_track_enabled(slot)) 2941 + return; 2904 2942 2905 2943 /* 2906 2944 * Enforce the iTLB multihit workaround after capturing the requested 2907 2945 * level, which will be used to do precise, accurate accounting. 2908 2946 */ 2909 - *req_level = level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, gfn, pfn, max_level); 2910 - if (level == PG_LEVEL_4K || huge_page_disallowed) 2911 - return PG_LEVEL_4K; 2947 + fault->req_level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, 2948 + fault->gfn, fault->pfn, 2949 + fault->max_level); 2950 + if (fault->req_level == PG_LEVEL_4K || fault->huge_page_disallowed) 2951 + return; 2912 2952 2913 2953 /* 2914 2954 * mmu_notifier_retry() was successful and mmu_lock is held, so 2915 2955 * the pmd can't be split from under us. 2916 2956 */ 2917 - mask = KVM_PAGES_PER_HPAGE(level) - 1; 2918 - VM_BUG_ON((gfn & mask) != (pfn & mask)); 2919 - *pfnp = pfn & ~mask; 2920 - 2921 - return level; 2957 + fault->goal_level = fault->req_level; 2958 + mask = KVM_PAGES_PER_HPAGE(fault->goal_level) - 1; 2959 + VM_BUG_ON((fault->gfn & mask) != (fault->pfn & mask)); 2960 + fault->pfn &= ~mask; 2922 2961 } 2923 2962 2924 - void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level, 2925 - kvm_pfn_t *pfnp, int *goal_levelp) 2963 + void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level) 2926 2964 { 2927 - int level = *goal_levelp; 2928 - 2929 - if (cur_level == level && level > PG_LEVEL_4K && 2965 + if (cur_level > PG_LEVEL_4K && 2966 + cur_level == fault->goal_level && 2930 2967 is_shadow_present_pte(spte) && 2931 2968 !is_large_pte(spte)) { 2932 2969 /* ··· 2930 2979 * patching back for them into pfn the next 9 bits of 2931 2980 * the address. 2932 2981 */ 2933 - u64 page_mask = KVM_PAGES_PER_HPAGE(level) - 2934 - KVM_PAGES_PER_HPAGE(level - 1); 2935 - *pfnp |= gfn & page_mask; 2936 - (*goal_levelp)--; 2982 + u64 page_mask = KVM_PAGES_PER_HPAGE(cur_level) - 2983 + KVM_PAGES_PER_HPAGE(cur_level - 1); 2984 + fault->pfn |= fault->gfn & page_mask; 2985 + fault->goal_level--; 2937 2986 } 2938 2987 } 2939 2988 2940 - static int __direct_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, 2941 - int map_writable, int max_level, kvm_pfn_t pfn, 2942 - bool prefault, bool is_tdp) 2989 + static int __direct_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) 2943 2990 { 2944 - bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(); 2945 - bool write = error_code & PFERR_WRITE_MASK; 2946 - bool exec = error_code & PFERR_FETCH_MASK; 2947 - bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled; 2948 2991 struct kvm_shadow_walk_iterator it; 2949 2992 struct kvm_mmu_page *sp; 2950 - int level, req_level, ret; 2951 - gfn_t gfn = gpa >> PAGE_SHIFT; 2952 - gfn_t base_gfn = gfn; 2993 + int ret; 2994 + gfn_t base_gfn = fault->gfn; 2953 2995 2954 - level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, 2955 - huge_page_disallowed, &req_level); 2996 + kvm_mmu_hugepage_adjust(vcpu, fault); 2956 2997 2957 - trace_kvm_mmu_spte_requested(gpa, level, pfn); 2958 - for_each_shadow_entry(vcpu, gpa, it) { 2998 + trace_kvm_mmu_spte_requested(fault); 2999 + for_each_shadow_entry(vcpu, fault->addr, it) { 2959 3000 /* 2960 3001 * We cannot overwrite existing page tables with an NX 2961 3002 * large page, as the leaf could be executable. 2962 3003 */ 2963 - if (nx_huge_page_workaround_enabled) 2964 - disallowed_hugepage_adjust(*it.sptep, gfn, it.level, 2965 - &pfn, &level); 3004 + if (fault->nx_huge_page_workaround_enabled) 3005 + disallowed_hugepage_adjust(fault, *it.sptep, it.level); 2966 3006 2967 - base_gfn = gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1); 2968 - if (it.level == level) 3007 + base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1); 3008 + if (it.level == fault->goal_level) 2969 3009 break; 2970 3010 2971 3011 drop_large_spte(vcpu, it.sptep); ··· 2967 3025 it.level - 1, true, ACC_ALL); 2968 3026 2969 3027 link_shadow_page(vcpu, it.sptep, sp); 2970 - if (is_tdp && huge_page_disallowed && 2971 - req_level >= it.level) 3028 + if (fault->is_tdp && fault->huge_page_disallowed && 3029 + fault->req_level >= it.level) 2972 3030 account_huge_nx_page(vcpu->kvm, sp); 2973 3031 } 2974 3032 2975 - ret = mmu_set_spte(vcpu, it.sptep, ACC_ALL, 2976 - write, level, base_gfn, pfn, prefault, 2977 - map_writable); 3033 + if (WARN_ON_ONCE(it.level != fault->goal_level)) 3034 + return -EFAULT; 3035 + 3036 + ret = mmu_set_spte(vcpu, fault->slot, it.sptep, ACC_ALL, 3037 + base_gfn, fault->pfn, fault); 2978 3038 if (ret == RET_PF_SPURIOUS) 2979 3039 return ret; 2980 3040 ··· 3008 3064 return -EFAULT; 3009 3065 } 3010 3066 3011 - static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, gva_t gva, gfn_t gfn, 3012 - kvm_pfn_t pfn, unsigned int access, 3013 - int *ret_val) 3067 + static bool handle_abnormal_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, 3068 + unsigned int access, int *ret_val) 3014 3069 { 3015 3070 /* The pfn is invalid, report the error! */ 3016 - if (unlikely(is_error_pfn(pfn))) { 3017 - *ret_val = kvm_handle_bad_page(vcpu, gfn, pfn); 3071 + if (unlikely(is_error_pfn(fault->pfn))) { 3072 + *ret_val = kvm_handle_bad_page(vcpu, fault->gfn, fault->pfn); 3018 3073 return true; 3019 3074 } 3020 3075 3021 - if (unlikely(is_noslot_pfn(pfn))) { 3022 - vcpu_cache_mmio_info(vcpu, gva, gfn, 3076 + if (unlikely(!fault->slot)) { 3077 + gva_t gva = fault->is_tdp ? 0 : fault->addr; 3078 + 3079 + vcpu_cache_mmio_info(vcpu, gva, fault->gfn, 3023 3080 access & shadow_mmio_access_mask); 3024 3081 /* 3025 3082 * If MMIO caching is disabled, emulate immediately without ··· 3036 3091 return false; 3037 3092 } 3038 3093 3039 - static bool page_fault_can_be_fast(u32 error_code) 3094 + static bool page_fault_can_be_fast(struct kvm_page_fault *fault) 3040 3095 { 3041 3096 /* 3042 3097 * Do not fix the mmio spte with invalid generation number which 3043 3098 * need to be updated by slow page fault path. 3044 3099 */ 3045 - if (unlikely(error_code & PFERR_RSVD_MASK)) 3100 + if (fault->rsvd) 3046 3101 return false; 3047 3102 3048 3103 /* See if the page fault is due to an NX violation */ 3049 - if (unlikely(((error_code & (PFERR_FETCH_MASK | PFERR_PRESENT_MASK)) 3050 - == (PFERR_FETCH_MASK | PFERR_PRESENT_MASK)))) 3104 + if (unlikely(fault->exec && fault->present)) 3051 3105 return false; 3052 3106 3053 3107 /* ··· 3063 3119 * accesses to a present page. 3064 3120 */ 3065 3121 3066 - return shadow_acc_track_mask != 0 || 3067 - ((error_code & (PFERR_WRITE_MASK | PFERR_PRESENT_MASK)) 3068 - == (PFERR_WRITE_MASK | PFERR_PRESENT_MASK)); 3122 + return shadow_acc_track_mask != 0 || (fault->write && fault->present); 3069 3123 } 3070 3124 3071 3125 /* ··· 3071 3129 * someone else modified the SPTE from its original value. 3072 3130 */ 3073 3131 static bool 3074 - fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 3132 + fast_pf_fix_direct_spte(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, 3075 3133 u64 *sptep, u64 old_spte, u64 new_spte) 3076 3134 { 3077 - gfn_t gfn; 3078 - 3079 - WARN_ON(!sp->role.direct); 3080 - 3081 3135 /* 3082 3136 * Theoretically we could also set dirty bit (and flush TLB) here in 3083 3137 * order to eliminate unnecessary PML logging. See comments in ··· 3089 3151 if (cmpxchg64(sptep, old_spte, new_spte) != old_spte) 3090 3152 return false; 3091 3153 3092 - if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) { 3093 - /* 3094 - * The gfn of direct spte is stable since it is 3095 - * calculated by sp->gfn. 3096 - */ 3097 - gfn = kvm_mmu_page_get_gfn(sp, sptep - sp->spt); 3098 - kvm_vcpu_mark_page_dirty(vcpu, gfn); 3099 - } 3154 + if (is_writable_pte(new_spte) && !is_writable_pte(old_spte)) 3155 + mark_page_dirty_in_slot(vcpu->kvm, fault->slot, fault->gfn); 3100 3156 3101 3157 return true; 3102 3158 } 3103 3159 3104 - static bool is_access_allowed(u32 fault_err_code, u64 spte) 3160 + static bool is_access_allowed(struct kvm_page_fault *fault, u64 spte) 3105 3161 { 3106 - if (fault_err_code & PFERR_FETCH_MASK) 3162 + if (fault->exec) 3107 3163 return is_executable_pte(spte); 3108 3164 3109 - if (fault_err_code & PFERR_WRITE_MASK) 3165 + if (fault->write) 3110 3166 return is_writable_pte(spte); 3111 3167 3112 3168 /* Fault was on Read access */ ··· 3125 3193 for_each_shadow_entry_lockless(vcpu, gpa, iterator, old_spte) { 3126 3194 sptep = iterator.sptep; 3127 3195 *spte = old_spte; 3128 - 3129 - if (!is_shadow_present_pte(old_spte)) 3130 - break; 3131 3196 } 3132 3197 3133 3198 return sptep; ··· 3133 3204 /* 3134 3205 * Returns one of RET_PF_INVALID, RET_PF_FIXED or RET_PF_SPURIOUS. 3135 3206 */ 3136 - static int fast_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code) 3207 + static int fast_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) 3137 3208 { 3138 3209 struct kvm_mmu_page *sp; 3139 3210 int ret = RET_PF_INVALID; ··· 3141 3212 u64 *sptep = NULL; 3142 3213 uint retry_count = 0; 3143 3214 3144 - if (!page_fault_can_be_fast(error_code)) 3215 + if (!page_fault_can_be_fast(fault)) 3145 3216 return ret; 3146 3217 3147 3218 walk_shadow_page_lockless_begin(vcpu); ··· 3150 3221 u64 new_spte; 3151 3222 3152 3223 if (is_tdp_mmu(vcpu->arch.mmu)) 3153 - sptep = kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, gpa, &spte); 3224 + sptep = kvm_tdp_mmu_fast_pf_get_last_sptep(vcpu, fault->addr, &spte); 3154 3225 else 3155 - sptep = fast_pf_get_last_sptep(vcpu, gpa, &spte); 3226 + sptep = fast_pf_get_last_sptep(vcpu, fault->addr, &spte); 3156 3227 3157 3228 if (!is_shadow_present_pte(spte)) 3158 3229 break; ··· 3171 3242 * Need not check the access of upper level table entries since 3172 3243 * they are always ACC_ALL. 3173 3244 */ 3174 - if (is_access_allowed(error_code, spte)) { 3245 + if (is_access_allowed(fault, spte)) { 3175 3246 ret = RET_PF_SPURIOUS; 3176 3247 break; 3177 3248 } ··· 3186 3257 * be removed in the fast path only if the SPTE was 3187 3258 * write-protected for dirty-logging or access tracking. 3188 3259 */ 3189 - if ((error_code & PFERR_WRITE_MASK) && 3260 + if (fault->write && 3190 3261 spte_can_locklessly_be_made_writable(spte)) { 3191 3262 new_spte |= PT_WRITABLE_MASK; 3192 3263 ··· 3207 3278 3208 3279 /* Verify that the fault can be handled in the fast path */ 3209 3280 if (new_spte == spte || 3210 - !is_access_allowed(error_code, new_spte)) 3281 + !is_access_allowed(fault, new_spte)) 3211 3282 break; 3212 3283 3213 3284 /* ··· 3215 3286 * since the gfn is not stable for indirect shadow page. See 3216 3287 * Documentation/virt/kvm/locking.rst to get more detail. 3217 3288 */ 3218 - if (fast_pf_fix_direct_spte(vcpu, sp, sptep, spte, new_spte)) { 3289 + if (fast_pf_fix_direct_spte(vcpu, fault, sptep, spte, new_spte)) { 3219 3290 ret = RET_PF_FIXED; 3220 3291 break; 3221 3292 } ··· 3228 3299 3229 3300 } while (true); 3230 3301 3231 - trace_fast_page_fault(vcpu, gpa, error_code, sptep, spte, ret); 3302 + trace_fast_page_fault(vcpu, fault, sptep, spte, ret); 3232 3303 walk_shadow_page_lockless_end(vcpu); 3233 3304 3234 3305 return ret; ··· 3401 3472 return r; 3402 3473 } 3403 3474 3475 + static int mmu_first_shadow_root_alloc(struct kvm *kvm) 3476 + { 3477 + struct kvm_memslots *slots; 3478 + struct kvm_memory_slot *slot; 3479 + int r = 0, i; 3480 + 3481 + /* 3482 + * Check if this is the first shadow root being allocated before 3483 + * taking the lock. 3484 + */ 3485 + if (kvm_shadow_root_allocated(kvm)) 3486 + return 0; 3487 + 3488 + mutex_lock(&kvm->slots_arch_lock); 3489 + 3490 + /* Recheck, under the lock, whether this is the first shadow root. */ 3491 + if (kvm_shadow_root_allocated(kvm)) 3492 + goto out_unlock; 3493 + 3494 + /* 3495 + * Check if anything actually needs to be allocated, e.g. all metadata 3496 + * will be allocated upfront if TDP is disabled. 3497 + */ 3498 + if (kvm_memslots_have_rmaps(kvm) && 3499 + kvm_page_track_write_tracking_enabled(kvm)) 3500 + goto out_success; 3501 + 3502 + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { 3503 + slots = __kvm_memslots(kvm, i); 3504 + kvm_for_each_memslot(slot, slots) { 3505 + /* 3506 + * Both of these functions are no-ops if the target is 3507 + * already allocated, so unconditionally calling both 3508 + * is safe. Intentionally do NOT free allocations on 3509 + * failure to avoid having to track which allocations 3510 + * were made now versus when the memslot was created. 3511 + * The metadata is guaranteed to be freed when the slot 3512 + * is freed, and will be kept/used if userspace retries 3513 + * KVM_RUN instead of killing the VM. 3514 + */ 3515 + r = memslot_rmap_alloc(slot, slot->npages); 3516 + if (r) 3517 + goto out_unlock; 3518 + r = kvm_page_track_write_tracking_alloc(slot); 3519 + if (r) 3520 + goto out_unlock; 3521 + } 3522 + } 3523 + 3524 + /* 3525 + * Ensure that shadow_root_allocated becomes true strictly after 3526 + * all the related pointers are set. 3527 + */ 3528 + out_success: 3529 + smp_store_release(&kvm->arch.shadow_root_allocated, true); 3530 + 3531 + out_unlock: 3532 + mutex_unlock(&kvm->slots_arch_lock); 3533 + return r; 3534 + } 3535 + 3404 3536 static int mmu_alloc_shadow_roots(struct kvm_vcpu *vcpu) 3405 3537 { 3406 3538 struct kvm_mmu *mmu = vcpu->arch.mmu; ··· 3492 3502 } 3493 3503 } 3494 3504 3495 - r = alloc_all_memslots_rmaps(vcpu->kvm); 3505 + r = mmu_first_shadow_root_alloc(vcpu->kvm); 3496 3506 if (r) 3497 3507 return r; 3498 3508 ··· 3643 3653 #endif 3644 3654 } 3645 3655 3656 + static bool is_unsync_root(hpa_t root) 3657 + { 3658 + struct kvm_mmu_page *sp; 3659 + 3660 + if (!VALID_PAGE(root)) 3661 + return false; 3662 + 3663 + /* 3664 + * The read barrier orders the CPU's read of SPTE.W during the page table 3665 + * walk before the reads of sp->unsync/sp->unsync_children here. 3666 + * 3667 + * Even if another CPU was marking the SP as unsync-ed simultaneously, 3668 + * any guest page table changes are not guaranteed to be visible anyway 3669 + * until this VCPU issues a TLB flush strictly after those changes are 3670 + * made. We only need to ensure that the other CPU sets these flags 3671 + * before any actual changes to the page tables are made. The comments 3672 + * in mmu_try_to_unsync_pages() describe what could go wrong if this 3673 + * requirement isn't satisfied. 3674 + */ 3675 + smp_rmb(); 3676 + sp = to_shadow_page(root); 3677 + if (sp->unsync || sp->unsync_children) 3678 + return true; 3679 + 3680 + return false; 3681 + } 3682 + 3646 3683 void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu) 3647 3684 { 3648 3685 int i; ··· 3687 3670 hpa_t root = vcpu->arch.mmu->root_hpa; 3688 3671 sp = to_shadow_page(root); 3689 3672 3690 - /* 3691 - * Even if another CPU was marking the SP as unsync-ed 3692 - * simultaneously, any guest page table changes are not 3693 - * guaranteed to be visible anyway until this VCPU issues a TLB 3694 - * flush strictly after those changes are made. We only need to 3695 - * ensure that the other CPU sets these flags before any actual 3696 - * changes to the page tables are made. The comments in 3697 - * mmu_try_to_unsync_pages() describe what could go wrong if 3698 - * this requirement isn't satisfied. 3699 - */ 3700 - if (!smp_load_acquire(&sp->unsync) && 3701 - !smp_load_acquire(&sp->unsync_children)) 3673 + if (!is_unsync_root(root)) 3702 3674 return; 3703 3675 3704 3676 write_lock(&vcpu->kvm->mmu_lock); ··· 3715 3709 3716 3710 kvm_mmu_audit(vcpu, AUDIT_POST_SYNC); 3717 3711 write_unlock(&vcpu->kvm->mmu_lock); 3712 + } 3713 + 3714 + void kvm_mmu_sync_prev_roots(struct kvm_vcpu *vcpu) 3715 + { 3716 + unsigned long roots_to_free = 0; 3717 + int i; 3718 + 3719 + for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) 3720 + if (is_unsync_root(vcpu->arch.mmu->prev_roots[i].hpa)) 3721 + roots_to_free |= KVM_MMU_ROOT_PREVIOUS(i); 3722 + 3723 + /* sync prev_roots by simply freeing them */ 3724 + kvm_mmu_free_roots(vcpu, vcpu->arch.mmu, roots_to_free); 3718 3725 } 3719 3726 3720 3727 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gpa_t vaddr, ··· 3782 3763 spte = mmu_spte_get_lockless(iterator.sptep); 3783 3764 3784 3765 sptes[leaf] = spte; 3785 - 3786 - if (!is_shadow_present_pte(spte)) 3787 - break; 3788 3766 } 3789 3767 3790 3768 return leaf; ··· 3872 3856 } 3873 3857 3874 3858 static bool page_fault_handle_page_track(struct kvm_vcpu *vcpu, 3875 - u32 error_code, gfn_t gfn) 3859 + struct kvm_page_fault *fault) 3876 3860 { 3877 - if (unlikely(error_code & PFERR_RSVD_MASK)) 3861 + if (unlikely(fault->rsvd)) 3878 3862 return false; 3879 3863 3880 - if (!(error_code & PFERR_PRESENT_MASK) || 3881 - !(error_code & PFERR_WRITE_MASK)) 3864 + if (!fault->present || !fault->write) 3882 3865 return false; 3883 3866 3884 3867 /* 3885 3868 * guest is writing the page which is write tracked which can 3886 3869 * not be fixed by page fault handler. 3887 3870 */ 3888 - if (kvm_page_track_is_active(vcpu, gfn, KVM_PAGE_TRACK_WRITE)) 3871 + if (kvm_slot_page_track_is_active(vcpu, fault->slot, fault->gfn, KVM_PAGE_TRACK_WRITE)) 3889 3872 return true; 3890 3873 3891 3874 return false; ··· 3896 3881 u64 spte; 3897 3882 3898 3883 walk_shadow_page_lockless_begin(vcpu); 3899 - for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) { 3884 + for_each_shadow_entry_lockless(vcpu, addr, iterator, spte) 3900 3885 clear_sp_write_flooding_count(iterator.sptep); 3901 - if (!is_shadow_present_pte(spte)) 3902 - break; 3903 - } 3904 3886 walk_shadow_page_lockless_end(vcpu); 3905 3887 } 3906 3888 ··· 3915 3903 kvm_vcpu_gfn_to_hva(vcpu, gfn), &arch); 3916 3904 } 3917 3905 3918 - static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, bool prefault, gfn_t gfn, 3919 - gpa_t cr2_or_gpa, kvm_pfn_t *pfn, hva_t *hva, 3920 - bool write, bool *writable, int *r) 3906 + static bool kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, int *r) 3921 3907 { 3922 - struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); 3908 + struct kvm_memory_slot *slot = fault->slot; 3923 3909 bool async; 3924 3910 3925 3911 /* ··· 3931 3921 if (!kvm_is_visible_memslot(slot)) { 3932 3922 /* Don't expose private memslots to L2. */ 3933 3923 if (is_guest_mode(vcpu)) { 3934 - *pfn = KVM_PFN_NOSLOT; 3935 - *writable = false; 3924 + fault->slot = NULL; 3925 + fault->pfn = KVM_PFN_NOSLOT; 3926 + fault->map_writable = false; 3936 3927 return false; 3937 3928 } 3938 3929 /* ··· 3950 3939 } 3951 3940 3952 3941 async = false; 3953 - *pfn = __gfn_to_pfn_memslot(slot, gfn, false, &async, 3954 - write, writable, hva); 3942 + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, &async, 3943 + fault->write, &fault->map_writable, 3944 + &fault->hva); 3955 3945 if (!async) 3956 3946 return false; /* *pfn has correct page already */ 3957 3947 3958 - if (!prefault && kvm_can_do_async_pf(vcpu)) { 3959 - trace_kvm_try_async_get_page(cr2_or_gpa, gfn); 3960 - if (kvm_find_async_pf_gfn(vcpu, gfn)) { 3961 - trace_kvm_async_pf_doublefault(cr2_or_gpa, gfn); 3948 + if (!fault->prefetch && kvm_can_do_async_pf(vcpu)) { 3949 + trace_kvm_try_async_get_page(fault->addr, fault->gfn); 3950 + if (kvm_find_async_pf_gfn(vcpu, fault->gfn)) { 3951 + trace_kvm_async_pf_doublefault(fault->addr, fault->gfn); 3962 3952 kvm_make_request(KVM_REQ_APF_HALT, vcpu); 3963 3953 goto out_retry; 3964 - } else if (kvm_arch_setup_async_pf(vcpu, cr2_or_gpa, gfn)) 3954 + } else if (kvm_arch_setup_async_pf(vcpu, fault->addr, fault->gfn)) 3965 3955 goto out_retry; 3966 3956 } 3967 3957 3968 - *pfn = __gfn_to_pfn_memslot(slot, gfn, false, NULL, 3969 - write, writable, hva); 3958 + fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, NULL, 3959 + fault->write, &fault->map_writable, 3960 + &fault->hva); 3961 + return false; 3970 3962 3971 3963 out_retry: 3972 3964 *r = RET_PF_RETRY; 3973 3965 return true; 3974 3966 } 3975 3967 3976 - static int direct_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, 3977 - bool prefault, int max_level, bool is_tdp) 3968 + static int direct_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) 3978 3969 { 3979 3970 bool is_tdp_mmu_fault = is_tdp_mmu(vcpu->arch.mmu); 3980 - bool write = error_code & PFERR_WRITE_MASK; 3981 - bool map_writable; 3982 3971 3983 - gfn_t gfn = gpa >> PAGE_SHIFT; 3984 3972 unsigned long mmu_seq; 3985 - kvm_pfn_t pfn; 3986 - hva_t hva; 3987 3973 int r; 3988 3974 3989 - if (page_fault_handle_page_track(vcpu, error_code, gfn)) 3975 + fault->gfn = fault->addr >> PAGE_SHIFT; 3976 + fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn); 3977 + 3978 + if (page_fault_handle_page_track(vcpu, fault)) 3990 3979 return RET_PF_EMULATE; 3991 3980 3992 - r = fast_page_fault(vcpu, gpa, error_code); 3981 + r = fast_page_fault(vcpu, fault); 3993 3982 if (r != RET_PF_INVALID) 3994 3983 return r; 3995 3984 ··· 4000 3989 mmu_seq = vcpu->kvm->mmu_notifier_seq; 4001 3990 smp_rmb(); 4002 3991 4003 - if (kvm_faultin_pfn(vcpu, prefault, gfn, gpa, &pfn, &hva, 4004 - write, &map_writable, &r)) 3992 + if (kvm_faultin_pfn(vcpu, fault, &r)) 4005 3993 return r; 4006 3994 4007 - if (handle_abnormal_pfn(vcpu, is_tdp ? 0 : gpa, gfn, pfn, ACC_ALL, &r)) 3995 + if (handle_abnormal_pfn(vcpu, fault, ACC_ALL, &r)) 4008 3996 return r; 4009 3997 4010 3998 r = RET_PF_RETRY; ··· 4013 4003 else 4014 4004 write_lock(&vcpu->kvm->mmu_lock); 4015 4005 4016 - if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva)) 4006 + if (fault->slot && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva)) 4017 4007 goto out_unlock; 4018 4008 r = make_mmu_pages_available(vcpu); 4019 4009 if (r) 4020 4010 goto out_unlock; 4021 4011 4022 4012 if (is_tdp_mmu_fault) 4023 - r = kvm_tdp_mmu_map(vcpu, gpa, error_code, map_writable, max_level, 4024 - pfn, prefault); 4013 + r = kvm_tdp_mmu_map(vcpu, fault); 4025 4014 else 4026 - r = __direct_map(vcpu, gpa, error_code, map_writable, max_level, pfn, 4027 - prefault, is_tdp); 4015 + r = __direct_map(vcpu, fault); 4028 4016 4029 4017 out_unlock: 4030 4018 if (is_tdp_mmu_fault) 4031 4019 read_unlock(&vcpu->kvm->mmu_lock); 4032 4020 else 4033 4021 write_unlock(&vcpu->kvm->mmu_lock); 4034 - kvm_release_pfn_clean(pfn); 4022 + kvm_release_pfn_clean(fault->pfn); 4035 4023 return r; 4036 4024 } 4037 4025 4038 - static int nonpaging_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, 4039 - u32 error_code, bool prefault) 4026 + static int nonpaging_page_fault(struct kvm_vcpu *vcpu, 4027 + struct kvm_page_fault *fault) 4040 4028 { 4041 - pgprintk("%s: gva %lx error %x\n", __func__, gpa, error_code); 4029 + pgprintk("%s: gva %lx error %x\n", __func__, fault->addr, fault->error_code); 4042 4030 4043 4031 /* This path builds a PAE pagetable, we can map 2mb pages at maximum. */ 4044 - return direct_page_fault(vcpu, gpa & PAGE_MASK, error_code, prefault, 4045 - PG_LEVEL_2M, false); 4032 + fault->max_level = PG_LEVEL_2M; 4033 + return direct_page_fault(vcpu, fault); 4046 4034 } 4047 4035 4048 4036 int kvm_handle_page_fault(struct kvm_vcpu *vcpu, u64 error_code, ··· 4076 4068 } 4077 4069 EXPORT_SYMBOL_GPL(kvm_handle_page_fault); 4078 4070 4079 - int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, 4080 - bool prefault) 4071 + int kvm_tdp_page_fault(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) 4081 4072 { 4082 - int max_level; 4083 - 4084 - for (max_level = KVM_MAX_HUGEPAGE_LEVEL; 4085 - max_level > PG_LEVEL_4K; 4086 - max_level--) { 4087 - int page_num = KVM_PAGES_PER_HPAGE(max_level); 4088 - gfn_t base = (gpa >> PAGE_SHIFT) & ~(page_num - 1); 4073 + while (fault->max_level > PG_LEVEL_4K) { 4074 + int page_num = KVM_PAGES_PER_HPAGE(fault->max_level); 4075 + gfn_t base = (fault->addr >> PAGE_SHIFT) & ~(page_num - 1); 4089 4076 4090 4077 if (kvm_mtrr_check_gfn_range_consistency(vcpu, base, page_num)) 4091 4078 break; 4079 + 4080 + --fault->max_level; 4092 4081 } 4093 4082 4094 - return direct_page_fault(vcpu, gpa, error_code, prefault, 4095 - max_level, true); 4083 + return direct_page_fault(vcpu, fault); 4096 4084 } 4097 4085 4098 4086 static void nonpaging_init_context(struct kvm_mmu *context) ··· 4209 4205 } 4210 4206 4211 4207 static bool sync_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, gfn_t gfn, 4212 - unsigned int access, int *nr_present) 4208 + unsigned int access) 4213 4209 { 4214 4210 if (unlikely(is_mmio_spte(*sptep))) { 4215 4211 if (gfn != get_mmio_spte_gfn(*sptep)) { ··· 4217 4213 return true; 4218 4214 } 4219 4215 4220 - (*nr_present)++; 4221 4216 mark_mmio_spte(vcpu, sptep, gfn, access); 4222 4217 return true; 4223 4218 } ··· 5215 5212 LIST_HEAD(invalid_list); 5216 5213 u64 entry, gentry, *spte; 5217 5214 int npte; 5218 - bool remote_flush, local_flush; 5215 + bool flush = false; 5219 5216 5220 5217 /* 5221 5218 * If we don't have indirect shadow pages, it means no page is ··· 5223 5220 */ 5224 5221 if (!READ_ONCE(vcpu->kvm->arch.indirect_shadow_pages)) 5225 5222 return; 5226 - 5227 - remote_flush = local_flush = false; 5228 5223 5229 5224 pgprintk("%s: gpa %llx bytes %d\n", __func__, gpa, bytes); 5230 5225 ··· 5252 5251 if (!spte) 5253 5252 continue; 5254 5253 5255 - local_flush = true; 5256 5254 while (npte--) { 5257 5255 entry = *spte; 5258 5256 mmu_page_zap_pte(vcpu->kvm, sp, spte, NULL); 5259 5257 if (gentry && sp->role.level != PG_LEVEL_4K) 5260 5258 ++vcpu->kvm->stat.mmu_pde_zapped; 5261 5259 if (need_remote_flush(entry, *spte)) 5262 - remote_flush = true; 5260 + flush = true; 5263 5261 ++spte; 5264 5262 } 5265 5263 } 5266 - kvm_mmu_flush_or_zap(vcpu, &invalid_list, remote_flush, local_flush); 5264 + kvm_mmu_remote_flush_or_zap(vcpu->kvm, &invalid_list, flush); 5267 5265 kvm_mmu_audit(vcpu, AUDIT_POST_PTE_WRITE); 5268 5266 write_unlock(&vcpu->kvm->mmu_lock); 5269 5267 } ··· 5473 5473 } 5474 5474 5475 5475 static __always_inline bool 5476 - slot_handle_leaf(struct kvm *kvm, const struct kvm_memory_slot *memslot, 5477 - slot_level_handler fn, bool flush_on_yield) 5476 + slot_handle_level_4k(struct kvm *kvm, const struct kvm_memory_slot *memslot, 5477 + slot_level_handler fn, bool flush_on_yield) 5478 5478 { 5479 5479 return slot_handle_level(kvm, memslot, fn, PG_LEVEL_4K, 5480 5480 PG_LEVEL_4K, flush_on_yield); ··· 5694 5694 5695 5695 spin_lock_init(&kvm->arch.mmu_unsync_pages_lock); 5696 5696 5697 - if (!kvm_mmu_init_tdp_mmu(kvm)) 5698 - /* 5699 - * No smp_load/store wrappers needed here as we are in 5700 - * VM init and there cannot be any memslots / other threads 5701 - * accessing this struct kvm yet. 5702 - */ 5703 - kvm->arch.memslots_have_rmaps = true; 5697 + kvm_mmu_init_tdp_mmu(kvm); 5704 5698 5705 5699 node->track_write = kvm_mmu_pte_write; 5706 5700 node->track_flush_slot = kvm_mmu_invalidate_zap_pages_in_memslot; ··· 5710 5716 kvm_mmu_uninit_tdp_mmu(kvm); 5711 5717 } 5712 5718 5719 + static bool __kvm_zap_rmaps(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) 5720 + { 5721 + const struct kvm_memory_slot *memslot; 5722 + struct kvm_memslots *slots; 5723 + bool flush = false; 5724 + gfn_t start, end; 5725 + int i; 5726 + 5727 + if (!kvm_memslots_have_rmaps(kvm)) 5728 + return flush; 5729 + 5730 + for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { 5731 + slots = __kvm_memslots(kvm, i); 5732 + kvm_for_each_memslot(memslot, slots) { 5733 + start = max(gfn_start, memslot->base_gfn); 5734 + end = min(gfn_end, memslot->base_gfn + memslot->npages); 5735 + if (start >= end) 5736 + continue; 5737 + 5738 + flush = slot_handle_level_range(kvm, memslot, kvm_zap_rmapp, 5739 + PG_LEVEL_4K, KVM_MAX_HUGEPAGE_LEVEL, 5740 + start, end - 1, true, flush); 5741 + } 5742 + } 5743 + 5744 + return flush; 5745 + } 5746 + 5713 5747 /* 5714 5748 * Invalidate (zap) SPTEs that cover GFNs from gfn_start and up to gfn_end 5715 5749 * (not including it) 5716 5750 */ 5717 5751 void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, gfn_t gfn_end) 5718 5752 { 5719 - struct kvm_memslots *slots; 5720 - struct kvm_memory_slot *memslot; 5753 + bool flush; 5721 5754 int i; 5722 - bool flush = false; 5723 5755 5724 5756 write_lock(&kvm->mmu_lock); 5725 5757 5726 5758 kvm_inc_notifier_count(kvm, gfn_start, gfn_end); 5727 5759 5728 - if (kvm_memslots_have_rmaps(kvm)) { 5729 - for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { 5730 - slots = __kvm_memslots(kvm, i); 5731 - kvm_for_each_memslot(memslot, slots) { 5732 - gfn_t start, end; 5733 - 5734 - start = max(gfn_start, memslot->base_gfn); 5735 - end = min(gfn_end, memslot->base_gfn + memslot->npages); 5736 - if (start >= end) 5737 - continue; 5738 - 5739 - flush = slot_handle_level_range(kvm, 5740 - (const struct kvm_memory_slot *) memslot, 5741 - kvm_zap_rmapp, PG_LEVEL_4K, 5742 - KVM_MAX_HUGEPAGE_LEVEL, start, 5743 - end - 1, true, flush); 5744 - } 5745 - } 5746 - if (flush) 5747 - kvm_flush_remote_tlbs_with_address(kvm, gfn_start, 5748 - gfn_end - gfn_start); 5749 - } 5760 + flush = __kvm_zap_rmaps(kvm, gfn_start, gfn_end); 5750 5761 5751 5762 if (is_tdp_mmu_enabled(kvm)) { 5752 5763 for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) 5753 5764 flush = kvm_tdp_mmu_zap_gfn_range(kvm, i, gfn_start, 5754 5765 gfn_end, flush); 5755 - if (flush) 5756 - kvm_flush_remote_tlbs_with_address(kvm, gfn_start, 5757 - gfn_end - gfn_start); 5758 5766 } 5759 5767 5760 5768 if (flush) 5761 - kvm_flush_remote_tlbs_with_address(kvm, gfn_start, gfn_end); 5769 + kvm_flush_remote_tlbs_with_address(kvm, gfn_start, 5770 + gfn_end - gfn_start); 5762 5771 5763 5772 kvm_dec_notifier_count(kvm, gfn_start, gfn_end); 5764 5773 ··· 5857 5860 5858 5861 if (kvm_memslots_have_rmaps(kvm)) { 5859 5862 write_lock(&kvm->mmu_lock); 5860 - flush = slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true); 5863 + /* 5864 + * Zap only 4k SPTEs since the legacy MMU only supports dirty 5865 + * logging at a 4k granularity and never creates collapsible 5866 + * 2m SPTEs during dirty logging. 5867 + */ 5868 + flush = slot_handle_level_4k(kvm, slot, kvm_mmu_zap_collapsible_spte, true); 5861 5869 if (flush) 5862 5870 kvm_arch_flush_remote_tlbs_memslot(kvm, slot); 5863 5871 write_unlock(&kvm->mmu_lock); ··· 5899 5897 5900 5898 if (kvm_memslots_have_rmaps(kvm)) { 5901 5899 write_lock(&kvm->mmu_lock); 5902 - flush = slot_handle_leaf(kvm, memslot, __rmap_clear_dirty, 5903 - false); 5900 + /* 5901 + * Clear dirty bits only on 4k SPTEs since the legacy MMU only 5902 + * support dirty logging at a 4k granularity. 5903 + */ 5904 + flush = slot_handle_level_4k(kvm, memslot, __rmap_clear_dirty, false); 5904 5905 write_unlock(&kvm->mmu_lock); 5905 5906 } 5906 5907 ··· 6181 6176 mmu_audit_disable(); 6182 6177 } 6183 6178 6184 - static int set_nx_huge_pages_recovery_ratio(const char *val, const struct kernel_param *kp) 6179 + static int set_nx_huge_pages_recovery_param(const char *val, const struct kernel_param *kp) 6185 6180 { 6186 - unsigned int old_val; 6181 + bool was_recovery_enabled, is_recovery_enabled; 6182 + uint old_period, new_period; 6187 6183 int err; 6188 6184 6189 - old_val = nx_huge_pages_recovery_ratio; 6185 + was_recovery_enabled = nx_huge_pages_recovery_ratio; 6186 + old_period = nx_huge_pages_recovery_period_ms; 6187 + 6190 6188 err = param_set_uint(val, kp); 6191 6189 if (err) 6192 6190 return err; 6193 6191 6194 - if (READ_ONCE(nx_huge_pages) && 6195 - !old_val && nx_huge_pages_recovery_ratio) { 6192 + is_recovery_enabled = nx_huge_pages_recovery_ratio; 6193 + new_period = nx_huge_pages_recovery_period_ms; 6194 + 6195 + if (READ_ONCE(nx_huge_pages) && is_recovery_enabled && 6196 + (!was_recovery_enabled || old_period > new_period)) { 6196 6197 struct kvm *kvm; 6197 6198 6198 6199 mutex_lock(&kvm_lock); ··· 6261 6250 6262 6251 static long get_nx_lpage_recovery_timeout(u64 start_time) 6263 6252 { 6264 - return READ_ONCE(nx_huge_pages) && READ_ONCE(nx_huge_pages_recovery_ratio) 6265 - ? start_time + 60 * HZ - get_jiffies_64() 6253 + uint ratio = READ_ONCE(nx_huge_pages_recovery_ratio); 6254 + uint period = READ_ONCE(nx_huge_pages_recovery_period_ms); 6255 + 6256 + if (!period && ratio) { 6257 + /* Make sure the period is not less than one second. */ 6258 + ratio = min(ratio, 3600u); 6259 + period = 60 * 60 * 1000 / ratio; 6260 + } 6261 + 6262 + return READ_ONCE(nx_huge_pages) && ratio 6263 + ? start_time + msecs_to_jiffies(period) - get_jiffies_64() 6266 6264 : MAX_SCHEDULE_TIMEOUT; 6267 6265 } 6268 6266

+4 -17

arch/x86/kvm/mmu/mmu_internal.h

··· 118 118 kvm_x86_ops.cpu_dirty_log_size; 119 119 } 120 120 121 - extern int nx_huge_pages; 122 - static inline bool is_nx_huge_page_enabled(void) 123 - { 124 - return READ_ONCE(nx_huge_pages); 125 - } 126 - 127 - int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, gfn_t gfn, bool can_unsync); 121 + int mmu_try_to_unsync_pages(struct kvm_vcpu *vcpu, struct kvm_memory_slot *slot, 122 + gfn_t gfn, bool can_unsync, bool prefetch); 128 123 129 124 void kvm_mmu_gfn_disallow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn); 130 125 void kvm_mmu_gfn_allow_lpage(const struct kvm_memory_slot *slot, gfn_t gfn); ··· 150 155 RET_PF_SPURIOUS, 151 156 }; 152 157 153 - /* Bits which may be returned by set_spte() */ 154 - #define SET_SPTE_WRITE_PROTECTED_PT BIT(0) 155 - #define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1) 156 - #define SET_SPTE_SPURIOUS BIT(2) 157 - 158 158 int kvm_mmu_max_mapping_level(struct kvm *kvm, 159 159 const struct kvm_memory_slot *slot, gfn_t gfn, 160 160 kvm_pfn_t pfn, int max_level); 161 - int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn, 162 - int max_level, kvm_pfn_t *pfnp, 163 - bool huge_page_disallowed, int *req_level); 164 - void disallowed_hugepage_adjust(u64 spte, gfn_t gfn, int cur_level, 165 - kvm_pfn_t *pfnp, int *goal_levelp); 161 + void kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); 162 + void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_level); 166 163 167 164 void *mmu_memory_cache_alloc(struct kvm_mmu_memory_cache *mc); 168 165

+9 -9

arch/x86/kvm/mmu/mmutrace.h

··· 252 252 253 253 TRACE_EVENT( 254 254 fast_page_fault, 255 - TP_PROTO(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u32 error_code, 255 + TP_PROTO(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, 256 256 u64 *sptep, u64 old_spte, int ret), 257 - TP_ARGS(vcpu, cr2_or_gpa, error_code, sptep, old_spte, ret), 257 + TP_ARGS(vcpu, fault, sptep, old_spte, ret), 258 258 259 259 TP_STRUCT__entry( 260 260 __field(int, vcpu_id) ··· 268 268 269 269 TP_fast_assign( 270 270 __entry->vcpu_id = vcpu->vcpu_id; 271 - __entry->cr2_or_gpa = cr2_or_gpa; 272 - __entry->error_code = error_code; 271 + __entry->cr2_or_gpa = fault->addr; 272 + __entry->error_code = fault->error_code; 273 273 __entry->sptep = sptep; 274 274 __entry->old_spte = old_spte; 275 275 __entry->new_spte = *sptep; ··· 367 367 368 368 TRACE_EVENT( 369 369 kvm_mmu_spte_requested, 370 - TP_PROTO(gpa_t addr, int level, kvm_pfn_t pfn), 371 - TP_ARGS(addr, level, pfn), 370 + TP_PROTO(struct kvm_page_fault *fault), 371 + TP_ARGS(fault), 372 372 373 373 TP_STRUCT__entry( 374 374 __field(u64, gfn) ··· 377 377 ), 378 378 379 379 TP_fast_assign( 380 - __entry->gfn = addr >> PAGE_SHIFT; 381 - __entry->pfn = pfn | (__entry->gfn & (KVM_PAGES_PER_HPAGE(level) - 1)); 382 - __entry->level = level; 380 + __entry->gfn = fault->gfn; 381 + __entry->pfn = fault->pfn | (fault->gfn & (KVM_PAGES_PER_HPAGE(fault->goal_level) - 1)); 382 + __entry->level = fault->goal_level; 383 383 ), 384 384 385 385 TP_printk("gfn %llx pfn %llx level %d",

+43 -6

arch/x86/kvm/mmu/page_track.c

··· 19 19 #include "mmu.h" 20 20 #include "mmu_internal.h" 21 21 22 + bool kvm_page_track_write_tracking_enabled(struct kvm *kvm) 23 + { 24 + return IS_ENABLED(CONFIG_KVM_EXTERNAL_WRITE_TRACKING) || 25 + !tdp_enabled || kvm_shadow_root_allocated(kvm); 26 + } 27 + 22 28 void kvm_page_track_free_memslot(struct kvm_memory_slot *slot) 23 29 { 24 30 int i; ··· 35 29 } 36 30 } 37 31 38 - int kvm_page_track_create_memslot(struct kvm_memory_slot *slot, 32 + int kvm_page_track_create_memslot(struct kvm *kvm, 33 + struct kvm_memory_slot *slot, 39 34 unsigned long npages) 40 35 { 41 - int i; 36 + int i; 42 37 43 38 for (i = 0; i < KVM_PAGE_TRACK_MAX; i++) { 39 + if (i == KVM_PAGE_TRACK_WRITE && 40 + !kvm_page_track_write_tracking_enabled(kvm)) 41 + continue; 42 + 44 43 slot->arch.gfn_track[i] = 45 44 kvcalloc(npages, sizeof(*slot->arch.gfn_track[i]), 46 45 GFP_KERNEL_ACCOUNT); ··· 66 55 return false; 67 56 68 57 return true; 58 + } 59 + 60 + int kvm_page_track_write_tracking_alloc(struct kvm_memory_slot *slot) 61 + { 62 + unsigned short *gfn_track; 63 + 64 + if (slot->arch.gfn_track[KVM_PAGE_TRACK_WRITE]) 65 + return 0; 66 + 67 + gfn_track = kvcalloc(slot->npages, sizeof(*gfn_track), GFP_KERNEL_ACCOUNT); 68 + if (gfn_track == NULL) 69 + return -ENOMEM; 70 + 71 + slot->arch.gfn_track[KVM_PAGE_TRACK_WRITE] = gfn_track; 72 + return 0; 69 73 } 70 74 71 75 static void update_gfn_track(struct kvm_memory_slot *slot, gfn_t gfn, ··· 118 92 if (WARN_ON(!page_track_mode_is_valid(mode))) 119 93 return; 120 94 95 + if (WARN_ON(mode == KVM_PAGE_TRACK_WRITE && 96 + !kvm_page_track_write_tracking_enabled(kvm))) 97 + return; 98 + 121 99 update_gfn_track(slot, gfn, mode, 1); 122 100 123 101 /* ··· 156 126 if (WARN_ON(!page_track_mode_is_valid(mode))) 157 127 return; 158 128 129 + if (WARN_ON(mode == KVM_PAGE_TRACK_WRITE && 130 + !kvm_page_track_write_tracking_enabled(kvm))) 131 + return; 132 + 159 133 update_gfn_track(slot, gfn, mode, -1); 160 134 161 135 /* ··· 173 139 /* 174 140 * check if the corresponding access on the specified guest page is tracked. 175 141 */ 176 - bool kvm_page_track_is_active(struct kvm_vcpu *vcpu, gfn_t gfn, 177 - enum kvm_page_track_mode mode) 142 + bool kvm_slot_page_track_is_active(struct kvm_vcpu *vcpu, 143 + struct kvm_memory_slot *slot, gfn_t gfn, 144 + enum kvm_page_track_mode mode) 178 145 { 179 - struct kvm_memory_slot *slot; 180 146 int index; 181 147 182 148 if (WARN_ON(!page_track_mode_is_valid(mode))) 183 149 return false; 184 150 185 - slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); 186 151 if (!slot) 152 + return false; 153 + 154 + if (mode == KVM_PAGE_TRACK_WRITE && 155 + !kvm_page_track_write_tracking_enabled(vcpu->kvm)) 187 156 return false; 188 157 189 158 index = gfn_to_index(gfn, slot->base_gfn, PG_LEVEL_4K);

+78 -92

arch/x86/kvm/mmu/paging_tmpl.h

··· 561 561 FNAME(prefetch_gpte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 562 562 u64 *spte, pt_element_t gpte, bool no_dirty_log) 563 563 { 564 + struct kvm_memory_slot *slot; 564 565 unsigned pte_access; 565 566 gfn_t gfn; 566 567 kvm_pfn_t pfn; ··· 574 573 gfn = gpte_to_gfn(gpte); 575 574 pte_access = sp->role.access & FNAME(gpte_access)(gpte); 576 575 FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte); 577 - pfn = pte_prefetch_gfn_to_pfn(vcpu, gfn, 576 + 577 + slot = gfn_to_memslot_dirty_bitmap(vcpu, gfn, 578 578 no_dirty_log && (pte_access & ACC_WRITE_MASK)); 579 + if (!slot) 580 + return false; 581 + 582 + pfn = gfn_to_pfn_memslot_atomic(slot, gfn); 579 583 if (is_error_pfn(pfn)) 580 584 return false; 581 585 582 - /* 583 - * we call mmu_set_spte() with host_writable = true because 584 - * pte_prefetch_gfn_to_pfn always gets a writable pfn. 585 - */ 586 - mmu_set_spte(vcpu, spte, pte_access, false, PG_LEVEL_4K, gfn, pfn, 587 - true, true); 588 - 586 + mmu_set_spte(vcpu, slot, spte, pte_access, gfn, pfn, NULL); 589 587 kvm_release_pfn_clean(pfn); 590 588 return true; 591 - } 592 - 593 - static void FNAME(update_pte)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 594 - u64 *spte, const void *pte) 595 - { 596 - pt_element_t gpte = *(const pt_element_t *)pte; 597 - 598 - FNAME(prefetch_gpte)(vcpu, sp, spte, gpte, false); 599 589 } 600 590 601 591 static bool FNAME(gpte_changed)(struct kvm_vcpu *vcpu, ··· 655 663 * If the guest tries to write a write-protected page, we need to 656 664 * emulate this operation, return 1 to indicate this case. 657 665 */ 658 - static int FNAME(fetch)(struct kvm_vcpu *vcpu, gpa_t addr, 659 - struct guest_walker *gw, u32 error_code, 660 - int max_level, kvm_pfn_t pfn, bool map_writable, 661 - bool prefault) 666 + static int FNAME(fetch)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault, 667 + struct guest_walker *gw) 662 668 { 663 - bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(); 664 - bool write_fault = error_code & PFERR_WRITE_MASK; 665 - bool exec = error_code & PFERR_FETCH_MASK; 666 - bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled; 667 669 struct kvm_mmu_page *sp = NULL; 668 670 struct kvm_shadow_walk_iterator it; 669 671 unsigned int direct_access, access; 670 - int top_level, level, req_level, ret; 671 - gfn_t base_gfn = gw->gfn; 672 + int top_level, ret; 673 + gfn_t base_gfn = fault->gfn; 672 674 675 + WARN_ON_ONCE(gw->gfn != base_gfn); 673 676 direct_access = gw->pte_access; 674 677 675 678 top_level = vcpu->arch.mmu->root_level; ··· 682 695 if (WARN_ON(!VALID_PAGE(vcpu->arch.mmu->root_hpa))) 683 696 goto out_gpte_changed; 684 697 685 - for (shadow_walk_init(&it, vcpu, addr); 698 + for (shadow_walk_init(&it, vcpu, fault->addr); 686 699 shadow_walk_okay(&it) && it.level > gw->level; 687 700 shadow_walk_next(&it)) { 688 701 gfn_t table_gfn; ··· 694 707 if (!is_shadow_present_pte(*it.sptep)) { 695 708 table_gfn = gw->table_gfn[it.level - 2]; 696 709 access = gw->pt_access[it.level - 2]; 697 - sp = kvm_mmu_get_page(vcpu, table_gfn, addr, 710 + sp = kvm_mmu_get_page(vcpu, table_gfn, fault->addr, 698 711 it.level-1, false, access); 699 712 /* 700 713 * We must synchronize the pagetable before linking it ··· 728 741 link_shadow_page(vcpu, it.sptep, sp); 729 742 } 730 743 731 - level = kvm_mmu_hugepage_adjust(vcpu, gw->gfn, max_level, &pfn, 732 - huge_page_disallowed, &req_level); 744 + kvm_mmu_hugepage_adjust(vcpu, fault); 733 745 734 - trace_kvm_mmu_spte_requested(addr, gw->level, pfn); 746 + trace_kvm_mmu_spte_requested(fault); 735 747 736 748 for (; shadow_walk_okay(&it); shadow_walk_next(&it)) { 737 749 clear_sp_write_flooding_count(it.sptep); ··· 739 753 * We cannot overwrite existing page tables with an NX 740 754 * large page, as the leaf could be executable. 741 755 */ 742 - if (nx_huge_page_workaround_enabled) 743 - disallowed_hugepage_adjust(*it.sptep, gw->gfn, it.level, 744 - &pfn, &level); 756 + if (fault->nx_huge_page_workaround_enabled) 757 + disallowed_hugepage_adjust(fault, *it.sptep, it.level); 745 758 746 - base_gfn = gw->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1); 747 - if (it.level == level) 759 + base_gfn = fault->gfn & ~(KVM_PAGES_PER_HPAGE(it.level) - 1); 760 + if (it.level == fault->goal_level) 748 761 break; 749 762 750 763 validate_direct_spte(vcpu, it.sptep, direct_access); ··· 751 766 drop_large_spte(vcpu, it.sptep); 752 767 753 768 if (!is_shadow_present_pte(*it.sptep)) { 754 - sp = kvm_mmu_get_page(vcpu, base_gfn, addr, 769 + sp = kvm_mmu_get_page(vcpu, base_gfn, fault->addr, 755 770 it.level - 1, true, direct_access); 756 771 link_shadow_page(vcpu, it.sptep, sp); 757 - if (huge_page_disallowed && req_level >= it.level) 772 + if (fault->huge_page_disallowed && 773 + fault->req_level >= it.level) 758 774 account_huge_nx_page(vcpu->kvm, sp); 759 775 } 760 776 } 761 777 762 - ret = mmu_set_spte(vcpu, it.sptep, gw->pte_access, write_fault, 763 - it.level, base_gfn, pfn, prefault, map_writable); 778 + if (WARN_ON_ONCE(it.level != fault->goal_level)) 779 + return -EFAULT; 780 + 781 + ret = mmu_set_spte(vcpu, fault->slot, it.sptep, gw->pte_access, 782 + base_gfn, fault->pfn, fault); 764 783 if (ret == RET_PF_SPURIOUS) 765 784 return ret; 766 785 ··· 830 841 * Returns: 1 if we need to emulate the instruction, 0 otherwise, or 831 842 * a negative value on error. 832 843 */ 833 - static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gpa_t addr, u32 error_code, 834 - bool prefault) 844 + static int FNAME(page_fault)(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) 835 845 { 836 - bool write_fault = error_code & PFERR_WRITE_MASK; 837 - bool user_fault = error_code & PFERR_USER_MASK; 838 846 struct guest_walker walker; 839 847 int r; 840 - kvm_pfn_t pfn; 841 - hva_t hva; 842 848 unsigned long mmu_seq; 843 - bool map_writable, is_self_change_mapping; 844 - int max_level; 849 + bool is_self_change_mapping; 845 850 846 - pgprintk("%s: addr %lx err %x\n", __func__, addr, error_code); 847 - 848 - /* 849 - * If PFEC.RSVD is set, this is a shadow page fault. 850 - * The bit needs to be cleared before walking guest page tables. 851 - */ 852 - error_code &= ~PFERR_RSVD_MASK; 851 + pgprintk("%s: addr %lx err %x\n", __func__, fault->addr, fault->error_code); 852 + WARN_ON_ONCE(fault->is_tdp); 853 853 854 854 /* 855 855 * Look up the guest pte for the faulting address. 856 + * If PFEC.RSVD is set, this is a shadow page fault. 857 + * The bit needs to be cleared before walking guest page tables. 856 858 */ 857 - r = FNAME(walk_addr)(&walker, vcpu, addr, error_code); 859 + r = FNAME(walk_addr)(&walker, vcpu, fault->addr, 860 + fault->error_code & ~PFERR_RSVD_MASK); 858 861 859 862 /* 860 863 * The page is not mapped by the guest. Let the guest handle it. 861 864 */ 862 865 if (!r) { 863 866 pgprintk("%s: guest page fault\n", __func__); 864 - if (!prefault) 867 + if (!fault->prefetch) 865 868 kvm_inject_emulated_page_fault(vcpu, &walker.fault); 866 869 867 870 return RET_PF_RETRY; 868 871 } 869 872 870 - if (page_fault_handle_page_track(vcpu, error_code, walker.gfn)) { 871 - shadow_page_table_clear_flood(vcpu, addr); 873 + fault->gfn = walker.gfn; 874 + fault->slot = kvm_vcpu_gfn_to_memslot(vcpu, fault->gfn); 875 + 876 + if (page_fault_handle_page_track(vcpu, fault)) { 877 + shadow_page_table_clear_flood(vcpu, fault->addr); 872 878 return RET_PF_EMULATE; 873 879 } 874 880 ··· 874 890 vcpu->arch.write_fault_to_shadow_pgtable = false; 875 891 876 892 is_self_change_mapping = FNAME(is_self_change_mapping)(vcpu, 877 - &walker, user_fault, &vcpu->arch.write_fault_to_shadow_pgtable); 893 + &walker, fault->user, &vcpu->arch.write_fault_to_shadow_pgtable); 878 894 879 895 if (is_self_change_mapping) 880 - max_level = PG_LEVEL_4K; 896 + fault->max_level = PG_LEVEL_4K; 881 897 else 882 - max_level = walker.level; 898 + fault->max_level = walker.level; 883 899 884 900 mmu_seq = vcpu->kvm->mmu_notifier_seq; 885 901 smp_rmb(); 886 902 887 - if (kvm_faultin_pfn(vcpu, prefault, walker.gfn, addr, &pfn, &hva, 888 - write_fault, &map_writable, &r)) 903 + if (kvm_faultin_pfn(vcpu, fault, &r)) 889 904 return r; 890 905 891 - if (handle_abnormal_pfn(vcpu, addr, walker.gfn, pfn, walker.pte_access, &r)) 906 + if (handle_abnormal_pfn(vcpu, fault, walker.pte_access, &r)) 892 907 return r; 893 908 894 909 /* 895 910 * Do not change pte_access if the pfn is a mmio page, otherwise 896 911 * we will cache the incorrect access into mmio spte. 897 912 */ 898 - if (write_fault && !(walker.pte_access & ACC_WRITE_MASK) && 899 - !is_cr0_wp(vcpu->arch.mmu) && !user_fault && !is_noslot_pfn(pfn)) { 913 + if (fault->write && !(walker.pte_access & ACC_WRITE_MASK) && 914 + !is_cr0_wp(vcpu->arch.mmu) && !fault->user && fault->slot) { 900 915 walker.pte_access |= ACC_WRITE_MASK; 901 916 walker.pte_access &= ~ACC_USER_MASK; 902 917 ··· 911 928 912 929 r = RET_PF_RETRY; 913 930 write_lock(&vcpu->kvm->mmu_lock); 914 - if (!is_noslot_pfn(pfn) && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, hva)) 931 + if (fault->slot && mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva)) 915 932 goto out_unlock; 916 933 917 934 kvm_mmu_audit(vcpu, AUDIT_PRE_PAGE_FAULT); 918 935 r = make_mmu_pages_available(vcpu); 919 936 if (r) 920 937 goto out_unlock; 921 - r = FNAME(fetch)(vcpu, addr, &walker, error_code, max_level, pfn, 922 - map_writable, prefault); 938 + r = FNAME(fetch)(vcpu, fault, &walker); 923 939 kvm_mmu_audit(vcpu, AUDIT_POST_PAGE_FAULT); 924 940 925 941 out_unlock: 926 942 write_unlock(&vcpu->kvm->mmu_lock); 927 - kvm_release_pfn_clean(pfn); 943 + kvm_release_pfn_clean(fault->pfn); 928 944 return r; 929 945 } 930 946 ··· 989 1007 sizeof(pt_element_t))) 990 1008 break; 991 1009 992 - FNAME(update_pte)(vcpu, sp, sptep, &gpte); 1010 + FNAME(prefetch_gpte)(vcpu, sp, sptep, gpte, false); 993 1011 } 994 1012 995 - if (!is_shadow_present_pte(*sptep) || !sp->unsync_children) 1013 + if (!sp->unsync_children) 996 1014 break; 997 1015 } 998 1016 write_unlock(&vcpu->kvm->mmu_lock); ··· 1048 1066 * Using the cached information from sp->gfns is safe because: 1049 1067 * - The spte has a reference to the struct page, so the pfn for a given gfn 1050 1068 * can't change unless all sptes pointing to it are nuked first. 1069 + * 1070 + * Returns 1071 + * < 0: the sp should be zapped 1072 + * 0: the sp is synced and no tlb flushing is required 1073 + * > 0: the sp is synced and tlb flushing is required 1051 1074 */ 1052 1075 static int FNAME(sync_page)(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp) 1053 1076 { 1054 1077 union kvm_mmu_page_role mmu_role = vcpu->arch.mmu->mmu_role.base; 1055 - int i, nr_present = 0; 1078 + int i; 1056 1079 bool host_writable; 1057 1080 gpa_t first_pte_gpa; 1058 - int set_spte_ret = 0; 1081 + bool flush = false; 1059 1082 1060 1083 /* 1061 1084 * Ignore various flags when verifying that it's safe to sync a shadow ··· 1085 1098 */ 1086 1099 if (WARN_ON_ONCE(sp->role.direct || 1087 1100 (sp->role.word ^ mmu_role.word) & ~sync_role_ign.word)) 1088 - return 0; 1101 + return -1; 1089 1102 1090 1103 first_pte_gpa = FNAME(get_level1_sp_gpa)(sp); 1091 1104 1092 1105 for (i = 0; i < PT64_ENT_PER_PAGE; i++) { 1106 + u64 *sptep, spte; 1107 + struct kvm_memory_slot *slot; 1093 1108 unsigned pte_access; 1094 1109 pt_element_t gpte; 1095 1110 gpa_t pte_gpa; ··· 1104 1115 1105 1116 if (kvm_vcpu_read_guest_atomic(vcpu, pte_gpa, &gpte, 1106 1117 sizeof(pt_element_t))) 1107 - return 0; 1118 + return -1; 1108 1119 1109 1120 if (FNAME(prefetch_invalid_gpte)(vcpu, sp, &sp->spt[i], gpte)) { 1110 - set_spte_ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH; 1121 + flush = true; 1111 1122 continue; 1112 1123 } 1113 1124 ··· 1116 1127 pte_access &= FNAME(gpte_access)(gpte); 1117 1128 FNAME(protect_clean_gpte)(vcpu->arch.mmu, &pte_access, gpte); 1118 1129 1119 - if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access, 1120 - &nr_present)) 1130 + if (sync_mmio_spte(vcpu, &sp->spt[i], gfn, pte_access)) 1121 1131 continue; 1122 1132 1123 1133 if (gfn != sp->gfns[i]) { 1124 1134 drop_spte(vcpu->kvm, &sp->spt[i]); 1125 - set_spte_ret |= SET_SPTE_NEED_REMOTE_TLB_FLUSH; 1135 + flush = true; 1126 1136 continue; 1127 1137 } 1128 1138 1129 - nr_present++; 1139 + sptep = &sp->spt[i]; 1140 + spte = *sptep; 1141 + host_writable = spte & shadow_host_writable_mask; 1142 + slot = kvm_vcpu_gfn_to_memslot(vcpu, gfn); 1143 + make_spte(vcpu, sp, slot, pte_access, gfn, 1144 + spte_to_pfn(spte), spte, true, false, 1145 + host_writable, &spte); 1130 1146 1131 - host_writable = sp->spt[i] & shadow_host_writable_mask; 1132 - 1133 - set_spte_ret |= set_spte(vcpu, &sp->spt[i], 1134 - pte_access, PG_LEVEL_4K, 1135 - gfn, spte_to_pfn(sp->spt[i]), 1136 - true, false, host_writable); 1147 + flush |= mmu_spte_update(sptep, spte); 1137 1148 } 1138 1149 1139 - if (set_spte_ret & SET_SPTE_NEED_REMOTE_TLB_FLUSH) 1140 - kvm_flush_remote_tlbs(vcpu->kvm); 1141 - 1142 - return nr_present; 1150 + return flush; 1143 1151 } 1144 1152 1145 1153 #undef pt_element_t

+21 -13

arch/x86/kvm/mmu/spte.c

··· 89 89 E820_TYPE_RAM); 90 90 } 91 91 92 - int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, 93 - gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative, 94 - bool can_unsync, bool host_writable, bool ad_disabled, 95 - u64 *new_spte) 92 + bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 93 + struct kvm_memory_slot *slot, 94 + unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, 95 + u64 old_spte, bool prefetch, bool can_unsync, 96 + bool host_writable, u64 *new_spte) 96 97 { 98 + int level = sp->role.level; 97 99 u64 spte = SPTE_MMU_PRESENT_MASK; 98 - int ret = 0; 100 + bool wrprot = false; 99 101 100 - if (ad_disabled) 102 + if (sp->role.ad_disabled) 101 103 spte |= SPTE_TDP_AD_DISABLED_MASK; 102 104 else if (kvm_vcpu_ad_need_write_protect(vcpu)) 103 105 spte |= SPTE_TDP_AD_WRPROT_ONLY_MASK; ··· 111 109 * read access. See FNAME(gpte_access) in paging_tmpl.h. 112 110 */ 113 111 spte |= shadow_present_mask; 114 - if (!speculative) 112 + if (!prefetch) 115 113 spte |= spte_shadow_accessed_mask(spte); 116 114 117 115 if (level > PG_LEVEL_4K && (pte_access & ACC_EXEC_MASK) && ··· 152 150 * is responsibility of kvm_mmu_get_page / kvm_mmu_sync_roots. 153 151 * Same reasoning can be applied to dirty page accounting. 154 152 */ 155 - if (!can_unsync && is_writable_pte(old_spte)) 153 + if (is_writable_pte(old_spte)) 156 154 goto out; 157 155 158 156 /* ··· 161 159 * e.g. it's write-tracked (upper-level SPs) or has one or more 162 160 * shadow pages and unsync'ing pages is not allowed. 163 161 */ 164 - if (mmu_try_to_unsync_pages(vcpu, gfn, can_unsync)) { 162 + if (mmu_try_to_unsync_pages(vcpu, slot, gfn, can_unsync, prefetch)) { 165 163 pgprintk("%s: found shadow page for %llx, marking ro\n", 166 164 __func__, gfn); 167 - ret |= SET_SPTE_WRITE_PROTECTED_PT; 165 + wrprot = true; 168 166 pte_access &= ~ACC_WRITE_MASK; 169 167 spte &= ~(PT_WRITABLE_MASK | shadow_mmu_writable_mask); 170 168 } ··· 173 171 if (pte_access & ACC_WRITE_MASK) 174 172 spte |= spte_shadow_dirty_mask(spte); 175 173 176 - if (speculative) 174 + out: 175 + if (prefetch) 177 176 spte = mark_spte_for_access_track(spte); 178 177 179 - out: 180 178 WARN_ONCE(is_rsvd_spte(&vcpu->arch.mmu->shadow_zero_check, spte, level), 181 179 "spte = 0x%llx, level = %d, rsvd bits = 0x%llx", spte, level, 182 180 get_rsvd_bits(&vcpu->arch.mmu->shadow_zero_check, spte, level)); 183 181 182 + if ((spte & PT_WRITABLE_MASK) && kvm_slot_dirty_track_enabled(slot)) { 183 + /* Enforced by kvm_mmu_hugepage_adjust. */ 184 + WARN_ON(level > PG_LEVEL_4K); 185 + mark_page_dirty_in_slot(vcpu->kvm, slot, gfn); 186 + } 187 + 184 188 *new_spte = spte; 185 - return ret; 189 + return wrprot; 186 190 } 187 191 188 192 u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled)

+6 -15

arch/x86/kvm/mmu/spte.h

··· 310 310 static __always_inline bool is_rsvd_spte(struct rsvd_bits_validate *rsvd_check, 311 311 u64 spte, int level) 312 312 { 313 - /* 314 - * Use a bitwise-OR instead of a logical-OR to aggregate the reserved 315 - * bits and EPT's invalid memtype/XWR checks to avoid an extra Jcc 316 - * (this is extremely unlikely to be short-circuited as true). 317 - */ 318 - return __is_bad_mt_xwr(rsvd_check, spte) | 313 + return __is_bad_mt_xwr(rsvd_check, spte) || 319 314 __is_rsvd_bits_set(rsvd_check, spte, level); 320 315 } 321 316 ··· 329 334 return gen; 330 335 } 331 336 332 - /* Bits which may be returned by set_spte() */ 333 - #define SET_SPTE_WRITE_PROTECTED_PT BIT(0) 334 - #define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1) 335 - #define SET_SPTE_SPURIOUS BIT(2) 336 - 337 - int make_spte(struct kvm_vcpu *vcpu, unsigned int pte_access, int level, 338 - gfn_t gfn, kvm_pfn_t pfn, u64 old_spte, bool speculative, 339 - bool can_unsync, bool host_writable, bool ad_disabled, 340 - u64 *new_spte); 337 + bool make_spte(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp, 338 + struct kvm_memory_slot *slot, 339 + unsigned int pte_access, gfn_t gfn, kvm_pfn_t pfn, 340 + u64 old_spte, bool prefetch, bool can_unsync, 341 + bool host_writable, u64 *new_spte); 341 342 u64 make_nonleaf_spte(u64 *child_pt, bool ad_disabled); 342 343 u64 make_mmio_spte(struct kvm_vcpu *vcpu, u64 gfn, unsigned int access); 343 344 u64 mark_spte_for_access_track(u64 spte);

+34 -85

arch/x86/kvm/mmu/tdp_mmu.c

··· 167 167 role.direct = true; 168 168 role.gpte_is_8_bytes = true; 169 169 role.access = ACC_ALL; 170 + role.ad_disabled = !shadow_accessed_mask; 170 171 171 172 return role; 172 173 } ··· 490 489 } 491 490 492 491 /* 493 - * tdp_mmu_set_spte_atomic_no_dirty_log - Set a TDP MMU SPTE atomically 494 - * and handle the associated bookkeeping, but do not mark the page dirty 492 + * tdp_mmu_set_spte_atomic - Set a TDP MMU SPTE atomically 493 + * and handle the associated bookkeeping. Do not mark the page dirty 495 494 * in KVM's dirty bitmaps. 496 495 * 497 496 * @kvm: kvm instance ··· 500 499 * Returns: true if the SPTE was set, false if it was not. If false is returned, 501 500 * this function will have no side-effects. 502 501 */ 503 - static inline bool tdp_mmu_set_spte_atomic_no_dirty_log(struct kvm *kvm, 504 - struct tdp_iter *iter, 505 - u64 new_spte) 502 + static inline bool tdp_mmu_set_spte_atomic(struct kvm *kvm, 503 + struct tdp_iter *iter, 504 + u64 new_spte) 506 505 { 507 506 lockdep_assert_held_read(&kvm->mmu_lock); 508 507 ··· 528 527 return true; 529 528 } 530 529 531 - /* 532 - * tdp_mmu_map_set_spte_atomic - Set a leaf TDP MMU SPTE atomically to resolve a 533 - * TDP page fault. 534 - * 535 - * @vcpu: The vcpu instance that took the TDP page fault. 536 - * @iter: a tdp_iter instance currently on the SPTE that should be set 537 - * @new_spte: The value the SPTE should be set to 538 - * 539 - * Returns: true if the SPTE was set, false if it was not. If false is returned, 540 - * this function will have no side-effects. 541 - */ 542 - static inline bool tdp_mmu_map_set_spte_atomic(struct kvm_vcpu *vcpu, 543 - struct tdp_iter *iter, 544 - u64 new_spte) 545 - { 546 - struct kvm *kvm = vcpu->kvm; 547 - 548 - if (!tdp_mmu_set_spte_atomic_no_dirty_log(kvm, iter, new_spte)) 549 - return false; 550 - 551 - /* 552 - * Use kvm_vcpu_gfn_to_memslot() instead of going through 553 - * handle_changed_spte_dirty_log() to leverage vcpu->last_used_slot. 554 - */ 555 - if (is_writable_pte(new_spte)) { 556 - struct kvm_memory_slot *slot = kvm_vcpu_gfn_to_memslot(vcpu, iter->gfn); 557 - 558 - if (slot && kvm_slot_dirty_track_enabled(slot)) { 559 - /* Enforced by kvm_mmu_hugepage_adjust. */ 560 - WARN_ON_ONCE(iter->level > PG_LEVEL_4K); 561 - mark_page_dirty_in_slot(kvm, slot, iter->gfn); 562 - } 563 - } 564 - 565 - return true; 566 - } 567 - 568 530 static inline bool tdp_mmu_zap_spte_atomic(struct kvm *kvm, 569 531 struct tdp_iter *iter) 570 532 { ··· 537 573 * immediately installing a present entry in its place 538 574 * before the TLBs are flushed. 539 575 */ 540 - if (!tdp_mmu_set_spte_atomic_no_dirty_log(kvm, iter, REMOVED_SPTE)) 576 + if (!tdp_mmu_set_spte_atomic(kvm, iter, REMOVED_SPTE)) 541 577 return false; 542 578 543 579 kvm_flush_remote_tlbs_with_address(kvm, iter->gfn, ··· 893 929 * Installs a last-level SPTE to handle a TDP page fault. 894 930 * (NPT/EPT violation/misconfiguration) 895 931 */ 896 - static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, int write, 897 - int map_writable, 898 - struct tdp_iter *iter, 899 - kvm_pfn_t pfn, bool prefault) 932 + static int tdp_mmu_map_handle_target_level(struct kvm_vcpu *vcpu, 933 + struct kvm_page_fault *fault, 934 + struct tdp_iter *iter) 900 935 { 936 + struct kvm_mmu_page *sp = sptep_to_sp(iter->sptep); 901 937 u64 new_spte; 902 938 int ret = RET_PF_FIXED; 903 - int make_spte_ret = 0; 939 + bool wrprot = false; 904 940 905 - if (unlikely(is_noslot_pfn(pfn))) 941 + WARN_ON(sp->role.level != fault->goal_level); 942 + if (unlikely(!fault->slot)) 906 943 new_spte = make_mmio_spte(vcpu, iter->gfn, ACC_ALL); 907 944 else 908 - make_spte_ret = make_spte(vcpu, ACC_ALL, iter->level, iter->gfn, 909 - pfn, iter->old_spte, prefault, true, 910 - map_writable, !shadow_accessed_mask, 911 - &new_spte); 945 + wrprot = make_spte(vcpu, sp, fault->slot, ACC_ALL, iter->gfn, 946 + fault->pfn, iter->old_spte, fault->prefetch, true, 947 + fault->map_writable, &new_spte); 912 948 913 949 if (new_spte == iter->old_spte) 914 950 ret = RET_PF_SPURIOUS; 915 - else if (!tdp_mmu_map_set_spte_atomic(vcpu, iter, new_spte)) 951 + else if (!tdp_mmu_set_spte_atomic(vcpu->kvm, iter, new_spte)) 916 952 return RET_PF_RETRY; 917 953 918 954 /* ··· 920 956 * protected, emulation is needed. If the emulation was skipped, 921 957 * the vCPU would have the same fault again. 922 958 */ 923 - if (make_spte_ret & SET_SPTE_WRITE_PROTECTED_PT) { 924 - if (write) 959 + if (wrprot) { 960 + if (fault->write) 925 961 ret = RET_PF_EMULATE; 926 - kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); 927 962 } 928 963 929 964 /* If a MMIO SPTE is installed, the MMIO will need to be emulated. */ ··· 949 986 * Handle a TDP page fault (NPT/EPT violation/misconfiguration) by installing 950 987 * page tables and SPTEs to translate the faulting guest physical address. 951 988 */ 952 - int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, 953 - int map_writable, int max_level, kvm_pfn_t pfn, 954 - bool prefault) 989 + int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) 955 990 { 956 - bool nx_huge_page_workaround_enabled = is_nx_huge_page_enabled(); 957 - bool write = error_code & PFERR_WRITE_MASK; 958 - bool exec = error_code & PFERR_FETCH_MASK; 959 - bool huge_page_disallowed = exec && nx_huge_page_workaround_enabled; 960 991 struct kvm_mmu *mmu = vcpu->arch.mmu; 961 992 struct tdp_iter iter; 962 993 struct kvm_mmu_page *sp; 963 994 u64 *child_pt; 964 995 u64 new_spte; 965 996 int ret; 966 - gfn_t gfn = gpa >> PAGE_SHIFT; 967 - int level; 968 - int req_level; 969 997 970 - level = kvm_mmu_hugepage_adjust(vcpu, gfn, max_level, &pfn, 971 - huge_page_disallowed, &req_level); 998 + kvm_mmu_hugepage_adjust(vcpu, fault); 972 999 973 - trace_kvm_mmu_spte_requested(gpa, level, pfn); 1000 + trace_kvm_mmu_spte_requested(fault); 974 1001 975 1002 rcu_read_lock(); 976 1003 977 - tdp_mmu_for_each_pte(iter, mmu, gfn, gfn + 1) { 978 - if (nx_huge_page_workaround_enabled) 979 - disallowed_hugepage_adjust(iter.old_spte, gfn, 980 - iter.level, &pfn, &level); 1004 + tdp_mmu_for_each_pte(iter, mmu, fault->gfn, fault->gfn + 1) { 1005 + if (fault->nx_huge_page_workaround_enabled) 1006 + disallowed_hugepage_adjust(fault, iter.old_spte, iter.level); 981 1007 982 - if (iter.level == level) 1008 + if (iter.level == fault->goal_level) 983 1009 break; 984 1010 985 1011 /* ··· 1004 1052 new_spte = make_nonleaf_spte(child_pt, 1005 1053 !shadow_accessed_mask); 1006 1054 1007 - if (tdp_mmu_set_spte_atomic_no_dirty_log(vcpu->kvm, &iter, new_spte)) { 1055 + if (tdp_mmu_set_spte_atomic(vcpu->kvm, &iter, new_spte)) { 1008 1056 tdp_mmu_link_page(vcpu->kvm, sp, 1009 - huge_page_disallowed && 1010 - req_level >= iter.level); 1057 + fault->huge_page_disallowed && 1058 + fault->req_level >= iter.level); 1011 1059 1012 1060 trace_kvm_mmu_get_page(sp, true); 1013 1061 } else { ··· 1017 1065 } 1018 1066 } 1019 1067 1020 - if (iter.level != level) { 1068 + if (iter.level != fault->goal_level) { 1021 1069 rcu_read_unlock(); 1022 1070 return RET_PF_RETRY; 1023 1071 } 1024 1072 1025 - ret = tdp_mmu_map_handle_target_level(vcpu, write, map_writable, &iter, 1026 - pfn, prefault); 1073 + ret = tdp_mmu_map_handle_target_level(vcpu, fault, &iter); 1027 1074 rcu_read_unlock(); 1028 1075 1029 1076 return ret; ··· 1192 1241 1193 1242 new_spte = iter.old_spte & ~PT_WRITABLE_MASK; 1194 1243 1195 - if (!tdp_mmu_set_spte_atomic_no_dirty_log(kvm, &iter, 1196 - new_spte)) { 1244 + if (!tdp_mmu_set_spte_atomic(kvm, &iter, new_spte)) { 1197 1245 /* 1198 1246 * The iter must explicitly re-read the SPTE because 1199 1247 * the atomic cmpxchg failed. ··· 1260 1310 continue; 1261 1311 } 1262 1312 1263 - if (!tdp_mmu_set_spte_atomic_no_dirty_log(kvm, &iter, 1264 - new_spte)) { 1313 + if (!tdp_mmu_set_spte_atomic(kvm, &iter, new_spte)) { 1265 1314 /* 1266 1315 * The iter must explicitly re-read the SPTE because 1267 1316 * the atomic cmpxchg failed.

+1 -5

arch/x86/kvm/mmu/tdp_mmu.h

··· 48 48 void kvm_tdp_mmu_invalidate_all_roots(struct kvm *kvm); 49 49 void kvm_tdp_mmu_zap_invalidated_roots(struct kvm *kvm); 50 50 51 - int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, gpa_t gpa, u32 error_code, 52 - int map_writable, int max_level, kvm_pfn_t pfn, 53 - bool prefault); 51 + int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault); 54 52 55 53 bool kvm_tdp_mmu_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range, 56 54 bool flush); ··· 90 92 #ifdef CONFIG_X86_64 91 93 bool kvm_mmu_init_tdp_mmu(struct kvm *kvm); 92 94 void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm); 93 - static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return kvm->arch.tdp_mmu_enabled; } 94 95 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return sp->tdp_mmu_page; } 95 96 96 97 static inline bool is_tdp_mmu(struct kvm_mmu *mmu) ··· 111 114 #else 112 115 static inline bool kvm_mmu_init_tdp_mmu(struct kvm *kvm) { return false; } 113 116 static inline void kvm_mmu_uninit_tdp_mmu(struct kvm *kvm) {} 114 - static inline bool is_tdp_mmu_enabled(struct kvm *kvm) { return false; } 115 117 static inline bool is_tdp_mmu_page(struct kvm_mmu_page *sp) { return false; } 116 118 static inline bool is_tdp_mmu(struct kvm_mmu *mmu) { return false; } 117 119 #endif

+42 -10

arch/x86/kvm/svm/nested.c

··· 238 238 kvm_vcpu_is_legal_gpa(vcpu, addr + size - 1); 239 239 } 240 240 241 + static bool nested_svm_check_tlb_ctl(struct kvm_vcpu *vcpu, u8 tlb_ctl) 242 + { 243 + /* Nested FLUSHBYASID is not supported yet. */ 244 + switch(tlb_ctl) { 245 + case TLB_CONTROL_DO_NOTHING: 246 + case TLB_CONTROL_FLUSH_ALL_ASID: 247 + return true; 248 + default: 249 + return false; 250 + } 251 + } 252 + 241 253 static bool nested_vmcb_check_controls(struct kvm_vcpu *vcpu, 242 254 struct vmcb_control_area *control) 243 255 { ··· 267 255 return false; 268 256 if (CC(!nested_svm_check_bitmap_pa(vcpu, control->iopm_base_pa, 269 257 IOPM_SIZE))) 258 + return false; 259 + 260 + if (CC(!nested_svm_check_tlb_ctl(vcpu, control->tlb_ctl))) 270 261 return false; 271 262 272 263 return true; ··· 553 538 if (nested_npt_enabled(svm)) 554 539 nested_svm_init_mmu_context(vcpu); 555 540 556 - svm->vmcb->control.tsc_offset = vcpu->arch.tsc_offset = 557 - vcpu->arch.l1_tsc_offset + svm->nested.ctl.tsc_offset; 541 + vcpu->arch.tsc_offset = kvm_calc_nested_tsc_offset( 542 + vcpu->arch.l1_tsc_offset, 543 + svm->nested.ctl.tsc_offset, 544 + svm->tsc_ratio_msr); 545 + 546 + svm->vmcb->control.tsc_offset = vcpu->arch.tsc_offset; 547 + 548 + if (svm->tsc_ratio_msr != kvm_default_tsc_scaling_ratio) { 549 + WARN_ON(!svm->tsc_scaling_enabled); 550 + nested_svm_update_tsc_ratio_msr(vcpu); 551 + } 558 552 559 553 svm->vmcb->control.int_ctl = 560 554 (svm->nested.ctl.int_ctl & int_ctl_vmcb12_bits) | ··· 573 549 svm->vmcb->control.int_state = svm->nested.ctl.int_state; 574 550 svm->vmcb->control.event_inj = svm->nested.ctl.event_inj; 575 551 svm->vmcb->control.event_inj_err = svm->nested.ctl.event_inj_err; 576 - 577 - svm->vmcb->control.pause_filter_count = svm->nested.ctl.pause_filter_count; 578 - svm->vmcb->control.pause_filter_thresh = svm->nested.ctl.pause_filter_thresh; 579 552 580 553 nested_svm_transition_tlb_flush(vcpu); 581 554 ··· 831 810 vmcb12->control.event_inj = svm->nested.ctl.event_inj; 832 811 vmcb12->control.event_inj_err = svm->nested.ctl.event_inj_err; 833 812 834 - vmcb12->control.pause_filter_count = 835 - svm->vmcb->control.pause_filter_count; 836 - vmcb12->control.pause_filter_thresh = 837 - svm->vmcb->control.pause_filter_thresh; 838 - 839 813 nested_svm_copy_common_state(svm->nested.vmcb02.ptr, svm->vmcb01.ptr); 840 814 841 815 svm_switch_vmcb(svm, &svm->vmcb01); ··· 846 830 if (svm->vmcb->control.tsc_offset != svm->vcpu.arch.tsc_offset) { 847 831 svm->vmcb->control.tsc_offset = svm->vcpu.arch.tsc_offset; 848 832 vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS); 833 + } 834 + 835 + if (svm->tsc_ratio_msr != kvm_default_tsc_scaling_ratio) { 836 + WARN_ON(!svm->tsc_scaling_enabled); 837 + vcpu->arch.tsc_scaling_ratio = vcpu->arch.l1_tsc_scaling_ratio; 838 + svm_write_tsc_multiplier(vcpu, vcpu->arch.tsc_scaling_ratio); 849 839 } 850 840 851 841 svm->nested.ctl.nested_cr3 = 0; ··· 1239 1217 } 1240 1218 1241 1219 return NESTED_EXIT_CONTINUE; 1220 + } 1221 + 1222 + void nested_svm_update_tsc_ratio_msr(struct kvm_vcpu *vcpu) 1223 + { 1224 + struct vcpu_svm *svm = to_svm(vcpu); 1225 + 1226 + vcpu->arch.tsc_scaling_ratio = 1227 + kvm_calc_nested_tsc_multiplier(vcpu->arch.l1_tsc_scaling_ratio, 1228 + svm->tsc_ratio_msr); 1229 + svm_write_tsc_multiplier(vcpu, vcpu->arch.tsc_scaling_ratio); 1242 1230 } 1243 1231 1244 1232 static int svm_get_nested_state(struct kvm_vcpu *vcpu,

+3 -3

arch/x86/kvm/svm/sev.c

··· 2652 2652 set_msr_interception(vcpu, svm->msrpm, MSR_IA32_LASTINTTOIP, 1, 1); 2653 2653 } 2654 2654 2655 - void sev_es_create_vcpu(struct vcpu_svm *svm) 2655 + void sev_es_vcpu_reset(struct vcpu_svm *svm) 2656 2656 { 2657 2657 /* 2658 - * Set the GHCB MSR value as per the GHCB specification when creating 2659 - * a vCPU for an SEV-ES guest. 2658 + * Set the GHCB MSR value as per the GHCB specification when emulating 2659 + * vCPU RESET for an SEV-ES guest. 2660 2660 */ 2661 2661 set_ghcb_msr(svm, GHCB_MSR_SEV_INFO(GHCB_VERSION_MAX, 2662 2662 GHCB_VERSION_MIN,

+115 -53

arch/x86/kvm/svm/svm.c

··· 188 188 static int vgif = true; 189 189 module_param(vgif, int, 0444); 190 190 191 + /* enable/disable LBR virtualization */ 192 + static int lbrv = true; 193 + module_param(lbrv, int, 0444); 194 + 195 + static int tsc_scaling = true; 196 + module_param(tsc_scaling, int, 0444); 197 + 191 198 /* 192 199 * enable / disable AVIC. Because the defaults differ for APICv 193 200 * support between VMX and SVM we cannot use module_param_named. ··· 475 468 static void svm_hardware_disable(void) 476 469 { 477 470 /* Make sure we clean up behind us */ 478 - if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) 471 + if (tsc_scaling) 479 472 wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT); 480 473 481 474 cpu_svm_disable(); ··· 518 511 wrmsrl(MSR_VM_HSAVE_PA, __sme_page_pa(sd->save_area)); 519 512 520 513 if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) { 514 + /* 515 + * Set the default value, even if we don't use TSC scaling 516 + * to avoid having stale value in the msr 517 + */ 521 518 wrmsrl(MSR_AMD64_TSC_RATIO, TSC_RATIO_DEFAULT); 522 519 __this_cpu_write(current_tsc_ratio, TSC_RATIO_DEFAULT); 523 520 } ··· 942 931 if (npt_enabled) 943 932 kvm_cpu_cap_set(X86_FEATURE_NPT); 944 933 934 + if (tsc_scaling) 935 + kvm_cpu_cap_set(X86_FEATURE_TSCRATEMSR); 936 + 945 937 /* Nested VM can receive #VMEXIT instead of triggering #GP */ 946 938 kvm_cpu_cap_set(X86_FEATURE_SVME_ADDR_CHK); 947 939 } ··· 992 978 if (boot_cpu_has(X86_FEATURE_FXSR_OPT)) 993 979 kvm_enable_efer_bits(EFER_FFXSR); 994 980 995 - if (boot_cpu_has(X86_FEATURE_TSCRATEMSR)) { 996 - kvm_has_tsc_control = true; 997 - kvm_max_tsc_scaling_ratio = TSC_RATIO_MAX; 998 - kvm_tsc_scaling_ratio_frac_bits = 32; 981 + if (tsc_scaling) { 982 + if (!boot_cpu_has(X86_FEATURE_TSCRATEMSR)) { 983 + tsc_scaling = false; 984 + } else { 985 + pr_info("TSC scaling supported\n"); 986 + kvm_has_tsc_control = true; 987 + kvm_max_tsc_scaling_ratio = TSC_RATIO_MAX; 988 + kvm_tsc_scaling_ratio_frac_bits = 32; 989 + } 999 990 } 1000 991 1001 992 tsc_aux_uret_slot = kvm_add_user_return_msr(MSR_TSC_AUX); ··· 1080 1061 pr_info("Virtual GIF supported\n"); 1081 1062 } 1082 1063 1064 + if (lbrv) { 1065 + if (!boot_cpu_has(X86_FEATURE_LBRV)) 1066 + lbrv = false; 1067 + else 1068 + pr_info("LBR virtualization supported\n"); 1069 + } 1070 + 1083 1071 svm_set_cpu_caps(); 1084 1072 1085 1073 /* ··· 1137 1111 1138 1112 static u64 svm_get_l2_tsc_multiplier(struct kvm_vcpu *vcpu) 1139 1113 { 1140 - return kvm_default_tsc_scaling_ratio; 1114 + struct vcpu_svm *svm = to_svm(vcpu); 1115 + 1116 + return svm->tsc_ratio_msr; 1141 1117 } 1142 1118 1143 1119 static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) ··· 1151 1123 vmcb_mark_dirty(svm->vmcb, VMCB_INTERCEPTS); 1152 1124 } 1153 1125 1154 - static void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier) 1126 + void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier) 1155 1127 { 1156 1128 wrmsrl(MSR_AMD64_TSC_RATIO, multiplier); 1157 1129 } ··· 1177 1149 svm_clr_intercept(svm, INTERCEPT_RDTSCP); 1178 1150 else 1179 1151 svm_set_intercept(svm, INTERCEPT_RDTSCP); 1152 + } 1153 + } 1154 + 1155 + static inline void init_vmcb_after_set_cpuid(struct kvm_vcpu *vcpu) 1156 + { 1157 + struct vcpu_svm *svm = to_svm(vcpu); 1158 + 1159 + if (guest_cpuid_is_intel(vcpu)) { 1160 + /* 1161 + * We must intercept SYSENTER_EIP and SYSENTER_ESP 1162 + * accesses because the processor only stores 32 bits. 1163 + * For the same reason we cannot use virtual VMLOAD/VMSAVE. 1164 + */ 1165 + svm_set_intercept(svm, INTERCEPT_VMLOAD); 1166 + svm_set_intercept(svm, INTERCEPT_VMSAVE); 1167 + svm->vmcb->control.virt_ext &= ~VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK; 1168 + 1169 + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 0, 0); 1170 + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 0, 0); 1171 + } else { 1172 + /* 1173 + * If hardware supports Virtual VMLOAD VMSAVE then enable it 1174 + * in VMCB and clear intercepts to avoid #VMEXIT. 1175 + */ 1176 + if (vls) { 1177 + svm_clr_intercept(svm, INTERCEPT_VMLOAD); 1178 + svm_clr_intercept(svm, INTERCEPT_VMSAVE); 1179 + svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK; 1180 + } 1181 + /* No need to intercept these MSRs */ 1182 + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1); 1183 + set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1); 1180 1184 } 1181 1185 } 1182 1186 ··· 1358 1298 } 1359 1299 1360 1300 svm_hv_init_vmcb(svm->vmcb); 1301 + init_vmcb_after_set_cpuid(vcpu); 1361 1302 1362 1303 vmcb_mark_all_dirty(svm->vmcb); 1363 1304 1364 1305 enable_gif(svm); 1306 + } 1365 1307 1308 + static void __svm_vcpu_reset(struct kvm_vcpu *vcpu) 1309 + { 1310 + struct vcpu_svm *svm = to_svm(vcpu); 1311 + 1312 + svm_vcpu_init_msrpm(vcpu, svm->msrpm); 1313 + 1314 + svm_init_osvw(vcpu); 1315 + vcpu->arch.microcode_version = 0x01000065; 1316 + svm->tsc_ratio_msr = kvm_default_tsc_scaling_ratio; 1317 + 1318 + if (sev_es_guest(vcpu->kvm)) 1319 + sev_es_vcpu_reset(svm); 1366 1320 } 1367 1321 1368 1322 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) ··· 1387 1313 svm->virt_spec_ctrl = 0; 1388 1314 1389 1315 init_vmcb(vcpu); 1316 + 1317 + if (!init_event) 1318 + __svm_vcpu_reset(vcpu); 1390 1319 } 1391 1320 1392 1321 void svm_switch_vmcb(struct vcpu_svm *svm, struct kvm_vmcb_info *target_vmcb) ··· 1449 1372 1450 1373 svm->vmcb01.ptr = page_address(vmcb01_page); 1451 1374 svm->vmcb01.pa = __sme_set(page_to_pfn(vmcb01_page) << PAGE_SHIFT); 1375 + svm_switch_vmcb(svm, &svm->vmcb01); 1452 1376 1453 1377 if (vmsa_page) 1454 1378 svm->vmsa = page_address(vmsa_page); 1455 1379 1456 1380 svm->guest_state_loaded = false; 1457 - 1458 - svm_switch_vmcb(svm, &svm->vmcb01); 1459 - init_vmcb(vcpu); 1460 - 1461 - svm_vcpu_init_msrpm(vcpu, svm->msrpm); 1462 - 1463 - svm_init_osvw(vcpu); 1464 - vcpu->arch.microcode_version = 0x01000065; 1465 - 1466 - if (sev_es_guest(vcpu->kvm)) 1467 - /* Perform SEV-ES specific VMCB creation updates */ 1468 - sev_es_create_vcpu(svm); 1469 1381 1470 1382 return 0; 1471 1383 ··· 1515 1449 vmsave(__sme_page_pa(sd->save_area)); 1516 1450 } 1517 1451 1518 - if (static_cpu_has(X86_FEATURE_TSCRATEMSR)) { 1452 + if (tsc_scaling) { 1519 1453 u64 tsc_ratio = vcpu->arch.tsc_scaling_ratio; 1520 1454 if (tsc_ratio != __this_cpu_read(current_tsc_ratio)) { 1521 1455 __this_cpu_write(current_tsc_ratio, tsc_ratio); ··· 2725 2659 struct vcpu_svm *svm = to_svm(vcpu); 2726 2660 2727 2661 switch (msr_info->index) { 2662 + case MSR_AMD64_TSC_RATIO: 2663 + if (!msr_info->host_initiated && !svm->tsc_scaling_enabled) 2664 + return 1; 2665 + msr_info->data = svm->tsc_ratio_msr; 2666 + break; 2728 2667 case MSR_STAR: 2729 2668 msr_info->data = svm->vmcb01.ptr->save.star; 2730 2669 break; ··· 2879 2808 u32 ecx = msr->index; 2880 2809 u64 data = msr->data; 2881 2810 switch (ecx) { 2811 + case MSR_AMD64_TSC_RATIO: 2812 + if (!msr->host_initiated && !svm->tsc_scaling_enabled) 2813 + return 1; 2814 + 2815 + if (data & TSC_RATIO_RSVD) 2816 + return 1; 2817 + 2818 + svm->tsc_ratio_msr = data; 2819 + 2820 + if (svm->tsc_scaling_enabled && is_guest_mode(vcpu)) 2821 + nested_svm_update_tsc_ratio_msr(vcpu); 2822 + 2823 + break; 2882 2824 case MSR_IA32_CR_PAT: 2883 2825 if (!kvm_mtrr_valid(vcpu, MSR_IA32_CR_PAT, data)) 2884 2826 return 1; ··· 3004 2920 svm->tsc_aux = data; 3005 2921 break; 3006 2922 case MSR_IA32_DEBUGCTLMSR: 3007 - if (!boot_cpu_has(X86_FEATURE_LBRV)) { 2923 + if (!lbrv) { 3008 2924 vcpu_unimpl(vcpu, "%s: MSR_IA32_DEBUGCTL 0x%llx, nop\n", 3009 2925 __func__, data); 3010 2926 break; ··· 3364 3280 return svm_exit_handlers[exit_code](vcpu); 3365 3281 } 3366 3282 3367 - static void svm_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2, 3283 + static void svm_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, 3284 + u64 *info1, u64 *info2, 3368 3285 u32 *intr_info, u32 *error_code) 3369 3286 { 3370 3287 struct vmcb_control_area *control = &to_svm(vcpu)->vmcb->control; 3371 3288 3289 + *reason = control->exit_code; 3372 3290 *info1 = control->exit_info_1; 3373 3291 *info2 = control->exit_info_2; 3374 3292 *intr_info = control->exit_int_info; ··· 3387 3301 struct kvm_run *kvm_run = vcpu->run; 3388 3302 u32 exit_code = svm->vmcb->control.exit_code; 3389 3303 3390 - trace_kvm_exit(exit_code, vcpu, KVM_ISA_SVM); 3304 + trace_kvm_exit(vcpu, KVM_ISA_SVM); 3391 3305 3392 3306 /* SEV-ES guests must use the CR write traps to track CR registers. */ 3393 3307 if (!sev_es_guest(vcpu->kvm)) { ··· 3400 3314 if (is_guest_mode(vcpu)) { 3401 3315 int vmexit; 3402 3316 3403 - trace_kvm_nested_vmexit(exit_code, vcpu, KVM_ISA_SVM); 3317 + trace_kvm_nested_vmexit(vcpu, KVM_ISA_SVM); 3404 3318 3405 3319 vmexit = nested_svm_exit_special(svm); 3406 3320 ··· 3868 3782 3869 3783 pre_svm_run(vcpu); 3870 3784 3871 - WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu)); 3872 - 3873 3785 sync_lapic_to_cr8(vcpu); 3874 3786 3875 3787 if (unlikely(svm->asid != svm->vmcb->control.asid)) { ··· 4087 4003 svm->nrips_enabled = kvm_cpu_cap_has(X86_FEATURE_NRIPS) && 4088 4004 guest_cpuid_has(vcpu, X86_FEATURE_NRIPS); 4089 4005 4006 + svm->tsc_scaling_enabled = tsc_scaling && guest_cpuid_has(vcpu, X86_FEATURE_TSCRATEMSR); 4007 + 4090 4008 svm_recalc_instruction_intercepts(vcpu, svm); 4091 4009 4092 4010 /* For sev guests, the memory encryption bit is not reserved in CR3. */ ··· 4115 4029 kvm_request_apicv_update(vcpu->kvm, false, 4116 4030 APICV_INHIBIT_REASON_NESTED); 4117 4031 } 4118 - 4119 - if (guest_cpuid_is_intel(vcpu)) { 4120 - /* 4121 - * We must intercept SYSENTER_EIP and SYSENTER_ESP 4122 - * accesses because the processor only stores 32 bits. 4123 - * For the same reason we cannot use virtual VMLOAD/VMSAVE. 4124 - */ 4125 - svm_set_intercept(svm, INTERCEPT_VMLOAD); 4126 - svm_set_intercept(svm, INTERCEPT_VMSAVE); 4127 - svm->vmcb->control.virt_ext &= ~VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK; 4128 - 4129 - set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 0, 0); 4130 - set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 0, 0); 4131 - } else { 4132 - /* 4133 - * If hardware supports Virtual VMLOAD VMSAVE then enable it 4134 - * in VMCB and clear intercepts to avoid #VMEXIT. 4135 - */ 4136 - if (vls) { 4137 - svm_clr_intercept(svm, INTERCEPT_VMLOAD); 4138 - svm_clr_intercept(svm, INTERCEPT_VMSAVE); 4139 - svm->vmcb->control.virt_ext |= VIRTUAL_VMLOAD_VMSAVE_ENABLE_MASK; 4140 - } 4141 - /* No need to intercept these MSRs */ 4142 - set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_EIP, 1, 1); 4143 - set_msr_interception(vcpu, svm->msrpm, MSR_IA32_SYSENTER_ESP, 1, 1); 4144 - } 4032 + init_vmcb_after_set_cpuid(vcpu); 4145 4033 } 4146 4034 4147 4035 static bool svm_has_wbinvd_exit(void) ··· 4582 4522 } 4583 4523 4584 4524 static struct kvm_x86_ops svm_x86_ops __initdata = { 4525 + .name = "kvm_amd", 4526 + 4585 4527 .hardware_unsetup = svm_hardware_teardown, 4586 4528 .hardware_enable = svm_hardware_enable, 4587 4529 .hardware_disable = svm_hardware_disable,

+7 -2

arch/x86/kvm/svm/svm.h

··· 140 140 u64 next_rip; 141 141 142 142 u64 spec_ctrl; 143 + 144 + u64 tsc_ratio_msr; 143 145 /* 144 146 * Contains guest-controlled bits of VIRT_SPEC_CTRL, which will be 145 147 * translated into the appropriate L2_CFG bits on the host to ··· 162 160 unsigned long int3_rip; 163 161 164 162 /* cached guest cpuid flags for faster access */ 165 - bool nrips_enabled : 1; 163 + bool nrips_enabled : 1; 164 + bool tsc_scaling_enabled : 1; 166 165 167 166 u32 ldr_reg; 168 167 u32 dfr_reg; ··· 486 483 int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, 487 484 bool has_error_code, u32 error_code); 488 485 int nested_svm_exit_special(struct vcpu_svm *svm); 486 + void nested_svm_update_tsc_ratio_msr(struct kvm_vcpu *vcpu); 487 + void svm_write_tsc_multiplier(struct kvm_vcpu *vcpu, u64 multiplier); 489 488 void nested_load_control_from_vmcb12(struct vcpu_svm *svm, 490 489 struct vmcb_control_area *control); 491 490 void nested_sync_control_from_vmcb02(struct vcpu_svm *svm); ··· 567 562 int sev_handle_vmgexit(struct kvm_vcpu *vcpu); 568 563 int sev_es_string_io(struct vcpu_svm *svm, int size, unsigned int port, int in); 569 564 void sev_es_init_vmcb(struct vcpu_svm *svm); 570 - void sev_es_create_vcpu(struct vcpu_svm *svm); 565 + void sev_es_vcpu_reset(struct vcpu_svm *svm); 571 566 void sev_vcpu_deliver_sipi_vector(struct kvm_vcpu *vcpu, u8 vector); 572 567 void sev_es_prepare_guest_switch(struct vcpu_svm *svm, unsigned int cpu); 573 568 void sev_es_unmap_ghcb(struct vcpu_svm *svm);

+5 -4

arch/x86/kvm/trace.h

··· 288 288 289 289 #define TRACE_EVENT_KVM_EXIT(name) \ 290 290 TRACE_EVENT(name, \ 291 - TP_PROTO(unsigned int exit_reason, struct kvm_vcpu *vcpu, u32 isa), \ 292 - TP_ARGS(exit_reason, vcpu, isa), \ 291 + TP_PROTO(struct kvm_vcpu *vcpu, u32 isa), \ 292 + TP_ARGS(vcpu, isa), \ 293 293 \ 294 294 TP_STRUCT__entry( \ 295 295 __field( unsigned int, exit_reason ) \ ··· 303 303 ), \ 304 304 \ 305 305 TP_fast_assign( \ 306 - __entry->exit_reason = exit_reason; \ 307 306 __entry->guest_rip = kvm_rip_read(vcpu); \ 308 307 __entry->isa = isa; \ 309 308 __entry->vcpu_id = vcpu->vcpu_id; \ 310 - static_call(kvm_x86_get_exit_info)(vcpu, &__entry->info1, \ 309 + static_call(kvm_x86_get_exit_info)(vcpu, \ 310 + &__entry->exit_reason, \ 311 + &__entry->info1, \ 311 312 &__entry->info2, \ 312 313 &__entry->intr_info, \ 313 314 &__entry->error_code); \

+32 -31

arch/x86/kvm/vmx/nested.c

··· 191 191 * failValid writes the error number to the current VMCS, which 192 192 * can't be done if there isn't a current VMCS. 193 193 */ 194 - if (vmx->nested.current_vmptr == -1ull && 194 + if (vmx->nested.current_vmptr == INVALID_GPA && 195 195 !evmptr_is_valid(vmx->nested.hv_evmcs_vmptr)) 196 196 return nested_vmx_failInvalid(vcpu); 197 197 ··· 218 218 static void vmx_disable_shadow_vmcs(struct vcpu_vmx *vmx) 219 219 { 220 220 secondary_exec_controls_clearbit(vmx, SECONDARY_EXEC_SHADOW_VMCS); 221 - vmcs_write64(VMCS_LINK_POINTER, -1ull); 221 + vmcs_write64(VMCS_LINK_POINTER, INVALID_GPA); 222 222 vmx->nested.need_vmcs12_to_shadow_sync = false; 223 223 } 224 224 ··· 290 290 291 291 vmx->nested.vmxon = false; 292 292 vmx->nested.smm.vmxon = false; 293 + vmx->nested.vmxon_ptr = INVALID_GPA; 293 294 free_vpid(vmx->nested.vpid02); 294 295 vmx->nested.posted_intr_nv = -1; 295 - vmx->nested.current_vmptr = -1ull; 296 + vmx->nested.current_vmptr = INVALID_GPA; 296 297 if (enable_shadow_vmcs) { 297 298 vmx_disable_shadow_vmcs(vmx); 298 299 vmcs_clear(vmx->vmcs01.shadow_vmcs); ··· 710 709 struct vmcs12 *shadow; 711 710 712 711 if (!nested_cpu_has_shadow_vmcs(vmcs12) || 713 - vmcs12->vmcs_link_pointer == -1ull) 712 + vmcs12->vmcs_link_pointer == INVALID_GPA) 714 713 return; 715 714 716 715 shadow = get_shadow_vmcs12(vcpu); ··· 728 727 struct vcpu_vmx *vmx = to_vmx(vcpu); 729 728 730 729 if (!nested_cpu_has_shadow_vmcs(vmcs12) || 731 - vmcs12->vmcs_link_pointer == -1ull) 730 + vmcs12->vmcs_link_pointer == INVALID_GPA) 732 731 return; 733 732 734 733 kvm_write_guest(vmx->vcpu.kvm, vmcs12->vmcs_link_pointer, ··· 1995 1994 } 1996 1995 1997 1996 if (unlikely(evmcs_gpa != vmx->nested.hv_evmcs_vmptr)) { 1998 - vmx->nested.current_vmptr = -1ull; 1997 + vmx->nested.current_vmptr = INVALID_GPA; 1999 1998 2000 1999 nested_release_evmcs(vcpu); 2001 2000 ··· 2179 2178 } 2180 2179 2181 2180 if (cpu_has_vmx_encls_vmexit()) 2182 - vmcs_write64(ENCLS_EXITING_BITMAP, -1ull); 2181 + vmcs_write64(ENCLS_EXITING_BITMAP, INVALID_GPA); 2183 2182 2184 2183 /* 2185 2184 * Set the MSR load/store lists to match L0's settings. Only the ··· 2198 2197 { 2199 2198 prepare_vmcs02_constant_state(vmx); 2200 2199 2201 - vmcs_write64(VMCS_LINK_POINTER, -1ull); 2200 + vmcs_write64(VMCS_LINK_POINTER, INVALID_GPA); 2202 2201 2203 2202 if (enable_vpid) { 2204 2203 if (nested_cpu_has_vpid(vmcs12) && vmx->nested.vpid02) ··· 2950 2949 struct vmcs12 *shadow; 2951 2950 struct kvm_host_map map; 2952 2951 2953 - if (vmcs12->vmcs_link_pointer == -1ull) 2952 + if (vmcs12->vmcs_link_pointer == INVALID_GPA) 2954 2953 return 0; 2955 2954 2956 2955 if (CC(!page_address_valid(vcpu, vmcs12->vmcs_link_pointer))) ··· 3217 3216 * Write an illegal value to VIRTUAL_APIC_PAGE_ADDR to 3218 3217 * force VM-Entry to fail. 3219 3218 */ 3220 - vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, -1ull); 3219 + vmcs_write64(VIRTUAL_APIC_PAGE_ADDR, INVALID_GPA); 3221 3220 } 3222 3221 } 3223 3222 ··· 3528 3527 } 3529 3528 3530 3529 if (CC(!evmptr_is_valid(vmx->nested.hv_evmcs_vmptr) && 3531 - vmx->nested.current_vmptr == -1ull)) 3530 + vmx->nested.current_vmptr == INVALID_GPA)) 3532 3531 return nested_vmx_failInvalid(vcpu); 3533 3532 3534 3533 vmcs12 = get_vmcs12(vcpu); ··· 4976 4975 { 4977 4976 struct vcpu_vmx *vmx = to_vmx(vcpu); 4978 4977 4979 - if (vmx->nested.current_vmptr == -1ull) 4978 + if (vmx->nested.current_vmptr == INVALID_GPA) 4980 4979 return; 4981 4980 4982 4981 copy_vmcs02_to_vmcs12_rare(vcpu, get_vmcs12(vcpu)); ··· 4996 4995 4997 4996 kvm_mmu_free_roots(vcpu, &vcpu->arch.guest_mmu, KVM_MMU_ROOTS_ALL); 4998 4997 4999 - vmx->nested.current_vmptr = -1ull; 4998 + vmx->nested.current_vmptr = INVALID_GPA; 5000 4999 } 5001 5000 5002 5001 /* Emulate the VMXOFF instruction */ ··· 5091 5090 return 1; 5092 5091 5093 5092 /* 5094 - * In VMX non-root operation, when the VMCS-link pointer is -1ull, 5093 + * In VMX non-root operation, when the VMCS-link pointer is INVALID_GPA, 5095 5094 * any VMREAD sets the ALU flags for VMfailInvalid. 5096 5095 */ 5097 - if (vmx->nested.current_vmptr == -1ull || 5096 + if (vmx->nested.current_vmptr == INVALID_GPA || 5098 5097 (is_guest_mode(vcpu) && 5099 - get_vmcs12(vcpu)->vmcs_link_pointer == -1ull)) 5098 + get_vmcs12(vcpu)->vmcs_link_pointer == INVALID_GPA)) 5100 5099 return nested_vmx_failInvalid(vcpu); 5101 5100 5102 5101 /* Decode instruction info and find the field to read */ ··· 5183 5182 return 1; 5184 5183 5185 5184 /* 5186 - * In VMX non-root operation, when the VMCS-link pointer is -1ull, 5185 + * In VMX non-root operation, when the VMCS-link pointer is INVALID_GPA, 5187 5186 * any VMWRITE sets the ALU flags for VMfailInvalid. 5188 5187 */ 5189 - if (vmx->nested.current_vmptr == -1ull || 5188 + if (vmx->nested.current_vmptr == INVALID_GPA || 5190 5189 (is_guest_mode(vcpu) && 5191 - get_vmcs12(vcpu)->vmcs_link_pointer == -1ull)) 5190 + get_vmcs12(vcpu)->vmcs_link_pointer == INVALID_GPA)) 5192 5191 return nested_vmx_failInvalid(vcpu); 5193 5192 5194 5193 if (instr_info & BIT(10)) ··· 5631 5630 gpa_t bitmap, last_bitmap; 5632 5631 u8 b; 5633 5632 5634 - last_bitmap = (gpa_t)-1; 5633 + last_bitmap = INVALID_GPA; 5635 5634 b = -1; 5636 5635 5637 5636 while (size > 0) { ··· 6066 6065 goto reflect_vmexit; 6067 6066 } 6068 6067 6069 - trace_kvm_nested_vmexit(exit_reason.full, vcpu, KVM_ISA_VMX); 6068 + trace_kvm_nested_vmexit(vcpu, KVM_ISA_VMX); 6070 6069 6071 6070 /* If L0 (KVM) wants the exit, it trumps L1's desires. */ 6072 6071 if (nested_vmx_l0_wants_exit(vcpu, exit_reason)) ··· 6107 6106 .format = KVM_STATE_NESTED_FORMAT_VMX, 6108 6107 .size = sizeof(kvm_state), 6109 6108 .hdr.vmx.flags = 0, 6110 - .hdr.vmx.vmxon_pa = -1ull, 6111 - .hdr.vmx.vmcs12_pa = -1ull, 6109 + .hdr.vmx.vmxon_pa = INVALID_GPA, 6110 + .hdr.vmx.vmcs12_pa = INVALID_GPA, 6112 6111 .hdr.vmx.preemption_timer_deadline = 0, 6113 6112 }; 6114 6113 struct kvm_vmx_nested_state_data __user *user_vmx_nested_state = ··· 6134 6133 6135 6134 if (is_guest_mode(vcpu) && 6136 6135 nested_cpu_has_shadow_vmcs(vmcs12) && 6137 - vmcs12->vmcs_link_pointer != -1ull) 6136 + vmcs12->vmcs_link_pointer != INVALID_GPA) 6138 6137 kvm_state.size += sizeof(user_vmx_nested_state->shadow_vmcs12); 6139 6138 } 6140 6139 ··· 6210 6209 return -EFAULT; 6211 6210 6212 6211 if (nested_cpu_has_shadow_vmcs(vmcs12) && 6213 - vmcs12->vmcs_link_pointer != -1ull) { 6212 + vmcs12->vmcs_link_pointer != INVALID_GPA) { 6214 6213 if (copy_to_user(user_vmx_nested_state->shadow_vmcs12, 6215 6214 get_shadow_vmcs12(vcpu), VMCS12_SIZE)) 6216 6215 return -EFAULT; ··· 6245 6244 if (kvm_state->format != KVM_STATE_NESTED_FORMAT_VMX) 6246 6245 return -EINVAL; 6247 6246 6248 - if (kvm_state->hdr.vmx.vmxon_pa == -1ull) { 6247 + if (kvm_state->hdr.vmx.vmxon_pa == INVALID_GPA) { 6249 6248 if (kvm_state->hdr.vmx.smm.flags) 6250 6249 return -EINVAL; 6251 6250 6252 - if (kvm_state->hdr.vmx.vmcs12_pa != -1ull) 6251 + if (kvm_state->hdr.vmx.vmcs12_pa != INVALID_GPA) 6253 6252 return -EINVAL; 6254 6253 6255 6254 /* ··· 6303 6302 6304 6303 vmx_leave_nested(vcpu); 6305 6304 6306 - if (kvm_state->hdr.vmx.vmxon_pa == -1ull) 6305 + if (kvm_state->hdr.vmx.vmxon_pa == INVALID_GPA) 6307 6306 return 0; 6308 6307 6309 6308 vmx->nested.vmxon_ptr = kvm_state->hdr.vmx.vmxon_pa; ··· 6316 6315 /* See vmx_has_valid_vmcs12. */ 6317 6316 if ((kvm_state->flags & KVM_STATE_NESTED_GUEST_MODE) || 6318 6317 (kvm_state->flags & KVM_STATE_NESTED_EVMCS) || 6319 - (kvm_state->hdr.vmx.vmcs12_pa != -1ull)) 6318 + (kvm_state->hdr.vmx.vmcs12_pa != INVALID_GPA)) 6320 6319 return -EINVAL; 6321 6320 else 6322 6321 return 0; 6323 6322 } 6324 6323 6325 - if (kvm_state->hdr.vmx.vmcs12_pa != -1ull) { 6324 + if (kvm_state->hdr.vmx.vmcs12_pa != INVALID_GPA) { 6326 6325 if (kvm_state->hdr.vmx.vmcs12_pa == kvm_state->hdr.vmx.vmxon_pa || 6327 6326 !page_address_valid(vcpu, kvm_state->hdr.vmx.vmcs12_pa)) 6328 6327 return -EINVAL; ··· 6367 6366 6368 6367 ret = -EINVAL; 6369 6368 if (nested_cpu_has_shadow_vmcs(vmcs12) && 6370 - vmcs12->vmcs_link_pointer != -1ull) { 6369 + vmcs12->vmcs_link_pointer != INVALID_GPA) { 6371 6370 struct vmcs12 *shadow_vmcs12 = get_shadow_vmcs12(vcpu); 6372 6371 6373 6372 if (kvm_state->size <

+2 -4

arch/x86/kvm/vmx/pmu_intel.c

··· 365 365 msr_info->data = pmu->global_ctrl; 366 366 return 0; 367 367 case MSR_CORE_PERF_GLOBAL_OVF_CTRL: 368 - msr_info->data = pmu->global_ovf_ctrl; 368 + msr_info->data = 0; 369 369 return 0; 370 370 default: 371 371 if ((pmc = get_gp_pmc(pmu, msr, MSR_IA32_PERFCTR0)) || ··· 423 423 if (!(data & pmu->global_ovf_ctrl_mask)) { 424 424 if (!msr_info->host_initiated) 425 425 pmu->global_status &= ~data; 426 - pmu->global_ovf_ctrl = data; 427 426 return 0; 428 427 } 429 428 break; ··· 587 588 pmc->counter = 0; 588 589 } 589 590 590 - pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status = 591 - pmu->global_ovf_ctrl = 0; 591 + pmu->fixed_ctr_ctrl = pmu->global_ctrl = pmu->global_status = 0; 592 592 593 593 intel_pmu_release_guest_lbr_event(vcpu); 594 594 }

+5 -11

arch/x86/kvm/vmx/sgx.c

··· 53 53 static void sgx_handle_emulation_failure(struct kvm_vcpu *vcpu, u64 addr, 54 54 unsigned int size) 55 55 { 56 - vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 57 - vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; 58 - vcpu->run->internal.ndata = 2; 59 - vcpu->run->internal.data[0] = addr; 60 - vcpu->run->internal.data[1] = size; 56 + uint64_t data[2] = { addr, size }; 57 + 58 + __kvm_prepare_emulation_failure_exit(vcpu, data, ARRAY_SIZE(data)); 61 59 } 62 60 63 61 static int sgx_read_hva(struct kvm_vcpu *vcpu, unsigned long hva, void *data, ··· 110 112 * but the error code isn't (yet) plumbed through the ENCLS helpers. 111 113 */ 112 114 if (trapnr == PF_VECTOR && !boot_cpu_has(X86_FEATURE_SGX2)) { 113 - vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 114 - vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; 115 - vcpu->run->internal.ndata = 0; 115 + kvm_prepare_emulation_failure_exit(vcpu); 116 116 return 0; 117 117 } 118 118 ··· 151 155 sgx_12_0 = kvm_find_cpuid_entry(vcpu, 0x12, 0); 152 156 sgx_12_1 = kvm_find_cpuid_entry(vcpu, 0x12, 1); 153 157 if (!sgx_12_0 || !sgx_12_1) { 154 - vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 155 - vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; 156 - vcpu->run->internal.ndata = 0; 158 + kvm_prepare_emulation_failure_exit(vcpu); 157 159 return 0; 158 160 } 159 161

+71 -65

arch/x86/kvm/vmx/vmx.c

··· 1059 1059 rdmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl); 1060 1060 if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) { 1061 1061 wrmsrl(MSR_IA32_RTIT_CTL, 0); 1062 - pt_save_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_range); 1063 - pt_load_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_range); 1062 + pt_save_msr(&vmx->pt_desc.host, vmx->pt_desc.num_address_ranges); 1063 + pt_load_msr(&vmx->pt_desc.guest, vmx->pt_desc.num_address_ranges); 1064 1064 } 1065 1065 } 1066 1066 ··· 1070 1070 return; 1071 1071 1072 1072 if (vmx->pt_desc.guest.ctl & RTIT_CTL_TRACEEN) { 1073 - pt_save_msr(&vmx->pt_desc.guest, vmx->pt_desc.addr_range); 1074 - pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.addr_range); 1073 + pt_save_msr(&vmx->pt_desc.guest, vmx->pt_desc.num_address_ranges); 1074 + pt_load_msr(&vmx->pt_desc.host, vmx->pt_desc.num_address_ranges); 1075 1075 } 1076 1076 1077 - /* Reload host state (IA32_RTIT_CTL will be cleared on VM exit). */ 1078 - wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl); 1077 + /* 1078 + * KVM requires VM_EXIT_CLEAR_IA32_RTIT_CTL to expose PT to the guest, 1079 + * i.e. RTIT_CTL is always cleared on VM-Exit. Restore it if necessary. 1080 + */ 1081 + if (vmx->pt_desc.host.ctl) 1082 + wrmsrl(MSR_IA32_RTIT_CTL, vmx->pt_desc.host.ctl); 1079 1083 } 1080 1084 1081 1085 void vmx_set_host_fs_gs(struct vmcs_host_state *host, u16 fs_sel, u16 gs_sel, ··· 1460 1456 * cause a #GP fault. 1461 1457 */ 1462 1458 value = (data & RTIT_CTL_ADDR0) >> RTIT_CTL_ADDR0_OFFSET; 1463 - if ((value && (vmx->pt_desc.addr_range < 1)) || (value > 2)) 1459 + if ((value && (vmx->pt_desc.num_address_ranges < 1)) || (value > 2)) 1464 1460 return 1; 1465 1461 value = (data & RTIT_CTL_ADDR1) >> RTIT_CTL_ADDR1_OFFSET; 1466 - if ((value && (vmx->pt_desc.addr_range < 2)) || (value > 2)) 1462 + if ((value && (vmx->pt_desc.num_address_ranges < 2)) || (value > 2)) 1467 1463 return 1; 1468 1464 value = (data & RTIT_CTL_ADDR2) >> RTIT_CTL_ADDR2_OFFSET; 1469 - if ((value && (vmx->pt_desc.addr_range < 3)) || (value > 2)) 1465 + if ((value && (vmx->pt_desc.num_address_ranges < 3)) || (value > 2)) 1470 1466 return 1; 1471 1467 value = (data & RTIT_CTL_ADDR3) >> RTIT_CTL_ADDR3_OFFSET; 1472 - if ((value && (vmx->pt_desc.addr_range < 4)) || (value > 2)) 1468 + if ((value && (vmx->pt_desc.num_address_ranges < 4)) || (value > 2)) 1473 1469 return 1; 1474 1470 1475 1471 return 0; ··· 1890 1886 case MSR_IA32_RTIT_ADDR0_A ... MSR_IA32_RTIT_ADDR3_B: 1891 1887 index = msr_info->index - MSR_IA32_RTIT_ADDR0_A; 1892 1888 if (!vmx_pt_mode_is_host_guest() || 1893 - (index >= 2 * intel_pt_validate_cap(vmx->pt_desc.caps, 1894 - PT_CAP_num_address_ranges))) 1889 + (index >= 2 * vmx->pt_desc.num_address_ranges)) 1895 1890 return 1; 1896 1891 if (index % 2) 1897 1892 msr_info->data = vmx->pt_desc.guest.addr_b[index / 2]; ··· 2205 2202 if (!pt_can_write_msr(vmx)) 2206 2203 return 1; 2207 2204 index = msr_info->index - MSR_IA32_RTIT_ADDR0_A; 2208 - if (index >= 2 * intel_pt_validate_cap(vmx->pt_desc.caps, 2209 - PT_CAP_num_address_ranges)) 2205 + if (index >= 2 * vmx->pt_desc.num_address_ranges) 2210 2206 return 1; 2211 2207 if (is_noncanonical_address(data, vcpu)) 2212 2208 return 1; ··· 3881 3879 vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_OUTPUT_BASE, MSR_TYPE_RW, flag); 3882 3880 vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_OUTPUT_MASK, MSR_TYPE_RW, flag); 3883 3881 vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_CR3_MATCH, MSR_TYPE_RW, flag); 3884 - for (i = 0; i < vmx->pt_desc.addr_range; i++) { 3882 + for (i = 0; i < vmx->pt_desc.num_address_ranges; i++) { 3885 3883 vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_ADDR0_A + i * 2, MSR_TYPE_RW, flag); 3886 3884 vmx_set_intercept_for_msr(vcpu, MSR_IA32_RTIT_ADDR0_B + i * 2, MSR_TYPE_RW, flag); 3887 3885 } ··· 4330 4328 4331 4329 #define VMX_XSS_EXIT_BITMAP 0 4332 4330 4333 - /* 4334 - * Noting that the initialization of Guest-state Area of VMCS is in 4335 - * vmx_vcpu_reset(). 4336 - */ 4337 4331 static void init_vmcs(struct vcpu_vmx *vmx) 4338 4332 { 4339 4333 if (nested) ··· 4338 4340 if (cpu_has_vmx_msr_bitmap()) 4339 4341 vmcs_write64(MSR_BITMAP, __pa(vmx->vmcs01.msr_bitmap)); 4340 4342 4341 - vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */ 4343 + vmcs_write64(VMCS_LINK_POINTER, INVALID_GPA); /* 22.3.1.5 */ 4342 4344 4343 4345 /* Control */ 4344 4346 pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx)); ··· 4434 4436 vmx_setup_uret_msrs(vmx); 4435 4437 } 4436 4438 4439 + static void __vmx_vcpu_reset(struct kvm_vcpu *vcpu) 4440 + { 4441 + struct vcpu_vmx *vmx = to_vmx(vcpu); 4442 + 4443 + init_vmcs(vmx); 4444 + 4445 + if (nested) 4446 + memcpy(&vmx->nested.msrs, &vmcs_config.nested, sizeof(vmx->nested.msrs)); 4447 + 4448 + vcpu_setup_sgx_lepubkeyhash(vcpu); 4449 + 4450 + vmx->nested.posted_intr_nv = -1; 4451 + vmx->nested.vmxon_ptr = INVALID_GPA; 4452 + vmx->nested.current_vmptr = INVALID_GPA; 4453 + vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID; 4454 + 4455 + vcpu->arch.microcode_version = 0x100000000ULL; 4456 + vmx->msr_ia32_feature_control_valid_bits = FEAT_CTL_LOCKED; 4457 + 4458 + /* 4459 + * Enforce invariant: pi_desc.nv is always either POSTED_INTR_VECTOR 4460 + * or POSTED_INTR_WAKEUP_VECTOR. 4461 + */ 4462 + vmx->pi_desc.nv = POSTED_INTR_VECTOR; 4463 + vmx->pi_desc.sn = 1; 4464 + } 4465 + 4437 4466 static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) 4438 4467 { 4439 4468 struct vcpu_vmx *vmx = to_vmx(vcpu); 4469 + 4470 + if (!init_event) 4471 + __vmx_vcpu_reset(vcpu); 4440 4472 4441 4473 vmx->rmode.vm86_active = 0; 4442 4474 vmx->spec_ctrl = 0; ··· 4477 4449 kvm_set_cr8(vcpu, 0); 4478 4450 4479 4451 vmx_segment_cache_clear(vmx); 4452 + kvm_register_mark_available(vcpu, VCPU_EXREG_SEGMENTS); 4480 4453 4481 4454 seg_setup(VCPU_SREG_CS); 4482 4455 vmcs_write16(GUEST_CS_SELECTOR, 0xf000); ··· 5408 5379 5409 5380 if (vmx->emulation_required && !vmx->rmode.vm86_active && 5410 5381 vcpu->arch.exception.pending) { 5411 - vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 5412 - vcpu->run->internal.suberror = 5413 - KVM_INTERNAL_ERROR_EMULATION; 5414 - vcpu->run->internal.ndata = 0; 5382 + kvm_prepare_emulation_failure_exit(vcpu); 5415 5383 return 0; 5416 5384 } 5417 5385 ··· 5659 5633 static const int kvm_vmx_max_exit_handlers = 5660 5634 ARRAY_SIZE(kvm_vmx_exit_handlers); 5661 5635 5662 - static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u64 *info1, u64 *info2, 5636 + static void vmx_get_exit_info(struct kvm_vcpu *vcpu, u32 *reason, 5637 + u64 *info1, u64 *info2, 5663 5638 u32 *intr_info, u32 *error_code) 5664 5639 { 5665 5640 struct vcpu_vmx *vmx = to_vmx(vcpu); 5666 5641 5642 + *reason = vmx->exit_reason.full; 5667 5643 *info1 = vmx_get_exit_qual(vcpu); 5668 5644 if (!(vmx->exit_reason.failed_vmentry)) { 5669 5645 *info2 = vmx->idt_vectoring_info; ··· 6434 6406 case MSR_IA32_VMX_BASIC ... MSR_IA32_VMX_VMFUNC: 6435 6407 return nested; 6436 6408 case MSR_AMD64_VIRT_SPEC_CTRL: 6409 + case MSR_AMD64_TSC_RATIO: 6437 6410 /* This is AMD only. */ 6438 6411 return false; 6439 6412 default: ··· 6811 6782 if (likely(!vmx->exit_reason.failed_vmentry)) 6812 6783 vmx->idt_vectoring_info = vmcs_read32(IDT_VECTORING_INFO_FIELD); 6813 6784 6814 - trace_kvm_exit(vmx->exit_reason.full, vcpu, KVM_ISA_VMX); 6785 + trace_kvm_exit(vcpu, KVM_ISA_VMX); 6815 6786 6816 6787 if (unlikely(vmx->exit_reason.failed_vmentry)) 6817 6788 return EXIT_FASTPATH_NONE; ··· 6842 6813 { 6843 6814 struct vmx_uret_msr *tsx_ctrl; 6844 6815 struct vcpu_vmx *vmx; 6845 - int i, cpu, err; 6816 + int i, err; 6846 6817 6847 6818 BUILD_BUG_ON(offsetof(struct vcpu_vmx, vcpu) != 0); 6848 6819 vmx = to_vmx(vcpu); ··· 6863 6834 goto free_vpid; 6864 6835 } 6865 6836 6866 - for (i = 0; i < kvm_nr_uret_msrs; ++i) { 6867 - vmx->guest_uret_msrs[i].data = 0; 6837 + for (i = 0; i < kvm_nr_uret_msrs; ++i) 6868 6838 vmx->guest_uret_msrs[i].mask = -1ull; 6869 - } 6870 6839 if (boot_cpu_has(X86_FEATURE_RTM)) { 6871 6840 /* 6872 6841 * TSX_CTRL_CPUID_CLEAR is handled in the CPUID interception. ··· 6901 6874 } 6902 6875 6903 6876 vmx->loaded_vmcs = &vmx->vmcs01; 6904 - cpu = get_cpu(); 6905 - vmx_vcpu_load(vcpu, cpu); 6906 - vcpu->cpu = cpu; 6907 - init_vmcs(vmx); 6908 - vmx_vcpu_put(vcpu); 6909 - put_cpu(); 6877 + 6910 6878 if (cpu_need_virtualize_apic_accesses(vcpu)) { 6911 6879 err = alloc_apic_access_page(vcpu->kvm); 6912 6880 if (err) ··· 6913 6891 if (err) 6914 6892 goto free_vmcs; 6915 6893 } 6916 - 6917 - if (nested) 6918 - memcpy(&vmx->nested.msrs, &vmcs_config.nested, sizeof(vmx->nested.msrs)); 6919 - else 6920 - memset(&vmx->nested.msrs, 0, sizeof(vmx->nested.msrs)); 6921 - 6922 - vcpu_setup_sgx_lepubkeyhash(vcpu); 6923 - 6924 - vmx->nested.posted_intr_nv = -1; 6925 - vmx->nested.current_vmptr = -1ull; 6926 - vmx->nested.hv_evmcs_vmptr = EVMPTR_INVALID; 6927 - 6928 - vcpu->arch.microcode_version = 0x100000000ULL; 6929 - vmx->msr_ia32_feature_control_valid_bits = FEAT_CTL_LOCKED; 6930 - 6931 - /* 6932 - * Enforce invariant: pi_desc.nv is always either POSTED_INTR_VECTOR 6933 - * or POSTED_INTR_WAKEUP_VECTOR. 6934 - */ 6935 - vmx->pi_desc.nv = POSTED_INTR_VECTOR; 6936 - vmx->pi_desc.sn = 1; 6937 6894 6938 6895 return 0; 6939 6896 ··· 7128 7127 } 7129 7128 7130 7129 /* Get the number of configurable Address Ranges for filtering */ 7131 - vmx->pt_desc.addr_range = intel_pt_validate_cap(vmx->pt_desc.caps, 7130 + vmx->pt_desc.num_address_ranges = intel_pt_validate_cap(vmx->pt_desc.caps, 7132 7131 PT_CAP_num_address_ranges); 7133 7132 7134 7133 /* Initialize and clear the no dependency bits */ 7135 7134 vmx->pt_desc.ctl_bitmask = ~(RTIT_CTL_TRACEEN | RTIT_CTL_OS | 7136 - RTIT_CTL_USR | RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC); 7135 + RTIT_CTL_USR | RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC | 7136 + RTIT_CTL_BRANCH_EN); 7137 7137 7138 7138 /* 7139 7139 * If CPUID.(EAX=14H,ECX=0):EBX[0]=1 CR3Filter can be set otherwise ··· 7152 7150 RTIT_CTL_CYC_THRESH | RTIT_CTL_PSB_FREQ); 7153 7151 7154 7152 /* 7155 - * If CPUID.(EAX=14H,ECX=0):EBX[3]=1 MTCEn BranchEn and 7156 - * MTCFreq can be set 7153 + * If CPUID.(EAX=14H,ECX=0):EBX[3]=1 MTCEn and MTCFreq can be set 7157 7154 */ 7158 7155 if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_mtc)) 7159 7156 vmx->pt_desc.ctl_bitmask &= ~(RTIT_CTL_MTC_EN | 7160 - RTIT_CTL_BRANCH_EN | RTIT_CTL_MTC_RANGE); 7157 + RTIT_CTL_MTC_RANGE); 7161 7158 7162 7159 /* If CPUID.(EAX=14H,ECX=0):EBX[4]=1 FUPonPTW and PTWEn can be set */ 7163 7160 if (intel_pt_validate_cap(vmx->pt_desc.caps, PT_CAP_ptwrite)) ··· 7176 7175 vmx->pt_desc.ctl_bitmask &= ~RTIT_CTL_FABRIC_EN; 7177 7176 7178 7177 /* unmask address range configure area */ 7179 - for (i = 0; i < vmx->pt_desc.addr_range; i++) 7178 + for (i = 0; i < vmx->pt_desc.num_address_ranges; i++) 7180 7179 vmx->pt_desc.ctl_bitmask &= ~(0xfULL << (32 + i * 4)); 7181 7180 } 7182 7181 ··· 7552 7551 7553 7552 static void hardware_unsetup(void) 7554 7553 { 7554 + kvm_set_posted_intr_wakeup_handler(NULL); 7555 + 7555 7556 if (nested) 7556 7557 nested_vmx_hardware_unsetup(); 7557 7558 ··· 7569 7566 } 7570 7567 7571 7568 static struct kvm_x86_ops vmx_x86_ops __initdata = { 7569 + .name = "kvm_intel", 7570 + 7572 7571 .hardware_unsetup = hardware_unsetup, 7573 7572 7574 7573 .hardware_enable = hardware_enable, ··· 7884 7879 vmx_x86_ops.request_immediate_exit = __kvm_request_immediate_exit; 7885 7880 } 7886 7881 7887 - kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler); 7888 - 7889 7882 kvm_mce_cap_supported |= MCG_LMCE_P; 7890 7883 7891 7884 if (pt_mode != PT_MODE_SYSTEM && pt_mode != PT_MODE_HOST_GUEST) ··· 7907 7904 r = alloc_kvm_area(); 7908 7905 if (r) 7909 7906 nested_vmx_hardware_unsetup(); 7907 + 7908 + kvm_set_posted_intr_wakeup_handler(pi_wakeup_handler); 7909 + 7910 7910 return r; 7911 7911 } 7912 7912

+1 -1

arch/x86/kvm/vmx/vmx.h

··· 62 62 63 63 struct pt_desc { 64 64 u64 ctl_bitmask; 65 - u32 addr_range; 65 + u32 num_address_ranges; 66 66 u32 caps[PT_CPUID_REGS_NUM * PT_CPUID_LEAVES]; 67 67 struct pt_ctx host; 68 68 struct pt_ctx guest;

+503 -297

arch/x86/kvm/x86.c

··· 790 790 } 791 791 EXPORT_SYMBOL_GPL(kvm_require_dr); 792 792 793 - /* 794 - * This function will be used to read from the physical memory of the currently 795 - * running guest. The difference to kvm_vcpu_read_guest_page is that this function 796 - * can read from guest physical or from the guest's guest physical memory. 797 - */ 798 - int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, 799 - gfn_t ngfn, void *data, int offset, int len, 800 - u32 access) 801 - { 802 - struct x86_exception exception; 803 - gfn_t real_gfn; 804 - gpa_t ngpa; 805 - 806 - ngpa = gfn_to_gpa(ngfn); 807 - real_gfn = mmu->translate_gpa(vcpu, ngpa, access, &exception); 808 - if (real_gfn == UNMAPPED_GVA) 809 - return -EFAULT; 810 - 811 - real_gfn = gpa_to_gfn(real_gfn); 812 - 813 - return kvm_vcpu_read_guest_page(vcpu, real_gfn, data, offset, len); 814 - } 815 - EXPORT_SYMBOL_GPL(kvm_read_guest_page_mmu); 816 - 817 793 static inline u64 pdptr_rsvd_bits(struct kvm_vcpu *vcpu) 818 794 { 819 795 return vcpu->arch.reserved_gpa_bits | rsvd_bits(5, 8) | rsvd_bits(1, 2); ··· 801 825 int load_pdptrs(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, unsigned long cr3) 802 826 { 803 827 gfn_t pdpt_gfn = cr3 >> PAGE_SHIFT; 804 - unsigned offset = ((cr3 & (PAGE_SIZE-1)) >> 5) << 2; 828 + gpa_t real_gpa; 805 829 int i; 806 830 int ret; 807 831 u64 pdpte[ARRAY_SIZE(mmu->pdptrs)]; 808 832 809 - ret = kvm_read_guest_page_mmu(vcpu, mmu, pdpt_gfn, pdpte, 810 - offset * sizeof(u64), sizeof(pdpte), 811 - PFERR_USER_MASK|PFERR_WRITE_MASK); 812 - if (ret < 0) { 813 - ret = 0; 814 - goto out; 815 - } 833 + /* 834 + * If the MMU is nested, CR3 holds an L2 GPA and needs to be translated 835 + * to an L1 GPA. 836 + */ 837 + real_gpa = mmu->translate_gpa(vcpu, gfn_to_gpa(pdpt_gfn), 838 + PFERR_USER_MASK | PFERR_WRITE_MASK, NULL); 839 + if (real_gpa == UNMAPPED_GVA) 840 + return 0; 841 + 842 + /* Note the offset, PDPTRs are 32 byte aligned when using PAE paging. */ 843 + ret = kvm_vcpu_read_guest_page(vcpu, gpa_to_gfn(real_gpa), pdpte, 844 + cr3 & GENMASK(11, 5), sizeof(pdpte)); 845 + if (ret < 0) 846 + return 0; 847 + 816 848 for (i = 0; i < ARRAY_SIZE(pdpte); ++i) { 817 849 if ((pdpte[i] & PT_PRESENT_MASK) && 818 850 (pdpte[i] & pdptr_rsvd_bits(vcpu))) { 819 - ret = 0; 820 - goto out; 851 + return 0; 821 852 } 822 853 } 823 - ret = 1; 824 854 825 855 memcpy(mmu->pdptrs, pdpte, sizeof(mmu->pdptrs)); 826 856 kvm_register_mark_dirty(vcpu, VCPU_EXREG_PDPTR); 827 857 vcpu->arch.pdptrs_from_userspace = false; 828 858 829 - out: 830 - 831 - return ret; 859 + return 1; 832 860 } 833 861 EXPORT_SYMBOL_GPL(load_pdptrs); 834 862 ··· 973 993 /* 974 994 * Do not allow the guest to set bits that we do not support 975 995 * saving. However, xcr0 bit 0 is always set, even if the 976 - * emulated CPU does not support XSAVE (see fx_init). 996 + * emulated CPU does not support XSAVE (see kvm_vcpu_reset()). 977 997 */ 978 998 valid_bits = vcpu->arch.guest_supported_xcr0 | XFEATURE_MASK_FP; 979 999 if (xcr0 & ~valid_bits) ··· 1022 1042 1023 1043 void kvm_post_set_cr4(struct kvm_vcpu *vcpu, unsigned long old_cr4, unsigned long cr4) 1024 1044 { 1025 - if (((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS) || 1026 - (!(cr4 & X86_CR4_PCIDE) && (old_cr4 & X86_CR4_PCIDE))) 1045 + /* 1046 + * If any role bit is changed, the MMU needs to be reset. 1047 + * 1048 + * If CR4.PCIDE is changed 1 -> 0, the guest TLB must be flushed. 1049 + * If CR4.PCIDE is changed 0 -> 1, there is no need to flush the TLB 1050 + * according to the SDM; however, stale prev_roots could be reused 1051 + * incorrectly in the future after a MOV to CR3 with NOFLUSH=1, so we 1052 + * free them all. KVM_REQ_MMU_RELOAD is fit for the both cases; it 1053 + * is slow, but changing CR4.PCIDE is a rare case. 1054 + * 1055 + * If CR4.PGE is changed, the guest TLB must be flushed. 1056 + * 1057 + * Note: resetting MMU is a superset of KVM_REQ_MMU_RELOAD and 1058 + * KVM_REQ_MMU_RELOAD is a superset of KVM_REQ_TLB_FLUSH_GUEST, hence 1059 + * the usage of "else if". 1060 + */ 1061 + if ((cr4 ^ old_cr4) & KVM_MMU_CR4_ROLE_BITS) 1027 1062 kvm_mmu_reset_context(vcpu); 1063 + else if ((cr4 ^ old_cr4) & X86_CR4_PCIDE) 1064 + kvm_make_request(KVM_REQ_MMU_RELOAD, vcpu); 1065 + else if ((cr4 ^ old_cr4) & X86_CR4_PGE) 1066 + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); 1028 1067 } 1029 1068 EXPORT_SYMBOL_GPL(kvm_post_set_cr4); 1030 1069 ··· 1091 1092 int i; 1092 1093 1093 1094 /* 1095 + * MOV CR3 and INVPCID are usually not intercepted when using TDP, but 1096 + * this is reachable when running EPT=1 and unrestricted_guest=0, and 1097 + * also via the emulator. KVM's TDP page tables are not in the scope of 1098 + * the invalidation, but the guest's TLB entries need to be flushed as 1099 + * the CPU may have cached entries in its TLB for the target PCID. 1100 + */ 1101 + if (unlikely(tdp_enabled)) { 1102 + kvm_make_request(KVM_REQ_TLB_FLUSH_GUEST, vcpu); 1103 + return; 1104 + } 1105 + 1106 + /* 1094 1107 * If neither the current CR3 nor any of the prev_roots use the given 1095 1108 * PCID, then nothing needs to be done here because a resync will 1096 1109 * happen anyway before switching to any other CR3. ··· 1111 1100 kvm_make_request(KVM_REQ_MMU_SYNC, vcpu); 1112 1101 kvm_make_request(KVM_REQ_TLB_FLUSH_CURRENT, vcpu); 1113 1102 } 1103 + 1104 + /* 1105 + * If PCID is disabled, there is no need to free prev_roots even if the 1106 + * PCIDs for them are also 0, because MOV to CR3 always flushes the TLB 1107 + * with PCIDE=0. 1108 + */ 1109 + if (!kvm_read_cr4_bits(vcpu, X86_CR4_PCIDE)) 1110 + return; 1114 1111 1115 1112 for (i = 0; i < KVM_MMU_NUM_PREV_ROOTS; i++) 1116 1113 if (kvm_get_pcid(vcpu, mmu->prev_roots[i].pgd) == pcid) ··· 1400 1381 MSR_PLATFORM_INFO, 1401 1382 MSR_MISC_FEATURES_ENABLES, 1402 1383 MSR_AMD64_VIRT_SPEC_CTRL, 1384 + MSR_AMD64_TSC_RATIO, 1403 1385 MSR_IA32_POWER_CTL, 1404 1386 MSR_IA32_UCODE_REV, 1405 1387 ··· 2474 2454 return check_tsc_unstable(); 2475 2455 } 2476 2456 2457 + /* 2458 + * Infers attempts to synchronize the guest's tsc from host writes. Sets the 2459 + * offset for the vcpu and tracks the TSC matching generation that the vcpu 2460 + * participates in. 2461 + */ 2462 + static void __kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 offset, u64 tsc, 2463 + u64 ns, bool matched) 2464 + { 2465 + struct kvm *kvm = vcpu->kvm; 2466 + 2467 + lockdep_assert_held(&kvm->arch.tsc_write_lock); 2468 + 2469 + /* 2470 + * We also track th most recent recorded KHZ, write and time to 2471 + * allow the matching interval to be extended at each write. 2472 + */ 2473 + kvm->arch.last_tsc_nsec = ns; 2474 + kvm->arch.last_tsc_write = tsc; 2475 + kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; 2476 + kvm->arch.last_tsc_offset = offset; 2477 + 2478 + vcpu->arch.last_guest_tsc = tsc; 2479 + 2480 + kvm_vcpu_write_tsc_offset(vcpu, offset); 2481 + 2482 + if (!matched) { 2483 + /* 2484 + * We split periods of matched TSC writes into generations. 2485 + * For each generation, we track the original measured 2486 + * nanosecond time, offset, and write, so if TSCs are in 2487 + * sync, we can match exact offset, and if not, we can match 2488 + * exact software computation in compute_guest_tsc() 2489 + * 2490 + * These values are tracked in kvm->arch.cur_xxx variables. 2491 + */ 2492 + kvm->arch.cur_tsc_generation++; 2493 + kvm->arch.cur_tsc_nsec = ns; 2494 + kvm->arch.cur_tsc_write = tsc; 2495 + kvm->arch.cur_tsc_offset = offset; 2496 + kvm->arch.nr_vcpus_matched_tsc = 0; 2497 + } else if (vcpu->arch.this_tsc_generation != kvm->arch.cur_tsc_generation) { 2498 + kvm->arch.nr_vcpus_matched_tsc++; 2499 + } 2500 + 2501 + /* Keep track of which generation this VCPU has synchronized to */ 2502 + vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation; 2503 + vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; 2504 + vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; 2505 + 2506 + kvm_track_tsc_matching(vcpu); 2507 + } 2508 + 2477 2509 static void kvm_synchronize_tsc(struct kvm_vcpu *vcpu, u64 data) 2478 2510 { 2479 2511 struct kvm *kvm = vcpu->kvm; 2480 2512 u64 offset, ns, elapsed; 2481 2513 unsigned long flags; 2482 - bool matched; 2483 - bool already_matched; 2514 + bool matched = false; 2484 2515 bool synchronizing = false; 2485 2516 2486 2517 raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); ··· 2577 2506 offset = kvm_compute_l1_tsc_offset(vcpu, data); 2578 2507 } 2579 2508 matched = true; 2580 - already_matched = (vcpu->arch.this_tsc_generation == kvm->arch.cur_tsc_generation); 2581 - } else { 2582 - /* 2583 - * We split periods of matched TSC writes into generations. 2584 - * For each generation, we track the original measured 2585 - * nanosecond time, offset, and write, so if TSCs are in 2586 - * sync, we can match exact offset, and if not, we can match 2587 - * exact software computation in compute_guest_tsc() 2588 - * 2589 - * These values are tracked in kvm->arch.cur_xxx variables. 2590 - */ 2591 - kvm->arch.cur_tsc_generation++; 2592 - kvm->arch.cur_tsc_nsec = ns; 2593 - kvm->arch.cur_tsc_write = data; 2594 - kvm->arch.cur_tsc_offset = offset; 2595 - matched = false; 2596 2509 } 2597 2510 2598 - /* 2599 - * We also track th most recent recorded KHZ, write and time to 2600 - * allow the matching interval to be extended at each write. 2601 - */ 2602 - kvm->arch.last_tsc_nsec = ns; 2603 - kvm->arch.last_tsc_write = data; 2604 - kvm->arch.last_tsc_khz = vcpu->arch.virtual_tsc_khz; 2605 - 2606 - vcpu->arch.last_guest_tsc = data; 2607 - 2608 - /* Keep track of which generation this VCPU has synchronized to */ 2609 - vcpu->arch.this_tsc_generation = kvm->arch.cur_tsc_generation; 2610 - vcpu->arch.this_tsc_nsec = kvm->arch.cur_tsc_nsec; 2611 - vcpu->arch.this_tsc_write = kvm->arch.cur_tsc_write; 2612 - 2613 - kvm_vcpu_write_tsc_offset(vcpu, offset); 2511 + __kvm_synchronize_tsc(vcpu, offset, data, ns, matched); 2614 2512 raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); 2615 - 2616 - raw_spin_lock_irqsave(&kvm->arch.pvclock_gtod_sync_lock, flags); 2617 - if (!matched) { 2618 - kvm->arch.nr_vcpus_matched_tsc = 0; 2619 - } else if (!already_matched) { 2620 - kvm->arch.nr_vcpus_matched_tsc++; 2621 - } 2622 - 2623 - kvm_track_tsc_matching(vcpu); 2624 - raw_spin_unlock_irqrestore(&kvm->arch.pvclock_gtod_sync_lock, flags); 2625 2513 } 2626 2514 2627 2515 static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu, ··· 2768 2738 int vclock_mode; 2769 2739 bool host_tsc_clocksource, vcpus_matched; 2770 2740 2741 + lockdep_assert_held(&kvm->arch.tsc_write_lock); 2771 2742 vcpus_matched = (ka->nr_vcpus_matched_tsc + 1 == 2772 2743 atomic_read(&kvm->online_vcpus)); 2773 2744 ··· 2793 2762 #endif 2794 2763 } 2795 2764 2796 - void kvm_make_mclock_inprogress_request(struct kvm *kvm) 2765 + static void kvm_make_mclock_inprogress_request(struct kvm *kvm) 2797 2766 { 2798 2767 kvm_make_all_cpus_request(kvm, KVM_REQ_MCLOCK_INPROGRESS); 2799 2768 } 2800 2769 2801 - static void kvm_gen_update_masterclock(struct kvm *kvm) 2770 + static void __kvm_start_pvclock_update(struct kvm *kvm) 2802 2771 { 2803 - #ifdef CONFIG_X86_64 2804 - int i; 2805 - struct kvm_vcpu *vcpu; 2806 - struct kvm_arch *ka = &kvm->arch; 2807 - unsigned long flags; 2772 + raw_spin_lock_irq(&kvm->arch.tsc_write_lock); 2773 + write_seqcount_begin(&kvm->arch.pvclock_sc); 2774 + } 2808 2775 2809 - kvm_hv_invalidate_tsc_page(kvm); 2810 - 2776 + static void kvm_start_pvclock_update(struct kvm *kvm) 2777 + { 2811 2778 kvm_make_mclock_inprogress_request(kvm); 2812 2779 2813 2780 /* no guest entries from this point */ 2814 - raw_spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); 2815 - pvclock_update_vm_gtod_copy(kvm); 2816 - raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); 2781 + __kvm_start_pvclock_update(kvm); 2782 + } 2817 2783 2784 + static void kvm_end_pvclock_update(struct kvm *kvm) 2785 + { 2786 + struct kvm_arch *ka = &kvm->arch; 2787 + struct kvm_vcpu *vcpu; 2788 + int i; 2789 + 2790 + write_seqcount_end(&ka->pvclock_sc); 2791 + raw_spin_unlock_irq(&ka->tsc_write_lock); 2818 2792 kvm_for_each_vcpu(i, vcpu, kvm) 2819 2793 kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); 2820 2794 2821 2795 /* guest entries allowed */ 2822 2796 kvm_for_each_vcpu(i, vcpu, kvm) 2823 2797 kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); 2824 - #endif 2825 2798 } 2826 2799 2827 - u64 get_kvmclock_ns(struct kvm *kvm) 2800 + static void kvm_update_masterclock(struct kvm *kvm) 2801 + { 2802 + kvm_hv_invalidate_tsc_page(kvm); 2803 + kvm_start_pvclock_update(kvm); 2804 + pvclock_update_vm_gtod_copy(kvm); 2805 + kvm_end_pvclock_update(kvm); 2806 + } 2807 + 2808 + /* Called within read_seqcount_begin/retry for kvm->pvclock_sc. */ 2809 + static void __get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) 2828 2810 { 2829 2811 struct kvm_arch *ka = &kvm->arch; 2830 2812 struct pvclock_vcpu_time_info hv_clock; 2831 - unsigned long flags; 2832 - u64 ret; 2833 - 2834 - raw_spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); 2835 - if (!ka->use_master_clock) { 2836 - raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); 2837 - return get_kvmclock_base_ns() + ka->kvmclock_offset; 2838 - } 2839 - 2840 - hv_clock.tsc_timestamp = ka->master_cycle_now; 2841 - hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; 2842 - raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); 2843 2813 2844 2814 /* both __this_cpu_read() and rdtsc() should be on the same cpu */ 2845 2815 get_cpu(); 2846 2816 2847 - if (__this_cpu_read(cpu_tsc_khz)) { 2817 + data->flags = 0; 2818 + if (ka->use_master_clock && __this_cpu_read(cpu_tsc_khz)) { 2819 + #ifdef CONFIG_X86_64 2820 + struct timespec64 ts; 2821 + 2822 + if (kvm_get_walltime_and_clockread(&ts, &data->host_tsc)) { 2823 + data->realtime = ts.tv_nsec + NSEC_PER_SEC * ts.tv_sec; 2824 + data->flags |= KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC; 2825 + } else 2826 + #endif 2827 + data->host_tsc = rdtsc(); 2828 + 2829 + data->flags |= KVM_CLOCK_TSC_STABLE; 2830 + hv_clock.tsc_timestamp = ka->master_cycle_now; 2831 + hv_clock.system_time = ka->master_kernel_ns + ka->kvmclock_offset; 2848 2832 kvm_get_time_scale(NSEC_PER_SEC, __this_cpu_read(cpu_tsc_khz) * 1000LL, 2849 2833 &hv_clock.tsc_shift, 2850 2834 &hv_clock.tsc_to_system_mul); 2851 - ret = __pvclock_read_cycles(&hv_clock, rdtsc()); 2852 - } else 2853 - ret = get_kvmclock_base_ns() + ka->kvmclock_offset; 2835 + data->clock = __pvclock_read_cycles(&hv_clock, data->host_tsc); 2836 + } else { 2837 + data->clock = get_kvmclock_base_ns() + ka->kvmclock_offset; 2838 + } 2854 2839 2855 2840 put_cpu(); 2841 + } 2856 2842 2857 - return ret; 2843 + static void get_kvmclock(struct kvm *kvm, struct kvm_clock_data *data) 2844 + { 2845 + struct kvm_arch *ka = &kvm->arch; 2846 + unsigned seq; 2847 + 2848 + do { 2849 + seq = read_seqcount_begin(&ka->pvclock_sc); 2850 + __get_kvmclock(kvm, data); 2851 + } while (read_seqcount_retry(&ka->pvclock_sc, seq)); 2852 + } 2853 + 2854 + u64 get_kvmclock_ns(struct kvm *kvm) 2855 + { 2856 + struct kvm_clock_data data; 2857 + 2858 + get_kvmclock(kvm, &data); 2859 + return data.clock; 2858 2860 } 2859 2861 2860 2862 static void kvm_setup_pvclock_page(struct kvm_vcpu *v, ··· 2952 2888 static int kvm_guest_time_update(struct kvm_vcpu *v) 2953 2889 { 2954 2890 unsigned long flags, tgt_tsc_khz; 2891 + unsigned seq; 2955 2892 struct kvm_vcpu_arch *vcpu = &v->arch; 2956 2893 struct kvm_arch *ka = &v->kvm->arch; 2957 2894 s64 kernel_ns; ··· 2967 2902 * If the host uses TSC clock, then passthrough TSC as stable 2968 2903 * to the guest. 2969 2904 */ 2970 - raw_spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); 2971 - use_master_clock = ka->use_master_clock; 2972 - if (use_master_clock) { 2973 - host_tsc = ka->master_cycle_now; 2974 - kernel_ns = ka->master_kernel_ns; 2975 - } 2976 - raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); 2905 + do { 2906 + seq = read_seqcount_begin(&ka->pvclock_sc); 2907 + use_master_clock = ka->use_master_clock; 2908 + if (use_master_clock) { 2909 + host_tsc = ka->master_cycle_now; 2910 + kernel_ns = ka->master_kernel_ns; 2911 + } 2912 + } while (read_seqcount_retry(&ka->pvclock_sc, seq)); 2977 2913 2978 2914 /* Keep irq disabled to prevent changes to the clock */ 2979 2915 local_irq_save(flags); ··· 3245 3179 ++vcpu->stat.tlb_flush; 3246 3180 3247 3181 if (!tdp_enabled) { 3248 - /* 3182 + /* 3249 3183 * A TLB flush on behalf of the guest is equivalent to 3250 3184 * INVPCID(all), toggling CR4.PGE, etc., which requires 3251 - * a forced sync of the shadow page tables. Unload the 3252 - * entire MMU here and the subsequent load will sync the 3253 - * shadow page tables, and also flush the TLB. 3185 + * a forced sync of the shadow page tables. Ensure all the 3186 + * roots are synced and the guest TLB in hardware is clean. 3254 3187 */ 3255 - kvm_mmu_unload(vcpu); 3256 - return; 3188 + kvm_mmu_sync_roots(vcpu); 3189 + kvm_mmu_sync_prev_roots(vcpu); 3257 3190 } 3258 3191 3259 3192 static_call(kvm_x86_tlb_flush_guest)(vcpu); ··· 4093 4028 case KVM_CAP_VM_COPY_ENC_CONTEXT_FROM: 4094 4029 case KVM_CAP_SREGS2: 4095 4030 case KVM_CAP_EXIT_ON_EMULATION_FAILURE: 4031 + case KVM_CAP_VCPU_ATTRIBUTES: 4096 4032 r = 1; 4097 4033 break; 4098 4034 case KVM_CAP_EXIT_HYPERCALL: ··· 4114 4048 r = KVM_SYNC_X86_VALID_FIELDS; 4115 4049 break; 4116 4050 case KVM_CAP_ADJUST_CLOCK: 4117 - r = KVM_CLOCK_TSC_STABLE; 4051 + r = KVM_CLOCK_VALID_FLAGS; 4118 4052 break; 4119 4053 case KVM_CAP_X86_DISABLE_EXITS: 4120 4054 r |= KVM_X86_DISABLE_EXITS_HLT | KVM_X86_DISABLE_EXITS_PAUSE | ··· 4143 4077 r = KVM_MAX_VCPUS; 4144 4078 break; 4145 4079 case KVM_CAP_MAX_VCPU_ID: 4146 - r = KVM_MAX_VCPU_ID; 4080 + r = KVM_MAX_VCPU_IDS; 4147 4081 break; 4148 4082 case KVM_CAP_PV_MMU: /* obsolete */ 4149 4083 r = 0; ··· 4841 4775 return 0; 4842 4776 } 4843 4777 4778 + static int kvm_arch_tsc_has_attr(struct kvm_vcpu *vcpu, 4779 + struct kvm_device_attr *attr) 4780 + { 4781 + int r; 4782 + 4783 + switch (attr->attr) { 4784 + case KVM_VCPU_TSC_OFFSET: 4785 + r = 0; 4786 + break; 4787 + default: 4788 + r = -ENXIO; 4789 + } 4790 + 4791 + return r; 4792 + } 4793 + 4794 + static int kvm_arch_tsc_get_attr(struct kvm_vcpu *vcpu, 4795 + struct kvm_device_attr *attr) 4796 + { 4797 + u64 __user *uaddr = (u64 __user *)(unsigned long)attr->addr; 4798 + int r; 4799 + 4800 + if ((u64)(unsigned long)uaddr != attr->addr) 4801 + return -EFAULT; 4802 + 4803 + switch (attr->attr) { 4804 + case KVM_VCPU_TSC_OFFSET: 4805 + r = -EFAULT; 4806 + if (put_user(vcpu->arch.l1_tsc_offset, uaddr)) 4807 + break; 4808 + r = 0; 4809 + break; 4810 + default: 4811 + r = -ENXIO; 4812 + } 4813 + 4814 + return r; 4815 + } 4816 + 4817 + static int kvm_arch_tsc_set_attr(struct kvm_vcpu *vcpu, 4818 + struct kvm_device_attr *attr) 4819 + { 4820 + u64 __user *uaddr = (u64 __user *)(unsigned long)attr->addr; 4821 + struct kvm *kvm = vcpu->kvm; 4822 + int r; 4823 + 4824 + if ((u64)(unsigned long)uaddr != attr->addr) 4825 + return -EFAULT; 4826 + 4827 + switch (attr->attr) { 4828 + case KVM_VCPU_TSC_OFFSET: { 4829 + u64 offset, tsc, ns; 4830 + unsigned long flags; 4831 + bool matched; 4832 + 4833 + r = -EFAULT; 4834 + if (get_user(offset, uaddr)) 4835 + break; 4836 + 4837 + raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); 4838 + 4839 + matched = (vcpu->arch.virtual_tsc_khz && 4840 + kvm->arch.last_tsc_khz == vcpu->arch.virtual_tsc_khz && 4841 + kvm->arch.last_tsc_offset == offset); 4842 + 4843 + tsc = kvm_scale_tsc(vcpu, rdtsc(), vcpu->arch.l1_tsc_scaling_ratio) + offset; 4844 + ns = get_kvmclock_base_ns(); 4845 + 4846 + __kvm_synchronize_tsc(vcpu, offset, tsc, ns, matched); 4847 + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); 4848 + 4849 + r = 0; 4850 + break; 4851 + } 4852 + default: 4853 + r = -ENXIO; 4854 + } 4855 + 4856 + return r; 4857 + } 4858 + 4859 + static int kvm_vcpu_ioctl_device_attr(struct kvm_vcpu *vcpu, 4860 + unsigned int ioctl, 4861 + void __user *argp) 4862 + { 4863 + struct kvm_device_attr attr; 4864 + int r; 4865 + 4866 + if (copy_from_user(&attr, argp, sizeof(attr))) 4867 + return -EFAULT; 4868 + 4869 + if (attr.group != KVM_VCPU_TSC_CTRL) 4870 + return -ENXIO; 4871 + 4872 + switch (ioctl) { 4873 + case KVM_HAS_DEVICE_ATTR: 4874 + r = kvm_arch_tsc_has_attr(vcpu, &attr); 4875 + break; 4876 + case KVM_GET_DEVICE_ATTR: 4877 + r = kvm_arch_tsc_get_attr(vcpu, &attr); 4878 + break; 4879 + case KVM_SET_DEVICE_ATTR: 4880 + r = kvm_arch_tsc_set_attr(vcpu, &attr); 4881 + break; 4882 + } 4883 + 4884 + return r; 4885 + } 4886 + 4844 4887 static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, 4845 4888 struct kvm_enable_cap *cap) 4846 4889 { ··· 5404 5229 r = __set_sregs2(vcpu, u.sregs2); 5405 5230 break; 5406 5231 } 5232 + case KVM_HAS_DEVICE_ATTR: 5233 + case KVM_GET_DEVICE_ATTR: 5234 + case KVM_SET_DEVICE_ATTR: 5235 + r = kvm_vcpu_ioctl_device_attr(vcpu, ioctl, argp); 5236 + break; 5407 5237 default: 5408 5238 r = -EINVAL; 5409 5239 } ··· 5892 5712 } 5893 5713 #endif /* CONFIG_HAVE_KVM_PM_NOTIFIER */ 5894 5714 5715 + static int kvm_vm_ioctl_get_clock(struct kvm *kvm, void __user *argp) 5716 + { 5717 + struct kvm_clock_data data = { 0 }; 5718 + 5719 + get_kvmclock(kvm, &data); 5720 + if (copy_to_user(argp, &data, sizeof(data))) 5721 + return -EFAULT; 5722 + 5723 + return 0; 5724 + } 5725 + 5726 + static int kvm_vm_ioctl_set_clock(struct kvm *kvm, void __user *argp) 5727 + { 5728 + struct kvm_arch *ka = &kvm->arch; 5729 + struct kvm_clock_data data; 5730 + u64 now_raw_ns; 5731 + 5732 + if (copy_from_user(&data, argp, sizeof(data))) 5733 + return -EFAULT; 5734 + 5735 + /* 5736 + * Only KVM_CLOCK_REALTIME is used, but allow passing the 5737 + * result of KVM_GET_CLOCK back to KVM_SET_CLOCK. 5738 + */ 5739 + if (data.flags & ~KVM_CLOCK_VALID_FLAGS) 5740 + return -EINVAL; 5741 + 5742 + kvm_hv_invalidate_tsc_page(kvm); 5743 + kvm_start_pvclock_update(kvm); 5744 + pvclock_update_vm_gtod_copy(kvm); 5745 + 5746 + /* 5747 + * This pairs with kvm_guest_time_update(): when masterclock is 5748 + * in use, we use master_kernel_ns + kvmclock_offset to set 5749 + * unsigned 'system_time' so if we use get_kvmclock_ns() (which 5750 + * is slightly ahead) here we risk going negative on unsigned 5751 + * 'system_time' when 'data.clock' is very small. 5752 + */ 5753 + if (data.flags & KVM_CLOCK_REALTIME) { 5754 + u64 now_real_ns = ktime_get_real_ns(); 5755 + 5756 + /* 5757 + * Avoid stepping the kvmclock backwards. 5758 + */ 5759 + if (now_real_ns > data.realtime) 5760 + data.clock += now_real_ns - data.realtime; 5761 + } 5762 + 5763 + if (ka->use_master_clock) 5764 + now_raw_ns = ka->master_kernel_ns; 5765 + else 5766 + now_raw_ns = get_kvmclock_base_ns(); 5767 + ka->kvmclock_offset = data.clock - now_raw_ns; 5768 + kvm_end_pvclock_update(kvm); 5769 + return 0; 5770 + } 5771 + 5895 5772 long kvm_arch_vm_ioctl(struct file *filp, 5896 5773 unsigned int ioctl, unsigned long arg) 5897 5774 { ··· 6192 5955 break; 6193 5956 } 6194 5957 #endif 6195 - case KVM_SET_CLOCK: { 6196 - struct kvm_arch *ka = &kvm->arch; 6197 - struct kvm_clock_data user_ns; 6198 - u64 now_ns; 6199 - 6200 - r = -EFAULT; 6201 - if (copy_from_user(&user_ns, argp, sizeof(user_ns))) 6202 - goto out; 6203 - 6204 - r = -EINVAL; 6205 - if (user_ns.flags) 6206 - goto out; 6207 - 6208 - r = 0; 6209 - /* 6210 - * TODO: userspace has to take care of races with VCPU_RUN, so 6211 - * kvm_gen_update_masterclock() can be cut down to locked 6212 - * pvclock_update_vm_gtod_copy(). 6213 - */ 6214 - kvm_gen_update_masterclock(kvm); 6215 - 6216 - /* 6217 - * This pairs with kvm_guest_time_update(): when masterclock is 6218 - * in use, we use master_kernel_ns + kvmclock_offset to set 6219 - * unsigned 'system_time' so if we use get_kvmclock_ns() (which 6220 - * is slightly ahead) here we risk going negative on unsigned 6221 - * 'system_time' when 'user_ns.clock' is very small. 6222 - */ 6223 - raw_spin_lock_irq(&ka->pvclock_gtod_sync_lock); 6224 - if (kvm->arch.use_master_clock) 6225 - now_ns = ka->master_kernel_ns; 6226 - else 6227 - now_ns = get_kvmclock_base_ns(); 6228 - ka->kvmclock_offset = user_ns.clock - now_ns; 6229 - raw_spin_unlock_irq(&ka->pvclock_gtod_sync_lock); 6230 - 6231 - kvm_make_all_cpus_request(kvm, KVM_REQ_CLOCK_UPDATE); 5958 + case KVM_SET_CLOCK: 5959 + r = kvm_vm_ioctl_set_clock(kvm, argp); 6232 5960 break; 6233 - } 6234 - case KVM_GET_CLOCK: { 6235 - struct kvm_clock_data user_ns; 6236 - u64 now_ns; 6237 - 6238 - now_ns = get_kvmclock_ns(kvm); 6239 - user_ns.clock = now_ns; 6240 - user_ns.flags = kvm->arch.use_master_clock ? KVM_CLOCK_TSC_STABLE : 0; 6241 - memset(&user_ns.pad, 0, sizeof(user_ns.pad)); 6242 - 6243 - r = -EFAULT; 6244 - if (copy_to_user(argp, &user_ns, sizeof(user_ns))) 6245 - goto out; 6246 - r = 0; 5961 + case KVM_GET_CLOCK: 5962 + r = kvm_vm_ioctl_get_clock(kvm, argp); 6247 5963 break; 6248 - } 6249 5964 case KVM_MEMORY_ENCRYPT_OP: { 6250 5965 r = -ENOTTY; 6251 5966 if (kvm_x86_ops.mem_enc_op) ··· 7564 7375 } 7565 7376 EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt); 7566 7377 7567 - static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu) 7378 + static void prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data, 7379 + u8 ndata, u8 *insn_bytes, u8 insn_size) 7568 7380 { 7569 - struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; 7570 - u32 insn_size = ctxt->fetch.end - ctxt->fetch.data; 7571 7381 struct kvm_run *run = vcpu->run; 7382 + u64 info[5]; 7383 + u8 info_start; 7384 + 7385 + /* 7386 + * Zero the whole array used to retrieve the exit info, as casting to 7387 + * u32 for select entries will leave some chunks uninitialized. 7388 + */ 7389 + memset(&info, 0, sizeof(info)); 7390 + 7391 + static_call(kvm_x86_get_exit_info)(vcpu, (u32 *)&info[0], &info[1], 7392 + &info[2], (u32 *)&info[3], 7393 + (u32 *)&info[4]); 7572 7394 7573 7395 run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 7574 7396 run->emulation_failure.suberror = KVM_INTERNAL_ERROR_EMULATION; 7575 - run->emulation_failure.ndata = 0; 7397 + 7398 + /* 7399 + * There's currently space for 13 entries, but 5 are used for the exit 7400 + * reason and info. Restrict to 4 to reduce the maintenance burden 7401 + * when expanding kvm_run.emulation_failure in the future. 7402 + */ 7403 + if (WARN_ON_ONCE(ndata > 4)) 7404 + ndata = 4; 7405 + 7406 + /* Always include the flags as a 'data' entry. */ 7407 + info_start = 1; 7576 7408 run->emulation_failure.flags = 0; 7577 7409 7578 7410 if (insn_size) { 7579 - run->emulation_failure.ndata = 3; 7411 + BUILD_BUG_ON((sizeof(run->emulation_failure.insn_size) + 7412 + sizeof(run->emulation_failure.insn_bytes) != 16)); 7413 + info_start += 2; 7580 7414 run->emulation_failure.flags |= 7581 7415 KVM_INTERNAL_ERROR_EMULATION_FLAG_INSTRUCTION_BYTES; 7582 7416 run->emulation_failure.insn_size = insn_size; 7583 7417 memset(run->emulation_failure.insn_bytes, 0x90, 7584 7418 sizeof(run->emulation_failure.insn_bytes)); 7585 - memcpy(run->emulation_failure.insn_bytes, 7586 - ctxt->fetch.data, insn_size); 7419 + memcpy(run->emulation_failure.insn_bytes, insn_bytes, insn_size); 7587 7420 } 7421 + 7422 + memcpy(&run->internal.data[info_start], info, sizeof(info)); 7423 + memcpy(&run->internal.data[info_start + ARRAY_SIZE(info)], data, 7424 + ndata * sizeof(data[0])); 7425 + 7426 + run->emulation_failure.ndata = info_start + ARRAY_SIZE(info) + ndata; 7588 7427 } 7428 + 7429 + static void prepare_emulation_ctxt_failure_exit(struct kvm_vcpu *vcpu) 7430 + { 7431 + struct x86_emulate_ctxt *ctxt = vcpu->arch.emulate_ctxt; 7432 + 7433 + prepare_emulation_failure_exit(vcpu, NULL, 0, ctxt->fetch.data, 7434 + ctxt->fetch.end - ctxt->fetch.data); 7435 + } 7436 + 7437 + void __kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu, u64 *data, 7438 + u8 ndata) 7439 + { 7440 + prepare_emulation_failure_exit(vcpu, data, ndata, NULL, 0); 7441 + } 7442 + EXPORT_SYMBOL_GPL(__kvm_prepare_emulation_failure_exit); 7443 + 7444 + void kvm_prepare_emulation_failure_exit(struct kvm_vcpu *vcpu) 7445 + { 7446 + __kvm_prepare_emulation_failure_exit(vcpu, NULL, 0); 7447 + } 7448 + EXPORT_SYMBOL_GPL(kvm_prepare_emulation_failure_exit); 7589 7449 7590 7450 static int handle_emulation_failure(struct kvm_vcpu *vcpu, int emulation_type) 7591 7451 { ··· 7650 7412 7651 7413 if (kvm->arch.exit_on_emulation_error || 7652 7414 (emulation_type & EMULTYPE_SKIP)) { 7653 - prepare_emulation_failure_exit(vcpu); 7415 + prepare_emulation_ctxt_failure_exit(vcpu); 7654 7416 return 0; 7655 7417 } 7656 7418 7657 7419 kvm_queue_exception(vcpu, UD_VECTOR); 7658 7420 7659 7421 if (!is_guest_mode(vcpu) && static_call(kvm_x86_get_cpl)(vcpu) == 0) { 7660 - vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 7661 - vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; 7662 - vcpu->run->internal.ndata = 0; 7422 + prepare_emulation_ctxt_failure_exit(vcpu); 7663 7423 return 0; 7664 7424 } 7665 7425 ··· 8257 8021 static void kvm_hyperv_tsc_notifier(void) 8258 8022 { 8259 8023 struct kvm *kvm; 8260 - struct kvm_vcpu *vcpu; 8261 8024 int cpu; 8262 - unsigned long flags; 8263 8025 8264 8026 mutex_lock(&kvm_lock); 8265 8027 list_for_each_entry(kvm, &vm_list, vm_list) 8266 8028 kvm_make_mclock_inprogress_request(kvm); 8267 8029 8030 + /* no guest entries from this point */ 8268 8031 hyperv_stop_tsc_emulation(); 8269 8032 8270 8033 /* TSC frequency always matches when on Hyper-V */ ··· 8272 8037 kvm_max_guest_tsc_khz = tsc_khz; 8273 8038 8274 8039 list_for_each_entry(kvm, &vm_list, vm_list) { 8275 - struct kvm_arch *ka = &kvm->arch; 8276 - 8277 - raw_spin_lock_irqsave(&ka->pvclock_gtod_sync_lock, flags); 8040 + __kvm_start_pvclock_update(kvm); 8278 8041 pvclock_update_vm_gtod_copy(kvm); 8279 - raw_spin_unlock_irqrestore(&ka->pvclock_gtod_sync_lock, flags); 8280 - 8281 - kvm_for_each_vcpu(cpu, vcpu, kvm) 8282 - kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); 8283 - 8284 - kvm_for_each_vcpu(cpu, vcpu, kvm) 8285 - kvm_clear_request(KVM_REQ_MCLOCK_INPROGRESS, vcpu); 8042 + kvm_end_pvclock_update(kvm); 8286 8043 } 8044 + 8287 8045 mutex_unlock(&kvm_lock); 8288 8046 } 8289 8047 #endif ··· 8517 8289 int r; 8518 8290 8519 8291 if (kvm_x86_ops.hardware_enable) { 8520 - printk(KERN_ERR "kvm: already loaded the other module\n"); 8292 + pr_err("kvm: already loaded vendor module '%s'\n", kvm_x86_ops.name); 8521 8293 r = -EEXIST; 8522 8294 goto out; 8523 8295 } 8524 8296 8525 8297 if (!ops->cpu_has_kvm_support()) { 8526 - pr_err_ratelimited("kvm: no hardware support\n"); 8298 + pr_err_ratelimited("kvm: no hardware support for '%s'\n", 8299 + ops->runtime_ops->name); 8527 8300 r = -EOPNOTSUPP; 8528 8301 goto out; 8529 8302 } 8530 8303 if (ops->disabled_by_bios()) { 8531 - pr_err_ratelimited("kvm: disabled by bios\n"); 8304 + pr_err_ratelimited("kvm: support for '%s' disabled by bios\n", 8305 + ops->runtime_ops->name); 8532 8306 r = -EOPNOTSUPP; 8533 8307 goto out; 8534 8308 } ··· 8715 8485 8716 8486 static void kvm_apicv_init(struct kvm *kvm) 8717 8487 { 8718 - mutex_init(&kvm->arch.apicv_update_lock); 8488 + init_rwsem(&kvm->arch.apicv_update_lock); 8719 8489 8720 8490 if (enable_apicv) 8721 8491 clear_bit(APICV_INHIBIT_REASON_DISABLE, ··· 9370 9140 void kvm_make_scan_ioapic_request_mask(struct kvm *kvm, 9371 9141 unsigned long *vcpu_bitmap) 9372 9142 { 9373 - cpumask_var_t cpus; 9374 - 9375 - zalloc_cpumask_var(&cpus, GFP_ATOMIC); 9376 - 9377 - kvm_make_vcpus_request_mask(kvm, KVM_REQ_SCAN_IOAPIC, 9378 - NULL, vcpu_bitmap, cpus); 9379 - 9380 - free_cpumask_var(cpus); 9143 + kvm_make_vcpus_request_mask(kvm, KVM_REQ_SCAN_IOAPIC, vcpu_bitmap); 9381 9144 } 9382 9145 9383 9146 void kvm_make_scan_ioapic_request(struct kvm *kvm) ··· 9385 9162 if (!lapic_in_kernel(vcpu)) 9386 9163 return; 9387 9164 9388 - mutex_lock(&vcpu->kvm->arch.apicv_update_lock); 9165 + down_read(&vcpu->kvm->arch.apicv_update_lock); 9389 9166 9390 9167 activate = kvm_apicv_activated(vcpu->kvm); 9391 9168 if (vcpu->arch.apicv_active == activate) ··· 9405 9182 kvm_make_request(KVM_REQ_EVENT, vcpu); 9406 9183 9407 9184 out: 9408 - mutex_unlock(&vcpu->kvm->arch.apicv_update_lock); 9185 + up_read(&vcpu->kvm->arch.apicv_update_lock); 9409 9186 } 9410 9187 EXPORT_SYMBOL_GPL(kvm_vcpu_update_apicv); 9411 9188 9412 9189 void __kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit) 9413 9190 { 9414 9191 unsigned long old, new; 9192 + 9193 + lockdep_assert_held_write(&kvm->arch.apicv_update_lock); 9415 9194 9416 9195 if (!kvm_x86_ops.check_apicv_inhibit_reasons || 9417 9196 !static_call(kvm_x86_check_apicv_inhibit_reasons)(bit)) ··· 9428 9203 9429 9204 if (!!old != !!new) { 9430 9205 trace_kvm_apicv_update_request(activate, bit); 9206 + /* 9207 + * Kick all vCPUs before setting apicv_inhibit_reasons to avoid 9208 + * false positives in the sanity check WARN in svm_vcpu_run(). 9209 + * This task will wait for all vCPUs to ack the kick IRQ before 9210 + * updating apicv_inhibit_reasons, and all other vCPUs will 9211 + * block on acquiring apicv_update_lock so that vCPUs can't 9212 + * redo svm_vcpu_run() without seeing the new inhibit state. 9213 + * 9214 + * Note, holding apicv_update_lock and taking it in the read 9215 + * side (handling the request) also prevents other vCPUs from 9216 + * servicing the request with a stale apicv_inhibit_reasons. 9217 + */ 9431 9218 kvm_make_all_cpus_request(kvm, KVM_REQ_APICV_UPDATE); 9432 9219 kvm->arch.apicv_inhibit_reasons = new; 9433 9220 if (new) { ··· 9453 9216 9454 9217 void kvm_request_apicv_update(struct kvm *kvm, bool activate, ulong bit) 9455 9218 { 9456 - mutex_lock(&kvm->arch.apicv_update_lock); 9219 + down_write(&kvm->arch.apicv_update_lock); 9457 9220 __kvm_request_apicv_update(kvm, activate, bit); 9458 - mutex_unlock(&kvm->arch.apicv_update_lock); 9221 + up_write(&kvm->arch.apicv_update_lock); 9459 9222 } 9460 9223 EXPORT_SYMBOL_GPL(kvm_request_apicv_update); 9461 9224 ··· 9567 9330 if (kvm_check_request(KVM_REQ_MIGRATE_TIMER, vcpu)) 9568 9331 __kvm_migrate_timers(vcpu); 9569 9332 if (kvm_check_request(KVM_REQ_MASTERCLOCK_UPDATE, vcpu)) 9570 - kvm_gen_update_masterclock(vcpu->kvm); 9333 + kvm_update_masterclock(vcpu->kvm); 9571 9334 if (kvm_check_request(KVM_REQ_GLOBAL_CLOCK_UPDATE, vcpu)) 9572 9335 kvm_gen_kvmclock_update(vcpu); 9573 9336 if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, vcpu)) { ··· 9774 9537 } 9775 9538 9776 9539 for (;;) { 9540 + /* 9541 + * Assert that vCPU vs. VM APICv state is consistent. An APICv 9542 + * update must kick and wait for all vCPUs before toggling the 9543 + * per-VM state, and responsing vCPUs must wait for the update 9544 + * to complete before servicing KVM_REQ_APICV_UPDATE. 9545 + */ 9546 + WARN_ON_ONCE(kvm_apicv_activated(vcpu->kvm) != kvm_vcpu_apicv_active(vcpu)); 9547 + 9777 9548 exit_fastpath = static_call(kvm_x86_run)(vcpu); 9778 9549 if (likely(exit_fastpath != EXIT_FASTPATH_REENTER_GUEST)) 9779 9550 break; ··· 10730 10485 return 0; 10731 10486 } 10732 10487 10733 - static void fx_init(struct kvm_vcpu *vcpu) 10734 - { 10735 - /* 10736 - * Ensure guest xcr0 is valid for loading 10737 - */ 10738 - vcpu->arch.xcr0 = XFEATURE_MASK_FP; 10739 - 10740 - vcpu->arch.cr0 |= X86_CR0_ET; 10741 - } 10742 - 10743 10488 int kvm_arch_vcpu_precreate(struct kvm *kvm, unsigned int id) 10744 10489 { 10745 10490 if (kvm_check_tsc_unstable() && atomic_read(&kvm->online_vcpus) != 0) ··· 10790 10555 pr_err("kvm: failed to allocate vcpu's fpu\n"); 10791 10556 goto free_emulate_ctxt; 10792 10557 } 10793 - 10794 - fx_init(vcpu); 10795 10558 10796 10559 vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu); 10797 10560 vcpu->arch.reserved_gpa_bits = kvm_vcpu_reserved_gpa_bits_raw(vcpu); ··· 10887 10654 10888 10655 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) 10889 10656 { 10657 + struct kvm_cpuid_entry2 *cpuid_0x1; 10890 10658 unsigned long old_cr0 = kvm_read_cr0(vcpu); 10891 10659 unsigned long new_cr0; 10892 - u32 eax, dummy; 10660 + 10661 + /* 10662 + * Several of the "set" flows, e.g. ->set_cr0(), read other registers 10663 + * to handle side effects. RESET emulation hits those flows and relies 10664 + * on emulated/virtualized registers, including those that are loaded 10665 + * into hardware, to be zeroed at vCPU creation. Use CRs as a sentinel 10666 + * to detect improper or missing initialization. 10667 + */ 10668 + WARN_ON_ONCE(!init_event && 10669 + (old_cr0 || kvm_read_cr3(vcpu) || kvm_read_cr4(vcpu))); 10893 10670 10894 10671 kvm_lapic_reset(vcpu, init_event); 10895 10672 ··· 10958 10715 vcpu->arch.xcr0 = XFEATURE_MASK_FP; 10959 10716 } 10960 10717 10718 + /* All GPRs except RDX (handled below) are zeroed on RESET/INIT. */ 10961 10719 memset(vcpu->arch.regs, 0, sizeof(vcpu->arch.regs)); 10962 - vcpu->arch.regs_avail = ~0; 10963 - vcpu->arch.regs_dirty = ~0; 10720 + kvm_register_mark_dirty(vcpu, VCPU_REGS_RSP); 10964 10721 10965 10722 /* 10966 10723 * Fall back to KVM's default Family/Model/Stepping of 0x600 (P6/Athlon) 10967 10724 * if no CPUID match is found. Note, it's impossible to get a match at 10968 10725 * RESET since KVM emulates RESET before exposing the vCPU to userspace, 10969 - * i.e. it'simpossible for kvm_cpuid() to find a valid entry on RESET. 10970 - * But, go through the motions in case that's ever remedied. 10726 + * i.e. it's impossible for kvm_find_cpuid_entry() to find a valid entry 10727 + * on RESET. But, go through the motions in case that's ever remedied. 10971 10728 */ 10972 - eax = 1; 10973 - if (!kvm_cpuid(vcpu, &eax, &dummy, &dummy, &dummy, true)) 10974 - eax = 0x600; 10975 - kvm_rdx_write(vcpu, eax); 10729 + cpuid_0x1 = kvm_find_cpuid_entry(vcpu, 1, 0); 10730 + kvm_rdx_write(vcpu, cpuid_0x1 ? cpuid_0x1->eax : 0x600); 10976 10731 10977 10732 vcpu->arch.ia32_xss = 0; 10978 10733 ··· 11222 10981 void kvm_arch_free_vm(struct kvm *kvm) 11223 10982 { 11224 10983 kfree(to_kvm_hv(kvm)->hv_pa_pg); 11225 - vfree(kvm); 10984 + __kvm_arch_free_vm(kvm); 11226 10985 } 11227 10986 11228 10987 11229 10988 int kvm_arch_init_vm(struct kvm *kvm, unsigned long type) 11230 10989 { 11231 10990 int ret; 10991 + unsigned long flags; 11232 10992 11233 10993 if (type) 11234 10994 return -EINVAL; ··· 11253 11011 11254 11012 raw_spin_lock_init(&kvm->arch.tsc_write_lock); 11255 11013 mutex_init(&kvm->arch.apic_map_lock); 11256 - raw_spin_lock_init(&kvm->arch.pvclock_gtod_sync_lock); 11257 - 11014 + seqcount_raw_spinlock_init(&kvm->arch.pvclock_sc, &kvm->arch.tsc_write_lock); 11258 11015 kvm->arch.kvmclock_offset = -get_kvmclock_base_ns(); 11016 + 11017 + raw_spin_lock_irqsave(&kvm->arch.tsc_write_lock, flags); 11259 11018 pvclock_update_vm_gtod_copy(kvm); 11019 + raw_spin_unlock_irqrestore(&kvm->arch.tsc_write_lock, flags); 11260 11020 11261 11021 kvm->arch.guest_can_read_msr_platform_info = true; 11262 11022 ··· 11455 11211 kvm_page_track_free_memslot(slot); 11456 11212 } 11457 11213 11458 - static int memslot_rmap_alloc(struct kvm_memory_slot *slot, 11459 - unsigned long npages) 11214 + int memslot_rmap_alloc(struct kvm_memory_slot *slot, unsigned long npages) 11460 11215 { 11461 11216 const int sz = sizeof(*slot->arch.rmap[0]); 11462 11217 int i; ··· 11474 11231 } 11475 11232 } 11476 11233 11477 - return 0; 11478 - } 11479 - 11480 - int alloc_all_memslots_rmaps(struct kvm *kvm) 11481 - { 11482 - struct kvm_memslots *slots; 11483 - struct kvm_memory_slot *slot; 11484 - int r, i; 11485 - 11486 - /* 11487 - * Check if memslots alreday have rmaps early before acquiring 11488 - * the slots_arch_lock below. 11489 - */ 11490 - if (kvm_memslots_have_rmaps(kvm)) 11491 - return 0; 11492 - 11493 - mutex_lock(&kvm->slots_arch_lock); 11494 - 11495 - /* 11496 - * Read memslots_have_rmaps again, under the slots arch lock, 11497 - * before allocating the rmaps 11498 - */ 11499 - if (kvm_memslots_have_rmaps(kvm)) { 11500 - mutex_unlock(&kvm->slots_arch_lock); 11501 - return 0; 11502 - } 11503 - 11504 - for (i = 0; i < KVM_ADDRESS_SPACE_NUM; i++) { 11505 - slots = __kvm_memslots(kvm, i); 11506 - kvm_for_each_memslot(slot, slots) { 11507 - r = memslot_rmap_alloc(slot, slot->npages); 11508 - if (r) { 11509 - mutex_unlock(&kvm->slots_arch_lock); 11510 - return r; 11511 - } 11512 - } 11513 - } 11514 - 11515 - /* 11516 - * Ensure that memslots_have_rmaps becomes true strictly after 11517 - * all the rmap pointers are set. 11518 - */ 11519 - smp_store_release(&kvm->arch.memslots_have_rmaps, true); 11520 - mutex_unlock(&kvm->slots_arch_lock); 11521 11234 return 0; 11522 11235 } 11523 11236 ··· 11527 11328 } 11528 11329 } 11529 11330 11530 - if (kvm_page_track_create_memslot(slot, npages)) 11331 + if (kvm_page_track_create_memslot(kvm, slot, npages)) 11531 11332 goto out_free; 11532 11333 11533 11334 return 0; ··· 12125 11926 return static_call(kvm_x86_update_pi_irte)(kvm, host_irq, guest_irq, set); 12126 11927 } 12127 11928 11929 + bool kvm_arch_irqfd_route_changed(struct kvm_kernel_irq_routing_entry *old, 11930 + struct kvm_kernel_irq_routing_entry *new) 11931 + { 11932 + if (new->type != KVM_IRQ_ROUTING_MSI) 11933 + return true; 11934 + 11935 + return !!memcmp(&old->msi, &new->msi, sizeof(new->msi)); 11936 + } 11937 + 12128 11938 bool kvm_vector_hashing_enabled(void) 12129 11939 { 12130 11940 return vector_hashing; ··· 12215 12007 * doesn't seem to be a real use-case behind such requests, just return 12216 12008 * KVM_EXIT_INTERNAL_ERROR for now. 12217 12009 */ 12218 - vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; 12219 - vcpu->run->internal.suberror = KVM_INTERNAL_ERROR_EMULATION; 12220 - vcpu->run->internal.ndata = 0; 12010 + kvm_prepare_emulation_failure_exit(vcpu); 12221 12011 12222 12012 return 0; 12223 12013 }

-2

arch/x86/kvm/x86.h

··· 343 343 344 344 extern int pi_inject_timer; 345 345 346 - extern struct static_key kvm_no_apic_vcpu; 347 - 348 346 extern bool report_ignored_msrs; 349 347 350 348 static inline u64 nsec_to_cycles(struct kvm_vcpu *vcpu, u64 nsec)

+9

drivers/clocksource/timer-riscv.c

··· 13 13 #include <linux/delay.h> 14 14 #include <linux/irq.h> 15 15 #include <linux/irqdomain.h> 16 + #include <linux/module.h> 16 17 #include <linux/sched_clock.h> 17 18 #include <linux/io-64-nonatomic-lo-hi.h> 18 19 #include <linux/interrupt.h> 19 20 #include <linux/of_irq.h> 21 + #include <clocksource/timer-riscv.h> 20 22 #include <asm/smp.h> 21 23 #include <asm/sbi.h> 22 24 #include <asm/timex.h> ··· 80 78 disable_percpu_irq(riscv_clock_event_irq); 81 79 return 0; 82 80 } 81 + 82 + void riscv_cs_get_mult_shift(u32 *mult, u32 *shift) 83 + { 84 + *mult = riscv_clocksource.mult; 85 + *shift = riscv_clocksource.shift; 86 + } 87 + EXPORT_SYMBOL_GPL(riscv_cs_get_mult_shift); 83 88 84 89 /* called directly from the low-level interrupt handler */ 85 90 static irqreturn_t riscv_timer_interrupt(int irq, void *dev_id)

+1

drivers/gpu/drm/i915/Kconfig

··· 126 126 depends on DRM_I915_GVT 127 127 depends on KVM 128 128 depends on VFIO_MDEV 129 + select KVM_EXTERNAL_WRITE_TRACKING 129 130 default n 130 131 help 131 132 Choose this option if you want to enable KVMGT support for

+16

include/clocksource/timer-riscv.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Copyright (C) 2019 Western Digital Corporation or its affiliates. 4 + * 5 + * Authors: 6 + * Atish Patra <atish.patra@wdc.com> 7 + */ 8 + 9 + #ifndef __TIMER_RISCV_H 10 + #define __TIMER_RISCV_H 11 + 12 + #include <linux/types.h> 13 + 14 + extern void riscv_cs_get_mult_shift(u32 *mult, u32 *shift); 15 + 16 + #endif

+13 -5

include/linux/kvm_host.h

··· 39 39 #include <asm/kvm_host.h> 40 40 #include <linux/kvm_dirty_ring.h> 41 41 42 - #ifndef KVM_MAX_VCPU_ID 43 - #define KVM_MAX_VCPU_ID KVM_MAX_VCPUS 42 + #ifndef KVM_MAX_VCPU_IDS 43 + #define KVM_MAX_VCPU_IDS KVM_MAX_VCPUS 44 44 #endif 45 45 46 46 /* ··· 160 160 #define KVM_ARCH_REQ(nr) KVM_ARCH_REQ_FLAGS(nr, 0) 161 161 162 162 bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req, 163 - struct kvm_vcpu *except, 164 - unsigned long *vcpu_bitmap, cpumask_var_t tmp); 163 + unsigned long *vcpu_bitmap); 165 164 bool kvm_make_all_cpus_request(struct kvm *kvm, unsigned int req); 166 165 bool kvm_make_all_cpus_request_except(struct kvm *kvm, unsigned int req, 167 166 struct kvm_vcpu *except); ··· 1081 1082 { 1082 1083 return kzalloc(sizeof(struct kvm), GFP_KERNEL); 1083 1084 } 1085 + #endif 1084 1086 1087 + static inline void __kvm_arch_free_vm(struct kvm *kvm) 1088 + { 1089 + kvfree(kvm); 1090 + } 1091 + 1092 + #ifndef __KVM_HAVE_ARCH_VM_FREE 1085 1093 static inline void kvm_arch_free_vm(struct kvm *kvm) 1086 1094 { 1087 - kfree(kvm); 1095 + __kvm_arch_free_vm(kvm); 1088 1096 } 1089 1097 #endif 1090 1098 ··· 1771 1765 void kvm_arch_irq_bypass_start(struct irq_bypass_consumer *); 1772 1766 int kvm_arch_update_irqfd_routing(struct kvm *kvm, unsigned int host_irq, 1773 1767 uint32_t guest_irq, bool set); 1768 + bool kvm_arch_irqfd_route_changed(struct kvm_kernel_irq_routing_entry *, 1769 + struct kvm_kernel_irq_routing_entry *); 1774 1770 #endif /* CONFIG_HAVE_KVM_IRQ_BYPASS */ 1775 1771 1776 1772 #ifdef CONFIG_HAVE_KVM_INVALID_WAKEUPS

+26 -3

include/uapi/linux/kvm.h

··· 269 269 #define KVM_EXIT_AP_RESET_HOLD 32 270 270 #define KVM_EXIT_X86_BUS_LOCK 33 271 271 #define KVM_EXIT_XEN 34 272 + #define KVM_EXIT_RISCV_SBI 35 272 273 273 274 /* For KVM_EXIT_INTERNAL_ERROR */ 274 275 /* Emulate instruction failed. */ ··· 398 397 * "ndata" is correct, that new fields are enumerated in "flags", 399 398 * and that each flag enumerates fields that are 64-bit aligned 400 399 * and sized (so that ndata+internal.data[] is valid/accurate). 400 + * 401 + * Space beyond the defined fields may be used to store arbitrary 402 + * debug information relating to the emulation failure. It is 403 + * accounted for in "ndata" but the format is unspecified and is 404 + * not represented in "flags". Any such information is *not* ABI! 401 405 */ 402 406 struct { 403 407 __u32 suberror; 404 408 __u32 ndata; 405 409 __u64 flags; 406 - __u8 insn_size; 407 - __u8 insn_bytes[15]; 410 + union { 411 + struct { 412 + __u8 insn_size; 413 + __u8 insn_bytes[15]; 414 + }; 415 + }; 416 + /* Arbitrary debug data may follow. */ 408 417 } emulation_failure; 409 418 /* KVM_EXIT_OSI */ 410 419 struct { ··· 480 469 } msr; 481 470 /* KVM_EXIT_XEN */ 482 471 struct kvm_xen_exit xen; 472 + /* KVM_EXIT_RISCV_SBI */ 473 + struct { 474 + unsigned long extension_id; 475 + unsigned long function_id; 476 + unsigned long args[6]; 477 + unsigned long ret[2]; 478 + } riscv_sbi; 483 479 /* Fix the size of the union. */ 484 480 char padding[256]; 485 481 }; ··· 1241 1223 1242 1224 /* Do not use 1, KVM_CHECK_EXTENSION returned it before we had flags. */ 1243 1225 #define KVM_CLOCK_TSC_STABLE 2 1226 + #define KVM_CLOCK_REALTIME (1 << 2) 1227 + #define KVM_CLOCK_HOST_TSC (1 << 3) 1244 1228 1245 1229 struct kvm_clock_data { 1246 1230 __u64 clock; 1247 1231 __u32 flags; 1248 - __u32 pad[9]; 1232 + __u32 pad0; 1233 + __u64 realtime; 1234 + __u64 host_tsc; 1235 + __u32 pad[4]; 1249 1236 }; 1250 1237 1251 1238 /* For KVM_CAP_SW_TLB */

+1296

tools/arch/arm64/include/asm/sysreg.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + /* 3 + * Macros for accessing system registers with older binutils. 4 + * 5 + * Copyright (C) 2014 ARM Ltd. 6 + * Author: Catalin Marinas <catalin.marinas@arm.com> 7 + */ 8 + 9 + #ifndef __ASM_SYSREG_H 10 + #define __ASM_SYSREG_H 11 + 12 + #include <linux/bits.h> 13 + #include <linux/stringify.h> 14 + 15 + /* 16 + * ARMv8 ARM reserves the following encoding for system registers: 17 + * (Ref: ARMv8 ARM, Section: "System instruction class encoding overview", 18 + * C5.2, version:ARM DDI 0487A.f) 19 + * [20-19] : Op0 20 + * [18-16] : Op1 21 + * [15-12] : CRn 22 + * [11-8] : CRm 23 + * [7-5] : Op2 24 + */ 25 + #define Op0_shift 19 26 + #define Op0_mask 0x3 27 + #define Op1_shift 16 28 + #define Op1_mask 0x7 29 + #define CRn_shift 12 30 + #define CRn_mask 0xf 31 + #define CRm_shift 8 32 + #define CRm_mask 0xf 33 + #define Op2_shift 5 34 + #define Op2_mask 0x7 35 + 36 + #define sys_reg(op0, op1, crn, crm, op2) \ 37 + (((op0) << Op0_shift) | ((op1) << Op1_shift) | \ 38 + ((crn) << CRn_shift) | ((crm) << CRm_shift) | \ 39 + ((op2) << Op2_shift)) 40 + 41 + #define sys_insn sys_reg 42 + 43 + #define sys_reg_Op0(id) (((id) >> Op0_shift) & Op0_mask) 44 + #define sys_reg_Op1(id) (((id) >> Op1_shift) & Op1_mask) 45 + #define sys_reg_CRn(id) (((id) >> CRn_shift) & CRn_mask) 46 + #define sys_reg_CRm(id) (((id) >> CRm_shift) & CRm_mask) 47 + #define sys_reg_Op2(id) (((id) >> Op2_shift) & Op2_mask) 48 + 49 + #ifndef CONFIG_BROKEN_GAS_INST 50 + 51 + #ifdef __ASSEMBLY__ 52 + // The space separator is omitted so that __emit_inst(x) can be parsed as 53 + // either an assembler directive or an assembler macro argument. 54 + #define __emit_inst(x) .inst(x) 55 + #else 56 + #define __emit_inst(x) ".inst " __stringify((x)) "\n\t" 57 + #endif 58 + 59 + #else /* CONFIG_BROKEN_GAS_INST */ 60 + 61 + #ifndef CONFIG_CPU_BIG_ENDIAN 62 + #define __INSTR_BSWAP(x) (x) 63 + #else /* CONFIG_CPU_BIG_ENDIAN */ 64 + #define __INSTR_BSWAP(x) ((((x) << 24) & 0xff000000) | \ 65 + (((x) << 8) & 0x00ff0000) | \ 66 + (((x) >> 8) & 0x0000ff00) | \ 67 + (((x) >> 24) & 0x000000ff)) 68 + #endif /* CONFIG_CPU_BIG_ENDIAN */ 69 + 70 + #ifdef __ASSEMBLY__ 71 + #define __emit_inst(x) .long __INSTR_BSWAP(x) 72 + #else /* __ASSEMBLY__ */ 73 + #define __emit_inst(x) ".long " __stringify(__INSTR_BSWAP(x)) "\n\t" 74 + #endif /* __ASSEMBLY__ */ 75 + 76 + #endif /* CONFIG_BROKEN_GAS_INST */ 77 + 78 + /* 79 + * Instructions for modifying PSTATE fields. 80 + * As per Arm ARM for v8-A, Section "C.5.1.3 op0 == 0b00, architectural hints, 81 + * barriers and CLREX, and PSTATE access", ARM DDI 0487 C.a, system instructions 82 + * for accessing PSTATE fields have the following encoding: 83 + * Op0 = 0, CRn = 4 84 + * Op1, Op2 encodes the PSTATE field modified and defines the constraints. 85 + * CRm = Imm4 for the instruction. 86 + * Rt = 0x1f 87 + */ 88 + #define pstate_field(op1, op2) ((op1) << Op1_shift | (op2) << Op2_shift) 89 + #define PSTATE_Imm_shift CRm_shift 90 + 91 + #define PSTATE_PAN pstate_field(0, 4) 92 + #define PSTATE_UAO pstate_field(0, 3) 93 + #define PSTATE_SSBS pstate_field(3, 1) 94 + #define PSTATE_TCO pstate_field(3, 4) 95 + 96 + #define SET_PSTATE_PAN(x) __emit_inst(0xd500401f | PSTATE_PAN | ((!!x) << PSTATE_Imm_shift)) 97 + #define SET_PSTATE_UAO(x) __emit_inst(0xd500401f | PSTATE_UAO | ((!!x) << PSTATE_Imm_shift)) 98 + #define SET_PSTATE_SSBS(x) __emit_inst(0xd500401f | PSTATE_SSBS | ((!!x) << PSTATE_Imm_shift)) 99 + #define SET_PSTATE_TCO(x) __emit_inst(0xd500401f | PSTATE_TCO | ((!!x) << PSTATE_Imm_shift)) 100 + 101 + #define set_pstate_pan(x) asm volatile(SET_PSTATE_PAN(x)) 102 + #define set_pstate_uao(x) asm volatile(SET_PSTATE_UAO(x)) 103 + #define set_pstate_ssbs(x) asm volatile(SET_PSTATE_SSBS(x)) 104 + 105 + #define __SYS_BARRIER_INSN(CRm, op2, Rt) \ 106 + __emit_inst(0xd5000000 | sys_insn(0, 3, 3, (CRm), (op2)) | ((Rt) & 0x1f)) 107 + 108 + #define SB_BARRIER_INSN __SYS_BARRIER_INSN(0, 7, 31) 109 + 110 + #define SYS_DC_ISW sys_insn(1, 0, 7, 6, 2) 111 + #define SYS_DC_CSW sys_insn(1, 0, 7, 10, 2) 112 + #define SYS_DC_CISW sys_insn(1, 0, 7, 14, 2) 113 + 114 + /* 115 + * System registers, organised loosely by encoding but grouped together 116 + * where the architected name contains an index. e.g. ID_MMFR<n>_EL1. 117 + */ 118 + #define SYS_OSDTRRX_EL1 sys_reg(2, 0, 0, 0, 2) 119 + #define SYS_MDCCINT_EL1 sys_reg(2, 0, 0, 2, 0) 120 + #define SYS_MDSCR_EL1 sys_reg(2, 0, 0, 2, 2) 121 + #define SYS_OSDTRTX_EL1 sys_reg(2, 0, 0, 3, 2) 122 + #define SYS_OSECCR_EL1 sys_reg(2, 0, 0, 6, 2) 123 + #define SYS_DBGBVRn_EL1(n) sys_reg(2, 0, 0, n, 4) 124 + #define SYS_DBGBCRn_EL1(n) sys_reg(2, 0, 0, n, 5) 125 + #define SYS_DBGWVRn_EL1(n) sys_reg(2, 0, 0, n, 6) 126 + #define SYS_DBGWCRn_EL1(n) sys_reg(2, 0, 0, n, 7) 127 + #define SYS_MDRAR_EL1 sys_reg(2, 0, 1, 0, 0) 128 + #define SYS_OSLAR_EL1 sys_reg(2, 0, 1, 0, 4) 129 + #define SYS_OSLSR_EL1 sys_reg(2, 0, 1, 1, 4) 130 + #define SYS_OSDLR_EL1 sys_reg(2, 0, 1, 3, 4) 131 + #define SYS_DBGPRCR_EL1 sys_reg(2, 0, 1, 4, 4) 132 + #define SYS_DBGCLAIMSET_EL1 sys_reg(2, 0, 7, 8, 6) 133 + #define SYS_DBGCLAIMCLR_EL1 sys_reg(2, 0, 7, 9, 6) 134 + #define SYS_DBGAUTHSTATUS_EL1 sys_reg(2, 0, 7, 14, 6) 135 + #define SYS_MDCCSR_EL0 sys_reg(2, 3, 0, 1, 0) 136 + #define SYS_DBGDTR_EL0 sys_reg(2, 3, 0, 4, 0) 137 + #define SYS_DBGDTRRX_EL0 sys_reg(2, 3, 0, 5, 0) 138 + #define SYS_DBGDTRTX_EL0 sys_reg(2, 3, 0, 5, 0) 139 + #define SYS_DBGVCR32_EL2 sys_reg(2, 4, 0, 7, 0) 140 + 141 + #define SYS_MIDR_EL1 sys_reg(3, 0, 0, 0, 0) 142 + #define SYS_MPIDR_EL1 sys_reg(3, 0, 0, 0, 5) 143 + #define SYS_REVIDR_EL1 sys_reg(3, 0, 0, 0, 6) 144 + 145 + #define SYS_ID_PFR0_EL1 sys_reg(3, 0, 0, 1, 0) 146 + #define SYS_ID_PFR1_EL1 sys_reg(3, 0, 0, 1, 1) 147 + #define SYS_ID_PFR2_EL1 sys_reg(3, 0, 0, 3, 4) 148 + #define SYS_ID_DFR0_EL1 sys_reg(3, 0, 0, 1, 2) 149 + #define SYS_ID_DFR1_EL1 sys_reg(3, 0, 0, 3, 5) 150 + #define SYS_ID_AFR0_EL1 sys_reg(3, 0, 0, 1, 3) 151 + #define SYS_ID_MMFR0_EL1 sys_reg(3, 0, 0, 1, 4) 152 + #define SYS_ID_MMFR1_EL1 sys_reg(3, 0, 0, 1, 5) 153 + #define SYS_ID_MMFR2_EL1 sys_reg(3, 0, 0, 1, 6) 154 + #define SYS_ID_MMFR3_EL1 sys_reg(3, 0, 0, 1, 7) 155 + #define SYS_ID_MMFR4_EL1 sys_reg(3, 0, 0, 2, 6) 156 + #define SYS_ID_MMFR5_EL1 sys_reg(3, 0, 0, 3, 6) 157 + 158 + #define SYS_ID_ISAR0_EL1 sys_reg(3, 0, 0, 2, 0) 159 + #define SYS_ID_ISAR1_EL1 sys_reg(3, 0, 0, 2, 1) 160 + #define SYS_ID_ISAR2_EL1 sys_reg(3, 0, 0, 2, 2) 161 + #define SYS_ID_ISAR3_EL1 sys_reg(3, 0, 0, 2, 3) 162 + #define SYS_ID_ISAR4_EL1 sys_reg(3, 0, 0, 2, 4) 163 + #define SYS_ID_ISAR5_EL1 sys_reg(3, 0, 0, 2, 5) 164 + #define SYS_ID_ISAR6_EL1 sys_reg(3, 0, 0, 2, 7) 165 + 166 + #define SYS_MVFR0_EL1 sys_reg(3, 0, 0, 3, 0) 167 + #define SYS_MVFR1_EL1 sys_reg(3, 0, 0, 3, 1) 168 + #define SYS_MVFR2_EL1 sys_reg(3, 0, 0, 3, 2) 169 + 170 + #define SYS_ID_AA64PFR0_EL1 sys_reg(3, 0, 0, 4, 0) 171 + #define SYS_ID_AA64PFR1_EL1 sys_reg(3, 0, 0, 4, 1) 172 + #define SYS_ID_AA64ZFR0_EL1 sys_reg(3, 0, 0, 4, 4) 173 + 174 + #define SYS_ID_AA64DFR0_EL1 sys_reg(3, 0, 0, 5, 0) 175 + #define SYS_ID_AA64DFR1_EL1 sys_reg(3, 0, 0, 5, 1) 176 + 177 + #define SYS_ID_AA64AFR0_EL1 sys_reg(3, 0, 0, 5, 4) 178 + #define SYS_ID_AA64AFR1_EL1 sys_reg(3, 0, 0, 5, 5) 179 + 180 + #define SYS_ID_AA64ISAR0_EL1 sys_reg(3, 0, 0, 6, 0) 181 + #define SYS_ID_AA64ISAR1_EL1 sys_reg(3, 0, 0, 6, 1) 182 + 183 + #define SYS_ID_AA64MMFR0_EL1 sys_reg(3, 0, 0, 7, 0) 184 + #define SYS_ID_AA64MMFR1_EL1 sys_reg(3, 0, 0, 7, 1) 185 + #define SYS_ID_AA64MMFR2_EL1 sys_reg(3, 0, 0, 7, 2) 186 + 187 + #define SYS_SCTLR_EL1 sys_reg(3, 0, 1, 0, 0) 188 + #define SYS_ACTLR_EL1 sys_reg(3, 0, 1, 0, 1) 189 + #define SYS_CPACR_EL1 sys_reg(3, 0, 1, 0, 2) 190 + #define SYS_RGSR_EL1 sys_reg(3, 0, 1, 0, 5) 191 + #define SYS_GCR_EL1 sys_reg(3, 0, 1, 0, 6) 192 + 193 + #define SYS_ZCR_EL1 sys_reg(3, 0, 1, 2, 0) 194 + #define SYS_TRFCR_EL1 sys_reg(3, 0, 1, 2, 1) 195 + 196 + #define SYS_TTBR0_EL1 sys_reg(3, 0, 2, 0, 0) 197 + #define SYS_TTBR1_EL1 sys_reg(3, 0, 2, 0, 1) 198 + #define SYS_TCR_EL1 sys_reg(3, 0, 2, 0, 2) 199 + 200 + #define SYS_APIAKEYLO_EL1 sys_reg(3, 0, 2, 1, 0) 201 + #define SYS_APIAKEYHI_EL1 sys_reg(3, 0, 2, 1, 1) 202 + #define SYS_APIBKEYLO_EL1 sys_reg(3, 0, 2, 1, 2) 203 + #define SYS_APIBKEYHI_EL1 sys_reg(3, 0, 2, 1, 3) 204 + 205 + #define SYS_APDAKEYLO_EL1 sys_reg(3, 0, 2, 2, 0) 206 + #define SYS_APDAKEYHI_EL1 sys_reg(3, 0, 2, 2, 1) 207 + #define SYS_APDBKEYLO_EL1 sys_reg(3, 0, 2, 2, 2) 208 + #define SYS_APDBKEYHI_EL1 sys_reg(3, 0, 2, 2, 3) 209 + 210 + #define SYS_APGAKEYLO_EL1 sys_reg(3, 0, 2, 3, 0) 211 + #define SYS_APGAKEYHI_EL1 sys_reg(3, 0, 2, 3, 1) 212 + 213 + #define SYS_SPSR_EL1 sys_reg(3, 0, 4, 0, 0) 214 + #define SYS_ELR_EL1 sys_reg(3, 0, 4, 0, 1) 215 + 216 + #define SYS_ICC_PMR_EL1 sys_reg(3, 0, 4, 6, 0) 217 + 218 + #define SYS_AFSR0_EL1 sys_reg(3, 0, 5, 1, 0) 219 + #define SYS_AFSR1_EL1 sys_reg(3, 0, 5, 1, 1) 220 + #define SYS_ESR_EL1 sys_reg(3, 0, 5, 2, 0) 221 + 222 + #define SYS_ERRIDR_EL1 sys_reg(3, 0, 5, 3, 0) 223 + #define SYS_ERRSELR_EL1 sys_reg(3, 0, 5, 3, 1) 224 + #define SYS_ERXFR_EL1 sys_reg(3, 0, 5, 4, 0) 225 + #define SYS_ERXCTLR_EL1 sys_reg(3, 0, 5, 4, 1) 226 + #define SYS_ERXSTATUS_EL1 sys_reg(3, 0, 5, 4, 2) 227 + #define SYS_ERXADDR_EL1 sys_reg(3, 0, 5, 4, 3) 228 + #define SYS_ERXMISC0_EL1 sys_reg(3, 0, 5, 5, 0) 229 + #define SYS_ERXMISC1_EL1 sys_reg(3, 0, 5, 5, 1) 230 + #define SYS_TFSR_EL1 sys_reg(3, 0, 5, 6, 0) 231 + #define SYS_TFSRE0_EL1 sys_reg(3, 0, 5, 6, 1) 232 + 233 + #define SYS_FAR_EL1 sys_reg(3, 0, 6, 0, 0) 234 + #define SYS_PAR_EL1 sys_reg(3, 0, 7, 4, 0) 235 + 236 + #define SYS_PAR_EL1_F BIT(0) 237 + #define SYS_PAR_EL1_FST GENMASK(6, 1) 238 + 239 + /*** Statistical Profiling Extension ***/ 240 + /* ID registers */ 241 + #define SYS_PMSIDR_EL1 sys_reg(3, 0, 9, 9, 7) 242 + #define SYS_PMSIDR_EL1_FE_SHIFT 0 243 + #define SYS_PMSIDR_EL1_FT_SHIFT 1 244 + #define SYS_PMSIDR_EL1_FL_SHIFT 2 245 + #define SYS_PMSIDR_EL1_ARCHINST_SHIFT 3 246 + #define SYS_PMSIDR_EL1_LDS_SHIFT 4 247 + #define SYS_PMSIDR_EL1_ERND_SHIFT 5 248 + #define SYS_PMSIDR_EL1_INTERVAL_SHIFT 8 249 + #define SYS_PMSIDR_EL1_INTERVAL_MASK 0xfUL 250 + #define SYS_PMSIDR_EL1_MAXSIZE_SHIFT 12 251 + #define SYS_PMSIDR_EL1_MAXSIZE_MASK 0xfUL 252 + #define SYS_PMSIDR_EL1_COUNTSIZE_SHIFT 16 253 + #define SYS_PMSIDR_EL1_COUNTSIZE_MASK 0xfUL 254 + 255 + #define SYS_PMBIDR_EL1 sys_reg(3, 0, 9, 10, 7) 256 + #define SYS_PMBIDR_EL1_ALIGN_SHIFT 0 257 + #define SYS_PMBIDR_EL1_ALIGN_MASK 0xfU 258 + #define SYS_PMBIDR_EL1_P_SHIFT 4 259 + #define SYS_PMBIDR_EL1_F_SHIFT 5 260 + 261 + /* Sampling controls */ 262 + #define SYS_PMSCR_EL1 sys_reg(3, 0, 9, 9, 0) 263 + #define SYS_PMSCR_EL1_E0SPE_SHIFT 0 264 + #define SYS_PMSCR_EL1_E1SPE_SHIFT 1 265 + #define SYS_PMSCR_EL1_CX_SHIFT 3 266 + #define SYS_PMSCR_EL1_PA_SHIFT 4 267 + #define SYS_PMSCR_EL1_TS_SHIFT 5 268 + #define SYS_PMSCR_EL1_PCT_SHIFT 6 269 + 270 + #define SYS_PMSCR_EL2 sys_reg(3, 4, 9, 9, 0) 271 + #define SYS_PMSCR_EL2_E0HSPE_SHIFT 0 272 + #define SYS_PMSCR_EL2_E2SPE_SHIFT 1 273 + #define SYS_PMSCR_EL2_CX_SHIFT 3 274 + #define SYS_PMSCR_EL2_PA_SHIFT 4 275 + #define SYS_PMSCR_EL2_TS_SHIFT 5 276 + #define SYS_PMSCR_EL2_PCT_SHIFT 6 277 + 278 + #define SYS_PMSICR_EL1 sys_reg(3, 0, 9, 9, 2) 279 + 280 + #define SYS_PMSIRR_EL1 sys_reg(3, 0, 9, 9, 3) 281 + #define SYS_PMSIRR_EL1_RND_SHIFT 0 282 + #define SYS_PMSIRR_EL1_INTERVAL_SHIFT 8 283 + #define SYS_PMSIRR_EL1_INTERVAL_MASK 0xffffffUL 284 + 285 + /* Filtering controls */ 286 + #define SYS_PMSNEVFR_EL1 sys_reg(3, 0, 9, 9, 1) 287 + 288 + #define SYS_PMSFCR_EL1 sys_reg(3, 0, 9, 9, 4) 289 + #define SYS_PMSFCR_EL1_FE_SHIFT 0 290 + #define SYS_PMSFCR_EL1_FT_SHIFT 1 291 + #define SYS_PMSFCR_EL1_FL_SHIFT 2 292 + #define SYS_PMSFCR_EL1_B_SHIFT 16 293 + #define SYS_PMSFCR_EL1_LD_SHIFT 17 294 + #define SYS_PMSFCR_EL1_ST_SHIFT 18 295 + 296 + #define SYS_PMSEVFR_EL1 sys_reg(3, 0, 9, 9, 5) 297 + #define SYS_PMSEVFR_EL1_RES0_8_2 \ 298 + (GENMASK_ULL(47, 32) | GENMASK_ULL(23, 16) | GENMASK_ULL(11, 8) |\ 299 + BIT_ULL(6) | BIT_ULL(4) | BIT_ULL(2) | BIT_ULL(0)) 300 + #define SYS_PMSEVFR_EL1_RES0_8_3 \ 301 + (SYS_PMSEVFR_EL1_RES0_8_2 & ~(BIT_ULL(18) | BIT_ULL(17) | BIT_ULL(11))) 302 + 303 + #define SYS_PMSLATFR_EL1 sys_reg(3, 0, 9, 9, 6) 304 + #define SYS_PMSLATFR_EL1_MINLAT_SHIFT 0 305 + 306 + /* Buffer controls */ 307 + #define SYS_PMBLIMITR_EL1 sys_reg(3, 0, 9, 10, 0) 308 + #define SYS_PMBLIMITR_EL1_E_SHIFT 0 309 + #define SYS_PMBLIMITR_EL1_FM_SHIFT 1 310 + #define SYS_PMBLIMITR_EL1_FM_MASK 0x3UL 311 + #define SYS_PMBLIMITR_EL1_FM_STOP_IRQ (0 << SYS_PMBLIMITR_EL1_FM_SHIFT) 312 + 313 + #define SYS_PMBPTR_EL1 sys_reg(3, 0, 9, 10, 1) 314 + 315 + /* Buffer error reporting */ 316 + #define SYS_PMBSR_EL1 sys_reg(3, 0, 9, 10, 3) 317 + #define SYS_PMBSR_EL1_COLL_SHIFT 16 318 + #define SYS_PMBSR_EL1_S_SHIFT 17 319 + #define SYS_PMBSR_EL1_EA_SHIFT 18 320 + #define SYS_PMBSR_EL1_DL_SHIFT 19 321 + #define SYS_PMBSR_EL1_EC_SHIFT 26 322 + #define SYS_PMBSR_EL1_EC_MASK 0x3fUL 323 + 324 + #define SYS_PMBSR_EL1_EC_BUF (0x0UL << SYS_PMBSR_EL1_EC_SHIFT) 325 + #define SYS_PMBSR_EL1_EC_FAULT_S1 (0x24UL << SYS_PMBSR_EL1_EC_SHIFT) 326 + #define SYS_PMBSR_EL1_EC_FAULT_S2 (0x25UL << SYS_PMBSR_EL1_EC_SHIFT) 327 + 328 + #define SYS_PMBSR_EL1_FAULT_FSC_SHIFT 0 329 + #define SYS_PMBSR_EL1_FAULT_FSC_MASK 0x3fUL 330 + 331 + #define SYS_PMBSR_EL1_BUF_BSC_SHIFT 0 332 + #define SYS_PMBSR_EL1_BUF_BSC_MASK 0x3fUL 333 + 334 + #define SYS_PMBSR_EL1_BUF_BSC_FULL (0x1UL << SYS_PMBSR_EL1_BUF_BSC_SHIFT) 335 + 336 + /*** End of Statistical Profiling Extension ***/ 337 + 338 + /* 339 + * TRBE Registers 340 + */ 341 + #define SYS_TRBLIMITR_EL1 sys_reg(3, 0, 9, 11, 0) 342 + #define SYS_TRBPTR_EL1 sys_reg(3, 0, 9, 11, 1) 343 + #define SYS_TRBBASER_EL1 sys_reg(3, 0, 9, 11, 2) 344 + #define SYS_TRBSR_EL1 sys_reg(3, 0, 9, 11, 3) 345 + #define SYS_TRBMAR_EL1 sys_reg(3, 0, 9, 11, 4) 346 + #define SYS_TRBTRG_EL1 sys_reg(3, 0, 9, 11, 6) 347 + #define SYS_TRBIDR_EL1 sys_reg(3, 0, 9, 11, 7) 348 + 349 + #define TRBLIMITR_LIMIT_MASK GENMASK_ULL(51, 0) 350 + #define TRBLIMITR_LIMIT_SHIFT 12 351 + #define TRBLIMITR_NVM BIT(5) 352 + #define TRBLIMITR_TRIG_MODE_MASK GENMASK(1, 0) 353 + #define TRBLIMITR_TRIG_MODE_SHIFT 3 354 + #define TRBLIMITR_FILL_MODE_MASK GENMASK(1, 0) 355 + #define TRBLIMITR_FILL_MODE_SHIFT 1 356 + #define TRBLIMITR_ENABLE BIT(0) 357 + #define TRBPTR_PTR_MASK GENMASK_ULL(63, 0) 358 + #define TRBPTR_PTR_SHIFT 0 359 + #define TRBBASER_BASE_MASK GENMASK_ULL(51, 0) 360 + #define TRBBASER_BASE_SHIFT 12 361 + #define TRBSR_EC_MASK GENMASK(5, 0) 362 + #define TRBSR_EC_SHIFT 26 363 + #define TRBSR_IRQ BIT(22) 364 + #define TRBSR_TRG BIT(21) 365 + #define TRBSR_WRAP BIT(20) 366 + #define TRBSR_ABORT BIT(18) 367 + #define TRBSR_STOP BIT(17) 368 + #define TRBSR_MSS_MASK GENMASK(15, 0) 369 + #define TRBSR_MSS_SHIFT 0 370 + #define TRBSR_BSC_MASK GENMASK(5, 0) 371 + #define TRBSR_BSC_SHIFT 0 372 + #define TRBSR_FSC_MASK GENMASK(5, 0) 373 + #define TRBSR_FSC_SHIFT 0 374 + #define TRBMAR_SHARE_MASK GENMASK(1, 0) 375 + #define TRBMAR_SHARE_SHIFT 8 376 + #define TRBMAR_OUTER_MASK GENMASK(3, 0) 377 + #define TRBMAR_OUTER_SHIFT 4 378 + #define TRBMAR_INNER_MASK GENMASK(3, 0) 379 + #define TRBMAR_INNER_SHIFT 0 380 + #define TRBTRG_TRG_MASK GENMASK(31, 0) 381 + #define TRBTRG_TRG_SHIFT 0 382 + #define TRBIDR_FLAG BIT(5) 383 + #define TRBIDR_PROG BIT(4) 384 + #define TRBIDR_ALIGN_MASK GENMASK(3, 0) 385 + #define TRBIDR_ALIGN_SHIFT 0 386 + 387 + #define SYS_PMINTENSET_EL1 sys_reg(3, 0, 9, 14, 1) 388 + #define SYS_PMINTENCLR_EL1 sys_reg(3, 0, 9, 14, 2) 389 + 390 + #define SYS_PMMIR_EL1 sys_reg(3, 0, 9, 14, 6) 391 + 392 + #define SYS_MAIR_EL1 sys_reg(3, 0, 10, 2, 0) 393 + #define SYS_AMAIR_EL1 sys_reg(3, 0, 10, 3, 0) 394 + 395 + #define SYS_LORSA_EL1 sys_reg(3, 0, 10, 4, 0) 396 + #define SYS_LOREA_EL1 sys_reg(3, 0, 10, 4, 1) 397 + #define SYS_LORN_EL1 sys_reg(3, 0, 10, 4, 2) 398 + #define SYS_LORC_EL1 sys_reg(3, 0, 10, 4, 3) 399 + #define SYS_LORID_EL1 sys_reg(3, 0, 10, 4, 7) 400 + 401 + #define SYS_VBAR_EL1 sys_reg(3, 0, 12, 0, 0) 402 + #define SYS_DISR_EL1 sys_reg(3, 0, 12, 1, 1) 403 + 404 + #define SYS_ICC_IAR0_EL1 sys_reg(3, 0, 12, 8, 0) 405 + #define SYS_ICC_EOIR0_EL1 sys_reg(3, 0, 12, 8, 1) 406 + #define SYS_ICC_HPPIR0_EL1 sys_reg(3, 0, 12, 8, 2) 407 + #define SYS_ICC_BPR0_EL1 sys_reg(3, 0, 12, 8, 3) 408 + #define SYS_ICC_AP0Rn_EL1(n) sys_reg(3, 0, 12, 8, 4 | n) 409 + #define SYS_ICC_AP0R0_EL1 SYS_ICC_AP0Rn_EL1(0) 410 + #define SYS_ICC_AP0R1_EL1 SYS_ICC_AP0Rn_EL1(1) 411 + #define SYS_ICC_AP0R2_EL1 SYS_ICC_AP0Rn_EL1(2) 412 + #define SYS_ICC_AP0R3_EL1 SYS_ICC_AP0Rn_EL1(3) 413 + #define SYS_ICC_AP1Rn_EL1(n) sys_reg(3, 0, 12, 9, n) 414 + #define SYS_ICC_AP1R0_EL1 SYS_ICC_AP1Rn_EL1(0) 415 + #define SYS_ICC_AP1R1_EL1 SYS_ICC_AP1Rn_EL1(1) 416 + #define SYS_ICC_AP1R2_EL1 SYS_ICC_AP1Rn_EL1(2) 417 + #define SYS_ICC_AP1R3_EL1 SYS_ICC_AP1Rn_EL1(3) 418 + #define SYS_ICC_DIR_EL1 sys_reg(3, 0, 12, 11, 1) 419 + #define SYS_ICC_RPR_EL1 sys_reg(3, 0, 12, 11, 3) 420 + #define SYS_ICC_SGI1R_EL1 sys_reg(3, 0, 12, 11, 5) 421 + #define SYS_ICC_ASGI1R_EL1 sys_reg(3, 0, 12, 11, 6) 422 + #define SYS_ICC_SGI0R_EL1 sys_reg(3, 0, 12, 11, 7) 423 + #define SYS_ICC_IAR1_EL1 sys_reg(3, 0, 12, 12, 0) 424 + #define SYS_ICC_EOIR1_EL1 sys_reg(3, 0, 12, 12, 1) 425 + #define SYS_ICC_HPPIR1_EL1 sys_reg(3, 0, 12, 12, 2) 426 + #define SYS_ICC_BPR1_EL1 sys_reg(3, 0, 12, 12, 3) 427 + #define SYS_ICC_CTLR_EL1 sys_reg(3, 0, 12, 12, 4) 428 + #define SYS_ICC_SRE_EL1 sys_reg(3, 0, 12, 12, 5) 429 + #define SYS_ICC_IGRPEN0_EL1 sys_reg(3, 0, 12, 12, 6) 430 + #define SYS_ICC_IGRPEN1_EL1 sys_reg(3, 0, 12, 12, 7) 431 + 432 + #define SYS_CONTEXTIDR_EL1 sys_reg(3, 0, 13, 0, 1) 433 + #define SYS_TPIDR_EL1 sys_reg(3, 0, 13, 0, 4) 434 + 435 + #define SYS_SCXTNUM_EL1 sys_reg(3, 0, 13, 0, 7) 436 + 437 + #define SYS_CNTKCTL_EL1 sys_reg(3, 0, 14, 1, 0) 438 + 439 + #define SYS_CCSIDR_EL1 sys_reg(3, 1, 0, 0, 0) 440 + #define SYS_CLIDR_EL1 sys_reg(3, 1, 0, 0, 1) 441 + #define SYS_GMID_EL1 sys_reg(3, 1, 0, 0, 4) 442 + #define SYS_AIDR_EL1 sys_reg(3, 1, 0, 0, 7) 443 + 444 + #define SYS_CSSELR_EL1 sys_reg(3, 2, 0, 0, 0) 445 + 446 + #define SYS_CTR_EL0 sys_reg(3, 3, 0, 0, 1) 447 + #define SYS_DCZID_EL0 sys_reg(3, 3, 0, 0, 7) 448 + 449 + #define SYS_RNDR_EL0 sys_reg(3, 3, 2, 4, 0) 450 + #define SYS_RNDRRS_EL0 sys_reg(3, 3, 2, 4, 1) 451 + 452 + #define SYS_PMCR_EL0 sys_reg(3, 3, 9, 12, 0) 453 + #define SYS_PMCNTENSET_EL0 sys_reg(3, 3, 9, 12, 1) 454 + #define SYS_PMCNTENCLR_EL0 sys_reg(3, 3, 9, 12, 2) 455 + #define SYS_PMOVSCLR_EL0 sys_reg(3, 3, 9, 12, 3) 456 + #define SYS_PMSWINC_EL0 sys_reg(3, 3, 9, 12, 4) 457 + #define SYS_PMSELR_EL0 sys_reg(3, 3, 9, 12, 5) 458 + #define SYS_PMCEID0_EL0 sys_reg(3, 3, 9, 12, 6) 459 + #define SYS_PMCEID1_EL0 sys_reg(3, 3, 9, 12, 7) 460 + #define SYS_PMCCNTR_EL0 sys_reg(3, 3, 9, 13, 0) 461 + #define SYS_PMXEVTYPER_EL0 sys_reg(3, 3, 9, 13, 1) 462 + #define SYS_PMXEVCNTR_EL0 sys_reg(3, 3, 9, 13, 2) 463 + #define SYS_PMUSERENR_EL0 sys_reg(3, 3, 9, 14, 0) 464 + #define SYS_PMOVSSET_EL0 sys_reg(3, 3, 9, 14, 3) 465 + 466 + #define SYS_TPIDR_EL0 sys_reg(3, 3, 13, 0, 2) 467 + #define SYS_TPIDRRO_EL0 sys_reg(3, 3, 13, 0, 3) 468 + 469 + #define SYS_SCXTNUM_EL0 sys_reg(3, 3, 13, 0, 7) 470 + 471 + /* Definitions for system register interface to AMU for ARMv8.4 onwards */ 472 + #define SYS_AM_EL0(crm, op2) sys_reg(3, 3, 13, (crm), (op2)) 473 + #define SYS_AMCR_EL0 SYS_AM_EL0(2, 0) 474 + #define SYS_AMCFGR_EL0 SYS_AM_EL0(2, 1) 475 + #define SYS_AMCGCR_EL0 SYS_AM_EL0(2, 2) 476 + #define SYS_AMUSERENR_EL0 SYS_AM_EL0(2, 3) 477 + #define SYS_AMCNTENCLR0_EL0 SYS_AM_EL0(2, 4) 478 + #define SYS_AMCNTENSET0_EL0 SYS_AM_EL0(2, 5) 479 + #define SYS_AMCNTENCLR1_EL0 SYS_AM_EL0(3, 0) 480 + #define SYS_AMCNTENSET1_EL0 SYS_AM_EL0(3, 1) 481 + 482 + /* 483 + * Group 0 of activity monitors (architected): 484 + * op0 op1 CRn CRm op2 485 + * Counter: 11 011 1101 010:n<3> n<2:0> 486 + * Type: 11 011 1101 011:n<3> n<2:0> 487 + * n: 0-15 488 + * 489 + * Group 1 of activity monitors (auxiliary): 490 + * op0 op1 CRn CRm op2 491 + * Counter: 11 011 1101 110:n<3> n<2:0> 492 + * Type: 11 011 1101 111:n<3> n<2:0> 493 + * n: 0-15 494 + */ 495 + 496 + #define SYS_AMEVCNTR0_EL0(n) SYS_AM_EL0(4 + ((n) >> 3), (n) & 7) 497 + #define SYS_AMEVTYPER0_EL0(n) SYS_AM_EL0(6 + ((n) >> 3), (n) & 7) 498 + #define SYS_AMEVCNTR1_EL0(n) SYS_AM_EL0(12 + ((n) >> 3), (n) & 7) 499 + #define SYS_AMEVTYPER1_EL0(n) SYS_AM_EL0(14 + ((n) >> 3), (n) & 7) 500 + 501 + /* AMU v1: Fixed (architecturally defined) activity monitors */ 502 + #define SYS_AMEVCNTR0_CORE_EL0 SYS_AMEVCNTR0_EL0(0) 503 + #define SYS_AMEVCNTR0_CONST_EL0 SYS_AMEVCNTR0_EL0(1) 504 + #define SYS_AMEVCNTR0_INST_RET_EL0 SYS_AMEVCNTR0_EL0(2) 505 + #define SYS_AMEVCNTR0_MEM_STALL SYS_AMEVCNTR0_EL0(3) 506 + 507 + #define SYS_CNTFRQ_EL0 sys_reg(3, 3, 14, 0, 0) 508 + 509 + #define SYS_CNTP_TVAL_EL0 sys_reg(3, 3, 14, 2, 0) 510 + #define SYS_CNTP_CTL_EL0 sys_reg(3, 3, 14, 2, 1) 511 + #define SYS_CNTP_CVAL_EL0 sys_reg(3, 3, 14, 2, 2) 512 + 513 + #define SYS_CNTV_CTL_EL0 sys_reg(3, 3, 14, 3, 1) 514 + #define SYS_CNTV_CVAL_EL0 sys_reg(3, 3, 14, 3, 2) 515 + 516 + #define SYS_AARCH32_CNTP_TVAL sys_reg(0, 0, 14, 2, 0) 517 + #define SYS_AARCH32_CNTP_CTL sys_reg(0, 0, 14, 2, 1) 518 + #define SYS_AARCH32_CNTP_CVAL sys_reg(0, 2, 0, 14, 0) 519 + 520 + #define __PMEV_op2(n) ((n) & 0x7) 521 + #define __CNTR_CRm(n) (0x8 | (((n) >> 3) & 0x3)) 522 + #define SYS_PMEVCNTRn_EL0(n) sys_reg(3, 3, 14, __CNTR_CRm(n), __PMEV_op2(n)) 523 + #define __TYPER_CRm(n) (0xc | (((n) >> 3) & 0x3)) 524 + #define SYS_PMEVTYPERn_EL0(n) sys_reg(3, 3, 14, __TYPER_CRm(n), __PMEV_op2(n)) 525 + 526 + #define SYS_PMCCFILTR_EL0 sys_reg(3, 3, 14, 15, 7) 527 + 528 + #define SYS_SCTLR_EL2 sys_reg(3, 4, 1, 0, 0) 529 + #define SYS_HFGRTR_EL2 sys_reg(3, 4, 1, 1, 4) 530 + #define SYS_HFGWTR_EL2 sys_reg(3, 4, 1, 1, 5) 531 + #define SYS_HFGITR_EL2 sys_reg(3, 4, 1, 1, 6) 532 + #define SYS_ZCR_EL2 sys_reg(3, 4, 1, 2, 0) 533 + #define SYS_TRFCR_EL2 sys_reg(3, 4, 1, 2, 1) 534 + #define SYS_DACR32_EL2 sys_reg(3, 4, 3, 0, 0) 535 + #define SYS_HDFGRTR_EL2 sys_reg(3, 4, 3, 1, 4) 536 + #define SYS_HDFGWTR_EL2 sys_reg(3, 4, 3, 1, 5) 537 + #define SYS_HAFGRTR_EL2 sys_reg(3, 4, 3, 1, 6) 538 + #define SYS_SPSR_EL2 sys_reg(3, 4, 4, 0, 0) 539 + #define SYS_ELR_EL2 sys_reg(3, 4, 4, 0, 1) 540 + #define SYS_IFSR32_EL2 sys_reg(3, 4, 5, 0, 1) 541 + #define SYS_ESR_EL2 sys_reg(3, 4, 5, 2, 0) 542 + #define SYS_VSESR_EL2 sys_reg(3, 4, 5, 2, 3) 543 + #define SYS_FPEXC32_EL2 sys_reg(3, 4, 5, 3, 0) 544 + #define SYS_TFSR_EL2 sys_reg(3, 4, 5, 6, 0) 545 + #define SYS_FAR_EL2 sys_reg(3, 4, 6, 0, 0) 546 + 547 + #define SYS_VDISR_EL2 sys_reg(3, 4, 12, 1, 1) 548 + #define __SYS__AP0Rx_EL2(x) sys_reg(3, 4, 12, 8, x) 549 + #define SYS_ICH_AP0R0_EL2 __SYS__AP0Rx_EL2(0) 550 + #define SYS_ICH_AP0R1_EL2 __SYS__AP0Rx_EL2(1) 551 + #define SYS_ICH_AP0R2_EL2 __SYS__AP0Rx_EL2(2) 552 + #define SYS_ICH_AP0R3_EL2 __SYS__AP0Rx_EL2(3) 553 + 554 + #define __SYS__AP1Rx_EL2(x) sys_reg(3, 4, 12, 9, x) 555 + #define SYS_ICH_AP1R0_EL2 __SYS__AP1Rx_EL2(0) 556 + #define SYS_ICH_AP1R1_EL2 __SYS__AP1Rx_EL2(1) 557 + #define SYS_ICH_AP1R2_EL2 __SYS__AP1Rx_EL2(2) 558 + #define SYS_ICH_AP1R3_EL2 __SYS__AP1Rx_EL2(3) 559 + 560 + #define SYS_ICH_VSEIR_EL2 sys_reg(3, 4, 12, 9, 4) 561 + #define SYS_ICC_SRE_EL2 sys_reg(3, 4, 12, 9, 5) 562 + #define SYS_ICH_HCR_EL2 sys_reg(3, 4, 12, 11, 0) 563 + #define SYS_ICH_VTR_EL2 sys_reg(3, 4, 12, 11, 1) 564 + #define SYS_ICH_MISR_EL2 sys_reg(3, 4, 12, 11, 2) 565 + #define SYS_ICH_EISR_EL2 sys_reg(3, 4, 12, 11, 3) 566 + #define SYS_ICH_ELRSR_EL2 sys_reg(3, 4, 12, 11, 5) 567 + #define SYS_ICH_VMCR_EL2 sys_reg(3, 4, 12, 11, 7) 568 + 569 + #define __SYS__LR0_EL2(x) sys_reg(3, 4, 12, 12, x) 570 + #define SYS_ICH_LR0_EL2 __SYS__LR0_EL2(0) 571 + #define SYS_ICH_LR1_EL2 __SYS__LR0_EL2(1) 572 + #define SYS_ICH_LR2_EL2 __SYS__LR0_EL2(2) 573 + #define SYS_ICH_LR3_EL2 __SYS__LR0_EL2(3) 574 + #define SYS_ICH_LR4_EL2 __SYS__LR0_EL2(4) 575 + #define SYS_ICH_LR5_EL2 __SYS__LR0_EL2(5) 576 + #define SYS_ICH_LR6_EL2 __SYS__LR0_EL2(6) 577 + #define SYS_ICH_LR7_EL2 __SYS__LR0_EL2(7) 578 + 579 + #define __SYS__LR8_EL2(x) sys_reg(3, 4, 12, 13, x) 580 + #define SYS_ICH_LR8_EL2 __SYS__LR8_EL2(0) 581 + #define SYS_ICH_LR9_EL2 __SYS__LR8_EL2(1) 582 + #define SYS_ICH_LR10_EL2 __SYS__LR8_EL2(2) 583 + #define SYS_ICH_LR11_EL2 __SYS__LR8_EL2(3) 584 + #define SYS_ICH_LR12_EL2 __SYS__LR8_EL2(4) 585 + #define SYS_ICH_LR13_EL2 __SYS__LR8_EL2(5) 586 + #define SYS_ICH_LR14_EL2 __SYS__LR8_EL2(6) 587 + #define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7) 588 + 589 + /* VHE encodings for architectural EL0/1 system registers */ 590 + #define SYS_SCTLR_EL12 sys_reg(3, 5, 1, 0, 0) 591 + #define SYS_CPACR_EL12 sys_reg(3, 5, 1, 0, 2) 592 + #define SYS_ZCR_EL12 sys_reg(3, 5, 1, 2, 0) 593 + #define SYS_TTBR0_EL12 sys_reg(3, 5, 2, 0, 0) 594 + #define SYS_TTBR1_EL12 sys_reg(3, 5, 2, 0, 1) 595 + #define SYS_TCR_EL12 sys_reg(3, 5, 2, 0, 2) 596 + #define SYS_SPSR_EL12 sys_reg(3, 5, 4, 0, 0) 597 + #define SYS_ELR_EL12 sys_reg(3, 5, 4, 0, 1) 598 + #define SYS_AFSR0_EL12 sys_reg(3, 5, 5, 1, 0) 599 + #define SYS_AFSR1_EL12 sys_reg(3, 5, 5, 1, 1) 600 + #define SYS_ESR_EL12 sys_reg(3, 5, 5, 2, 0) 601 + #define SYS_TFSR_EL12 sys_reg(3, 5, 5, 6, 0) 602 + #define SYS_FAR_EL12 sys_reg(3, 5, 6, 0, 0) 603 + #define SYS_MAIR_EL12 sys_reg(3, 5, 10, 2, 0) 604 + #define SYS_AMAIR_EL12 sys_reg(3, 5, 10, 3, 0) 605 + #define SYS_VBAR_EL12 sys_reg(3, 5, 12, 0, 0) 606 + #define SYS_CONTEXTIDR_EL12 sys_reg(3, 5, 13, 0, 1) 607 + #define SYS_CNTKCTL_EL12 sys_reg(3, 5, 14, 1, 0) 608 + #define SYS_CNTP_TVAL_EL02 sys_reg(3, 5, 14, 2, 0) 609 + #define SYS_CNTP_CTL_EL02 sys_reg(3, 5, 14, 2, 1) 610 + #define SYS_CNTP_CVAL_EL02 sys_reg(3, 5, 14, 2, 2) 611 + #define SYS_CNTV_TVAL_EL02 sys_reg(3, 5, 14, 3, 0) 612 + #define SYS_CNTV_CTL_EL02 sys_reg(3, 5, 14, 3, 1) 613 + #define SYS_CNTV_CVAL_EL02 sys_reg(3, 5, 14, 3, 2) 614 + 615 + /* Common SCTLR_ELx flags. */ 616 + #define SCTLR_ELx_DSSBS (BIT(44)) 617 + #define SCTLR_ELx_ATA (BIT(43)) 618 + 619 + #define SCTLR_ELx_TCF_SHIFT 40 620 + #define SCTLR_ELx_TCF_NONE (UL(0x0) << SCTLR_ELx_TCF_SHIFT) 621 + #define SCTLR_ELx_TCF_SYNC (UL(0x1) << SCTLR_ELx_TCF_SHIFT) 622 + #define SCTLR_ELx_TCF_ASYNC (UL(0x2) << SCTLR_ELx_TCF_SHIFT) 623 + #define SCTLR_ELx_TCF_MASK (UL(0x3) << SCTLR_ELx_TCF_SHIFT) 624 + 625 + #define SCTLR_ELx_ENIA_SHIFT 31 626 + 627 + #define SCTLR_ELx_ITFSB (BIT(37)) 628 + #define SCTLR_ELx_ENIA (BIT(SCTLR_ELx_ENIA_SHIFT)) 629 + #define SCTLR_ELx_ENIB (BIT(30)) 630 + #define SCTLR_ELx_ENDA (BIT(27)) 631 + #define SCTLR_ELx_EE (BIT(25)) 632 + #define SCTLR_ELx_IESB (BIT(21)) 633 + #define SCTLR_ELx_WXN (BIT(19)) 634 + #define SCTLR_ELx_ENDB (BIT(13)) 635 + #define SCTLR_ELx_I (BIT(12)) 636 + #define SCTLR_ELx_SA (BIT(3)) 637 + #define SCTLR_ELx_C (BIT(2)) 638 + #define SCTLR_ELx_A (BIT(1)) 639 + #define SCTLR_ELx_M (BIT(0)) 640 + 641 + /* SCTLR_EL2 specific flags. */ 642 + #define SCTLR_EL2_RES1 ((BIT(4)) | (BIT(5)) | (BIT(11)) | (BIT(16)) | \ 643 + (BIT(18)) | (BIT(22)) | (BIT(23)) | (BIT(28)) | \ 644 + (BIT(29))) 645 + 646 + #ifdef CONFIG_CPU_BIG_ENDIAN 647 + #define ENDIAN_SET_EL2 SCTLR_ELx_EE 648 + #else 649 + #define ENDIAN_SET_EL2 0 650 + #endif 651 + 652 + #define INIT_SCTLR_EL2_MMU_ON \ 653 + (SCTLR_ELx_M | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_ELx_I | \ 654 + SCTLR_ELx_IESB | SCTLR_ELx_WXN | ENDIAN_SET_EL2 | \ 655 + SCTLR_ELx_ITFSB | SCTLR_EL2_RES1) 656 + 657 + #define INIT_SCTLR_EL2_MMU_OFF \ 658 + (SCTLR_EL2_RES1 | ENDIAN_SET_EL2) 659 + 660 + /* SCTLR_EL1 specific flags. */ 661 + #define SCTLR_EL1_EPAN (BIT(57)) 662 + #define SCTLR_EL1_ATA0 (BIT(42)) 663 + 664 + #define SCTLR_EL1_TCF0_SHIFT 38 665 + #define SCTLR_EL1_TCF0_NONE (UL(0x0) << SCTLR_EL1_TCF0_SHIFT) 666 + #define SCTLR_EL1_TCF0_SYNC (UL(0x1) << SCTLR_EL1_TCF0_SHIFT) 667 + #define SCTLR_EL1_TCF0_ASYNC (UL(0x2) << SCTLR_EL1_TCF0_SHIFT) 668 + #define SCTLR_EL1_TCF0_MASK (UL(0x3) << SCTLR_EL1_TCF0_SHIFT) 669 + 670 + #define SCTLR_EL1_BT1 (BIT(36)) 671 + #define SCTLR_EL1_BT0 (BIT(35)) 672 + #define SCTLR_EL1_UCI (BIT(26)) 673 + #define SCTLR_EL1_E0E (BIT(24)) 674 + #define SCTLR_EL1_SPAN (BIT(23)) 675 + #define SCTLR_EL1_NTWE (BIT(18)) 676 + #define SCTLR_EL1_NTWI (BIT(16)) 677 + #define SCTLR_EL1_UCT (BIT(15)) 678 + #define SCTLR_EL1_DZE (BIT(14)) 679 + #define SCTLR_EL1_UMA (BIT(9)) 680 + #define SCTLR_EL1_SED (BIT(8)) 681 + #define SCTLR_EL1_ITD (BIT(7)) 682 + #define SCTLR_EL1_CP15BEN (BIT(5)) 683 + #define SCTLR_EL1_SA0 (BIT(4)) 684 + 685 + #define SCTLR_EL1_RES1 ((BIT(11)) | (BIT(20)) | (BIT(22)) | (BIT(28)) | \ 686 + (BIT(29))) 687 + 688 + #ifdef CONFIG_CPU_BIG_ENDIAN 689 + #define ENDIAN_SET_EL1 (SCTLR_EL1_E0E | SCTLR_ELx_EE) 690 + #else 691 + #define ENDIAN_SET_EL1 0 692 + #endif 693 + 694 + #define INIT_SCTLR_EL1_MMU_OFF \ 695 + (ENDIAN_SET_EL1 | SCTLR_EL1_RES1) 696 + 697 + #define INIT_SCTLR_EL1_MMU_ON \ 698 + (SCTLR_ELx_M | SCTLR_ELx_C | SCTLR_ELx_SA | SCTLR_EL1_SA0 | \ 699 + SCTLR_EL1_SED | SCTLR_ELx_I | SCTLR_EL1_DZE | SCTLR_EL1_UCT | \ 700 + SCTLR_EL1_NTWE | SCTLR_ELx_IESB | SCTLR_EL1_SPAN | SCTLR_ELx_ITFSB | \ 701 + SCTLR_ELx_ATA | SCTLR_EL1_ATA0 | ENDIAN_SET_EL1 | SCTLR_EL1_UCI | \ 702 + SCTLR_EL1_EPAN | SCTLR_EL1_RES1) 703 + 704 + /* MAIR_ELx memory attributes (used by Linux) */ 705 + #define MAIR_ATTR_DEVICE_nGnRnE UL(0x00) 706 + #define MAIR_ATTR_DEVICE_nGnRE UL(0x04) 707 + #define MAIR_ATTR_NORMAL_NC UL(0x44) 708 + #define MAIR_ATTR_NORMAL_TAGGED UL(0xf0) 709 + #define MAIR_ATTR_NORMAL UL(0xff) 710 + #define MAIR_ATTR_MASK UL(0xff) 711 + 712 + /* Position the attr at the correct index */ 713 + #define MAIR_ATTRIDX(attr, idx) ((attr) << ((idx) * 8)) 714 + 715 + /* id_aa64isar0 */ 716 + #define ID_AA64ISAR0_RNDR_SHIFT 60 717 + #define ID_AA64ISAR0_TLB_SHIFT 56 718 + #define ID_AA64ISAR0_TS_SHIFT 52 719 + #define ID_AA64ISAR0_FHM_SHIFT 48 720 + #define ID_AA64ISAR0_DP_SHIFT 44 721 + #define ID_AA64ISAR0_SM4_SHIFT 40 722 + #define ID_AA64ISAR0_SM3_SHIFT 36 723 + #define ID_AA64ISAR0_SHA3_SHIFT 32 724 + #define ID_AA64ISAR0_RDM_SHIFT 28 725 + #define ID_AA64ISAR0_ATOMICS_SHIFT 20 726 + #define ID_AA64ISAR0_CRC32_SHIFT 16 727 + #define ID_AA64ISAR0_SHA2_SHIFT 12 728 + #define ID_AA64ISAR0_SHA1_SHIFT 8 729 + #define ID_AA64ISAR0_AES_SHIFT 4 730 + 731 + #define ID_AA64ISAR0_TLB_RANGE_NI 0x0 732 + #define ID_AA64ISAR0_TLB_RANGE 0x2 733 + 734 + /* id_aa64isar1 */ 735 + #define ID_AA64ISAR1_I8MM_SHIFT 52 736 + #define ID_AA64ISAR1_DGH_SHIFT 48 737 + #define ID_AA64ISAR1_BF16_SHIFT 44 738 + #define ID_AA64ISAR1_SPECRES_SHIFT 40 739 + #define ID_AA64ISAR1_SB_SHIFT 36 740 + #define ID_AA64ISAR1_FRINTTS_SHIFT 32 741 + #define ID_AA64ISAR1_GPI_SHIFT 28 742 + #define ID_AA64ISAR1_GPA_SHIFT 24 743 + #define ID_AA64ISAR1_LRCPC_SHIFT 20 744 + #define ID_AA64ISAR1_FCMA_SHIFT 16 745 + #define ID_AA64ISAR1_JSCVT_SHIFT 12 746 + #define ID_AA64ISAR1_API_SHIFT 8 747 + #define ID_AA64ISAR1_APA_SHIFT 4 748 + #define ID_AA64ISAR1_DPB_SHIFT 0 749 + 750 + #define ID_AA64ISAR1_APA_NI 0x0 751 + #define ID_AA64ISAR1_APA_ARCHITECTED 0x1 752 + #define ID_AA64ISAR1_APA_ARCH_EPAC 0x2 753 + #define ID_AA64ISAR1_APA_ARCH_EPAC2 0x3 754 + #define ID_AA64ISAR1_APA_ARCH_EPAC2_FPAC 0x4 755 + #define ID_AA64ISAR1_APA_ARCH_EPAC2_FPAC_CMB 0x5 756 + #define ID_AA64ISAR1_API_NI 0x0 757 + #define ID_AA64ISAR1_API_IMP_DEF 0x1 758 + #define ID_AA64ISAR1_API_IMP_DEF_EPAC 0x2 759 + #define ID_AA64ISAR1_API_IMP_DEF_EPAC2 0x3 760 + #define ID_AA64ISAR1_API_IMP_DEF_EPAC2_FPAC 0x4 761 + #define ID_AA64ISAR1_API_IMP_DEF_EPAC2_FPAC_CMB 0x5 762 + #define ID_AA64ISAR1_GPA_NI 0x0 763 + #define ID_AA64ISAR1_GPA_ARCHITECTED 0x1 764 + #define ID_AA64ISAR1_GPI_NI 0x0 765 + #define ID_AA64ISAR1_GPI_IMP_DEF 0x1 766 + 767 + /* id_aa64pfr0 */ 768 + #define ID_AA64PFR0_CSV3_SHIFT 60 769 + #define ID_AA64PFR0_CSV2_SHIFT 56 770 + #define ID_AA64PFR0_DIT_SHIFT 48 771 + #define ID_AA64PFR0_AMU_SHIFT 44 772 + #define ID_AA64PFR0_MPAM_SHIFT 40 773 + #define ID_AA64PFR0_SEL2_SHIFT 36 774 + #define ID_AA64PFR0_SVE_SHIFT 32 775 + #define ID_AA64PFR0_RAS_SHIFT 28 776 + #define ID_AA64PFR0_GIC_SHIFT 24 777 + #define ID_AA64PFR0_ASIMD_SHIFT 20 778 + #define ID_AA64PFR0_FP_SHIFT 16 779 + #define ID_AA64PFR0_EL3_SHIFT 12 780 + #define ID_AA64PFR0_EL2_SHIFT 8 781 + #define ID_AA64PFR0_EL1_SHIFT 4 782 + #define ID_AA64PFR0_EL0_SHIFT 0 783 + 784 + #define ID_AA64PFR0_AMU 0x1 785 + #define ID_AA64PFR0_SVE 0x1 786 + #define ID_AA64PFR0_RAS_V1 0x1 787 + #define ID_AA64PFR0_RAS_V1P1 0x2 788 + #define ID_AA64PFR0_FP_NI 0xf 789 + #define ID_AA64PFR0_FP_SUPPORTED 0x0 790 + #define ID_AA64PFR0_ASIMD_NI 0xf 791 + #define ID_AA64PFR0_ASIMD_SUPPORTED 0x0 792 + #define ID_AA64PFR0_ELx_64BIT_ONLY 0x1 793 + #define ID_AA64PFR0_ELx_32BIT_64BIT 0x2 794 + 795 + /* id_aa64pfr1 */ 796 + #define ID_AA64PFR1_MPAMFRAC_SHIFT 16 797 + #define ID_AA64PFR1_RASFRAC_SHIFT 12 798 + #define ID_AA64PFR1_MTE_SHIFT 8 799 + #define ID_AA64PFR1_SSBS_SHIFT 4 800 + #define ID_AA64PFR1_BT_SHIFT 0 801 + 802 + #define ID_AA64PFR1_SSBS_PSTATE_NI 0 803 + #define ID_AA64PFR1_SSBS_PSTATE_ONLY 1 804 + #define ID_AA64PFR1_SSBS_PSTATE_INSNS 2 805 + #define ID_AA64PFR1_BT_BTI 0x1 806 + 807 + #define ID_AA64PFR1_MTE_NI 0x0 808 + #define ID_AA64PFR1_MTE_EL0 0x1 809 + #define ID_AA64PFR1_MTE 0x2 810 + 811 + /* id_aa64zfr0 */ 812 + #define ID_AA64ZFR0_F64MM_SHIFT 56 813 + #define ID_AA64ZFR0_F32MM_SHIFT 52 814 + #define ID_AA64ZFR0_I8MM_SHIFT 44 815 + #define ID_AA64ZFR0_SM4_SHIFT 40 816 + #define ID_AA64ZFR0_SHA3_SHIFT 32 817 + #define ID_AA64ZFR0_BF16_SHIFT 20 818 + #define ID_AA64ZFR0_BITPERM_SHIFT 16 819 + #define ID_AA64ZFR0_AES_SHIFT 4 820 + #define ID_AA64ZFR0_SVEVER_SHIFT 0 821 + 822 + #define ID_AA64ZFR0_F64MM 0x1 823 + #define ID_AA64ZFR0_F32MM 0x1 824 + #define ID_AA64ZFR0_I8MM 0x1 825 + #define ID_AA64ZFR0_BF16 0x1 826 + #define ID_AA64ZFR0_SM4 0x1 827 + #define ID_AA64ZFR0_SHA3 0x1 828 + #define ID_AA64ZFR0_BITPERM 0x1 829 + #define ID_AA64ZFR0_AES 0x1 830 + #define ID_AA64ZFR0_AES_PMULL 0x2 831 + #define ID_AA64ZFR0_SVEVER_SVE2 0x1 832 + 833 + /* id_aa64mmfr0 */ 834 + #define ID_AA64MMFR0_ECV_SHIFT 60 835 + #define ID_AA64MMFR0_FGT_SHIFT 56 836 + #define ID_AA64MMFR0_EXS_SHIFT 44 837 + #define ID_AA64MMFR0_TGRAN4_2_SHIFT 40 838 + #define ID_AA64MMFR0_TGRAN64_2_SHIFT 36 839 + #define ID_AA64MMFR0_TGRAN16_2_SHIFT 32 840 + #define ID_AA64MMFR0_TGRAN4_SHIFT 28 841 + #define ID_AA64MMFR0_TGRAN64_SHIFT 24 842 + #define ID_AA64MMFR0_TGRAN16_SHIFT 20 843 + #define ID_AA64MMFR0_BIGENDEL0_SHIFT 16 844 + #define ID_AA64MMFR0_SNSMEM_SHIFT 12 845 + #define ID_AA64MMFR0_BIGENDEL_SHIFT 8 846 + #define ID_AA64MMFR0_ASID_SHIFT 4 847 + #define ID_AA64MMFR0_PARANGE_SHIFT 0 848 + 849 + #define ID_AA64MMFR0_ASID_8 0x0 850 + #define ID_AA64MMFR0_ASID_16 0x2 851 + 852 + #define ID_AA64MMFR0_TGRAN4_NI 0xf 853 + #define ID_AA64MMFR0_TGRAN4_SUPPORTED_MIN 0x0 854 + #define ID_AA64MMFR0_TGRAN4_SUPPORTED_MAX 0x7 855 + #define ID_AA64MMFR0_TGRAN64_NI 0xf 856 + #define ID_AA64MMFR0_TGRAN64_SUPPORTED_MIN 0x0 857 + #define ID_AA64MMFR0_TGRAN64_SUPPORTED_MAX 0x7 858 + #define ID_AA64MMFR0_TGRAN16_NI 0x0 859 + #define ID_AA64MMFR0_TGRAN16_SUPPORTED_MIN 0x1 860 + #define ID_AA64MMFR0_TGRAN16_SUPPORTED_MAX 0xf 861 + 862 + #define ID_AA64MMFR0_PARANGE_32 0x0 863 + #define ID_AA64MMFR0_PARANGE_36 0x1 864 + #define ID_AA64MMFR0_PARANGE_40 0x2 865 + #define ID_AA64MMFR0_PARANGE_42 0x3 866 + #define ID_AA64MMFR0_PARANGE_44 0x4 867 + #define ID_AA64MMFR0_PARANGE_48 0x5 868 + #define ID_AA64MMFR0_PARANGE_52 0x6 869 + 870 + #define ARM64_MIN_PARANGE_BITS 32 871 + 872 + #define ID_AA64MMFR0_TGRAN_2_SUPPORTED_DEFAULT 0x0 873 + #define ID_AA64MMFR0_TGRAN_2_SUPPORTED_NONE 0x1 874 + #define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MIN 0x2 875 + #define ID_AA64MMFR0_TGRAN_2_SUPPORTED_MAX 0x7 876 + 877 + #ifdef CONFIG_ARM64_PA_BITS_52 878 + #define ID_AA64MMFR0_PARANGE_MAX ID_AA64MMFR0_PARANGE_52 879 + #else 880 + #define ID_AA64MMFR0_PARANGE_MAX ID_AA64MMFR0_PARANGE_48 881 + #endif 882 + 883 + /* id_aa64mmfr1 */ 884 + #define ID_AA64MMFR1_ETS_SHIFT 36 885 + #define ID_AA64MMFR1_TWED_SHIFT 32 886 + #define ID_AA64MMFR1_XNX_SHIFT 28 887 + #define ID_AA64MMFR1_SPECSEI_SHIFT 24 888 + #define ID_AA64MMFR1_PAN_SHIFT 20 889 + #define ID_AA64MMFR1_LOR_SHIFT 16 890 + #define ID_AA64MMFR1_HPD_SHIFT 12 891 + #define ID_AA64MMFR1_VHE_SHIFT 8 892 + #define ID_AA64MMFR1_VMIDBITS_SHIFT 4 893 + #define ID_AA64MMFR1_HADBS_SHIFT 0 894 + 895 + #define ID_AA64MMFR1_VMIDBITS_8 0 896 + #define ID_AA64MMFR1_VMIDBITS_16 2 897 + 898 + /* id_aa64mmfr2 */ 899 + #define ID_AA64MMFR2_E0PD_SHIFT 60 900 + #define ID_AA64MMFR2_EVT_SHIFT 56 901 + #define ID_AA64MMFR2_BBM_SHIFT 52 902 + #define ID_AA64MMFR2_TTL_SHIFT 48 903 + #define ID_AA64MMFR2_FWB_SHIFT 40 904 + #define ID_AA64MMFR2_IDS_SHIFT 36 905 + #define ID_AA64MMFR2_AT_SHIFT 32 906 + #define ID_AA64MMFR2_ST_SHIFT 28 907 + #define ID_AA64MMFR2_NV_SHIFT 24 908 + #define ID_AA64MMFR2_CCIDX_SHIFT 20 909 + #define ID_AA64MMFR2_LVA_SHIFT 16 910 + #define ID_AA64MMFR2_IESB_SHIFT 12 911 + #define ID_AA64MMFR2_LSM_SHIFT 8 912 + #define ID_AA64MMFR2_UAO_SHIFT 4 913 + #define ID_AA64MMFR2_CNP_SHIFT 0 914 + 915 + /* id_aa64dfr0 */ 916 + #define ID_AA64DFR0_MTPMU_SHIFT 48 917 + #define ID_AA64DFR0_TRBE_SHIFT 44 918 + #define ID_AA64DFR0_TRACE_FILT_SHIFT 40 919 + #define ID_AA64DFR0_DOUBLELOCK_SHIFT 36 920 + #define ID_AA64DFR0_PMSVER_SHIFT 32 921 + #define ID_AA64DFR0_CTX_CMPS_SHIFT 28 922 + #define ID_AA64DFR0_WRPS_SHIFT 20 923 + #define ID_AA64DFR0_BRPS_SHIFT 12 924 + #define ID_AA64DFR0_PMUVER_SHIFT 8 925 + #define ID_AA64DFR0_TRACEVER_SHIFT 4 926 + #define ID_AA64DFR0_DEBUGVER_SHIFT 0 927 + 928 + #define ID_AA64DFR0_PMUVER_8_0 0x1 929 + #define ID_AA64DFR0_PMUVER_8_1 0x4 930 + #define ID_AA64DFR0_PMUVER_8_4 0x5 931 + #define ID_AA64DFR0_PMUVER_8_5 0x6 932 + #define ID_AA64DFR0_PMUVER_IMP_DEF 0xf 933 + 934 + #define ID_AA64DFR0_PMSVER_8_2 0x1 935 + #define ID_AA64DFR0_PMSVER_8_3 0x2 936 + 937 + #define ID_DFR0_PERFMON_SHIFT 24 938 + 939 + #define ID_DFR0_PERFMON_8_0 0x3 940 + #define ID_DFR0_PERFMON_8_1 0x4 941 + #define ID_DFR0_PERFMON_8_4 0x5 942 + #define ID_DFR0_PERFMON_8_5 0x6 943 + 944 + #define ID_ISAR4_SWP_FRAC_SHIFT 28 945 + #define ID_ISAR4_PSR_M_SHIFT 24 946 + #define ID_ISAR4_SYNCH_PRIM_FRAC_SHIFT 20 947 + #define ID_ISAR4_BARRIER_SHIFT 16 948 + #define ID_ISAR4_SMC_SHIFT 12 949 + #define ID_ISAR4_WRITEBACK_SHIFT 8 950 + #define ID_ISAR4_WITHSHIFTS_SHIFT 4 951 + #define ID_ISAR4_UNPRIV_SHIFT 0 952 + 953 + #define ID_DFR1_MTPMU_SHIFT 0 954 + 955 + #define ID_ISAR0_DIVIDE_SHIFT 24 956 + #define ID_ISAR0_DEBUG_SHIFT 20 957 + #define ID_ISAR0_COPROC_SHIFT 16 958 + #define ID_ISAR0_CMPBRANCH_SHIFT 12 959 + #define ID_ISAR0_BITFIELD_SHIFT 8 960 + #define ID_ISAR0_BITCOUNT_SHIFT 4 961 + #define ID_ISAR0_SWAP_SHIFT 0 962 + 963 + #define ID_ISAR5_RDM_SHIFT 24 964 + #define ID_ISAR5_CRC32_SHIFT 16 965 + #define ID_ISAR5_SHA2_SHIFT 12 966 + #define ID_ISAR5_SHA1_SHIFT 8 967 + #define ID_ISAR5_AES_SHIFT 4 968 + #define ID_ISAR5_SEVL_SHIFT 0 969 + 970 + #define ID_ISAR6_I8MM_SHIFT 24 971 + #define ID_ISAR6_BF16_SHIFT 20 972 + #define ID_ISAR6_SPECRES_SHIFT 16 973 + #define ID_ISAR6_SB_SHIFT 12 974 + #define ID_ISAR6_FHM_SHIFT 8 975 + #define ID_ISAR6_DP_SHIFT 4 976 + #define ID_ISAR6_JSCVT_SHIFT 0 977 + 978 + #define ID_MMFR0_INNERSHR_SHIFT 28 979 + #define ID_MMFR0_FCSE_SHIFT 24 980 + #define ID_MMFR0_AUXREG_SHIFT 20 981 + #define ID_MMFR0_TCM_SHIFT 16 982 + #define ID_MMFR0_SHARELVL_SHIFT 12 983 + #define ID_MMFR0_OUTERSHR_SHIFT 8 984 + #define ID_MMFR0_PMSA_SHIFT 4 985 + #define ID_MMFR0_VMSA_SHIFT 0 986 + 987 + #define ID_MMFR4_EVT_SHIFT 28 988 + #define ID_MMFR4_CCIDX_SHIFT 24 989 + #define ID_MMFR4_LSM_SHIFT 20 990 + #define ID_MMFR4_HPDS_SHIFT 16 991 + #define ID_MMFR4_CNP_SHIFT 12 992 + #define ID_MMFR4_XNX_SHIFT 8 993 + #define ID_MMFR4_AC2_SHIFT 4 994 + #define ID_MMFR4_SPECSEI_SHIFT 0 995 + 996 + #define ID_MMFR5_ETS_SHIFT 0 997 + 998 + #define ID_PFR0_DIT_SHIFT 24 999 + #define ID_PFR0_CSV2_SHIFT 16 1000 + #define ID_PFR0_STATE3_SHIFT 12 1001 + #define ID_PFR0_STATE2_SHIFT 8 1002 + #define ID_PFR0_STATE1_SHIFT 4 1003 + #define ID_PFR0_STATE0_SHIFT 0 1004 + 1005 + #define ID_DFR0_PERFMON_SHIFT 24 1006 + #define ID_DFR0_MPROFDBG_SHIFT 20 1007 + #define ID_DFR0_MMAPTRC_SHIFT 16 1008 + #define ID_DFR0_COPTRC_SHIFT 12 1009 + #define ID_DFR0_MMAPDBG_SHIFT 8 1010 + #define ID_DFR0_COPSDBG_SHIFT 4 1011 + #define ID_DFR0_COPDBG_SHIFT 0 1012 + 1013 + #define ID_PFR2_SSBS_SHIFT 4 1014 + #define ID_PFR2_CSV3_SHIFT 0 1015 + 1016 + #define MVFR0_FPROUND_SHIFT 28 1017 + #define MVFR0_FPSHVEC_SHIFT 24 1018 + #define MVFR0_FPSQRT_SHIFT 20 1019 + #define MVFR0_FPDIVIDE_SHIFT 16 1020 + #define MVFR0_FPTRAP_SHIFT 12 1021 + #define MVFR0_FPDP_SHIFT 8 1022 + #define MVFR0_FPSP_SHIFT 4 1023 + #define MVFR0_SIMD_SHIFT 0 1024 + 1025 + #define MVFR1_SIMDFMAC_SHIFT 28 1026 + #define MVFR1_FPHP_SHIFT 24 1027 + #define MVFR1_SIMDHP_SHIFT 20 1028 + #define MVFR1_SIMDSP_SHIFT 16 1029 + #define MVFR1_SIMDINT_SHIFT 12 1030 + #define MVFR1_SIMDLS_SHIFT 8 1031 + #define MVFR1_FPDNAN_SHIFT 4 1032 + #define MVFR1_FPFTZ_SHIFT 0 1033 + 1034 + #define ID_PFR1_GIC_SHIFT 28 1035 + #define ID_PFR1_VIRT_FRAC_SHIFT 24 1036 + #define ID_PFR1_SEC_FRAC_SHIFT 20 1037 + #define ID_PFR1_GENTIMER_SHIFT 16 1038 + #define ID_PFR1_VIRTUALIZATION_SHIFT 12 1039 + #define ID_PFR1_MPROGMOD_SHIFT 8 1040 + #define ID_PFR1_SECURITY_SHIFT 4 1041 + #define ID_PFR1_PROGMOD_SHIFT 0 1042 + 1043 + #if defined(CONFIG_ARM64_4K_PAGES) 1044 + #define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN4_SHIFT 1045 + #define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN4_SUPPORTED_MIN 1046 + #define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX ID_AA64MMFR0_TGRAN4_SUPPORTED_MAX 1047 + #define ID_AA64MMFR0_TGRAN_2_SHIFT ID_AA64MMFR0_TGRAN4_2_SHIFT 1048 + #elif defined(CONFIG_ARM64_16K_PAGES) 1049 + #define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN16_SHIFT 1050 + #define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN16_SUPPORTED_MIN 1051 + #define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX ID_AA64MMFR0_TGRAN16_SUPPORTED_MAX 1052 + #define ID_AA64MMFR0_TGRAN_2_SHIFT ID_AA64MMFR0_TGRAN16_2_SHIFT 1053 + #elif defined(CONFIG_ARM64_64K_PAGES) 1054 + #define ID_AA64MMFR0_TGRAN_SHIFT ID_AA64MMFR0_TGRAN64_SHIFT 1055 + #define ID_AA64MMFR0_TGRAN_SUPPORTED_MIN ID_AA64MMFR0_TGRAN64_SUPPORTED_MIN 1056 + #define ID_AA64MMFR0_TGRAN_SUPPORTED_MAX ID_AA64MMFR0_TGRAN64_SUPPORTED_MAX 1057 + #define ID_AA64MMFR0_TGRAN_2_SHIFT ID_AA64MMFR0_TGRAN64_2_SHIFT 1058 + #endif 1059 + 1060 + #define MVFR2_FPMISC_SHIFT 4 1061 + #define MVFR2_SIMDMISC_SHIFT 0 1062 + 1063 + #define DCZID_DZP_SHIFT 4 1064 + #define DCZID_BS_SHIFT 0 1065 + 1066 + /* 1067 + * The ZCR_ELx_LEN_* definitions intentionally include bits [8:4] which 1068 + * are reserved by the SVE architecture for future expansion of the LEN 1069 + * field, with compatible semantics. 1070 + */ 1071 + #define ZCR_ELx_LEN_SHIFT 0 1072 + #define ZCR_ELx_LEN_SIZE 9 1073 + #define ZCR_ELx_LEN_MASK 0x1ff 1074 + 1075 + #define CPACR_EL1_ZEN_EL1EN (BIT(16)) /* enable EL1 access */ 1076 + #define CPACR_EL1_ZEN_EL0EN (BIT(17)) /* enable EL0 access, if EL1EN set */ 1077 + #define CPACR_EL1_ZEN (CPACR_EL1_ZEN_EL1EN | CPACR_EL1_ZEN_EL0EN) 1078 + 1079 + /* TCR EL1 Bit Definitions */ 1080 + #define SYS_TCR_EL1_TCMA1 (BIT(58)) 1081 + #define SYS_TCR_EL1_TCMA0 (BIT(57)) 1082 + 1083 + /* GCR_EL1 Definitions */ 1084 + #define SYS_GCR_EL1_RRND (BIT(16)) 1085 + #define SYS_GCR_EL1_EXCL_MASK 0xffffUL 1086 + 1087 + /* RGSR_EL1 Definitions */ 1088 + #define SYS_RGSR_EL1_TAG_MASK 0xfUL 1089 + #define SYS_RGSR_EL1_SEED_SHIFT 8 1090 + #define SYS_RGSR_EL1_SEED_MASK 0xffffUL 1091 + 1092 + /* GMID_EL1 field definitions */ 1093 + #define SYS_GMID_EL1_BS_SHIFT 0 1094 + #define SYS_GMID_EL1_BS_SIZE 4 1095 + 1096 + /* TFSR{,E0}_EL1 bit definitions */ 1097 + #define SYS_TFSR_EL1_TF0_SHIFT 0 1098 + #define SYS_TFSR_EL1_TF1_SHIFT 1 1099 + #define SYS_TFSR_EL1_TF0 (UL(1) << SYS_TFSR_EL1_TF0_SHIFT) 1100 + #define SYS_TFSR_EL1_TF1 (UL(1) << SYS_TFSR_EL1_TF1_SHIFT) 1101 + 1102 + /* Safe value for MPIDR_EL1: Bit31:RES1, Bit30:U:0, Bit24:MT:0 */ 1103 + #define SYS_MPIDR_SAFE_VAL (BIT(31)) 1104 + 1105 + #define TRFCR_ELx_TS_SHIFT 5 1106 + #define TRFCR_ELx_TS_VIRTUAL ((0x1UL) << TRFCR_ELx_TS_SHIFT) 1107 + #define TRFCR_ELx_TS_GUEST_PHYSICAL ((0x2UL) << TRFCR_ELx_TS_SHIFT) 1108 + #define TRFCR_ELx_TS_PHYSICAL ((0x3UL) << TRFCR_ELx_TS_SHIFT) 1109 + #define TRFCR_EL2_CX BIT(3) 1110 + #define TRFCR_ELx_ExTRE BIT(1) 1111 + #define TRFCR_ELx_E0TRE BIT(0) 1112 + 1113 + 1114 + /* GIC Hypervisor interface registers */ 1115 + /* ICH_MISR_EL2 bit definitions */ 1116 + #define ICH_MISR_EOI (1 << 0) 1117 + #define ICH_MISR_U (1 << 1) 1118 + 1119 + /* ICH_LR*_EL2 bit definitions */ 1120 + #define ICH_LR_VIRTUAL_ID_MASK ((1ULL << 32) - 1) 1121 + 1122 + #define ICH_LR_EOI (1ULL << 41) 1123 + #define ICH_LR_GROUP (1ULL << 60) 1124 + #define ICH_LR_HW (1ULL << 61) 1125 + #define ICH_LR_STATE (3ULL << 62) 1126 + #define ICH_LR_PENDING_BIT (1ULL << 62) 1127 + #define ICH_LR_ACTIVE_BIT (1ULL << 63) 1128 + #define ICH_LR_PHYS_ID_SHIFT 32 1129 + #define ICH_LR_PHYS_ID_MASK (0x3ffULL << ICH_LR_PHYS_ID_SHIFT) 1130 + #define ICH_LR_PRIORITY_SHIFT 48 1131 + #define ICH_LR_PRIORITY_MASK (0xffULL << ICH_LR_PRIORITY_SHIFT) 1132 + 1133 + /* ICH_HCR_EL2 bit definitions */ 1134 + #define ICH_HCR_EN (1 << 0) 1135 + #define ICH_HCR_UIE (1 << 1) 1136 + #define ICH_HCR_NPIE (1 << 3) 1137 + #define ICH_HCR_TC (1 << 10) 1138 + #define ICH_HCR_TALL0 (1 << 11) 1139 + #define ICH_HCR_TALL1 (1 << 12) 1140 + #define ICH_HCR_EOIcount_SHIFT 27 1141 + #define ICH_HCR_EOIcount_MASK (0x1f << ICH_HCR_EOIcount_SHIFT) 1142 + 1143 + /* ICH_VMCR_EL2 bit definitions */ 1144 + #define ICH_VMCR_ACK_CTL_SHIFT 2 1145 + #define ICH_VMCR_ACK_CTL_MASK (1 << ICH_VMCR_ACK_CTL_SHIFT) 1146 + #define ICH_VMCR_FIQ_EN_SHIFT 3 1147 + #define ICH_VMCR_FIQ_EN_MASK (1 << ICH_VMCR_FIQ_EN_SHIFT) 1148 + #define ICH_VMCR_CBPR_SHIFT 4 1149 + #define ICH_VMCR_CBPR_MASK (1 << ICH_VMCR_CBPR_SHIFT) 1150 + #define ICH_VMCR_EOIM_SHIFT 9 1151 + #define ICH_VMCR_EOIM_MASK (1 << ICH_VMCR_EOIM_SHIFT) 1152 + #define ICH_VMCR_BPR1_SHIFT 18 1153 + #define ICH_VMCR_BPR1_MASK (7 << ICH_VMCR_BPR1_SHIFT) 1154 + #define ICH_VMCR_BPR0_SHIFT 21 1155 + #define ICH_VMCR_BPR0_MASK (7 << ICH_VMCR_BPR0_SHIFT) 1156 + #define ICH_VMCR_PMR_SHIFT 24 1157 + #define ICH_VMCR_PMR_MASK (0xffUL << ICH_VMCR_PMR_SHIFT) 1158 + #define ICH_VMCR_ENG0_SHIFT 0 1159 + #define ICH_VMCR_ENG0_MASK (1 << ICH_VMCR_ENG0_SHIFT) 1160 + #define ICH_VMCR_ENG1_SHIFT 1 1161 + #define ICH_VMCR_ENG1_MASK (1 << ICH_VMCR_ENG1_SHIFT) 1162 + 1163 + /* ICH_VTR_EL2 bit definitions */ 1164 + #define ICH_VTR_PRI_BITS_SHIFT 29 1165 + #define ICH_VTR_PRI_BITS_MASK (7 << ICH_VTR_PRI_BITS_SHIFT) 1166 + #define ICH_VTR_ID_BITS_SHIFT 23 1167 + #define ICH_VTR_ID_BITS_MASK (7 << ICH_VTR_ID_BITS_SHIFT) 1168 + #define ICH_VTR_SEIS_SHIFT 22 1169 + #define ICH_VTR_SEIS_MASK (1 << ICH_VTR_SEIS_SHIFT) 1170 + #define ICH_VTR_A3V_SHIFT 21 1171 + #define ICH_VTR_A3V_MASK (1 << ICH_VTR_A3V_SHIFT) 1172 + 1173 + #define ARM64_FEATURE_FIELD_BITS 4 1174 + 1175 + /* Create a mask for the feature bits of the specified feature. */ 1176 + #define ARM64_FEATURE_MASK(x) (GENMASK_ULL(x##_SHIFT + ARM64_FEATURE_FIELD_BITS - 1, x##_SHIFT)) 1177 + 1178 + #ifdef __ASSEMBLY__ 1179 + 1180 + .irp num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30 1181 + .equ .L__reg_num_x\num, \num 1182 + .endr 1183 + .equ .L__reg_num_xzr, 31 1184 + 1185 + .macro mrs_s, rt, sreg 1186 + __emit_inst(0xd5200000|(\sreg)|(.L__reg_num_\rt)) 1187 + .endm 1188 + 1189 + .macro msr_s, sreg, rt 1190 + __emit_inst(0xd5000000|(\sreg)|(.L__reg_num_\rt)) 1191 + .endm 1192 + 1193 + #else 1194 + 1195 + #include <linux/build_bug.h> 1196 + #include <linux/types.h> 1197 + #include <asm/alternative.h> 1198 + 1199 + #define __DEFINE_MRS_MSR_S_REGNUM \ 1200 + " .irp num,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30\n" \ 1201 + " .equ .L__reg_num_x\\num, \\num\n" \ 1202 + " .endr\n" \ 1203 + " .equ .L__reg_num_xzr, 31\n" 1204 + 1205 + #define DEFINE_MRS_S \ 1206 + __DEFINE_MRS_MSR_S_REGNUM \ 1207 + " .macro mrs_s, rt, sreg\n" \ 1208 + __emit_inst(0xd5200000|(\\sreg)|(.L__reg_num_\\rt)) \ 1209 + " .endm\n" 1210 + 1211 + #define DEFINE_MSR_S \ 1212 + __DEFINE_MRS_MSR_S_REGNUM \ 1213 + " .macro msr_s, sreg, rt\n" \ 1214 + __emit_inst(0xd5000000|(\\sreg)|(.L__reg_num_\\rt)) \ 1215 + " .endm\n" 1216 + 1217 + #define UNDEFINE_MRS_S \ 1218 + " .purgem mrs_s\n" 1219 + 1220 + #define UNDEFINE_MSR_S \ 1221 + " .purgem msr_s\n" 1222 + 1223 + #define __mrs_s(v, r) \ 1224 + DEFINE_MRS_S \ 1225 + " mrs_s " v ", " __stringify(r) "\n" \ 1226 + UNDEFINE_MRS_S 1227 + 1228 + #define __msr_s(r, v) \ 1229 + DEFINE_MSR_S \ 1230 + " msr_s " __stringify(r) ", " v "\n" \ 1231 + UNDEFINE_MSR_S 1232 + 1233 + /* 1234 + * Unlike read_cpuid, calls to read_sysreg are never expected to be 1235 + * optimized away or replaced with synthetic values. 1236 + */ 1237 + #define read_sysreg(r) ({ \ 1238 + u64 __val; \ 1239 + asm volatile("mrs %0, " __stringify(r) : "=r" (__val)); \ 1240 + __val; \ 1241 + }) 1242 + 1243 + /* 1244 + * The "Z" constraint normally means a zero immediate, but when combined with 1245 + * the "%x0" template means XZR. 1246 + */ 1247 + #define write_sysreg(v, r) do { \ 1248 + u64 __val = (u64)(v); \ 1249 + asm volatile("msr " __stringify(r) ", %x0" \ 1250 + : : "rZ" (__val)); \ 1251 + } while (0) 1252 + 1253 + /* 1254 + * For registers without architectural names, or simply unsupported by 1255 + * GAS. 1256 + */ 1257 + #define read_sysreg_s(r) ({ \ 1258 + u64 __val; \ 1259 + asm volatile(__mrs_s("%0", r) : "=r" (__val)); \ 1260 + __val; \ 1261 + }) 1262 + 1263 + #define write_sysreg_s(v, r) do { \ 1264 + u64 __val = (u64)(v); \ 1265 + asm volatile(__msr_s(r, "%x0") : : "rZ" (__val)); \ 1266 + } while (0) 1267 + 1268 + /* 1269 + * Modify bits in a sysreg. Bits in the clear mask are zeroed, then bits in the 1270 + * set mask are set. Other bits are left as-is. 1271 + */ 1272 + #define sysreg_clear_set(sysreg, clear, set) do { \ 1273 + u64 __scs_val = read_sysreg(sysreg); \ 1274 + u64 __scs_new = (__scs_val & ~(u64)(clear)) | (set); \ 1275 + if (__scs_new != __scs_val) \ 1276 + write_sysreg(__scs_new, sysreg); \ 1277 + } while (0) 1278 + 1279 + #define sysreg_clear_set_s(sysreg, clear, set) do { \ 1280 + u64 __scs_val = read_sysreg_s(sysreg); \ 1281 + u64 __scs_new = (__scs_val & ~(u64)(clear)) | (set); \ 1282 + if (__scs_new != __scs_val) \ 1283 + write_sysreg_s(__scs_new, sysreg); \ 1284 + } while (0) 1285 + 1286 + #define read_sysreg_par() ({ \ 1287 + u64 par; \ 1288 + asm(ALTERNATIVE("nop", "dmb sy", ARM64_WORKAROUND_1508412)); \ 1289 + par = read_sysreg(par_el1); \ 1290 + asm(ALTERNATIVE("nop", "dmb sy", ARM64_WORKAROUND_1508412)); \ 1291 + par; \ 1292 + }) 1293 + 1294 + #endif 1295 + 1296 + #endif /* __ASM_SYSREG_H */

+48

tools/arch/x86/include/asm/pvclock-abi.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _ASM_X86_PVCLOCK_ABI_H 3 + #define _ASM_X86_PVCLOCK_ABI_H 4 + #ifndef __ASSEMBLY__ 5 + 6 + /* 7 + * These structs MUST NOT be changed. 8 + * They are the ABI between hypervisor and guest OS. 9 + * Both Xen and KVM are using this. 10 + * 11 + * pvclock_vcpu_time_info holds the system time and the tsc timestamp 12 + * of the last update. So the guest can use the tsc delta to get a 13 + * more precise system time. There is one per virtual cpu. 14 + * 15 + * pvclock_wall_clock references the point in time when the system 16 + * time was zero (usually boot time), thus the guest calculates the 17 + * current wall clock by adding the system time. 18 + * 19 + * Protocol for the "version" fields is: hypervisor raises it (making 20 + * it uneven) before it starts updating the fields and raises it again 21 + * (making it even) when it is done. Thus the guest can make sure the 22 + * time values it got are consistent by checking the version before 23 + * and after reading them. 24 + */ 25 + 26 + struct pvclock_vcpu_time_info { 27 + u32 version; 28 + u32 pad0; 29 + u64 tsc_timestamp; 30 + u64 system_time; 31 + u32 tsc_to_system_mul; 32 + s8 tsc_shift; 33 + u8 flags; 34 + u8 pad[2]; 35 + } __attribute__((__packed__)); /* 32 bytes */ 36 + 37 + struct pvclock_wall_clock { 38 + u32 version; 39 + u32 sec; 40 + u32 nsec; 41 + } __attribute__((__packed__)); 42 + 43 + #define PVCLOCK_TSC_STABLE_BIT (1 << 0) 44 + #define PVCLOCK_GUEST_STOPPED (1 << 1) 45 + /* PVCLOCK_COUNTS_FROM_ZERO broke ABI and can't be used anymore. */ 46 + #define PVCLOCK_COUNTS_FROM_ZERO (1 << 2) 47 + #endif /* __ASSEMBLY__ */ 48 + #endif /* _ASM_X86_PVCLOCK_ABI_H */

+103

tools/arch/x86/include/asm/pvclock.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _ASM_X86_PVCLOCK_H 3 + #define _ASM_X86_PVCLOCK_H 4 + 5 + #include <asm/barrier.h> 6 + #include <asm/pvclock-abi.h> 7 + 8 + /* some helper functions for xen and kvm pv clock sources */ 9 + u64 pvclock_clocksource_read(struct pvclock_vcpu_time_info *src); 10 + u8 pvclock_read_flags(struct pvclock_vcpu_time_info *src); 11 + void pvclock_set_flags(u8 flags); 12 + unsigned long pvclock_tsc_khz(struct pvclock_vcpu_time_info *src); 13 + void pvclock_resume(void); 14 + 15 + void pvclock_touch_watchdogs(void); 16 + 17 + static __always_inline 18 + unsigned pvclock_read_begin(const struct pvclock_vcpu_time_info *src) 19 + { 20 + unsigned version = src->version & ~1; 21 + /* Make sure that the version is read before the data. */ 22 + rmb(); 23 + return version; 24 + } 25 + 26 + static __always_inline 27 + bool pvclock_read_retry(const struct pvclock_vcpu_time_info *src, 28 + unsigned version) 29 + { 30 + /* Make sure that the version is re-read after the data. */ 31 + rmb(); 32 + return version != src->version; 33 + } 34 + 35 + /* 36 + * Scale a 64-bit delta by scaling and multiplying by a 32-bit fraction, 37 + * yielding a 64-bit result. 38 + */ 39 + static inline u64 pvclock_scale_delta(u64 delta, u32 mul_frac, int shift) 40 + { 41 + u64 product; 42 + #ifdef __i386__ 43 + u32 tmp1, tmp2; 44 + #else 45 + unsigned long tmp; 46 + #endif 47 + 48 + if (shift < 0) 49 + delta >>= -shift; 50 + else 51 + delta <<= shift; 52 + 53 + #ifdef __i386__ 54 + __asm__ ( 55 + "mul %5 ; " 56 + "mov %4,%%eax ; " 57 + "mov %%edx,%4 ; " 58 + "mul %5 ; " 59 + "xor %5,%5 ; " 60 + "add %4,%%eax ; " 61 + "adc %5,%%edx ; " 62 + : "=A" (product), "=r" (tmp1), "=r" (tmp2) 63 + : "a" ((u32)delta), "1" ((u32)(delta >> 32)), "2" (mul_frac) ); 64 + #elif defined(__x86_64__) 65 + __asm__ ( 66 + "mulq %[mul_frac] ; shrd $32, %[hi], %[lo]" 67 + : [lo]"=a"(product), 68 + [hi]"=d"(tmp) 69 + : "0"(delta), 70 + [mul_frac]"rm"((u64)mul_frac)); 71 + #else 72 + #error implement me! 73 + #endif 74 + 75 + return product; 76 + } 77 + 78 + static __always_inline 79 + u64 __pvclock_read_cycles(const struct pvclock_vcpu_time_info *src, u64 tsc) 80 + { 81 + u64 delta = tsc - src->tsc_timestamp; 82 + u64 offset = pvclock_scale_delta(delta, src->tsc_to_system_mul, 83 + src->tsc_shift); 84 + return src->system_time + offset; 85 + } 86 + 87 + struct pvclock_vsyscall_time_info { 88 + struct pvclock_vcpu_time_info pvti; 89 + } __attribute__((__aligned__(64))); 90 + 91 + #define PVTI_SIZE sizeof(struct pvclock_vsyscall_time_info) 92 + 93 + #ifdef CONFIG_PARAVIRT_CLOCK 94 + void pvclock_set_pvti_cpu0_va(struct pvclock_vsyscall_time_info *pvti); 95 + struct pvclock_vsyscall_time_info *pvclock_get_pvti_cpu0_va(void); 96 + #else 97 + static inline struct pvclock_vsyscall_time_info *pvclock_get_pvti_cpu0_va(void) 98 + { 99 + return NULL; 100 + } 101 + #endif 102 + 103 + #endif /* _ASM_X86_PVCLOCK_H */

+3

tools/testing/selftests/kvm/.gitignore

··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 + /aarch64/arch_timer 2 3 /aarch64/debug-exceptions 3 4 /aarch64/get-reg-list 4 5 /aarch64/psci_cpu_on_test ··· 13 12 /x86_64/emulator_error_test 14 13 /x86_64/get_cpuid_test 15 14 /x86_64/get_msr_index_features 15 + /x86_64/kvm_clock_test 16 16 /x86_64/kvm_pv_test 17 17 /x86_64/hyperv_clock 18 18 /x86_64/hyperv_cpuid ··· 55 53 /set_memory_region_test 56 54 /steal_time 57 55 /kvm_binary_stats_test 56 + /system_counter_offset_test

+6 -1

tools/testing/selftests/kvm/Makefile

··· 35 35 36 36 LIBKVM = lib/assert.c lib/elf.c lib/io.c lib/kvm_util.c lib/rbtree.c lib/sparsebit.c lib/test_util.c lib/guest_modes.c lib/perf_test_util.c 37 37 LIBKVM_x86_64 = lib/x86_64/apic.c lib/x86_64/processor.c lib/x86_64/vmx.c lib/x86_64/svm.c lib/x86_64/ucall.c lib/x86_64/handlers.S 38 - LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c lib/aarch64/handlers.S 38 + LIBKVM_aarch64 = lib/aarch64/processor.c lib/aarch64/ucall.c lib/aarch64/handlers.S lib/aarch64/spinlock.c lib/aarch64/gic.c lib/aarch64/gic_v3.c lib/aarch64/vgic.c 39 39 LIBKVM_s390x = lib/s390x/processor.c lib/s390x/ucall.c lib/s390x/diag318_test_handler.c 40 40 41 41 TEST_GEN_PROGS_x86_64 = x86_64/cr4_cpuid_sync_test ··· 46 46 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_clock 47 47 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_cpuid 48 48 TEST_GEN_PROGS_x86_64 += x86_64/hyperv_features 49 + TEST_GEN_PROGS_x86_64 += x86_64/kvm_clock_test 49 50 TEST_GEN_PROGS_x86_64 += x86_64/kvm_pv_test 50 51 TEST_GEN_PROGS_x86_64 += x86_64/mmio_warning_test 51 52 TEST_GEN_PROGS_x86_64 += x86_64/mmu_role_test ··· 86 85 TEST_GEN_PROGS_x86_64 += set_memory_region_test 87 86 TEST_GEN_PROGS_x86_64 += steal_time 88 87 TEST_GEN_PROGS_x86_64 += kvm_binary_stats_test 88 + TEST_GEN_PROGS_x86_64 += system_counter_offset_test 89 89 90 + TEST_GEN_PROGS_aarch64 += aarch64/arch_timer 90 91 TEST_GEN_PROGS_aarch64 += aarch64/debug-exceptions 91 92 TEST_GEN_PROGS_aarch64 += aarch64/get-reg-list 92 93 TEST_GEN_PROGS_aarch64 += aarch64/psci_cpu_on_test ··· 98 95 TEST_GEN_PROGS_aarch64 += dirty_log_perf_test 99 96 TEST_GEN_PROGS_aarch64 += kvm_create_max_vcpus 100 97 TEST_GEN_PROGS_aarch64 += kvm_page_table_test 98 + TEST_GEN_PROGS_aarch64 += memslot_modification_stress_test 99 + TEST_GEN_PROGS_aarch64 += memslot_perf_test 101 100 TEST_GEN_PROGS_aarch64 += rseq_test 102 101 TEST_GEN_PROGS_aarch64 += set_memory_region_test 103 102 TEST_GEN_PROGS_aarch64 += steal_time

+479

tools/testing/selftests/kvm/aarch64/arch_timer.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * arch_timer.c - Tests the aarch64 timer IRQ functionality 4 + * 5 + * The test validates both the virtual and physical timer IRQs using 6 + * CVAL and TVAL registers. This consitutes the four stages in the test. 7 + * The guest's main thread configures the timer interrupt for a stage 8 + * and waits for it to fire, with a timeout equal to the timer period. 9 + * It asserts that the timeout doesn't exceed the timer period. 10 + * 11 + * On the other hand, upon receipt of an interrupt, the guest's interrupt 12 + * handler validates the interrupt by checking if the architectural state 13 + * is in compliance with the specifications. 14 + * 15 + * The test provides command-line options to configure the timer's 16 + * period (-p), number of vCPUs (-n), and iterations per stage (-i). 17 + * To stress-test the timer stack even more, an option to migrate the 18 + * vCPUs across pCPUs (-m), at a particular rate, is also provided. 19 + * 20 + * Copyright (c) 2021, Google LLC. 21 + */ 22 + 23 + #define _GNU_SOURCE 24 + 25 + #include <stdlib.h> 26 + #include <pthread.h> 27 + #include <linux/kvm.h> 28 + #include <linux/sizes.h> 29 + #include <linux/bitmap.h> 30 + #include <sys/sysinfo.h> 31 + 32 + #include "kvm_util.h" 33 + #include "processor.h" 34 + #include "delay.h" 35 + #include "arch_timer.h" 36 + #include "gic.h" 37 + #include "vgic.h" 38 + 39 + #define NR_VCPUS_DEF 4 40 + #define NR_TEST_ITERS_DEF 5 41 + #define TIMER_TEST_PERIOD_MS_DEF 10 42 + #define TIMER_TEST_ERR_MARGIN_US 100 43 + #define TIMER_TEST_MIGRATION_FREQ_MS 2 44 + 45 + struct test_args { 46 + int nr_vcpus; 47 + int nr_iter; 48 + int timer_period_ms; 49 + int migration_freq_ms; 50 + }; 51 + 52 + static struct test_args test_args = { 53 + .nr_vcpus = NR_VCPUS_DEF, 54 + .nr_iter = NR_TEST_ITERS_DEF, 55 + .timer_period_ms = TIMER_TEST_PERIOD_MS_DEF, 56 + .migration_freq_ms = TIMER_TEST_MIGRATION_FREQ_MS, 57 + }; 58 + 59 + #define msecs_to_usecs(msec) ((msec) * 1000LL) 60 + 61 + #define GICD_BASE_GPA 0x8000000ULL 62 + #define GICR_BASE_GPA 0x80A0000ULL 63 + 64 + enum guest_stage { 65 + GUEST_STAGE_VTIMER_CVAL = 1, 66 + GUEST_STAGE_VTIMER_TVAL, 67 + GUEST_STAGE_PTIMER_CVAL, 68 + GUEST_STAGE_PTIMER_TVAL, 69 + GUEST_STAGE_MAX, 70 + }; 71 + 72 + /* Shared variables between host and guest */ 73 + struct test_vcpu_shared_data { 74 + int nr_iter; 75 + enum guest_stage guest_stage; 76 + uint64_t xcnt; 77 + }; 78 + 79 + struct test_vcpu { 80 + uint32_t vcpuid; 81 + pthread_t pt_vcpu_run; 82 + struct kvm_vm *vm; 83 + }; 84 + 85 + static struct test_vcpu test_vcpu[KVM_MAX_VCPUS]; 86 + static struct test_vcpu_shared_data vcpu_shared_data[KVM_MAX_VCPUS]; 87 + 88 + static int vtimer_irq, ptimer_irq; 89 + 90 + static unsigned long *vcpu_done_map; 91 + static pthread_mutex_t vcpu_done_map_lock; 92 + 93 + static void 94 + guest_configure_timer_action(struct test_vcpu_shared_data *shared_data) 95 + { 96 + switch (shared_data->guest_stage) { 97 + case GUEST_STAGE_VTIMER_CVAL: 98 + timer_set_next_cval_ms(VIRTUAL, test_args.timer_period_ms); 99 + shared_data->xcnt = timer_get_cntct(VIRTUAL); 100 + timer_set_ctl(VIRTUAL, CTL_ENABLE); 101 + break; 102 + case GUEST_STAGE_VTIMER_TVAL: 103 + timer_set_next_tval_ms(VIRTUAL, test_args.timer_period_ms); 104 + shared_data->xcnt = timer_get_cntct(VIRTUAL); 105 + timer_set_ctl(VIRTUAL, CTL_ENABLE); 106 + break; 107 + case GUEST_STAGE_PTIMER_CVAL: 108 + timer_set_next_cval_ms(PHYSICAL, test_args.timer_period_ms); 109 + shared_data->xcnt = timer_get_cntct(PHYSICAL); 110 + timer_set_ctl(PHYSICAL, CTL_ENABLE); 111 + break; 112 + case GUEST_STAGE_PTIMER_TVAL: 113 + timer_set_next_tval_ms(PHYSICAL, test_args.timer_period_ms); 114 + shared_data->xcnt = timer_get_cntct(PHYSICAL); 115 + timer_set_ctl(PHYSICAL, CTL_ENABLE); 116 + break; 117 + default: 118 + GUEST_ASSERT(0); 119 + } 120 + } 121 + 122 + static void guest_validate_irq(unsigned int intid, 123 + struct test_vcpu_shared_data *shared_data) 124 + { 125 + enum guest_stage stage = shared_data->guest_stage; 126 + uint64_t xcnt = 0, xcnt_diff_us, cval = 0; 127 + unsigned long xctl = 0; 128 + unsigned int timer_irq = 0; 129 + 130 + if (stage == GUEST_STAGE_VTIMER_CVAL || 131 + stage == GUEST_STAGE_VTIMER_TVAL) { 132 + xctl = timer_get_ctl(VIRTUAL); 133 + timer_set_ctl(VIRTUAL, CTL_IMASK); 134 + xcnt = timer_get_cntct(VIRTUAL); 135 + cval = timer_get_cval(VIRTUAL); 136 + timer_irq = vtimer_irq; 137 + } else if (stage == GUEST_STAGE_PTIMER_CVAL || 138 + stage == GUEST_STAGE_PTIMER_TVAL) { 139 + xctl = timer_get_ctl(PHYSICAL); 140 + timer_set_ctl(PHYSICAL, CTL_IMASK); 141 + xcnt = timer_get_cntct(PHYSICAL); 142 + cval = timer_get_cval(PHYSICAL); 143 + timer_irq = ptimer_irq; 144 + } else { 145 + GUEST_ASSERT(0); 146 + } 147 + 148 + xcnt_diff_us = cycles_to_usec(xcnt - shared_data->xcnt); 149 + 150 + /* Make sure we are dealing with the correct timer IRQ */ 151 + GUEST_ASSERT_2(intid == timer_irq, intid, timer_irq); 152 + 153 + /* Basic 'timer condition met' check */ 154 + GUEST_ASSERT_3(xcnt >= cval, xcnt, cval, xcnt_diff_us); 155 + GUEST_ASSERT_1(xctl & CTL_ISTATUS, xctl); 156 + } 157 + 158 + static void guest_irq_handler(struct ex_regs *regs) 159 + { 160 + unsigned int intid = gic_get_and_ack_irq(); 161 + uint32_t cpu = guest_get_vcpuid(); 162 + struct test_vcpu_shared_data *shared_data = &vcpu_shared_data[cpu]; 163 + 164 + guest_validate_irq(intid, shared_data); 165 + 166 + WRITE_ONCE(shared_data->nr_iter, shared_data->nr_iter + 1); 167 + 168 + gic_set_eoi(intid); 169 + } 170 + 171 + static void guest_run_stage(struct test_vcpu_shared_data *shared_data, 172 + enum guest_stage stage) 173 + { 174 + uint32_t irq_iter, config_iter; 175 + 176 + shared_data->guest_stage = stage; 177 + shared_data->nr_iter = 0; 178 + 179 + for (config_iter = 0; config_iter < test_args.nr_iter; config_iter++) { 180 + /* Setup the next interrupt */ 181 + guest_configure_timer_action(shared_data); 182 + 183 + /* Setup a timeout for the interrupt to arrive */ 184 + udelay(msecs_to_usecs(test_args.timer_period_ms) + 185 + TIMER_TEST_ERR_MARGIN_US); 186 + 187 + irq_iter = READ_ONCE(shared_data->nr_iter); 188 + GUEST_ASSERT_2(config_iter + 1 == irq_iter, 189 + config_iter + 1, irq_iter); 190 + } 191 + } 192 + 193 + static void guest_code(void) 194 + { 195 + uint32_t cpu = guest_get_vcpuid(); 196 + struct test_vcpu_shared_data *shared_data = &vcpu_shared_data[cpu]; 197 + 198 + local_irq_disable(); 199 + 200 + gic_init(GIC_V3, test_args.nr_vcpus, 201 + (void *)GICD_BASE_GPA, (void *)GICR_BASE_GPA); 202 + 203 + timer_set_ctl(VIRTUAL, CTL_IMASK); 204 + timer_set_ctl(PHYSICAL, CTL_IMASK); 205 + 206 + gic_irq_enable(vtimer_irq); 207 + gic_irq_enable(ptimer_irq); 208 + local_irq_enable(); 209 + 210 + guest_run_stage(shared_data, GUEST_STAGE_VTIMER_CVAL); 211 + guest_run_stage(shared_data, GUEST_STAGE_VTIMER_TVAL); 212 + guest_run_stage(shared_data, GUEST_STAGE_PTIMER_CVAL); 213 + guest_run_stage(shared_data, GUEST_STAGE_PTIMER_TVAL); 214 + 215 + GUEST_DONE(); 216 + } 217 + 218 + static void *test_vcpu_run(void *arg) 219 + { 220 + struct ucall uc; 221 + struct test_vcpu *vcpu = arg; 222 + struct kvm_vm *vm = vcpu->vm; 223 + uint32_t vcpuid = vcpu->vcpuid; 224 + struct test_vcpu_shared_data *shared_data = &vcpu_shared_data[vcpuid]; 225 + 226 + vcpu_run(vm, vcpuid); 227 + 228 + /* Currently, any exit from guest is an indication of completion */ 229 + pthread_mutex_lock(&vcpu_done_map_lock); 230 + set_bit(vcpuid, vcpu_done_map); 231 + pthread_mutex_unlock(&vcpu_done_map_lock); 232 + 233 + switch (get_ucall(vm, vcpuid, &uc)) { 234 + case UCALL_SYNC: 235 + case UCALL_DONE: 236 + break; 237 + case UCALL_ABORT: 238 + sync_global_from_guest(vm, *shared_data); 239 + TEST_FAIL("%s at %s:%ld\n\tvalues: %lu, %lu; %lu, vcpu: %u; stage: %u; iter: %u", 240 + (const char *)uc.args[0], __FILE__, uc.args[1], 241 + uc.args[2], uc.args[3], uc.args[4], vcpuid, 242 + shared_data->guest_stage, shared_data->nr_iter); 243 + break; 244 + default: 245 + TEST_FAIL("Unexpected guest exit\n"); 246 + } 247 + 248 + return NULL; 249 + } 250 + 251 + static uint32_t test_get_pcpu(void) 252 + { 253 + uint32_t pcpu; 254 + unsigned int nproc_conf; 255 + cpu_set_t online_cpuset; 256 + 257 + nproc_conf = get_nprocs_conf(); 258 + sched_getaffinity(0, sizeof(cpu_set_t), &online_cpuset); 259 + 260 + /* Randomly find an available pCPU to place a vCPU on */ 261 + do { 262 + pcpu = rand() % nproc_conf; 263 + } while (!CPU_ISSET(pcpu, &online_cpuset)); 264 + 265 + return pcpu; 266 + } 267 + 268 + static int test_migrate_vcpu(struct test_vcpu *vcpu) 269 + { 270 + int ret; 271 + cpu_set_t cpuset; 272 + uint32_t new_pcpu = test_get_pcpu(); 273 + 274 + CPU_ZERO(&cpuset); 275 + CPU_SET(new_pcpu, &cpuset); 276 + 277 + pr_debug("Migrating vCPU: %u to pCPU: %u\n", vcpu->vcpuid, new_pcpu); 278 + 279 + ret = pthread_setaffinity_np(vcpu->pt_vcpu_run, 280 + sizeof(cpuset), &cpuset); 281 + 282 + /* Allow the error where the vCPU thread is already finished */ 283 + TEST_ASSERT(ret == 0 || ret == ESRCH, 284 + "Failed to migrate the vCPU:%u to pCPU: %u; ret: %d\n", 285 + vcpu->vcpuid, new_pcpu, ret); 286 + 287 + return ret; 288 + } 289 + 290 + static void *test_vcpu_migration(void *arg) 291 + { 292 + unsigned int i, n_done; 293 + bool vcpu_done; 294 + 295 + do { 296 + usleep(msecs_to_usecs(test_args.migration_freq_ms)); 297 + 298 + for (n_done = 0, i = 0; i < test_args.nr_vcpus; i++) { 299 + pthread_mutex_lock(&vcpu_done_map_lock); 300 + vcpu_done = test_bit(i, vcpu_done_map); 301 + pthread_mutex_unlock(&vcpu_done_map_lock); 302 + 303 + if (vcpu_done) { 304 + n_done++; 305 + continue; 306 + } 307 + 308 + test_migrate_vcpu(&test_vcpu[i]); 309 + } 310 + } while (test_args.nr_vcpus != n_done); 311 + 312 + return NULL; 313 + } 314 + 315 + static void test_run(struct kvm_vm *vm) 316 + { 317 + int i, ret; 318 + pthread_t pt_vcpu_migration; 319 + 320 + pthread_mutex_init(&vcpu_done_map_lock, NULL); 321 + vcpu_done_map = bitmap_zalloc(test_args.nr_vcpus); 322 + TEST_ASSERT(vcpu_done_map, "Failed to allocate vcpu done bitmap\n"); 323 + 324 + for (i = 0; i < test_args.nr_vcpus; i++) { 325 + ret = pthread_create(&test_vcpu[i].pt_vcpu_run, NULL, 326 + test_vcpu_run, &test_vcpu[i]); 327 + TEST_ASSERT(!ret, "Failed to create vCPU-%d pthread\n", i); 328 + } 329 + 330 + /* Spawn a thread to control the vCPU migrations */ 331 + if (test_args.migration_freq_ms) { 332 + srand(time(NULL)); 333 + 334 + ret = pthread_create(&pt_vcpu_migration, NULL, 335 + test_vcpu_migration, NULL); 336 + TEST_ASSERT(!ret, "Failed to create the migration pthread\n"); 337 + } 338 + 339 + 340 + for (i = 0; i < test_args.nr_vcpus; i++) 341 + pthread_join(test_vcpu[i].pt_vcpu_run, NULL); 342 + 343 + if (test_args.migration_freq_ms) 344 + pthread_join(pt_vcpu_migration, NULL); 345 + 346 + bitmap_free(vcpu_done_map); 347 + } 348 + 349 + static void test_init_timer_irq(struct kvm_vm *vm) 350 + { 351 + /* Timer initid should be same for all the vCPUs, so query only vCPU-0 */ 352 + int vcpu0_fd = vcpu_get_fd(vm, 0); 353 + 354 + kvm_device_access(vcpu0_fd, KVM_ARM_VCPU_TIMER_CTRL, 355 + KVM_ARM_VCPU_TIMER_IRQ_PTIMER, &ptimer_irq, false); 356 + kvm_device_access(vcpu0_fd, KVM_ARM_VCPU_TIMER_CTRL, 357 + KVM_ARM_VCPU_TIMER_IRQ_VTIMER, &vtimer_irq, false); 358 + 359 + sync_global_to_guest(vm, ptimer_irq); 360 + sync_global_to_guest(vm, vtimer_irq); 361 + 362 + pr_debug("ptimer_irq: %d; vtimer_irq: %d\n", ptimer_irq, vtimer_irq); 363 + } 364 + 365 + static struct kvm_vm *test_vm_create(void) 366 + { 367 + struct kvm_vm *vm; 368 + unsigned int i; 369 + int nr_vcpus = test_args.nr_vcpus; 370 + 371 + vm = vm_create_default_with_vcpus(nr_vcpus, 0, 0, guest_code, NULL); 372 + 373 + vm_init_descriptor_tables(vm); 374 + vm_install_exception_handler(vm, VECTOR_IRQ_CURRENT, guest_irq_handler); 375 + 376 + for (i = 0; i < nr_vcpus; i++) { 377 + vcpu_init_descriptor_tables(vm, i); 378 + 379 + test_vcpu[i].vcpuid = i; 380 + test_vcpu[i].vm = vm; 381 + } 382 + 383 + ucall_init(vm, NULL); 384 + test_init_timer_irq(vm); 385 + vgic_v3_setup(vm, nr_vcpus, GICD_BASE_GPA, GICR_BASE_GPA); 386 + 387 + /* Make all the test's cmdline args visible to the guest */ 388 + sync_global_to_guest(vm, test_args); 389 + 390 + return vm; 391 + } 392 + 393 + static void test_print_help(char *name) 394 + { 395 + pr_info("Usage: %s [-h] [-n nr_vcpus] [-i iterations] [-p timer_period_ms]\n", 396 + name); 397 + pr_info("\t-n: Number of vCPUs to configure (default: %u; max: %u)\n", 398 + NR_VCPUS_DEF, KVM_MAX_VCPUS); 399 + pr_info("\t-i: Number of iterations per stage (default: %u)\n", 400 + NR_TEST_ITERS_DEF); 401 + pr_info("\t-p: Periodicity (in ms) of the guest timer (default: %u)\n", 402 + TIMER_TEST_PERIOD_MS_DEF); 403 + pr_info("\t-m: Frequency (in ms) of vCPUs to migrate to different pCPU. 0 to turn off (default: %u)\n", 404 + TIMER_TEST_MIGRATION_FREQ_MS); 405 + pr_info("\t-h: print this help screen\n"); 406 + } 407 + 408 + static bool parse_args(int argc, char *argv[]) 409 + { 410 + int opt; 411 + 412 + while ((opt = getopt(argc, argv, "hn:i:p:m:")) != -1) { 413 + switch (opt) { 414 + case 'n': 415 + test_args.nr_vcpus = atoi(optarg); 416 + if (test_args.nr_vcpus <= 0) { 417 + pr_info("Positive value needed for -n\n"); 418 + goto err; 419 + } else if (test_args.nr_vcpus > KVM_MAX_VCPUS) { 420 + pr_info("Max allowed vCPUs: %u\n", 421 + KVM_MAX_VCPUS); 422 + goto err; 423 + } 424 + break; 425 + case 'i': 426 + test_args.nr_iter = atoi(optarg); 427 + if (test_args.nr_iter <= 0) { 428 + pr_info("Positive value needed for -i\n"); 429 + goto err; 430 + } 431 + break; 432 + case 'p': 433 + test_args.timer_period_ms = atoi(optarg); 434 + if (test_args.timer_period_ms <= 0) { 435 + pr_info("Positive value needed for -p\n"); 436 + goto err; 437 + } 438 + break; 439 + case 'm': 440 + test_args.migration_freq_ms = atoi(optarg); 441 + if (test_args.migration_freq_ms < 0) { 442 + pr_info("0 or positive value needed for -m\n"); 443 + goto err; 444 + } 445 + break; 446 + case 'h': 447 + default: 448 + goto err; 449 + } 450 + } 451 + 452 + return true; 453 + 454 + err: 455 + test_print_help(argv[0]); 456 + return false; 457 + } 458 + 459 + int main(int argc, char *argv[]) 460 + { 461 + struct kvm_vm *vm; 462 + 463 + /* Tell stdout not to buffer its content */ 464 + setbuf(stdout, NULL); 465 + 466 + if (!parse_args(argc, argv)) 467 + exit(KSFT_SKIP); 468 + 469 + if (test_args.migration_freq_ms && get_nprocs() < 2) { 470 + print_skip("At least two physical CPUs needed for vCPU migration"); 471 + exit(KSFT_SKIP); 472 + } 473 + 474 + vm = test_vm_create(); 475 + test_run(vm); 476 + kvm_vm_free(vm); 477 + 478 + return 0; 479 + }

+15 -15

tools/testing/selftests/kvm/aarch64/debug-exceptions.c

··· 34 34 { 35 35 asm volatile("msr daifset, #8"); 36 36 37 - write_sysreg(osdlr_el1, 0); 38 - write_sysreg(oslar_el1, 0); 37 + write_sysreg(0, osdlr_el1); 38 + write_sysreg(0, oslar_el1); 39 39 isb(); 40 40 41 - write_sysreg(mdscr_el1, 0); 41 + write_sysreg(0, mdscr_el1); 42 42 /* This test only uses the first bp and wp slot. */ 43 - write_sysreg(dbgbvr0_el1, 0); 44 - write_sysreg(dbgbcr0_el1, 0); 45 - write_sysreg(dbgwcr0_el1, 0); 46 - write_sysreg(dbgwvr0_el1, 0); 43 + write_sysreg(0, dbgbvr0_el1); 44 + write_sysreg(0, dbgbcr0_el1); 45 + write_sysreg(0, dbgwcr0_el1); 46 + write_sysreg(0, dbgwvr0_el1); 47 47 isb(); 48 48 } 49 49 ··· 53 53 uint32_t mdscr; 54 54 55 55 wcr = DBGWCR_LEN8 | DBGWCR_RD | DBGWCR_WR | DBGWCR_EL1 | DBGWCR_E; 56 - write_sysreg(dbgwcr0_el1, wcr); 57 - write_sysreg(dbgwvr0_el1, addr); 56 + write_sysreg(wcr, dbgwcr0_el1); 57 + write_sysreg(addr, dbgwvr0_el1); 58 58 isb(); 59 59 60 60 asm volatile("msr daifclr, #8"); 61 61 62 62 mdscr = read_sysreg(mdscr_el1) | MDSCR_KDE | MDSCR_MDE; 63 - write_sysreg(mdscr_el1, mdscr); 63 + write_sysreg(mdscr, mdscr_el1); 64 64 isb(); 65 65 } 66 66 ··· 70 70 uint32_t mdscr; 71 71 72 72 bcr = DBGBCR_LEN8 | DBGBCR_EXEC | DBGBCR_EL1 | DBGBCR_E; 73 - write_sysreg(dbgbcr0_el1, bcr); 74 - write_sysreg(dbgbvr0_el1, addr); 73 + write_sysreg(bcr, dbgbcr0_el1); 74 + write_sysreg(addr, dbgbvr0_el1); 75 75 isb(); 76 76 77 77 asm volatile("msr daifclr, #8"); 78 78 79 79 mdscr = read_sysreg(mdscr_el1) | MDSCR_KDE | MDSCR_MDE; 80 - write_sysreg(mdscr_el1, mdscr); 80 + write_sysreg(mdscr, mdscr_el1); 81 81 isb(); 82 82 } 83 83 ··· 88 88 asm volatile("msr daifclr, #8"); 89 89 90 90 mdscr = read_sysreg(mdscr_el1) | MDSCR_KDE | MDSCR_SS; 91 - write_sysreg(mdscr_el1, mdscr); 91 + write_sysreg(mdscr, mdscr_el1); 92 92 isb(); 93 93 } 94 94 ··· 190 190 { 191 191 uint64_t id_aa64dfr0; 192 192 193 - get_reg(vm, VCPU_ID, ARM64_SYS_REG(ID_AA64DFR0_EL1), &id_aa64dfr0); 193 + get_reg(vm, VCPU_ID, KVM_ARM64_SYS_REG(SYS_ID_AA64DFR0_EL1), &id_aa64dfr0); 194 194 return id_aa64dfr0 & 0xf; 195 195 } 196 196

+1 -1

tools/testing/selftests/kvm/aarch64/psci_cpu_on_test.c

··· 91 91 init.features[0] |= (1 << KVM_ARM_VCPU_POWER_OFF); 92 92 aarch64_vcpu_add_default(vm, VCPU_ID_TARGET, &init, guest_main); 93 93 94 - get_reg(vm, VCPU_ID_TARGET, ARM64_SYS_REG(MPIDR_EL1), &target_mpidr); 94 + get_reg(vm, VCPU_ID_TARGET, KVM_ARM64_SYS_REG(SYS_MPIDR_EL1), &target_mpidr); 95 95 vcpu_args_set(vm, VCPU_ID_SOURCE, 1, target_mpidr & MPIDR_HWID_BITMASK); 96 96 vcpu_run(vm, VCPU_ID_SOURCE); 97 97

+270 -99

tools/testing/selftests/kvm/aarch64/vgic_init.c

··· 13 13 #include "test_util.h" 14 14 #include "kvm_util.h" 15 15 #include "processor.h" 16 + #include "vgic.h" 16 17 17 18 #define NR_VCPUS 4 18 19 19 - #define REDIST_REGION_ATTR_ADDR(count, base, flags, index) (((uint64_t)(count) << 52) | \ 20 - ((uint64_t)((base) >> 16) << 16) | ((uint64_t)(flags) << 12) | index) 21 20 #define REG_OFFSET(vcpu, offset) (((uint64_t)vcpu << 32) | offset) 22 21 23 22 #define GICR_TYPER 0x8 24 23 24 + #define VGIC_DEV_IS_V2(_d) ((_d) == KVM_DEV_TYPE_ARM_VGIC_V2) 25 + #define VGIC_DEV_IS_V3(_d) ((_d) == KVM_DEV_TYPE_ARM_VGIC_V3) 26 + 25 27 struct vm_gic { 26 28 struct kvm_vm *vm; 27 29 int gic_fd; 30 + uint32_t gic_dev_type; 28 31 }; 29 32 30 - static int max_ipa_bits; 33 + static uint64_t max_phys_size; 31 34 32 35 /* helper to access a redistributor register */ 33 - static int access_redist_reg(int gicv3_fd, int vcpu, int offset, 34 - uint32_t *val, bool write) 36 + static int access_v3_redist_reg(int gicv3_fd, int vcpu, int offset, 37 + uint32_t *val, bool write) 35 38 { 36 39 uint64_t attr = REG_OFFSET(vcpu, offset); 37 40 ··· 61 58 return 0; 62 59 } 63 60 64 - static struct vm_gic vm_gic_create(void) 61 + static struct vm_gic vm_gic_create_with_vcpus(uint32_t gic_dev_type, uint32_t nr_vcpus) 65 62 { 66 63 struct vm_gic v; 67 64 68 - v.vm = vm_create_default_with_vcpus(NR_VCPUS, 0, 0, guest_code, NULL); 69 - v.gic_fd = kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_V3, false); 65 + v.gic_dev_type = gic_dev_type; 66 + v.vm = vm_create_default_with_vcpus(nr_vcpus, 0, 0, guest_code, NULL); 67 + v.gic_fd = kvm_create_device(v.vm, gic_dev_type, false); 70 68 71 69 return v; 72 70 } ··· 78 74 kvm_vm_free(v->vm); 79 75 } 80 76 77 + struct vgic_region_attr { 78 + uint64_t attr; 79 + uint64_t size; 80 + uint64_t alignment; 81 + }; 82 + 83 + struct vgic_region_attr gic_v3_dist_region = { 84 + .attr = KVM_VGIC_V3_ADDR_TYPE_DIST, 85 + .size = 0x10000, 86 + .alignment = 0x10000, 87 + }; 88 + 89 + struct vgic_region_attr gic_v3_redist_region = { 90 + .attr = KVM_VGIC_V3_ADDR_TYPE_REDIST, 91 + .size = NR_VCPUS * 0x20000, 92 + .alignment = 0x10000, 93 + }; 94 + 95 + struct vgic_region_attr gic_v2_dist_region = { 96 + .attr = KVM_VGIC_V2_ADDR_TYPE_DIST, 97 + .size = 0x1000, 98 + .alignment = 0x1000, 99 + }; 100 + 101 + struct vgic_region_attr gic_v2_cpu_region = { 102 + .attr = KVM_VGIC_V2_ADDR_TYPE_CPU, 103 + .size = 0x2000, 104 + .alignment = 0x1000, 105 + }; 106 + 81 107 /** 82 - * Helper routine that performs KVM device tests in general and 83 - * especially ARM_VGIC_V3 ones. Eventually the ARM_VGIC_V3 84 - * device gets created, a legacy RDIST region is set at @0x0 85 - * and a DIST region is set @0x60000 108 + * Helper routine that performs KVM device tests in general. Eventually the 109 + * ARM_VGIC (GICv2 or GICv3) device gets created with an overlapping 110 + * DIST/REDIST (or DIST/CPUIF for GICv2). Assumption is 4 vcpus are going to be 111 + * used hence the overlap. In the case of GICv3, A RDIST region is set at @0x0 112 + * and a DIST region is set @0x70000. The GICv2 case sets a CPUIF @0x0 and a 113 + * DIST region @0x1000. 86 114 */ 87 115 static void subtest_dist_rdist(struct vm_gic *v) 88 116 { 89 117 int ret; 90 118 uint64_t addr; 119 + struct vgic_region_attr rdist; /* CPU interface in GICv2*/ 120 + struct vgic_region_attr dist; 121 + 122 + rdist = VGIC_DEV_IS_V3(v->gic_dev_type) ? gic_v3_redist_region 123 + : gic_v2_cpu_region; 124 + dist = VGIC_DEV_IS_V3(v->gic_dev_type) ? gic_v3_dist_region 125 + : gic_v2_dist_region; 91 126 92 127 /* Check existing group/attributes */ 93 128 kvm_device_check_attr(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 94 - KVM_VGIC_V3_ADDR_TYPE_DIST); 129 + dist.attr); 95 130 96 131 kvm_device_check_attr(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 97 - KVM_VGIC_V3_ADDR_TYPE_REDIST); 132 + rdist.attr); 98 133 99 134 /* check non existing attribute */ 100 - ret = _kvm_device_check_attr(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 0); 135 + ret = _kvm_device_check_attr(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, -1); 101 136 TEST_ASSERT(ret && errno == ENXIO, "attribute not supported"); 102 137 103 138 /* misaligned DIST and REDIST address settings */ 104 - addr = 0x1000; 139 + addr = dist.alignment / 0x10; 105 140 ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 106 - KVM_VGIC_V3_ADDR_TYPE_DIST, &addr, true); 107 - TEST_ASSERT(ret && errno == EINVAL, "GICv3 dist base not 64kB aligned"); 141 + dist.attr, &addr, true); 142 + TEST_ASSERT(ret && errno == EINVAL, "GIC dist base not aligned"); 108 143 144 + addr = rdist.alignment / 0x10; 109 145 ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 110 - KVM_VGIC_V3_ADDR_TYPE_REDIST, &addr, true); 111 - TEST_ASSERT(ret && errno == EINVAL, "GICv3 redist base not 64kB aligned"); 146 + rdist.attr, &addr, true); 147 + TEST_ASSERT(ret && errno == EINVAL, "GIC redist/cpu base not aligned"); 112 148 113 149 /* out of range address */ 114 - if (max_ipa_bits) { 115 - addr = 1ULL << max_ipa_bits; 116 - ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 117 - KVM_VGIC_V3_ADDR_TYPE_DIST, &addr, true); 118 - TEST_ASSERT(ret && errno == E2BIG, "dist address beyond IPA limit"); 150 + addr = max_phys_size; 151 + ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 152 + dist.attr, &addr, true); 153 + TEST_ASSERT(ret && errno == E2BIG, "dist address beyond IPA limit"); 119 154 120 - ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 121 - KVM_VGIC_V3_ADDR_TYPE_REDIST, &addr, true); 122 - TEST_ASSERT(ret && errno == E2BIG, "redist address beyond IPA limit"); 123 - } 155 + ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 156 + rdist.attr, &addr, true); 157 + TEST_ASSERT(ret && errno == E2BIG, "redist address beyond IPA limit"); 158 + 159 + /* Space for half a rdist (a rdist is: 2 * rdist.alignment). */ 160 + addr = max_phys_size - dist.alignment; 161 + ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 162 + rdist.attr, &addr, true); 163 + TEST_ASSERT(ret && errno == E2BIG, 164 + "half of the redist is beyond IPA limit"); 124 165 125 166 /* set REDIST base address @0x0*/ 126 167 addr = 0x00000; 127 168 kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 128 - KVM_VGIC_V3_ADDR_TYPE_REDIST, &addr, true); 169 + rdist.attr, &addr, true); 129 170 130 171 /* Attempt to create a second legacy redistributor region */ 131 172 addr = 0xE0000; 132 173 ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 133 - KVM_VGIC_V3_ADDR_TYPE_REDIST, &addr, true); 134 - TEST_ASSERT(ret && errno == EEXIST, "GICv3 redist base set again"); 174 + rdist.attr, &addr, true); 175 + TEST_ASSERT(ret && errno == EEXIST, "GIC redist base set again"); 135 176 136 - /* Attempt to mix legacy and new redistributor regions */ 137 - addr = REDIST_REGION_ATTR_ADDR(NR_VCPUS, 0x100000, 0, 0); 138 - ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 139 - KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &addr, true); 140 - TEST_ASSERT(ret && errno == EINVAL, "attempt to mix GICv3 REDIST and REDIST_REGION"); 177 + ret = _kvm_device_check_attr(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 178 + KVM_VGIC_V3_ADDR_TYPE_REDIST); 179 + if (!ret) { 180 + /* Attempt to mix legacy and new redistributor regions */ 181 + addr = REDIST_REGION_ATTR_ADDR(NR_VCPUS, 0x100000, 0, 0); 182 + ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 183 + KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, 184 + &addr, true); 185 + TEST_ASSERT(ret && errno == EINVAL, 186 + "attempt to mix GICv3 REDIST and REDIST_REGION"); 187 + } 141 188 142 189 /* 143 190 * Set overlapping DIST / REDIST, cannot be detected here. Will be detected 144 191 * on first vcpu run instead. 145 192 */ 146 - addr = 3 * 2 * 0x10000; 147 - kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, KVM_VGIC_V3_ADDR_TYPE_DIST, 148 - &addr, true); 193 + addr = rdist.size - rdist.alignment; 194 + kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 195 + dist.attr, &addr, true); 149 196 } 150 197 151 198 /* Test the new REDIST region API */ 152 - static void subtest_redist_regions(struct vm_gic *v) 199 + static void subtest_v3_redist_regions(struct vm_gic *v) 153 200 { 154 201 uint64_t addr, expected_addr; 155 202 int ret; ··· 254 199 kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 255 200 KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &addr, true); 256 201 257 - addr = REDIST_REGION_ATTR_ADDR(1, 1ULL << max_ipa_bits, 0, 2); 202 + addr = REDIST_REGION_ATTR_ADDR(1, max_phys_size, 0, 2); 258 203 ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 259 204 KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &addr, true); 260 205 TEST_ASSERT(ret && errno == E2BIG, 261 206 "register redist region with base address beyond IPA range"); 207 + 208 + /* The last redist is above the pa range. */ 209 + addr = REDIST_REGION_ATTR_ADDR(2, max_phys_size - 0x30000, 0, 2); 210 + ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 211 + KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &addr, true); 212 + TEST_ASSERT(ret && errno == E2BIG, 213 + "register redist region with top address beyond IPA range"); 262 214 263 215 addr = 0x260000; 264 216 ret = _kvm_device_access(v->gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, ··· 311 249 * VGIC KVM device is created and initialized before the secondary CPUs 312 250 * get created 313 251 */ 314 - static void test_vgic_then_vcpus(void) 252 + static void test_vgic_then_vcpus(uint32_t gic_dev_type) 315 253 { 316 254 struct vm_gic v; 317 255 int ret, i; 318 256 319 - v.vm = vm_create_default(0, 0, guest_code); 320 - v.gic_fd = kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_V3, false); 257 + v = vm_gic_create_with_vcpus(gic_dev_type, 1); 321 258 322 259 subtest_dist_rdist(&v); 323 260 ··· 331 270 } 332 271 333 272 /* All the VCPUs are created before the VGIC KVM device gets initialized */ 334 - static void test_vcpus_then_vgic(void) 273 + static void test_vcpus_then_vgic(uint32_t gic_dev_type) 335 274 { 336 275 struct vm_gic v; 337 276 int ret; 338 277 339 - v = vm_gic_create(); 278 + v = vm_gic_create_with_vcpus(gic_dev_type, NR_VCPUS); 340 279 341 280 subtest_dist_rdist(&v); 342 281 ··· 346 285 vm_gic_destroy(&v); 347 286 } 348 287 349 - static void test_new_redist_regions(void) 288 + static void test_v3_new_redist_regions(void) 350 289 { 351 290 void *dummy = NULL; 352 291 struct vm_gic v; 353 292 uint64_t addr; 354 293 int ret; 355 294 356 - v = vm_gic_create(); 357 - subtest_redist_regions(&v); 295 + v = vm_gic_create_with_vcpus(KVM_DEV_TYPE_ARM_VGIC_V3, NR_VCPUS); 296 + subtest_v3_redist_regions(&v); 358 297 kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_CTRL, 359 298 KVM_DEV_ARM_VGIC_CTRL_INIT, NULL, true); 360 299 ··· 364 303 365 304 /* step2 */ 366 305 367 - v = vm_gic_create(); 368 - subtest_redist_regions(&v); 306 + v = vm_gic_create_with_vcpus(KVM_DEV_TYPE_ARM_VGIC_V3, NR_VCPUS); 307 + subtest_v3_redist_regions(&v); 369 308 370 309 addr = REDIST_REGION_ATTR_ADDR(1, 0x280000, 0, 2); 371 310 kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, ··· 378 317 379 318 /* step 3 */ 380 319 381 - v = vm_gic_create(); 382 - subtest_redist_regions(&v); 320 + v = vm_gic_create_with_vcpus(KVM_DEV_TYPE_ARM_VGIC_V3, NR_VCPUS); 321 + subtest_v3_redist_regions(&v); 383 322 384 323 _kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 385 324 KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, dummy, true); ··· 399 338 vm_gic_destroy(&v); 400 339 } 401 340 402 - static void test_typer_accesses(void) 341 + static void test_v3_typer_accesses(void) 403 342 { 404 343 struct vm_gic v; 405 344 uint64_t addr; ··· 412 351 413 352 vm_vcpu_add_default(v.vm, 3, guest_code); 414 353 415 - ret = access_redist_reg(v.gic_fd, 1, GICR_TYPER, &val, false); 354 + ret = access_v3_redist_reg(v.gic_fd, 1, GICR_TYPER, &val, false); 416 355 TEST_ASSERT(ret && errno == EINVAL, "attempting to read GICR_TYPER of non created vcpu"); 417 356 418 357 vm_vcpu_add_default(v.vm, 1, guest_code); 419 358 420 - ret = access_redist_reg(v.gic_fd, 1, GICR_TYPER, &val, false); 359 + ret = access_v3_redist_reg(v.gic_fd, 1, GICR_TYPER, &val, false); 421 360 TEST_ASSERT(ret && errno == EBUSY, "read GICR_TYPER before GIC initialized"); 422 361 423 362 vm_vcpu_add_default(v.vm, 2, guest_code); ··· 426 365 KVM_DEV_ARM_VGIC_CTRL_INIT, NULL, true); 427 366 428 367 for (i = 0; i < NR_VCPUS ; i++) { 429 - ret = access_redist_reg(v.gic_fd, 0, GICR_TYPER, &val, false); 368 + ret = access_v3_redist_reg(v.gic_fd, 0, GICR_TYPER, &val, false); 430 369 TEST_ASSERT(!ret && !val, "read GICR_TYPER before rdist region setting"); 431 370 } 432 371 ··· 435 374 KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &addr, true); 436 375 437 376 /* The 2 first rdists should be put there (vcpu 0 and 3) */ 438 - ret = access_redist_reg(v.gic_fd, 0, GICR_TYPER, &val, false); 377 + ret = access_v3_redist_reg(v.gic_fd, 0, GICR_TYPER, &val, false); 439 378 TEST_ASSERT(!ret && !val, "read typer of rdist #0"); 440 379 441 - ret = access_redist_reg(v.gic_fd, 3, GICR_TYPER, &val, false); 380 + ret = access_v3_redist_reg(v.gic_fd, 3, GICR_TYPER, &val, false); 442 381 TEST_ASSERT(!ret && val == 0x310, "read typer of rdist #1"); 443 382 444 383 addr = REDIST_REGION_ATTR_ADDR(10, 0x100000, 0, 1); ··· 446 385 KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &addr, true); 447 386 TEST_ASSERT(ret && errno == EINVAL, "collision with previous rdist region"); 448 387 449 - ret = access_redist_reg(v.gic_fd, 1, GICR_TYPER, &val, false); 388 + ret = access_v3_redist_reg(v.gic_fd, 1, GICR_TYPER, &val, false); 450 389 TEST_ASSERT(!ret && val == 0x100, 451 390 "no redist region attached to vcpu #1 yet, last cannot be returned"); 452 391 453 - ret = access_redist_reg(v.gic_fd, 2, GICR_TYPER, &val, false); 392 + ret = access_v3_redist_reg(v.gic_fd, 2, GICR_TYPER, &val, false); 454 393 TEST_ASSERT(!ret && val == 0x200, 455 394 "no redist region attached to vcpu #2, last cannot be returned"); 456 395 ··· 458 397 kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 459 398 KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &addr, true); 460 399 461 - ret = access_redist_reg(v.gic_fd, 1, GICR_TYPER, &val, false); 400 + ret = access_v3_redist_reg(v.gic_fd, 1, GICR_TYPER, &val, false); 462 401 TEST_ASSERT(!ret && val == 0x100, "read typer of rdist #1"); 463 402 464 - ret = access_redist_reg(v.gic_fd, 2, GICR_TYPER, &val, false); 403 + ret = access_v3_redist_reg(v.gic_fd, 2, GICR_TYPER, &val, false); 465 404 TEST_ASSERT(!ret && val == 0x210, 466 405 "read typer of rdist #1, last properly returned"); 467 406 ··· 478 417 * rdist region #2 @0x200000 2 rdist capacity 479 418 * rdists: 1, 2 480 419 */ 481 - static void test_last_bit_redist_regions(void) 420 + static void test_v3_last_bit_redist_regions(void) 482 421 { 483 422 uint32_t vcpuids[] = { 0, 3, 5, 4, 1, 2 }; 484 423 struct vm_gic v; ··· 505 444 kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 506 445 KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &addr, true); 507 446 508 - ret = access_redist_reg(v.gic_fd, 0, GICR_TYPER, &val, false); 447 + ret = access_v3_redist_reg(v.gic_fd, 0, GICR_TYPER, &val, false); 509 448 TEST_ASSERT(!ret && val == 0x000, "read typer of rdist #0"); 510 449 511 - ret = access_redist_reg(v.gic_fd, 1, GICR_TYPER, &val, false); 450 + ret = access_v3_redist_reg(v.gic_fd, 1, GICR_TYPER, &val, false); 512 451 TEST_ASSERT(!ret && val == 0x100, "read typer of rdist #1"); 513 452 514 - ret = access_redist_reg(v.gic_fd, 2, GICR_TYPER, &val, false); 453 + ret = access_v3_redist_reg(v.gic_fd, 2, GICR_TYPER, &val, false); 515 454 TEST_ASSERT(!ret && val == 0x200, "read typer of rdist #2"); 516 455 517 - ret = access_redist_reg(v.gic_fd, 3, GICR_TYPER, &val, false); 456 + ret = access_v3_redist_reg(v.gic_fd, 3, GICR_TYPER, &val, false); 518 457 TEST_ASSERT(!ret && val == 0x310, "read typer of rdist #3"); 519 458 520 - ret = access_redist_reg(v.gic_fd, 5, GICR_TYPER, &val, false); 459 + ret = access_v3_redist_reg(v.gic_fd, 5, GICR_TYPER, &val, false); 521 460 TEST_ASSERT(!ret && val == 0x500, "read typer of rdist #5"); 522 461 523 - ret = access_redist_reg(v.gic_fd, 4, GICR_TYPER, &val, false); 462 + ret = access_v3_redist_reg(v.gic_fd, 4, GICR_TYPER, &val, false); 524 463 TEST_ASSERT(!ret && val == 0x410, "read typer of rdist #4"); 525 464 526 465 vm_gic_destroy(&v); 527 466 } 528 467 529 468 /* Test last bit with legacy region */ 530 - static void test_last_bit_single_rdist(void) 469 + static void test_v3_last_bit_single_rdist(void) 531 470 { 532 471 uint32_t vcpuids[] = { 0, 3, 5, 4, 1, 2 }; 533 472 struct vm_gic v; ··· 546 485 kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 547 486 KVM_VGIC_V3_ADDR_TYPE_REDIST, &addr, true); 548 487 549 - ret = access_redist_reg(v.gic_fd, 0, GICR_TYPER, &val, false); 488 + ret = access_v3_redist_reg(v.gic_fd, 0, GICR_TYPER, &val, false); 550 489 TEST_ASSERT(!ret && val == 0x000, "read typer of rdist #0"); 551 490 552 - ret = access_redist_reg(v.gic_fd, 3, GICR_TYPER, &val, false); 491 + ret = access_v3_redist_reg(v.gic_fd, 3, GICR_TYPER, &val, false); 553 492 TEST_ASSERT(!ret && val == 0x300, "read typer of rdist #1"); 554 493 555 - ret = access_redist_reg(v.gic_fd, 5, GICR_TYPER, &val, false); 494 + ret = access_v3_redist_reg(v.gic_fd, 5, GICR_TYPER, &val, false); 556 495 TEST_ASSERT(!ret && val == 0x500, "read typer of rdist #2"); 557 496 558 - ret = access_redist_reg(v.gic_fd, 1, GICR_TYPER, &val, false); 497 + ret = access_v3_redist_reg(v.gic_fd, 1, GICR_TYPER, &val, false); 559 498 TEST_ASSERT(!ret && val == 0x100, "read typer of rdist #3"); 560 499 561 - ret = access_redist_reg(v.gic_fd, 2, GICR_TYPER, &val, false); 500 + ret = access_v3_redist_reg(v.gic_fd, 2, GICR_TYPER, &val, false); 562 501 TEST_ASSERT(!ret && val == 0x210, "read typer of rdist #3"); 563 502 564 503 vm_gic_destroy(&v); 565 504 } 566 505 567 - void test_kvm_device(void) 506 + /* Uses the legacy REDIST region API. */ 507 + static void test_v3_redist_ipa_range_check_at_vcpu_run(void) 508 + { 509 + struct vm_gic v; 510 + int ret, i; 511 + uint64_t addr; 512 + 513 + v = vm_gic_create_with_vcpus(KVM_DEV_TYPE_ARM_VGIC_V3, 1); 514 + 515 + /* Set space for 3 redists, we have 1 vcpu, so this succeeds. */ 516 + addr = max_phys_size - (3 * 2 * 0x10000); 517 + kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 518 + KVM_VGIC_V3_ADDR_TYPE_REDIST, &addr, true); 519 + 520 + addr = 0x00000; 521 + kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 522 + KVM_VGIC_V3_ADDR_TYPE_DIST, &addr, true); 523 + 524 + /* Add the rest of the VCPUs */ 525 + for (i = 1; i < NR_VCPUS; ++i) 526 + vm_vcpu_add_default(v.vm, i, guest_code); 527 + 528 + kvm_device_access(v.gic_fd, KVM_DEV_ARM_VGIC_GRP_CTRL, 529 + KVM_DEV_ARM_VGIC_CTRL_INIT, NULL, true); 530 + 531 + /* Attempt to run a vcpu without enough redist space. */ 532 + ret = run_vcpu(v.vm, 2); 533 + TEST_ASSERT(ret && errno == EINVAL, 534 + "redist base+size above PA range detected on 1st vcpu run"); 535 + 536 + vm_gic_destroy(&v); 537 + } 538 + 539 + static void test_v3_its_region(void) 540 + { 541 + struct vm_gic v; 542 + uint64_t addr; 543 + int its_fd, ret; 544 + 545 + v = vm_gic_create_with_vcpus(KVM_DEV_TYPE_ARM_VGIC_V3, NR_VCPUS); 546 + its_fd = kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_ITS, false); 547 + 548 + addr = 0x401000; 549 + ret = _kvm_device_access(its_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 550 + KVM_VGIC_ITS_ADDR_TYPE, &addr, true); 551 + TEST_ASSERT(ret && errno == EINVAL, 552 + "ITS region with misaligned address"); 553 + 554 + addr = max_phys_size; 555 + ret = _kvm_device_access(its_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 556 + KVM_VGIC_ITS_ADDR_TYPE, &addr, true); 557 + TEST_ASSERT(ret && errno == E2BIG, 558 + "register ITS region with base address beyond IPA range"); 559 + 560 + addr = max_phys_size - 0x10000; 561 + ret = _kvm_device_access(its_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 562 + KVM_VGIC_ITS_ADDR_TYPE, &addr, true); 563 + TEST_ASSERT(ret && errno == E2BIG, 564 + "Half of ITS region is beyond IPA range"); 565 + 566 + /* This one succeeds setting the ITS base */ 567 + addr = 0x400000; 568 + kvm_device_access(its_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 569 + KVM_VGIC_ITS_ADDR_TYPE, &addr, true); 570 + 571 + addr = 0x300000; 572 + ret = _kvm_device_access(its_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 573 + KVM_VGIC_ITS_ADDR_TYPE, &addr, true); 574 + TEST_ASSERT(ret && errno == EEXIST, "ITS base set again"); 575 + 576 + close(its_fd); 577 + vm_gic_destroy(&v); 578 + } 579 + 580 + /* 581 + * Returns 0 if it's possible to create GIC device of a given type (V2 or V3). 582 + */ 583 + int test_kvm_device(uint32_t gic_dev_type) 568 584 { 569 585 struct vm_gic v; 570 586 int ret, fd; 587 + uint32_t other; 571 588 572 589 v.vm = vm_create_default_with_vcpus(NR_VCPUS, 0, 0, guest_code, NULL); 573 590 ··· 653 514 ret = _kvm_create_device(v.vm, 0, true, &fd); 654 515 TEST_ASSERT(ret && errno == ENODEV, "unsupported device"); 655 516 656 - /* trial mode with VGIC_V3 device */ 657 - ret = _kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_V3, true, &fd); 658 - if (ret) { 659 - print_skip("GICv3 not supported"); 660 - exit(KSFT_SKIP); 661 - } 662 - v.gic_fd = kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_V3, false); 517 + /* trial mode */ 518 + ret = _kvm_create_device(v.vm, gic_dev_type, true, &fd); 519 + if (ret) 520 + return ret; 521 + v.gic_fd = kvm_create_device(v.vm, gic_dev_type, false); 663 522 664 - ret = _kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_V3, false, &fd); 665 - TEST_ASSERT(ret && errno == EEXIST, "create GICv3 device twice"); 523 + ret = _kvm_create_device(v.vm, gic_dev_type, false, &fd); 524 + TEST_ASSERT(ret && errno == EEXIST, "create GIC device twice"); 666 525 667 - kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_V3, true); 526 + kvm_create_device(v.vm, gic_dev_type, true); 668 527 669 - if (!_kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_V2, true, &fd)) { 670 - ret = _kvm_create_device(v.vm, KVM_DEV_TYPE_ARM_VGIC_V2, false, &fd); 671 - TEST_ASSERT(ret && errno == EINVAL, "create GICv2 while v3 exists"); 528 + /* try to create the other gic_dev_type */ 529 + other = VGIC_DEV_IS_V2(gic_dev_type) ? KVM_DEV_TYPE_ARM_VGIC_V3 530 + : KVM_DEV_TYPE_ARM_VGIC_V2; 531 + 532 + if (!_kvm_create_device(v.vm, other, true, &fd)) { 533 + ret = _kvm_create_device(v.vm, other, false, &fd); 534 + TEST_ASSERT(ret && errno == EINVAL, 535 + "create GIC device while other version exists"); 672 536 } 673 537 674 538 vm_gic_destroy(&v); 539 + 540 + return 0; 541 + } 542 + 543 + void run_tests(uint32_t gic_dev_type) 544 + { 545 + test_vcpus_then_vgic(gic_dev_type); 546 + test_vgic_then_vcpus(gic_dev_type); 547 + 548 + if (VGIC_DEV_IS_V3(gic_dev_type)) { 549 + test_v3_new_redist_regions(); 550 + test_v3_typer_accesses(); 551 + test_v3_last_bit_redist_regions(); 552 + test_v3_last_bit_single_rdist(); 553 + test_v3_redist_ipa_range_check_at_vcpu_run(); 554 + test_v3_its_region(); 555 + } 675 556 } 676 557 677 558 int main(int ac, char **av) 678 559 { 679 - max_ipa_bits = kvm_check_cap(KVM_CAP_ARM_VM_IPA_SIZE); 560 + int ret; 561 + int pa_bits; 680 562 681 - test_kvm_device(); 682 - test_vcpus_then_vgic(); 683 - test_vgic_then_vcpus(); 684 - test_new_redist_regions(); 685 - test_typer_accesses(); 686 - test_last_bit_redist_regions(); 687 - test_last_bit_single_rdist(); 563 + pa_bits = vm_guest_mode_params[VM_MODE_DEFAULT].pa_bits; 564 + max_phys_size = 1ULL << pa_bits; 688 565 566 + ret = test_kvm_device(KVM_DEV_TYPE_ARM_VGIC_V3); 567 + if (!ret) { 568 + pr_info("Running GIC_v3 tests.\n"); 569 + run_tests(KVM_DEV_TYPE_ARM_VGIC_V3); 570 + return 0; 571 + } 572 + 573 + ret = test_kvm_device(KVM_DEV_TYPE_ARM_VGIC_V2); 574 + if (!ret) { 575 + pr_info("Running GIC_v2 tests.\n"); 576 + run_tests(KVM_DEV_TYPE_ARM_VGIC_V2); 577 + return 0; 578 + } 579 + 580 + print_skip("No GICv2 nor GICv3 support"); 581 + exit(KSFT_SKIP); 689 582 return 0; 690 583 }

+142

tools/testing/selftests/kvm/include/aarch64/arch_timer.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * ARM Generic Timer specific interface 4 + */ 5 + 6 + #ifndef SELFTEST_KVM_ARCH_TIMER_H 7 + #define SELFTEST_KVM_ARCH_TIMER_H 8 + 9 + #include "processor.h" 10 + 11 + enum arch_timer { 12 + VIRTUAL, 13 + PHYSICAL, 14 + }; 15 + 16 + #define CTL_ENABLE (1 << 0) 17 + #define CTL_IMASK (1 << 1) 18 + #define CTL_ISTATUS (1 << 2) 19 + 20 + #define msec_to_cycles(msec) \ 21 + (timer_get_cntfrq() * (uint64_t)(msec) / 1000) 22 + 23 + #define usec_to_cycles(usec) \ 24 + (timer_get_cntfrq() * (uint64_t)(usec) / 1000000) 25 + 26 + #define cycles_to_usec(cycles) \ 27 + ((uint64_t)(cycles) * 1000000 / timer_get_cntfrq()) 28 + 29 + static inline uint32_t timer_get_cntfrq(void) 30 + { 31 + return read_sysreg(cntfrq_el0); 32 + } 33 + 34 + static inline uint64_t timer_get_cntct(enum arch_timer timer) 35 + { 36 + isb(); 37 + 38 + switch (timer) { 39 + case VIRTUAL: 40 + return read_sysreg(cntvct_el0); 41 + case PHYSICAL: 42 + return read_sysreg(cntpct_el0); 43 + default: 44 + GUEST_ASSERT_1(0, timer); 45 + } 46 + 47 + /* We should not reach here */ 48 + return 0; 49 + } 50 + 51 + static inline void timer_set_cval(enum arch_timer timer, uint64_t cval) 52 + { 53 + switch (timer) { 54 + case VIRTUAL: 55 + write_sysreg(cval, cntv_cval_el0); 56 + break; 57 + case PHYSICAL: 58 + write_sysreg(cval, cntp_cval_el0); 59 + break; 60 + default: 61 + GUEST_ASSERT_1(0, timer); 62 + } 63 + 64 + isb(); 65 + } 66 + 67 + static inline uint64_t timer_get_cval(enum arch_timer timer) 68 + { 69 + switch (timer) { 70 + case VIRTUAL: 71 + return read_sysreg(cntv_cval_el0); 72 + case PHYSICAL: 73 + return read_sysreg(cntp_cval_el0); 74 + default: 75 + GUEST_ASSERT_1(0, timer); 76 + } 77 + 78 + /* We should not reach here */ 79 + return 0; 80 + } 81 + 82 + static inline void timer_set_tval(enum arch_timer timer, uint32_t tval) 83 + { 84 + switch (timer) { 85 + case VIRTUAL: 86 + write_sysreg(tval, cntv_tval_el0); 87 + break; 88 + case PHYSICAL: 89 + write_sysreg(tval, cntp_tval_el0); 90 + break; 91 + default: 92 + GUEST_ASSERT_1(0, timer); 93 + } 94 + 95 + isb(); 96 + } 97 + 98 + static inline void timer_set_ctl(enum arch_timer timer, uint32_t ctl) 99 + { 100 + switch (timer) { 101 + case VIRTUAL: 102 + write_sysreg(ctl, cntv_ctl_el0); 103 + break; 104 + case PHYSICAL: 105 + write_sysreg(ctl, cntp_ctl_el0); 106 + break; 107 + default: 108 + GUEST_ASSERT_1(0, timer); 109 + } 110 + 111 + isb(); 112 + } 113 + 114 + static inline uint32_t timer_get_ctl(enum arch_timer timer) 115 + { 116 + switch (timer) { 117 + case VIRTUAL: 118 + return read_sysreg(cntv_ctl_el0); 119 + case PHYSICAL: 120 + return read_sysreg(cntp_ctl_el0); 121 + default: 122 + GUEST_ASSERT_1(0, timer); 123 + } 124 + 125 + /* We should not reach here */ 126 + return 0; 127 + } 128 + 129 + static inline void timer_set_next_cval_ms(enum arch_timer timer, uint32_t msec) 130 + { 131 + uint64_t now_ct = timer_get_cntct(timer); 132 + uint64_t next_ct = now_ct + msec_to_cycles(msec); 133 + 134 + timer_set_cval(timer, next_ct); 135 + } 136 + 137 + static inline void timer_set_next_tval_ms(enum arch_timer timer, uint32_t msec) 138 + { 139 + timer_set_tval(timer, msec_to_cycles(msec)); 140 + } 141 + 142 + #endif /* SELFTEST_KVM_ARCH_TIMER_H */

+25

tools/testing/selftests/kvm/include/aarch64/delay.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * ARM simple delay routines 4 + */ 5 + 6 + #ifndef SELFTEST_KVM_ARM_DELAY_H 7 + #define SELFTEST_KVM_ARM_DELAY_H 8 + 9 + #include "arch_timer.h" 10 + 11 + static inline void __delay(uint64_t cycles) 12 + { 13 + enum arch_timer timer = VIRTUAL; 14 + uint64_t start = timer_get_cntct(timer); 15 + 16 + while ((timer_get_cntct(timer) - start) < cycles) 17 + cpu_relax(); 18 + } 19 + 20 + static inline void udelay(unsigned long usec) 21 + { 22 + __delay(usec_to_cycles(usec)); 23 + } 24 + 25 + #endif /* SELFTEST_KVM_ARM_DELAY_H */

+21

tools/testing/selftests/kvm/include/aarch64/gic.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * ARM Generic Interrupt Controller (GIC) specific defines 4 + */ 5 + 6 + #ifndef SELFTEST_KVM_GIC_H 7 + #define SELFTEST_KVM_GIC_H 8 + 9 + enum gic_type { 10 + GIC_V3, 11 + GIC_TYPE_MAX, 12 + }; 13 + 14 + void gic_init(enum gic_type type, unsigned int nr_cpus, 15 + void *dist_base, void *redist_base); 16 + void gic_irq_enable(unsigned int intid); 17 + void gic_irq_disable(unsigned int intid); 18 + unsigned int gic_get_and_ack_irq(void); 19 + void gic_set_eoi(unsigned int intid); 20 + 21 + #endif /* SELFTEST_KVM_GIC_H */

+70 -20

tools/testing/selftests/kvm/include/aarch64/processor.h

··· 9 9 10 10 #include "kvm_util.h" 11 11 #include <linux/stringify.h> 12 + #include <linux/types.h> 13 + #include <asm/sysreg.h> 12 14 13 15 14 16 #define ARM64_CORE_REG(x) (KVM_REG_ARM64 | KVM_REG_SIZE_U64 | \ 15 17 KVM_REG_ARM_CORE | KVM_REG_ARM_CORE_REG(x)) 16 18 17 - #define CPACR_EL1 3, 0, 1, 0, 2 18 - #define TCR_EL1 3, 0, 2, 0, 2 19 - #define MAIR_EL1 3, 0, 10, 2, 0 20 - #define MPIDR_EL1 3, 0, 0, 0, 5 21 - #define TTBR0_EL1 3, 0, 2, 0, 0 22 - #define SCTLR_EL1 3, 0, 1, 0, 0 23 - #define VBAR_EL1 3, 0, 12, 0, 0 24 - 25 - #define ID_AA64DFR0_EL1 3, 0, 0, 5, 0 19 + /* 20 + * KVM_ARM64_SYS_REG(sys_reg_id): Helper macro to convert 21 + * SYS_* register definitions in asm/sysreg.h to use in KVM 22 + * calls such as get_reg() and set_reg(). 23 + */ 24 + #define KVM_ARM64_SYS_REG(sys_reg_id) \ 25 + ARM64_SYS_REG(sys_reg_Op0(sys_reg_id), \ 26 + sys_reg_Op1(sys_reg_id), \ 27 + sys_reg_CRn(sys_reg_id), \ 28 + sys_reg_CRm(sys_reg_id), \ 29 + sys_reg_Op2(sys_reg_id)) 26 30 27 31 /* 28 32 * Default MAIR ··· 63 59 vcpu_ioctl(vm, vcpuid, KVM_SET_ONE_REG, &reg); 64 60 } 65 61 66 - void aarch64_vcpu_setup(struct kvm_vm *vm, int vcpuid, struct kvm_vcpu_init *init); 62 + void aarch64_vcpu_setup(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_vcpu_init *init); 67 63 void aarch64_vcpu_add_default(struct kvm_vm *vm, uint32_t vcpuid, 68 64 struct kvm_vcpu_init *init, void *guest_code); 69 65 ··· 122 118 void vm_install_sync_handler(struct kvm_vm *vm, 123 119 int vector, int ec, handler_fn handler); 124 120 125 - #define write_sysreg(reg, val) \ 126 - ({ \ 127 - u64 __val = (u64)(val); \ 128 - asm volatile("msr " __stringify(reg) ", %x0" : : "rZ" (__val)); \ 121 + static inline void cpu_relax(void) 122 + { 123 + asm volatile("yield" ::: "memory"); 124 + } 125 + 126 + #define isb() asm volatile("isb" : : : "memory") 127 + #define dsb(opt) asm volatile("dsb " #opt : : : "memory") 128 + #define dmb(opt) asm volatile("dmb " #opt : : : "memory") 129 + 130 + #define dma_wmb() dmb(oshst) 131 + #define __iowmb() dma_wmb() 132 + 133 + #define dma_rmb() dmb(oshld) 134 + 135 + #define __iormb(v) \ 136 + ({ \ 137 + unsigned long tmp; \ 138 + \ 139 + dma_rmb(); \ 140 + \ 141 + /* \ 142 + * Courtesy of arch/arm64/include/asm/io.h: \ 143 + * Create a dummy control dependency from the IO read to any \ 144 + * later instructions. This ensures that a subsequent call \ 145 + * to udelay() will be ordered due to the ISB in __delay(). \ 146 + */ \ 147 + asm volatile("eor %0, %1, %1\n" \ 148 + "cbnz %0, ." \ 149 + : "=r" (tmp) : "r" ((unsigned long)(v)) \ 150 + : "memory"); \ 129 151 }) 130 152 131 - #define read_sysreg(reg) \ 132 - ({ u64 val; \ 133 - asm volatile("mrs %0, "__stringify(reg) : "=r"(val) : : "memory");\ 134 - val; \ 135 - }) 153 + static __always_inline void __raw_writel(u32 val, volatile void *addr) 154 + { 155 + asm volatile("str %w0, [%1]" : : "rZ" (val), "r" (addr)); 156 + } 136 157 137 - #define isb() asm volatile("isb" : : : "memory") 158 + static __always_inline u32 __raw_readl(const volatile void *addr) 159 + { 160 + u32 val; 161 + asm volatile("ldr %w0, [%1]" : "=r" (val) : "r" (addr)); 162 + return val; 163 + } 164 + 165 + #define writel_relaxed(v,c) ((void)__raw_writel((__force u32)cpu_to_le32(v),(c))) 166 + #define readl_relaxed(c) ({ u32 __r = le32_to_cpu((__force __le32)__raw_readl(c)); __r; }) 167 + 168 + #define writel(v,c) ({ __iowmb(); writel_relaxed((v),(c));}) 169 + #define readl(c) ({ u32 __v = readl_relaxed(c); __iormb(__v); __v; }) 170 + 171 + static inline void local_irq_enable(void) 172 + { 173 + asm volatile("msr daifclr, #3" : : : "memory"); 174 + } 175 + 176 + static inline void local_irq_disable(void) 177 + { 178 + asm volatile("msr daifset, #3" : : : "memory"); 179 + } 138 180 139 181 #endif /* SELFTEST_KVM_PROCESSOR_H */

+13

tools/testing/selftests/kvm/include/aarch64/spinlock.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + #ifndef SELFTEST_KVM_ARM64_SPINLOCK_H 4 + #define SELFTEST_KVM_ARM64_SPINLOCK_H 5 + 6 + struct spinlock { 7 + int v; 8 + }; 9 + 10 + extern void spin_lock(struct spinlock *lock); 11 + extern void spin_unlock(struct spinlock *lock); 12 + 13 + #endif /* SELFTEST_KVM_ARM64_SPINLOCK_H */

+20

tools/testing/selftests/kvm/include/aarch64/vgic.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * ARM Generic Interrupt Controller (GIC) host specific defines 4 + */ 5 + 6 + #ifndef SELFTEST_KVM_VGIC_H 7 + #define SELFTEST_KVM_VGIC_H 8 + 9 + #include <linux/kvm.h> 10 + 11 + #define REDIST_REGION_ATTR_ADDR(count, base, flags, index) \ 12 + (((uint64_t)(count) << 52) | \ 13 + ((uint64_t)((base) >> 16) << 16) | \ 14 + ((uint64_t)(flags) << 12) | \ 15 + index) 16 + 17 + int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, 18 + uint64_t gicd_base_gpa, uint64_t gicr_base_gpa); 19 + 20 + #endif /* SELFTEST_KVM_VGIC_H */

+13

tools/testing/selftests/kvm/include/kvm_util.h

··· 19 19 #define KVM_DEV_PATH "/dev/kvm" 20 20 #define KVM_MAX_VCPUS 512 21 21 22 + #define NSEC_PER_SEC 1000000000L 23 + 22 24 /* 23 25 * Callers of kvm_util only have an incomplete/opaque description of the 24 26 * structure kvm_util is using to maintain the state of a VM. ··· 240 238 int kvm_device_access(int dev_fd, uint32_t group, uint64_t attr, 241 239 void *val, bool write); 242 240 241 + int _vcpu_has_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group, 242 + uint64_t attr); 243 + int vcpu_has_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group, 244 + uint64_t attr); 245 + int _vcpu_access_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group, 246 + uint64_t attr, void *val, bool write); 247 + int vcpu_access_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group, 248 + uint64_t attr, void *val, bool write); 249 + 243 250 const char *exit_reason_str(unsigned int exit_reason); 244 251 245 252 void virt_pgd_alloc(struct kvm_vm *vm); ··· 410 399 411 400 int vm_get_stats_fd(struct kvm_vm *vm); 412 401 int vcpu_get_stats_fd(struct kvm_vm *vm, uint32_t vcpuid); 402 + 403 + uint32_t guest_get_vcpuid(void); 413 404 414 405 #endif /* SELFTEST_KVM_UTIL_H */

+1 -1

tools/testing/selftests/kvm/kvm_create_max_vcpus.c

··· 53 53 kvm_max_vcpu_id = kvm_max_vcpus; 54 54 55 55 TEST_ASSERT(kvm_max_vcpu_id >= kvm_max_vcpus, 56 - "KVM_MAX_VCPU_ID (%d) must be at least as large as KVM_MAX_VCPUS (%d).", 56 + "KVM_MAX_VCPU_IDS (%d) must be at least as large as KVM_MAX_VCPUS (%d).", 57 57 kvm_max_vcpu_id, kvm_max_vcpus); 58 58 59 59 test_vcpu_creation(0, kvm_max_vcpus);

+95

tools/testing/selftests/kvm/lib/aarch64/gic.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * ARM Generic Interrupt Controller (GIC) support 4 + */ 5 + 6 + #include <errno.h> 7 + #include <linux/bits.h> 8 + #include <linux/sizes.h> 9 + 10 + #include "kvm_util.h" 11 + 12 + #include <gic.h> 13 + #include "gic_private.h" 14 + #include "processor.h" 15 + #include "spinlock.h" 16 + 17 + static const struct gic_common_ops *gic_common_ops; 18 + static struct spinlock gic_lock; 19 + 20 + static void gic_cpu_init(unsigned int cpu, void *redist_base) 21 + { 22 + gic_common_ops->gic_cpu_init(cpu, redist_base); 23 + } 24 + 25 + static void 26 + gic_dist_init(enum gic_type type, unsigned int nr_cpus, void *dist_base) 27 + { 28 + const struct gic_common_ops *gic_ops = NULL; 29 + 30 + spin_lock(&gic_lock); 31 + 32 + /* Distributor initialization is needed only once per VM */ 33 + if (gic_common_ops) { 34 + spin_unlock(&gic_lock); 35 + return; 36 + } 37 + 38 + if (type == GIC_V3) 39 + gic_ops = &gicv3_ops; 40 + 41 + GUEST_ASSERT(gic_ops); 42 + 43 + gic_ops->gic_init(nr_cpus, dist_base); 44 + gic_common_ops = gic_ops; 45 + 46 + /* Make sure that the initialized data is visible to all the vCPUs */ 47 + dsb(sy); 48 + 49 + spin_unlock(&gic_lock); 50 + } 51 + 52 + void gic_init(enum gic_type type, unsigned int nr_cpus, 53 + void *dist_base, void *redist_base) 54 + { 55 + uint32_t cpu = guest_get_vcpuid(); 56 + 57 + GUEST_ASSERT(type < GIC_TYPE_MAX); 58 + GUEST_ASSERT(dist_base); 59 + GUEST_ASSERT(redist_base); 60 + GUEST_ASSERT(nr_cpus); 61 + 62 + gic_dist_init(type, nr_cpus, dist_base); 63 + gic_cpu_init(cpu, redist_base); 64 + } 65 + 66 + void gic_irq_enable(unsigned int intid) 67 + { 68 + GUEST_ASSERT(gic_common_ops); 69 + gic_common_ops->gic_irq_enable(intid); 70 + } 71 + 72 + void gic_irq_disable(unsigned int intid) 73 + { 74 + GUEST_ASSERT(gic_common_ops); 75 + gic_common_ops->gic_irq_disable(intid); 76 + } 77 + 78 + unsigned int gic_get_and_ack_irq(void) 79 + { 80 + uint64_t irqstat; 81 + unsigned int intid; 82 + 83 + GUEST_ASSERT(gic_common_ops); 84 + 85 + irqstat = gic_common_ops->gic_read_iar(); 86 + intid = irqstat & GENMASK(23, 0); 87 + 88 + return intid; 89 + } 90 + 91 + void gic_set_eoi(unsigned int intid) 92 + { 93 + GUEST_ASSERT(gic_common_ops); 94 + gic_common_ops->gic_write_eoir(intid); 95 + }

+21

tools/testing/selftests/kvm/lib/aarch64/gic_private.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * ARM Generic Interrupt Controller (GIC) private defines that's only 4 + * shared among the GIC library code. 5 + */ 6 + 7 + #ifndef SELFTEST_KVM_GIC_PRIVATE_H 8 + #define SELFTEST_KVM_GIC_PRIVATE_H 9 + 10 + struct gic_common_ops { 11 + void (*gic_init)(unsigned int nr_cpus, void *dist_base); 12 + void (*gic_cpu_init)(unsigned int cpu, void *redist_base); 13 + void (*gic_irq_enable)(unsigned int intid); 14 + void (*gic_irq_disable)(unsigned int intid); 15 + uint64_t (*gic_read_iar)(void); 16 + void (*gic_write_eoir)(uint32_t irq); 17 + }; 18 + 19 + extern const struct gic_common_ops gicv3_ops; 20 + 21 + #endif /* SELFTEST_KVM_GIC_PRIVATE_H */

+240

tools/testing/selftests/kvm/lib/aarch64/gic_v3.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * ARM Generic Interrupt Controller (GIC) v3 support 4 + */ 5 + 6 + #include <linux/sizes.h> 7 + 8 + #include "kvm_util.h" 9 + #include "processor.h" 10 + #include "delay.h" 11 + 12 + #include "gic_v3.h" 13 + #include "gic_private.h" 14 + 15 + struct gicv3_data { 16 + void *dist_base; 17 + void *redist_base[GICV3_MAX_CPUS]; 18 + unsigned int nr_cpus; 19 + unsigned int nr_spis; 20 + }; 21 + 22 + #define sgi_base_from_redist(redist_base) (redist_base + SZ_64K) 23 + 24 + enum gicv3_intid_range { 25 + SGI_RANGE, 26 + PPI_RANGE, 27 + SPI_RANGE, 28 + INVALID_RANGE, 29 + }; 30 + 31 + static struct gicv3_data gicv3_data; 32 + 33 + static void gicv3_gicd_wait_for_rwp(void) 34 + { 35 + unsigned int count = 100000; /* 1s */ 36 + 37 + while (readl(gicv3_data.dist_base + GICD_CTLR) & GICD_CTLR_RWP) { 38 + GUEST_ASSERT(count--); 39 + udelay(10); 40 + } 41 + } 42 + 43 + static void gicv3_gicr_wait_for_rwp(void *redist_base) 44 + { 45 + unsigned int count = 100000; /* 1s */ 46 + 47 + while (readl(redist_base + GICR_CTLR) & GICR_CTLR_RWP) { 48 + GUEST_ASSERT(count--); 49 + udelay(10); 50 + } 51 + } 52 + 53 + static enum gicv3_intid_range get_intid_range(unsigned int intid) 54 + { 55 + switch (intid) { 56 + case 0 ... 15: 57 + return SGI_RANGE; 58 + case 16 ... 31: 59 + return PPI_RANGE; 60 + case 32 ... 1019: 61 + return SPI_RANGE; 62 + } 63 + 64 + /* We should not be reaching here */ 65 + GUEST_ASSERT(0); 66 + 67 + return INVALID_RANGE; 68 + } 69 + 70 + static uint64_t gicv3_read_iar(void) 71 + { 72 + uint64_t irqstat = read_sysreg_s(SYS_ICC_IAR1_EL1); 73 + 74 + dsb(sy); 75 + return irqstat; 76 + } 77 + 78 + static void gicv3_write_eoir(uint32_t irq) 79 + { 80 + write_sysreg_s(irq, SYS_ICC_EOIR1_EL1); 81 + isb(); 82 + } 83 + 84 + static void 85 + gicv3_config_irq(unsigned int intid, unsigned int offset) 86 + { 87 + uint32_t cpu = guest_get_vcpuid(); 88 + uint32_t mask = 1 << (intid % 32); 89 + enum gicv3_intid_range intid_range = get_intid_range(intid); 90 + void *reg; 91 + 92 + /* We care about 'cpu' only for SGIs or PPIs */ 93 + if (intid_range == SGI_RANGE || intid_range == PPI_RANGE) { 94 + GUEST_ASSERT(cpu < gicv3_data.nr_cpus); 95 + 96 + reg = sgi_base_from_redist(gicv3_data.redist_base[cpu]) + 97 + offset; 98 + writel(mask, reg); 99 + gicv3_gicr_wait_for_rwp(gicv3_data.redist_base[cpu]); 100 + } else if (intid_range == SPI_RANGE) { 101 + reg = gicv3_data.dist_base + offset + (intid / 32) * 4; 102 + writel(mask, reg); 103 + gicv3_gicd_wait_for_rwp(); 104 + } else { 105 + GUEST_ASSERT(0); 106 + } 107 + } 108 + 109 + static void gicv3_irq_enable(unsigned int intid) 110 + { 111 + gicv3_config_irq(intid, GICD_ISENABLER); 112 + } 113 + 114 + static void gicv3_irq_disable(unsigned int intid) 115 + { 116 + gicv3_config_irq(intid, GICD_ICENABLER); 117 + } 118 + 119 + static void gicv3_enable_redist(void *redist_base) 120 + { 121 + uint32_t val = readl(redist_base + GICR_WAKER); 122 + unsigned int count = 100000; /* 1s */ 123 + 124 + val &= ~GICR_WAKER_ProcessorSleep; 125 + writel(val, redist_base + GICR_WAKER); 126 + 127 + /* Wait until the processor is 'active' */ 128 + while (readl(redist_base + GICR_WAKER) & GICR_WAKER_ChildrenAsleep) { 129 + GUEST_ASSERT(count--); 130 + udelay(10); 131 + } 132 + } 133 + 134 + static inline void *gicr_base_cpu(void *redist_base, uint32_t cpu) 135 + { 136 + /* Align all the redistributors sequentially */ 137 + return redist_base + cpu * SZ_64K * 2; 138 + } 139 + 140 + static void gicv3_cpu_init(unsigned int cpu, void *redist_base) 141 + { 142 + void *sgi_base; 143 + unsigned int i; 144 + void *redist_base_cpu; 145 + 146 + GUEST_ASSERT(cpu < gicv3_data.nr_cpus); 147 + 148 + redist_base_cpu = gicr_base_cpu(redist_base, cpu); 149 + sgi_base = sgi_base_from_redist(redist_base_cpu); 150 + 151 + gicv3_enable_redist(redist_base_cpu); 152 + 153 + /* 154 + * Mark all the SGI and PPI interrupts as non-secure Group-1. 155 + * Also, deactivate and disable them. 156 + */ 157 + writel(~0, sgi_base + GICR_IGROUPR0); 158 + writel(~0, sgi_base + GICR_ICACTIVER0); 159 + writel(~0, sgi_base + GICR_ICENABLER0); 160 + 161 + /* Set a default priority for all the SGIs and PPIs */ 162 + for (i = 0; i < 32; i += 4) 163 + writel(GICD_INT_DEF_PRI_X4, 164 + sgi_base + GICR_IPRIORITYR0 + i); 165 + 166 + gicv3_gicr_wait_for_rwp(redist_base_cpu); 167 + 168 + /* Enable the GIC system register (ICC_*) access */ 169 + write_sysreg_s(read_sysreg_s(SYS_ICC_SRE_EL1) | ICC_SRE_EL1_SRE, 170 + SYS_ICC_SRE_EL1); 171 + 172 + /* Set a default priority threshold */ 173 + write_sysreg_s(ICC_PMR_DEF_PRIO, SYS_ICC_PMR_EL1); 174 + 175 + /* Enable non-secure Group-1 interrupts */ 176 + write_sysreg_s(ICC_IGRPEN1_EL1_ENABLE, SYS_ICC_GRPEN1_EL1); 177 + 178 + gicv3_data.redist_base[cpu] = redist_base_cpu; 179 + } 180 + 181 + static void gicv3_dist_init(void) 182 + { 183 + void *dist_base = gicv3_data.dist_base; 184 + unsigned int i; 185 + 186 + /* Disable the distributor until we set things up */ 187 + writel(0, dist_base + GICD_CTLR); 188 + gicv3_gicd_wait_for_rwp(); 189 + 190 + /* 191 + * Mark all the SPI interrupts as non-secure Group-1. 192 + * Also, deactivate and disable them. 193 + */ 194 + for (i = 32; i < gicv3_data.nr_spis; i += 32) { 195 + writel(~0, dist_base + GICD_IGROUPR + i / 8); 196 + writel(~0, dist_base + GICD_ICACTIVER + i / 8); 197 + writel(~0, dist_base + GICD_ICENABLER + i / 8); 198 + } 199 + 200 + /* Set a default priority for all the SPIs */ 201 + for (i = 32; i < gicv3_data.nr_spis; i += 4) 202 + writel(GICD_INT_DEF_PRI_X4, 203 + dist_base + GICD_IPRIORITYR + i); 204 + 205 + /* Wait for the settings to sync-in */ 206 + gicv3_gicd_wait_for_rwp(); 207 + 208 + /* Finally, enable the distributor globally with ARE */ 209 + writel(GICD_CTLR_ARE_NS | GICD_CTLR_ENABLE_G1A | 210 + GICD_CTLR_ENABLE_G1, dist_base + GICD_CTLR); 211 + gicv3_gicd_wait_for_rwp(); 212 + } 213 + 214 + static void gicv3_init(unsigned int nr_cpus, void *dist_base) 215 + { 216 + GUEST_ASSERT(nr_cpus <= GICV3_MAX_CPUS); 217 + 218 + gicv3_data.nr_cpus = nr_cpus; 219 + gicv3_data.dist_base = dist_base; 220 + gicv3_data.nr_spis = GICD_TYPER_SPIS( 221 + readl(gicv3_data.dist_base + GICD_TYPER)); 222 + if (gicv3_data.nr_spis > 1020) 223 + gicv3_data.nr_spis = 1020; 224 + 225 + /* 226 + * Initialize only the distributor for now. 227 + * The redistributor and CPU interfaces are initialized 228 + * later for every PE. 229 + */ 230 + gicv3_dist_init(); 231 + } 232 + 233 + const struct gic_common_ops gicv3_ops = { 234 + .gic_init = gicv3_init, 235 + .gic_cpu_init = gicv3_cpu_init, 236 + .gic_irq_enable = gicv3_irq_enable, 237 + .gic_irq_disable = gicv3_irq_disable, 238 + .gic_read_iar = gicv3_read_iar, 239 + .gic_write_eoir = gicv3_write_eoir, 240 + };

+70

tools/testing/selftests/kvm/lib/aarch64/gic_v3.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * ARM Generic Interrupt Controller (GIC) v3 specific defines 4 + */ 5 + 6 + #ifndef SELFTEST_KVM_GICV3_H 7 + #define SELFTEST_KVM_GICV3_H 8 + 9 + #include <asm/sysreg.h> 10 + 11 + /* 12 + * Distributor registers 13 + */ 14 + #define GICD_CTLR 0x0000 15 + #define GICD_TYPER 0x0004 16 + #define GICD_IGROUPR 0x0080 17 + #define GICD_ISENABLER 0x0100 18 + #define GICD_ICENABLER 0x0180 19 + #define GICD_ICACTIVER 0x0380 20 + #define GICD_IPRIORITYR 0x0400 21 + 22 + /* 23 + * The assumption is that the guest runs in a non-secure mode. 24 + * The following bits of GICD_CTLR are defined accordingly. 25 + */ 26 + #define GICD_CTLR_RWP (1U << 31) 27 + #define GICD_CTLR_nASSGIreq (1U << 8) 28 + #define GICD_CTLR_ARE_NS (1U << 4) 29 + #define GICD_CTLR_ENABLE_G1A (1U << 1) 30 + #define GICD_CTLR_ENABLE_G1 (1U << 0) 31 + 32 + #define GICD_TYPER_SPIS(typer) ((((typer) & 0x1f) + 1) * 32) 33 + #define GICD_INT_DEF_PRI_X4 0xa0a0a0a0 34 + 35 + /* 36 + * Redistributor registers 37 + */ 38 + #define GICR_CTLR 0x000 39 + #define GICR_WAKER 0x014 40 + 41 + #define GICR_CTLR_RWP (1U << 3) 42 + 43 + #define GICR_WAKER_ProcessorSleep (1U << 1) 44 + #define GICR_WAKER_ChildrenAsleep (1U << 2) 45 + 46 + /* 47 + * Redistributor registers, offsets from SGI base 48 + */ 49 + #define GICR_IGROUPR0 GICD_IGROUPR 50 + #define GICR_ISENABLER0 GICD_ISENABLER 51 + #define GICR_ICENABLER0 GICD_ICENABLER 52 + #define GICR_ICACTIVER0 GICD_ICACTIVER 53 + #define GICR_IPRIORITYR0 GICD_IPRIORITYR 54 + 55 + /* CPU interface registers */ 56 + #define SYS_ICC_PMR_EL1 sys_reg(3, 0, 4, 6, 0) 57 + #define SYS_ICC_IAR1_EL1 sys_reg(3, 0, 12, 12, 0) 58 + #define SYS_ICC_EOIR1_EL1 sys_reg(3, 0, 12, 12, 1) 59 + #define SYS_ICC_SRE_EL1 sys_reg(3, 0, 12, 12, 5) 60 + #define SYS_ICC_GRPEN1_EL1 sys_reg(3, 0, 12, 12, 7) 61 + 62 + #define ICC_PMR_DEF_PRIO 0xf0 63 + 64 + #define ICC_SRE_EL1_SRE (1U << 0) 65 + 66 + #define ICC_IGRPEN1_EL1_ENABLE (1U << 0) 67 + 68 + #define GICV3_MAX_CPUS 512 69 + 70 + #endif /* SELFTEST_KVM_GICV3_H */

+15 -9

tools/testing/selftests/kvm/lib/aarch64/processor.c

··· 212 212 } 213 213 } 214 214 215 - void aarch64_vcpu_setup(struct kvm_vm *vm, int vcpuid, struct kvm_vcpu_init *init) 215 + void aarch64_vcpu_setup(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_vcpu_init *init) 216 216 { 217 217 struct kvm_vcpu_init default_init = { .target = -1, }; 218 218 uint64_t sctlr_el1, tcr_el1; ··· 232 232 * Enable FP/ASIMD to avoid trapping when accessing Q0-Q15 233 233 * registers, which the variable argument list macros do. 234 234 */ 235 - set_reg(vm, vcpuid, ARM64_SYS_REG(CPACR_EL1), 3 << 20); 235 + set_reg(vm, vcpuid, KVM_ARM64_SYS_REG(SYS_CPACR_EL1), 3 << 20); 236 236 237 - get_reg(vm, vcpuid, ARM64_SYS_REG(SCTLR_EL1), &sctlr_el1); 238 - get_reg(vm, vcpuid, ARM64_SYS_REG(TCR_EL1), &tcr_el1); 237 + get_reg(vm, vcpuid, KVM_ARM64_SYS_REG(SYS_SCTLR_EL1), &sctlr_el1); 238 + get_reg(vm, vcpuid, KVM_ARM64_SYS_REG(SYS_TCR_EL1), &tcr_el1); 239 239 240 240 switch (vm->mode) { 241 241 case VM_MODE_P52V48_4K: ··· 273 273 tcr_el1 |= (1 << 8) | (1 << 10) | (3 << 12); 274 274 tcr_el1 |= (64 - vm->va_bits) /* T0SZ */; 275 275 276 - set_reg(vm, vcpuid, ARM64_SYS_REG(SCTLR_EL1), sctlr_el1); 277 - set_reg(vm, vcpuid, ARM64_SYS_REG(TCR_EL1), tcr_el1); 278 - set_reg(vm, vcpuid, ARM64_SYS_REG(MAIR_EL1), DEFAULT_MAIR_EL1); 279 - set_reg(vm, vcpuid, ARM64_SYS_REG(TTBR0_EL1), vm->pgd); 276 + set_reg(vm, vcpuid, KVM_ARM64_SYS_REG(SYS_SCTLR_EL1), sctlr_el1); 277 + set_reg(vm, vcpuid, KVM_ARM64_SYS_REG(SYS_TCR_EL1), tcr_el1); 278 + set_reg(vm, vcpuid, KVM_ARM64_SYS_REG(SYS_MAIR_EL1), DEFAULT_MAIR_EL1); 279 + set_reg(vm, vcpuid, KVM_ARM64_SYS_REG(SYS_TTBR0_EL1), vm->pgd); 280 + set_reg(vm, vcpuid, KVM_ARM64_SYS_REG(SYS_TPIDR_EL1), vcpuid); 280 281 } 281 282 282 283 void vcpu_dump(FILE *stream, struct kvm_vm *vm, uint32_t vcpuid, uint8_t indent) ··· 363 362 { 364 363 extern char vectors; 365 364 366 - set_reg(vm, vcpuid, ARM64_SYS_REG(VBAR_EL1), (uint64_t)&vectors); 365 + set_reg(vm, vcpuid, KVM_ARM64_SYS_REG(SYS_VBAR_EL1), (uint64_t)&vectors); 367 366 } 368 367 369 368 void route_exception(struct ex_regs *regs, int vector) ··· 426 425 assert(!VECTOR_IS_SYNC(vector)); 427 426 assert(vector < VECTOR_NUM); 428 427 handlers->exception_handlers[vector][0] = handler; 428 + } 429 + 430 + uint32_t guest_get_vcpuid(void) 431 + { 432 + return read_sysreg(tpidr_el1); 429 433 }

+27

tools/testing/selftests/kvm/lib/aarch64/spinlock.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * ARM64 Spinlock support 4 + */ 5 + #include <stdint.h> 6 + 7 + #include "spinlock.h" 8 + 9 + void spin_lock(struct spinlock *lock) 10 + { 11 + int val, res; 12 + 13 + asm volatile( 14 + "1: ldaxr %w0, [%2]\n" 15 + " cbnz %w0, 1b\n" 16 + " mov %w0, #1\n" 17 + " stxr %w1, %w0, [%2]\n" 18 + " cbnz %w1, 1b\n" 19 + : "=&r" (val), "=&r" (res) 20 + : "r" (&lock->v) 21 + : "memory"); 22 + } 23 + 24 + void spin_unlock(struct spinlock *lock) 25 + { 26 + asm volatile("stlr wzr, [%0]\n" : : "r" (&lock->v) : "memory"); 27 + }

+70

tools/testing/selftests/kvm/lib/aarch64/vgic.c

··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * ARM Generic Interrupt Controller (GIC) v3 host support 4 + */ 5 + 6 + #include <linux/kvm.h> 7 + #include <linux/sizes.h> 8 + #include <asm/kvm.h> 9 + 10 + #include "kvm_util.h" 11 + #include "../kvm_util_internal.h" 12 + #include "vgic.h" 13 + 14 + /* 15 + * vGIC-v3 default host setup 16 + * 17 + * Input args: 18 + * vm - KVM VM 19 + * nr_vcpus - Number of vCPUs supported by this VM 20 + * gicd_base_gpa - Guest Physical Address of the Distributor region 21 + * gicr_base_gpa - Guest Physical Address of the Redistributor region 22 + * 23 + * Output args: None 24 + * 25 + * Return: GIC file-descriptor or negative error code upon failure 26 + * 27 + * The function creates a vGIC-v3 device and maps the distributor and 28 + * redistributor regions of the guest. Since it depends on the number of 29 + * vCPUs for the VM, it must be called after all the vCPUs have been created. 30 + */ 31 + int vgic_v3_setup(struct kvm_vm *vm, unsigned int nr_vcpus, 32 + uint64_t gicd_base_gpa, uint64_t gicr_base_gpa) 33 + { 34 + int gic_fd; 35 + uint64_t redist_attr; 36 + struct list_head *iter; 37 + unsigned int nr_gic_pages, nr_vcpus_created = 0; 38 + 39 + TEST_ASSERT(nr_vcpus, "Number of vCPUs cannot be empty\n"); 40 + 41 + /* 42 + * Make sure that the caller is infact calling this 43 + * function after all the vCPUs are added. 44 + */ 45 + list_for_each(iter, &vm->vcpus) 46 + nr_vcpus_created++; 47 + TEST_ASSERT(nr_vcpus == nr_vcpus_created, 48 + "Number of vCPUs requested (%u) doesn't match with the ones created for the VM (%u)\n", 49 + nr_vcpus, nr_vcpus_created); 50 + 51 + /* Distributor setup */ 52 + gic_fd = kvm_create_device(vm, KVM_DEV_TYPE_ARM_VGIC_V3, false); 53 + kvm_device_access(gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 54 + KVM_VGIC_V3_ADDR_TYPE_DIST, &gicd_base_gpa, true); 55 + nr_gic_pages = vm_calc_num_guest_pages(vm->mode, KVM_VGIC_V3_DIST_SIZE); 56 + virt_map(vm, gicd_base_gpa, gicd_base_gpa, nr_gic_pages); 57 + 58 + /* Redistributor setup */ 59 + redist_attr = REDIST_REGION_ATTR_ADDR(nr_vcpus, gicr_base_gpa, 0, 0); 60 + kvm_device_access(gic_fd, KVM_DEV_ARM_VGIC_GRP_ADDR, 61 + KVM_VGIC_V3_ADDR_TYPE_REDIST_REGION, &redist_attr, true); 62 + nr_gic_pages = vm_calc_num_guest_pages(vm->mode, 63 + KVM_VGIC_V3_REDIST_SIZE * nr_vcpus); 64 + virt_map(vm, gicr_base_gpa, gicr_base_gpa, nr_gic_pages); 65 + 66 + kvm_device_access(gic_fd, KVM_DEV_ARM_VGIC_GRP_CTRL, 67 + KVM_DEV_ARM_VGIC_CTRL_INIT, NULL, true); 68 + 69 + return gic_fd; 70 + }

+42 -4

tools/testing/selftests/kvm/lib/kvm_util.c

··· 1792 1792 void vcpu_sregs_set(struct kvm_vm *vm, uint32_t vcpuid, struct kvm_sregs *sregs) 1793 1793 { 1794 1794 int ret = _vcpu_sregs_set(vm, vcpuid, sregs); 1795 - TEST_ASSERT(ret == 0, "KVM_RUN IOCTL failed, " 1795 + TEST_ASSERT(ret == 0, "KVM_SET_SREGS IOCTL failed, " 1796 1796 "rc: %i errno: %i", ret, errno); 1797 1797 } 1798 1798 ··· 1984 1984 { 1985 1985 int ret = _kvm_device_check_attr(dev_fd, group, attr); 1986 1986 1987 - TEST_ASSERT(ret >= 0, "KVM_HAS_DEVICE_ATTR failed, rc: %i errno: %i", ret, errno); 1987 + TEST_ASSERT(!ret, "KVM_HAS_DEVICE_ATTR failed, rc: %i errno: %i", ret, errno); 1988 1988 return ret; 1989 1989 } 1990 1990 ··· 2008 2008 ret = _kvm_create_device(vm, type, test, &fd); 2009 2009 2010 2010 if (!test) { 2011 - TEST_ASSERT(ret >= 0, 2011 + TEST_ASSERT(!ret, 2012 2012 "KVM_CREATE_DEVICE IOCTL failed, rc: %i errno: %i", ret, errno); 2013 2013 return fd; 2014 2014 } ··· 2036 2036 { 2037 2037 int ret = _kvm_device_access(dev_fd, group, attr, val, write); 2038 2038 2039 - TEST_ASSERT(ret >= 0, "KVM_SET|GET_DEVICE_ATTR IOCTL failed, rc: %i errno: %i", ret, errno); 2039 + TEST_ASSERT(!ret, "KVM_SET|GET_DEVICE_ATTR IOCTL failed, rc: %i errno: %i", ret, errno); 2040 + return ret; 2041 + } 2042 + 2043 + int _vcpu_has_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group, 2044 + uint64_t attr) 2045 + { 2046 + struct vcpu *vcpu = vcpu_find(vm, vcpuid); 2047 + 2048 + TEST_ASSERT(vcpu, "nonexistent vcpu id: %d", vcpuid); 2049 + 2050 + return _kvm_device_check_attr(vcpu->fd, group, attr); 2051 + } 2052 + 2053 + int vcpu_has_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group, 2054 + uint64_t attr) 2055 + { 2056 + int ret = _vcpu_has_device_attr(vm, vcpuid, group, attr); 2057 + 2058 + TEST_ASSERT(!ret, "KVM_HAS_DEVICE_ATTR IOCTL failed, rc: %i errno: %i", ret, errno); 2059 + return ret; 2060 + } 2061 + 2062 + int _vcpu_access_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group, 2063 + uint64_t attr, void *val, bool write) 2064 + { 2065 + struct vcpu *vcpu = vcpu_find(vm, vcpuid); 2066 + 2067 + TEST_ASSERT(vcpu, "nonexistent vcpu id: %d", vcpuid); 2068 + 2069 + return _kvm_device_access(vcpu->fd, group, attr, val, write); 2070 + } 2071 + 2072 + int vcpu_access_device_attr(struct kvm_vm *vm, uint32_t vcpuid, uint32_t group, 2073 + uint64_t attr, void *val, bool write) 2074 + { 2075 + int ret = _vcpu_access_device_attr(vm, vcpuid, group, attr, val, write); 2076 + 2077 + TEST_ASSERT(!ret, "KVM_SET|GET_DEVICE_ATTR IOCTL failed, rc: %i errno: %i", ret, errno); 2040 2078 return ret; 2041 2079 } 2042 2080

+1 -1

tools/testing/selftests/kvm/lib/sparsebit.c

··· 1866 1866 * of total bits set. 1867 1867 */ 1868 1868 if (s->num_set != total_bits_set) { 1869 - fprintf(stderr, "Number of bits set missmatch,\n" 1869 + fprintf(stderr, "Number of bits set mismatch,\n" 1870 1870 " s->num_set: 0x%lx total_bits_set: 0x%lx", 1871 1871 s->num_set, total_bits_set); 1872 1872

+1 -3

tools/testing/selftests/kvm/lib/x86_64/processor.c

··· 660 660 661 661 /* Create VCPU */ 662 662 vm_vcpu_add(vm, vcpuid); 663 + vcpu_set_cpuid(vm, vcpuid, kvm_get_supported_cpuid()); 663 664 vcpu_setup(vm, vcpuid); 664 665 665 666 /* Setup guest general purpose registers */ ··· 673 672 /* Setup the MP state */ 674 673 mp_state.mp_state = 0; 675 674 vcpu_set_mp_state(vm, vcpuid, &mp_state); 676 - 677 - /* Setup supported CPUIDs */ 678 - vcpu_set_cpuid(vm, vcpuid, kvm_get_supported_cpuid()); 679 675 } 680 676 681 677 /*

+13 -1

tools/testing/selftests/kvm/lib/x86_64/svm.c

··· 54 54 seg->base = base; 55 55 } 56 56 57 + /* 58 + * Avoid using memset to clear the vmcb, since libc may not be 59 + * available in L1 (and, even if it is, features that libc memset may 60 + * want to use, like AVX, may not be enabled). 61 + */ 62 + static void clear_vmcb(struct vmcb *vmcb) 63 + { 64 + int n = sizeof(*vmcb) / sizeof(u32); 65 + 66 + asm volatile ("rep stosl" : "+c"(n), "+D"(vmcb) : "a"(0) : "memory"); 67 + } 68 + 57 69 void generic_svm_setup(struct svm_test_data *svm, void *guest_rip, void *guest_rsp) 58 70 { 59 71 struct vmcb *vmcb = svm->vmcb; ··· 82 70 wrmsr(MSR_EFER, efer | EFER_SVME); 83 71 wrmsr(MSR_VM_HSAVE_PA, svm->save_area_gpa); 84 72 85 - memset(vmcb, 0, sizeof(*vmcb)); 73 + clear_vmcb(vmcb); 86 74 asm volatile ("vmsave %0\n\t" : : "a" (vmcb_gpa) : "memory"); 87 75 vmcb_set_seg(&save->es, get_es(), 0, -1U, data_seg_attr); 88 76 vmcb_set_seg(&save->cs, get_cs(), 0, -1U, code_seg_attr);

+34 -22

tools/testing/selftests/kvm/memslot_perf_test.c

··· 127 127 pr_info(__VA_ARGS__); \ 128 128 } while (0) 129 129 130 + static void check_mmio_access(struct vm_data *vm, struct kvm_run *run) 131 + { 132 + TEST_ASSERT(vm->mmio_ok, "Unexpected mmio exit"); 133 + TEST_ASSERT(run->mmio.is_write, "Unexpected mmio read"); 134 + TEST_ASSERT(run->mmio.len == 8, 135 + "Unexpected exit mmio size = %u", run->mmio.len); 136 + TEST_ASSERT(run->mmio.phys_addr >= vm->mmio_gpa_min && 137 + run->mmio.phys_addr <= vm->mmio_gpa_max, 138 + "Unexpected exit mmio address = 0x%llx", 139 + run->mmio.phys_addr); 140 + } 141 + 130 142 static void *vcpu_worker(void *data) 131 143 { 132 144 struct vm_data *vm = data; 133 145 struct kvm_run *run; 134 146 struct ucall uc; 135 - uint64_t cmd; 136 147 137 148 run = vcpu_state(vm->vm, VCPU_ID); 138 149 while (1) { 139 150 vcpu_run(vm->vm, VCPU_ID); 140 151 141 - if (run->exit_reason == KVM_EXIT_IO) { 142 - cmd = get_ucall(vm->vm, VCPU_ID, &uc); 143 - if (cmd != UCALL_SYNC) 144 - break; 145 - 152 + switch (get_ucall(vm->vm, VCPU_ID, &uc)) { 153 + case UCALL_SYNC: 154 + TEST_ASSERT(uc.args[1] == 0, 155 + "Unexpected sync ucall, got %lx", 156 + (ulong)uc.args[1]); 146 157 sem_post(&vcpu_ready); 147 158 continue; 148 - } 149 - 150 - if (run->exit_reason != KVM_EXIT_MMIO) 159 + case UCALL_NONE: 160 + if (run->exit_reason == KVM_EXIT_MMIO) 161 + check_mmio_access(vm, run); 162 + else 163 + goto done; 151 164 break; 152 - 153 - TEST_ASSERT(vm->mmio_ok, "Unexpected mmio exit"); 154 - TEST_ASSERT(run->mmio.is_write, "Unexpected mmio read"); 155 - TEST_ASSERT(run->mmio.len == 8, 156 - "Unexpected exit mmio size = %u", run->mmio.len); 157 - TEST_ASSERT(run->mmio.phys_addr >= vm->mmio_gpa_min && 158 - run->mmio.phys_addr <= vm->mmio_gpa_max, 159 - "Unexpected exit mmio address = 0x%llx", 160 - run->mmio.phys_addr); 165 + case UCALL_ABORT: 166 + TEST_FAIL("%s at %s:%ld, val = %lu", 167 + (const char *)uc.args[0], 168 + __FILE__, uc.args[1], uc.args[2]); 169 + break; 170 + case UCALL_DONE: 171 + goto done; 172 + default: 173 + TEST_FAIL("Unknown ucall %lu", uc.cmd); 174 + } 161 175 } 162 176 163 - if (run->exit_reason == KVM_EXIT_IO && cmd == UCALL_ABORT) 164 - TEST_FAIL("%s at %s:%ld, val = %lu", (const char *)uc.args[0], 165 - __FILE__, uc.args[1], uc.args[2]); 166 - 177 + done: 167 178 return NULL; 168 179 } 169 180 ··· 279 268 TEST_ASSERT(data->hva_slots, "malloc() fail"); 280 269 281 270 data->vm = vm_create_default(VCPU_ID, mempages, guest_code); 271 + ucall_init(data->vm, NULL); 282 272 283 273 pr_info_v("Adding slots 1..%i, each slot with %"PRIu64" pages + %"PRIu64" extra pages last\n", 284 274 max_mem_slots - 1, data->pages_per_slot, rempages);

+132

tools/testing/selftests/kvm/system_counter_offset_test.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2021, Google LLC. 4 + * 5 + * Tests for adjusting the system counter from userspace 6 + */ 7 + #include <asm/kvm_para.h> 8 + #include <stdint.h> 9 + #include <string.h> 10 + #include <sys/stat.h> 11 + #include <time.h> 12 + 13 + #include "test_util.h" 14 + #include "kvm_util.h" 15 + #include "processor.h" 16 + 17 + #define VCPU_ID 0 18 + 19 + #ifdef __x86_64__ 20 + 21 + struct test_case { 22 + uint64_t tsc_offset; 23 + }; 24 + 25 + static struct test_case test_cases[] = { 26 + { 0 }, 27 + { 180 * NSEC_PER_SEC }, 28 + { -180 * NSEC_PER_SEC }, 29 + }; 30 + 31 + static void check_preconditions(struct kvm_vm *vm) 32 + { 33 + if (!_vcpu_has_device_attr(vm, VCPU_ID, KVM_VCPU_TSC_CTRL, KVM_VCPU_TSC_OFFSET)) 34 + return; 35 + 36 + print_skip("KVM_VCPU_TSC_OFFSET not supported; skipping test"); 37 + exit(KSFT_SKIP); 38 + } 39 + 40 + static void setup_system_counter(struct kvm_vm *vm, struct test_case *test) 41 + { 42 + vcpu_access_device_attr(vm, VCPU_ID, KVM_VCPU_TSC_CTRL, 43 + KVM_VCPU_TSC_OFFSET, &test->tsc_offset, true); 44 + } 45 + 46 + static uint64_t guest_read_system_counter(struct test_case *test) 47 + { 48 + return rdtsc(); 49 + } 50 + 51 + static uint64_t host_read_guest_system_counter(struct test_case *test) 52 + { 53 + return rdtsc() + test->tsc_offset; 54 + } 55 + 56 + #else /* __x86_64__ */ 57 + 58 + #error test not implemented for this architecture! 59 + 60 + #endif 61 + 62 + #define GUEST_SYNC_CLOCK(__stage, __val) \ 63 + GUEST_SYNC_ARGS(__stage, __val, 0, 0, 0) 64 + 65 + static void guest_main(void) 66 + { 67 + int i; 68 + 69 + for (i = 0; i < ARRAY_SIZE(test_cases); i++) { 70 + struct test_case *test = &test_cases[i]; 71 + 72 + GUEST_SYNC_CLOCK(i, guest_read_system_counter(test)); 73 + } 74 + } 75 + 76 + static void handle_sync(struct ucall *uc, uint64_t start, uint64_t end) 77 + { 78 + uint64_t obs = uc->args[2]; 79 + 80 + TEST_ASSERT(start <= obs && obs <= end, 81 + "unexpected system counter value: %"PRIu64" expected range: [%"PRIu64", %"PRIu64"]", 82 + obs, start, end); 83 + 84 + pr_info("system counter value: %"PRIu64" expected range [%"PRIu64", %"PRIu64"]\n", 85 + obs, start, end); 86 + } 87 + 88 + static void handle_abort(struct ucall *uc) 89 + { 90 + TEST_FAIL("%s at %s:%ld", (const char *)uc->args[0], 91 + __FILE__, uc->args[1]); 92 + } 93 + 94 + static void enter_guest(struct kvm_vm *vm) 95 + { 96 + uint64_t start, end; 97 + struct ucall uc; 98 + int i; 99 + 100 + for (i = 0; i < ARRAY_SIZE(test_cases); i++) { 101 + struct test_case *test = &test_cases[i]; 102 + 103 + setup_system_counter(vm, test); 104 + start = host_read_guest_system_counter(test); 105 + vcpu_run(vm, VCPU_ID); 106 + end = host_read_guest_system_counter(test); 107 + 108 + switch (get_ucall(vm, VCPU_ID, &uc)) { 109 + case UCALL_SYNC: 110 + handle_sync(&uc, start, end); 111 + break; 112 + case UCALL_ABORT: 113 + handle_abort(&uc); 114 + return; 115 + default: 116 + TEST_ASSERT(0, "unhandled ucall %ld\n", 117 + get_ucall(vm, VCPU_ID, &uc)); 118 + } 119 + } 120 + } 121 + 122 + int main(void) 123 + { 124 + struct kvm_vm *vm; 125 + 126 + vm = vm_create_default(VCPU_ID, 0, guest_main); 127 + check_preconditions(vm); 128 + ucall_init(vm, NULL); 129 + 130 + enter_guest(vm); 131 + kvm_vm_free(vm); 132 + }

+1 -2

tools/testing/selftests/kvm/x86_64/cr4_cpuid_sync_test.c

··· 109 109 } 110 110 } 111 111 112 - kvm_vm_free(vm); 113 - 114 112 done: 113 + kvm_vm_free(vm); 115 114 return 0; 116 115 }

+203

tools/testing/selftests/kvm/x86_64/kvm_clock_test.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2021, Google LLC. 4 + * 5 + * Tests for adjusting the KVM clock from userspace 6 + */ 7 + #include <asm/kvm_para.h> 8 + #include <asm/pvclock.h> 9 + #include <asm/pvclock-abi.h> 10 + #include <stdint.h> 11 + #include <string.h> 12 + #include <sys/stat.h> 13 + #include <time.h> 14 + 15 + #include "test_util.h" 16 + #include "kvm_util.h" 17 + #include "processor.h" 18 + 19 + #define VCPU_ID 0 20 + 21 + struct test_case { 22 + uint64_t kvmclock_base; 23 + int64_t realtime_offset; 24 + }; 25 + 26 + static struct test_case test_cases[] = { 27 + { .kvmclock_base = 0 }, 28 + { .kvmclock_base = 180 * NSEC_PER_SEC }, 29 + { .kvmclock_base = 0, .realtime_offset = -180 * NSEC_PER_SEC }, 30 + { .kvmclock_base = 0, .realtime_offset = 180 * NSEC_PER_SEC }, 31 + }; 32 + 33 + #define GUEST_SYNC_CLOCK(__stage, __val) \ 34 + GUEST_SYNC_ARGS(__stage, __val, 0, 0, 0) 35 + 36 + static void guest_main(vm_paddr_t pvti_pa, struct pvclock_vcpu_time_info *pvti) 37 + { 38 + int i; 39 + 40 + wrmsr(MSR_KVM_SYSTEM_TIME_NEW, pvti_pa | KVM_MSR_ENABLED); 41 + for (i = 0; i < ARRAY_SIZE(test_cases); i++) 42 + GUEST_SYNC_CLOCK(i, __pvclock_read_cycles(pvti, rdtsc())); 43 + } 44 + 45 + #define EXPECTED_FLAGS (KVM_CLOCK_REALTIME | KVM_CLOCK_HOST_TSC) 46 + 47 + static inline void assert_flags(struct kvm_clock_data *data) 48 + { 49 + TEST_ASSERT((data->flags & EXPECTED_FLAGS) == EXPECTED_FLAGS, 50 + "unexpected clock data flags: %x (want set: %x)", 51 + data->flags, EXPECTED_FLAGS); 52 + } 53 + 54 + static void handle_sync(struct ucall *uc, struct kvm_clock_data *start, 55 + struct kvm_clock_data *end) 56 + { 57 + uint64_t obs, exp_lo, exp_hi; 58 + 59 + obs = uc->args[2]; 60 + exp_lo = start->clock; 61 + exp_hi = end->clock; 62 + 63 + assert_flags(start); 64 + assert_flags(end); 65 + 66 + TEST_ASSERT(exp_lo <= obs && obs <= exp_hi, 67 + "unexpected kvm-clock value: %"PRIu64" expected range: [%"PRIu64", %"PRIu64"]", 68 + obs, exp_lo, exp_hi); 69 + 70 + pr_info("kvm-clock value: %"PRIu64" expected range [%"PRIu64", %"PRIu64"]\n", 71 + obs, exp_lo, exp_hi); 72 + } 73 + 74 + static void handle_abort(struct ucall *uc) 75 + { 76 + TEST_FAIL("%s at %s:%ld", (const char *)uc->args[0], 77 + __FILE__, uc->args[1]); 78 + } 79 + 80 + static void setup_clock(struct kvm_vm *vm, struct test_case *test_case) 81 + { 82 + struct kvm_clock_data data; 83 + 84 + memset(&data, 0, sizeof(data)); 85 + 86 + data.clock = test_case->kvmclock_base; 87 + if (test_case->realtime_offset) { 88 + struct timespec ts; 89 + int r; 90 + 91 + data.flags |= KVM_CLOCK_REALTIME; 92 + do { 93 + r = clock_gettime(CLOCK_REALTIME, &ts); 94 + if (!r) 95 + break; 96 + } while (errno == EINTR); 97 + 98 + TEST_ASSERT(!r, "clock_gettime() failed: %d\n", r); 99 + 100 + data.realtime = ts.tv_sec * NSEC_PER_SEC; 101 + data.realtime += ts.tv_nsec; 102 + data.realtime += test_case->realtime_offset; 103 + } 104 + 105 + vm_ioctl(vm, KVM_SET_CLOCK, &data); 106 + } 107 + 108 + static void enter_guest(struct kvm_vm *vm) 109 + { 110 + struct kvm_clock_data start, end; 111 + struct kvm_run *run; 112 + struct ucall uc; 113 + int i, r; 114 + 115 + run = vcpu_state(vm, VCPU_ID); 116 + 117 + for (i = 0; i < ARRAY_SIZE(test_cases); i++) { 118 + setup_clock(vm, &test_cases[i]); 119 + 120 + vm_ioctl(vm, KVM_GET_CLOCK, &start); 121 + 122 + r = _vcpu_run(vm, VCPU_ID); 123 + vm_ioctl(vm, KVM_GET_CLOCK, &end); 124 + 125 + TEST_ASSERT(!r, "vcpu_run failed: %d\n", r); 126 + TEST_ASSERT(run->exit_reason == KVM_EXIT_IO, 127 + "unexpected exit reason: %u (%s)", 128 + run->exit_reason, exit_reason_str(run->exit_reason)); 129 + 130 + switch (get_ucall(vm, VCPU_ID, &uc)) { 131 + case UCALL_SYNC: 132 + handle_sync(&uc, &start, &end); 133 + break; 134 + case UCALL_ABORT: 135 + handle_abort(&uc); 136 + return; 137 + default: 138 + TEST_ASSERT(0, "unhandled ucall: %ld\n", uc.cmd); 139 + } 140 + } 141 + } 142 + 143 + #define CLOCKSOURCE_PATH "/sys/devices/system/clocksource/clocksource0/current_clocksource" 144 + 145 + static void check_clocksource(void) 146 + { 147 + char *clk_name; 148 + struct stat st; 149 + FILE *fp; 150 + 151 + fp = fopen(CLOCKSOURCE_PATH, "r"); 152 + if (!fp) { 153 + pr_info("failed to open clocksource file: %d; assuming TSC.\n", 154 + errno); 155 + return; 156 + } 157 + 158 + if (fstat(fileno(fp), &st)) { 159 + pr_info("failed to stat clocksource file: %d; assuming TSC.\n", 160 + errno); 161 + goto out; 162 + } 163 + 164 + clk_name = malloc(st.st_size); 165 + TEST_ASSERT(clk_name, "failed to allocate buffer to read file\n"); 166 + 167 + if (!fgets(clk_name, st.st_size, fp)) { 168 + pr_info("failed to read clocksource file: %d; assuming TSC.\n", 169 + ferror(fp)); 170 + goto out; 171 + } 172 + 173 + TEST_ASSERT(!strncmp(clk_name, "tsc\n", st.st_size), 174 + "clocksource not supported: %s", clk_name); 175 + out: 176 + fclose(fp); 177 + } 178 + 179 + int main(void) 180 + { 181 + vm_vaddr_t pvti_gva; 182 + vm_paddr_t pvti_gpa; 183 + struct kvm_vm *vm; 184 + int flags; 185 + 186 + flags = kvm_check_cap(KVM_CAP_ADJUST_CLOCK); 187 + if (!(flags & KVM_CLOCK_REALTIME)) { 188 + print_skip("KVM_CLOCK_REALTIME not supported; flags: %x", 189 + flags); 190 + exit(KSFT_SKIP); 191 + } 192 + 193 + check_clocksource(); 194 + 195 + vm = vm_create_default(VCPU_ID, 0, guest_main); 196 + 197 + pvti_gva = vm_vaddr_alloc(vm, getpagesize(), 0x10000); 198 + pvti_gpa = addr_gva2gpa(vm, pvti_gva); 199 + vcpu_args_set(vm, VCPU_ID, 2, pvti_gpa, pvti_gva); 200 + 201 + enter_guest(vm); 202 + kvm_vm_free(vm); 203 + }

+1 -1

tools/testing/selftests/kvm/x86_64/vmx_tsc_adjust_test.c

··· 161 161 } 162 162 } 163 163 164 - kvm_vm_free(vm); 165 164 done: 165 + kvm_vm_free(vm); 166 166 return 0; 167 167 }

+14 -1

virt/kvm/eventfd.c

··· 281 281 { 282 282 return 0; 283 283 } 284 + 285 + bool __attribute__((weak)) kvm_arch_irqfd_route_changed( 286 + struct kvm_kernel_irq_routing_entry *old, 287 + struct kvm_kernel_irq_routing_entry *new) 288 + { 289 + return true; 290 + } 284 291 #endif 285 292 286 293 static int ··· 622 615 spin_lock_irq(&kvm->irqfds.lock); 623 616 624 617 list_for_each_entry(irqfd, &kvm->irqfds.items, list) { 618 + #ifdef CONFIG_HAVE_KVM_IRQ_BYPASS 619 + /* Under irqfds.lock, so can read irq_entry safely */ 620 + struct kvm_kernel_irq_routing_entry old = irqfd->irq_entry; 621 + #endif 622 + 625 623 irqfd_update(kvm, irqfd); 626 624 627 625 #ifdef CONFIG_HAVE_KVM_IRQ_BYPASS 628 - if (irqfd->producer) { 626 + if (irqfd->producer && 627 + kvm_arch_irqfd_route_changed(&old, &irqfd->irq_entry)) { 629 628 int ret = kvm_arch_update_irqfd_routing( 630 629 irqfd->kvm, irqfd->producer->irq, 631 630 irqfd->gsi, 1);

+77 -54

virt/kvm/kvm_main.c

··· 155 155 static unsigned long long kvm_createvm_count; 156 156 static unsigned long long kvm_active_vms; 157 157 158 + static DEFINE_PER_CPU(cpumask_var_t, cpu_kick_mask); 159 + 158 160 __weak void kvm_arch_mmu_notifier_invalidate_range(struct kvm *kvm, 159 161 unsigned long start, unsigned long end) 160 162 { ··· 237 235 { 238 236 } 239 237 240 - static inline bool kvm_kick_many_cpus(cpumask_var_t tmp, bool wait) 238 + static inline bool kvm_kick_many_cpus(struct cpumask *cpus, bool wait) 241 239 { 242 - const struct cpumask *cpus; 243 - 244 - if (likely(cpumask_available(tmp))) 245 - cpus = tmp; 246 - else 247 - cpus = cpu_online_mask; 248 - 249 240 if (cpumask_empty(cpus)) 250 241 return false; 251 242 ··· 246 251 return true; 247 252 } 248 253 249 - bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req, 250 - struct kvm_vcpu *except, 251 - unsigned long *vcpu_bitmap, cpumask_var_t tmp) 254 + static void kvm_make_vcpu_request(struct kvm *kvm, struct kvm_vcpu *vcpu, 255 + unsigned int req, struct cpumask *tmp, 256 + int current_cpu) 252 257 { 253 - int i, cpu, me; 258 + int cpu; 259 + 260 + kvm_make_request(req, vcpu); 261 + 262 + if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu)) 263 + return; 264 + 265 + /* 266 + * Note, the vCPU could get migrated to a different pCPU at any point 267 + * after kvm_request_needs_ipi(), which could result in sending an IPI 268 + * to the previous pCPU. But, that's OK because the purpose of the IPI 269 + * is to ensure the vCPU returns to OUTSIDE_GUEST_MODE, which is 270 + * satisfied if the vCPU migrates. Entering READING_SHADOW_PAGE_TABLES 271 + * after this point is also OK, as the requirement is only that KVM wait 272 + * for vCPUs that were reading SPTEs _before_ any changes were 273 + * finalized. See kvm_vcpu_kick() for more details on handling requests. 274 + */ 275 + if (kvm_request_needs_ipi(vcpu, req)) { 276 + cpu = READ_ONCE(vcpu->cpu); 277 + if (cpu != -1 && cpu != current_cpu) 278 + __cpumask_set_cpu(cpu, tmp); 279 + } 280 + } 281 + 282 + bool kvm_make_vcpus_request_mask(struct kvm *kvm, unsigned int req, 283 + unsigned long *vcpu_bitmap) 284 + { 254 285 struct kvm_vcpu *vcpu; 286 + struct cpumask *cpus; 287 + int i, me; 255 288 bool called; 256 289 257 290 me = get_cpu(); 258 291 259 - kvm_for_each_vcpu(i, vcpu, kvm) { 260 - if ((vcpu_bitmap && !test_bit(i, vcpu_bitmap)) || 261 - vcpu == except) 292 + cpus = this_cpu_cpumask_var_ptr(cpu_kick_mask); 293 + cpumask_clear(cpus); 294 + 295 + for_each_set_bit(i, vcpu_bitmap, KVM_MAX_VCPUS) { 296 + vcpu = kvm_get_vcpu(kvm, i); 297 + if (!vcpu) 262 298 continue; 263 - 264 - kvm_make_request(req, vcpu); 265 - 266 - if (!(req & KVM_REQUEST_NO_WAKEUP) && kvm_vcpu_wake_up(vcpu)) 267 - continue; 268 - 269 - /* 270 - * tmp can be "unavailable" if cpumasks are allocated off stack 271 - * as allocation of the mask is deliberately not fatal and is 272 - * handled by falling back to kicking all online CPUs. 273 - */ 274 - if (!cpumask_available(tmp)) 275 - continue; 276 - 277 - /* 278 - * Note, the vCPU could get migrated to a different pCPU at any 279 - * point after kvm_request_needs_ipi(), which could result in 280 - * sending an IPI to the previous pCPU. But, that's ok because 281 - * the purpose of the IPI is to ensure the vCPU returns to 282 - * OUTSIDE_GUEST_MODE, which is satisfied if the vCPU migrates. 283 - * Entering READING_SHADOW_PAGE_TABLES after this point is also 284 - * ok, as the requirement is only that KVM wait for vCPUs that 285 - * were reading SPTEs _before_ any changes were finalized. See 286 - * kvm_vcpu_kick() for more details on handling requests. 287 - */ 288 - if (kvm_request_needs_ipi(vcpu, req)) { 289 - cpu = READ_ONCE(vcpu->cpu); 290 - if (cpu != -1 && cpu != me) 291 - __cpumask_set_cpu(cpu, tmp); 292 - } 299 + kvm_make_vcpu_request(kvm, vcpu, req, cpus, me); 293 300 } 294 301 295 - called = kvm_kick_many_cpus(tmp, !!(req & KVM_REQUEST_WAIT)); 302 + called = kvm_kick_many_cpus(cpus, !!(req & KVM_REQUEST_WAIT)); 296 303 put_cpu(); 297 304 298 305 return called; ··· 303 306 bool kvm_make_all_cpus_request_except(struct kvm *kvm, unsigned int req, 304 307 struct kvm_vcpu *except) 305 308 { 306 - cpumask_var_t cpus; 309 + struct kvm_vcpu *vcpu; 310 + struct cpumask *cpus; 307 311 bool called; 312 + int i, me; 308 313 309 - zalloc_cpumask_var(&cpus, GFP_ATOMIC); 314 + me = get_cpu(); 310 315 311 - called = kvm_make_vcpus_request_mask(kvm, req, except, NULL, cpus); 316 + cpus = this_cpu_cpumask_var_ptr(cpu_kick_mask); 317 + cpumask_clear(cpus); 312 318 313 - free_cpumask_var(cpus); 319 + kvm_for_each_vcpu(i, vcpu, kvm) { 320 + if (vcpu == except) 321 + continue; 322 + kvm_make_vcpu_request(kvm, vcpu, req, cpus, me); 323 + } 324 + 325 + called = kvm_kick_many_cpus(cpus, !!(req & KVM_REQUEST_WAIT)); 326 + put_cpu(); 327 + 314 328 return called; 315 329 } 316 330 ··· 3531 3523 static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma) 3532 3524 { 3533 3525 struct kvm_vcpu *vcpu = file->private_data; 3534 - unsigned long pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT; 3526 + unsigned long pages = vma_pages(vma); 3535 3527 3536 3528 if ((kvm_page_in_dirty_ring(vcpu->kvm, vma->vm_pgoff) || 3537 3529 kvm_page_in_dirty_ring(vcpu->kvm, vma->vm_pgoff + pages - 1)) && ··· 3595 3587 struct kvm_vcpu *vcpu; 3596 3588 struct page *page; 3597 3589 3598 - if (id >= KVM_MAX_VCPU_ID) 3590 + if (id >= KVM_MAX_VCPU_IDS) 3599 3591 return -EINVAL; 3600 3592 3601 3593 mutex_lock(&kvm->lock); ··· 5551 5543 goto out_free_3; 5552 5544 } 5553 5545 5546 + for_each_possible_cpu(cpu) { 5547 + if (!alloc_cpumask_var_node(&per_cpu(cpu_kick_mask, cpu), 5548 + GFP_KERNEL, cpu_to_node(cpu))) { 5549 + r = -ENOMEM; 5550 + goto out_free_4; 5551 + } 5552 + } 5553 + 5554 5554 r = kvm_async_pf_init(); 5555 5555 if (r) 5556 - goto out_free; 5556 + goto out_free_5; 5557 5557 5558 5558 kvm_chardev_ops.owner = module; 5559 5559 kvm_vm_fops.owner = module; ··· 5587 5571 5588 5572 out_unreg: 5589 5573 kvm_async_pf_deinit(); 5590 - out_free: 5574 + out_free_5: 5575 + for_each_possible_cpu(cpu) 5576 + free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); 5577 + out_free_4: 5591 5578 kmem_cache_destroy(kvm_vcpu_cache); 5592 5579 out_free_3: 5593 5580 unregister_reboot_notifier(&kvm_reboot_notifier); ··· 5610 5591 5611 5592 void kvm_exit(void) 5612 5593 { 5594 + int cpu; 5595 + 5613 5596 debugfs_remove_recursive(kvm_debugfs_dir); 5614 5597 misc_deregister(&kvm_dev); 5598 + for_each_possible_cpu(cpu) 5599 + free_cpumask_var(per_cpu(cpu_kick_mask, cpu)); 5615 5600 kmem_cache_destroy(kvm_vcpu_cache); 5616 5601 kvm_async_pf_deinit(); 5617 5602 unregister_syscore_ops(&kvm_syscore_ops);