Merge tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

+5 -11

Documentation/arch/riscv/vm-layout.rst

··· 144 144 smaller than sv48, the CPU maximum supported address space will be the default. 145 145 146 146 Software can "opt-in" to receiving VAs from another VA space by providing 147 - a hint address to mmap. A hint address passed to mmap will cause the largest 148 - address space that fits entirely into the hint to be used, unless there is no 149 - space left in the address space. If there is no space available in the requested 150 - address space, an address in the next smallest available address space will be 151 - returned. 152 - 153 - For example, in order to obtain 48-bit VA space, a hint address greater than 154 - :code:`1 << 47` must be provided. Note that this is 47 due to sv48 userspace 155 - ending at :code:`1 << 47` and the addresses beyond this are reserved for the 156 - kernel. Similarly, to obtain 57-bit VA space addresses, a hint address greater 157 - than or equal to :code:`1 << 56` must be provided. 147 + a hint address to mmap. When a hint address is passed to mmap, the returned 148 + address will never use more bits than the hint address. For example, if a hint 149 + address of `1 << 40` is passed to mmap, a valid returned address will never use 150 + bits 41 through 63. If no mappable addresses are available in that range, mmap 151 + will return `MAP_FAILED`.

+5 -1

Documentation/devicetree/bindings/riscv/cpus.yaml

··· 110 110 const: 1 111 111 112 112 compatible: 113 - const: riscv,cpu-intc 113 + oneOf: 114 + - items: 115 + - const: andestech,cpu-intc 116 + - const: riscv,cpu-intc 117 + - const: riscv,cpu-intc 114 118 115 119 interrupt-controller: true 116 120

+7

Documentation/devicetree/bindings/riscv/extensions.yaml

··· 477 477 latency, as ratified in commit 56ed795 ("Update 478 478 riscv-crypto-spec-vector.adoc") of riscv-crypto. 479 479 480 + - const: xandespmu 481 + description: 482 + The Andes Technology performance monitor extension for counter overflow 483 + and privilege mode filtering. For more details, see Counter Related 484 + Registers in the AX45MP datasheet. 485 + https://www.andestech.com/wp-content/uploads/AX45MP-1C-Rev.-5.0.0-Datasheet.pdf 486 + 480 487 additionalProperties: true 481 488 ...

+17 -1

Documentation/features/sched/membarrier-sync-core/arch-support.txt

··· 10 10 # Rely on implicit context synchronization as a result of exception return 11 11 # when returning from IPI handler, and when returning to user-space. 12 12 # 13 + # * riscv 14 + # 15 + # riscv uses xRET as return from interrupt and to return to user-space. 16 + # 17 + # Given that xRET is not core serializing, we rely on FENCE.I for providing 18 + # core serialization: 19 + # 20 + # - by calling sync_core_before_usermode() on return from interrupt (cf. 21 + # ipi_sync_core()), 22 + # 23 + # - via switch_mm() and sync_core_before_usermode() (respectively, for 24 + # uthread->uthread and kthread->uthread transitions) before returning 25 + # to user-space. 26 + # 27 + # The serialization in switch_mm() is activated by prepare_sync_core_cmd(). 28 + # 13 29 # * x86 14 30 # 15 31 # x86-32 uses IRET as return from interrupt, which takes care of the IPI. ··· 59 43 | openrisc: | TODO | 60 44 | parisc: | TODO | 61 45 | powerpc: | ok | 62 - | riscv: | TODO | 46 + | riscv: | ok | 63 47 | s390: | ok | 64 48 | sh: | TODO | 65 49 | sparc: | TODO |

+1

Documentation/scheduler/index.rst

··· 7 7 8 8 9 9 completion 10 + membarrier 10 11 sched-arch 11 12 sched-bwc 12 13 sched-deadline

+39

Documentation/scheduler/membarrier.rst

··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ======================== 4 + membarrier() System Call 5 + ======================== 6 + 7 + MEMBARRIER_CMD_{PRIVATE,GLOBAL}_EXPEDITED - Architecture requirements 8 + ===================================================================== 9 + 10 + Memory barriers before updating rq->curr 11 + ---------------------------------------- 12 + 13 + The commands MEMBARRIER_CMD_PRIVATE_EXPEDITED and MEMBARRIER_CMD_GLOBAL_EXPEDITED 14 + require each architecture to have a full memory barrier after coming from 15 + user-space, before updating rq->curr. This barrier is implied by the sequence 16 + rq_lock(); smp_mb__after_spinlock() in __schedule(). The barrier matches a full 17 + barrier in the proximity of the membarrier system call exit, cf. 18 + membarrier_{private,global}_expedited(). 19 + 20 + Memory barriers after updating rq->curr 21 + --------------------------------------- 22 + 23 + The commands MEMBARRIER_CMD_PRIVATE_EXPEDITED and MEMBARRIER_CMD_GLOBAL_EXPEDITED 24 + require each architecture to have a full memory barrier after updating rq->curr, 25 + before returning to user-space. The schemes providing this barrier on the various 26 + architectures are as follows. 27 + 28 + - alpha, arc, arm, hexagon, mips rely on the full barrier implied by 29 + spin_unlock() in finish_lock_switch(). 30 + 31 + - arm64 relies on the full barrier implied by switch_to(). 32 + 33 + - powerpc, riscv, s390, sparc, x86 rely on the full barrier implied by 34 + switch_mm(), if mm is not NULL; they rely on the full barrier implied 35 + by mmdrop(), otherwise. On powerpc and riscv, switch_mm() relies on 36 + membarrier_arch_switch_mm(). 37 + 38 + The barrier matches a full barrier in the proximity of the membarrier system call 39 + entry, cf. membarrier_{private,global}_expedited().

+3 -1

MAINTAINERS

··· 14134 14134 M: "Paul E. McKenney" <paulmck@kernel.org> 14135 14135 L: linux-kernel@vger.kernel.org 14136 14136 S: Supported 14137 - F: arch/powerpc/include/asm/membarrier.h 14137 + F: Documentation/scheduler/membarrier.rst 14138 + F: arch/*/include/asm/membarrier.h 14139 + F: arch/*/include/asm/sync_core.h 14138 14140 F: include/uapi/linux/membarrier.h 14139 14141 F: kernel/sched/membarrier.c 14140 14142

+1

arch/riscv/Kbuild

··· 2 2 3 3 obj-y += kernel/ mm/ net/ 4 4 obj-$(CONFIG_BUILTIN_DTB) += boot/dts/ 5 + obj-$(CONFIG_CRYPTO) += crypto/ 5 6 obj-y += errata/ 6 7 obj-$(CONFIG_KVM) += kvm/ 7 8

+64 -16

arch/riscv/Kconfig

··· 27 27 select ARCH_HAS_GCOV_PROFILE_ALL 28 28 select ARCH_HAS_GIGANTIC_PAGE 29 29 select ARCH_HAS_KCOV 30 + select ARCH_HAS_MEMBARRIER_CALLBACKS 31 + select ARCH_HAS_MEMBARRIER_SYNC_CORE 30 32 select ARCH_HAS_MMIOWB 31 33 select ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE 32 34 select ARCH_HAS_PMEM_API 35 + select ARCH_HAS_PREPARE_SYNC_CORE_CMD 33 36 select ARCH_HAS_PTE_SPECIAL 34 37 select ARCH_HAS_SET_DIRECT_MAP if MMU 35 38 select ARCH_HAS_SET_MEMORY if MMU 36 39 select ARCH_HAS_STRICT_KERNEL_RWX if MMU && !XIP_KERNEL 37 40 select ARCH_HAS_STRICT_MODULE_RWX if MMU && !XIP_KERNEL 41 + select ARCH_HAS_SYNC_CORE_BEFORE_USERMODE 38 42 select ARCH_HAS_SYSCALL_WRAPPER 39 43 select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST 40 44 select ARCH_HAS_UBSAN ··· 51 47 select ARCH_SUPPORTS_CFI_CLANG 52 48 select ARCH_SUPPORTS_DEBUG_PAGEALLOC if MMU 53 49 select ARCH_SUPPORTS_HUGETLBFS if MMU 50 + # LLD >= 14: https://github.com/llvm/llvm-project/issues/50505 51 + select ARCH_SUPPORTS_LTO_CLANG if LLD_VERSION >= 140000 52 + select ARCH_SUPPORTS_LTO_CLANG_THIN if LLD_VERSION >= 140000 54 53 select ARCH_SUPPORTS_PAGE_TABLE_CHECK if MMU 55 54 select ARCH_SUPPORTS_PER_VMA_LOCK if MMU 56 55 select ARCH_SUPPORTS_SHADOW_CALL_STACK if HAVE_SHADOW_CALL_STACK ··· 113 106 select HAVE_ARCH_KGDB_QXFER_PKT 114 107 select HAVE_ARCH_MMAP_RND_BITS if MMU 115 108 select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT 109 + select HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET 116 110 select HAVE_ARCH_SECCOMP_FILTER 117 111 select HAVE_ARCH_THREAD_STRUCT_WHITELIST 118 112 select HAVE_ARCH_TRACEHOOK ··· 132 124 select HAVE_FUNCTION_GRAPH_RETVAL if HAVE_FUNCTION_GRAPH_TRACER 133 125 select HAVE_FUNCTION_TRACER if !XIP_KERNEL && !PREEMPTION 134 126 select HAVE_EBPF_JIT if MMU 127 + select HAVE_FAST_GUP if MMU 135 128 select HAVE_FUNCTION_ARG_ACCESS_API 136 129 select HAVE_FUNCTION_ERROR_INJECTION 137 130 select HAVE_GCC_PLUGINS ··· 164 155 select IRQ_FORCED_THREADING 165 156 select KASAN_VMALLOC if KASAN 166 157 select LOCK_MM_AND_FIND_VMA 158 + select MMU_GATHER_RCU_TABLE_FREE if SMP && MMU 167 159 select MODULES_USE_ELF_RELA if MODULES 168 160 select MODULE_SECTIONS if MODULES 169 161 select OF ··· 586 576 depends on LLD_VERSION >= 150000 || LD_VERSION >= 23900 587 577 depends on AS_HAS_OPTION_ARCH 588 578 579 + # This symbol indicates that the toolchain supports all v1.0 vector crypto 580 + # extensions, including Zvk*, Zvbb, and Zvbc. LLVM added all of these at once. 581 + # binutils added all except Zvkb, then added Zvkb. So we just check for Zvkb. 582 + config TOOLCHAIN_HAS_VECTOR_CRYPTO 583 + def_bool $(as-instr, .option arch$(comma) +v$(comma) +zvkb) 584 + depends on AS_HAS_OPTION_ARCH 585 + 589 586 config RISCV_ISA_ZBB 590 587 bool "Zbb extension support for bit manipulation instructions" 591 588 depends on TOOLCHAIN_HAS_ZBB ··· 703 686 affects irq stack size, which is equal to thread stack size. 704 687 705 688 config RISCV_MISALIGNED 706 - bool "Support misaligned load/store traps for kernel and userspace" 689 + bool 707 690 select SYSCTL_ARCH_UNALIGN_ALLOW 708 - default y 709 691 help 710 - Say Y here if you want the kernel to embed support for misaligned 711 - load/store for both kernel and userspace. When disable, misaligned 712 - accesses will generate SIGBUS in userspace and panic in kernel. 692 + Embed support for emulating misaligned loads and stores. 693 + 694 + choice 695 + prompt "Unaligned Accesses Support" 696 + default RISCV_PROBE_UNALIGNED_ACCESS 697 + help 698 + This determines the level of support for unaligned accesses. This 699 + information is used by the kernel to perform optimizations. It is also 700 + exposed to user space via the hwprobe syscall. The hardware will be 701 + probed at boot by default. 702 + 703 + config RISCV_PROBE_UNALIGNED_ACCESS 704 + bool "Probe for hardware unaligned access support" 705 + select RISCV_MISALIGNED 706 + help 707 + During boot, the kernel will run a series of tests to determine the 708 + speed of unaligned accesses. This probing will dynamically determine 709 + the speed of unaligned accesses on the underlying system. If unaligned 710 + memory accesses trap into the kernel as they are not supported by the 711 + system, the kernel will emulate the unaligned accesses to preserve the 712 + UABI. 713 + 714 + config RISCV_EMULATED_UNALIGNED_ACCESS 715 + bool "Emulate unaligned access where system support is missing" 716 + select RISCV_MISALIGNED 717 + help 718 + If unaligned memory accesses trap into the kernel as they are not 719 + supported by the system, the kernel will emulate the unaligned 720 + accesses to preserve the UABI. When the underlying system does support 721 + unaligned accesses, the unaligned accesses are assumed to be slow. 722 + 723 + config RISCV_SLOW_UNALIGNED_ACCESS 724 + bool "Assume the system supports slow unaligned memory accesses" 725 + depends on NONPORTABLE 726 + help 727 + Assume that the system supports slow unaligned memory accesses. The 728 + kernel and userspace programs may not be able to run at all on systems 729 + that do not support unaligned memory accesses. 713 730 714 731 config RISCV_EFFICIENT_UNALIGNED_ACCESS 715 - bool "Assume the CPU supports fast unaligned memory accesses" 732 + bool "Assume the system supports fast unaligned memory accesses" 716 733 depends on NONPORTABLE 717 734 select DCACHE_WORD_ACCESS if MMU 718 735 select HAVE_EFFICIENT_UNALIGNED_ACCESS 719 736 help 720 - Say Y here if you want the kernel to assume that the CPU supports 721 - efficient unaligned memory accesses. When enabled, this option 722 - improves the performance of the kernel on such CPUs. However, the 723 - kernel will run much more slowly, or will not be able to run at all, 724 - on CPUs that do not support efficient unaligned memory accesses. 737 + Assume that the system supports fast unaligned memory accesses. When 738 + enabled, this option improves the performance of the kernel on such 739 + systems. However, the kernel and userspace programs will run much more 740 + slowly, or will not be able to run at all, on systems that do not 741 + support efficient unaligned memory accesses. 725 742 726 - If unsure what to do here, say N. 743 + endchoice 727 744 728 745 endmenu # "Platform type" 729 746 ··· 1062 1011 1063 1012 source "kernel/power/Kconfig" 1064 1013 1065 - # Hibernation is only possible on systems where the SBI implementation has 1066 - # marked its reserved memory as not accessible from, or does not run 1067 - # from the same memory as, Linux 1068 1014 config ARCH_HIBERNATION_POSSIBLE 1069 - def_bool NONPORTABLE 1015 + def_bool y 1070 1016 1071 1017 config ARCH_HIBERNATION_HEADER 1072 1018 def_bool HIBERNATION

+5

arch/riscv/Makefile

··· 50 50 KBUILD_CFLAGS += -Wa,-mno-relax 51 51 KBUILD_AFLAGS += -Wa,-mno-relax 52 52 endif 53 + # LLVM has an issue with target-features and LTO: https://github.com/llvm/llvm-project/issues/59350 54 + # Ensure it is aware of linker relaxation with LTO, otherwise relocations may 55 + # be incorrect: https://github.com/llvm/llvm-project/issues/65090 56 + else ifeq ($(CONFIG_LTO_CLANG),y) 57 + KBUILD_LDFLAGS += -mllvm -mattr=+c -mllvm -mattr=+relax 53 58 endif 54 59 55 60 ifeq ($(CONFIG_SHADOW_CALL_STACK),y)

+2 -2

arch/riscv/boot/dts/renesas/r9a07g043f.dtsi

··· 27 27 riscv,isa-base = "rv64i"; 28 28 riscv,isa-extensions = "i", "m", "a", "f", "d", "c", 29 29 "zicntr", "zicsr", "zifencei", 30 - "zihpm"; 30 + "zihpm", "xandespmu"; 31 31 mmu-type = "riscv,sv39"; 32 32 i-cache-size = <0x8000>; 33 33 i-cache-line-size = <0x40>; ··· 39 39 40 40 cpu0_intc: interrupt-controller { 41 41 #interrupt-cells = <1>; 42 - compatible = "riscv,cpu-intc"; 42 + compatible = "andestech,cpu-intc", "riscv,cpu-intc"; 43 43 interrupt-controller; 44 44 }; 45 45 };

+3

arch/riscv/configs/defconfig

··· 44 44 CONFIG_CPU_FREQ_GOV_ONDEMAND=y 45 45 CONFIG_CPU_FREQ_GOV_CONSERVATIVE=m 46 46 CONFIG_CPUFREQ_DT=y 47 + CONFIG_ACPI_CPPC_CPUFREQ=m 47 48 CONFIG_VIRTUALIZATION=y 48 49 CONFIG_KVM=m 49 50 CONFIG_ACPI=y ··· 216 215 CONFIG_MMC_SDHCI=y 217 216 CONFIG_MMC_SDHCI_PLTFM=y 218 217 CONFIG_MMC_SDHCI_CADENCE=y 218 + CONFIG_MMC_SDHCI_OF_DWCMSHC=y 219 219 CONFIG_MMC_SPI=y 220 220 CONFIG_MMC_DW=y 221 221 CONFIG_MMC_DW_STARFIVE=y ··· 226 224 CONFIG_RTC_DRV_SUN6I=y 227 225 CONFIG_DMADEVICES=y 228 226 CONFIG_DMA_SUN6I=m 227 + CONFIG_DW_AXI_DMAC=y 229 228 CONFIG_RZ_DMAC=y 230 229 CONFIG_VIRTIO_PCI=y 231 230 CONFIG_VIRTIO_BALLOON=y

+93

arch/riscv/crypto/Kconfig

··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + 3 + menu "Accelerated Cryptographic Algorithms for CPU (riscv)" 4 + 5 + config CRYPTO_AES_RISCV64 6 + tristate "Ciphers: AES, modes: ECB, CBC, CTS, CTR, XTS" 7 + depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO 8 + select CRYPTO_ALGAPI 9 + select CRYPTO_LIB_AES 10 + select CRYPTO_SKCIPHER 11 + help 12 + Block cipher: AES cipher algorithms 13 + Length-preserving ciphers: AES with ECB, CBC, CTS, CTR, XTS 14 + 15 + Architecture: riscv64 using: 16 + - Zvkned vector crypto extension 17 + - Zvbb vector extension (XTS) 18 + - Zvkb vector crypto extension (CTR) 19 + - Zvkg vector crypto extension (XTS) 20 + 21 + config CRYPTO_CHACHA_RISCV64 22 + tristate "Ciphers: ChaCha" 23 + depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO 24 + select CRYPTO_SKCIPHER 25 + select CRYPTO_LIB_CHACHA_GENERIC 26 + help 27 + Length-preserving ciphers: ChaCha20 stream cipher algorithm 28 + 29 + Architecture: riscv64 using: 30 + - Zvkb vector crypto extension 31 + 32 + config CRYPTO_GHASH_RISCV64 33 + tristate "Hash functions: GHASH" 34 + depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO 35 + select CRYPTO_GCM 36 + help 37 + GCM GHASH function (NIST SP 800-38D) 38 + 39 + Architecture: riscv64 using: 40 + - Zvkg vector crypto extension 41 + 42 + config CRYPTO_SHA256_RISCV64 43 + tristate "Hash functions: SHA-224 and SHA-256" 44 + depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO 45 + select CRYPTO_SHA256 46 + help 47 + SHA-224 and SHA-256 secure hash algorithm (FIPS 180) 48 + 49 + Architecture: riscv64 using: 50 + - Zvknha or Zvknhb vector crypto extensions 51 + - Zvkb vector crypto extension 52 + 53 + config CRYPTO_SHA512_RISCV64 54 + tristate "Hash functions: SHA-384 and SHA-512" 55 + depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO 56 + select CRYPTO_SHA512 57 + help 58 + SHA-384 and SHA-512 secure hash algorithm (FIPS 180) 59 + 60 + Architecture: riscv64 using: 61 + - Zvknhb vector crypto extension 62 + - Zvkb vector crypto extension 63 + 64 + config CRYPTO_SM3_RISCV64 65 + tristate "Hash functions: SM3 (ShangMi 3)" 66 + depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO 67 + select CRYPTO_HASH 68 + select CRYPTO_SM3 69 + help 70 + SM3 (ShangMi 3) secure hash function (OSCCA GM/T 0004-2012) 71 + 72 + Architecture: riscv64 using: 73 + - Zvksh vector crypto extension 74 + - Zvkb vector crypto extension 75 + 76 + config CRYPTO_SM4_RISCV64 77 + tristate "Ciphers: SM4 (ShangMi 4)" 78 + depends on 64BIT && RISCV_ISA_V && TOOLCHAIN_HAS_VECTOR_CRYPTO 79 + select CRYPTO_ALGAPI 80 + select CRYPTO_SM4 81 + help 82 + SM4 block cipher algorithm (OSCCA GB/T 32907-2016, 83 + ISO/IEC 18033-3:2010/Amd 1:2021) 84 + 85 + SM4 (GBT.32907-2016) is a cryptographic standard issued by the 86 + Organization of State Commercial Administration of China (OSCCA) 87 + as an authorized cryptographic algorithm for use within China. 88 + 89 + Architecture: riscv64 using: 90 + - Zvksed vector crypto extension 91 + - Zvkb vector crypto extension 92 + 93 + endmenu

+23

arch/riscv/crypto/Makefile

··· 1 + # SPDX-License-Identifier: GPL-2.0-only 2 + 3 + obj-$(CONFIG_CRYPTO_AES_RISCV64) += aes-riscv64.o 4 + aes-riscv64-y := aes-riscv64-glue.o aes-riscv64-zvkned.o \ 5 + aes-riscv64-zvkned-zvbb-zvkg.o aes-riscv64-zvkned-zvkb.o 6 + 7 + obj-$(CONFIG_CRYPTO_CHACHA_RISCV64) += chacha-riscv64.o 8 + chacha-riscv64-y := chacha-riscv64-glue.o chacha-riscv64-zvkb.o 9 + 10 + obj-$(CONFIG_CRYPTO_GHASH_RISCV64) += ghash-riscv64.o 11 + ghash-riscv64-y := ghash-riscv64-glue.o ghash-riscv64-zvkg.o 12 + 13 + obj-$(CONFIG_CRYPTO_SHA256_RISCV64) += sha256-riscv64.o 14 + sha256-riscv64-y := sha256-riscv64-glue.o sha256-riscv64-zvknha_or_zvknhb-zvkb.o 15 + 16 + obj-$(CONFIG_CRYPTO_SHA512_RISCV64) += sha512-riscv64.o 17 + sha512-riscv64-y := sha512-riscv64-glue.o sha512-riscv64-zvknhb-zvkb.o 18 + 19 + obj-$(CONFIG_CRYPTO_SM3_RISCV64) += sm3-riscv64.o 20 + sm3-riscv64-y := sm3-riscv64-glue.o sm3-riscv64-zvksh-zvkb.o 21 + 22 + obj-$(CONFIG_CRYPTO_SM4_RISCV64) += sm4-riscv64.o 23 + sm4-riscv64-y := sm4-riscv64-glue.o sm4-riscv64-zvksed-zvkb.o

+156

arch/riscv/crypto/aes-macros.S

··· 1 + /* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */ 2 + // 3 + // This file is dual-licensed, meaning that you can use it under your 4 + // choice of either of the following two licenses: 5 + // 6 + // Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. 7 + // 8 + // Licensed under the Apache License 2.0 (the "License"). You can obtain 9 + // a copy in the file LICENSE in the source distribution or at 10 + // https://www.openssl.org/source/license.html 11 + // 12 + // or 13 + // 14 + // Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu> 15 + // Copyright (c) 2023, Phoebe Chen <phoebe.chen@sifive.com> 16 + // Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com> 17 + // Copyright 2024 Google LLC 18 + // All rights reserved. 19 + // 20 + // Redistribution and use in source and binary forms, with or without 21 + // modification, are permitted provided that the following conditions 22 + // are met: 23 + // 1. Redistributions of source code must retain the above copyright 24 + // notice, this list of conditions and the following disclaimer. 25 + // 2. Redistributions in binary form must reproduce the above copyright 26 + // notice, this list of conditions and the following disclaimer in the 27 + // documentation and/or other materials provided with the distribution. 28 + // 29 + // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 30 + // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 31 + // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 32 + // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 33 + // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 34 + // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 35 + // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 36 + // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 37 + // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 38 + // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 39 + // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 40 + 41 + // This file contains macros that are shared by the other aes-*.S files. The 42 + // generated code of these macros depends on the following RISC-V extensions: 43 + // - RV64I 44 + // - RISC-V Vector ('V') with VLEN >= 128 45 + // - RISC-V Vector AES block cipher extension ('Zvkned') 46 + 47 + // Loads the AES round keys from \keyp into vector registers and jumps to code 48 + // specific to the length of the key. Specifically: 49 + // - If AES-128, loads round keys into v1-v11 and jumps to \label128. 50 + // - If AES-192, loads round keys into v1-v13 and jumps to \label192. 51 + // - If AES-256, loads round keys into v1-v15 and continues onwards. 52 + // 53 + // Also sets vl=4 and vtype=e32,m1,ta,ma. Clobbers t0 and t1. 54 + .macro aes_begin keyp, label128, label192 55 + lwu t0, 480(\keyp) // t0 = key length in bytes 56 + li t1, 24 // t1 = key length for AES-192 57 + vsetivli zero, 4, e32, m1, ta, ma 58 + vle32.v v1, (\keyp) 59 + addi \keyp, \keyp, 16 60 + vle32.v v2, (\keyp) 61 + addi \keyp, \keyp, 16 62 + vle32.v v3, (\keyp) 63 + addi \keyp, \keyp, 16 64 + vle32.v v4, (\keyp) 65 + addi \keyp, \keyp, 16 66 + vle32.v v5, (\keyp) 67 + addi \keyp, \keyp, 16 68 + vle32.v v6, (\keyp) 69 + addi \keyp, \keyp, 16 70 + vle32.v v7, (\keyp) 71 + addi \keyp, \keyp, 16 72 + vle32.v v8, (\keyp) 73 + addi \keyp, \keyp, 16 74 + vle32.v v9, (\keyp) 75 + addi \keyp, \keyp, 16 76 + vle32.v v10, (\keyp) 77 + addi \keyp, \keyp, 16 78 + vle32.v v11, (\keyp) 79 + blt t0, t1, \label128 // If AES-128, goto label128. 80 + addi \keyp, \keyp, 16 81 + vle32.v v12, (\keyp) 82 + addi \keyp, \keyp, 16 83 + vle32.v v13, (\keyp) 84 + beq t0, t1, \label192 // If AES-192, goto label192. 85 + // Else, it's AES-256. 86 + addi \keyp, \keyp, 16 87 + vle32.v v14, (\keyp) 88 + addi \keyp, \keyp, 16 89 + vle32.v v15, (\keyp) 90 + .endm 91 + 92 + // Encrypts \data using zvkned instructions, using the round keys loaded into 93 + // v1-v11 (for AES-128), v1-v13 (for AES-192), or v1-v15 (for AES-256). \keylen 94 + // is the AES key length in bits. vl and vtype must already be set 95 + // appropriately. Note that if vl > 4, multiple blocks are encrypted. 96 + .macro aes_encrypt data, keylen 97 + vaesz.vs \data, v1 98 + vaesem.vs \data, v2 99 + vaesem.vs \data, v3 100 + vaesem.vs \data, v4 101 + vaesem.vs \data, v5 102 + vaesem.vs \data, v6 103 + vaesem.vs \data, v7 104 + vaesem.vs \data, v8 105 + vaesem.vs \data, v9 106 + vaesem.vs \data, v10 107 + .if \keylen == 128 108 + vaesef.vs \data, v11 109 + .elseif \keylen == 192 110 + vaesem.vs \data, v11 111 + vaesem.vs \data, v12 112 + vaesef.vs \data, v13 113 + .else 114 + vaesem.vs \data, v11 115 + vaesem.vs \data, v12 116 + vaesem.vs \data, v13 117 + vaesem.vs \data, v14 118 + vaesef.vs \data, v15 119 + .endif 120 + .endm 121 + 122 + // Same as aes_encrypt, but decrypts instead of encrypts. 123 + .macro aes_decrypt data, keylen 124 + .if \keylen == 128 125 + vaesz.vs \data, v11 126 + .elseif \keylen == 192 127 + vaesz.vs \data, v13 128 + vaesdm.vs \data, v12 129 + vaesdm.vs \data, v11 130 + .else 131 + vaesz.vs \data, v15 132 + vaesdm.vs \data, v14 133 + vaesdm.vs \data, v13 134 + vaesdm.vs \data, v12 135 + vaesdm.vs \data, v11 136 + .endif 137 + vaesdm.vs \data, v10 138 + vaesdm.vs \data, v9 139 + vaesdm.vs \data, v8 140 + vaesdm.vs \data, v7 141 + vaesdm.vs \data, v6 142 + vaesdm.vs \data, v5 143 + vaesdm.vs \data, v4 144 + vaesdm.vs \data, v3 145 + vaesdm.vs \data, v2 146 + vaesdf.vs \data, v1 147 + .endm 148 + 149 + // Expands to aes_encrypt or aes_decrypt according to \enc, which is 1 or 0. 150 + .macro aes_crypt data, enc, keylen 151 + .if \enc 152 + aes_encrypt \data, \keylen 153 + .else 154 + aes_decrypt \data, \keylen 155 + .endif 156 + .endm

+637

arch/riscv/crypto/aes-riscv64-glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * AES using the RISC-V vector crypto extensions. Includes the bare block 4 + * cipher and the ECB, CBC, CBC-CTS, CTR, and XTS modes. 5 + * 6 + * Copyright (C) 2023 VRULL GmbH 7 + * Author: Heiko Stuebner <heiko.stuebner@vrull.eu> 8 + * 9 + * Copyright (C) 2023 SiFive, Inc. 10 + * Author: Jerry Shih <jerry.shih@sifive.com> 11 + * 12 + * Copyright 2024 Google LLC 13 + */ 14 + 15 + #include <asm/simd.h> 16 + #include <asm/vector.h> 17 + #include <crypto/aes.h> 18 + #include <crypto/internal/cipher.h> 19 + #include <crypto/internal/simd.h> 20 + #include <crypto/internal/skcipher.h> 21 + #include <crypto/scatterwalk.h> 22 + #include <crypto/xts.h> 23 + #include <linux/linkage.h> 24 + #include <linux/module.h> 25 + 26 + asmlinkage void aes_encrypt_zvkned(const struct crypto_aes_ctx *key, 27 + const u8 in[AES_BLOCK_SIZE], 28 + u8 out[AES_BLOCK_SIZE]); 29 + asmlinkage void aes_decrypt_zvkned(const struct crypto_aes_ctx *key, 30 + const u8 in[AES_BLOCK_SIZE], 31 + u8 out[AES_BLOCK_SIZE]); 32 + 33 + asmlinkage void aes_ecb_encrypt_zvkned(const struct crypto_aes_ctx *key, 34 + const u8 *in, u8 *out, size_t len); 35 + asmlinkage void aes_ecb_decrypt_zvkned(const struct crypto_aes_ctx *key, 36 + const u8 *in, u8 *out, size_t len); 37 + 38 + asmlinkage void aes_cbc_encrypt_zvkned(const struct crypto_aes_ctx *key, 39 + const u8 *in, u8 *out, size_t len, 40 + u8 iv[AES_BLOCK_SIZE]); 41 + asmlinkage void aes_cbc_decrypt_zvkned(const struct crypto_aes_ctx *key, 42 + const u8 *in, u8 *out, size_t len, 43 + u8 iv[AES_BLOCK_SIZE]); 44 + 45 + asmlinkage void aes_cbc_cts_crypt_zvkned(const struct crypto_aes_ctx *key, 46 + const u8 *in, u8 *out, size_t len, 47 + const u8 iv[AES_BLOCK_SIZE], bool enc); 48 + 49 + asmlinkage void aes_ctr32_crypt_zvkned_zvkb(const struct crypto_aes_ctx *key, 50 + const u8 *in, u8 *out, size_t len, 51 + u8 iv[AES_BLOCK_SIZE]); 52 + 53 + asmlinkage void aes_xts_encrypt_zvkned_zvbb_zvkg( 54 + const struct crypto_aes_ctx *key, 55 + const u8 *in, u8 *out, size_t len, 56 + u8 tweak[AES_BLOCK_SIZE]); 57 + 58 + asmlinkage void aes_xts_decrypt_zvkned_zvbb_zvkg( 59 + const struct crypto_aes_ctx *key, 60 + const u8 *in, u8 *out, size_t len, 61 + u8 tweak[AES_BLOCK_SIZE]); 62 + 63 + static int riscv64_aes_setkey(struct crypto_aes_ctx *ctx, 64 + const u8 *key, unsigned int keylen) 65 + { 66 + /* 67 + * For now we just use the generic key expansion, for these reasons: 68 + * 69 + * - zvkned's key expansion instructions don't support AES-192. 70 + * So, non-zvkned fallback code would be needed anyway. 71 + * 72 + * - Users of AES in Linux usually don't change keys frequently. 73 + * So, key expansion isn't performance-critical. 74 + * 75 + * - For single-block AES exposed as a "cipher" algorithm, it's 76 + * necessary to use struct crypto_aes_ctx and initialize its 'key_dec' 77 + * field with the round keys for the Equivalent Inverse Cipher. This 78 + * is because with "cipher", decryption can be requested from a 79 + * context where the vector unit isn't usable, necessitating a 80 + * fallback to aes_decrypt(). But, zvkned can only generate and use 81 + * the normal round keys. Of course, it's preferable to not have 82 + * special code just for "cipher", as e.g. XTS also uses a 83 + * single-block AES encryption. It's simplest to just use 84 + * struct crypto_aes_ctx and aes_expandkey() everywhere. 85 + */ 86 + return aes_expandkey(ctx, key, keylen); 87 + } 88 + 89 + static int riscv64_aes_setkey_cipher(struct crypto_tfm *tfm, 90 + const u8 *key, unsigned int keylen) 91 + { 92 + struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); 93 + 94 + return riscv64_aes_setkey(ctx, key, keylen); 95 + } 96 + 97 + static int riscv64_aes_setkey_skcipher(struct crypto_skcipher *tfm, 98 + const u8 *key, unsigned int keylen) 99 + { 100 + struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); 101 + 102 + return riscv64_aes_setkey(ctx, key, keylen); 103 + } 104 + 105 + /* Bare AES, without a mode of operation */ 106 + 107 + static void riscv64_aes_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src) 108 + { 109 + const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); 110 + 111 + if (crypto_simd_usable()) { 112 + kernel_vector_begin(); 113 + aes_encrypt_zvkned(ctx, src, dst); 114 + kernel_vector_end(); 115 + } else { 116 + aes_encrypt(ctx, dst, src); 117 + } 118 + } 119 + 120 + static void riscv64_aes_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src) 121 + { 122 + const struct crypto_aes_ctx *ctx = crypto_tfm_ctx(tfm); 123 + 124 + if (crypto_simd_usable()) { 125 + kernel_vector_begin(); 126 + aes_decrypt_zvkned(ctx, src, dst); 127 + kernel_vector_end(); 128 + } else { 129 + aes_decrypt(ctx, dst, src); 130 + } 131 + } 132 + 133 + /* AES-ECB */ 134 + 135 + static inline int riscv64_aes_ecb_crypt(struct skcipher_request *req, bool enc) 136 + { 137 + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); 138 + const struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); 139 + struct skcipher_walk walk; 140 + unsigned int nbytes; 141 + int err; 142 + 143 + err = skcipher_walk_virt(&walk, req, false); 144 + while ((nbytes = walk.nbytes) != 0) { 145 + kernel_vector_begin(); 146 + if (enc) 147 + aes_ecb_encrypt_zvkned(ctx, walk.src.virt.addr, 148 + walk.dst.virt.addr, 149 + nbytes & ~(AES_BLOCK_SIZE - 1)); 150 + else 151 + aes_ecb_decrypt_zvkned(ctx, walk.src.virt.addr, 152 + walk.dst.virt.addr, 153 + nbytes & ~(AES_BLOCK_SIZE - 1)); 154 + kernel_vector_end(); 155 + err = skcipher_walk_done(&walk, nbytes & (AES_BLOCK_SIZE - 1)); 156 + } 157 + 158 + return err; 159 + } 160 + 161 + static int riscv64_aes_ecb_encrypt(struct skcipher_request *req) 162 + { 163 + return riscv64_aes_ecb_crypt(req, true); 164 + } 165 + 166 + static int riscv64_aes_ecb_decrypt(struct skcipher_request *req) 167 + { 168 + return riscv64_aes_ecb_crypt(req, false); 169 + } 170 + 171 + /* AES-CBC */ 172 + 173 + static int riscv64_aes_cbc_crypt(struct skcipher_request *req, bool enc) 174 + { 175 + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); 176 + const struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); 177 + struct skcipher_walk walk; 178 + unsigned int nbytes; 179 + int err; 180 + 181 + err = skcipher_walk_virt(&walk, req, false); 182 + while ((nbytes = walk.nbytes) != 0) { 183 + kernel_vector_begin(); 184 + if (enc) 185 + aes_cbc_encrypt_zvkned(ctx, walk.src.virt.addr, 186 + walk.dst.virt.addr, 187 + nbytes & ~(AES_BLOCK_SIZE - 1), 188 + walk.iv); 189 + else 190 + aes_cbc_decrypt_zvkned(ctx, walk.src.virt.addr, 191 + walk.dst.virt.addr, 192 + nbytes & ~(AES_BLOCK_SIZE - 1), 193 + walk.iv); 194 + kernel_vector_end(); 195 + err = skcipher_walk_done(&walk, nbytes & (AES_BLOCK_SIZE - 1)); 196 + } 197 + 198 + return err; 199 + } 200 + 201 + static int riscv64_aes_cbc_encrypt(struct skcipher_request *req) 202 + { 203 + return riscv64_aes_cbc_crypt(req, true); 204 + } 205 + 206 + static int riscv64_aes_cbc_decrypt(struct skcipher_request *req) 207 + { 208 + return riscv64_aes_cbc_crypt(req, false); 209 + } 210 + 211 + /* AES-CBC-CTS */ 212 + 213 + static int riscv64_aes_cbc_cts_crypt(struct skcipher_request *req, bool enc) 214 + { 215 + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); 216 + const struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); 217 + struct scatterlist sg_src[2], sg_dst[2]; 218 + struct skcipher_request subreq; 219 + struct scatterlist *src, *dst; 220 + struct skcipher_walk walk; 221 + unsigned int cbc_len; 222 + int err; 223 + 224 + if (req->cryptlen < AES_BLOCK_SIZE) 225 + return -EINVAL; 226 + 227 + err = skcipher_walk_virt(&walk, req, false); 228 + if (err) 229 + return err; 230 + /* 231 + * If the full message is available in one step, decrypt it in one call 232 + * to the CBC-CTS assembly function. This reduces overhead, especially 233 + * on short messages. Otherwise, fall back to doing CBC up to the last 234 + * two blocks, then invoke CTS just for the ciphertext stealing. 235 + */ 236 + if (unlikely(walk.nbytes != req->cryptlen)) { 237 + cbc_len = round_down(req->cryptlen - AES_BLOCK_SIZE - 1, 238 + AES_BLOCK_SIZE); 239 + skcipher_walk_abort(&walk); 240 + skcipher_request_set_tfm(&subreq, tfm); 241 + skcipher_request_set_callback(&subreq, 242 + skcipher_request_flags(req), 243 + NULL, NULL); 244 + skcipher_request_set_crypt(&subreq, req->src, req->dst, 245 + cbc_len, req->iv); 246 + err = riscv64_aes_cbc_crypt(&subreq, enc); 247 + if (err) 248 + return err; 249 + dst = src = scatterwalk_ffwd(sg_src, req->src, cbc_len); 250 + if (req->dst != req->src) 251 + dst = scatterwalk_ffwd(sg_dst, req->dst, cbc_len); 252 + skcipher_request_set_crypt(&subreq, src, dst, 253 + req->cryptlen - cbc_len, req->iv); 254 + err = skcipher_walk_virt(&walk, &subreq, false); 255 + if (err) 256 + return err; 257 + } 258 + kernel_vector_begin(); 259 + aes_cbc_cts_crypt_zvkned(ctx, walk.src.virt.addr, walk.dst.virt.addr, 260 + walk.nbytes, req->iv, enc); 261 + kernel_vector_end(); 262 + return skcipher_walk_done(&walk, 0); 263 + } 264 + 265 + static int riscv64_aes_cbc_cts_encrypt(struct skcipher_request *req) 266 + { 267 + return riscv64_aes_cbc_cts_crypt(req, true); 268 + } 269 + 270 + static int riscv64_aes_cbc_cts_decrypt(struct skcipher_request *req) 271 + { 272 + return riscv64_aes_cbc_cts_crypt(req, false); 273 + } 274 + 275 + /* AES-CTR */ 276 + 277 + static int riscv64_aes_ctr_crypt(struct skcipher_request *req) 278 + { 279 + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); 280 + const struct crypto_aes_ctx *ctx = crypto_skcipher_ctx(tfm); 281 + unsigned int nbytes, p1_nbytes; 282 + struct skcipher_walk walk; 283 + u32 ctr32, nblocks; 284 + int err; 285 + 286 + /* Get the low 32-bit word of the 128-bit big endian counter. */ 287 + ctr32 = get_unaligned_be32(req->iv + 12); 288 + 289 + err = skcipher_walk_virt(&walk, req, false); 290 + while ((nbytes = walk.nbytes) != 0) { 291 + if (nbytes < walk.total) { 292 + /* Not the end yet, so keep the length block-aligned. */ 293 + nbytes = round_down(nbytes, AES_BLOCK_SIZE); 294 + nblocks = nbytes / AES_BLOCK_SIZE; 295 + } else { 296 + /* It's the end, so include any final partial block. */ 297 + nblocks = DIV_ROUND_UP(nbytes, AES_BLOCK_SIZE); 298 + } 299 + ctr32 += nblocks; 300 + 301 + kernel_vector_begin(); 302 + if (ctr32 >= nblocks) { 303 + /* The low 32-bit word of the counter won't overflow. */ 304 + aes_ctr32_crypt_zvkned_zvkb(ctx, walk.src.virt.addr, 305 + walk.dst.virt.addr, nbytes, 306 + req->iv); 307 + } else { 308 + /* 309 + * The low 32-bit word of the counter will overflow. 310 + * The assembly doesn't handle this case, so split the 311 + * operation into two at the point where the overflow 312 + * will occur. After the first part, add the carry bit. 313 + */ 314 + p1_nbytes = min_t(unsigned int, nbytes, 315 + (nblocks - ctr32) * AES_BLOCK_SIZE); 316 + aes_ctr32_crypt_zvkned_zvkb(ctx, walk.src.virt.addr, 317 + walk.dst.virt.addr, 318 + p1_nbytes, req->iv); 319 + crypto_inc(req->iv, 12); 320 + 321 + if (ctr32) { 322 + aes_ctr32_crypt_zvkned_zvkb( 323 + ctx, 324 + walk.src.virt.addr + p1_nbytes, 325 + walk.dst.virt.addr + p1_nbytes, 326 + nbytes - p1_nbytes, req->iv); 327 + } 328 + } 329 + kernel_vector_end(); 330 + 331 + err = skcipher_walk_done(&walk, walk.nbytes - nbytes); 332 + } 333 + 334 + return err; 335 + } 336 + 337 + /* AES-XTS */ 338 + 339 + struct riscv64_aes_xts_ctx { 340 + struct crypto_aes_ctx ctx1; 341 + struct crypto_aes_ctx ctx2; 342 + }; 343 + 344 + static int riscv64_aes_xts_setkey(struct crypto_skcipher *tfm, const u8 *key, 345 + unsigned int keylen) 346 + { 347 + struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm); 348 + 349 + return xts_verify_key(tfm, key, keylen) ?: 350 + riscv64_aes_setkey(&ctx->ctx1, key, keylen / 2) ?: 351 + riscv64_aes_setkey(&ctx->ctx2, key + keylen / 2, keylen / 2); 352 + } 353 + 354 + static int riscv64_aes_xts_crypt(struct skcipher_request *req, bool enc) 355 + { 356 + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); 357 + const struct riscv64_aes_xts_ctx *ctx = crypto_skcipher_ctx(tfm); 358 + int tail = req->cryptlen % AES_BLOCK_SIZE; 359 + struct scatterlist sg_src[2], sg_dst[2]; 360 + struct skcipher_request subreq; 361 + struct scatterlist *src, *dst; 362 + struct skcipher_walk walk; 363 + int err; 364 + 365 + if (req->cryptlen < AES_BLOCK_SIZE) 366 + return -EINVAL; 367 + 368 + /* Encrypt the IV with the tweak key to get the first tweak. */ 369 + kernel_vector_begin(); 370 + aes_encrypt_zvkned(&ctx->ctx2, req->iv, req->iv); 371 + kernel_vector_end(); 372 + 373 + err = skcipher_walk_virt(&walk, req, false); 374 + 375 + /* 376 + * If the message length isn't divisible by the AES block size and the 377 + * full message isn't available in one step of the scatterlist walk, 378 + * then separate off the last full block and the partial block. This 379 + * ensures that they are processed in the same call to the assembly 380 + * function, which is required for ciphertext stealing. 381 + */ 382 + if (unlikely(tail > 0 && walk.nbytes < walk.total)) { 383 + skcipher_walk_abort(&walk); 384 + 385 + skcipher_request_set_tfm(&subreq, tfm); 386 + skcipher_request_set_callback(&subreq, 387 + skcipher_request_flags(req), 388 + NULL, NULL); 389 + skcipher_request_set_crypt(&subreq, req->src, req->dst, 390 + req->cryptlen - tail - AES_BLOCK_SIZE, 391 + req->iv); 392 + req = &subreq; 393 + err = skcipher_walk_virt(&walk, req, false); 394 + } else { 395 + tail = 0; 396 + } 397 + 398 + while (walk.nbytes) { 399 + unsigned int nbytes = walk.nbytes; 400 + 401 + if (nbytes < walk.total) 402 + nbytes = round_down(nbytes, AES_BLOCK_SIZE); 403 + 404 + kernel_vector_begin(); 405 + if (enc) 406 + aes_xts_encrypt_zvkned_zvbb_zvkg( 407 + &ctx->ctx1, walk.src.virt.addr, 408 + walk.dst.virt.addr, nbytes, req->iv); 409 + else 410 + aes_xts_decrypt_zvkned_zvbb_zvkg( 411 + &ctx->ctx1, walk.src.virt.addr, 412 + walk.dst.virt.addr, nbytes, req->iv); 413 + kernel_vector_end(); 414 + err = skcipher_walk_done(&walk, walk.nbytes - nbytes); 415 + } 416 + 417 + if (err || likely(!tail)) 418 + return err; 419 + 420 + /* Do ciphertext stealing with the last full block and partial block. */ 421 + 422 + dst = src = scatterwalk_ffwd(sg_src, req->src, req->cryptlen); 423 + if (req->dst != req->src) 424 + dst = scatterwalk_ffwd(sg_dst, req->dst, req->cryptlen); 425 + 426 + skcipher_request_set_crypt(req, src, dst, AES_BLOCK_SIZE + tail, 427 + req->iv); 428 + 429 + err = skcipher_walk_virt(&walk, req, false); 430 + if (err) 431 + return err; 432 + 433 + kernel_vector_begin(); 434 + if (enc) 435 + aes_xts_encrypt_zvkned_zvbb_zvkg( 436 + &ctx->ctx1, walk.src.virt.addr, 437 + walk.dst.virt.addr, walk.nbytes, req->iv); 438 + else 439 + aes_xts_decrypt_zvkned_zvbb_zvkg( 440 + &ctx->ctx1, walk.src.virt.addr, 441 + walk.dst.virt.addr, walk.nbytes, req->iv); 442 + kernel_vector_end(); 443 + 444 + return skcipher_walk_done(&walk, 0); 445 + } 446 + 447 + static int riscv64_aes_xts_encrypt(struct skcipher_request *req) 448 + { 449 + return riscv64_aes_xts_crypt(req, true); 450 + } 451 + 452 + static int riscv64_aes_xts_decrypt(struct skcipher_request *req) 453 + { 454 + return riscv64_aes_xts_crypt(req, false); 455 + } 456 + 457 + /* Algorithm definitions */ 458 + 459 + static struct crypto_alg riscv64_zvkned_aes_cipher_alg = { 460 + .cra_flags = CRYPTO_ALG_TYPE_CIPHER, 461 + .cra_blocksize = AES_BLOCK_SIZE, 462 + .cra_ctxsize = sizeof(struct crypto_aes_ctx), 463 + .cra_priority = 300, 464 + .cra_name = "aes", 465 + .cra_driver_name = "aes-riscv64-zvkned", 466 + .cra_cipher = { 467 + .cia_min_keysize = AES_MIN_KEY_SIZE, 468 + .cia_max_keysize = AES_MAX_KEY_SIZE, 469 + .cia_setkey = riscv64_aes_setkey_cipher, 470 + .cia_encrypt = riscv64_aes_encrypt, 471 + .cia_decrypt = riscv64_aes_decrypt, 472 + }, 473 + .cra_module = THIS_MODULE, 474 + }; 475 + 476 + static struct skcipher_alg riscv64_zvkned_aes_skcipher_algs[] = { 477 + { 478 + .setkey = riscv64_aes_setkey_skcipher, 479 + .encrypt = riscv64_aes_ecb_encrypt, 480 + .decrypt = riscv64_aes_ecb_decrypt, 481 + .min_keysize = AES_MIN_KEY_SIZE, 482 + .max_keysize = AES_MAX_KEY_SIZE, 483 + .walksize = 8 * AES_BLOCK_SIZE, /* matches LMUL=8 */ 484 + .base = { 485 + .cra_blocksize = AES_BLOCK_SIZE, 486 + .cra_ctxsize = sizeof(struct crypto_aes_ctx), 487 + .cra_priority = 300, 488 + .cra_name = "ecb(aes)", 489 + .cra_driver_name = "ecb-aes-riscv64-zvkned", 490 + .cra_module = THIS_MODULE, 491 + }, 492 + }, { 493 + .setkey = riscv64_aes_setkey_skcipher, 494 + .encrypt = riscv64_aes_cbc_encrypt, 495 + .decrypt = riscv64_aes_cbc_decrypt, 496 + .min_keysize = AES_MIN_KEY_SIZE, 497 + .max_keysize = AES_MAX_KEY_SIZE, 498 + .ivsize = AES_BLOCK_SIZE, 499 + .base = { 500 + .cra_blocksize = AES_BLOCK_SIZE, 501 + .cra_ctxsize = sizeof(struct crypto_aes_ctx), 502 + .cra_priority = 300, 503 + .cra_name = "cbc(aes)", 504 + .cra_driver_name = "cbc-aes-riscv64-zvkned", 505 + .cra_module = THIS_MODULE, 506 + }, 507 + }, { 508 + .setkey = riscv64_aes_setkey_skcipher, 509 + .encrypt = riscv64_aes_cbc_cts_encrypt, 510 + .decrypt = riscv64_aes_cbc_cts_decrypt, 511 + .min_keysize = AES_MIN_KEY_SIZE, 512 + .max_keysize = AES_MAX_KEY_SIZE, 513 + .ivsize = AES_BLOCK_SIZE, 514 + .walksize = 4 * AES_BLOCK_SIZE, /* matches LMUL=4 */ 515 + .base = { 516 + .cra_blocksize = AES_BLOCK_SIZE, 517 + .cra_ctxsize = sizeof(struct crypto_aes_ctx), 518 + .cra_priority = 300, 519 + .cra_name = "cts(cbc(aes))", 520 + .cra_driver_name = "cts-cbc-aes-riscv64-zvkned", 521 + .cra_module = THIS_MODULE, 522 + }, 523 + } 524 + }; 525 + 526 + static struct skcipher_alg riscv64_zvkned_zvkb_aes_skcipher_alg = { 527 + .setkey = riscv64_aes_setkey_skcipher, 528 + .encrypt = riscv64_aes_ctr_crypt, 529 + .decrypt = riscv64_aes_ctr_crypt, 530 + .min_keysize = AES_MIN_KEY_SIZE, 531 + .max_keysize = AES_MAX_KEY_SIZE, 532 + .ivsize = AES_BLOCK_SIZE, 533 + .chunksize = AES_BLOCK_SIZE, 534 + .walksize = 4 * AES_BLOCK_SIZE, /* matches LMUL=4 */ 535 + .base = { 536 + .cra_blocksize = 1, 537 + .cra_ctxsize = sizeof(struct crypto_aes_ctx), 538 + .cra_priority = 300, 539 + .cra_name = "ctr(aes)", 540 + .cra_driver_name = "ctr-aes-riscv64-zvkned-zvkb", 541 + .cra_module = THIS_MODULE, 542 + }, 543 + }; 544 + 545 + static struct skcipher_alg riscv64_zvkned_zvbb_zvkg_aes_skcipher_alg = { 546 + .setkey = riscv64_aes_xts_setkey, 547 + .encrypt = riscv64_aes_xts_encrypt, 548 + .decrypt = riscv64_aes_xts_decrypt, 549 + .min_keysize = 2 * AES_MIN_KEY_SIZE, 550 + .max_keysize = 2 * AES_MAX_KEY_SIZE, 551 + .ivsize = AES_BLOCK_SIZE, 552 + .chunksize = AES_BLOCK_SIZE, 553 + .walksize = 4 * AES_BLOCK_SIZE, /* matches LMUL=4 */ 554 + .base = { 555 + .cra_blocksize = AES_BLOCK_SIZE, 556 + .cra_ctxsize = sizeof(struct riscv64_aes_xts_ctx), 557 + .cra_priority = 300, 558 + .cra_name = "xts(aes)", 559 + .cra_driver_name = "xts-aes-riscv64-zvkned-zvbb-zvkg", 560 + .cra_module = THIS_MODULE, 561 + }, 562 + }; 563 + 564 + static inline bool riscv64_aes_xts_supported(void) 565 + { 566 + return riscv_isa_extension_available(NULL, ZVBB) && 567 + riscv_isa_extension_available(NULL, ZVKG) && 568 + riscv_vector_vlen() < 2048 /* Implementation limitation */; 569 + } 570 + 571 + static int __init riscv64_aes_mod_init(void) 572 + { 573 + int err = -ENODEV; 574 + 575 + if (riscv_isa_extension_available(NULL, ZVKNED) && 576 + riscv_vector_vlen() >= 128) { 577 + err = crypto_register_alg(&riscv64_zvkned_aes_cipher_alg); 578 + if (err) 579 + return err; 580 + 581 + err = crypto_register_skciphers( 582 + riscv64_zvkned_aes_skcipher_algs, 583 + ARRAY_SIZE(riscv64_zvkned_aes_skcipher_algs)); 584 + if (err) 585 + goto unregister_zvkned_cipher_alg; 586 + 587 + if (riscv_isa_extension_available(NULL, ZVKB)) { 588 + err = crypto_register_skcipher( 589 + &riscv64_zvkned_zvkb_aes_skcipher_alg); 590 + if (err) 591 + goto unregister_zvkned_skcipher_algs; 592 + } 593 + 594 + if (riscv64_aes_xts_supported()) { 595 + err = crypto_register_skcipher( 596 + &riscv64_zvkned_zvbb_zvkg_aes_skcipher_alg); 597 + if (err) 598 + goto unregister_zvkned_zvkb_skcipher_alg; 599 + } 600 + } 601 + 602 + return err; 603 + 604 + unregister_zvkned_zvkb_skcipher_alg: 605 + if (riscv_isa_extension_available(NULL, ZVKB)) 606 + crypto_unregister_skcipher(&riscv64_zvkned_zvkb_aes_skcipher_alg); 607 + unregister_zvkned_skcipher_algs: 608 + crypto_unregister_skciphers(riscv64_zvkned_aes_skcipher_algs, 609 + ARRAY_SIZE(riscv64_zvkned_aes_skcipher_algs)); 610 + unregister_zvkned_cipher_alg: 611 + crypto_unregister_alg(&riscv64_zvkned_aes_cipher_alg); 612 + return err; 613 + } 614 + 615 + static void __exit riscv64_aes_mod_exit(void) 616 + { 617 + if (riscv64_aes_xts_supported()) 618 + crypto_unregister_skcipher(&riscv64_zvkned_zvbb_zvkg_aes_skcipher_alg); 619 + if (riscv_isa_extension_available(NULL, ZVKB)) 620 + crypto_unregister_skcipher(&riscv64_zvkned_zvkb_aes_skcipher_alg); 621 + crypto_unregister_skciphers(riscv64_zvkned_aes_skcipher_algs, 622 + ARRAY_SIZE(riscv64_zvkned_aes_skcipher_algs)); 623 + crypto_unregister_alg(&riscv64_zvkned_aes_cipher_alg); 624 + } 625 + 626 + module_init(riscv64_aes_mod_init); 627 + module_exit(riscv64_aes_mod_exit); 628 + 629 + MODULE_DESCRIPTION("AES-ECB/CBC/CTS/CTR/XTS (RISC-V accelerated)"); 630 + MODULE_AUTHOR("Jerry Shih <jerry.shih@sifive.com>"); 631 + MODULE_LICENSE("GPL"); 632 + MODULE_ALIAS_CRYPTO("aes"); 633 + MODULE_ALIAS_CRYPTO("ecb(aes)"); 634 + MODULE_ALIAS_CRYPTO("cbc(aes)"); 635 + MODULE_ALIAS_CRYPTO("cts(cbc(aes))"); 636 + MODULE_ALIAS_CRYPTO("ctr(aes)"); 637 + MODULE_ALIAS_CRYPTO("xts(aes)");

+312

arch/riscv/crypto/aes-riscv64-zvkned-zvbb-zvkg.S

··· 1 + /* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */ 2 + // 3 + // This file is dual-licensed, meaning that you can use it under your 4 + // choice of either of the following two licenses: 5 + // 6 + // Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. 7 + // 8 + // Licensed under the Apache License 2.0 (the "License"). You can obtain 9 + // a copy in the file LICENSE in the source distribution or at 10 + // https://www.openssl.org/source/license.html 11 + // 12 + // or 13 + // 14 + // Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com> 15 + // Copyright 2024 Google LLC 16 + // All rights reserved. 17 + // 18 + // Redistribution and use in source and binary forms, with or without 19 + // modification, are permitted provided that the following conditions 20 + // are met: 21 + // 1. Redistributions of source code must retain the above copyright 22 + // notice, this list of conditions and the following disclaimer. 23 + // 2. Redistributions in binary form must reproduce the above copyright 24 + // notice, this list of conditions and the following disclaimer in the 25 + // documentation and/or other materials provided with the distribution. 26 + // 27 + // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 28 + // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 29 + // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 30 + // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 31 + // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 32 + // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 33 + // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 34 + // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 35 + // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 36 + // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 + // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 + 39 + // The generated code of this file depends on the following RISC-V extensions: 40 + // - RV64I 41 + // - RISC-V Vector ('V') with VLEN >= 128 && VLEN < 2048 42 + // - RISC-V Vector AES block cipher extension ('Zvkned') 43 + // - RISC-V Vector Bit-manipulation extension ('Zvbb') 44 + // - RISC-V Vector GCM/GMAC extension ('Zvkg') 45 + 46 + #include <linux/linkage.h> 47 + 48 + .text 49 + .option arch, +zvkned, +zvbb, +zvkg 50 + 51 + #include "aes-macros.S" 52 + 53 + #define KEYP a0 54 + #define INP a1 55 + #define OUTP a2 56 + #define LEN a3 57 + #define TWEAKP a4 58 + 59 + #define LEN32 a5 60 + #define TAIL_LEN a6 61 + #define VL a7 62 + #define VLMAX t4 63 + 64 + // v1-v15 contain the AES round keys, but they are used for temporaries before 65 + // the AES round keys have been loaded. 66 + #define TWEAKS v16 // LMUL=4 (most of the time) 67 + #define TWEAKS_BREV v20 // LMUL=4 (most of the time) 68 + #define MULTS_BREV v24 // LMUL=4 (most of the time) 69 + #define TMP0 v28 70 + #define TMP1 v29 71 + #define TMP2 v30 72 + #define TMP3 v31 73 + 74 + // xts_init initializes the following values: 75 + // 76 + // TWEAKS: N 128-bit tweaks T*(x^i) for i in 0..(N - 1) 77 + // TWEAKS_BREV: same as TWEAKS, but bit-reversed 78 + // MULTS_BREV: N 128-bit values x^N, bit-reversed. Only if N > 1. 79 + // 80 + // N is the maximum number of blocks that will be processed per loop iteration, 81 + // computed using vsetvli. 82 + // 83 + // The field convention used by XTS is the same as that of GHASH, but with the 84 + // bits reversed within each byte. The zvkg extension provides the vgmul 85 + // instruction which does multiplication in this field. Therefore, for tweak 86 + // computation we use vgmul to do multiplications in parallel, instead of 87 + // serially multiplying by x using shifting+xoring. Note that for this to work, 88 + // the inputs and outputs to vgmul must be bit-reversed (we do it with vbrev8). 89 + .macro xts_init 90 + 91 + // Load the first tweak T. 92 + vsetivli zero, 4, e32, m1, ta, ma 93 + vle32.v TWEAKS, (TWEAKP) 94 + 95 + // If there's only one block (or no blocks at all), then skip the tweak 96 + // sequence computation because (at most) T itself is needed. 97 + li t0, 16 98 + ble LEN, t0, .Linit_single_block\@ 99 + 100 + // Save a copy of T bit-reversed in v12. 101 + vbrev8.v v12, TWEAKS 102 + 103 + // 104 + // Generate x^i for i in 0..(N - 1), i.e. 128-bit values 1 << i assuming 105 + // that N <= 128. Though, this code actually requires N < 64 (or 106 + // equivalently VLEN < 2048) due to the use of 64-bit intermediate 107 + // values here and in the x^N computation later. 108 + // 109 + vsetvli VL, LEN32, e32, m4, ta, ma 110 + srli t0, VL, 2 // t0 = N (num blocks) 111 + // Generate two sequences, each with N 32-bit values: 112 + // v0=[1, 1, 1, ...] and v1=[0, 1, 2, ...]. 113 + vsetvli zero, t0, e32, m1, ta, ma 114 + vmv.v.i v0, 1 115 + vid.v v1 116 + // Use vzext to zero-extend the sequences to 64 bits. Reinterpret them 117 + // as two sequences, each with 2*N 32-bit values: 118 + // v2=[1, 0, 1, 0, 1, 0, ...] and v4=[0, 0, 1, 0, 2, 0, ...]. 119 + vsetvli zero, t0, e64, m2, ta, ma 120 + vzext.vf2 v2, v0 121 + vzext.vf2 v4, v1 122 + slli t1, t0, 1 // t1 = 2*N 123 + vsetvli zero, t1, e32, m2, ta, ma 124 + // Use vwsll to compute [1<<0, 0<<0, 1<<1, 0<<0, 1<<2, 0<<0, ...], 125 + // widening to 64 bits per element. When reinterpreted as N 128-bit 126 + // values, this is the needed sequence of 128-bit values 1 << i (x^i). 127 + vwsll.vv v8, v2, v4 128 + 129 + // Copy the bit-reversed T to all N elements of TWEAKS_BREV, then 130 + // multiply by x^i. This gives the sequence T*(x^i), bit-reversed. 131 + vsetvli zero, LEN32, e32, m4, ta, ma 132 + vmv.v.i TWEAKS_BREV, 0 133 + vaesz.vs TWEAKS_BREV, v12 134 + vbrev8.v v8, v8 135 + vgmul.vv TWEAKS_BREV, v8 136 + 137 + // Save a copy of the sequence T*(x^i) with the bit reversal undone. 138 + vbrev8.v TWEAKS, TWEAKS_BREV 139 + 140 + // Generate N copies of x^N, i.e. 128-bit values 1 << N, bit-reversed. 141 + li t1, 1 142 + sll t1, t1, t0 // t1 = 1 << N 143 + vsetivli zero, 2, e64, m1, ta, ma 144 + vmv.v.i v0, 0 145 + vsetivli zero, 1, e64, m1, tu, ma 146 + vmv.v.x v0, t1 147 + vbrev8.v v0, v0 148 + vsetvli zero, LEN32, e32, m4, ta, ma 149 + vmv.v.i MULTS_BREV, 0 150 + vaesz.vs MULTS_BREV, v0 151 + 152 + j .Linit_done\@ 153 + 154 + .Linit_single_block\@: 155 + vbrev8.v TWEAKS_BREV, TWEAKS 156 + .Linit_done\@: 157 + .endm 158 + 159 + // Set the first 128 bits of MULTS_BREV to 0x40, i.e. 'x' bit-reversed. This is 160 + // the multiplier required to advance the tweak by one. 161 + .macro load_x 162 + li t0, 0x40 163 + vsetivli zero, 4, e32, m1, ta, ma 164 + vmv.v.i MULTS_BREV, 0 165 + vsetivli zero, 1, e8, m1, tu, ma 166 + vmv.v.x MULTS_BREV, t0 167 + .endm 168 + 169 + .macro __aes_xts_crypt enc, keylen 170 + // With 16 < len <= 31, there's no main loop, just ciphertext stealing. 171 + beqz LEN32, .Lcts_without_main_loop\@ 172 + 173 + vsetvli VLMAX, zero, e32, m4, ta, ma 174 + 1: 175 + vsetvli VL, LEN32, e32, m4, ta, ma 176 + 2: 177 + // Encrypt or decrypt VL/4 blocks. 178 + vle32.v TMP0, (INP) 179 + vxor.vv TMP0, TMP0, TWEAKS 180 + aes_crypt TMP0, \enc, \keylen 181 + vxor.vv TMP0, TMP0, TWEAKS 182 + vse32.v TMP0, (OUTP) 183 + 184 + // Update the pointers and the remaining length. 185 + slli t0, VL, 2 186 + add INP, INP, t0 187 + add OUTP, OUTP, t0 188 + sub LEN32, LEN32, VL 189 + 190 + // Check whether more blocks remain. 191 + beqz LEN32, .Lmain_loop_done\@ 192 + 193 + // Compute the next sequence of tweaks by multiplying the previous 194 + // sequence by x^N. Store the result in both bit-reversed order and 195 + // regular order (i.e. with the bit reversal undone). 196 + vgmul.vv TWEAKS_BREV, MULTS_BREV 197 + vbrev8.v TWEAKS, TWEAKS_BREV 198 + 199 + // Since we compute the tweak multipliers x^N in advance, we require 200 + // that each iteration process the same length except possibly the last. 201 + // This conflicts slightly with the behavior allowed by RISC-V Vector 202 + // Extension, where CPUs can select a lower length for both of the last 203 + // two iterations. E.g., vl might take the sequence of values 204 + // [16, 16, 16, 12, 12], whereas we need [16, 16, 16, 16, 8] so that we 205 + // can use x^4 again instead of computing x^3. Therefore, we explicitly 206 + // keep the vl at VLMAX if there is at least VLMAX remaining. 207 + bge LEN32, VLMAX, 2b 208 + j 1b 209 + 210 + .Lmain_loop_done\@: 211 + load_x 212 + 213 + // Compute the next tweak. 214 + addi t0, VL, -4 215 + vsetivli zero, 4, e32, m4, ta, ma 216 + vslidedown.vx TWEAKS_BREV, TWEAKS_BREV, t0 // Extract last tweak 217 + vsetivli zero, 4, e32, m1, ta, ma 218 + vgmul.vv TWEAKS_BREV, MULTS_BREV // Advance to next tweak 219 + 220 + bnez TAIL_LEN, .Lcts\@ 221 + 222 + // Update *TWEAKP to contain the next tweak. 223 + vbrev8.v TWEAKS, TWEAKS_BREV 224 + vse32.v TWEAKS, (TWEAKP) 225 + ret 226 + 227 + .Lcts_without_main_loop\@: 228 + load_x 229 + .Lcts\@: 230 + // TWEAKS_BREV now contains the next tweak. Compute the one after that. 231 + vsetivli zero, 4, e32, m1, ta, ma 232 + vmv.v.v TMP0, TWEAKS_BREV 233 + vgmul.vv TMP0, MULTS_BREV 234 + // Undo the bit reversal of the next two tweaks and store them in TMP1 235 + // and TMP2, such that TMP1 is the first needed and TMP2 the second. 236 + .if \enc 237 + vbrev8.v TMP1, TWEAKS_BREV 238 + vbrev8.v TMP2, TMP0 239 + .else 240 + vbrev8.v TMP1, TMP0 241 + vbrev8.v TMP2, TWEAKS_BREV 242 + .endif 243 + 244 + // Encrypt/decrypt the last full block. 245 + vle32.v TMP0, (INP) 246 + vxor.vv TMP0, TMP0, TMP1 247 + aes_crypt TMP0, \enc, \keylen 248 + vxor.vv TMP0, TMP0, TMP1 249 + 250 + // Swap the first TAIL_LEN bytes of the above result with the tail. 251 + // Note that to support in-place encryption/decryption, the load from 252 + // the input tail must happen before the store to the output tail. 253 + addi t0, INP, 16 254 + addi t1, OUTP, 16 255 + vmv.v.v TMP3, TMP0 256 + vsetvli zero, TAIL_LEN, e8, m1, tu, ma 257 + vle8.v TMP0, (t0) 258 + vse8.v TMP3, (t1) 259 + 260 + // Encrypt/decrypt again and store the last full block. 261 + vsetivli zero, 4, e32, m1, ta, ma 262 + vxor.vv TMP0, TMP0, TMP2 263 + aes_crypt TMP0, \enc, \keylen 264 + vxor.vv TMP0, TMP0, TMP2 265 + vse32.v TMP0, (OUTP) 266 + 267 + ret 268 + .endm 269 + 270 + .macro aes_xts_crypt enc 271 + 272 + // Check whether the length is a multiple of the AES block size. 273 + andi TAIL_LEN, LEN, 15 274 + beqz TAIL_LEN, 1f 275 + 276 + // The length isn't a multiple of the AES block size, so ciphertext 277 + // stealing will be required. Ciphertext stealing involves special 278 + // handling of the partial block and the last full block, so subtract 279 + // the length of both from the length to be processed in the main loop. 280 + sub LEN, LEN, TAIL_LEN 281 + addi LEN, LEN, -16 282 + 1: 283 + srli LEN32, LEN, 2 284 + // LEN and LEN32 now contain the total length of the blocks that will be 285 + // processed in the main loop, in bytes and 32-bit words respectively. 286 + 287 + xts_init 288 + aes_begin KEYP, 128f, 192f 289 + __aes_xts_crypt \enc, 256 290 + 128: 291 + __aes_xts_crypt \enc, 128 292 + 192: 293 + __aes_xts_crypt \enc, 192 294 + .endm 295 + 296 + // void aes_xts_encrypt_zvkned_zvbb_zvkg(const struct crypto_aes_ctx *key, 297 + // const u8 *in, u8 *out, size_t len, 298 + // u8 tweak[16]); 299 + // 300 + // |key| is the data key. |tweak| contains the next tweak; the encryption of 301 + // the original IV with the tweak key was already done. This function supports 302 + // incremental computation, but |len| must always be >= 16 (AES_BLOCK_SIZE), and 303 + // |len| must be a multiple of 16 except on the last call. If |len| is a 304 + // multiple of 16, then this function updates |tweak| to contain the next tweak. 305 + SYM_FUNC_START(aes_xts_encrypt_zvkned_zvbb_zvkg) 306 + aes_xts_crypt 1 307 + SYM_FUNC_END(aes_xts_encrypt_zvkned_zvbb_zvkg) 308 + 309 + // Same prototype and calling convention as the encryption function 310 + SYM_FUNC_START(aes_xts_decrypt_zvkned_zvbb_zvkg) 311 + aes_xts_crypt 0 312 + SYM_FUNC_END(aes_xts_decrypt_zvkned_zvbb_zvkg)

+146

arch/riscv/crypto/aes-riscv64-zvkned-zvkb.S

··· 1 + /* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */ 2 + // 3 + // This file is dual-licensed, meaning that you can use it under your 4 + // choice of either of the following two licenses: 5 + // 6 + // Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. 7 + // 8 + // Licensed under the Apache License 2.0 (the "License"). You can obtain 9 + // a copy in the file LICENSE in the source distribution or at 10 + // https://www.openssl.org/source/license.html 11 + // 12 + // or 13 + // 14 + // Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com> 15 + // Copyright 2024 Google LLC 16 + // All rights reserved. 17 + // 18 + // Redistribution and use in source and binary forms, with or without 19 + // modification, are permitted provided that the following conditions 20 + // are met: 21 + // 1. Redistributions of source code must retain the above copyright 22 + // notice, this list of conditions and the following disclaimer. 23 + // 2. Redistributions in binary form must reproduce the above copyright 24 + // notice, this list of conditions and the following disclaimer in the 25 + // documentation and/or other materials provided with the distribution. 26 + // 27 + // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 28 + // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 29 + // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 30 + // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 31 + // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 32 + // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 33 + // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 34 + // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 35 + // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 36 + // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 + // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 + 39 + // The generated code of this file depends on the following RISC-V extensions: 40 + // - RV64I 41 + // - RISC-V Vector ('V') with VLEN >= 128 42 + // - RISC-V Vector AES block cipher extension ('Zvkned') 43 + // - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') 44 + 45 + #include <linux/linkage.h> 46 + 47 + .text 48 + .option arch, +zvkned, +zvkb 49 + 50 + #include "aes-macros.S" 51 + 52 + #define KEYP a0 53 + #define INP a1 54 + #define OUTP a2 55 + #define LEN a3 56 + #define IVP a4 57 + 58 + #define LEN32 a5 59 + #define VL_E32 a6 60 + #define VL_BLOCKS a7 61 + 62 + .macro aes_ctr32_crypt keylen 63 + // LEN32 = number of blocks, rounded up, in 32-bit words. 64 + addi t0, LEN, 15 65 + srli t0, t0, 4 66 + slli LEN32, t0, 2 67 + 68 + // Create a mask that selects the last 32-bit word of each 128-bit 69 + // block. This is the word that contains the (big-endian) counter. 70 + li t0, 0x88 71 + vsetvli t1, zero, e8, m1, ta, ma 72 + vmv.v.x v0, t0 73 + 74 + // Load the IV into v31. The last 32-bit word contains the counter. 75 + vsetivli zero, 4, e32, m1, ta, ma 76 + vle32.v v31, (IVP) 77 + 78 + // Convert the big-endian counter into little-endian. 79 + vsetivli zero, 4, e32, m1, ta, mu 80 + vrev8.v v31, v31, v0.t 81 + 82 + // Splat the IV to v16 (with LMUL=4). The number of copies is the 83 + // maximum number of blocks that will be processed per iteration. 84 + vsetvli zero, LEN32, e32, m4, ta, ma 85 + vmv.v.i v16, 0 86 + vaesz.vs v16, v31 87 + 88 + // v20 = [x, x, x, 0, x, x, x, 1, ...] 89 + viota.m v20, v0, v0.t 90 + // v16 = [IV0, IV1, IV2, counter+0, IV0, IV1, IV2, counter+1, ...] 91 + vsetvli VL_E32, LEN32, e32, m4, ta, mu 92 + vadd.vv v16, v16, v20, v0.t 93 + 94 + j 2f 95 + 1: 96 + // Set the number of blocks to process in this iteration. vl=VL_E32 is 97 + // the length in 32-bit words, i.e. 4 times the number of blocks. 98 + vsetvli VL_E32, LEN32, e32, m4, ta, mu 99 + 100 + // Increment the counters by the number of blocks processed in the 101 + // previous iteration. 102 + vadd.vx v16, v16, VL_BLOCKS, v0.t 103 + 2: 104 + // Prepare the AES inputs into v24. 105 + vmv.v.v v24, v16 106 + vrev8.v v24, v24, v0.t // Convert counters back to big-endian. 107 + 108 + // Encrypt the AES inputs to create the next portion of the keystream. 109 + aes_encrypt v24, \keylen 110 + 111 + // XOR the data with the keystream. 112 + vsetvli t0, LEN, e8, m4, ta, ma 113 + vle8.v v20, (INP) 114 + vxor.vv v20, v20, v24 115 + vse8.v v20, (OUTP) 116 + 117 + // Advance the pointers and update the remaining length. 118 + add INP, INP, t0 119 + add OUTP, OUTP, t0 120 + sub LEN, LEN, t0 121 + sub LEN32, LEN32, VL_E32 122 + srli VL_BLOCKS, VL_E32, 2 123 + 124 + // Repeat if more data remains. 125 + bnez LEN, 1b 126 + 127 + // Update *IVP to contain the next counter. 128 + vsetivli zero, 4, e32, m1, ta, mu 129 + vadd.vx v16, v16, VL_BLOCKS, v0.t 130 + vrev8.v v16, v16, v0.t // Convert counters back to big-endian. 131 + vse32.v v16, (IVP) 132 + 133 + ret 134 + .endm 135 + 136 + // void aes_ctr32_crypt_zvkned_zvkb(const struct crypto_aes_ctx *key, 137 + // const u8 *in, u8 *out, size_t len, 138 + // u8 iv[16]); 139 + SYM_FUNC_START(aes_ctr32_crypt_zvkned_zvkb) 140 + aes_begin KEYP, 128f, 192f 141 + aes_ctr32_crypt 256 142 + 128: 143 + aes_ctr32_crypt 128 144 + 192: 145 + aes_ctr32_crypt 192 146 + SYM_FUNC_END(aes_ctr32_crypt_zvkned_zvkb)

+339

arch/riscv/crypto/aes-riscv64-zvkned.S

··· 1 + /* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */ 2 + // 3 + // This file is dual-licensed, meaning that you can use it under your 4 + // choice of either of the following two licenses: 5 + // 6 + // Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. 7 + // 8 + // Licensed under the Apache License 2.0 (the "License"). You can obtain 9 + // a copy in the file LICENSE in the source distribution or at 10 + // https://www.openssl.org/source/license.html 11 + // 12 + // or 13 + // 14 + // Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu> 15 + // Copyright (c) 2023, Phoebe Chen <phoebe.chen@sifive.com> 16 + // Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com> 17 + // Copyright 2024 Google LLC 18 + // All rights reserved. 19 + // 20 + // Redistribution and use in source and binary forms, with or without 21 + // modification, are permitted provided that the following conditions 22 + // are met: 23 + // 1. Redistributions of source code must retain the above copyright 24 + // notice, this list of conditions and the following disclaimer. 25 + // 2. Redistributions in binary form must reproduce the above copyright 26 + // notice, this list of conditions and the following disclaimer in the 27 + // documentation and/or other materials provided with the distribution. 28 + // 29 + // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 30 + // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 31 + // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 32 + // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 33 + // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 34 + // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 35 + // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 36 + // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 37 + // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 38 + // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 39 + // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 40 + 41 + // The generated code of this file depends on the following RISC-V extensions: 42 + // - RV64I 43 + // - RISC-V Vector ('V') with VLEN >= 128 44 + // - RISC-V Vector AES block cipher extension ('Zvkned') 45 + 46 + #include <linux/linkage.h> 47 + 48 + .text 49 + .option arch, +zvkned 50 + 51 + #include "aes-macros.S" 52 + 53 + #define KEYP a0 54 + #define INP a1 55 + #define OUTP a2 56 + #define LEN a3 57 + #define IVP a4 58 + 59 + .macro __aes_crypt_zvkned enc, keylen 60 + vle32.v v16, (INP) 61 + aes_crypt v16, \enc, \keylen 62 + vse32.v v16, (OUTP) 63 + ret 64 + .endm 65 + 66 + .macro aes_crypt_zvkned enc 67 + aes_begin KEYP, 128f, 192f 68 + __aes_crypt_zvkned \enc, 256 69 + 128: 70 + __aes_crypt_zvkned \enc, 128 71 + 192: 72 + __aes_crypt_zvkned \enc, 192 73 + .endm 74 + 75 + // void aes_encrypt_zvkned(const struct crypto_aes_ctx *key, 76 + // const u8 in[16], u8 out[16]); 77 + SYM_FUNC_START(aes_encrypt_zvkned) 78 + aes_crypt_zvkned 1 79 + SYM_FUNC_END(aes_encrypt_zvkned) 80 + 81 + // Same prototype and calling convention as the encryption function 82 + SYM_FUNC_START(aes_decrypt_zvkned) 83 + aes_crypt_zvkned 0 84 + SYM_FUNC_END(aes_decrypt_zvkned) 85 + 86 + .macro __aes_ecb_crypt enc, keylen 87 + srli t0, LEN, 2 88 + // t0 is the remaining length in 32-bit words. It's a multiple of 4. 89 + 1: 90 + vsetvli t1, t0, e32, m8, ta, ma 91 + sub t0, t0, t1 // Subtract number of words processed 92 + slli t1, t1, 2 // Words to bytes 93 + vle32.v v16, (INP) 94 + aes_crypt v16, \enc, \keylen 95 + vse32.v v16, (OUTP) 96 + add INP, INP, t1 97 + add OUTP, OUTP, t1 98 + bnez t0, 1b 99 + 100 + ret 101 + .endm 102 + 103 + .macro aes_ecb_crypt enc 104 + aes_begin KEYP, 128f, 192f 105 + __aes_ecb_crypt \enc, 256 106 + 128: 107 + __aes_ecb_crypt \enc, 128 108 + 192: 109 + __aes_ecb_crypt \enc, 192 110 + .endm 111 + 112 + // void aes_ecb_encrypt_zvkned(const struct crypto_aes_ctx *key, 113 + // const u8 *in, u8 *out, size_t len); 114 + // 115 + // |len| must be nonzero and a multiple of 16 (AES_BLOCK_SIZE). 116 + SYM_FUNC_START(aes_ecb_encrypt_zvkned) 117 + aes_ecb_crypt 1 118 + SYM_FUNC_END(aes_ecb_encrypt_zvkned) 119 + 120 + // Same prototype and calling convention as the encryption function 121 + SYM_FUNC_START(aes_ecb_decrypt_zvkned) 122 + aes_ecb_crypt 0 123 + SYM_FUNC_END(aes_ecb_decrypt_zvkned) 124 + 125 + .macro aes_cbc_encrypt keylen 126 + vle32.v v16, (IVP) // Load IV 127 + 1: 128 + vle32.v v17, (INP) // Load plaintext block 129 + vxor.vv v16, v16, v17 // XOR with IV or prev ciphertext block 130 + aes_encrypt v16, \keylen // Encrypt 131 + vse32.v v16, (OUTP) // Store ciphertext block 132 + addi INP, INP, 16 133 + addi OUTP, OUTP, 16 134 + addi LEN, LEN, -16 135 + bnez LEN, 1b 136 + 137 + vse32.v v16, (IVP) // Store next IV 138 + ret 139 + .endm 140 + 141 + .macro aes_cbc_decrypt keylen 142 + srli LEN, LEN, 2 // Convert LEN from bytes to words 143 + vle32.v v16, (IVP) // Load IV 144 + 1: 145 + vsetvli t0, LEN, e32, m4, ta, ma 146 + vle32.v v20, (INP) // Load ciphertext blocks 147 + vslideup.vi v16, v20, 4 // Setup prev ciphertext blocks 148 + addi t1, t0, -4 149 + vslidedown.vx v24, v20, t1 // Save last ciphertext block 150 + aes_decrypt v20, \keylen // Decrypt the blocks 151 + vxor.vv v20, v20, v16 // XOR with prev ciphertext blocks 152 + vse32.v v20, (OUTP) // Store plaintext blocks 153 + vmv.v.v v16, v24 // Next "IV" is last ciphertext block 154 + slli t1, t0, 2 // Words to bytes 155 + add INP, INP, t1 156 + add OUTP, OUTP, t1 157 + sub LEN, LEN, t0 158 + bnez LEN, 1b 159 + 160 + vsetivli zero, 4, e32, m1, ta, ma 161 + vse32.v v16, (IVP) // Store next IV 162 + ret 163 + .endm 164 + 165 + // void aes_cbc_encrypt_zvkned(const struct crypto_aes_ctx *key, 166 + // const u8 *in, u8 *out, size_t len, u8 iv[16]); 167 + // 168 + // |len| must be nonzero and a multiple of 16 (AES_BLOCK_SIZE). 169 + SYM_FUNC_START(aes_cbc_encrypt_zvkned) 170 + aes_begin KEYP, 128f, 192f 171 + aes_cbc_encrypt 256 172 + 128: 173 + aes_cbc_encrypt 128 174 + 192: 175 + aes_cbc_encrypt 192 176 + SYM_FUNC_END(aes_cbc_encrypt_zvkned) 177 + 178 + // Same prototype and calling convention as the encryption function 179 + SYM_FUNC_START(aes_cbc_decrypt_zvkned) 180 + aes_begin KEYP, 128f, 192f 181 + aes_cbc_decrypt 256 182 + 128: 183 + aes_cbc_decrypt 128 184 + 192: 185 + aes_cbc_decrypt 192 186 + SYM_FUNC_END(aes_cbc_decrypt_zvkned) 187 + 188 + .macro aes_cbc_cts_encrypt keylen 189 + 190 + // CBC-encrypt all blocks except the last. But don't store the 191 + // second-to-last block to the output buffer yet, since it will be 192 + // handled specially in the ciphertext stealing step. Exception: if the 193 + // message is single-block, still encrypt the last (and only) block. 194 + li t0, 16 195 + j 2f 196 + 1: 197 + vse32.v v16, (OUTP) // Store ciphertext block 198 + addi OUTP, OUTP, 16 199 + 2: 200 + vle32.v v17, (INP) // Load plaintext block 201 + vxor.vv v16, v16, v17 // XOR with IV or prev ciphertext block 202 + aes_encrypt v16, \keylen // Encrypt 203 + addi INP, INP, 16 204 + addi LEN, LEN, -16 205 + bgt LEN, t0, 1b // Repeat if more than one block remains 206 + 207 + // Special case: if the message is a single block, just do CBC. 208 + beqz LEN, .Lcts_encrypt_done\@ 209 + 210 + // Encrypt the last two blocks using ciphertext stealing as follows: 211 + // C[n-1] = Encrypt(Encrypt(P[n-1] ^ C[n-2]) ^ P[n]) 212 + // C[n] = Encrypt(P[n-1] ^ C[n-2])[0..LEN] 213 + // 214 + // C[i] denotes the i'th ciphertext block, and likewise P[i] the i'th 215 + // plaintext block. Block n, the last block, may be partial; its length 216 + // is 1 <= LEN <= 16. If there are only 2 blocks, C[n-2] means the IV. 217 + // 218 + // v16 already contains Encrypt(P[n-1] ^ C[n-2]). 219 + // INP points to P[n]. OUTP points to where C[n-1] should go. 220 + // To support in-place encryption, load P[n] before storing C[n]. 221 + addi t0, OUTP, 16 // Get pointer to where C[n] should go 222 + vsetvli zero, LEN, e8, m1, tu, ma 223 + vle8.v v17, (INP) // Load P[n] 224 + vse8.v v16, (t0) // Store C[n] 225 + vxor.vv v16, v16, v17 // v16 = Encrypt(P[n-1] ^ C[n-2]) ^ P[n] 226 + vsetivli zero, 4, e32, m1, ta, ma 227 + aes_encrypt v16, \keylen 228 + .Lcts_encrypt_done\@: 229 + vse32.v v16, (OUTP) // Store C[n-1] (or C[n] in single-block case) 230 + ret 231 + .endm 232 + 233 + #define LEN32 t4 // Length of remaining full blocks in 32-bit words 234 + #define LEN_MOD16 t5 // Length of message in bytes mod 16 235 + 236 + .macro aes_cbc_cts_decrypt keylen 237 + andi LEN32, LEN, ~15 238 + srli LEN32, LEN32, 2 239 + andi LEN_MOD16, LEN, 15 240 + 241 + // Save C[n-2] in v28 so that it's available later during the ciphertext 242 + // stealing step. If there are fewer than three blocks, C[n-2] means 243 + // the IV, otherwise it means the third-to-last ciphertext block. 244 + vmv.v.v v28, v16 // IV 245 + add t0, LEN, -33 246 + bltz t0, .Lcts_decrypt_loop\@ 247 + andi t0, t0, ~15 248 + add t0, t0, INP 249 + vle32.v v28, (t0) 250 + 251 + // CBC-decrypt all full blocks. For the last full block, or the last 2 252 + // full blocks if the message is block-aligned, this doesn't write the 253 + // correct output blocks (unless the message is only a single block), 254 + // because it XORs the wrong values with the raw AES plaintexts. But we 255 + // fix this after this loop without redoing the AES decryptions. This 256 + // approach allows more of the AES decryptions to be parallelized. 257 + .Lcts_decrypt_loop\@: 258 + vsetvli t0, LEN32, e32, m4, ta, ma 259 + addi t1, t0, -4 260 + vle32.v v20, (INP) // Load next set of ciphertext blocks 261 + vmv.v.v v24, v16 // Get IV or last ciphertext block of prev set 262 + vslideup.vi v24, v20, 4 // Setup prev ciphertext blocks 263 + vslidedown.vx v16, v20, t1 // Save last ciphertext block of this set 264 + aes_decrypt v20, \keylen // Decrypt this set of blocks 265 + vxor.vv v24, v24, v20 // XOR prev ciphertext blocks with decrypted blocks 266 + vse32.v v24, (OUTP) // Store this set of plaintext blocks 267 + sub LEN32, LEN32, t0 268 + slli t0, t0, 2 // Words to bytes 269 + add INP, INP, t0 270 + add OUTP, OUTP, t0 271 + bnez LEN32, .Lcts_decrypt_loop\@ 272 + 273 + vsetivli zero, 4, e32, m4, ta, ma 274 + vslidedown.vx v20, v20, t1 // Extract raw plaintext of last full block 275 + addi t0, OUTP, -16 // Get pointer to last full plaintext block 276 + bnez LEN_MOD16, .Lcts_decrypt_non_block_aligned\@ 277 + 278 + // Special case: if the message is a single block, just do CBC. 279 + li t1, 16 280 + beq LEN, t1, .Lcts_decrypt_done\@ 281 + 282 + // Block-aligned message. Just fix up the last 2 blocks. We need: 283 + // 284 + // P[n-1] = Decrypt(C[n]) ^ C[n-2] 285 + // P[n] = Decrypt(C[n-1]) ^ C[n] 286 + // 287 + // We have C[n] in v16, Decrypt(C[n]) in v20, and C[n-2] in v28. 288 + // Together with Decrypt(C[n-1]) ^ C[n-2] from the output buffer, this 289 + // is everything needed to fix the output without re-decrypting blocks. 290 + addi t1, OUTP, -32 // Get pointer to where P[n-1] should go 291 + vxor.vv v20, v20, v28 // Decrypt(C[n]) ^ C[n-2] == P[n-1] 292 + vle32.v v24, (t1) // Decrypt(C[n-1]) ^ C[n-2] 293 + vse32.v v20, (t1) // Store P[n-1] 294 + vxor.vv v20, v24, v16 // Decrypt(C[n-1]) ^ C[n-2] ^ C[n] == P[n] ^ C[n-2] 295 + j .Lcts_decrypt_finish\@ 296 + 297 + .Lcts_decrypt_non_block_aligned\@: 298 + // Decrypt the last two blocks using ciphertext stealing as follows: 299 + // 300 + // P[n-1] = Decrypt(C[n] || Decrypt(C[n-1])[LEN_MOD16..16]) ^ C[n-2] 301 + // P[n] = (Decrypt(C[n-1]) ^ C[n])[0..LEN_MOD16] 302 + // 303 + // We already have Decrypt(C[n-1]) in v20 and C[n-2] in v28. 304 + vmv.v.v v16, v20 // v16 = Decrypt(C[n-1]) 305 + vsetvli zero, LEN_MOD16, e8, m1, tu, ma 306 + vle8.v v20, (INP) // v20 = C[n] || Decrypt(C[n-1])[LEN_MOD16..16] 307 + vxor.vv v16, v16, v20 // v16 = Decrypt(C[n-1]) ^ C[n] 308 + vse8.v v16, (OUTP) // Store P[n] 309 + vsetivli zero, 4, e32, m1, ta, ma 310 + aes_decrypt v20, \keylen // v20 = Decrypt(C[n] || Decrypt(C[n-1])[LEN_MOD16..16]) 311 + .Lcts_decrypt_finish\@: 312 + vxor.vv v20, v20, v28 // XOR with C[n-2] 313 + vse32.v v20, (t0) // Store last full plaintext block 314 + .Lcts_decrypt_done\@: 315 + ret 316 + .endm 317 + 318 + .macro aes_cbc_cts_crypt keylen 319 + vle32.v v16, (IVP) // Load IV 320 + beqz a5, .Lcts_decrypt\@ 321 + aes_cbc_cts_encrypt \keylen 322 + .Lcts_decrypt\@: 323 + aes_cbc_cts_decrypt \keylen 324 + .endm 325 + 326 + // void aes_cbc_cts_crypt_zvkned(const struct crypto_aes_ctx *key, 327 + // const u8 *in, u8 *out, size_t len, 328 + // const u8 iv[16], bool enc); 329 + // 330 + // Encrypts or decrypts a message with the CS3 variant of AES-CBC-CTS. 331 + // This is the variant that unconditionally swaps the last two blocks. 332 + SYM_FUNC_START(aes_cbc_cts_crypt_zvkned) 333 + aes_begin KEYP, 128f, 192f 334 + aes_cbc_cts_crypt 256 335 + 128: 336 + aes_cbc_cts_crypt 128 337 + 192: 338 + aes_cbc_cts_crypt 192 339 + SYM_FUNC_END(aes_cbc_cts_crypt_zvkned)

+101

arch/riscv/crypto/chacha-riscv64-glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * ChaCha20 using the RISC-V vector crypto extensions 4 + * 5 + * Copyright (C) 2023 SiFive, Inc. 6 + * Author: Jerry Shih <jerry.shih@sifive.com> 7 + */ 8 + 9 + #include <asm/simd.h> 10 + #include <asm/vector.h> 11 + #include <crypto/internal/chacha.h> 12 + #include <crypto/internal/skcipher.h> 13 + #include <linux/linkage.h> 14 + #include <linux/module.h> 15 + 16 + asmlinkage void chacha20_zvkb(const u32 key[8], const u8 *in, u8 *out, 17 + size_t len, const u32 iv[4]); 18 + 19 + static int riscv64_chacha20_crypt(struct skcipher_request *req) 20 + { 21 + u32 iv[CHACHA_IV_SIZE / sizeof(u32)]; 22 + u8 block_buffer[CHACHA_BLOCK_SIZE]; 23 + struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req); 24 + const struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm); 25 + struct skcipher_walk walk; 26 + unsigned int nbytes; 27 + unsigned int tail_bytes; 28 + int err; 29 + 30 + iv[0] = get_unaligned_le32(req->iv); 31 + iv[1] = get_unaligned_le32(req->iv + 4); 32 + iv[2] = get_unaligned_le32(req->iv + 8); 33 + iv[3] = get_unaligned_le32(req->iv + 12); 34 + 35 + err = skcipher_walk_virt(&walk, req, false); 36 + while (walk.nbytes) { 37 + nbytes = walk.nbytes & ~(CHACHA_BLOCK_SIZE - 1); 38 + tail_bytes = walk.nbytes & (CHACHA_BLOCK_SIZE - 1); 39 + kernel_vector_begin(); 40 + if (nbytes) { 41 + chacha20_zvkb(ctx->key, walk.src.virt.addr, 42 + walk.dst.virt.addr, nbytes, iv); 43 + iv[0] += nbytes / CHACHA_BLOCK_SIZE; 44 + } 45 + if (walk.nbytes == walk.total && tail_bytes > 0) { 46 + memcpy(block_buffer, walk.src.virt.addr + nbytes, 47 + tail_bytes); 48 + chacha20_zvkb(ctx->key, block_buffer, block_buffer, 49 + CHACHA_BLOCK_SIZE, iv); 50 + memcpy(walk.dst.virt.addr + nbytes, block_buffer, 51 + tail_bytes); 52 + tail_bytes = 0; 53 + } 54 + kernel_vector_end(); 55 + 56 + err = skcipher_walk_done(&walk, tail_bytes); 57 + } 58 + 59 + return err; 60 + } 61 + 62 + static struct skcipher_alg riscv64_chacha_alg = { 63 + .setkey = chacha20_setkey, 64 + .encrypt = riscv64_chacha20_crypt, 65 + .decrypt = riscv64_chacha20_crypt, 66 + .min_keysize = CHACHA_KEY_SIZE, 67 + .max_keysize = CHACHA_KEY_SIZE, 68 + .ivsize = CHACHA_IV_SIZE, 69 + .chunksize = CHACHA_BLOCK_SIZE, 70 + .walksize = 4 * CHACHA_BLOCK_SIZE, 71 + .base = { 72 + .cra_blocksize = 1, 73 + .cra_ctxsize = sizeof(struct chacha_ctx), 74 + .cra_priority = 300, 75 + .cra_name = "chacha20", 76 + .cra_driver_name = "chacha20-riscv64-zvkb", 77 + .cra_module = THIS_MODULE, 78 + }, 79 + }; 80 + 81 + static int __init riscv64_chacha_mod_init(void) 82 + { 83 + if (riscv_isa_extension_available(NULL, ZVKB) && 84 + riscv_vector_vlen() >= 128) 85 + return crypto_register_skcipher(&riscv64_chacha_alg); 86 + 87 + return -ENODEV; 88 + } 89 + 90 + static void __exit riscv64_chacha_mod_exit(void) 91 + { 92 + crypto_unregister_skcipher(&riscv64_chacha_alg); 93 + } 94 + 95 + module_init(riscv64_chacha_mod_init); 96 + module_exit(riscv64_chacha_mod_exit); 97 + 98 + MODULE_DESCRIPTION("ChaCha20 (RISC-V accelerated)"); 99 + MODULE_AUTHOR("Jerry Shih <jerry.shih@sifive.com>"); 100 + MODULE_LICENSE("GPL"); 101 + MODULE_ALIAS_CRYPTO("chacha20");

+294

arch/riscv/crypto/chacha-riscv64-zvkb.S

··· 1 + /* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */ 2 + // 3 + // This file is dual-licensed, meaning that you can use it under your 4 + // choice of either of the following two licenses: 5 + // 6 + // Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. 7 + // 8 + // Licensed under the Apache License 2.0 (the "License"). You can obtain 9 + // a copy in the file LICENSE in the source distribution or at 10 + // https://www.openssl.org/source/license.html 11 + // 12 + // or 13 + // 14 + // Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com> 15 + // Copyright 2024 Google LLC 16 + // All rights reserved. 17 + // 18 + // Redistribution and use in source and binary forms, with or without 19 + // modification, are permitted provided that the following conditions 20 + // are met: 21 + // 1. Redistributions of source code must retain the above copyright 22 + // notice, this list of conditions and the following disclaimer. 23 + // 2. Redistributions in binary form must reproduce the above copyright 24 + // notice, this list of conditions and the following disclaimer in the 25 + // documentation and/or other materials provided with the distribution. 26 + // 27 + // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 28 + // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 29 + // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 30 + // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 31 + // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 32 + // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 33 + // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 34 + // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 35 + // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 36 + // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 37 + // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 38 + 39 + // The generated code of this file depends on the following RISC-V extensions: 40 + // - RV64I 41 + // - RISC-V Vector ('V') with VLEN >= 128 42 + // - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') 43 + 44 + #include <linux/linkage.h> 45 + 46 + .text 47 + .option arch, +zvkb 48 + 49 + #define KEYP a0 50 + #define INP a1 51 + #define OUTP a2 52 + #define LEN a3 53 + #define IVP a4 54 + 55 + #define CONSTS0 a5 56 + #define CONSTS1 a6 57 + #define CONSTS2 a7 58 + #define CONSTS3 t0 59 + #define TMP t1 60 + #define VL t2 61 + #define STRIDE t3 62 + #define NROUNDS t4 63 + #define KEY0 s0 64 + #define KEY1 s1 65 + #define KEY2 s2 66 + #define KEY3 s3 67 + #define KEY4 s4 68 + #define KEY5 s5 69 + #define KEY6 s6 70 + #define KEY7 s7 71 + #define COUNTER s8 72 + #define NONCE0 s9 73 + #define NONCE1 s10 74 + #define NONCE2 s11 75 + 76 + .macro chacha_round a0, b0, c0, d0, a1, b1, c1, d1, \ 77 + a2, b2, c2, d2, a3, b3, c3, d3 78 + // a += b; d ^= a; d = rol(d, 16); 79 + vadd.vv \a0, \a0, \b0 80 + vadd.vv \a1, \a1, \b1 81 + vadd.vv \a2, \a2, \b2 82 + vadd.vv \a3, \a3, \b3 83 + vxor.vv \d0, \d0, \a0 84 + vxor.vv \d1, \d1, \a1 85 + vxor.vv \d2, \d2, \a2 86 + vxor.vv \d3, \d3, \a3 87 + vror.vi \d0, \d0, 32 - 16 88 + vror.vi \d1, \d1, 32 - 16 89 + vror.vi \d2, \d2, 32 - 16 90 + vror.vi \d3, \d3, 32 - 16 91 + 92 + // c += d; b ^= c; b = rol(b, 12); 93 + vadd.vv \c0, \c0, \d0 94 + vadd.vv \c1, \c1, \d1 95 + vadd.vv \c2, \c2, \d2 96 + vadd.vv \c3, \c3, \d3 97 + vxor.vv \b0, \b0, \c0 98 + vxor.vv \b1, \b1, \c1 99 + vxor.vv \b2, \b2, \c2 100 + vxor.vv \b3, \b3, \c3 101 + vror.vi \b0, \b0, 32 - 12 102 + vror.vi \b1, \b1, 32 - 12 103 + vror.vi \b2, \b2, 32 - 12 104 + vror.vi \b3, \b3, 32 - 12 105 + 106 + // a += b; d ^= a; d = rol(d, 8); 107 + vadd.vv \a0, \a0, \b0 108 + vadd.vv \a1, \a1, \b1 109 + vadd.vv \a2, \a2, \b2 110 + vadd.vv \a3, \a3, \b3 111 + vxor.vv \d0, \d0, \a0 112 + vxor.vv \d1, \d1, \a1 113 + vxor.vv \d2, \d2, \a2 114 + vxor.vv \d3, \d3, \a3 115 + vror.vi \d0, \d0, 32 - 8 116 + vror.vi \d1, \d1, 32 - 8 117 + vror.vi \d2, \d2, 32 - 8 118 + vror.vi \d3, \d3, 32 - 8 119 + 120 + // c += d; b ^= c; b = rol(b, 7); 121 + vadd.vv \c0, \c0, \d0 122 + vadd.vv \c1, \c1, \d1 123 + vadd.vv \c2, \c2, \d2 124 + vadd.vv \c3, \c3, \d3 125 + vxor.vv \b0, \b0, \c0 126 + vxor.vv \b1, \b1, \c1 127 + vxor.vv \b2, \b2, \c2 128 + vxor.vv \b3, \b3, \c3 129 + vror.vi \b0, \b0, 32 - 7 130 + vror.vi \b1, \b1, 32 - 7 131 + vror.vi \b2, \b2, 32 - 7 132 + vror.vi \b3, \b3, 32 - 7 133 + .endm 134 + 135 + // void chacha20_zvkb(const u32 key[8], const u8 *in, u8 *out, size_t len, 136 + // const u32 iv[4]); 137 + // 138 + // |len| must be nonzero and a multiple of 64 (CHACHA_BLOCK_SIZE). 139 + // The counter is treated as 32-bit, following the RFC7539 convention. 140 + SYM_FUNC_START(chacha20_zvkb) 141 + srli LEN, LEN, 6 // Bytes to blocks 142 + 143 + addi sp, sp, -96 144 + sd s0, 0(sp) 145 + sd s1, 8(sp) 146 + sd s2, 16(sp) 147 + sd s3, 24(sp) 148 + sd s4, 32(sp) 149 + sd s5, 40(sp) 150 + sd s6, 48(sp) 151 + sd s7, 56(sp) 152 + sd s8, 64(sp) 153 + sd s9, 72(sp) 154 + sd s10, 80(sp) 155 + sd s11, 88(sp) 156 + 157 + li STRIDE, 64 158 + 159 + // Set up the initial state matrix in scalar registers. 160 + li CONSTS0, 0x61707865 // "expa" little endian 161 + li CONSTS1, 0x3320646e // "nd 3" little endian 162 + li CONSTS2, 0x79622d32 // "2-by" little endian 163 + li CONSTS3, 0x6b206574 // "te k" little endian 164 + lw KEY0, 0(KEYP) 165 + lw KEY1, 4(KEYP) 166 + lw KEY2, 8(KEYP) 167 + lw KEY3, 12(KEYP) 168 + lw KEY4, 16(KEYP) 169 + lw KEY5, 20(KEYP) 170 + lw KEY6, 24(KEYP) 171 + lw KEY7, 28(KEYP) 172 + lw COUNTER, 0(IVP) 173 + lw NONCE0, 4(IVP) 174 + lw NONCE1, 8(IVP) 175 + lw NONCE2, 12(IVP) 176 + 177 + .Lblock_loop: 178 + // Set vl to the number of blocks to process in this iteration. 179 + vsetvli VL, LEN, e32, m1, ta, ma 180 + 181 + // Set up the initial state matrix for the next VL blocks in v0-v15. 182 + // v{i} holds the i'th 32-bit word of the state matrix for all blocks. 183 + // Note that only the counter word, at index 12, differs across blocks. 184 + vmv.v.x v0, CONSTS0 185 + vmv.v.x v1, CONSTS1 186 + vmv.v.x v2, CONSTS2 187 + vmv.v.x v3, CONSTS3 188 + vmv.v.x v4, KEY0 189 + vmv.v.x v5, KEY1 190 + vmv.v.x v6, KEY2 191 + vmv.v.x v7, KEY3 192 + vmv.v.x v8, KEY4 193 + vmv.v.x v9, KEY5 194 + vmv.v.x v10, KEY6 195 + vmv.v.x v11, KEY7 196 + vid.v v12 197 + vadd.vx v12, v12, COUNTER 198 + vmv.v.x v13, NONCE0 199 + vmv.v.x v14, NONCE1 200 + vmv.v.x v15, NONCE2 201 + 202 + // Load the first half of the input data for each block into v16-v23. 203 + // v{16+i} holds the i'th 32-bit word for all blocks. 204 + vlsseg8e32.v v16, (INP), STRIDE 205 + 206 + li NROUNDS, 20 207 + .Lnext_doubleround: 208 + addi NROUNDS, NROUNDS, -2 209 + // column round 210 + chacha_round v0, v4, v8, v12, v1, v5, v9, v13, \ 211 + v2, v6, v10, v14, v3, v7, v11, v15 212 + // diagonal round 213 + chacha_round v0, v5, v10, v15, v1, v6, v11, v12, \ 214 + v2, v7, v8, v13, v3, v4, v9, v14 215 + bnez NROUNDS, .Lnext_doubleround 216 + 217 + // Load the second half of the input data for each block into v24-v31. 218 + // v{24+i} holds the {8+i}'th 32-bit word for all blocks. 219 + addi TMP, INP, 32 220 + vlsseg8e32.v v24, (TMP), STRIDE 221 + 222 + // Finalize the first half of the keystream for each block. 223 + vadd.vx v0, v0, CONSTS0 224 + vadd.vx v1, v1, CONSTS1 225 + vadd.vx v2, v2, CONSTS2 226 + vadd.vx v3, v3, CONSTS3 227 + vadd.vx v4, v4, KEY0 228 + vadd.vx v5, v5, KEY1 229 + vadd.vx v6, v6, KEY2 230 + vadd.vx v7, v7, KEY3 231 + 232 + // Encrypt/decrypt the first half of the data for each block. 233 + vxor.vv v16, v16, v0 234 + vxor.vv v17, v17, v1 235 + vxor.vv v18, v18, v2 236 + vxor.vv v19, v19, v3 237 + vxor.vv v20, v20, v4 238 + vxor.vv v21, v21, v5 239 + vxor.vv v22, v22, v6 240 + vxor.vv v23, v23, v7 241 + 242 + // Store the first half of the output data for each block. 243 + vssseg8e32.v v16, (OUTP), STRIDE 244 + 245 + // Finalize the second half of the keystream for each block. 246 + vadd.vx v8, v8, KEY4 247 + vadd.vx v9, v9, KEY5 248 + vadd.vx v10, v10, KEY6 249 + vadd.vx v11, v11, KEY7 250 + vid.v v0 251 + vadd.vx v12, v12, COUNTER 252 + vadd.vx v13, v13, NONCE0 253 + vadd.vx v14, v14, NONCE1 254 + vadd.vx v15, v15, NONCE2 255 + vadd.vv v12, v12, v0 256 + 257 + // Encrypt/decrypt the second half of the data for each block. 258 + vxor.vv v24, v24, v8 259 + vxor.vv v25, v25, v9 260 + vxor.vv v26, v26, v10 261 + vxor.vv v27, v27, v11 262 + vxor.vv v29, v29, v13 263 + vxor.vv v28, v28, v12 264 + vxor.vv v30, v30, v14 265 + vxor.vv v31, v31, v15 266 + 267 + // Store the second half of the output data for each block. 268 + addi TMP, OUTP, 32 269 + vssseg8e32.v v24, (TMP), STRIDE 270 + 271 + // Update the counter, the remaining number of blocks, and the input and 272 + // output pointers according to the number of blocks processed (VL). 273 + add COUNTER, COUNTER, VL 274 + sub LEN, LEN, VL 275 + slli TMP, VL, 6 276 + add OUTP, OUTP, TMP 277 + add INP, INP, TMP 278 + bnez LEN, .Lblock_loop 279 + 280 + ld s0, 0(sp) 281 + ld s1, 8(sp) 282 + ld s2, 16(sp) 283 + ld s3, 24(sp) 284 + ld s4, 32(sp) 285 + ld s5, 40(sp) 286 + ld s6, 48(sp) 287 + ld s7, 56(sp) 288 + ld s8, 64(sp) 289 + ld s9, 72(sp) 290 + ld s10, 80(sp) 291 + ld s11, 88(sp) 292 + addi sp, sp, 96 293 + ret 294 + SYM_FUNC_END(chacha20_zvkb)

+168

arch/riscv/crypto/ghash-riscv64-glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * GHASH using the RISC-V vector crypto extensions 4 + * 5 + * Copyright (C) 2023 VRULL GmbH 6 + * Author: Heiko Stuebner <heiko.stuebner@vrull.eu> 7 + * 8 + * Copyright (C) 2023 SiFive, Inc. 9 + * Author: Jerry Shih <jerry.shih@sifive.com> 10 + */ 11 + 12 + #include <asm/simd.h> 13 + #include <asm/vector.h> 14 + #include <crypto/ghash.h> 15 + #include <crypto/internal/hash.h> 16 + #include <crypto/internal/simd.h> 17 + #include <linux/linkage.h> 18 + #include <linux/module.h> 19 + 20 + asmlinkage void ghash_zvkg(be128 *accumulator, const be128 *key, const u8 *data, 21 + size_t len); 22 + 23 + struct riscv64_ghash_tfm_ctx { 24 + be128 key; 25 + }; 26 + 27 + struct riscv64_ghash_desc_ctx { 28 + be128 accumulator; 29 + u8 buffer[GHASH_BLOCK_SIZE]; 30 + u32 bytes; 31 + }; 32 + 33 + static int riscv64_ghash_setkey(struct crypto_shash *tfm, const u8 *key, 34 + unsigned int keylen) 35 + { 36 + struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(tfm); 37 + 38 + if (keylen != GHASH_BLOCK_SIZE) 39 + return -EINVAL; 40 + 41 + memcpy(&tctx->key, key, GHASH_BLOCK_SIZE); 42 + 43 + return 0; 44 + } 45 + 46 + static int riscv64_ghash_init(struct shash_desc *desc) 47 + { 48 + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); 49 + 50 + *dctx = (struct riscv64_ghash_desc_ctx){}; 51 + 52 + return 0; 53 + } 54 + 55 + static inline void 56 + riscv64_ghash_blocks(const struct riscv64_ghash_tfm_ctx *tctx, 57 + struct riscv64_ghash_desc_ctx *dctx, 58 + const u8 *src, size_t srclen) 59 + { 60 + /* The srclen is nonzero and a multiple of 16. */ 61 + if (crypto_simd_usable()) { 62 + kernel_vector_begin(); 63 + ghash_zvkg(&dctx->accumulator, &tctx->key, src, srclen); 64 + kernel_vector_end(); 65 + } else { 66 + do { 67 + crypto_xor((u8 *)&dctx->accumulator, src, 68 + GHASH_BLOCK_SIZE); 69 + gf128mul_lle(&dctx->accumulator, &tctx->key); 70 + src += GHASH_BLOCK_SIZE; 71 + srclen -= GHASH_BLOCK_SIZE; 72 + } while (srclen); 73 + } 74 + } 75 + 76 + static int riscv64_ghash_update(struct shash_desc *desc, const u8 *src, 77 + unsigned int srclen) 78 + { 79 + const struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm); 80 + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); 81 + unsigned int len; 82 + 83 + if (dctx->bytes) { 84 + if (dctx->bytes + srclen < GHASH_BLOCK_SIZE) { 85 + memcpy(dctx->buffer + dctx->bytes, src, srclen); 86 + dctx->bytes += srclen; 87 + return 0; 88 + } 89 + memcpy(dctx->buffer + dctx->bytes, src, 90 + GHASH_BLOCK_SIZE - dctx->bytes); 91 + riscv64_ghash_blocks(tctx, dctx, dctx->buffer, 92 + GHASH_BLOCK_SIZE); 93 + src += GHASH_BLOCK_SIZE - dctx->bytes; 94 + srclen -= GHASH_BLOCK_SIZE - dctx->bytes; 95 + dctx->bytes = 0; 96 + } 97 + 98 + len = round_down(srclen, GHASH_BLOCK_SIZE); 99 + if (len) { 100 + riscv64_ghash_blocks(tctx, dctx, src, len); 101 + src += len; 102 + srclen -= len; 103 + } 104 + 105 + if (srclen) { 106 + memcpy(dctx->buffer, src, srclen); 107 + dctx->bytes = srclen; 108 + } 109 + 110 + return 0; 111 + } 112 + 113 + static int riscv64_ghash_final(struct shash_desc *desc, u8 *out) 114 + { 115 + const struct riscv64_ghash_tfm_ctx *tctx = crypto_shash_ctx(desc->tfm); 116 + struct riscv64_ghash_desc_ctx *dctx = shash_desc_ctx(desc); 117 + int i; 118 + 119 + if (dctx->bytes) { 120 + for (i = dctx->bytes; i < GHASH_BLOCK_SIZE; i++) 121 + dctx->buffer[i] = 0; 122 + 123 + riscv64_ghash_blocks(tctx, dctx, dctx->buffer, 124 + GHASH_BLOCK_SIZE); 125 + } 126 + 127 + memcpy(out, &dctx->accumulator, GHASH_DIGEST_SIZE); 128 + return 0; 129 + } 130 + 131 + static struct shash_alg riscv64_ghash_alg = { 132 + .init = riscv64_ghash_init, 133 + .update = riscv64_ghash_update, 134 + .final = riscv64_ghash_final, 135 + .setkey = riscv64_ghash_setkey, 136 + .descsize = sizeof(struct riscv64_ghash_desc_ctx), 137 + .digestsize = GHASH_DIGEST_SIZE, 138 + .base = { 139 + .cra_blocksize = GHASH_BLOCK_SIZE, 140 + .cra_ctxsize = sizeof(struct riscv64_ghash_tfm_ctx), 141 + .cra_priority = 300, 142 + .cra_name = "ghash", 143 + .cra_driver_name = "ghash-riscv64-zvkg", 144 + .cra_module = THIS_MODULE, 145 + }, 146 + }; 147 + 148 + static int __init riscv64_ghash_mod_init(void) 149 + { 150 + if (riscv_isa_extension_available(NULL, ZVKG) && 151 + riscv_vector_vlen() >= 128) 152 + return crypto_register_shash(&riscv64_ghash_alg); 153 + 154 + return -ENODEV; 155 + } 156 + 157 + static void __exit riscv64_ghash_mod_exit(void) 158 + { 159 + crypto_unregister_shash(&riscv64_ghash_alg); 160 + } 161 + 162 + module_init(riscv64_ghash_mod_init); 163 + module_exit(riscv64_ghash_mod_exit); 164 + 165 + MODULE_DESCRIPTION("GHASH (RISC-V accelerated)"); 166 + MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>"); 167 + MODULE_LICENSE("GPL"); 168 + MODULE_ALIAS_CRYPTO("ghash");

+72

arch/riscv/crypto/ghash-riscv64-zvkg.S

··· 1 + /* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */ 2 + // 3 + // This file is dual-licensed, meaning that you can use it under your 4 + // choice of either of the following two licenses: 5 + // 6 + // Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. 7 + // 8 + // Licensed under the Apache License 2.0 (the "License"). You can obtain 9 + // a copy in the file LICENSE in the source distribution or at 10 + // https://www.openssl.org/source/license.html 11 + // 12 + // or 13 + // 14 + // Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu> 15 + // Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com> 16 + // Copyright 2024 Google LLC 17 + // All rights reserved. 18 + // 19 + // Redistribution and use in source and binary forms, with or without 20 + // modification, are permitted provided that the following conditions 21 + // are met: 22 + // 1. Redistributions of source code must retain the above copyright 23 + // notice, this list of conditions and the following disclaimer. 24 + // 2. Redistributions in binary form must reproduce the above copyright 25 + // notice, this list of conditions and the following disclaimer in the 26 + // documentation and/or other materials provided with the distribution. 27 + // 28 + // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 29 + // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 30 + // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 31 + // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 32 + // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 33 + // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 34 + // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 35 + // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 36 + // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 37 + // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 + // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 + 40 + // The generated code of this file depends on the following RISC-V extensions: 41 + // - RV64I 42 + // - RISC-V Vector ('V') with VLEN >= 128 43 + // - RISC-V Vector GCM/GMAC extension ('Zvkg') 44 + 45 + #include <linux/linkage.h> 46 + 47 + .text 48 + .option arch, +zvkg 49 + 50 + #define ACCUMULATOR a0 51 + #define KEY a1 52 + #define DATA a2 53 + #define LEN a3 54 + 55 + // void ghash_zvkg(be128 *accumulator, const be128 *key, const u8 *data, 56 + // size_t len); 57 + // 58 + // |len| must be nonzero and a multiple of 16 (GHASH_BLOCK_SIZE). 59 + SYM_FUNC_START(ghash_zvkg) 60 + vsetivli zero, 4, e32, m1, ta, ma 61 + vle32.v v1, (ACCUMULATOR) 62 + vle32.v v2, (KEY) 63 + .Lnext_block: 64 + vle32.v v3, (DATA) 65 + vghsh.vv v1, v2, v3 66 + addi DATA, DATA, 16 67 + addi LEN, LEN, -16 68 + bnez LEN, .Lnext_block 69 + 70 + vse32.v v1, (ACCUMULATOR) 71 + ret 72 + SYM_FUNC_END(ghash_zvkg)

+137

arch/riscv/crypto/sha256-riscv64-glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * SHA-256 and SHA-224 using the RISC-V vector crypto extensions 4 + * 5 + * Copyright (C) 2022 VRULL GmbH 6 + * Author: Heiko Stuebner <heiko.stuebner@vrull.eu> 7 + * 8 + * Copyright (C) 2023 SiFive, Inc. 9 + * Author: Jerry Shih <jerry.shih@sifive.com> 10 + */ 11 + 12 + #include <asm/simd.h> 13 + #include <asm/vector.h> 14 + #include <crypto/internal/hash.h> 15 + #include <crypto/internal/simd.h> 16 + #include <crypto/sha256_base.h> 17 + #include <linux/linkage.h> 18 + #include <linux/module.h> 19 + 20 + /* 21 + * Note: the asm function only uses the 'state' field of struct sha256_state. 22 + * It is assumed to be the first field. 23 + */ 24 + asmlinkage void sha256_transform_zvknha_or_zvknhb_zvkb( 25 + struct sha256_state *state, const u8 *data, int num_blocks); 26 + 27 + static int riscv64_sha256_update(struct shash_desc *desc, const u8 *data, 28 + unsigned int len) 29 + { 30 + /* 31 + * Ensure struct sha256_state begins directly with the SHA-256 32 + * 256-bit internal state, as this is what the asm function expects. 33 + */ 34 + BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0); 35 + 36 + if (crypto_simd_usable()) { 37 + kernel_vector_begin(); 38 + sha256_base_do_update(desc, data, len, 39 + sha256_transform_zvknha_or_zvknhb_zvkb); 40 + kernel_vector_end(); 41 + } else { 42 + crypto_sha256_update(desc, data, len); 43 + } 44 + return 0; 45 + } 46 + 47 + static int riscv64_sha256_finup(struct shash_desc *desc, const u8 *data, 48 + unsigned int len, u8 *out) 49 + { 50 + if (crypto_simd_usable()) { 51 + kernel_vector_begin(); 52 + if (len) 53 + sha256_base_do_update( 54 + desc, data, len, 55 + sha256_transform_zvknha_or_zvknhb_zvkb); 56 + sha256_base_do_finalize( 57 + desc, sha256_transform_zvknha_or_zvknhb_zvkb); 58 + kernel_vector_end(); 59 + 60 + return sha256_base_finish(desc, out); 61 + } 62 + 63 + return crypto_sha256_finup(desc, data, len, out); 64 + } 65 + 66 + static int riscv64_sha256_final(struct shash_desc *desc, u8 *out) 67 + { 68 + return riscv64_sha256_finup(desc, NULL, 0, out); 69 + } 70 + 71 + static int riscv64_sha256_digest(struct shash_desc *desc, const u8 *data, 72 + unsigned int len, u8 *out) 73 + { 74 + return sha256_base_init(desc) ?: 75 + riscv64_sha256_finup(desc, data, len, out); 76 + } 77 + 78 + static struct shash_alg riscv64_sha256_algs[] = { 79 + { 80 + .init = sha256_base_init, 81 + .update = riscv64_sha256_update, 82 + .final = riscv64_sha256_final, 83 + .finup = riscv64_sha256_finup, 84 + .digest = riscv64_sha256_digest, 85 + .descsize = sizeof(struct sha256_state), 86 + .digestsize = SHA256_DIGEST_SIZE, 87 + .base = { 88 + .cra_blocksize = SHA256_BLOCK_SIZE, 89 + .cra_priority = 300, 90 + .cra_name = "sha256", 91 + .cra_driver_name = "sha256-riscv64-zvknha_or_zvknhb-zvkb", 92 + .cra_module = THIS_MODULE, 93 + }, 94 + }, { 95 + .init = sha224_base_init, 96 + .update = riscv64_sha256_update, 97 + .final = riscv64_sha256_final, 98 + .finup = riscv64_sha256_finup, 99 + .descsize = sizeof(struct sha256_state), 100 + .digestsize = SHA224_DIGEST_SIZE, 101 + .base = { 102 + .cra_blocksize = SHA224_BLOCK_SIZE, 103 + .cra_priority = 300, 104 + .cra_name = "sha224", 105 + .cra_driver_name = "sha224-riscv64-zvknha_or_zvknhb-zvkb", 106 + .cra_module = THIS_MODULE, 107 + }, 108 + }, 109 + }; 110 + 111 + static int __init riscv64_sha256_mod_init(void) 112 + { 113 + /* Both zvknha and zvknhb provide the SHA-256 instructions. */ 114 + if ((riscv_isa_extension_available(NULL, ZVKNHA) || 115 + riscv_isa_extension_available(NULL, ZVKNHB)) && 116 + riscv_isa_extension_available(NULL, ZVKB) && 117 + riscv_vector_vlen() >= 128) 118 + return crypto_register_shashes(riscv64_sha256_algs, 119 + ARRAY_SIZE(riscv64_sha256_algs)); 120 + 121 + return -ENODEV; 122 + } 123 + 124 + static void __exit riscv64_sha256_mod_exit(void) 125 + { 126 + crypto_unregister_shashes(riscv64_sha256_algs, 127 + ARRAY_SIZE(riscv64_sha256_algs)); 128 + } 129 + 130 + module_init(riscv64_sha256_mod_init); 131 + module_exit(riscv64_sha256_mod_exit); 132 + 133 + MODULE_DESCRIPTION("SHA-256 (RISC-V accelerated)"); 134 + MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>"); 135 + MODULE_LICENSE("GPL"); 136 + MODULE_ALIAS_CRYPTO("sha256"); 137 + MODULE_ALIAS_CRYPTO("sha224");

+225

arch/riscv/crypto/sha256-riscv64-zvknha_or_zvknhb-zvkb.S

··· 1 + /* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */ 2 + // 3 + // This file is dual-licensed, meaning that you can use it under your 4 + // choice of either of the following two licenses: 5 + // 6 + // Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. 7 + // 8 + // Licensed under the Apache License 2.0 (the "License"). You can obtain 9 + // a copy in the file LICENSE in the source distribution or at 10 + // https://www.openssl.org/source/license.html 11 + // 12 + // or 13 + // 14 + // Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu> 15 + // Copyright (c) 2023, Phoebe Chen <phoebe.chen@sifive.com> 16 + // Copyright 2024 Google LLC 17 + // All rights reserved. 18 + // 19 + // Redistribution and use in source and binary forms, with or without 20 + // modification, are permitted provided that the following conditions 21 + // are met: 22 + // 1. Redistributions of source code must retain the above copyright 23 + // notice, this list of conditions and the following disclaimer. 24 + // 2. Redistributions in binary form must reproduce the above copyright 25 + // notice, this list of conditions and the following disclaimer in the 26 + // documentation and/or other materials provided with the distribution. 27 + // 28 + // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 29 + // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 30 + // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 31 + // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 32 + // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 33 + // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 34 + // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 35 + // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 36 + // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 37 + // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 + // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 + 40 + // The generated code of this file depends on the following RISC-V extensions: 41 + // - RV64I 42 + // - RISC-V Vector ('V') with VLEN >= 128 43 + // - RISC-V Vector SHA-2 Secure Hash extension ('Zvknha' or 'Zvknhb') 44 + // - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') 45 + 46 + #include <linux/cfi_types.h> 47 + 48 + .text 49 + .option arch, +zvknha, +zvkb 50 + 51 + #define STATEP a0 52 + #define DATA a1 53 + #define NUM_BLOCKS a2 54 + 55 + #define STATEP_C a3 56 + 57 + #define MASK v0 58 + #define INDICES v1 59 + #define W0 v2 60 + #define W1 v3 61 + #define W2 v4 62 + #define W3 v5 63 + #define VTMP v6 64 + #define FEBA v7 65 + #define HGDC v8 66 + #define K0 v10 67 + #define K1 v11 68 + #define K2 v12 69 + #define K3 v13 70 + #define K4 v14 71 + #define K5 v15 72 + #define K6 v16 73 + #define K7 v17 74 + #define K8 v18 75 + #define K9 v19 76 + #define K10 v20 77 + #define K11 v21 78 + #define K12 v22 79 + #define K13 v23 80 + #define K14 v24 81 + #define K15 v25 82 + #define PREV_FEBA v26 83 + #define PREV_HGDC v27 84 + 85 + // Do 4 rounds of SHA-256. w0 contains the current 4 message schedule words. 86 + // 87 + // If not all the message schedule words have been computed yet, then this also 88 + // computes 4 more message schedule words. w1-w3 contain the next 3 groups of 4 89 + // message schedule words; this macro computes the group after w3 and writes it 90 + // to w0. This means that the next (w0, w1, w2, w3) is the current (w1, w2, w3, 91 + // w0), so the caller must cycle through the registers accordingly. 92 + .macro sha256_4rounds last, k, w0, w1, w2, w3 93 + vadd.vv VTMP, \k, \w0 94 + vsha2cl.vv HGDC, FEBA, VTMP 95 + vsha2ch.vv FEBA, HGDC, VTMP 96 + .if !\last 97 + vmerge.vvm VTMP, \w2, \w1, MASK 98 + vsha2ms.vv \w0, VTMP, \w3 99 + .endif 100 + .endm 101 + 102 + .macro sha256_16rounds last, k0, k1, k2, k3 103 + sha256_4rounds \last, \k0, W0, W1, W2, W3 104 + sha256_4rounds \last, \k1, W1, W2, W3, W0 105 + sha256_4rounds \last, \k2, W2, W3, W0, W1 106 + sha256_4rounds \last, \k3, W3, W0, W1, W2 107 + .endm 108 + 109 + // void sha256_transform_zvknha_or_zvknhb_zvkb(u32 state[8], const u8 *data, 110 + // int num_blocks); 111 + SYM_TYPED_FUNC_START(sha256_transform_zvknha_or_zvknhb_zvkb) 112 + 113 + // Load the round constants into K0-K15. 114 + vsetivli zero, 4, e32, m1, ta, ma 115 + la t0, K256 116 + vle32.v K0, (t0) 117 + addi t0, t0, 16 118 + vle32.v K1, (t0) 119 + addi t0, t0, 16 120 + vle32.v K2, (t0) 121 + addi t0, t0, 16 122 + vle32.v K3, (t0) 123 + addi t0, t0, 16 124 + vle32.v K4, (t0) 125 + addi t0, t0, 16 126 + vle32.v K5, (t0) 127 + addi t0, t0, 16 128 + vle32.v K6, (t0) 129 + addi t0, t0, 16 130 + vle32.v K7, (t0) 131 + addi t0, t0, 16 132 + vle32.v K8, (t0) 133 + addi t0, t0, 16 134 + vle32.v K9, (t0) 135 + addi t0, t0, 16 136 + vle32.v K10, (t0) 137 + addi t0, t0, 16 138 + vle32.v K11, (t0) 139 + addi t0, t0, 16 140 + vle32.v K12, (t0) 141 + addi t0, t0, 16 142 + vle32.v K13, (t0) 143 + addi t0, t0, 16 144 + vle32.v K14, (t0) 145 + addi t0, t0, 16 146 + vle32.v K15, (t0) 147 + 148 + // Setup mask for the vmerge to replace the first word (idx==0) in 149 + // message scheduling. There are 4 words, so an 8-bit mask suffices. 150 + vsetivli zero, 1, e8, m1, ta, ma 151 + vmv.v.i MASK, 0x01 152 + 153 + // Load the state. The state is stored as {a,b,c,d,e,f,g,h}, but we 154 + // need {f,e,b,a},{h,g,d,c}. The dst vtype is e32m1 and the index vtype 155 + // is e8mf4. We use index-load with the i8 indices {20, 16, 4, 0}, 156 + // loaded using the 32-bit little endian value 0x00041014. 157 + li t0, 0x00041014 158 + vsetivli zero, 1, e32, m1, ta, ma 159 + vmv.v.x INDICES, t0 160 + addi STATEP_C, STATEP, 8 161 + vsetivli zero, 4, e32, m1, ta, ma 162 + vluxei8.v FEBA, (STATEP), INDICES 163 + vluxei8.v HGDC, (STATEP_C), INDICES 164 + 165 + .Lnext_block: 166 + addi NUM_BLOCKS, NUM_BLOCKS, -1 167 + 168 + // Save the previous state, as it's needed later. 169 + vmv.v.v PREV_FEBA, FEBA 170 + vmv.v.v PREV_HGDC, HGDC 171 + 172 + // Load the next 512-bit message block and endian-swap each 32-bit word. 173 + vle32.v W0, (DATA) 174 + vrev8.v W0, W0 175 + addi DATA, DATA, 16 176 + vle32.v W1, (DATA) 177 + vrev8.v W1, W1 178 + addi DATA, DATA, 16 179 + vle32.v W2, (DATA) 180 + vrev8.v W2, W2 181 + addi DATA, DATA, 16 182 + vle32.v W3, (DATA) 183 + vrev8.v W3, W3 184 + addi DATA, DATA, 16 185 + 186 + // Do the 64 rounds of SHA-256. 187 + sha256_16rounds 0, K0, K1, K2, K3 188 + sha256_16rounds 0, K4, K5, K6, K7 189 + sha256_16rounds 0, K8, K9, K10, K11 190 + sha256_16rounds 1, K12, K13, K14, K15 191 + 192 + // Add the previous state. 193 + vadd.vv FEBA, FEBA, PREV_FEBA 194 + vadd.vv HGDC, HGDC, PREV_HGDC 195 + 196 + // Repeat if more blocks remain. 197 + bnez NUM_BLOCKS, .Lnext_block 198 + 199 + // Store the new state and return. 200 + vsuxei8.v FEBA, (STATEP), INDICES 201 + vsuxei8.v HGDC, (STATEP_C), INDICES 202 + ret 203 + SYM_FUNC_END(sha256_transform_zvknha_or_zvknhb_zvkb) 204 + 205 + .section ".rodata" 206 + .p2align 2 207 + .type K256, @object 208 + K256: 209 + .word 0x428a2f98, 0x71374491, 0xb5c0fbcf, 0xe9b5dba5 210 + .word 0x3956c25b, 0x59f111f1, 0x923f82a4, 0xab1c5ed5 211 + .word 0xd807aa98, 0x12835b01, 0x243185be, 0x550c7dc3 212 + .word 0x72be5d74, 0x80deb1fe, 0x9bdc06a7, 0xc19bf174 213 + .word 0xe49b69c1, 0xefbe4786, 0x0fc19dc6, 0x240ca1cc 214 + .word 0x2de92c6f, 0x4a7484aa, 0x5cb0a9dc, 0x76f988da 215 + .word 0x983e5152, 0xa831c66d, 0xb00327c8, 0xbf597fc7 216 + .word 0xc6e00bf3, 0xd5a79147, 0x06ca6351, 0x14292967 217 + .word 0x27b70a85, 0x2e1b2138, 0x4d2c6dfc, 0x53380d13 218 + .word 0x650a7354, 0x766a0abb, 0x81c2c92e, 0x92722c85 219 + .word 0xa2bfe8a1, 0xa81a664b, 0xc24b8b70, 0xc76c51a3 220 + .word 0xd192e819, 0xd6990624, 0xf40e3585, 0x106aa070 221 + .word 0x19a4c116, 0x1e376c08, 0x2748774c, 0x34b0bcb5 222 + .word 0x391c0cb3, 0x4ed8aa4a, 0x5b9cca4f, 0x682e6ff3 223 + .word 0x748f82ee, 0x78a5636f, 0x84c87814, 0x8cc70208 224 + .word 0x90befffa, 0xa4506ceb, 0xbef9a3f7, 0xc67178f2 225 + .size K256, . - K256

+133

arch/riscv/crypto/sha512-riscv64-glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * SHA-512 and SHA-384 using the RISC-V vector crypto extensions 4 + * 5 + * Copyright (C) 2023 VRULL GmbH 6 + * Author: Heiko Stuebner <heiko.stuebner@vrull.eu> 7 + * 8 + * Copyright (C) 2023 SiFive, Inc. 9 + * Author: Jerry Shih <jerry.shih@sifive.com> 10 + */ 11 + 12 + #include <asm/simd.h> 13 + #include <asm/vector.h> 14 + #include <crypto/internal/hash.h> 15 + #include <crypto/internal/simd.h> 16 + #include <crypto/sha512_base.h> 17 + #include <linux/linkage.h> 18 + #include <linux/module.h> 19 + 20 + /* 21 + * Note: the asm function only uses the 'state' field of struct sha512_state. 22 + * It is assumed to be the first field. 23 + */ 24 + asmlinkage void sha512_transform_zvknhb_zvkb( 25 + struct sha512_state *state, const u8 *data, int num_blocks); 26 + 27 + static int riscv64_sha512_update(struct shash_desc *desc, const u8 *data, 28 + unsigned int len) 29 + { 30 + /* 31 + * Ensure struct sha512_state begins directly with the SHA-512 32 + * 512-bit internal state, as this is what the asm function expects. 33 + */ 34 + BUILD_BUG_ON(offsetof(struct sha512_state, state) != 0); 35 + 36 + if (crypto_simd_usable()) { 37 + kernel_vector_begin(); 38 + sha512_base_do_update(desc, data, len, 39 + sha512_transform_zvknhb_zvkb); 40 + kernel_vector_end(); 41 + } else { 42 + crypto_sha512_update(desc, data, len); 43 + } 44 + return 0; 45 + } 46 + 47 + static int riscv64_sha512_finup(struct shash_desc *desc, const u8 *data, 48 + unsigned int len, u8 *out) 49 + { 50 + if (crypto_simd_usable()) { 51 + kernel_vector_begin(); 52 + if (len) 53 + sha512_base_do_update(desc, data, len, 54 + sha512_transform_zvknhb_zvkb); 55 + sha512_base_do_finalize(desc, sha512_transform_zvknhb_zvkb); 56 + kernel_vector_end(); 57 + 58 + return sha512_base_finish(desc, out); 59 + } 60 + 61 + return crypto_sha512_finup(desc, data, len, out); 62 + } 63 + 64 + static int riscv64_sha512_final(struct shash_desc *desc, u8 *out) 65 + { 66 + return riscv64_sha512_finup(desc, NULL, 0, out); 67 + } 68 + 69 + static int riscv64_sha512_digest(struct shash_desc *desc, const u8 *data, 70 + unsigned int len, u8 *out) 71 + { 72 + return sha512_base_init(desc) ?: 73 + riscv64_sha512_finup(desc, data, len, out); 74 + } 75 + 76 + static struct shash_alg riscv64_sha512_algs[] = { 77 + { 78 + .init = sha512_base_init, 79 + .update = riscv64_sha512_update, 80 + .final = riscv64_sha512_final, 81 + .finup = riscv64_sha512_finup, 82 + .digest = riscv64_sha512_digest, 83 + .descsize = sizeof(struct sha512_state), 84 + .digestsize = SHA512_DIGEST_SIZE, 85 + .base = { 86 + .cra_blocksize = SHA512_BLOCK_SIZE, 87 + .cra_priority = 300, 88 + .cra_name = "sha512", 89 + .cra_driver_name = "sha512-riscv64-zvknhb-zvkb", 90 + .cra_module = THIS_MODULE, 91 + }, 92 + }, { 93 + .init = sha384_base_init, 94 + .update = riscv64_sha512_update, 95 + .final = riscv64_sha512_final, 96 + .finup = riscv64_sha512_finup, 97 + .descsize = sizeof(struct sha512_state), 98 + .digestsize = SHA384_DIGEST_SIZE, 99 + .base = { 100 + .cra_blocksize = SHA384_BLOCK_SIZE, 101 + .cra_priority = 300, 102 + .cra_name = "sha384", 103 + .cra_driver_name = "sha384-riscv64-zvknhb-zvkb", 104 + .cra_module = THIS_MODULE, 105 + }, 106 + }, 107 + }; 108 + 109 + static int __init riscv64_sha512_mod_init(void) 110 + { 111 + if (riscv_isa_extension_available(NULL, ZVKNHB) && 112 + riscv_isa_extension_available(NULL, ZVKB) && 113 + riscv_vector_vlen() >= 128) 114 + return crypto_register_shashes(riscv64_sha512_algs, 115 + ARRAY_SIZE(riscv64_sha512_algs)); 116 + 117 + return -ENODEV; 118 + } 119 + 120 + static void __exit riscv64_sha512_mod_exit(void) 121 + { 122 + crypto_unregister_shashes(riscv64_sha512_algs, 123 + ARRAY_SIZE(riscv64_sha512_algs)); 124 + } 125 + 126 + module_init(riscv64_sha512_mod_init); 127 + module_exit(riscv64_sha512_mod_exit); 128 + 129 + MODULE_DESCRIPTION("SHA-512 (RISC-V accelerated)"); 130 + MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>"); 131 + MODULE_LICENSE("GPL"); 132 + MODULE_ALIAS_CRYPTO("sha512"); 133 + MODULE_ALIAS_CRYPTO("sha384");

+203

arch/riscv/crypto/sha512-riscv64-zvknhb-zvkb.S

··· 1 + /* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */ 2 + // 3 + // This file is dual-licensed, meaning that you can use it under your 4 + // choice of either of the following two licenses: 5 + // 6 + // Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. 7 + // 8 + // Licensed under the Apache License 2.0 (the "License"). You can obtain 9 + // a copy in the file LICENSE in the source distribution or at 10 + // https://www.openssl.org/source/license.html 11 + // 12 + // or 13 + // 14 + // Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu> 15 + // Copyright (c) 2023, Phoebe Chen <phoebe.chen@sifive.com> 16 + // Copyright 2024 Google LLC 17 + // All rights reserved. 18 + // 19 + // Redistribution and use in source and binary forms, with or without 20 + // modification, are permitted provided that the following conditions 21 + // are met: 22 + // 1. Redistributions of source code must retain the above copyright 23 + // notice, this list of conditions and the following disclaimer. 24 + // 2. Redistributions in binary form must reproduce the above copyright 25 + // notice, this list of conditions and the following disclaimer in the 26 + // documentation and/or other materials provided with the distribution. 27 + // 28 + // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 29 + // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 30 + // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 31 + // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 32 + // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 33 + // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 34 + // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 35 + // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 36 + // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 37 + // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 + // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 + 40 + // The generated code of this file depends on the following RISC-V extensions: 41 + // - RV64I 42 + // - RISC-V Vector ('V') with VLEN >= 128 43 + // - RISC-V Vector SHA-2 Secure Hash extension ('Zvknhb') 44 + // - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') 45 + 46 + #include <linux/cfi_types.h> 47 + 48 + .text 49 + .option arch, +zvknhb, +zvkb 50 + 51 + #define STATEP a0 52 + #define DATA a1 53 + #define NUM_BLOCKS a2 54 + 55 + #define STATEP_C a3 56 + #define K a4 57 + 58 + #define MASK v0 59 + #define INDICES v1 60 + #define W0 v10 // LMUL=2 61 + #define W1 v12 // LMUL=2 62 + #define W2 v14 // LMUL=2 63 + #define W3 v16 // LMUL=2 64 + #define VTMP v20 // LMUL=2 65 + #define FEBA v22 // LMUL=2 66 + #define HGDC v24 // LMUL=2 67 + #define PREV_FEBA v26 // LMUL=2 68 + #define PREV_HGDC v28 // LMUL=2 69 + 70 + // Do 4 rounds of SHA-512. w0 contains the current 4 message schedule words. 71 + // 72 + // If not all the message schedule words have been computed yet, then this also 73 + // computes 4 more message schedule words. w1-w3 contain the next 3 groups of 4 74 + // message schedule words; this macro computes the group after w3 and writes it 75 + // to w0. This means that the next (w0, w1, w2, w3) is the current (w1, w2, w3, 76 + // w0), so the caller must cycle through the registers accordingly. 77 + .macro sha512_4rounds last, w0, w1, w2, w3 78 + vle64.v VTMP, (K) 79 + addi K, K, 32 80 + vadd.vv VTMP, VTMP, \w0 81 + vsha2cl.vv HGDC, FEBA, VTMP 82 + vsha2ch.vv FEBA, HGDC, VTMP 83 + .if !\last 84 + vmerge.vvm VTMP, \w2, \w1, MASK 85 + vsha2ms.vv \w0, VTMP, \w3 86 + .endif 87 + .endm 88 + 89 + .macro sha512_16rounds last 90 + sha512_4rounds \last, W0, W1, W2, W3 91 + sha512_4rounds \last, W1, W2, W3, W0 92 + sha512_4rounds \last, W2, W3, W0, W1 93 + sha512_4rounds \last, W3, W0, W1, W2 94 + .endm 95 + 96 + // void sha512_transform_zvknhb_zvkb(u64 state[8], const u8 *data, 97 + // int num_blocks); 98 + SYM_TYPED_FUNC_START(sha512_transform_zvknhb_zvkb) 99 + 100 + // Setup mask for the vmerge to replace the first word (idx==0) in 101 + // message scheduling. There are 4 words, so an 8-bit mask suffices. 102 + vsetivli zero, 1, e8, m1, ta, ma 103 + vmv.v.i MASK, 0x01 104 + 105 + // Load the state. The state is stored as {a,b,c,d,e,f,g,h}, but we 106 + // need {f,e,b,a},{h,g,d,c}. The dst vtype is e64m2 and the index vtype 107 + // is e8mf4. We use index-load with the i8 indices {40, 32, 8, 0}, 108 + // loaded using the 32-bit little endian value 0x00082028. 109 + li t0, 0x00082028 110 + vsetivli zero, 1, e32, m1, ta, ma 111 + vmv.v.x INDICES, t0 112 + addi STATEP_C, STATEP, 16 113 + vsetivli zero, 4, e64, m2, ta, ma 114 + vluxei8.v FEBA, (STATEP), INDICES 115 + vluxei8.v HGDC, (STATEP_C), INDICES 116 + 117 + .Lnext_block: 118 + la K, K512 119 + addi NUM_BLOCKS, NUM_BLOCKS, -1 120 + 121 + // Save the previous state, as it's needed later. 122 + vmv.v.v PREV_FEBA, FEBA 123 + vmv.v.v PREV_HGDC, HGDC 124 + 125 + // Load the next 1024-bit message block and endian-swap each 64-bit word 126 + vle64.v W0, (DATA) 127 + vrev8.v W0, W0 128 + addi DATA, DATA, 32 129 + vle64.v W1, (DATA) 130 + vrev8.v W1, W1 131 + addi DATA, DATA, 32 132 + vle64.v W2, (DATA) 133 + vrev8.v W2, W2 134 + addi DATA, DATA, 32 135 + vle64.v W3, (DATA) 136 + vrev8.v W3, W3 137 + addi DATA, DATA, 32 138 + 139 + // Do the 80 rounds of SHA-512. 140 + sha512_16rounds 0 141 + sha512_16rounds 0 142 + sha512_16rounds 0 143 + sha512_16rounds 0 144 + sha512_16rounds 1 145 + 146 + // Add the previous state. 147 + vadd.vv FEBA, FEBA, PREV_FEBA 148 + vadd.vv HGDC, HGDC, PREV_HGDC 149 + 150 + // Repeat if more blocks remain. 151 + bnez NUM_BLOCKS, .Lnext_block 152 + 153 + // Store the new state and return. 154 + vsuxei8.v FEBA, (STATEP), INDICES 155 + vsuxei8.v HGDC, (STATEP_C), INDICES 156 + ret 157 + SYM_FUNC_END(sha512_transform_zvknhb_zvkb) 158 + 159 + .section ".rodata" 160 + .p2align 3 161 + .type K512, @object 162 + K512: 163 + .dword 0x428a2f98d728ae22, 0x7137449123ef65cd 164 + .dword 0xb5c0fbcfec4d3b2f, 0xe9b5dba58189dbbc 165 + .dword 0x3956c25bf348b538, 0x59f111f1b605d019 166 + .dword 0x923f82a4af194f9b, 0xab1c5ed5da6d8118 167 + .dword 0xd807aa98a3030242, 0x12835b0145706fbe 168 + .dword 0x243185be4ee4b28c, 0x550c7dc3d5ffb4e2 169 + .dword 0x72be5d74f27b896f, 0x80deb1fe3b1696b1 170 + .dword 0x9bdc06a725c71235, 0xc19bf174cf692694 171 + .dword 0xe49b69c19ef14ad2, 0xefbe4786384f25e3 172 + .dword 0x0fc19dc68b8cd5b5, 0x240ca1cc77ac9c65 173 + .dword 0x2de92c6f592b0275, 0x4a7484aa6ea6e483 174 + .dword 0x5cb0a9dcbd41fbd4, 0x76f988da831153b5 175 + .dword 0x983e5152ee66dfab, 0xa831c66d2db43210 176 + .dword 0xb00327c898fb213f, 0xbf597fc7beef0ee4 177 + .dword 0xc6e00bf33da88fc2, 0xd5a79147930aa725 178 + .dword 0x06ca6351e003826f, 0x142929670a0e6e70 179 + .dword 0x27b70a8546d22ffc, 0x2e1b21385c26c926 180 + .dword 0x4d2c6dfc5ac42aed, 0x53380d139d95b3df 181 + .dword 0x650a73548baf63de, 0x766a0abb3c77b2a8 182 + .dword 0x81c2c92e47edaee6, 0x92722c851482353b 183 + .dword 0xa2bfe8a14cf10364, 0xa81a664bbc423001 184 + .dword 0xc24b8b70d0f89791, 0xc76c51a30654be30 185 + .dword 0xd192e819d6ef5218, 0xd69906245565a910 186 + .dword 0xf40e35855771202a, 0x106aa07032bbd1b8 187 + .dword 0x19a4c116b8d2d0c8, 0x1e376c085141ab53 188 + .dword 0x2748774cdf8eeb99, 0x34b0bcb5e19b48a8 189 + .dword 0x391c0cb3c5c95a63, 0x4ed8aa4ae3418acb 190 + .dword 0x5b9cca4f7763e373, 0x682e6ff3d6b2b8a3 191 + .dword 0x748f82ee5defb2fc, 0x78a5636f43172f60 192 + .dword 0x84c87814a1f0ab72, 0x8cc702081a6439ec 193 + .dword 0x90befffa23631e28, 0xa4506cebde82bde9 194 + .dword 0xbef9a3f7b2c67915, 0xc67178f2e372532b 195 + .dword 0xca273eceea26619c, 0xd186b8c721c0c207 196 + .dword 0xeada7dd6cde0eb1e, 0xf57d4f7fee6ed178 197 + .dword 0x06f067aa72176fba, 0x0a637dc5a2c898a6 198 + .dword 0x113f9804bef90dae, 0x1b710b35131c471b 199 + .dword 0x28db77f523047d84, 0x32caab7b40c72493 200 + .dword 0x3c9ebe0a15c9bebc, 0x431d67c49c100d4c 201 + .dword 0x4cc5d4becb3e42b6, 0x597f299cfc657e2a 202 + .dword 0x5fcb6fab3ad6faec, 0x6c44198c4a475817 203 + .size K512, . - K512

+112

arch/riscv/crypto/sm3-riscv64-glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-or-later 2 + /* 3 + * SM3 using the RISC-V vector crypto extensions 4 + * 5 + * Copyright (C) 2023 VRULL GmbH 6 + * Author: Heiko Stuebner <heiko.stuebner@vrull.eu> 7 + * 8 + * Copyright (C) 2023 SiFive, Inc. 9 + * Author: Jerry Shih <jerry.shih@sifive.com> 10 + */ 11 + 12 + #include <asm/simd.h> 13 + #include <asm/vector.h> 14 + #include <crypto/internal/hash.h> 15 + #include <crypto/internal/simd.h> 16 + #include <crypto/sm3_base.h> 17 + #include <linux/linkage.h> 18 + #include <linux/module.h> 19 + 20 + /* 21 + * Note: the asm function only uses the 'state' field of struct sm3_state. 22 + * It is assumed to be the first field. 23 + */ 24 + asmlinkage void sm3_transform_zvksh_zvkb( 25 + struct sm3_state *state, const u8 *data, int num_blocks); 26 + 27 + static int riscv64_sm3_update(struct shash_desc *desc, const u8 *data, 28 + unsigned int len) 29 + { 30 + /* 31 + * Ensure struct sm3_state begins directly with the SM3 32 + * 256-bit internal state, as this is what the asm function expects. 33 + */ 34 + BUILD_BUG_ON(offsetof(struct sm3_state, state) != 0); 35 + 36 + if (crypto_simd_usable()) { 37 + kernel_vector_begin(); 38 + sm3_base_do_update(desc, data, len, sm3_transform_zvksh_zvkb); 39 + kernel_vector_end(); 40 + } else { 41 + sm3_update(shash_desc_ctx(desc), data, len); 42 + } 43 + return 0; 44 + } 45 + 46 + static int riscv64_sm3_finup(struct shash_desc *desc, const u8 *data, 47 + unsigned int len, u8 *out) 48 + { 49 + struct sm3_state *ctx; 50 + 51 + if (crypto_simd_usable()) { 52 + kernel_vector_begin(); 53 + if (len) 54 + sm3_base_do_update(desc, data, len, 55 + sm3_transform_zvksh_zvkb); 56 + sm3_base_do_finalize(desc, sm3_transform_zvksh_zvkb); 57 + kernel_vector_end(); 58 + 59 + return sm3_base_finish(desc, out); 60 + } 61 + 62 + ctx = shash_desc_ctx(desc); 63 + if (len) 64 + sm3_update(ctx, data, len); 65 + sm3_final(ctx, out); 66 + 67 + return 0; 68 + } 69 + 70 + static int riscv64_sm3_final(struct shash_desc *desc, u8 *out) 71 + { 72 + return riscv64_sm3_finup(desc, NULL, 0, out); 73 + } 74 + 75 + static struct shash_alg riscv64_sm3_alg = { 76 + .init = sm3_base_init, 77 + .update = riscv64_sm3_update, 78 + .final = riscv64_sm3_final, 79 + .finup = riscv64_sm3_finup, 80 + .descsize = sizeof(struct sm3_state), 81 + .digestsize = SM3_DIGEST_SIZE, 82 + .base = { 83 + .cra_blocksize = SM3_BLOCK_SIZE, 84 + .cra_priority = 300, 85 + .cra_name = "sm3", 86 + .cra_driver_name = "sm3-riscv64-zvksh-zvkb", 87 + .cra_module = THIS_MODULE, 88 + }, 89 + }; 90 + 91 + static int __init riscv64_sm3_mod_init(void) 92 + { 93 + if (riscv_isa_extension_available(NULL, ZVKSH) && 94 + riscv_isa_extension_available(NULL, ZVKB) && 95 + riscv_vector_vlen() >= 128) 96 + return crypto_register_shash(&riscv64_sm3_alg); 97 + 98 + return -ENODEV; 99 + } 100 + 101 + static void __exit riscv64_sm3_mod_exit(void) 102 + { 103 + crypto_unregister_shash(&riscv64_sm3_alg); 104 + } 105 + 106 + module_init(riscv64_sm3_mod_init); 107 + module_exit(riscv64_sm3_mod_exit); 108 + 109 + MODULE_DESCRIPTION("SM3 (RISC-V accelerated)"); 110 + MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>"); 111 + MODULE_LICENSE("GPL"); 112 + MODULE_ALIAS_CRYPTO("sm3");

+123

arch/riscv/crypto/sm3-riscv64-zvksh-zvkb.S

··· 1 + /* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */ 2 + // 3 + // This file is dual-licensed, meaning that you can use it under your 4 + // choice of either of the following two licenses: 5 + // 6 + // Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. 7 + // 8 + // Licensed under the Apache License 2.0 (the "License"). You can obtain 9 + // a copy in the file LICENSE in the source distribution or at 10 + // https://www.openssl.org/source/license.html 11 + // 12 + // or 13 + // 14 + // Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu> 15 + // Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com> 16 + // Copyright 2024 Google LLC 17 + // All rights reserved. 18 + // 19 + // Redistribution and use in source and binary forms, with or without 20 + // modification, are permitted provided that the following conditions 21 + // are met: 22 + // 1. Redistributions of source code must retain the above copyright 23 + // notice, this list of conditions and the following disclaimer. 24 + // 2. Redistributions in binary form must reproduce the above copyright 25 + // notice, this list of conditions and the following disclaimer in the 26 + // documentation and/or other materials provided with the distribution. 27 + // 28 + // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 29 + // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 30 + // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 31 + // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 32 + // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 33 + // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 34 + // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 35 + // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 36 + // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 37 + // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 + // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 + 40 + // The generated code of this file depends on the following RISC-V extensions: 41 + // - RV64I 42 + // - RISC-V Vector ('V') with VLEN >= 128 43 + // - RISC-V Vector SM3 Secure Hash extension ('Zvksh') 44 + // - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') 45 + 46 + #include <linux/cfi_types.h> 47 + 48 + .text 49 + .option arch, +zvksh, +zvkb 50 + 51 + #define STATEP a0 52 + #define DATA a1 53 + #define NUM_BLOCKS a2 54 + 55 + #define STATE v0 // LMUL=2 56 + #define PREV_STATE v2 // LMUL=2 57 + #define W0 v4 // LMUL=2 58 + #define W1 v6 // LMUL=2 59 + #define VTMP v8 // LMUL=2 60 + 61 + .macro sm3_8rounds i, w0, w1 62 + // Do 4 rounds using W_{0+i}..W_{7+i}. 63 + vsm3c.vi STATE, \w0, \i + 0 64 + vslidedown.vi VTMP, \w0, 2 65 + vsm3c.vi STATE, VTMP, \i + 1 66 + 67 + // Compute W_{4+i}..W_{11+i}. 68 + vslidedown.vi VTMP, \w0, 4 69 + vslideup.vi VTMP, \w1, 4 70 + 71 + // Do 4 rounds using W_{4+i}..W_{11+i}. 72 + vsm3c.vi STATE, VTMP, \i + 2 73 + vslidedown.vi VTMP, VTMP, 2 74 + vsm3c.vi STATE, VTMP, \i + 3 75 + 76 + .if \i < 28 77 + // Compute W_{16+i}..W_{23+i}. 78 + vsm3me.vv \w0, \w1, \w0 79 + .endif 80 + // For the next 8 rounds, w0 and w1 are swapped. 81 + .endm 82 + 83 + // void sm3_transform_zvksh_zvkb(u32 state[8], const u8 *data, int num_blocks); 84 + SYM_TYPED_FUNC_START(sm3_transform_zvksh_zvkb) 85 + 86 + // Load the state and endian-swap each 32-bit word. 87 + vsetivli zero, 8, e32, m2, ta, ma 88 + vle32.v STATE, (STATEP) 89 + vrev8.v STATE, STATE 90 + 91 + .Lnext_block: 92 + addi NUM_BLOCKS, NUM_BLOCKS, -1 93 + 94 + // Save the previous state, as it's needed later. 95 + vmv.v.v PREV_STATE, STATE 96 + 97 + // Load the next 512-bit message block into W0-W1. 98 + vle32.v W0, (DATA) 99 + addi DATA, DATA, 32 100 + vle32.v W1, (DATA) 101 + addi DATA, DATA, 32 102 + 103 + // Do the 64 rounds of SM3. 104 + sm3_8rounds 0, W0, W1 105 + sm3_8rounds 4, W1, W0 106 + sm3_8rounds 8, W0, W1 107 + sm3_8rounds 12, W1, W0 108 + sm3_8rounds 16, W0, W1 109 + sm3_8rounds 20, W1, W0 110 + sm3_8rounds 24, W0, W1 111 + sm3_8rounds 28, W1, W0 112 + 113 + // XOR in the previous state. 114 + vxor.vv STATE, STATE, PREV_STATE 115 + 116 + // Repeat if more blocks remain. 117 + bnez NUM_BLOCKS, .Lnext_block 118 + 119 + // Store the new state and return. 120 + vrev8.v STATE, STATE 121 + vse32.v STATE, (STATEP) 122 + ret 123 + SYM_FUNC_END(sm3_transform_zvksh_zvkb)

+107

arch/riscv/crypto/sm4-riscv64-glue.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * SM4 using the RISC-V vector crypto extensions 4 + * 5 + * Copyright (C) 2023 VRULL GmbH 6 + * Author: Heiko Stuebner <heiko.stuebner@vrull.eu> 7 + * 8 + * Copyright (C) 2023 SiFive, Inc. 9 + * Author: Jerry Shih <jerry.shih@sifive.com> 10 + */ 11 + 12 + #include <asm/simd.h> 13 + #include <asm/vector.h> 14 + #include <crypto/internal/cipher.h> 15 + #include <crypto/internal/simd.h> 16 + #include <crypto/sm4.h> 17 + #include <linux/linkage.h> 18 + #include <linux/module.h> 19 + 20 + asmlinkage void sm4_expandkey_zvksed_zvkb(const u8 user_key[SM4_KEY_SIZE], 21 + u32 rkey_enc[SM4_RKEY_WORDS], 22 + u32 rkey_dec[SM4_RKEY_WORDS]); 23 + asmlinkage void sm4_crypt_zvksed_zvkb(const u32 rkey[SM4_RKEY_WORDS], 24 + const u8 in[SM4_BLOCK_SIZE], 25 + u8 out[SM4_BLOCK_SIZE]); 26 + 27 + static int riscv64_sm4_setkey(struct crypto_tfm *tfm, const u8 *key, 28 + unsigned int keylen) 29 + { 30 + struct sm4_ctx *ctx = crypto_tfm_ctx(tfm); 31 + 32 + if (crypto_simd_usable()) { 33 + if (keylen != SM4_KEY_SIZE) 34 + return -EINVAL; 35 + kernel_vector_begin(); 36 + sm4_expandkey_zvksed_zvkb(key, ctx->rkey_enc, ctx->rkey_dec); 37 + kernel_vector_end(); 38 + return 0; 39 + } 40 + return sm4_expandkey(ctx, key, keylen); 41 + } 42 + 43 + static void riscv64_sm4_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src) 44 + { 45 + const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm); 46 + 47 + if (crypto_simd_usable()) { 48 + kernel_vector_begin(); 49 + sm4_crypt_zvksed_zvkb(ctx->rkey_enc, src, dst); 50 + kernel_vector_end(); 51 + } else { 52 + sm4_crypt_block(ctx->rkey_enc, dst, src); 53 + } 54 + } 55 + 56 + static void riscv64_sm4_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src) 57 + { 58 + const struct sm4_ctx *ctx = crypto_tfm_ctx(tfm); 59 + 60 + if (crypto_simd_usable()) { 61 + kernel_vector_begin(); 62 + sm4_crypt_zvksed_zvkb(ctx->rkey_dec, src, dst); 63 + kernel_vector_end(); 64 + } else { 65 + sm4_crypt_block(ctx->rkey_dec, dst, src); 66 + } 67 + } 68 + 69 + static struct crypto_alg riscv64_sm4_alg = { 70 + .cra_flags = CRYPTO_ALG_TYPE_CIPHER, 71 + .cra_blocksize = SM4_BLOCK_SIZE, 72 + .cra_ctxsize = sizeof(struct sm4_ctx), 73 + .cra_priority = 300, 74 + .cra_name = "sm4", 75 + .cra_driver_name = "sm4-riscv64-zvksed-zvkb", 76 + .cra_cipher = { 77 + .cia_min_keysize = SM4_KEY_SIZE, 78 + .cia_max_keysize = SM4_KEY_SIZE, 79 + .cia_setkey = riscv64_sm4_setkey, 80 + .cia_encrypt = riscv64_sm4_encrypt, 81 + .cia_decrypt = riscv64_sm4_decrypt, 82 + }, 83 + .cra_module = THIS_MODULE, 84 + }; 85 + 86 + static int __init riscv64_sm4_mod_init(void) 87 + { 88 + if (riscv_isa_extension_available(NULL, ZVKSED) && 89 + riscv_isa_extension_available(NULL, ZVKB) && 90 + riscv_vector_vlen() >= 128) 91 + return crypto_register_alg(&riscv64_sm4_alg); 92 + 93 + return -ENODEV; 94 + } 95 + 96 + static void __exit riscv64_sm4_mod_exit(void) 97 + { 98 + crypto_unregister_alg(&riscv64_sm4_alg); 99 + } 100 + 101 + module_init(riscv64_sm4_mod_init); 102 + module_exit(riscv64_sm4_mod_exit); 103 + 104 + MODULE_DESCRIPTION("SM4 (RISC-V accelerated)"); 105 + MODULE_AUTHOR("Heiko Stuebner <heiko.stuebner@vrull.eu>"); 106 + MODULE_LICENSE("GPL"); 107 + MODULE_ALIAS_CRYPTO("sm4");

+117

arch/riscv/crypto/sm4-riscv64-zvksed-zvkb.S

··· 1 + /* SPDX-License-Identifier: Apache-2.0 OR BSD-2-Clause */ 2 + // 3 + // This file is dual-licensed, meaning that you can use it under your 4 + // choice of either of the following two licenses: 5 + // 6 + // Copyright 2023 The OpenSSL Project Authors. All Rights Reserved. 7 + // 8 + // Licensed under the Apache License 2.0 (the "License"). You can obtain 9 + // a copy in the file LICENSE in the source distribution or at 10 + // https://www.openssl.org/source/license.html 11 + // 12 + // or 13 + // 14 + // Copyright (c) 2023, Christoph Müllner <christoph.muellner@vrull.eu> 15 + // Copyright (c) 2023, Jerry Shih <jerry.shih@sifive.com> 16 + // Copyright 2024 Google LLC 17 + // All rights reserved. 18 + // 19 + // Redistribution and use in source and binary forms, with or without 20 + // modification, are permitted provided that the following conditions 21 + // are met: 22 + // 1. Redistributions of source code must retain the above copyright 23 + // notice, this list of conditions and the following disclaimer. 24 + // 2. Redistributions in binary form must reproduce the above copyright 25 + // notice, this list of conditions and the following disclaimer in the 26 + // documentation and/or other materials provided with the distribution. 27 + // 28 + // THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS 29 + // "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT 30 + // LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR 31 + // A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT 32 + // OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 33 + // SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 34 + // LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, 35 + // DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY 36 + // THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 37 + // (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 38 + // OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 39 + 40 + // The generated code of this file depends on the following RISC-V extensions: 41 + // - RV64I 42 + // - RISC-V Vector ('V') with VLEN >= 128 43 + // - RISC-V Vector SM4 Block Cipher extension ('Zvksed') 44 + // - RISC-V Vector Cryptography Bit-manipulation extension ('Zvkb') 45 + 46 + #include <linux/linkage.h> 47 + 48 + .text 49 + .option arch, +zvksed, +zvkb 50 + 51 + // void sm4_expandkey_zksed_zvkb(const u8 user_key[16], u32 rkey_enc[32], 52 + // u32 rkey_dec[32]); 53 + SYM_FUNC_START(sm4_expandkey_zvksed_zvkb) 54 + vsetivli zero, 4, e32, m1, ta, ma 55 + 56 + // Load the user key. 57 + vle32.v v1, (a0) 58 + vrev8.v v1, v1 59 + 60 + // XOR the user key with the family key. 61 + la t0, FAMILY_KEY 62 + vle32.v v2, (t0) 63 + vxor.vv v1, v1, v2 64 + 65 + // Compute the round keys. Store them in forwards order in rkey_enc 66 + // and in reverse order in rkey_dec. 67 + addi a2, a2, 31*4 68 + li t0, -4 69 + .set i, 0 70 + .rept 8 71 + vsm4k.vi v1, v1, i 72 + vse32.v v1, (a1) // Store to rkey_enc. 73 + vsse32.v v1, (a2), t0 // Store to rkey_dec. 74 + .if i < 7 75 + addi a1, a1, 16 76 + addi a2, a2, -16 77 + .endif 78 + .set i, i + 1 79 + .endr 80 + 81 + ret 82 + SYM_FUNC_END(sm4_expandkey_zvksed_zvkb) 83 + 84 + // void sm4_crypt_zvksed_zvkb(const u32 rkey[32], const u8 in[16], u8 out[16]); 85 + SYM_FUNC_START(sm4_crypt_zvksed_zvkb) 86 + vsetivli zero, 4, e32, m1, ta, ma 87 + 88 + // Load the input data. 89 + vle32.v v1, (a1) 90 + vrev8.v v1, v1 91 + 92 + // Do the 32 rounds of SM4, 4 at a time. 93 + .set i, 0 94 + .rept 8 95 + vle32.v v2, (a0) 96 + vsm4r.vs v1, v2 97 + .if i < 7 98 + addi a0, a0, 16 99 + .endif 100 + .set i, i + 1 101 + .endr 102 + 103 + // Store the output data (in reverse element order). 104 + vrev8.v v1, v1 105 + li t0, -4 106 + addi a2, a2, 12 107 + vsse32.v v1, (a2), t0 108 + 109 + ret 110 + SYM_FUNC_END(sm4_crypt_zvksed_zvkb) 111 + 112 + .section ".rodata" 113 + .p2align 2 114 + .type FAMILY_KEY, @object 115 + FAMILY_KEY: 116 + .word 0xA3B1BAC6, 0x56AA3350, 0x677D9197, 0xB27022DC 117 + .size FAMILY_KEY, . - FAMILY_KEY

+5 -5

arch/riscv/errata/andes/errata.c

··· 18 18 #include <asm/sbi.h> 19 19 #include <asm/vendorid_list.h> 20 20 21 - #define ANDESTECH_AX45MP_MARCHID 0x8000000000008a45UL 22 - #define ANDESTECH_AX45MP_MIMPID 0x500UL 23 - #define ANDESTECH_SBI_EXT_ANDES 0x0900031E 21 + #define ANDES_AX45MP_MARCHID 0x8000000000008a45UL 22 + #define ANDES_AX45MP_MIMPID 0x500UL 23 + #define ANDES_SBI_EXT_ANDES 0x0900031E 24 24 25 25 #define ANDES_SBI_EXT_IOCP_SW_WORKAROUND 1 26 26 ··· 32 32 * ANDES_SBI_EXT_IOCP_SW_WORKAROUND SBI EXT checks if the IOCP is missing and 33 33 * cache is controllable only then CMO will be applied to the platform. 34 34 */ 35 - ret = sbi_ecall(ANDESTECH_SBI_EXT_ANDES, ANDES_SBI_EXT_IOCP_SW_WORKAROUND, 35 + ret = sbi_ecall(ANDES_SBI_EXT_ANDES, ANDES_SBI_EXT_IOCP_SW_WORKAROUND, 36 36 0, 0, 0, 0, 0, 0); 37 37 38 38 return ret.error ? 0 : ret.value; ··· 50 50 51 51 done = true; 52 52 53 - if (arch_id != ANDESTECH_AX45MP_MARCHID || impid != ANDESTECH_AX45MP_MIMPID) 53 + if (arch_id != ANDES_AX45MP_MARCHID || impid != ANDES_AX45MP_MIMPID) 54 54 return; 55 55 56 56 if (!ax45mp_iocp_sw_workaround())

+10

arch/riscv/include/asm/asm.h

··· 183 183 REG_L x31, PT_T6(sp) 184 184 .endm 185 185 186 + /* Annotate a function as being unsuitable for kprobes. */ 187 + #ifdef CONFIG_KPROBES 188 + #define ASM_NOKPROBE(name) \ 189 + .pushsection "_kprobe_blacklist", "aw"; \ 190 + RISCV_PTR name; \ 191 + .popsection 192 + #else 193 + #define ASM_NOKPROBE(name) 194 + #endif 195 + 186 196 #endif /* __ASSEMBLY__ */ 187 197 188 198 #endif /* _ASM_RISCV_ASM_H */

+8 -9

arch/riscv/include/asm/atomic.h

··· 17 17 #endif 18 18 19 19 #include <asm/cmpxchg.h> 20 - #include <asm/barrier.h> 21 20 22 21 #define __atomic_acquire_fence() \ 23 22 __asm__ __volatile__(RISCV_ACQUIRE_BARRIER "" ::: "memory") ··· 206 207 " add %[rc], %[p], %[a]\n" 207 208 " sc.w.rl %[rc], %[rc], %[c]\n" 208 209 " bnez %[rc], 0b\n" 209 - " fence rw, rw\n" 210 + RISCV_FULL_BARRIER 210 211 "1:\n" 211 212 : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter) 212 213 : [a]"r" (a), [u]"r" (u) ··· 227 228 " add %[rc], %[p], %[a]\n" 228 229 " sc.d.rl %[rc], %[rc], %[c]\n" 229 230 " bnez %[rc], 0b\n" 230 - " fence rw, rw\n" 231 + RISCV_FULL_BARRIER 231 232 "1:\n" 232 233 : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter) 233 234 : [a]"r" (a), [u]"r" (u) ··· 247 248 " addi %[rc], %[p], 1\n" 248 249 " sc.w.rl %[rc], %[rc], %[c]\n" 249 250 " bnez %[rc], 0b\n" 250 - " fence rw, rw\n" 251 + RISCV_FULL_BARRIER 251 252 "1:\n" 252 253 : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter) 253 254 : ··· 267 268 " addi %[rc], %[p], -1\n" 268 269 " sc.w.rl %[rc], %[rc], %[c]\n" 269 270 " bnez %[rc], 0b\n" 270 - " fence rw, rw\n" 271 + RISCV_FULL_BARRIER 271 272 "1:\n" 272 273 : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter) 273 274 : ··· 287 288 " bltz %[rc], 1f\n" 288 289 " sc.w.rl %[rc], %[rc], %[c]\n" 289 290 " bnez %[rc], 0b\n" 290 - " fence rw, rw\n" 291 + RISCV_FULL_BARRIER 291 292 "1:\n" 292 293 : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter) 293 294 : ··· 309 310 " addi %[rc], %[p], 1\n" 310 311 " sc.d.rl %[rc], %[rc], %[c]\n" 311 312 " bnez %[rc], 0b\n" 312 - " fence rw, rw\n" 313 + RISCV_FULL_BARRIER 313 314 "1:\n" 314 315 : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter) 315 316 : ··· 330 331 " addi %[rc], %[p], -1\n" 331 332 " sc.d.rl %[rc], %[rc], %[c]\n" 332 333 " bnez %[rc], 0b\n" 333 - " fence rw, rw\n" 334 + RISCV_FULL_BARRIER 334 335 "1:\n" 335 336 : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter) 336 337 : ··· 351 352 " bltz %[rc], 1f\n" 352 353 " sc.d.rl %[rc], %[rc], %[c]\n" 353 354 " bnez %[rc], 0b\n" 354 - " fence rw, rw\n" 355 + RISCV_FULL_BARRIER 355 356 "1:\n" 356 357 : [p]"=&r" (prev), [rc]"=&r" (rc), [c]"+A" (v->counter) 357 358 :

+10 -11

arch/riscv/include/asm/barrier.h

··· 11 11 #define _ASM_RISCV_BARRIER_H 12 12 13 13 #ifndef __ASSEMBLY__ 14 + #include <asm/fence.h> 14 15 15 16 #define nop() __asm__ __volatile__ ("nop") 16 17 #define __nops(n) ".rept " #n "\nnop\n.endr\n" 17 18 #define nops(n) __asm__ __volatile__ (__nops(n)) 18 19 19 - #define RISCV_FENCE(p, s) \ 20 - __asm__ __volatile__ ("fence " #p "," #s : : : "memory") 21 20 22 21 /* These barriers need to enforce ordering on both devices or memory. */ 23 - #define mb() RISCV_FENCE(iorw,iorw) 24 - #define rmb() RISCV_FENCE(ir,ir) 25 - #define wmb() RISCV_FENCE(ow,ow) 22 + #define __mb() RISCV_FENCE(iorw, iorw) 23 + #define __rmb() RISCV_FENCE(ir, ir) 24 + #define __wmb() RISCV_FENCE(ow, ow) 26 25 27 26 /* These barriers do not need to enforce ordering on devices, just memory. */ 28 - #define __smp_mb() RISCV_FENCE(rw,rw) 29 - #define __smp_rmb() RISCV_FENCE(r,r) 30 - #define __smp_wmb() RISCV_FENCE(w,w) 27 + #define __smp_mb() RISCV_FENCE(rw, rw) 28 + #define __smp_rmb() RISCV_FENCE(r, r) 29 + #define __smp_wmb() RISCV_FENCE(w, w) 31 30 32 31 #define __smp_store_release(p, v) \ 33 32 do { \ 34 33 compiletime_assert_atomic_type(*p); \ 35 - RISCV_FENCE(rw,w); \ 34 + RISCV_FENCE(rw, w); \ 36 35 WRITE_ONCE(*p, v); \ 37 36 } while (0) 38 37 ··· 39 40 ({ \ 40 41 typeof(*p) ___p1 = READ_ONCE(*p); \ 41 42 compiletime_assert_atomic_type(*p); \ 42 - RISCV_FENCE(r,rw); \ 43 + RISCV_FENCE(r, rw); \ 43 44 ___p1; \ 44 45 }) 45 46 ··· 68 69 * instances the scheduler pairs this with an mb(), so nothing is necessary on 69 70 * the new hart. 70 71 */ 71 - #define smp_mb__after_spinlock() RISCV_FENCE(iorw,iorw) 72 + #define smp_mb__after_spinlock() RISCV_FENCE(iorw, iorw) 72 73 73 74 #include <asm-generic/barrier.h> 74 75

+24 -114

arch/riscv/include/asm/bitops.h

··· 22 22 #include <asm-generic/bitops/fls.h> 23 23 24 24 #else 25 + #define __HAVE_ARCH___FFS 26 + #define __HAVE_ARCH___FLS 27 + #define __HAVE_ARCH_FFS 28 + #define __HAVE_ARCH_FLS 29 + 30 + #include <asm-generic/bitops/__ffs.h> 31 + #include <asm-generic/bitops/__fls.h> 32 + #include <asm-generic/bitops/ffs.h> 33 + #include <asm-generic/bitops/fls.h> 34 + 25 35 #include <asm/alternative-macros.h> 26 36 #include <asm/hwcap.h> 27 37 ··· 47 37 48 38 static __always_inline unsigned long variable__ffs(unsigned long word) 49 39 { 50 - int num; 51 - 52 40 asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0, 53 41 RISCV_ISA_EXT_ZBB, 1) 54 42 : : : : legacy); ··· 60 52 return word; 61 53 62 54 legacy: 63 - num = 0; 64 - #if BITS_PER_LONG == 64 65 - if ((word & 0xffffffff) == 0) { 66 - num += 32; 67 - word >>= 32; 68 - } 69 - #endif 70 - if ((word & 0xffff) == 0) { 71 - num += 16; 72 - word >>= 16; 73 - } 74 - if ((word & 0xff) == 0) { 75 - num += 8; 76 - word >>= 8; 77 - } 78 - if ((word & 0xf) == 0) { 79 - num += 4; 80 - word >>= 4; 81 - } 82 - if ((word & 0x3) == 0) { 83 - num += 2; 84 - word >>= 2; 85 - } 86 - if ((word & 0x1) == 0) 87 - num += 1; 88 - return num; 55 + return generic___ffs(word); 89 56 } 90 57 91 58 /** ··· 76 93 77 94 static __always_inline unsigned long variable__fls(unsigned long word) 78 95 { 79 - int num; 80 - 81 96 asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0, 82 97 RISCV_ISA_EXT_ZBB, 1) 83 98 : : : : legacy); ··· 89 108 return BITS_PER_LONG - 1 - word; 90 109 91 110 legacy: 92 - num = BITS_PER_LONG - 1; 93 - #if BITS_PER_LONG == 64 94 - if (!(word & (~0ul << 32))) { 95 - num -= 32; 96 - word <<= 32; 97 - } 98 - #endif 99 - if (!(word & (~0ul << (BITS_PER_LONG - 16)))) { 100 - num -= 16; 101 - word <<= 16; 102 - } 103 - if (!(word & (~0ul << (BITS_PER_LONG - 8)))) { 104 - num -= 8; 105 - word <<= 8; 106 - } 107 - if (!(word & (~0ul << (BITS_PER_LONG - 4)))) { 108 - num -= 4; 109 - word <<= 4; 110 - } 111 - if (!(word & (~0ul << (BITS_PER_LONG - 2)))) { 112 - num -= 2; 113 - word <<= 2; 114 - } 115 - if (!(word & (~0ul << (BITS_PER_LONG - 1)))) 116 - num -= 1; 117 - return num; 111 + return generic___fls(word); 118 112 } 119 113 120 114 /** ··· 105 149 106 150 static __always_inline int variable_ffs(int x) 107 151 { 108 - int r; 109 - 110 - if (!x) 111 - return 0; 112 - 113 152 asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0, 114 153 RISCV_ISA_EXT_ZBB, 1) 115 154 : : : : legacy); 155 + 156 + if (!x) 157 + return 0; 116 158 117 159 asm volatile (".option push\n" 118 160 ".option arch,+zbb\n" 119 161 CTZW "%0, %1\n" 120 162 ".option pop\n" 121 - : "=r" (r) : "r" (x) :); 163 + : "=r" (x) : "r" (x) :); 122 164 123 - return r + 1; 165 + return x + 1; 124 166 125 167 legacy: 126 - r = 1; 127 - if (!(x & 0xffff)) { 128 - x >>= 16; 129 - r += 16; 130 - } 131 - if (!(x & 0xff)) { 132 - x >>= 8; 133 - r += 8; 134 - } 135 - if (!(x & 0xf)) { 136 - x >>= 4; 137 - r += 4; 138 - } 139 - if (!(x & 3)) { 140 - x >>= 2; 141 - r += 2; 142 - } 143 - if (!(x & 1)) { 144 - x >>= 1; 145 - r += 1; 146 - } 147 - return r; 168 + return generic_ffs(x); 148 169 } 149 170 150 171 /** ··· 137 204 138 205 static __always_inline int variable_fls(unsigned int x) 139 206 { 140 - int r; 141 - 142 - if (!x) 143 - return 0; 144 - 145 207 asm goto(ALTERNATIVE("j %l[legacy]", "nop", 0, 146 208 RISCV_ISA_EXT_ZBB, 1) 147 209 : : : : legacy); 210 + 211 + if (!x) 212 + return 0; 148 213 149 214 asm volatile (".option push\n" 150 215 ".option arch,+zbb\n" 151 216 CLZW "%0, %1\n" 152 217 ".option pop\n" 153 - : "=r" (r) : "r" (x) :); 218 + : "=r" (x) : "r" (x) :); 154 219 155 - return 32 - r; 220 + return 32 - x; 156 221 157 222 legacy: 158 - r = 32; 159 - if (!(x & 0xffff0000u)) { 160 - x <<= 16; 161 - r -= 16; 162 - } 163 - if (!(x & 0xff000000u)) { 164 - x <<= 8; 165 - r -= 8; 166 - } 167 - if (!(x & 0xf0000000u)) { 168 - x <<= 4; 169 - r -= 4; 170 - } 171 - if (!(x & 0xc0000000u)) { 172 - x <<= 2; 173 - r -= 2; 174 - } 175 - if (!(x & 0x80000000u)) { 176 - x <<= 1; 177 - r -= 1; 178 - } 179 - return r; 223 + return generic_fls(x); 180 224 } 181 225 182 226 /**

+2 -3

arch/riscv/include/asm/cmpxchg.h

··· 8 8 9 9 #include <linux/bug.h> 10 10 11 - #include <asm/barrier.h> 12 11 #include <asm/fence.h> 13 12 14 13 #define __xchg_relaxed(ptr, new, size) \ ··· 312 313 " bne %0, %z3, 1f\n" \ 313 314 " sc.w.rl %1, %z4, %2\n" \ 314 315 " bnez %1, 0b\n" \ 315 - " fence rw, rw\n" \ 316 + RISCV_FULL_BARRIER \ 316 317 "1:\n" \ 317 318 : "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \ 318 319 : "rJ" ((long)__old), "rJ" (__new) \ ··· 324 325 " bne %0, %z3, 1f\n" \ 325 326 " sc.d.rl %1, %z4, %2\n" \ 326 327 " bnez %1, 0b\n" \ 327 - " fence rw, rw\n" \ 328 + RISCV_FULL_BARRIER \ 328 329 "1:\n" \ 329 330 : "=&r" (__ret), "=&r" (__rc), "+A" (*__ptr) \ 330 331 : "rJ" (__old), "rJ" (__new) \

+19

arch/riscv/include/asm/compat.h

··· 14 14 15 15 static inline int is_compat_task(void) 16 16 { 17 + if (!IS_ENABLED(CONFIG_COMPAT)) 18 + return 0; 19 + 17 20 return test_thread_flag(TIF_32BIT); 21 + } 22 + 23 + static inline int is_compat_thread(struct thread_info *thread) 24 + { 25 + if (!IS_ENABLED(CONFIG_COMPAT)) 26 + return 0; 27 + 28 + return test_ti_thread_flag(thread, TIF_32BIT); 29 + } 30 + 31 + static inline void set_compat_task(bool is_compat) 32 + { 33 + if (is_compat) 34 + set_thread_flag(TIF_32BIT); 35 + else 36 + clear_thread_flag(TIF_32BIT); 18 37 } 19 38 20 39 struct compat_user_regs_struct {

+19 -12

arch/riscv/include/asm/cpufeature.h

··· 1 1 /* SPDX-License-Identifier: GPL-2.0-only */ 2 2 /* 3 - * Copyright 2022-2023 Rivos, Inc 3 + * Copyright 2022-2024 Rivos, Inc 4 4 */ 5 5 6 6 #ifndef _ASM_CPUFEATURE_H ··· 28 28 29 29 DECLARE_PER_CPU(struct riscv_cpuinfo, riscv_cpuinfo); 30 30 31 - DECLARE_PER_CPU(long, misaligned_access_speed); 32 - 33 31 /* Per-cpu ISA extensions. */ 34 32 extern struct riscv_isainfo hart_isa[NR_CPUS]; 35 33 36 34 void riscv_user_isa_enable(void); 37 35 38 - #ifdef CONFIG_RISCV_MISALIGNED 39 - bool unaligned_ctl_available(void); 40 - bool check_unaligned_access_emulated(int cpu); 36 + #if defined(CONFIG_RISCV_MISALIGNED) 37 + bool check_unaligned_access_emulated_all_cpus(void); 41 38 void unaligned_emulation_finish(void); 39 + bool unaligned_ctl_available(void); 40 + DECLARE_PER_CPU(long, misaligned_access_speed); 42 41 #else 43 42 static inline bool unaligned_ctl_available(void) 44 43 { 45 44 return false; 46 45 } 46 + #endif 47 47 48 - static inline bool check_unaligned_access_emulated(int cpu) 48 + #if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS) 49 + DECLARE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key); 50 + 51 + static __always_inline bool has_fast_unaligned_accesses(void) 49 52 { 50 - return false; 53 + return static_branch_likely(&fast_unaligned_access_speed_key); 51 54 } 52 - 53 - static inline void unaligned_emulation_finish(void) {} 55 + #else 56 + static __always_inline bool has_fast_unaligned_accesses(void) 57 + { 58 + if (IS_ENABLED(CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS)) 59 + return true; 60 + else 61 + return false; 62 + } 54 63 #endif 55 64 56 65 unsigned long riscv_get_elf_hwcap(void); ··· 143 134 144 135 return __riscv_isa_extension_available(hart_isa[cpu].isa, ext); 145 136 } 146 - 147 - DECLARE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key); 148 137 149 138 #endif

+2 -9

arch/riscv/include/asm/elf.h

··· 53 53 #define ELF_ET_DYN_BASE ((DEFAULT_MAP_WINDOW / 3) * 2) 54 54 55 55 #ifdef CONFIG_64BIT 56 - #ifdef CONFIG_COMPAT 57 - #define STACK_RND_MASK (test_thread_flag(TIF_32BIT) ? \ 56 + #define STACK_RND_MASK (is_compat_task() ? \ 58 57 0x7ff >> (PAGE_SHIFT - 12) : \ 59 58 0x3ffff >> (PAGE_SHIFT - 12)) 60 - #else 61 - #define STACK_RND_MASK (0x3ffff >> (PAGE_SHIFT - 12)) 62 - #endif 63 59 #endif 64 60 65 61 /* ··· 135 139 #ifdef CONFIG_COMPAT 136 140 137 141 #define SET_PERSONALITY(ex) \ 138 - do { if ((ex).e_ident[EI_CLASS] == ELFCLASS32) \ 139 - set_thread_flag(TIF_32BIT); \ 140 - else \ 141 - clear_thread_flag(TIF_32BIT); \ 142 + do { set_compat_task((ex).e_ident[EI_CLASS] == ELFCLASS32); \ 142 143 if (personality(current->personality) != PER_LINUX32) \ 143 144 set_personality(PER_LINUX | \ 144 145 (current->personality & (~PER_MASK))); \

+2 -11

arch/riscv/include/asm/errata_list.h

··· 12 12 #include <asm/vendorid_list.h> 13 13 14 14 #ifdef CONFIG_ERRATA_ANDES 15 - #define ERRATA_ANDESTECH_NO_IOCP 0 16 - #define ERRATA_ANDESTECH_NUMBER 1 15 + #define ERRATA_ANDES_NO_IOCP 0 16 + #define ERRATA_ANDES_NUMBER 1 17 17 #endif 18 18 19 19 #ifdef CONFIG_ERRATA_SIFIVE ··· 111 111 112 112 #define THEAD_C9XX_RV_IRQ_PMU 17 113 113 #define THEAD_C9XX_CSR_SCOUNTEROF 0x5c5 114 - 115 - #define ALT_SBI_PMU_OVERFLOW(__ovl) \ 116 - asm volatile(ALTERNATIVE( \ 117 - "csrr %0, " __stringify(CSR_SSCOUNTOVF), \ 118 - "csrr %0, " __stringify(THEAD_C9XX_CSR_SCOUNTEROF), \ 119 - THEAD_VENDOR_ID, ERRATA_THEAD_PMU, \ 120 - CONFIG_ERRATA_THEAD_PMU) \ 121 - : "=r" (__ovl) : \ 122 - : "memory") 123 114 124 115 #endif /* __ASSEMBLY__ */ 125 116

+8 -2

arch/riscv/include/asm/fence.h

··· 1 1 #ifndef _ASM_RISCV_FENCE_H 2 2 #define _ASM_RISCV_FENCE_H 3 3 4 + #define RISCV_FENCE_ASM(p, s) "\tfence " #p "," #s "\n" 5 + #define RISCV_FENCE(p, s) \ 6 + ({ __asm__ __volatile__ (RISCV_FENCE_ASM(p, s) : : : "memory"); }) 7 + 4 8 #ifdef CONFIG_SMP 5 - #define RISCV_ACQUIRE_BARRIER "\tfence r , rw\n" 6 - #define RISCV_RELEASE_BARRIER "\tfence rw, w\n" 9 + #define RISCV_ACQUIRE_BARRIER RISCV_FENCE_ASM(r, rw) 10 + #define RISCV_RELEASE_BARRIER RISCV_FENCE_ASM(rw, w) 11 + #define RISCV_FULL_BARRIER RISCV_FENCE_ASM(rw, rw) 7 12 #else 8 13 #define RISCV_ACQUIRE_BARRIER 9 14 #define RISCV_RELEASE_BARRIER 15 + #define RISCV_FULL_BARRIER 10 16 #endif 11 17 12 18 #endif /* _ASM_RISCV_FENCE_H */

+1

arch/riscv/include/asm/hwcap.h

··· 80 80 #define RISCV_ISA_EXT_ZFA 71 81 81 #define RISCV_ISA_EXT_ZTSO 72 82 82 #define RISCV_ISA_EXT_ZACAS 73 83 + #define RISCV_ISA_EXT_XANDESPMU 74 83 84 84 85 #define RISCV_ISA_EXT_XLINUXENVCFG 127 85 86

+4 -4

arch/riscv/include/asm/io.h

··· 47 47 * sufficient to ensure this works sanely on controllers that support I/O 48 48 * writes. 49 49 */ 50 - #define __io_pbr() __asm__ __volatile__ ("fence io,i" : : : "memory"); 51 - #define __io_par(v) __asm__ __volatile__ ("fence i,ior" : : : "memory"); 52 - #define __io_pbw() __asm__ __volatile__ ("fence iow,o" : : : "memory"); 53 - #define __io_paw() __asm__ __volatile__ ("fence o,io" : : : "memory"); 50 + #define __io_pbr() RISCV_FENCE(io, i) 51 + #define __io_par(v) RISCV_FENCE(i, ior) 52 + #define __io_pbw() RISCV_FENCE(iow, o) 53 + #define __io_paw() RISCV_FENCE(o, io) 54 54 55 55 /* 56 56 * Accesses from a single hart to a single I/O address must be ordered. This

+50

arch/riscv/include/asm/membarrier.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef _ASM_RISCV_MEMBARRIER_H 3 + #define _ASM_RISCV_MEMBARRIER_H 4 + 5 + static inline void membarrier_arch_switch_mm(struct mm_struct *prev, 6 + struct mm_struct *next, 7 + struct task_struct *tsk) 8 + { 9 + /* 10 + * Only need the full barrier when switching between processes. 11 + * Barrier when switching from kernel to userspace is not 12 + * required here, given that it is implied by mmdrop(). Barrier 13 + * when switching from userspace to kernel is not needed after 14 + * store to rq->curr. 15 + */ 16 + if (IS_ENABLED(CONFIG_SMP) && 17 + likely(!(atomic_read(&next->membarrier_state) & 18 + (MEMBARRIER_STATE_PRIVATE_EXPEDITED | 19 + MEMBARRIER_STATE_GLOBAL_EXPEDITED)) || !prev)) 20 + return; 21 + 22 + /* 23 + * The membarrier system call requires a full memory barrier 24 + * after storing to rq->curr, before going back to user-space. 25 + * 26 + * This barrier is also needed for the SYNC_CORE command when 27 + * switching between processes; in particular, on a transition 28 + * from a thread belonging to another mm to a thread belonging 29 + * to the mm for which a membarrier SYNC_CORE is done on CPU0: 30 + * 31 + * - [CPU0] sets all bits in the mm icache_stale_mask (in 32 + * prepare_sync_core_cmd()); 33 + * 34 + * - [CPU1] stores to rq->curr (by the scheduler); 35 + * 36 + * - [CPU0] loads rq->curr within membarrier and observes 37 + * cpu_rq(1)->curr->mm != mm, so the IPI is skipped on 38 + * CPU1; this means membarrier relies on switch_mm() to 39 + * issue the sync-core; 40 + * 41 + * - [CPU1] switch_mm() loads icache_stale_mask; if the bit 42 + * is zero, switch_mm() may incorrectly skip the sync-core. 43 + * 44 + * Matches a full barrier in the proximity of the membarrier 45 + * system call entry. 46 + */ 47 + smp_mb(); 48 + } 49 + 50 + #endif /* _ASM_RISCV_MEMBARRIER_H */

+3 -2

arch/riscv/include/asm/mmio.h

··· 12 12 #define _ASM_RISCV_MMIO_H 13 13 14 14 #include <linux/types.h> 15 + #include <asm/fence.h> 15 16 #include <asm/mmiowb.h> 16 17 17 18 /* Generic IO read/write. These perform native-endian accesses. */ ··· 132 131 * doesn't define any ordering between the memory space and the I/O space. 133 132 */ 134 133 #define __io_br() do {} while (0) 135 - #define __io_ar(v) ({ __asm__ __volatile__ ("fence i,ir" : : : "memory"); }) 136 - #define __io_bw() ({ __asm__ __volatile__ ("fence w,o" : : : "memory"); }) 134 + #define __io_ar(v) RISCV_FENCE(i, ir) 135 + #define __io_bw() RISCV_FENCE(w, o) 137 136 #define __io_aw() mmiowb_set_pending() 138 137 139 138 #define readb(c) ({ u8 __v; __io_br(); __v = readb_cpu(c); __io_ar(__v); __v; })

+1 -1

arch/riscv/include/asm/mmiowb.h

··· 7 7 * "o,w" is sufficient to ensure that all writes to the device have completed 8 8 * before the write to the spinlock is allowed to commit. 9 9 */ 10 - #define mmiowb() __asm__ __volatile__ ("fence o,w" : : : "memory"); 10 + #define mmiowb() RISCV_FENCE(o, w) 11 11 12 12 #include <linux/smp.h> 13 13 #include <asm-generic/mmiowb.h>

+45 -22

arch/riscv/include/asm/pgalloc.h

··· 95 95 __pud_free(mm, pud); 96 96 } 97 97 98 - #define __pud_free_tlb(tlb, pud, addr) \ 99 - do { \ 100 - if (pgtable_l4_enabled) { \ 101 - pagetable_pud_dtor(virt_to_ptdesc(pud)); \ 102 - tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pud)); \ 103 - } \ 104 - } while (0) 98 + static inline void __pud_free_tlb(struct mmu_gather *tlb, pud_t *pud, 99 + unsigned long addr) 100 + { 101 + if (pgtable_l4_enabled) { 102 + struct ptdesc *ptdesc = virt_to_ptdesc(pud); 103 + 104 + pagetable_pud_dtor(ptdesc); 105 + if (riscv_use_ipi_for_rfence()) 106 + tlb_remove_page_ptdesc(tlb, ptdesc); 107 + else 108 + tlb_remove_ptdesc(tlb, ptdesc); 109 + } 110 + } 105 111 106 112 #define p4d_alloc_one p4d_alloc_one 107 113 static inline p4d_t *p4d_alloc_one(struct mm_struct *mm, unsigned long addr) ··· 136 130 __p4d_free(mm, p4d); 137 131 } 138 132 139 - #define __p4d_free_tlb(tlb, p4d, addr) \ 140 - do { \ 141 - if (pgtable_l5_enabled) \ 142 - tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(p4d)); \ 143 - } while (0) 133 + static inline void __p4d_free_tlb(struct mmu_gather *tlb, p4d_t *p4d, 134 + unsigned long addr) 135 + { 136 + if (pgtable_l5_enabled) { 137 + if (riscv_use_ipi_for_rfence()) 138 + tlb_remove_page_ptdesc(tlb, virt_to_ptdesc(p4d)); 139 + else 140 + tlb_remove_ptdesc(tlb, virt_to_ptdesc(p4d)); 141 + } 142 + } 144 143 #endif /* __PAGETABLE_PMD_FOLDED */ 145 144 146 145 static inline void sync_kernel_mappings(pgd_t *pgd) ··· 170 159 171 160 #ifndef __PAGETABLE_PMD_FOLDED 172 161 173 - #define __pmd_free_tlb(tlb, pmd, addr) \ 174 - do { \ 175 - pagetable_pmd_dtor(virt_to_ptdesc(pmd)); \ 176 - tlb_remove_page_ptdesc((tlb), virt_to_ptdesc(pmd)); \ 177 - } while (0) 162 + static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmd, 163 + unsigned long addr) 164 + { 165 + struct ptdesc *ptdesc = virt_to_ptdesc(pmd); 166 + 167 + pagetable_pmd_dtor(ptdesc); 168 + if (riscv_use_ipi_for_rfence()) 169 + tlb_remove_page_ptdesc(tlb, ptdesc); 170 + else 171 + tlb_remove_ptdesc(tlb, ptdesc); 172 + } 178 173 179 174 #endif /* __PAGETABLE_PMD_FOLDED */ 180 175 181 - #define __pte_free_tlb(tlb, pte, buf) \ 182 - do { \ 183 - pagetable_pte_dtor(page_ptdesc(pte)); \ 184 - tlb_remove_page_ptdesc((tlb), page_ptdesc(pte));\ 185 - } while (0) 176 + static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, 177 + unsigned long addr) 178 + { 179 + struct ptdesc *ptdesc = page_ptdesc(pte); 180 + 181 + pagetable_pte_dtor(ptdesc); 182 + if (riscv_use_ipi_for_rfence()) 183 + tlb_remove_page_ptdesc(tlb, ptdesc); 184 + else 185 + tlb_remove_ptdesc(tlb, ptdesc); 186 + } 186 187 #endif /* CONFIG_MMU */ 187 188 188 189 #endif /* _ASM_RISCV_PGALLOC_H */

+17 -15

arch/riscv/include/asm/pgtable.h

··· 127 127 #define VA_USER_SV48 (UL(1) << (VA_BITS_SV48 - 1)) 128 128 #define VA_USER_SV57 (UL(1) << (VA_BITS_SV57 - 1)) 129 129 130 - #ifdef CONFIG_COMPAT 131 130 #define MMAP_VA_BITS_64 ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS) 132 131 #define MMAP_MIN_VA_BITS_64 (VA_BITS_SV39) 133 132 #define MMAP_VA_BITS (is_compat_task() ? VA_BITS_SV32 : MMAP_VA_BITS_64) 134 133 #define MMAP_MIN_VA_BITS (is_compat_task() ? VA_BITS_SV32 : MMAP_MIN_VA_BITS_64) 135 - #else 136 - #define MMAP_VA_BITS ((VA_BITS >= VA_BITS_SV48) ? VA_BITS_SV48 : VA_BITS) 137 - #define MMAP_MIN_VA_BITS (VA_BITS_SV39) 138 - #endif /* CONFIG_COMPAT */ 139 - 140 134 #else 141 135 #include <asm/pgtable-32.h> 142 136 #endif /* CONFIG_64BIT */ ··· 433 439 return pte; 434 440 } 435 441 442 + #ifdef CONFIG_RISCV_ISA_SVNAPOT 436 443 #define pte_leaf_size(pte) (pte_napot(pte) ? \ 437 444 napot_cont_size(napot_cont_order(pte)) :\ 438 445 PAGE_SIZE) 446 + #endif 439 447 440 448 #ifdef CONFIG_NUMA_BALANCING 441 449 /* ··· 513 517 WRITE_ONCE(*ptep, pteval); 514 518 } 515 519 516 - void flush_icache_pte(pte_t pte); 520 + void flush_icache_pte(struct mm_struct *mm, pte_t pte); 517 521 518 - static inline void __set_pte_at(pte_t *ptep, pte_t pteval) 522 + static inline void __set_pte_at(struct mm_struct *mm, pte_t *ptep, pte_t pteval) 519 523 { 520 524 if (pte_present(pteval) && pte_exec(pteval)) 521 - flush_icache_pte(pteval); 525 + flush_icache_pte(mm, pteval); 522 526 523 527 set_pte(ptep, pteval); 524 528 } ··· 531 535 page_table_check_ptes_set(mm, ptep, pteval, nr); 532 536 533 537 for (;;) { 534 - __set_pte_at(ptep, pteval); 538 + __set_pte_at(mm, ptep, pteval); 535 539 if (--nr == 0) 536 540 break; 537 541 ptep++; ··· 543 547 static inline void pte_clear(struct mm_struct *mm, 544 548 unsigned long addr, pte_t *ptep) 545 549 { 546 - __set_pte_at(ptep, __pte(0)); 550 + __set_pte_at(mm, ptep, __pte(0)); 547 551 } 548 552 549 553 #define __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS /* defined in mm/pgtable.c */ ··· 658 662 return pte_write(pmd_pte(pmd)); 659 663 } 660 664 665 + #define pud_write pud_write 666 + static inline int pud_write(pud_t pud) 667 + { 668 + return pte_write(pud_pte(pud)); 669 + } 670 + 661 671 #define pmd_dirty pmd_dirty 662 672 static inline int pmd_dirty(pmd_t pmd) 663 673 { ··· 715 713 pmd_t *pmdp, pmd_t pmd) 716 714 { 717 715 page_table_check_pmd_set(mm, pmdp, pmd); 718 - return __set_pte_at((pte_t *)pmdp, pmd_pte(pmd)); 716 + return __set_pte_at(mm, (pte_t *)pmdp, pmd_pte(pmd)); 719 717 } 720 718 721 719 static inline void set_pud_at(struct mm_struct *mm, unsigned long addr, 722 720 pud_t *pudp, pud_t pud) 723 721 { 724 722 page_table_check_pud_set(mm, pudp, pud); 725 - return __set_pte_at((pte_t *)pudp, pud_pte(pud)); 723 + return __set_pte_at(mm, (pte_t *)pudp, pud_pte(pud)); 726 724 } 727 725 728 726 #ifdef CONFIG_PAGE_TABLE_CHECK ··· 873 871 #define TASK_SIZE_MIN (PGDIR_SIZE_L3 * PTRS_PER_PGD / 2) 874 872 875 873 #ifdef CONFIG_COMPAT 876 - #define TASK_SIZE_32 (_AC(0x80000000, UL)) 877 - #define TASK_SIZE (test_thread_flag(TIF_32BIT) ? \ 874 + #define TASK_SIZE_32 (_AC(0x80000000, UL) - PAGE_SIZE) 875 + #define TASK_SIZE (is_compat_task() ? \ 878 876 TASK_SIZE_32 : TASK_SIZE_64) 879 877 #else 880 878 #define TASK_SIZE TASK_SIZE_64

+15 -16

arch/riscv/include/asm/processor.h

··· 14 14 15 15 #include <asm/ptrace.h> 16 16 17 - #ifdef CONFIG_64BIT 18 - #define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1)) 19 - #define STACK_TOP_MAX TASK_SIZE 20 - 17 + /* 18 + * addr is a hint to the maximum userspace address that mmap should provide, so 19 + * this macro needs to return the largest address space available so that 20 + * mmap_end < addr, being mmap_end the top of that address space. 21 + * See Documentation/arch/riscv/vm-layout.rst for more details. 22 + */ 21 23 #define arch_get_mmap_end(addr, len, flags) \ 22 24 ({ \ 23 25 unsigned long mmap_end; \ 24 26 typeof(addr) _addr = (addr); \ 25 - if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) && is_compat_task())) \ 27 + if ((_addr) == 0 || is_compat_task() || \ 28 + ((_addr + len) > BIT(VA_BITS - 1))) \ 26 29 mmap_end = STACK_TOP_MAX; \ 27 - else if ((_addr) >= VA_USER_SV57) \ 28 - mmap_end = STACK_TOP_MAX; \ 29 - else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \ 30 - mmap_end = VA_USER_SV48; \ 31 30 else \ 32 - mmap_end = VA_USER_SV39; \ 31 + mmap_end = (_addr + len); \ 33 32 mmap_end; \ 34 33 }) 35 34 ··· 38 39 typeof(addr) _addr = (addr); \ 39 40 typeof(base) _base = (base); \ 40 41 unsigned long rnd_gap = DEFAULT_MAP_WINDOW - (_base); \ 41 - if ((_addr) == 0 || (IS_ENABLED(CONFIG_COMPAT) && is_compat_task())) \ 42 + if ((_addr) == 0 || is_compat_task() || \ 43 + ((_addr + len) > BIT(VA_BITS - 1))) \ 42 44 mmap_base = (_base); \ 43 - else if (((_addr) >= VA_USER_SV57) && (VA_BITS >= VA_BITS_SV57)) \ 44 - mmap_base = VA_USER_SV57 - rnd_gap; \ 45 - else if ((((_addr) >= VA_USER_SV48)) && (VA_BITS >= VA_BITS_SV48)) \ 46 - mmap_base = VA_USER_SV48 - rnd_gap; \ 47 45 else \ 48 - mmap_base = VA_USER_SV39 - rnd_gap; \ 46 + mmap_base = (_addr + len) - rnd_gap; \ 49 47 mmap_base; \ 50 48 }) 51 49 50 + #ifdef CONFIG_64BIT 51 + #define DEFAULT_MAP_WINDOW (UL(1) << (MMAP_VA_BITS - 1)) 52 + #define STACK_TOP_MAX TASK_SIZE_64 52 53 #else 53 54 #define DEFAULT_MAP_WINDOW TASK_SIZE 54 55 #define STACK_TOP_MAX TASK_SIZE

+2 -2

arch/riscv/include/asm/simd.h

··· 34 34 return false; 35 35 36 36 /* 37 - * Nesting is acheived in preempt_v by spreading the control for 37 + * Nesting is achieved in preempt_v by spreading the control for 38 38 * preemptible and non-preemptible kernel-mode Vector into two fields. 39 - * Always try to match with prempt_v if kernel V-context exists. Then, 39 + * Always try to match with preempt_v if kernel V-context exists. Then, 40 40 * fallback to check non preempt_v if nesting happens, or if the config 41 41 * is not set. 42 42 */

+3

arch/riscv/include/asm/suspend.h

··· 56 56 asmlinkage void hibernate_restore_image(unsigned long resume_satp, unsigned long satp_temp, 57 57 unsigned long cpu_resume); 58 58 asmlinkage int hibernate_core_restore_code(void); 59 + bool riscv_sbi_hsm_is_supported(void); 60 + bool riscv_sbi_suspend_state_is_valid(u32 state); 61 + int riscv_sbi_hart_suspend(u32 state); 59 62 #endif

+29

arch/riscv/include/asm/sync_core.h

··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _ASM_RISCV_SYNC_CORE_H 3 + #define _ASM_RISCV_SYNC_CORE_H 4 + 5 + /* 6 + * RISC-V implements return to user-space through an xRET instruction, 7 + * which is not core serializing. 8 + */ 9 + static inline void sync_core_before_usermode(void) 10 + { 11 + asm volatile ("fence.i" ::: "memory"); 12 + } 13 + 14 + #ifdef CONFIG_SMP 15 + /* 16 + * Ensure the next switch_mm() on every CPU issues a core serializing 17 + * instruction for the given @mm. 18 + */ 19 + static inline void prepare_sync_core_cmd(struct mm_struct *mm) 20 + { 21 + cpumask_setall(&mm->context.icache_stale_mask); 22 + } 23 + #else 24 + static inline void prepare_sync_core_cmd(struct mm_struct *mm) 25 + { 26 + } 27 + #endif /* CONFIG_SMP */ 28 + 29 + #endif /* _ASM_RISCV_SYNC_CORE_H */

+39 -14

arch/riscv/include/asm/syscall_wrapper.h

··· 12 12 13 13 asmlinkage long __riscv_sys_ni_syscall(const struct pt_regs *); 14 14 15 - #define SC_RISCV_REGS_TO_ARGS(x, ...) \ 16 - __MAP(x,__SC_ARGS \ 17 - ,,regs->orig_a0,,regs->a1,,regs->a2 \ 15 + #ifdef CONFIG_64BIT 16 + 17 + #define __SYSCALL_SE_DEFINEx(x, prefix, name, ...) \ 18 + static long __se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ 19 + static long __se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__)) 20 + 21 + #define SC_RISCV_REGS_TO_ARGS(x, ...) \ 22 + __MAP(x,__SC_ARGS \ 23 + ,,regs->orig_a0,,regs->a1,,regs->a2 \ 18 24 ,,regs->a3,,regs->a4,,regs->a5,,regs->a6) 25 + 26 + #else 27 + /* 28 + * Use type aliasing to ensure registers a0-a6 are correctly passed to the syscall 29 + * implementation when >word-size arguments are used. 30 + */ 31 + #define __SYSCALL_SE_DEFINEx(x, prefix, name, ...) \ 32 + __diag_push(); \ 33 + __diag_ignore(GCC, 8, "-Wattribute-alias", \ 34 + "Type aliasing is used to sanitize syscall arguments"); \ 35 + static long __se_##prefix##name(ulong, ulong, ulong, ulong, ulong, ulong, \ 36 + ulong) \ 37 + __attribute__((alias(__stringify(___se_##prefix##name)))); \ 38 + __diag_pop(); \ 39 + static long noinline ___se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ 40 + static long ___se_##prefix##name(__MAP(x,__SC_LONG,__VA_ARGS__)) 41 + 42 + #define SC_RISCV_REGS_TO_ARGS(x, ...) \ 43 + regs->orig_a0,regs->a1,regs->a2,regs->a3,regs->a4,regs->a5,regs->a6 44 + 45 + #endif /* CONFIG_64BIT */ 19 46 20 47 #ifdef CONFIG_COMPAT 21 48 22 49 #define COMPAT_SYSCALL_DEFINEx(x, name, ...) \ 23 50 asmlinkage long __riscv_compat_sys##name(const struct pt_regs *regs); \ 24 51 ALLOW_ERROR_INJECTION(__riscv_compat_sys##name, ERRNO); \ 25 - static long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ 26 52 static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ 53 + __SYSCALL_SE_DEFINEx(x, compat_sys, name, __VA_ARGS__) \ 54 + { \ 55 + return __do_compat_sys##name(__MAP(x,__SC_DELOUSE,__VA_ARGS__)); \ 56 + } \ 27 57 asmlinkage long __riscv_compat_sys##name(const struct pt_regs *regs) \ 28 58 { \ 29 59 return __se_compat_sys##name(SC_RISCV_REGS_TO_ARGS(x,__VA_ARGS__)); \ 30 - } \ 31 - static long __se_compat_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ 32 - { \ 33 - return __do_compat_sys##name(__MAP(x,__SC_DELOUSE,__VA_ARGS__)); \ 34 60 } \ 35 61 static inline long __do_compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) 36 62 ··· 77 51 #define __SYSCALL_DEFINEx(x, name, ...) \ 78 52 asmlinkage long __riscv_sys##name(const struct pt_regs *regs); \ 79 53 ALLOW_ERROR_INJECTION(__riscv_sys##name, ERRNO); \ 80 - static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ 81 54 static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)); \ 82 - asmlinkage long __riscv_sys##name(const struct pt_regs *regs) \ 83 - { \ 84 - return __se_sys##name(SC_RISCV_REGS_TO_ARGS(x,__VA_ARGS__)); \ 85 - } \ 86 - static long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ 55 + __SYSCALL_SE_DEFINEx(x, sys, name, __VA_ARGS__) \ 87 56 { \ 88 57 long ret = __do_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__)); \ 89 58 __MAP(x,__SC_TEST,__VA_ARGS__); \ 90 59 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \ 91 60 return ret; \ 61 + } \ 62 + asmlinkage long __riscv_sys##name(const struct pt_regs *regs) \ 63 + { \ 64 + return __se_sys##name(SC_RISCV_REGS_TO_ARGS(x,__VA_ARGS__)); \ 92 65 } \ 93 66 static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) 94 67

+18

arch/riscv/include/asm/tlb.h

··· 10 10 11 11 static void tlb_flush(struct mmu_gather *tlb); 12 12 13 + #ifdef CONFIG_MMU 14 + #include <linux/swap.h> 15 + 16 + /* 17 + * While riscv platforms with riscv_ipi_for_rfence as true require an IPI to 18 + * perform TLB shootdown, some platforms with riscv_ipi_for_rfence as false use 19 + * SBI to perform TLB shootdown. To keep software pagetable walkers safe in this 20 + * case we switch to RCU based table free (MMU_GATHER_RCU_TABLE_FREE). See the 21 + * comment below 'ifdef CONFIG_MMU_GATHER_RCU_TABLE_FREE' in include/asm-generic/tlb.h 22 + * for more details. 23 + */ 24 + static inline void __tlb_remove_table(void *table) 25 + { 26 + free_page_and_swap_cache(table); 27 + } 28 + 29 + #endif /* CONFIG_MMU */ 30 + 13 31 #define tlb_flush tlb_flush 14 32 #include <asm-generic/tlb.h> 15 33

+11

arch/riscv/include/asm/vector.h

··· 284 284 285 285 #endif /* CONFIG_RISCV_ISA_V */ 286 286 287 + /* 288 + * Return the implementation's vlen value. 289 + * 290 + * riscv_v_vsize contains the value of "32 vector registers with vlenb length" 291 + * so rebuild the vlen value in bits from it. 292 + */ 293 + static inline int riscv_vector_vlen(void) 294 + { 295 + return riscv_v_vsize / 32 * 8; 296 + } 297 + 287 298 #endif /* ! __ASM_RISCV_VECTOR_H */

+1 -1

arch/riscv/include/asm/vendorid_list.h

··· 5 5 #ifndef ASM_VENDOR_LIST_H 6 6 #define ASM_VENDOR_LIST_H 7 7 8 - #define ANDESTECH_VENDOR_ID 0x31e 8 + #define ANDES_VENDOR_ID 0x31e 9 9 #define SIFIVE_VENDOR_ID 0x489 10 10 #define THEAD_VENDOR_ID 0x5b7 11 11

+3 -1

arch/riscv/kernel/Makefile

··· 39 39 obj-y += head.o 40 40 obj-y += soc.o 41 41 obj-$(CONFIG_RISCV_ALTERNATIVE) += alternative.o 42 - obj-y += copy-unaligned.o 43 42 obj-y += cpu.o 44 43 obj-y += cpufeature.o 45 44 obj-y += entry.o ··· 63 64 obj-$(CONFIG_MMU) += vdso.o vdso/ 64 65 65 66 obj-$(CONFIG_RISCV_MISALIGNED) += traps_misaligned.o 67 + obj-$(CONFIG_RISCV_MISALIGNED) += unaligned_access_speed.o 68 + obj-$(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS) += copy-unaligned.o 69 + 66 70 obj-$(CONFIG_FPU) += fpu.o 67 71 obj-$(CONFIG_RISCV_ISA_V) += vector.o 68 72 obj-$(CONFIG_RISCV_ISA_V) += kernel_mode_vector.o

+1 -1

arch/riscv/kernel/alternative.c

··· 43 43 44 44 switch (cpu_mfr_info->vendor_id) { 45 45 #ifdef CONFIG_ERRATA_ANDES 46 - case ANDESTECH_VENDOR_ID: 46 + case ANDES_VENDOR_ID: 47 47 cpu_mfr_info->patch_func = andes_errata_patch_func; 48 48 break; 49 49 #endif

+1 -255

arch/riscv/kernel/cpufeature.c

··· 11 11 #include <linux/cpu.h> 12 12 #include <linux/cpuhotplug.h> 13 13 #include <linux/ctype.h> 14 - #include <linux/jump_label.h> 15 14 #include <linux/log2.h> 16 15 #include <linux/memory.h> 17 16 #include <linux/module.h> ··· 20 21 #include <asm/cacheflush.h> 21 22 #include <asm/cpufeature.h> 22 23 #include <asm/hwcap.h> 23 - #include <asm/hwprobe.h> 24 24 #include <asm/patch.h> 25 25 #include <asm/processor.h> 26 26 #include <asm/sbi.h> 27 27 #include <asm/vector.h> 28 28 29 - #include "copy-unaligned.h" 30 - 31 29 #define NUM_ALPHA_EXTS ('z' - 'a' + 1) 32 - 33 - #define MISALIGNED_ACCESS_JIFFIES_LG2 1 34 - #define MISALIGNED_BUFFER_SIZE 0x4000 35 - #define MISALIGNED_BUFFER_ORDER get_order(MISALIGNED_BUFFER_SIZE) 36 - #define MISALIGNED_COPY_SIZE ((MISALIGNED_BUFFER_SIZE / 2) - 0x80) 37 30 38 31 unsigned long elf_hwcap __read_mostly; 39 32 ··· 34 43 35 44 /* Per-cpu ISA extensions. */ 36 45 struct riscv_isainfo hart_isa[NR_CPUS]; 37 - 38 - /* Performance information */ 39 - DEFINE_PER_CPU(long, misaligned_access_speed); 40 - 41 - static cpumask_t fast_misaligned_access; 42 46 43 47 /** 44 48 * riscv_isa_extension_base() - Get base extension word ··· 304 318 __RISCV_ISA_EXT_DATA(svinval, RISCV_ISA_EXT_SVINVAL), 305 319 __RISCV_ISA_EXT_DATA(svnapot, RISCV_ISA_EXT_SVNAPOT), 306 320 __RISCV_ISA_EXT_DATA(svpbmt, RISCV_ISA_EXT_SVPBMT), 321 + __RISCV_ISA_EXT_DATA(xandespmu, RISCV_ISA_EXT_XANDESPMU), 307 322 }; 308 323 309 324 const size_t riscv_isa_ext_count = ARRAY_SIZE(riscv_isa_ext); ··· 717 730 718 731 return hwcap; 719 732 } 720 - 721 - static int check_unaligned_access(void *param) 722 - { 723 - int cpu = smp_processor_id(); 724 - u64 start_cycles, end_cycles; 725 - u64 word_cycles; 726 - u64 byte_cycles; 727 - int ratio; 728 - unsigned long start_jiffies, now; 729 - struct page *page = param; 730 - void *dst; 731 - void *src; 732 - long speed = RISCV_HWPROBE_MISALIGNED_SLOW; 733 - 734 - if (check_unaligned_access_emulated(cpu)) 735 - return 0; 736 - 737 - /* Make an unaligned destination buffer. */ 738 - dst = (void *)((unsigned long)page_address(page) | 0x1); 739 - /* Unalign src as well, but differently (off by 1 + 2 = 3). */ 740 - src = dst + (MISALIGNED_BUFFER_SIZE / 2); 741 - src += 2; 742 - word_cycles = -1ULL; 743 - /* Do a warmup. */ 744 - __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE); 745 - preempt_disable(); 746 - start_jiffies = jiffies; 747 - while ((now = jiffies) == start_jiffies) 748 - cpu_relax(); 749 - 750 - /* 751 - * For a fixed amount of time, repeatedly try the function, and take 752 - * the best time in cycles as the measurement. 753 - */ 754 - while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) { 755 - start_cycles = get_cycles64(); 756 - /* Ensure the CSR read can't reorder WRT to the copy. */ 757 - mb(); 758 - __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE); 759 - /* Ensure the copy ends before the end time is snapped. */ 760 - mb(); 761 - end_cycles = get_cycles64(); 762 - if ((end_cycles - start_cycles) < word_cycles) 763 - word_cycles = end_cycles - start_cycles; 764 - } 765 - 766 - byte_cycles = -1ULL; 767 - __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE); 768 - start_jiffies = jiffies; 769 - while ((now = jiffies) == start_jiffies) 770 - cpu_relax(); 771 - 772 - while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) { 773 - start_cycles = get_cycles64(); 774 - mb(); 775 - __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE); 776 - mb(); 777 - end_cycles = get_cycles64(); 778 - if ((end_cycles - start_cycles) < byte_cycles) 779 - byte_cycles = end_cycles - start_cycles; 780 - } 781 - 782 - preempt_enable(); 783 - 784 - /* Don't divide by zero. */ 785 - if (!word_cycles || !byte_cycles) { 786 - pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n", 787 - cpu); 788 - 789 - return 0; 790 - } 791 - 792 - if (word_cycles < byte_cycles) 793 - speed = RISCV_HWPROBE_MISALIGNED_FAST; 794 - 795 - ratio = div_u64((byte_cycles * 100), word_cycles); 796 - pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n", 797 - cpu, 798 - ratio / 100, 799 - ratio % 100, 800 - (speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow"); 801 - 802 - per_cpu(misaligned_access_speed, cpu) = speed; 803 - 804 - /* 805 - * Set the value of fast_misaligned_access of a CPU. These operations 806 - * are atomic to avoid race conditions. 807 - */ 808 - if (speed == RISCV_HWPROBE_MISALIGNED_FAST) 809 - cpumask_set_cpu(cpu, &fast_misaligned_access); 810 - else 811 - cpumask_clear_cpu(cpu, &fast_misaligned_access); 812 - 813 - return 0; 814 - } 815 - 816 - static void check_unaligned_access_nonboot_cpu(void *param) 817 - { 818 - unsigned int cpu = smp_processor_id(); 819 - struct page **pages = param; 820 - 821 - if (smp_processor_id() != 0) 822 - check_unaligned_access(pages[cpu]); 823 - } 824 - 825 - DEFINE_STATIC_KEY_FALSE(fast_misaligned_access_speed_key); 826 - 827 - static void modify_unaligned_access_branches(cpumask_t *mask, int weight) 828 - { 829 - if (cpumask_weight(mask) == weight) 830 - static_branch_enable_cpuslocked(&fast_misaligned_access_speed_key); 831 - else 832 - static_branch_disable_cpuslocked(&fast_misaligned_access_speed_key); 833 - } 834 - 835 - static void set_unaligned_access_static_branches_except_cpu(int cpu) 836 - { 837 - /* 838 - * Same as set_unaligned_access_static_branches, except excludes the 839 - * given CPU from the result. When a CPU is hotplugged into an offline 840 - * state, this function is called before the CPU is set to offline in 841 - * the cpumask, and thus the CPU needs to be explicitly excluded. 842 - */ 843 - 844 - cpumask_t fast_except_me; 845 - 846 - cpumask_and(&fast_except_me, &fast_misaligned_access, cpu_online_mask); 847 - cpumask_clear_cpu(cpu, &fast_except_me); 848 - 849 - modify_unaligned_access_branches(&fast_except_me, num_online_cpus() - 1); 850 - } 851 - 852 - static void set_unaligned_access_static_branches(void) 853 - { 854 - /* 855 - * This will be called after check_unaligned_access_all_cpus so the 856 - * result of unaligned access speed for all CPUs will be available. 857 - * 858 - * To avoid the number of online cpus changing between reading 859 - * cpu_online_mask and calling num_online_cpus, cpus_read_lock must be 860 - * held before calling this function. 861 - */ 862 - 863 - cpumask_t fast_and_online; 864 - 865 - cpumask_and(&fast_and_online, &fast_misaligned_access, cpu_online_mask); 866 - 867 - modify_unaligned_access_branches(&fast_and_online, num_online_cpus()); 868 - } 869 - 870 - static int lock_and_set_unaligned_access_static_branch(void) 871 - { 872 - cpus_read_lock(); 873 - set_unaligned_access_static_branches(); 874 - cpus_read_unlock(); 875 - 876 - return 0; 877 - } 878 - 879 - arch_initcall_sync(lock_and_set_unaligned_access_static_branch); 880 - 881 - static int riscv_online_cpu(unsigned int cpu) 882 - { 883 - static struct page *buf; 884 - 885 - /* We are already set since the last check */ 886 - if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN) 887 - goto exit; 888 - 889 - buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER); 890 - if (!buf) { 891 - pr_warn("Allocation failure, not measuring misaligned performance\n"); 892 - return -ENOMEM; 893 - } 894 - 895 - check_unaligned_access(buf); 896 - __free_pages(buf, MISALIGNED_BUFFER_ORDER); 897 - 898 - exit: 899 - set_unaligned_access_static_branches(); 900 - 901 - return 0; 902 - } 903 - 904 - static int riscv_offline_cpu(unsigned int cpu) 905 - { 906 - set_unaligned_access_static_branches_except_cpu(cpu); 907 - 908 - return 0; 909 - } 910 - 911 - /* Measure unaligned access on all CPUs present at boot in parallel. */ 912 - static int check_unaligned_access_all_cpus(void) 913 - { 914 - unsigned int cpu; 915 - unsigned int cpu_count = num_possible_cpus(); 916 - struct page **bufs = kzalloc(cpu_count * sizeof(struct page *), 917 - GFP_KERNEL); 918 - 919 - if (!bufs) { 920 - pr_warn("Allocation failure, not measuring misaligned performance\n"); 921 - return 0; 922 - } 923 - 924 - /* 925 - * Allocate separate buffers for each CPU so there's no fighting over 926 - * cache lines. 927 - */ 928 - for_each_cpu(cpu, cpu_online_mask) { 929 - bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER); 930 - if (!bufs[cpu]) { 931 - pr_warn("Allocation failure, not measuring misaligned performance\n"); 932 - goto out; 933 - } 934 - } 935 - 936 - /* Check everybody except 0, who stays behind to tend jiffies. */ 937 - on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1); 938 - 939 - /* Check core 0. */ 940 - smp_call_on_cpu(0, check_unaligned_access, bufs[0], true); 941 - 942 - /* 943 - * Setup hotplug callbacks for any new CPUs that come online or go 944 - * offline. 945 - */ 946 - cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online", 947 - riscv_online_cpu, riscv_offline_cpu); 948 - 949 - out: 950 - unaligned_emulation_finish(); 951 - for_each_cpu(cpu, cpu_online_mask) { 952 - if (bufs[cpu]) 953 - __free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER); 954 - } 955 - 956 - kfree(bufs); 957 - return 0; 958 - } 959 - 960 - arch_initcall(check_unaligned_access_all_cpus); 961 733 962 734 void riscv_user_isa_enable(void) 963 735 {

+3

arch/riscv/kernel/entry.S

··· 111 111 1: 112 112 tail do_trap_unknown 113 113 SYM_CODE_END(handle_exception) 114 + ASM_NOKPROBE(handle_exception) 114 115 115 116 /* 116 117 * The ret_from_exception must be called with interrupt disabled. Here is the ··· 185 184 sret 186 185 #endif 187 186 SYM_CODE_END(ret_from_exception) 187 + ASM_NOKPROBE(ret_from_exception) 188 188 189 189 #ifdef CONFIG_VMAP_STACK 190 190 SYM_CODE_START_LOCAL(handle_kernel_stack_overflow) ··· 221 219 move a0, sp 222 220 tail handle_bad_stack 223 221 SYM_CODE_END(handle_kernel_stack_overflow) 222 + ASM_NOKPROBE(handle_kernel_stack_overflow) 224 223 #endif 225 224 226 225 SYM_CODE_START(ret_from_fork)

+3

arch/riscv/kernel/pi/Makefile

··· 9 9 -fno-asynchronous-unwind-tables -fno-unwind-tables \ 10 10 $(call cc-option,-fno-addrsig) 11 11 12 + # Disable LTO 13 + KBUILD_CFLAGS := $(filter-out $(CC_FLAGS_LTO), $(KBUILD_CFLAGS)) 14 + 12 15 KBUILD_CFLAGS += -mcmodel=medany 13 16 14 17 CFLAGS_cmdline_early.o += -D__NO_FORTIFY

+3 -3

arch/riscv/kernel/ptrace.c

··· 377 377 378 378 return ret; 379 379 } 380 + #else 381 + static const struct user_regset_view compat_riscv_user_native_view = {}; 380 382 #endif /* CONFIG_COMPAT */ 381 383 382 384 const struct user_regset_view *task_user_regset_view(struct task_struct *task) 383 385 { 384 - #ifdef CONFIG_COMPAT 385 - if (test_tsk_thread_flag(task, TIF_32BIT)) 386 + if (is_compat_thread(&task->thread_info)) 386 387 return &compat_riscv_user_native_view; 387 388 else 388 - #endif 389 389 return &riscv_user_native_view; 390 390 }

-1

arch/riscv/kernel/smpboot.c

··· 28 28 29 29 #include <asm/cpufeature.h> 30 30 #include <asm/cpu_ops.h> 31 - #include <asm/cpufeature.h> 32 31 #include <asm/irq.h> 33 32 #include <asm/mmu_context.h> 34 33 #include <asm/numa.h>

+49

arch/riscv/kernel/suspend.c

··· 132 132 } 133 133 134 134 arch_initcall(sbi_system_suspend_init); 135 + 136 + static int sbi_suspend_finisher(unsigned long suspend_type, 137 + unsigned long resume_addr, 138 + unsigned long opaque) 139 + { 140 + struct sbiret ret; 141 + 142 + ret = sbi_ecall(SBI_EXT_HSM, SBI_EXT_HSM_HART_SUSPEND, 143 + suspend_type, resume_addr, opaque, 0, 0, 0); 144 + 145 + return (ret.error) ? sbi_err_map_linux_errno(ret.error) : 0; 146 + } 147 + 148 + int riscv_sbi_hart_suspend(u32 state) 149 + { 150 + if (state & SBI_HSM_SUSP_NON_RET_BIT) 151 + return cpu_suspend(state, sbi_suspend_finisher); 152 + else 153 + return sbi_suspend_finisher(state, 0, 0); 154 + } 155 + 156 + bool riscv_sbi_suspend_state_is_valid(u32 state) 157 + { 158 + if (state > SBI_HSM_SUSPEND_RET_DEFAULT && 159 + state < SBI_HSM_SUSPEND_RET_PLATFORM) 160 + return false; 161 + 162 + if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && 163 + state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) 164 + return false; 165 + 166 + return true; 167 + } 168 + 169 + bool riscv_sbi_hsm_is_supported(void) 170 + { 171 + /* 172 + * The SBI HSM suspend function is only available when: 173 + * 1) SBI version is 0.3 or higher 174 + * 2) SBI HSM extension is available 175 + */ 176 + if (sbi_spec_version < sbi_mk_version(0, 3) || 177 + !sbi_probe_extension(SBI_EXT_HSM)) { 178 + pr_info("HSM suspend not available\n"); 179 + return false; 180 + } 181 + 182 + return true; 183 + } 135 184 #endif /* CONFIG_RISCV_SBI */

+13

arch/riscv/kernel/sys_hwprobe.c

··· 147 147 return (pair.value & ext); 148 148 } 149 149 150 + #if defined(CONFIG_RISCV_PROBE_UNALIGNED_ACCESS) 150 151 static u64 hwprobe_misaligned(const struct cpumask *cpus) 151 152 { 152 153 int cpu; ··· 170 169 171 170 return perf; 172 171 } 172 + #else 173 + static u64 hwprobe_misaligned(const struct cpumask *cpus) 174 + { 175 + if (IS_ENABLED(CONFIG_RISCV_EFFICIENT_UNALIGNED_ACCESS)) 176 + return RISCV_HWPROBE_MISALIGNED_FAST; 177 + 178 + if (IS_ENABLED(CONFIG_RISCV_EMULATED_UNALIGNED_ACCESS) && unaligned_ctl_available()) 179 + return RISCV_HWPROBE_MISALIGNED_EMULATED; 180 + 181 + return RISCV_HWPROBE_MISALIGNED_SLOW; 182 + } 183 + #endif 173 184 174 185 static void hwprobe_one_pair(struct riscv_hwprobe *pair, 175 186 const struct cpumask *cpus)

+16 -1

arch/riscv/kernel/traps.c

··· 6 6 #include <linux/cpu.h> 7 7 #include <linux/kernel.h> 8 8 #include <linux/init.h> 9 + #include <linux/randomize_kstack.h> 9 10 #include <linux/sched.h> 10 11 #include <linux/sched/debug.h> 11 12 #include <linux/sched/signal.h> ··· 311 310 } 312 311 } 313 312 314 - asmlinkage __visible __trap_section void do_trap_ecall_u(struct pt_regs *regs) 313 + asmlinkage __visible __trap_section __no_stack_protector 314 + void do_trap_ecall_u(struct pt_regs *regs) 315 315 { 316 316 if (user_mode(regs)) { 317 317 long syscall = regs->a7; ··· 324 322 325 323 syscall = syscall_enter_from_user_mode(regs, syscall); 326 324 325 + add_random_kstack_offset(); 326 + 327 327 if (syscall >= 0 && syscall < NR_syscalls) 328 328 syscall_handler(regs, syscall); 329 329 else if (syscall != -1) 330 330 regs->a0 = -ENOSYS; 331 + /* 332 + * Ultimately, this value will get limited by KSTACK_OFFSET_MAX(), 333 + * so the maximum stack offset is 1k bytes (10 bits). 334 + * 335 + * The actual entropy will be further reduced by the compiler when 336 + * applying stack alignment constraints: 16-byte (i.e. 4-bit) aligned 337 + * for RV32I or RV64I. 338 + * 339 + * The resulting 6 bits of entropy is seen in SP[9:4]. 340 + */ 341 + choose_random_kstack_offset(get_random_u16()); 331 342 332 343 syscall_exit_to_user_mode(regs); 333 344 } else {

+9 -8

arch/riscv/kernel/traps_misaligned.c

··· 413 413 414 414 perf_sw_event(PERF_COUNT_SW_ALIGNMENT_FAULTS, 1, regs, addr); 415 415 416 + #ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS 416 417 *this_cpu_ptr(&misaligned_access_speed) = RISCV_HWPROBE_MISALIGNED_EMULATED; 418 + #endif 417 419 418 420 if (!unaligned_enabled) 419 421 return -1; ··· 598 596 return 0; 599 597 } 600 598 601 - bool check_unaligned_access_emulated(int cpu) 599 + static bool check_unaligned_access_emulated(int cpu) 602 600 { 603 601 long *mas_ptr = per_cpu_ptr(&misaligned_access_speed, cpu); 604 602 unsigned long tmp_var, tmp_val; ··· 625 623 return misaligned_emu_detected; 626 624 } 627 625 628 - void unaligned_emulation_finish(void) 626 + bool check_unaligned_access_emulated_all_cpus(void) 629 627 { 630 628 int cpu; 631 629 ··· 634 632 * accesses emulated since tasks requesting such control can run on any 635 633 * CPU. 636 634 */ 637 - for_each_present_cpu(cpu) { 638 - if (per_cpu(misaligned_access_speed, cpu) != 639 - RISCV_HWPROBE_MISALIGNED_EMULATED) { 640 - return; 641 - } 642 - } 635 + for_each_online_cpu(cpu) 636 + if (!check_unaligned_access_emulated(cpu)) 637 + return false; 638 + 643 639 unaligned_ctl = true; 640 + return true; 644 641 } 645 642 646 643 bool unaligned_ctl_available(void)

+281

arch/riscv/kernel/unaligned_access_speed.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright 2024 Rivos Inc. 4 + */ 5 + 6 + #include <linux/cpu.h> 7 + #include <linux/cpumask.h> 8 + #include <linux/jump_label.h> 9 + #include <linux/mm.h> 10 + #include <linux/smp.h> 11 + #include <linux/types.h> 12 + #include <asm/cpufeature.h> 13 + #include <asm/hwprobe.h> 14 + 15 + #include "copy-unaligned.h" 16 + 17 + #define MISALIGNED_ACCESS_JIFFIES_LG2 1 18 + #define MISALIGNED_BUFFER_SIZE 0x4000 19 + #define MISALIGNED_BUFFER_ORDER get_order(MISALIGNED_BUFFER_SIZE) 20 + #define MISALIGNED_COPY_SIZE ((MISALIGNED_BUFFER_SIZE / 2) - 0x80) 21 + 22 + DEFINE_PER_CPU(long, misaligned_access_speed); 23 + 24 + #ifdef CONFIG_RISCV_PROBE_UNALIGNED_ACCESS 25 + static cpumask_t fast_misaligned_access; 26 + static int check_unaligned_access(void *param) 27 + { 28 + int cpu = smp_processor_id(); 29 + u64 start_cycles, end_cycles; 30 + u64 word_cycles; 31 + u64 byte_cycles; 32 + int ratio; 33 + unsigned long start_jiffies, now; 34 + struct page *page = param; 35 + void *dst; 36 + void *src; 37 + long speed = RISCV_HWPROBE_MISALIGNED_SLOW; 38 + 39 + if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN) 40 + return 0; 41 + 42 + /* Make an unaligned destination buffer. */ 43 + dst = (void *)((unsigned long)page_address(page) | 0x1); 44 + /* Unalign src as well, but differently (off by 1 + 2 = 3). */ 45 + src = dst + (MISALIGNED_BUFFER_SIZE / 2); 46 + src += 2; 47 + word_cycles = -1ULL; 48 + /* Do a warmup. */ 49 + __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE); 50 + preempt_disable(); 51 + start_jiffies = jiffies; 52 + while ((now = jiffies) == start_jiffies) 53 + cpu_relax(); 54 + 55 + /* 56 + * For a fixed amount of time, repeatedly try the function, and take 57 + * the best time in cycles as the measurement. 58 + */ 59 + while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) { 60 + start_cycles = get_cycles64(); 61 + /* Ensure the CSR read can't reorder WRT to the copy. */ 62 + mb(); 63 + __riscv_copy_words_unaligned(dst, src, MISALIGNED_COPY_SIZE); 64 + /* Ensure the copy ends before the end time is snapped. */ 65 + mb(); 66 + end_cycles = get_cycles64(); 67 + if ((end_cycles - start_cycles) < word_cycles) 68 + word_cycles = end_cycles - start_cycles; 69 + } 70 + 71 + byte_cycles = -1ULL; 72 + __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE); 73 + start_jiffies = jiffies; 74 + while ((now = jiffies) == start_jiffies) 75 + cpu_relax(); 76 + 77 + while (time_before(jiffies, now + (1 << MISALIGNED_ACCESS_JIFFIES_LG2))) { 78 + start_cycles = get_cycles64(); 79 + mb(); 80 + __riscv_copy_bytes_unaligned(dst, src, MISALIGNED_COPY_SIZE); 81 + mb(); 82 + end_cycles = get_cycles64(); 83 + if ((end_cycles - start_cycles) < byte_cycles) 84 + byte_cycles = end_cycles - start_cycles; 85 + } 86 + 87 + preempt_enable(); 88 + 89 + /* Don't divide by zero. */ 90 + if (!word_cycles || !byte_cycles) { 91 + pr_warn("cpu%d: rdtime lacks granularity needed to measure unaligned access speed\n", 92 + cpu); 93 + 94 + return 0; 95 + } 96 + 97 + if (word_cycles < byte_cycles) 98 + speed = RISCV_HWPROBE_MISALIGNED_FAST; 99 + 100 + ratio = div_u64((byte_cycles * 100), word_cycles); 101 + pr_info("cpu%d: Ratio of byte access time to unaligned word access is %d.%02d, unaligned accesses are %s\n", 102 + cpu, 103 + ratio / 100, 104 + ratio % 100, 105 + (speed == RISCV_HWPROBE_MISALIGNED_FAST) ? "fast" : "slow"); 106 + 107 + per_cpu(misaligned_access_speed, cpu) = speed; 108 + 109 + /* 110 + * Set the value of fast_misaligned_access of a CPU. These operations 111 + * are atomic to avoid race conditions. 112 + */ 113 + if (speed == RISCV_HWPROBE_MISALIGNED_FAST) 114 + cpumask_set_cpu(cpu, &fast_misaligned_access); 115 + else 116 + cpumask_clear_cpu(cpu, &fast_misaligned_access); 117 + 118 + return 0; 119 + } 120 + 121 + static void check_unaligned_access_nonboot_cpu(void *param) 122 + { 123 + unsigned int cpu = smp_processor_id(); 124 + struct page **pages = param; 125 + 126 + if (smp_processor_id() != 0) 127 + check_unaligned_access(pages[cpu]); 128 + } 129 + 130 + DEFINE_STATIC_KEY_FALSE(fast_unaligned_access_speed_key); 131 + 132 + static void modify_unaligned_access_branches(cpumask_t *mask, int weight) 133 + { 134 + if (cpumask_weight(mask) == weight) 135 + static_branch_enable_cpuslocked(&fast_unaligned_access_speed_key); 136 + else 137 + static_branch_disable_cpuslocked(&fast_unaligned_access_speed_key); 138 + } 139 + 140 + static void set_unaligned_access_static_branches_except_cpu(int cpu) 141 + { 142 + /* 143 + * Same as set_unaligned_access_static_branches, except excludes the 144 + * given CPU from the result. When a CPU is hotplugged into an offline 145 + * state, this function is called before the CPU is set to offline in 146 + * the cpumask, and thus the CPU needs to be explicitly excluded. 147 + */ 148 + 149 + cpumask_t fast_except_me; 150 + 151 + cpumask_and(&fast_except_me, &fast_misaligned_access, cpu_online_mask); 152 + cpumask_clear_cpu(cpu, &fast_except_me); 153 + 154 + modify_unaligned_access_branches(&fast_except_me, num_online_cpus() - 1); 155 + } 156 + 157 + static void set_unaligned_access_static_branches(void) 158 + { 159 + /* 160 + * This will be called after check_unaligned_access_all_cpus so the 161 + * result of unaligned access speed for all CPUs will be available. 162 + * 163 + * To avoid the number of online cpus changing between reading 164 + * cpu_online_mask and calling num_online_cpus, cpus_read_lock must be 165 + * held before calling this function. 166 + */ 167 + 168 + cpumask_t fast_and_online; 169 + 170 + cpumask_and(&fast_and_online, &fast_misaligned_access, cpu_online_mask); 171 + 172 + modify_unaligned_access_branches(&fast_and_online, num_online_cpus()); 173 + } 174 + 175 + static int lock_and_set_unaligned_access_static_branch(void) 176 + { 177 + cpus_read_lock(); 178 + set_unaligned_access_static_branches(); 179 + cpus_read_unlock(); 180 + 181 + return 0; 182 + } 183 + 184 + arch_initcall_sync(lock_and_set_unaligned_access_static_branch); 185 + 186 + static int riscv_online_cpu(unsigned int cpu) 187 + { 188 + static struct page *buf; 189 + 190 + /* We are already set since the last check */ 191 + if (per_cpu(misaligned_access_speed, cpu) != RISCV_HWPROBE_MISALIGNED_UNKNOWN) 192 + goto exit; 193 + 194 + buf = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER); 195 + if (!buf) { 196 + pr_warn("Allocation failure, not measuring misaligned performance\n"); 197 + return -ENOMEM; 198 + } 199 + 200 + check_unaligned_access(buf); 201 + __free_pages(buf, MISALIGNED_BUFFER_ORDER); 202 + 203 + exit: 204 + set_unaligned_access_static_branches(); 205 + 206 + return 0; 207 + } 208 + 209 + static int riscv_offline_cpu(unsigned int cpu) 210 + { 211 + set_unaligned_access_static_branches_except_cpu(cpu); 212 + 213 + return 0; 214 + } 215 + 216 + /* Measure unaligned access speed on all CPUs present at boot in parallel. */ 217 + static int check_unaligned_access_speed_all_cpus(void) 218 + { 219 + unsigned int cpu; 220 + unsigned int cpu_count = num_possible_cpus(); 221 + struct page **bufs = kcalloc(cpu_count, sizeof(*bufs), GFP_KERNEL); 222 + 223 + if (!bufs) { 224 + pr_warn("Allocation failure, not measuring misaligned performance\n"); 225 + return 0; 226 + } 227 + 228 + /* 229 + * Allocate separate buffers for each CPU so there's no fighting over 230 + * cache lines. 231 + */ 232 + for_each_cpu(cpu, cpu_online_mask) { 233 + bufs[cpu] = alloc_pages(GFP_KERNEL, MISALIGNED_BUFFER_ORDER); 234 + if (!bufs[cpu]) { 235 + pr_warn("Allocation failure, not measuring misaligned performance\n"); 236 + goto out; 237 + } 238 + } 239 + 240 + /* Check everybody except 0, who stays behind to tend jiffies. */ 241 + on_each_cpu(check_unaligned_access_nonboot_cpu, bufs, 1); 242 + 243 + /* Check core 0. */ 244 + smp_call_on_cpu(0, check_unaligned_access, bufs[0], true); 245 + 246 + /* 247 + * Setup hotplug callbacks for any new CPUs that come online or go 248 + * offline. 249 + */ 250 + cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "riscv:online", 251 + riscv_online_cpu, riscv_offline_cpu); 252 + 253 + out: 254 + for_each_cpu(cpu, cpu_online_mask) { 255 + if (bufs[cpu]) 256 + __free_pages(bufs[cpu], MISALIGNED_BUFFER_ORDER); 257 + } 258 + 259 + kfree(bufs); 260 + return 0; 261 + } 262 + 263 + static int check_unaligned_access_all_cpus(void) 264 + { 265 + bool all_cpus_emulated = check_unaligned_access_emulated_all_cpus(); 266 + 267 + if (!all_cpus_emulated) 268 + return check_unaligned_access_speed_all_cpus(); 269 + 270 + return 0; 271 + } 272 + #else /* CONFIG_RISCV_PROBE_UNALIGNED_ACCESS */ 273 + static int check_unaligned_access_all_cpus(void) 274 + { 275 + check_unaligned_access_emulated_all_cpus(); 276 + 277 + return 0; 278 + } 279 + #endif 280 + 281 + arch_initcall(check_unaligned_access_all_cpus);

+2 -5

arch/riscv/lib/csum.c

··· 3 3 * Checksum library 4 4 * 5 5 * Influenced by arch/arm64/lib/csum.c 6 - * Copyright (C) 2023 Rivos Inc. 6 + * Copyright (C) 2023-2024 Rivos Inc. 7 7 */ 8 8 #include <linux/bitops.h> 9 9 #include <linux/compiler.h> ··· 318 318 * branches. The largest chunk of overlap was delegated into the 319 319 * do_csum_common function. 320 320 */ 321 - if (static_branch_likely(&fast_misaligned_access_speed_key)) 322 - return do_csum_no_alignment(buff, len); 323 - 324 - if (((unsigned long)buff & OFFSET_MASK) == 0) 321 + if (has_fast_unaligned_accesses() || (((unsigned long)buff & OFFSET_MASK) == 0)) 325 322 return do_csum_no_alignment(buff, len); 326 323 327 324 return do_csum_with_alignment(buff, len);

-1

arch/riscv/lib/uaccess_vector.S

··· 1 1 /* SPDX-License-Identifier: GPL-2.0-only */ 2 2 3 3 #include <linux/linkage.h> 4 - #include <asm-generic/export.h> 5 4 #include <asm/asm.h> 6 5 #include <asm/asm-extable.h> 7 6 #include <asm/csr.h>

+2 -2

arch/riscv/mm/cacheflush.c

··· 82 82 #endif /* CONFIG_SMP */ 83 83 84 84 #ifdef CONFIG_MMU 85 - void flush_icache_pte(pte_t pte) 85 + void flush_icache_pte(struct mm_struct *mm, pte_t pte) 86 86 { 87 87 struct folio *folio = page_folio(pte_page(pte)); 88 88 89 89 if (!test_bit(PG_dcache_clean, &folio->flags)) { 90 - flush_icache_all(); 90 + flush_icache_mm(mm, false); 91 91 set_bit(PG_dcache_clean, &folio->flags); 92 92 } 93 93 }

+2

arch/riscv/mm/context.c

··· 323 323 if (unlikely(prev == next)) 324 324 return; 325 325 326 + membarrier_arch_switch_mm(prev, next, task); 327 + 326 328 /* 327 329 * Mark the current MM context as inactive, and the next as 328 330 * active. This is at least used by the icache flushing

+6

arch/riscv/mm/init.c

··· 764 764 } 765 765 early_param("no5lvl", print_no5lvl); 766 766 767 + static void __init set_mmap_rnd_bits_max(void) 768 + { 769 + mmap_rnd_bits_max = MMAP_VA_BITS - PAGE_SHIFT - 3; 770 + } 771 + 767 772 /* 768 773 * There is a simple way to determine if 4-level is supported by the 769 774 * underlying hardware: establish 1:1 mapping in 4-level page table mode ··· 1083 1078 1084 1079 #if defined(CONFIG_64BIT) && !defined(CONFIG_XIP_KERNEL) 1085 1080 set_satp_mode(dtb_pa); 1081 + set_mmap_rnd_bits_max(); 1086 1082 #endif 1087 1083 1088 1084 /*

+1 -1

arch/riscv/mm/pgtable.c

··· 10 10 pte_t entry, int dirty) 11 11 { 12 12 if (!pte_same(ptep_get(ptep), entry)) 13 - __set_pte_at(ptep, entry); 13 + __set_pte_at(vma->vm_mm, ptep, entry); 14 14 /* 15 15 * update_mmu_cache will unconditionally execute, handling both 16 16 * the case that the PTE changed and the spurious fault case.

+3

crypto/Kconfig

··· 1497 1497 if PPC 1498 1498 source "arch/powerpc/crypto/Kconfig" 1499 1499 endif 1500 + if RISCV 1501 + source "arch/riscv/crypto/Kconfig" 1502 + endif 1500 1503 if S390 1501 1504 source "arch/s390/crypto/Kconfig" 1502 1505 endif

+1 -1

drivers/acpi/Kconfig

··· 286 286 287 287 config ACPI_PROCESSOR 288 288 tristate "Processor" 289 - depends on X86 || ARM64 || LOONGARCH 289 + depends on X86 || ARM64 || LOONGARCH || RISCV 290 290 select ACPI_PROCESSOR_IDLE 291 291 select ACPI_CPU_FREQ_PSS if X86 || LOONGARCH 292 292 select THERMAL

+3 -1

drivers/acpi/riscv/Makefile

··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 - obj-y += rhct.o 2 + obj-y += rhct.o 3 + obj-$(CONFIG_ACPI_PROCESSOR_IDLE) += cpuidle.o 4 + obj-$(CONFIG_ACPI_CPPC_LIB) += cppc.o

+157

drivers/acpi/riscv/cppc.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Implement CPPC FFH helper routines for RISC-V. 4 + * 5 + * Copyright (C) 2024 Ventana Micro Systems Inc. 6 + */ 7 + 8 + #include <acpi/cppc_acpi.h> 9 + #include <asm/csr.h> 10 + #include <asm/sbi.h> 11 + 12 + #define SBI_EXT_CPPC 0x43505043 13 + 14 + /* CPPC interfaces defined in SBI spec */ 15 + #define SBI_CPPC_PROBE 0x0 16 + #define SBI_CPPC_READ 0x1 17 + #define SBI_CPPC_READ_HI 0x2 18 + #define SBI_CPPC_WRITE 0x3 19 + 20 + /* RISC-V FFH definitions from RISC-V FFH spec */ 21 + #define FFH_CPPC_TYPE(r) (((r) & GENMASK_ULL(63, 60)) >> 60) 22 + #define FFH_CPPC_SBI_REG(r) ((r) & GENMASK(31, 0)) 23 + #define FFH_CPPC_CSR_NUM(r) ((r) & GENMASK(11, 0)) 24 + 25 + #define FFH_CPPC_SBI 0x1 26 + #define FFH_CPPC_CSR 0x2 27 + 28 + struct sbi_cppc_data { 29 + u64 val; 30 + u32 reg; 31 + struct sbiret ret; 32 + }; 33 + 34 + static bool cppc_ext_present; 35 + 36 + static int __init sbi_cppc_init(void) 37 + { 38 + if (sbi_spec_version >= sbi_mk_version(2, 0) && 39 + sbi_probe_extension(SBI_EXT_CPPC) > 0) { 40 + pr_info("SBI CPPC extension detected\n"); 41 + cppc_ext_present = true; 42 + } else { 43 + pr_info("SBI CPPC extension NOT detected!!\n"); 44 + cppc_ext_present = false; 45 + } 46 + 47 + return 0; 48 + } 49 + device_initcall(sbi_cppc_init); 50 + 51 + static void sbi_cppc_read(void *read_data) 52 + { 53 + struct sbi_cppc_data *data = (struct sbi_cppc_data *)read_data; 54 + 55 + data->ret = sbi_ecall(SBI_EXT_CPPC, SBI_CPPC_READ, 56 + data->reg, 0, 0, 0, 0, 0); 57 + } 58 + 59 + static void sbi_cppc_write(void *write_data) 60 + { 61 + struct sbi_cppc_data *data = (struct sbi_cppc_data *)write_data; 62 + 63 + data->ret = sbi_ecall(SBI_EXT_CPPC, SBI_CPPC_WRITE, 64 + data->reg, data->val, 0, 0, 0, 0); 65 + } 66 + 67 + static void cppc_ffh_csr_read(void *read_data) 68 + { 69 + struct sbi_cppc_data *data = (struct sbi_cppc_data *)read_data; 70 + 71 + switch (data->reg) { 72 + /* Support only TIME CSR for now */ 73 + case CSR_TIME: 74 + data->ret.value = csr_read(CSR_TIME); 75 + data->ret.error = 0; 76 + break; 77 + default: 78 + data->ret.error = -EINVAL; 79 + break; 80 + } 81 + } 82 + 83 + static void cppc_ffh_csr_write(void *write_data) 84 + { 85 + struct sbi_cppc_data *data = (struct sbi_cppc_data *)write_data; 86 + 87 + data->ret.error = -EINVAL; 88 + } 89 + 90 + /* 91 + * Refer to drivers/acpi/cppc_acpi.c for the description of the functions 92 + * below. 93 + */ 94 + bool cpc_ffh_supported(void) 95 + { 96 + return true; 97 + } 98 + 99 + int cpc_read_ffh(int cpu, struct cpc_reg *reg, u64 *val) 100 + { 101 + struct sbi_cppc_data data; 102 + 103 + if (WARN_ON_ONCE(irqs_disabled())) 104 + return -EPERM; 105 + 106 + if (FFH_CPPC_TYPE(reg->address) == FFH_CPPC_SBI) { 107 + if (!cppc_ext_present) 108 + return -EINVAL; 109 + 110 + data.reg = FFH_CPPC_SBI_REG(reg->address); 111 + 112 + smp_call_function_single(cpu, sbi_cppc_read, &data, 1); 113 + 114 + *val = data.ret.value; 115 + 116 + return (data.ret.error) ? sbi_err_map_linux_errno(data.ret.error) : 0; 117 + } else if (FFH_CPPC_TYPE(reg->address) == FFH_CPPC_CSR) { 118 + data.reg = FFH_CPPC_CSR_NUM(reg->address); 119 + 120 + smp_call_function_single(cpu, cppc_ffh_csr_read, &data, 1); 121 + 122 + *val = data.ret.value; 123 + 124 + return (data.ret.error) ? sbi_err_map_linux_errno(data.ret.error) : 0; 125 + } 126 + 127 + return -EINVAL; 128 + } 129 + 130 + int cpc_write_ffh(int cpu, struct cpc_reg *reg, u64 val) 131 + { 132 + struct sbi_cppc_data data; 133 + 134 + if (WARN_ON_ONCE(irqs_disabled())) 135 + return -EPERM; 136 + 137 + if (FFH_CPPC_TYPE(reg->address) == FFH_CPPC_SBI) { 138 + if (!cppc_ext_present) 139 + return -EINVAL; 140 + 141 + data.reg = FFH_CPPC_SBI_REG(reg->address); 142 + data.val = val; 143 + 144 + smp_call_function_single(cpu, sbi_cppc_write, &data, 1); 145 + 146 + return (data.ret.error) ? sbi_err_map_linux_errno(data.ret.error) : 0; 147 + } else if (FFH_CPPC_TYPE(reg->address) == FFH_CPPC_CSR) { 148 + data.reg = FFH_CPPC_CSR_NUM(reg->address); 149 + data.val = val; 150 + 151 + smp_call_function_single(cpu, cppc_ffh_csr_write, &data, 1); 152 + 153 + return (data.ret.error) ? sbi_err_map_linux_errno(data.ret.error) : 0; 154 + } 155 + 156 + return -EINVAL; 157 + }

+81

drivers/acpi/riscv/cpuidle.c

··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* 3 + * Copyright (C) 2024, Ventana Micro Systems Inc 4 + * Author: Sunil V L <sunilvl@ventanamicro.com> 5 + * 6 + */ 7 + 8 + #include <linux/acpi.h> 9 + #include <acpi/processor.h> 10 + #include <linux/cpu_pm.h> 11 + #include <linux/cpuidle.h> 12 + #include <linux/suspend.h> 13 + #include <asm/cpuidle.h> 14 + #include <asm/sbi.h> 15 + #include <asm/suspend.h> 16 + 17 + #define RISCV_FFH_LPI_TYPE_MASK GENMASK_ULL(63, 60) 18 + #define RISCV_FFH_LPI_RSVD_MASK GENMASK_ULL(59, 32) 19 + 20 + #define RISCV_FFH_LPI_TYPE_SBI BIT_ULL(60) 21 + 22 + static int acpi_cpu_init_idle(unsigned int cpu) 23 + { 24 + int i; 25 + struct acpi_lpi_state *lpi; 26 + struct acpi_processor *pr = per_cpu(processors, cpu); 27 + 28 + if (unlikely(!pr || !pr->flags.has_lpi)) 29 + return -EINVAL; 30 + 31 + if (!riscv_sbi_hsm_is_supported()) 32 + return -ENODEV; 33 + 34 + if (pr->power.count <= 1) 35 + return -ENODEV; 36 + 37 + for (i = 1; i < pr->power.count; i++) { 38 + u32 state; 39 + 40 + lpi = &pr->power.lpi_states[i]; 41 + 42 + /* 43 + * Validate Entry Method as per FFH spec. 44 + * bits[63:60] should be 0x1 45 + * bits[59:32] should be 0x0 46 + * bits[31:0] represent a SBI power_state 47 + */ 48 + if (((lpi->address & RISCV_FFH_LPI_TYPE_MASK) != RISCV_FFH_LPI_TYPE_SBI) || 49 + (lpi->address & RISCV_FFH_LPI_RSVD_MASK)) { 50 + pr_warn("Invalid LPI entry method %#llx\n", lpi->address); 51 + return -EINVAL; 52 + } 53 + 54 + state = lpi->address; 55 + if (!riscv_sbi_suspend_state_is_valid(state)) { 56 + pr_warn("Invalid SBI power state %#x\n", state); 57 + return -EINVAL; 58 + } 59 + } 60 + 61 + return 0; 62 + } 63 + 64 + int acpi_processor_ffh_lpi_probe(unsigned int cpu) 65 + { 66 + return acpi_cpu_init_idle(cpu); 67 + } 68 + 69 + int acpi_processor_ffh_lpi_enter(struct acpi_lpi_state *lpi) 70 + { 71 + u32 state = lpi->address; 72 + 73 + if (state & SBI_HSM_SUSP_NON_RET_BIT) 74 + return CPU_PM_CPU_IDLE_ENTER_PARAM(riscv_sbi_hart_suspend, 75 + lpi->index, 76 + state); 77 + else 78 + return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(riscv_sbi_hart_suspend, 79 + lpi->index, 80 + state); 81 + }

+1 -1

drivers/clocksource/timer-clint.c

··· 131 131 struct clock_event_device *ce = per_cpu_ptr(&clint_clock_event, cpu); 132 132 133 133 ce->cpumask = cpumask_of(cpu); 134 - clockevents_config_and_register(ce, clint_timer_freq, 100, 0x7fffffff); 134 + clockevents_config_and_register(ce, clint_timer_freq, 100, ULONG_MAX); 135 135 136 136 enable_percpu_irq(clint_timer_irq, 137 137 irq_get_trigger_type(clint_timer_irq));

+1 -1

drivers/clocksource/timer-riscv.c

··· 114 114 ce->features |= CLOCK_EVT_FEAT_C3STOP; 115 115 if (static_branch_likely(&riscv_sstc_available)) 116 116 ce->rating = 450; 117 - clockevents_config_and_register(ce, riscv_timebase, 100, 0x7fffffff); 117 + clockevents_config_and_register(ce, riscv_timebase, 100, ULONG_MAX); 118 118 119 119 enable_percpu_irq(riscv_clock_event_irq, 120 120 irq_get_trigger_type(riscv_clock_event_irq));

+29

drivers/cpufreq/Kconfig

··· 302 302 which are capable of changing the CPU's frequency dynamically. 303 303 304 304 endif 305 + 306 + config ACPI_CPPC_CPUFREQ 307 + tristate "CPUFreq driver based on the ACPI CPPC spec" 308 + depends on ACPI_PROCESSOR 309 + depends on ARM || ARM64 || RISCV 310 + select ACPI_CPPC_LIB 311 + help 312 + This adds a CPUFreq driver which uses CPPC methods 313 + as described in the ACPIv5.1 spec. CPPC stands for 314 + Collaborative Processor Performance Controls. It 315 + is based on an abstract continuous scale of CPU 316 + performance values which allows the remote power 317 + processor to flexibly optimize for power and 318 + performance. CPPC relies on power management firmware 319 + support for its operation. 320 + 321 + If in doubt, say N. 322 + 323 + config ACPI_CPPC_CPUFREQ_FIE 324 + bool "Frequency Invariance support for CPPC cpufreq driver" 325 + depends on ACPI_CPPC_CPUFREQ && GENERIC_ARCH_TOPOLOGY 326 + depends on ARM || ARM64 || RISCV 327 + default y 328 + help 329 + This extends frequency invariance support in the CPPC cpufreq driver, 330 + by using CPPC delivered and reference performance counters. 331 + 332 + If in doubt, say N. 333 + 305 334 endmenu

-26

drivers/cpufreq/Kconfig.arm

··· 3 3 # ARM CPU Frequency scaling drivers 4 4 # 5 5 6 - config ACPI_CPPC_CPUFREQ 7 - tristate "CPUFreq driver based on the ACPI CPPC spec" 8 - depends on ACPI_PROCESSOR 9 - select ACPI_CPPC_LIB 10 - help 11 - This adds a CPUFreq driver which uses CPPC methods 12 - as described in the ACPIv5.1 spec. CPPC stands for 13 - Collaborative Processor Performance Controls. It 14 - is based on an abstract continuous scale of CPU 15 - performance values which allows the remote power 16 - processor to flexibly optimize for power and 17 - performance. CPPC relies on power management firmware 18 - support for its operation. 19 - 20 - If in doubt, say N. 21 - 22 - config ACPI_CPPC_CPUFREQ_FIE 23 - bool "Frequency Invariance support for CPPC cpufreq driver" 24 - depends on ACPI_CPPC_CPUFREQ && GENERIC_ARCH_TOPOLOGY 25 - default y 26 - help 27 - This extends frequency invariance support in the CPPC cpufreq driver, 28 - by using CPPC delivered and reference performance counters. 29 - 30 - If in doubt, say N. 31 - 32 6 config ARM_ALLWINNER_SUN50I_CPUFREQ_NVMEM 33 7 tristate "Allwinner nvmem based SUN50I CPUFreq driver" 34 8 depends on ARCH_SUNXI

+5 -44

drivers/cpuidle/cpuidle-riscv-sbi.c

··· 73 73 return data->available; 74 74 } 75 75 76 - static int sbi_suspend_finisher(unsigned long suspend_type, 77 - unsigned long resume_addr, 78 - unsigned long opaque) 79 - { 80 - struct sbiret ret; 81 - 82 - ret = sbi_ecall(SBI_EXT_HSM, SBI_EXT_HSM_HART_SUSPEND, 83 - suspend_type, resume_addr, opaque, 0, 0, 0); 84 - 85 - return (ret.error) ? sbi_err_map_linux_errno(ret.error) : 0; 86 - } 87 - 88 - static int sbi_suspend(u32 state) 89 - { 90 - if (state & SBI_HSM_SUSP_NON_RET_BIT) 91 - return cpu_suspend(state, sbi_suspend_finisher); 92 - else 93 - return sbi_suspend_finisher(state, 0, 0); 94 - } 95 - 96 76 static __cpuidle int sbi_cpuidle_enter_state(struct cpuidle_device *dev, 97 77 struct cpuidle_driver *drv, int idx) 98 78 { ··· 80 100 u32 state = states[idx]; 81 101 82 102 if (state & SBI_HSM_SUSP_NON_RET_BIT) 83 - return CPU_PM_CPU_IDLE_ENTER_PARAM(sbi_suspend, idx, state); 103 + return CPU_PM_CPU_IDLE_ENTER_PARAM(riscv_sbi_hart_suspend, idx, state); 84 104 else 85 - return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(sbi_suspend, 105 + return CPU_PM_CPU_IDLE_ENTER_RETENTION_PARAM(riscv_sbi_hart_suspend, 86 106 idx, state); 87 107 } 88 108 ··· 113 133 else 114 134 state = states[idx]; 115 135 116 - ret = sbi_suspend(state) ? -1 : idx; 136 + ret = riscv_sbi_hart_suspend(state) ? -1 : idx; 117 137 118 138 ct_cpuidle_exit(); 119 139 ··· 186 206 { }, 187 207 }; 188 208 189 - static bool sbi_suspend_state_is_valid(u32 state) 190 - { 191 - if (state > SBI_HSM_SUSPEND_RET_DEFAULT && 192 - state < SBI_HSM_SUSPEND_RET_PLATFORM) 193 - return false; 194 - if (state > SBI_HSM_SUSPEND_NON_RET_DEFAULT && 195 - state < SBI_HSM_SUSPEND_NON_RET_PLATFORM) 196 - return false; 197 - return true; 198 - } 199 - 200 209 static int sbi_dt_parse_state_node(struct device_node *np, u32 *state) 201 210 { 202 211 int err = of_property_read_u32(np, "riscv,sbi-suspend-param", state); ··· 195 226 return err; 196 227 } 197 228 198 - if (!sbi_suspend_state_is_valid(*state)) { 229 + if (!riscv_sbi_suspend_state_is_valid(*state)) { 199 230 pr_warn("Invalid SBI suspend state %#x\n", *state); 200 231 return -EINVAL; 201 232 } ··· 576 607 int ret; 577 608 struct platform_device *pdev; 578 609 579 - /* 580 - * The SBI HSM suspend function is only available when: 581 - * 1) SBI version is 0.3 or higher 582 - * 2) SBI HSM extension is available 583 - */ 584 - if ((sbi_spec_version < sbi_mk_version(0, 3)) || 585 - !sbi_probe_extension(SBI_EXT_HSM)) { 586 - pr_info("HSM suspend not available\n"); 610 + if (!riscv_sbi_hsm_is_supported()) 587 611 return 0; 588 - } 589 612 590 613 ret = platform_driver_register(&sbi_cpuidle_driver); 591 614 if (ret)

+14

drivers/perf/Kconfig

··· 96 96 an L3 memory system. The L3 cache events are added into perf event 97 97 subsystem, allowing monitoring of various L3 cache perf events. 98 98 99 + config ANDES_CUSTOM_PMU 100 + bool "Andes custom PMU support" 101 + depends on ARCH_RENESAS && RISCV_ALTERNATIVE && RISCV_PMU_SBI 102 + default y 103 + help 104 + The Andes cores implement the PMU overflow extension very 105 + similar to the standard Sscofpmf and Smcntrpmf extension. 106 + 107 + This will patch the overflow and pending CSRs and handle the 108 + non-standard behaviour via the regular SBI PMU driver and 109 + interface. 110 + 111 + If you don't know what to do here, say "Y". 112 + 99 113 config ARM_PMU_ACPI 100 114 depends on ARM_PMU && ACPI 101 115 def_bool y

+32 -5

drivers/perf/riscv_pmu_sbi.c

··· 19 19 #include <linux/of.h> 20 20 #include <linux/cpu_pm.h> 21 21 #include <linux/sched/clock.h> 22 + #include <linux/soc/andes/irq.h> 22 23 23 24 #include <asm/errata_list.h> 24 25 #include <asm/sbi.h> 25 26 #include <asm/cpufeature.h> 27 + 28 + #define ALT_SBI_PMU_OVERFLOW(__ovl) \ 29 + asm volatile(ALTERNATIVE_2( \ 30 + "csrr %0, " __stringify(CSR_SSCOUNTOVF), \ 31 + "csrr %0, " __stringify(THEAD_C9XX_CSR_SCOUNTEROF), \ 32 + THEAD_VENDOR_ID, ERRATA_THEAD_PMU, \ 33 + CONFIG_ERRATA_THEAD_PMU, \ 34 + "csrr %0, " __stringify(ANDES_CSR_SCOUNTEROF), \ 35 + 0, RISCV_ISA_EXT_XANDESPMU, \ 36 + CONFIG_ANDES_CUSTOM_PMU) \ 37 + : "=r" (__ovl) : \ 38 + : "memory") 39 + 40 + #define ALT_SBI_PMU_OVF_CLEAR_PENDING(__irq_mask) \ 41 + asm volatile(ALTERNATIVE( \ 42 + "csrc " __stringify(CSR_IP) ", %0\n\t", \ 43 + "csrc " __stringify(ANDES_CSR_SLIP) ", %0\n\t", \ 44 + 0, RISCV_ISA_EXT_XANDESPMU, \ 45 + CONFIG_ANDES_CUSTOM_PMU) \ 46 + : : "r"(__irq_mask) \ 47 + : "memory") 26 48 27 49 #define SYSCTL_NO_USER_ACCESS 0 28 50 #define SYSCTL_USER_ACCESS 1 ··· 83 61 static union sbi_pmu_ctr_info *pmu_ctr_list; 84 62 static bool riscv_pmu_use_irq; 85 63 static unsigned int riscv_pmu_irq_num; 64 + static unsigned int riscv_pmu_irq_mask; 86 65 static unsigned int riscv_pmu_irq; 87 66 88 67 /* Cache the available counters in a bitmask */ ··· 717 694 718 695 event = cpu_hw_evt->events[fidx]; 719 696 if (!event) { 720 - csr_clear(CSR_SIP, BIT(riscv_pmu_irq_num)); 697 + ALT_SBI_PMU_OVF_CLEAR_PENDING(riscv_pmu_irq_mask); 721 698 return IRQ_NONE; 722 699 } 723 700 ··· 731 708 * Overflow interrupt pending bit should only be cleared after stopping 732 709 * all the counters to avoid any race condition. 733 710 */ 734 - csr_clear(CSR_SIP, BIT(riscv_pmu_irq_num)); 711 + ALT_SBI_PMU_OVF_CLEAR_PENDING(riscv_pmu_irq_mask); 735 712 736 713 /* No overflow bit is set */ 737 714 if (!overflow) ··· 803 780 804 781 if (riscv_pmu_use_irq) { 805 782 cpu_hw_evt->irq = riscv_pmu_irq; 806 - csr_clear(CSR_IP, BIT(riscv_pmu_irq_num)); 807 - csr_set(CSR_IE, BIT(riscv_pmu_irq_num)); 783 + ALT_SBI_PMU_OVF_CLEAR_PENDING(riscv_pmu_irq_mask); 808 784 enable_percpu_irq(riscv_pmu_irq, IRQ_TYPE_NONE); 809 785 } 810 786 ··· 814 792 { 815 793 if (riscv_pmu_use_irq) { 816 794 disable_percpu_irq(riscv_pmu_irq); 817 - csr_clear(CSR_IE, BIT(riscv_pmu_irq_num)); 818 795 } 819 796 820 797 /* Disable all counters access for user mode now */ ··· 837 816 riscv_cached_mimpid(0) == 0) { 838 817 riscv_pmu_irq_num = THEAD_C9XX_RV_IRQ_PMU; 839 818 riscv_pmu_use_irq = true; 819 + } else if (riscv_isa_extension_available(NULL, XANDESPMU) && 820 + IS_ENABLED(CONFIG_ANDES_CUSTOM_PMU)) { 821 + riscv_pmu_irq_num = ANDES_SLI_CAUSE_BASE + ANDES_RV_IRQ_PMOVI; 822 + riscv_pmu_use_irq = true; 840 823 } 824 + 825 + riscv_pmu_irq_mask = BIT(riscv_pmu_irq_num % BITS_PER_LONG); 841 826 842 827 if (!riscv_pmu_use_irq) 843 828 return -EOPNOTSUPP;

+6 -2

include/asm-generic/bitops/__ffs.h

··· 5 5 #include <asm/types.h> 6 6 7 7 /** 8 - * __ffs - find first bit in word. 8 + * generic___ffs - find first bit in word. 9 9 * @word: The word to search 10 10 * 11 11 * Undefined if no bit exists, so code should check against 0 first. 12 12 */ 13 - static __always_inline unsigned long __ffs(unsigned long word) 13 + static __always_inline unsigned long generic___ffs(unsigned long word) 14 14 { 15 15 int num = 0; 16 16 ··· 40 40 num += 1; 41 41 return num; 42 42 } 43 + 44 + #ifndef __HAVE_ARCH___FFS 45 + #define __ffs(word) generic___ffs(word) 46 + #endif 43 47 44 48 #endif /* _ASM_GENERIC_BITOPS___FFS_H_ */

+6 -2

include/asm-generic/bitops/__fls.h

··· 5 5 #include <asm/types.h> 6 6 7 7 /** 8 - * __fls - find last (most-significant) set bit in a long word 8 + * generic___fls - find last (most-significant) set bit in a long word 9 9 * @word: the word to search 10 10 * 11 11 * Undefined if no set bit exists, so code should check against 0 first. 12 12 */ 13 - static __always_inline unsigned long __fls(unsigned long word) 13 + static __always_inline unsigned long generic___fls(unsigned long word) 14 14 { 15 15 int num = BITS_PER_LONG - 1; 16 16 ··· 40 40 num -= 1; 41 41 return num; 42 42 } 43 + 44 + #ifndef __HAVE_ARCH___FLS 45 + #define __fls(word) generic___fls(word) 46 + #endif 43 47 44 48 #endif /* _ASM_GENERIC_BITOPS___FLS_H_ */

+6 -2

include/asm-generic/bitops/ffs.h

··· 3 3 #define _ASM_GENERIC_BITOPS_FFS_H_ 4 4 5 5 /** 6 - * ffs - find first bit set 6 + * generic_ffs - find first bit set 7 7 * @x: the word to search 8 8 * 9 9 * This is defined the same way as 10 10 * the libc and compiler builtin ffs routines, therefore 11 11 * differs in spirit from ffz (man ffs). 12 12 */ 13 - static inline int ffs(int x) 13 + static inline int generic_ffs(int x) 14 14 { 15 15 int r = 1; 16 16 ··· 38 38 } 39 39 return r; 40 40 } 41 + 42 + #ifndef __HAVE_ARCH_FFS 43 + #define ffs(x) generic_ffs(x) 44 + #endif 41 45 42 46 #endif /* _ASM_GENERIC_BITOPS_FFS_H_ */

+6 -2

include/asm-generic/bitops/fls.h

··· 3 3 #define _ASM_GENERIC_BITOPS_FLS_H_ 4 4 5 5 /** 6 - * fls - find last (most-significant) bit set 6 + * generic_fls - find last (most-significant) bit set 7 7 * @x: the word to search 8 8 * 9 9 * This is defined the same way as ffs. 10 10 * Note fls(0) = 0, fls(1) = 1, fls(0x80000000) = 32. 11 11 */ 12 12 13 - static __always_inline int fls(unsigned int x) 13 + static __always_inline int generic_fls(unsigned int x) 14 14 { 15 15 int r = 32; 16 16 ··· 38 38 } 39 39 return r; 40 40 } 41 + 42 + #ifndef __HAVE_ARCH_FLS 43 + #define fls(x) generic_fls(x) 44 + #endif 41 45 42 46 #endif /* _ASM_GENERIC_BITOPS_FLS_H_ */

+1 -1

include/linux/mm.h

··· 87 87 88 88 #ifdef CONFIG_HAVE_ARCH_MMAP_RND_BITS 89 89 extern const int mmap_rnd_bits_min; 90 - extern const int mmap_rnd_bits_max; 90 + extern int mmap_rnd_bits_max __ro_after_init; 91 91 extern int mmap_rnd_bits __read_mostly; 92 92 #endif 93 93 #ifdef CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS

+15 -1

include/linux/sync_core.h

··· 17 17 } 18 18 #endif 19 19 20 - #endif /* _LINUX_SYNC_CORE_H */ 20 + #ifdef CONFIG_ARCH_HAS_PREPARE_SYNC_CORE_CMD 21 + #include <asm/sync_core.h> 22 + #else 23 + /* 24 + * This is a dummy prepare_sync_core_cmd() implementation that can be used on 25 + * all architectures which provide unconditional core serializing instructions 26 + * in switch_mm(). 27 + * If your architecture doesn't provide such core serializing instructions in 28 + * switch_mm(), you may need to write your own functions. 29 + */ 30 + static inline void prepare_sync_core_cmd(struct mm_struct *mm) 31 + { 32 + } 33 + #endif 21 34 35 + #endif /* _LINUX_SYNC_CORE_H */

+3

init/Kconfig

··· 1986 1986 config ARCH_HAS_NON_OVERLAPPING_ADDRESS_SPACE 1987 1987 bool 1988 1988 1989 + config ARCH_HAS_PREPARE_SYNC_CORE_CMD 1990 + bool 1991 + 1989 1992 config ARCH_HAS_SYNC_CORE_BEFORE_USERMODE 1990 1993 bool 1991 1994

+13 -3

kernel/sched/core.c

··· 6647 6647 * if (signal_pending_state()) if (p->state & @state) 6648 6648 * 6649 6649 * Also, the membarrier system call requires a full memory barrier 6650 - * after coming from user-space, before storing to rq->curr. 6650 + * after coming from user-space, before storing to rq->curr; this 6651 + * barrier matches a full barrier in the proximity of the membarrier 6652 + * system call exit. 6651 6653 */ 6652 6654 rq_lock(rq, &rf); 6653 6655 smp_mb__after_spinlock(); ··· 6720 6718 * 6721 6719 * Here are the schemes providing that barrier on the 6722 6720 * various architectures: 6723 - * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC. 6724 - * switch_mm() rely on membarrier_arch_switch_mm() on PowerPC. 6721 + * - mm ? switch_mm() : mmdrop() for x86, s390, sparc, PowerPC, 6722 + * RISC-V. switch_mm() relies on membarrier_arch_switch_mm() 6723 + * on PowerPC and on RISC-V. 6725 6724 * - finish_lock_switch() for weakly-ordered 6726 6725 * architectures where spin_unlock is a full barrier, 6727 6726 * - switch_to() for arm64 (weakly-ordered, spin_unlock 6728 6727 * is a RELEASE barrier), 6728 + * 6729 + * The barrier matches a full barrier in the proximity of 6730 + * the membarrier system call entry. 6731 + * 6732 + * On RISC-V, this barrier pairing is also needed for the 6733 + * SYNC_CORE command when switching between processes, cf. 6734 + * the inline comments in membarrier_arch_switch_mm(). 6729 6735 */ 6730 6736 ++*switch_count; 6731 6737

+9 -4

kernel/sched/membarrier.c

··· 254 254 return 0; 255 255 256 256 /* 257 - * Matches memory barriers around rq->curr modification in 257 + * Matches memory barriers after rq->curr modification in 258 258 * scheduler. 259 259 */ 260 260 smp_mb(); /* system call entry is not a mb. */ ··· 304 304 305 305 /* 306 306 * Memory barrier on the caller thread _after_ we finished 307 - * waiting for the last IPI. Matches memory barriers around 307 + * waiting for the last IPI. Matches memory barriers before 308 308 * rq->curr modification in scheduler. 309 309 */ 310 310 smp_mb(); /* exit from system call is not a mb */ ··· 324 324 MEMBARRIER_STATE_PRIVATE_EXPEDITED_SYNC_CORE_READY)) 325 325 return -EPERM; 326 326 ipi_func = ipi_sync_core; 327 + prepare_sync_core_cmd(mm); 327 328 } else if (flags == MEMBARRIER_FLAG_RSEQ) { 328 329 if (!IS_ENABLED(CONFIG_RSEQ)) 329 330 return -EINVAL; ··· 344 343 return 0; 345 344 346 345 /* 347 - * Matches memory barriers around rq->curr modification in 346 + * Matches memory barriers after rq->curr modification in 348 347 * scheduler. 348 + * 349 + * On RISC-V, this barrier pairing is also needed for the 350 + * SYNC_CORE command when switching between processes, cf. 351 + * the inline comments in membarrier_arch_switch_mm(). 349 352 */ 350 353 smp_mb(); /* system call entry is not a mb. */ 351 354 ··· 425 420 426 421 /* 427 422 * Memory barrier on the caller thread _after_ we finished 428 - * waiting for the last IPI. Matches memory barriers around 423 + * waiting for the last IPI. Matches memory barriers before 429 424 * rq->curr modification in scheduler. 430 425 */ 431 426 smp_mb(); /* exit from system call is not a mb */

+1 -1

mm/mmap.c

··· 64 64 65 65 #ifdef CONFIG_HAVE_ARCH_MMAP_RND_BITS 66 66 const int mmap_rnd_bits_min = CONFIG_ARCH_MMAP_RND_BITS_MIN; 67 - const int mmap_rnd_bits_max = CONFIG_ARCH_MMAP_RND_BITS_MAX; 67 + int mmap_rnd_bits_max __ro_after_init = CONFIG_ARCH_MMAP_RND_BITS_MAX; 68 68 int mmap_rnd_bits __read_mostly = CONFIG_ARCH_MMAP_RND_BITS; 69 69 #endif 70 70 #ifdef CONFIG_HAVE_ARCH_MMAP_RND_COMPAT_BITS

+68

tools/perf/pmu-events/arch/riscv/andes/ax45/firmware.json

··· 1 + [ 2 + { 3 + "ArchStdEvent": "FW_MISALIGNED_LOAD" 4 + }, 5 + { 6 + "ArchStdEvent": "FW_MISALIGNED_STORE" 7 + }, 8 + { 9 + "ArchStdEvent": "FW_ACCESS_LOAD" 10 + }, 11 + { 12 + "ArchStdEvent": "FW_ACCESS_STORE" 13 + }, 14 + { 15 + "ArchStdEvent": "FW_ILLEGAL_INSN" 16 + }, 17 + { 18 + "ArchStdEvent": "FW_SET_TIMER" 19 + }, 20 + { 21 + "ArchStdEvent": "FW_IPI_SENT" 22 + }, 23 + { 24 + "ArchStdEvent": "FW_IPI_RECEIVED" 25 + }, 26 + { 27 + "ArchStdEvent": "FW_FENCE_I_SENT" 28 + }, 29 + { 30 + "ArchStdEvent": "FW_FENCE_I_RECEIVED" 31 + }, 32 + { 33 + "ArchStdEvent": "FW_SFENCE_VMA_SENT" 34 + }, 35 + { 36 + "ArchStdEvent": "FW_SFENCE_VMA_RECEIVED" 37 + }, 38 + { 39 + "ArchStdEvent": "FW_SFENCE_VMA_RECEIVED" 40 + }, 41 + { 42 + "ArchStdEvent": "FW_SFENCE_VMA_ASID_RECEIVED" 43 + }, 44 + { 45 + "ArchStdEvent": "FW_HFENCE_GVMA_SENT" 46 + }, 47 + { 48 + "ArchStdEvent": "FW_HFENCE_GVMA_RECEIVED" 49 + }, 50 + { 51 + "ArchStdEvent": "FW_HFENCE_GVMA_VMID_SENT" 52 + }, 53 + { 54 + "ArchStdEvent": "FW_HFENCE_GVMA_VMID_RECEIVED" 55 + }, 56 + { 57 + "ArchStdEvent": "FW_HFENCE_VVMA_SENT" 58 + }, 59 + { 60 + "ArchStdEvent": "FW_HFENCE_VVMA_RECEIVED" 61 + }, 62 + { 63 + "ArchStdEvent": "FW_HFENCE_VVMA_ASID_SENT" 64 + }, 65 + { 66 + "ArchStdEvent": "FW_HFENCE_VVMA_ASID_RECEIVED" 67 + } 68 + ]

+127

tools/perf/pmu-events/arch/riscv/andes/ax45/instructions.json

··· 1 + [ 2 + { 3 + "EventCode": "0x10", 4 + "EventName": "cycle_count", 5 + "BriefDescription": "Cycle count" 6 + }, 7 + { 8 + "EventCode": "0x20", 9 + "EventName": "inst_count", 10 + "BriefDescription": "Retired instruction count" 11 + }, 12 + { 13 + "EventCode": "0x30", 14 + "EventName": "int_load_inst", 15 + "BriefDescription": "Integer load instruction count" 16 + }, 17 + { 18 + "EventCode": "0x40", 19 + "EventName": "int_store_inst", 20 + "BriefDescription": "Integer store instruction count" 21 + }, 22 + { 23 + "EventCode": "0x50", 24 + "EventName": "atomic_inst", 25 + "BriefDescription": "Atomic instruction count" 26 + }, 27 + { 28 + "EventCode": "0x60", 29 + "EventName": "sys_inst", 30 + "BriefDescription": "System instruction count" 31 + }, 32 + { 33 + "EventCode": "0x70", 34 + "EventName": "int_compute_inst", 35 + "BriefDescription": "Integer computational instruction count" 36 + }, 37 + { 38 + "EventCode": "0x80", 39 + "EventName": "condition_br", 40 + "BriefDescription": "Conditional branch instruction count" 41 + }, 42 + { 43 + "EventCode": "0x90", 44 + "EventName": "taken_condition_br", 45 + "BriefDescription": "Taken conditional branch instruction count" 46 + }, 47 + { 48 + "EventCode": "0xA0", 49 + "EventName": "jal_inst", 50 + "BriefDescription": "JAL instruction count" 51 + }, 52 + { 53 + "EventCode": "0xB0", 54 + "EventName": "jalr_inst", 55 + "BriefDescription": "JALR instruction count" 56 + }, 57 + { 58 + "EventCode": "0xC0", 59 + "EventName": "ret_inst", 60 + "BriefDescription": "Return instruction count" 61 + }, 62 + { 63 + "EventCode": "0xD0", 64 + "EventName": "control_trans_inst", 65 + "BriefDescription": "Control transfer instruction count" 66 + }, 67 + { 68 + "EventCode": "0xE0", 69 + "EventName": "ex9_inst", 70 + "BriefDescription": "EXEC.IT instruction count" 71 + }, 72 + { 73 + "EventCode": "0xF0", 74 + "EventName": "int_mul_inst", 75 + "BriefDescription": "Integer multiplication instruction count" 76 + }, 77 + { 78 + "EventCode": "0x100", 79 + "EventName": "int_div_rem_inst", 80 + "BriefDescription": "Integer division/remainder instruction count" 81 + }, 82 + { 83 + "EventCode": "0x110", 84 + "EventName": "float_load_inst", 85 + "BriefDescription": "Floating-point load instruction count" 86 + }, 87 + { 88 + "EventCode": "0x120", 89 + "EventName": "float_store_inst", 90 + "BriefDescription": "Floating-point store instruction count" 91 + }, 92 + { 93 + "EventCode": "0x130", 94 + "EventName": "float_add_sub_inst", 95 + "BriefDescription": "Floating-point addition/subtraction instruction count" 96 + }, 97 + { 98 + "EventCode": "0x140", 99 + "EventName": "float_mul_inst", 100 + "BriefDescription": "Floating-point multiplication instruction count" 101 + }, 102 + { 103 + "EventCode": "0x150", 104 + "EventName": "float_fused_muladd_inst", 105 + "BriefDescription": "Floating-point fused multiply-add instruction count" 106 + }, 107 + { 108 + "EventCode": "0x160", 109 + "EventName": "float_div_sqrt_inst", 110 + "BriefDescription": "Floating-point division or square-root instruction count" 111 + }, 112 + { 113 + "EventCode": "0x170", 114 + "EventName": "other_float_inst", 115 + "BriefDescription": "Other floating-point instruction count" 116 + }, 117 + { 118 + "EventCode": "0x180", 119 + "EventName": "int_mul_add_sub_inst", 120 + "BriefDescription": "Integer multiplication and add/sub instruction count" 121 + }, 122 + { 123 + "EventCode": "0x190", 124 + "EventName": "retired_ops", 125 + "BriefDescription": "Retired operation count" 126 + } 127 + ]

+57

tools/perf/pmu-events/arch/riscv/andes/ax45/memory.json

··· 1 + [ 2 + { 3 + "EventCode": "0x01", 4 + "EventName": "ilm_access", 5 + "BriefDescription": "ILM access" 6 + }, 7 + { 8 + "EventCode": "0x11", 9 + "EventName": "dlm_access", 10 + "BriefDescription": "DLM access" 11 + }, 12 + { 13 + "EventCode": "0x21", 14 + "EventName": "icache_access", 15 + "BriefDescription": "ICACHE access" 16 + }, 17 + { 18 + "EventCode": "0x31", 19 + "EventName": "icache_miss", 20 + "BriefDescription": "ICACHE miss" 21 + }, 22 + { 23 + "EventCode": "0x41", 24 + "EventName": "dcache_access", 25 + "BriefDescription": "DCACHE access" 26 + }, 27 + { 28 + "EventCode": "0x51", 29 + "EventName": "dcache_miss", 30 + "BriefDescription": "DCACHE miss" 31 + }, 32 + { 33 + "EventCode": "0x61", 34 + "EventName": "dcache_load_access", 35 + "BriefDescription": "DCACHE load access" 36 + }, 37 + { 38 + "EventCode": "0x71", 39 + "EventName": "dcache_load_miss", 40 + "BriefDescription": "DCACHE load miss" 41 + }, 42 + { 43 + "EventCode": "0x81", 44 + "EventName": "dcache_store_access", 45 + "BriefDescription": "DCACHE store access" 46 + }, 47 + { 48 + "EventCode": "0x91", 49 + "EventName": "dcache_store_miss", 50 + "BriefDescription": "DCACHE store miss" 51 + }, 52 + { 53 + "EventCode": "0xA1", 54 + "EventName": "dcache_wb", 55 + "BriefDescription": "DCACHE writeback" 56 + } 57 + ]

+77

tools/perf/pmu-events/arch/riscv/andes/ax45/microarch.json

··· 1 + [ 2 + { 3 + "EventCode": "0xB1", 4 + "EventName": "cycle_wait_icache_fill", 5 + "BriefDescription": "Cycles waiting for ICACHE fill data" 6 + }, 7 + { 8 + "EventCode": "0xC1", 9 + "EventName": "cycle_wait_dcache_fill", 10 + "BriefDescription": "Cycles waiting for DCACHE fill data" 11 + }, 12 + { 13 + "EventCode": "0xD1", 14 + "EventName": "uncached_ifetch_from_bus", 15 + "BriefDescription": "Uncached ifetch data access from bus" 16 + }, 17 + { 18 + "EventCode": "0xE1", 19 + "EventName": "uncached_load_from_bus", 20 + "BriefDescription": "Uncached load data access from bus" 21 + }, 22 + { 23 + "EventCode": "0xF1", 24 + "EventName": "cycle_wait_uncached_ifetch", 25 + "BriefDescription": "Cycles waiting for uncached ifetch data from bus" 26 + }, 27 + { 28 + "EventCode": "0x101", 29 + "EventName": "cycle_wait_uncached_load", 30 + "BriefDescription": "Cycles waiting for uncached load data from bus" 31 + }, 32 + { 33 + "EventCode": "0x111", 34 + "EventName": "main_itlb_access", 35 + "BriefDescription": "Main ITLB access" 36 + }, 37 + { 38 + "EventCode": "0x121", 39 + "EventName": "main_itlb_miss", 40 + "BriefDescription": "Main ITLB miss" 41 + }, 42 + { 43 + "EventCode": "0x131", 44 + "EventName": "main_dtlb_access", 45 + "BriefDescription": "Main DTLB access" 46 + }, 47 + { 48 + "EventCode": "0x141", 49 + "EventName": "main_dtlb_miss", 50 + "BriefDescription": "Main DTLB miss" 51 + }, 52 + { 53 + "EventCode": "0x151", 54 + "EventName": "cycle_wait_itlb_fill", 55 + "BriefDescription": "Cycles waiting for Main ITLB fill data" 56 + }, 57 + { 58 + "EventCode": "0x161", 59 + "EventName": "pipe_stall_cycle_dtlb_miss", 60 + "BriefDescription": "Pipeline stall cycles caused by Main DTLB miss" 61 + }, 62 + { 63 + "EventCode": "0x02", 64 + "EventName": "mispredict_condition_br", 65 + "BriefDescription": "Misprediction of conditional branches" 66 + }, 67 + { 68 + "EventCode": "0x12", 69 + "EventName": "mispredict_take_condition_br", 70 + "BriefDescription": "Misprediction of taken conditional branches" 71 + }, 72 + { 73 + "EventCode": "0x22", 74 + "EventName": "mispredict_target_ret_inst", 75 + "BriefDescription": "Misprediction of targets of Return instructions" 76 + } 77 + ]

+1

tools/perf/pmu-events/arch/riscv/mapfile.csv

··· 17 17 0x489-0x8000000000000007-0x[[:xdigit:]]+,v1,sifive/u74,core 18 18 0x5b7-0x0-0x0,v1,thead/c900-legacy,core 19 19 0x67e-0x80000000db0000[89]0-0x[[:xdigit:]]+,v1,starfive/dubhe-80,core 20 + 0x31e-0x8000000000008a45-0x[[:xdigit:]]+,v1,andes/ax45,core

+1 -22

tools/testing/selftests/riscv/mm/mmap_bottomup.c

··· 6 6 7 7 TEST(infinite_rlimit) 8 8 { 9 - // Only works on 64 bit 10 - #if __riscv_xlen == 64 11 - struct addresses mmap_addresses; 12 - 13 9 EXPECT_EQ(BOTTOM_UP, memory_layout()); 14 10 15 - do_mmaps(&mmap_addresses); 16 - 17 - EXPECT_NE(MAP_FAILED, mmap_addresses.no_hint); 18 - EXPECT_NE(MAP_FAILED, mmap_addresses.on_37_addr); 19 - EXPECT_NE(MAP_FAILED, mmap_addresses.on_38_addr); 20 - EXPECT_NE(MAP_FAILED, mmap_addresses.on_46_addr); 21 - EXPECT_NE(MAP_FAILED, mmap_addresses.on_47_addr); 22 - EXPECT_NE(MAP_FAILED, mmap_addresses.on_55_addr); 23 - EXPECT_NE(MAP_FAILED, mmap_addresses.on_56_addr); 24 - 25 - EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.no_hint); 26 - EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_37_addr); 27 - EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_38_addr); 28 - EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_46_addr); 29 - EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_47_addr); 30 - EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_55_addr); 31 - EXPECT_GT(1UL << 56, (unsigned long)mmap_addresses.on_56_addr); 32 - #endif 11 + TEST_MMAPS; 33 12 } 34 13 35 14 TEST_HARNESS_MAIN

+1 -22

tools/testing/selftests/riscv/mm/mmap_default.c

··· 6 6 7 7 TEST(default_rlimit) 8 8 { 9 - // Only works on 64 bit 10 - #if __riscv_xlen == 64 11 - struct addresses mmap_addresses; 12 - 13 9 EXPECT_EQ(TOP_DOWN, memory_layout()); 14 10 15 - do_mmaps(&mmap_addresses); 16 - 17 - EXPECT_NE(MAP_FAILED, mmap_addresses.no_hint); 18 - EXPECT_NE(MAP_FAILED, mmap_addresses.on_37_addr); 19 - EXPECT_NE(MAP_FAILED, mmap_addresses.on_38_addr); 20 - EXPECT_NE(MAP_FAILED, mmap_addresses.on_46_addr); 21 - EXPECT_NE(MAP_FAILED, mmap_addresses.on_47_addr); 22 - EXPECT_NE(MAP_FAILED, mmap_addresses.on_55_addr); 23 - EXPECT_NE(MAP_FAILED, mmap_addresses.on_56_addr); 24 - 25 - EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.no_hint); 26 - EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_37_addr); 27 - EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_38_addr); 28 - EXPECT_GT(1UL << 38, (unsigned long)mmap_addresses.on_46_addr); 29 - EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_47_addr); 30 - EXPECT_GT(1UL << 47, (unsigned long)mmap_addresses.on_55_addr); 31 - EXPECT_GT(1UL << 56, (unsigned long)mmap_addresses.on_56_addr); 32 - #endif 11 + TEST_MMAPS; 33 12 } 34 13 35 14 TEST_HARNESS_MAIN

+65 -42

tools/testing/selftests/riscv/mm/mmap_test.h

··· 4 4 #include <sys/mman.h> 5 5 #include <sys/resource.h> 6 6 #include <stddef.h> 7 + #include <strings.h> 8 + #include "../../kselftest_harness.h" 7 9 8 10 #define TOP_DOWN 0 9 11 #define BOTTOM_UP 1 10 12 11 - struct addresses { 12 - int *no_hint; 13 - int *on_37_addr; 14 - int *on_38_addr; 15 - int *on_46_addr; 16 - int *on_47_addr; 17 - int *on_55_addr; 18 - int *on_56_addr; 13 + #if __riscv_xlen == 64 14 + uint64_t random_addresses[] = { 15 + 0x19764f0d73b3a9f0, 0x016049584cecef59, 0x3580bdd3562f4acd, 16 + 0x1164219f20b17da0, 0x07d97fcb40ff2373, 0x76ec528921272ee7, 17 + 0x4dd48c38a3de3f70, 0x2e11415055f6997d, 0x14b43334ac476c02, 18 + 0x375a60795aff19f6, 0x47f3051725b8ee1a, 0x4e697cf240494a9f, 19 + 0x456b59b5c2f9e9d1, 0x101724379d63cb96, 0x7fe9ad31619528c1, 20 + 0x2f417247c495c2ea, 0x329a5a5b82943a5e, 0x06d7a9d6adcd3827, 21 + 0x327b0b9ee37f62d5, 0x17c7b1851dfd9b76, 0x006ebb6456ec2cd9, 22 + 0x00836cd14146a134, 0x00e5c4dcde7126db, 0x004c29feadf75753, 23 + 0x00d8b20149ed930c, 0x00d71574c269387a, 0x0006ebe4a82acb7a, 24 + 0x0016135df51f471b, 0x00758bdb55455160, 0x00d0bdd949b13b32, 25 + 0x00ecea01e7c5f54b, 0x00e37b071b9948b1, 0x0011fdd00ff57ab3, 26 + 0x00e407294b52f5ea, 0x00567748c200ed20, 0x000d073084651046, 27 + 0x00ac896f4365463c, 0x00eb0d49a0b26216, 0x0066a2564a982a31, 28 + 0x002e0d20237784ae, 0x0000554ff8a77a76, 0x00006ce07a54c012, 29 + 0x000009570516d799, 0x00000954ca15b84d, 0x0000684f0d453379, 30 + 0x00002ae5816302b5, 0x0000042403fb54bf, 0x00004bad7392bf30, 31 + 0x00003e73bfa4b5e3, 0x00005442c29978e0, 0x00002803f11286b6, 32 + 0x000073875d745fc6, 0x00007cede9cb8240, 0x000027df84cc6a4f, 33 + 0x00006d7e0e74242a, 0x00004afd0b836e02, 0x000047d0e837cd82, 34 + 0x00003b42405efeda, 0x00001531bafa4c95, 0x00007172cae34ac4, 19 35 }; 36 + #else 37 + uint32_t random_addresses[] = { 38 + 0x8dc302e0, 0x929ab1e0, 0xb47683ba, 0xea519c73, 0xa19f1c90, 0xc49ba213, 39 + 0x8f57c625, 0xadfe5137, 0x874d4d95, 0xaa20f09d, 0xcf21ebfc, 0xda7737f1, 40 + 0xcedf392a, 0x83026c14, 0xccedca52, 0xc6ccf826, 0xe0cd9415, 0x997472ca, 41 + 0xa21a44c1, 0xe82196f5, 0xa23fd66b, 0xc28d5590, 0xd009cdce, 0xcf0be646, 42 + 0x8fc8c7ff, 0xe2a85984, 0xa3d3236b, 0x89a0619d, 0xc03db924, 0xb5d4cc1b, 43 + 0xb96ee04c, 0xd191da48, 0xb432a000, 0xaa2bebbc, 0xa2fcb289, 0xb0cca89b, 44 + 0xb0c18d6a, 0x88f58deb, 0xa4d42d1c, 0xe4d74e86, 0x99902b09, 0x8f786d31, 45 + 0xbec5e381, 0x9a727e65, 0xa9a65040, 0xa880d789, 0x8f1b335e, 0xfc821c1e, 46 + 0x97e34be4, 0xbbef84ed, 0xf447d197, 0xfd7ceee2, 0xe632348d, 0xee4590f4, 47 + 0x958992a5, 0xd57e05d6, 0xfd240970, 0xc5b0dcff, 0xd96da2c2, 0xa7ae041d, 48 + }; 49 + #endif 20 50 21 51 // Only works on 64 bit 22 52 #if __riscv_xlen == 64 23 - static inline void do_mmaps(struct addresses *mmap_addresses) 53 + #define PROT (PROT_READ | PROT_WRITE) 54 + #define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS) 55 + 56 + /* mmap must return a value that doesn't use more bits than the hint address. */ 57 + static inline unsigned long get_max_value(unsigned long input) 24 58 { 25 - /* 26 - * Place all of the hint addresses on the boundaries of mmap 27 - * sv39, sv48, sv57 28 - * User addresses end at 1<<38, 1<<47, 1<<56 respectively 29 - */ 30 - void *on_37_bits = (void *)(1UL << 37); 31 - void *on_38_bits = (void *)(1UL << 38); 32 - void *on_46_bits = (void *)(1UL << 46); 33 - void *on_47_bits = (void *)(1UL << 47); 34 - void *on_55_bits = (void *)(1UL << 55); 35 - void *on_56_bits = (void *)(1UL << 56); 59 + unsigned long max_bit = (1UL << (((sizeof(unsigned long) * 8) - 1 - 60 + __builtin_clzl(input)))); 36 61 37 - int prot = PROT_READ | PROT_WRITE; 38 - int flags = MAP_PRIVATE | MAP_ANONYMOUS; 39 - 40 - mmap_addresses->no_hint = 41 - mmap(NULL, 5 * sizeof(int), prot, flags, 0, 0); 42 - mmap_addresses->on_37_addr = 43 - mmap(on_37_bits, 5 * sizeof(int), prot, flags, 0, 0); 44 - mmap_addresses->on_38_addr = 45 - mmap(on_38_bits, 5 * sizeof(int), prot, flags, 0, 0); 46 - mmap_addresses->on_46_addr = 47 - mmap(on_46_bits, 5 * sizeof(int), prot, flags, 0, 0); 48 - mmap_addresses->on_47_addr = 49 - mmap(on_47_bits, 5 * sizeof(int), prot, flags, 0, 0); 50 - mmap_addresses->on_55_addr = 51 - mmap(on_55_bits, 5 * sizeof(int), prot, flags, 0, 0); 52 - mmap_addresses->on_56_addr = 53 - mmap(on_56_bits, 5 * sizeof(int), prot, flags, 0, 0); 62 + return max_bit + (max_bit - 1); 54 63 } 64 + 65 + #define TEST_MMAPS \ 66 + ({ \ 67 + void *mmap_addr; \ 68 + for (int i = 0; i < ARRAY_SIZE(random_addresses); i++) { \ 69 + mmap_addr = mmap((void *)random_addresses[i], \ 70 + 5 * sizeof(int), PROT, FLAGS, 0, 0); \ 71 + EXPECT_NE(MAP_FAILED, mmap_addr); \ 72 + EXPECT_GE((void *)get_max_value(random_addresses[i]), \ 73 + mmap_addr); \ 74 + mmap_addr = mmap((void *)random_addresses[i], \ 75 + 5 * sizeof(int), PROT, FLAGS, 0, 0); \ 76 + EXPECT_NE(MAP_FAILED, mmap_addr); \ 77 + EXPECT_GE((void *)get_max_value(random_addresses[i]), \ 78 + mmap_addr); \ 79 + } \ 80 + }) 55 81 #endif /* __riscv_xlen == 64 */ 56 82 57 83 static inline int memory_layout(void) 58 84 { 59 - int prot = PROT_READ | PROT_WRITE; 60 - int flags = MAP_PRIVATE | MAP_ANONYMOUS; 61 - 62 - void *value1 = mmap(NULL, sizeof(int), prot, flags, 0, 0); 63 - void *value2 = mmap(NULL, sizeof(int), prot, flags, 0, 0); 85 + void *value1 = mmap(NULL, sizeof(int), PROT, FLAGS, 0, 0); 86 + void *value2 = mmap(NULL, sizeof(int), PROT, FLAGS, 0, 0); 64 87 65 88 return value2 > value1; 66 89 }