Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

arm64: Rewrite __arch_clear_user()

Now that we're always using STTR variants rather than abstracting two
different addressing modes, the user_ldst macro here is frankly more
obfuscating than helpful. Rewrite __arch_clear_user() with regular
USER() annotations so that it's clearer what's going on, and take the
opportunity to minimise the branchiness in the most common paths, while
also allowing the exception fixup to return an accurate result.

Apparently some folks examine large reads from /dev/zero closely enough
to notice the loop being hot, so align it per the other critical loops
(presumably around a typical instruction fetch granularity).

Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/1cbd78b12c076a8ad4656a345811cfb9425df0b3.1622128527.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

authored by

Robin Murphy and committed by
Will Deacon
344323e0 9e51cafd

+27 -20
+27 -20
arch/arm64/lib/clear_user.S
··· 1 1 /* SPDX-License-Identifier: GPL-2.0-only */ 2 2 /* 3 - * Based on arch/arm/lib/clear_user.S 4 - * 5 - * Copyright (C) 2012 ARM Ltd. 3 + * Copyright (C) 2021 Arm Ltd. 6 4 */ 7 - #include <linux/linkage.h> 8 5 9 - #include <asm/asm-uaccess.h> 6 + #include <linux/linkage.h> 10 7 #include <asm/assembler.h> 11 8 12 9 .text ··· 16 19 * 17 20 * Alignment fixed up by hardware. 18 21 */ 22 + 23 + .p2align 4 24 + // Alignment is for the loop, but since the prologue (including BTI) 25 + // is also 16 bytes we can keep any padding outside the function 19 26 SYM_FUNC_START(__arch_clear_user) 20 - mov x2, x1 // save the size for fixup return 27 + add x2, x0, x1 21 28 subs x1, x1, #8 22 29 b.mi 2f 23 30 1: 24 - user_ldst 9f, sttr, xzr, x0, 8 31 + USER(9f, sttr xzr, [x0]) 32 + add x0, x0, #8 25 33 subs x1, x1, #8 26 - b.pl 1b 27 - 2: adds x1, x1, #4 28 - b.mi 3f 29 - user_ldst 9f, sttr, wzr, x0, 4 30 - sub x1, x1, #4 31 - 3: adds x1, x1, #2 32 - b.mi 4f 33 - user_ldst 9f, sttrh, wzr, x0, 2 34 - sub x1, x1, #2 35 - 4: adds x1, x1, #1 36 - b.mi 5f 37 - user_ldst 9f, sttrb, wzr, x0, 0 34 + b.hi 1b 35 + USER(9f, sttr xzr, [x2, #-8]) 36 + mov x0, #0 37 + ret 38 + 39 + 2: tbz x1, #2, 3f 40 + USER(9f, sttr wzr, [x0]) 41 + USER(8f, sttr wzr, [x2, #-4]) 42 + mov x0, #0 43 + ret 44 + 45 + 3: tbz x1, #1, 4f 46 + USER(9f, sttrh wzr, [x0]) 47 + 4: tbz x1, #0, 5f 48 + USER(7f, sttrb wzr, [x2, #-1]) 38 49 5: mov x0, #0 39 50 ret 40 51 SYM_FUNC_END(__arch_clear_user) ··· 50 45 51 46 .section .fixup,"ax" 52 47 .align 2 53 - 9: mov x0, x2 // return the original size 48 + 7: sub x0, x2, #5 // Adjust for faulting on the final byte... 49 + 8: add x0, x0, #4 // ...or the second word of the 4-7 byte case 50 + 9: sub x0, x2, x0 54 51 ret 55 52 .previous