[ARM] cache align destination pointer when copying memory for some processors

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

The implementation for memory copy functions on ARM had a (disabled)
provision for aligning the source pointer before loading registers with
data. Turns out that aligning the _destination_ pointer is much more
useful, as the read side is already sufficiently helped with the use of
preload.

So this changes the definition of the CALGN() macro to target the
destination pointer instead, and turns it on for Feroceon processors
where the gain is very noticeable.

Signed-off-by: Nicolas Pitre <nico@marvell.com>
Signed-off-by: Lennert Buytenhek <buytenh@marvell.com>

authored by

Nicolas Pitre and committed by

Lennert Buytenhek 18 years ago 2239aff6 4c4925c1

+19 -20

3 changed files

expand all

arch

arm

lib

copy_template.S

memmove.S

include

asm-arm

assembler.h

+2 -10

arch/arm/lib/copy_template.S

··· 13 13 */ 14 14 15 15 /* 16 - * This can be used to enable code to cacheline align the source pointer. 17 - * Experiments on tested architectures (StrongARM and XScale) didn't show 18 - * this a worthwhile thing to do. That might be different in the future. 19 - */ 20 - //#define CALGN(code...) code 21 - #define CALGN(code...) 22 - 23 - /* 24 16 * Theory of operation 25 17 * ------------------- 26 18 * ··· 74 82 stmfd sp!, {r5 - r8} 75 83 blt 5f 76 84 77 - CALGN( ands ip, r1, #31 ) 85 + CALGN( ands ip, r0, #31 ) 78 86 CALGN( rsb r3, ip, #32 ) 79 87 CALGN( sbcnes r4, r3, r2 ) @ C is always set here 80 88 CALGN( bcs 2f ) ··· 160 168 subs r2, r2, #28 161 169 blt 14f 162 170 163 - CALGN( ands ip, r1, #31 ) 171 + CALGN( ands ip, r0, #31 ) 164 172 CALGN( rsb ip, ip, #32 ) 165 173 CALGN( sbcnes r4, ip, r2 ) @ C is always set here 166 174 CALGN( subcc r2, r2, ip )

+2 -10

arch/arm/lib/memmove.S

··· 13 13 #include <linux/linkage.h> 14 14 #include <asm/assembler.h> 15 15 16 - /* 17 - * This can be used to enable code to cacheline align the source pointer. 18 - * Experiments on tested architectures (StrongARM and XScale) didn't show 19 - * this a worthwhile thing to do. That might be different in the future. 20 - */ 21 - //#define CALGN(code...) code 22 - #define CALGN(code...) 23 - 24 16 .text 25 17 26 18 /* ··· 47 55 stmfd sp!, {r5 - r8} 48 56 blt 5f 49 57 50 - CALGN( ands ip, r1, #31 ) 58 + CALGN( ands ip, r0, #31 ) 51 59 CALGN( sbcnes r4, ip, r2 ) @ C is always set here 52 60 CALGN( bcs 2f ) 53 61 CALGN( adr r4, 6f ) ··· 131 139 subs r2, r2, #28 132 140 blt 14f 133 141 134 - CALGN( ands ip, r1, #31 ) 142 + CALGN( ands ip, r0, #31 ) 135 143 CALGN( sbcnes r4, ip, r2 ) @ C is always set here 136 144 CALGN( subcc r2, r2, ip ) 137 145 CALGN( bcc 15f )

+15

include/asm-arm/assembler.h

··· 56 56 #endif 57 57 58 58 /* 59 + * This can be used to enable code to cacheline align the destination 60 + * pointer when bulk writing to memory. Experiments on StrongARM and 61 + * XScale didn't show this a worthwhile thing to do when the cache is not 62 + * set to write-allocate (this would need further testing on XScale when WA 63 + * is used). 64 + * 65 + * On Feroceon there is much to gain however, regardless of cache mode. 66 + */ 67 + #ifdef CONFIG_CPU_FEROCEON 68 + #define CALGN(code...) code 69 + #else 70 + #define CALGN(code...) 71 + #endif 72 + 73 + /* 59 74 * Enable and disable interrupts 60 75 */ 61 76 #if __LINUX_ARM_ARCH__ >= 6