Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

powerpc32: memcpy: only use dcbz once cache is enabled

memcpy() uses instruction dcbz to speed up copy by not wasting time
loading cache line with data that will be overwritten.
Some platform like mpc52xx do no have cache active at startup and
can therefore not use memcpy(). Allthough no part of the code
explicitly uses memcpy(), GCC makes calls to it.

This patch modifies memcpy() such that at startup, memcpy()
unconditionally jumps to generic_memcpy() which doesn't use
the dcbz instruction.

Once the initial MMU is set up, in machine_init() we patch memcpy()
by replacing this inconditional jump by a NOP

Reported-by: Michal Sojka <sojkam1@fel.cvut.cz>
Tested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

authored by

LEROY Christophe and committed by
Michael Ellerman
1cd03890 36b35d5d

+8
+3
arch/powerpc/kernel/setup_32.c
··· 38 38 #include <asm/udbg.h> 39 39 #include <asm/mmu_context.h> 40 40 #include <asm/epapr_hcalls.h> 41 + #include <asm/code-patching.h> 41 42 42 43 #define DBG(fmt...) 43 44 ··· 116 115 117 116 /* Enable early debugging if any specified (see udbg.h) */ 118 117 udbg_early_init(); 118 + 119 + patch_instruction((unsigned int *)&memcpy, PPC_INST_NOP); 119 120 120 121 /* Do some early initialization based on the flat device tree */ 121 122 early_init_devtree(__va(dt_ptr));
+5
arch/powerpc/lib/copy_32.S
··· 128 128 * the destination area is cacheable. 129 129 * We only use this version if the source and dest don't overlap. 130 130 * -- paulus. 131 + * 132 + * During early init, cache might not be active yet, so dcbz cannot be used. 133 + * We therefore jump to generic_memcpy which doesn't use dcbz. This jump is 134 + * replaced by a nop once cache is active. This is done in machine_init() 131 135 */ 132 136 _GLOBAL(memmove) 133 137 cmplw 0,r3,r4 ··· 139 135 /* fall through */ 140 136 141 137 _GLOBAL(memcpy) 138 + b generic_memcpy 142 139 add r7,r3,r5 /* test if the src & dst overlap */ 143 140 add r8,r4,r5 144 141 cmplw 0,r4,r7