Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

ARM: 7983/1: atomics: implement a better __atomic_add_unless for v6+

Looking at perf profiles of multi-threaded hackbench runs, a significant
performance hit appears to manifest from the cmpxchg loop used to
implement the 32-bit atomic_add_unless function. This can be mitigated
by writing a direct implementation of __atomic_add_unless which doesn't
require iteration outside of the atomic operation.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

authored by

Will Deacon and committed by
Russell King
db38ee87 d98b90ea

+31 -4
+31 -4
arch/arm/include/asm/atomic.h
··· 141 141 return oldval; 142 142 } 143 143 144 + static inline int __atomic_add_unless(atomic_t *v, int a, int u) 145 + { 146 + int oldval, newval; 147 + unsigned long tmp; 148 + 149 + smp_mb(); 150 + prefetchw(&v->counter); 151 + 152 + __asm__ __volatile__ ("@ atomic_add_unless\n" 153 + "1: ldrex %0, [%4]\n" 154 + " teq %0, %5\n" 155 + " beq 2f\n" 156 + " add %1, %0, %6\n" 157 + " strex %2, %1, [%4]\n" 158 + " teq %2, #0\n" 159 + " bne 1b\n" 160 + "2:" 161 + : "=&r" (oldval), "=&r" (newval), "=&r" (tmp), "+Qo" (v->counter) 162 + : "r" (&v->counter), "r" (u), "r" (a) 163 + : "cc"); 164 + 165 + if (oldval != u) 166 + smp_mb(); 167 + 168 + return oldval; 169 + } 170 + 144 171 #else /* ARM_ARCH_6 */ 145 172 146 173 #ifdef CONFIG_SMP ··· 216 189 return ret; 217 190 } 218 191 219 - #endif /* __LINUX_ARM_ARCH__ */ 220 - 221 - #define atomic_xchg(v, new) (xchg(&((v)->counter), new)) 222 - 223 192 static inline int __atomic_add_unless(atomic_t *v, int a, int u) 224 193 { 225 194 int c, old; ··· 225 202 c = old; 226 203 return c; 227 204 } 205 + 206 + #endif /* __LINUX_ARM_ARCH__ */ 207 + 208 + #define atomic_xchg(v, new) (xchg(&((v)->counter), new)) 228 209 229 210 #define atomic_inc(v) atomic_add(1, v) 230 211 #define atomic_dec(v) atomic_sub(1, v)