Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/atomic: Fix smp_mb__{before,after}_atomic()

Recent probing at the Linux Kernel Memory Model uncovered a
'surprise'. Strongly ordered architectures where the atomic RmW
primitive implies full memory ordering and
smp_mb__{before,after}_atomic() are a simple barrier() (such as x86)
fail for:

*x = 1;
atomic_inc(u);
smp_mb__after_atomic();
r0 = *y;

Because, while the atomic_inc() implies memory order, it
(surprisingly) does not provide a compiler barrier. This then allows
the compiler to re-order like so:

atomic_inc(u);
*x = 1;
smp_mb__after_atomic();
r0 = *y;

Which the CPU is then allowed to re-order (under TSO rules) like:

atomic_inc(u);
r0 = *y;
*x = 1;

And this very much was not intended. Therefore strengthen the atomic
RmW ops to include a compiler barrier.

NOTE: atomic_{or,and,xor} and the bitops already had the compiler
barrier.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>

authored by

Peter Zijlstra and committed by
Ingo Molnar
69d927bb dd471efe

+13 -10
+3
Documentation/atomic_t.txt
··· 196 196 ordering on their SMP atomic primitives. For example our TSO architectures 197 197 provide full ordered atomics and these barriers are no-ops. 198 198 199 + NOTE: when the atomic RmW ops are fully ordered, they should also imply a 200 + compiler barrier. 201 + 199 202 Thus: 200 203 201 204 atomic_fetch_add();
+4 -4
arch/x86/include/asm/atomic.h
··· 54 54 { 55 55 asm volatile(LOCK_PREFIX "addl %1,%0" 56 56 : "+m" (v->counter) 57 - : "ir" (i)); 57 + : "ir" (i) : "memory"); 58 58 } 59 59 60 60 /** ··· 68 68 { 69 69 asm volatile(LOCK_PREFIX "subl %1,%0" 70 70 : "+m" (v->counter) 71 - : "ir" (i)); 71 + : "ir" (i) : "memory"); 72 72 } 73 73 74 74 /** ··· 95 95 static __always_inline void arch_atomic_inc(atomic_t *v) 96 96 { 97 97 asm volatile(LOCK_PREFIX "incl %0" 98 - : "+m" (v->counter)); 98 + : "+m" (v->counter) :: "memory"); 99 99 } 100 100 #define arch_atomic_inc arch_atomic_inc 101 101 ··· 108 108 static __always_inline void arch_atomic_dec(atomic_t *v) 109 109 { 110 110 asm volatile(LOCK_PREFIX "decl %0" 111 - : "+m" (v->counter)); 111 + : "+m" (v->counter) :: "memory"); 112 112 } 113 113 #define arch_atomic_dec arch_atomic_dec 114 114
+4 -4
arch/x86/include/asm/atomic64_64.h
··· 45 45 { 46 46 asm volatile(LOCK_PREFIX "addq %1,%0" 47 47 : "=m" (v->counter) 48 - : "er" (i), "m" (v->counter)); 48 + : "er" (i), "m" (v->counter) : "memory"); 49 49 } 50 50 51 51 /** ··· 59 59 { 60 60 asm volatile(LOCK_PREFIX "subq %1,%0" 61 61 : "=m" (v->counter) 62 - : "er" (i), "m" (v->counter)); 62 + : "er" (i), "m" (v->counter) : "memory"); 63 63 } 64 64 65 65 /** ··· 87 87 { 88 88 asm volatile(LOCK_PREFIX "incq %0" 89 89 : "=m" (v->counter) 90 - : "m" (v->counter)); 90 + : "m" (v->counter) : "memory"); 91 91 } 92 92 #define arch_atomic64_inc arch_atomic64_inc 93 93 ··· 101 101 { 102 102 asm volatile(LOCK_PREFIX "decq %0" 103 103 : "=m" (v->counter) 104 - : "m" (v->counter)); 104 + : "m" (v->counter) : "memory"); 105 105 } 106 106 #define arch_atomic64_dec arch_atomic64_dec 107 107
+2 -2
arch/x86/include/asm/barrier.h
··· 80 80 }) 81 81 82 82 /* Atomic operations are already serializing on x86 */ 83 - #define __smp_mb__before_atomic() barrier() 84 - #define __smp_mb__after_atomic() barrier() 83 + #define __smp_mb__before_atomic() do { } while (0) 84 + #define __smp_mb__after_atomic() do { } while (0) 85 85 86 86 #include <asm-generic/barrier.h> 87 87