···11- Semantics and Behavior of Atomic and22- Bitmask Operations11+=======================================================22+Semantics and Behavior of Atomic and Bitmask Operations33+=======================================================3444- David S. Miller 55+:Author: David S. Miller5666- This document is intended to serve as a guide to Linux port77+This document is intended to serve as a guide to Linux port78maintainers on how to implement atomic counter, bitops, and spinlock89interfaces properly.9101010- The atomic_t type should be defined as a signed integer and1111+Atomic Type And Operations1212+==========================1313+1414+The atomic_t type should be defined as a signed integer and1115the atomic_long_t type as a signed long integer. Also, they should1216be made opaque such that any kind of cast to a normal C integer type1313-will fail. Something like the following should suffice:1717+will fail. Something like the following should suffice::14181519 typedef struct { int counter; } atomic_t;1620 typedef struct { long counter; } atomic_long_t;17211822Historically, counter has been declared volatile. This is now discouraged.1919-See Documentation/process/volatile-considered-harmful.rst for the complete rationale.2323+See :ref:`Documentation/process/volatile-considered-harmful.rst2424+<volatile_considered_harmful>` for the complete rationale.20252126local_t is very similar to atomic_t. If the counter is per CPU and only2227updated by one CPU, local_t is probably more appropriate. Please see2323-Documentation/local_ops.txt for the semantics of local_t.2828+:ref:`Documentation/core-api/local_ops.rst <local_ops>` for the semantics of2929+local_t.24302531The first operations to implement for atomic_t's are the initializers and2626-plain reads.3232+plain reads. ::27332834 #define ATOMIC_INIT(i) { (i) }2935 #define atomic_set(v, i) ((v)->counter = (i))30363131-The first macro is used in definitions, such as:3737+The first macro is used in definitions, such as::32383333-static atomic_t my_counter = ATOMIC_INIT(1);3939+ static atomic_t my_counter = ATOMIC_INIT(1);34403541The initializer is atomic in that the return values of the atomic operations3642are guaranteed to be correct reflecting the initialized value if the···4438proper implicit or explicit read memory barrier is needed before reading the4539value with atomic_read from another thread.46404747-As with all of the atomic_ interfaces, replace the leading "atomic_"4848-with "atomic_long_" to operate on atomic_long_t.4141+As with all of the ``atomic_`` interfaces, replace the leading ``atomic_``4242+with ``atomic_long_`` to operate on atomic_long_t.49435050-The second interface can be used at runtime, as in:4444+The second interface can be used at runtime, as in::51455246 struct foo { atomic_t counter; };5347 ...···6559or explicit memory barrier is needed before the value set with the operation6660is guaranteed to be readable with atomic_read from another thread.67616868-Next, we have:6262+Next, we have::69637064 #define atomic_read(v) ((v)->counter)7165···7973interface must take care of that with a proper implicit or explicit memory8074barrier.81758282-*** WARNING: atomic_read() and atomic_set() DO NOT IMPLY BARRIERS! ***7676+.. warning::83778484-Some architectures may choose to use the volatile keyword, barriers, or inline8585-assembly to guarantee some degree of immediacy for atomic_read() and8686-atomic_set(). This is not uniformly guaranteed, and may change in the future,8787-so all users of atomic_t should treat atomic_read() and atomic_set() as simple8888-C statements that may be reordered or optimized away entirely by the compiler8989-or processor, and explicitly invoke the appropriate compiler and/or memory9090-barrier for each use case. Failure to do so will result in code that may9191-suddenly break when used with different architectures or compiler9292-optimizations, or even changes in unrelated code which changes how the9393-compiler optimizes the section accessing atomic_t variables.7878+ ``atomic_read()`` and ``atomic_set()`` DO NOT IMPLY BARRIERS!94799595-*** YOU HAVE BEEN WARNED! ***8080+ Some architectures may choose to use the volatile keyword, barriers, or8181+ inline assembly to guarantee some degree of immediacy for atomic_read()8282+ and atomic_set(). This is not uniformly guaranteed, and may change in8383+ the future, so all users of atomic_t should treat atomic_read() and8484+ atomic_set() as simple C statements that may be reordered or optimized8585+ away entirely by the compiler or processor, and explicitly invoke the8686+ appropriate compiler and/or memory barrier for each use case. Failure8787+ to do so will result in code that may suddenly break when used with8888+ different architectures or compiler optimizations, or even changes in8989+ unrelated code which changes how the compiler optimizes the section9090+ accessing atomic_t variables.96919792Properly aligned pointers, longs, ints, and chars (and unsigned9893equivalents) may be atomically loaded from and stored to in the same···10295optimizations that might otherwise optimize accesses out of existence on10396the one hand, or that might create unsolicited accesses on the other.10497105105-For example consider the following code:9898+For example consider the following code::10699107100 while (a > 0)108101 do_something();109102110103If the compiler can prove that do_something() does not store to the111104variable a, then the compiler is within its rights transforming this to112112-the following:105105+the following::113106114107 tmp = a;115108 if (a > 0)···117110 do_something();118111119112If you don't want the compiler to do this (and you probably don't), then120120-you should use something like the following:113113+you should use something like the following::121114122115 while (READ_ONCE(a) < 0)123116 do_something();124117125118Alternatively, you could place a barrier() call in the loop.126119127127-For another example, consider the following code:120120+For another example, consider the following code::128121129122 tmp_a = a;130123 do_something_with(tmp_a);···132125133126If the compiler can prove that do_something_with() does not store to the134127variable a, then the compiler is within its rights to manufacture an135135-additional load as follows:128128+additional load as follows::136129137130 tmp_a = a;138131 do_something_with(tmp_a);···146139do_something_with() was an inline function that made very heavy use147140of registers: reloading from variable a could save a flush to the148141stack and later reload. To prevent the compiler from attacking your149149-code in this manner, write the following:142142+code in this manner, write the following::150143151144 tmp_a = READ_ONCE(a);152145 do_something_with(tmp_a);···154147155148For a final example, consider the following code, assuming that the156149variable a is set at boot time before the second CPU is brought online157157-and never changed later, so that memory barriers are not needed:150150+and never changed later, so that memory barriers are not needed::158151159152 if (a)160153 b = 9;···162155 b = 42;163156164157The compiler is within its rights to manufacture an additional store165165-by transforming the above code into the following:158158+by transforming the above code into the following::166159167160 b = 42;168161 if (a)···170163171164This could come as a fatal surprise to other code running concurrently172165that expected b to never have the value 42 if a was zero. To prevent173173-the compiler from doing this, write something like:166166+the compiler from doing this, write something like::174167175168 if (a)176169 WRITE_ONCE(b, 9);···180173Don't even -think- about doing this without proper use of memory barriers,181174locks, or atomic operations if variable a can change at runtime!182175183183-*** WARNING: READ_ONCE() OR WRITE_ONCE() DO NOT IMPLY A BARRIER! ***176176+.. warning::177177+178178+ ``READ_ONCE()`` OR ``WRITE_ONCE()`` DO NOT IMPLY A BARRIER!184179185180Now, we move onto the atomic operation interfaces typically implemented with186186-the help of assembly code.181181+the help of assembly code. ::187182188183 void atomic_add(int i, atomic_t *v);189184 void atomic_sub(int i, atomic_t *v);···201192require any explicit memory barriers. They need only perform the202193atomic_t counter update in an SMP safe manner.203194204204-Next, we have:195195+Next, we have::205196206197 int atomic_inc_return(atomic_t *v);207198 int atomic_dec_return(atomic_t *v);···223214memory barrier semantics which satisfy the above requirements, that is224215fine as well.225216226226-Let's move on:217217+Let's move on::227218228219 int atomic_add_return(int i, atomic_t *v);229220 int atomic_sub_return(int i, atomic_t *v);···233224This means that like atomic_{inc,dec}_return(), the memory barrier234225semantics are required.235226236236-Next:227227+Next::237228238229 int atomic_inc_and_test(atomic_t *v);239230 int atomic_dec_and_test(atomic_t *v);···243234resulting counter value was zero or not.244235245236Again, these primitives provide explicit memory barrier semantics around246246-the atomic operation.237237+the atomic operation::247238248239 int atomic_sub_and_test(int i, atomic_t *v);249240250241This is identical to atomic_dec_and_test() except that an explicit251242decrement is given instead of the implicit "1". This primitive must252252-provide explicit memory barrier semantics around the operation.243243+provide explicit memory barrier semantics around the operation::253244254245 int atomic_add_negative(int i, atomic_t *v);255246···258249This primitive must provide explicit memory barrier semantics around259250the operation.260251261261-Then:252252+Then::262253263254 int atomic_xchg(atomic_t *v, int new);264255···266257the given new value. It returns the old value that the atomic variable v had267258just before the operation.268259269269-atomic_xchg must provide explicit memory barriers around the operation.260260+atomic_xchg must provide explicit memory barriers around the operation. ::270261271262 int atomic_cmpxchg(atomic_t *v, int old, int new);272263273264This performs an atomic compare exchange operation on the atomic value v,274265with the given old and new values. Like all atomic_xxx operations,275266atomic_cmpxchg will only satisfy its atomicity semantics as long as all276276-other accesses of *v are performed through atomic_xxx operations.267267+other accesses of \*v are performed through atomic_xxx operations.277268278269atomic_cmpxchg must provide explicit memory barriers around the operation,279270although if the comparison fails then no memory ordering guarantees are···282273The semantics for atomic_cmpxchg are the same as those defined for 'cas'283274below.284275285285-Finally:276276+Finally::286277287278 int atomic_add_unless(atomic_t *v, int a, int u);288279···298289299290If a caller requires memory barrier semantics around an atomic_t300291operation which does not return a value, a set of interfaces are301301-defined which accomplish this:292292+defined which accomplish this::302293303294 void smp_mb__before_atomic(void);304295 void smp_mb__after_atomic(void);305296306306-For example, smp_mb__before_atomic() can be used like so:297297+For example, smp_mb__before_atomic() can be used like so::307298308299 obj->dead = 1;309300 smp_mb__before_atomic();···324315an example, which follows a pattern occurring frequently in the Linux325316kernel. It is the use of atomic counters to implement reference326317counting, and it works such that once the counter falls to zero it can327327-be guaranteed that no other entity can be accessing the object:318318+be guaranteed that no other entity can be accessing the object::328319329329-static void obj_list_add(struct obj *obj, struct list_head *head)330330-{331331- obj->active = 1;332332- list_add(&obj->list, head);333333-}320320+ static void obj_list_add(struct obj *obj, struct list_head *head)321321+ {322322+ obj->active = 1;323323+ list_add(&obj->list, head);324324+ }334325335335-static void obj_list_del(struct obj *obj)336336-{337337- list_del(&obj->list);338338- obj->active = 0;339339-}326326+ static void obj_list_del(struct obj *obj)327327+ {328328+ list_del(&obj->list);329329+ obj->active = 0;330330+ }340331341341-static void obj_destroy(struct obj *obj)342342-{343343- BUG_ON(obj->active);344344- kfree(obj);345345-}332332+ static void obj_destroy(struct obj *obj)333333+ {334334+ BUG_ON(obj->active);335335+ kfree(obj);336336+ }346337347347-struct obj *obj_list_peek(struct list_head *head)348348-{349349- if (!list_empty(head)) {338338+ struct obj *obj_list_peek(struct list_head *head)339339+ {340340+ if (!list_empty(head)) {341341+ struct obj *obj;342342+343343+ obj = list_entry(head->next, struct obj, list);344344+ atomic_inc(&obj->refcnt);345345+ return obj;346346+ }347347+ return NULL;348348+ }349349+350350+ void obj_poke(void)351351+ {350352 struct obj *obj;351353352352- obj = list_entry(head->next, struct obj, list);353353- atomic_inc(&obj->refcnt);354354- return obj;354354+ spin_lock(&global_list_lock);355355+ obj = obj_list_peek(&global_list);356356+ spin_unlock(&global_list_lock);357357+358358+ if (obj) {359359+ obj->ops->poke(obj);360360+ if (atomic_dec_and_test(&obj->refcnt))361361+ obj_destroy(obj);362362+ }355363 }356356- return NULL;357357-}358364359359-void obj_poke(void)360360-{361361- struct obj *obj;365365+ void obj_timeout(struct obj *obj)366366+ {367367+ spin_lock(&global_list_lock);368368+ obj_list_del(obj);369369+ spin_unlock(&global_list_lock);362370363363- spin_lock(&global_list_lock);364364- obj = obj_list_peek(&global_list);365365- spin_unlock(&global_list_lock);366366-367367- if (obj) {368368- obj->ops->poke(obj);369371 if (atomic_dec_and_test(&obj->refcnt))370372 obj_destroy(obj);371373 }372372-}373374374374-void obj_timeout(struct obj *obj)375375-{376376- spin_lock(&global_list_lock);377377- obj_list_del(obj);378378- spin_unlock(&global_list_lock);375375+.. note::379376380380- if (atomic_dec_and_test(&obj->refcnt))381381- obj_destroy(obj);382382-}383383-384384-(This is a simplification of the ARP queue management in the385385- generic neighbour discover code of the networking. Olaf Kirch386386- found a bug wrt. memory barriers in kfree_skb() that exposed387387- the atomic_t memory barrier requirements quite clearly.)377377+ This is a simplification of the ARP queue management in the generic378378+ neighbour discover code of the networking. Olaf Kirch found a bug wrt.379379+ memory barriers in kfree_skb() that exposed the atomic_t memory barrier380380+ requirements quite clearly.388381389382Given the above scheme, it must be the case that the obj->active390383update done by the obj list deletion be visible to other processors···394383395384Otherwise, the counter could fall to zero, yet obj->active would still396385be set, thus triggering the assertion in obj_destroy(). The error397397-sequence looks like this:386386+sequence looks like this::398387399388 cpu 0 cpu 1400389 obj_poke() obj_timeout()···431420Another note is that the atomic_t operations returning values are432421extremely slow on an old 386.433422423423+424424+Atomic Bitmask425425+==============426426+434427We will now cover the atomic bitmask operations. You will find that435428their SMP and memory barrier semantics are similar in shape and scope436429to the atomic_t ops above.···442427Native atomic bit operations are defined to operate on objects aligned443428to the size of an "unsigned long" C data type, and are least of that444429size. The endianness of the bits within each "unsigned long" are the445445-native endianness of the cpu.430430+native endianness of the cpu. ::446431447432 void set_bit(unsigned long nr, volatile unsigned long *addr);448433 void clear_bit(unsigned long nr, volatile unsigned long *addr);···452437indicated by "nr" on the bit mask pointed to by "ADDR".453438454439They must execute atomically, yet there are no implicit memory barrier455455-semantics required of these interfaces.440440+semantics required of these interfaces. ::456441457442 int test_and_set_bit(unsigned long nr, volatile unsigned long *addr);458443 int test_and_clear_bit(unsigned long nr, volatile unsigned long *addr);···481466All memory operations before the atomic bit operation call must be482467made visible globally before the atomic bit operation is made visible.483468Likewise, the atomic bit operation must be visible globally before any484484-subsequent memory operation is made visible. For example:469469+subsequent memory operation is made visible. For example::485470486471 obj->dead = 1;487472 if (test_and_set_bit(0, &obj->flags))···494479memory operation done by test_and_set_bit() must become visible before495480"obj->killed = 1;" is visible.496481497497-Finally there is the basic operation:482482+Finally there is the basic operation::498483499484 int test_bit(unsigned long nr, __const__ volatile unsigned long *addr);500485···503488504489If explicit memory barriers are required around {set,clear}_bit() (which do505490not return a value, and thus does not need to provide memory barrier506506-semantics), two interfaces are provided:491491+semantics), two interfaces are provided::507492508493 void smp_mb__before_atomic(void);509494 void smp_mb__after_atomic(void);510495511496They are used as follows, and are akin to their atomic_t operation512512-brothers:497497+brothers::513498514499 /* All memory operations before this call will515500 * be globally visible before the clear_bit().···526511same as spinlocks). These operate in the same way as their non-_lock/unlock527512postfixed variants, except that they are to provide acquire/release semantics,528513respectively. This means they can be used for bit_spin_trylock and529529-bit_spin_unlock type operations without specifying any more barriers.514514+bit_spin_unlock type operations without specifying any more barriers. ::530515531516 int test_and_set_bit_lock(unsigned long nr, unsigned long *addr);532517 void clear_bit_unlock(unsigned long nr, unsigned long *addr);···541526locking scheme is being used to protect the bitmask, and thus less542527expensive non-atomic operations may be used in the implementation.543528They have names similar to the above bitmask operation interfaces,544544-except that two underscores are prefixed to the interface name.529529+except that two underscores are prefixed to the interface name. ::545530546531 void __set_bit(unsigned long nr, volatile unsigned long *addr);547532 void __clear_bit(unsigned long nr, volatile unsigned long *addr);···557542memory-barrier semantics as the atomic and bit operations returning558543values.559544560560-Note: If someone wants to use xchg(), cmpxchg() and their variants,561561-linux/atomic.h should be included rather than asm/cmpxchg.h, unless562562-the code is in arch/* and can take care of itself.545545+.. note::546546+547547+ If someone wants to use xchg(), cmpxchg() and their variants,548548+ linux/atomic.h should be included rather than asm/cmpxchg.h, unless the549549+ code is in arch/* and can take care of itself.563550564551Spinlocks and rwlocks have memory barrier expectations as well.565552The rule to follow is simple:···575558576559Which finally brings us to _atomic_dec_and_lock(). There is an577560architecture-neutral version implemented in lib/dec_and_lock.c,578578-but most platforms will wish to optimize this in assembler.561561+but most platforms will wish to optimize this in assembler. ::579562580563 int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock);581564···590573subsequent memory operation.591574592575We can demonstrate this operation more clearly if we define593593-an abstract atomic operation:576576+an abstract atomic operation::594577595578 long cas(long *mem, long old, long new);596579···6015843) Regardless, the current value at "mem" is returned.602585603586As an example usage, here is what an atomic counter update604604-might look like:587587+might look like::605588606606-void example_atomic_inc(long *counter)607607-{608608- long old, new, ret;589589+ void example_atomic_inc(long *counter)590590+ {591591+ long old, new, ret;609592610610- while (1) {611611- old = *counter;612612- new = old + 1;593593+ while (1) {594594+ old = *counter;595595+ new = old + 1;613596614614- ret = cas(counter, old, new);615615- if (ret == old)616616- break;617617- }618618-}619619-620620-Let's use cas() in order to build a pseudo-C atomic_dec_and_lock():621621-622622-int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)623623-{624624- long old, new, ret;625625- int went_to_zero;626626-627627- went_to_zero = 0;628628- while (1) {629629- old = atomic_read(atomic);630630- new = old - 1;631631- if (new == 0) {632632- went_to_zero = 1;633633- spin_lock(lock);634634- }635635- ret = cas(atomic, old, new);636636- if (ret == old)637637- break;638638- if (went_to_zero) {639639- spin_unlock(lock);640640- went_to_zero = 0;597597+ ret = cas(counter, old, new);598598+ if (ret == old)599599+ break;641600 }642601 }643602644644- return went_to_zero;645645-}603603+Let's use cas() in order to build a pseudo-C atomic_dec_and_lock()::604604+605605+ int _atomic_dec_and_lock(atomic_t *atomic, spinlock_t *lock)606606+ {607607+ long old, new, ret;608608+ int went_to_zero;609609+610610+ went_to_zero = 0;611611+ while (1) {612612+ old = atomic_read(atomic);613613+ new = old - 1;614614+ if (new == 0) {615615+ went_to_zero = 1;616616+ spin_lock(lock);617617+ }618618+ ret = cas(atomic, old, new);619619+ if (ret == old)620620+ break;621621+ if (went_to_zero) {622622+ spin_unlock(lock);623623+ went_to_zero = 0;624624+ }625625+ }626626+627627+ return went_to_zero;628628+ }646629647630Now, as far as memory barriers go, as long as spin_lock()648631strictly orders all subsequent memory operations (including···652635a counter dropping to zero is never made visible before the653636spinlock being acquired.654637655655-Note that this also means that for the case where the counter656656-is not dropping to zero, there are no memory ordering657657-requirements.638638+.. note::639639+640640+ Note that this also means that for the case where the counter is not641641+ dropping to zero, there are no memory ordering requirements.