Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

refcount: provide ops for cases when object's memory can be reused

For speculative lookups where a successful inc_not_zero() pins the object,
but where we still need to double check if the object acquired is indeed
the one we set out to acquire (identity check), needs this validation to
happen *after* the increment. Similarly, when a new object is initialized
and its memory might have been previously occupied by another object, all
stores to initialize the object should happen *before* refcount
initialization.

Notably SLAB_TYPESAFE_BY_RCU is one such an example when this ordering is
required for reference counting.

Add refcount_{add|inc}_not_zero_acquire() to guarantee the proper ordering
between acquiring a reference count on an object and performing the
identity check for that object.

Add refcount_set_release() to guarantee proper ordering between stores
initializing object attributes and the store initializing the refcount.
refcount_set_release() should be done after all other object attributes
are initialized. Once refcount_set_release() is called, the object should
be considered visible to other tasks even if it was not yet added into an
object collection normally used to discover it. This is because other
tasks might have discovered the object previously occupying the same
memory and after memory reuse they can succeed in taking refcount for the
new object and start using it.

Object reuse example to consider:

consumer:
obj = lookup(collection, key);
if (!refcount_inc_not_zero_acquire(&obj->ref))
return;
if (READ_ONCE(obj->key) != key) { /* identity check */
put_ref(obj);
return;
}
use(obj->value);

producer:
remove(collection, obj->key);
if (!refcount_dec_and_test(&obj->ref))
return;
obj->key = KEY_INVALID;
free(obj);
obj = malloc(); /* obj is reused */
obj->key = new_key;
obj->value = new_value;
refcount_set_release(obj->ref, 1);
add(collection, new_key, obj);

refcount_{add|inc}_not_zero_acquire() is required to prevent the following
reordering when refcount_inc_not_zero() is used instead:

consumer:
obj = lookup(collection, key);
if (READ_ONCE(obj->key) != key) { /* reordered identity check */
put_ref(obj);
return;
}
producer:
remove(collection, obj->key);
if (!refcount_dec_and_test(&obj->ref))
return;
obj->key = KEY_INVALID;
free(obj);
obj = malloc(); /* obj is reused */
obj->key = new_key;
obj->value = new_value;
refcount_set_release(obj->ref, 1);
add(collection, new_key, obj);

if (!refcount_inc_not_zero(&obj->ref))
return;
use(obj->value); /* USING WRONG OBJECT */

refcount_set_release() is required to prevent the following reordering
when refcount_set() is used instead:

consumer:
obj = lookup(collection, key);

producer:
remove(collection, obj->key);
if (!refcount_dec_and_test(&obj->ref))
return;
obj->key = KEY_INVALID;
free(obj);
obj = malloc(); /* obj is reused */
obj->key = new_key; /* new_key == old_key */
refcount_set(obj->ref, 1);

if (!refcount_inc_not_zero_acquire(&obj->ref))
return;
if (READ_ONCE(obj->key) != key) { /* pass since new_key == old_key */
put_ref(obj);
return;
}
use(obj->value); /* USING STALE obj->value */

obj->value = new_value; /* reordered store */
add(collection, key, obj);

[surenb@google.com: fix title underlines in refcount-vs-atomic.rst]
Link: https://lkml.kernel.org/r/20250217161645.3137927-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250213224655.1680278-11-surenb@google.com
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz> [slab]
Tested-by: Shivank Garg <shivankg@amd.com>
Link: https://lkml.kernel.org/r/5e19ec93-8307-47c2-bb13-3ddf7150624e@amd.com
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Klara Modin <klarasmodin@gmail.com>
Cc: Liam R. Howlett <Liam.Howlett@Oracle.com>
Cc: Lokesh Gidra <lokeshgidra@google.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Mateusz Guzik <mjguzik@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Sourav Panda <souravpanda@google.com>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Suren Baghdasaryan and committed by
Andrew Morton
7f8ceea0 45ad9f52

+156 -6
+10
Documentation/RCU/whatisRCU.rst
··· 971 971 initialized after each and every call to kmem_cache_alloc(), which renders 972 972 reference-free spinlock acquisition completely unsafe. Therefore, when 973 973 using ``SLAB_TYPESAFE_BY_RCU``, make proper use of a reference counter. 974 + If using refcount_t, the specialized refcount_{add|inc}_not_zero_acquire() 975 + and refcount_set_release() APIs should be used to ensure correct operation 976 + ordering when verifying object identity and when initializing newly 977 + allocated objects. Acquire fence in refcount_{add|inc}_not_zero_acquire() 978 + ensures that identity checks happen *after* reference count is taken. 979 + refcount_set_release() should be called after a newly allocated object is 980 + fully initialized and release fence ensures that new values are visible 981 + *before* refcount can be successfully taken by other users. Once 982 + refcount_set_release() is called, the object should be considered visible 983 + by other tasks. 974 984 (Those willing to initialize their locks in a kmem_cache constructor 975 985 may also use locking, including cache-friendly sequence locking.) 976 986
+31 -6
Documentation/core-api/refcount-vs-atomic.rst
··· 86 86 * none (both fully unordered) 87 87 88 88 89 - case 2) - increment-based ops that return no value 89 + case 2) - non-"Read/Modify/Write" (RMW) ops with release ordering 90 + ----------------------------------------------------------------- 91 + 92 + Function changes: 93 + 94 + * atomic_set_release() --> refcount_set_release() 95 + 96 + Memory ordering guarantee changes: 97 + 98 + * none (both provide RELEASE ordering) 99 + 100 + 101 + case 3) - increment-based ops that return no value 90 102 -------------------------------------------------- 91 103 92 104 Function changes: ··· 110 98 111 99 * none (both fully unordered) 112 100 113 - case 3) - decrement-based RMW ops that return no value 101 + case 4) - decrement-based RMW ops that return no value 114 102 ------------------------------------------------------ 115 103 116 104 Function changes: ··· 122 110 * fully unordered --> RELEASE ordering 123 111 124 112 125 - case 4) - increment-based RMW ops that return a value 113 + case 5) - increment-based RMW ops that return a value 126 114 ----------------------------------------------------- 127 115 128 116 Function changes: ··· 138 126 result of obtaining pointer to the object! 139 127 140 128 141 - case 5) - generic dec/sub decrement-based RMW ops that return a value 129 + case 6) - increment-based RMW ops with acquire ordering that return a value 130 + --------------------------------------------------------------------------- 131 + 132 + Function changes: 133 + 134 + * atomic_inc_not_zero() --> refcount_inc_not_zero_acquire() 135 + * no atomic counterpart --> refcount_add_not_zero_acquire() 136 + 137 + Memory ordering guarantees changes: 138 + 139 + * fully ordered --> ACQUIRE ordering on success 140 + 141 + 142 + case 7) - generic dec/sub decrement-based RMW ops that return a value 142 143 --------------------------------------------------------------------- 143 144 144 145 Function changes: ··· 164 139 * fully ordered --> RELEASE ordering + ACQUIRE ordering on success 165 140 166 141 167 - case 6) other decrement-based RMW ops that return a value 142 + case 8) other decrement-based RMW ops that return a value 168 143 --------------------------------------------------------- 169 144 170 145 Function changes: ··· 179 154 .. note:: atomic_add_unless() only provides full order on success. 180 155 181 156 182 - case 7) - lock-based RMW 157 + case 9) - lock-based RMW 183 158 ------------------------ 184 159 185 160 Function changes:
+106
include/linux/refcount.h
··· 87 87 * The decrements dec_and_test() and sub_and_test() also provide acquire 88 88 * ordering on success. 89 89 * 90 + * refcount_{add|inc}_not_zero_acquire() and refcount_set_release() provide 91 + * acquire and release ordering for cases when the memory occupied by the 92 + * object might be reused to store another object. This is important for the 93 + * cases where secondary validation is required to detect such reuse, e.g. 94 + * SLAB_TYPESAFE_BY_RCU. The secondary validation checks have to happen after 95 + * the refcount is taken, hence acquire order is necessary. Similarly, when the 96 + * object is initialized, all stores to its attributes should be visible before 97 + * the refcount is set, otherwise a stale attribute value might be used by 98 + * another task which succeeds in taking a refcount to the new object. 90 99 */ 91 100 92 101 #ifndef _LINUX_REFCOUNT_H ··· 132 123 static inline void refcount_set(refcount_t *r, int n) 133 124 { 134 125 atomic_set(&r->refs, n); 126 + } 127 + 128 + /** 129 + * refcount_set_release - set a refcount's value with release ordering 130 + * @r: the refcount 131 + * @n: value to which the refcount will be set 132 + * 133 + * This function should be used when memory occupied by the object might be 134 + * reused to store another object -- consider SLAB_TYPESAFE_BY_RCU. 135 + * 136 + * Provides release memory ordering which will order previous memory operations 137 + * against this store. This ensures all updates to this object are visible 138 + * once the refcount is set and stale values from the object previously 139 + * occupying this memory are overwritten with new ones. 140 + * 141 + * This function should be called only after new object is fully initialized. 142 + * After this call the object should be considered visible to other tasks even 143 + * if it was not yet added into an object collection normally used to discover 144 + * it. This is because other tasks might have discovered the object previously 145 + * occupying the same memory and after memory reuse they can succeed in taking 146 + * refcount to the new object and start using it. 147 + */ 148 + static inline void refcount_set_release(refcount_t *r, int n) 149 + { 150 + atomic_set_release(&r->refs, n); 135 151 } 136 152 137 153 /** ··· 210 176 static inline __must_check bool refcount_add_not_zero(int i, refcount_t *r) 211 177 { 212 178 return __refcount_add_not_zero(i, r, NULL); 179 + } 180 + 181 + static inline __must_check __signed_wrap 182 + bool __refcount_add_not_zero_acquire(int i, refcount_t *r, int *oldp) 183 + { 184 + int old = refcount_read(r); 185 + 186 + do { 187 + if (!old) 188 + break; 189 + } while (!atomic_try_cmpxchg_acquire(&r->refs, &old, old + i)); 190 + 191 + if (oldp) 192 + *oldp = old; 193 + 194 + if (unlikely(old < 0 || old + i < 0)) 195 + refcount_warn_saturate(r, REFCOUNT_ADD_NOT_ZERO_OVF); 196 + 197 + return old; 198 + } 199 + 200 + /** 201 + * refcount_add_not_zero_acquire - add a value to a refcount with acquire ordering unless it is 0 202 + * 203 + * @i: the value to add to the refcount 204 + * @r: the refcount 205 + * 206 + * Will saturate at REFCOUNT_SATURATED and WARN. 207 + * 208 + * This function should be used when memory occupied by the object might be 209 + * reused to store another object -- consider SLAB_TYPESAFE_BY_RCU. 210 + * 211 + * Provides acquire memory ordering on success, it is assumed the caller has 212 + * guaranteed the object memory to be stable (RCU, etc.). It does provide a 213 + * control dependency and thereby orders future stores. See the comment on top. 214 + * 215 + * Use of this function is not recommended for the normal reference counting 216 + * use case in which references are taken and released one at a time. In these 217 + * cases, refcount_inc_not_zero_acquire() should instead be used to increment a 218 + * reference count. 219 + * 220 + * Return: false if the passed refcount is 0, true otherwise 221 + */ 222 + static inline __must_check bool refcount_add_not_zero_acquire(int i, refcount_t *r) 223 + { 224 + return __refcount_add_not_zero_acquire(i, r, NULL); 213 225 } 214 226 215 227 static inline __signed_wrap ··· 314 234 static inline __must_check bool refcount_inc_not_zero(refcount_t *r) 315 235 { 316 236 return __refcount_inc_not_zero(r, NULL); 237 + } 238 + 239 + static inline __must_check bool __refcount_inc_not_zero_acquire(refcount_t *r, int *oldp) 240 + { 241 + return __refcount_add_not_zero_acquire(1, r, oldp); 242 + } 243 + 244 + /** 245 + * refcount_inc_not_zero_acquire - increment a refcount with acquire ordering unless it is 0 246 + * @r: the refcount to increment 247 + * 248 + * Similar to refcount_inc_not_zero(), but provides acquire memory ordering on 249 + * success. 250 + * 251 + * This function should be used when memory occupied by the object might be 252 + * reused to store another object -- consider SLAB_TYPESAFE_BY_RCU. 253 + * 254 + * Provides acquire memory ordering on success, it is assumed the caller has 255 + * guaranteed the object memory to be stable (RCU, etc.). It does provide a 256 + * control dependency and thereby orders future stores. See the comment on top. 257 + * 258 + * Return: true if the increment was successful, false otherwise 259 + */ 260 + static inline __must_check bool refcount_inc_not_zero_acquire(refcount_t *r) 261 + { 262 + return __refcount_inc_not_zero_acquire(r, NULL); 317 263 } 318 264 319 265 static inline void __refcount_inc(refcount_t *r, int *oldp)
+9
include/linux/slab.h
··· 136 136 * rcu_read_lock before reading the address, then rcu_read_unlock after 137 137 * taking the spinlock within the structure expected at that address. 138 138 * 139 + * Note that object identity check has to be done *after* acquiring a 140 + * reference, therefore user has to ensure proper ordering for loads. 141 + * Similarly, when initializing objects allocated with SLAB_TYPESAFE_BY_RCU, 142 + * the newly allocated object has to be fully initialized *before* its 143 + * refcount gets initialized and proper ordering for stores is required. 144 + * refcount_{add|inc}_not_zero_acquire() and refcount_set_release() are 145 + * designed with the proper fences required for reference counting objects 146 + * allocated with SLAB_TYPESAFE_BY_RCU. 147 + * 139 148 * Note that it is not possible to acquire a lock within a structure 140 149 * allocated with SLAB_TYPESAFE_BY_RCU without first acquiring a reference 141 150 * as described above. The reason is that SLAB_TYPESAFE_BY_RCU pages