Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

rcuref: Plug slowpath race in rcuref_put()

Kernel test robot reported an "imbalanced put" in the rcuref_put() slow
path, which turned out to be a false positive. Consider the following race:

ref = 0 (via rcuref_init(ref, 1))
T1 T2
rcuref_put(ref)
-> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff
-> rcuref_put_slowpath(ref)
rcuref_get(ref)
-> atomic_add_negative_relaxed(1, &ref->refcnt)
-> return true; # ref -> 0

rcuref_put(ref)
-> atomic_add_negative_release(-1, ref) # ref -> 0xffffffff
-> rcuref_put_slowpath()

-> cnt = atomic_read(&ref->refcnt); # cnt -> 0xffffffff / RCUREF_NOREF
-> atomic_try_cmpxchg_release(&ref->refcnt, &cnt, RCUREF_DEAD)) # ref -> 0xe0000000 / RCUREF_DEAD
-> return true
-> cnt = atomic_read(&ref->refcnt); # cnt -> 0xe0000000 / RCUREF_DEAD
-> if (cnt > RCUREF_RELEASED) # 0xe0000000 > 0xc0000000
-> WARN_ONCE(cnt >= RCUREF_RELEASED, "rcuref - imbalanced put()")

The problem is the additional read in the slow path (after it
decremented to RCUREF_NOREF) which can happen after the counter has been
marked RCUREF_DEAD.

Prevent this by reusing the return value of the decrement. Now every "final"
put uses RCUREF_NOREF in the slow path and attempts the final cmpxchg() to
RCUREF_DEAD.

[ bigeasy: Add changelog ]

Fixes: ee1ee6db07795 ("atomics: Provide rcuref - scalable reference counting")
Reported-by: kernel test robot <oliver.sang@intel.com>
Debugged-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: stable@vger.kernel.org
Closes: https://lore.kernel.org/oe-lkp/202412311453.9d7636a2-lkp@intel.com

+8 -6
+6 -3
include/linux/rcuref.h
··· 71 71 return rcuref_get_slowpath(ref); 72 72 } 73 73 74 - extern __must_check bool rcuref_put_slowpath(rcuref_t *ref); 74 + extern __must_check bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt); 75 75 76 76 /* 77 77 * Internal helper. Do not invoke directly. 78 78 */ 79 79 static __always_inline __must_check bool __rcuref_put(rcuref_t *ref) 80 80 { 81 + int cnt; 82 + 81 83 RCU_LOCKDEP_WARN(!rcu_read_lock_held() && preemptible(), 82 84 "suspicious rcuref_put_rcusafe() usage"); 83 85 /* 84 86 * Unconditionally decrease the reference count. The saturation and 85 87 * dead zones provide enough tolerance for this. 86 88 */ 87 - if (likely(!atomic_add_negative_release(-1, &ref->refcnt))) 89 + cnt = atomic_sub_return_release(1, &ref->refcnt); 90 + if (likely(cnt >= 0)) 88 91 return false; 89 92 90 93 /* 91 94 * Handle the last reference drop and cases inside the saturation 92 95 * and dead zones. 93 96 */ 94 - return rcuref_put_slowpath(ref); 97 + return rcuref_put_slowpath(ref, cnt); 95 98 } 96 99 97 100 /**
+2 -3
lib/rcuref.c
··· 220 220 /** 221 221 * rcuref_put_slowpath - Slowpath of __rcuref_put() 222 222 * @ref: Pointer to the reference count 223 + * @cnt: The resulting value of the fastpath decrement 223 224 * 224 225 * Invoked when the reference count is outside of the valid zone. 225 226 * ··· 234 233 * with a concurrent get()/put() pair. Caller is not allowed to 235 234 * deconstruct the protected object. 236 235 */ 237 - bool rcuref_put_slowpath(rcuref_t *ref) 236 + bool rcuref_put_slowpath(rcuref_t *ref, unsigned int cnt) 238 237 { 239 - unsigned int cnt = atomic_read(&ref->refcnt); 240 - 241 238 /* Did this drop the last reference? */ 242 239 if (likely(cnt == RCUREF_NOREF)) { 243 240 /*