Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

libceph: avoid KEEPALIVE_PENDING races in ceph_con_keepalive()

con_fault() can transition the connection into STANDBY right after
ceph_con_keepalive() clears STANDBY in clear_standby():

libceph user thread ceph-msgr worker

ceph_con_keepalive()
mutex_lock(&con->mutex)
clear_standby(con)
mutex_unlock(&con->mutex)
mutex_lock(&con->mutex)
con_fault()
...
if KEEPALIVE_PENDING isn't set
set state to STANDBY
...
mutex_unlock(&con->mutex)
set KEEPALIVE_PENDING
set WRITE_PENDING

This triggers warnings in clear_standby() when either ceph_con_send()
or ceph_con_keepalive() get to clearing STANDBY next time.

I don't see a reason to condition queue_con() call on the previous
value of KEEPALIVE_PENDING, so move the setting of KEEPALIVE_PENDING
into the critical section -- unlike WRITE_PENDING, KEEPALIVE_PENDING
could have been a non-atomic flag.

Reported-by: syzbot+acdeb633f6211ccdf886@syzkaller.appspotmail.com
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Tested-by: Myungho Jung <mhjungk@gmail.com>

+3 -2
+3 -2
net/ceph/messenger.c
··· 3206 3206 dout("con_keepalive %p\n", con); 3207 3207 mutex_lock(&con->mutex); 3208 3208 clear_standby(con); 3209 + con_flag_set(con, CON_FLAG_KEEPALIVE_PENDING); 3209 3210 mutex_unlock(&con->mutex); 3210 - if (con_flag_test_and_set(con, CON_FLAG_KEEPALIVE_PENDING) == 0 && 3211 - con_flag_test_and_set(con, CON_FLAG_WRITE_PENDING) == 0) 3211 + 3212 + if (con_flag_test_and_set(con, CON_FLAG_WRITE_PENDING) == 0) 3212 3213 queue_con(con); 3213 3214 } 3214 3215 EXPORT_SYMBOL(ceph_con_keepalive);