Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

net: tls: avoid hanging tasks on the tx_lock

syzbot sent a hung task report and Eric explains that adversarial
receiver may keep RWIN at 0 for a long time, so we are not guaranteed
to make forward progress. Thread which took tx_lock and went to sleep
may not release tx_lock for hours. Use interruptible sleep where
possible and reschedule the work if it can't take the lock.

Testing: existing selftest passes

Reported-by: syzbot+9c0268252b8ef967c62e@syzkaller.appspotmail.com
Fixes: 79ffe6087e91 ("net/tls: add a TX lock")
Link: https://lore.kernel.org/all/000000000000e412e905f5b46201@google.com/
Cc: stable@vger.kernel.org # wait 4 weeks
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20230301002857.2101894-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+19 -7
+19 -7
net/tls/tls_sw.c
··· 956 956 MSG_CMSG_COMPAT)) 957 957 return -EOPNOTSUPP; 958 958 959 - mutex_lock(&tls_ctx->tx_lock); 959 + ret = mutex_lock_interruptible(&tls_ctx->tx_lock); 960 + if (ret) 961 + return ret; 960 962 lock_sock(sk); 961 963 962 964 if (unlikely(msg->msg_controllen)) { ··· 1292 1290 MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY)) 1293 1291 return -EOPNOTSUPP; 1294 1292 1295 - mutex_lock(&tls_ctx->tx_lock); 1293 + ret = mutex_lock_interruptible(&tls_ctx->tx_lock); 1294 + if (ret) 1295 + return ret; 1296 1296 lock_sock(sk); 1297 1297 ret = tls_sw_do_sendpage(sk, page, offset, size, flags); 1298 1298 release_sock(sk); ··· 2439 2435 2440 2436 if (!test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) 2441 2437 return; 2442 - mutex_lock(&tls_ctx->tx_lock); 2443 - lock_sock(sk); 2444 - tls_tx_records(sk, -1); 2445 - release_sock(sk); 2446 - mutex_unlock(&tls_ctx->tx_lock); 2438 + 2439 + if (mutex_trylock(&tls_ctx->tx_lock)) { 2440 + lock_sock(sk); 2441 + tls_tx_records(sk, -1); 2442 + release_sock(sk); 2443 + mutex_unlock(&tls_ctx->tx_lock); 2444 + } else if (!test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) { 2445 + /* Someone is holding the tx_lock, they will likely run Tx 2446 + * and cancel the work on their way out of the lock section. 2447 + * Schedule a long delay just in case. 2448 + */ 2449 + schedule_delayed_work(&ctx->tx_work.work, msecs_to_jiffies(10)); 2450 + } 2447 2451 } 2448 2452 2449 2453 static bool tls_is_tx_ready(struct tls_sw_context_tx *ctx)