Revert "smb: client: fix TCP timers deadlock after rmmod"

This reverts commit e9f2517a3e18a54a3943c098d2226b245d488801.

Commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after
rmmod") is intended to fix a null-ptr-deref in LOCKDEP, which is
mentioned as CVE-2024-54680, but is actually did not fix anything;
The issue can be reproduced on top of it. [0]

Also, it reverted the change by commit ef7134c7fc48 ("smb: client:
Fix use-after-free of network namespace.") and introduced a real
issue by reviving the kernel TCP socket.

When a reconnect happens for a CIFS connection, the socket state
transitions to FIN_WAIT_1. Then, inet_csk_clear_xmit_timers_sync()
in tcp_close() stops all timers for the socket.

If an incoming FIN packet is lost, the socket will stay at FIN_WAIT_1
forever, and such sockets could be leaked up to net.ipv4.tcp_max_orphans.

Usually, FIN can be retransmitted by the peer, but if the peer aborts
the connection, the issue comes into reality.

I warned about this privately by pointing out the exact report [1],
but the bogus fix was finally merged.

So, we should not stop the timers to finally kill the connection on
our side in that case, meaning we must not use a kernel socket for
TCP whose sk->sk_net_refcnt is 0.

The kernel socket does not have a reference to its netns to make it
possible to tear down netns without cleaning up every resource in it.

For example, tunnel devices use a UDP socket internally, but we can
destroy netns without removing such devices and let it complete
during exit. Otherwise, netns would be leaked when the last application
died.

However, this is problematic for TCP sockets because TCP has timers to
close the connection gracefully even after the socket is close()d. The
lifetime of the socket and its netns is different from the lifetime of
the underlying connection.

If the socket user does not maintain the netns lifetime, the timer could
be fired after the socket is close()d and its netns is freed up, resulting
in use-after-free.

Actually, we have seen so many similar issues and converted such sockets
to have a reference to netns.

That's why I converted the CIFS client socket to have a reference to
netns (sk->sk_net_refcnt == 1), which is somehow mentioned as out-of-scope
of CIFS and technically wrong in e9f2517a3e18, but **is in-scope and right
fix**.

Regarding the LOCKDEP issue, we can prevent the module unload by
bumping the module refcount when switching the LOCKDDEP key in
sock_lock_init_class_and_name(). [2]

For a while, let's revert the bogus fix.

Note that now we can use sk_net_refcnt_upgrade() for the socket
conversion, but I'll do so later separately to make backport easy.

Link: https://lore.kernel.org/all/20250402020807.28583-1-kuniyu@amazon.com/ #[0]
Link: https://lore.kernel.org/netdev/c08bd5378da647a2a4c16698125d180a@huawei.com/ #[1]
Link: https://lore.kernel.org/lkml/20250402005841.19846-1-kuniyu@amazon.com/ #[2]
Fixes: e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>

authored by Kuniyuki Iwashima and committed by Steve French 95d2b9f6 c707193a

+10 -26
+10 -26
fs/smb/client/connect.c
··· 1073 1073 msleep(125); 1074 1074 if (cifs_rdma_enabled(server)) 1075 1075 smbd_destroy(server); 1076 - 1077 1076 if (server->ssocket) { 1078 1077 sock_release(server->ssocket); 1079 1078 server->ssocket = NULL; 1080 - 1081 - /* Release netns reference for the socket. */ 1082 - put_net(cifs_net_ns(server)); 1083 1079 } 1084 1080 1085 1081 if (!list_empty(&server->pending_mid_q)) { ··· 1123 1127 */ 1124 1128 } 1125 1129 1126 - /* Release netns reference for this server. */ 1127 1130 put_net(cifs_net_ns(server)); 1128 1131 kfree(server->leaf_fullpath); 1129 1132 kfree(server->hostname); ··· 1768 1773 1769 1774 tcp_ses->ops = ctx->ops; 1770 1775 tcp_ses->vals = ctx->vals; 1771 - 1772 - /* Grab netns reference for this server. */ 1773 1776 cifs_set_net_ns(tcp_ses, get_net(current->nsproxy->net_ns)); 1774 1777 1775 1778 tcp_ses->sign = ctx->sign; ··· 1895 1902 out_err_crypto_release: 1896 1903 cifs_crypto_secmech_release(tcp_ses); 1897 1904 1898 - /* Release netns reference for this server. */ 1899 1905 put_net(cifs_net_ns(tcp_ses)); 1900 1906 1901 1907 out_err: ··· 1903 1911 cifs_put_tcp_session(tcp_ses->primary_server, false); 1904 1912 kfree(tcp_ses->hostname); 1905 1913 kfree(tcp_ses->leaf_fullpath); 1906 - if (tcp_ses->ssocket) { 1914 + if (tcp_ses->ssocket) 1907 1915 sock_release(tcp_ses->ssocket); 1908 - put_net(cifs_net_ns(tcp_ses)); 1909 - } 1910 1916 kfree(tcp_ses); 1911 1917 } 1912 1918 return ERR_PTR(rc); ··· 3348 3358 socket = server->ssocket; 3349 3359 } else { 3350 3360 struct net *net = cifs_net_ns(server); 3361 + struct sock *sk; 3351 3362 3352 - rc = sock_create_kern(net, sfamily, SOCK_STREAM, IPPROTO_TCP, &server->ssocket); 3363 + rc = __sock_create(net, sfamily, SOCK_STREAM, 3364 + IPPROTO_TCP, &server->ssocket, 1); 3353 3365 if (rc < 0) { 3354 3366 cifs_server_dbg(VFS, "Error %d creating socket\n", rc); 3355 3367 return rc; 3356 3368 } 3357 3369 3358 - /* 3359 - * Grab netns reference for the socket. 3360 - * 3361 - * It'll be released here, on error, or in clean_demultiplex_info() upon server 3362 - * teardown. 3363 - */ 3364 - get_net(net); 3370 + sk = server->ssocket->sk; 3371 + __netns_tracker_free(net, &sk->ns_tracker, false); 3372 + sk->sk_net_refcnt = 1; 3373 + get_net_track(net, &sk->ns_tracker, GFP_KERNEL); 3374 + sock_inuse_add(net, 1); 3365 3375 3366 3376 /* BB other socket options to set KEEPALIVE, NODELAY? */ 3367 3377 cifs_dbg(FYI, "Socket created\n"); ··· 3375 3385 } 3376 3386 3377 3387 rc = bind_socket(server); 3378 - if (rc < 0) { 3379 - put_net(cifs_net_ns(server)); 3388 + if (rc < 0) 3380 3389 return rc; 3381 - } 3382 3390 3383 3391 /* 3384 3392 * Eventually check for other socket options to change from ··· 3413 3425 if (rc < 0) { 3414 3426 cifs_dbg(FYI, "Error %d connecting to server\n", rc); 3415 3427 trace_smb3_connect_err(server->hostname, server->conn_id, &server->dstaddr, rc); 3416 - put_net(cifs_net_ns(server)); 3417 3428 sock_release(socket); 3418 3429 server->ssocket = NULL; 3419 3430 return rc; ··· 3429 3442 server->rfc1001_sessinit == 1 || 3430 3443 (server->rfc1001_sessinit == -1 && sport == htons(RFC1001_PORT))) 3431 3444 rc = ip_rfc1001_connect(server); 3432 - 3433 - if (rc < 0) 3434 - put_net(cifs_net_ns(server)); 3435 3445 3436 3446 return rc; 3437 3447 }