Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Treewide: Stop corrupting socket's task_frag

Since moving to memalloc_nofs_save/restore, SUNRPC has stopped setting the
GFP_NOIO flag on sk_allocation which the networking system uses to decide
when it is safe to use current->task_frag. The results of this are
unexpected corruption in task_frag when SUNRPC is involved in memory
reclaim.

The corruption can be seen in crashes, but the root cause is often
difficult to ascertain as a crashing machine's stack trace will have no
evidence of being near NFS or SUNRPC code. I believe this problem to
be much more pervasive than reports to the community may indicate.

Fix this by having kernel users of sockets that may corrupt task_frag due
to reclaim set sk_use_task_frag = false. Preemptively correcting this
situation for users that still set sk_allocation allows them to convert to
memalloc_nofs_save/restore without the same unexpected corruptions that are
sure to follow, unlikely to show up in testing, and difficult to bisect.

CC: Philipp Reisner <philipp.reisner@linbit.com>
CC: Lars Ellenberg <lars.ellenberg@linbit.com>
CC: "Christoph Böhmwalder" <christoph.boehmwalder@linbit.com>
CC: Jens Axboe <axboe@kernel.dk>
CC: Josef Bacik <josef@toxicpanda.com>
CC: Keith Busch <kbusch@kernel.org>
CC: Christoph Hellwig <hch@lst.de>
CC: Sagi Grimberg <sagi@grimberg.me>
CC: Lee Duncan <lduncan@suse.com>
CC: Chris Leech <cleech@redhat.com>
CC: Mike Christie <michael.christie@oracle.com>
CC: "James E.J. Bottomley" <jejb@linux.ibm.com>
CC: "Martin K. Petersen" <martin.petersen@oracle.com>
CC: Valentina Manea <valentina.manea.m@gmail.com>
CC: Shuah Khan <shuah@kernel.org>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
CC: David Howells <dhowells@redhat.com>
CC: Marc Dionne <marc.dionne@auristor.com>
CC: Steve French <sfrench@samba.org>
CC: Christine Caulfield <ccaulfie@redhat.com>
CC: David Teigland <teigland@redhat.com>
CC: Mark Fasheh <mark@fasheh.com>
CC: Joel Becker <jlbec@evilplan.org>
CC: Joseph Qi <joseph.qi@linux.alibaba.com>
CC: Eric Van Hensbergen <ericvh@gmail.com>
CC: Latchesar Ionkov <lucho@ionkov.net>
CC: Dominique Martinet <asmadeus@codewreck.org>
CC: Ilya Dryomov <idryomov@gmail.com>
CC: Xiubo Li <xiubli@redhat.com>
CC: Chuck Lever <chuck.lever@oracle.com>
CC: Jeff Layton <jlayton@kernel.org>
CC: Trond Myklebust <trond.myklebust@hammerspace.com>
CC: Anna Schumaker <anna@kernel.org>
CC: Steffen Klassert <steffen.klassert@secunet.com>
CC: Herbert Xu <herbert@gondor.apana.org.au>

Suggested-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Reviewed-by: Guillaume Nault <gnault@redhat.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

Benjamin Coddington and committed by
Jakub Kicinski
98123866 fb87bd47

+17
+3
drivers/block/drbd/drbd_receiver.c
··· 1030 1030 sock.socket->sk->sk_allocation = GFP_NOIO; 1031 1031 msock.socket->sk->sk_allocation = GFP_NOIO; 1032 1032 1033 + sock.socket->sk->sk_use_task_frag = false; 1034 + msock.socket->sk->sk_use_task_frag = false; 1035 + 1033 1036 sock.socket->sk->sk_priority = TC_PRIO_INTERACTIVE_BULK; 1034 1037 msock.socket->sk->sk_priority = TC_PRIO_INTERACTIVE; 1035 1038
+1
drivers/block/nbd.c
··· 512 512 noreclaim_flag = memalloc_noreclaim_save(); 513 513 do { 514 514 sock->sk->sk_allocation = GFP_NOIO | __GFP_MEMALLOC; 515 + sock->sk->sk_use_task_frag = false; 515 516 msg.msg_name = NULL; 516 517 msg.msg_namelen = 0; 517 518 msg.msg_control = NULL;
+1
drivers/nvme/host/tcp.c
··· 1537 1537 queue->sock->sk->sk_rcvtimeo = 10 * HZ; 1538 1538 1539 1539 queue->sock->sk->sk_allocation = GFP_ATOMIC; 1540 + queue->sock->sk->sk_use_task_frag = false; 1540 1541 nvme_tcp_set_queue_io_cpu(queue); 1541 1542 queue->request = NULL; 1542 1543 queue->data_remaining = 0;
+1
drivers/scsi/iscsi_tcp.c
··· 738 738 sk->sk_reuse = SK_CAN_REUSE; 739 739 sk->sk_sndtimeo = 15 * HZ; /* FIXME: make it configurable */ 740 740 sk->sk_allocation = GFP_ATOMIC; 741 + sk->sk_use_task_frag = false; 741 742 sk_set_memalloc(sk); 742 743 sock_no_linger(sk); 743 744
+1
drivers/usb/usbip/usbip_common.c
··· 315 315 316 316 do { 317 317 sock->sk->sk_allocation = GFP_NOIO; 318 + sock->sk->sk_use_task_frag = false; 318 319 319 320 result = sock_recvmsg(sock, &msg, MSG_WAITALL); 320 321 if (result <= 0)
+1
fs/cifs/connect.c
··· 2944 2944 cifs_dbg(FYI, "Socket created\n"); 2945 2945 server->ssocket = socket; 2946 2946 socket->sk->sk_allocation = GFP_NOFS; 2947 + socket->sk->sk_use_task_frag = false; 2947 2948 if (sfamily == AF_INET6) 2948 2949 cifs_reclassify_socket6(socket); 2949 2950 else
+2
fs/dlm/lowcomms.c
··· 645 645 if (dlm_config.ci_protocol == DLM_PROTO_SCTP) 646 646 sk->sk_state_change = lowcomms_state_change; 647 647 sk->sk_allocation = GFP_NOFS; 648 + sk->sk_use_task_frag = false; 648 649 sk->sk_error_report = lowcomms_error_report; 649 650 release_sock(sk); 650 651 } ··· 1770 1769 listen_con.sock = sock; 1771 1770 1772 1771 sock->sk->sk_allocation = GFP_NOFS; 1772 + sock->sk->sk_use_task_frag = false; 1773 1773 sock->sk->sk_data_ready = lowcomms_listen_data_ready; 1774 1774 release_sock(sock->sk); 1775 1775
+1
fs/ocfs2/cluster/tcp.c
··· 1602 1602 sc->sc_sock = sock; /* freed by sc_kref_release */ 1603 1603 1604 1604 sock->sk->sk_allocation = GFP_ATOMIC; 1605 + sock->sk->sk_use_task_frag = false; 1605 1606 1606 1607 myaddr.sin_family = AF_INET; 1607 1608 myaddr.sin_addr.s_addr = mynode->nd_ipv4_address;
+1
net/9p/trans_fd.c
··· 868 868 } 869 869 870 870 csocket->sk->sk_allocation = GFP_NOIO; 871 + csocket->sk->sk_use_task_frag = false; 871 872 file = sock_alloc_file(csocket, 0, NULL); 872 873 if (IS_ERR(file)) { 873 874 pr_err("%s (%d): failed to map fd\n",
+1
net/ceph/messenger.c
··· 446 446 if (ret) 447 447 return ret; 448 448 sock->sk->sk_allocation = GFP_NOFS; 449 + sock->sk->sk_use_task_frag = false; 449 450 450 451 #ifdef CONFIG_LOCKDEP 451 452 lockdep_set_class(&sock->sk->sk_lock, &socket_class);
+3
net/sunrpc/xprtsock.c
··· 1882 1882 sk->sk_write_space = xs_udp_write_space; 1883 1883 sk->sk_state_change = xs_local_state_change; 1884 1884 sk->sk_error_report = xs_error_report; 1885 + sk->sk_use_task_frag = false; 1885 1886 1886 1887 xprt_clear_connected(xprt); 1887 1888 ··· 2083 2082 sk->sk_user_data = xprt; 2084 2083 sk->sk_data_ready = xs_data_ready; 2085 2084 sk->sk_write_space = xs_udp_write_space; 2085 + sk->sk_use_task_frag = false; 2086 2086 2087 2087 xprt_set_connected(xprt); 2088 2088 ··· 2251 2249 sk->sk_state_change = xs_tcp_state_change; 2252 2250 sk->sk_write_space = xs_tcp_write_space; 2253 2251 sk->sk_error_report = xs_error_report; 2252 + sk->sk_use_task_frag = false; 2254 2253 2255 2254 /* socket options */ 2256 2255 sock_reset_flag(sk, SOCK_LINGER);
+1
net/xfrm/espintcp.c
··· 489 489 490 490 /* avoid using task_frag */ 491 491 sk->sk_allocation = GFP_ATOMIC; 492 + sk->sk_use_task_frag = false; 492 493 493 494 return 0; 494 495