Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tcp: remove empty skb from write queue in error cases

Vladimir Rutsky reported stuck TCP sessions after memory pressure
events. Edge Trigger epoll() user would never receive an EPOLLOUT
notification allowing them to retry a sendmsg().

Jason tested the case of sk_stream_alloc_skb() returning NULL,
but there are other paths that could lead both sendmsg() and sendpage()
to return -1 (EAGAIN), with an empty skb queued on the write queue.

This patch makes sure we remove this empty skb so that
Jason code can detect that the queue is empty, and
call sk->sk_write_space(sk) accordingly.

Fixes: ce5ec440994b ("tcp: ensure epoll edge trigger wakeup when write queue is empty")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Reported-by: Vladimir Rutsky <rutsky@google.com>
Cc: Soheil Hassas Yeganeh <soheil@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Eric Dumazet and committed by
David S. Miller
fdfc5c85 7d0a0658

+21 -11
+21 -11
net/ipv4/tcp.c
··· 935 935 return mss_now; 936 936 } 937 937 938 + /* In some cases, both sendpage() and sendmsg() could have added 939 + * an skb to the write queue, but failed adding payload on it. 940 + * We need to remove it to consume less memory, but more 941 + * importantly be able to generate EPOLLOUT for Edge Trigger epoll() 942 + * users. 943 + */ 944 + static void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) 945 + { 946 + if (skb && !skb->len) { 947 + tcp_unlink_write_queue(skb, sk); 948 + if (tcp_write_queue_empty(sk)) 949 + tcp_chrono_stop(sk, TCP_CHRONO_BUSY); 950 + sk_wmem_free_skb(sk, skb); 951 + } 952 + } 953 + 938 954 ssize_t do_tcp_sendpages(struct sock *sk, struct page *page, int offset, 939 955 size_t size, int flags) 940 956 { ··· 1080 1064 return copied; 1081 1065 1082 1066 do_error: 1067 + tcp_remove_empty_skb(sk, tcp_write_queue_tail(sk)); 1083 1068 if (copied) 1084 1069 goto out; 1085 1070 out_err: ··· 1405 1388 sock_zerocopy_put(uarg); 1406 1389 return copied + copied_syn; 1407 1390 1408 - do_fault: 1409 - if (!skb->len) { 1410 - tcp_unlink_write_queue(skb, sk); 1411 - /* It is the one place in all of TCP, except connection 1412 - * reset, where we can be unlinking the send_head. 1413 - */ 1414 - if (tcp_write_queue_empty(sk)) 1415 - tcp_chrono_stop(sk, TCP_CHRONO_BUSY); 1416 - sk_wmem_free_skb(sk, skb); 1417 - } 1418 - 1419 1391 do_error: 1392 + skb = tcp_write_queue_tail(sk); 1393 + do_fault: 1394 + tcp_remove_empty_skb(sk, skb); 1395 + 1420 1396 if (copied + copied_syn) 1421 1397 goto out; 1422 1398 out_err: