Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tcp: make sure EPOLLOUT wont be missed

As Jason Baron explained in commit 790ba4566c1a ("tcp: set SOCK_NOSPACE
under memory pressure"), it is crucial we properly set SOCK_NOSPACE
when needed.

However, Jason patch had a bug, because the 'nonblocking' status
as far as sk_stream_wait_memory() is concerned is governed
by MSG_DONTWAIT flag passed at sendmsg() time :

long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT);

So it is very possible that tcp sendmsg() calls sk_stream_wait_memory(),
and that sk_stream_wait_memory() returns -EAGAIN with SOCK_NOSPACE
cleared, if sk->sk_sndtimeo has been set to a small (but not zero)
value.

This patch removes the 'noblock' variable since we must always
set SOCK_NOSPACE if -EAGAIN is returned.

It also renames the do_nonblock label since we might reach this
code path even if we were in blocking mode.

Fixes: 790ba4566c1a ("tcp: set SOCK_NOSPACE under memory pressure")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jason Baron <jbaron@akamai.com>
Reported-by: Vladimir Rutsky <rutsky@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Jason Baron <jbaron@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Eric Dumazet and committed by
David S. Miller
ef8d8ccd 06821504

+9 -7
+9 -7
net/core/stream.c
··· 120 120 int err = 0; 121 121 long vm_wait = 0; 122 122 long current_timeo = *timeo_p; 123 - bool noblock = (*timeo_p ? false : true); 124 123 DEFINE_WAIT_FUNC(wait, woken_wake_function); 125 124 126 125 if (sk_stream_memory_free(sk)) ··· 132 133 133 134 if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN)) 134 135 goto do_error; 135 - if (!*timeo_p) { 136 - if (noblock) 137 - set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); 138 - goto do_nonblock; 139 - } 136 + if (!*timeo_p) 137 + goto do_eagain; 140 138 if (signal_pending(current)) 141 139 goto do_interrupted; 142 140 sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); ··· 165 169 do_error: 166 170 err = -EPIPE; 167 171 goto out; 168 - do_nonblock: 172 + do_eagain: 173 + /* Make sure that whenever EAGAIN is returned, EPOLLOUT event can 174 + * be generated later. 175 + * When TCP receives ACK packets that make room, tcp_check_space() 176 + * only calls tcp_new_space() if SOCK_NOSPACE is set. 177 + */ 178 + set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); 169 179 err = -EAGAIN; 170 180 goto out; 171 181 do_interrupted: