tcp: TCP connection times out if ICMP frag needed is delayed

We are seeing an issue with TCP in handling an ICMP frag needed
message that is received after net.ipv4.tcp_retries1 retransmits.
The default value of retries1 is 3. So if the path mtu changes
and ICMP frag needed is lost for the first 3 retransmits or if
it gets delayed until 3 retransmits are done, TCP doesn't update
MSS correctly and continues to retransmit the orginal message
until it timesout after tcp_retries2 retransmits.

I am seeing this issue even with the latest 2.6.25.4 kernel.

In tcp_retransmit_timer(), when retransmits counter exceeds
tcp_retries1 value, the dst cache entry of the socket is reset.
At this time, if we receive an ICMP frag needed message, the
dst entry gets updated with the new MTU, but the TCP sockets
dst_cache entry remains NULL.

So the next time when we try to retransmit after the ICMP frag
needed is received, tcp_retransmit_skb() gets called. Here the
cur_mss value is calculated at the start of the routine with
a NULL sk_dst_cache. Instead we should call tcp_current_mss after
the rebuild_header that caches the dst entry with the updated mtu.
Also the rebuild_header should be called before tcp_fragment
so that skb is fragmented if the mss goes down.

Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by Sridhar Samudrala and committed by David S. Miller 7d227cd2 c8942f1f

+6 -4
+6 -4
net/ipv4/tcp_output.c
··· 1836 { 1837 struct tcp_sock *tp = tcp_sk(sk); 1838 struct inet_connection_sock *icsk = inet_csk(sk); 1839 - unsigned int cur_mss = tcp_current_mss(sk, 0); 1840 int err; 1841 1842 /* Inconslusive MTU probe */ ··· 1857 if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq)) 1858 return -ENOMEM; 1859 } 1860 1861 /* If receiver has shrunk his window, and skb is out of 1862 * new window, do not retransmit it. The exception is the ··· 1888 tcp_skb_pcount(tcp_write_queue_next(sk, skb)) == 1) && 1889 (sysctl_tcp_retrans_collapse != 0)) 1890 tcp_retrans_try_collapse(sk, skb, cur_mss); 1891 - 1892 - if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk)) 1893 - return -EHOSTUNREACH; /* Routing failure or similar. */ 1894 1895 /* Some Solaris stacks overoptimize and ignore the FIN on a 1896 * retransmit when old data is attached. So strip it off
··· 1836 { 1837 struct tcp_sock *tp = tcp_sk(sk); 1838 struct inet_connection_sock *icsk = inet_csk(sk); 1839 + unsigned int cur_mss; 1840 int err; 1841 1842 /* Inconslusive MTU probe */ ··· 1857 if (tcp_trim_head(sk, skb, tp->snd_una - TCP_SKB_CB(skb)->seq)) 1858 return -ENOMEM; 1859 } 1860 + 1861 + if (inet_csk(sk)->icsk_af_ops->rebuild_header(sk)) 1862 + return -EHOSTUNREACH; /* Routing failure or similar. */ 1863 + 1864 + cur_mss = tcp_current_mss(sk, 0); 1865 1866 /* If receiver has shrunk his window, and skb is out of 1867 * new window, do not retransmit it. The exception is the ··· 1883 tcp_skb_pcount(tcp_write_queue_next(sk, skb)) == 1) && 1884 (sysctl_tcp_retrans_collapse != 0)) 1885 tcp_retrans_try_collapse(sk, skb, cur_mss); 1886 1887 /* Some Solaris stacks overoptimize and ignore the FIN on a 1888 * retransmit when old data is attached. So strip it off