commit 882bebaaca4bb1484078d44ef011f918c0e1e14e · tjh.dev/kernel

tjh.dev / kernel

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

[TCP]: tcp_simple_retransmit can cause S+L

This fixes Bugzilla #10384

tcp_simple_retransmit does L increment without any checking
whatsoever for overflowing S+L when Reno is in use.

The simplest scenario I can currently think of is rather
complex in practice (there might be some more straightforward
cases though). Ie., if mss is reduced during mtu probing, it
may end up marking everything lost and if some duplicate ACKs
arrived prior to that sacked_out will be non-zero as well,
leading to S+L > packets_out, tcp_clean_rtx_queue on the next
cumulative ACK or tcp_fastretrans_alert on the next duplicate
ACK will fix the S counter.

More straightforward (but questionable) solution would be to
just call tcp_reset_reno_sack() in tcp_simple_retransmit but
it would negatively impact the probe's retransmission, ie.,
the retransmissions would not occur if some duplicate ACKs
had arrived.

So I had to add reno sacked_out reseting to CA_Loss state
when the first cumulative ACK arrives (this stale sacked_out
might actually be the explanation for the reports of left_out
overflows in kernel prior to 2.6.23 and S+L overflow reports
of 2.6.24). However, this alone won't be enough to fix kernel
before 2.6.24 because it is building on top of the commit
1b6d427bb7e ([TCP]: Reduce sacked_out with reno when purging
write_queue) to keep the sacked_out from overflowing.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by Ilpo Järvinen and committed by David S. Miller 18 years ago 882bebaa c137f3dd

+23 -6

3 changed files

expand all

unified split

include

net

tcp.h

net

ipv4

tcp_input.c

tcp_output.c

include/net/tcp.h

··· 752 return tp->packets_out - tcp_left_out(tp) + tp->retrans_out; 753 } 754 755 /* If cwnd > ssthresh, we may raise ssthresh to be half-way to cwnd. 756 * The exception is rate halving phase, when cwnd is decreasing towards 757 * ssthresh.

··· 752 return tp->packets_out - tcp_left_out(tp) + tp->retrans_out; 753 } 754 755 + extern int tcp_limit_reno_sacked(struct tcp_sock *tp); 756 + 757 /* If cwnd > ssthresh, we may raise ssthresh to be half-way to cwnd. 758 * The exception is rate halving phase, when cwnd is decreasing towards 759 * ssthresh.

+18 -6

net/ipv4/tcp_input.c

··· 1625 return flag; 1626 } 1627 1628 - /* If we receive more dupacks than we expected counting segments 1629 - * in assumption of absent reordering, interpret this as reordering. 1630 - * The only another reason could be bug in receiver TCP. 1631 */ 1632 - static void tcp_check_reno_reordering(struct sock *sk, const int addend) 1633 { 1634 - struct tcp_sock *tp = tcp_sk(sk); 1635 u32 holes; 1636 1637 holes = max(tp->lost_out, 1U); ··· 1637 1638 if ((tp->sacked_out + holes) > tp->packets_out) { 1639 tp->sacked_out = tp->packets_out - holes; 1640 - tcp_update_reordering(sk, tp->packets_out + addend, 0); 1641 } 1642 } 1643 1644 /* Emulate SACKs for SACKless connection: account for a new dupack. */ ··· 2610 case TCP_CA_Loss: 2611 if (flag & FLAG_DATA_ACKED) 2612 icsk->icsk_retransmits = 0; 2613 if (!tcp_try_undo_loss(sk)) { 2614 tcp_moderate_cwnd(tp); 2615 tcp_xmit_retransmit_queue(sk);

··· 1625 return flag; 1626 } 1627 1628 + /* Limits sacked_out so that sum with lost_out isn't ever larger than 1629 + * packets_out. Returns zero if sacked_out adjustement wasn't necessary. 1630 */ 1631 + int tcp_limit_reno_sacked(struct tcp_sock *tp) 1632 { 1633 u32 holes; 1634 1635 holes = max(tp->lost_out, 1U); ··· 1639 1640 if ((tp->sacked_out + holes) > tp->packets_out) { 1641 tp->sacked_out = tp->packets_out - holes; 1642 + return 1; 1643 } 1644 + return 0; 1645 + } 1646 + 1647 + /* If we receive more dupacks than we expected counting segments 1648 + * in assumption of absent reordering, interpret this as reordering. 1649 + * The only another reason could be bug in receiver TCP. 1650 + */ 1651 + static void tcp_check_reno_reordering(struct sock *sk, const int addend) 1652 + { 1653 + struct tcp_sock *tp = tcp_sk(sk); 1654 + if (tcp_limit_reno_sacked(tp)) 1655 + tcp_update_reordering(sk, tp->packets_out + addend, 0); 1656 } 1657 1658 /* Emulate SACKs for SACKless connection: account for a new dupack. */ ··· 2600 case TCP_CA_Loss: 2601 if (flag & FLAG_DATA_ACKED) 2602 icsk->icsk_retransmits = 0; 2603 + if (tcp_is_reno(tp) && flag & FLAG_SND_UNA_ADVANCED) 2604 + tcp_reset_reno_sack(tp); 2605 if (!tcp_try_undo_loss(sk)) { 2606 tcp_moderate_cwnd(tp); 2607 tcp_xmit_retransmit_queue(sk);

net/ipv4/tcp_output.c

··· 1808 if (!lost) 1809 return; 1810 1811 tcp_verify_left_out(tp); 1812 1813 /* Don't muck with the congestion window here.

··· 1808 if (!lost) 1809 return; 1810 1811 + if (tcp_is_reno(tp)) 1812 + tcp_limit_reno_sacked(tp); 1813 + 1814 tcp_verify_left_out(tp); 1815 1816 /* Don't muck with the congestion window here.