Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

sctp: be more restrictive in transport selection on bundled sacks

It was noticed recently that when we send data on a transport, its possible that
we might bundle a sack that arrived on a different transport. While this isn't
a major problem, it does go against the SHOULD requirement in section 6.4 of RFC
2960:

An endpoint SHOULD transmit reply chunks (e.g., SACK, HEARTBEAT ACK,
etc.) to the same destination transport address from which it
received the DATA or control chunk to which it is replying. This
rule should also be followed if the endpoint is bundling DATA chunks
together with the reply chunk.

This patch seeks to correct that. It restricts the bundling of sack operations
to only those transports which have moved the ctsn of the association forward
since the last sack. By doing this we guarantee that we only bundle outbound
saks on a transport that has received a chunk since the last sack. This brings
us into stricter compliance with the RFC.

Vlad had initially suggested that we strictly allow only sack bundling on the
transport that last moved the ctsn forward. While this makes sense, I was
concerned that doing so prevented us from bundling in the case where we had
received chunks that moved the ctsn on multiple transports. In those cases, the
RFC allows us to select any of the transports having received chunks to bundle
the sack on. so I've modified the approach to allow for that, by adding a state
variable to each transport that tracks weather it has moved the ctsn since the
last sack. This I think keeps our behavior (and performance), close enough to
our current profile that I think we can do this without a sysctl knob to
enable/disable it.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: Vlad Yaseivch <vyasevich@gmail.com>
CC: David S. Miller <davem@davemloft.net>
CC: linux-sctp@vger.kernel.org
Reported-by: Michele Baldessari <michele@redhat.com>
Reported-by: sorin serban <sserban@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Neil Horman and committed by
David S. Miller
4244854d 0e90b49c

+39 -5
+4
include/net/sctp/structs.h
··· 912 912 /* Is this structure kfree()able? */ 913 913 malloced:1; 914 914 915 + /* Has this transport moved the ctsn since we last sacked */ 916 + __u32 sack_generation; 917 + 915 918 struct flowi fl; 916 919 917 920 /* This is the peer's IP address and port. */ ··· 1587 1584 */ 1588 1585 __u8 sack_needed; /* Do we need to sack the peer? */ 1589 1586 __u32 sack_cnt; 1587 + __u32 sack_generation; 1590 1588 1591 1589 /* These are capabilities which our peer advertised. */ 1592 1590 __u8 ecn_capable:1, /* Can peer do ECN? */
+2 -1
include/net/sctp/tsnmap.h
··· 117 117 int sctp_tsnmap_check(const struct sctp_tsnmap *, __u32 tsn); 118 118 119 119 /* Mark this TSN as seen. */ 120 - int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn); 120 + int sctp_tsnmap_mark(struct sctp_tsnmap *, __u32 tsn, 121 + struct sctp_transport *trans); 121 122 122 123 /* Mark this TSN and all lower as seen. */ 123 124 void sctp_tsnmap_skip(struct sctp_tsnmap *map, __u32 tsn);
+1
net/sctp/associola.c
··· 271 271 */ 272 272 asoc->peer.sack_needed = 1; 273 273 asoc->peer.sack_cnt = 0; 274 + asoc->peer.sack_generation = 1; 274 275 275 276 /* Assume that the peer will tell us if he recognizes ASCONF 276 277 * as part of INIT exchange.
+5
net/sctp/output.c
··· 248 248 /* If the SACK timer is running, we have a pending SACK */ 249 249 if (timer_pending(timer)) { 250 250 struct sctp_chunk *sack; 251 + 252 + if (pkt->transport->sack_generation != 253 + pkt->transport->asoc->peer.sack_generation) 254 + return retval; 255 + 251 256 asoc->a_rwnd = asoc->rwnd; 252 257 sack = sctp_make_sack(asoc); 253 258 if (sack) {
+16
net/sctp/sm_make_chunk.c
··· 734 734 int len; 735 735 __u32 ctsn; 736 736 __u16 num_gabs, num_dup_tsns; 737 + struct sctp_association *aptr = (struct sctp_association *)asoc; 737 738 struct sctp_tsnmap *map = (struct sctp_tsnmap *)&asoc->peer.tsn_map; 738 739 struct sctp_gap_ack_block gabs[SCTP_MAX_GABS]; 740 + struct sctp_transport *trans; 739 741 740 742 memset(gabs, 0, sizeof(gabs)); 741 743 ctsn = sctp_tsnmap_get_ctsn(map); ··· 807 805 sctp_addto_chunk(retval, sizeof(__u32) * num_dup_tsns, 808 806 sctp_tsnmap_get_dups(map)); 809 807 808 + /* Once we have a sack generated, check to see what our sack 809 + * generation is, if its 0, reset the transports to 0, and reset 810 + * the association generation to 1 811 + * 812 + * The idea is that zero is never used as a valid generation for the 813 + * association so no transport will match after a wrap event like this, 814 + * Until the next sack 815 + */ 816 + if (++aptr->peer.sack_generation == 0) { 817 + list_for_each_entry(trans, &asoc->peer.transport_addr_list, 818 + transports) 819 + trans->sack_generation = 0; 820 + aptr->peer.sack_generation = 1; 821 + } 810 822 nodata: 811 823 return retval; 812 824 }
+1 -1
net/sctp/sm_sideeffect.c
··· 1268 1268 case SCTP_CMD_REPORT_TSN: 1269 1269 /* Record the arrival of a TSN. */ 1270 1270 error = sctp_tsnmap_mark(&asoc->peer.tsn_map, 1271 - cmd->obj.u32); 1271 + cmd->obj.u32, NULL); 1272 1272 break; 1273 1273 1274 1274 case SCTP_CMD_REPORT_FWDTSN:
+2
net/sctp/transport.c
··· 68 68 peer->af_specific = sctp_get_af_specific(addr->sa.sa_family); 69 69 memset(&peer->saddr, 0, sizeof(union sctp_addr)); 70 70 71 + peer->sack_generation = 0; 72 + 71 73 /* From 6.3.1 RTO Calculation: 72 74 * 73 75 * C1) Until an RTT measurement has been made for a packet sent to the
+5 -1
net/sctp/tsnmap.c
··· 114 114 115 115 116 116 /* Mark this TSN as seen. */ 117 - int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn) 117 + int sctp_tsnmap_mark(struct sctp_tsnmap *map, __u32 tsn, 118 + struct sctp_transport *trans) 118 119 { 119 120 u16 gap; 120 121 ··· 134 133 */ 135 134 map->max_tsn_seen++; 136 135 map->cumulative_tsn_ack_point++; 136 + if (trans) 137 + trans->sack_generation = 138 + trans->asoc->peer.sack_generation; 137 139 map->base_tsn++; 138 140 } else { 139 141 /* Either we already have a gap, or about to record a gap, so
+2 -1
net/sctp/ulpevent.c
··· 715 715 * can mark it as received so the tsn_map is updated correctly. 716 716 */ 717 717 if (sctp_tsnmap_mark(&asoc->peer.tsn_map, 718 - ntohl(chunk->subh.data_hdr->tsn))) 718 + ntohl(chunk->subh.data_hdr->tsn), 719 + chunk->transport)) 719 720 goto fail_mark; 720 721 721 722 /* First calculate the padding, so we don't inadvertently
+1 -1
net/sctp/ulpqueue.c
··· 1051 1051 if (chunk && (freed >= needed)) { 1052 1052 __u32 tsn; 1053 1053 tsn = ntohl(chunk->subh.data_hdr->tsn); 1054 - sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn); 1054 + sctp_tsnmap_mark(&asoc->peer.tsn_map, tsn, chunk->transport); 1055 1055 sctp_ulpq_tail_data(ulpq, chunk, gfp); 1056 1056 1057 1057 sctp_ulpq_partial_delivery(ulpq, chunk, gfp);