Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'sctp-implement-rfc6951-udp-encapsulation-of-sctp'

Xin Long says:

====================
sctp: Implement RFC6951: UDP Encapsulation of SCTP

Description From the RFC:

The Main Reasons:

o To allow SCTP traffic to pass through legacy NATs, which do not
provide native SCTP support as specified in [BEHAVE] and
[NATSUPP].

o To allow SCTP to be implemented on hosts that do not provide
direct access to the IP layer. In particular, applications can
use their own SCTP implementation if the operating system does not
provide one.

Implementation Notes:

UDP-encapsulated SCTP is normally communicated between SCTP stacks
using the IANA-assigned UDP port number 9899 (sctp-tunneling) on both
ends. There are circumstances where other ports may be used on
either end, and it might be required to use ports other than the
registered port.

Each SCTP stack uses a single local UDP encapsulation port number as
the destination port for all its incoming SCTP packets, this greatly
simplifies implementation design.

An SCTP implementation supporting UDP encapsulation MUST maintain a
remote UDP encapsulation port number per destination address for each
SCTP association. Again, because the remote stack may be using ports
other than the well-known port, each port may be different from each
stack. However, because of remapping of ports by NATs, the remote
ports associated with different remote IP addresses may not be
identical, even if they are associated with the same stack.

Because the well-known port might not be used, implementations need
to allow other port numbers to be specified as a local or remote UDP
encapsulation port number through APIs.

Patches:

This patchset is using the udp4/6 tunnel APIs to implement the UDP
Encapsulation of SCTP with not much change in SCTP protocol stack
and with all current SCTP features keeped in Linux Kernel.

1 - 4: Fix some UDP issues that may be triggered by SCTP over UDP.
5 - 7: Process incoming UDP encapsulated packets and ICMP packets.
8 -10: Remote encap port's update by sysctl, sockopt and packets.
11-14: Process outgoing pakects with UDP encapsulated and its GSO.
15-16: Add the part from draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.
17: Enable this feature.

Tests:

- lksctp-tools/src/func_tests with UDP Encapsulation enabled/disabled:

Both make v4test and v6test passed.

- sctp-tests with UDP Encapsulation enabled/disabled:

repeatability/procdumps/sctpdiag/gsomtuchange/extoverflow/
sctphashtable passed. Others failed as expected due to those
"iptables -p sctp" rules.

- netperf on lo/netns/virtio_net, with gso enabled/disabled and
with ip_checksum enabled/disabled, with UDP Encapsulation
enabled/disabled:

No clear performance dropped.

v1->v2:
- Fix some incorrect code in the patches 5,6,8,10,11,13,14,17, suggested
by Marcelo.
- Append two patches 15-16 to add the Additional Considerations for UDP
Encapsulation of SCTP from draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.
v2->v3:
- remove the cleanup code in patch 2, suggested by Willem.
- remove the patch 3 and fix the checksum in the new patch 3 after
talking with Paolo, Marcelo and Guillaume.
- add 'select NET_UDP_TUNNEL' in patch 4 to solve a compiling error.
- fix __be16 type cast warning in patch 8.
- fix the wrong endian orders when setting values in 14,16.
v3->v4:
- add entries in ip-sysctl.rst in patch 7,16, as Marcelo Suggested.
- not create udp socks when udp_port is set to 0 in patch 16, as
Marcelo noticed.
v4->v5:
- improve the description for udp_port and encap_port entries in patch
7, 16.
- use 0 as the default udp_port.
====================

Link: https://lore.kernel.org/r/cover.1603955040.git.lucien.xin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+531 -50
+31
Documentation/networking/ip-sysctl.rst
··· 2642 2642 2643 2643 Default: 1 2644 2644 2645 + udp_port - INTEGER 2646 + The listening port for the local UDP tunneling sock. Normally it's 2647 + using the IANA-assigned UDP port number 9899 (sctp-tunneling). 2648 + 2649 + This UDP sock is used for processing the incoming UDP-encapsulated 2650 + SCTP packets (from RFC6951), and shared by all applications in the 2651 + same net namespace. This UDP sock will be closed when the value is 2652 + set to 0. 2653 + 2654 + The value will also be used to set the src port of the UDP header 2655 + for the outgoing UDP-encapsulated SCTP packets. For the dest port, 2656 + please refer to 'encap_port' below. 2657 + 2658 + Default: 0 2659 + 2660 + encap_port - INTEGER 2661 + The default remote UDP encapsulation port. 2662 + 2663 + This value is used to set the dest port of the UDP header for the 2664 + outgoing UDP-encapsulated SCTP packets by default. Users can also 2665 + change the value for each sock/asoc/transport by using setsockopt. 2666 + For further information, please refer to RFC6951. 2667 + 2668 + Note that when connecting to a remote server, the client should set 2669 + this to the port that the UDP tunneling sock on the peer server is 2670 + listening to and the local UDP tunneling sock on the client also 2671 + must be started. On the server, it would get the encap_port from 2672 + the incoming packet's source port. 2673 + 2674 + Default: 0 2675 + 2645 2676 2646 2677 ``/proc/sys/net/core/*`` 2647 2678 ========================
+20
include/linux/sctp.h
··· 482 482 * 11 Restart of an association with new addresses 483 483 * 12 User Initiated Abort 484 484 * 13 Protocol Violation 485 + * 14 Restart of an Association with New Encapsulation Port 485 486 */ 486 487 487 488 SCTP_ERROR_RESTART = cpu_to_be16(0x0b), 488 489 SCTP_ERROR_USER_ABORT = cpu_to_be16(0x0c), 489 490 SCTP_ERROR_PROTO_VIOLATION = cpu_to_be16(0x0d), 491 + SCTP_ERROR_NEW_ENCAP_PORT = cpu_to_be16(0x0e), 490 492 491 493 /* ADDIP Section 3.3 New Error Causes 492 494 * ··· 793 791 SCTP_DSCP_VAL_MASK = 0xfc, 794 792 SCTP_FLOWLABEL_SET_MASK = 0x100000, 795 793 SCTP_FLOWLABEL_VAL_MASK = 0xfffff 794 + }; 795 + 796 + /* UDP Encapsulation 797 + * draft-tuexen-tsvwg-sctp-udp-encaps-cons-03.html#section-4-4 798 + * 799 + * The error cause indicating an "Restart of an Association with 800 + * New Encapsulation Port" 801 + * 802 + * 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 803 + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 804 + * | Cause Code = 14 | Cause Length = 8 | 805 + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 806 + * | Current Encapsulation Port | New Encapsulation Port | 807 + * +-------------------------------+-------------------------------+ 808 + */ 809 + struct sctp_new_encap_port_hdr { 810 + __be16 cur_port; 811 + __be16 new_port; 796 812 }; 797 813 798 814 #endif /* __LINUX_SCTP_H__ */
+8
include/net/netns/sctp.h
··· 22 22 */ 23 23 struct sock *ctl_sock; 24 24 25 + /* UDP tunneling listening sock. */ 26 + struct sock *udp4_sock; 27 + struct sock *udp6_sock; 28 + /* UDP tunneling listening port. */ 29 + int udp_port; 30 + /* UDP tunneling remote encap port. */ 31 + int encap_port; 32 + 25 33 /* This is the global local address list. 26 34 * We actively maintain this complete list of addresses on 27 35 * the system by catching address add/delete events.
+2
include/net/sctp/constants.h
··· 286 286 * functions simpler to write. 287 287 */ 288 288 289 + #define SCTP_DEFAULT_UDP_PORT 9899 /* default UDP tunneling port */ 290 + 289 291 /* These are the values for pf exposure, UNUSED is to keep compatible with old 290 292 * applications by default. 291 293 */
+7 -2
include/net/sctp/sctp.h
··· 84 84 struct sctp_pf *sctp_get_pf_specific(sa_family_t family); 85 85 int sctp_register_pf(struct sctp_pf *, sa_family_t); 86 86 void sctp_addr_wq_mgmt(struct net *, struct sctp_sockaddr_entry *, int); 87 + int sctp_udp_sock_start(struct net *net); 88 + void sctp_udp_sock_stop(struct net *net); 87 89 88 90 /* 89 91 * sctp/socket.c ··· 578 576 { 579 577 __u32 overhead = sizeof(struct sctphdr) + extra; 580 578 581 - if (sp) 579 + if (sp) { 582 580 overhead += sp->pf->af->net_header_len; 583 - else 581 + if (sp->udp_port) 582 + overhead += sizeof(struct udphdr); 583 + } else { 584 584 overhead += sizeof(struct ipv6hdr); 585 + } 585 586 586 587 if (WARN_ON_ONCE(mtu && mtu <= overhead)) 587 588 mtu = overhead;
+4
include/net/sctp/sm.h
··· 221 221 struct sctp_chunk *sctp_make_violation_max_retrans( 222 222 const struct sctp_association *asoc, 223 223 const struct sctp_chunk *chunk); 224 + struct sctp_chunk *sctp_make_new_encap_port( 225 + const struct sctp_association *asoc, 226 + const struct sctp_chunk *chunk); 224 227 struct sctp_chunk *sctp_make_heartbeat(const struct sctp_association *asoc, 225 228 const struct sctp_transport *transport); 226 229 struct sctp_chunk *sctp_make_heartbeat_ack(const struct sctp_association *asoc, ··· 383 380 if (ntohl(chunk->sctp_hdr->vtag) == asoc->c.my_vtag) 384 381 return 1; 385 382 383 + chunk->transport->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port; 386 384 return 0; 387 385 } 388 386
+8 -6
include/net/sctp/structs.h
··· 178 178 */ 179 179 __u32 hbinterval; 180 180 181 + __be16 udp_port; 182 + __be16 encap_port; 183 + 181 184 /* This is the max_retrans value for new associations. */ 182 185 __u16 pathmaxrxt; 183 186 ··· 880 877 */ 881 878 unsigned long last_time_ecne_reduced; 882 879 880 + __be16 encap_port; 881 + 883 882 /* This is the max_retrans value for the transport and will 884 883 * be initialized from the assocs value. This can be changed 885 884 * using the SCTP_SET_PEER_ADDR_PARAMS socket option. ··· 1121 1116 * sctp_input_cb is currently used on rx and sock rx queue 1122 1117 */ 1123 1118 struct sctp_input_cb { 1124 - union { 1125 - struct inet_skb_parm h4; 1126 - #if IS_ENABLED(CONFIG_IPV6) 1127 - struct inet6_skb_parm h6; 1128 - #endif 1129 - } header; 1130 1119 struct sctp_chunk *chunk; 1131 1120 struct sctp_af *af; 1121 + __be16 encap_port; 1132 1122 }; 1133 1123 #define SCTP_INPUT_CB(__skb) ((struct sctp_input_cb *)&((__skb)->cb[0])) 1134 1124 ··· 1789 1789 * will be inherited by all new transports. 1790 1790 */ 1791 1791 unsigned long hbinterval; 1792 + 1793 + __be16 encap_port; 1792 1794 1793 1795 /* This is the max_retrans value for new transports in the 1794 1796 * association.
+7
include/uapi/linux/sctp.h
··· 140 140 #define SCTP_ECN_SUPPORTED 130 141 141 #define SCTP_EXPOSE_POTENTIALLY_FAILED_STATE 131 142 142 #define SCTP_EXPOSE_PF_STATE SCTP_EXPOSE_POTENTIALLY_FAILED_STATE 143 + #define SCTP_REMOTE_UDP_ENCAPS_PORT 132 143 144 144 145 /* PR-SCTP policies */ 145 146 #define SCTP_PR_SCTP_NONE 0x0000 ··· 1196 1195 sctp_assoc_t se_assoc_id; 1197 1196 uint16_t se_type; 1198 1197 uint8_t se_on; 1198 + }; 1199 + 1200 + struct sctp_udpencaps { 1201 + sctp_assoc_t sue_assoc_id; 1202 + struct sockaddr_storage sue_address; 1203 + uint16_t sue_port; 1199 1204 }; 1200 1205 1201 1206 /* SCTP Stream schedulers */
+1 -1
net/ipv4/udp.c
··· 702 702 sk = __udp4_lib_lookup(net, iph->daddr, uh->dest, 703 703 iph->saddr, uh->source, skb->dev->ifindex, 704 704 inet_sdif(skb), udptable, NULL); 705 - if (!sk) { 705 + if (!sk || udp_sk(sk)->encap_type) { 706 706 /* No socket for error: try tunnels before discarding */ 707 707 sk = ERR_PTR(-ENOENT); 708 708 if (static_branch_unlikely(&udp_encap_needed_key)) {
+3
net/ipv4/udp_offload.c
··· 49 49 __skb_pull(skb, tnl_hlen); 50 50 skb_reset_mac_header(skb); 51 51 skb_set_network_header(skb, skb_inner_network_offset(skb)); 52 + skb_set_transport_header(skb, skb_inner_transport_offset(skb)); 52 53 skb->mac_len = skb_inner_network_offset(skb); 53 54 skb->protocol = new_protocol; 54 55 ··· 68 67 (NETIF_F_HW_CSUM | NETIF_F_IP_CSUM)))); 69 68 70 69 features &= skb->dev->hw_enc_features; 70 + /* CRC checksum can't be handled by HW when it's a UDP tunneling packet. */ 71 + features &= ~NETIF_F_SCTP_CRC; 71 72 72 73 /* The only checksum offload we care about from here on out is the 73 74 * outer one so strip the existing checksum feature flags and
+1 -1
net/ipv6/udp.c
··· 560 560 561 561 sk = __udp6_lib_lookup(net, daddr, uh->dest, saddr, uh->source, 562 562 inet6_iif(skb), inet6_sdif(skb), udptable, NULL); 563 - if (!sk) { 563 + if (!sk || udp_sk(sk)->encap_type) { 564 564 /* No socket for error: try tunnels before discarding */ 565 565 sk = ERR_PTR(-ENOENT); 566 566 if (static_branch_unlikely(&udpv6_encap_needed_key)) {
+4 -4
net/ipv6/udp_offload.c
··· 28 28 int tnl_hlen; 29 29 int err; 30 30 31 - mss = skb_shinfo(skb)->gso_size; 32 - if (unlikely(skb->len <= mss)) 33 - goto out; 34 - 35 31 if (skb->encapsulation && skb_shinfo(skb)->gso_type & 36 32 (SKB_GSO_UDP_TUNNEL|SKB_GSO_UDP_TUNNEL_CSUM)) 37 33 segs = skb_udp_tunnel_segment(skb, features, true); ··· 43 47 44 48 if (skb_shinfo(skb)->gso_type & SKB_GSO_UDP_L4) 45 49 return __udp_gso_segment(skb, features); 50 + 51 + mss = skb_shinfo(skb)->gso_size; 52 + if (unlikely(skb->len <= mss)) 53 + goto out; 46 54 47 55 /* Do software UFO. Complete and fill in the UDP checksum as HW cannot 48 56 * do checksum of UDP packets sent as multiple IP fragments.
+1
net/sctp/Kconfig
··· 11 11 select CRYPTO_HMAC 12 12 select CRYPTO_SHA1 13 13 select LIBCRC32C 14 + select NET_UDP_TUNNEL 14 15 help 15 16 Stream Control Transmission Protocol 16 17
+4
net/sctp/associola.c
··· 99 99 */ 100 100 asoc->hbinterval = msecs_to_jiffies(sp->hbinterval); 101 101 102 + asoc->encap_port = sp->encap_port; 103 + 102 104 /* Initialize path max retrans value. */ 103 105 asoc->pathmaxrxt = sp->pathmaxrxt; 104 106 ··· 625 623 * association configured value. 626 624 */ 627 625 peer->hbinterval = asoc->hbinterval; 626 + 627 + peer->encap_port = asoc->encap_port; 628 628 629 629 /* Set the path max_retrans. */ 630 630 peer->pathmaxrxt = asoc->pathmaxrxt;
+33 -11
net/sctp/ipv6.c
··· 55 55 #include <net/inet_common.h> 56 56 #include <net/inet_ecn.h> 57 57 #include <net/sctp/sctp.h> 58 + #include <net/udp_tunnel.h> 58 59 59 60 #include <linux/uaccess.h> 60 61 ··· 192 191 return ret; 193 192 } 194 193 195 - static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *transport) 194 + static int sctp_v6_xmit(struct sk_buff *skb, struct sctp_transport *t) 196 195 { 196 + struct dst_entry *dst = dst_clone(t->dst); 197 + struct flowi6 *fl6 = &t->fl.u.ip6; 197 198 struct sock *sk = skb->sk; 198 199 struct ipv6_pinfo *np = inet6_sk(sk); 199 - struct flowi6 *fl6 = &transport->fl.u.ip6; 200 200 __u8 tclass = np->tclass; 201 - int res; 201 + __be32 label; 202 202 203 203 pr_debug("%s: skb:%p, len:%d, src:%pI6 dst:%pI6\n", __func__, skb, 204 204 skb->len, &fl6->saddr, &fl6->daddr); 205 205 206 - if (transport->dscp & SCTP_DSCP_SET_MASK) 207 - tclass = transport->dscp & SCTP_DSCP_VAL_MASK; 206 + if (t->dscp & SCTP_DSCP_SET_MASK) 207 + tclass = t->dscp & SCTP_DSCP_VAL_MASK; 208 208 209 209 if (INET_ECN_is_capable(tclass)) 210 210 IP6_ECN_flow_xmit(sk, fl6->flowlabel); 211 211 212 - if (!(transport->param_flags & SPP_PMTUD_ENABLE)) 212 + if (!(t->param_flags & SPP_PMTUD_ENABLE)) 213 213 skb->ignore_df = 1; 214 214 215 215 SCTP_INC_STATS(sock_net(sk), SCTP_MIB_OUTSCTPPACKS); 216 216 217 - rcu_read_lock(); 218 - res = ip6_xmit(sk, skb, fl6, sk->sk_mark, rcu_dereference(np->opt), 219 - tclass, sk->sk_priority); 220 - rcu_read_unlock(); 221 - return res; 217 + if (!t->encap_port || !sctp_sk(sk)->udp_port) { 218 + int res; 219 + 220 + skb_dst_set(skb, dst); 221 + rcu_read_lock(); 222 + res = ip6_xmit(sk, skb, fl6, sk->sk_mark, 223 + rcu_dereference(np->opt), 224 + tclass, sk->sk_priority); 225 + rcu_read_unlock(); 226 + return res; 227 + } 228 + 229 + if (skb_is_gso(skb)) 230 + skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM; 231 + 232 + skb->encapsulation = 1; 233 + skb_reset_inner_mac_header(skb); 234 + skb_reset_inner_transport_header(skb); 235 + skb_set_inner_ipproto(skb, IPPROTO_SCTP); 236 + label = ip6_make_flowlabel(sock_net(sk), skb, fl6->flowlabel, true, fl6); 237 + 238 + return udp_tunnel6_xmit_skb(dst, sk, skb, NULL, &fl6->saddr, 239 + &fl6->daddr, tclass, ip6_dst_hoplimit(dst), 240 + label, sctp_sk(sk)->udp_port, t->encap_port, false); 222 241 } 223 242 224 243 /* Returns the dst cache entry for the given source and destination ip ··· 1074 1053 1075 1054 static int sctp6_rcv(struct sk_buff *skb) 1076 1055 { 1056 + memset(skb->cb, 0, sizeof(skb->cb)); 1077 1057 return sctp_rcv(skb) ? -1 : 0; 1078 1058 } 1079 1059
+5 -1
net/sctp/offload.c
··· 27 27 { 28 28 skb->ip_summed = CHECKSUM_NONE; 29 29 skb->csum_not_inet = 0; 30 - gso_reset_checksum(skb, ~0); 30 + /* csum and csum_start in GSO CB may be needed to do the UDP 31 + * checksum when it's a UDP tunneling packet. 32 + */ 33 + SKB_GSO_CB(skb)->csum = (__force __wsum)~0; 34 + SKB_GSO_CB(skb)->csum_start = skb_headroom(skb) + skb->len; 31 35 return sctp_compute_cksum(skb, skb_transport_offset(skb)); 32 36 } 33 37
+10 -12
net/sctp/output.c
··· 508 508 sizeof(struct inet6_skb_parm))); 509 509 skb_shinfo(head)->gso_segs = pkt_count; 510 510 skb_shinfo(head)->gso_size = GSO_BY_FRAGS; 511 - rcu_read_lock(); 512 - if (skb_dst(head) != tp->dst) { 513 - dst_hold(tp->dst); 514 - sk_setup_caps(sk, tp->dst); 515 - } 516 - rcu_read_unlock(); 517 511 goto chksum; 518 512 } 519 513 520 514 if (sctp_checksum_disable) 521 515 return 1; 522 516 523 - if (!(skb_dst(head)->dev->features & NETIF_F_SCTP_CRC) || 524 - dst_xfrm(skb_dst(head)) || packet->ipfragok) { 517 + if (!(tp->dst->dev->features & NETIF_F_SCTP_CRC) || 518 + dst_xfrm(tp->dst) || packet->ipfragok || tp->encap_port) { 525 519 struct sctphdr *sh = 526 520 (struct sctphdr *)skb_transport_header(head); 527 521 ··· 542 548 struct sctp_association *asoc = tp->asoc; 543 549 struct sctp_chunk *chunk, *tmp; 544 550 int pkt_count, gso = 0; 545 - struct dst_entry *dst; 546 551 struct sk_buff *head; 547 552 struct sctphdr *sh; 548 553 struct sock *sk; ··· 578 585 sh->checksum = 0; 579 586 580 587 /* drop packet if no dst */ 581 - dst = dst_clone(tp->dst); 582 - if (!dst) { 588 + if (!tp->dst) { 583 589 IP_INC_STATS(sock_net(sk), IPSTATS_MIB_OUTNOROUTES); 584 590 kfree_skb(head); 585 591 goto out; 586 592 } 587 - skb_dst_set(head, dst); 593 + 594 + rcu_read_lock(); 595 + if (__sk_dst_get(sk) != tp->dst) { 596 + dst_hold(tp->dst); 597 + sk_setup_caps(sk, tp->dst); 598 + } 599 + rcu_read_unlock(); 588 600 589 601 /* pack up chunks */ 590 602 pkt_count = sctp_packet_pack(packet, head, gso, gfp);
+131 -12
net/sctp/protocol.c
··· 44 44 #include <net/addrconf.h> 45 45 #include <net/inet_common.h> 46 46 #include <net/inet_ecn.h> 47 + #include <net/udp_tunnel.h> 47 48 48 49 #define MAX_SCTP_PORT_HASH_ENTRIES (64 * 1024) 49 50 ··· 841 840 return 0; 842 841 } 843 842 843 + static int sctp_udp_rcv(struct sock *sk, struct sk_buff *skb) 844 + { 845 + memset(skb->cb, 0, sizeof(skb->cb)); 846 + SCTP_INPUT_CB(skb)->encap_port = udp_hdr(skb)->source; 847 + 848 + skb_set_transport_header(skb, sizeof(struct udphdr)); 849 + sctp_rcv(skb); 850 + return 0; 851 + } 852 + 853 + static int sctp_udp_err_lookup(struct sock *sk, struct sk_buff *skb) 854 + { 855 + struct sctp_association *asoc; 856 + struct sctp_transport *t; 857 + int family; 858 + 859 + skb->transport_header += sizeof(struct udphdr); 860 + family = (ip_hdr(skb)->version == 4) ? AF_INET : AF_INET6; 861 + sk = sctp_err_lookup(dev_net(skb->dev), family, skb, sctp_hdr(skb), 862 + &asoc, &t); 863 + if (!sk) 864 + return -ENOENT; 865 + 866 + sctp_err_finish(sk, t); 867 + return 0; 868 + } 869 + 870 + int sctp_udp_sock_start(struct net *net) 871 + { 872 + struct udp_tunnel_sock_cfg tuncfg = {NULL}; 873 + struct udp_port_cfg udp_conf = {0}; 874 + struct socket *sock; 875 + int err; 876 + 877 + udp_conf.family = AF_INET; 878 + udp_conf.local_ip.s_addr = htonl(INADDR_ANY); 879 + udp_conf.local_udp_port = htons(net->sctp.udp_port); 880 + err = udp_sock_create(net, &udp_conf, &sock); 881 + if (err) { 882 + pr_err("Failed to create the SCTP UDP tunneling v4 sock\n"); 883 + return err; 884 + } 885 + 886 + tuncfg.encap_type = 1; 887 + tuncfg.encap_rcv = sctp_udp_rcv; 888 + tuncfg.encap_err_lookup = sctp_udp_err_lookup; 889 + setup_udp_tunnel_sock(net, sock, &tuncfg); 890 + net->sctp.udp4_sock = sock->sk; 891 + 892 + #if IS_ENABLED(CONFIG_IPV6) 893 + memset(&udp_conf, 0, sizeof(udp_conf)); 894 + 895 + udp_conf.family = AF_INET6; 896 + udp_conf.local_ip6 = in6addr_any; 897 + udp_conf.local_udp_port = htons(net->sctp.udp_port); 898 + udp_conf.use_udp6_rx_checksums = true; 899 + udp_conf.ipv6_v6only = true; 900 + err = udp_sock_create(net, &udp_conf, &sock); 901 + if (err) { 902 + pr_err("Failed to create the SCTP UDP tunneling v6 sock\n"); 903 + udp_tunnel_sock_release(net->sctp.udp4_sock->sk_socket); 904 + net->sctp.udp4_sock = NULL; 905 + return err; 906 + } 907 + 908 + tuncfg.encap_type = 1; 909 + tuncfg.encap_rcv = sctp_udp_rcv; 910 + tuncfg.encap_err_lookup = sctp_udp_err_lookup; 911 + setup_udp_tunnel_sock(net, sock, &tuncfg); 912 + net->sctp.udp6_sock = sock->sk; 913 + #endif 914 + 915 + return 0; 916 + } 917 + 918 + void sctp_udp_sock_stop(struct net *net) 919 + { 920 + if (net->sctp.udp4_sock) { 921 + udp_tunnel_sock_release(net->sctp.udp4_sock->sk_socket); 922 + net->sctp.udp4_sock = NULL; 923 + } 924 + if (net->sctp.udp6_sock) { 925 + udp_tunnel_sock_release(net->sctp.udp6_sock->sk_socket); 926 + net->sctp.udp6_sock = NULL; 927 + } 928 + } 929 + 844 930 /* Register address family specific functions. */ 845 931 int sctp_register_af(struct sctp_af *af) 846 932 { ··· 1059 971 } 1060 972 1061 973 /* Wrapper routine that calls the ip transmit routine. */ 1062 - static inline int sctp_v4_xmit(struct sk_buff *skb, 1063 - struct sctp_transport *transport) 974 + static inline int sctp_v4_xmit(struct sk_buff *skb, struct sctp_transport *t) 1064 975 { 1065 - struct inet_sock *inet = inet_sk(skb->sk); 976 + struct dst_entry *dst = dst_clone(t->dst); 977 + struct flowi4 *fl4 = &t->fl.u.ip4; 978 + struct sock *sk = skb->sk; 979 + struct inet_sock *inet = inet_sk(sk); 1066 980 __u8 dscp = inet->tos; 981 + __be16 df = 0; 1067 982 1068 983 pr_debug("%s: skb:%p, len:%d, src:%pI4, dst:%pI4\n", __func__, skb, 1069 - skb->len, &transport->fl.u.ip4.saddr, 1070 - &transport->fl.u.ip4.daddr); 984 + skb->len, &fl4->saddr, &fl4->daddr); 1071 985 1072 - if (transport->dscp & SCTP_DSCP_SET_MASK) 1073 - dscp = transport->dscp & SCTP_DSCP_VAL_MASK; 986 + if (t->dscp & SCTP_DSCP_SET_MASK) 987 + dscp = t->dscp & SCTP_DSCP_VAL_MASK; 1074 988 1075 - inet->pmtudisc = transport->param_flags & SPP_PMTUD_ENABLE ? 1076 - IP_PMTUDISC_DO : IP_PMTUDISC_DONT; 989 + inet->pmtudisc = t->param_flags & SPP_PMTUD_ENABLE ? IP_PMTUDISC_DO 990 + : IP_PMTUDISC_DONT; 991 + SCTP_INC_STATS(sock_net(sk), SCTP_MIB_OUTSCTPPACKS); 1077 992 1078 - SCTP_INC_STATS(sock_net(&inet->sk), SCTP_MIB_OUTSCTPPACKS); 993 + if (!t->encap_port || !sctp_sk(sk)->udp_port) { 994 + skb_dst_set(skb, dst); 995 + return __ip_queue_xmit(sk, skb, &t->fl, dscp); 996 + } 1079 997 1080 - return __ip_queue_xmit(&inet->sk, skb, &transport->fl, dscp); 998 + if (skb_is_gso(skb)) 999 + skb_shinfo(skb)->gso_type |= SKB_GSO_UDP_TUNNEL_CSUM; 1000 + 1001 + if (ip_dont_fragment(sk, dst) && !skb->ignore_df) 1002 + df = htons(IP_DF); 1003 + 1004 + skb->encapsulation = 1; 1005 + skb_reset_inner_mac_header(skb); 1006 + skb_reset_inner_transport_header(skb); 1007 + skb_set_inner_ipproto(skb, IPPROTO_SCTP); 1008 + udp_tunnel_xmit_skb((struct rtable *)dst, sk, skb, fl4->saddr, 1009 + fl4->daddr, dscp, ip4_dst_hoplimit(dst), df, 1010 + sctp_sk(sk)->udp_port, t->encap_port, false, false); 1011 + return 0; 1081 1012 } 1082 1013 1083 1014 static struct sctp_af sctp_af_inet; ··· 1161 1054 .flags = SCTP_PROTOSW_FLAG 1162 1055 }; 1163 1056 1057 + static int sctp4_rcv(struct sk_buff *skb) 1058 + { 1059 + memset(skb->cb, 0, sizeof(skb->cb)); 1060 + return sctp_rcv(skb); 1061 + } 1062 + 1164 1063 /* Register with IP layer. */ 1165 1064 static const struct net_protocol sctp_protocol = { 1166 - .handler = sctp_rcv, 1065 + .handler = sctp4_rcv, 1167 1066 .err_handler = sctp_v4_err, 1168 1067 .no_policy = 1, 1169 1068 .netns_ok = 1, ··· 1383 1270 1384 1271 /* Enable ECN by default. */ 1385 1272 net->sctp.ecn_enable = 1; 1273 + 1274 + /* Set UDP tunneling listening port to 0 by default */ 1275 + net->sctp.udp_port = 0; 1276 + 1277 + /* Set remote encap port to 0 by default */ 1278 + net->sctp.encap_port = 0; 1386 1279 1387 1280 /* Set SCOPE policy to enabled */ 1388 1281 net->sctp.scope_policy = SCTP_SCOPE_POLICY_ENABLE;
+21
net/sctp/sm_make_chunk.c
··· 1142 1142 return retval; 1143 1143 } 1144 1144 1145 + struct sctp_chunk *sctp_make_new_encap_port(const struct sctp_association *asoc, 1146 + const struct sctp_chunk *chunk) 1147 + { 1148 + struct sctp_new_encap_port_hdr nep; 1149 + struct sctp_chunk *retval; 1150 + 1151 + retval = sctp_make_abort(asoc, chunk, 1152 + sizeof(struct sctp_errhdr) + sizeof(nep)); 1153 + if (!retval) 1154 + goto nodata; 1155 + 1156 + sctp_init_cause(retval, SCTP_ERROR_NEW_ENCAP_PORT, sizeof(nep)); 1157 + nep.cur_port = SCTP_INPUT_CB(chunk->skb)->encap_port; 1158 + nep.new_port = chunk->transport->encap_port; 1159 + sctp_addto_chunk(retval, sizeof(nep), &nep); 1160 + 1161 + nodata: 1162 + return retval; 1163 + } 1164 + 1145 1165 /* Make a HEARTBEAT chunk. */ 1146 1166 struct sctp_chunk *sctp_make_heartbeat(const struct sctp_association *asoc, 1147 1167 const struct sctp_transport *transport) ··· 2341 2321 * added as the primary transport. The source address seems to 2342 2322 * be a better choice than any of the embedded addresses. 2343 2323 */ 2324 + asoc->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port; 2344 2325 if (!sctp_assoc_add_peer(asoc, peer_addr, gfp, SCTP_ACTIVE)) 2345 2326 goto nomem; 2346 2327
+52
net/sctp/sm_statefuns.c
··· 87 87 const union sctp_subtype type, 88 88 void *arg, 89 89 struct sctp_cmd_seq *commands); 90 + static enum sctp_disposition sctp_sf_new_encap_port( 91 + struct net *net, 92 + const struct sctp_endpoint *ep, 93 + const struct sctp_association *asoc, 94 + const union sctp_subtype type, 95 + void *arg, 96 + struct sctp_cmd_seq *commands); 90 97 static struct sctp_sackhdr *sctp_sm_pull_sack(struct sctp_chunk *chunk); 91 98 92 99 static enum sctp_disposition sctp_stop_t1_and_abort( ··· 1500 1493 if (!sctp_chunk_length_valid(chunk, sizeof(struct sctp_init_chunk))) 1501 1494 return sctp_sf_violation_chunklen(net, ep, asoc, type, arg, 1502 1495 commands); 1496 + 1497 + if (SCTP_INPUT_CB(chunk->skb)->encap_port != chunk->transport->encap_port) 1498 + return sctp_sf_new_encap_port(net, ep, asoc, type, arg, commands); 1499 + 1503 1500 /* Grab the INIT header. */ 1504 1501 chunk->subh.init_hdr = (struct sctp_inithdr *)chunk->skb->data; 1505 1502 ··· 3399 3388 packet->vtag = ntohl(chunk->sctp_hdr->vtag); 3400 3389 3401 3390 /* Set the skb to the belonging sock for accounting. */ 3391 + abort->skb->sk = ep->base.sk; 3392 + 3393 + sctp_packet_append_chunk(packet, abort); 3394 + 3395 + sctp_add_cmd_sf(commands, SCTP_CMD_SEND_PKT, SCTP_PACKET(packet)); 3396 + 3397 + SCTP_INC_STATS(net, SCTP_MIB_OUTCTRLCHUNKS); 3398 + 3399 + sctp_sf_pdiscard(net, ep, asoc, type, arg, commands); 3400 + return SCTP_DISPOSITION_CONSUME; 3401 + } 3402 + 3403 + /* Handling of SCTP Packets Containing an INIT Chunk Matching an 3404 + * Existing Associations when the UDP encap port is incorrect. 3405 + * 3406 + * From Section 4 at draft-tuexen-tsvwg-sctp-udp-encaps-cons-03. 3407 + */ 3408 + static enum sctp_disposition sctp_sf_new_encap_port( 3409 + struct net *net, 3410 + const struct sctp_endpoint *ep, 3411 + const struct sctp_association *asoc, 3412 + const union sctp_subtype type, 3413 + void *arg, 3414 + struct sctp_cmd_seq *commands) 3415 + { 3416 + struct sctp_packet *packet = NULL; 3417 + struct sctp_chunk *chunk = arg; 3418 + struct sctp_chunk *abort; 3419 + 3420 + packet = sctp_ootb_pkt_new(net, asoc, chunk); 3421 + if (!packet) 3422 + return SCTP_DISPOSITION_NOMEM; 3423 + 3424 + abort = sctp_make_new_encap_port(asoc, chunk); 3425 + if (!abort) { 3426 + sctp_ootb_pkt_free(packet); 3427 + return SCTP_DISPOSITION_NOMEM; 3428 + } 3429 + 3402 3430 abort->skb->sk = ep->base.sk; 3403 3431 3404 3432 sctp_packet_append_chunk(packet, abort); ··· 6317 6267 transport = sctp_transport_new(net, sctp_source(chunk), GFP_ATOMIC); 6318 6268 if (!transport) 6319 6269 goto nomem; 6270 + 6271 + transport->encap_port = SCTP_INPUT_CB(chunk->skb)->encap_port; 6320 6272 6321 6273 /* Cache a route for the transport with the chunk's destination as 6322 6274 * the source address.
+116
net/sctp/socket.c
··· 4417 4417 return retval; 4418 4418 } 4419 4419 4420 + static int sctp_setsockopt_encap_port(struct sock *sk, 4421 + struct sctp_udpencaps *encap, 4422 + unsigned int optlen) 4423 + { 4424 + struct sctp_association *asoc; 4425 + struct sctp_transport *t; 4426 + __be16 encap_port; 4427 + 4428 + if (optlen != sizeof(*encap)) 4429 + return -EINVAL; 4430 + 4431 + /* If an address other than INADDR_ANY is specified, and 4432 + * no transport is found, then the request is invalid. 4433 + */ 4434 + encap_port = (__force __be16)encap->sue_port; 4435 + if (!sctp_is_any(sk, (union sctp_addr *)&encap->sue_address)) { 4436 + t = sctp_addr_id2transport(sk, &encap->sue_address, 4437 + encap->sue_assoc_id); 4438 + if (!t) 4439 + return -EINVAL; 4440 + 4441 + t->encap_port = encap_port; 4442 + return 0; 4443 + } 4444 + 4445 + /* Get association, if assoc_id != SCTP_FUTURE_ASSOC and the 4446 + * socket is a one to many style socket, and an association 4447 + * was not found, then the id was invalid. 4448 + */ 4449 + asoc = sctp_id2assoc(sk, encap->sue_assoc_id); 4450 + if (!asoc && encap->sue_assoc_id != SCTP_FUTURE_ASSOC && 4451 + sctp_style(sk, UDP)) 4452 + return -EINVAL; 4453 + 4454 + /* If changes are for association, also apply encap_port to 4455 + * each transport. 4456 + */ 4457 + if (asoc) { 4458 + list_for_each_entry(t, &asoc->peer.transport_addr_list, 4459 + transports) 4460 + t->encap_port = encap_port; 4461 + 4462 + return 0; 4463 + } 4464 + 4465 + sctp_sk(sk)->encap_port = encap_port; 4466 + return 0; 4467 + } 4468 + 4420 4469 /* API 6.2 setsockopt(), getsockopt() 4421 4470 * 4422 4471 * Applications use setsockopt() and getsockopt() to set or retrieve ··· 4684 4635 break; 4685 4636 case SCTP_EXPOSE_POTENTIALLY_FAILED_STATE: 4686 4637 retval = sctp_setsockopt_pf_expose(sk, kopt, optlen); 4638 + break; 4639 + case SCTP_REMOTE_UDP_ENCAPS_PORT: 4640 + retval = sctp_setsockopt_encap_port(sk, kopt, optlen); 4687 4641 break; 4688 4642 default: 4689 4643 retval = -ENOPROTOOPT; ··· 4928 4876 * be modified via SCTP_PEER_ADDR_PARAMS 4929 4877 */ 4930 4878 sp->hbinterval = net->sctp.hb_interval; 4879 + sp->udp_port = htons(net->sctp.udp_port); 4880 + sp->encap_port = htons(net->sctp.encap_port); 4931 4881 sp->pathmaxrxt = net->sctp.max_retrans_path; 4932 4882 sp->pf_retrans = net->sctp.pf_retrans; 4933 4883 sp->ps_retrans = net->sctp.ps_retrans; ··· 7844 7790 return retval; 7845 7791 } 7846 7792 7793 + static int sctp_getsockopt_encap_port(struct sock *sk, int len, 7794 + char __user *optval, int __user *optlen) 7795 + { 7796 + struct sctp_association *asoc; 7797 + struct sctp_udpencaps encap; 7798 + struct sctp_transport *t; 7799 + __be16 encap_port; 7800 + 7801 + if (len < sizeof(encap)) 7802 + return -EINVAL; 7803 + 7804 + len = sizeof(encap); 7805 + if (copy_from_user(&encap, optval, len)) 7806 + return -EFAULT; 7807 + 7808 + /* If an address other than INADDR_ANY is specified, and 7809 + * no transport is found, then the request is invalid. 7810 + */ 7811 + if (!sctp_is_any(sk, (union sctp_addr *)&encap.sue_address)) { 7812 + t = sctp_addr_id2transport(sk, &encap.sue_address, 7813 + encap.sue_assoc_id); 7814 + if (!t) { 7815 + pr_debug("%s: failed no transport\n", __func__); 7816 + return -EINVAL; 7817 + } 7818 + 7819 + encap_port = t->encap_port; 7820 + goto out; 7821 + } 7822 + 7823 + /* Get association, if assoc_id != SCTP_FUTURE_ASSOC and the 7824 + * socket is a one to many style socket, and an association 7825 + * was not found, then the id was invalid. 7826 + */ 7827 + asoc = sctp_id2assoc(sk, encap.sue_assoc_id); 7828 + if (!asoc && encap.sue_assoc_id != SCTP_FUTURE_ASSOC && 7829 + sctp_style(sk, UDP)) { 7830 + pr_debug("%s: failed no association\n", __func__); 7831 + return -EINVAL; 7832 + } 7833 + 7834 + if (asoc) { 7835 + encap_port = asoc->encap_port; 7836 + goto out; 7837 + } 7838 + 7839 + encap_port = sctp_sk(sk)->encap_port; 7840 + 7841 + out: 7842 + encap.sue_port = (__force uint16_t)encap_port; 7843 + if (copy_to_user(optval, &encap, len)) 7844 + return -EFAULT; 7845 + 7846 + if (put_user(len, optlen)) 7847 + return -EFAULT; 7848 + 7849 + return 0; 7850 + } 7851 + 7847 7852 static int sctp_getsockopt(struct sock *sk, int level, int optname, 7848 7853 char __user *optval, int __user *optlen) 7849 7854 { ··· 8122 8009 break; 8123 8010 case SCTP_EXPOSE_POTENTIALLY_FAILED_STATE: 8124 8011 retval = sctp_getsockopt_pf_expose(sk, len, optval, optlen); 8012 + break; 8013 + case SCTP_REMOTE_UDP_ENCAPS_PORT: 8014 + retval = sctp_getsockopt_encap_port(sk, len, optval, optlen); 8125 8015 break; 8126 8016 default: 8127 8017 retval = -ENOPROTOOPT;
+62
net/sctp/sysctl.c
··· 36 36 static int rto_beta_max = 1000; 37 37 static int pf_expose_max = SCTP_PF_EXPOSE_MAX; 38 38 static int ps_retrans_max = SCTP_PS_RETRANS_MAX; 39 + static int udp_port_max = 65535; 39 40 40 41 static unsigned long max_autoclose_min = 0; 41 42 static unsigned long max_autoclose_max = ··· 49 48 void *buffer, size_t *lenp, loff_t *ppos); 50 49 static int proc_sctp_do_rto_max(struct ctl_table *ctl, int write, void *buffer, 51 50 size_t *lenp, loff_t *ppos); 51 + static int proc_sctp_do_udp_port(struct ctl_table *ctl, int write, void *buffer, 52 + size_t *lenp, loff_t *ppos); 52 53 static int proc_sctp_do_alpha_beta(struct ctl_table *ctl, int write, 53 54 void *buffer, size_t *lenp, loff_t *ppos); 54 55 static int proc_sctp_do_auth(struct ctl_table *ctl, int write, ··· 294 291 .proc_handler = proc_dointvec, 295 292 }, 296 293 { 294 + .procname = "udp_port", 295 + .data = &init_net.sctp.udp_port, 296 + .maxlen = sizeof(int), 297 + .mode = 0644, 298 + .proc_handler = proc_sctp_do_udp_port, 299 + .extra1 = SYSCTL_ZERO, 300 + .extra2 = &udp_port_max, 301 + }, 302 + { 303 + .procname = "encap_port", 304 + .data = &init_net.sctp.encap_port, 305 + .maxlen = sizeof(int), 306 + .mode = 0644, 307 + .proc_handler = proc_dointvec, 308 + .extra1 = SYSCTL_ZERO, 309 + .extra2 = &udp_port_max, 310 + }, 311 + { 297 312 .procname = "addr_scope_policy", 298 313 .data = &init_net.sctp.scope_policy, 299 314 .maxlen = sizeof(int), ··· 492 471 /* Update the value in the control socket */ 493 472 lock_sock(sk); 494 473 sctp_sk(sk)->ep->auth_enable = new_value; 474 + release_sock(sk); 475 + } 476 + 477 + return ret; 478 + } 479 + 480 + static int proc_sctp_do_udp_port(struct ctl_table *ctl, int write, 481 + void *buffer, size_t *lenp, loff_t *ppos) 482 + { 483 + struct net *net = current->nsproxy->net_ns; 484 + unsigned int min = *(unsigned int *)ctl->extra1; 485 + unsigned int max = *(unsigned int *)ctl->extra2; 486 + struct ctl_table tbl; 487 + int ret, new_value; 488 + 489 + memset(&tbl, 0, sizeof(struct ctl_table)); 490 + tbl.maxlen = sizeof(unsigned int); 491 + 492 + if (write) 493 + tbl.data = &new_value; 494 + else 495 + tbl.data = &net->sctp.udp_port; 496 + 497 + ret = proc_dointvec(&tbl, write, buffer, lenp, ppos); 498 + if (write && ret == 0) { 499 + struct sock *sk = net->sctp.ctl_sock; 500 + 501 + if (new_value > max || new_value < min) 502 + return -EINVAL; 503 + 504 + net->sctp.udp_port = new_value; 505 + sctp_udp_sock_stop(net); 506 + if (new_value) { 507 + ret = sctp_udp_sock_start(net); 508 + if (ret) 509 + net->sctp.udp_port = 0; 510 + } 511 + 512 + /* Update the value in the control socket */ 513 + lock_sock(sk); 514 + sctp_sk(sk)->udp_port = htons(net->sctp.udp_port); 495 515 release_sock(sk); 496 516 } 497 517