Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

net/sched: act_skbmod: Add SKBMOD_F_ECN option support

Currently, when doing rate limiting using the tc-police(8) action, the
easiest way is to simply drop the packets which exceed or conform the
configured bandwidth limit. Add a new option to tc-skbmod(8), so that
users may use the ECN [1] extension to explicitly inform the receiver
about the congestion instead of dropping packets "on the floor".

The 2 least significant bits of the Traffic Class field in IPv4 and IPv6
headers are used to represent different ECN states [2]:

0b00: "Non ECN-Capable Transport", Non-ECT
0b10: "ECN Capable Transport", ECT(0)
0b01: "ECN Capable Transport", ECT(1)
0b11: "Congestion Encountered", CE

As an example:

$ tc filter add dev eth0 parent 1: protocol ip prio 10 \
matchall action skbmod ecn

Doing the above marks all ECT(0) and ECT(1) packets as CE. It does NOT
affect Non-ECT or non-IP packets. In the tc-police scenario mentioned
above, users may pipe a tc-police action and a tc-skbmod "ecn" action
together to achieve ECN-based rate limiting.

For TCP connections, upon receiving a CE packet, the receiver will respond
with an ECE packet, asking the sender to reduce their congestion window.
However ECN also works with other L4 protocols e.g. DCCP and SCTP [2], and
our implementation does not touch or care about L4 headers.

The updated tc-skbmod SYNOPSIS looks like the following:

tc ... action skbmod { set SETTABLE | swap SWAPPABLE | ecn } ...

Only one of "set", "swap" or "ecn" shall be used in a single tc-skbmod
command. Trying to use more than one of them at a time is considered
undefined behavior; pipe multiple tc-skbmod commands together instead.
"set" and "swap" only affect Ethernet packets, while "ecn" only affects
IPv{4,6} packets.

It is also worth mentioning that, in theory, the same effect could be
achieved by piping a "police" action and a "bpf" action using the
bpf_skb_ecn_set_ce() helper, but this requires eBPF programming from the
user, thus impractical.

Depends on patch "net/sched: act_skbmod: Skip non-Ethernet packets".

[1] https://datatracker.ietf.org/doc/html/rfc3168
[2] https://en.wikipedia.org/wiki/Explicit_Congestion_Notification

Reviewed-by: Cong Wang <cong.wang@bytedance.com>
Signed-off-by: Peilin Ye <peilin.ye@bytedance.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Peilin Ye and committed by
David S. Miller
56af5e74 d80f6d66

+33 -12
+1
include/uapi/linux/tc_act/tc_skbmod.h
··· 17 17 #define SKBMOD_F_SMAC 0x2 18 18 #define SKBMOD_F_ETYPE 0x4 19 19 #define SKBMOD_F_SWAPMAC 0x8 20 + #define SKBMOD_F_ECN 0x10 20 21 21 22 struct tc_skbmod { 22 23 tc_gen;
+32 -12
net/sched/act_skbmod.c
··· 11 11 #include <linux/kernel.h> 12 12 #include <linux/skbuff.h> 13 13 #include <linux/rtnetlink.h> 14 + #include <net/inet_ecn.h> 14 15 #include <net/netlink.h> 15 16 #include <net/pkt_sched.h> 16 17 #include <net/pkt_cls.h> ··· 22 21 static unsigned int skbmod_net_id; 23 22 static struct tc_action_ops act_skbmod_ops; 24 23 25 - #define MAX_EDIT_LEN ETH_HLEN 26 24 static int tcf_skbmod_act(struct sk_buff *skb, const struct tc_action *a, 27 25 struct tcf_result *res) 28 26 { 29 27 struct tcf_skbmod *d = to_skbmod(a); 30 - int action; 28 + int action, max_edit_len, err; 31 29 struct tcf_skbmod_params *p; 32 30 u64 flags; 33 - int err; 34 31 35 32 tcf_lastuse_update(&d->tcf_tm); 36 33 bstats_cpu_update(this_cpu_ptr(d->common.cpu_bstats), skb); ··· 37 38 if (unlikely(action == TC_ACT_SHOT)) 38 39 goto drop; 39 40 40 - if (!skb->dev || skb->dev->type != ARPHRD_ETHER) 41 - return action; 41 + max_edit_len = skb_mac_header_len(skb); 42 + p = rcu_dereference_bh(d->skbmod_p); 43 + flags = p->flags; 42 44 43 - /* XXX: if you are going to edit more fields beyond ethernet header 44 - * (example when you add IP header replacement or vlan swap) 45 - * then MAX_EDIT_LEN needs to change appropriately 46 - */ 47 - err = skb_ensure_writable(skb, MAX_EDIT_LEN); 45 + /* tcf_skbmod_init() guarantees "flags" to be one of the following: 46 + * 1. a combination of SKBMOD_F_{DMAC,SMAC,ETYPE} 47 + * 2. SKBMOD_F_SWAPMAC 48 + * 3. SKBMOD_F_ECN 49 + * SKBMOD_F_ECN only works with IP packets; all other flags only work with Ethernet 50 + * packets. 51 + */ 52 + if (flags == SKBMOD_F_ECN) { 53 + switch (skb_protocol(skb, true)) { 54 + case cpu_to_be16(ETH_P_IP): 55 + case cpu_to_be16(ETH_P_IPV6): 56 + max_edit_len += skb_network_header_len(skb); 57 + break; 58 + default: 59 + goto out; 60 + } 61 + } else if (!skb->dev || skb->dev->type != ARPHRD_ETHER) { 62 + goto out; 63 + } 64 + 65 + err = skb_ensure_writable(skb, max_edit_len); 48 66 if (unlikely(err)) /* best policy is to drop on the floor */ 49 67 goto drop; 50 68 51 - p = rcu_dereference_bh(d->skbmod_p); 52 - flags = p->flags; 53 69 if (flags & SKBMOD_F_DMAC) 54 70 ether_addr_copy(eth_hdr(skb)->h_dest, p->eth_dst); 55 71 if (flags & SKBMOD_F_SMAC) ··· 80 66 ether_addr_copy(eth_hdr(skb)->h_source, (u8 *)tmpaddr); 81 67 } 82 68 69 + if (flags & SKBMOD_F_ECN) 70 + INET_ECN_set_ce(skb); 71 + 72 + out: 83 73 return action; 84 74 85 75 drop: ··· 147 129 index = parm->index; 148 130 if (parm->flags & SKBMOD_F_SWAPMAC) 149 131 lflags = SKBMOD_F_SWAPMAC; 132 + if (parm->flags & SKBMOD_F_ECN) 133 + lflags = SKBMOD_F_ECN; 150 134 151 135 err = tcf_idr_check_alloc(tn, &index, a, bind); 152 136 if (err < 0)