Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

ipv6: Flow label state ranges

This patch divides the IPv6 flow label space into two ranges:
0-7ffff is reserved for flow label manager, 80000-fffff will be
used for creating auto flow labels (per RFC6438). This only affects how
labels are set on transmit, it does not affect receive. This range split
can be disbaled by systcl.

Background:

IPv6 flow labels have been an unmitigated disappointment thus far
in the lifetime of IPv6. Support in HW devices to use them for ECMP
is lacking, and OSes don't turn them on by default. If we had these
we could get much better hashing in IPv6 networks without resorting
to DPI, possibly eliminating some of the motivations to to define new
encaps in UDP just for getting ECMP.

Unfortunately, the initial specfications of IPv6 did not clarify
how they are to be used. There has always been a vague concept that
these can be used for ECMP, flow hashing, etc. and we do now have a
good standard how to this in RFC6438. The problem is that flow labels
can be either stateful or stateless (as in RFC6438), and we are
presented with the possibility that a stateless label may collide
with a stateful one. Attempts to split the flow label space were
rejected in IETF. When we added support in Linux for RFC6438, we
could not turn on flow labels by default due to this conflict.

This patch splits the flow label space and should give us
a path to enabling auto flow labels by default for all IPv6 packets.
This is an API change so we need to consider compatibility with
existing deployment. The stateful range is chosen to be the lower
values in hopes that most uses would have chosen small numbers.

Once we resolve the stateless/stateful issue, we can proceed to
look at enabling RFC6438 flow labels by default (starting with
scaled testing).

Signed-off-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Tom Herbert and committed by
David S. Miller
82a584b7 7035870d

+29 -2
+8
Documentation/networking/ip-sysctl.txt
··· 1213 1213 FALSE: disabled 1214 1214 Default: false 1215 1215 1216 + flowlabel_state_ranges - BOOLEAN 1217 + Split the flow label number space into two ranges. 0-0x7FFFF is 1218 + reserved for the IPv6 flow manager facility, 0x80000-0xFFFFF 1219 + is reserved for stateless flow labels as described in RFC6437. 1220 + TRUE: enabled 1221 + FALSE: disabled 1222 + Default: true 1223 + 1216 1224 anycast_src_echo_reply - BOOLEAN 1217 1225 Controls the use of anycast addresses as source addresses for ICMPv6 1218 1226 echo reply
+7 -2
include/net/ipv6.h
··· 239 239 struct net *fl_net; 240 240 }; 241 241 242 - #define IPV6_FLOWINFO_MASK cpu_to_be32(0x0FFFFFFF) 243 - #define IPV6_FLOWLABEL_MASK cpu_to_be32(0x000FFFFF) 242 + #define IPV6_FLOWINFO_MASK cpu_to_be32(0x0FFFFFFF) 243 + #define IPV6_FLOWLABEL_MASK cpu_to_be32(0x000FFFFF) 244 + #define IPV6_FLOWLABEL_STATELESS_FLAG cpu_to_be32(0x00080000) 245 + 244 246 #define IPV6_TCLASS_MASK (IPV6_FLOWINFO_MASK & ~IPV6_FLOWLABEL_MASK) 245 247 #define IPV6_TCLASS_SHIFT 20 246 248 ··· 721 719 hash ^= hash >> 12; 722 720 723 721 flowlabel = (__force __be32)hash & IPV6_FLOWLABEL_MASK; 722 + 723 + if (net->ipv6.sysctl.flowlabel_state_ranges) 724 + flowlabel |= IPV6_FLOWLABEL_STATELESS_FLAG; 724 725 } 725 726 726 727 return flowlabel;
+1
include/net/netns/ipv6.h
··· 34 34 int fwmark_reflect; 35 35 int idgen_retries; 36 36 int idgen_delay; 37 + int flowlabel_state_ranges; 37 38 }; 38 39 39 40 struct netns_ipv6 {
+1
net/ipv6/af_inet6.c
··· 768 768 net->ipv6.sysctl.auto_flowlabels = 0; 769 769 net->ipv6.sysctl.idgen_retries = 3; 770 770 net->ipv6.sysctl.idgen_delay = 1 * HZ; 771 + net->ipv6.sysctl.flowlabel_state_ranges = 1; 771 772 atomic_set(&net->ipv6.fib6_sernum, 1); 772 773 773 774 err = ipv6_init_mibs(net);
+4
net/ipv6/ip6_flowlabel.c
··· 595 595 if (freq.flr_label & ~IPV6_FLOWLABEL_MASK) 596 596 return -EINVAL; 597 597 598 + if (net->ipv6.sysctl.flowlabel_state_ranges && 599 + (freq.flr_label & IPV6_FLOWLABEL_STATELESS_FLAG)) 600 + return -ERANGE; 601 + 598 602 fl = fl_create(net, sk, &freq, optval, optlen, &err); 599 603 if (!fl) 600 604 return err;
+8
net/ipv6/sysctl_net_ipv6.c
··· 68 68 .mode = 0644, 69 69 .proc_handler = proc_dointvec_jiffies, 70 70 }, 71 + { 72 + .procname = "flowlabel_state_ranges", 73 + .data = &init_net.ipv6.sysctl.flowlabel_state_ranges, 74 + .maxlen = sizeof(int), 75 + .mode = 0644, 76 + .proc_handler = proc_dointvec 77 + }, 71 78 { } 72 79 }; 73 80 ··· 116 109 ipv6_table[4].data = &net->ipv6.sysctl.fwmark_reflect; 117 110 ipv6_table[5].data = &net->ipv6.sysctl.idgen_retries; 118 111 ipv6_table[6].data = &net->ipv6.sysctl.idgen_delay; 112 + ipv6_table[7].data = &net->ipv6.sysctl.flowlabel_state_ranges; 119 113 120 114 ipv6_route_table = ipv6_route_sysctl_init(net); 121 115 if (!ipv6_route_table)