Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'allow-configuration-of-multipath-hash-seed'

Petr Machata says:

====================
Allow configuration of multipath hash seed

Let me just quote the commit message of patch #2 here to inform the
motivation and some of the implementation:

When calculating hashes for the purpose of multipath forwarding,
both IPv4 and IPv6 code currently fall back on
flow_hash_from_keys(). That uses a randomly-generated seed. That's a
fine choice by default, but unfortunately some deployments may need
a tighter control over the seed used.

In this patchset, make the seed configurable by adding a new sysctl
key, net.ipv4.fib_multipath_hash_seed to control the seed. This seed
is used specifically for multipath forwarding and not for the other
concerns that flow_hash_from_keys() is used for, such as queue
selection. Expose the knob as sysctl because other such settings,
such as headers to hash, are also handled that way.

Despite being placed in the net.ipv4 namespace, the multipath seed
sysctl is used for both IPv4 and IPv6, similarly to e.g. a number of
TCP variables. Like those, the multipath hash seed is a per-netns
variable.

The seed used by flow_hash_from_keys() is a 128-bit quantity.
However it seems that usually the seed is a much more modest value.
32 bits seem typical (Cisco, Cumulus), some systems go even lower.
For that reason, and to decouple the user interface from
implementation details, go with a 32-bit quantity, which is then
quadruplicated to form the siphash key.

One example of use of this interface is avoiding hash polarization,
where two ECMP routers, one behind the other, happen to make consistent
hashing decisions, and as a result, part of the ECMP space of the latter
router is never used. Another is a load balancer where several machines
forward traffic to one of a number of leaves, and the forwarding
decisions need to be made consistently. (This is a case of a desired
hash polarization, mentioned e.g. in chapter 6.3 of [0].)

There has already been a proposal to include a hash seed control
interface in the past[1].

- Patches #1-#2 contain the substance of the work
- Patch #3 is an mlxsw offload
- Patches #4 and #5 are a selftest

[0] https://www.usenix.org/system/files/conference/nsdi18/nsdi18-araujo.pdf
[1] https://lore.kernel.org/netdev/YIlVpYMCn%2F8WfE1P@rnd/
====================

Link: https://lore.kernel.org/r/20240607151357.421181-1-petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+484 -14
+14
Documentation/networking/ip-sysctl.rst
··· 131 131 132 132 Default: 0x0007 (source IP, destination IP and IP protocol) 133 133 134 + fib_multipath_hash_seed - UNSIGNED INTEGER 135 + The seed value used when calculating hash for multipath routes. Applies 136 + to both IPv4 and IPv6 datapath. Only present for kernels built with 137 + CONFIG_IP_ROUTE_MULTIPATH enabled. 138 + 139 + When set to 0, the seed value used for multipath routing defaults to an 140 + internal random-generated one. 141 + 142 + The actual hashing algorithm is not specified -- there is no guarantee 143 + that a next hop distribution effected by a given seed will keep stable 144 + across kernel versions. 145 + 146 + Default: 0 (random) 147 + 134 148 fib_sync_mem - UNSIGNED INTEGER 135 149 Amount of dirty memory from fib entries that can be backlogged before 136 150 synchronize_rcu is forced.
+5 -1
drivers/net/ethernet/mellanox/mlxsw/spectrum_router.c
··· 11450 11450 { 11451 11451 bool old_inc_parsing_depth, new_inc_parsing_depth; 11452 11452 struct mlxsw_sp_mp_hash_config config = {}; 11453 + struct net *net = mlxsw_sp_net(mlxsw_sp); 11453 11454 char recr2_pl[MLXSW_REG_RECR2_LEN]; 11454 11455 unsigned long bit; 11455 11456 u32 seed; 11456 11457 int err; 11457 11458 11458 - seed = jhash(mlxsw_sp->base_mac, sizeof(mlxsw_sp->base_mac), 0); 11459 + seed = READ_ONCE(net->ipv4.sysctl_fib_multipath_hash_seed).user_seed; 11460 + if (!seed) 11461 + seed = jhash(mlxsw_sp->base_mac, sizeof(mlxsw_sp->base_mac), 0); 11462 + 11459 11463 mlxsw_reg_recr2_pack(recr2_pl, seed); 11460 11464 mlxsw_sp_mp4_hash_init(mlxsw_sp, &config); 11461 11465 mlxsw_sp_mp6_hash_init(mlxsw_sp, &config);
+2
include/net/flow_dissector.h
··· 442 442 } 443 443 444 444 u32 flow_hash_from_keys(struct flow_keys *keys); 445 + u32 flow_hash_from_keys_seed(struct flow_keys *keys, 446 + const siphash_key_t *keyval); 445 447 void skb_flow_get_icmp_tci(const struct sk_buff *skb, 446 448 struct flow_dissector_key_icmp *key_icmp, 447 449 const void *data, int thoff, int hlen);
+28
include/net/ip_fib.h
··· 520 520 #ifdef CONFIG_IP_ROUTE_MULTIPATH 521 521 int fib_multipath_hash(const struct net *net, const struct flowi4 *fl4, 522 522 const struct sk_buff *skb, struct flow_keys *flkeys); 523 + 524 + static void 525 + fib_multipath_hash_construct_key(siphash_key_t *key, u32 mp_seed) 526 + { 527 + u64 mp_seed_64 = mp_seed; 528 + 529 + key->key[0] = (mp_seed_64 << 32) | mp_seed_64; 530 + key->key[1] = key->key[0]; 531 + } 532 + 533 + static inline u32 fib_multipath_hash_from_keys(const struct net *net, 534 + struct flow_keys *keys) 535 + { 536 + siphash_aligned_key_t hash_key; 537 + u32 mp_seed; 538 + 539 + mp_seed = READ_ONCE(net->ipv4.sysctl_fib_multipath_hash_seed).mp_seed; 540 + fib_multipath_hash_construct_key(&hash_key, mp_seed); 541 + 542 + return flow_hash_from_keys_seed(keys, &hash_key); 543 + } 544 + #else 545 + static inline u32 fib_multipath_hash_from_keys(const struct net *net, 546 + struct flow_keys *keys) 547 + { 548 + return flow_hash_from_keys(keys); 549 + } 523 550 #endif 551 + 524 552 int fib_check_nh(struct net *net, struct fib_nh *nh, u32 table, u8 scope, 525 553 struct netlink_ext_ack *extack); 526 554 void fib_select_multipath(struct fib_result *res, int hash);
+8
include/net/netns/ipv4.h
··· 40 40 41 41 struct tcp_fastopen_context; 42 42 43 + #ifdef CONFIG_IP_ROUTE_MULTIPATH 44 + struct sysctl_fib_multipath_hash_seed { 45 + u32 user_seed; 46 + u32 mp_seed; 47 + }; 48 + #endif 49 + 43 50 struct netns_ipv4 { 44 51 /* Cacheline organization can be found documented in 45 52 * Documentation/networking/net_cachelines/netns_ipv4_sysctl.rst. ··· 253 246 #endif 254 247 #endif 255 248 #ifdef CONFIG_IP_ROUTE_MULTIPATH 249 + struct sysctl_fib_multipath_hash_seed sysctl_fib_multipath_hash_seed; 256 250 u32 sysctl_fib_multipath_hash_fields; 257 251 u8 sysctl_fib_multipath_use_neigh; 258 252 u8 sysctl_fib_multipath_hash_policy;
+7
net/core/flow_dissector.c
··· 1806 1806 } 1807 1807 EXPORT_SYMBOL(flow_hash_from_keys); 1808 1808 1809 + u32 flow_hash_from_keys_seed(struct flow_keys *keys, 1810 + const siphash_key_t *keyval) 1811 + { 1812 + return __flow_hash_from_keys(keys, keyval); 1813 + } 1814 + EXPORT_SYMBOL(flow_hash_from_keys_seed); 1815 + 1809 1816 static inline u32 ___skb_get_hash(const struct sk_buff *skb, 1810 1817 struct flow_keys *keys, 1811 1818 const siphash_key_t *keyval)
+6 -6
net/ipv4/route.c
··· 1923 1923 hash_keys.ports.dst = keys.ports.dst; 1924 1924 1925 1925 *p_has_inner = !!(keys.control.flags & FLOW_DIS_ENCAPSULATION); 1926 - return flow_hash_from_keys(&hash_keys); 1926 + return fib_multipath_hash_from_keys(net, &hash_keys); 1927 1927 } 1928 1928 1929 1929 static u32 fib_multipath_custom_hash_inner(const struct net *net, ··· 1972 1972 if (hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_DST_PORT) 1973 1973 hash_keys.ports.dst = keys.ports.dst; 1974 1974 1975 - return flow_hash_from_keys(&hash_keys); 1975 + return fib_multipath_hash_from_keys(net, &hash_keys); 1976 1976 } 1977 1977 1978 1978 static u32 fib_multipath_custom_hash_skb(const struct net *net, ··· 2009 2009 if (hash_fields & FIB_MULTIPATH_HASH_FIELD_DST_PORT) 2010 2010 hash_keys.ports.dst = fl4->fl4_dport; 2011 2011 2012 - return flow_hash_from_keys(&hash_keys); 2012 + return fib_multipath_hash_from_keys(net, &hash_keys); 2013 2013 } 2014 2014 2015 2015 /* if skb is set it will be used and fl4 can be NULL */ ··· 2030 2030 hash_keys.addrs.v4addrs.src = fl4->saddr; 2031 2031 hash_keys.addrs.v4addrs.dst = fl4->daddr; 2032 2032 } 2033 - mhash = flow_hash_from_keys(&hash_keys); 2033 + mhash = fib_multipath_hash_from_keys(net, &hash_keys); 2034 2034 break; 2035 2035 case 1: 2036 2036 /* skb is currently provided only when forwarding */ ··· 2064 2064 hash_keys.ports.dst = fl4->fl4_dport; 2065 2065 hash_keys.basic.ip_proto = fl4->flowi4_proto; 2066 2066 } 2067 - mhash = flow_hash_from_keys(&hash_keys); 2067 + mhash = fib_multipath_hash_from_keys(net, &hash_keys); 2068 2068 break; 2069 2069 case 2: 2070 2070 memset(&hash_keys, 0, sizeof(hash_keys)); ··· 2095 2095 hash_keys.addrs.v4addrs.src = fl4->saddr; 2096 2096 hash_keys.addrs.v4addrs.dst = fl4->daddr; 2097 2097 } 2098 - mhash = flow_hash_from_keys(&hash_keys); 2098 + mhash = fib_multipath_hash_from_keys(net, &hash_keys); 2099 2099 break; 2100 2100 case 3: 2101 2101 if (skb)
+66
net/ipv4/sysctl_net_ipv4.c
··· 464 464 465 465 return ret; 466 466 } 467 + 468 + static u32 proc_fib_multipath_hash_rand_seed __ro_after_init; 469 + 470 + static void proc_fib_multipath_hash_init_rand_seed(void) 471 + { 472 + get_random_bytes(&proc_fib_multipath_hash_rand_seed, 473 + sizeof(proc_fib_multipath_hash_rand_seed)); 474 + } 475 + 476 + static void proc_fib_multipath_hash_set_seed(struct net *net, u32 user_seed) 477 + { 478 + struct sysctl_fib_multipath_hash_seed new = { 479 + .user_seed = user_seed, 480 + .mp_seed = (user_seed ? user_seed : 481 + proc_fib_multipath_hash_rand_seed), 482 + }; 483 + 484 + WRITE_ONCE(net->ipv4.sysctl_fib_multipath_hash_seed, new); 485 + } 486 + 487 + static int proc_fib_multipath_hash_seed(struct ctl_table *table, int write, 488 + void *buffer, size_t *lenp, 489 + loff_t *ppos) 490 + { 491 + struct sysctl_fib_multipath_hash_seed *mphs; 492 + struct net *net = table->data; 493 + struct ctl_table tmp; 494 + u32 user_seed; 495 + int ret; 496 + 497 + mphs = &net->ipv4.sysctl_fib_multipath_hash_seed; 498 + user_seed = mphs->user_seed; 499 + 500 + tmp = *table; 501 + tmp.data = &user_seed; 502 + 503 + ret = proc_douintvec_minmax(&tmp, write, buffer, lenp, ppos); 504 + 505 + if (write && ret == 0) { 506 + proc_fib_multipath_hash_set_seed(net, user_seed); 507 + call_netevent_notifiers(NETEVENT_IPV4_MPATH_HASH_UPDATE, net); 508 + } 509 + 510 + return ret; 511 + } 512 + #else 513 + 514 + static void proc_fib_multipath_hash_init_rand_seed(void) 515 + { 516 + } 517 + 518 + static void proc_fib_multipath_hash_set_seed(struct net *net, u32 user_seed) 519 + { 520 + } 521 + 467 522 #endif 468 523 469 524 static struct ctl_table ipv4_table[] = { ··· 1127 1072 .extra1 = SYSCTL_ONE, 1128 1073 .extra2 = &fib_multipath_hash_fields_all_mask, 1129 1074 }, 1075 + { 1076 + .procname = "fib_multipath_hash_seed", 1077 + .data = &init_net, 1078 + .maxlen = sizeof(u32), 1079 + .mode = 0644, 1080 + .proc_handler = proc_fib_multipath_hash_seed, 1081 + }, 1130 1082 #endif 1131 1083 { 1132 1084 .procname = "ip_unprivileged_port_start", ··· 1612 1550 if (!net->ipv4.sysctl_local_reserved_ports) 1613 1551 goto err_ports; 1614 1552 1553 + proc_fib_multipath_hash_set_seed(net, 0); 1554 + 1615 1555 return 0; 1616 1556 1617 1557 err_ports: ··· 1647 1583 hdr = register_net_sysctl(&init_net, "net/ipv4", ipv4_table); 1648 1584 if (!hdr) 1649 1585 return -ENOMEM; 1586 + 1587 + proc_fib_multipath_hash_init_rand_seed(); 1650 1588 1651 1589 if (register_pernet_subsys(&ipv4_sysctl_ops)) { 1652 1590 unregister_net_sysctl_table(hdr);
+6 -6
net/ipv6/route.c
··· 2372 2372 hash_keys.ports.dst = keys.ports.dst; 2373 2373 2374 2374 *p_has_inner = !!(keys.control.flags & FLOW_DIS_ENCAPSULATION); 2375 - return flow_hash_from_keys(&hash_keys); 2375 + return fib_multipath_hash_from_keys(net, &hash_keys); 2376 2376 } 2377 2377 2378 2378 static u32 rt6_multipath_custom_hash_inner(const struct net *net, ··· 2421 2421 if (hash_fields & FIB_MULTIPATH_HASH_FIELD_INNER_DST_PORT) 2422 2422 hash_keys.ports.dst = keys.ports.dst; 2423 2423 2424 - return flow_hash_from_keys(&hash_keys); 2424 + return fib_multipath_hash_from_keys(net, &hash_keys); 2425 2425 } 2426 2426 2427 2427 static u32 rt6_multipath_custom_hash_skb(const struct net *net, ··· 2460 2460 if (hash_fields & FIB_MULTIPATH_HASH_FIELD_DST_PORT) 2461 2461 hash_keys.ports.dst = fl6->fl6_dport; 2462 2462 2463 - return flow_hash_from_keys(&hash_keys); 2463 + return fib_multipath_hash_from_keys(net, &hash_keys); 2464 2464 } 2465 2465 2466 2466 /* if skb is set it will be used and fl6 can be NULL */ ··· 2482 2482 hash_keys.tags.flow_label = (__force u32)flowi6_get_flowlabel(fl6); 2483 2483 hash_keys.basic.ip_proto = fl6->flowi6_proto; 2484 2484 } 2485 - mhash = flow_hash_from_keys(&hash_keys); 2485 + mhash = fib_multipath_hash_from_keys(net, &hash_keys); 2486 2486 break; 2487 2487 case 1: 2488 2488 if (skb) { ··· 2514 2514 hash_keys.ports.dst = fl6->fl6_dport; 2515 2515 hash_keys.basic.ip_proto = fl6->flowi6_proto; 2516 2516 } 2517 - mhash = flow_hash_from_keys(&hash_keys); 2517 + mhash = fib_multipath_hash_from_keys(net, &hash_keys); 2518 2518 break; 2519 2519 case 2: 2520 2520 memset(&hash_keys, 0, sizeof(hash_keys)); ··· 2551 2551 hash_keys.tags.flow_label = (__force u32)flowi6_get_flowlabel(fl6); 2552 2552 hash_keys.basic.ip_proto = fl6->flowi6_proto; 2553 2553 } 2554 - mhash = flow_hash_from_keys(&hash_keys); 2554 + mhash = fib_multipath_hash_from_keys(net, &hash_keys); 2555 2555 break; 2556 2556 case 3: 2557 2557 if (skb)
+1
tools/testing/selftests/net/forwarding/Makefile
··· 70 70 router_broadcast.sh \ 71 71 router_mpath_nh_res.sh \ 72 72 router_mpath_nh.sh \ 73 + router_mpath_seed.sh \ 73 74 router_multicast.sh \ 74 75 router_multipath.sh \ 75 76 router_nh.sh \
+8 -1
tools/testing/selftests/net/forwarding/lib.sh
··· 1134 1134 } 1135 1135 1136 1136 declare -A SYSCTL_ORIG 1137 + sysctl_save() 1138 + { 1139 + local key=$1; shift 1140 + 1141 + SYSCTL_ORIG[$key]=$(sysctl -n $key) 1142 + } 1143 + 1137 1144 sysctl_set() 1138 1145 { 1139 1146 local key=$1; shift 1140 1147 local value=$1; shift 1141 1148 1142 - SYSCTL_ORIG[$key]=$(sysctl -n $key) 1149 + sysctl_save "$key" 1143 1150 sysctl -qw $key="$value" 1144 1151 } 1145 1152
+333
tools/testing/selftests/net/forwarding/router_mpath_seed.sh
··· 1 + #!/bin/bash 2 + # SPDX-License-Identifier: GPL-2.0 3 + 4 + # +-------------------------+ +-------------------------+ 5 + # | H1 | | H2 | 6 + # | $h1 + | | + $h2 | 7 + # | 192.0.2.1/28 | | | | 192.0.2.34/28 | 8 + # | 2001:db8:1::1/64 | | | | 2001:db8:3::2/64 | 9 + # +-------------------|-----+ +-|-----------------------+ 10 + # | | 11 + # +-------------------|-----+ +-|-----------------------+ 12 + # | R1 | | | | R2 | 13 + # | $rp11 + | | + $rp21 | 14 + # | 192.0.2.2/28 | | 192.0.2.33/28 | 15 + # | 2001:db8:1::2/64 | | 2001:db8:3::1/64 | 16 + # | | | | 17 + # | $rp12 + | | + $rp22 | 18 + # | 192.0.2.17/28 | | | | 192.0.2.18..27/28 | 19 + # | 2001:db8:2::17/64 | | | | 2001:db8:2::18..27/64 | 20 + # +-------------------|-----+ +-|-----------------------+ 21 + # | | 22 + # `----------' 23 + 24 + ALL_TESTS=" 25 + ping_ipv4 26 + ping_ipv6 27 + test_mpath_seed_stability_ipv4 28 + test_mpath_seed_stability_ipv6 29 + test_mpath_seed_get 30 + test_mpath_seed_ipv4 31 + test_mpath_seed_ipv6 32 + " 33 + NUM_NETIFS=6 34 + source lib.sh 35 + 36 + h1_create() 37 + { 38 + simple_if_init $h1 192.0.2.1/28 2001:db8:1::1/64 39 + ip -4 route add 192.0.2.32/28 vrf v$h1 nexthop via 192.0.2.2 40 + ip -6 route add 2001:db8:3::/64 vrf v$h1 nexthop via 2001:db8:1::2 41 + } 42 + 43 + h1_destroy() 44 + { 45 + ip -6 route del 2001:db8:3::/64 vrf v$h1 nexthop via 2001:db8:1::2 46 + ip -4 route del 192.0.2.32/28 vrf v$h1 nexthop via 192.0.2.2 47 + simple_if_fini $h1 192.0.2.1/28 2001:db8:1::1/64 48 + } 49 + 50 + h2_create() 51 + { 52 + simple_if_init $h2 192.0.2.34/28 2001:db8:3::2/64 53 + ip -4 route add 192.0.2.0/28 vrf v$h2 nexthop via 192.0.2.33 54 + ip -6 route add 2001:db8:1::/64 vrf v$h2 nexthop via 2001:db8:3::1 55 + } 56 + 57 + h2_destroy() 58 + { 59 + ip -6 route del 2001:db8:1::/64 vrf v$h2 nexthop via 2001:db8:3::1 60 + ip -4 route del 192.0.2.0/28 vrf v$h2 nexthop via 192.0.2.33 61 + simple_if_fini $h2 192.0.2.34/28 2001:db8:3::2/64 62 + } 63 + 64 + router1_create() 65 + { 66 + simple_if_init $rp11 192.0.2.2/28 2001:db8:1::2/64 67 + __simple_if_init $rp12 v$rp11 192.0.2.17/28 2001:db8:2::17/64 68 + } 69 + 70 + router1_destroy() 71 + { 72 + __simple_if_fini $rp12 192.0.2.17/28 2001:db8:2::17/64 73 + simple_if_fini $rp11 192.0.2.2/28 2001:db8:1::2/64 74 + } 75 + 76 + router2_create() 77 + { 78 + simple_if_init $rp21 192.0.2.33/28 2001:db8:3::1/64 79 + __simple_if_init $rp22 v$rp21 192.0.2.18/28 2001:db8:2::18/64 80 + ip -4 route add 192.0.2.0/28 vrf v$rp21 nexthop via 192.0.2.17 81 + ip -6 route add 2001:db8:1::/64 vrf v$rp21 nexthop via 2001:db8:2::17 82 + } 83 + 84 + router2_destroy() 85 + { 86 + ip -6 route del 2001:db8:1::/64 vrf v$rp21 nexthop via 2001:db8:2::17 87 + ip -4 route del 192.0.2.0/28 vrf v$rp21 nexthop via 192.0.2.17 88 + __simple_if_fini $rp22 192.0.2.18/28 2001:db8:2::18/64 89 + simple_if_fini $rp21 192.0.2.33/28 2001:db8:3::1/64 90 + } 91 + 92 + nexthops_create() 93 + { 94 + local i 95 + for i in $(seq 10); do 96 + ip nexthop add id $((1000 + i)) via 192.0.2.18 dev $rp12 97 + ip nexthop add id $((2000 + i)) via 2001:db8:2::18 dev $rp12 98 + done 99 + 100 + ip nexthop add id 1000 group $(seq -s / 1001 1010) hw_stats on 101 + ip nexthop add id 2000 group $(seq -s / 2001 2010) hw_stats on 102 + ip -4 route add 192.0.2.32/28 vrf v$rp11 nhid 1000 103 + ip -6 route add 2001:db8:3::/64 vrf v$rp11 nhid 2000 104 + } 105 + 106 + nexthops_destroy() 107 + { 108 + local i 109 + 110 + ip -6 route del 2001:db8:3::/64 vrf v$rp11 nhid 2000 111 + ip -4 route del 192.0.2.32/28 vrf v$rp11 nhid 1000 112 + ip nexthop del id 2000 113 + ip nexthop del id 1000 114 + 115 + for i in $(seq 10 -1 1); do 116 + ip nexthop del id $((2000 + i)) 117 + ip nexthop del id $((1000 + i)) 118 + done 119 + } 120 + 121 + setup_prepare() 122 + { 123 + h1=${NETIFS[p1]} 124 + rp11=${NETIFS[p2]} 125 + 126 + rp12=${NETIFS[p3]} 127 + rp22=${NETIFS[p4]} 128 + 129 + rp21=${NETIFS[p5]} 130 + h2=${NETIFS[p6]} 131 + 132 + sysctl_save net.ipv4.fib_multipath_hash_seed 133 + 134 + vrf_prepare 135 + 136 + h1_create 137 + h2_create 138 + router1_create 139 + router2_create 140 + 141 + forwarding_enable 142 + } 143 + 144 + cleanup() 145 + { 146 + pre_cleanup 147 + 148 + forwarding_restore 149 + 150 + nexthops_destroy 151 + router2_destroy 152 + router1_destroy 153 + h2_destroy 154 + h1_destroy 155 + 156 + vrf_cleanup 157 + 158 + sysctl_restore net.ipv4.fib_multipath_hash_seed 159 + } 160 + 161 + ping_ipv4() 162 + { 163 + ping_test $h1 192.0.2.34 164 + } 165 + 166 + ping_ipv6() 167 + { 168 + ping6_test $h1 2001:db8:3::2 169 + } 170 + 171 + test_mpath_seed_get() 172 + { 173 + RET=0 174 + 175 + local i 176 + for ((i = 0; i < 100; i++)); do 177 + local seed_w=$((999331 * i)) 178 + sysctl -qw net.ipv4.fib_multipath_hash_seed=$seed_w 179 + local seed_r=$(sysctl -n net.ipv4.fib_multipath_hash_seed) 180 + ((seed_r == seed_w)) 181 + check_err $? "mpath seed written as $seed_w, but read as $seed_r" 182 + done 183 + 184 + log_test "mpath seed set/get" 185 + } 186 + 187 + nh_stats_snapshot() 188 + { 189 + local group_id=$1; shift 190 + 191 + ip -j -s -s nexthop show id $group_id | 192 + jq -c '[.[].group_stats | sort_by(.id) | .[].packets]' 193 + } 194 + 195 + get_active_nh() 196 + { 197 + local s0=$1; shift 198 + local s1=$1; shift 199 + 200 + jq -n --argjson s0 "$s0" --argjson s1 "$s1" -f /dev/stdin <<-"EOF" 201 + [range($s0 | length)] | 202 + map($s1[.] - $s0[.]) | 203 + map(if . > 8 then 1 else 0 end) | 204 + index(1) 205 + EOF 206 + } 207 + 208 + probe_nh() 209 + { 210 + local group_id=$1; shift 211 + local -a mz=("$@") 212 + 213 + local s0=$(nh_stats_snapshot $group_id) 214 + "${mz[@]}" 215 + local s1=$(nh_stats_snapshot $group_id) 216 + 217 + get_active_nh "$s0" "$s1" 218 + } 219 + 220 + probe_seed() 221 + { 222 + local group_id=$1; shift 223 + local seed=$1; shift 224 + local -a mz=("$@") 225 + 226 + sysctl -qw net.ipv4.fib_multipath_hash_seed=$seed 227 + probe_nh "$group_id" "${mz[@]}" 228 + } 229 + 230 + test_mpath_seed() 231 + { 232 + local group_id=$1; shift 233 + local what=$1; shift 234 + local -a mz=("$@") 235 + local ii 236 + 237 + RET=0 238 + 239 + local -a tally=(0 0 0 0 0 0 0 0 0 0) 240 + for ((ii = 0; ii < 100; ii++)); do 241 + local act=$(probe_seed $group_id $((999331 * ii)) "${mz[@]}") 242 + ((tally[act]++)) 243 + done 244 + 245 + local tally_str="${tally[@]}" 246 + for ((ii = 0; ii < ${#tally[@]}; ii++)); do 247 + ((tally[ii] > 0)) 248 + check_err $? "NH #$ii not hit, tally='$tally_str'" 249 + done 250 + 251 + log_test "mpath seed $what" 252 + sysctl -qw net.ipv4.fib_multipath_hash_seed=0 253 + } 254 + 255 + test_mpath_seed_ipv4() 256 + { 257 + test_mpath_seed 1000 IPv4 \ 258 + $MZ $h1 -A 192.0.2.1 -B 192.0.2.34 -q \ 259 + -p 64 -d 0 -c 10 -t udp 260 + } 261 + 262 + test_mpath_seed_ipv6() 263 + { 264 + test_mpath_seed 2000 IPv6 \ 265 + $MZ -6 $h1 -A 2001:db8:1::1 -B 2001:db8:3::2 -q \ 266 + -p 64 -d 0 -c 10 -t udp 267 + } 268 + 269 + check_mpath_seed_stability() 270 + { 271 + local seed=$1; shift 272 + local act_0=$1; shift 273 + local act_1=$1; shift 274 + 275 + ((act_0 == act_1)) 276 + check_err $? "seed $seed: active NH moved from $act_0 to $act_1 after seed change" 277 + } 278 + 279 + test_mpath_seed_stability() 280 + { 281 + local group_id=$1; shift 282 + local what=$1; shift 283 + local -a mz=("$@") 284 + 285 + RET=0 286 + 287 + local seed_0=0 288 + local seed_1=3221338814 289 + local seed_2=3735928559 290 + 291 + # Initial active NH before touching the seed at all. 292 + local act_ini=$(probe_nh $group_id "${mz[@]}") 293 + 294 + local act_0_0=$(probe_seed $group_id $seed_0 "${mz[@]}") 295 + local act_1_0=$(probe_seed $group_id $seed_1 "${mz[@]}") 296 + local act_2_0=$(probe_seed $group_id $seed_2 "${mz[@]}") 297 + 298 + local act_0_1=$(probe_seed $group_id $seed_0 "${mz[@]}") 299 + local act_1_1=$(probe_seed $group_id $seed_1 "${mz[@]}") 300 + local act_2_1=$(probe_seed $group_id $seed_2 "${mz[@]}") 301 + 302 + check_mpath_seed_stability initial $act_ini $act_0_0 303 + check_mpath_seed_stability $seed_0 $act_0_0 $act_0_1 304 + check_mpath_seed_stability $seed_1 $act_1_0 $act_1_1 305 + check_mpath_seed_stability $seed_2 $act_2_0 $act_2_1 306 + 307 + log_test "mpath seed stability $what" 308 + sysctl -qw net.ipv4.fib_multipath_hash_seed=0 309 + } 310 + 311 + test_mpath_seed_stability_ipv4() 312 + { 313 + test_mpath_seed_stability 1000 IPv4 \ 314 + $MZ $h1 -A 192.0.2.1 -B 192.0.2.34 -q \ 315 + -p 64 -d 0 -c 10 -t udp 316 + } 317 + 318 + test_mpath_seed_stability_ipv6() 319 + { 320 + test_mpath_seed_stability 2000 IPv6 \ 321 + $MZ -6 $h1 -A 2001:db8:1::1 -B 2001:db8:3::2 -q \ 322 + -p 64 -d 0 -c 10 -t udp 323 + } 324 + 325 + trap cleanup EXIT 326 + 327 + setup_prepare 328 + setup_wait 329 + nexthops_create 330 + 331 + tests_run 332 + 333 + exit $EXIT_STATUS