Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

net: ipv6: Allow shorthand delete of all nexthops in multipath route

IPv4 allows multipath routes to be deleted using just the prefix and
length. For example:
$ ip ro ls vrf red
unreachable default metric 8192
1.1.1.0/24
nexthop via 10.100.1.254 dev eth1 weight 1
nexthop via 10.11.200.2 dev eth11.200 weight 1
10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3

$ ip ro del 1.1.1.0/24 vrf red

$ ip ro ls vrf red
unreachable default metric 8192
10.11.200.0/24 dev eth11.200 proto kernel scope link src 10.11.200.3
10.100.1.0/24 dev eth1 proto kernel scope link src 10.100.1.3

The same notation does not work with IPv6 because of how multipath routes
are implemented for IPv6. For IPv6 only the first nexthop of a multipath
route is deleted if the request contains only a prefix and length. This
leads to unnecessary complexity in userspace dealing with IPv6 multipath
routes.

This patch allows all nexthops to be deleted without specifying each one
in the delete request. Internally, this is done by walking the sibling
list of the route matching the specifications given (prefix, length,
metric, protocol, etc).

$ ip -6 ro ls vrf red
2001:db8:1::/120 dev eth1 proto kernel metric 256 pref medium
2001:db8:2::/120 dev eth2 proto kernel metric 256 pref medium
2001:db8:200::/120 via 2001:db8:1::2 dev eth1 metric 1024 pref medium
2001:db8:200::/120 via 2001:db8:2::2 dev eth2 metric 1024 pref medium
...

$ ip -6 ro del vrf red 2001:db8:200::/120

$ ip -6 ro ls vrf red
2001:db8:1::/120 dev eth1 proto kernel metric 256 pref medium
2001:db8:2::/120 dev eth2 proto kernel metric 256 pref medium
...

Because IPv6 allows individual nexthops to be deleted without deleting
the entire route, the ip6_route_multipath_del and non-multipath code
path (ip6_route_del) have to be discriminated so that all nexthops are
only deleted for the latter case. This is done by making the existing
fc_type in fib6_config a u16 and then adding a new u16 field with
fc_delete_all_nh as the first bit.

Suggested-by: Dinesh Dutt <ddutt@cumulusnetworks.com>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

David Ahern and committed by
David S. Miller
0ae81335 4d6308aa

+39 -3
+3 -1
include/net/ip6_fib.h
··· 37 37 int fc_ifindex; 38 38 u32 fc_flags; 39 39 u32 fc_protocol; 40 - u32 fc_type; /* only 8 bits are used */ 40 + u16 fc_type; /* only 8 bits are used */ 41 + u16 fc_delete_all_nh : 1, 42 + __unused : 15; 41 43 42 44 struct in6_addr fc_dst; 43 45 struct in6_addr fc_src;
+36 -2
net/ipv6/route.c
··· 2143 2143 return __ip6_del_rt(rt, &info); 2144 2144 } 2145 2145 2146 + static int __ip6_del_rt_siblings(struct rt6_info *rt, struct fib6_config *cfg) 2147 + { 2148 + struct nl_info *info = &cfg->fc_nlinfo; 2149 + struct fib6_table *table; 2150 + int err; 2151 + 2152 + table = rt->rt6i_table; 2153 + write_lock_bh(&table->tb6_lock); 2154 + 2155 + if (rt->rt6i_nsiblings && cfg->fc_delete_all_nh) { 2156 + struct rt6_info *sibling, *next_sibling; 2157 + 2158 + list_for_each_entry_safe(sibling, next_sibling, 2159 + &rt->rt6i_siblings, 2160 + rt6i_siblings) { 2161 + err = fib6_del(sibling, info); 2162 + if (err) 2163 + goto out; 2164 + } 2165 + } 2166 + 2167 + err = fib6_del(rt, info); 2168 + out: 2169 + write_unlock_bh(&table->tb6_lock); 2170 + ip6_rt_put(rt); 2171 + return err; 2172 + } 2173 + 2146 2174 static int ip6_route_del(struct fib6_config *cfg) 2147 2175 { 2148 2176 struct fib6_table *table; ··· 2207 2179 dst_hold(&rt->dst); 2208 2180 read_unlock_bh(&table->tb6_lock); 2209 2181 2210 - return __ip6_del_rt(rt, &cfg->fc_nlinfo); 2182 + /* if gateway was specified only delete the one hop */ 2183 + if (cfg->fc_flags & RTF_GATEWAY) 2184 + return __ip6_del_rt(rt, &cfg->fc_nlinfo); 2185 + 2186 + return __ip6_del_rt_siblings(rt, cfg); 2211 2187 } 2212 2188 } 2213 2189 read_unlock_bh(&table->tb6_lock); ··· 3174 3142 3175 3143 if (cfg.fc_mp) 3176 3144 return ip6_route_multipath_del(&cfg); 3177 - else 3145 + else { 3146 + cfg.fc_delete_all_nh = 1; 3178 3147 return ip6_route_del(&cfg); 3148 + } 3179 3149 } 3180 3150 3181 3151 static int inet6_rtm_newroute(struct sk_buff *skb, struct nlmsghdr *nlh)