Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bridge: switchdev: Add forward mark support for stacked devices

switchdev_port_fwd_mark_set() is used to set the 'offload_fwd_mark' of
port netdevs so that packets being flooded by the device won't be
flooded twice.

It works by assigning a unique identifier (the ifindex of the first
bridge port) to bridge ports sharing the same parent ID. This prevents
packets from being flooded twice by the same switch, but will flood
packets through bridge ports belonging to a different switch.

This method is problematic when stacked devices are taken into account,
such as VLANs. In such cases, a physical port netdev can have upper
devices being members in two different bridges, thus requiring two
different 'offload_fwd_mark's to be configured on the port netdev, which
is impossible.

The main problem is that packet and netdev marking is performed at the
physical netdev level, whereas flooding occurs between bridge ports,
which are not necessarily port netdevs.

Instead, packet and netdev marking should really be done in the bridge
driver with the switch driver only telling it which packets it already
forwarded. The bridge driver will mark such packets using the mark
assigned to the ingress bridge port and will prevent the packet from
being forwarded through any bridge port sharing the same mark (i.e.
having the same parent ID).

Remove the current switchdev 'offload_fwd_mark' implementation and
instead implement the proposed method. In addition, make rocker - the
sole user of the mark - use the proposed method.

Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Ido Schimmel and committed by
David S. Miller
6bc506b4 5c326ab4

+117 -132
+4 -9
Documentation/networking/switchdev.txt
··· 283 283 bridge should not reflood the packet to the same ports the device flooded, 284 284 otherwise there will be duplicate packets on the wire. 285 285 286 - To avoid duplicate packets, the device/driver should mark a packet as already 287 - forwarded using skb->offload_fwd_mark. The same mark is set on the device 288 - ports in the domain using dev->offload_fwd_mark. If the skb->offload_fwd_mark 289 - is non-zero and matches the forwarding egress port's dev->skb_mark, the kernel 290 - will drop the skb right before transmit on the egress port, with the 291 - understanding that the device already forwarded the packet on same egress port. 292 - The driver can use switchdev_port_fwd_mark_set() to set a globally unique mark 293 - for port's dev->offload_fwd_mark, based on the port's parent ID (switch ID) and 294 - a group ifindex. 286 + To avoid duplicate packets, the switch driver should mark a packet as already 287 + forwarded by setting the skb->offload_fwd_mark bit. The bridge driver will mark 288 + the skb using the ingress bridge port's mark and prevent it from being forwarded 289 + through any bridge port with the same mark. 295 290 296 291 It is possible for the switch device to not handle flooding and push the 297 292 packets up to the bridge driver for flooding. This is not ideal as the number
+1 -1
drivers/net/ethernet/rocker/rocker_main.c
··· 2412 2412 skb->protocol = eth_type_trans(skb, rocker_port->dev); 2413 2413 2414 2414 if (rx_flags & ROCKER_RX_FLAGS_FWD_OFFLOAD) 2415 - skb->offload_fwd_mark = rocker_port->dev->offload_fwd_mark; 2415 + skb->offload_fwd_mark = 1; 2416 2416 2417 2417 rocker_port->dev->stats.rx_packets++; 2418 2418 rocker_port->dev->stats.rx_bytes += skb->len;
-4
drivers/net/ethernet/rocker/rocker_ofdpa.c
··· 2558 2558 struct ofdpa_port *ofdpa_port = rocker_port->wpriv; 2559 2559 int err; 2560 2560 2561 - switchdev_port_fwd_mark_set(ofdpa_port->dev, NULL, false); 2562 2561 rocker_port_set_learning(rocker_port, 2563 2562 !!(ofdpa_port->brport_flags & BR_LEARNING)); 2564 2563 ··· 2816 2817 ofdpa_port_internal_vlan_id_get(ofdpa_port, bridge->ifindex); 2817 2818 2818 2819 ofdpa_port->bridge_dev = bridge; 2819 - switchdev_port_fwd_mark_set(ofdpa_port->dev, bridge, true); 2820 2820 2821 2821 return ofdpa_port_vlan_add(ofdpa_port, NULL, OFDPA_UNTAGGED_VID, 0); 2822 2822 } ··· 2834 2836 ofdpa_port_internal_vlan_id_get(ofdpa_port, 2835 2837 ofdpa_port->dev->ifindex); 2836 2838 2837 - switchdev_port_fwd_mark_set(ofdpa_port->dev, ofdpa_port->bridge_dev, 2838 - false); 2839 2839 ofdpa_port->bridge_dev = NULL; 2840 2840 2841 2841 err = ofdpa_port_vlan_add(ofdpa_port, NULL, OFDPA_UNTAGGED_VID, 0);
-5
include/linux/netdevice.h
··· 1562 1562 * 1563 1563 * @xps_maps: XXX: need comments on this one 1564 1564 * 1565 - * @offload_fwd_mark: Offload device fwding mark 1566 - * 1567 1565 * @watchdog_timeo: Represents the timeout that is used by 1568 1566 * the watchdog (see dev_watchdog()) 1569 1567 * @watchdog_timer: List of timers ··· 1811 1813 #endif 1812 1814 #ifdef CONFIG_NET_CLS_ACT 1813 1815 struct tcf_proto __rcu *egress_cl_list; 1814 - #endif 1815 - #ifdef CONFIG_NET_SWITCHDEV 1816 - u32 offload_fwd_mark; 1817 1816 #endif 1818 1817 1819 1818 /* These may be needed for future network-power-down code. */
+5 -8
include/linux/skbuff.h
··· 612 612 * @no_fcs: Request NIC to treat last 4 bytes as Ethernet FCS 613 613 * @napi_id: id of the NAPI struct this skb came from 614 614 * @secmark: security marking 615 - * @offload_fwd_mark: fwding offload mark 616 615 * @mark: Generic packet mark 617 616 * @vlan_proto: vlan encapsulation protocol 618 617 * @vlan_tci: vlan tag control information ··· 729 730 __u8 ipvs_property:1; 730 731 __u8 inner_protocol_type:1; 731 732 __u8 remcsum_offload:1; 732 - /* 3 or 5 bit hole */ 733 + #ifdef CONFIG_NET_SWITCHDEV 734 + __u8 offload_fwd_mark:1; 735 + #endif 736 + /* 2, 4 or 5 bit hole */ 733 737 734 738 #ifdef CONFIG_NET_SCHED 735 739 __u16 tc_index; /* traffic control index */ ··· 759 757 unsigned int sender_cpu; 760 758 }; 761 759 #endif 762 - union { 763 760 #ifdef CONFIG_NETWORK_SECMARK 764 - __u32 secmark; 761 + __u32 secmark; 765 762 #endif 766 - #ifdef CONFIG_NET_SWITCHDEV 767 - __u32 offload_fwd_mark; 768 - #endif 769 - }; 770 763 771 764 union { 772 765 __u32 mark;
-6
include/net/switchdev.h
··· 347 347 return idx; 348 348 } 349 349 350 - static inline void switchdev_port_fwd_mark_set(struct net_device *dev, 351 - struct net_device *group_dev, 352 - bool joining) 353 - { 354 - } 355 - 356 350 static inline bool switchdev_port_same_parent_id(struct net_device *a, 357 351 struct net_device *b) 358 352 {
+2
net/bridge/Makefile
··· 20 20 21 21 bridge-$(CONFIG_BRIDGE_VLAN_FILTERING) += br_vlan.o 22 22 23 + bridge-$(CONFIG_NET_SWITCHDEV) += br_switchdev.o 24 + 23 25 obj-$(CONFIG_NETFILTER) += netfilter/
+2 -1
net/bridge/br_forward.c
··· 29 29 30 30 vg = nbp_vlan_group_rcu(p); 31 31 return ((p->flags & BR_HAIRPIN_MODE) || skb->dev != p->dev) && 32 - br_allowed_egress(vg, skb) && p->state == BR_STATE_FORWARDING; 32 + br_allowed_egress(vg, skb) && p->state == BR_STATE_FORWARDING && 33 + nbp_switchdev_allowed_egress(p, skb); 33 34 } 34 35 35 36 int br_dev_queue_push_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)
+7 -3
net/bridge/br_if.c
··· 545 545 if (err) 546 546 goto err5; 547 547 548 + err = nbp_switchdev_mark_set(p); 549 + if (err) 550 + goto err6; 551 + 548 552 dev_disable_lro(dev); 549 553 550 554 list_add_rcu(&p->list, &br->port_list); ··· 570 566 err = nbp_vlan_init(p); 571 567 if (err) { 572 568 netdev_err(dev, "failed to initialize vlan filtering on this port\n"); 573 - goto err6; 569 + goto err7; 574 570 } 575 571 576 572 spin_lock_bh(&br->lock); ··· 593 589 594 590 return 0; 595 591 596 - err6: 592 + err7: 597 593 list_del_rcu(&p->list); 598 594 br_fdb_delete_by_port(br, p, 0, 1); 599 595 nbp_update_port_count(br); 596 + err6: 600 597 netdev_upper_dev_unlink(dev, br->dev); 601 - 602 598 err5: 603 599 dev->priv_flags &= ~IFF_BRIDGE_PORT; 604 600 netdev_rx_handler_unregister(dev);
+2
net/bridge/br_input.c
··· 145 145 if (!br_allowed_ingress(p->br, nbp_vlan_group_rcu(p), skb, &vid)) 146 146 goto out; 147 147 148 + nbp_switchdev_frame_mark(p, skb); 149 + 148 150 /* insert into forwarding database after filtering to avoid spoofing */ 149 151 br = p->br; 150 152 if (p->flags & BR_LEARNING)
+37
net/bridge/br_private.h
··· 251 251 #ifdef CONFIG_BRIDGE_VLAN_FILTERING 252 252 struct net_bridge_vlan_group __rcu *vlgrp; 253 253 #endif 254 + #ifdef CONFIG_NET_SWITCHDEV 255 + int offload_fwd_mark; 256 + #endif 254 257 }; 255 258 256 259 #define br_auto_port(p) ((p)->flags & BR_AUTO_MASK) ··· 362 359 struct timer_list gc_timer; 363 360 struct kobject *ifobj; 364 361 u32 auto_cnt; 362 + 363 + #ifdef CONFIG_NET_SWITCHDEV 364 + int offload_fwd_mark; 365 + #endif 366 + 365 367 #ifdef CONFIG_BRIDGE_VLAN_FILTERING 366 368 struct net_bridge_vlan_group __rcu *vlgrp; 367 369 u8 vlan_enabled; ··· 388 380 389 381 #ifdef CONFIG_BRIDGE_VLAN_FILTERING 390 382 bool vlan_filtered; 383 + #endif 384 + 385 + #ifdef CONFIG_NET_SWITCHDEV 386 + int offload_fwd_mark; 391 387 #endif 392 388 }; 393 389 ··· 1045 1033 static inline int br_sysfs_addbr(struct net_device *dev) { return 0; } 1046 1034 static inline void br_sysfs_delbr(struct net_device *dev) { return; } 1047 1035 #endif /* CONFIG_SYSFS */ 1036 + 1037 + /* br_switchdev.c */ 1038 + #ifdef CONFIG_NET_SWITCHDEV 1039 + int nbp_switchdev_mark_set(struct net_bridge_port *p); 1040 + void nbp_switchdev_frame_mark(const struct net_bridge_port *p, 1041 + struct sk_buff *skb); 1042 + bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p, 1043 + const struct sk_buff *skb); 1044 + #else 1045 + static inline int nbp_switchdev_mark_set(struct net_bridge_port *p) 1046 + { 1047 + return 0; 1048 + } 1049 + 1050 + static inline void nbp_switchdev_frame_mark(const struct net_bridge_port *p, 1051 + struct sk_buff *skb) 1052 + { 1053 + } 1054 + 1055 + static inline bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p, 1056 + const struct sk_buff *skb) 1057 + { 1058 + return true; 1059 + } 1060 + #endif /* CONFIG_NET_SWITCHDEV */ 1048 1061 1049 1062 #endif
+57
net/bridge/br_switchdev.c
··· 1 + #include <linux/kernel.h> 2 + #include <linux/list.h> 3 + #include <linux/netdevice.h> 4 + #include <linux/rtnetlink.h> 5 + #include <linux/skbuff.h> 6 + #include <net/switchdev.h> 7 + 8 + #include "br_private.h" 9 + 10 + static int br_switchdev_mark_get(struct net_bridge *br, struct net_device *dev) 11 + { 12 + struct net_bridge_port *p; 13 + 14 + /* dev is yet to be added to the port list. */ 15 + list_for_each_entry(p, &br->port_list, list) { 16 + if (switchdev_port_same_parent_id(dev, p->dev)) 17 + return p->offload_fwd_mark; 18 + } 19 + 20 + return ++br->offload_fwd_mark; 21 + } 22 + 23 + int nbp_switchdev_mark_set(struct net_bridge_port *p) 24 + { 25 + struct switchdev_attr attr = { 26 + .orig_dev = p->dev, 27 + .id = SWITCHDEV_ATTR_ID_PORT_PARENT_ID, 28 + }; 29 + int err; 30 + 31 + ASSERT_RTNL(); 32 + 33 + err = switchdev_port_attr_get(p->dev, &attr); 34 + if (err) { 35 + if (err == -EOPNOTSUPP) 36 + return 0; 37 + return err; 38 + } 39 + 40 + p->offload_fwd_mark = br_switchdev_mark_get(p->br, p->dev); 41 + 42 + return 0; 43 + } 44 + 45 + void nbp_switchdev_frame_mark(const struct net_bridge_port *p, 46 + struct sk_buff *skb) 47 + { 48 + if (skb->offload_fwd_mark && !WARN_ON_ONCE(!p->offload_fwd_mark)) 49 + BR_INPUT_SKB_CB(skb)->offload_fwd_mark = p->offload_fwd_mark; 50 + } 51 + 52 + bool nbp_switchdev_allowed_egress(const struct net_bridge_port *p, 53 + const struct sk_buff *skb) 54 + { 55 + return !skb->offload_fwd_mark || 56 + BR_INPUT_SKB_CB(skb)->offload_fwd_mark != p->offload_fwd_mark; 57 + }
-10
net/core/dev.c
··· 3355 3355 else 3356 3356 skb_dst_force(skb); 3357 3357 3358 - #ifdef CONFIG_NET_SWITCHDEV 3359 - /* Don't forward if offload device already forwarded */ 3360 - if (skb->offload_fwd_mark && 3361 - skb->offload_fwd_mark == dev->offload_fwd_mark) { 3362 - consume_skb(skb); 3363 - rc = NET_XMIT_SUCCESS; 3364 - goto out; 3365 - } 3366 - #endif 3367 - 3368 3358 txq = netdev_pick_tx(dev, skb, accel_priv); 3369 3359 q = rcu_dereference_bh(txq->qdisc); 3370 3360
-85
net/switchdev/switchdev.c
··· 1305 1305 return netdev_phys_item_id_same(&a_attr.u.ppid, &b_attr.u.ppid); 1306 1306 } 1307 1307 EXPORT_SYMBOL_GPL(switchdev_port_same_parent_id); 1308 - 1309 - static u32 switchdev_port_fwd_mark_get(struct net_device *dev, 1310 - struct net_device *group_dev) 1311 - { 1312 - struct net_device *lower_dev; 1313 - struct list_head *iter; 1314 - 1315 - netdev_for_each_lower_dev(group_dev, lower_dev, iter) { 1316 - if (lower_dev == dev) 1317 - continue; 1318 - if (switchdev_port_same_parent_id(dev, lower_dev)) 1319 - return lower_dev->offload_fwd_mark; 1320 - return switchdev_port_fwd_mark_get(dev, lower_dev); 1321 - } 1322 - 1323 - return dev->ifindex; 1324 - } 1325 - 1326 - static void switchdev_port_fwd_mark_reset(struct net_device *group_dev, 1327 - u32 old_mark, u32 *reset_mark) 1328 - { 1329 - struct net_device *lower_dev; 1330 - struct list_head *iter; 1331 - 1332 - netdev_for_each_lower_dev(group_dev, lower_dev, iter) { 1333 - if (lower_dev->offload_fwd_mark == old_mark) { 1334 - if (!*reset_mark) 1335 - *reset_mark = lower_dev->ifindex; 1336 - lower_dev->offload_fwd_mark = *reset_mark; 1337 - } 1338 - switchdev_port_fwd_mark_reset(lower_dev, old_mark, reset_mark); 1339 - } 1340 - } 1341 - 1342 - /** 1343 - * switchdev_port_fwd_mark_set - Set port offload forwarding mark 1344 - * 1345 - * @dev: port device 1346 - * @group_dev: containing device 1347 - * @joining: true if dev is joining group; false if leaving group 1348 - * 1349 - * An ungrouped port's offload mark is just its ifindex. A grouped 1350 - * port's (member of a bridge, for example) offload mark is the ifindex 1351 - * of one of the ports in the group with the same parent (switch) ID. 1352 - * Ports on the same device in the same group will have the same mark. 1353 - * 1354 - * Example: 1355 - * 1356 - * br0 ifindex=9 1357 - * sw1p1 ifindex=2 mark=2 1358 - * sw1p2 ifindex=3 mark=2 1359 - * sw2p1 ifindex=4 mark=5 1360 - * sw2p2 ifindex=5 mark=5 1361 - * 1362 - * If sw2p2 leaves the bridge, we'll have: 1363 - * 1364 - * br0 ifindex=9 1365 - * sw1p1 ifindex=2 mark=2 1366 - * sw1p2 ifindex=3 mark=2 1367 - * sw2p1 ifindex=4 mark=4 1368 - * sw2p2 ifindex=5 mark=5 1369 - */ 1370 - void switchdev_port_fwd_mark_set(struct net_device *dev, 1371 - struct net_device *group_dev, 1372 - bool joining) 1373 - { 1374 - u32 mark = dev->ifindex; 1375 - u32 reset_mark = 0; 1376 - 1377 - if (group_dev) { 1378 - ASSERT_RTNL(); 1379 - if (joining) 1380 - mark = switchdev_port_fwd_mark_get(dev, group_dev); 1381 - else if (dev->offload_fwd_mark == mark) 1382 - /* Ohoh, this port was the mark reference port, 1383 - * but it's leaving the group, so reset the 1384 - * mark for the remaining ports in the group. 1385 - */ 1386 - switchdev_port_fwd_mark_reset(group_dev, mark, 1387 - &reset_mark); 1388 - } 1389 - 1390 - dev->offload_fwd_mark = mark; 1391 - } 1392 - EXPORT_SYMBOL_GPL(switchdev_port_fwd_mark_set);