Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bonding: allow user-controlled output slave selection

v2: changed bonding module version, modified to apply on top of changes
from previous patch in series, and updated documentation to elaborate on
multiqueue awareness that now exists in bonding driver.

This patch give the user the ability to control the output slave for
round-robin and active-backup bonding. Similar functionality was
discussed in the past, but Jay Vosburgh indicated he would rather see a
feature like this added to existing modes rather than creating a
completely new mode. Jay's thoughts as well as Neil's input surrounding
some of the issues with the first implementation pushed us toward a
design that relied on the queue_mapping rather than skb marks.
Round-robin and active-backup modes were chosen as the first users of
this slave selection as they seemed like the most logical choices when
considering a multi-switch environment.

Round-robin mode works without any modification, but active-backup does
require inclusion of the first patch in this series and setting
the 'all_slaves_active' flag. This will allow reception of unicast traffic on
any of the backup interfaces.

This was tested with IPv4-based filters as well as VLAN-based filters
with good results.

More information as well as a configuration example is available in the
patch to Documentation/networking/bonding.txt.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Andy Gospodarek and committed by
David S. Miller
bb1d9123 ebd8e497

+277 -6
+81 -1
Documentation/networking/bonding.txt
··· 49 49 3.3 Configuring Bonding Manually with Ifenslave 50 50 3.3.1 Configuring Multiple Bonds Manually 51 51 3.4 Configuring Bonding Manually via Sysfs 52 + 3.5 Overriding Configuration for Special Cases 52 53 53 54 4. Querying Bonding Configuration 54 55 4.1 Bonding Configuration ··· 1319 1318 echo +eth2 > /sys/class/net/bond1/bonding/slaves 1320 1319 echo +eth3 > /sys/class/net/bond1/bonding/slaves 1321 1320 1321 + 3.5 Overriding Configuration for Special Cases 1322 + ---------------------------------------------- 1323 + When using the bonding driver, the physical port which transmits a frame is 1324 + typically selected by the bonding driver, and is not relevant to the user or 1325 + system administrator. The output port is simply selected using the policies of 1326 + the selected bonding mode. On occasion however, it is helpful to direct certain 1327 + classes of traffic to certain physical interfaces on output to implement 1328 + slightly more complex policies. For example, to reach a web server over a 1329 + bonded interface in which eth0 connects to a private network, while eth1 1330 + connects via a public network, it may be desirous to bias the bond to send said 1331 + traffic over eth0 first, using eth1 only as a fall back, while all other traffic 1332 + can safely be sent over either interface. Such configurations may be achieved 1333 + using the traffic control utilities inherent in linux. 1322 1334 1323 - 4. Querying Bonding Configuration 1335 + By default the bonding driver is multiqueue aware and 16 queues are created 1336 + when the driver initializes (see Documentation/networking/multiqueue.txt 1337 + for details). If more or less queues are desired the module parameter 1338 + tx_queues can be used to change this value. There is no sysfs parameter 1339 + available as the allocation is done at module init time. 1340 + 1341 + The output of the file /proc/net/bonding/bondX has changed so the output Queue 1342 + ID is now printed for each slave: 1343 + 1344 + Bonding Mode: fault-tolerance (active-backup) 1345 + Primary Slave: None 1346 + Currently Active Slave: eth0 1347 + MII Status: up 1348 + MII Polling Interval (ms): 0 1349 + Up Delay (ms): 0 1350 + Down Delay (ms): 0 1351 + 1352 + Slave Interface: eth0 1353 + MII Status: up 1354 + Link Failure Count: 0 1355 + Permanent HW addr: 00:1a:a0:12:8f:cb 1356 + Slave queue ID: 0 1357 + 1358 + Slave Interface: eth1 1359 + MII Status: up 1360 + Link Failure Count: 0 1361 + Permanent HW addr: 00:1a:a0:12:8f:cc 1362 + Slave queue ID: 2 1363 + 1364 + The queue_id for a slave can be set using the command: 1365 + 1366 + # echo "eth1:2" > /sys/class/net/bond0/bonding/queue_id 1367 + 1368 + Any interface that needs a queue_id set should set it with multiple calls 1369 + like the one above until proper priorities are set for all interfaces. On 1370 + distributions that allow configuration via initscripts, multiple 'queue_id' 1371 + arguments can be added to BONDING_OPTS to set all needed slave queues. 1372 + 1373 + These queue id's can be used in conjunction with the tc utility to configure 1374 + a multiqueue qdisc and filters to bias certain traffic to transmit on certain 1375 + slave devices. For instance, say we wanted, in the above configuration to 1376 + force all traffic bound to 192.168.1.100 to use eth1 in the bond as its output 1377 + device. The following commands would accomplish this: 1378 + 1379 + # tc qdisc add dev bond0 handle 1 root multiq 1380 + 1381 + # tc filter add dev bond0 protocol ip parent 1: prio 1 u32 match ip dst \ 1382 + 192.168.1.100 action skbedit queue_mapping 2 1383 + 1384 + These commands tell the kernel to attach a multiqueue queue discipline to the 1385 + bond0 interface and filter traffic enqueued to it, such that packets with a dst 1386 + ip of 192.168.1.100 have their output queue mapping value overwritten to 2. 1387 + This value is then passed into the driver, causing the normal output path 1388 + selection policy to be overridden, selecting instead qid 2, which maps to eth1. 1389 + 1390 + Note that qid values begin at 1. Qid 0 is reserved to initiate to the driver 1391 + that normal output policy selection should take place. One benefit to simply 1392 + leaving the qid for a slave to 0 is the multiqueue awareness in the bonding 1393 + driver that is now present. This awareness allows tc filters to be placed on 1394 + slave devices as well as bond devices and the bonding driver will simply act as 1395 + a pass-through for selecting output queues on the slave device rather than 1396 + output port selection. 1397 + 1398 + This feature first appeared in bonding driver version 3.7.0 and support for 1399 + output slave selection was limited to round-robin and active-backup modes. 1400 + 1401 + 4 Querying Bonding Configuration 1324 1402 ================================= 1325 1403 1326 1404 4.1 Bonding Configuration
+72 -3
drivers/net/bonding/bond_main.c
··· 90 90 #define BOND_LINK_ARP_INTERV 0 91 91 92 92 static int max_bonds = BOND_DEFAULT_MAX_BONDS; 93 + static int tx_queues = BOND_DEFAULT_TX_QUEUES; 93 94 static int num_grat_arp = 1; 94 95 static int num_unsol_na = 1; 95 96 static int miimon = BOND_LINK_MON_INTERV; ··· 112 111 113 112 module_param(max_bonds, int, 0); 114 113 MODULE_PARM_DESC(max_bonds, "Max number of bonded devices"); 114 + module_param(tx_queues, int, 0); 115 + MODULE_PARM_DESC(tx_queues, "Max number of transmit queues (default = 16)"); 115 116 module_param(num_grat_arp, int, 0644); 116 117 MODULE_PARM_DESC(num_grat_arp, "Number of gratuitous ARP packets to send on failover event"); 117 118 module_param(num_unsol_na, int, 0644); ··· 1542 1539 res = -ENOMEM; 1543 1540 goto err_undo_flags; 1544 1541 } 1542 + 1543 + /* 1544 + * Set the new_slave's queue_id to be zero. Queue ID mapping 1545 + * is set via sysfs or module option if desired. 1546 + */ 1547 + new_slave->queue_id = 0; 1545 1548 1546 1549 /* Save slave's original mtu and then set it to match the bond */ 1547 1550 new_slave->original_mtu = slave_dev->mtu; ··· 3294 3285 else 3295 3286 seq_puts(seq, "Aggregator ID: N/A\n"); 3296 3287 } 3288 + seq_printf(seq, "Slave queue ID: %d\n", slave->queue_id); 3297 3289 } 3298 3290 3299 3291 static int bond_info_seq_show(struct seq_file *seq, void *v) ··· 4431 4421 } 4432 4422 } 4433 4423 4424 + /* 4425 + * Lookup the slave that corresponds to a qid 4426 + */ 4427 + static inline int bond_slave_override(struct bonding *bond, 4428 + struct sk_buff *skb) 4429 + { 4430 + int i, res = 1; 4431 + struct slave *slave = NULL; 4432 + struct slave *check_slave; 4433 + 4434 + read_lock(&bond->lock); 4435 + 4436 + if (!BOND_IS_OK(bond) || !skb->queue_mapping) 4437 + goto out; 4438 + 4439 + /* Find out if any slaves have the same mapping as this skb. */ 4440 + bond_for_each_slave(bond, check_slave, i) { 4441 + if (check_slave->queue_id == skb->queue_mapping) { 4442 + slave = check_slave; 4443 + break; 4444 + } 4445 + } 4446 + 4447 + /* If the slave isn't UP, use default transmit policy. */ 4448 + if (slave && slave->queue_id && IS_UP(slave->dev) && 4449 + (slave->link == BOND_LINK_UP)) { 4450 + res = bond_dev_queue_xmit(bond, skb, slave->dev); 4451 + } 4452 + 4453 + out: 4454 + read_unlock(&bond->lock); 4455 + return res; 4456 + } 4457 + 4458 + static u16 bond_select_queue(struct net_device *dev, struct sk_buff *skb) 4459 + { 4460 + /* 4461 + * This helper function exists to help dev_pick_tx get the correct 4462 + * destination queue. Using a helper function skips the a call to 4463 + * skb_tx_hash and will put the skbs in the queue we expect on their 4464 + * way down to the bonding driver. 4465 + */ 4466 + return skb->queue_mapping; 4467 + } 4468 + 4434 4469 static netdev_tx_t bond_start_xmit(struct sk_buff *skb, struct net_device *dev) 4435 4470 { 4436 - const struct bonding *bond = netdev_priv(dev); 4471 + struct bonding *bond = netdev_priv(dev); 4472 + 4473 + if (TX_QUEUE_OVERRIDE(bond->params.mode)) { 4474 + if (!bond_slave_override(bond, skb)) 4475 + return NETDEV_TX_OK; 4476 + } 4437 4477 4438 4478 switch (bond->params.mode) { 4439 4479 case BOND_MODE_ROUNDROBIN: ··· 4568 4508 .ndo_open = bond_open, 4569 4509 .ndo_stop = bond_close, 4570 4510 .ndo_start_xmit = bond_start_xmit, 4511 + .ndo_select_queue = bond_select_queue, 4571 4512 .ndo_get_stats = bond_get_stats, 4572 4513 .ndo_do_ioctl = bond_do_ioctl, 4573 4514 .ndo_set_multicast_list = bond_set_multicast_list, ··· 4837 4776 } 4838 4777 } 4839 4778 4779 + if (tx_queues < 1 || tx_queues > 255) { 4780 + pr_warning("Warning: tx_queues (%d) should be between " 4781 + "1 and 255, resetting to %d\n", 4782 + tx_queues, BOND_DEFAULT_TX_QUEUES); 4783 + tx_queues = BOND_DEFAULT_TX_QUEUES; 4784 + } 4785 + 4840 4786 if ((all_slaves_active != 0) && (all_slaves_active != 1)) { 4841 4787 pr_warning("Warning: all_slaves_active module parameter (%d), " 4842 4788 "not of valid value (0/1), so it was set to " ··· 5021 4953 params->primary[0] = 0; 5022 4954 params->primary_reselect = primary_reselect_value; 5023 4955 params->fail_over_mac = fail_over_mac_value; 4956 + params->tx_queues = tx_queues; 5024 4957 params->all_slaves_active = all_slaves_active; 5025 4958 5026 4959 if (primary) { ··· 5109 5040 5110 5041 rtnl_lock(); 5111 5042 5112 - bond_dev = alloc_netdev(sizeof(struct bonding), name ? name : "", 5113 - bond_setup); 5043 + bond_dev = alloc_netdev_mq(sizeof(struct bonding), name ? name : "", 5044 + bond_setup, tx_queues); 5114 5045 if (!bond_dev) { 5115 5046 pr_err("%s: eek! can't alloc netdev!\n", name); 5116 5047 rtnl_unlock();
+116
drivers/net/bonding/bond_sysfs.c
··· 1412 1412 static DEVICE_ATTR(ad_partner_mac, S_IRUGO, bonding_show_ad_partner_mac, NULL); 1413 1413 1414 1414 /* 1415 + * Show the queue_ids of the slaves in the current bond. 1416 + */ 1417 + static ssize_t bonding_show_queue_id(struct device *d, 1418 + struct device_attribute *attr, 1419 + char *buf) 1420 + { 1421 + struct slave *slave; 1422 + int i, res = 0; 1423 + struct bonding *bond = to_bond(d); 1424 + 1425 + if (!rtnl_trylock()) 1426 + return restart_syscall(); 1427 + 1428 + read_lock(&bond->lock); 1429 + bond_for_each_slave(bond, slave, i) { 1430 + if (res > (PAGE_SIZE - 6)) { 1431 + /* not enough space for another interface name */ 1432 + if ((PAGE_SIZE - res) > 10) 1433 + res = PAGE_SIZE - 10; 1434 + res += sprintf(buf + res, "++more++ "); 1435 + break; 1436 + } 1437 + res += sprintf(buf + res, "%s:%d ", 1438 + slave->dev->name, slave->queue_id); 1439 + } 1440 + read_unlock(&bond->lock); 1441 + if (res) 1442 + buf[res-1] = '\n'; /* eat the leftover space */ 1443 + rtnl_unlock(); 1444 + return res; 1445 + } 1446 + 1447 + /* 1448 + * Set the queue_ids of the slaves in the current bond. The bond 1449 + * interface must be enslaved for this to work. 1450 + */ 1451 + static ssize_t bonding_store_queue_id(struct device *d, 1452 + struct device_attribute *attr, 1453 + const char *buffer, size_t count) 1454 + { 1455 + struct slave *slave, *update_slave; 1456 + struct bonding *bond = to_bond(d); 1457 + u16 qid; 1458 + int i, ret = count; 1459 + char *delim; 1460 + struct net_device *sdev = NULL; 1461 + 1462 + if (!rtnl_trylock()) 1463 + return restart_syscall(); 1464 + 1465 + /* delim will point to queue id if successful */ 1466 + delim = strchr(buffer, ':'); 1467 + if (!delim) 1468 + goto err_no_cmd; 1469 + 1470 + /* 1471 + * Terminate string that points to device name and bump it 1472 + * up one, so we can read the queue id there. 1473 + */ 1474 + *delim = '\0'; 1475 + if (sscanf(++delim, "%hd\n", &qid) != 1) 1476 + goto err_no_cmd; 1477 + 1478 + /* Check buffer length, valid ifname and queue id */ 1479 + if (strlen(buffer) > IFNAMSIZ || 1480 + !dev_valid_name(buffer) || 1481 + qid > bond->params.tx_queues) 1482 + goto err_no_cmd; 1483 + 1484 + /* Get the pointer to that interface if it exists */ 1485 + sdev = __dev_get_by_name(dev_net(bond->dev), buffer); 1486 + if (!sdev) 1487 + goto err_no_cmd; 1488 + 1489 + read_lock(&bond->lock); 1490 + 1491 + /* Search for thes slave and check for duplicate qids */ 1492 + update_slave = NULL; 1493 + bond_for_each_slave(bond, slave, i) { 1494 + if (sdev == slave->dev) 1495 + /* 1496 + * We don't need to check the matching 1497 + * slave for dups, since we're overwriting it 1498 + */ 1499 + update_slave = slave; 1500 + else if (qid && qid == slave->queue_id) { 1501 + goto err_no_cmd_unlock; 1502 + } 1503 + } 1504 + 1505 + if (!update_slave) 1506 + goto err_no_cmd_unlock; 1507 + 1508 + /* Actually set the qids for the slave */ 1509 + update_slave->queue_id = qid; 1510 + 1511 + read_unlock(&bond->lock); 1512 + out: 1513 + rtnl_unlock(); 1514 + return ret; 1515 + 1516 + err_no_cmd_unlock: 1517 + read_unlock(&bond->lock); 1518 + err_no_cmd: 1519 + pr_info("invalid input for queue_id set for %s.\n", 1520 + bond->dev->name); 1521 + ret = -EPERM; 1522 + goto out; 1523 + } 1524 + 1525 + static DEVICE_ATTR(queue_id, S_IRUGO | S_IWUSR, bonding_show_queue_id, 1526 + bonding_store_queue_id); 1527 + 1528 + 1529 + /* 1415 1530 * Show and set the all_slaves_active flag. 1416 1531 */ 1417 1532 static ssize_t bonding_show_slaves_active(struct device *d, ··· 1604 1489 &dev_attr_ad_actor_key.attr, 1605 1490 &dev_attr_ad_partner_key.attr, 1606 1491 &dev_attr_ad_partner_mac.attr, 1492 + &dev_attr_queue_id.attr, 1607 1493 &dev_attr_all_slaves_active.attr, 1608 1494 NULL, 1609 1495 };
+7 -2
drivers/net/bonding/bonding.h
··· 23 23 #include "bond_3ad.h" 24 24 #include "bond_alb.h" 25 25 26 - #define DRV_VERSION "3.6.0" 27 - #define DRV_RELDATE "September 26, 2009" 26 + #define DRV_VERSION "3.7.0" 27 + #define DRV_RELDATE "June 2, 2010" 28 28 #define DRV_NAME "bonding" 29 29 #define DRV_DESCRIPTION "Ethernet Channel Bonding Driver" 30 30 ··· 60 60 ((mode) == BOND_MODE_TLB) || \ 61 61 ((mode) == BOND_MODE_ALB)) 62 62 63 + #define TX_QUEUE_OVERRIDE(mode) \ 64 + (((mode) == BOND_MODE_ACTIVEBACKUP) || \ 65 + ((mode) == BOND_MODE_ROUNDROBIN)) 63 66 /* 64 67 * Less bad way to call ioctl from within the kernel; this needs to be 65 68 * done some other way to get the call out of interrupt context. ··· 134 131 char primary[IFNAMSIZ]; 135 132 int primary_reselect; 136 133 __be32 arp_targets[BOND_MAX_ARP_TARGETS]; 134 + int tx_queues; 137 135 int all_slaves_active; 138 136 }; 139 137 ··· 169 165 u8 perm_hwaddr[ETH_ALEN]; 170 166 u16 speed; 171 167 u8 duplex; 168 + u16 queue_id; 172 169 struct ad_slave_info ad_info; /* HUGE - better to dynamically alloc */ 173 170 struct tlb_slave_info tlb_info; 174 171 };
+1
include/linux/if_bonding.h
··· 83 83 84 84 #define BOND_DEFAULT_MAX_BONDS 1 /* Default maximum number of devices to support */ 85 85 86 + #define BOND_DEFAULT_TX_QUEUES 16 /* Default number of tx queues per device */ 86 87 /* hashing types */ 87 88 #define BOND_XMIT_POLICY_LAYER2 0 /* layer 2 (MAC only), default */ 88 89 #define BOND_XMIT_POLICY_LAYER34 1 /* layer 3+4 (IP ^ (TCP || UDP)) */