Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

failover: allow name change on IFF_UP slave interfaces

When a netdev appears through hot plug then gets enslaved by a failover
master that is already up and running, the slave will be opened
right away after getting enslaved. Today there's a race that userspace
(udev) may fail to rename the slave if the kernel (net_failover)
opens the slave earlier than when the userspace rename happens.
Unlike bond or team, the primary slave of failover can't be renamed by
userspace ahead of time, since the kernel initiated auto-enslavement is
unable to, or rather, is never meant to be synchronized with the rename
request from userspace.

As the failover slave interfaces are not designed to be operated
directly by userspace apps: IP configuration, filter rules with
regard to network traffic passing and etc., should all be done on master
interface. In general, userspace apps only care about the
name of master interface, while slave names are less important as long
as admin users can see reliable names that may carry
other information describing the netdev. For e.g., they can infer that
"ens3nsby" is a standby slave of "ens3", while for a
name like "eth0" they can't tell which master it belongs to.

Historically the name of IFF_UP interface can't be changed because
there might be admin script or management software that is already
relying on such behavior and assumes that the slave name can't be
changed once UP. But failover is special: with the in-kernel
auto-enslavement mechanism, the userspace expectation for device
enumeration and bring-up order is already broken. Previously initramfs
and various userspace config tools were modified to bypass failover
slaves because of auto-enslavement and duplicate MAC address. Similarly,
in case that users care about seeing reliable slave name, the new type
of failover slaves needs to be taken care of specifically in userspace
anyway.

It's less risky to lift up the rename restriction on failover slave
which is already UP. Although it's possible this change may potentially
break userspace component (most likely configuration scripts or
management software) that assumes slave name can't be changed while
UP, it's relatively a limited and controllable set among all userspace
components, which can be fixed specifically to listen for the rename
events on failover slaves. Userspace component interacting with slaves
is expected to be changed to operate on failover master interface
instead, as the failover slave is dynamic in nature which may come and
go at any point. The goal is to make the role of failover slaves less
relevant, and userspace components should only deal with failover master
in the long run.

Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
Signed-off-by: Si-Wei Liu <si-wei.liu@oracle.com>
Reviewed-by: Liran Alon <liran.alon@oracle.com>
Acked-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Si-Wei Liu and committed by
David S. Miller
8065a779 43c2adb9

+21 -4
+3
include/linux/netdevice.h
··· 1500 1500 * @IFF_FAILOVER: device is a failover master device 1501 1501 * @IFF_FAILOVER_SLAVE: device is lower dev of a failover master device 1502 1502 * @IFF_L3MDEV_RX_HANDLER: only invoke the rx handler of L3 master device 1503 + * @IFF_LIVE_RENAME_OK: rename is allowed while device is up and running 1503 1504 */ 1504 1505 enum netdev_priv_flags { 1505 1506 IFF_802_1Q_VLAN = 1<<0, ··· 1533 1532 IFF_FAILOVER = 1<<27, 1534 1533 IFF_FAILOVER_SLAVE = 1<<28, 1535 1534 IFF_L3MDEV_RX_HANDLER = 1<<29, 1535 + IFF_LIVE_RENAME_OK = 1<<30, 1536 1536 }; 1537 1537 1538 1538 #define IFF_802_1Q_VLAN IFF_802_1Q_VLAN ··· 1565 1563 #define IFF_FAILOVER IFF_FAILOVER 1566 1564 #define IFF_FAILOVER_SLAVE IFF_FAILOVER_SLAVE 1567 1565 #define IFF_L3MDEV_RX_HANDLER IFF_L3MDEV_RX_HANDLER 1566 + #define IFF_LIVE_RENAME_OK IFF_LIVE_RENAME_OK 1568 1567 1569 1568 /** 1570 1569 * struct net_device - The DEVICE structure.
+15 -1
net/core/dev.c
··· 1184 1184 BUG_ON(!dev_net(dev)); 1185 1185 1186 1186 net = dev_net(dev); 1187 - if (dev->flags & IFF_UP) 1187 + 1188 + /* Some auto-enslaved devices e.g. failover slaves are 1189 + * special, as userspace might rename the device after 1190 + * the interface had been brought up and running since 1191 + * the point kernel initiated auto-enslavement. Allow 1192 + * live name change even when these slave devices are 1193 + * up and running. 1194 + * 1195 + * Typically, users of these auto-enslaving devices 1196 + * don't actually care about slave name change, as 1197 + * they are supposed to operate on master interface 1198 + * directly. 1199 + */ 1200 + if (dev->flags & IFF_UP && 1201 + likely(!(dev->priv_flags & IFF_LIVE_RENAME_OK))) 1188 1202 return -EBUSY; 1189 1203 1190 1204 write_seqcount_begin(&devnet_rename_seq);
+3 -3
net/core/failover.c
··· 80 80 goto err_upper_link; 81 81 } 82 82 83 - slave_dev->priv_flags |= IFF_FAILOVER_SLAVE; 83 + slave_dev->priv_flags |= (IFF_FAILOVER_SLAVE | IFF_LIVE_RENAME_OK); 84 84 85 85 if (fops && fops->slave_register && 86 86 !fops->slave_register(slave_dev, failover_dev)) 87 87 return NOTIFY_OK; 88 88 89 89 netdev_upper_dev_unlink(slave_dev, failover_dev); 90 - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; 90 + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_LIVE_RENAME_OK); 91 91 err_upper_link: 92 92 netdev_rx_handler_unregister(slave_dev); 93 93 done: ··· 121 121 122 122 netdev_rx_handler_unregister(slave_dev); 123 123 netdev_upper_dev_unlink(slave_dev, failover_dev); 124 - slave_dev->priv_flags &= ~IFF_FAILOVER_SLAVE; 124 + slave_dev->priv_flags &= ~(IFF_FAILOVER_SLAVE | IFF_LIVE_RENAME_OK); 125 125 126 126 if (fops && fops->slave_unregister && 127 127 !fops->slave_unregister(slave_dev, failover_dev))