Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'net-fix-issues-around-register_netdevice-failures'

Jakub Kicinski says:

====================
net: fix issues around register_netdevice() failures

This series attempts to clean up the life cycle of struct
net_device. Dave has added dev->needs_free_netdev in the
past to fix double frees, we can lean on that mechanism
a little more to fix remaining issues with register_netdevice().

This is the next chapter of the saga which already includes:
commit 0e0eee2465df ("net: correct error path in rtnl_newlink()")
commit e51fb152318e ("rtnetlink: fix a memory leak when ->newlink fails")
commit cf124db566e6 ("net: Fix inconsistent teardown and release of private netdev state.")
commit 93ee31f14f6f ("[NET]: Fix free_netdev on register_netdev failure.")
commit 814152a89ed5 ("net: fix memleak in register_netdevice()")
commit 10cc514f451a ("net: Fix null de-reference of device refcount")

The immediate problem which gets fixed here is that calling
free_netdev() right after unregister_netdevice() is illegal
because we need to release rtnl_lock first, to let the
unregistration finish. Note that unregister_netdevice() is
just a wrapper of unregister_netdevice_queue(), it only
does half of the job.

Where this limitation becomes most problematic is in failure
modes of register_netdevice(). There is a notifier call right
at the end of it, which lets other subsystems veto the entire
thing. At which point we should really go through a full
unregister_netdevice(), but we can't because callers may
go straight to free_netdev() after the failure, and that's
no bueno (see the previous paragraph).

This set makes free_netdev() more lenient, when device
is still being unregistered free_netdev() will simply set
dev->needs_free_netdev and let the unregister process do
the freeing.

With the free_netdev() problem out of the way failures in
register_netdevice() can make use of net_todo, again.
Users are still expected to call free_netdev() right after
failure but that will only set dev->needs_free_netdev.

To prevent the pathological case of:

dev->needs_free_netdev = true;
if (register_netdevice(dev)) {
rtnl_unlock();
free_netdev(dev);
}

make register_netdevice()'s failure clear dev->needs_free_netdev.

Problems described above are only present with register_netdevice() /
unregister_netdevice(). We have two parallel APIs for registration
of devices:
- those called outside rtnl_lock (register_netdev(), and
unregister_netdev());
- and those to be used under rtnl_lock - register_netdevice()
and unregister_netdevice().
The former is trivial and has no problems. The alternative
approach to fix the latter would be to also separate the
freeing functions - i.e. add free_netdevice(). This has been
implemented (incl. converting all relevant calls in the tree)
but it feels a little unnecessary to put the burden of choosing
the right free_netdev{,ice}() call on the programmer when we
can "just do the right thing" by default.
====================

Link: https://lore.kernel.org/r/20210106184007.1821480-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+187 -36
+165 -6
Documentation/networking/netdevices.rst
··· 10 10 The following is a random collection of documentation regarding 11 11 network devices. 12 12 13 - struct net_device allocation rules 14 - ================================== 13 + struct net_device lifetime rules 14 + ================================ 15 15 Network device structures need to persist even after module is unloaded and 16 16 must be allocated with alloc_netdev_mqs() and friends. 17 17 If device has registered successfully, it will be freed on last use 18 - by free_netdev(). This is required to handle the pathologic case cleanly 19 - (example: rmmod mydriver </sys/class/net/myeth/mtu ) 18 + by free_netdev(). This is required to handle the pathological case cleanly 19 + (example: ``rmmod mydriver </sys/class/net/myeth/mtu``) 20 20 21 - alloc_netdev_mqs()/alloc_netdev() reserve extra space for driver 21 + alloc_netdev_mqs() / alloc_netdev() reserve extra space for driver 22 22 private data which gets freed when the network device is freed. If 23 23 separately allocated data is attached to the network device 24 - (netdev_priv(dev)) then it is up to the module exit handler to free that. 24 + (netdev_priv()) then it is up to the module exit handler to free that. 25 + 26 + There are two groups of APIs for registering struct net_device. 27 + First group can be used in normal contexts where ``rtnl_lock`` is not already 28 + held: register_netdev(), unregister_netdev(). 29 + Second group can be used when ``rtnl_lock`` is already held: 30 + register_netdevice(), unregister_netdevice(), free_netdevice(). 31 + 32 + Simple drivers 33 + -------------- 34 + 35 + Most drivers (especially device drivers) handle lifetime of struct net_device 36 + in context where ``rtnl_lock`` is not held (e.g. driver probe and remove paths). 37 + 38 + In that case the struct net_device registration is done using 39 + the register_netdev(), and unregister_netdev() functions: 40 + 41 + .. code-block:: c 42 + 43 + int probe() 44 + { 45 + struct my_device_priv *priv; 46 + int err; 47 + 48 + dev = alloc_netdev_mqs(...); 49 + if (!dev) 50 + return -ENOMEM; 51 + priv = netdev_priv(dev); 52 + 53 + /* ... do all device setup before calling register_netdev() ... 54 + */ 55 + 56 + err = register_netdev(dev); 57 + if (err) 58 + goto err_undo; 59 + 60 + /* net_device is visible to the user! */ 61 + 62 + err_undo: 63 + /* ... undo the device setup ... */ 64 + free_netdev(dev); 65 + return err; 66 + } 67 + 68 + void remove() 69 + { 70 + unregister_netdev(dev); 71 + free_netdev(dev); 72 + } 73 + 74 + Note that after calling register_netdev() the device is visible in the system. 75 + Users can open it and start sending / receiving traffic immediately, 76 + or run any other callback, so all initialization must be done prior to 77 + registration. 78 + 79 + unregister_netdev() closes the device and waits for all users to be done 80 + with it. The memory of struct net_device itself may still be referenced 81 + by sysfs but all operations on that device will fail. 82 + 83 + free_netdev() can be called after unregister_netdev() returns on when 84 + register_netdev() failed. 85 + 86 + Device management under RTNL 87 + ---------------------------- 88 + 89 + Registering struct net_device while in context which already holds 90 + the ``rtnl_lock`` requires extra care. In those scenarios most drivers 91 + will want to make use of struct net_device's ``needs_free_netdev`` 92 + and ``priv_destructor`` members for freeing of state. 93 + 94 + Example flow of netdev handling under ``rtnl_lock``: 95 + 96 + .. code-block:: c 97 + 98 + static void my_setup(struct net_device *dev) 99 + { 100 + dev->needs_free_netdev = true; 101 + } 102 + 103 + static void my_destructor(struct net_device *dev) 104 + { 105 + some_obj_destroy(priv->obj); 106 + some_uninit(priv); 107 + } 108 + 109 + int create_link() 110 + { 111 + struct my_device_priv *priv; 112 + int err; 113 + 114 + ASSERT_RTNL(); 115 + 116 + dev = alloc_netdev(sizeof(*priv), "net%d", NET_NAME_UNKNOWN, my_setup); 117 + if (!dev) 118 + return -ENOMEM; 119 + priv = netdev_priv(dev); 120 + 121 + /* Implicit constructor */ 122 + err = some_init(priv); 123 + if (err) 124 + goto err_free_dev; 125 + 126 + priv->obj = some_obj_create(); 127 + if (!priv->obj) { 128 + err = -ENOMEM; 129 + goto err_some_uninit; 130 + } 131 + /* End of constructor, set the destructor: */ 132 + dev->priv_destructor = my_destructor; 133 + 134 + err = register_netdevice(dev); 135 + if (err) 136 + /* register_netdevice() calls destructor on failure */ 137 + goto err_free_dev; 138 + 139 + /* If anything fails now unregister_netdevice() (or unregister_netdev()) 140 + * will take care of calling my_destructor and free_netdev(). 141 + */ 142 + 143 + return 0; 144 + 145 + err_some_uninit: 146 + some_uninit(priv); 147 + err_free_dev: 148 + free_netdev(dev); 149 + return err; 150 + } 151 + 152 + If struct net_device.priv_destructor is set it will be called by the core 153 + some time after unregister_netdevice(), it will also be called if 154 + register_netdevice() fails. The callback may be invoked with or without 155 + ``rtnl_lock`` held. 156 + 157 + There is no explicit constructor callback, driver "constructs" the private 158 + netdev state after allocating it and before registration. 159 + 160 + Setting struct net_device.needs_free_netdev makes core call free_netdevice() 161 + automatically after unregister_netdevice() when all references to the device 162 + are gone. It only takes effect after a successful call to register_netdevice() 163 + so if register_netdevice() fails driver is responsible for calling 164 + free_netdev(). 165 + 166 + free_netdev() is safe to call on error paths right after unregister_netdevice() 167 + or when register_netdevice() fails. Parts of netdev (de)registration process 168 + happen after ``rtnl_lock`` is released, therefore in those cases free_netdev() 169 + will defer some of the processing until ``rtnl_lock`` is released. 170 + 171 + Devices spawned from struct rtnl_link_ops should never free the 172 + struct net_device directly. 173 + 174 + .ndo_init and .ndo_uninit 175 + ~~~~~~~~~~~~~~~~~~~~~~~~~ 176 + 177 + ``.ndo_init`` and ``.ndo_uninit`` callbacks are called during net_device 178 + registration and de-registration, under ``rtnl_lock``. Drivers can use 179 + those e.g. when parts of their init process need to run under ``rtnl_lock``. 180 + 181 + ``.ndo_init`` runs before device is visible in the system, ``.ndo_uninit`` 182 + runs during de-registering after device is closed but other subsystems 183 + may still have outstanding references to the netdevice. 25 184 26 185 MTU 27 186 ===
+1 -3
net/8021q/vlan.c
··· 284 284 return 0; 285 285 286 286 out_free_newdev: 287 - if (new_dev->reg_state == NETREG_UNINITIALIZED || 288 - new_dev->reg_state == NETREG_UNREGISTERED) 289 - free_netdev(new_dev); 287 + free_netdev(new_dev); 290 288 return err; 291 289 } 292 290
+15 -10
net/core/dev.c
··· 10077 10077 ret = call_netdevice_notifiers(NETDEV_REGISTER, dev); 10078 10078 ret = notifier_to_errno(ret); 10079 10079 if (ret) { 10080 + /* Expect explicit free_netdev() on failure */ 10081 + dev->needs_free_netdev = false; 10080 10082 rollback_registered(dev); 10081 - rcu_barrier(); 10082 - 10083 - dev->reg_state = NETREG_UNREGISTERED; 10084 - /* We should put the kobject that hold in 10085 - * netdev_unregister_kobject(), otherwise 10086 - * the net device cannot be freed when 10087 - * driver calls free_netdev(), because the 10088 - * kobject is being hold. 10089 - */ 10090 - kobject_put(&dev->dev.kobj); 10083 + net_set_todo(dev); 10084 + goto out; 10091 10085 } 10092 10086 /* 10093 10087 * Prevent userspace races by waiting until the network ··· 10625 10631 struct napi_struct *p, *n; 10626 10632 10627 10633 might_sleep(); 10634 + 10635 + /* When called immediately after register_netdevice() failed the unwind 10636 + * handling may still be dismantling the device. Handle that case by 10637 + * deferring the free. 10638 + */ 10639 + if (dev->reg_state == NETREG_UNREGISTERING) { 10640 + ASSERT_RTNL(); 10641 + dev->needs_free_netdev = true; 10642 + return; 10643 + } 10644 + 10628 10645 netif_free_tx_queues(dev); 10629 10646 netif_free_rx_queues(dev); 10630 10647
+6 -17
net/core/rtnetlink.c
··· 3439 3439 3440 3440 dev->ifindex = ifm->ifi_index; 3441 3441 3442 - if (ops->newlink) { 3442 + if (ops->newlink) 3443 3443 err = ops->newlink(link_net ? : net, dev, tb, data, extack); 3444 - /* Drivers should call free_netdev() in ->destructor 3445 - * and unregister it on failure after registration 3446 - * so that device could be finally freed in rtnl_unlock. 3447 - */ 3448 - if (err < 0) { 3449 - /* If device is not registered at all, free it now */ 3450 - if (dev->reg_state == NETREG_UNINITIALIZED || 3451 - dev->reg_state == NETREG_UNREGISTERED) 3452 - free_netdev(dev); 3453 - goto out; 3454 - } 3455 - } else { 3444 + else 3456 3445 err = register_netdevice(dev); 3457 - if (err < 0) { 3458 - free_netdev(dev); 3459 - goto out; 3460 - } 3446 + if (err < 0) { 3447 + free_netdev(dev); 3448 + goto out; 3461 3449 } 3450 + 3462 3451 err = rtnl_configure_link(dev, ifm); 3463 3452 if (err < 0) 3464 3453 goto out_unregister;