Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bonding: fix oops during rmmod

"rmmod bonding" causes an oops ever since commit cc317ea3d927 ("bonding:
remove redundant NULL check in debugfs function"). Here are the relevant
functions being called:

bonding_exit()
bond_destroy_debugfs()
debugfs_remove_recursive(bonding_debug_root);
bonding_debug_root = NULL; <--------- SET TO NULL HERE
bond_netlink_fini()
rtnl_link_unregister()
__rtnl_link_unregister()
unregister_netdevice_many_notify()
bond_uninit()
bond_debug_unregister()
(commit removed check for bonding_debug_root == NULL)
debugfs_remove()
simple_recursive_removal()
down_write() -> OOPS

However, reverting the bad commit does not solve the problem completely
because the original code contains a race that could cause the same
oops, although it was much less likely to be triggered unintentionally:

CPU1
rmmod bonding
bonding_exit()
bond_destroy_debugfs()
debugfs_remove_recursive(bonding_debug_root);

CPU2
echo -bond0 > /sys/class/net/bonding_masters
bond_uninit()
bond_debug_unregister()
if (!bonding_debug_root)

CPU1
bonding_debug_root = NULL;

So do NOT revert the bad commit (since the removed checks were racy
anyway), and instead change the order of actions taken during module
removal. The same oops can also happen if there is an error during
module init, so apply the same fix there.

Fixes: cc317ea3d927 ("bonding: remove redundant NULL check in debugfs function")
Cc: stable@vger.kernel.org
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Link: https://lore.kernel.org/r/641f914f-3216-4eeb-87dd-91b78aa97773@cybernetics.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

Tony Battersby and committed by
Jakub Kicinski
a45835a0 bb487272

+7 -6
+7 -6
drivers/net/bonding/bond_main.c
··· 6477 6477 if (res) 6478 6478 goto out; 6479 6479 6480 + bond_create_debugfs(); 6481 + 6480 6482 res = register_pernet_subsys(&bond_net_ops); 6481 6483 if (res) 6482 - goto out; 6484 + goto err_net_ops; 6483 6485 6484 6486 res = bond_netlink_init(); 6485 6487 if (res) 6486 6488 goto err_link; 6487 - 6488 - bond_create_debugfs(); 6489 6489 6490 6490 for (i = 0; i < max_bonds; i++) { 6491 6491 res = bond_create(&init_net, NULL); ··· 6501 6501 out: 6502 6502 return res; 6503 6503 err: 6504 - bond_destroy_debugfs(); 6505 6504 bond_netlink_fini(); 6506 6505 err_link: 6507 6506 unregister_pernet_subsys(&bond_net_ops); 6507 + err_net_ops: 6508 + bond_destroy_debugfs(); 6508 6509 goto out; 6509 6510 6510 6511 } ··· 6514 6513 { 6515 6514 unregister_netdevice_notifier(&bond_netdev_notifier); 6516 6515 6517 - bond_destroy_debugfs(); 6518 - 6519 6516 bond_netlink_fini(); 6520 6517 unregister_pernet_subsys(&bond_net_ops); 6518 + 6519 + bond_destroy_debugfs(); 6521 6520 6522 6521 #ifdef CONFIG_NET_POLL_CONTROLLER 6523 6522 /* Make sure we don't have an imbalance on our netpoll blocking */