Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

of: dynamic: Synchronize of_changeset_destroy() with the devlink removals

In the following sequence:
1) of_platform_depopulate()
2) of_overlay_remove()

During the step 1, devices are destroyed and devlinks are removed.
During the step 2, OF nodes are destroyed but
__of_changeset_entry_destroy() can raise warnings related to missing
of_node_put():
ERROR: memory leak, expected refcount 1 instead of 2 ...

Indeed, during the devlink removals performed at step 1, the removal
itself releasing the device (and the attached of_node) is done by a job
queued in a workqueue and so, it is done asynchronously with respect to
function calls.
When the warning is present, of_node_put() will be called but wrongly
too late from the workqueue job.

In order to be sure that any ongoing devlink removals are done before
the of_node destruction, synchronize the of_changeset_destroy() with the
devlink removals.

Fixes: 80dd33cf72d1 ("drivers: base: Fix device link removal")
Cc: stable@vger.kernel.org
Signed-off-by: Herve Codina <herve.codina@bootlin.com>
Reviewed-by: Saravana Kannan <saravanak@google.com>
Tested-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Reviewed-by: Nuno Sa <nuno.sa@analog.com>
Link: https://lore.kernel.org/r/20240325152140.198219-3-herve.codina@bootlin.com
Signed-off-by: Rob Herring <robh@kernel.org>

authored by

Herve Codina and committed by
Rob Herring
8917e738 0462c56c

+12
+12
drivers/of/dynamic.c
··· 9 9 10 10 #define pr_fmt(fmt) "OF: " fmt 11 11 12 + #include <linux/device.h> 12 13 #include <linux/of.h> 13 14 #include <linux/spinlock.h> 14 15 #include <linux/slab.h> ··· 667 666 void of_changeset_destroy(struct of_changeset *ocs) 668 667 { 669 668 struct of_changeset_entry *ce, *cen; 669 + 670 + /* 671 + * When a device is deleted, the device links to/from it are also queued 672 + * for deletion. Until these device links are freed, the devices 673 + * themselves aren't freed. If the device being deleted is due to an 674 + * overlay change, this device might be holding a reference to a device 675 + * node that will be freed. So, wait until all already pending device 676 + * links are deleted before freeing a device node. This ensures we don't 677 + * free any device node that has a non-zero reference count. 678 + */ 679 + device_link_wait_removal(); 670 680 671 681 list_for_each_entry_safe_reverse(ce, cen, &ocs->entries, node) 672 682 __of_changeset_entry_destroy(ce);