Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

nvdimm: Fix firmware activation deadlock scenarios

Lockdep reports the following deadlock scenarios for CXL root device
power-management, device_prepare(), operations, and device_shutdown()
operations for 'nd_region' devices:

Chain exists of:
&nvdimm_region_key --> &nvdimm_bus->reconfig_mutex --> system_transition_mutex

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(system_transition_mutex);
lock(&nvdimm_bus->reconfig_mutex);
lock(system_transition_mutex);
lock(&nvdimm_region_key);

Chain exists of:
&cxl_nvdimm_bridge_key --> acpi_scan_lock --> &cxl_root_key

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&cxl_root_key);
lock(acpi_scan_lock);
lock(&cxl_root_key);
lock(&cxl_nvdimm_bridge_key);

These stem from holding nvdimm_bus_lock() over hibernate_quiet_exec()
which walks the entire system device topology taking device_lock() along
the way. The nvdimm_bus_lock() is protecting against unregistration,
multiple simultaneous ops callers, and preventing activate_show() from
racing activate_store(). For the first 2, the lock is redundant.
Unregistration already flushes all ops users, and sysfs already prevents
multiple threads to be active in an ops handler at the same time. For
the last userspace should already be waiting for its last
activate_store() to complete, and does not need activate_show() to flush
the write side, so this lock usage can be deleted in these attributes.

Fixes: 48001ea50d17 ("PM, libnvdimm: Add runtime firmware activation support")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Link: https://lore.kernel.org/r/165074883800.4116052.10737040861825806582.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>

-9
-9
drivers/nvdimm/core.c
··· 368 368 if (!nd_desc->fw_ops) 369 369 return -EOPNOTSUPP; 370 370 371 - nvdimm_bus_lock(dev); 372 371 cap = nd_desc->fw_ops->capability(nd_desc); 373 - nvdimm_bus_unlock(dev); 374 372 375 373 switch (cap) { 376 374 case NVDIMM_FWA_CAP_QUIESCE: ··· 393 395 if (!nd_desc->fw_ops) 394 396 return -EOPNOTSUPP; 395 397 396 - nvdimm_bus_lock(dev); 397 398 cap = nd_desc->fw_ops->capability(nd_desc); 398 399 state = nd_desc->fw_ops->activate_state(nd_desc); 399 - nvdimm_bus_unlock(dev); 400 400 401 401 if (cap < NVDIMM_FWA_CAP_QUIESCE) 402 402 return -EOPNOTSUPP; ··· 439 443 else 440 444 return -EINVAL; 441 445 442 - nvdimm_bus_lock(dev); 443 446 state = nd_desc->fw_ops->activate_state(nd_desc); 444 447 445 448 switch (state) { ··· 456 461 default: 457 462 rc = -ENXIO; 458 463 } 459 - nvdimm_bus_unlock(dev); 460 464 461 465 if (rc == 0) 462 466 rc = len; ··· 478 484 if (!nd_desc->fw_ops) 479 485 return 0; 480 486 481 - nvdimm_bus_lock(dev); 482 487 cap = nd_desc->fw_ops->capability(nd_desc); 483 - nvdimm_bus_unlock(dev); 484 - 485 488 if (cap < NVDIMM_FWA_CAP_QUIESCE) 486 489 return 0; 487 490