ata: libata-scsi: Avoid deadlock on rescan after device resume

When an ATA port is resumed from sleep, the port is reset and a power
management request issued to libata EH to reset the port and rescanning
the device(s) attached to the port. Device rescanning is done by
scheduling an ata_scsi_dev_rescan() work, which will execute
scsi_rescan_device().

However, scsi_rescan_device() takes the generic device lock, which is
also taken by dpm_resume() when the SCSI device is resumed as well. If
a device rescan execution starts before the completion of the SCSI
device resume, the rcu locking used to refresh the cached VPD pages of
the device, combined with the generic device locking from
scsi_rescan_device() and from dpm_resume() can cause a deadlock.

Avoid this situation by changing struct ata_port scsi_rescan_task to be
a delayed work instead of a simple work_struct. ata_scsi_dev_rescan() is
modified to check if the SCSI device associated with the ATA device that
must be rescanned is not suspended. If the SCSI device is still
suspended, ata_scsi_dev_rescan() returns early and reschedule itself for
execution after an arbitrary delay of 5ms.

Reported-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Reported-by: Joe Breuer <linux-kernel@jmbreuer.net>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217530
Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Tested-by: Joe Breuer <linux-kernel@jmbreuer.net>

Changed files
+25 -4
drivers
include
linux
+2 -1
drivers/ata/libata-core.c
··· 5348 5348 5349 5349 mutex_init(&ap->scsi_scan_mutex); 5350 5350 INIT_DELAYED_WORK(&ap->hotplug_task, ata_scsi_hotplug); 5351 - INIT_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan); 5351 + INIT_DELAYED_WORK(&ap->scsi_rescan_task, ata_scsi_dev_rescan); 5352 5352 INIT_LIST_HEAD(&ap->eh_done_q); 5353 5353 init_waitqueue_head(&ap->eh_wait_q); 5354 5354 init_completion(&ap->park_req_pending); ··· 5954 5954 WARN_ON(!(ap->pflags & ATA_PFLAG_UNLOADED)); 5955 5955 5956 5956 cancel_delayed_work_sync(&ap->hotplug_task); 5957 + cancel_delayed_work_sync(&ap->scsi_rescan_task); 5957 5958 5958 5959 skip_eh: 5959 5960 /* clean up zpodd on port removal */
+1 -1
drivers/ata/libata-eh.c
··· 2984 2984 ehc->i.flags |= ATA_EHI_SETMODE; 2985 2985 2986 2986 /* schedule the scsi_rescan_device() here */ 2987 - schedule_work(&(ap->scsi_rescan_task)); 2987 + schedule_delayed_work(&ap->scsi_rescan_task, 0); 2988 2988 } else if (dev->class == ATA_DEV_UNKNOWN && 2989 2989 ehc->tries[dev->devno] && 2990 2990 ata_class_enabled(ehc->classes[dev->devno])) {
+21 -1
drivers/ata/libata-scsi.c
··· 4597 4597 void ata_scsi_dev_rescan(struct work_struct *work) 4598 4598 { 4599 4599 struct ata_port *ap = 4600 - container_of(work, struct ata_port, scsi_rescan_task); 4600 + container_of(work, struct ata_port, scsi_rescan_task.work); 4601 4601 struct ata_link *link; 4602 4602 struct ata_device *dev; 4603 4603 unsigned long flags; 4604 + bool delay_rescan = false; 4604 4605 4605 4606 mutex_lock(&ap->scsi_scan_mutex); 4606 4607 spin_lock_irqsave(ap->lock, flags); ··· 4615 4614 if (scsi_device_get(sdev)) 4616 4615 continue; 4617 4616 4617 + /* 4618 + * If the rescan work was scheduled because of a resume 4619 + * event, the port is already fully resumed, but the 4620 + * SCSI device may not yet be fully resumed. In such 4621 + * case, executing scsi_rescan_device() may cause a 4622 + * deadlock with the PM code on device_lock(). Prevent 4623 + * this by giving up and retrying rescan after a short 4624 + * delay. 4625 + */ 4626 + delay_rescan = sdev->sdev_gendev.power.is_suspended; 4627 + if (delay_rescan) { 4628 + scsi_device_put(sdev); 4629 + break; 4630 + } 4631 + 4618 4632 spin_unlock_irqrestore(ap->lock, flags); 4619 4633 scsi_rescan_device(&(sdev->sdev_gendev)); 4620 4634 scsi_device_put(sdev); ··· 4639 4623 4640 4624 spin_unlock_irqrestore(ap->lock, flags); 4641 4625 mutex_unlock(&ap->scsi_scan_mutex); 4626 + 4627 + if (delay_rescan) 4628 + schedule_delayed_work(&ap->scsi_rescan_task, 4629 + msecs_to_jiffies(5)); 4642 4630 }
+1 -1
include/linux/libata.h
··· 836 836 837 837 struct mutex scsi_scan_mutex; 838 838 struct delayed_work hotplug_task; 839 - struct work_struct scsi_rescan_task; 839 + struct delayed_work scsi_rescan_task; 840 840 841 841 unsigned int hsm_task_state; 842 842