Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

scsi: block: pm: Always set request queue runtime active in blk_post_runtime_resume()

John Garry reported a deadlock that occurs when trying to access a
runtime-suspended SATA device. For obscure reasons, the rescan procedure
causes the link to be hard-reset, which disconnects the device.

The rescan tries to carry out a runtime resume when accessing the device.
scsi_rescan_device() holds the SCSI device lock and won't release it until
it can put commands onto the device's block queue. This can't happen until
the queue is successfully runtime-resumed or the device is unregistered.
But the runtime resume fails because the device is disconnected, and
__scsi_remove_device() can't do the unregistration because it can't get the
device lock.

The best way to resolve this deadlock appears to be to allow the block
queue to start running again even after an unsuccessful runtime resume.
The idea is that the driver or the SCSI error handler will need to be able
to use the queue to resolve the runtime resume failure.

This patch removes the err argument to blk_post_runtime_resume() and makes
the routine act as though the resume was successful always. This fixes the
deadlock.

Link: https://lore.kernel.org/r/1639999298-244569-4-git-send-email-chenxiang66@hisilicon.com
Fixes: e27829dc92e5 ("scsi: serialize ->rescan against ->remove")
Reported-and-tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

authored by

Alan Stern and committed by
Martin K. Petersen
6e1fcab0 6cc73908

+9 -17
+7 -15
block/blk-pm.c
··· 163 163 /** 164 164 * blk_post_runtime_resume - Post runtime resume processing 165 165 * @q: the queue of the device 166 - * @err: return value of the device's runtime_resume function 167 166 * 168 167 * Description: 169 - * Update the queue's runtime status according to the return value of the 170 - * device's runtime_resume function. If the resume was successful, call 171 - * blk_set_runtime_active() to do the real work of restarting the queue. 168 + * For historical reasons, this routine merely calls blk_set_runtime_active() 169 + * to do the real work of restarting the queue. It does this regardless of 170 + * whether the device's runtime-resume succeeded; even if it failed the 171 + * driver or error handler will need to communicate with the device. 172 172 * 173 173 * This function should be called near the end of the device's 174 174 * runtime_resume callback. 175 175 */ 176 - void blk_post_runtime_resume(struct request_queue *q, int err) 176 + void blk_post_runtime_resume(struct request_queue *q) 177 177 { 178 - if (!q->dev) 179 - return; 180 - if (!err) { 181 - blk_set_runtime_active(q); 182 - } else { 183 - spin_lock_irq(&q->queue_lock); 184 - q->rpm_status = RPM_SUSPENDED; 185 - spin_unlock_irq(&q->queue_lock); 186 - } 178 + blk_set_runtime_active(q); 187 179 } 188 180 EXPORT_SYMBOL(blk_post_runtime_resume); 189 181 ··· 193 201 * runtime PM status and re-enable peeking requests from the queue. It 194 202 * should be called before first request is added to the queue. 195 203 * 196 - * This function is also called by blk_post_runtime_resume() for successful 204 + * This function is also called by blk_post_runtime_resume() for 197 205 * runtime resumes. It does everything necessary to restart the queue. 198 206 */ 199 207 void blk_set_runtime_active(struct request_queue *q)
+1 -1
drivers/scsi/scsi_pm.c
··· 180 180 blk_pre_runtime_resume(sdev->request_queue); 181 181 if (pm && pm->runtime_resume) 182 182 err = pm->runtime_resume(dev); 183 - blk_post_runtime_resume(sdev->request_queue, err); 183 + blk_post_runtime_resume(sdev->request_queue); 184 184 185 185 return err; 186 186 }
+1 -1
include/linux/blk-pm.h
··· 14 14 extern int blk_pre_runtime_suspend(struct request_queue *q); 15 15 extern void blk_post_runtime_suspend(struct request_queue *q, int err); 16 16 extern void blk_pre_runtime_resume(struct request_queue *q); 17 - extern void blk_post_runtime_resume(struct request_queue *q, int err); 17 + extern void blk_post_runtime_resume(struct request_queue *q); 18 18 extern void blk_set_runtime_active(struct request_queue *q); 19 19 #else 20 20 static inline void blk_pm_runtime_init(struct request_queue *q,