Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

ata: libata-eh: correctly handle deferred qc timeouts

A deferred qc may timeout while waiting for the device queue to drain
to be submitted. In such case, since the qc is not active,
ata_scsi_cmd_error_handler() ends up calling scsi_eh_finish_cmd(),
which frees the qc. But as the port deferred_qc field still references
this finished/freed qc, the deferred qc work may eventually attempt to
call ata_qc_issue() against this invalid qc, leading to errors such as
reported by UBSAN (syzbot run):

UBSAN: shift-out-of-bounds in drivers/ata/libata-core.c:5166:24
shift exponent 4210818301 is too large for 64-bit type 'long long unsigned int'
...
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x100/0x190 lib/dump_stack.c:120
ubsan_epilogue+0xa/0x30 lib/ubsan.c:233
__ubsan_handle_shift_out_of_bounds+0x279/0x2a0 lib/ubsan.c:494
ata_qc_issue.cold+0x38/0x9f drivers/ata/libata-core.c:5166
ata_scsi_deferred_qc_work+0x154/0x1f0 drivers/ata/libata-scsi.c:1679
process_one_work+0x9d7/0x1920 kernel/workqueue.c:3275
process_scheduled_works kernel/workqueue.c:3358 [inline]
worker_thread+0x5da/0xe40 kernel/workqueue.c:3439
kthread+0x370/0x450 kernel/kthread.c:467
ret_from_fork+0x754/0xd80 arch/x86/kernel/process.c:158
ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245
</TASK>

Fix this by checking if the qc of a timed out SCSI command is a deferred
one, and in such case, clear the port deferred_qc field and finish the
SCSI command with DID_TIME_OUT.

Reported-by: syzbot+1f77b8ca15336fff21ff@syzkaller.appspotmail.com
Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>

+19 -3
+19 -3
drivers/ata/libata-eh.c
··· 640 set_host_byte(scmd, DID_OK); 641 642 ata_qc_for_each_raw(ap, qc, i) { 643 - if (qc->flags & ATA_QCFLAG_ACTIVE && 644 - qc->scsicmd == scmd) 645 break; 646 } 647 648 - if (i < ATA_MAX_QUEUE) { 649 /* the scmd has an associated qc */ 650 if (!(qc->flags & ATA_QCFLAG_EH)) { 651 /* which hasn't failed yet, timeout */
··· 640 set_host_byte(scmd, DID_OK); 641 642 ata_qc_for_each_raw(ap, qc, i) { 643 + if (qc->scsicmd != scmd) 644 + continue; 645 + if ((qc->flags & ATA_QCFLAG_ACTIVE) || 646 + qc == ap->deferred_qc) 647 break; 648 } 649 650 + if (qc == ap->deferred_qc) { 651 + /* 652 + * This is a deferred command that timed out while 653 + * waiting for the command queue to drain. Since the qc 654 + * is not active yet (deferred_qc is still set, so the 655 + * deferred qc work has not issued the command yet), 656 + * simply signal the timeout by finishing the SCSI 657 + * command and clear the deferred qc to prevent the 658 + * deferred qc work from issuing this qc. 659 + */ 660 + WARN_ON_ONCE(qc->flags & ATA_QCFLAG_ACTIVE); 661 + ap->deferred_qc = NULL; 662 + set_host_byte(scmd, DID_TIME_OUT); 663 + scsi_eh_finish_cmd(scmd, &ap->eh_done_q); 664 + } else if (i < ATA_MAX_QUEUE) { 665 /* the scmd has an associated qc */ 666 if (!(qc->flags & ATA_QCFLAG_EH)) { 667 /* which hasn't failed yet, timeout */