Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

dm kcopyd: always complete failed jobs

This patch fixes a problem in dm-kcopyd that may leave jobs in
complete queue indefinitely in the event of backing storage failure.

This behavior has been observed while running 100% write file fio
workload against an XFS volume created on top of a dm-zoned target
device. If the underlying storage of dm-zoned goes to offline state
under I/O, kcopyd sometimes never issues the end copy callback and
dm-zoned reclaim work hangs indefinitely waiting for that completion.

This behavior was traced down to the error handling code in
process_jobs() function that places the failed job to complete_jobs
queue, but doesn't wake up the job handler. In case of backing device
failure, all outstanding jobs may end up going to complete_jobs queue
via this code path and then stay there forever because there are no
more successful I/O jobs to wake up the job handler.

This patch adds a wake() call to always wake up kcopyd job wait queue
for all I/O jobs that fail before dm_io() gets called for that job.

The patch also sets the write error status in all sub jobs that are
failed because their master job has failed.

Fixes: b73c67c2cbb00 ("dm kcopyd: add sequential write feature")
Cc: stable@vger.kernel.org
Signed-off-by: Dmitry Fomichev <dmitry.fomichev@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>

authored by

Dmitry Fomichev and committed by
Mike Snitzer
d1fef414 cf3591ef

+4 -1
+4 -1
drivers/md/dm-kcopyd.c
··· 566 566 * no point in continuing. 567 567 */ 568 568 if (test_bit(DM_KCOPYD_WRITE_SEQ, &job->flags) && 569 - job->master_job->write_err) 569 + job->master_job->write_err) { 570 + job->write_err = job->master_job->write_err; 570 571 return -EIO; 572 + } 571 573 572 574 io_job_start(job->kc->throttle); 573 575 ··· 621 619 else 622 620 job->read_err = 1; 623 621 push(&kc->complete_jobs, job); 622 + wake(kc); 624 623 break; 625 624 } 626 625