blk-mq: fix corruption with direct issue

If we attempt a direct issue to a SCSI device, and it returns BUSY, then
we queue the request up normally. However, the SCSI layer may have
already setup SG tables etc for this particular command. If we later
merge with this request, then the old tables are no longer valid. Once
we issue the IO, we only read/write the original part of the request,
not the new state of it.

This causes data corruption, and is most often noticed with the file
system complaining about the just read data being invalid:

[ 235.934465] EXT4-fs error (device sda1): ext4_iget:4831: inode #7142: comm dpkg-query: bad extra_isize 24937 (inode size 256)

because most of it is garbage...

This doesn't happen from the normal issue path, as we will simply defer
the request to the hardware queue dispatch list if we fail. Once it's on
the dispatch list, we never merge with it.

Fix this from the direct issue path by flagging the request as
REQ_NOMERGE so we don't change the size of it before issue.

See also:
https://bugzilla.kernel.org/show_bug.cgi?id=201685

Tested-by: Guenter Roeck <linux@roeck-us.net>
Fixes: 6ce3dd6eec1 ("blk-mq: issue directly if hw queue isn't busy in case of 'none'")
Cc: stable@vger.kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

+25 -1
+25 -1
block/blk-mq.c
··· 1715 break; 1716 case BLK_STS_RESOURCE: 1717 case BLK_STS_DEV_RESOURCE: 1718 blk_mq_update_dispatch_busy(hctx, true); 1719 __blk_mq_requeue_request(rq); 1720 break; ··· 1734 } 1735 1736 return ret; 1737 } 1738 1739 static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, ··· 1769 goto insert; 1770 } 1771 1772 - if (q->elevator && !bypass_insert) 1773 goto insert; 1774 1775 if (!blk_mq_get_dispatch_budget(hctx)) ··· 1830 blk_status_t ret; 1831 struct request *rq = list_first_entry(list, struct request, 1832 queuelist); 1833 1834 list_del_init(&rq->queuelist); 1835 ret = blk_mq_request_issue_directly(rq);
··· 1715 break; 1716 case BLK_STS_RESOURCE: 1717 case BLK_STS_DEV_RESOURCE: 1718 + /* 1719 + * If direct dispatch fails, we cannot allow any merging on 1720 + * this IO. Drivers (like SCSI) may have set up permanent state 1721 + * for this request, like SG tables and mappings, and if we 1722 + * merge to it later on then we'll still only do IO to the 1723 + * original part. 1724 + */ 1725 + rq->cmd_flags |= REQ_NOMERGE; 1726 + 1727 blk_mq_update_dispatch_busy(hctx, true); 1728 __blk_mq_requeue_request(rq); 1729 break; ··· 1725 } 1726 1727 return ret; 1728 + } 1729 + 1730 + /* 1731 + * Don't allow direct dispatch of anything but regular reads/writes, 1732 + * as some of the other commands can potentially share request space 1733 + * with data we need for the IO scheduler. If we attempt a direct dispatch 1734 + * on those and fail, we can't safely add it to the scheduler afterwards 1735 + * without potentially overwriting data that the driver has already written. 1736 + */ 1737 + static bool blk_rq_can_direct_dispatch(struct request *rq) 1738 + { 1739 + return req_op(rq) == REQ_OP_READ || req_op(rq) == REQ_OP_WRITE; 1740 } 1741 1742 static blk_status_t __blk_mq_try_issue_directly(struct blk_mq_hw_ctx *hctx, ··· 1748 goto insert; 1749 } 1750 1751 + if (!blk_rq_can_direct_dispatch(rq) || (q->elevator && !bypass_insert)) 1752 goto insert; 1753 1754 if (!blk_mq_get_dispatch_budget(hctx)) ··· 1809 blk_status_t ret; 1810 struct request *rq = list_first_entry(list, struct request, 1811 queuelist); 1812 + 1813 + if (!blk_rq_can_direct_dispatch(rq)) 1814 + break; 1815 1816 list_del_init(&rq->queuelist); 1817 ret = blk_mq_request_issue_directly(rq);