Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

nvme: fix use after free when disconnecting a reconnecting ctrl

A crash happens when trying to disconnect a reconnecting ctrl:

1) The network was cut off when the connection was just established,
scan work hang there waiting for some IOs complete. Those I/Os were
retried because we return BLK_STS_RESOURCE to blk in reconnecting.
2) After a while, I tried to disconnect this connection. This
procedure also hangs because it tried to obtain ctrl->scan_lock.
It should be noted that now we have switched the controller state
to NVME_CTRL_DELETING.
3) In nvme_check_ready(), we always return true when ctrl->state is
NVME_CTRL_DELETING, so those retrying I/Os were issued to the bottom
device which was already freed.

To fix this, when ctrl->state is NVME_CTRL_DELETING, issue cmd to bottom
device only when queue state is live. If not, return host path error to
the block layer

Signed-off-by: Ruozhu Li <liruozhu@huawei.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>

authored by

Ruozhu Li and committed by
Christoph Hellwig
8b77fa6f c7c15ae3

+2 -1
+1
drivers/nvme/host/core.c
··· 666 666 struct request *rq) 667 667 { 668 668 if (ctrl->state != NVME_CTRL_DELETING_NOIO && 669 + ctrl->state != NVME_CTRL_DELETING && 669 670 ctrl->state != NVME_CTRL_DEAD && 670 671 !test_bit(NVME_CTRL_FAILFAST_EXPIRED, &ctrl->flags) && 671 672 !blk_noretry_request(rq) && !(rq->cmd_flags & REQ_NVME_MPATH))
+1 -1
drivers/nvme/host/nvme.h
··· 709 709 return true; 710 710 if (ctrl->ops->flags & NVME_F_FABRICS && 711 711 ctrl->state == NVME_CTRL_DELETING) 712 - return true; 712 + return queue_live; 713 713 return __nvme_check_ready(ctrl, rq, queue_live); 714 714 } 715 715 int nvme_submit_sync_cmd(struct request_queue *q, struct nvme_command *cmd,