Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

[PATCH] dm-multipath: fix stall on noflush suspend/resume

Allow noflush suspend/resume of device-mapper device only for the case
where the device size is unchanged.

Otherwise, dm-multipath devices can stall when resumed if noflush was used
when suspending them, all paths have failed and queue_if_no_path is set.

Explanation:
1. Something is doing fsync() on the block dev,
holding inode->i_sem
2. The fsync write is blocked by all-paths-down and queue_if_no_path
3. Someone requests to suspend the dm device with noflush.
Pending writes are left in queue.
4. In the middle of dm_resume(), __bind() tries to get
inode->i_sem to do __set_size() and waits forever.

'noflush suspend' is a new device-mapper feature introduced in
early 2.6.20. So I hope the fix being included before 2.6.20 is
released.

Example of reproducer:
1. Create a multipath device by dmsetup
2. Fail all paths during mkfs
3. Do dmsetup suspend --noflush and load new map with healthy paths
4. Do dmsetup resume

Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Acked-by: Alasdair G Kergon <agk@redhat.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Jun'ichi Nomura and committed by
Linus Torvalds
bfa152fa e540eb45

+19 -8
+19 -8
drivers/md/dm.c
··· 1116 1116 if (size != get_capacity(md->disk)) 1117 1117 memset(&md->geometry, 0, sizeof(md->geometry)); 1118 1118 1119 - __set_size(md, size); 1119 + if (md->suspended_bdev) 1120 + __set_size(md, size); 1120 1121 if (size == 0) 1121 1122 return 0; 1122 1123 ··· 1265 1264 if (!dm_suspended(md)) 1266 1265 goto out; 1267 1266 1267 + /* without bdev, the device size cannot be changed */ 1268 + if (!md->suspended_bdev) 1269 + if (get_capacity(md->disk) != dm_table_get_size(table)) 1270 + goto out; 1271 + 1268 1272 __unbind(md); 1269 1273 r = __bind(md, table); 1270 1274 ··· 1347 1341 /* This does not get reverted if there's an error later. */ 1348 1342 dm_table_presuspend_targets(map); 1349 1343 1350 - md->suspended_bdev = bdget_disk(md->disk, 0); 1351 - if (!md->suspended_bdev) { 1352 - DMWARN("bdget failed in dm_suspend"); 1353 - r = -ENOMEM; 1354 - goto flush_and_out; 1344 + /* bdget() can stall if the pending I/Os are not flushed */ 1345 + if (!noflush) { 1346 + md->suspended_bdev = bdget_disk(md->disk, 0); 1347 + if (!md->suspended_bdev) { 1348 + DMWARN("bdget failed in dm_suspend"); 1349 + r = -ENOMEM; 1350 + goto flush_and_out; 1351 + } 1355 1352 } 1356 1353 1357 1354 /* ··· 1482 1473 1483 1474 unlock_fs(md); 1484 1475 1485 - bdput(md->suspended_bdev); 1486 - md->suspended_bdev = NULL; 1476 + if (md->suspended_bdev) { 1477 + bdput(md->suspended_bdev); 1478 + md->suspended_bdev = NULL; 1479 + } 1487 1480 1488 1481 clear_bit(DMF_SUSPENDED, &md->flags); 1489 1482