Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

genhd: Fix BUG in blkdev_open()

When two blkdev_open() calls for a partition race with device removal
and recreation, we can hit BUG_ON(!bd_may_claim(bdev, whole, holder)) in
blkdev_open(). The race can happen as follows:

CPU0 CPU1 CPU2
del_gendisk()
bdev_unhash_inode(part1);

blkdev_open(part1, O_EXCL) blkdev_open(part1, O_EXCL)
bdev = bd_acquire() bdev = bd_acquire()
blkdev_get(bdev)
bd_start_claiming(bdev)
- finds old inode 'whole'
bd_prepare_to_claim() -> 0
bdev_unhash_inode(whole);
<device removed>
<new device under same
number created>
blkdev_get(bdev);
bd_start_claiming(bdev)
- finds new inode 'whole'
bd_prepare_to_claim()
- this also succeeds as we have
different 'whole' here...
- bad things happen now as we
have two exclusive openers of
the same bdev

The problem here is that block device opens can see various intermediate
states while gendisk is shutting down and then being recreated.

We fix the problem by introducing new lookup_sem in gendisk that
synchronizes gendisk deletion with get_gendisk() and furthermore by
making sure that get_gendisk() does not return gendisk that is being (or
has been) deleted. This makes sure that once we ever manage to look up
newly created bdev inode, we are also guaranteed that following
get_gendisk() will either return failure (and we fail open) or it
returns gendisk for the new device and following bdget_disk() will
return new bdev inode (i.e., blkdev_open() follows the path as if it is
completely run after new device is created).

Reported-and-analyzed-by: Hou Tao <houtao1@huawei.com>
Tested-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Jens Axboe <axboe@kernel.dk>

authored by

Jan Kara and committed by
Jens Axboe
56c0908c 89736653

+21 -1
+20 -1
block/genhd.c
··· 717 717 blk_integrity_del(disk); 718 718 disk_del_events(disk); 719 719 720 + /* 721 + * Block lookups of the disk until all bdevs are unhashed and the 722 + * disk is marked as dead (GENHD_FL_UP cleared). 723 + */ 724 + down_write(&disk->lookup_sem); 720 725 /* invalidate stuff */ 721 726 disk_part_iter_init(&piter, disk, 722 727 DISK_PITER_INCL_EMPTY | DISK_PITER_REVERSE); ··· 736 731 bdev_unhash_inode(disk_devt(disk)); 737 732 set_capacity(disk, 0); 738 733 disk->flags &= ~GENHD_FL_UP; 734 + up_write(&disk->lookup_sem); 739 735 740 736 if (!(disk->flags & GENHD_FL_HIDDEN)) 741 737 sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi"); ··· 822 816 spin_unlock_bh(&ext_devt_lock); 823 817 } 824 818 825 - if (disk && unlikely(disk->flags & GENHD_FL_HIDDEN)) { 819 + if (!disk) 820 + return NULL; 821 + 822 + /* 823 + * Synchronize with del_gendisk() to not return disk that is being 824 + * destroyed. 825 + */ 826 + down_read(&disk->lookup_sem); 827 + if (unlikely((disk->flags & GENHD_FL_HIDDEN) || 828 + !(disk->flags & GENHD_FL_UP))) { 829 + up_read(&disk->lookup_sem); 826 830 put_disk_and_module(disk); 827 831 disk = NULL; 832 + } else { 833 + up_read(&disk->lookup_sem); 828 834 } 829 835 return disk; 830 836 } ··· 1436 1418 kfree(disk); 1437 1419 return NULL; 1438 1420 } 1421 + init_rwsem(&disk->lookup_sem); 1439 1422 disk->node_id = node_id; 1440 1423 if (disk_expand_part_tbl(disk, 0)) { 1441 1424 free_part_stats(&disk->part0);
+1
include/linux/genhd.h
··· 198 198 void *private_data; 199 199 200 200 int flags; 201 + struct rw_semaphore lookup_sem; 201 202 struct kobject *slave_dir; 202 203 203 204 struct timer_rand_state *random;