Remove deadlock potential in md_open

A recent commit:
commit 449aad3e25358812c43afc60918c5ad3819488e7

introduced the possibility of an A-B/B-A deadlock between
bd_mutex and reconfig_mutex.

__blkdev_get holds bd_mutex while calling md_open which takes
reconfig_mutex,
do_md_run is always called with reconfig_mutex held, and it now
takes bd_mutex in the call the revalidate_disk.

This potential deadlock was not caught by lockdep due to the
use of mutex_lock_interruptible_nexted which was introduced
by
commit d63a5a74dee87883fda6b7d170244acaac5b05e8
do avoid a warning of an impossible deadlock.

It is quite possible to split reconfig_mutex in to two locks.
One protects the array data structures while it is being
reconfigured, the other ensures that an array is never even partially
open while it is being deactivated.
In particular, the second lock prevents an open from completing
between the time when do_md_stop checks if there are any active opens,
and the time when the array is either set read-only, or when ->pers is
set to NULL. So we can be certain that no IO is in flight as the
array is being destroyed.

So create a new lock, open_mutex, just to ensure exclusion between
'open' and 'stop'.

This avoids the deadlock and also avoids the lockdep warning mentioned
in commit d63a5a74d

Reported-by: "Mike Snitzer" <snitzer@gmail.com>
Reported-by: "H. Peter Anvin" <hpa@zytor.com>
Signed-off-by: NeilBrown <neilb@suse.de>

NeilBrown c8c00a69 7b2aa037

+20 -8
+10 -8
drivers/md/md.c
··· 359 359 else 360 360 new->md_minor = MINOR(unit) >> MdpMinorShift; 361 361 362 + mutex_init(&new->open_mutex); 362 363 mutex_init(&new->reconfig_mutex); 363 364 INIT_LIST_HEAD(&new->disks); 364 365 INIT_LIST_HEAD(&new->all_mddevs); ··· 4305 4304 struct gendisk *disk = mddev->gendisk; 4306 4305 mdk_rdev_t *rdev; 4307 4306 4307 + mutex_lock(&mddev->open_mutex); 4308 4308 if (atomic_read(&mddev->openers) > is_open) { 4309 4309 printk("md: %s still in use.\n",mdname(mddev)); 4310 - return -EBUSY; 4311 - } 4312 - 4313 - if (mddev->pers) { 4310 + err = -EBUSY; 4311 + } else if (mddev->pers) { 4314 4312 4315 4313 if (mddev->sync_thread) { 4316 4314 set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); ··· 4367 4367 set_disk_ro(disk, 1); 4368 4368 clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); 4369 4369 } 4370 - 4370 + out: 4371 + mutex_unlock(&mddev->open_mutex); 4372 + if (err) 4373 + return err; 4371 4374 /* 4372 4375 * Free resources if final stop 4373 4376 */ ··· 4436 4433 blk_integrity_unregister(disk); 4437 4434 md_new_event(mddev); 4438 4435 sysfs_notify_dirent(mddev->sysfs_state); 4439 - out: 4440 4436 return err; 4441 4437 } 4442 4438 ··· 5520 5518 } 5521 5519 BUG_ON(mddev != bdev->bd_disk->private_data); 5522 5520 5523 - if ((err = mutex_lock_interruptible_nested(&mddev->reconfig_mutex, 1))) 5521 + if ((err = mutex_lock_interruptible(&mddev->open_mutex))) 5524 5522 goto out; 5525 5523 5526 5524 err = 0; 5527 5525 atomic_inc(&mddev->openers); 5528 - mddev_unlock(mddev); 5526 + mutex_unlock(&mddev->open_mutex); 5529 5527 5530 5528 check_disk_change(bdev); 5531 5529 out:
+10
drivers/md/md.h
··· 223 223 * so we don't loop trying */ 224 224 225 225 int in_sync; /* know to not need resync */ 226 + /* 'open_mutex' avoids races between 'md_open' and 'do_md_stop', so 227 + * that we are never stopping an array while it is open. 228 + * 'reconfig_mutex' protects all other reconfiguration. 229 + * These locks are separate due to conflicting interactions 230 + * with bdev->bd_mutex. 231 + * Lock ordering is: 232 + * reconfig_mutex -> bd_mutex : e.g. do_md_run -> revalidate_disk 233 + * bd_mutex -> open_mutex: e.g. __blkdev_get -> md_open 234 + */ 235 + struct mutex open_mutex; 226 236 struct mutex reconfig_mutex; 227 237 atomic_t active; /* general refcount */ 228 238 atomic_t openers; /* number of active opens */