Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

block: add disk sequence number

Associating uevents with block devices in userspace is difficult and racy:
the uevent netlink socket is lossy, and on slow and overloaded systems
has a very high latency.
Block devices do not have exclusive owners in userspace, any process can
set one up (e.g. loop devices). Moreover, device names can be reused
(e.g. loop0 can be reused again and again). A userspace process setting
up a block device and watching for its events cannot thus reliably tell
whether an event relates to the device it just set up or another earlier
instance with the same name.

Being able to set a UUID on a loop device would solve the race conditions.
But it does not allow to derive orderings from uevents: if you see a
uevent with a UUID that does not match the device you are waiting for,
you cannot tell whether it's because the right uevent has not arrived yet,
or it was already sent and you missed it. So you cannot tell whether you
should wait for it or not.

Associating a unique, monotonically increasing sequential number to the
lifetime of each block device, which can be retrieved with an ioctl
immediately upon setting it up, allows to solve the race conditions with
uevents, and also allows userspace processes to know whether they should
wait for the uevent they need or if it was dropped and thus they should
move on.

Additionally, increment the disk sequence number when the media change,
i.e. on DISK_EVENT_MEDIA_CHANGE event.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Matteo Croce <mcroce@microsoft.com>
Tested-by: Luca Boccassi <bluca@debian.org>
Link: https://lore.kernel.org/r/20210712230530.29323-2-mcroce@linux.microsoft.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

authored by

Matteo Croce and committed by
Jens Axboe
cf179948 2164877c

+29
+3
block/disk-events.c
··· 190 190 191 191 spin_unlock_irq(&ev->lock); 192 192 193 + if (events & DISK_EVENT_MEDIA_CHANGE) 194 + inc_diskseq(disk); 195 + 193 196 /* 194 197 * Tell userland about new events. Only the events listed in 195 198 * @disk->events are reported, and only if DISK_EVENT_FLAG_UEVENT
+24
block/genhd.c
··· 29 29 30 30 static struct kobject *block_depr; 31 31 32 + /* 33 + * Unique, monotonically increasing sequential number associated with block 34 + * devices instances (i.e. incremented each time a device is attached). 35 + * Associating uevents with block devices in userspace is difficult and racy: 36 + * the uevent netlink socket is lossy, and on slow and overloaded systems has 37 + * a very high latency. 38 + * Block devices do not have exclusive owners in userspace, any process can set 39 + * one up (e.g. loop devices). Moreover, device names can be reused (e.g. loop0 40 + * can be reused again and again). 41 + * A userspace process setting up a block device and watching for its events 42 + * cannot thus reliably tell whether an event relates to the device it just set 43 + * up or another earlier instance with the same name. 44 + * This sequential number allows userspace processes to solve this problem, and 45 + * uniquely associate an uevent to the lifetime to a device. 46 + */ 47 + static atomic64_t diskseq; 48 + 32 49 /* for extended dynamic devt allocation, currently only one major is used */ 33 50 #define NR_EXT_DEVT (1 << MINORBITS) 34 51 static DEFINE_IDA(ext_devt_ida); ··· 1269 1252 disk_to_dev(disk)->class = &block_class; 1270 1253 disk_to_dev(disk)->type = &disk_type; 1271 1254 device_initialize(disk_to_dev(disk)); 1255 + inc_diskseq(disk); 1256 + 1272 1257 return disk; 1273 1258 1274 1259 out_destroy_part_tbl: ··· 1371 1352 return bdev->bd_read_only || get_disk_ro(bdev->bd_disk); 1372 1353 } 1373 1354 EXPORT_SYMBOL(bdev_read_only); 1355 + 1356 + void inc_diskseq(struct gendisk *disk) 1357 + { 1358 + disk->diskseq = atomic64_inc_return(&diskseq); 1359 + }
+2
include/linux/genhd.h
··· 172 172 int node_id; 173 173 struct badblocks *bb; 174 174 struct lockdep_map lockdep_map; 175 + u64 diskseq; 175 176 }; 176 177 177 178 /* ··· 333 332 #endif /* CONFIG_SYSFS */ 334 333 335 334 dev_t part_devt(struct gendisk *disk, u8 partno); 335 + void inc_diskseq(struct gendisk *disk); 336 336 dev_t blk_lookup_devt(const char *name, int partno); 337 337 void blk_request_module(dev_t devt); 338 338 #ifdef CONFIG_BLOCK