Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'for-6.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- Update DM crypt to allocate compound pages if possible

- Fix DM crypt target's crypt_ctr_cipher_new return value on invalid
AEAD cipher

- Fix DM flakey testing target's write bio corruption feature to
corrupt the data of a cloned bio instead of the original

- Add random_read_corrupt and random_write_corrupt features to DM
flakey target

- Fix ABBA deadlock in DM thin metadata by resetting associated bufio
client rather than destroying and recreating it

- A couple other small DM thinp cleanups

- Update DM core to support disabling block core IO stats accounting
and optimize away code that isn't needed if stats are disabled

- Other small DM core cleanups

- Improve DM integrity target to not require so much memory on 32 bit
systems. Also only allocate the recalculate buffer as needed (and
increasingly reduce its size on allocation failure)

- Update DM integrity to use %*ph for printing hexdump of a small
buffer. Also update DM integrity documentation

- Various DM core ioctl interface hardening. Now more careful about
alignment of structures and processing of input passed to the kernel
from userspace.

Also disallow the creation of DM devices named "control", "." or ".."

- Eliminate GFP_NOIO workarounds for __vmalloc and kvmalloc in DM
core's ioctl and bufio code

* tag 'for-6.5/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm: (28 commits)
dm: get rid of GFP_NOIO workarounds for __vmalloc and kvmalloc
dm integrity: scale down the recalculate buffer if memory allocation fails
dm integrity: only allocate recalculate buffer when needed
dm integrity: reduce vmalloc space footprint on 32-bit architectures
dm ioctl: Refuse to create device named "." or ".."
dm ioctl: Refuse to create device named "control"
dm ioctl: Avoid double-fetch of version
dm ioctl: structs and parameter strings must not overlap
dm ioctl: Avoid pointer arithmetic overflow
dm ioctl: Check dm_target_spec is sufficiently aligned
Documentation: dm-integrity: Document an example of how the tunables relate.
Documentation: dm-integrity: Document default values.
Documentation: dm-integrity: Document the meaning of "buffer".
Documentation: dm-integrity: Fix minor grammatical error.
dm integrity: Use %*ph for printing hexdump of a small buffer
dm thin: disable discards for thin-pool if no_discard_passdown
dm: remove stale/redundant dm_internal_{suspend,resume} prototypes in dm.h
dm: skip dm-stats work in alloc_io() unless needed
dm: avoid needless dm_io access if all IO accounting is disabled
dm: support turning off block-core's io stats accounting
...

+479 -237
+10
Documentation/admin-guide/device-mapper/dm-flakey.rst
··· 67 67 Perform the replacement only if bio->bi_opf has all the 68 68 selected flags set. 69 69 70 + random_read_corrupt <probability> 71 + During <down interval>, replace random byte in a read bio 72 + with a random value. probability is an integer between 73 + 0 and 1000000000 meaning 0% to 100% probability of corruption. 74 + 75 + random_write_corrupt <probability> 76 + During <down interval>, replace random byte in a write bio 77 + with a random value. probability is an integer between 78 + 0 and 1000000000 meaning 0% to 100% probability of corruption. 79 + 70 80 Examples: 71 81 72 82 Replaces the 32nd byte of READ bios with the value 1::
+27 -16
Documentation/admin-guide/device-mapper/dm-integrity.rst
··· 25 25 mode, the dm-integrity target can be used to detect silent data 26 26 corruption on the disk or in the I/O path. 27 27 28 - There's an alternate mode of operation where dm-integrity uses bitmap 28 + There's an alternate mode of operation where dm-integrity uses a bitmap 29 29 instead of a journal. If a bit in the bitmap is 1, the corresponding 30 30 region's data and integrity tags are not synchronized - if the machine 31 31 crashes, the unsynchronized regions will be recalculated. The bitmap mode ··· 37 37 the device. But it will only format the device if the superblock contains 38 38 zeroes. If the superblock is neither valid nor zeroed, the dm-integrity 39 39 target can't be loaded. 40 + 41 + Accesses to the on-disk metadata area containing checksums (aka tags) are 42 + buffered using dm-bufio. When an access to any given metadata area 43 + occurs, each unique metadata area gets its own buffer(s). The buffer size 44 + is capped at the size of the metadata area, but may be smaller, thereby 45 + requiring multiple buffers to represent the full metadata area. A smaller 46 + buffer size will produce a smaller resulting read/write operation to the 47 + metadata area for small reads/writes. The metadata is still read even in 48 + a full write to the data covered by a single buffer. 40 49 41 50 To use the target for the first time: 42 51 ··· 102 93 device. If the device is already formatted, the value from the 103 94 superblock is used. 104 95 105 - interleave_sectors:number 96 + interleave_sectors:number (default 32768) 106 97 The number of interleaved sectors. This values is rounded down to 107 98 a power of two. If the device is already formatted, the value from 108 99 the superblock is used. ··· 111 102 Don't interleave the data and metadata on the device. Use a 112 103 separate device for metadata. 113 104 114 - buffer_sectors:number 115 - The number of sectors in one buffer. The value is rounded down to 116 - a power of two. 105 + buffer_sectors:number (default 128) 106 + The number of sectors in one metadata buffer. The value is rounded 107 + down to a power of two. 117 108 118 - The tag area is accessed using buffers, the buffer size is 119 - configurable. The large buffer size means that the I/O size will 120 - be larger, but there could be less I/Os issued. 121 - 122 - journal_watermark:number 109 + journal_watermark:number (default 50) 123 110 The journal watermark in percents. When the size of the journal 124 111 exceeds this watermark, the thread that flushes the journal will 125 112 be started. 126 113 127 - commit_time:number 114 + commit_time:number (default 10000) 128 115 Commit time in milliseconds. When this time passes, the journal is 129 116 written. The journal is also written immediately if the FLUSH 130 117 request is received. ··· 168 163 the journal. Thus, modified sector number would be detected at 169 164 this stage. 170 165 171 - block_size:number 172 - The size of a data block in bytes. The larger the block size the 166 + block_size:number (default 512) 167 + The size of a data block in bytes. The larger the block size the 173 168 less overhead there is for per-block integrity metadata. 174 - Supported values are 512, 1024, 2048 and 4096 bytes. If not 175 - specified the default block size is 512 bytes. 169 + Supported values are 512, 1024, 2048 and 4096 bytes. 176 170 177 171 sectors_per_bit:number 178 172 In the bitmap mode, this parameter specifies the number of ··· 213 209 should not be changed when reloading the target because the layout of disk 214 210 data depend on them and the reloaded target would be non-functional. 215 211 212 + For example, on a device using the default interleave_sectors of 32768, a 213 + block_size of 512, and an internal_hash of crc32c with a tag size of 4 214 + bytes, it will take 128 KiB of tags to track a full data area, requiring 215 + 256 sectors of metadata per data area. With the default buffer_sectors of 216 + 128, that means there will be 2 buffers per metadata area, or 2 buffers 217 + per 16 MiB of data. 216 218 217 219 Status line: 218 220 ··· 296 286 Each run contains: 297 287 298 288 * tag area - it contains integrity tags. There is one tag for each 299 - sector in the data area 289 + sector in the data area. The size of this area is always 4KiB or 290 + greater. 300 291 * data area - it contains data sectors. The number of data sectors 301 292 in one run must be a power of two. log2 of this value is stored 302 293 in the superblock.
+7 -17
drivers/md/dm-bufio.c
··· 1157 1157 1158 1158 *data_mode = DATA_MODE_VMALLOC; 1159 1159 1160 - /* 1161 - * __vmalloc allocates the data pages and auxiliary structures with 1162 - * gfp_flags that were specified, but pagetables are always allocated 1163 - * with GFP_KERNEL, no matter what was specified as gfp_mask. 1164 - * 1165 - * Consequently, we must set per-process flag PF_MEMALLOC_NOIO so that 1166 - * all allocations done by this process (including pagetables) are done 1167 - * as if GFP_NOIO was specified. 1168 - */ 1169 - if (gfp_mask & __GFP_NORETRY) { 1170 - unsigned int noio_flag = memalloc_noio_save(); 1171 - void *ptr = __vmalloc(c->block_size, gfp_mask); 1172 - 1173 - memalloc_noio_restore(noio_flag); 1174 - return ptr; 1175 - } 1176 - 1177 1160 return __vmalloc(c->block_size, gfp_mask); 1178 1161 } 1179 1162 ··· 2574 2591 kfree(c); 2575 2592 } 2576 2593 EXPORT_SYMBOL_GPL(dm_bufio_client_destroy); 2594 + 2595 + void dm_bufio_client_reset(struct dm_bufio_client *c) 2596 + { 2597 + drop_buffers(c); 2598 + flush_work(&c->shrink_work); 2599 + } 2600 + EXPORT_SYMBOL_GPL(dm_bufio_client_reset); 2577 2601 2578 2602 void dm_bufio_set_sector_offset(struct dm_bufio_client *c, sector_t start) 2579 2603 {
+2 -1
drivers/md/dm-core.h
··· 306 306 */ 307 307 enum { 308 308 DM_IO_ACCOUNTED, 309 - DM_IO_WAS_SPLIT 309 + DM_IO_WAS_SPLIT, 310 + DM_IO_BLK_STAT 310 311 }; 311 312 312 313 static inline bool dm_io_flagged(struct dm_io *io, unsigned int bit)
+36 -15
drivers/md/dm-crypt.c
··· 1661 1661 * In order to not degrade performance with excessive locking, we try 1662 1662 * non-blocking allocations without a mutex first but on failure we fallback 1663 1663 * to blocking allocations with a mutex. 1664 + * 1665 + * In order to reduce allocation overhead, we try to allocate compound pages in 1666 + * the first pass. If they are not available, we fall back to the mempool. 1664 1667 */ 1665 1668 static struct bio *crypt_alloc_buffer(struct dm_crypt_io *io, unsigned int size) 1666 1669 { ··· 1671 1668 struct bio *clone; 1672 1669 unsigned int nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; 1673 1670 gfp_t gfp_mask = GFP_NOWAIT | __GFP_HIGHMEM; 1674 - unsigned int i, len, remaining_size; 1675 - struct page *page; 1671 + unsigned int remaining_size; 1672 + unsigned int order = MAX_ORDER - 1; 1676 1673 1677 1674 retry: 1678 1675 if (unlikely(gfp_mask & __GFP_DIRECT_RECLAIM)) ··· 1685 1682 1686 1683 remaining_size = size; 1687 1684 1688 - for (i = 0; i < nr_iovecs; i++) { 1689 - page = mempool_alloc(&cc->page_pool, gfp_mask); 1690 - if (!page) { 1685 + while (remaining_size) { 1686 + struct page *pages; 1687 + unsigned size_to_add; 1688 + unsigned remaining_order = __fls((remaining_size + PAGE_SIZE - 1) >> PAGE_SHIFT); 1689 + order = min(order, remaining_order); 1690 + 1691 + while (order > 0) { 1692 + pages = alloc_pages(gfp_mask 1693 + | __GFP_NOMEMALLOC | __GFP_NORETRY | __GFP_NOWARN | __GFP_COMP, 1694 + order); 1695 + if (likely(pages != NULL)) 1696 + goto have_pages; 1697 + order--; 1698 + } 1699 + 1700 + pages = mempool_alloc(&cc->page_pool, gfp_mask); 1701 + if (!pages) { 1691 1702 crypt_free_buffer_pages(cc, clone); 1692 1703 bio_put(clone); 1693 1704 gfp_mask |= __GFP_DIRECT_RECLAIM; 1705 + order = 0; 1694 1706 goto retry; 1695 1707 } 1696 1708 1697 - len = (remaining_size > PAGE_SIZE) ? PAGE_SIZE : remaining_size; 1698 - 1699 - __bio_add_page(clone, page, len, 0); 1700 - remaining_size -= len; 1709 + have_pages: 1710 + size_to_add = min((unsigned)PAGE_SIZE << order, remaining_size); 1711 + __bio_add_page(clone, pages, size_to_add, 0); 1712 + remaining_size -= size_to_add; 1701 1713 } 1702 1714 1703 1715 /* Allocate space for integrity tags */ ··· 1730 1712 1731 1713 static void crypt_free_buffer_pages(struct crypt_config *cc, struct bio *clone) 1732 1714 { 1733 - struct bio_vec *bv; 1734 - struct bvec_iter_all iter_all; 1715 + struct folio_iter fi; 1735 1716 1736 - bio_for_each_segment_all(bv, clone, iter_all) { 1737 - BUG_ON(!bv->bv_page); 1738 - mempool_free(bv->bv_page, &cc->page_pool); 1717 + if (clone->bi_vcnt > 0) { /* bio_for_each_folio_all crashes with an empty bio */ 1718 + bio_for_each_folio_all(fi, clone) { 1719 + if (folio_test_large(fi.folio)) 1720 + folio_put(fi.folio); 1721 + else 1722 + mempool_free(&fi.folio->page, &cc->page_pool); 1723 + } 1739 1724 } 1740 1725 } 1741 1726 ··· 2908 2887 ret = crypt_ctr_auth_cipher(cc, cipher_api); 2909 2888 if (ret < 0) { 2910 2889 ti->error = "Invalid AEAD cipher spec"; 2911 - return -ENOMEM; 2890 + return ret; 2912 2891 } 2913 2892 } 2914 2893
+188 -22
drivers/md/dm-flakey.c
··· 16 16 17 17 #define DM_MSG_PREFIX "flakey" 18 18 19 + #define PROBABILITY_BASE 1000000000 20 + 19 21 #define all_corrupt_bio_flags_match(bio, fc) \ 20 22 (((bio)->bi_opf & (fc)->corrupt_bio_flags) == (fc)->corrupt_bio_flags) 21 23 ··· 36 34 unsigned int corrupt_bio_rw; 37 35 unsigned int corrupt_bio_value; 38 36 blk_opf_t corrupt_bio_flags; 37 + unsigned int random_read_corrupt; 38 + unsigned int random_write_corrupt; 39 39 }; 40 40 41 41 enum feature_flag_bits { ··· 58 54 const char *arg_name; 59 55 60 56 static const struct dm_arg _args[] = { 61 - {0, 7, "Invalid number of feature args"}, 57 + {0, 11, "Invalid number of feature args"}, 62 58 {1, UINT_MAX, "Invalid corrupt bio byte"}, 63 59 {0, 255, "Invalid corrupt value to write into bio byte (0-255)"}, 64 60 {0, UINT_MAX, "Invalid corrupt bio flags mask"}, 61 + {0, PROBABILITY_BASE, "Invalid random corrupt argument"}, 65 62 }; 66 63 67 64 /* No feature arguments supplied. */ ··· 175 170 continue; 176 171 } 177 172 173 + if (!strcasecmp(arg_name, "random_read_corrupt")) { 174 + if (!argc) { 175 + ti->error = "Feature random_read_corrupt requires a parameter"; 176 + return -EINVAL; 177 + } 178 + r = dm_read_arg(_args + 4, as, &fc->random_read_corrupt, &ti->error); 179 + if (r) 180 + return r; 181 + argc--; 182 + 183 + continue; 184 + } 185 + 186 + if (!strcasecmp(arg_name, "random_write_corrupt")) { 187 + if (!argc) { 188 + ti->error = "Feature random_write_corrupt requires a parameter"; 189 + return -EINVAL; 190 + } 191 + r = dm_read_arg(_args + 4, as, &fc->random_write_corrupt, &ti->error); 192 + if (r) 193 + return r; 194 + argc--; 195 + 196 + continue; 197 + } 198 + 178 199 ti->error = "Unrecognised flakey feature requested"; 179 200 return -EINVAL; 180 201 } ··· 215 184 } 216 185 217 186 if (!fc->corrupt_bio_byte && !test_bit(ERROR_READS, &fc->flags) && 218 - !test_bit(DROP_WRITES, &fc->flags) && !test_bit(ERROR_WRITES, &fc->flags)) { 187 + !test_bit(DROP_WRITES, &fc->flags) && !test_bit(ERROR_WRITES, &fc->flags) && 188 + !fc->random_read_corrupt && !fc->random_write_corrupt) { 219 189 set_bit(ERROR_WRITES, &fc->flags); 220 190 set_bit(ERROR_READS, &fc->flags); 221 191 } ··· 338 306 bio->bi_iter.bi_sector = flakey_map_sector(ti, bio->bi_iter.bi_sector); 339 307 } 340 308 341 - static void corrupt_bio_data(struct bio *bio, struct flakey_c *fc) 309 + static void corrupt_bio_common(struct bio *bio, unsigned int corrupt_bio_byte, 310 + unsigned char corrupt_bio_value) 342 311 { 343 - unsigned int corrupt_bio_byte = fc->corrupt_bio_byte - 1; 344 - 345 312 struct bvec_iter iter; 346 313 struct bio_vec bvec; 347 - 348 - if (!bio_has_data(bio)) 349 - return; 350 314 351 315 /* 352 316 * Overwrite the Nth byte of the bio's data, on whichever page ··· 350 322 */ 351 323 bio_for_each_segment(bvec, bio, iter) { 352 324 if (bio_iter_len(bio, iter) > corrupt_bio_byte) { 353 - char *segment; 354 - struct page *page = bio_iter_page(bio, iter); 355 - if (unlikely(page == ZERO_PAGE(0))) 356 - break; 357 - segment = bvec_kmap_local(&bvec); 358 - segment[corrupt_bio_byte] = fc->corrupt_bio_value; 325 + unsigned char *segment = bvec_kmap_local(&bvec); 326 + segment[corrupt_bio_byte] = corrupt_bio_value; 359 327 kunmap_local(segment); 360 328 DMDEBUG("Corrupting data bio=%p by writing %u to byte %u " 361 329 "(rw=%c bi_opf=%u bi_sector=%llu size=%u)\n", 362 - bio, fc->corrupt_bio_value, fc->corrupt_bio_byte, 330 + bio, corrupt_bio_value, corrupt_bio_byte, 363 331 (bio_data_dir(bio) == WRITE) ? 'w' : 'r', bio->bi_opf, 364 - (unsigned long long)bio->bi_iter.bi_sector, bio->bi_iter.bi_size); 332 + (unsigned long long)bio->bi_iter.bi_sector, 333 + bio->bi_iter.bi_size); 365 334 break; 366 335 } 367 336 corrupt_bio_byte -= bio_iter_len(bio, iter); 368 337 } 338 + } 339 + 340 + static void corrupt_bio_data(struct bio *bio, struct flakey_c *fc) 341 + { 342 + unsigned int corrupt_bio_byte = fc->corrupt_bio_byte - 1; 343 + 344 + if (!bio_has_data(bio)) 345 + return; 346 + 347 + corrupt_bio_common(bio, corrupt_bio_byte, fc->corrupt_bio_value); 348 + } 349 + 350 + static void corrupt_bio_random(struct bio *bio) 351 + { 352 + unsigned int corrupt_byte; 353 + unsigned char corrupt_value; 354 + 355 + if (!bio_has_data(bio)) 356 + return; 357 + 358 + corrupt_byte = get_random_u32() % bio->bi_iter.bi_size; 359 + corrupt_value = get_random_u8(); 360 + 361 + corrupt_bio_common(bio, corrupt_byte, corrupt_value); 362 + } 363 + 364 + static void clone_free(struct bio *clone) 365 + { 366 + struct folio_iter fi; 367 + 368 + if (clone->bi_vcnt > 0) { /* bio_for_each_folio_all crashes with an empty bio */ 369 + bio_for_each_folio_all(fi, clone) 370 + folio_put(fi.folio); 371 + } 372 + 373 + bio_uninit(clone); 374 + kfree(clone); 375 + } 376 + 377 + static void clone_endio(struct bio *clone) 378 + { 379 + struct bio *bio = clone->bi_private; 380 + bio->bi_status = clone->bi_status; 381 + clone_free(clone); 382 + bio_endio(bio); 383 + } 384 + 385 + static struct bio *clone_bio(struct dm_target *ti, struct flakey_c *fc, struct bio *bio) 386 + { 387 + struct bio *clone; 388 + unsigned size, remaining_size, nr_iovecs, order; 389 + struct bvec_iter iter = bio->bi_iter; 390 + 391 + if (unlikely(bio->bi_iter.bi_size > UIO_MAXIOV << PAGE_SHIFT)) 392 + dm_accept_partial_bio(bio, UIO_MAXIOV << PAGE_SHIFT >> SECTOR_SHIFT); 393 + 394 + size = bio->bi_iter.bi_size; 395 + nr_iovecs = (size + PAGE_SIZE - 1) >> PAGE_SHIFT; 396 + 397 + clone = bio_kmalloc(nr_iovecs, GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN); 398 + if (!clone) 399 + return NULL; 400 + 401 + bio_init(clone, fc->dev->bdev, bio->bi_inline_vecs, nr_iovecs, bio->bi_opf); 402 + 403 + clone->bi_iter.bi_sector = flakey_map_sector(ti, bio->bi_iter.bi_sector); 404 + clone->bi_private = bio; 405 + clone->bi_end_io = clone_endio; 406 + 407 + remaining_size = size; 408 + 409 + order = MAX_ORDER - 1; 410 + while (remaining_size) { 411 + struct page *pages; 412 + unsigned size_to_add, to_copy; 413 + unsigned char *virt; 414 + unsigned remaining_order = __fls((remaining_size + PAGE_SIZE - 1) >> PAGE_SHIFT); 415 + order = min(order, remaining_order); 416 + 417 + retry_alloc_pages: 418 + pages = alloc_pages(GFP_NOIO | __GFP_NORETRY | __GFP_NOWARN | __GFP_COMP, order); 419 + if (unlikely(!pages)) { 420 + if (order) { 421 + order--; 422 + goto retry_alloc_pages; 423 + } 424 + clone_free(clone); 425 + return NULL; 426 + } 427 + size_to_add = min((unsigned)PAGE_SIZE << order, remaining_size); 428 + 429 + virt = page_to_virt(pages); 430 + to_copy = size_to_add; 431 + do { 432 + struct bio_vec bvec = bvec_iter_bvec(bio->bi_io_vec, iter); 433 + unsigned this_step = min(bvec.bv_len, to_copy); 434 + void *map = bvec_kmap_local(&bvec); 435 + memcpy(virt, map, this_step); 436 + kunmap_local(map); 437 + 438 + bvec_iter_advance(bio->bi_io_vec, &iter, this_step); 439 + to_copy -= this_step; 440 + virt += this_step; 441 + } while (to_copy); 442 + 443 + __bio_add_page(clone, pages, size_to_add, 0); 444 + remaining_size -= size_to_add; 445 + } 446 + 447 + return clone; 369 448 } 370 449 371 450 static int flakey_map(struct dm_target *ti, struct bio *bio) ··· 489 354 /* Are we alive ? */ 490 355 elapsed = (jiffies - fc->start_time) / HZ; 491 356 if (elapsed % (fc->up_interval + fc->down_interval) >= fc->up_interval) { 357 + bool corrupt_fixed, corrupt_random; 492 358 /* 493 359 * Flag this bio as submitted while down. 494 360 */ ··· 519 383 /* 520 384 * Corrupt matching writes. 521 385 */ 522 - if (fc->corrupt_bio_byte) { 523 - if (fc->corrupt_bio_rw == WRITE) { 524 - if (all_corrupt_bio_flags_match(bio, fc)) 525 - corrupt_bio_data(bio, fc); 386 + corrupt_fixed = false; 387 + corrupt_random = false; 388 + if (fc->corrupt_bio_byte && fc->corrupt_bio_rw == WRITE) { 389 + if (all_corrupt_bio_flags_match(bio, fc)) 390 + corrupt_fixed = true; 391 + } 392 + if (fc->random_write_corrupt) { 393 + u64 rnd = get_random_u64(); 394 + u32 rem = do_div(rnd, PROBABILITY_BASE); 395 + if (rem < fc->random_write_corrupt) 396 + corrupt_random = true; 397 + } 398 + if (corrupt_fixed || corrupt_random) { 399 + struct bio *clone = clone_bio(ti, fc, bio); 400 + if (clone) { 401 + if (corrupt_fixed) 402 + corrupt_bio_data(clone, fc); 403 + if (corrupt_random) 404 + corrupt_bio_random(clone); 405 + submit_bio(clone); 406 + return DM_MAPIO_SUBMITTED; 526 407 } 527 - goto map_bio; 528 408 } 529 409 } 530 410 ··· 568 416 */ 569 417 corrupt_bio_data(bio, fc); 570 418 } 419 + } 420 + if (fc->random_read_corrupt) { 421 + u64 rnd = get_random_u64(); 422 + u32 rem = do_div(rnd, PROBABILITY_BASE); 423 + if (rem < fc->random_read_corrupt) 424 + corrupt_bio_random(bio); 571 425 } 572 426 if (test_bit(ERROR_READS, &fc->flags)) { 573 427 /* ··· 607 449 error_reads = test_bit(ERROR_READS, &fc->flags); 608 450 drop_writes = test_bit(DROP_WRITES, &fc->flags); 609 451 error_writes = test_bit(ERROR_WRITES, &fc->flags); 610 - DMEMIT(" %u", error_reads + drop_writes + error_writes + (fc->corrupt_bio_byte > 0) * 5); 452 + DMEMIT(" %u", error_reads + drop_writes + error_writes + 453 + (fc->corrupt_bio_byte > 0) * 5 + 454 + (fc->random_read_corrupt > 0) * 2 + 455 + (fc->random_write_corrupt > 0) * 2); 611 456 612 457 if (error_reads) 613 458 DMEMIT(" error_reads"); ··· 624 463 fc->corrupt_bio_byte, 625 464 (fc->corrupt_bio_rw == WRITE) ? 'w' : 'r', 626 465 fc->corrupt_bio_value, fc->corrupt_bio_flags); 466 + 467 + if (fc->random_read_corrupt > 0) 468 + DMEMIT(" random_read_corrupt %u", fc->random_read_corrupt); 469 + if (fc->random_write_corrupt > 0) 470 + DMEMIT(" random_write_corrupt %u", fc->random_write_corrupt); 627 471 628 472 break; 629 473
+38 -47
drivers/md/dm-integrity.c
··· 34 34 #define DEFAULT_BUFFER_SECTORS 128 35 35 #define DEFAULT_JOURNAL_WATERMARK 50 36 36 #define DEFAULT_SYNC_MSEC 10000 37 - #define DEFAULT_MAX_JOURNAL_SECTORS 131072 37 + #define DEFAULT_MAX_JOURNAL_SECTORS (IS_ENABLED(CONFIG_64BIT) ? 131072 : 8192) 38 38 #define MIN_LOG2_INTERLEAVE_SECTORS 3 39 39 #define MAX_LOG2_INTERLEAVE_SECTORS 31 40 40 #define METADATA_WORKQUEUE_MAX_ACTIVE 16 41 - #define RECALC_SECTORS 32768 41 + #define RECALC_SECTORS (IS_ENABLED(CONFIG_64BIT) ? 32768 : 2048) 42 42 #define RECALC_WRITE_SUPER 16 43 43 #define BITMAP_BLOCK_SIZE 4096 /* don't change it */ 44 44 #define BITMAP_FLUSH_INTERVAL (10 * HZ) ··· 251 251 252 252 struct workqueue_struct *recalc_wq; 253 253 struct work_struct recalc_work; 254 - u8 *recalc_buffer; 255 - u8 *recalc_tags; 256 254 257 255 struct bio_list flush_bio_list; 258 256 ··· 340 342 #define JOURNAL_IO_MEMPOOL 32 341 343 342 344 #ifdef DEBUG_PRINT 343 - #define DEBUG_print(x, ...) printk(KERN_DEBUG x, ##__VA_ARGS__) 344 - static void __DEBUG_bytes(__u8 *bytes, size_t len, const char *msg, ...) 345 - { 346 - va_list args; 347 - 348 - va_start(args, msg); 349 - vprintk(msg, args); 350 - va_end(args); 351 - if (len) 352 - pr_cont(":"); 353 - while (len) { 354 - pr_cont(" %02x", *bytes); 355 - bytes++; 356 - len--; 357 - } 358 - pr_cont("\n"); 359 - } 360 - #define DEBUG_bytes(bytes, len, msg, ...) __DEBUG_bytes(bytes, len, KERN_DEBUG msg, ##__VA_ARGS__) 345 + #define DEBUG_print(x, ...) printk(KERN_DEBUG x, ##__VA_ARGS__) 346 + #define DEBUG_bytes(bytes, len, msg, ...) printk(KERN_DEBUG msg "%s%*ph\n", ##__VA_ARGS__, \ 347 + len ? ": " : "", len, bytes) 361 348 #else 362 349 #define DEBUG_print(x, ...) do { } while (0) 363 350 #define DEBUG_bytes(bytes, len, msg, ...) do { } while (0) ··· 2644 2661 static void integrity_recalc(struct work_struct *w) 2645 2662 { 2646 2663 struct dm_integrity_c *ic = container_of(w, struct dm_integrity_c, recalc_work); 2664 + size_t recalc_tags_size; 2665 + u8 *recalc_buffer = NULL; 2666 + u8 *recalc_tags = NULL; 2647 2667 struct dm_integrity_range range; 2648 2668 struct dm_io_request io_req; 2649 2669 struct dm_io_region io_loc; ··· 2658 2672 unsigned int i; 2659 2673 int r; 2660 2674 unsigned int super_counter = 0; 2675 + unsigned recalc_sectors = RECALC_SECTORS; 2676 + 2677 + retry: 2678 + recalc_buffer = __vmalloc(recalc_sectors << SECTOR_SHIFT, GFP_NOIO); 2679 + if (!recalc_buffer) { 2680 + oom: 2681 + recalc_sectors >>= 1; 2682 + if (recalc_sectors >= 1U << ic->sb->log2_sectors_per_block) 2683 + goto retry; 2684 + DMCRIT("out of memory for recalculate buffer - recalculation disabled"); 2685 + goto free_ret; 2686 + } 2687 + recalc_tags_size = (recalc_sectors >> ic->sb->log2_sectors_per_block) * ic->tag_size; 2688 + if (crypto_shash_digestsize(ic->internal_hash) > ic->tag_size) 2689 + recalc_tags_size += crypto_shash_digestsize(ic->internal_hash) - ic->tag_size; 2690 + recalc_tags = kvmalloc(recalc_tags_size, GFP_NOIO); 2691 + if (!recalc_tags) { 2692 + vfree(recalc_buffer); 2693 + goto oom; 2694 + } 2661 2695 2662 2696 DEBUG_print("start recalculation... (position %llx)\n", le64_to_cpu(ic->sb->recalc_sector)); 2663 2697 ··· 2699 2693 } 2700 2694 2701 2695 get_area_and_offset(ic, range.logical_sector, &area, &offset); 2702 - range.n_sectors = min((sector_t)RECALC_SECTORS, ic->provided_data_sectors - range.logical_sector); 2696 + range.n_sectors = min((sector_t)recalc_sectors, ic->provided_data_sectors - range.logical_sector); 2703 2697 if (!ic->meta_dev) 2704 2698 range.n_sectors = min(range.n_sectors, ((sector_t)1U << ic->sb->log2_interleave_sectors) - (unsigned int)offset); 2705 2699 ··· 2741 2735 2742 2736 io_req.bi_opf = REQ_OP_READ; 2743 2737 io_req.mem.type = DM_IO_VMA; 2744 - io_req.mem.ptr.addr = ic->recalc_buffer; 2738 + io_req.mem.ptr.addr = recalc_buffer; 2745 2739 io_req.notify.fn = NULL; 2746 2740 io_req.client = ic->io; 2747 2741 io_loc.bdev = ic->dev->bdev; ··· 2754 2748 goto err; 2755 2749 } 2756 2750 2757 - t = ic->recalc_tags; 2751 + t = recalc_tags; 2758 2752 for (i = 0; i < n_sectors; i += ic->sectors_per_block) { 2759 - integrity_sector_checksum(ic, logical_sector + i, ic->recalc_buffer + (i << SECTOR_SHIFT), t); 2753 + integrity_sector_checksum(ic, logical_sector + i, recalc_buffer + (i << SECTOR_SHIFT), t); 2760 2754 t += ic->tag_size; 2761 2755 } 2762 2756 2763 2757 metadata_block = get_metadata_sector_and_offset(ic, area, offset, &metadata_offset); 2764 2758 2765 - r = dm_integrity_rw_tag(ic, ic->recalc_tags, &metadata_block, &metadata_offset, t - ic->recalc_tags, TAG_WRITE); 2759 + r = dm_integrity_rw_tag(ic, recalc_tags, &metadata_block, &metadata_offset, t - recalc_tags, TAG_WRITE); 2766 2760 if (unlikely(r)) { 2767 2761 dm_integrity_io_error(ic, "writing tags", r); 2768 2762 goto err; ··· 2790 2784 2791 2785 err: 2792 2786 remove_range(ic, &range); 2793 - return; 2787 + goto free_ret; 2794 2788 2795 2789 unlock_ret: 2796 2790 spin_unlock_irq(&ic->endio_wait.lock); 2797 2791 2798 2792 recalc_write_super(ic); 2793 + 2794 + free_ret: 2795 + vfree(recalc_buffer); 2796 + kvfree(recalc_tags); 2799 2797 } 2800 2798 2801 2799 static void bitmap_block_work(struct work_struct *w) ··· 4464 4454 } 4465 4455 4466 4456 if (ic->internal_hash) { 4467 - size_t recalc_tags_size; 4468 - 4469 4457 ic->recalc_wq = alloc_workqueue("dm-integrity-recalc", WQ_MEM_RECLAIM, 1); 4470 4458 if (!ic->recalc_wq) { 4471 4459 ti->error = "Cannot allocate workqueue"; ··· 4471 4463 goto bad; 4472 4464 } 4473 4465 INIT_WORK(&ic->recalc_work, integrity_recalc); 4474 - ic->recalc_buffer = vmalloc(RECALC_SECTORS << SECTOR_SHIFT); 4475 - if (!ic->recalc_buffer) { 4476 - ti->error = "Cannot allocate buffer for recalculating"; 4477 - r = -ENOMEM; 4478 - goto bad; 4479 - } 4480 - recalc_tags_size = (RECALC_SECTORS >> ic->sb->log2_sectors_per_block) * ic->tag_size; 4481 - if (crypto_shash_digestsize(ic->internal_hash) > ic->tag_size) 4482 - recalc_tags_size += crypto_shash_digestsize(ic->internal_hash) - ic->tag_size; 4483 - ic->recalc_tags = kvmalloc(recalc_tags_size, GFP_KERNEL); 4484 - if (!ic->recalc_tags) { 4485 - ti->error = "Cannot allocate tags for recalculating"; 4486 - r = -ENOMEM; 4487 - goto bad; 4488 - } 4489 4466 } else { 4490 4467 if (ic->sb->flags & cpu_to_le32(SB_FLAG_RECALCULATING)) { 4491 4468 ti->error = "Recalculate can only be specified with internal_hash"; ··· 4614 4621 destroy_workqueue(ic->writer_wq); 4615 4622 if (ic->recalc_wq) 4616 4623 destroy_workqueue(ic->recalc_wq); 4617 - vfree(ic->recalc_buffer); 4618 - kvfree(ic->recalc_tags); 4619 4624 kvfree(ic->bbs); 4620 4625 if (ic->bufio) 4621 4626 dm_bufio_client_destroy(ic->bufio);
+75 -23
drivers/md/dm-ioctl.c
··· 767 767 static int check_name(const char *name) 768 768 { 769 769 if (strchr(name, '/')) { 770 - DMERR("invalid device name"); 770 + DMERR("device name cannot contain '/'"); 771 + return -EINVAL; 772 + } 773 + 774 + if (strcmp(name, DM_CONTROL_NODE) == 0 || 775 + strcmp(name, ".") == 0 || 776 + strcmp(name, "..") == 0) { 777 + DMERR("device name cannot be \"%s\", \".\", or \"..\"", DM_CONTROL_NODE); 771 778 return -EINVAL; 772 779 } 773 780 ··· 1395 1388 return mode; 1396 1389 } 1397 1390 1398 - static int next_target(struct dm_target_spec *last, uint32_t next, void *end, 1391 + static int next_target(struct dm_target_spec *last, uint32_t next, const char *end, 1399 1392 struct dm_target_spec **spec, char **target_params) 1400 1393 { 1394 + static_assert(__alignof__(struct dm_target_spec) <= 8, 1395 + "struct dm_target_spec must not require more than 8-byte alignment"); 1396 + 1397 + /* 1398 + * Number of bytes remaining, starting with last. This is always 1399 + * sizeof(struct dm_target_spec) or more, as otherwise *last was 1400 + * out of bounds already. 1401 + */ 1402 + size_t remaining = end - (char *)last; 1403 + 1404 + /* 1405 + * There must be room for both the next target spec and the 1406 + * NUL-terminator of the target itself. 1407 + */ 1408 + if (remaining - sizeof(struct dm_target_spec) <= next) { 1409 + DMERR("Target spec extends beyond end of parameters"); 1410 + return -EINVAL; 1411 + } 1412 + 1413 + if (next % __alignof__(struct dm_target_spec)) { 1414 + DMERR("Next dm_target_spec (offset %u) is not %zu-byte aligned", 1415 + next, __alignof__(struct dm_target_spec)); 1416 + return -EINVAL; 1417 + } 1418 + 1401 1419 *spec = (struct dm_target_spec *) ((unsigned char *) last + next); 1402 1420 *target_params = (char *) (*spec + 1); 1403 1421 1404 - if (*spec < (last + 1)) 1405 - return -EINVAL; 1406 - 1407 - return invalid_str(*target_params, end); 1422 + return 0; 1408 1423 } 1409 1424 1410 1425 static int populate_table(struct dm_table *table, ··· 1436 1407 unsigned int i = 0; 1437 1408 struct dm_target_spec *spec = (struct dm_target_spec *) param; 1438 1409 uint32_t next = param->data_start; 1439 - void *end = (void *) param + param_size; 1410 + const char *const end = (const char *) param + param_size; 1440 1411 char *target_params; 1412 + size_t min_size = sizeof(struct dm_ioctl); 1441 1413 1442 1414 if (!param->target_count) { 1443 1415 DMERR("%s: no targets specified", __func__); ··· 1446 1416 } 1447 1417 1448 1418 for (i = 0; i < param->target_count; i++) { 1419 + const char *nul_terminator; 1420 + 1421 + if (next < min_size) { 1422 + DMERR("%s: next target spec (offset %u) overlaps %s", 1423 + __func__, next, i ? "previous target" : "'struct dm_ioctl'"); 1424 + return -EINVAL; 1425 + } 1449 1426 1450 1427 r = next_target(spec, next, end, &spec, &target_params); 1451 1428 if (r) { 1452 1429 DMERR("unable to find target"); 1453 1430 return r; 1454 1431 } 1432 + 1433 + nul_terminator = memchr(target_params, 0, (size_t)(end - target_params)); 1434 + if (nul_terminator == NULL) { 1435 + DMERR("%s: target parameters not NUL-terminated", __func__); 1436 + return -EINVAL; 1437 + } 1438 + 1439 + /* Add 1 for NUL terminator */ 1440 + min_size = (size_t)(nul_terminator - (const char *)spec) + 1; 1455 1441 1456 1442 r = dm_table_add_target(table, spec->target_type, 1457 1443 (sector_t) spec->sector_start, ··· 1876 1830 * As well as checking the version compatibility this always 1877 1831 * copies the kernel interface version out. 1878 1832 */ 1879 - static int check_version(unsigned int cmd, struct dm_ioctl __user *user) 1833 + static int check_version(unsigned int cmd, struct dm_ioctl __user *user, 1834 + struct dm_ioctl *kernel_params) 1880 1835 { 1881 - uint32_t version[3]; 1882 1836 int r = 0; 1883 1837 1884 - if (copy_from_user(version, user->version, sizeof(version))) 1838 + /* Make certain version is first member of dm_ioctl struct */ 1839 + BUILD_BUG_ON(offsetof(struct dm_ioctl, version) != 0); 1840 + 1841 + if (copy_from_user(kernel_params->version, user->version, sizeof(kernel_params->version))) 1885 1842 return -EFAULT; 1886 1843 1887 - if ((version[0] != DM_VERSION_MAJOR) || 1888 - (version[1] > DM_VERSION_MINOR)) { 1844 + if ((kernel_params->version[0] != DM_VERSION_MAJOR) || 1845 + (kernel_params->version[1] > DM_VERSION_MINOR)) { 1889 1846 DMERR("ioctl interface mismatch: kernel(%u.%u.%u), user(%u.%u.%u), cmd(%d)", 1890 1847 DM_VERSION_MAJOR, DM_VERSION_MINOR, 1891 1848 DM_VERSION_PATCHLEVEL, 1892 - version[0], version[1], version[2], cmd); 1849 + kernel_params->version[0], 1850 + kernel_params->version[1], 1851 + kernel_params->version[2], 1852 + cmd); 1893 1853 r = -EINVAL; 1894 1854 } 1895 1855 1896 1856 /* 1897 1857 * Fill in the kernel version. 1898 1858 */ 1899 - version[0] = DM_VERSION_MAJOR; 1900 - version[1] = DM_VERSION_MINOR; 1901 - version[2] = DM_VERSION_PATCHLEVEL; 1902 - if (copy_to_user(user->version, version, sizeof(version))) 1859 + kernel_params->version[0] = DM_VERSION_MAJOR; 1860 + kernel_params->version[1] = DM_VERSION_MINOR; 1861 + kernel_params->version[2] = DM_VERSION_PATCHLEVEL; 1862 + if (copy_to_user(user->version, kernel_params->version, sizeof(kernel_params->version))) 1903 1863 return -EFAULT; 1904 1864 1905 1865 return r; ··· 1929 1877 struct dm_ioctl *dmi; 1930 1878 int secure_data; 1931 1879 const size_t minimum_data_size = offsetof(struct dm_ioctl, data); 1932 - unsigned int noio_flag; 1933 1880 1934 - if (copy_from_user(param_kernel, user, minimum_data_size)) 1881 + /* check_version() already copied version from userspace, avoid TOCTOU */ 1882 + if (copy_from_user((char *)param_kernel + sizeof(param_kernel->version), 1883 + (char __user *)user + sizeof(param_kernel->version), 1884 + minimum_data_size - sizeof(param_kernel->version))) 1935 1885 return -EFAULT; 1936 1886 1937 1887 if (param_kernel->data_size < minimum_data_size) { ··· 1958 1904 * Use kmalloc() rather than vmalloc() when we can. 1959 1905 */ 1960 1906 dmi = NULL; 1961 - noio_flag = memalloc_noio_save(); 1962 - dmi = kvmalloc(param_kernel->data_size, GFP_KERNEL | __GFP_HIGH); 1963 - memalloc_noio_restore(noio_flag); 1907 + dmi = kvmalloc(param_kernel->data_size, GFP_NOIO | __GFP_HIGH); 1964 1908 1965 1909 if (!dmi) { 1966 1910 if (secure_data && clear_user(user, param_kernel->data_size)) ··· 2043 1991 * Check the interface version passed in. This also 2044 1992 * writes out the kernel's interface version. 2045 1993 */ 2046 - r = check_version(cmd, user); 1994 + r = check_version(cmd, user, &param_kernel); 2047 1995 if (r) 2048 1996 return r; 2049 1997
+26 -34
drivers/md/dm-thin-metadata.c
··· 603 603 r = dm_tm_create_with_sm(pmd->bm, THIN_SUPERBLOCK_LOCATION, 604 604 &pmd->tm, &pmd->metadata_sm); 605 605 if (r < 0) { 606 + pmd->tm = NULL; 607 + pmd->metadata_sm = NULL; 606 608 DMERR("tm_create_with_sm failed"); 607 609 return r; 608 610 } ··· 613 611 if (IS_ERR(pmd->data_sm)) { 614 612 DMERR("sm_disk_create failed"); 615 613 r = PTR_ERR(pmd->data_sm); 614 + pmd->data_sm = NULL; 616 615 goto bad_cleanup_tm; 617 616 } 618 617 ··· 644 641 645 642 bad_cleanup_nb_tm: 646 643 dm_tm_destroy(pmd->nb_tm); 644 + pmd->nb_tm = NULL; 647 645 bad_cleanup_data_sm: 648 646 dm_sm_destroy(pmd->data_sm); 647 + pmd->data_sm = NULL; 649 648 bad_cleanup_tm: 650 649 dm_tm_destroy(pmd->tm); 650 + pmd->tm = NULL; 651 651 dm_sm_destroy(pmd->metadata_sm); 652 + pmd->metadata_sm = NULL; 652 653 653 654 return r; 654 655 } ··· 718 711 sizeof(disk_super->metadata_space_map_root), 719 712 &pmd->tm, &pmd->metadata_sm); 720 713 if (r < 0) { 714 + pmd->tm = NULL; 715 + pmd->metadata_sm = NULL; 721 716 DMERR("tm_open_with_sm failed"); 722 717 goto bad_unlock_sblock; 723 718 } ··· 729 720 if (IS_ERR(pmd->data_sm)) { 730 721 DMERR("sm_disk_open failed"); 731 722 r = PTR_ERR(pmd->data_sm); 723 + pmd->data_sm = NULL; 732 724 goto bad_cleanup_tm; 733 725 } 734 726 ··· 756 746 757 747 bad_cleanup_data_sm: 758 748 dm_sm_destroy(pmd->data_sm); 749 + pmd->data_sm = NULL; 759 750 bad_cleanup_tm: 760 751 dm_tm_destroy(pmd->tm); 752 + pmd->tm = NULL; 761 753 dm_sm_destroy(pmd->metadata_sm); 754 + pmd->metadata_sm = NULL; 762 755 bad_unlock_sblock: 763 756 dm_bm_unlock(sblock); 764 757 ··· 808 795 bool destroy_bm) 809 796 { 810 797 dm_sm_destroy(pmd->data_sm); 798 + pmd->data_sm = NULL; 811 799 dm_sm_destroy(pmd->metadata_sm); 800 + pmd->metadata_sm = NULL; 812 801 dm_tm_destroy(pmd->nb_tm); 802 + pmd->nb_tm = NULL; 813 803 dm_tm_destroy(pmd->tm); 804 + pmd->tm = NULL; 814 805 if (destroy_bm) 815 806 dm_block_manager_destroy(pmd->bm); 816 807 } ··· 1022 1005 __func__, r); 1023 1006 } 1024 1007 pmd_write_unlock(pmd); 1025 - if (!pmd->fail_io) 1026 - __destroy_persistent_data_objects(pmd, true); 1008 + __destroy_persistent_data_objects(pmd, true); 1027 1009 1028 1010 kfree(pmd); 1029 1011 return 0; ··· 1897 1881 int dm_pool_abort_metadata(struct dm_pool_metadata *pmd) 1898 1882 { 1899 1883 int r = -EINVAL; 1900 - struct dm_block_manager *old_bm = NULL, *new_bm = NULL; 1901 1884 1902 1885 /* fail_io is double-checked with pmd->root_lock held below */ 1903 1886 if (unlikely(pmd->fail_io)) 1904 1887 return r; 1905 1888 1906 - /* 1907 - * Replacement block manager (new_bm) is created and old_bm destroyed outside of 1908 - * pmd root_lock to avoid ABBA deadlock that would result (due to life-cycle of 1909 - * shrinker associated with the block manager's bufio client vs pmd root_lock). 1910 - * - must take shrinker_rwsem without holding pmd->root_lock 1911 - */ 1912 - new_bm = dm_block_manager_create(pmd->bdev, THIN_METADATA_BLOCK_SIZE << SECTOR_SHIFT, 1913 - THIN_MAX_CONCURRENT_LOCKS); 1914 - 1915 1889 pmd_write_lock(pmd); 1916 1890 if (pmd->fail_io) { 1917 1891 pmd_write_unlock(pmd); 1918 - goto out; 1892 + return r; 1919 1893 } 1920 - 1921 1894 __set_abort_with_changes_flags(pmd); 1922 - __destroy_persistent_data_objects(pmd, false); 1923 - old_bm = pmd->bm; 1924 - if (IS_ERR(new_bm)) { 1925 - DMERR("could not create block manager during abort"); 1926 - pmd->bm = NULL; 1927 - r = PTR_ERR(new_bm); 1928 - goto out_unlock; 1929 - } 1930 1895 1931 - pmd->bm = new_bm; 1896 + /* destroy data_sm/metadata_sm/nb_tm/tm */ 1897 + __destroy_persistent_data_objects(pmd, false); 1898 + 1899 + /* reset bm */ 1900 + dm_block_manager_reset(pmd->bm); 1901 + 1902 + /* rebuild data_sm/metadata_sm/nb_tm/tm */ 1932 1903 r = __open_or_format_metadata(pmd, false); 1933 - if (r) { 1934 - pmd->bm = NULL; 1935 - goto out_unlock; 1936 - } 1937 - new_bm = NULL; 1938 - out_unlock: 1939 1904 if (r) 1940 1905 pmd->fail_io = true; 1941 1906 pmd_write_unlock(pmd); 1942 - dm_block_manager_destroy(old_bm); 1943 - out: 1944 - if (new_bm && !IS_ERR(new_bm)) 1945 - dm_block_manager_destroy(new_bm); 1946 - 1947 1907 return r; 1948 1908 } 1949 1909
+17 -24
drivers/md/dm-thin.c
··· 2527 2527 2528 2528 /*----------------------------------------------------------------*/ 2529 2529 2530 - static bool passdown_enabled(struct pool_c *pt) 2531 - { 2532 - return pt->adjusted_pf.discard_passdown; 2533 - } 2534 - 2535 2530 static void set_discard_callbacks(struct pool *pool) 2536 2531 { 2537 2532 struct pool_c *pt = pool->ti->private; 2538 2533 2539 - if (passdown_enabled(pt)) { 2534 + if (pt->adjusted_pf.discard_passdown) { 2540 2535 pool->process_discard_cell = process_discard_cell_passdown; 2541 2536 pool->process_prepared_discard = process_prepared_discard_passdown_pt1; 2542 2537 pool->process_prepared_discard_pt2 = process_prepared_discard_passdown_pt2; ··· 2840 2845 * If discard_passdown was enabled verify that the data device 2841 2846 * supports discards. Disable discard_passdown if not. 2842 2847 */ 2843 - static void disable_passdown_if_not_supported(struct pool_c *pt) 2848 + static void disable_discard_passdown_if_not_supported(struct pool_c *pt) 2844 2849 { 2845 2850 struct pool *pool = pt->pool; 2846 2851 struct block_device *data_bdev = pt->data_dev->bdev; ··· 3441 3446 3442 3447 static int pool_map(struct dm_target *ti, struct bio *bio) 3443 3448 { 3444 - int r; 3445 3449 struct pool_c *pt = ti->private; 3446 3450 struct pool *pool = pt->pool; 3447 3451 ··· 3449 3455 */ 3450 3456 spin_lock_irq(&pool->lock); 3451 3457 bio_set_dev(bio, pt->data_dev->bdev); 3452 - r = DM_MAPIO_REMAPPED; 3453 3458 spin_unlock_irq(&pool->lock); 3454 3459 3455 - return r; 3460 + return DM_MAPIO_REMAPPED; 3456 3461 } 3457 3462 3458 3463 static int maybe_resize_data_dev(struct dm_target *ti, bool *need_commit) ··· 4092 4099 * They get transferred to the live pool in bind_control_target() 4093 4100 * called from pool_preresume(). 4094 4101 */ 4095 - if (!pt->adjusted_pf.discard_enabled) { 4102 + 4103 + if (pt->adjusted_pf.discard_enabled) { 4104 + disable_discard_passdown_if_not_supported(pt); 4105 + if (!pt->adjusted_pf.discard_passdown) 4106 + limits->max_discard_sectors = 0; 4107 + /* 4108 + * The pool uses the same discard limits as the underlying data 4109 + * device. DM core has already set this up. 4110 + */ 4111 + } else { 4096 4112 /* 4097 4113 * Must explicitly disallow stacking discard limits otherwise the 4098 4114 * block layer will stack them if pool's data device has support. 4099 4115 */ 4100 4116 limits->discard_granularity = 0; 4101 - return; 4102 4117 } 4103 - 4104 - disable_passdown_if_not_supported(pt); 4105 - 4106 - /* 4107 - * The pool uses the same discard limits as the underlying data 4108 - * device. DM core has already set this up. 4109 - */ 4110 4118 } 4111 4119 4112 4120 static struct target_type pool_target = { ··· 4491 4497 struct thin_c *tc = ti->private; 4492 4498 struct pool *pool = tc->pool; 4493 4499 4494 - if (!pool->pf.discard_enabled) 4495 - return; 4496 - 4497 - limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT; 4498 - limits->max_discard_sectors = pool->sectors_per_block * BIO_PRISON_MAX_RANGE; 4500 + if (pool->pf.discard_enabled) { 4501 + limits->discard_granularity = pool->sectors_per_block << SECTOR_SHIFT; 4502 + limits->max_discard_sectors = pool->sectors_per_block * BIO_PRISON_MAX_RANGE; 4503 + } 4499 4504 } 4500 4505 4501 4506 static struct target_type thin_target = {
+7 -8
drivers/md/dm-zone.c
··· 7 7 #include <linux/mm.h> 8 8 #include <linux/sched/mm.h> 9 9 #include <linux/slab.h> 10 + #include <linux/bitmap.h> 10 11 11 12 #include "dm-core.h" 12 13 ··· 141 140 void dm_cleanup_zoned_dev(struct mapped_device *md) 142 141 { 143 142 if (md->disk) { 144 - kfree(md->disk->conv_zones_bitmap); 143 + bitmap_free(md->disk->conv_zones_bitmap); 145 144 md->disk->conv_zones_bitmap = NULL; 146 - kfree(md->disk->seq_zones_wlock); 145 + bitmap_free(md->disk->seq_zones_wlock); 147 146 md->disk->seq_zones_wlock = NULL; 148 147 } 149 148 ··· 183 182 switch (zone->type) { 184 183 case BLK_ZONE_TYPE_CONVENTIONAL: 185 184 if (!disk->conv_zones_bitmap) { 186 - disk->conv_zones_bitmap = 187 - kcalloc(BITS_TO_LONGS(disk->nr_zones), 188 - sizeof(unsigned long), GFP_NOIO); 185 + disk->conv_zones_bitmap = bitmap_zalloc(disk->nr_zones, 186 + GFP_NOIO); 189 187 if (!disk->conv_zones_bitmap) 190 188 return -ENOMEM; 191 189 } ··· 193 193 case BLK_ZONE_TYPE_SEQWRITE_REQ: 194 194 case BLK_ZONE_TYPE_SEQWRITE_PREF: 195 195 if (!disk->seq_zones_wlock) { 196 - disk->seq_zones_wlock = 197 - kcalloc(BITS_TO_LONGS(disk->nr_zones), 198 - sizeof(unsigned long), GFP_NOIO); 196 + disk->seq_zones_wlock = bitmap_zalloc(disk->nr_zones, 197 + GFP_NOIO); 199 198 if (!disk->seq_zones_wlock) 200 199 return -ENOMEM; 201 200 }
+32 -26
drivers/md/dm.c
··· 487 487 } 488 488 EXPORT_SYMBOL_GPL(dm_start_time_ns_from_clone); 489 489 490 - static bool bio_is_flush_with_data(struct bio *bio) 490 + static inline bool bio_is_flush_with_data(struct bio *bio) 491 491 { 492 492 return ((bio->bi_opf & REQ_PREFLUSH) && bio->bi_iter.bi_size); 493 493 } 494 494 495 - static void dm_io_acct(struct dm_io *io, bool end) 495 + static inline unsigned int dm_io_sectors(struct dm_io *io, struct bio *bio) 496 496 { 497 - struct dm_stats_aux *stats_aux = &io->stats_aux; 498 - unsigned long start_time = io->start_time; 499 - struct mapped_device *md = io->md; 500 - struct bio *bio = io->orig_bio; 501 - unsigned int sectors; 502 - 503 497 /* 504 498 * If REQ_PREFLUSH set, don't account payload, it will be 505 499 * submitted (and accounted) after this flush completes. 506 500 */ 507 501 if (bio_is_flush_with_data(bio)) 508 - sectors = 0; 509 - else if (likely(!(dm_io_flagged(io, DM_IO_WAS_SPLIT)))) 510 - sectors = bio_sectors(bio); 511 - else 512 - sectors = io->sectors; 502 + return 0; 503 + if (unlikely(dm_io_flagged(io, DM_IO_WAS_SPLIT))) 504 + return io->sectors; 505 + return bio_sectors(bio); 506 + } 513 507 514 - if (!end) 515 - bdev_start_io_acct(bio->bi_bdev, bio_op(bio), start_time); 516 - else 517 - bdev_end_io_acct(bio->bi_bdev, bio_op(bio), sectors, 518 - start_time); 508 + static void dm_io_acct(struct dm_io *io, bool end) 509 + { 510 + struct bio *bio = io->orig_bio; 511 + 512 + if (dm_io_flagged(io, DM_IO_BLK_STAT)) { 513 + if (!end) 514 + bdev_start_io_acct(bio->bi_bdev, bio_op(bio), 515 + io->start_time); 516 + else 517 + bdev_end_io_acct(bio->bi_bdev, bio_op(bio), 518 + dm_io_sectors(io, bio), 519 + io->start_time); 520 + } 519 521 520 522 if (static_branch_unlikely(&stats_enabled) && 521 - unlikely(dm_stats_used(&md->stats))) { 523 + unlikely(dm_stats_used(&io->md->stats))) { 522 524 sector_t sector; 523 525 524 - if (likely(!dm_io_flagged(io, DM_IO_WAS_SPLIT))) 525 - sector = bio->bi_iter.bi_sector; 526 - else 526 + if (unlikely(dm_io_flagged(io, DM_IO_WAS_SPLIT))) 527 527 sector = bio_end_sector(bio) - io->sector_offset; 528 + else 529 + sector = bio->bi_iter.bi_sector; 528 530 529 - dm_stats_account_io(&md->stats, bio_data_dir(bio), 530 - sector, sectors, 531 - end, start_time, stats_aux); 531 + dm_stats_account_io(&io->md->stats, bio_data_dir(bio), 532 + sector, dm_io_sectors(io, bio), 533 + end, io->start_time, &io->stats_aux); 532 534 } 533 535 } 534 536 ··· 594 592 spin_lock_init(&io->lock); 595 593 io->start_time = jiffies; 596 594 io->flags = 0; 595 + if (blk_queue_io_stat(md->queue)) 596 + dm_io_set_flag(io, DM_IO_BLK_STAT); 597 597 598 - if (static_branch_unlikely(&stats_enabled)) 598 + if (static_branch_unlikely(&stats_enabled) && 599 + unlikely(dm_stats_used(&md->stats))) 599 600 dm_stats_record_start(&md->stats, &io->stats_aux); 600 601 601 602 return io; ··· 2353 2348 break; 2354 2349 case DM_TYPE_BIO_BASED: 2355 2350 case DM_TYPE_DAX_BIO_BASED: 2351 + blk_queue_flag_set(QUEUE_FLAG_IO_STAT, md->queue); 2356 2352 break; 2357 2353 case DM_TYPE_NONE: 2358 2354 WARN_ON_ONCE(true);
-3
drivers/md/dm.h
··· 210 210 int dm_kobject_uevent(struct mapped_device *md, enum kobject_action action, 211 211 unsigned int cookie, bool need_resize_uevent); 212 212 213 - void dm_internal_suspend(struct mapped_device *md); 214 - void dm_internal_resume(struct mapped_device *md); 215 - 216 213 int dm_io_init(void); 217 214 void dm_io_exit(void); 218 215
+6
drivers/md/persistent-data/dm-block-manager.c
··· 421 421 } 422 422 EXPORT_SYMBOL_GPL(dm_block_manager_destroy); 423 423 424 + void dm_block_manager_reset(struct dm_block_manager *bm) 425 + { 426 + dm_bufio_client_reset(bm->bufio); 427 + } 428 + EXPORT_SYMBOL_GPL(dm_block_manager_reset); 429 + 424 430 unsigned int dm_bm_block_size(struct dm_block_manager *bm) 425 431 { 426 432 return dm_bufio_get_block_size(bm->bufio);
+1
drivers/md/persistent-data/dm-block-manager.h
··· 36 36 struct block_device *bdev, unsigned int block_size, 37 37 unsigned int max_held_per_thread); 38 38 void dm_block_manager_destroy(struct dm_block_manager *bm); 39 + void dm_block_manager_reset(struct dm_block_manager *bm); 39 40 40 41 unsigned int dm_bm_block_size(struct dm_block_manager *bm); 41 42 dm_block_t dm_bm_nr_blocks(struct dm_block_manager *bm);
+2 -1
drivers/md/persistent-data/dm-space-map.h
··· 77 77 78 78 static inline void dm_sm_destroy(struct dm_space_map *sm) 79 79 { 80 - sm->destroy(sm); 80 + if (sm) 81 + sm->destroy(sm); 81 82 } 82 83 83 84 static inline int dm_sm_extend(struct dm_space_map *sm, dm_block_t extra_blocks)
+3
drivers/md/persistent-data/dm-transaction-manager.c
··· 199 199 200 200 void dm_tm_destroy(struct dm_transaction_manager *tm) 201 201 { 202 + if (!tm) 203 + return; 204 + 202 205 if (!tm->is_clone) 203 206 wipe_shadow_table(tm); 204 207
+2
include/linux/dm-bufio.h
··· 38 38 */ 39 39 void dm_bufio_client_destroy(struct dm_bufio_client *c); 40 40 41 + void dm_bufio_client_reset(struct dm_bufio_client *c); 42 + 41 43 /* 42 44 * Set the sector range. 43 45 * When this function is called, there must be no I/O in progress on the bufio