Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

Pull device mapper updates from Mike Snitzer:

- DM multipath locking fixes around m->flags tests and improvements to
bio-based code so that it follows patterns established by
request-based code.

- Request-based DM core improvement to eliminate unnecessary call to
blk_mq_queue_stopped().

- Add "panic_on_corruption" error handling mode to DM verity target.

- DM bufio fix to to perform buffer cleanup from a workqueue rather
than wait for IO in reclaim context from shrinker.

- DM crypt improvement to optionally avoid async processing via
workqueues for reads and/or writes -- via "no_read_workqueue" and
"no_write_workqueue" features. This more direct IO processing
improves latency and throughput with faster storage. Avoiding
workqueue IO submission for writes (DM_CRYPT_NO_WRITE_WORKQUEUE) is a
requirement for adding zoned block device support to DM crypt.

- Add zoned block device support to DM crypt. Makes use of
DM_CRYPT_NO_WRITE_WORKQUEUE and a new optional feature
(DM_CRYPT_WRITE_INLINE) that allows write completion to wait for
encryption to complete. This allows write ordering to be preserved,
which is needed for zoned block devices.

- Fix DM ebs target's check for REQ_OP_FLUSH.

- Fix DM core's report zones support to not report more zones than were
requested.

- A few small compiler warning fixes.

- DM dust improvements to return output directly to the user rather
than require they scrape the system log for output.

* tag 'for-5.9/dm-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm: don't call report zones for more than the user requested
dm ebs: Fix incorrect checking for REQ_OP_FLUSH
dm init: Set file local variable static
dm ioctl: Fix compilation warning
dm raid: Remove empty if statement
dm verity: Fix compilation warning
dm crypt: Enable zoned block device support
dm crypt: add flags to optionally bypass kcryptd workqueues
dm bufio: do buffer cleanup from a workqueue
dm rq: don't call blk_mq_queue_stopped() in dm_stop_queue()
dm dust: add interface to list all badblocks
dm dust: report some message results directly back to user
dm verity: add "panic_on_corruption" error handling mode
dm mpath: use double checked locking in fast path
dm mpath: rename current_pgpath to pgpath in multipath_prepare_ioctl
dm mpath: rework __map_bio()
dm mpath: factor out multipath_queue_bio
dm mpath: push locking down to must_push_back_rq()
dm mpath: take m->lock spinlock when testing QUEUE_IF_NO_PATH
dm mpath: changes from initial m->flags locking audit

+355 -118
+25 -7
Documentation/admin-guide/device-mapper/dm-dust.rst
··· 69 69 $ sudo dmsetup create dust1 --table '0 33552384 dust /dev/vdb1 0 4096' 70 70 71 71 Check the status of the read behavior ("bypass" indicates that all I/O 72 - will be passed through to the underlying device):: 72 + will be passed through to the underlying device; "verbose" indicates that 73 + bad block additions, removals, and remaps will be verbosely logged):: 73 74 74 75 $ sudo dmsetup status dust1 75 - 0 33552384 dust 252:17 bypass 76 + 0 33552384 dust 252:17 bypass verbose 76 77 77 78 $ sudo dd if=/dev/mapper/dust1 of=/dev/null bs=512 count=128 iflag=direct 78 79 128+0 records in ··· 165 164 A message will print with the number of bad blocks currently 166 165 configured on the device:: 167 166 168 - kernel: device-mapper: dust: countbadblocks: 895 badblock(s) found 167 + countbadblocks: 895 badblock(s) found 169 168 170 169 Querying for specific bad blocks 171 170 -------------------------------- ··· 177 176 178 177 The following message will print if the block is in the list:: 179 178 180 - device-mapper: dust: queryblock: block 72 found in badblocklist 179 + dust_query_block: block 72 found in badblocklist 181 180 182 181 The following message will print if the block is not in the list:: 183 182 184 - device-mapper: dust: queryblock: block 72 not found in badblocklist 183 + dust_query_block: block 72 not found in badblocklist 185 184 186 185 The "queryblock" message command will work in both the "enabled" 187 186 and "disabled" modes, allowing the verification of whether a block ··· 199 198 200 199 After clearing the bad block list, the following message will appear:: 201 200 202 - kernel: device-mapper: dust: clearbadblocks: badblocks cleared 201 + dust_clear_badblocks: badblocks cleared 203 202 204 203 If there were no bad blocks to clear, the following message will 205 204 appear:: 206 205 207 - kernel: device-mapper: dust: clearbadblocks: no badblocks found 206 + dust_clear_badblocks: no badblocks found 207 + 208 + Listing the bad block list 209 + -------------------------- 210 + 211 + To list all bad blocks in the bad block list (using an example device 212 + with blocks 1 and 2 in the bad block list), run the following message 213 + command:: 214 + 215 + $ sudo dmsetup message dust1 0 listbadblocks 216 + 1 217 + 2 218 + 219 + If there are no bad blocks in the bad block list, the command will 220 + execute with no output:: 221 + 222 + $ sudo dmsetup message dust1 0 listbadblocks 208 223 209 224 Message commands list 210 225 --------------------- ··· 240 223 241 224 countbadblocks 242 225 clearbadblocks 226 + listbadblocks 243 227 disable 244 228 enable 245 229 quiet
+4
Documentation/admin-guide/device-mapper/verity.rst
··· 83 83 not compatible with ignore_corruption and requires user space support to 84 84 avoid restart loops. 85 85 86 + panic_on_corruption 87 + Panic the device when a corrupted block is discovered. This option is 88 + not compatible with ignore_corruption and restart_on_corruption. 89 + 86 90 ignore_zero_blocks 87 91 Do not verify blocks that are expected to contain zeroes and always return 88 92 zeroes instead. This may be useful if the partition contains unused blocks
+41 -19
drivers/md/dm-bufio.c
··· 108 108 int async_write_error; 109 109 110 110 struct list_head client_list; 111 + 111 112 struct shrinker shrinker; 113 + struct work_struct shrink_work; 114 + atomic_long_t need_shrink; 112 115 }; 113 116 114 117 /* ··· 1637 1634 return retain_bytes; 1638 1635 } 1639 1636 1640 - static unsigned long __scan(struct dm_bufio_client *c, unsigned long nr_to_scan, 1641 - gfp_t gfp_mask) 1637 + static void __scan(struct dm_bufio_client *c) 1642 1638 { 1643 1639 int l; 1644 1640 struct dm_buffer *b, *tmp; ··· 1648 1646 1649 1647 for (l = 0; l < LIST_SIZE; l++) { 1650 1648 list_for_each_entry_safe_reverse(b, tmp, &c->lru[l], lru_list) { 1651 - if (__try_evict_buffer(b, gfp_mask)) 1649 + if (count - freed <= retain_target) 1650 + atomic_long_set(&c->need_shrink, 0); 1651 + if (!atomic_long_read(&c->need_shrink)) 1652 + return; 1653 + if (__try_evict_buffer(b, GFP_KERNEL)) { 1654 + atomic_long_dec(&c->need_shrink); 1652 1655 freed++; 1653 - if (!--nr_to_scan || ((count - freed) <= retain_target)) 1654 - return freed; 1656 + } 1655 1657 cond_resched(); 1656 1658 } 1657 1659 } 1658 - return freed; 1659 1660 } 1660 1661 1661 - static unsigned long 1662 - dm_bufio_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) 1662 + static void shrink_work(struct work_struct *w) 1663 + { 1664 + struct dm_bufio_client *c = container_of(w, struct dm_bufio_client, shrink_work); 1665 + 1666 + dm_bufio_lock(c); 1667 + __scan(c); 1668 + dm_bufio_unlock(c); 1669 + } 1670 + 1671 + static unsigned long dm_bufio_shrink_scan(struct shrinker *shrink, struct shrink_control *sc) 1663 1672 { 1664 1673 struct dm_bufio_client *c; 1665 - unsigned long freed; 1666 1674 1667 1675 c = container_of(shrink, struct dm_bufio_client, shrinker); 1668 - if (sc->gfp_mask & __GFP_FS) 1669 - dm_bufio_lock(c); 1670 - else if (!dm_bufio_trylock(c)) 1671 - return SHRINK_STOP; 1676 + atomic_long_add(sc->nr_to_scan, &c->need_shrink); 1677 + queue_work(dm_bufio_wq, &c->shrink_work); 1672 1678 1673 - freed = __scan(c, sc->nr_to_scan, sc->gfp_mask); 1674 - dm_bufio_unlock(c); 1675 - return freed; 1679 + return sc->nr_to_scan; 1676 1680 } 1677 1681 1678 - static unsigned long 1679 - dm_bufio_shrink_count(struct shrinker *shrink, struct shrink_control *sc) 1682 + static unsigned long dm_bufio_shrink_count(struct shrinker *shrink, struct shrink_control *sc) 1680 1683 { 1681 1684 struct dm_bufio_client *c = container_of(shrink, struct dm_bufio_client, shrinker); 1682 1685 unsigned long count = READ_ONCE(c->n_buffers[LIST_CLEAN]) + 1683 1686 READ_ONCE(c->n_buffers[LIST_DIRTY]); 1684 1687 unsigned long retain_target = get_retain_buffers(c); 1688 + unsigned long queued_for_cleanup = atomic_long_read(&c->need_shrink); 1685 1689 1686 - return (count < retain_target) ? 0 : (count - retain_target); 1690 + if (unlikely(count < retain_target)) 1691 + count = 0; 1692 + else 1693 + count -= retain_target; 1694 + 1695 + if (unlikely(count < queued_for_cleanup)) 1696 + count = 0; 1697 + else 1698 + count -= queued_for_cleanup; 1699 + 1700 + return count; 1687 1701 } 1688 1702 1689 1703 /* ··· 1790 1772 __free_buffer_wake(b); 1791 1773 } 1792 1774 1775 + INIT_WORK(&c->shrink_work, shrink_work); 1776 + atomic_long_set(&c->need_shrink, 0); 1777 + 1793 1778 c->shrinker.count_objects = dm_bufio_shrink_count; 1794 1779 c->shrinker.scan_objects = dm_bufio_shrink_scan; 1795 1780 c->shrinker.seeks = 1; ··· 1838 1817 drop_buffers(c); 1839 1818 1840 1819 unregister_shrinker(&c->shrinker); 1820 + flush_work(&c->shrink_work); 1841 1821 1842 1822 mutex_lock(&dm_bufio_clients_lock); 1843 1823
+116 -13
drivers/md/dm-crypt.c
··· 69 69 u8 *integrity_metadata; 70 70 bool integrity_metadata_from_pool; 71 71 struct work_struct work; 72 + struct tasklet_struct tasklet; 72 73 73 74 struct convert_context ctx; 74 75 ··· 128 127 * and encrypts / decrypts at the same time. 129 128 */ 130 129 enum flags { DM_CRYPT_SUSPENDED, DM_CRYPT_KEY_VALID, 131 - DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD }; 130 + DM_CRYPT_SAME_CPU, DM_CRYPT_NO_OFFLOAD, 131 + DM_CRYPT_NO_READ_WORKQUEUE, DM_CRYPT_NO_WRITE_WORKQUEUE, 132 + DM_CRYPT_WRITE_INLINE }; 132 133 133 134 enum cipher_flags { 134 135 CRYPT_MODE_INTEGRITY_AEAD, /* Use authenticated mode for cihper */ ··· 1526 1523 * Encrypt / decrypt data from one bio to another one (can be the same one) 1527 1524 */ 1528 1525 static blk_status_t crypt_convert(struct crypt_config *cc, 1529 - struct convert_context *ctx) 1526 + struct convert_context *ctx, bool atomic) 1530 1527 { 1531 1528 unsigned int tag_offset = 0; 1532 1529 unsigned int sector_step = cc->sector_size >> SECTOR_SHIFT; ··· 1569 1566 atomic_dec(&ctx->cc_pending); 1570 1567 ctx->cc_sector += sector_step; 1571 1568 tag_offset++; 1572 - cond_resched(); 1569 + if (!atomic) 1570 + cond_resched(); 1573 1571 continue; 1574 1572 /* 1575 1573 * There was a data integrity error. ··· 1896 1892 1897 1893 clone->bi_iter.bi_sector = cc->start + io->sector; 1898 1894 1899 - if (likely(!async) && test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags)) { 1895 + if ((likely(!async) && test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags)) || 1896 + test_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags)) { 1900 1897 submit_bio_noacct(clone); 1901 1898 return; 1902 1899 } ··· 1920 1915 spin_unlock_irqrestore(&cc->write_thread_lock, flags); 1921 1916 } 1922 1917 1918 + static bool kcryptd_crypt_write_inline(struct crypt_config *cc, 1919 + struct convert_context *ctx) 1920 + 1921 + { 1922 + if (!test_bit(DM_CRYPT_WRITE_INLINE, &cc->flags)) 1923 + return false; 1924 + 1925 + /* 1926 + * Note: zone append writes (REQ_OP_ZONE_APPEND) do not have ordering 1927 + * constraints so they do not need to be issued inline by 1928 + * kcryptd_crypt_write_convert(). 1929 + */ 1930 + switch (bio_op(ctx->bio_in)) { 1931 + case REQ_OP_WRITE: 1932 + case REQ_OP_WRITE_SAME: 1933 + case REQ_OP_WRITE_ZEROES: 1934 + return true; 1935 + default: 1936 + return false; 1937 + } 1938 + } 1939 + 1923 1940 static void kcryptd_crypt_write_convert(struct dm_crypt_io *io) 1924 1941 { 1925 1942 struct crypt_config *cc = io->cc; 1943 + struct convert_context *ctx = &io->ctx; 1926 1944 struct bio *clone; 1927 1945 int crypt_finished; 1928 1946 sector_t sector = io->sector; ··· 1955 1927 * Prevent io from disappearing until this function completes. 1956 1928 */ 1957 1929 crypt_inc_pending(io); 1958 - crypt_convert_init(cc, &io->ctx, NULL, io->base_bio, sector); 1930 + crypt_convert_init(cc, ctx, NULL, io->base_bio, sector); 1959 1931 1960 1932 clone = crypt_alloc_buffer(io, io->base_bio->bi_iter.bi_size); 1961 1933 if (unlikely(!clone)) { ··· 1969 1941 sector += bio_sectors(clone); 1970 1942 1971 1943 crypt_inc_pending(io); 1972 - r = crypt_convert(cc, &io->ctx); 1944 + r = crypt_convert(cc, ctx, 1945 + test_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags)); 1973 1946 if (r) 1974 1947 io->error = r; 1975 - crypt_finished = atomic_dec_and_test(&io->ctx.cc_pending); 1948 + crypt_finished = atomic_dec_and_test(&ctx->cc_pending); 1949 + if (!crypt_finished && kcryptd_crypt_write_inline(cc, ctx)) { 1950 + /* Wait for completion signaled by kcryptd_async_done() */ 1951 + wait_for_completion(&ctx->restart); 1952 + crypt_finished = 1; 1953 + } 1976 1954 1977 1955 /* Encryption was already finished, submit io now */ 1978 1956 if (crypt_finished) { ··· 2005 1971 crypt_convert_init(cc, &io->ctx, io->base_bio, io->base_bio, 2006 1972 io->sector); 2007 1973 2008 - r = crypt_convert(cc, &io->ctx); 1974 + r = crypt_convert(cc, &io->ctx, 1975 + test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags)); 2009 1976 if (r) 2010 1977 io->error = r; 2011 1978 ··· 2050 2015 if (!atomic_dec_and_test(&ctx->cc_pending)) 2051 2016 return; 2052 2017 2053 - if (bio_data_dir(io->base_bio) == READ) 2018 + /* 2019 + * The request is fully completed: for inline writes, let 2020 + * kcryptd_crypt_write_convert() do the IO submission. 2021 + */ 2022 + if (bio_data_dir(io->base_bio) == READ) { 2054 2023 kcryptd_crypt_read_done(io); 2055 - else 2056 - kcryptd_crypt_write_io_submit(io, 1); 2024 + return; 2025 + } 2026 + 2027 + if (kcryptd_crypt_write_inline(cc, ctx)) { 2028 + complete(&ctx->restart); 2029 + return; 2030 + } 2031 + 2032 + kcryptd_crypt_write_io_submit(io, 1); 2057 2033 } 2058 2034 2059 2035 static void kcryptd_crypt(struct work_struct *work) ··· 2077 2031 kcryptd_crypt_write_convert(io); 2078 2032 } 2079 2033 2034 + static void kcryptd_crypt_tasklet(unsigned long work) 2035 + { 2036 + kcryptd_crypt((struct work_struct *)work); 2037 + } 2038 + 2080 2039 static void kcryptd_queue_crypt(struct dm_crypt_io *io) 2081 2040 { 2082 2041 struct crypt_config *cc = io->cc; 2042 + 2043 + if ((bio_data_dir(io->base_bio) == READ && test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags)) || 2044 + (bio_data_dir(io->base_bio) == WRITE && test_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags))) { 2045 + if (in_irq()) { 2046 + /* Crypto API's "skcipher_walk_first() refuses to work in hard IRQ context */ 2047 + tasklet_init(&io->tasklet, kcryptd_crypt_tasklet, (unsigned long)&io->work); 2048 + tasklet_schedule(&io->tasklet); 2049 + return; 2050 + } 2051 + 2052 + kcryptd_crypt(&io->work); 2053 + return; 2054 + } 2083 2055 2084 2056 INIT_WORK(&io->work, kcryptd_crypt); 2085 2057 queue_work(cc->crypt_queue, &io->work); ··· 2902 2838 struct crypt_config *cc = ti->private; 2903 2839 struct dm_arg_set as; 2904 2840 static const struct dm_arg _args[] = { 2905 - {0, 6, "Invalid number of feature args"}, 2841 + {0, 8, "Invalid number of feature args"}, 2906 2842 }; 2907 2843 unsigned int opt_params, val; 2908 2844 const char *opt_string, *sval; ··· 2932 2868 2933 2869 else if (!strcasecmp(opt_string, "submit_from_crypt_cpus")) 2934 2870 set_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags); 2871 + else if (!strcasecmp(opt_string, "no_read_workqueue")) 2872 + set_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags); 2873 + else if (!strcasecmp(opt_string, "no_write_workqueue")) 2874 + set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags); 2935 2875 else if (sscanf(opt_string, "integrity:%u:", &val) == 1) { 2936 2876 if (val == 0 || val > MAX_TAG_SIZE) { 2937 2877 ti->error = "Invalid integrity arguments"; ··· 2975 2907 2976 2908 return 0; 2977 2909 } 2910 + 2911 + #ifdef CONFIG_BLK_DEV_ZONED 2912 + 2913 + static int crypt_report_zones(struct dm_target *ti, 2914 + struct dm_report_zones_args *args, unsigned int nr_zones) 2915 + { 2916 + struct crypt_config *cc = ti->private; 2917 + sector_t sector = cc->start + dm_target_offset(ti, args->next_sector); 2918 + 2919 + args->start = cc->start; 2920 + return blkdev_report_zones(cc->dev->bdev, sector, nr_zones, 2921 + dm_report_zones_cb, args); 2922 + } 2923 + 2924 + #endif 2978 2925 2979 2926 /* 2980 2927 * Construct an encryption mapping: ··· 3123 3040 goto bad; 3124 3041 } 3125 3042 cc->start = tmpll; 3043 + 3044 + /* 3045 + * For zoned block devices, we need to preserve the issuer write 3046 + * ordering. To do so, disable write workqueues and force inline 3047 + * encryption completion. 3048 + */ 3049 + if (bdev_is_zoned(cc->dev->bdev)) { 3050 + set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags); 3051 + set_bit(DM_CRYPT_WRITE_INLINE, &cc->flags); 3052 + } 3126 3053 3127 3054 if (crypt_integrity_aead(cc) || cc->integrity_iv_size) { 3128 3055 ret = crypt_integrity_ctr(cc, ti); ··· 3289 3196 num_feature_args += !!ti->num_discard_bios; 3290 3197 num_feature_args += test_bit(DM_CRYPT_SAME_CPU, &cc->flags); 3291 3198 num_feature_args += test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags); 3199 + num_feature_args += test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags); 3200 + num_feature_args += test_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags); 3292 3201 num_feature_args += cc->sector_size != (1 << SECTOR_SHIFT); 3293 3202 num_feature_args += test_bit(CRYPT_IV_LARGE_SECTORS, &cc->cipher_flags); 3294 3203 if (cc->on_disk_tag_size) ··· 3303 3208 DMEMIT(" same_cpu_crypt"); 3304 3209 if (test_bit(DM_CRYPT_NO_OFFLOAD, &cc->flags)) 3305 3210 DMEMIT(" submit_from_crypt_cpus"); 3211 + if (test_bit(DM_CRYPT_NO_READ_WORKQUEUE, &cc->flags)) 3212 + DMEMIT(" no_read_workqueue"); 3213 + if (test_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags)) 3214 + DMEMIT(" no_write_workqueue"); 3306 3215 if (cc->on_disk_tag_size) 3307 3216 DMEMIT(" integrity:%u:%s", cc->on_disk_tag_size, cc->cipher_auth); 3308 3217 if (cc->sector_size != (1 << SECTOR_SHIFT)) ··· 3419 3320 3420 3321 static struct target_type crypt_target = { 3421 3322 .name = "crypt", 3422 - .version = {1, 21, 0}, 3323 + .version = {1, 22, 0}, 3423 3324 .module = THIS_MODULE, 3424 3325 .ctr = crypt_ctr, 3425 3326 .dtr = crypt_dtr, 3327 + #ifdef CONFIG_BLK_DEV_ZONED 3328 + .features = DM_TARGET_ZONED_HM, 3329 + .report_zones = crypt_report_zones, 3330 + #endif 3426 3331 .map = crypt_map, 3427 3332 .status = crypt_status, 3428 3333 .postsuspend = crypt_postsuspend,
+45 -13
drivers/md/dm-dust.c
··· 138 138 return 0; 139 139 } 140 140 141 - static int dust_query_block(struct dust_device *dd, unsigned long long block) 141 + static int dust_query_block(struct dust_device *dd, unsigned long long block, char *result, 142 + unsigned int maxlen, unsigned int *sz_ptr) 142 143 { 143 144 struct badblock *bblock; 144 145 unsigned long flags; 146 + unsigned int sz = *sz_ptr; 145 147 146 148 spin_lock_irqsave(&dd->dust_lock, flags); 147 149 bblock = dust_rb_search(&dd->badblocklist, block); 148 150 if (bblock != NULL) 149 - DMINFO("%s: block %llu found in badblocklist", __func__, block); 151 + DMEMIT("%s: block %llu found in badblocklist", __func__, block); 150 152 else 151 - DMINFO("%s: block %llu not found in badblocklist", __func__, block); 153 + DMEMIT("%s: block %llu not found in badblocklist", __func__, block); 152 154 spin_unlock_irqrestore(&dd->dust_lock, flags); 153 155 154 - return 0; 156 + return 1; 155 157 } 156 158 157 159 static int __dust_map_read(struct dust_device *dd, sector_t thisblock) ··· 261 259 return true; 262 260 } 263 261 264 - static int dust_clear_badblocks(struct dust_device *dd) 262 + static int dust_clear_badblocks(struct dust_device *dd, char *result, unsigned int maxlen, 263 + unsigned int *sz_ptr) 265 264 { 266 265 unsigned long flags; 267 266 struct rb_root badblocklist; 268 267 unsigned long long badblock_count; 268 + unsigned int sz = *sz_ptr; 269 269 270 270 spin_lock_irqsave(&dd->dust_lock, flags); 271 271 badblocklist = dd->badblocklist; ··· 277 273 spin_unlock_irqrestore(&dd->dust_lock, flags); 278 274 279 275 if (!__dust_clear_badblocks(&badblocklist, badblock_count)) 280 - DMINFO("%s: no badblocks found", __func__); 276 + DMEMIT("%s: no badblocks found", __func__); 281 277 else 282 - DMINFO("%s: badblocks cleared", __func__); 278 + DMEMIT("%s: badblocks cleared", __func__); 283 279 284 - return 0; 280 + return 1; 281 + } 282 + 283 + static int dust_list_badblocks(struct dust_device *dd, char *result, unsigned int maxlen, 284 + unsigned int *sz_ptr) 285 + { 286 + unsigned long flags; 287 + struct rb_root badblocklist; 288 + struct rb_node *node; 289 + struct badblock *bblk; 290 + unsigned int sz = *sz_ptr; 291 + unsigned long long num = 0; 292 + 293 + spin_lock_irqsave(&dd->dust_lock, flags); 294 + badblocklist = dd->badblocklist; 295 + for (node = rb_first(&badblocklist); node; node = rb_next(node)) { 296 + bblk = rb_entry(node, struct badblock, node); 297 + DMEMIT("%llu\n", bblk->bb); 298 + num++; 299 + } 300 + 301 + spin_unlock_irqrestore(&dd->dust_lock, flags); 302 + if (!num) 303 + DMEMIT("No blocks in badblocklist"); 304 + 305 + return 1; 285 306 } 286 307 287 308 /* ··· 412 383 } 413 384 414 385 static int dust_message(struct dm_target *ti, unsigned int argc, char **argv, 415 - char *result_buf, unsigned int maxlen) 386 + char *result, unsigned int maxlen) 416 387 { 417 388 struct dust_device *dd = ti->private; 418 389 sector_t size = i_size_read(dd->dev->bdev->bd_inode) >> SECTOR_SHIFT; ··· 422 393 unsigned char wr_fail_cnt; 423 394 unsigned int tmp_ui; 424 395 unsigned long flags; 396 + unsigned int sz = 0; 425 397 char dummy; 426 398 427 399 if (argc == 1) { ··· 440 410 r = 0; 441 411 } else if (!strcasecmp(argv[0], "countbadblocks")) { 442 412 spin_lock_irqsave(&dd->dust_lock, flags); 443 - DMINFO("countbadblocks: %llu badblock(s) found", 413 + DMEMIT("countbadblocks: %llu badblock(s) found", 444 414 dd->badblock_count); 445 415 spin_unlock_irqrestore(&dd->dust_lock, flags); 446 - r = 0; 416 + r = 1; 447 417 } else if (!strcasecmp(argv[0], "clearbadblocks")) { 448 - r = dust_clear_badblocks(dd); 418 + r = dust_clear_badblocks(dd, result, maxlen, &sz); 449 419 } else if (!strcasecmp(argv[0], "quiet")) { 450 420 if (!dd->quiet_mode) 451 421 dd->quiet_mode = true; 452 422 else 453 423 dd->quiet_mode = false; 454 424 r = 0; 425 + } else if (!strcasecmp(argv[0], "listbadblocks")) { 426 + r = dust_list_badblocks(dd, result, maxlen, &sz); 455 427 } else { 456 428 invalid_msg = true; 457 429 } ··· 473 441 else if (!strcasecmp(argv[0], "removebadblock")) 474 442 r = dust_remove_block(dd, block); 475 443 else if (!strcasecmp(argv[0], "queryblock")) 476 - r = dust_query_block(dd, block); 444 + r = dust_query_block(dd, block, result, maxlen, &sz); 477 445 else 478 446 invalid_msg = true; 479 447
+1 -1
drivers/md/dm-ebs-target.c
··· 363 363 bio_set_dev(bio, ec->dev->bdev); 364 364 bio->bi_iter.bi_sector = ec->start + dm_target_offset(ti, bio->bi_iter.bi_sector); 365 365 366 - if (unlikely(bio->bi_opf & REQ_OP_FLUSH)) 366 + if (unlikely(bio_op(bio) == REQ_OP_FLUSH)) 367 367 return DM_MAPIO_REMAPPED; 368 368 /* 369 369 * Only queue for bufio processing in case of partial or overlapping buffers
+1 -1
drivers/md/dm-init.c
··· 36 36 struct list_head list; 37 37 }; 38 38 39 - const char * const dm_allowed_targets[] __initconst = { 39 + static const char * const dm_allowed_targets[] __initconst = { 40 40 "crypt", 41 41 "delay", 42 42 "linear",
+1 -1
drivers/md/dm-ioctl.c
··· 1168 1168 spec->sector_start = ti->begin; 1169 1169 spec->length = ti->len; 1170 1170 strncpy(spec->target_type, ti->type->name, 1171 - sizeof(spec->target_type)); 1171 + sizeof(spec->target_type) - 1); 1172 1172 1173 1173 outptr += sizeof(struct dm_target_spec); 1174 1174 remaining = len - (outptr - outbuf);
+98 -48
drivers/md/dm-mpath.c
··· 128 128 #define MPATHF_PG_INIT_REQUIRED 5 /* pg_init needs calling? */ 129 129 #define MPATHF_PG_INIT_DELAY_RETRY 6 /* Delay pg_init retry? */ 130 130 131 + static bool mpath_double_check_test_bit(int MPATHF_bit, struct multipath *m) 132 + { 133 + bool r = test_bit(MPATHF_bit, &m->flags); 134 + 135 + if (r) { 136 + unsigned long flags; 137 + spin_lock_irqsave(&m->lock, flags); 138 + r = test_bit(MPATHF_bit, &m->flags); 139 + spin_unlock_irqrestore(&m->lock, flags); 140 + } 141 + 142 + return r; 143 + } 144 + 131 145 /*----------------------------------------------- 132 146 * Allocation routines 133 147 *-----------------------------------------------*/ ··· 349 335 350 336 static void __switch_pg(struct multipath *m, struct priority_group *pg) 351 337 { 338 + lockdep_assert_held(&m->lock); 339 + 352 340 m->current_pg = pg; 353 341 354 342 /* Must we initialise the PG first, and queue I/O till it's ready? */ ··· 398 382 unsigned bypassed = 1; 399 383 400 384 if (!atomic_read(&m->nr_valid_paths)) { 385 + spin_lock_irqsave(&m->lock, flags); 401 386 clear_bit(MPATHF_QUEUE_IO, &m->flags); 387 + spin_unlock_irqrestore(&m->lock, flags); 402 388 goto failed; 403 389 } 404 390 ··· 440 422 continue; 441 423 pgpath = choose_path_in_pg(m, pg, nr_bytes); 442 424 if (!IS_ERR_OR_NULL(pgpath)) { 443 - if (!bypassed) 425 + if (!bypassed) { 426 + spin_lock_irqsave(&m->lock, flags); 444 427 set_bit(MPATHF_PG_INIT_DELAY_RETRY, &m->flags); 428 + spin_unlock_irqrestore(&m->lock, flags); 429 + } 445 430 return pgpath; 446 431 } 447 432 } ··· 486 465 487 466 static bool must_push_back_rq(struct multipath *m) 488 467 { 489 - return test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) || __must_push_back(m); 468 + unsigned long flags; 469 + bool ret; 470 + 471 + spin_lock_irqsave(&m->lock, flags); 472 + ret = (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags) || __must_push_back(m)); 473 + spin_unlock_irqrestore(&m->lock, flags); 474 + 475 + return ret; 490 476 } 491 477 492 478 /* ··· 513 485 514 486 /* Do we need to select a new pgpath? */ 515 487 pgpath = READ_ONCE(m->current_pgpath); 516 - if (!pgpath || !test_bit(MPATHF_QUEUE_IO, &m->flags)) 488 + if (!pgpath || !mpath_double_check_test_bit(MPATHF_QUEUE_IO, m)) 517 489 pgpath = choose_pgpath(m, nr_bytes); 518 490 519 491 if (!pgpath) { ··· 521 493 return DM_MAPIO_DELAY_REQUEUE; 522 494 dm_report_EIO(m); /* Failed */ 523 495 return DM_MAPIO_KILL; 524 - } else if (test_bit(MPATHF_QUEUE_IO, &m->flags) || 525 - test_bit(MPATHF_PG_INIT_REQUIRED, &m->flags)) { 496 + } else if (mpath_double_check_test_bit(MPATHF_QUEUE_IO, m) || 497 + mpath_double_check_test_bit(MPATHF_PG_INIT_REQUIRED, m)) { 526 498 pg_init_all_paths(m); 527 499 return DM_MAPIO_DELAY_REQUEUE; 528 500 } ··· 588 560 * Map cloned bios (bio-based multipath) 589 561 */ 590 562 563 + static void __multipath_queue_bio(struct multipath *m, struct bio *bio) 564 + { 565 + /* Queue for the daemon to resubmit */ 566 + bio_list_add(&m->queued_bios, bio); 567 + if (!test_bit(MPATHF_QUEUE_IO, &m->flags)) 568 + queue_work(kmultipathd, &m->process_queued_bios); 569 + } 570 + 571 + static void multipath_queue_bio(struct multipath *m, struct bio *bio) 572 + { 573 + unsigned long flags; 574 + 575 + spin_lock_irqsave(&m->lock, flags); 576 + __multipath_queue_bio(m, bio); 577 + spin_unlock_irqrestore(&m->lock, flags); 578 + } 579 + 591 580 static struct pgpath *__map_bio(struct multipath *m, struct bio *bio) 592 581 { 593 582 struct pgpath *pgpath; 594 583 unsigned long flags; 595 - bool queue_io; 596 584 597 585 /* Do we need to select a new pgpath? */ 598 586 pgpath = READ_ONCE(m->current_pgpath); 599 - if (!pgpath || !test_bit(MPATHF_QUEUE_IO, &m->flags)) 587 + if (!pgpath || !mpath_double_check_test_bit(MPATHF_QUEUE_IO, m)) 600 588 pgpath = choose_pgpath(m, bio->bi_iter.bi_size); 601 589 602 - /* MPATHF_QUEUE_IO might have been cleared by choose_pgpath. */ 603 - queue_io = test_bit(MPATHF_QUEUE_IO, &m->flags); 604 - 605 - if ((pgpath && queue_io) || 606 - (!pgpath && test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags))) { 607 - /* Queue for the daemon to resubmit */ 590 + if (!pgpath) { 608 591 spin_lock_irqsave(&m->lock, flags); 609 - bio_list_add(&m->queued_bios, bio); 592 + if (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) { 593 + __multipath_queue_bio(m, bio); 594 + pgpath = ERR_PTR(-EAGAIN); 595 + } 610 596 spin_unlock_irqrestore(&m->lock, flags); 611 597 612 - /* PG_INIT_REQUIRED cannot be set without QUEUE_IO */ 613 - if (queue_io || test_bit(MPATHF_PG_INIT_REQUIRED, &m->flags)) 614 - pg_init_all_paths(m); 615 - else if (!queue_io) 616 - queue_work(kmultipathd, &m->process_queued_bios); 617 - 598 + } else if (mpath_double_check_test_bit(MPATHF_QUEUE_IO, m) || 599 + mpath_double_check_test_bit(MPATHF_PG_INIT_REQUIRED, m)) { 600 + multipath_queue_bio(m, bio); 601 + pg_init_all_paths(m); 618 602 return ERR_PTR(-EAGAIN); 619 603 } 620 604 ··· 875 835 struct request_queue *q = bdev_get_queue(bdev); 876 836 int r; 877 837 878 - if (test_bit(MPATHF_RETAIN_ATTACHED_HW_HANDLER, &m->flags)) { 838 + if (mpath_double_check_test_bit(MPATHF_RETAIN_ATTACHED_HW_HANDLER, m)) { 879 839 retain: 880 840 if (*attached_handler_name) { 881 841 /* ··· 1654 1614 if (pgpath) 1655 1615 fail_path(pgpath); 1656 1616 1657 - if (atomic_read(&m->nr_valid_paths) == 0 && 1617 + if (!atomic_read(&m->nr_valid_paths) && 1658 1618 !must_push_back_rq(m)) { 1659 1619 if (error == BLK_STS_IOERR) 1660 1620 dm_report_EIO(m); ··· 1689 1649 if (pgpath) 1690 1650 fail_path(pgpath); 1691 1651 1692 - if (atomic_read(&m->nr_valid_paths) == 0 && 1693 - !test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) { 1694 - if (__must_push_back(m)) { 1695 - r = DM_ENDIO_REQUEUE; 1696 - } else { 1697 - dm_report_EIO(m); 1698 - *error = BLK_STS_IOERR; 1652 + if (!atomic_read(&m->nr_valid_paths)) { 1653 + spin_lock_irqsave(&m->lock, flags); 1654 + if (!test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) { 1655 + if (__must_push_back(m)) { 1656 + r = DM_ENDIO_REQUEUE; 1657 + } else { 1658 + dm_report_EIO(m); 1659 + *error = BLK_STS_IOERR; 1660 + } 1661 + spin_unlock_irqrestore(&m->lock, flags); 1662 + goto done; 1699 1663 } 1700 - goto done; 1664 + spin_unlock_irqrestore(&m->lock, flags); 1701 1665 } 1702 1666 1703 - spin_lock_irqsave(&m->lock, flags); 1704 - bio_list_add(&m->queued_bios, clone); 1705 - spin_unlock_irqrestore(&m->lock, flags); 1706 - if (!test_bit(MPATHF_QUEUE_IO, &m->flags)) 1707 - queue_work(kmultipathd, &m->process_queued_bios); 1708 - 1667 + multipath_queue_bio(m, clone); 1709 1668 r = DM_ENDIO_INCOMPLETE; 1710 1669 done: 1711 1670 if (pgpath) { ··· 1976 1937 struct block_device **bdev) 1977 1938 { 1978 1939 struct multipath *m = ti->private; 1979 - struct pgpath *current_pgpath; 1940 + struct pgpath *pgpath; 1941 + unsigned long flags; 1980 1942 int r; 1981 1943 1982 - current_pgpath = READ_ONCE(m->current_pgpath); 1983 - if (!current_pgpath || !test_bit(MPATHF_QUEUE_IO, &m->flags)) 1984 - current_pgpath = choose_pgpath(m, 0); 1944 + pgpath = READ_ONCE(m->current_pgpath); 1945 + if (!pgpath || !mpath_double_check_test_bit(MPATHF_QUEUE_IO, m)) 1946 + pgpath = choose_pgpath(m, 0); 1985 1947 1986 - if (current_pgpath) { 1987 - if (!test_bit(MPATHF_QUEUE_IO, &m->flags)) { 1988 - *bdev = current_pgpath->path.dev->bdev; 1948 + if (pgpath) { 1949 + if (!mpath_double_check_test_bit(MPATHF_QUEUE_IO, m)) { 1950 + *bdev = pgpath->path.dev->bdev; 1989 1951 r = 0; 1990 1952 } else { 1991 1953 /* pg_init has not started or completed */ ··· 1994 1954 } 1995 1955 } else { 1996 1956 /* No path is available */ 1957 + r = -EIO; 1958 + spin_lock_irqsave(&m->lock, flags); 1997 1959 if (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) 1998 1960 r = -ENOTCONN; 1999 - else 2000 - r = -EIO; 1961 + spin_unlock_irqrestore(&m->lock, flags); 2001 1962 } 2002 1963 2003 1964 if (r == -ENOTCONN) { ··· 2006 1965 /* Path status changed, redo selection */ 2007 1966 (void) choose_pgpath(m, 0); 2008 1967 } 1968 + spin_lock_irqsave(&m->lock, flags); 2009 1969 if (test_bit(MPATHF_PG_INIT_REQUIRED, &m->flags)) 2010 - pg_init_all_paths(m); 1970 + (void) __pg_init_all_paths(m); 1971 + spin_unlock_irqrestore(&m->lock, flags); 2011 1972 dm_table_run_md_queue_async(m->ti->table); 2012 1973 process_queued_io_list(m); 2013 1974 } ··· 2069 2026 return true; 2070 2027 2071 2028 /* no paths available, for blk-mq: rely on IO mapping to delay requeue */ 2072 - if (!atomic_read(&m->nr_valid_paths) && test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) 2073 - return (m->queue_mode != DM_TYPE_REQUEST_BASED); 2029 + if (!atomic_read(&m->nr_valid_paths)) { 2030 + unsigned long flags; 2031 + spin_lock_irqsave(&m->lock, flags); 2032 + if (test_bit(MPATHF_QUEUE_IF_NO_PATH, &m->flags)) { 2033 + spin_unlock_irqrestore(&m->lock, flags); 2034 + return (m->queue_mode != DM_TYPE_REQUEST_BASED); 2035 + } 2036 + spin_unlock_irqrestore(&m->lock, flags); 2037 + } 2074 2038 2075 2039 /* Guess which priority_group will be used at next mapping time */ 2076 2040 pg = READ_ONCE(m->current_pg);
-2
drivers/md/dm-raid.c
··· 2337 2337 2338 2338 if (new_devs == rs->raid_disks || !rebuilds) { 2339 2339 /* Replace a broken device */ 2340 - if (new_devs == 1 && !rs->delta_disks) 2341 - ; 2342 2340 if (new_devs == rs->raid_disks) { 2343 2341 DMINFO("Superblocks created for new raid set"); 2344 2342 set_bit(MD_ARRAY_FIRST_USE, &mddev->flags);
-3
drivers/md/dm-rq.c
··· 70 70 71 71 void dm_stop_queue(struct request_queue *q) 72 72 { 73 - if (blk_mq_queue_stopped(q)) 74 - return; 75 - 76 73 blk_mq_quiesce_queue(q); 77 74 } 78 75
+12 -1
drivers/md/dm-verity-target.c
··· 30 30 31 31 #define DM_VERITY_OPT_LOGGING "ignore_corruption" 32 32 #define DM_VERITY_OPT_RESTART "restart_on_corruption" 33 + #define DM_VERITY_OPT_PANIC "panic_on_corruption" 33 34 #define DM_VERITY_OPT_IGN_ZEROES "ignore_zero_blocks" 34 35 #define DM_VERITY_OPT_AT_MOST_ONCE "check_at_most_once" 35 36 ··· 254 253 255 254 if (v->mode == DM_VERITY_MODE_RESTART) 256 255 kernel_restart("dm-verity device corrupted"); 256 + 257 + if (v->mode == DM_VERITY_MODE_PANIC) 258 + panic("dm-verity device corrupted"); 257 259 258 260 return 1; 259 261 } ··· 746 742 case DM_VERITY_MODE_RESTART: 747 743 DMEMIT(DM_VERITY_OPT_RESTART); 748 744 break; 745 + case DM_VERITY_MODE_PANIC: 746 + DMEMIT(DM_VERITY_OPT_PANIC); 747 + break; 749 748 default: 750 749 BUG(); 751 750 } ··· 912 905 913 906 } else if (!strcasecmp(arg_name, DM_VERITY_OPT_RESTART)) { 914 907 v->mode = DM_VERITY_MODE_RESTART; 908 + continue; 909 + 910 + } else if (!strcasecmp(arg_name, DM_VERITY_OPT_PANIC)) { 911 + v->mode = DM_VERITY_MODE_PANIC; 915 912 continue; 916 913 917 914 } else if (!strcasecmp(arg_name, DM_VERITY_OPT_IGN_ZEROES)) { ··· 1232 1221 1233 1222 static struct target_type verity_target = { 1234 1223 .name = "verity", 1235 - .version = {1, 6, 0}, 1224 + .version = {1, 7, 0}, 1236 1225 .module = THIS_MODULE, 1237 1226 .ctr = verity_ctr, 1238 1227 .dtr = verity_dtr,
+7 -7
drivers/md/dm-verity-verify-sig.h
··· 34 34 35 35 #define DM_VERITY_ROOT_HASH_VERIFICATION_OPTS 0 36 36 37 - int verity_verify_root_hash(const void *data, size_t data_len, 38 - const void *sig_data, size_t sig_len) 37 + static inline int verity_verify_root_hash(const void *data, size_t data_len, 38 + const void *sig_data, size_t sig_len) 39 39 { 40 40 return 0; 41 41 } 42 42 43 - bool verity_verify_is_sig_opt_arg(const char *arg_name) 43 + static inline bool verity_verify_is_sig_opt_arg(const char *arg_name) 44 44 { 45 45 return false; 46 46 } 47 47 48 - int verity_verify_sig_parse_opt_args(struct dm_arg_set *as, struct dm_verity *v, 49 - struct dm_verity_sig_opts *sig_opts, 50 - unsigned int *argc, const char *arg_name) 48 + static inline int verity_verify_sig_parse_opt_args(struct dm_arg_set *as, 49 + struct dm_verity *v, struct dm_verity_sig_opts *sig_opts, 50 + unsigned int *argc, const char *arg_name) 51 51 { 52 52 return -EINVAL; 53 53 } 54 54 55 - void verity_verify_sig_opts_cleanup(struct dm_verity_sig_opts *sig_opts) 55 + static inline void verity_verify_sig_opts_cleanup(struct dm_verity_sig_opts *sig_opts) 56 56 { 57 57 } 58 58
+2 -1
drivers/md/dm-verity.h
··· 20 20 enum verity_mode { 21 21 DM_VERITY_MODE_EIO, 22 22 DM_VERITY_MODE_LOGGING, 23 - DM_VERITY_MODE_RESTART 23 + DM_VERITY_MODE_RESTART, 24 + DM_VERITY_MODE_PANIC 24 25 }; 25 26 26 27 enum verity_block_type {
+2 -1
drivers/md/dm.c
··· 504 504 } 505 505 506 506 args.tgt = tgt; 507 - ret = tgt->type->report_zones(tgt, &args, nr_zones); 507 + ret = tgt->type->report_zones(tgt, &args, 508 + nr_zones - args.zone_idx); 508 509 if (ret < 0) 509 510 goto out; 510 511 } while (args.zone_idx < nr_zones &&