Merge tag 'for-6.8/block-2024-01-18' of git://git.kernel.dk/linux

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Pull block fixes from Jens Axboe:

- NVMe pull request via Keith:
- tcp, fc, and rdma target fixes (Maurizio, Daniel, Hannes,
Christoph)
- discard fixes and improvements (Christoph)
- timeout debug improvements (Keith, Max)
- various cleanups (Daniel, Max, Giuxen)
- trace event string fixes (Arnd)
- shadow doorbell setup on reset fix (William)
- a write zeroes quirk for SK Hynix (Jim)

- MD pull request via Song:
- Sparse warning since v6.0 (Bart)
- /proc/mdstat regression since v6.7 (Yu Kuai)

- Use symbolic error value (Christian)

- IO Priority documentation update (Christian)

- Fix for accessing queue limits without having entered the queue
(Christoph, me)

- Fix for loop dio support (Christoph)

- Move null_blk off deprecated ida interface (Christophe)

- Ensure nbd initializes full msghdr (Eric)

- Fix for a regression with the folio conversion, which is now easier
to hit because of an unrelated change (Matthew)

- Remove redundant check in virtio-blk (Li)

- Fix for a potential hang in sbitmap (Ming)

- Fix for partial zone appending (Damien)

- Misc changes and fixes (Bart, me, Kemeng, Dmitry)

* tag 'for-6.8/block-2024-01-18' of git://git.kernel.dk/linux: (45 commits)
Documentation: block: ioprio: Update schedulers
loop: fix the the direct I/O support check when used on top of block devices
blk-mq: Remove the hctx 'run' debugfs attribute
nbd: always initialize struct msghdr completely
block: Fix iterating over an empty bio with bio_for_each_folio_all
block: bio-integrity: fix kcalloc() arguments order
virtio_blk: remove duplicate check if queue is broken in virtblk_done
sbitmap: remove stale comment in sbq_calc_wake_batch
block: Correct a documentation comment in blk-cgroup.c
null_blk: Remove usage of the deprecated ida_simple_xx() API
block: ensure we hold a queue reference when using queue limits
blk-mq: rename blk_mq_can_use_cached_rq
block: print symbolic error name instead of error code
blk-mq: fix IO hang from sbitmap wakeup race
nvmet-rdma: avoid circular locking dependency on install_queue()
nvmet-tcp: avoid circular locking dependency on install_queue()
nvme-pci: set doorbell config before unquiescing
block: fix partial zone append completion handling in req_bio_endio()
block/iocost: silence warning on 'last_period' potentially being unused
md/raid1: Use blk_opf_t for read and write operations
...

Linus Torvalds 2 years ago 9d1694dc e9a5a78d

+286 -223

34 changed files

expand all

Documentation

block

ioprio.rst

block

bio-integrity.c

blk-cgroup.c

blk-iocost.c

blk-mq-debugfs.c

blk-mq-sched.c

blk-mq.c

ioprio.c

partitions

core.c

drivers

block

loop.c

nbd.c

null_blk

main.c

virtio_blk.c

md.c

raid1.c

nvme

common

keyring.c

host

core.c

nvme.h

pci.c

pr.c

rdma.c

sysfs.c

tcp.c

target

fc.c

fcloop.c

rdma.c

tcp.c

trace.c

trace.h

include

linux

bio.h

blk-mq.h

ioprio.h

nvme.h

lib

sbitmap.c

+6 -7

Documentation/block/ioprio.rst

··· 6 6 Intro 7 7 ----- 8 8 9 - With the introduction of cfq v3 (aka cfq-ts or time sliced cfq), basic io 10 - priorities are supported for reads on files. This enables users to io nice 11 - processes or process groups, similar to what has been possible with cpu 12 - scheduling for ages. This document mainly details the current possibilities 13 - with cfq; other io schedulers do not support io priorities thus far. 9 + The io priority feature enables users to io nice processes or process groups, 10 + similar to what has been possible with cpu scheduling for ages. Support for io 11 + priorities is io scheduler dependent and currently supported by bfq and 12 + mq-deadline. 14 13 15 14 Scheduling classes 16 15 ------------------ 17 16 18 - CFQ implements three generic scheduling classes that determine how io is 19 - served for a process. 17 + Three generic scheduling classes are implemented for io priorities that 18 + determine how io is served for a process. 20 19 21 20 IOPRIO_CLASS_RT: This is the realtime io class. This scheduling class is given 22 21 higher priority than any other in the system, processes from this class are

+1 -1

block/bio-integrity.c

··· 336 336 if (nr_vecs > BIO_MAX_VECS) 337 337 return -E2BIG; 338 338 if (nr_vecs > UIO_FASTIOV) { 339 - bvec = kcalloc(sizeof(*bvec), nr_vecs, GFP_KERNEL); 339 + bvec = kcalloc(nr_vecs, sizeof(*bvec), GFP_KERNEL); 340 340 if (!bvec) 341 341 return -ENOMEM; 342 342 pages = NULL;

+1 -1

block/blk-cgroup.c

··· 300 300 * @disk: gendisk the new blkg is associated with 301 301 * @gfp_mask: allocation mask to use 302 302 * 303 - * Allocate a new blkg assocating @blkcg and @q. 303 + * Allocate a new blkg associating @blkcg and @disk. 304 304 */ 305 305 static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct gendisk *disk, 306 306 gfp_t gfp_mask)

+1 -1

block/blk-iocost.c

··· 1261 1261 static bool iocg_activate(struct ioc_gq *iocg, struct ioc_now *now) 1262 1262 { 1263 1263 struct ioc *ioc = iocg->ioc; 1264 - u64 last_period, cur_period; 1264 + u64 __maybe_unused last_period, cur_period; 1265 1265 u64 vtime, vtarget; 1266 1266 int i; 1267 1267

-18

block/blk-mq-debugfs.c

··· 479 479 return res; 480 480 } 481 481 482 - static int hctx_run_show(void *data, struct seq_file *m) 483 - { 484 - struct blk_mq_hw_ctx *hctx = data; 485 - 486 - seq_printf(m, "%lu\n", hctx->run); 487 - return 0; 488 - } 489 - 490 - static ssize_t hctx_run_write(void *data, const char __user *buf, size_t count, 491 - loff_t *ppos) 492 - { 493 - struct blk_mq_hw_ctx *hctx = data; 494 - 495 - hctx->run = 0; 496 - return count; 497 - } 498 - 499 482 static int hctx_active_show(void *data, struct seq_file *m) 500 483 { 501 484 struct blk_mq_hw_ctx *hctx = data; ··· 607 624 {"tags_bitmap", 0400, hctx_tags_bitmap_show}, 608 625 {"sched_tags", 0400, hctx_sched_tags_show}, 609 626 {"sched_tags_bitmap", 0400, hctx_sched_tags_bitmap_show}, 610 - {"run", 0600, hctx_run_show, hctx_run_write}, 611 627 {"active", 0400, hctx_active_show}, 612 628 {"dispatch_busy", 0400, hctx_dispatch_busy_show}, 613 629 {"type", 0400, hctx_type_show},

-2

block/blk-mq-sched.c

··· 324 324 if (unlikely(blk_mq_hctx_stopped(hctx) || blk_queue_quiesced(q))) 325 325 return; 326 326 327 - hctx->run++; 328 - 329 327 /* 330 328 * A return of -EAGAIN is an indication that hctx->dispatch is not 331 329 * empty and we must run again in order to avoid starving flushes.

+39 -11

block/blk-mq.c

··· 772 772 /* 773 773 * Partial zone append completions cannot be supported as the 774 774 * BIO fragments may end up not being written sequentially. 775 + * For such case, force the completed nbytes to be equal to 776 + * the BIO size so that bio_advance() sets the BIO remaining 777 + * size to 0 and we end up calling bio_endio() before returning. 775 778 */ 776 - if (bio->bi_iter.bi_size != nbytes) 779 + if (bio->bi_iter.bi_size != nbytes) { 777 780 bio->bi_status = BLK_STS_IOERR; 778 - else 781 + nbytes = bio->bi_iter.bi_size; 782 + } else { 779 783 bio->bi_iter.bi_sector = rq->__sector; 784 + } 780 785 } 781 786 782 787 bio_advance(bio, nbytes); ··· 1865 1860 __add_wait_queue(wq, wait); 1866 1861 1867 1862 /* 1863 + * Add one explicit barrier since blk_mq_get_driver_tag() may 1864 + * not imply barrier in case of failure. 1865 + * 1866 + * Order adding us to wait queue and allocating driver tag. 1867 + * 1868 + * The pair is the one implied in sbitmap_queue_wake_up() which 1869 + * orders clearing sbitmap tag bits and waitqueue_active() in 1870 + * __sbitmap_queue_wake_up(), since waitqueue_active() is lockless 1871 + * 1872 + * Otherwise, re-order of adding wait queue and getting driver tag 1873 + * may cause __sbitmap_queue_wake_up() to wake up nothing because 1874 + * the waitqueue_active() may not observe us in wait queue. 1875 + */ 1876 + smp_mb(); 1877 + 1878 + /* 1868 1879 * It's possible that a tag was freed in the window between the 1869 1880 * allocation failure and adding the hardware queue to the wait 1870 1881 * queue. ··· 2912 2891 return NULL; 2913 2892 } 2914 2893 2915 - /* return true if this @rq can be used for @bio */ 2916 - static bool blk_mq_can_use_cached_rq(struct request *rq, struct blk_plug *plug, 2894 + /* 2895 + * Check if we can use the passed on request for submitting the passed in bio, 2896 + * and remove it from the request list if it can be used. 2897 + */ 2898 + static bool blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug, 2917 2899 struct bio *bio) 2918 2900 { 2919 2901 enum hctx_type type = blk_mq_get_hctx_type(bio->bi_opf); ··· 2976 2952 blk_status_t ret; 2977 2953 2978 2954 bio = blk_queue_bounce(bio, q); 2979 - if (bio_may_exceed_limits(bio, &q->limits)) { 2980 - bio = __bio_split_to_limits(bio, &q->limits, &nr_segs); 2981 - if (!bio) 2982 - return; 2983 - } 2984 - 2985 2955 bio_set_ioprio(bio); 2986 2956 2987 2957 if (plug) { ··· 2984 2966 rq = NULL; 2985 2967 } 2986 2968 if (rq) { 2969 + if (unlikely(bio_may_exceed_limits(bio, &q->limits))) { 2970 + bio = __bio_split_to_limits(bio, &q->limits, &nr_segs); 2971 + if (!bio) 2972 + return; 2973 + } 2987 2974 if (!bio_integrity_prep(bio)) 2988 2975 return; 2989 2976 if (blk_mq_attempt_bio_merge(q, bio, nr_segs)) 2990 2977 return; 2991 - if (blk_mq_can_use_cached_rq(rq, plug, bio)) 2978 + if (blk_mq_use_cached_rq(rq, plug, bio)) 2992 2979 goto done; 2993 2980 percpu_ref_get(&q->q_usage_counter); 2994 2981 } else { 2995 2982 if (unlikely(bio_queue_enter(bio))) 2996 2983 return; 2984 + if (unlikely(bio_may_exceed_limits(bio, &q->limits))) { 2985 + bio = __bio_split_to_limits(bio, &q->limits, &nr_segs); 2986 + if (!bio) 2987 + goto fail; 2988 + } 2997 2989 if (!bio_integrity_prep(bio)) 2998 2990 goto fail; 2999 2991 }

-26

block/ioprio.c

··· 139 139 return ret; 140 140 } 141 141 142 - /* 143 - * If the task has set an I/O priority, use that. Otherwise, return 144 - * the default I/O priority. 145 - * 146 - * Expected to be called for current task or with task_lock() held to keep 147 - * io_context stable. 148 - */ 149 - int __get_task_ioprio(struct task_struct *p) 150 - { 151 - struct io_context *ioc = p->io_context; 152 - int prio; 153 - 154 - if (p != current) 155 - lockdep_assert_held(&p->alloc_lock); 156 - if (ioc) 157 - prio = ioc->ioprio; 158 - else 159 - prio = IOPRIO_DEFAULT; 160 - 161 - if (IOPRIO_PRIO_CLASS(prio) == IOPRIO_CLASS_NONE) 162 - prio = IOPRIO_PRIO_VALUE(task_nice_ioclass(p), 163 - task_nice_ioprio(p)); 164 - return prio; 165 - } 166 - EXPORT_SYMBOL_GPL(__get_task_ioprio); 167 - 168 142 static int get_task_ioprio(struct task_struct *p) 169 143 { 170 144 int ret;

+2 -2

block/partitions/core.c

··· 562 562 part = add_partition(disk, p, from, size, state->parts[p].flags, 563 563 &state->parts[p].info); 564 564 if (IS_ERR(part) && PTR_ERR(part) != -ENXIO) { 565 - printk(KERN_ERR " %s: p%d could not be added: %ld\n", 566 - disk->disk_name, p, -PTR_ERR(part)); 565 + printk(KERN_ERR " %s: p%d could not be added: %pe\n", 566 + disk->disk_name, p, part); 567 567 return true; 568 568 } 569 569

+25 -27

drivers/block/loop.c

··· 165 165 return get_size(lo->lo_offset, lo->lo_sizelimit, file); 166 166 } 167 167 168 + /* 169 + * We support direct I/O only if lo_offset is aligned with the logical I/O size 170 + * of backing device, and the logical block size of loop is bigger than that of 171 + * the backing device. 172 + */ 173 + static bool lo_bdev_can_use_dio(struct loop_device *lo, 174 + struct block_device *backing_bdev) 175 + { 176 + unsigned short sb_bsize = bdev_logical_block_size(backing_bdev); 177 + 178 + if (queue_logical_block_size(lo->lo_queue) < sb_bsize) 179 + return false; 180 + if (lo->lo_offset & (sb_bsize - 1)) 181 + return false; 182 + return true; 183 + } 184 + 168 185 static void __loop_update_dio(struct loop_device *lo, bool dio) 169 186 { 170 187 struct file *file = lo->lo_backing_file; 171 - struct address_space *mapping = file->f_mapping; 172 - struct inode *inode = mapping->host; 173 - unsigned short sb_bsize = 0; 174 - unsigned dio_align = 0; 188 + struct inode *inode = file->f_mapping->host; 189 + struct block_device *backing_bdev = NULL; 175 190 bool use_dio; 176 191 177 - if (inode->i_sb->s_bdev) { 178 - sb_bsize = bdev_logical_block_size(inode->i_sb->s_bdev); 179 - dio_align = sb_bsize - 1; 180 - } 192 + if (S_ISBLK(inode->i_mode)) 193 + backing_bdev = I_BDEV(inode); 194 + else if (inode->i_sb->s_bdev) 195 + backing_bdev = inode->i_sb->s_bdev; 181 196 182 - /* 183 - * We support direct I/O only if lo_offset is aligned with the 184 - * logical I/O size of backing device, and the logical block 185 - * size of loop is bigger than the backing device's. 186 - * 187 - * TODO: the above condition may be loosed in the future, and 188 - * direct I/O may be switched runtime at that time because most 189 - * of requests in sane applications should be PAGE_SIZE aligned 190 - */ 191 - if (dio) { 192 - if (queue_logical_block_size(lo->lo_queue) >= sb_bsize && 193 - !(lo->lo_offset & dio_align) && 194 - (file->f_mode & FMODE_CAN_ODIRECT)) 195 - use_dio = true; 196 - else 197 - use_dio = false; 198 - } else { 199 - use_dio = false; 200 - } 197 + use_dio = dio && (file->f_mode & FMODE_CAN_ODIRECT) && 198 + (!backing_bdev || lo_bdev_can_use_dio(lo, backing_bdev)); 201 199 202 200 if (lo->use_dio == use_dio) 203 201 return;

+1 -5

drivers/block/nbd.c

··· 508 508 struct iov_iter *iter, int msg_flags, int *sent) 509 509 { 510 510 int result; 511 - struct msghdr msg; 511 + struct msghdr msg = {} ; 512 512 unsigned int noreclaim_flag; 513 513 514 514 if (unlikely(!sock)) { ··· 524 524 do { 525 525 sock->sk->sk_allocation = GFP_NOIO | __GFP_MEMALLOC; 526 526 sock->sk->sk_use_task_frag = false; 527 - msg.msg_name = NULL; 528 - msg.msg_namelen = 0; 529 - msg.msg_control = NULL; 530 - msg.msg_controllen = 0; 531 527 msg.msg_flags = msg_flags | MSG_NOSIGNAL; 532 528 533 529 if (send)

+2 -2

drivers/block/null_blk/main.c

··· 1840 1840 1841 1841 dev = nullb->dev; 1842 1842 1843 - ida_simple_remove(&nullb_indexes, nullb->index); 1843 + ida_free(&nullb_indexes, nullb->index); 1844 1844 1845 1845 list_del_init(&nullb->list); 1846 1846 ··· 2174 2174 blk_queue_flag_set(QUEUE_FLAG_NONROT, nullb->q); 2175 2175 2176 2176 mutex_lock(&lock); 2177 - rv = ida_simple_get(&nullb_indexes, 0, 0, GFP_KERNEL); 2177 + rv = ida_alloc(&nullb_indexes, GFP_KERNEL); 2178 2178 if (rv < 0) { 2179 2179 mutex_unlock(&lock); 2180 2180 goto out_cleanup_zone;

-2

drivers/block/virtio_blk.c

··· 367 367 blk_mq_complete_request(req); 368 368 req_done = true; 369 369 } 370 - if (unlikely(virtqueue_is_broken(vq))) 371 - break; 372 370 } while (!virtqueue_enable_cb(vq)); 373 371 374 372 /* In case queue is stopped waiting for more buffers. */

+27 -13

drivers/md/md.c

··· 8132 8132 seq_printf(seq, "\n"); 8133 8133 } 8134 8134 8135 + static void status_personalities(struct seq_file *seq) 8136 + { 8137 + struct md_personality *pers; 8138 + 8139 + seq_puts(seq, "Personalities : "); 8140 + spin_lock(&pers_lock); 8141 + list_for_each_entry(pers, &pers_list, list) 8142 + seq_printf(seq, "[%s] ", pers->name); 8143 + 8144 + spin_unlock(&pers_lock); 8145 + seq_puts(seq, "\n"); 8146 + } 8147 + 8135 8148 static int status_resync(struct seq_file *seq, struct mddev *mddev) 8136 8149 { 8137 8150 sector_t max_sectors, resync, res; ··· 8286 8273 static void *md_seq_start(struct seq_file *seq, loff_t *pos) 8287 8274 __acquires(&all_mddevs_lock) 8288 8275 { 8289 - struct md_personality *pers; 8290 - 8291 - seq_puts(seq, "Personalities : "); 8292 - spin_lock(&pers_lock); 8293 - list_for_each_entry(pers, &pers_list, list) 8294 - seq_printf(seq, "[%s] ", pers->name); 8295 - 8296 - spin_unlock(&pers_lock); 8297 - seq_puts(seq, "\n"); 8298 8276 seq->poll_event = atomic_read(&md_event_count); 8299 - 8300 8277 spin_lock(&all_mddevs_lock); 8301 8278 8302 - return seq_list_start(&all_mddevs, *pos); 8279 + return seq_list_start_head(&all_mddevs, *pos); 8303 8280 } 8304 8281 8305 8282 static void *md_seq_next(struct seq_file *seq, void *v, loff_t *pos) ··· 8300 8297 static void md_seq_stop(struct seq_file *seq, void *v) 8301 8298 __releases(&all_mddevs_lock) 8302 8299 { 8303 - status_unused(seq); 8304 8300 spin_unlock(&all_mddevs_lock); 8305 8301 } 8306 8302 8307 8303 static int md_seq_show(struct seq_file *seq, void *v) 8308 8304 { 8309 - struct mddev *mddev = list_entry(v, struct mddev, all_mddevs); 8305 + struct mddev *mddev; 8310 8306 sector_t sectors; 8311 8307 struct md_rdev *rdev; 8312 8308 8309 + if (v == &all_mddevs) { 8310 + status_personalities(seq); 8311 + if (list_empty(&all_mddevs)) 8312 + status_unused(seq); 8313 + return 0; 8314 + } 8315 + 8316 + mddev = list_entry(v, struct mddev, all_mddevs); 8313 8317 if (!mddev_get(mddev)) 8314 8318 return 0; 8315 8319 ··· 8392 8382 } 8393 8383 spin_unlock(&mddev->lock); 8394 8384 spin_lock(&all_mddevs_lock); 8385 + 8386 + if (mddev == list_last_entry(&all_mddevs, struct mddev, all_mddevs)) 8387 + status_unused(seq); 8388 + 8395 8389 if (atomic_dec_and_test(&mddev->active)) 8396 8390 __mddev_put(mddev); 8397 8391

+6 -6

drivers/md/raid1.c

··· 1968 1968 } 1969 1969 1970 1970 static int r1_sync_page_io(struct md_rdev *rdev, sector_t sector, 1971 - int sectors, struct page *page, int rw) 1971 + int sectors, struct page *page, blk_opf_t rw) 1972 1972 { 1973 1973 if (sync_page_io(rdev, sector, sectors << 9, page, rw, false)) 1974 1974 /* success */ 1975 1975 return 1; 1976 - if (rw == WRITE) { 1976 + if (rw == REQ_OP_WRITE) { 1977 1977 set_bit(WriteErrorSeen, &rdev->flags); 1978 1978 if (!test_and_set_bit(WantReplacement, 1979 1979 &rdev->flags)) ··· 2090 2090 rdev = conf->mirrors[d].rdev; 2091 2091 if (r1_sync_page_io(rdev, sect, s, 2092 2092 pages[idx], 2093 - WRITE) == 0) { 2093 + REQ_OP_WRITE) == 0) { 2094 2094 r1_bio->bios[d]->bi_end_io = NULL; 2095 2095 rdev_dec_pending(rdev, mddev); 2096 2096 } ··· 2105 2105 rdev = conf->mirrors[d].rdev; 2106 2106 if (r1_sync_page_io(rdev, sect, s, 2107 2107 pages[idx], 2108 - READ) != 0) 2108 + REQ_OP_READ) != 0) 2109 2109 atomic_add(s, &rdev->corrected_errors); 2110 2110 } 2111 2111 sectors -= s; ··· 2321 2321 !test_bit(Faulty, &rdev->flags)) { 2322 2322 atomic_inc(&rdev->nr_pending); 2323 2323 r1_sync_page_io(rdev, sect, s, 2324 - conf->tmppage, WRITE); 2324 + conf->tmppage, REQ_OP_WRITE); 2325 2325 rdev_dec_pending(rdev, mddev); 2326 2326 } 2327 2327 } ··· 2335 2335 !test_bit(Faulty, &rdev->flags)) { 2336 2336 atomic_inc(&rdev->nr_pending); 2337 2337 if (r1_sync_page_io(rdev, sect, s, 2338 - conf->tmppage, READ)) { 2338 + conf->tmppage, REQ_OP_READ)) { 2339 2339 atomic_add(s, &rdev->corrected_errors); 2340 2340 pr_info("md/raid1:%s: read error corrected (%d sectors at %llu on %pg)\n", 2341 2341 mdname(mddev), s,

+1 -1

drivers/nvme/common/keyring.c

··· 111 111 * should be preferred to 'generated' PSKs, 112 112 * and SHA-384 should be preferred to SHA-256. 113 113 */ 114 - struct nvme_tls_psk_priority_list { 114 + static struct nvme_tls_psk_priority_list { 115 115 bool generated; 116 116 enum nvme_tcp_tls_cipher cipher; 117 117 } nvme_tls_psk_prio[] = {

+20 -21

drivers/nvme/host/core.c

··· 1740 1740 struct nvme_ns_head *head) 1741 1741 { 1742 1742 struct request_queue *queue = disk->queue; 1743 - u32 size = queue_logical_block_size(queue); 1743 + u32 max_discard_sectors; 1744 1744 1745 - if (ctrl->dmrsl && ctrl->dmrsl <= nvme_sect_to_lba(head, UINT_MAX)) 1746 - ctrl->max_discard_sectors = 1747 - nvme_lba_to_sect(head, ctrl->dmrsl); 1748 - 1749 - if (ctrl->max_discard_sectors == 0) { 1745 + if (ctrl->dmrsl && ctrl->dmrsl <= nvme_sect_to_lba(head, UINT_MAX)) { 1746 + max_discard_sectors = nvme_lba_to_sect(head, ctrl->dmrsl); 1747 + } else if (ctrl->oncs & NVME_CTRL_ONCS_DSM) { 1748 + max_discard_sectors = UINT_MAX; 1749 + } else { 1750 1750 blk_queue_max_discard_sectors(queue, 0); 1751 1751 return; 1752 1752 } ··· 1754 1754 BUILD_BUG_ON(PAGE_SIZE / sizeof(struct nvme_dsm_range) < 1755 1755 NVME_DSM_MAX_RANGES); 1756 1756 1757 - queue->limits.discard_granularity = size; 1758 - 1759 - /* If discard is already enabled, don't reset queue limits */ 1757 + /* 1758 + * If discard is already enabled, don't reset queue limits. 1759 + * 1760 + * This works around the fact that the block layer can't cope well with 1761 + * updating the hardware limits when overridden through sysfs. This is 1762 + * harmless because discard limits in NVMe are purely advisory. 1763 + */ 1760 1764 if (queue->limits.max_discard_sectors) 1761 1765 return; 1762 1766 1763 - blk_queue_max_discard_sectors(queue, ctrl->max_discard_sectors); 1764 - blk_queue_max_discard_segments(queue, ctrl->max_discard_segments); 1767 + blk_queue_max_discard_sectors(queue, max_discard_sectors); 1768 + if (ctrl->dmrl) 1769 + blk_queue_max_discard_segments(queue, ctrl->dmrl); 1770 + else 1771 + blk_queue_max_discard_segments(queue, NVME_DSM_MAX_RANGES); 1772 + queue->limits.discard_granularity = queue_logical_block_size(queue); 1765 1773 1766 1774 if (ctrl->quirks & NVME_QUIRK_DEALLOCATE_ZEROES) 1767 1775 blk_queue_max_write_zeroes_sectors(queue, UINT_MAX); ··· 2938 2930 struct nvme_id_ctrl_nvm *id; 2939 2931 int ret; 2940 2932 2941 - if (ctrl->oncs & NVME_CTRL_ONCS_DSM) { 2942 - ctrl->max_discard_sectors = UINT_MAX; 2943 - ctrl->max_discard_segments = NVME_DSM_MAX_RANGES; 2944 - } else { 2945 - ctrl->max_discard_sectors = 0; 2946 - ctrl->max_discard_segments = 0; 2947 - } 2948 - 2949 2933 /* 2950 2934 * Even though NVMe spec explicitly states that MDTS is not applicable 2951 2935 * to the write-zeroes, we are cautious and limit the size to the ··· 2967 2967 if (ret) 2968 2968 goto free_data; 2969 2969 2970 - if (id->dmrl) 2971 - ctrl->max_discard_segments = id->dmrl; 2970 + ctrl->dmrl = id->dmrl; 2972 2971 ctrl->dmrsl = le32_to_cpu(id->dmrsl); 2973 2972 if (id->wzsl) 2974 2973 ctrl->max_zeroes_sectors = nvme_mps_to_sectors(ctrl, id->wzsl);

+13 -3

drivers/nvme/host/nvme.h

··· 303 303 u32 max_hw_sectors; 304 304 u32 max_segments; 305 305 u32 max_integrity_segments; 306 - u32 max_discard_sectors; 307 - u32 max_discard_segments; 308 306 u32 max_zeroes_sectors; 309 307 #ifdef CONFIG_BLK_DEV_ZONED 310 308 u32 max_zone_append; 311 309 #endif 312 310 u16 crdt[3]; 313 311 u16 oncs; 312 + u8 dmrl; 314 313 u32 dmrsl; 315 314 u16 oacs; 316 315 u16 sqsize; ··· 931 932 extern struct device_attribute dev_attr_ana_state; 932 933 extern struct device_attribute subsys_attr_iopolicy; 933 934 935 + static inline bool nvme_disk_is_ns_head(struct gendisk *disk) 936 + { 937 + return disk->fops == &nvme_ns_head_ops; 938 + } 934 939 #else 935 940 #define multipath false 936 941 static inline bool nvme_ctrl_use_ana(struct nvme_ctrl *ctrl) ··· 1012 1009 static inline void nvme_mpath_end_request(struct request *rq) 1013 1010 { 1014 1011 } 1012 + static inline bool nvme_disk_is_ns_head(struct gendisk *disk) 1013 + { 1014 + return false; 1015 + } 1015 1016 #endif /* CONFIG_NVME_MULTIPATH */ 1016 1017 1017 1018 int nvme_revalidate_zones(struct nvme_ns *ns); ··· 1044 1037 1045 1038 static inline struct nvme_ns *nvme_get_ns_from_dev(struct device *dev) 1046 1039 { 1047 - return dev_to_disk(dev)->private_data; 1040 + struct gendisk *disk = dev_to_disk(dev); 1041 + 1042 + WARN_ON(nvme_disk_is_ns_head(disk)); 1043 + return disk->private_data; 1048 1044 } 1049 1045 1050 1046 #ifdef CONFIG_NVME_HWMON

+16 -11

drivers/nvme/host/pci.c

··· 1284 1284 struct request *abort_req; 1285 1285 struct nvme_command cmd = { }; 1286 1286 u32 csts = readl(dev->bar + NVME_REG_CSTS); 1287 + u8 opcode; 1287 1288 1288 1289 /* If PCI error recovery process is happening, we cannot reset or 1289 1290 * the recovery mechanism will surely fail. ··· 1311 1310 1312 1311 if (blk_mq_rq_state(req) != MQ_RQ_IN_FLIGHT) { 1313 1312 dev_warn(dev->ctrl.device, 1314 - "I/O %d QID %d timeout, completion polled\n", 1315 - req->tag, nvmeq->qid); 1313 + "I/O tag %d (%04x) QID %d timeout, completion polled\n", 1314 + req->tag, nvme_cid(req), nvmeq->qid); 1316 1315 return BLK_EH_DONE; 1317 1316 } 1318 1317 ··· 1328 1327 fallthrough; 1329 1328 case NVME_CTRL_DELETING: 1330 1329 dev_warn_ratelimited(dev->ctrl.device, 1331 - "I/O %d QID %d timeout, disable controller\n", 1332 - req->tag, nvmeq->qid); 1330 + "I/O tag %d (%04x) QID %d timeout, disable controller\n", 1331 + req->tag, nvme_cid(req), nvmeq->qid); 1333 1332 nvme_req(req)->flags |= NVME_REQ_CANCELLED; 1334 1333 nvme_dev_disable(dev, true); 1335 1334 return BLK_EH_DONE; ··· 1344 1343 * command was already aborted once before and still hasn't been 1345 1344 * returned to the driver, or if this is the admin queue. 1346 1345 */ 1346 + opcode = nvme_req(req)->cmd->common.opcode; 1347 1347 if (!nvmeq->qid || iod->aborted) { 1348 1348 dev_warn(dev->ctrl.device, 1349 - "I/O %d QID %d timeout, reset controller\n", 1350 - req->tag, nvmeq->qid); 1349 + "I/O tag %d (%04x) opcode %#x (%s) QID %d timeout, reset controller\n", 1350 + req->tag, nvme_cid(req), opcode, 1351 + nvme_opcode_str(nvmeq->qid, opcode, 0), nvmeq->qid); 1351 1352 nvme_req(req)->flags |= NVME_REQ_CANCELLED; 1352 1353 goto disable; 1353 1354 } ··· 1365 1362 cmd.abort.sqid = cpu_to_le16(nvmeq->qid); 1366 1363 1367 1364 dev_warn(nvmeq->dev->ctrl.device, 1368 - "I/O %d (%s) QID %d timeout, aborting\n", 1369 - req->tag, 1370 - nvme_get_opcode_str(nvme_req(req)->cmd->common.opcode), 1371 - nvmeq->qid); 1365 + "I/O tag %d (%04x) opcode %#x (%s) QID %d timeout, aborting req_op:%s(%u) size:%u\n", 1366 + req->tag, nvme_cid(req), opcode, nvme_get_opcode_str(opcode), 1367 + nvmeq->qid, blk_op_str(req_op(req)), req_op(req), 1368 + blk_rq_bytes(req)); 1372 1369 1373 1370 abort_req = blk_mq_alloc_request(dev->ctrl.admin_q, nvme_req_op(&cmd), 1374 1371 BLK_MQ_REQ_NOWAIT); ··· 2746 2743 * controller around but remove all namespaces. 2747 2744 */ 2748 2745 if (dev->online_queues > 1) { 2746 + nvme_dbbuf_set(dev); 2749 2747 nvme_unquiesce_io_queues(&dev->ctrl); 2750 2748 nvme_wait_freeze(&dev->ctrl); 2751 2749 nvme_pci_update_nr_queues(dev); 2752 - nvme_dbbuf_set(dev); 2753 2750 nvme_unfreeze(&dev->ctrl); 2754 2751 } else { 2755 2752 dev_warn(dev->ctrl.device, "IO queues lost\n"); ··· 3411 3408 .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, 3412 3409 { PCI_DEVICE(0x1c5c, 0x174a), /* SK Hynix P31 SSD */ 3413 3410 .driver_data = NVME_QUIRK_BOGUS_NID, }, 3411 + { PCI_DEVICE(0x1c5c, 0x1D59), /* SK Hynix BC901 */ 3412 + .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, 3414 3413 { PCI_DEVICE(0x15b7, 0x2001), /* Sandisk Skyhawk */ 3415 3414 .driver_data = NVME_QUIRK_DISABLE_WRITE_ZEROES, }, 3416 3415 { PCI_DEVICE(0x1d97, 0x2263), /* SPCC */

+1 -1

drivers/nvme/host/pr.c

··· 98 98 struct nvme_command *c, void *data, unsigned int data_len) 99 99 { 100 100 if (IS_ENABLED(CONFIG_NVME_MULTIPATH) && 101 - bdev->bd_disk->fops == &nvme_ns_head_ops) 101 + nvme_disk_is_ns_head(bdev->bd_disk)) 102 102 return nvme_send_ns_head_pr_command(bdev, c, data, data_len); 103 103 104 104 return nvme_send_ns_pr_command(bdev->bd_disk->private_data, c, data,

+7 -2

drivers/nvme/host/rdma.c

··· 1946 1946 struct nvme_rdma_request *req = blk_mq_rq_to_pdu(rq); 1947 1947 struct nvme_rdma_queue *queue = req->queue; 1948 1948 struct nvme_rdma_ctrl *ctrl = queue->ctrl; 1949 + u8 opcode = req->req.cmd->common.opcode; 1950 + u8 fctype = req->req.cmd->fabrics.fctype; 1951 + int qid = nvme_rdma_queue_idx(queue); 1949 1952 1950 - dev_warn(ctrl->ctrl.device, "I/O %d QID %d timeout\n", 1951 - rq->tag, nvme_rdma_queue_idx(queue)); 1953 + dev_warn(ctrl->ctrl.device, 1954 + "I/O tag %d (%04x) opcode %#x (%s) QID %d timeout\n", 1955 + rq->tag, nvme_cid(rq), opcode, 1956 + nvme_opcode_str(qid, opcode, fctype), qid); 1952 1957 1953 1958 if (nvme_ctrl_state(&ctrl->ctrl) != NVME_CTRL_LIVE) { 1954 1959 /*

+4 -4

drivers/nvme/host/sysfs.c

··· 39 39 { 40 40 struct gendisk *disk = dev_to_disk(dev); 41 41 42 - if (disk->fops == &nvme_bdev_ops) 43 - return nvme_get_ns_from_dev(dev)->head; 44 - else 42 + if (nvme_disk_is_ns_head(disk)) 45 43 return disk->private_data; 44 + return nvme_get_ns_from_dev(dev)->head; 46 45 } 47 46 48 47 static ssize_t wwid_show(struct device *dev, struct device_attribute *attr, ··· 232 233 } 233 234 #ifdef CONFIG_NVME_MULTIPATH 234 235 if (a == &dev_attr_ana_grpid.attr || a == &dev_attr_ana_state.attr) { 235 - if (dev_to_disk(dev)->fops != &nvme_bdev_ops) /* per-path attr */ 236 + /* per-path attr */ 237 + if (nvme_disk_is_ns_head(dev_to_disk(dev))) 236 238 return 0; 237 239 if (!nvme_ctrl_use_ana(nvme_get_ns_from_dev(dev)->ctrl)) 238 240 return 0;

+5 -6

drivers/nvme/host/tcp.c

··· 1922 1922 ctrl->opts->subsysnqn); 1923 1923 if (!pskid) { 1924 1924 dev_err(ctrl->device, "no valid PSK found\n"); 1925 - ret = -ENOKEY; 1926 - goto out_free_queue; 1925 + return -ENOKEY; 1927 1926 } 1928 1927 } 1929 1928 1930 1929 ret = nvme_tcp_alloc_queue(ctrl, 0, pskid); 1931 1930 if (ret) 1932 - goto out_free_queue; 1931 + return ret; 1933 1932 1934 1933 ret = nvme_tcp_alloc_async_req(to_tcp_ctrl(ctrl)); 1935 1934 if (ret) ··· 2432 2433 int qid = nvme_tcp_queue_id(req->queue); 2433 2434 2434 2435 dev_warn(ctrl->device, 2435 - "queue %d: timeout cid %#x type %d opcode %#x (%s)\n", 2436 - nvme_tcp_queue_id(req->queue), nvme_cid(rq), pdu->hdr.type, 2437 - opc, nvme_opcode_str(qid, opc, fctype)); 2436 + "I/O tag %d (%04x) type %d opcode %#x (%s) QID %d timeout\n", 2437 + rq->tag, nvme_cid(rq), pdu->hdr.type, opc, 2438 + nvme_opcode_str(qid, opc, fctype), qid); 2438 2439 2439 2440 if (nvme_ctrl_state(ctrl) != NVME_CTRL_LIVE) { 2440 2441 /*

+1 -1

drivers/nvme/target/fc.c

··· 1031 1031 list_for_each_entry(host, &tgtport->host_list, host_list) { 1032 1032 if (host->hosthandle == hosthandle && !host->invalid) { 1033 1033 if (nvmet_fc_hostport_get(host)) 1034 - return (host); 1034 + return host; 1035 1035 } 1036 1036 } 1037 1037

+2 -5

drivers/nvme/target/fcloop.c

··· 995 995 { 996 996 struct fcloop_nport *nport = 997 997 container_of(ref, struct fcloop_nport, ref); 998 - unsigned long flags; 999 - 1000 - spin_lock_irqsave(&fcloop_lock, flags); 1001 - list_del(&nport->nport_list); 1002 - spin_unlock_irqrestore(&fcloop_lock, flags); 1003 998 1004 999 kfree(nport); 1005 1000 } ··· 1351 1356 if (rport && nport->tport) 1352 1357 nport->tport->remoteport = NULL; 1353 1358 nport->rport = NULL; 1359 + 1360 + list_del(&nport->nport_list); 1354 1361 1355 1362 return rport; 1356 1363 }

+16 -3

drivers/nvme/target/rdma.c

··· 37 37 #define NVMET_RDMA_MAX_MDTS 8 38 38 #define NVMET_RDMA_MAX_METADATA_MDTS 5 39 39 40 + #define NVMET_RDMA_BACKLOG 128 41 + 40 42 struct nvmet_rdma_srq; 41 43 42 44 struct nvmet_rdma_cmd { ··· 1585 1583 } 1586 1584 1587 1585 if (queue->host_qid == 0) { 1588 - /* Let inflight controller teardown complete */ 1589 - flush_workqueue(nvmet_wq); 1586 + struct nvmet_rdma_queue *q; 1587 + int pending = 0; 1588 + 1589 + /* Check for pending controller teardown */ 1590 + mutex_lock(&nvmet_rdma_queue_mutex); 1591 + list_for_each_entry(q, &nvmet_rdma_queue_list, queue_list) { 1592 + if (q->nvme_sq.ctrl == queue->nvme_sq.ctrl && 1593 + q->state == NVMET_RDMA_Q_DISCONNECTING) 1594 + pending++; 1595 + } 1596 + mutex_unlock(&nvmet_rdma_queue_mutex); 1597 + if (pending > NVMET_RDMA_BACKLOG) 1598 + return NVME_SC_CONNECT_CTRL_BUSY; 1590 1599 } 1591 1600 1592 1601 ret = nvmet_rdma_cm_accept(cm_id, queue, &event->param.conn); ··· 1893 1880 goto out_destroy_id; 1894 1881 } 1895 1882 1896 - ret = rdma_listen(cm_id, 128); 1883 + ret = rdma_listen(cm_id, NVMET_RDMA_BACKLOG); 1897 1884 if (ret) { 1898 1885 pr_err("listening to %pISpcs failed (%d)\n", addr, ret); 1899 1886 goto out_destroy_id;

+37 -11

drivers/nvme/target/tcp.c

··· 24 24 #include "nvmet.h" 25 25 26 26 #define NVMET_TCP_DEF_INLINE_DATA_SIZE (4 * PAGE_SIZE) 27 + #define NVMET_TCP_MAXH2CDATA 0x400000 /* 16M arbitrary limit */ 28 + #define NVMET_TCP_BACKLOG 128 27 29 28 30 static int param_store_val(const char *str, int *val, int min, int max) 29 31 { ··· 925 923 icresp->hdr.pdo = 0; 926 924 icresp->hdr.plen = cpu_to_le32(icresp->hdr.hlen); 927 925 icresp->pfv = cpu_to_le16(NVME_TCP_PFV_1_0); 928 - icresp->maxdata = cpu_to_le32(0x400000); /* 16M arbitrary limit */ 926 + icresp->maxdata = cpu_to_le32(NVMET_TCP_MAXH2CDATA); 929 927 icresp->cpda = 0; 930 928 if (queue->hdr_digest) 931 929 icresp->digest |= NVME_TCP_HDR_DIGEST_ENABLE; ··· 980 978 { 981 979 struct nvme_tcp_data_pdu *data = &queue->pdu.data; 982 980 struct nvmet_tcp_cmd *cmd; 981 + unsigned int exp_data_len; 983 982 984 983 if (likely(queue->nr_cmds)) { 985 984 if (unlikely(data->ttag >= queue->nr_cmds)) { 986 985 pr_err("queue %d: received out of bound ttag %u, nr_cmds %u\n", 987 986 queue->idx, data->ttag, queue->nr_cmds); 988 - nvmet_tcp_fatal_error(queue); 989 - return -EPROTO; 987 + goto err_proto; 990 988 } 991 989 cmd = &queue->cmds[data->ttag]; 992 990 } else { ··· 997 995 pr_err("ttag %u unexpected data offset %u (expected %u)\n", 998 996 data->ttag, le32_to_cpu(data->data_offset), 999 997 cmd->rbytes_done); 1000 - /* FIXME: use path and transport errors */ 1001 - nvmet_req_complete(&cmd->req, 1002 - NVME_SC_INVALID_FIELD | NVME_SC_DNR); 1003 - return -EPROTO; 998 + goto err_proto; 1004 999 } 1005 1000 1001 + exp_data_len = le32_to_cpu(data->hdr.plen) - 1002 + nvmet_tcp_hdgst_len(queue) - 1003 + nvmet_tcp_ddgst_len(queue) - 1004 + sizeof(*data); 1005 + 1006 1006 cmd->pdu_len = le32_to_cpu(data->data_length); 1007 + if (unlikely(cmd->pdu_len != exp_data_len || 1008 + cmd->pdu_len == 0 || 1009 + cmd->pdu_len > NVMET_TCP_MAXH2CDATA)) { 1010 + pr_err("H2CData PDU len %u is invalid\n", cmd->pdu_len); 1011 + goto err_proto; 1012 + } 1007 1013 cmd->pdu_recv = 0; 1008 1014 nvmet_tcp_build_pdu_iovec(cmd); 1009 1015 queue->cmd = cmd; 1010 1016 queue->rcv_state = NVMET_TCP_RECV_DATA; 1011 1017 1012 1018 return 0; 1019 + 1020 + err_proto: 1021 + /* FIXME: use proper transport errors */ 1022 + nvmet_tcp_fatal_error(queue); 1023 + return -EPROTO; 1013 1024 } 1014 1025 1015 1026 static int nvmet_tcp_done_recv_pdu(struct nvmet_tcp_queue *queue) ··· 1783 1768 (int)sizeof(struct nvme_tcp_icreq_pdu)); 1784 1769 if (hdr->type == nvme_tcp_icreq && 1785 1770 hdr->hlen == sizeof(struct nvme_tcp_icreq_pdu) && 1786 - hdr->plen == (__le32)sizeof(struct nvme_tcp_icreq_pdu)) { 1771 + hdr->plen == cpu_to_le32(sizeof(struct nvme_tcp_icreq_pdu))) { 1787 1772 pr_debug("queue %d: icreq detected\n", 1788 1773 queue->idx); 1789 1774 return len; ··· 2068 2053 goto err_sock; 2069 2054 } 2070 2055 2071 - ret = kernel_listen(port->sock, 128); 2056 + ret = kernel_listen(port->sock, NVMET_TCP_BACKLOG); 2072 2057 if (ret) { 2073 2058 pr_err("failed to listen %d on port sock\n", ret); 2074 2059 goto err_sock; ··· 2134 2119 container_of(sq, struct nvmet_tcp_queue, nvme_sq); 2135 2120 2136 2121 if (sq->qid == 0) { 2137 - /* Let inflight controller teardown complete */ 2138 - flush_workqueue(nvmet_wq); 2122 + struct nvmet_tcp_queue *q; 2123 + int pending = 0; 2124 + 2125 + /* Check for pending controller teardown */ 2126 + mutex_lock(&nvmet_tcp_queue_mutex); 2127 + list_for_each_entry(q, &nvmet_tcp_queue_list, queue_list) { 2128 + if (q->nvme_sq.ctrl == sq->ctrl && 2129 + q->state == NVMET_TCP_Q_DISCONNECTING) 2130 + pending++; 2131 + } 2132 + mutex_unlock(&nvmet_tcp_queue_mutex); 2133 + if (pending > NVMET_TCP_BACKLOG) 2134 + return NVME_SC_CONNECT_CTRL_BUSY; 2139 2135 } 2140 2136 2141 2137 queue->nr_cmds = sq->size * 2;

+3 -3

drivers/nvme/target/trace.c

··· 211 211 return ret; 212 212 } 213 213 214 - const char *nvmet_trace_ctrl_name(struct trace_seq *p, struct nvmet_ctrl *ctrl) 214 + const char *nvmet_trace_ctrl_id(struct trace_seq *p, u16 ctrl_id) 215 215 { 216 216 const char *ret = trace_seq_buffer_ptr(p); 217 217 ··· 224 224 * If we can know the extra data of the connect command in this stage, 225 225 * we can update this print statement later. 226 226 */ 227 - if (ctrl) 228 - trace_seq_printf(p, "%d", ctrl->cntlid); 227 + if (ctrl_id) 228 + trace_seq_printf(p, "%d", ctrl_id); 229 229 else 230 230 trace_seq_printf(p, "_"); 231 231 trace_seq_putc(p, 0);

+19 -14

drivers/nvme/target/trace.h

··· 32 32 nvmet_trace_parse_nvm_cmd(p, opcode, cdw10) : \ 33 33 nvmet_trace_parse_admin_cmd(p, opcode, cdw10))) 34 34 35 - const char *nvmet_trace_ctrl_name(struct trace_seq *p, struct nvmet_ctrl *ctrl); 36 - #define __print_ctrl_name(ctrl) \ 37 - nvmet_trace_ctrl_name(p, ctrl) 35 + const char *nvmet_trace_ctrl_id(struct trace_seq *p, u16 ctrl_id); 36 + #define __print_ctrl_id(ctrl_id) \ 37 + nvmet_trace_ctrl_id(p, ctrl_id) 38 38 39 39 const char *nvmet_trace_disk_name(struct trace_seq *p, char *name); 40 40 #define __print_disk_name(name) \ 41 41 nvmet_trace_disk_name(p, name) 42 42 43 43 #ifndef TRACE_HEADER_MULTI_READ 44 - static inline struct nvmet_ctrl *nvmet_req_to_ctrl(struct nvmet_req *req) 44 + static inline u16 nvmet_req_to_ctrl_id(struct nvmet_req *req) 45 45 { 46 - return req->sq->ctrl; 46 + /* 47 + * The queue and controller pointers are not valid until an association 48 + * has been established. 49 + */ 50 + if (!req->sq || !req->sq->ctrl) 51 + return 0; 52 + return req->sq->ctrl->cntlid; 47 53 } 48 54 49 55 static inline void __assign_req_name(char *name, struct nvmet_req *req) ··· 59 53 return; 60 54 } 61 55 62 - strncpy(name, req->ns->device_path, 63 - min_t(size_t, DISK_NAME_LEN, strlen(req->ns->device_path))); 56 + strscpy_pad(name, req->ns->device_path, DISK_NAME_LEN); 64 57 } 65 58 #endif 66 59 ··· 68 63 TP_ARGS(req, cmd), 69 64 TP_STRUCT__entry( 70 65 __field(struct nvme_command *, cmd) 71 - __field(struct nvmet_ctrl *, ctrl) 66 + __field(u16, ctrl_id) 72 67 __array(char, disk, DISK_NAME_LEN) 73 68 __field(int, qid) 74 69 __field(u16, cid) ··· 81 76 ), 82 77 TP_fast_assign( 83 78 __entry->cmd = cmd; 84 - __entry->ctrl = nvmet_req_to_ctrl(req); 79 + __entry->ctrl_id = nvmet_req_to_ctrl_id(req); 85 80 __assign_req_name(__entry->disk, req); 86 81 __entry->qid = req->sq->qid; 87 82 __entry->cid = cmd->common.command_id; ··· 90 85 __entry->flags = cmd->common.flags; 91 86 __entry->nsid = le32_to_cpu(cmd->common.nsid); 92 87 __entry->metadata = le64_to_cpu(cmd->common.metadata); 93 - memcpy(__entry->cdw10, &cmd->common.cdw10, 88 + memcpy(__entry->cdw10, &cmd->common.cdws, 94 89 sizeof(__entry->cdw10)); 95 90 ), 96 91 TP_printk("nvmet%s: %sqid=%d, cmdid=%u, nsid=%u, flags=%#x, " 97 92 "meta=%#llx, cmd=(%s, %s)", 98 - __print_ctrl_name(__entry->ctrl), 93 + __print_ctrl_id(__entry->ctrl_id), 99 94 __print_disk_name(__entry->disk), 100 95 __entry->qid, __entry->cid, __entry->nsid, 101 96 __entry->flags, __entry->metadata, ··· 109 104 TP_PROTO(struct nvmet_req *req), 110 105 TP_ARGS(req), 111 106 TP_STRUCT__entry( 112 - __field(struct nvmet_ctrl *, ctrl) 107 + __field(u16, ctrl_id) 113 108 __array(char, disk, DISK_NAME_LEN) 114 109 __field(int, qid) 115 110 __field(int, cid) ··· 117 112 __field(u16, status) 118 113 ), 119 114 TP_fast_assign( 120 - __entry->ctrl = nvmet_req_to_ctrl(req); 115 + __entry->ctrl_id = nvmet_req_to_ctrl_id(req); 121 116 __entry->qid = req->cq->qid; 122 117 __entry->cid = req->cqe->command_id; 123 118 __entry->result = le64_to_cpu(req->cqe->result.u64); ··· 125 120 __assign_req_name(__entry->disk, req); 126 121 ), 127 122 TP_printk("nvmet%s: %sqid=%d, cmdid=%u, res=%#llx, status=%#x", 128 - __print_ctrl_name(__entry->ctrl), 123 + __print_ctrl_id(__entry->ctrl_id), 129 124 __print_disk_name(__entry->disk), 130 125 __entry->qid, __entry->cid, __entry->result, __entry->status) 131 126

+6 -3

include/linux/bio.h

··· 286 286 { 287 287 struct bio_vec *bvec = bio_first_bvec_all(bio) + i; 288 288 289 + if (unlikely(i >= bio->bi_vcnt)) { 290 + fi->folio = NULL; 291 + return; 292 + } 293 + 289 294 fi->folio = page_folio(bvec->bv_page); 290 295 fi->offset = bvec->bv_offset + 291 296 PAGE_SIZE * (bvec->bv_page - &fi->folio->page); ··· 308 303 fi->offset = 0; 309 304 fi->length = min(folio_size(fi->folio), fi->_seg_count); 310 305 fi->_next = folio_next(fi->folio); 311 - } else if (fi->_i + 1 < bio->bi_vcnt) { 312 - bio_first_folio(fi, bio, fi->_i + 1); 313 306 } else { 314 - fi->folio = NULL; 307 + bio_first_folio(fi, bio, fi->_i + 1); 315 308 } 316 309 } 317 310

-3

include/linux/blk-mq.h

··· 391 391 */ 392 392 struct blk_mq_tags *sched_tags; 393 393 394 - /** @run: Number of dispatched requests. */ 395 - unsigned long run; 396 - 397 394 /** @numa_node: NUMA node the storage adapter has been connected to. */ 398 395 unsigned int numa_node; 399 396 /** @queue_num: Index of this hardware queue. */

+24 -1

include/linux/ioprio.h

··· 47 47 } 48 48 49 49 #ifdef CONFIG_BLOCK 50 - int __get_task_ioprio(struct task_struct *p); 50 + /* 51 + * If the task has set an I/O priority, use that. Otherwise, return 52 + * the default I/O priority. 53 + * 54 + * Expected to be called for current task or with task_lock() held to keep 55 + * io_context stable. 56 + */ 57 + static inline int __get_task_ioprio(struct task_struct *p) 58 + { 59 + struct io_context *ioc = p->io_context; 60 + int prio; 61 + 62 + if (!ioc) 63 + return IOPRIO_DEFAULT; 64 + 65 + if (p != current) 66 + lockdep_assert_held(&p->alloc_lock); 67 + 68 + prio = ioc->ioprio; 69 + if (IOPRIO_PRIO_CLASS(prio) == IOPRIO_CLASS_NONE) 70 + prio = IOPRIO_PRIO_VALUE(task_nice_ioclass(p), 71 + task_nice_ioprio(p)); 72 + return prio; 73 + } 51 74 #else 52 75 static inline int __get_task_ioprio(struct task_struct *p) 53 76 {

-1

include/linux/nvme.h

··· 20 20 #define NVMF_TRSVCID_SIZE 32 21 21 #define NVMF_TRADDR_SIZE 256 22 22 #define NVMF_TSAS_SIZE 256 23 - #define NVMF_AUTH_HASH_LEN 64 24 23 25 24 #define NVME_DISC_SUBSYS_NAME "nqn.2014-08.org.nvmexpress.discovery" 26 25

-5

lib/sbitmap.c

··· 388 388 unsigned int shallow_depth; 389 389 390 390 /* 391 - * For each batch, we wake up one queue. We need to make sure that our 392 - * batch size is small enough that the full depth of the bitmap, 393 - * potentially limited by a shallow depth, is enough to wake up all of 394 - * the queues. 395 - * 396 391 * Each full word of the bitmap has bits_per_word bits, and there might 397 392 * be a partial word. There are depth / bits_per_word full words and 398 393 * depth % bits_per_word bits left over. In bitwise arithmetic: