block: blk-merge: fast-clone bio when splitting rw bios

biovecs has become immutable since v3.13, so it isn't necessary
to allocate biovecs for the new cloned bios, then we can save
one extra biovecs allocation/copy, and the allocation is often
not fixed-length and a bit more expensive.

For example, if the 'max_sectors_kb' of null blk's queue is set
as 16(32 sectors) via sysfs just for making more splits, this patch
can increase throught about ~70% in the sequential read test over
null_blk(direct io, bs: 1M).

Cc: Christoph Hellwig <hch@infradead.org>
Cc: Kent Overstreet <kent.overstreet@gmail.com>
Cc: Ming Lin <ming.l@ssi.samsung.com>
Cc: Dongsu Park <dpark@posteo.net>
Signed-off-by: Ming Lei <ming.lei@canonical.com>

This fixes a performance regression introduced by commit 54efd50bfd,
and allows us to take full advantage of the fact that we have immutable
bio_vecs. Hand applied, as it rejected violently with commit
5014c311baa2.

Signed-off-by: Jens Axboe <axboe@fb.com>

authored by Ming Lei and committed by Jens Axboe 52cc6eea 6fe810bd

+4 -15
+4 -15
block/blk-merge.c
··· 66 66 struct bio *bio, 67 67 struct bio_set *bs) 68 68 { 69 - struct bio *split; 70 69 struct bio_vec bv, bvprv, *bvprvp = NULL; 71 70 struct bvec_iter iter; 72 71 unsigned seg_size = 0, nsegs = 0, sectors = 0; 73 72 74 73 bio_for_each_segment(bv, bio, iter) { 75 - sectors += bv.bv_len >> 9; 76 - 77 - if (sectors > queue_max_sectors(q)) 74 + if (sectors + (bv.bv_len >> 9) > queue_max_sectors(q)) 78 75 goto split; 79 76 80 77 /* ··· 92 95 seg_size += bv.bv_len; 93 96 bvprv = bv; 94 97 bvprvp = &bv; 98 + sectors += bv.bv_len >> 9; 95 99 continue; 96 100 } 97 101 new_segment: ··· 103 105 bvprv = bv; 104 106 bvprvp = &bv; 105 107 seg_size = bv.bv_len; 108 + sectors += bv.bv_len >> 9; 106 109 } 107 110 108 111 return NULL; 109 112 split: 110 - split = bio_clone_bioset(bio, GFP_NOIO, bs); 111 - 112 - split->bi_iter.bi_size -= iter.bi_size; 113 - bio->bi_iter = iter; 114 - 115 - if (bio_integrity(bio)) { 116 - bio_integrity_advance(bio, split->bi_iter.bi_size); 117 - bio_integrity_trim(split, 0, bio_sectors(split)); 118 - } 119 - 120 - return split; 113 + return bio_split(bio, sectors, GFP_NOIO, bs); 121 114 } 122 115 123 116 void blk_queue_split(struct request_queue *q, struct bio **bio,