Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: Define struct folio_queue and ITER_FOLIOQ to handle a sequence of folios

Define a data structure, struct folio_queue, to represent a sequence of
folios and a kernel-internal I/O iterator type, ITER_FOLIOQ, to allow a
list of folio_queue structures to be used to provide a buffer to
iov_iter-taking functions, such as sendmsg and recvmsg.

The folio_queue structure looks like:

struct folio_queue {
struct folio_batch vec;
u8 orders[PAGEVEC_SIZE];
struct folio_queue *next;
struct folio_queue *prev;
unsigned long marks;
unsigned long marks2;
};

It does not use a list_head so that next and/or prev can be set to NULL at
the ends of the list, allowing iov_iter-handling routines to determine that
they *are* the ends without needing to store a head pointer in the iov_iter
struct.

A folio_batch struct is used to hold the folio pointers which allows the
batch to be passed to batch handling functions. Two mark bits are
available per slot. The intention is to use at least one of them to mark
folios that need putting, but that might not be ultimately necessary.
Accessor functions are used to access the slots to do the masking and an
additional accessor function is used to indicate the size of the array.

The order of each folio is also stored in the structure to avoid the need
for iov_iter_advance() and iov_iter_revert() to have to query each folio to
find its size.

With careful barriering, this can be used as an extending buffer with new
folios inserted and new folio_queue structs added without the need for a
lock. Further, provided we always keep at least one struct in the buffer,
we can also remove consumed folios and consumed structs from the head end
as we without the need for locks.

[Questions/thoughts]

(1) To manage this, I need a head pointer, a tail pointer, a tail slot
number (assuming insertion happens at the tail end and the next
pointers point from head to tail). Should I put these into a struct
of their own, say "folio_queue_head" or "rolling_buffer"?

I will end up with two of these in netfs_io_request eventually, one
keeping track of the pagecache I'm dealing with for buffered I/O and
the other to hold a bounce buffer when we need one.

(2) Should I make the slots {folio,off,len} or bio_vec?

(3) This is intended to replace ITER_XARRAY eventually. Using an xarray
in I/O iteration requires the taking of the RCU read lock, doing
copying under the RCU read lock, walking the xarray (which may change
under us), handling retries and dealing with special values.

The advantage of ITER_XARRAY is that when we're dealing with the
pagecache directly, we don't need any allocation - but if we're doing
encrypted comms, there's a good chance we'd be using a bounce buffer
anyway.

This will require afs, erofs, cifs, orangefs and fscache to be
converted to not use this. afs still uses it for dirs and symlinks;
some of erofs usages should be easy to change, but there's one which
won't be so easy; ceph's use via fscache can be fixed by porting ceph
to netfslib; cifs is using xarray as a bounce buffer - that can be
moved to use sheaves instead; and orangefs has a similar problem to
erofs - maybe orangefs could use netfslib?

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Matthew Wilcox <willy@infradead.org>
cc: Jeff Layton <jlayton@kernel.org>
cc: Steve French <sfrench@samba.org>
cc: Ilya Dryomov <idryomov@gmail.com>
cc: Gao Xiang <xiang@kernel.org>
cc: Mike Marshall <hubcap@omnibond.com>
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
cc: linux-mm@kvack.org
cc: linux-afs@lists.infradead.org
cc: linux-cifs@vger.kernel.org
cc: ceph-devel@vger.kernel.org
cc: linux-erofs@lists.ozlabs.org
cc: devel@lists.orangefs.org
Link: https://lore.kernel.org/r/20240814203850.2240469-13-dhowells@redhat.com/ # v2
Signed-off-by: Christian Brauner <brauner@kernel.org>

authored by

David Howells and committed by
Christian Brauner
db0aa2e9 22de489d

+771 -4
+138
include/linux/folio_queue.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-or-later */ 2 + /* Queue of folios definitions 3 + * 4 + * Copyright (C) 2024 Red Hat, Inc. All Rights Reserved. 5 + * Written by David Howells (dhowells@redhat.com) 6 + */ 7 + 8 + #ifndef _LINUX_FOLIO_QUEUE_H 9 + #define _LINUX_FOLIO_QUEUE_H 10 + 11 + #include <linux/pagevec.h> 12 + 13 + /* 14 + * Segment in a queue of running buffers. Each segment can hold a number of 15 + * folios and a portion of the queue can be referenced with the ITER_FOLIOQ 16 + * iterator. The possibility exists of inserting non-folio elements into the 17 + * queue (such as gaps). 18 + * 19 + * Explicit prev and next pointers are used instead of a list_head to make it 20 + * easier to add segments to tail and remove them from the head without the 21 + * need for a lock. 22 + */ 23 + struct folio_queue { 24 + struct folio_batch vec; /* Folios in the queue segment */ 25 + u8 orders[PAGEVEC_SIZE]; /* Order of each folio */ 26 + struct folio_queue *next; /* Next queue segment or NULL */ 27 + struct folio_queue *prev; /* Previous queue segment of NULL */ 28 + unsigned long marks; /* 1-bit mark per folio */ 29 + unsigned long marks2; /* Second 1-bit mark per folio */ 30 + #if PAGEVEC_SIZE > BITS_PER_LONG 31 + #error marks is not big enough 32 + #endif 33 + }; 34 + 35 + static inline void folioq_init(struct folio_queue *folioq) 36 + { 37 + folio_batch_init(&folioq->vec); 38 + folioq->next = NULL; 39 + folioq->prev = NULL; 40 + folioq->marks = 0; 41 + folioq->marks2 = 0; 42 + } 43 + 44 + static inline unsigned int folioq_nr_slots(const struct folio_queue *folioq) 45 + { 46 + return PAGEVEC_SIZE; 47 + } 48 + 49 + static inline unsigned int folioq_count(struct folio_queue *folioq) 50 + { 51 + return folio_batch_count(&folioq->vec); 52 + } 53 + 54 + static inline bool folioq_full(struct folio_queue *folioq) 55 + { 56 + //return !folio_batch_space(&folioq->vec); 57 + return folioq_count(folioq) >= folioq_nr_slots(folioq); 58 + } 59 + 60 + static inline bool folioq_is_marked(const struct folio_queue *folioq, unsigned int slot) 61 + { 62 + return test_bit(slot, &folioq->marks); 63 + } 64 + 65 + static inline void folioq_mark(struct folio_queue *folioq, unsigned int slot) 66 + { 67 + set_bit(slot, &folioq->marks); 68 + } 69 + 70 + static inline void folioq_unmark(struct folio_queue *folioq, unsigned int slot) 71 + { 72 + clear_bit(slot, &folioq->marks); 73 + } 74 + 75 + static inline bool folioq_is_marked2(const struct folio_queue *folioq, unsigned int slot) 76 + { 77 + return test_bit(slot, &folioq->marks2); 78 + } 79 + 80 + static inline void folioq_mark2(struct folio_queue *folioq, unsigned int slot) 81 + { 82 + set_bit(slot, &folioq->marks2); 83 + } 84 + 85 + static inline void folioq_unmark2(struct folio_queue *folioq, unsigned int slot) 86 + { 87 + clear_bit(slot, &folioq->marks2); 88 + } 89 + 90 + static inline unsigned int __folio_order(struct folio *folio) 91 + { 92 + if (!folio_test_large(folio)) 93 + return 0; 94 + return folio->_flags_1 & 0xff; 95 + } 96 + 97 + static inline unsigned int folioq_append(struct folio_queue *folioq, struct folio *folio) 98 + { 99 + unsigned int slot = folioq->vec.nr++; 100 + 101 + folioq->vec.folios[slot] = folio; 102 + folioq->orders[slot] = __folio_order(folio); 103 + return slot; 104 + } 105 + 106 + static inline unsigned int folioq_append_mark(struct folio_queue *folioq, struct folio *folio) 107 + { 108 + unsigned int slot = folioq->vec.nr++; 109 + 110 + folioq->vec.folios[slot] = folio; 111 + folioq->orders[slot] = __folio_order(folio); 112 + folioq_mark(folioq, slot); 113 + return slot; 114 + } 115 + 116 + static inline struct folio *folioq_folio(const struct folio_queue *folioq, unsigned int slot) 117 + { 118 + return folioq->vec.folios[slot]; 119 + } 120 + 121 + static inline unsigned int folioq_folio_order(const struct folio_queue *folioq, unsigned int slot) 122 + { 123 + return folioq->orders[slot]; 124 + } 125 + 126 + static inline size_t folioq_folio_size(const struct folio_queue *folioq, unsigned int slot) 127 + { 128 + return PAGE_SIZE << folioq_folio_order(folioq, slot); 129 + } 130 + 131 + static inline void folioq_clear(struct folio_queue *folioq, unsigned int slot) 132 + { 133 + folioq->vec.folios[slot] = NULL; 134 + folioq_unmark(folioq, slot); 135 + folioq_unmark2(folioq, slot); 136 + } 137 + 138 + #endif /* _LINUX_FOLIO_QUEUE_H */
+57
include/linux/iov_iter.h
··· 10 10 11 11 #include <linux/uio.h> 12 12 #include <linux/bvec.h> 13 + #include <linux/folio_queue.h> 13 14 14 15 typedef size_t (*iov_step_f)(void *iter_base, size_t progress, size_t len, 15 16 void *priv, void *priv2); ··· 142 141 } 143 142 144 143 /* 144 + * Handle ITER_FOLIOQ. 145 + */ 146 + static __always_inline 147 + size_t iterate_folioq(struct iov_iter *iter, size_t len, void *priv, void *priv2, 148 + iov_step_f step) 149 + { 150 + const struct folio_queue *folioq = iter->folioq; 151 + unsigned int slot = iter->folioq_slot; 152 + size_t progress = 0, skip = iter->iov_offset; 153 + 154 + if (slot == folioq_nr_slots(folioq)) { 155 + /* The iterator may have been extended. */ 156 + folioq = folioq->next; 157 + slot = 0; 158 + } 159 + 160 + do { 161 + struct folio *folio = folioq_folio(folioq, slot); 162 + size_t part, remain, consumed; 163 + size_t fsize; 164 + void *base; 165 + 166 + if (!folio) 167 + break; 168 + 169 + fsize = folioq_folio_size(folioq, slot); 170 + base = kmap_local_folio(folio, skip); 171 + part = umin(len, PAGE_SIZE - skip % PAGE_SIZE); 172 + remain = step(base, progress, part, priv, priv2); 173 + kunmap_local(base); 174 + consumed = part - remain; 175 + len -= consumed; 176 + progress += consumed; 177 + skip += consumed; 178 + if (skip >= fsize) { 179 + skip = 0; 180 + slot++; 181 + if (slot == folioq_nr_slots(folioq) && folioq->next) { 182 + folioq = folioq->next; 183 + slot = 0; 184 + } 185 + } 186 + if (remain) 187 + break; 188 + } while (len); 189 + 190 + iter->folioq_slot = slot; 191 + iter->folioq = folioq; 192 + iter->iov_offset = skip; 193 + iter->count -= progress; 194 + return progress; 195 + } 196 + 197 + /* 145 198 * Handle ITER_XARRAY. 146 199 */ 147 200 static __always_inline ··· 304 249 return iterate_bvec(iter, len, priv, priv2, step); 305 250 if (iov_iter_is_kvec(iter)) 306 251 return iterate_kvec(iter, len, priv, priv2, step); 252 + if (iov_iter_is_folioq(iter)) 253 + return iterate_folioq(iter, len, priv, priv2, step); 307 254 if (iov_iter_is_xarray(iter)) 308 255 return iterate_xarray(iter, len, priv, priv2, step); 309 256 return iterate_discard(iter, len, priv, priv2, step);
+12
include/linux/uio.h
··· 11 11 #include <uapi/linux/uio.h> 12 12 13 13 struct page; 14 + struct folio_queue; 14 15 15 16 typedef unsigned int __bitwise iov_iter_extraction_t; 16 17 ··· 26 25 ITER_IOVEC, 27 26 ITER_BVEC, 28 27 ITER_KVEC, 28 + ITER_FOLIOQ, 29 29 ITER_XARRAY, 30 30 ITER_DISCARD, 31 31 }; ··· 68 66 const struct iovec *__iov; 69 67 const struct kvec *kvec; 70 68 const struct bio_vec *bvec; 69 + const struct folio_queue *folioq; 71 70 struct xarray *xarray; 72 71 void __user *ubuf; 73 72 }; ··· 77 74 }; 78 75 union { 79 76 unsigned long nr_segs; 77 + u8 folioq_slot; 80 78 loff_t xarray_start; 81 79 }; 82 80 }; ··· 128 124 static inline bool iov_iter_is_discard(const struct iov_iter *i) 129 125 { 130 126 return iov_iter_type(i) == ITER_DISCARD; 127 + } 128 + 129 + static inline bool iov_iter_is_folioq(const struct iov_iter *i) 130 + { 131 + return iov_iter_type(i) == ITER_FOLIOQ; 131 132 } 132 133 133 134 static inline bool iov_iter_is_xarray(const struct iov_iter *i) ··· 282 273 void iov_iter_bvec(struct iov_iter *i, unsigned int direction, const struct bio_vec *bvec, 283 274 unsigned long nr_segs, size_t count); 284 275 void iov_iter_discard(struct iov_iter *i, unsigned int direction, size_t count); 276 + void iov_iter_folio_queue(struct iov_iter *i, unsigned int direction, 277 + const struct folio_queue *folioq, 278 + unsigned int first_slot, unsigned int offset, size_t count); 285 279 void iov_iter_xarray(struct iov_iter *i, unsigned int direction, struct xarray *xarray, 286 280 loff_t start, size_t count); 287 281 ssize_t iov_iter_get_pages2(struct iov_iter *i, struct page **pages,
+238 -2
lib/iov_iter.c
··· 527 527 i->__iov = iov; 528 528 } 529 529 530 + static void iov_iter_folioq_advance(struct iov_iter *i, size_t size) 531 + { 532 + const struct folio_queue *folioq = i->folioq; 533 + unsigned int slot = i->folioq_slot; 534 + 535 + if (!i->count) 536 + return; 537 + i->count -= size; 538 + 539 + if (slot >= folioq_nr_slots(folioq)) { 540 + folioq = folioq->next; 541 + slot = 0; 542 + } 543 + 544 + size += i->iov_offset; /* From beginning of current segment. */ 545 + do { 546 + size_t fsize = folioq_folio_size(folioq, slot); 547 + 548 + if (likely(size < fsize)) 549 + break; 550 + size -= fsize; 551 + slot++; 552 + if (slot >= folioq_nr_slots(folioq) && folioq->next) { 553 + folioq = folioq->next; 554 + slot = 0; 555 + } 556 + } while (size); 557 + 558 + i->iov_offset = size; 559 + i->folioq_slot = slot; 560 + i->folioq = folioq; 561 + } 562 + 530 563 void iov_iter_advance(struct iov_iter *i, size_t size) 531 564 { 532 565 if (unlikely(i->count < size)) ··· 572 539 iov_iter_iovec_advance(i, size); 573 540 } else if (iov_iter_is_bvec(i)) { 574 541 iov_iter_bvec_advance(i, size); 542 + } else if (iov_iter_is_folioq(i)) { 543 + iov_iter_folioq_advance(i, size); 575 544 } else if (iov_iter_is_discard(i)) { 576 545 i->count -= size; 577 546 } 578 547 } 579 548 EXPORT_SYMBOL(iov_iter_advance); 549 + 550 + static void iov_iter_folioq_revert(struct iov_iter *i, size_t unroll) 551 + { 552 + const struct folio_queue *folioq = i->folioq; 553 + unsigned int slot = i->folioq_slot; 554 + 555 + for (;;) { 556 + size_t fsize; 557 + 558 + if (slot == 0) { 559 + folioq = folioq->prev; 560 + slot = folioq_nr_slots(folioq); 561 + } 562 + slot--; 563 + 564 + fsize = folioq_folio_size(folioq, slot); 565 + if (unroll <= fsize) { 566 + i->iov_offset = fsize - unroll; 567 + break; 568 + } 569 + unroll -= fsize; 570 + } 571 + 572 + i->folioq_slot = slot; 573 + i->folioq = folioq; 574 + } 580 575 581 576 void iov_iter_revert(struct iov_iter *i, size_t unroll) 582 577 { ··· 637 576 } 638 577 unroll -= n; 639 578 } 579 + } else if (iov_iter_is_folioq(i)) { 580 + i->iov_offset = 0; 581 + iov_iter_folioq_revert(i, unroll); 640 582 } else { /* same logics for iovec and kvec */ 641 583 const struct iovec *iov = iter_iov(i); 642 584 while (1) { ··· 667 603 if (iov_iter_is_bvec(i)) 668 604 return min(i->count, i->bvec->bv_len - i->iov_offset); 669 605 } 606 + if (unlikely(iov_iter_is_folioq(i))) 607 + return !i->count ? 0 : 608 + umin(folioq_folio_size(i->folioq, i->folioq_slot), i->count); 670 609 return i->count; 671 610 } 672 611 EXPORT_SYMBOL(iov_iter_single_seg_count); ··· 705 638 }; 706 639 } 707 640 EXPORT_SYMBOL(iov_iter_bvec); 641 + 642 + /** 643 + * iov_iter_folio_queue - Initialise an I/O iterator to use the folios in a folio queue 644 + * @i: The iterator to initialise. 645 + * @direction: The direction of the transfer. 646 + * @folioq: The starting point in the folio queue. 647 + * @first_slot: The first slot in the folio queue to use 648 + * @offset: The offset into the folio in the first slot to start at 649 + * @count: The size of the I/O buffer in bytes. 650 + * 651 + * Set up an I/O iterator to either draw data out of the pages attached to an 652 + * inode or to inject data into those pages. The pages *must* be prevented 653 + * from evaporation, either by taking a ref on them or locking them by the 654 + * caller. 655 + */ 656 + void iov_iter_folio_queue(struct iov_iter *i, unsigned int direction, 657 + const struct folio_queue *folioq, unsigned int first_slot, 658 + unsigned int offset, size_t count) 659 + { 660 + BUG_ON(direction & ~1); 661 + *i = (struct iov_iter) { 662 + .iter_type = ITER_FOLIOQ, 663 + .data_source = direction, 664 + .folioq = folioq, 665 + .folioq_slot = first_slot, 666 + .count = count, 667 + .iov_offset = offset, 668 + }; 669 + } 670 + EXPORT_SYMBOL(iov_iter_folio_queue); 708 671 709 672 /** 710 673 * iov_iter_xarray - Initialise an I/O iterator to use the pages in an xarray ··· 862 765 if (iov_iter_is_bvec(i)) 863 766 return iov_iter_aligned_bvec(i, addr_mask, len_mask); 864 767 768 + /* With both xarray and folioq types, we're dealing with whole folios. */ 865 769 if (iov_iter_is_xarray(i)) { 866 770 if (i->count & len_mask) 867 771 return false; 868 772 if ((i->xarray_start + i->iov_offset) & addr_mask) 773 + return false; 774 + } 775 + if (iov_iter_is_folioq(i)) { 776 + if (i->count & len_mask) 777 + return false; 778 + if (i->iov_offset & addr_mask) 869 779 return false; 870 780 } 871 781 ··· 939 835 if (iov_iter_is_bvec(i)) 940 836 return iov_iter_alignment_bvec(i); 941 837 838 + /* With both xarray and folioq types, we're dealing with whole folios. */ 839 + if (iov_iter_is_folioq(i)) 840 + return i->iov_offset | i->count; 942 841 if (iov_iter_is_xarray(i)) 943 842 return (i->xarray_start + i->iov_offset) | i->count; 944 843 ··· 992 885 return 0; 993 886 } 994 887 return count; 888 + } 889 + 890 + static ssize_t iter_folioq_get_pages(struct iov_iter *iter, 891 + struct page ***ppages, size_t maxsize, 892 + unsigned maxpages, size_t *_start_offset) 893 + { 894 + const struct folio_queue *folioq = iter->folioq; 895 + struct page **pages; 896 + unsigned int slot = iter->folioq_slot; 897 + size_t extracted = 0, count = iter->count, iov_offset = iter->iov_offset; 898 + 899 + if (slot >= folioq_nr_slots(folioq)) { 900 + folioq = folioq->next; 901 + slot = 0; 902 + if (WARN_ON(iov_offset != 0)) 903 + return -EIO; 904 + } 905 + 906 + maxpages = want_pages_array(ppages, maxsize, iov_offset & ~PAGE_MASK, maxpages); 907 + if (!maxpages) 908 + return -ENOMEM; 909 + *_start_offset = iov_offset & ~PAGE_MASK; 910 + pages = *ppages; 911 + 912 + for (;;) { 913 + struct folio *folio = folioq_folio(folioq, slot); 914 + size_t offset = iov_offset, fsize = folioq_folio_size(folioq, slot); 915 + size_t part = PAGE_SIZE - offset % PAGE_SIZE; 916 + 917 + part = umin(part, umin(maxsize - extracted, fsize - offset)); 918 + count -= part; 919 + iov_offset += part; 920 + extracted += part; 921 + 922 + *pages = folio_page(folio, offset / PAGE_SIZE); 923 + get_page(*pages); 924 + pages++; 925 + maxpages--; 926 + if (maxpages == 0 || extracted >= maxsize) 927 + break; 928 + 929 + if (offset >= fsize) { 930 + iov_offset = 0; 931 + slot++; 932 + if (slot == folioq_nr_slots(folioq) && folioq->next) { 933 + folioq = folioq->next; 934 + slot = 0; 935 + } 936 + } 937 + } 938 + 939 + iter->count = count; 940 + iter->iov_offset = iov_offset; 941 + iter->folioq = folioq; 942 + iter->folioq_slot = slot; 943 + return extracted; 995 944 } 996 945 997 946 static ssize_t iter_xarray_populate_pages(struct page **pages, struct xarray *xa, ··· 1197 1034 } 1198 1035 return maxsize; 1199 1036 } 1037 + if (iov_iter_is_folioq(i)) 1038 + return iter_folioq_get_pages(i, pages, maxsize, maxpages, start); 1200 1039 if (iov_iter_is_xarray(i)) 1201 1040 return iter_xarray_get_pages(i, pages, maxsize, maxpages, start); 1202 1041 return -EFAULT; ··· 1283 1118 return iov_npages(i, maxpages); 1284 1119 if (iov_iter_is_bvec(i)) 1285 1120 return bvec_npages(i, maxpages); 1121 + if (iov_iter_is_folioq(i)) { 1122 + unsigned offset = i->iov_offset % PAGE_SIZE; 1123 + int npages = DIV_ROUND_UP(offset + i->count, PAGE_SIZE); 1124 + return min(npages, maxpages); 1125 + } 1286 1126 if (iov_iter_is_xarray(i)) { 1287 1127 unsigned offset = (i->xarray_start + i->iov_offset) % PAGE_SIZE; 1288 1128 int npages = DIV_ROUND_UP(offset + i->count, PAGE_SIZE); ··· 1569 1399 } 1570 1400 1571 1401 /* 1402 + * Extract a list of contiguous pages from an ITER_FOLIOQ iterator. This does 1403 + * not get references on the pages, nor does it get a pin on them. 1404 + */ 1405 + static ssize_t iov_iter_extract_folioq_pages(struct iov_iter *i, 1406 + struct page ***pages, size_t maxsize, 1407 + unsigned int maxpages, 1408 + iov_iter_extraction_t extraction_flags, 1409 + size_t *offset0) 1410 + { 1411 + const struct folio_queue *folioq = i->folioq; 1412 + struct page **p; 1413 + unsigned int nr = 0; 1414 + size_t extracted = 0, offset, slot = i->folioq_slot; 1415 + 1416 + if (slot >= folioq_nr_slots(folioq)) { 1417 + folioq = folioq->next; 1418 + slot = 0; 1419 + if (WARN_ON(i->iov_offset != 0)) 1420 + return -EIO; 1421 + } 1422 + 1423 + offset = i->iov_offset & ~PAGE_MASK; 1424 + *offset0 = offset; 1425 + 1426 + maxpages = want_pages_array(pages, maxsize, offset, maxpages); 1427 + if (!maxpages) 1428 + return -ENOMEM; 1429 + p = *pages; 1430 + 1431 + for (;;) { 1432 + struct folio *folio = folioq_folio(folioq, slot); 1433 + size_t offset = i->iov_offset, fsize = folioq_folio_size(folioq, slot); 1434 + size_t part = PAGE_SIZE - offset % PAGE_SIZE; 1435 + 1436 + if (offset < fsize) { 1437 + part = umin(part, umin(maxsize - extracted, fsize - offset)); 1438 + i->count -= part; 1439 + i->iov_offset += part; 1440 + extracted += part; 1441 + 1442 + p[nr++] = folio_page(folio, offset / PAGE_SIZE); 1443 + } 1444 + 1445 + if (nr >= maxpages || extracted >= maxsize) 1446 + break; 1447 + 1448 + if (i->iov_offset >= fsize) { 1449 + i->iov_offset = 0; 1450 + slot++; 1451 + if (slot == folioq_nr_slots(folioq) && folioq->next) { 1452 + folioq = folioq->next; 1453 + slot = 0; 1454 + } 1455 + } 1456 + } 1457 + 1458 + i->folioq = folioq; 1459 + i->folioq_slot = slot; 1460 + return extracted; 1461 + } 1462 + 1463 + /* 1572 1464 * Extract a list of contiguous pages from an ITER_XARRAY iterator. This does not 1573 1465 * get references on the pages, nor does it get a pin on them. 1574 1466 */ ··· 1850 1618 * added to the pages, but refs will not be taken. 1851 1619 * iov_iter_extract_will_pin() will return true. 1852 1620 * 1853 - * (*) If the iterator is ITER_KVEC, ITER_BVEC or ITER_XARRAY, the pages are 1854 - * merely listed; no extra refs or pins are obtained. 1621 + * (*) If the iterator is ITER_KVEC, ITER_BVEC, ITER_FOLIOQ or ITER_XARRAY, the 1622 + * pages are merely listed; no extra refs or pins are obtained. 1855 1623 * iov_iter_extract_will_pin() will return 0. 1856 1624 * 1857 1625 * Note also: ··· 1886 1654 return iov_iter_extract_bvec_pages(i, pages, maxsize, 1887 1655 maxpages, extraction_flags, 1888 1656 offset0); 1657 + if (iov_iter_is_folioq(i)) 1658 + return iov_iter_extract_folioq_pages(i, pages, maxsize, 1659 + maxpages, extraction_flags, 1660 + offset0); 1889 1661 if (iov_iter_is_xarray(i)) 1890 1662 return iov_iter_extract_xarray_pages(i, pages, maxsize, 1891 1663 maxpages, extraction_flags,
+259
lib/kunit_iov_iter.c
··· 12 12 #include <linux/mm.h> 13 13 #include <linux/uio.h> 14 14 #include <linux/bvec.h> 15 + #include <linux/folio_queue.h> 15 16 #include <kunit/test.h> 16 17 17 18 MODULE_DESCRIPTION("iov_iter testing"); ··· 62 61 release_pages(pages, got); 63 62 KUNIT_ASSERT_EQ(test, got, npages); 64 63 } 64 + 65 + for (int i = 0; i < npages; i++) 66 + pages[i]->index = i; 65 67 66 68 buffer = vmap(pages, npages, VM_MAP | VM_MAP_PUT_PAGES, PAGE_KERNEL); 67 69 KUNIT_ASSERT_NOT_ERR_OR_NULL(test, buffer); ··· 350 346 351 347 for (j = pr->from; j < pr->to; j++) { 352 348 buffer[i++] = pattern(patt + j); 349 + if (i >= bufsize) 350 + goto stop; 351 + } 352 + } 353 + stop: 354 + 355 + /* Compare the images */ 356 + for (i = 0; i < bufsize; i++) { 357 + KUNIT_EXPECT_EQ_MSG(test, scratch[i], buffer[i], "at i=%x", i); 358 + if (scratch[i] != buffer[i]) 359 + return; 360 + } 361 + 362 + KUNIT_SUCCEED(test); 363 + } 364 + 365 + static void iov_kunit_destroy_folioq(void *data) 366 + { 367 + struct folio_queue *folioq, *next; 368 + 369 + for (folioq = data; folioq; folioq = next) { 370 + next = folioq->next; 371 + for (int i = 0; i < folioq_nr_slots(folioq); i++) 372 + if (folioq_folio(folioq, i)) 373 + folio_put(folioq_folio(folioq, i)); 374 + kfree(folioq); 375 + } 376 + } 377 + 378 + static void __init iov_kunit_load_folioq(struct kunit *test, 379 + struct iov_iter *iter, int dir, 380 + struct folio_queue *folioq, 381 + struct page **pages, size_t npages) 382 + { 383 + struct folio_queue *p = folioq; 384 + size_t size = 0; 385 + int i; 386 + 387 + for (i = 0; i < npages; i++) { 388 + if (folioq_full(p)) { 389 + p->next = kzalloc(sizeof(struct folio_queue), GFP_KERNEL); 390 + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, p->next); 391 + folioq_init(p->next); 392 + p->next->prev = p; 393 + p = p->next; 394 + } 395 + folioq_append(p, page_folio(pages[i])); 396 + size += PAGE_SIZE; 397 + } 398 + iov_iter_folio_queue(iter, dir, folioq, 0, 0, size); 399 + } 400 + 401 + static struct folio_queue *iov_kunit_create_folioq(struct kunit *test) 402 + { 403 + struct folio_queue *folioq; 404 + 405 + folioq = kzalloc(sizeof(struct folio_queue), GFP_KERNEL); 406 + KUNIT_ASSERT_NOT_ERR_OR_NULL(test, folioq); 407 + kunit_add_action_or_reset(test, iov_kunit_destroy_folioq, folioq); 408 + folioq_init(folioq); 409 + return folioq; 410 + } 411 + 412 + /* 413 + * Test copying to a ITER_FOLIOQ-type iterator. 414 + */ 415 + static void __init iov_kunit_copy_to_folioq(struct kunit *test) 416 + { 417 + const struct kvec_test_range *pr; 418 + struct iov_iter iter; 419 + struct folio_queue *folioq; 420 + struct page **spages, **bpages; 421 + u8 *scratch, *buffer; 422 + size_t bufsize, npages, size, copied; 423 + int i, patt; 424 + 425 + bufsize = 0x100000; 426 + npages = bufsize / PAGE_SIZE; 427 + 428 + folioq = iov_kunit_create_folioq(test); 429 + 430 + scratch = iov_kunit_create_buffer(test, &spages, npages); 431 + for (i = 0; i < bufsize; i++) 432 + scratch[i] = pattern(i); 433 + 434 + buffer = iov_kunit_create_buffer(test, &bpages, npages); 435 + memset(buffer, 0, bufsize); 436 + 437 + iov_kunit_load_folioq(test, &iter, READ, folioq, bpages, npages); 438 + 439 + i = 0; 440 + for (pr = kvec_test_ranges; pr->from >= 0; pr++) { 441 + size = pr->to - pr->from; 442 + KUNIT_ASSERT_LE(test, pr->to, bufsize); 443 + 444 + iov_iter_folio_queue(&iter, READ, folioq, 0, 0, pr->to); 445 + iov_iter_advance(&iter, pr->from); 446 + copied = copy_to_iter(scratch + i, size, &iter); 447 + 448 + KUNIT_EXPECT_EQ(test, copied, size); 449 + KUNIT_EXPECT_EQ(test, iter.count, 0); 450 + KUNIT_EXPECT_EQ(test, iter.iov_offset, pr->to % PAGE_SIZE); 451 + i += size; 452 + if (test->status == KUNIT_FAILURE) 453 + goto stop; 454 + } 455 + 456 + /* Build the expected image in the scratch buffer. */ 457 + patt = 0; 458 + memset(scratch, 0, bufsize); 459 + for (pr = kvec_test_ranges; pr->from >= 0; pr++) 460 + for (i = pr->from; i < pr->to; i++) 461 + scratch[i] = pattern(patt++); 462 + 463 + /* Compare the images */ 464 + for (i = 0; i < bufsize; i++) { 465 + KUNIT_EXPECT_EQ_MSG(test, buffer[i], scratch[i], "at i=%x", i); 466 + if (buffer[i] != scratch[i]) 467 + return; 468 + } 469 + 470 + stop: 471 + KUNIT_SUCCEED(test); 472 + } 473 + 474 + /* 475 + * Test copying from a ITER_FOLIOQ-type iterator. 476 + */ 477 + static void __init iov_kunit_copy_from_folioq(struct kunit *test) 478 + { 479 + const struct kvec_test_range *pr; 480 + struct iov_iter iter; 481 + struct folio_queue *folioq; 482 + struct page **spages, **bpages; 483 + u8 *scratch, *buffer; 484 + size_t bufsize, npages, size, copied; 485 + int i, j; 486 + 487 + bufsize = 0x100000; 488 + npages = bufsize / PAGE_SIZE; 489 + 490 + folioq = iov_kunit_create_folioq(test); 491 + 492 + buffer = iov_kunit_create_buffer(test, &bpages, npages); 493 + for (i = 0; i < bufsize; i++) 494 + buffer[i] = pattern(i); 495 + 496 + scratch = iov_kunit_create_buffer(test, &spages, npages); 497 + memset(scratch, 0, bufsize); 498 + 499 + iov_kunit_load_folioq(test, &iter, READ, folioq, bpages, npages); 500 + 501 + i = 0; 502 + for (pr = kvec_test_ranges; pr->from >= 0; pr++) { 503 + size = pr->to - pr->from; 504 + KUNIT_ASSERT_LE(test, pr->to, bufsize); 505 + 506 + iov_iter_folio_queue(&iter, WRITE, folioq, 0, 0, pr->to); 507 + iov_iter_advance(&iter, pr->from); 508 + copied = copy_from_iter(scratch + i, size, &iter); 509 + 510 + KUNIT_EXPECT_EQ(test, copied, size); 511 + KUNIT_EXPECT_EQ(test, iter.count, 0); 512 + KUNIT_EXPECT_EQ(test, iter.iov_offset, pr->to % PAGE_SIZE); 513 + i += size; 514 + } 515 + 516 + /* Build the expected image in the main buffer. */ 517 + i = 0; 518 + memset(buffer, 0, bufsize); 519 + for (pr = kvec_test_ranges; pr->from >= 0; pr++) { 520 + for (j = pr->from; j < pr->to; j++) { 521 + buffer[i++] = pattern(j); 353 522 if (i >= bufsize) 354 523 goto stop; 355 524 } ··· 855 678 } 856 679 857 680 /* 681 + * Test the extraction of ITER_FOLIOQ-type iterators. 682 + */ 683 + static void __init iov_kunit_extract_pages_folioq(struct kunit *test) 684 + { 685 + const struct kvec_test_range *pr; 686 + struct folio_queue *folioq; 687 + struct iov_iter iter; 688 + struct page **bpages, *pagelist[8], **pages = pagelist; 689 + ssize_t len; 690 + size_t bufsize, size = 0, npages; 691 + int i, from; 692 + 693 + bufsize = 0x100000; 694 + npages = bufsize / PAGE_SIZE; 695 + 696 + folioq = iov_kunit_create_folioq(test); 697 + 698 + iov_kunit_create_buffer(test, &bpages, npages); 699 + iov_kunit_load_folioq(test, &iter, READ, folioq, bpages, npages); 700 + 701 + for (pr = kvec_test_ranges; pr->from >= 0; pr++) { 702 + from = pr->from; 703 + size = pr->to - from; 704 + KUNIT_ASSERT_LE(test, pr->to, bufsize); 705 + 706 + iov_iter_folio_queue(&iter, WRITE, folioq, 0, 0, pr->to); 707 + iov_iter_advance(&iter, from); 708 + 709 + do { 710 + size_t offset0 = LONG_MAX; 711 + 712 + for (i = 0; i < ARRAY_SIZE(pagelist); i++) 713 + pagelist[i] = (void *)(unsigned long)0xaa55aa55aa55aa55ULL; 714 + 715 + len = iov_iter_extract_pages(&iter, &pages, 100 * 1024, 716 + ARRAY_SIZE(pagelist), 0, &offset0); 717 + KUNIT_EXPECT_GE(test, len, 0); 718 + if (len < 0) 719 + break; 720 + KUNIT_EXPECT_LE(test, len, size); 721 + KUNIT_EXPECT_EQ(test, iter.count, size - len); 722 + if (len == 0) 723 + break; 724 + size -= len; 725 + KUNIT_EXPECT_GE(test, (ssize_t)offset0, 0); 726 + KUNIT_EXPECT_LT(test, offset0, PAGE_SIZE); 727 + 728 + for (i = 0; i < ARRAY_SIZE(pagelist); i++) { 729 + struct page *p; 730 + ssize_t part = min_t(ssize_t, len, PAGE_SIZE - offset0); 731 + int ix; 732 + 733 + KUNIT_ASSERT_GE(test, part, 0); 734 + ix = from / PAGE_SIZE; 735 + KUNIT_ASSERT_LT(test, ix, npages); 736 + p = bpages[ix]; 737 + KUNIT_EXPECT_PTR_EQ(test, pagelist[i], p); 738 + KUNIT_EXPECT_EQ(test, offset0, from % PAGE_SIZE); 739 + from += part; 740 + len -= part; 741 + KUNIT_ASSERT_GE(test, len, 0); 742 + if (len == 0) 743 + break; 744 + offset0 = 0; 745 + } 746 + 747 + if (test->status == KUNIT_FAILURE) 748 + goto stop; 749 + } while (iov_iter_count(&iter) > 0); 750 + 751 + KUNIT_EXPECT_EQ(test, size, 0); 752 + KUNIT_EXPECT_EQ(test, iter.count, 0); 753 + } 754 + 755 + stop: 756 + KUNIT_SUCCEED(test); 757 + } 758 + 759 + /* 858 760 * Test the extraction of ITER_XARRAY-type iterators. 859 761 */ 860 762 static void __init iov_kunit_extract_pages_xarray(struct kunit *test) ··· 1017 761 KUNIT_CASE(iov_kunit_copy_from_kvec), 1018 762 KUNIT_CASE(iov_kunit_copy_to_bvec), 1019 763 KUNIT_CASE(iov_kunit_copy_from_bvec), 764 + KUNIT_CASE(iov_kunit_copy_to_folioq), 765 + KUNIT_CASE(iov_kunit_copy_from_folioq), 1020 766 KUNIT_CASE(iov_kunit_copy_to_xarray), 1021 767 KUNIT_CASE(iov_kunit_copy_from_xarray), 1022 768 KUNIT_CASE(iov_kunit_extract_pages_kvec), 1023 769 KUNIT_CASE(iov_kunit_extract_pages_bvec), 770 + KUNIT_CASE(iov_kunit_extract_pages_folioq), 1024 771 KUNIT_CASE(iov_kunit_extract_pages_xarray), 1025 772 {} 1026 773 };
+67 -2
lib/scatterlist.c
··· 11 11 #include <linux/kmemleak.h> 12 12 #include <linux/bvec.h> 13 13 #include <linux/uio.h> 14 + #include <linux/folio_queue.h> 14 15 15 16 /** 16 17 * sg_next - return the next scatterlist entry in a list ··· 1263 1262 } 1264 1263 1265 1264 /* 1265 + * Extract up to sg_max folios from an FOLIOQ-type iterator and add them to 1266 + * the scatterlist. The pages are not pinned. 1267 + */ 1268 + static ssize_t extract_folioq_to_sg(struct iov_iter *iter, 1269 + ssize_t maxsize, 1270 + struct sg_table *sgtable, 1271 + unsigned int sg_max, 1272 + iov_iter_extraction_t extraction_flags) 1273 + { 1274 + const struct folio_queue *folioq = iter->folioq; 1275 + struct scatterlist *sg = sgtable->sgl + sgtable->nents; 1276 + unsigned int slot = iter->folioq_slot; 1277 + ssize_t ret = 0; 1278 + size_t offset = iter->iov_offset; 1279 + 1280 + BUG_ON(!folioq); 1281 + 1282 + if (slot >= folioq_nr_slots(folioq)) { 1283 + folioq = folioq->next; 1284 + if (WARN_ON_ONCE(!folioq)) 1285 + return 0; 1286 + slot = 0; 1287 + } 1288 + 1289 + do { 1290 + struct folio *folio = folioq_folio(folioq, slot); 1291 + size_t fsize = folioq_folio_size(folioq, slot); 1292 + 1293 + if (offset < fsize) { 1294 + size_t part = umin(maxsize - ret, fsize - offset); 1295 + 1296 + sg_set_page(sg, folio_page(folio, 0), part, offset); 1297 + sgtable->nents++; 1298 + sg++; 1299 + sg_max--; 1300 + offset += part; 1301 + ret += part; 1302 + } 1303 + 1304 + if (offset >= fsize) { 1305 + offset = 0; 1306 + slot++; 1307 + if (slot >= folioq_nr_slots(folioq)) { 1308 + if (!folioq->next) { 1309 + WARN_ON_ONCE(ret < iter->count); 1310 + break; 1311 + } 1312 + folioq = folioq->next; 1313 + slot = 0; 1314 + } 1315 + } 1316 + } while (sg_max > 0 && ret < maxsize); 1317 + 1318 + iter->folioq = folioq; 1319 + iter->folioq_slot = slot; 1320 + iter->iov_offset = offset; 1321 + iter->count -= ret; 1322 + return ret; 1323 + } 1324 + 1325 + /* 1266 1326 * Extract up to sg_max folios from an XARRAY-type iterator and add them to 1267 1327 * the scatterlist. The pages are not pinned. 1268 1328 */ ··· 1385 1323 * addition of @sg_max elements. 1386 1324 * 1387 1325 * The pages referred to by UBUF- and IOVEC-type iterators are extracted and 1388 - * pinned; BVEC-, KVEC- and XARRAY-type are extracted but aren't pinned; PIPE- 1389 - * and DISCARD-type are not supported. 1326 + * pinned; BVEC-, KVEC-, FOLIOQ- and XARRAY-type are extracted but aren't 1327 + * pinned; DISCARD-type is not supported. 1390 1328 * 1391 1329 * No end mark is placed on the scatterlist; that's left to the caller. 1392 1330 * ··· 1418 1356 case ITER_KVEC: 1419 1357 return extract_kvec_to_sg(iter, maxsize, sgtable, sg_max, 1420 1358 extraction_flags); 1359 + case ITER_FOLIOQ: 1360 + return extract_folioq_to_sg(iter, maxsize, sgtable, sg_max, 1361 + extraction_flags); 1421 1362 case ITER_XARRAY: 1422 1363 return extract_xarray_to_sg(iter, maxsize, sgtable, sg_max, 1423 1364 extraction_flags);