Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

cifs: Change the I/O paths to use an iterator rather than a page list

Currently, the cifs I/O paths hand lists of pages from the VM interface
routines at the top all the way through the intervening layers to the
socket interface at the bottom.

This is a problem, however, for interfacing with netfslib which passes an
iterator through to the ->issue_read() method (and will pass an iterator
through to the ->issue_write() method in future). Netfslib takes over
bounce buffering for direct I/O, async I/O and encrypted content, so cifs
doesn't need to do that. Netfslib also converts IOVEC-type iterators into
BVEC-type iterators if necessary.

Further, cifs needs foliating - and folios may come in a variety of sizes,
so a page list pointing to an array of heterogeneous pages may cause
problems in places such as where crypto is done.

Change the cifs I/O paths to hand iov_iter iterators all the way through
instead.

Notes:

(1) Some old routines are #if'd out to be removed in a follow up patch so
as to avoid confusing diff, thereby making the diff output easier to
follow. I've removed functions that don't overlap with anything
added.

(2) struct smb_rqst loses rq_pages, rq_offset, rq_npages, rq_pagesz and
rq_tailsz which describe the pages forming the buffer; instead there's
an rq_iter describing the source buffer and an rq_buffer which is used
to hold the buffer for encryption.

(3) struct cifs_readdata and cifs_writedata are similarly modified to
smb_rqst. The ->read_into_pages() and ->copy_into_pages() are then
replaced with passing the iterator directly to the socket.

The iterators are stored in these structs so that they are persistent
and don't get deallocated when the function returns (unlike if they
were stack variables).

(4) Buffered writeback is overhauled, borrowing the code from the afs
filesystem to gather up contiguous runs of folios. The XARRAY-type
iterator is then used to refer directly to the pagecache and can be
passed to the socket to transmit data directly from there.

This includes:

cifs_extend_writeback()
cifs_write_back_from_locked_folio()
cifs_writepages_region()
cifs_writepages()

(5) Pages are converted to folios.

(6) Direct I/O uses netfs_extract_user_iter() to create a BVEC-type
iterator from an IOBUF/UBUF-type source iterator.

(7) smb2_get_aead_req() uses netfs_extract_iter_to_sg() to extract page
fragments from the iterator into the scatterlists that the crypto
layer prefers.

(8) smb2_init_transform_rq() attached pages to smb_rqst::rq_buffer, an
xarray, to use as a bounce buffer for encryption. An XARRAY-type
iterator can then be used to pass the bounce buffer to lower layers.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Steve French <sfrench@samba.org>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Paulo Alcantara <pc@cjr.nz>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org

Link: https://lore.kernel.org/r/164311907995.2806745.400147335497304099.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/164928620163.457102.11602306234438271112.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165211420279.3154751.15923591172438186144.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165348880385.2106726.3220789453472800240.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/165364827111.3334034.934805882842932881.stgit@warthog.procyon.org.uk/ # v3
Link: https://lore.kernel.org/r/166126396180.708021.271013668175370826.stgit@warthog.procyon.org.uk/ # v1
Link: https://lore.kernel.org/r/166697259595.61150.5982032408321852414.stgit@warthog.procyon.org.uk/ # rfc
Link: https://lore.kernel.org/r/166732031756.3186319.12528413619888902872.stgit@warthog.procyon.org.uk/ # rfc
Signed-off-by: Steve French <stfrench@microsoft.com>

authored by

David Howells and committed by
Steve French
d08089f6 16541195

+1134 -1092
+1
fs/cifs/Kconfig
··· 18 18 select DNS_RESOLVER 19 19 select ASN1 20 20 select OID_REGISTRY 21 + select NETFS_SUPPORT 21 22 help 22 23 This is the client VFS module for the SMB3 family of network file 23 24 protocols (including the most recent, most secure dialect SMB3.1.1).
+6 -22
fs/cifs/cifsencrypt.c
··· 169 169 } 170 170 171 171 int __cifs_calc_signature(struct smb_rqst *rqst, 172 - struct TCP_Server_Info *server, char *signature, 173 - struct shash_desc *shash) 172 + struct TCP_Server_Info *server, char *signature, 173 + struct shash_desc *shash) 174 174 { 175 175 int i; 176 - int rc; 176 + ssize_t rc; 177 177 struct kvec *iov = rqst->rq_iov; 178 178 int n_vec = rqst->rq_nvec; 179 179 ··· 205 205 } 206 206 } 207 207 208 - /* now hash over the rq_pages array */ 209 - for (i = 0; i < rqst->rq_npages; i++) { 210 - void *kaddr; 211 - unsigned int len, offset; 212 - 213 - rqst_page_get_length(rqst, i, &len, &offset); 214 - 215 - kaddr = (char *) kmap(rqst->rq_pages[i]) + offset; 216 - 217 - rc = crypto_shash_update(shash, kaddr, len); 218 - if (rc) { 219 - cifs_dbg(VFS, "%s: Could not update with payload\n", 220 - __func__); 221 - kunmap(rqst->rq_pages[i]); 222 - return rc; 223 - } 224 - 225 - kunmap(rqst->rq_pages[i]); 226 - } 208 + rc = cifs_shash_iter(&rqst->rq_iter, iov_iter_count(&rqst->rq_iter), shash); 209 + if (rc < 0) 210 + return rc; 227 211 228 212 rc = crypto_shash_final(shash, signature); 229 213 if (rc)
+33 -33
fs/cifs/cifsglob.h
··· 212 212 struct smb_rqst { 213 213 struct kvec *rq_iov; /* array of kvecs */ 214 214 unsigned int rq_nvec; /* number of kvecs in array */ 215 - struct page **rq_pages; /* pointer to array of page ptrs */ 216 - unsigned int rq_offset; /* the offset to the 1st page */ 217 - unsigned int rq_npages; /* number pages in array */ 218 - unsigned int rq_pagesz; /* page size to use */ 219 - unsigned int rq_tailsz; /* length of last page */ 215 + size_t rq_iter_size; /* Amount of data in ->rq_iter */ 216 + struct iov_iter rq_iter; /* Data iterator */ 217 + struct xarray rq_buffer; /* Page buffer for encryption */ 220 218 }; 221 219 222 220 struct mid_q_entry; ··· 1419 1421 struct cifsFileInfo *cfile; 1420 1422 struct bio_vec *bv; 1421 1423 loff_t pos; 1422 - unsigned int npages; 1424 + unsigned int nr_pinned_pages; 1423 1425 ssize_t rc; 1424 1426 unsigned int len; 1425 1427 unsigned int total_len; 1428 + unsigned int bv_need_unpin; /* If ->bv[] needs unpinning */ 1426 1429 bool should_dirty; 1427 1430 /* 1428 1431 * Indicates if this aio_ctx is for direct_io, ··· 1441 1442 struct address_space *mapping; 1442 1443 struct cifs_aio_ctx *ctx; 1443 1444 __u64 offset; 1445 + ssize_t got_bytes; 1444 1446 unsigned int bytes; 1445 - unsigned int got_bytes; 1446 1447 pid_t pid; 1447 1448 int result; 1448 1449 struct work_struct work; 1449 - int (*read_into_pages)(struct TCP_Server_Info *server, 1450 - struct cifs_readdata *rdata, 1451 - unsigned int len); 1452 - int (*copy_into_pages)(struct TCP_Server_Info *server, 1453 - struct cifs_readdata *rdata, 1454 - struct iov_iter *iter); 1450 + struct iov_iter iter; 1455 1451 struct kvec iov[2]; 1456 1452 struct TCP_Server_Info *server; 1457 1453 #ifdef CONFIG_CIFS_SMB_DIRECT 1458 1454 struct smbd_mr *mr; 1459 1455 #endif 1460 - unsigned int pagesz; 1461 - unsigned int page_offset; 1462 - unsigned int tailsz; 1463 1456 struct cifs_credits credits; 1464 - unsigned int nr_pages; 1465 - struct page **pages; 1466 1457 }; 1467 1458 1468 1459 /* asynchronous write support */ ··· 1464 1475 struct work_struct work; 1465 1476 struct cifsFileInfo *cfile; 1466 1477 struct cifs_aio_ctx *ctx; 1478 + struct iov_iter iter; 1479 + struct bio_vec *bv; 1467 1480 __u64 offset; 1468 1481 pid_t pid; 1469 1482 unsigned int bytes; ··· 1474 1483 #ifdef CONFIG_CIFS_SMB_DIRECT 1475 1484 struct smbd_mr *mr; 1476 1485 #endif 1477 - unsigned int pagesz; 1478 - unsigned int page_offset; 1479 - unsigned int tailsz; 1480 1486 struct cifs_credits credits; 1481 - unsigned int nr_pages; 1482 - struct page **pages; 1483 1487 }; 1484 1488 1485 1489 /* ··· 2134 2148 dst->FileNameLength = src->FileNameLength; 2135 2149 } 2136 2150 2137 - static inline unsigned int cifs_get_num_sgs(const struct smb_rqst *rqst, 2138 - int num_rqst, 2139 - const u8 *sig) 2151 + static inline int cifs_get_num_sgs(const struct smb_rqst *rqst, 2152 + int num_rqst, 2153 + const u8 *sig) 2140 2154 { 2141 2155 unsigned int len, skip; 2142 2156 unsigned int nents = 0; ··· 2156 2170 * rqst[1+].rq_iov[0+] data to be encrypted/decrypted 2157 2171 */ 2158 2172 for (i = 0; i < num_rqst; i++) { 2173 + /* We really don't want a mixture of pinned and unpinned pages 2174 + * in the sglist. It's hard to keep track of which is what. 2175 + * Instead, we convert to a BVEC-type iterator higher up. 2176 + */ 2177 + if (WARN_ON_ONCE(user_backed_iter(&rqst[i].rq_iter))) 2178 + return -EIO; 2179 + 2180 + /* We also don't want to have any extra refs or pins to clean 2181 + * up in the sglist. 2182 + */ 2183 + if (WARN_ON_ONCE(iov_iter_extract_will_pin(&rqst[i].rq_iter))) 2184 + return -EIO; 2185 + 2159 2186 for (j = 0; j < rqst[i].rq_nvec; j++) { 2160 2187 struct kvec *iov = &rqst[i].rq_iov[j]; 2161 2188 ··· 2182 2183 } 2183 2184 skip = 0; 2184 2185 } 2185 - nents += rqst[i].rq_npages; 2186 + nents += iov_iter_npages(&rqst[i].rq_iter, INT_MAX); 2186 2187 } 2187 2188 nents += DIV_ROUND_UP(offset_in_page(sig) + SMB2_SIGNATURE_SIZE, PAGE_SIZE); 2188 2189 return nents; ··· 2191 2192 /* We can not use the normal sg_set_buf() as we will sometimes pass a 2192 2193 * stack object as buf. 2193 2194 */ 2194 - static inline struct scatterlist *cifs_sg_set_buf(struct scatterlist *sg, 2195 - const void *buf, 2196 - unsigned int buflen) 2195 + static inline void cifs_sg_set_buf(struct sg_table *sgtable, 2196 + const void *buf, 2197 + unsigned int buflen) 2197 2198 { 2198 2199 unsigned long addr = (unsigned long)buf; 2199 2200 unsigned int off = offset_in_page(addr); ··· 2203 2204 do { 2204 2205 unsigned int len = min_t(unsigned int, buflen, PAGE_SIZE - off); 2205 2206 2206 - sg_set_page(sg++, vmalloc_to_page((void *)addr), len, off); 2207 + sg_set_page(&sgtable->sgl[sgtable->nents++], 2208 + vmalloc_to_page((void *)addr), len, off); 2207 2209 2208 2210 off = 0; 2209 2211 addr += PAGE_SIZE; 2210 2212 buflen -= len; 2211 2213 } while (buflen); 2212 2214 } else { 2213 - sg_set_page(sg++, virt_to_page(addr), buflen, off); 2215 + sg_set_page(&sgtable->sgl[sgtable->nents++], 2216 + virt_to_page(addr), buflen, off); 2214 2217 } 2215 - return sg; 2216 2218 } 2217 2219 2218 2220 #endif /* _CIFS_GLOB_H */
+1 -7
fs/cifs/cifsproto.h
··· 584 584 int cifs_async_writev(struct cifs_writedata *wdata, 585 585 void (*release)(struct kref *kref)); 586 586 void cifs_writev_complete(struct work_struct *work); 587 - struct cifs_writedata *cifs_writedata_alloc(unsigned int nr_pages, 588 - work_func_t complete); 589 - struct cifs_writedata *cifs_writedata_direct_alloc(struct page **pages, 590 - work_func_t complete); 587 + struct cifs_writedata *cifs_writedata_alloc(work_func_t complete); 591 588 void cifs_writedata_release(struct kref *refcount); 592 589 int cifs_query_mf_symlink(unsigned int xid, struct cifs_tcon *tcon, 593 590 struct cifs_sb_info *cifs_sb, ··· 601 604 enum securityEnum); 602 605 struct cifs_aio_ctx *cifs_aio_ctx_alloc(void); 603 606 void cifs_aio_ctx_release(struct kref *refcount); 604 - int setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw); 605 607 606 608 int cifs_alloc_hash(const char *name, struct shash_desc **sdesc); 607 609 void cifs_free_hash(struct shash_desc **sdesc); 608 610 609 - void rqst_page_get_length(const struct smb_rqst *rqst, unsigned int page, 610 - unsigned int *len, unsigned int *offset); 611 611 struct cifs_chan * 612 612 cifs_ses_find_chan(struct cifs_ses *ses, struct TCP_Server_Info *server); 613 613 int cifs_try_adding_channels(struct cifs_sb_info *cifs_sb, struct cifs_ses *ses);
+5 -10
fs/cifs/cifssmb.c
··· 24 24 #include <linux/task_io_accounting_ops.h> 25 25 #include <linux/uaccess.h> 26 26 #include "cifspdu.h" 27 + #include "cifsfs.h" 27 28 #include "cifsglob.h" 28 29 #include "cifsacl.h" 29 30 #include "cifsproto.h" ··· 1295 1294 struct TCP_Server_Info *server = tcon->ses->server; 1296 1295 struct smb_rqst rqst = { .rq_iov = rdata->iov, 1297 1296 .rq_nvec = 2, 1298 - .rq_pages = rdata->pages, 1299 - .rq_offset = rdata->page_offset, 1300 - .rq_npages = rdata->nr_pages, 1301 - .rq_pagesz = rdata->pagesz, 1302 - .rq_tailsz = rdata->tailsz }; 1297 + .rq_iter_size = iov_iter_count(&rdata->iter), 1298 + .rq_iter = rdata->iter }; 1303 1299 struct cifs_credits credits = { .value = 1, .instance = 0 }; 1304 1300 1305 1301 cifs_dbg(FYI, "%s: mid=%llu state=%d result=%d bytes=%u\n", ··· 1735 1737 1736 1738 rqst.rq_iov = iov; 1737 1739 rqst.rq_nvec = 2; 1738 - rqst.rq_pages = wdata->pages; 1739 - rqst.rq_offset = wdata->page_offset; 1740 - rqst.rq_npages = wdata->nr_pages; 1741 - rqst.rq_pagesz = wdata->pagesz; 1742 - rqst.rq_tailsz = wdata->tailsz; 1740 + rqst.rq_iter = wdata->iter; 1741 + rqst.rq_iter_size = iov_iter_count(&wdata->iter); 1743 1742 1744 1743 cifs_dbg(FYI, "async write at %llu %u bytes\n", 1745 1744 wdata->offset, wdata->bytes);
+749 -448
fs/cifs/file.c
··· 37 37 #include "cached_dir.h" 38 38 39 39 /* 40 + * Remove the dirty flags from a span of pages. 41 + */ 42 + static void cifs_undirty_folios(struct inode *inode, loff_t start, unsigned int len) 43 + { 44 + struct address_space *mapping = inode->i_mapping; 45 + struct folio *folio; 46 + pgoff_t end; 47 + 48 + XA_STATE(xas, &mapping->i_pages, start / PAGE_SIZE); 49 + 50 + rcu_read_lock(); 51 + 52 + end = (start + len - 1) / PAGE_SIZE; 53 + xas_for_each_marked(&xas, folio, end, PAGECACHE_TAG_DIRTY) { 54 + xas_pause(&xas); 55 + rcu_read_unlock(); 56 + folio_lock(folio); 57 + folio_clear_dirty_for_io(folio); 58 + folio_unlock(folio); 59 + rcu_read_lock(); 60 + } 61 + 62 + rcu_read_unlock(); 63 + } 64 + 65 + /* 40 66 * Completion of write to server. 41 67 */ 42 68 void cifs_pages_written_back(struct inode *inode, loff_t start, unsigned int len) ··· 2417 2391 if (wdata->cfile) 2418 2392 cifsFileInfo_put(wdata->cfile); 2419 2393 2420 - kvfree(wdata->pages); 2421 2394 kfree(wdata); 2422 2395 } 2423 2396 ··· 2427 2402 static void 2428 2403 cifs_writev_requeue(struct cifs_writedata *wdata) 2429 2404 { 2430 - int i, rc = 0; 2405 + int rc = 0; 2431 2406 struct inode *inode = d_inode(wdata->cfile->dentry); 2432 2407 struct TCP_Server_Info *server; 2433 - unsigned int rest_len; 2408 + unsigned int rest_len = wdata->bytes; 2409 + loff_t fpos = wdata->offset; 2434 2410 2435 2411 server = tlink_tcon(wdata->cfile->tlink)->ses->server; 2436 - i = 0; 2437 - rest_len = wdata->bytes; 2438 2412 do { 2439 2413 struct cifs_writedata *wdata2; 2440 - unsigned int j, nr_pages, wsize, tailsz, cur_len; 2414 + unsigned int wsize, cur_len; 2441 2415 2442 2416 wsize = server->ops->wp_retry_size(inode); 2443 2417 if (wsize < rest_len) { 2444 - nr_pages = wsize / PAGE_SIZE; 2445 - if (!nr_pages) { 2418 + if (wsize < PAGE_SIZE) { 2446 2419 rc = -EOPNOTSUPP; 2447 2420 break; 2448 2421 } 2449 - cur_len = nr_pages * PAGE_SIZE; 2450 - tailsz = PAGE_SIZE; 2422 + cur_len = min(round_down(wsize, PAGE_SIZE), rest_len); 2451 2423 } else { 2452 - nr_pages = DIV_ROUND_UP(rest_len, PAGE_SIZE); 2453 2424 cur_len = rest_len; 2454 - tailsz = rest_len - (nr_pages - 1) * PAGE_SIZE; 2455 2425 } 2456 2426 2457 - wdata2 = cifs_writedata_alloc(nr_pages, cifs_writev_complete); 2427 + wdata2 = cifs_writedata_alloc(cifs_writev_complete); 2458 2428 if (!wdata2) { 2459 2429 rc = -ENOMEM; 2460 2430 break; 2461 2431 } 2462 2432 2463 - for (j = 0; j < nr_pages; j++) { 2464 - wdata2->pages[j] = wdata->pages[i + j]; 2465 - lock_page(wdata2->pages[j]); 2466 - clear_page_dirty_for_io(wdata2->pages[j]); 2467 - } 2468 - 2469 2433 wdata2->sync_mode = wdata->sync_mode; 2470 - wdata2->nr_pages = nr_pages; 2471 - wdata2->offset = page_offset(wdata2->pages[0]); 2472 - wdata2->pagesz = PAGE_SIZE; 2473 - wdata2->tailsz = tailsz; 2474 - wdata2->bytes = cur_len; 2434 + wdata2->offset = fpos; 2435 + wdata2->bytes = cur_len; 2436 + wdata2->iter = wdata->iter; 2437 + 2438 + iov_iter_advance(&wdata2->iter, fpos - wdata->offset); 2439 + iov_iter_truncate(&wdata2->iter, wdata2->bytes); 2440 + 2441 + if (iov_iter_is_xarray(&wdata2->iter)) 2442 + /* Check for pages having been redirtied and clean 2443 + * them. We can do this by walking the xarray. If 2444 + * it's not an xarray, then it's a DIO and we shouldn't 2445 + * be mucking around with the page bits. 2446 + */ 2447 + cifs_undirty_folios(inode, fpos, cur_len); 2475 2448 2476 2449 rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, 2477 2450 &wdata2->cfile); ··· 2484 2461 cifs_writedata_release); 2485 2462 } 2486 2463 2487 - for (j = 0; j < nr_pages; j++) { 2488 - unlock_page(wdata2->pages[j]); 2489 - if (rc != 0 && !is_retryable_error(rc)) { 2490 - SetPageError(wdata2->pages[j]); 2491 - end_page_writeback(wdata2->pages[j]); 2492 - put_page(wdata2->pages[j]); 2493 - } 2494 - } 2495 - 2496 2464 kref_put(&wdata2->refcount, cifs_writedata_release); 2497 2465 if (rc) { 2498 2466 if (is_retryable_error(rc)) 2499 2467 continue; 2500 - i += nr_pages; 2468 + fpos += cur_len; 2469 + rest_len -= cur_len; 2501 2470 break; 2502 2471 } 2503 2472 2473 + fpos += cur_len; 2504 2474 rest_len -= cur_len; 2505 - i += nr_pages; 2506 - } while (i < wdata->nr_pages); 2475 + } while (rest_len > 0); 2507 2476 2508 - /* cleanup remaining pages from the original wdata */ 2509 - for (; i < wdata->nr_pages; i++) { 2510 - SetPageError(wdata->pages[i]); 2511 - end_page_writeback(wdata->pages[i]); 2512 - put_page(wdata->pages[i]); 2513 - } 2477 + /* Clean up remaining pages from the original wdata */ 2478 + if (iov_iter_is_xarray(&wdata->iter)) 2479 + cifs_pages_write_failed(inode, fpos, rest_len); 2514 2480 2515 2481 if (rc != 0 && !is_retryable_error(rc)) 2516 2482 mapping_set_error(inode->i_mapping, rc); ··· 2512 2500 struct cifs_writedata *wdata = container_of(work, 2513 2501 struct cifs_writedata, work); 2514 2502 struct inode *inode = d_inode(wdata->cfile->dentry); 2515 - int i = 0; 2516 2503 2517 2504 if (wdata->result == 0) { 2518 2505 spin_lock(&inode->i_lock); ··· 2522 2511 } else if (wdata->sync_mode == WB_SYNC_ALL && wdata->result == -EAGAIN) 2523 2512 return cifs_writev_requeue(wdata); 2524 2513 2525 - for (i = 0; i < wdata->nr_pages; i++) { 2526 - struct page *page = wdata->pages[i]; 2514 + if (wdata->result == -EAGAIN) 2515 + cifs_pages_write_redirty(inode, wdata->offset, wdata->bytes); 2516 + else if (wdata->result < 0) 2517 + cifs_pages_write_failed(inode, wdata->offset, wdata->bytes); 2518 + else 2519 + cifs_pages_written_back(inode, wdata->offset, wdata->bytes); 2527 2520 2528 - if (wdata->result == -EAGAIN) 2529 - __set_page_dirty_nobuffers(page); 2530 - else if (wdata->result < 0) 2531 - SetPageError(page); 2532 - end_page_writeback(page); 2533 - cifs_readpage_to_fscache(inode, page); 2534 - put_page(page); 2535 - } 2536 2521 if (wdata->result != -EAGAIN) 2537 2522 mapping_set_error(inode->i_mapping, wdata->result); 2538 2523 kref_put(&wdata->refcount, cifs_writedata_release); 2539 2524 } 2540 2525 2541 - struct cifs_writedata * 2542 - cifs_writedata_alloc(unsigned int nr_pages, work_func_t complete) 2543 - { 2544 - struct cifs_writedata *writedata = NULL; 2545 - struct page **pages = 2546 - kcalloc(nr_pages, sizeof(struct page *), GFP_NOFS); 2547 - if (pages) { 2548 - writedata = cifs_writedata_direct_alloc(pages, complete); 2549 - if (!writedata) 2550 - kvfree(pages); 2551 - } 2552 - 2553 - return writedata; 2554 - } 2555 - 2556 - struct cifs_writedata * 2557 - cifs_writedata_direct_alloc(struct page **pages, work_func_t complete) 2526 + struct cifs_writedata *cifs_writedata_alloc(work_func_t complete) 2558 2527 { 2559 2528 struct cifs_writedata *wdata; 2560 2529 2561 2530 wdata = kzalloc(sizeof(*wdata), GFP_NOFS); 2562 2531 if (wdata != NULL) { 2563 - wdata->pages = pages; 2564 2532 kref_init(&wdata->refcount); 2565 2533 INIT_LIST_HEAD(&wdata->list); 2566 2534 init_completion(&wdata->done); ··· 2547 2557 } 2548 2558 return wdata; 2549 2559 } 2550 - 2551 2560 2552 2561 static int cifs_partialpagewrite(struct page *page, unsigned from, unsigned to) 2553 2562 { ··· 2606 2617 return rc; 2607 2618 } 2608 2619 2620 + #if 0 // TODO: Remove for iov_iter support 2609 2621 static struct cifs_writedata * 2610 2622 wdata_alloc_and_fillpages(pgoff_t tofind, struct address_space *mapping, 2611 2623 pgoff_t end, pgoff_t *index, ··· 2912 2922 set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags); 2913 2923 return rc; 2914 2924 } 2925 + #endif 2926 + 2927 + /* 2928 + * Extend the region to be written back to include subsequent contiguously 2929 + * dirty pages if possible, but don't sleep while doing so. 2930 + */ 2931 + static void cifs_extend_writeback(struct address_space *mapping, 2932 + long *_count, 2933 + loff_t start, 2934 + int max_pages, 2935 + size_t max_len, 2936 + unsigned int *_len) 2937 + { 2938 + struct folio_batch batch; 2939 + struct folio *folio; 2940 + unsigned int psize, nr_pages; 2941 + size_t len = *_len; 2942 + pgoff_t index = (start + len) / PAGE_SIZE; 2943 + bool stop = true; 2944 + unsigned int i; 2945 + XA_STATE(xas, &mapping->i_pages, index); 2946 + 2947 + folio_batch_init(&batch); 2948 + 2949 + do { 2950 + /* Firstly, we gather up a batch of contiguous dirty pages 2951 + * under the RCU read lock - but we can't clear the dirty flags 2952 + * there if any of those pages are mapped. 2953 + */ 2954 + rcu_read_lock(); 2955 + 2956 + xas_for_each(&xas, folio, ULONG_MAX) { 2957 + stop = true; 2958 + if (xas_retry(&xas, folio)) 2959 + continue; 2960 + if (xa_is_value(folio)) 2961 + break; 2962 + if (folio_index(folio) != index) 2963 + break; 2964 + if (!folio_try_get_rcu(folio)) { 2965 + xas_reset(&xas); 2966 + continue; 2967 + } 2968 + nr_pages = folio_nr_pages(folio); 2969 + if (nr_pages > max_pages) 2970 + break; 2971 + 2972 + /* Has the page moved or been split? */ 2973 + if (unlikely(folio != xas_reload(&xas))) { 2974 + folio_put(folio); 2975 + break; 2976 + } 2977 + 2978 + if (!folio_trylock(folio)) { 2979 + folio_put(folio); 2980 + break; 2981 + } 2982 + if (!folio_test_dirty(folio) || folio_test_writeback(folio)) { 2983 + folio_unlock(folio); 2984 + folio_put(folio); 2985 + break; 2986 + } 2987 + 2988 + max_pages -= nr_pages; 2989 + psize = folio_size(folio); 2990 + len += psize; 2991 + stop = false; 2992 + if (max_pages <= 0 || len >= max_len || *_count <= 0) 2993 + stop = true; 2994 + 2995 + index += nr_pages; 2996 + if (!folio_batch_add(&batch, folio)) 2997 + break; 2998 + if (stop) 2999 + break; 3000 + } 3001 + 3002 + if (!stop) 3003 + xas_pause(&xas); 3004 + rcu_read_unlock(); 3005 + 3006 + /* Now, if we obtained any pages, we can shift them to being 3007 + * writable and mark them for caching. 3008 + */ 3009 + if (!folio_batch_count(&batch)) 3010 + break; 3011 + 3012 + for (i = 0; i < folio_batch_count(&batch); i++) { 3013 + folio = batch.folios[i]; 3014 + /* The folio should be locked, dirty and not undergoing 3015 + * writeback from the loop above. 3016 + */ 3017 + if (!folio_clear_dirty_for_io(folio)) 3018 + WARN_ON(1); 3019 + if (folio_start_writeback(folio)) 3020 + WARN_ON(1); 3021 + 3022 + *_count -= folio_nr_pages(folio); 3023 + folio_unlock(folio); 3024 + } 3025 + 3026 + folio_batch_release(&batch); 3027 + cond_resched(); 3028 + } while (!stop); 3029 + 3030 + *_len = len; 3031 + } 3032 + 3033 + /* 3034 + * Write back the locked page and any subsequent non-locked dirty pages. 3035 + */ 3036 + static ssize_t cifs_write_back_from_locked_folio(struct address_space *mapping, 3037 + struct writeback_control *wbc, 3038 + struct folio *folio, 3039 + loff_t start, loff_t end) 3040 + { 3041 + struct inode *inode = mapping->host; 3042 + struct TCP_Server_Info *server; 3043 + struct cifs_writedata *wdata; 3044 + struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb); 3045 + struct cifs_credits credits_on_stack; 3046 + struct cifs_credits *credits = &credits_on_stack; 3047 + struct cifsFileInfo *cfile = NULL; 3048 + unsigned int xid, wsize, len; 3049 + loff_t i_size = i_size_read(inode); 3050 + size_t max_len; 3051 + long count = wbc->nr_to_write; 3052 + int rc; 3053 + 3054 + /* The folio should be locked, dirty and not undergoing writeback. */ 3055 + if (folio_start_writeback(folio)) 3056 + WARN_ON(1); 3057 + 3058 + count -= folio_nr_pages(folio); 3059 + len = folio_size(folio); 3060 + 3061 + xid = get_xid(); 3062 + server = cifs_pick_channel(cifs_sb_master_tcon(cifs_sb)->ses); 3063 + 3064 + rc = cifs_get_writable_file(CIFS_I(inode), FIND_WR_ANY, &cfile); 3065 + if (rc) { 3066 + cifs_dbg(VFS, "No writable handle in writepages rc=%d\n", rc); 3067 + goto err_xid; 3068 + } 3069 + 3070 + rc = server->ops->wait_mtu_credits(server, cifs_sb->ctx->wsize, 3071 + &wsize, credits); 3072 + if (rc != 0) 3073 + goto err_close; 3074 + 3075 + wdata = cifs_writedata_alloc(cifs_writev_complete); 3076 + if (!wdata) { 3077 + rc = -ENOMEM; 3078 + goto err_uncredit; 3079 + } 3080 + 3081 + wdata->sync_mode = wbc->sync_mode; 3082 + wdata->offset = folio_pos(folio); 3083 + wdata->pid = cfile->pid; 3084 + wdata->credits = credits_on_stack; 3085 + wdata->cfile = cfile; 3086 + wdata->server = server; 3087 + cfile = NULL; 3088 + 3089 + /* Find all consecutive lockable dirty pages, stopping when we find a 3090 + * page that is not immediately lockable, is not dirty or is missing, 3091 + * or we reach the end of the range. 3092 + */ 3093 + if (start < i_size) { 3094 + /* Trim the write to the EOF; the extra data is ignored. Also 3095 + * put an upper limit on the size of a single storedata op. 3096 + */ 3097 + max_len = wsize; 3098 + max_len = min_t(unsigned long long, max_len, end - start + 1); 3099 + max_len = min_t(unsigned long long, max_len, i_size - start); 3100 + 3101 + if (len < max_len) { 3102 + int max_pages = INT_MAX; 3103 + 3104 + #ifdef CONFIG_CIFS_SMB_DIRECT 3105 + if (server->smbd_conn) 3106 + max_pages = server->smbd_conn->max_frmr_depth; 3107 + #endif 3108 + max_pages -= folio_nr_pages(folio); 3109 + 3110 + if (max_pages > 0) 3111 + cifs_extend_writeback(mapping, &count, start, 3112 + max_pages, max_len, &len); 3113 + } 3114 + len = min_t(loff_t, len, max_len); 3115 + } 3116 + 3117 + wdata->bytes = len; 3118 + 3119 + /* We now have a contiguous set of dirty pages, each with writeback 3120 + * set; the first page is still locked at this point, but all the rest 3121 + * have been unlocked. 3122 + */ 3123 + folio_unlock(folio); 3124 + 3125 + if (start < i_size) { 3126 + iov_iter_xarray(&wdata->iter, ITER_SOURCE, &mapping->i_pages, 3127 + start, len); 3128 + 3129 + rc = adjust_credits(wdata->server, &wdata->credits, wdata->bytes); 3130 + if (rc) 3131 + goto err_wdata; 3132 + 3133 + if (wdata->cfile->invalidHandle) 3134 + rc = -EAGAIN; 3135 + else 3136 + rc = wdata->server->ops->async_writev(wdata, 3137 + cifs_writedata_release); 3138 + if (rc >= 0) { 3139 + kref_put(&wdata->refcount, cifs_writedata_release); 3140 + goto err_close; 3141 + } 3142 + } else { 3143 + /* The dirty region was entirely beyond the EOF. */ 3144 + cifs_pages_written_back(inode, start, len); 3145 + rc = 0; 3146 + } 3147 + 3148 + err_wdata: 3149 + kref_put(&wdata->refcount, cifs_writedata_release); 3150 + err_uncredit: 3151 + add_credits_and_wake_if(server, credits, 0); 3152 + err_close: 3153 + if (cfile) 3154 + cifsFileInfo_put(cfile); 3155 + err_xid: 3156 + free_xid(xid); 3157 + if (rc == 0) { 3158 + wbc->nr_to_write = count; 3159 + } else if (is_retryable_error(rc)) { 3160 + cifs_pages_write_redirty(inode, start, len); 3161 + } else { 3162 + cifs_pages_write_failed(inode, start, len); 3163 + mapping_set_error(mapping, rc); 3164 + } 3165 + /* Indication to update ctime and mtime as close is deferred */ 3166 + set_bit(CIFS_INO_MODIFIED_ATTR, &CIFS_I(inode)->flags); 3167 + return rc; 3168 + } 3169 + 3170 + /* 3171 + * write a region of pages back to the server 3172 + */ 3173 + static int cifs_writepages_region(struct address_space *mapping, 3174 + struct writeback_control *wbc, 3175 + loff_t start, loff_t end, loff_t *_next) 3176 + { 3177 + struct folio *folio; 3178 + struct page *head_page; 3179 + ssize_t ret; 3180 + int n, skips = 0; 3181 + 3182 + do { 3183 + pgoff_t index = start / PAGE_SIZE; 3184 + 3185 + n = find_get_pages_range_tag(mapping, &index, end / PAGE_SIZE, 3186 + PAGECACHE_TAG_DIRTY, 1, &head_page); 3187 + if (!n) 3188 + break; 3189 + 3190 + folio = page_folio(head_page); 3191 + start = folio_pos(folio); /* May regress with THPs */ 3192 + 3193 + /* At this point we hold neither the i_pages lock nor the 3194 + * page lock: the page may be truncated or invalidated 3195 + * (changing page->mapping to NULL), or even swizzled 3196 + * back from swapper_space to tmpfs file mapping 3197 + */ 3198 + if (wbc->sync_mode != WB_SYNC_NONE) { 3199 + ret = folio_lock_killable(folio); 3200 + if (ret < 0) { 3201 + folio_put(folio); 3202 + return ret; 3203 + } 3204 + } else { 3205 + if (!folio_trylock(folio)) { 3206 + folio_put(folio); 3207 + return 0; 3208 + } 3209 + } 3210 + 3211 + if (folio_mapping(folio) != mapping || 3212 + !folio_test_dirty(folio)) { 3213 + start += folio_size(folio); 3214 + folio_unlock(folio); 3215 + folio_put(folio); 3216 + continue; 3217 + } 3218 + 3219 + if (folio_test_writeback(folio) || 3220 + folio_test_fscache(folio)) { 3221 + folio_unlock(folio); 3222 + if (wbc->sync_mode != WB_SYNC_NONE) { 3223 + folio_wait_writeback(folio); 3224 + #ifdef CONFIG_CIFS_FSCACHE 3225 + folio_wait_fscache(folio); 3226 + #endif 3227 + } else { 3228 + start += folio_size(folio); 3229 + } 3230 + folio_put(folio); 3231 + if (wbc->sync_mode == WB_SYNC_NONE) { 3232 + if (skips >= 5 || need_resched()) 3233 + break; 3234 + skips++; 3235 + } 3236 + continue; 3237 + } 3238 + 3239 + if (!folio_clear_dirty_for_io(folio)) 3240 + /* We hold the page lock - it should've been dirty. */ 3241 + WARN_ON(1); 3242 + 3243 + ret = cifs_write_back_from_locked_folio(mapping, wbc, folio, start, end); 3244 + folio_put(folio); 3245 + if (ret < 0) 3246 + return ret; 3247 + 3248 + start += ret; 3249 + cond_resched(); 3250 + } while (wbc->nr_to_write > 0); 3251 + 3252 + *_next = start; 3253 + return 0; 3254 + } 3255 + 3256 + /* 3257 + * Write some of the pending data back to the server 3258 + */ 3259 + static int cifs_writepages(struct address_space *mapping, 3260 + struct writeback_control *wbc) 3261 + { 3262 + loff_t start, next; 3263 + int ret; 3264 + 3265 + /* We have to be careful as we can end up racing with setattr() 3266 + * truncating the pagecache since the caller doesn't take a lock here 3267 + * to prevent it. 3268 + */ 3269 + 3270 + if (wbc->range_cyclic) { 3271 + start = mapping->writeback_index * PAGE_SIZE; 3272 + ret = cifs_writepages_region(mapping, wbc, start, LLONG_MAX, &next); 3273 + if (ret == 0) { 3274 + mapping->writeback_index = next / PAGE_SIZE; 3275 + if (start > 0 && wbc->nr_to_write > 0) { 3276 + ret = cifs_writepages_region(mapping, wbc, 0, 3277 + start, &next); 3278 + if (ret == 0) 3279 + mapping->writeback_index = 3280 + next / PAGE_SIZE; 3281 + } 3282 + } 3283 + } else if (wbc->range_start == 0 && wbc->range_end == LLONG_MAX) { 3284 + ret = cifs_writepages_region(mapping, wbc, 0, LLONG_MAX, &next); 3285 + if (wbc->nr_to_write > 0 && ret == 0) 3286 + mapping->writeback_index = next / PAGE_SIZE; 3287 + } else { 3288 + ret = cifs_writepages_region(mapping, wbc, 3289 + wbc->range_start, wbc->range_end, &next); 3290 + } 3291 + 3292 + return ret; 3293 + } 2915 3294 2916 3295 static int 2917 3296 cifs_writepage_locked(struct page *page, struct writeback_control *wbc) ··· 3331 2972 struct inode *inode = mapping->host; 3332 2973 struct cifsFileInfo *cfile = file->private_data; 3333 2974 struct cifs_sb_info *cifs_sb = CIFS_SB(cfile->dentry->d_sb); 2975 + struct folio *folio = page_folio(page); 3334 2976 __u32 pid; 3335 2977 3336 2978 if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) ··· 3342 2982 cifs_dbg(FYI, "write_end for page %p from pos %lld with %d bytes\n", 3343 2983 page, pos, copied); 3344 2984 3345 - if (PageChecked(page)) { 2985 + if (folio_test_checked(folio)) { 3346 2986 if (copied == len) 3347 - SetPageUptodate(page); 3348 - ClearPageChecked(page); 3349 - } else if (!PageUptodate(page) && copied == PAGE_SIZE) 3350 - SetPageUptodate(page); 2987 + folio_mark_uptodate(folio); 2988 + folio_clear_checked(folio); 2989 + } else if (!folio_test_uptodate(folio) && copied == PAGE_SIZE) 2990 + folio_mark_uptodate(folio); 3351 2991 3352 - if (!PageUptodate(page)) { 2992 + if (!folio_test_uptodate(folio)) { 3353 2993 char *page_data; 3354 2994 unsigned offset = pos & (PAGE_SIZE - 1); 3355 2995 unsigned int xid; ··· 3509 3149 return rc; 3510 3150 } 3511 3151 3152 + #if 0 // TODO: Remove for iov_iter support 3512 3153 static int 3513 3154 cifs_write_allocate_pages(struct page **pages, unsigned long num_pages) 3514 3155 { ··· 3550 3189 3551 3190 return num_pages; 3552 3191 } 3192 + #endif 3553 3193 3554 3194 static void 3555 3195 cifs_uncached_writedata_release(struct kref *refcount) 3556 3196 { 3557 - int i; 3558 3197 struct cifs_writedata *wdata = container_of(refcount, 3559 3198 struct cifs_writedata, refcount); 3560 3199 3561 3200 kref_put(&wdata->ctx->refcount, cifs_aio_ctx_release); 3562 - for (i = 0; i < wdata->nr_pages; i++) 3563 - put_page(wdata->pages[i]); 3564 3201 cifs_writedata_release(refcount); 3565 3202 } 3566 3203 ··· 3584 3225 kref_put(&wdata->refcount, cifs_uncached_writedata_release); 3585 3226 } 3586 3227 3228 + #if 0 // TODO: Remove for iov_iter support 3587 3229 static int 3588 3230 wdata_fill_from_iovec(struct cifs_writedata *wdata, struct iov_iter *from, 3589 3231 size_t *len, unsigned long *num_pages) ··· 3626 3266 *num_pages = i + 1; 3627 3267 return 0; 3628 3268 } 3269 + #endif 3629 3270 3630 3271 static int 3631 3272 cifs_resend_wdata(struct cifs_writedata *wdata, struct list_head *wdata_list, ··· 3698 3337 return rc; 3699 3338 } 3700 3339 3340 + /* 3341 + * Select span of a bvec iterator we're going to use. Limit it by both maximum 3342 + * size and maximum number of segments. 3343 + */ 3344 + static size_t cifs_limit_bvec_subset(const struct iov_iter *iter, size_t max_size, 3345 + size_t max_segs, unsigned int *_nsegs) 3346 + { 3347 + const struct bio_vec *bvecs = iter->bvec; 3348 + unsigned int nbv = iter->nr_segs, ix = 0, nsegs = 0; 3349 + size_t len, span = 0, n = iter->count; 3350 + size_t skip = iter->iov_offset; 3351 + 3352 + if (WARN_ON(!iov_iter_is_bvec(iter)) || n == 0) 3353 + return 0; 3354 + 3355 + while (n && ix < nbv && skip) { 3356 + len = bvecs[ix].bv_len; 3357 + if (skip < len) 3358 + break; 3359 + skip -= len; 3360 + n -= len; 3361 + ix++; 3362 + } 3363 + 3364 + while (n && ix < nbv) { 3365 + len = min3(n, bvecs[ix].bv_len - skip, max_size); 3366 + span += len; 3367 + nsegs++; 3368 + ix++; 3369 + if (span >= max_size || nsegs >= max_segs) 3370 + break; 3371 + skip = 0; 3372 + n -= len; 3373 + } 3374 + 3375 + *_nsegs = nsegs; 3376 + return span; 3377 + } 3378 + 3701 3379 static int 3702 - cifs_write_from_iter(loff_t offset, size_t len, struct iov_iter *from, 3380 + cifs_write_from_iter(loff_t fpos, size_t len, struct iov_iter *from, 3703 3381 struct cifsFileInfo *open_file, 3704 3382 struct cifs_sb_info *cifs_sb, struct list_head *wdata_list, 3705 3383 struct cifs_aio_ctx *ctx) 3706 3384 { 3707 3385 int rc = 0; 3708 - size_t cur_len; 3709 - unsigned long nr_pages, num_pages, i; 3386 + size_t cur_len, max_len; 3710 3387 struct cifs_writedata *wdata; 3711 - struct iov_iter saved_from = *from; 3712 - loff_t saved_offset = offset; 3713 3388 pid_t pid; 3714 3389 struct TCP_Server_Info *server; 3715 - struct page **pagevec; 3716 - size_t start; 3717 - unsigned int xid; 3390 + unsigned int xid, max_segs = INT_MAX; 3718 3391 3719 3392 if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) 3720 3393 pid = open_file->pid; ··· 3758 3363 server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); 3759 3364 xid = get_xid(); 3760 3365 3366 + #ifdef CONFIG_CIFS_SMB_DIRECT 3367 + if (server->smbd_conn) 3368 + max_segs = server->smbd_conn->max_frmr_depth; 3369 + #endif 3370 + 3761 3371 do { 3762 - unsigned int wsize; 3763 3372 struct cifs_credits credits_on_stack; 3764 3373 struct cifs_credits *credits = &credits_on_stack; 3374 + unsigned int wsize, nsegs = 0; 3375 + 3376 + if (signal_pending(current)) { 3377 + rc = -EINTR; 3378 + break; 3379 + } 3765 3380 3766 3381 if (open_file->invalidHandle) { 3767 3382 rc = cifs_reopen_file(open_file, false); ··· 3786 3381 if (rc) 3787 3382 break; 3788 3383 3789 - cur_len = min_t(const size_t, len, wsize); 3384 + max_len = min_t(const size_t, len, wsize); 3385 + if (!max_len) { 3386 + rc = -EAGAIN; 3387 + add_credits_and_wake_if(server, credits, 0); 3388 + break; 3389 + } 3790 3390 3791 - if (ctx->direct_io) { 3792 - ssize_t result; 3391 + cur_len = cifs_limit_bvec_subset(from, max_len, max_segs, &nsegs); 3392 + cifs_dbg(FYI, "write_from_iter len=%zx/%zx nsegs=%u/%lu/%u\n", 3393 + cur_len, max_len, nsegs, from->nr_segs, max_segs); 3394 + if (cur_len == 0) { 3395 + rc = -EIO; 3396 + add_credits_and_wake_if(server, credits, 0); 3397 + break; 3398 + } 3793 3399 3794 - result = iov_iter_get_pages_alloc2( 3795 - from, &pagevec, cur_len, &start); 3796 - if (result < 0) { 3797 - cifs_dbg(VFS, 3798 - "direct_writev couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n", 3799 - result, iov_iter_type(from), 3800 - from->iov_offset, from->count); 3801 - dump_stack(); 3802 - 3803 - rc = result; 3804 - add_credits_and_wake_if(server, credits, 0); 3805 - break; 3806 - } 3807 - cur_len = (size_t)result; 3808 - 3809 - nr_pages = 3810 - (cur_len + start + PAGE_SIZE - 1) / PAGE_SIZE; 3811 - 3812 - wdata = cifs_writedata_direct_alloc(pagevec, 3813 - cifs_uncached_writev_complete); 3814 - if (!wdata) { 3815 - rc = -ENOMEM; 3816 - for (i = 0; i < nr_pages; i++) 3817 - put_page(pagevec[i]); 3818 - kvfree(pagevec); 3819 - add_credits_and_wake_if(server, credits, 0); 3820 - break; 3821 - } 3822 - 3823 - 3824 - wdata->page_offset = start; 3825 - wdata->tailsz = 3826 - nr_pages > 1 ? 3827 - cur_len - (PAGE_SIZE - start) - 3828 - (nr_pages - 2) * PAGE_SIZE : 3829 - cur_len; 3830 - } else { 3831 - nr_pages = get_numpages(wsize, len, &cur_len); 3832 - wdata = cifs_writedata_alloc(nr_pages, 3833 - cifs_uncached_writev_complete); 3834 - if (!wdata) { 3835 - rc = -ENOMEM; 3836 - add_credits_and_wake_if(server, credits, 0); 3837 - break; 3838 - } 3839 - 3840 - rc = cifs_write_allocate_pages(wdata->pages, nr_pages); 3841 - if (rc) { 3842 - kvfree(wdata->pages); 3843 - kfree(wdata); 3844 - add_credits_and_wake_if(server, credits, 0); 3845 - break; 3846 - } 3847 - 3848 - num_pages = nr_pages; 3849 - rc = wdata_fill_from_iovec( 3850 - wdata, from, &cur_len, &num_pages); 3851 - if (rc) { 3852 - for (i = 0; i < nr_pages; i++) 3853 - put_page(wdata->pages[i]); 3854 - kvfree(wdata->pages); 3855 - kfree(wdata); 3856 - add_credits_and_wake_if(server, credits, 0); 3857 - break; 3858 - } 3859 - 3860 - /* 3861 - * Bring nr_pages down to the number of pages we 3862 - * actually used, and free any pages that we didn't use. 3863 - */ 3864 - for ( ; nr_pages > num_pages; nr_pages--) 3865 - put_page(wdata->pages[nr_pages - 1]); 3866 - 3867 - wdata->tailsz = cur_len - ((nr_pages - 1) * PAGE_SIZE); 3400 + wdata = cifs_writedata_alloc(cifs_uncached_writev_complete); 3401 + if (!wdata) { 3402 + rc = -ENOMEM; 3403 + add_credits_and_wake_if(server, credits, 0); 3404 + break; 3868 3405 } 3869 3406 3870 3407 wdata->sync_mode = WB_SYNC_ALL; 3871 - wdata->nr_pages = nr_pages; 3872 - wdata->offset = (__u64)offset; 3873 - wdata->cfile = cifsFileInfo_get(open_file); 3874 - wdata->server = server; 3875 - wdata->pid = pid; 3876 - wdata->bytes = cur_len; 3877 - wdata->pagesz = PAGE_SIZE; 3878 - wdata->credits = credits_on_stack; 3879 - wdata->ctx = ctx; 3408 + wdata->offset = (__u64)fpos; 3409 + wdata->cfile = cifsFileInfo_get(open_file); 3410 + wdata->server = server; 3411 + wdata->pid = pid; 3412 + wdata->bytes = cur_len; 3413 + wdata->credits = credits_on_stack; 3414 + wdata->iter = *from; 3415 + wdata->ctx = ctx; 3880 3416 kref_get(&ctx->refcount); 3417 + 3418 + iov_iter_truncate(&wdata->iter, cur_len); 3881 3419 3882 3420 rc = adjust_credits(server, &wdata->credits, wdata->bytes); 3883 3421 ··· 3836 3488 add_credits_and_wake_if(server, &wdata->credits, 0); 3837 3489 kref_put(&wdata->refcount, 3838 3490 cifs_uncached_writedata_release); 3839 - if (rc == -EAGAIN) { 3840 - *from = saved_from; 3841 - iov_iter_advance(from, offset - saved_offset); 3491 + if (rc == -EAGAIN) 3842 3492 continue; 3843 - } 3844 3493 break; 3845 3494 } 3846 3495 3847 3496 list_add_tail(&wdata->list, wdata_list); 3848 - offset += cur_len; 3497 + iov_iter_advance(from, cur_len); 3498 + fpos += cur_len; 3849 3499 len -= cur_len; 3850 3500 } while (len > 0); 3851 3501 ··· 3942 3596 struct cifs_tcon *tcon; 3943 3597 struct cifs_sb_info *cifs_sb; 3944 3598 struct cifs_aio_ctx *ctx; 3945 - struct iov_iter saved_from = *from; 3946 - size_t len = iov_iter_count(from); 3947 3599 int rc; 3948 3600 3949 3601 /* ··· 3975 3631 ctx->iocb = iocb; 3976 3632 3977 3633 ctx->pos = iocb->ki_pos; 3634 + ctx->direct_io = direct; 3635 + ctx->nr_pinned_pages = 0; 3978 3636 3979 - if (direct) { 3980 - ctx->direct_io = true; 3981 - ctx->iter = *from; 3982 - ctx->len = len; 3983 - } else { 3984 - rc = setup_aio_ctx_iter(ctx, from, ITER_SOURCE); 3985 - if (rc) { 3637 + if (user_backed_iter(from)) { 3638 + /* 3639 + * Extract IOVEC/UBUF-type iterators to a BVEC-type iterator as 3640 + * they contain references to the calling process's virtual 3641 + * memory layout which won't be available in an async worker 3642 + * thread. This also takes a pin on every folio involved. 3643 + */ 3644 + rc = netfs_extract_user_iter(from, iov_iter_count(from), 3645 + &ctx->iter, 0); 3646 + if (rc < 0) { 3986 3647 kref_put(&ctx->refcount, cifs_aio_ctx_release); 3987 3648 return rc; 3988 3649 } 3650 + 3651 + ctx->nr_pinned_pages = rc; 3652 + ctx->bv = (void *)ctx->iter.bvec; 3653 + ctx->bv_need_unpin = iov_iter_extract_will_pin(&ctx->iter); 3654 + } else if ((iov_iter_is_bvec(from) || iov_iter_is_kvec(from)) && 3655 + !is_sync_kiocb(iocb)) { 3656 + /* 3657 + * If the op is asynchronous, we need to copy the list attached 3658 + * to a BVEC/KVEC-type iterator, but we assume that the storage 3659 + * will be pinned by the caller; in any case, we may or may not 3660 + * be able to pin the pages, so we don't try. 3661 + */ 3662 + ctx->bv = (void *)dup_iter(&ctx->iter, from, GFP_KERNEL); 3663 + if (!ctx->bv) { 3664 + kref_put(&ctx->refcount, cifs_aio_ctx_release); 3665 + return -ENOMEM; 3666 + } 3667 + } else { 3668 + /* 3669 + * Otherwise, we just pass the iterator down as-is and rely on 3670 + * the caller to make sure the pages referred to by the 3671 + * iterator don't evaporate. 3672 + */ 3673 + ctx->iter = *from; 3989 3674 } 3675 + 3676 + ctx->len = iov_iter_count(&ctx->iter); 3990 3677 3991 3678 /* grab a lock here due to read response handlers can access ctx */ 3992 3679 mutex_lock(&ctx->aio_mutex); 3993 3680 3994 - rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &saved_from, 3681 + rc = cifs_write_from_iter(iocb->ki_pos, ctx->len, &ctx->iter, 3995 3682 cfile, cifs_sb, &ctx->list, ctx); 3996 3683 3997 3684 /* ··· 4165 3790 return written; 4166 3791 } 4167 3792 4168 - static struct cifs_readdata * 4169 - cifs_readdata_direct_alloc(struct page **pages, work_func_t complete) 3793 + static struct cifs_readdata *cifs_readdata_alloc(work_func_t complete) 4170 3794 { 4171 3795 struct cifs_readdata *rdata; 4172 3796 4173 3797 rdata = kzalloc(sizeof(*rdata), GFP_KERNEL); 4174 - if (rdata != NULL) { 4175 - rdata->pages = pages; 3798 + if (rdata) { 4176 3799 kref_init(&rdata->refcount); 4177 3800 INIT_LIST_HEAD(&rdata->list); 4178 3801 init_completion(&rdata->done); ··· 4180 3807 return rdata; 4181 3808 } 4182 3809 4183 - static struct cifs_readdata * 4184 - cifs_readdata_alloc(unsigned int nr_pages, work_func_t complete) 4185 - { 4186 - struct page **pages = 4187 - kcalloc(nr_pages, sizeof(struct page *), GFP_KERNEL); 4188 - struct cifs_readdata *ret = NULL; 4189 - 4190 - if (pages) { 4191 - ret = cifs_readdata_direct_alloc(pages, complete); 4192 - if (!ret) 4193 - kfree(pages); 4194 - } 4195 - 4196 - return ret; 4197 - } 4198 - 4199 3810 void 4200 3811 cifs_readdata_release(struct kref *refcount) 4201 3812 { 4202 3813 struct cifs_readdata *rdata = container_of(refcount, 4203 3814 struct cifs_readdata, refcount); 3815 + 3816 + if (rdata->ctx) 3817 + kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release); 4204 3818 #ifdef CONFIG_CIFS_SMB_DIRECT 4205 3819 if (rdata->mr) { 4206 3820 smbd_deregister_mr(rdata->mr); ··· 4197 3837 if (rdata->cfile) 4198 3838 cifsFileInfo_put(rdata->cfile); 4199 3839 4200 - kvfree(rdata->pages); 4201 3840 kfree(rdata); 4202 - } 4203 - 4204 - static int 4205 - cifs_read_allocate_pages(struct cifs_readdata *rdata, unsigned int nr_pages) 4206 - { 4207 - int rc = 0; 4208 - struct page *page; 4209 - unsigned int i; 4210 - 4211 - for (i = 0; i < nr_pages; i++) { 4212 - page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); 4213 - if (!page) { 4214 - rc = -ENOMEM; 4215 - break; 4216 - } 4217 - rdata->pages[i] = page; 4218 - } 4219 - 4220 - if (rc) { 4221 - unsigned int nr_page_failed = i; 4222 - 4223 - for (i = 0; i < nr_page_failed; i++) { 4224 - put_page(rdata->pages[i]); 4225 - rdata->pages[i] = NULL; 4226 - } 4227 - } 4228 - return rc; 4229 - } 4230 - 4231 - static void 4232 - cifs_uncached_readdata_release(struct kref *refcount) 4233 - { 4234 - struct cifs_readdata *rdata = container_of(refcount, 4235 - struct cifs_readdata, refcount); 4236 - unsigned int i; 4237 - 4238 - kref_put(&rdata->ctx->refcount, cifs_aio_ctx_release); 4239 - for (i = 0; i < rdata->nr_pages; i++) { 4240 - put_page(rdata->pages[i]); 4241 - } 4242 - cifs_readdata_release(refcount); 4243 - } 4244 - 4245 - /** 4246 - * cifs_readdata_to_iov - copy data from pages in response to an iovec 4247 - * @rdata: the readdata response with list of pages holding data 4248 - * @iter: destination for our data 4249 - * 4250 - * This function copies data from a list of pages in a readdata response into 4251 - * an array of iovecs. It will first calculate where the data should go 4252 - * based on the info in the readdata and then copy the data into that spot. 4253 - */ 4254 - static int 4255 - cifs_readdata_to_iov(struct cifs_readdata *rdata, struct iov_iter *iter) 4256 - { 4257 - size_t remaining = rdata->got_bytes; 4258 - unsigned int i; 4259 - 4260 - for (i = 0; i < rdata->nr_pages; i++) { 4261 - struct page *page = rdata->pages[i]; 4262 - size_t copy = min_t(size_t, remaining, PAGE_SIZE); 4263 - size_t written; 4264 - 4265 - if (unlikely(iov_iter_is_pipe(iter))) { 4266 - void *addr = kmap_atomic(page); 4267 - 4268 - written = copy_to_iter(addr, copy, iter); 4269 - kunmap_atomic(addr); 4270 - } else 4271 - written = copy_page_to_iter(page, 0, copy, iter); 4272 - remaining -= written; 4273 - if (written < copy && iov_iter_count(iter) > 0) 4274 - break; 4275 - } 4276 - return remaining ? -EFAULT : 0; 4277 3841 } 4278 3842 4279 3843 static void collect_uncached_read_data(struct cifs_aio_ctx *ctx); ··· 4211 3927 complete(&rdata->done); 4212 3928 collect_uncached_read_data(rdata->ctx); 4213 3929 /* the below call can possibly free the last ref to aio ctx */ 4214 - kref_put(&rdata->refcount, cifs_uncached_readdata_release); 3930 + kref_put(&rdata->refcount, cifs_readdata_release); 4215 3931 } 3932 + 3933 + #if 0 // TODO: Remove for iov_iter support 4216 3934 4217 3935 static int 4218 3936 uncached_fill_pages(struct TCP_Server_Info *server, ··· 4289 4003 { 4290 4004 return uncached_fill_pages(server, rdata, iter, iter->count); 4291 4005 } 4006 + #endif 4292 4007 4293 4008 static int cifs_resend_rdata(struct cifs_readdata *rdata, 4294 4009 struct list_head *rdata_list, ··· 4359 4072 } while (rc == -EAGAIN); 4360 4073 4361 4074 fail: 4362 - kref_put(&rdata->refcount, cifs_uncached_readdata_release); 4075 + kref_put(&rdata->refcount, cifs_readdata_release); 4363 4076 return rc; 4364 4077 } 4365 4078 4366 4079 static int 4367 - cifs_send_async_read(loff_t offset, size_t len, struct cifsFileInfo *open_file, 4080 + cifs_send_async_read(loff_t fpos, size_t len, struct cifsFileInfo *open_file, 4368 4081 struct cifs_sb_info *cifs_sb, struct list_head *rdata_list, 4369 4082 struct cifs_aio_ctx *ctx) 4370 4083 { 4371 4084 struct cifs_readdata *rdata; 4372 - unsigned int npages, rsize; 4085 + unsigned int rsize, nsegs, max_segs = INT_MAX; 4373 4086 struct cifs_credits credits_on_stack; 4374 4087 struct cifs_credits *credits = &credits_on_stack; 4375 - size_t cur_len; 4088 + size_t cur_len, max_len; 4376 4089 int rc; 4377 4090 pid_t pid; 4378 4091 struct TCP_Server_Info *server; 4379 - struct page **pagevec; 4380 - size_t start; 4381 - struct iov_iter direct_iov = ctx->iter; 4382 4092 4383 4093 server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); 4094 + 4095 + #ifdef CONFIG_CIFS_SMB_DIRECT 4096 + if (server->smbd_conn) 4097 + max_segs = server->smbd_conn->max_frmr_depth; 4098 + #endif 4384 4099 4385 4100 if (cifs_sb->mnt_cifs_flags & CIFS_MOUNT_RWPIDFORWARD) 4386 4101 pid = open_file->pid; 4387 4102 else 4388 4103 pid = current->tgid; 4389 - 4390 - if (ctx->direct_io) 4391 - iov_iter_advance(&direct_iov, offset - ctx->pos); 4392 4104 4393 4105 do { 4394 4106 if (open_file->invalidHandle) { ··· 4408 4122 if (rc) 4409 4123 break; 4410 4124 4411 - cur_len = min_t(const size_t, len, rsize); 4125 + max_len = min_t(size_t, len, rsize); 4412 4126 4413 - if (ctx->direct_io) { 4414 - ssize_t result; 4415 - 4416 - result = iov_iter_get_pages_alloc2( 4417 - &direct_iov, &pagevec, 4418 - cur_len, &start); 4419 - if (result < 0) { 4420 - cifs_dbg(VFS, 4421 - "Couldn't get user pages (rc=%zd) iter type %d iov_offset %zd count %zd\n", 4422 - result, iov_iter_type(&direct_iov), 4423 - direct_iov.iov_offset, 4424 - direct_iov.count); 4425 - dump_stack(); 4426 - 4427 - rc = result; 4428 - add_credits_and_wake_if(server, credits, 0); 4429 - break; 4430 - } 4431 - cur_len = (size_t)result; 4432 - 4433 - rdata = cifs_readdata_direct_alloc( 4434 - pagevec, cifs_uncached_readv_complete); 4435 - if (!rdata) { 4436 - add_credits_and_wake_if(server, credits, 0); 4437 - rc = -ENOMEM; 4438 - break; 4439 - } 4440 - 4441 - npages = (cur_len + start + PAGE_SIZE-1) / PAGE_SIZE; 4442 - rdata->page_offset = start; 4443 - rdata->tailsz = npages > 1 ? 4444 - cur_len-(PAGE_SIZE-start)-(npages-2)*PAGE_SIZE : 4445 - cur_len; 4446 - 4447 - } else { 4448 - 4449 - npages = DIV_ROUND_UP(cur_len, PAGE_SIZE); 4450 - /* allocate a readdata struct */ 4451 - rdata = cifs_readdata_alloc(npages, 4452 - cifs_uncached_readv_complete); 4453 - if (!rdata) { 4454 - add_credits_and_wake_if(server, credits, 0); 4455 - rc = -ENOMEM; 4456 - break; 4457 - } 4458 - 4459 - rc = cifs_read_allocate_pages(rdata, npages); 4460 - if (rc) { 4461 - kvfree(rdata->pages); 4462 - kfree(rdata); 4463 - add_credits_and_wake_if(server, credits, 0); 4464 - break; 4465 - } 4466 - 4467 - rdata->tailsz = PAGE_SIZE; 4127 + cur_len = cifs_limit_bvec_subset(&ctx->iter, max_len, 4128 + max_segs, &nsegs); 4129 + cifs_dbg(FYI, "read-to-iter len=%zx/%zx nsegs=%u/%lu/%u\n", 4130 + cur_len, max_len, nsegs, ctx->iter.nr_segs, max_segs); 4131 + if (cur_len == 0) { 4132 + rc = -EIO; 4133 + add_credits_and_wake_if(server, credits, 0); 4134 + break; 4468 4135 } 4469 4136 4470 - rdata->server = server; 4471 - rdata->cfile = cifsFileInfo_get(open_file); 4472 - rdata->nr_pages = npages; 4473 - rdata->offset = offset; 4474 - rdata->bytes = cur_len; 4475 - rdata->pid = pid; 4476 - rdata->pagesz = PAGE_SIZE; 4477 - rdata->read_into_pages = cifs_uncached_read_into_pages; 4478 - rdata->copy_into_pages = cifs_uncached_copy_into_pages; 4479 - rdata->credits = credits_on_stack; 4480 - rdata->ctx = ctx; 4137 + rdata = cifs_readdata_alloc(cifs_uncached_readv_complete); 4138 + if (!rdata) { 4139 + add_credits_and_wake_if(server, credits, 0); 4140 + rc = -ENOMEM; 4141 + break; 4142 + } 4143 + 4144 + rdata->server = server; 4145 + rdata->cfile = cifsFileInfo_get(open_file); 4146 + rdata->offset = fpos; 4147 + rdata->bytes = cur_len; 4148 + rdata->pid = pid; 4149 + rdata->credits = credits_on_stack; 4150 + rdata->ctx = ctx; 4481 4151 kref_get(&ctx->refcount); 4152 + 4153 + rdata->iter = ctx->iter; 4154 + iov_iter_truncate(&rdata->iter, cur_len); 4482 4155 4483 4156 rc = adjust_credits(server, &rdata->credits, rdata->bytes); 4484 4157 ··· 4450 4205 4451 4206 if (rc) { 4452 4207 add_credits_and_wake_if(server, &rdata->credits, 0); 4453 - kref_put(&rdata->refcount, 4454 - cifs_uncached_readdata_release); 4455 - if (rc == -EAGAIN) { 4456 - iov_iter_revert(&direct_iov, cur_len); 4208 + kref_put(&rdata->refcount, cifs_readdata_release); 4209 + if (rc == -EAGAIN) 4457 4210 continue; 4458 - } 4459 4211 break; 4460 4212 } 4461 4213 4462 4214 list_add_tail(&rdata->list, rdata_list); 4463 - offset += cur_len; 4215 + iov_iter_advance(&ctx->iter, cur_len); 4216 + fpos += cur_len; 4464 4217 len -= cur_len; 4465 4218 } while (len > 0); 4466 4219 ··· 4500 4257 list_del_init(&rdata->list); 4501 4258 INIT_LIST_HEAD(&tmp_list); 4502 4259 4503 - /* 4504 - * Got a part of data and then reconnect has 4505 - * happened -- fill the buffer and continue 4506 - * reading. 4507 - */ 4508 - if (got_bytes && got_bytes < rdata->bytes) { 4509 - rc = 0; 4510 - if (!ctx->direct_io) 4511 - rc = cifs_readdata_to_iov(rdata, to); 4512 - if (rc) { 4513 - kref_put(&rdata->refcount, 4514 - cifs_uncached_readdata_release); 4515 - continue; 4516 - } 4517 - } 4518 - 4519 4260 if (ctx->direct_io) { 4520 4261 /* 4521 4262 * Re-use rdata as this is a ··· 4516 4289 &tmp_list, ctx); 4517 4290 4518 4291 kref_put(&rdata->refcount, 4519 - cifs_uncached_readdata_release); 4292 + cifs_readdata_release); 4520 4293 } 4521 4294 4522 4295 list_splice(&tmp_list, &ctx->list); ··· 4524 4297 goto again; 4525 4298 } else if (rdata->result) 4526 4299 rc = rdata->result; 4527 - else if (!ctx->direct_io) 4528 - rc = cifs_readdata_to_iov(rdata, to); 4529 4300 4530 4301 /* if there was a short read -- discard anything left */ 4531 4302 if (rdata->got_bytes && rdata->got_bytes < rdata->bytes) ··· 4532 4307 ctx->total_len += rdata->got_bytes; 4533 4308 } 4534 4309 list_del_init(&rdata->list); 4535 - kref_put(&rdata->refcount, cifs_uncached_readdata_release); 4310 + kref_put(&rdata->refcount, cifs_readdata_release); 4536 4311 } 4537 4312 4538 4313 if (!ctx->direct_io) ··· 4592 4367 if (!ctx) 4593 4368 return -ENOMEM; 4594 4369 4595 - ctx->cfile = cifsFileInfo_get(cfile); 4370 + ctx->pos = offset; 4371 + ctx->direct_io = direct; 4372 + ctx->len = len; 4373 + ctx->cfile = cifsFileInfo_get(cfile); 4374 + ctx->nr_pinned_pages = 0; 4596 4375 4597 4376 if (!is_sync_kiocb(iocb)) 4598 4377 ctx->iocb = iocb; 4599 4378 4600 - if (user_backed_iter(to)) 4601 - ctx->should_dirty = true; 4602 - 4603 - if (direct) { 4604 - ctx->pos = offset; 4605 - ctx->direct_io = true; 4606 - ctx->iter = *to; 4607 - ctx->len = len; 4608 - } else { 4609 - rc = setup_aio_ctx_iter(ctx, to, ITER_DEST); 4610 - if (rc) { 4379 + if (user_backed_iter(to)) { 4380 + /* 4381 + * Extract IOVEC/UBUF-type iterators to a BVEC-type iterator as 4382 + * they contain references to the calling process's virtual 4383 + * memory layout which won't be available in an async worker 4384 + * thread. This also takes a pin on every folio involved. 4385 + */ 4386 + rc = netfs_extract_user_iter(to, iov_iter_count(to), 4387 + &ctx->iter, 0); 4388 + if (rc < 0) { 4611 4389 kref_put(&ctx->refcount, cifs_aio_ctx_release); 4612 4390 return rc; 4613 4391 } 4614 - len = ctx->len; 4392 + 4393 + ctx->nr_pinned_pages = rc; 4394 + ctx->bv = (void *)ctx->iter.bvec; 4395 + ctx->bv_need_unpin = iov_iter_extract_will_pin(&ctx->iter); 4396 + ctx->should_dirty = true; 4397 + } else if ((iov_iter_is_bvec(to) || iov_iter_is_kvec(to)) && 4398 + !is_sync_kiocb(iocb)) { 4399 + /* 4400 + * If the op is asynchronous, we need to copy the list attached 4401 + * to a BVEC/KVEC-type iterator, but we assume that the storage 4402 + * will be retained by the caller; in any case, we may or may 4403 + * not be able to pin the pages, so we don't try. 4404 + */ 4405 + ctx->bv = (void *)dup_iter(&ctx->iter, to, GFP_KERNEL); 4406 + if (!ctx->bv) { 4407 + kref_put(&ctx->refcount, cifs_aio_ctx_release); 4408 + return -ENOMEM; 4409 + } 4410 + } else { 4411 + /* 4412 + * Otherwise, we just pass the iterator down as-is and rely on 4413 + * the caller to make sure the pages referred to by the 4414 + * iterator don't evaporate. 4415 + */ 4416 + ctx->iter = *to; 4615 4417 } 4616 4418 4617 4419 if (direct) { ··· 4900 4648 return rc; 4901 4649 } 4902 4650 4651 + #if 0 // TODO: Remove for iov_iter support 4652 + 4903 4653 static void 4904 4654 cifs_readv_complete(struct work_struct *work) 4905 4655 { ··· 5032 4778 { 5033 4779 return readpages_fill_pages(server, rdata, iter, iter->count); 5034 4780 } 4781 + #endif 4782 + 4783 + /* 4784 + * Unlock a bunch of folios in the pagecache. 4785 + */ 4786 + static void cifs_unlock_folios(struct address_space *mapping, pgoff_t first, pgoff_t last) 4787 + { 4788 + struct folio *folio; 4789 + XA_STATE(xas, &mapping->i_pages, first); 4790 + 4791 + rcu_read_lock(); 4792 + xas_for_each(&xas, folio, last) { 4793 + folio_unlock(folio); 4794 + } 4795 + rcu_read_unlock(); 4796 + } 4797 + 4798 + static void cifs_readahead_complete(struct work_struct *work) 4799 + { 4800 + struct cifs_readdata *rdata = container_of(work, 4801 + struct cifs_readdata, work); 4802 + struct folio *folio; 4803 + pgoff_t last; 4804 + bool good = rdata->result == 0 || (rdata->result == -EAGAIN && rdata->got_bytes); 4805 + 4806 + XA_STATE(xas, &rdata->mapping->i_pages, rdata->offset / PAGE_SIZE); 4807 + 4808 + if (good) 4809 + cifs_readahead_to_fscache(rdata->mapping->host, 4810 + rdata->offset, rdata->bytes); 4811 + 4812 + if (iov_iter_count(&rdata->iter) > 0) 4813 + iov_iter_zero(iov_iter_count(&rdata->iter), &rdata->iter); 4814 + 4815 + last = (rdata->offset + rdata->bytes - 1) / PAGE_SIZE; 4816 + 4817 + rcu_read_lock(); 4818 + xas_for_each(&xas, folio, last) { 4819 + if (good) { 4820 + flush_dcache_folio(folio); 4821 + folio_mark_uptodate(folio); 4822 + } 4823 + folio_unlock(folio); 4824 + } 4825 + rcu_read_unlock(); 4826 + 4827 + kref_put(&rdata->refcount, cifs_readdata_release); 4828 + } 5035 4829 5036 4830 static void cifs_readahead(struct readahead_control *ractl) 5037 4831 { 5038 - int rc; 5039 4832 struct cifsFileInfo *open_file = ractl->file->private_data; 5040 4833 struct cifs_sb_info *cifs_sb = CIFS_FILE_SB(ractl->file); 5041 4834 struct TCP_Server_Info *server; 5042 - pid_t pid; 5043 - unsigned int xid, nr_pages, last_batch_size = 0, cache_nr_pages = 0; 5044 - pgoff_t next_cached = ULONG_MAX; 4835 + unsigned int xid, nr_pages, cache_nr_pages = 0; 4836 + unsigned int ra_pages; 4837 + pgoff_t next_cached = ULONG_MAX, ra_index; 5045 4838 bool caching = fscache_cookie_enabled(cifs_inode_cookie(ractl->mapping->host)) && 5046 4839 cifs_inode_cookie(ractl->mapping->host)->cache_priv; 5047 4840 bool check_cache = caching; 4841 + pid_t pid; 4842 + int rc = 0; 4843 + 4844 + /* Note that readahead_count() lags behind our dequeuing of pages from 4845 + * the ractl, wo we have to keep track for ourselves. 4846 + */ 4847 + ra_pages = readahead_count(ractl); 4848 + ra_index = readahead_index(ractl); 5048 4849 5049 4850 xid = get_xid(); 5050 4851 ··· 5108 4799 else 5109 4800 pid = current->tgid; 5110 4801 5111 - rc = 0; 5112 4802 server = cifs_pick_channel(tlink_tcon(open_file->tlink)->ses); 5113 4803 5114 4804 cifs_dbg(FYI, "%s: file=%p mapping=%p num_pages=%u\n", 5115 - __func__, ractl->file, ractl->mapping, readahead_count(ractl)); 4805 + __func__, ractl->file, ractl->mapping, ra_pages); 5116 4806 5117 4807 /* 5118 4808 * Chop the readahead request up into rsize-sized read requests. 5119 4809 */ 5120 - while ((nr_pages = readahead_count(ractl) - last_batch_size)) { 5121 - unsigned int i, got, rsize; 5122 - struct page *page; 4810 + while ((nr_pages = ra_pages)) { 4811 + unsigned int i, rsize; 5123 4812 struct cifs_readdata *rdata; 5124 4813 struct cifs_credits credits_on_stack; 5125 4814 struct cifs_credits *credits = &credits_on_stack; 5126 - pgoff_t index = readahead_index(ractl) + last_batch_size; 4815 + struct folio *folio; 4816 + pgoff_t fsize; 5127 4817 5128 4818 /* 5129 4819 * Find out if we have anything cached in the range of ··· 5131 4823 if (caching) { 5132 4824 if (check_cache) { 5133 4825 rc = cifs_fscache_query_occupancy( 5134 - ractl->mapping->host, index, nr_pages, 4826 + ractl->mapping->host, ra_index, nr_pages, 5135 4827 &next_cached, &cache_nr_pages); 5136 4828 if (rc < 0) 5137 4829 caching = false; 5138 4830 check_cache = false; 5139 4831 } 5140 4832 5141 - if (index == next_cached) { 4833 + if (ra_index == next_cached) { 5142 4834 /* 5143 4835 * TODO: Send a whole batch of pages to be read 5144 4836 * by the cache. 5145 4837 */ 5146 - struct folio *folio = readahead_folio(ractl); 5147 - 5148 - last_batch_size = folio_nr_pages(folio); 4838 + folio = readahead_folio(ractl); 4839 + fsize = folio_nr_pages(folio); 4840 + ra_pages -= fsize; 4841 + ra_index += fsize; 5149 4842 if (cifs_readpage_from_fscache(ractl->mapping->host, 5150 4843 &folio->page) < 0) { 5151 4844 /* ··· 5157 4848 caching = false; 5158 4849 } 5159 4850 folio_unlock(folio); 5160 - next_cached++; 5161 - cache_nr_pages--; 4851 + next_cached += fsize; 4852 + cache_nr_pages -= fsize; 5162 4853 if (cache_nr_pages == 0) 5163 4854 check_cache = true; 5164 4855 continue; ··· 5183 4874 &rsize, credits); 5184 4875 if (rc) 5185 4876 break; 5186 - nr_pages = min_t(size_t, rsize / PAGE_SIZE, readahead_count(ractl)); 5187 - nr_pages = min_t(size_t, nr_pages, next_cached - index); 4877 + nr_pages = min_t(size_t, rsize / PAGE_SIZE, ra_pages); 4878 + if (next_cached != ULONG_MAX) 4879 + nr_pages = min_t(size_t, nr_pages, next_cached - ra_index); 5188 4880 5189 4881 /* 5190 4882 * Give up immediately if rsize is too small to read an entire ··· 5198 4888 break; 5199 4889 } 5200 4890 5201 - rdata = cifs_readdata_alloc(nr_pages, cifs_readv_complete); 4891 + rdata = cifs_readdata_alloc(cifs_readahead_complete); 5202 4892 if (!rdata) { 5203 4893 /* best to give up if we're out of mem */ 5204 4894 add_credits_and_wake_if(server, credits, 0); 5205 4895 break; 5206 4896 } 5207 4897 5208 - got = __readahead_batch(ractl, rdata->pages, nr_pages); 5209 - if (got != nr_pages) { 5210 - pr_warn("__readahead_batch() returned %u/%u\n", 5211 - got, nr_pages); 5212 - nr_pages = got; 5213 - } 5214 - 5215 - rdata->nr_pages = nr_pages; 5216 - rdata->bytes = readahead_batch_length(ractl); 4898 + rdata->offset = ra_index * PAGE_SIZE; 4899 + rdata->bytes = nr_pages * PAGE_SIZE; 5217 4900 rdata->cfile = cifsFileInfo_get(open_file); 5218 4901 rdata->server = server; 5219 4902 rdata->mapping = ractl->mapping; 5220 - rdata->offset = readahead_pos(ractl); 5221 4903 rdata->pid = pid; 5222 - rdata->pagesz = PAGE_SIZE; 5223 - rdata->tailsz = PAGE_SIZE; 5224 - rdata->read_into_pages = cifs_readpages_read_into_pages; 5225 - rdata->copy_into_pages = cifs_readpages_copy_into_pages; 5226 4904 rdata->credits = credits_on_stack; 4905 + 4906 + for (i = 0; i < nr_pages; i++) { 4907 + if (!readahead_folio(ractl)) 4908 + WARN_ON(1); 4909 + } 4910 + ra_pages -= nr_pages; 4911 + ra_index += nr_pages; 4912 + 4913 + iov_iter_xarray(&rdata->iter, ITER_DEST, &rdata->mapping->i_pages, 4914 + rdata->offset, rdata->bytes); 5227 4915 5228 4916 rc = adjust_credits(server, &rdata->credits, rdata->bytes); 5229 4917 if (!rc) { ··· 5233 4925 5234 4926 if (rc) { 5235 4927 add_credits_and_wake_if(server, &rdata->credits, 0); 5236 - for (i = 0; i < rdata->nr_pages; i++) { 5237 - page = rdata->pages[i]; 5238 - unlock_page(page); 5239 - put_page(page); 5240 - } 4928 + cifs_unlock_folios(rdata->mapping, 4929 + rdata->offset / PAGE_SIZE, 4930 + (rdata->offset + rdata->bytes - 1) / PAGE_SIZE); 5241 4931 /* Fallback to the readpage in error/reconnect cases */ 5242 4932 kref_put(&rdata->refcount, cifs_readdata_release); 5243 4933 break; 5244 4934 } 5245 4935 5246 4936 kref_put(&rdata->refcount, cifs_readdata_release); 5247 - last_batch_size = nr_pages; 5248 4937 } 5249 4938 5250 4939 free_xid(xid); ··· 5283 4978 5284 4979 flush_dcache_page(page); 5285 4980 SetPageUptodate(page); 5286 - 5287 - /* send this page to the cache */ 5288 - cifs_readpage_to_fscache(file_inode(file), page); 5289 - 5290 4981 rc = 0; 5291 4982 5292 4983 io_error:
+8 -14
fs/cifs/fscache.c
··· 165 165 /* 166 166 * Fallback page writing interface. 167 167 */ 168 - static int fscache_fallback_write_page(struct inode *inode, struct page *page, 169 - bool no_space_allocated_yet) 168 + static int fscache_fallback_write_pages(struct inode *inode, loff_t start, size_t len, 169 + bool no_space_allocated_yet) 170 170 { 171 171 struct netfs_cache_resources cres; 172 172 struct fscache_cookie *cookie = cifs_inode_cookie(inode); 173 173 struct iov_iter iter; 174 - struct bio_vec bvec[1]; 175 - loff_t start = page_offset(page); 176 - size_t len = PAGE_SIZE; 177 174 int ret; 178 175 179 176 memset(&cres, 0, sizeof(cres)); 180 - bvec[0].bv_page = page; 181 - bvec[0].bv_offset = 0; 182 - bvec[0].bv_len = PAGE_SIZE; 183 - iov_iter_bvec(&iter, ITER_SOURCE, bvec, ARRAY_SIZE(bvec), PAGE_SIZE); 177 + iov_iter_xarray(&iter, ITER_SOURCE, &inode->i_mapping->i_pages, start, len); 184 178 185 179 ret = fscache_begin_write_operation(&cres, cookie); 186 180 if (ret < 0) ··· 183 189 ret = cres.ops->prepare_write(&cres, &start, &len, i_size_read(inode), 184 190 no_space_allocated_yet); 185 191 if (ret == 0) 186 - ret = fscache_write(&cres, page_offset(page), &iter, NULL, NULL); 192 + ret = fscache_write(&cres, start, &iter, NULL, NULL); 187 193 fscache_end_operation(&cres); 188 194 return ret; 189 195 } ··· 207 213 return 0; 208 214 } 209 215 210 - void __cifs_readpage_to_fscache(struct inode *inode, struct page *page) 216 + void __cifs_readahead_to_fscache(struct inode *inode, loff_t pos, size_t len) 211 217 { 212 - cifs_dbg(FYI, "%s: (fsc: %p, p: %p, i: %p)\n", 213 - __func__, cifs_inode_cookie(inode), page, inode); 218 + cifs_dbg(FYI, "%s: (fsc: %p, p: %llx, l: %zx, i: %p)\n", 219 + __func__, cifs_inode_cookie(inode), pos, len, inode); 214 220 215 - fscache_fallback_write_page(inode, page, true); 221 + fscache_fallback_write_pages(inode, pos, len, true); 216 222 } 217 223 218 224 /*
+5 -5
fs/cifs/fscache.h
··· 90 90 } 91 91 92 92 extern int __cifs_readpage_from_fscache(struct inode *pinode, struct page *ppage); 93 - extern void __cifs_readpage_to_fscache(struct inode *pinode, struct page *ppage); 93 + extern void __cifs_readahead_to_fscache(struct inode *pinode, loff_t pos, size_t len); 94 94 95 95 96 96 static inline int cifs_readpage_from_fscache(struct inode *inode, ··· 101 101 return -ENOBUFS; 102 102 } 103 103 104 - static inline void cifs_readpage_to_fscache(struct inode *inode, 105 - struct page *page) 104 + static inline void cifs_readahead_to_fscache(struct inode *inode, 105 + loff_t pos, size_t len) 106 106 { 107 107 if (cifs_inode_cookie(inode)) 108 - __cifs_readpage_to_fscache(inode, page); 108 + __cifs_readahead_to_fscache(inode, pos, len); 109 109 } 110 110 111 111 #else /* CONFIG_CIFS_FSCACHE */ ··· 141 141 } 142 142 143 143 static inline 144 - void cifs_readpage_to_fscache(struct inode *inode, struct page *page) {} 144 + void cifs_readahead_to_fscache(struct inode *inode, loff_t pos, size_t len) {} 145 145 146 146 #endif /* CONFIG_CIFS_FSCACHE */ 147 147
+13 -115
fs/cifs/misc.c
··· 966 966 967 967 /* 968 968 * ctx->bv is only set if setup_aio_ctx_iter() was call successfuly 969 - * which means that iov_iter_get_pages() was a success and thus that 970 - * we have taken reference on pages. 969 + * which means that iov_iter_extract_pages() was a success and thus 970 + * that we may have references or pins on pages that we need to 971 + * release. 971 972 */ 972 973 if (ctx->bv) { 973 - unsigned i; 974 + if (ctx->should_dirty || ctx->bv_need_unpin) { 975 + unsigned int i; 974 976 975 - for (i = 0; i < ctx->npages; i++) { 976 - if (ctx->should_dirty) 977 - set_page_dirty(ctx->bv[i].bv_page); 978 - put_page(ctx->bv[i].bv_page); 977 + for (i = 0; i < ctx->nr_pinned_pages; i++) { 978 + struct page *page = ctx->bv[i].bv_page; 979 + 980 + if (ctx->should_dirty) 981 + set_page_dirty(page); 982 + if (ctx->bv_need_unpin) 983 + unpin_user_page(page); 984 + } 979 985 } 980 986 kvfree(ctx->bv); 981 987 } 982 988 983 989 kfree(ctx); 984 - } 985 - 986 - #define CIFS_AIO_KMALLOC_LIMIT (1024 * 1024) 987 - 988 - int 989 - setup_aio_ctx_iter(struct cifs_aio_ctx *ctx, struct iov_iter *iter, int rw) 990 - { 991 - ssize_t rc; 992 - unsigned int cur_npages; 993 - unsigned int npages = 0; 994 - unsigned int i; 995 - size_t len; 996 - size_t count = iov_iter_count(iter); 997 - unsigned int saved_len; 998 - size_t start; 999 - unsigned int max_pages = iov_iter_npages(iter, INT_MAX); 1000 - struct page **pages = NULL; 1001 - struct bio_vec *bv = NULL; 1002 - 1003 - if (iov_iter_is_kvec(iter)) { 1004 - memcpy(&ctx->iter, iter, sizeof(*iter)); 1005 - ctx->len = count; 1006 - iov_iter_advance(iter, count); 1007 - return 0; 1008 - } 1009 - 1010 - if (array_size(max_pages, sizeof(*bv)) <= CIFS_AIO_KMALLOC_LIMIT) 1011 - bv = kmalloc_array(max_pages, sizeof(*bv), GFP_KERNEL); 1012 - 1013 - if (!bv) { 1014 - bv = vmalloc(array_size(max_pages, sizeof(*bv))); 1015 - if (!bv) 1016 - return -ENOMEM; 1017 - } 1018 - 1019 - if (array_size(max_pages, sizeof(*pages)) <= CIFS_AIO_KMALLOC_LIMIT) 1020 - pages = kmalloc_array(max_pages, sizeof(*pages), GFP_KERNEL); 1021 - 1022 - if (!pages) { 1023 - pages = vmalloc(array_size(max_pages, sizeof(*pages))); 1024 - if (!pages) { 1025 - kvfree(bv); 1026 - return -ENOMEM; 1027 - } 1028 - } 1029 - 1030 - saved_len = count; 1031 - 1032 - while (count && npages < max_pages) { 1033 - rc = iov_iter_get_pages2(iter, pages, count, max_pages, &start); 1034 - if (rc < 0) { 1035 - cifs_dbg(VFS, "Couldn't get user pages (rc=%zd)\n", rc); 1036 - break; 1037 - } 1038 - 1039 - if (rc > count) { 1040 - cifs_dbg(VFS, "get pages rc=%zd more than %zu\n", rc, 1041 - count); 1042 - break; 1043 - } 1044 - 1045 - count -= rc; 1046 - rc += start; 1047 - cur_npages = DIV_ROUND_UP(rc, PAGE_SIZE); 1048 - 1049 - if (npages + cur_npages > max_pages) { 1050 - cifs_dbg(VFS, "out of vec array capacity (%u vs %u)\n", 1051 - npages + cur_npages, max_pages); 1052 - break; 1053 - } 1054 - 1055 - for (i = 0; i < cur_npages; i++) { 1056 - len = rc > PAGE_SIZE ? PAGE_SIZE : rc; 1057 - bv[npages + i].bv_page = pages[i]; 1058 - bv[npages + i].bv_offset = start; 1059 - bv[npages + i].bv_len = len - start; 1060 - rc -= len; 1061 - start = 0; 1062 - } 1063 - 1064 - npages += cur_npages; 1065 - } 1066 - 1067 - kvfree(pages); 1068 - ctx->bv = bv; 1069 - ctx->len = saved_len - count; 1070 - ctx->npages = npages; 1071 - iov_iter_bvec(&ctx->iter, rw, ctx->bv, npages, ctx->len); 1072 - return 0; 1073 990 } 1074 991 1075 992 /** ··· 1044 1127 1045 1128 kfree_sensitive(*sdesc); 1046 1129 *sdesc = NULL; 1047 - } 1048 - 1049 - /** 1050 - * rqst_page_get_length - obtain the length and offset for a page in smb_rqst 1051 - * @rqst: The request descriptor 1052 - * @page: The index of the page to query 1053 - * @len: Where to store the length for this page: 1054 - * @offset: Where to store the offset for this page 1055 - */ 1056 - void rqst_page_get_length(const struct smb_rqst *rqst, unsigned int page, 1057 - unsigned int *len, unsigned int *offset) 1058 - { 1059 - *len = rqst->rq_pagesz; 1060 - *offset = (page == 0) ? rqst->rq_offset : 0; 1061 - 1062 - if (rqst->rq_npages == 1 || page == rqst->rq_npages-1) 1063 - *len = rqst->rq_tailsz; 1064 - else if (page == 0) 1065 - *len = rqst->rq_pagesz - rqst->rq_offset; 1066 1130 } 1067 1131 1068 1132 void extract_unc_hostname(const char *unc, const char **h, size_t *len)
+180 -196
fs/cifs/smb2ops.c
··· 4238 4238 4239 4239 static void *smb2_aead_req_alloc(struct crypto_aead *tfm, const struct smb_rqst *rqst, 4240 4240 int num_rqst, const u8 *sig, u8 **iv, 4241 - struct aead_request **req, struct scatterlist **sgl, 4242 - unsigned int *num_sgs) 4241 + struct aead_request **req, struct sg_table *sgt, 4242 + unsigned int *num_sgs, size_t *sensitive_size) 4243 4243 { 4244 4244 unsigned int req_size = sizeof(**req) + crypto_aead_reqsize(tfm); 4245 4245 unsigned int iv_size = crypto_aead_ivsize(tfm); ··· 4247 4247 u8 *p; 4248 4248 4249 4249 *num_sgs = cifs_get_num_sgs(rqst, num_rqst, sig); 4250 + if (IS_ERR_VALUE((long)(int)*num_sgs)) 4251 + return ERR_PTR(*num_sgs); 4250 4252 4251 4253 len = iv_size; 4252 4254 len += crypto_aead_alignmask(tfm) & ~(crypto_tfm_ctx_alignment() - 1); 4253 4255 len = ALIGN(len, crypto_tfm_ctx_alignment()); 4254 4256 len += req_size; 4255 4257 len = ALIGN(len, __alignof__(struct scatterlist)); 4256 - len += *num_sgs * sizeof(**sgl); 4258 + len += array_size(*num_sgs, sizeof(struct scatterlist)); 4259 + *sensitive_size = len; 4257 4260 4258 - p = kmalloc(len, GFP_ATOMIC); 4261 + p = kvzalloc(len, GFP_NOFS); 4259 4262 if (!p) 4260 - return NULL; 4263 + return ERR_PTR(-ENOMEM); 4261 4264 4262 4265 *iv = (u8 *)PTR_ALIGN(p, crypto_aead_alignmask(tfm) + 1); 4263 4266 *req = (struct aead_request *)PTR_ALIGN(*iv + iv_size, 4264 4267 crypto_tfm_ctx_alignment()); 4265 - *sgl = (struct scatterlist *)PTR_ALIGN((u8 *)*req + req_size, 4266 - __alignof__(struct scatterlist)); 4268 + sgt->sgl = (struct scatterlist *)PTR_ALIGN((u8 *)*req + req_size, 4269 + __alignof__(struct scatterlist)); 4267 4270 return p; 4268 4271 } 4269 4272 4270 - static void *smb2_get_aead_req(struct crypto_aead *tfm, const struct smb_rqst *rqst, 4273 + static void *smb2_get_aead_req(struct crypto_aead *tfm, struct smb_rqst *rqst, 4271 4274 int num_rqst, const u8 *sig, u8 **iv, 4272 - struct aead_request **req, struct scatterlist **sgl) 4275 + struct aead_request **req, struct scatterlist **sgl, 4276 + size_t *sensitive_size) 4273 4277 { 4274 - unsigned int off, len, skip; 4275 - struct scatterlist *sg; 4276 - unsigned int num_sgs; 4277 - unsigned long addr; 4278 - int i, j; 4278 + struct sg_table sgtable = {}; 4279 + unsigned int skip, num_sgs, i, j; 4280 + ssize_t rc; 4279 4281 void *p; 4280 4282 4281 - p = smb2_aead_req_alloc(tfm, rqst, num_rqst, sig, iv, req, sgl, &num_sgs); 4282 - if (!p) 4283 - return NULL; 4283 + p = smb2_aead_req_alloc(tfm, rqst, num_rqst, sig, iv, req, &sgtable, 4284 + &num_sgs, sensitive_size); 4285 + if (IS_ERR(p)) 4286 + return ERR_CAST(p); 4284 4287 4285 - sg_init_table(*sgl, num_sgs); 4286 - sg = *sgl; 4288 + sg_init_marker(sgtable.sgl, num_sgs); 4287 4289 4288 4290 /* 4289 4291 * The first rqst has a transform header where the ··· 4293 4291 */ 4294 4292 skip = 20; 4295 4293 4296 - /* Assumes the first rqst has a transform header as the first iov. 4297 - * I.e. 4298 - * rqst[0].rq_iov[0] is transform header 4299 - * rqst[0].rq_iov[1+] data to be encrypted/decrypted 4300 - * rqst[1+].rq_iov[0+] data to be encrypted/decrypted 4301 - */ 4302 4294 for (i = 0; i < num_rqst; i++) { 4303 - for (j = 0; j < rqst[i].rq_nvec; j++) { 4304 - struct kvec *iov = &rqst[i].rq_iov[j]; 4295 + struct iov_iter *iter = &rqst[i].rq_iter; 4296 + size_t count = iov_iter_count(iter); 4305 4297 4306 - addr = (unsigned long)iov->iov_base + skip; 4307 - len = iov->iov_len - skip; 4308 - sg = cifs_sg_set_buf(sg, (void *)addr, len); 4298 + for (j = 0; j < rqst[i].rq_nvec; j++) { 4299 + cifs_sg_set_buf(&sgtable, 4300 + rqst[i].rq_iov[j].iov_base + skip, 4301 + rqst[i].rq_iov[j].iov_len - skip); 4309 4302 4310 4303 /* See the above comment on the 'skip' assignment */ 4311 4304 skip = 0; 4312 4305 } 4313 - for (j = 0; j < rqst[i].rq_npages; j++) { 4314 - rqst_page_get_length(&rqst[i], j, &len, &off); 4315 - sg_set_page(sg++, rqst[i].rq_pages[j], len, off); 4316 - } 4317 - } 4318 - cifs_sg_set_buf(sg, sig, SMB2_SIGNATURE_SIZE); 4306 + sgtable.orig_nents = sgtable.nents; 4319 4307 4308 + rc = netfs_extract_iter_to_sg(iter, count, &sgtable, 4309 + num_sgs - sgtable.nents, 0); 4310 + iov_iter_revert(iter, rc); 4311 + sgtable.orig_nents = sgtable.nents; 4312 + } 4313 + 4314 + cifs_sg_set_buf(&sgtable, sig, SMB2_SIGNATURE_SIZE); 4315 + sg_mark_end(&sgtable.sgl[sgtable.nents - 1]); 4316 + *sgl = sgtable.sgl; 4320 4317 return p; 4321 4318 } 4322 4319 ··· 4369 4368 struct crypto_aead *tfm; 4370 4369 unsigned int crypt_len = le32_to_cpu(tr_hdr->OriginalMessageSize); 4371 4370 void *creq; 4371 + size_t sensitive_size; 4372 4372 4373 4373 rc = smb2_get_enc_key(server, le64_to_cpu(tr_hdr->SessionId), enc, key); 4374 4374 if (rc) { ··· 4403 4401 return rc; 4404 4402 } 4405 4403 4406 - creq = smb2_get_aead_req(tfm, rqst, num_rqst, sign, &iv, &req, &sg); 4407 - if (unlikely(!creq)) 4408 - return -ENOMEM; 4404 + creq = smb2_get_aead_req(tfm, rqst, num_rqst, sign, &iv, &req, &sg, 4405 + &sensitive_size); 4406 + if (IS_ERR(creq)) 4407 + return PTR_ERR(creq); 4409 4408 4410 4409 if (!enc) { 4411 4410 memcpy(sign, &tr_hdr->Signature, SMB2_SIGNATURE_SIZE); ··· 4434 4431 if (!rc && enc) 4435 4432 memcpy(&tr_hdr->Signature, sign, SMB2_SIGNATURE_SIZE); 4436 4433 4437 - kfree_sensitive(creq); 4434 + kvfree_sensitive(creq, sensitive_size); 4438 4435 return rc; 4436 + } 4437 + 4438 + /* 4439 + * Clear a read buffer, discarding the folios which have XA_MARK_0 set. 4440 + */ 4441 + static void cifs_clear_xarray_buffer(struct xarray *buffer) 4442 + { 4443 + struct folio *folio; 4444 + 4445 + XA_STATE(xas, buffer, 0); 4446 + 4447 + rcu_read_lock(); 4448 + xas_for_each_marked(&xas, folio, ULONG_MAX, XA_MARK_0) { 4449 + folio_put(folio); 4450 + } 4451 + rcu_read_unlock(); 4452 + xa_destroy(buffer); 4439 4453 } 4440 4454 4441 4455 void 4442 4456 smb3_free_compound_rqst(int num_rqst, struct smb_rqst *rqst) 4443 4457 { 4444 - int i, j; 4458 + int i; 4445 4459 4446 - for (i = 0; i < num_rqst; i++) { 4447 - if (rqst[i].rq_pages) { 4448 - for (j = rqst[i].rq_npages - 1; j >= 0; j--) 4449 - put_page(rqst[i].rq_pages[j]); 4450 - kfree(rqst[i].rq_pages); 4451 - } 4452 - } 4460 + for (i = 0; i < num_rqst; i++) 4461 + if (!xa_empty(&rqst[i].rq_buffer)) 4462 + cifs_clear_xarray_buffer(&rqst[i].rq_buffer); 4453 4463 } 4454 4464 4455 4465 /* ··· 4482 4466 smb3_init_transform_rq(struct TCP_Server_Info *server, int num_rqst, 4483 4467 struct smb_rqst *new_rq, struct smb_rqst *old_rq) 4484 4468 { 4485 - struct page **pages; 4486 4469 struct smb2_transform_hdr *tr_hdr = new_rq[0].rq_iov[0].iov_base; 4487 - unsigned int npages; 4470 + struct page *page; 4488 4471 unsigned int orig_len = 0; 4489 4472 int i, j; 4490 4473 int rc = -ENOMEM; ··· 4491 4476 for (i = 1; i < num_rqst; i++) { 4492 4477 struct smb_rqst *old = &old_rq[i - 1]; 4493 4478 struct smb_rqst *new = &new_rq[i]; 4479 + struct xarray *buffer = &new->rq_buffer; 4480 + size_t size = iov_iter_count(&old->rq_iter), seg, copied = 0; 4494 4481 4495 4482 orig_len += smb_rqst_len(server, old); 4496 4483 new->rq_iov = old->rq_iov; 4497 4484 new->rq_nvec = old->rq_nvec; 4498 4485 4499 - npages = old->rq_npages; 4500 - if (!npages) 4501 - continue; 4486 + xa_init(buffer); 4502 4487 4503 - pages = kmalloc_array(npages, sizeof(struct page *), 4504 - GFP_KERNEL); 4505 - if (!pages) 4506 - goto err_free; 4488 + if (size > 0) { 4489 + unsigned int npages = DIV_ROUND_UP(size, PAGE_SIZE); 4507 4490 4508 - new->rq_pages = pages; 4509 - new->rq_npages = npages; 4510 - new->rq_offset = old->rq_offset; 4511 - new->rq_pagesz = old->rq_pagesz; 4512 - new->rq_tailsz = old->rq_tailsz; 4491 + for (j = 0; j < npages; j++) { 4492 + void *o; 4513 4493 4514 - for (j = 0; j < npages; j++) { 4515 - pages[j] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); 4516 - if (!pages[j]) 4517 - goto err_free; 4518 - } 4494 + rc = -ENOMEM; 4495 + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); 4496 + if (!page) 4497 + goto err_free; 4498 + page->index = j; 4499 + o = xa_store(buffer, j, page, GFP_KERNEL); 4500 + if (xa_is_err(o)) { 4501 + rc = xa_err(o); 4502 + put_page(page); 4503 + goto err_free; 4504 + } 4519 4505 4520 - /* copy pages form the old */ 4521 - for (j = 0; j < npages; j++) { 4522 - unsigned int offset, len; 4506 + xa_set_mark(buffer, j, XA_MARK_0); 4523 4507 4524 - rqst_page_get_length(new, j, &len, &offset); 4525 - 4526 - memcpy_page(new->rq_pages[j], offset, 4527 - old->rq_pages[j], offset, len); 4508 + seg = min_t(size_t, size - copied, PAGE_SIZE); 4509 + if (copy_page_from_iter(page, 0, seg, &old->rq_iter) != seg) { 4510 + rc = -EFAULT; 4511 + goto err_free; 4512 + } 4513 + copied += seg; 4514 + } 4515 + iov_iter_xarray(&new->rq_iter, ITER_SOURCE, 4516 + buffer, 0, size); 4517 + new->rq_iter_size = size; 4528 4518 } 4529 4519 } 4530 4520 ··· 4558 4538 4559 4539 static int 4560 4540 decrypt_raw_data(struct TCP_Server_Info *server, char *buf, 4561 - unsigned int buf_data_size, struct page **pages, 4562 - unsigned int npages, unsigned int page_data_size, 4541 + unsigned int buf_data_size, struct iov_iter *iter, 4563 4542 bool is_offloaded) 4564 4543 { 4565 4544 struct kvec iov[2]; 4566 4545 struct smb_rqst rqst = {NULL}; 4546 + size_t iter_size = 0; 4567 4547 int rc; 4568 4548 4569 4549 iov[0].iov_base = buf; ··· 4573 4553 4574 4554 rqst.rq_iov = iov; 4575 4555 rqst.rq_nvec = 2; 4576 - rqst.rq_pages = pages; 4577 - rqst.rq_npages = npages; 4578 - rqst.rq_pagesz = PAGE_SIZE; 4579 - rqst.rq_tailsz = (page_data_size % PAGE_SIZE) ? : PAGE_SIZE; 4556 + if (iter) { 4557 + rqst.rq_iter = *iter; 4558 + rqst.rq_iter_size = iov_iter_count(iter); 4559 + iter_size = iov_iter_count(iter); 4560 + } 4580 4561 4581 4562 rc = crypt_message(server, 1, &rqst, 0); 4582 4563 cifs_dbg(FYI, "Decrypt message returned %d\n", rc); ··· 4588 4567 memmove(buf, iov[1].iov_base, buf_data_size); 4589 4568 4590 4569 if (!is_offloaded) 4591 - server->total_read = buf_data_size + page_data_size; 4570 + server->total_read = buf_data_size + iter_size; 4592 4571 4593 4572 return rc; 4594 4573 } 4595 4574 4596 4575 static int 4597 - read_data_into_pages(struct TCP_Server_Info *server, struct page **pages, 4598 - unsigned int npages, unsigned int len) 4576 + cifs_copy_pages_to_iter(struct xarray *pages, unsigned int data_size, 4577 + unsigned int skip, struct iov_iter *iter) 4599 4578 { 4600 - int i; 4601 - int length; 4579 + struct page *page; 4580 + unsigned long index; 4602 4581 4603 - for (i = 0; i < npages; i++) { 4604 - struct page *page = pages[i]; 4605 - size_t n; 4582 + xa_for_each(pages, index, page) { 4583 + size_t n, len = min_t(unsigned int, PAGE_SIZE - skip, data_size); 4606 4584 4607 - n = len; 4608 - if (len >= PAGE_SIZE) { 4609 - /* enough data to fill the page */ 4610 - n = PAGE_SIZE; 4611 - len -= n; 4612 - } else { 4613 - zero_user(page, len, PAGE_SIZE - len); 4614 - len = 0; 4585 + n = copy_page_to_iter(page, skip, len, iter); 4586 + if (n != len) { 4587 + cifs_dbg(VFS, "%s: something went wrong\n", __func__); 4588 + return -EIO; 4615 4589 } 4616 - length = cifs_read_page_from_socket(server, page, 0, n); 4617 - if (length < 0) 4618 - return length; 4619 - server->total_read += length; 4590 + data_size -= n; 4591 + skip = 0; 4620 4592 } 4621 4593 4622 - return 0; 4623 - } 4624 - 4625 - static int 4626 - init_read_bvec(struct page **pages, unsigned int npages, unsigned int data_size, 4627 - unsigned int cur_off, struct bio_vec **page_vec) 4628 - { 4629 - struct bio_vec *bvec; 4630 - int i; 4631 - 4632 - bvec = kcalloc(npages, sizeof(struct bio_vec), GFP_KERNEL); 4633 - if (!bvec) 4634 - return -ENOMEM; 4635 - 4636 - for (i = 0; i < npages; i++) { 4637 - bvec[i].bv_page = pages[i]; 4638 - bvec[i].bv_offset = (i == 0) ? cur_off : 0; 4639 - bvec[i].bv_len = min_t(unsigned int, PAGE_SIZE, data_size); 4640 - data_size -= bvec[i].bv_len; 4641 - } 4642 - 4643 - if (data_size != 0) { 4644 - cifs_dbg(VFS, "%s: something went wrong\n", __func__); 4645 - kfree(bvec); 4646 - return -EIO; 4647 - } 4648 - 4649 - *page_vec = bvec; 4650 4594 return 0; 4651 4595 } 4652 4596 4653 4597 static int 4654 4598 handle_read_data(struct TCP_Server_Info *server, struct mid_q_entry *mid, 4655 - char *buf, unsigned int buf_len, struct page **pages, 4656 - unsigned int npages, unsigned int page_data_size, 4657 - bool is_offloaded) 4599 + char *buf, unsigned int buf_len, struct xarray *pages, 4600 + unsigned int pages_len, bool is_offloaded) 4658 4601 { 4659 4602 unsigned int data_offset; 4660 4603 unsigned int data_len; ··· 4627 4642 unsigned int pad_len; 4628 4643 struct cifs_readdata *rdata = mid->callback_data; 4629 4644 struct smb2_hdr *shdr = (struct smb2_hdr *)buf; 4630 - struct bio_vec *bvec = NULL; 4631 - struct iov_iter iter; 4632 - struct kvec iov; 4633 4645 int length; 4634 4646 bool use_rdma_mr = false; 4635 4647 ··· 4715 4733 return 0; 4716 4734 } 4717 4735 4718 - if (data_len > page_data_size - pad_len) { 4736 + if (data_len > pages_len - pad_len) { 4719 4737 /* data_len is corrupt -- discard frame */ 4720 4738 rdata->result = -EIO; 4721 4739 if (is_offloaded) ··· 4725 4743 return 0; 4726 4744 } 4727 4745 4728 - rdata->result = init_read_bvec(pages, npages, page_data_size, 4729 - cur_off, &bvec); 4746 + /* Copy the data to the output I/O iterator. */ 4747 + rdata->result = cifs_copy_pages_to_iter(pages, pages_len, 4748 + cur_off, &rdata->iter); 4730 4749 if (rdata->result != 0) { 4731 4750 if (is_offloaded) 4732 4751 mid->mid_state = MID_RESPONSE_MALFORMED; ··· 4735 4752 dequeue_mid(mid, rdata->result); 4736 4753 return 0; 4737 4754 } 4755 + rdata->got_bytes = pages_len; 4738 4756 4739 - iov_iter_bvec(&iter, ITER_SOURCE, bvec, npages, data_len); 4740 4757 } else if (buf_len >= data_offset + data_len) { 4741 4758 /* read response payload is in buf */ 4742 - WARN_ONCE(npages > 0, "read data can be either in buf or in pages"); 4743 - iov.iov_base = buf + data_offset; 4744 - iov.iov_len = data_len; 4745 - iov_iter_kvec(&iter, ITER_SOURCE, &iov, 1, data_len); 4759 + WARN_ONCE(pages && !xa_empty(pages), 4760 + "read data can be either in buf or in pages"); 4761 + length = copy_to_iter(buf + data_offset, data_len, &rdata->iter); 4762 + if (length < 0) 4763 + return length; 4764 + rdata->got_bytes = data_len; 4746 4765 } else { 4747 4766 /* read response payload cannot be in both buf and pages */ 4748 4767 WARN_ONCE(1, "buf can not contain only a part of read data"); ··· 4756 4771 return 0; 4757 4772 } 4758 4773 4759 - length = rdata->copy_into_pages(server, rdata, &iter); 4760 - 4761 - kfree(bvec); 4762 - 4763 - if (length < 0) 4764 - return length; 4765 - 4766 4774 if (is_offloaded) 4767 4775 mid->mid_state = MID_RESPONSE_RECEIVED; 4768 4776 else 4769 4777 dequeue_mid(mid, false); 4770 - return length; 4778 + return 0; 4771 4779 } 4772 4780 4773 4781 struct smb2_decrypt_work { 4774 4782 struct work_struct decrypt; 4775 4783 struct TCP_Server_Info *server; 4776 - struct page **ppages; 4784 + struct xarray buffer; 4777 4785 char *buf; 4778 - unsigned int npages; 4779 4786 unsigned int len; 4780 4787 }; 4781 4788 ··· 4776 4799 { 4777 4800 struct smb2_decrypt_work *dw = container_of(work, 4778 4801 struct smb2_decrypt_work, decrypt); 4779 - int i, rc; 4802 + int rc; 4780 4803 struct mid_q_entry *mid; 4804 + struct iov_iter iter; 4781 4805 4806 + iov_iter_xarray(&iter, ITER_DEST, &dw->buffer, 0, dw->len); 4782 4807 rc = decrypt_raw_data(dw->server, dw->buf, dw->server->vals->read_rsp_size, 4783 - dw->ppages, dw->npages, dw->len, true); 4808 + &iter, true); 4784 4809 if (rc) { 4785 4810 cifs_dbg(VFS, "error decrypting rc=%d\n", rc); 4786 4811 goto free_pages; ··· 4796 4817 mid->decrypted = true; 4797 4818 rc = handle_read_data(dw->server, mid, dw->buf, 4798 4819 dw->server->vals->read_rsp_size, 4799 - dw->ppages, dw->npages, dw->len, 4820 + &dw->buffer, dw->len, 4800 4821 true); 4801 4822 if (rc >= 0) { 4802 4823 #ifdef CONFIG_CIFS_STATS2 ··· 4829 4850 } 4830 4851 4831 4852 free_pages: 4832 - for (i = dw->npages-1; i >= 0; i--) 4833 - put_page(dw->ppages[i]); 4834 - 4835 - kfree(dw->ppages); 4853 + cifs_clear_xarray_buffer(&dw->buffer); 4836 4854 cifs_small_buf_release(dw->buf); 4837 4855 kfree(dw); 4838 4856 } ··· 4839 4863 receive_encrypted_read(struct TCP_Server_Info *server, struct mid_q_entry **mid, 4840 4864 int *num_mids) 4841 4865 { 4866 + struct page *page; 4842 4867 char *buf = server->smallbuf; 4843 4868 struct smb2_transform_hdr *tr_hdr = (struct smb2_transform_hdr *)buf; 4844 - unsigned int npages; 4845 - struct page **pages; 4846 - unsigned int len; 4869 + struct iov_iter iter; 4870 + unsigned int len, npages; 4847 4871 unsigned int buflen = server->pdu_size; 4848 4872 int rc; 4849 4873 int i = 0; 4850 4874 struct smb2_decrypt_work *dw; 4875 + 4876 + dw = kzalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL); 4877 + if (!dw) 4878 + return -ENOMEM; 4879 + xa_init(&dw->buffer); 4880 + INIT_WORK(&dw->decrypt, smb2_decrypt_offload); 4881 + dw->server = server; 4851 4882 4852 4883 *num_mids = 1; 4853 4884 len = min_t(unsigned int, buflen, server->vals->read_rsp_size + ··· 4862 4879 4863 4880 rc = cifs_read_from_socket(server, buf + HEADER_SIZE(server) - 1, len); 4864 4881 if (rc < 0) 4865 - return rc; 4882 + goto free_dw; 4866 4883 server->total_read += rc; 4867 4884 4868 4885 len = le32_to_cpu(tr_hdr->OriginalMessageSize) - 4869 4886 server->vals->read_rsp_size; 4887 + dw->len = len; 4870 4888 npages = DIV_ROUND_UP(len, PAGE_SIZE); 4871 4889 4872 - pages = kmalloc_array(npages, sizeof(struct page *), GFP_KERNEL); 4873 - if (!pages) { 4874 - rc = -ENOMEM; 4875 - goto discard_data; 4876 - } 4877 - 4890 + rc = -ENOMEM; 4878 4891 for (; i < npages; i++) { 4879 - pages[i] = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); 4880 - if (!pages[i]) { 4881 - rc = -ENOMEM; 4892 + void *old; 4893 + 4894 + page = alloc_page(GFP_KERNEL|__GFP_HIGHMEM); 4895 + if (!page) 4896 + goto discard_data; 4897 + page->index = i; 4898 + old = xa_store(&dw->buffer, i, page, GFP_KERNEL); 4899 + if (xa_is_err(old)) { 4900 + rc = xa_err(old); 4901 + put_page(page); 4882 4902 goto discard_data; 4883 4903 } 4904 + xa_set_mark(&dw->buffer, i, XA_MARK_0); 4884 4905 } 4885 4906 4886 - /* read read data into pages */ 4887 - rc = read_data_into_pages(server, pages, npages, len); 4888 - if (rc) 4889 - goto free_pages; 4907 + iov_iter_xarray(&iter, ITER_DEST, &dw->buffer, 0, npages * PAGE_SIZE); 4908 + 4909 + /* Read the data into the buffer and clear excess bufferage. */ 4910 + rc = cifs_read_iter_from_socket(server, &iter, dw->len); 4911 + if (rc < 0) 4912 + goto discard_data; 4913 + 4914 + server->total_read += rc; 4915 + if (rc < npages * PAGE_SIZE) 4916 + iov_iter_zero(npages * PAGE_SIZE - rc, &iter); 4917 + iov_iter_revert(&iter, npages * PAGE_SIZE); 4918 + iov_iter_truncate(&iter, dw->len); 4890 4919 4891 4920 rc = cifs_discard_remaining_data(server); 4892 4921 if (rc) ··· 4911 4916 4912 4917 if ((server->min_offload) && (server->in_flight > 1) && 4913 4918 (server->pdu_size >= server->min_offload)) { 4914 - dw = kmalloc(sizeof(struct smb2_decrypt_work), GFP_KERNEL); 4915 - if (dw == NULL) 4916 - goto non_offloaded_decrypt; 4917 - 4918 4919 dw->buf = server->smallbuf; 4919 4920 server->smallbuf = (char *)cifs_small_buf_get(); 4920 4921 4921 - INIT_WORK(&dw->decrypt, smb2_decrypt_offload); 4922 - 4923 - dw->npages = npages; 4924 - dw->server = server; 4925 - dw->ppages = pages; 4926 - dw->len = len; 4927 4922 queue_work(decrypt_wq, &dw->decrypt); 4928 4923 *num_mids = 0; /* worker thread takes care of finding mid */ 4929 4924 return -1; 4930 4925 } 4931 4926 4932 - non_offloaded_decrypt: 4933 4927 rc = decrypt_raw_data(server, buf, server->vals->read_rsp_size, 4934 - pages, npages, len, false); 4928 + &iter, false); 4935 4929 if (rc) 4936 4930 goto free_pages; 4937 4931 4938 4932 *mid = smb2_find_mid(server, buf); 4939 - if (*mid == NULL) 4933 + if (*mid == NULL) { 4940 4934 cifs_dbg(FYI, "mid not found\n"); 4941 - else { 4935 + } else { 4942 4936 cifs_dbg(FYI, "mid found\n"); 4943 4937 (*mid)->decrypted = true; 4944 4938 rc = handle_read_data(server, *mid, buf, 4945 4939 server->vals->read_rsp_size, 4946 - pages, npages, len, false); 4940 + &dw->buffer, dw->len, false); 4947 4941 if (rc >= 0) { 4948 4942 if (server->ops->is_network_name_deleted) { 4949 4943 server->ops->is_network_name_deleted(buf, ··· 4942 4958 } 4943 4959 4944 4960 free_pages: 4945 - for (i = i - 1; i >= 0; i--) 4946 - put_page(pages[i]); 4947 - kfree(pages); 4961 + cifs_clear_xarray_buffer(&dw->buffer); 4962 + free_dw: 4963 + kfree(dw); 4948 4964 return rc; 4949 4965 discard_data: 4950 4966 cifs_discard_remaining_data(server); ··· 4982 4998 server->total_read += length; 4983 4999 4984 5000 buf_size = pdu_length - sizeof(struct smb2_transform_hdr); 4985 - length = decrypt_raw_data(server, buf, buf_size, NULL, 0, 0, false); 5001 + length = decrypt_raw_data(server, buf, buf_size, NULL, false); 4986 5002 if (length) 4987 5003 return length; 4988 5004 ··· 5081 5097 char *buf = server->large_buf ? server->bigbuf : server->smallbuf; 5082 5098 5083 5099 return handle_read_data(server, mid, buf, server->pdu_size, 5084 - NULL, 0, 0, false); 5100 + NULL, 0, false); 5085 5101 } 5086 5102 5087 5103 static int
+17 -36
fs/cifs/smb2pdu.c
··· 4139 4139 struct smbd_buffer_descriptor_v1 *v1; 4140 4140 bool need_invalidate = server->dialect == SMB30_PROT_ID; 4141 4141 4142 - rdata->mr = smbd_register_mr( 4143 - server->smbd_conn, rdata->pages, 4144 - rdata->nr_pages, rdata->page_offset, 4145 - rdata->tailsz, true, need_invalidate); 4142 + rdata->mr = smbd_register_mr(server->smbd_conn, &rdata->iter, 4143 + true, need_invalidate); 4146 4144 if (!rdata->mr) 4147 4145 return -EAGAIN; 4148 4146 ··· 4197 4199 (struct smb2_hdr *)rdata->iov[0].iov_base; 4198 4200 struct cifs_credits credits = { .value = 0, .instance = 0 }; 4199 4201 struct smb_rqst rqst = { .rq_iov = &rdata->iov[1], 4200 - .rq_nvec = 1, }; 4201 - 4202 - if (rdata->got_bytes) { 4203 - rqst.rq_pages = rdata->pages; 4204 - rqst.rq_offset = rdata->page_offset; 4205 - rqst.rq_npages = rdata->nr_pages; 4206 - rqst.rq_pagesz = rdata->pagesz; 4207 - rqst.rq_tailsz = rdata->tailsz; 4208 - } 4202 + .rq_nvec = 1, 4203 + .rq_iter = rdata->iter, 4204 + .rq_iter_size = iov_iter_count(&rdata->iter), }; 4209 4205 4210 4206 WARN_ONCE(rdata->server != mid->server, 4211 4207 "rdata server %p != mid server %p", ··· 4217 4225 if (server->sign && !mid->decrypted) { 4218 4226 int rc; 4219 4227 4228 + iov_iter_revert(&rqst.rq_iter, rdata->got_bytes); 4229 + iov_iter_truncate(&rqst.rq_iter, rdata->got_bytes); 4220 4230 rc = smb2_verify_signature(&rqst, server); 4221 4231 if (rc) 4222 4232 cifs_tcon_dbg(VFS, "SMB signature verification returned error = %d\n", ··· 4561 4567 req->VolatileFileId = io_parms->volatile_fid; 4562 4568 req->WriteChannelInfoOffset = 0; 4563 4569 req->WriteChannelInfoLength = 0; 4564 - req->Channel = 0; 4570 + req->Channel = SMB2_CHANNEL_NONE; 4565 4571 req->Offset = cpu_to_le64(io_parms->offset); 4566 4572 req->DataOffset = cpu_to_le16( 4567 4573 offsetof(struct smb2_write_req, Buffer)); ··· 4581 4587 */ 4582 4588 if (smb3_use_rdma_offload(io_parms)) { 4583 4589 struct smbd_buffer_descriptor_v1 *v1; 4590 + size_t data_size = iov_iter_count(&wdata->iter); 4584 4591 bool need_invalidate = server->dialect == SMB30_PROT_ID; 4585 4592 4586 - wdata->mr = smbd_register_mr( 4587 - server->smbd_conn, wdata->pages, 4588 - wdata->nr_pages, wdata->page_offset, 4589 - wdata->tailsz, false, need_invalidate); 4593 + wdata->mr = smbd_register_mr(server->smbd_conn, &wdata->iter, 4594 + false, need_invalidate); 4590 4595 if (!wdata->mr) { 4591 4596 rc = -EAGAIN; 4592 4597 goto async_writev_out; 4593 4598 } 4594 4599 req->Length = 0; 4595 4600 req->DataOffset = 0; 4596 - if (wdata->nr_pages > 1) 4597 - req->RemainingBytes = 4598 - cpu_to_le32( 4599 - (wdata->nr_pages - 1) * wdata->pagesz - 4600 - wdata->page_offset + wdata->tailsz 4601 - ); 4602 - else 4603 - req->RemainingBytes = cpu_to_le32(wdata->tailsz); 4601 + req->RemainingBytes = cpu_to_le32(data_size); 4604 4602 req->Channel = SMB2_CHANNEL_RDMA_V1_INVALIDATE; 4605 4603 if (need_invalidate) 4606 4604 req->Channel = SMB2_CHANNEL_RDMA_V1; ··· 4611 4625 4612 4626 rqst.rq_iov = iov; 4613 4627 rqst.rq_nvec = 1; 4614 - rqst.rq_pages = wdata->pages; 4615 - rqst.rq_offset = wdata->page_offset; 4616 - rqst.rq_npages = wdata->nr_pages; 4617 - rqst.rq_pagesz = wdata->pagesz; 4618 - rqst.rq_tailsz = wdata->tailsz; 4628 + rqst.rq_iter = wdata->iter; 4629 + rqst.rq_iter_size = iov_iter_count(&rqst.rq_iter); 4619 4630 #ifdef CONFIG_CIFS_SMB_DIRECT 4620 - if (wdata->mr) { 4631 + if (wdata->mr) 4621 4632 iov[0].iov_len += sizeof(struct smbd_buffer_descriptor_v1); 4622 - rqst.rq_npages = 0; 4623 - } 4624 4633 #endif 4625 - cifs_dbg(FYI, "async write at %llu %u bytes\n", 4626 - io_parms->offset, io_parms->length); 4634 + cifs_dbg(FYI, "async write at %llu %u bytes iter=%zx\n", 4635 + io_parms->offset, io_parms->length, iov_iter_count(&rqst.rq_iter)); 4627 4636 4628 4637 #ifdef CONFIG_CIFS_SMB_DIRECT 4629 4638 /* For RDMA read, I/O size is in RemainingBytes not in Length */
+98 -166
fs/cifs/smbdirect.c
··· 34 34 struct smbd_response *response); 35 35 36 36 static int smbd_post_send_empty(struct smbd_connection *info); 37 - static int smbd_post_send_data( 38 - struct smbd_connection *info, 39 - struct kvec *iov, int n_vec, int remaining_data_length); 40 - static int smbd_post_send_page(struct smbd_connection *info, 41 - struct page *page, unsigned long offset, 42 - size_t size, int remaining_data_length); 43 37 44 38 static void destroy_mr_list(struct smbd_connection *info); 45 39 static int allocate_mr_list(struct smbd_connection *info); ··· 981 987 } 982 988 983 989 /* 984 - * Send a page 985 - * page: the page to send 986 - * offset: offset in the page to send 987 - * size: length in the page to send 988 - * remaining_data_length: remaining data to send in this payload 989 - */ 990 - static int smbd_post_send_page(struct smbd_connection *info, struct page *page, 991 - unsigned long offset, size_t size, int remaining_data_length) 992 - { 993 - struct scatterlist sgl; 994 - 995 - sg_init_table(&sgl, 1); 996 - sg_set_page(&sgl, page, size, offset); 997 - 998 - return smbd_post_send_sgl(info, &sgl, size, remaining_data_length); 999 - } 1000 - 1001 - /* 1002 990 * Send an empty message 1003 991 * Empty message is used to extend credits to peer to for keep live 1004 992 * while there is no upper layer payload to send at the time ··· 989 1013 { 990 1014 info->count_send_empty++; 991 1015 return smbd_post_send_sgl(info, NULL, 0, 0); 992 - } 993 - 994 - /* 995 - * Send a data buffer 996 - * iov: the iov array describing the data buffers 997 - * n_vec: number of iov array 998 - * remaining_data_length: remaining data to send following this packet 999 - * in segmented SMBD packet 1000 - */ 1001 - static int smbd_post_send_data( 1002 - struct smbd_connection *info, struct kvec *iov, int n_vec, 1003 - int remaining_data_length) 1004 - { 1005 - int i; 1006 - u32 data_length = 0; 1007 - struct scatterlist sgl[SMBDIRECT_MAX_SEND_SGE - 1]; 1008 - 1009 - if (n_vec > SMBDIRECT_MAX_SEND_SGE - 1) { 1010 - cifs_dbg(VFS, "Can't fit data to SGL, n_vec=%d\n", n_vec); 1011 - return -EINVAL; 1012 - } 1013 - 1014 - sg_init_table(sgl, n_vec); 1015 - for (i = 0; i < n_vec; i++) { 1016 - data_length += iov[i].iov_len; 1017 - sg_set_buf(&sgl[i], iov[i].iov_base, iov[i].iov_len); 1018 - } 1019 - 1020 - return smbd_post_send_sgl(info, sgl, data_length, remaining_data_length); 1021 1016 } 1022 1017 1023 1018 /* ··· 1935 1988 } 1936 1989 1937 1990 /* 1991 + * Send the contents of an iterator 1992 + * @iter: The iterator to send 1993 + * @_remaining_data_length: remaining data to send in this payload 1994 + */ 1995 + static int smbd_post_send_iter(struct smbd_connection *info, 1996 + struct iov_iter *iter, 1997 + int *_remaining_data_length) 1998 + { 1999 + struct scatterlist sgl[SMBDIRECT_MAX_SEND_SGE - 1]; 2000 + unsigned int max_payload = info->max_send_size - sizeof(struct smbd_data_transfer); 2001 + ssize_t rc; 2002 + 2003 + /* We're not expecting a user-backed iter */ 2004 + WARN_ON(iov_iter_extract_will_pin(iter)); 2005 + 2006 + do { 2007 + struct sg_table sgtable = { .sgl = sgl }; 2008 + size_t maxlen = min_t(size_t, *_remaining_data_length, max_payload); 2009 + 2010 + sg_init_table(sgtable.sgl, ARRAY_SIZE(sgl)); 2011 + rc = netfs_extract_iter_to_sg(iter, maxlen, 2012 + &sgtable, ARRAY_SIZE(sgl), 0); 2013 + if (rc < 0) 2014 + break; 2015 + if (WARN_ON_ONCE(sgtable.nents == 0)) 2016 + return -EIO; 2017 + 2018 + sg_mark_end(&sgl[sgtable.nents - 1]); 2019 + *_remaining_data_length -= rc; 2020 + rc = smbd_post_send_sgl(info, sgl, rc, *_remaining_data_length); 2021 + } while (rc == 0 && iov_iter_count(iter) > 0); 2022 + 2023 + return rc; 2024 + } 2025 + 2026 + /* 1938 2027 * Send data to transport 1939 2028 * Each rqst is transported as a SMBDirect payload 1940 2029 * rqst: the data to write ··· 1980 1997 int num_rqst, struct smb_rqst *rqst_array) 1981 1998 { 1982 1999 struct smbd_connection *info = server->smbd_conn; 1983 - struct kvec vecs[SMBDIRECT_MAX_SEND_SGE - 1]; 1984 - int nvecs; 1985 - int size; 1986 - unsigned int buflen, remaining_data_length; 1987 - unsigned int offset, remaining_vec_data_length; 1988 - int start, i, j; 1989 - int max_iov_size = 1990 - info->max_send_size - sizeof(struct smbd_data_transfer); 1991 - struct kvec *iov; 1992 - int rc; 1993 2000 struct smb_rqst *rqst; 1994 - int rqst_idx; 2001 + struct iov_iter iter; 2002 + unsigned int remaining_data_length, klen; 2003 + int rc, i, rqst_idx; 1995 2004 1996 2005 if (info->transport_status != SMBD_CONNECTED) 1997 2006 return -EAGAIN; ··· 2010 2035 rqst_idx = 0; 2011 2036 do { 2012 2037 rqst = &rqst_array[rqst_idx]; 2013 - iov = rqst->rq_iov; 2014 2038 2015 2039 cifs_dbg(FYI, "Sending smb (RDMA): idx=%d smb_len=%lu\n", 2016 - rqst_idx, smb_rqst_len(server, rqst)); 2017 - remaining_vec_data_length = 0; 2018 - for (i = 0; i < rqst->rq_nvec; i++) { 2019 - remaining_vec_data_length += iov[i].iov_len; 2020 - dump_smb(iov[i].iov_base, iov[i].iov_len); 2040 + rqst_idx, smb_rqst_len(server, rqst)); 2041 + for (i = 0; i < rqst->rq_nvec; i++) 2042 + dump_smb(rqst->rq_iov[i].iov_base, rqst->rq_iov[i].iov_len); 2043 + 2044 + log_write(INFO, "RDMA-WR[%u] nvec=%d len=%u iter=%zu rqlen=%lu\n", 2045 + rqst_idx, rqst->rq_nvec, remaining_data_length, 2046 + iov_iter_count(&rqst->rq_iter), smb_rqst_len(server, rqst)); 2047 + 2048 + /* Send the metadata pages. */ 2049 + klen = 0; 2050 + for (i = 0; i < rqst->rq_nvec; i++) 2051 + klen += rqst->rq_iov[i].iov_len; 2052 + iov_iter_kvec(&iter, ITER_SOURCE, rqst->rq_iov, rqst->rq_nvec, klen); 2053 + 2054 + rc = smbd_post_send_iter(info, &iter, &remaining_data_length); 2055 + if (rc < 0) 2056 + break; 2057 + 2058 + if (iov_iter_count(&rqst->rq_iter) > 0) { 2059 + /* And then the data pages if there are any */ 2060 + rc = smbd_post_send_iter(info, &rqst->rq_iter, 2061 + &remaining_data_length); 2062 + if (rc < 0) 2063 + break; 2021 2064 } 2022 2065 2023 - log_write(INFO, "rqst_idx=%d nvec=%d rqst->rq_npages=%d rq_pagesz=%d rq_tailsz=%d buflen=%lu\n", 2024 - rqst_idx, rqst->rq_nvec, 2025 - rqst->rq_npages, rqst->rq_pagesz, 2026 - rqst->rq_tailsz, smb_rqst_len(server, rqst)); 2027 - 2028 - start = 0; 2029 - offset = 0; 2030 - do { 2031 - buflen = 0; 2032 - i = start; 2033 - j = 0; 2034 - while (i < rqst->rq_nvec && 2035 - j < SMBDIRECT_MAX_SEND_SGE - 1 && 2036 - buflen < max_iov_size) { 2037 - 2038 - vecs[j].iov_base = iov[i].iov_base + offset; 2039 - if (buflen + iov[i].iov_len > max_iov_size) { 2040 - vecs[j].iov_len = 2041 - max_iov_size - iov[i].iov_len; 2042 - buflen = max_iov_size; 2043 - offset = vecs[j].iov_len; 2044 - } else { 2045 - vecs[j].iov_len = 2046 - iov[i].iov_len - offset; 2047 - buflen += vecs[j].iov_len; 2048 - offset = 0; 2049 - ++i; 2050 - } 2051 - ++j; 2052 - } 2053 - 2054 - remaining_vec_data_length -= buflen; 2055 - remaining_data_length -= buflen; 2056 - log_write(INFO, "sending %s iov[%d] from start=%d nvecs=%d remaining_data_length=%d\n", 2057 - remaining_vec_data_length > 0 ? 2058 - "partial" : "complete", 2059 - rqst->rq_nvec, start, j, 2060 - remaining_data_length); 2061 - 2062 - start = i; 2063 - rc = smbd_post_send_data(info, vecs, j, remaining_data_length); 2064 - if (rc) 2065 - goto done; 2066 - } while (remaining_vec_data_length > 0); 2067 - 2068 - /* now sending pages if there are any */ 2069 - for (i = 0; i < rqst->rq_npages; i++) { 2070 - rqst_page_get_length(rqst, i, &buflen, &offset); 2071 - nvecs = (buflen + max_iov_size - 1) / max_iov_size; 2072 - log_write(INFO, "sending pages buflen=%d nvecs=%d\n", 2073 - buflen, nvecs); 2074 - for (j = 0; j < nvecs; j++) { 2075 - size = min_t(unsigned int, max_iov_size, remaining_data_length); 2076 - remaining_data_length -= size; 2077 - log_write(INFO, "sending pages i=%d offset=%d size=%d remaining_data_length=%d\n", 2078 - i, j * max_iov_size + offset, size, 2079 - remaining_data_length); 2080 - rc = smbd_post_send_page( 2081 - info, rqst->rq_pages[i], 2082 - j*max_iov_size + offset, 2083 - size, remaining_data_length); 2084 - if (rc) 2085 - goto done; 2086 - } 2087 - } 2088 2066 } while (++rqst_idx < num_rqst); 2089 2067 2090 - done: 2091 2068 /* 2092 2069 * As an optimization, we don't wait for individual I/O to finish 2093 2070 * before sending the next one. ··· 2245 2318 } 2246 2319 2247 2320 /* 2321 + * Transcribe the pages from an iterator into an MR scatterlist. 2322 + * @iter: The iterator to transcribe 2323 + * @_remaining_data_length: remaining data to send in this payload 2324 + */ 2325 + static int smbd_iter_to_mr(struct smbd_connection *info, 2326 + struct iov_iter *iter, 2327 + struct scatterlist *sgl, 2328 + unsigned int num_pages) 2329 + { 2330 + struct sg_table sgtable = { .sgl = sgl }; 2331 + int ret; 2332 + 2333 + sg_init_table(sgl, num_pages); 2334 + 2335 + ret = netfs_extract_iter_to_sg(iter, iov_iter_count(iter), 2336 + &sgtable, num_pages, 0); 2337 + WARN_ON(ret < 0); 2338 + return ret; 2339 + } 2340 + 2341 + /* 2248 2342 * Register memory for RDMA read/write 2249 - * pages[]: the list of pages to register memory with 2250 - * num_pages: the number of pages to register 2251 - * tailsz: if non-zero, the bytes to register in the last page 2343 + * iter: the buffer to register memory with 2252 2344 * writing: true if this is a RDMA write (SMB read), false for RDMA read 2253 2345 * need_invalidate: true if this MR needs to be locally invalidated after I/O 2254 2346 * return value: the MR registered, NULL if failed. 2255 2347 */ 2256 - struct smbd_mr *smbd_register_mr( 2257 - struct smbd_connection *info, struct page *pages[], int num_pages, 2258 - int offset, int tailsz, bool writing, bool need_invalidate) 2348 + struct smbd_mr *smbd_register_mr(struct smbd_connection *info, 2349 + struct iov_iter *iter, 2350 + bool writing, bool need_invalidate) 2259 2351 { 2260 2352 struct smbd_mr *smbdirect_mr; 2261 - int rc, i; 2353 + int rc, num_pages; 2262 2354 enum dma_data_direction dir; 2263 2355 struct ib_reg_wr *reg_wr; 2264 2356 2357 + num_pages = iov_iter_npages(iter, info->max_frmr_depth + 1); 2265 2358 if (num_pages > info->max_frmr_depth) { 2266 2359 log_rdma_mr(ERR, "num_pages=%d max_frmr_depth=%d\n", 2267 2360 num_pages, info->max_frmr_depth); 2361 + WARN_ON_ONCE(1); 2268 2362 return NULL; 2269 2363 } 2270 2364 ··· 2294 2346 log_rdma_mr(ERR, "get_mr returning NULL\n"); 2295 2347 return NULL; 2296 2348 } 2297 - smbdirect_mr->need_invalidate = need_invalidate; 2298 - smbdirect_mr->sgl_count = num_pages; 2299 - sg_init_table(smbdirect_mr->sgl, num_pages); 2300 2349 2301 - log_rdma_mr(INFO, "num_pages=0x%x offset=0x%x tailsz=0x%x\n", 2302 - num_pages, offset, tailsz); 2303 - 2304 - if (num_pages == 1) { 2305 - sg_set_page(&smbdirect_mr->sgl[0], pages[0], tailsz, offset); 2306 - goto skip_multiple_pages; 2307 - } 2308 - 2309 - /* We have at least two pages to register */ 2310 - sg_set_page( 2311 - &smbdirect_mr->sgl[0], pages[0], PAGE_SIZE - offset, offset); 2312 - i = 1; 2313 - while (i < num_pages - 1) { 2314 - sg_set_page(&smbdirect_mr->sgl[i], pages[i], PAGE_SIZE, 0); 2315 - i++; 2316 - } 2317 - sg_set_page(&smbdirect_mr->sgl[i], pages[i], 2318 - tailsz ? tailsz : PAGE_SIZE, 0); 2319 - 2320 - skip_multiple_pages: 2321 2350 dir = writing ? DMA_FROM_DEVICE : DMA_TO_DEVICE; 2322 2351 smbdirect_mr->dir = dir; 2352 + smbdirect_mr->need_invalidate = need_invalidate; 2353 + smbdirect_mr->sgl_count = num_pages; 2354 + 2355 + log_rdma_mr(INFO, "num_pages=0x%x count=0x%zx\n", 2356 + num_pages, iov_iter_count(iter)); 2357 + smbd_iter_to_mr(info, iter, smbdirect_mr->sgl, num_pages); 2358 + 2323 2359 rc = ib_dma_map_sg(info->id->device, smbdirect_mr->sgl, num_pages, dir); 2324 2360 if (!rc) { 2325 2361 log_rdma_mr(ERR, "ib_dma_map_sg num_pages=%x dir=%x rc=%x\n",
+2 -2
fs/cifs/smbdirect.h
··· 302 302 303 303 /* Interfaces to register and deregister MR for RDMA read/write */ 304 304 struct smbd_mr *smbd_register_mr( 305 - struct smbd_connection *info, struct page *pages[], int num_pages, 306 - int offset, int tailsz, bool writing, bool need_invalidate); 305 + struct smbd_connection *info, struct iov_iter *iter, 306 + bool writing, bool need_invalidate); 307 307 int smbd_deregister_mr(struct smbd_mr *mr); 308 308 309 309 #else
+16 -38
fs/cifs/transport.c
··· 270 270 for (i = 0; i < nvec; i++) 271 271 buflen += iov[i].iov_len; 272 272 273 - /* 274 - * Add in the page array if there is one. The caller needs to make 275 - * sure rq_offset and rq_tailsz are set correctly. If a buffer of 276 - * multiple pages ends at page boundary, rq_tailsz needs to be set to 277 - * PAGE_SIZE. 278 - */ 279 - if (rqst->rq_npages) { 280 - if (rqst->rq_npages == 1) 281 - buflen += rqst->rq_tailsz; 282 - else { 283 - /* 284 - * If there is more than one page, calculate the 285 - * buffer length based on rq_offset and rq_tailsz 286 - */ 287 - buflen += rqst->rq_pagesz * (rqst->rq_npages - 1) - 288 - rqst->rq_offset; 289 - buflen += rqst->rq_tailsz; 290 - } 291 - } 292 - 273 + buflen += iov_iter_count(&rqst->rq_iter); 293 274 return buflen; 294 275 } 295 276 ··· 357 376 358 377 total_len += sent; 359 378 360 - /* now walk the page array and send each page in it */ 361 - for (i = 0; i < rqst[j].rq_npages; i++) { 362 - struct bio_vec bvec; 363 - 364 - bvec.bv_page = rqst[j].rq_pages[i]; 365 - rqst_page_get_length(&rqst[j], i, &bvec.bv_len, 366 - &bvec.bv_offset); 367 - 368 - iov_iter_bvec(&smb_msg.msg_iter, ITER_SOURCE, 369 - &bvec, 1, bvec.bv_len); 379 + if (iov_iter_count(&rqst[j].rq_iter) > 0) { 380 + smb_msg.msg_iter = rqst[j].rq_iter; 370 381 rc = smb_send_kvec(server, &smb_msg, &sent); 371 382 if (rc < 0) 372 383 break; 373 - 374 384 total_len += sent; 375 385 } 376 - } 386 + 387 + } 377 388 378 389 unmask: 379 390 sigprocmask(SIG_SETMASK, &oldmask, NULL); ··· 1613 1640 cifs_discard_remaining_data(struct TCP_Server_Info *server) 1614 1641 { 1615 1642 unsigned int rfclen = server->pdu_size; 1616 - int remaining = rfclen + HEADER_PREAMBLE_SIZE(server) - 1643 + size_t remaining = rfclen + HEADER_PREAMBLE_SIZE(server) - 1617 1644 server->total_read; 1618 1645 1619 1646 while (remaining > 0) { 1620 - int length; 1647 + ssize_t length; 1621 1648 1622 1649 length = cifs_discard_from_socket(server, 1623 1650 min_t(size_t, remaining, ··· 1763 1790 return cifs_readv_discard(server, mid); 1764 1791 } 1765 1792 1766 - length = rdata->read_into_pages(server, rdata, data_len); 1767 - if (length < 0) 1768 - return length; 1769 - 1793 + #ifdef CONFIG_CIFS_SMB_DIRECT 1794 + if (rdata->mr) 1795 + length = data_len; /* An RDMA read is already done. */ 1796 + else 1797 + #endif 1798 + length = cifs_read_iter_from_socket(server, &rdata->iter, 1799 + data_len); 1800 + if (length > 0) 1801 + rdata->got_bytes += length; 1770 1802 server->total_read += length; 1771 1803 1772 1804 cifs_dbg(FYI, "total_read=%u buflen=%u remaining=%u\n",