Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

pNFS: Fix extent encoding in block/scsi layout

The ext_tree_encode_commit() function may be called multiple times for
the same file, layout, and last written byte if the provided buffer is
not large enough to encode all extents in it.

The first problem is that the last written byte field must be zeroed
only on a successful call, otherwise we will lose its actual value and
get an integer overflow on the next encoding attempt.

The second problem is that we can't count and encode in one pass. The
extent state changes during encoding, so if we return -ENOSPC but have
already encoded some extents into a small buffer, they will not be
re-encoded into a new larger buffer on the next try. As a result, the
client never commits these extents to the server.

Co-developed-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Konstantin Evtushenko <koevtushenko@yandex.com>
Signed-off-by: Sergey Bashirov <sergeybashirov@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20250630183537.196479-3-sergeybashirov@gmail.com
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>

authored by

Sergey Bashirov and committed by
Trond Myklebust
d84c4754 9768797c

+74 -6
+74 -6
fs/nfs/blocklayout/extent_tree.c
··· 520 520 return xdr_encode_hyper(p, be->be_length << SECTOR_SHIFT); 521 521 } 522 522 523 - static int ext_tree_encode_commit(struct pnfs_block_layout *bl, __be32 *p, 523 + /** 524 + * ext_tree_try_encode_commit - try to encode all extents into the buffer 525 + * @bl: pointer to the layout 526 + * @p: pointer to the output buffer 527 + * @buffer_size: size of the output buffer 528 + * @count: output pointer to the number of encoded extents 529 + * @lastbyte: output pointer to the last written byte 530 + * 531 + * Return values: 532 + * %0: Success, all required extents encoded, outputs are valid 533 + * %-ENOSPC: Buffer too small, nothing encoded, outputs are invalid 534 + */ 535 + static int 536 + ext_tree_try_encode_commit(struct pnfs_block_layout *bl, __be32 *p, 524 537 size_t buffer_size, size_t *count, __u64 *lastbyte) 525 538 { 526 539 struct pnfs_block_extent *be; 540 + 541 + spin_lock(&bl->bl_ext_lock); 542 + for (be = ext_tree_first(&bl->bl_ext_rw); be; be = ext_tree_next(be)) { 543 + if (be->be_state != PNFS_BLOCK_INVALID_DATA || 544 + be->be_tag != EXTENT_WRITTEN) 545 + continue; 546 + 547 + (*count)++; 548 + if (ext_tree_layoutupdate_size(bl, *count) > buffer_size) { 549 + spin_unlock(&bl->bl_ext_lock); 550 + return -ENOSPC; 551 + } 552 + } 553 + for (be = ext_tree_first(&bl->bl_ext_rw); be; be = ext_tree_next(be)) { 554 + if (be->be_state != PNFS_BLOCK_INVALID_DATA || 555 + be->be_tag != EXTENT_WRITTEN) 556 + continue; 557 + 558 + if (bl->bl_scsi_layout) 559 + p = encode_scsi_range(be, p); 560 + else 561 + p = encode_block_extent(be, p); 562 + be->be_tag = EXTENT_COMMITTING; 563 + } 564 + *lastbyte = (bl->bl_lwb != 0) ? bl->bl_lwb - 1 : U64_MAX; 565 + bl->bl_lwb = 0; 566 + spin_unlock(&bl->bl_ext_lock); 567 + 568 + return 0; 569 + } 570 + 571 + /** 572 + * ext_tree_encode_commit - encode as much as possible extents into the buffer 573 + * @bl: pointer to the layout 574 + * @p: pointer to the output buffer 575 + * @buffer_size: size of the output buffer 576 + * @count: output pointer to the number of encoded extents 577 + * @lastbyte: output pointer to the last written byte 578 + * 579 + * Return values: 580 + * %0: Success, all required extents encoded, outputs are valid 581 + * %-ENOSPC: Buffer too small, some extents are encoded, outputs are valid 582 + */ 583 + static int 584 + ext_tree_encode_commit(struct pnfs_block_layout *bl, __be32 *p, 585 + size_t buffer_size, size_t *count, __u64 *lastbyte) 586 + { 587 + struct pnfs_block_extent *be, *be_prev; 527 588 int ret = 0; 528 589 529 590 spin_lock(&bl->bl_ext_lock); ··· 595 534 596 535 (*count)++; 597 536 if (ext_tree_layoutupdate_size(bl, *count) > buffer_size) { 598 - /* keep counting.. */ 537 + (*count)--; 599 538 ret = -ENOSPC; 600 - continue; 539 + break; 601 540 } 602 541 603 542 if (bl->bl_scsi_layout) ··· 605 544 else 606 545 p = encode_block_extent(be, p); 607 546 be->be_tag = EXTENT_COMMITTING; 547 + be_prev = be; 608 548 } 609 - *lastbyte = bl->bl_lwb - 1; 610 - bl->bl_lwb = 0; 549 + if (!ret) { 550 + *lastbyte = (bl->bl_lwb != 0) ? bl->bl_lwb - 1 : U64_MAX; 551 + bl->bl_lwb = 0; 552 + } else { 553 + *lastbyte = be_prev->be_f_offset + be_prev->be_length; 554 + *lastbyte <<= SECTOR_SHIFT; 555 + *lastbyte -= 1; 556 + } 611 557 spin_unlock(&bl->bl_ext_lock); 612 558 613 559 return ret; ··· 645 577 start_p = page_address(arg->layoutupdate_page); 646 578 arg->layoutupdate_pages = &arg->layoutupdate_page; 647 579 648 - ret = ext_tree_encode_commit(bl, start_p + 1, buffer_size, 580 + ret = ext_tree_try_encode_commit(bl, start_p + 1, buffer_size, 649 581 &count, &arg->lastbytewritten); 650 582 if (unlikely(ret)) { 651 583 ext_tree_free_commitdata(arg, buffer_size);