Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag

Btrfs requires all of its bios to be fs block aligned, normally it's
totally fine but with the incoming block size larger than page size
(bs > ps) support, the requirement is no longer met for direct IOs.

Because iomap_dio_bio_iter() calls bio_iov_iter_get_pages(), only
requiring alignment to be bdev_logical_block_size().

In the real world that value is either 512 or 4K, on 4K page sized
systems it means bio_iov_iter_get_pages() can break the bio at any page
boundary, breaking btrfs' requirement for bs > ps cases.

To address this problem, introduce a new public iomap dio flag,
IOMAP_DIO_FSBLOCK_ALIGNED.

When calling __iomap_dio_rw() with that new flag, iomap_dio::flags will
inherit that new flag, and iomap_dio_bio_iter() will take fs block size
into the calculation of the alignment, and pass the alignment to
bio_iov_iter_get_pages(), respecting the fs block size requirement.

The initial user of this flag will be btrfs, which needs to calculate the
checksum for direct read and thus requires the biovec to be fs block
aligned for the incoming bs > ps support.

Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: Pankaj Raghav <p.raghav@samsung.com>
[hch: also align pos/len, incorporate the trace flags from Darrick]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20251031131045.1613229-2-hch@lst.de
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Christian Brauner <brauner@kernel.org>

authored by

Qu Wenruo and committed by
Christian Brauner
001397f5 560507cb

+27 -5
+15 -2
fs/iomap/direct-io.c
··· 336 336 int nr_pages, ret = 0; 337 337 u64 copied = 0; 338 338 size_t orig_count; 339 + unsigned int alignment; 339 340 340 - if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1)) 341 + /* 342 + * File systems that write out of place and always allocate new blocks 343 + * need each bio to be block aligned as that's the unit of allocation. 344 + */ 345 + if (dio->flags & IOMAP_DIO_FSBLOCK_ALIGNED) 346 + alignment = fs_block_size; 347 + else 348 + alignment = bdev_logical_block_size(iomap->bdev); 349 + 350 + if ((pos | length) & (alignment - 1)) 341 351 return -EINVAL; 342 352 343 353 if (dio->flags & IOMAP_DIO_WRITE) { ··· 444 434 bio->bi_end_io = iomap_dio_bio_end_io; 445 435 446 436 ret = bio_iov_iter_get_pages(bio, dio->submit.iter, 447 - bdev_logical_block_size(iomap->bdev) - 1); 437 + alignment - 1); 448 438 if (unlikely(ret)) { 449 439 /* 450 440 * We have to stop part way through an IO. We must fall ··· 648 638 649 639 if (iocb->ki_flags & IOCB_NOWAIT) 650 640 iomi.flags |= IOMAP_NOWAIT; 641 + 642 + if (dio_flags & IOMAP_DIO_FSBLOCK_ALIGNED) 643 + dio->flags |= IOMAP_DIO_FSBLOCK_ALIGNED; 651 644 652 645 if (iov_iter_rw(iter) == READ) { 653 646 /* reads can always complete inline */
+4 -3
fs/iomap/trace.h
··· 122 122 123 123 124 124 #define IOMAP_DIO_STRINGS \ 125 - {IOMAP_DIO_FORCE_WAIT, "DIO_FORCE_WAIT" }, \ 126 - {IOMAP_DIO_OVERWRITE_ONLY, "DIO_OVERWRITE_ONLY" }, \ 127 - {IOMAP_DIO_PARTIAL, "DIO_PARTIAL" } 125 + {IOMAP_DIO_FORCE_WAIT, "DIO_FORCE_WAIT" }, \ 126 + {IOMAP_DIO_OVERWRITE_ONLY, "DIO_OVERWRITE_ONLY" }, \ 127 + {IOMAP_DIO_PARTIAL, "DIO_PARTIAL" }, \ 128 + {IOMAP_DIO_FSBLOCK_ALIGNED, "DIO_FSBLOCK_ALIGNED" } 128 129 129 130 DECLARE_EVENT_CLASS(iomap_class, 130 131 TP_PROTO(struct inode *inode, struct iomap *iomap),
+8
include/linux/iomap.h
··· 553 553 */ 554 554 #define IOMAP_DIO_PARTIAL (1 << 2) 555 555 556 + /* 557 + * Ensure each bio is aligned to fs block size. 558 + * 559 + * For filesystems which need to calculate/verify the checksum of each fs 560 + * block. Otherwise they may not be able to handle unaligned bios. 561 + */ 562 + #define IOMAP_DIO_FSBLOCK_ALIGNED (1 << 3) 563 + 556 564 ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, 557 565 const struct iomap_ops *ops, const struct iomap_dio_ops *dops, 558 566 unsigned int dio_flags, void *private, size_t done_before);