Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

btrfs: make btrfs_check_nocow_lock() check more than one extent

Currently btrfs_check_nocow_lock() stops at the first extent it finds and
that extent may be smaller than the target range we want to NOCOW into.
But we can have multiple consecutive extents which we can NOCOW into, so
by stopping at the first one we find we just make the caller do more work
by splitting the write into multiple ones, or in the case of mmap writes
with large folios we fail with -ENOSPC in case the folio's range is
covered by more than one extent (the fallback to NOCOW for mmap writes in
case there's no available data space to reserve/allocate was recently
added by the patch "btrfs: fix -ENOSPC mmap write failure on NOCOW
files/extents").

Improve on this by checking for multiple consecutive extents.

Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

authored by

Filipe Manana and committed by
David Sterba
240fafaa 68e0fcc3

+30 -9
+30 -9
fs/btrfs/file.c
··· 984 984 struct btrfs_root *root = inode->root; 985 985 struct extent_state *cached_state = NULL; 986 986 u64 lockstart, lockend; 987 - u64 num_bytes; 988 - int ret; 987 + u64 cur_offset; 988 + int ret = 0; 989 989 990 990 if (!(inode->flags & (BTRFS_INODE_NODATACOW | BTRFS_INODE_PREALLOC))) 991 991 return 0; ··· 996 996 lockstart = round_down(pos, fs_info->sectorsize); 997 997 lockend = round_up(pos + *write_bytes, 998 998 fs_info->sectorsize) - 1; 999 - num_bytes = lockend - lockstart + 1; 1000 999 1001 1000 if (nowait) { 1002 1001 if (!btrfs_try_lock_ordered_range(inode, lockstart, lockend, ··· 1007 1008 btrfs_lock_and_flush_ordered_range(inode, lockstart, lockend, 1008 1009 &cached_state); 1009 1010 } 1010 - ret = can_nocow_extent(inode, lockstart, &num_bytes, NULL, nowait); 1011 - if (ret <= 0) 1012 - btrfs_drew_write_unlock(&root->snapshot_lock); 1013 - else 1014 - *write_bytes = min_t(size_t, *write_bytes , 1015 - num_bytes - pos + lockstart); 1011 + 1012 + cur_offset = lockstart; 1013 + while (cur_offset < lockend) { 1014 + u64 num_bytes = lockend - cur_offset + 1; 1015 + 1016 + ret = can_nocow_extent(inode, cur_offset, &num_bytes, NULL, nowait); 1017 + if (ret <= 0) { 1018 + /* 1019 + * If cur_offset == lockstart it means we haven't found 1020 + * any extent against which we can NOCOW, so unlock the 1021 + * snapshot lock. 1022 + */ 1023 + if (cur_offset == lockstart) 1024 + btrfs_drew_write_unlock(&root->snapshot_lock); 1025 + break; 1026 + } 1027 + cur_offset += num_bytes; 1028 + } 1029 + 1016 1030 btrfs_unlock_extent(&inode->io_tree, lockstart, lockend, &cached_state); 1031 + 1032 + /* 1033 + * cur_offset > lockstart means there's at least a partial range we can 1034 + * NOCOW, and that range can cover one or more extents. 1035 + */ 1036 + if (cur_offset > lockstart) { 1037 + *write_bytes = min_t(size_t, *write_bytes, cur_offset - pos); 1038 + return 1; 1039 + } 1017 1040 1018 1041 return ret; 1019 1042 }