Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

btrfs: prevent inline data extents read from touching blocks beyond its range

Currently reading an inline data extent will zero out the remaining
range in the page.

This is not yet causing problems even for block size < page size
(subpage) cases because:

1) An inline data extent always starts at file offset 0
Meaning at page read, we always read the inline extent first, before
any other blocks in the page. Then later blocks are properly read out
and re-fill the zeroed out ranges.

2) Currently btrfs will read out the whole page if a buffered write is
not page aligned
So a page is either fully uptodate at buffered write time (covers the
whole page), or we will read out the whole page first.
Meaning there is nothing to lose for such an inline extent read.

But it's still not ideal:

- We're zeroing out the page twice
Once done by read_inline_extent()/uncompress_inline(), once done by
btrfs_do_readpage() for ranges beyond i_size.

- We're touching blocks that don't belong to the inline extent
In the incoming patches, we can have a partial uptodate folio, of
which some dirty blocks can exist while the page is not fully uptodate:

The page size is 16K and block size is 4K:

0 4K 8K 12K 16K
| | |/////////| |

And range [8K, 12K) is dirtied by a buffered write, the remaining
blocks are not uptodate.

If range [0, 4K) contains an inline data extent, and we try to read
the whole page, the current behavior will overwrite range [8K, 12K)
with zero and cause data loss.

So to make the behavior more consistent and in preparation for future
changes, limit the inline data extents read to only zero out the range
inside the first block, not the whole page.

Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>

authored by

Qu Wenruo and committed by
David Sterba
1a5b5668 a66b39f6

+8 -6
+8 -6
fs/btrfs/inode.c
··· 6785 6785 { 6786 6786 int ret; 6787 6787 struct extent_buffer *leaf = path->nodes[0]; 6788 + const u32 blocksize = leaf->fs_info->sectorsize; 6788 6789 char *tmp; 6789 6790 size_t max_size; 6790 6791 unsigned long inline_size; ··· 6802 6801 6803 6802 read_extent_buffer(leaf, tmp, ptr, inline_size); 6804 6803 6805 - max_size = min_t(unsigned long, PAGE_SIZE, max_size); 6804 + max_size = min_t(unsigned long, blocksize, max_size); 6806 6805 ret = btrfs_decompress(compress_type, tmp, folio, 0, inline_size, 6807 6806 max_size); 6808 6807 ··· 6814 6813 * cover that region here. 6815 6814 */ 6816 6815 6817 - if (max_size < PAGE_SIZE) 6818 - folio_zero_range(folio, max_size, PAGE_SIZE - max_size); 6816 + if (max_size < blocksize) 6817 + folio_zero_range(folio, max_size, blocksize - max_size); 6819 6818 kfree(tmp); 6820 6819 return ret; 6821 6820 } 6822 6821 6823 6822 static int read_inline_extent(struct btrfs_path *path, struct folio *folio) 6824 6823 { 6824 + const u32 blocksize = path->nodes[0]->fs_info->sectorsize; 6825 6825 struct btrfs_file_extent_item *fi; 6826 6826 void *kaddr; 6827 6827 size_t copy_size; ··· 6837 6835 if (btrfs_file_extent_compression(path->nodes[0], fi) != BTRFS_COMPRESS_NONE) 6838 6836 return uncompress_inline(path, folio, fi); 6839 6837 6840 - copy_size = min_t(u64, PAGE_SIZE, 6838 + copy_size = min_t(u64, blocksize, 6841 6839 btrfs_file_extent_ram_bytes(path->nodes[0], fi)); 6842 6840 kaddr = kmap_local_folio(folio, 0); 6843 6841 read_extent_buffer(path->nodes[0], kaddr, 6844 6842 btrfs_file_extent_inline_start(fi), copy_size); 6845 6843 kunmap_local(kaddr); 6846 - if (copy_size < PAGE_SIZE) 6847 - folio_zero_range(folio, copy_size, PAGE_SIZE - copy_size); 6844 + if (copy_size < blocksize) 6845 + folio_zero_range(folio, copy_size, blocksize - copy_size); 6848 6846 return 0; 6849 6847 } 6850 6848