Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

ext4: only update i_reserved_data_blocks on successful block allocation

In our fault injection test, we create an ext4 file, migrate it to
non-extent based file, then punch a hole and finally trigger a WARN_ON
in the ext4_da_update_reserve_space():

EXT4-fs warning (device sda): ext4_da_update_reserve_space:369:
ino 14, used 11 with only 10 reserved data blocks

When writing back a non-extent based file, if we enable delalloc, the
number of reserved blocks will be subtracted from the number of blocks
mapped by ext4_ind_map_blocks(), and the extent status tree will be
updated. We update the extent status tree by first removing the old
extent_status and then inserting the new extent_status. If the block range
we remove happens to be in an extent, then we need to allocate another
extent_status with ext4_es_alloc_extent().

use old to remove to add new
|----------|------------|------------|
old extent_status

The problem is that the allocation of a new extent_status failed due to a
fault injection, and __es_shrink() did not get free memory, resulting in
a return of -ENOMEM. Then do_writepages() retries after receiving -ENOMEM,
we map to the same extent again, and the number of reserved blocks is again
subtracted from the number of blocks in that extent. Since the blocks in
the same extent are subtracted twice, we end up triggering WARN_ON at
ext4_da_update_reserve_space() because used > ei->i_reserved_data_blocks.

For non-extent based file, we update the number of reserved blocks after
ext4_ind_map_blocks() is executed, which causes a problem that when we call
ext4_ind_map_blocks() to create a block, it doesn't always create a block,
but we always reduce the number of reserved blocks. So we move the logic
for updating reserved blocks to ext4_ind_map_blocks() to ensure that the
number of reserved blocks is updated only after we do succeed in allocating
some new blocks.

Fixes: 5f634d064c70 ("ext4: Fix quota accounting error with fallocate")
Cc: stable@kernel.org
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://lore.kernel.org/r/20230424033846.4732-2-libaokun1@huawei.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

authored by

Baokun Li and committed by
Theodore Ts'o
de25d6e9 f52f3d2b

+8 -10
+8
fs/ext4/indirect.c
··· 651 651 652 652 ext4_update_inode_fsync_trans(handle, inode, 1); 653 653 count = ar.len; 654 + 655 + /* 656 + * Update reserved blocks/metadata blocks after successful block 657 + * allocation which had been deferred till now. 658 + */ 659 + if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) 660 + ext4_da_update_reserve_space(inode, count, 1); 661 + 654 662 got_it: 655 663 map->m_flags |= EXT4_MAP_MAPPED; 656 664 map->m_pblk = le32_to_cpu(chain[depth-1].key);
-10
fs/ext4/inode.c
··· 632 632 */ 633 633 ext4_clear_inode_state(inode, EXT4_STATE_EXT_MIGRATE); 634 634 } 635 - 636 - /* 637 - * Update reserved blocks/metadata blocks after successful 638 - * block allocation which had been deferred till now. We don't 639 - * support fallocate for non extent files. So we can update 640 - * reserve space here. 641 - */ 642 - if ((retval > 0) && 643 - (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE)) 644 - ext4_da_update_reserve_space(inode, retval, 1); 645 635 } 646 636 647 637 if (retval > 0) {