ext4: journal credits reservation fixes for DIO, fallocate

DIO and fallocate credit calculation is different than writepage, as
they do start a new journal right for each call to ext4_get_blocks_wrap().
This patch uses the helper function in DIO and fallocate case, passing
a flag indicating that the modified data are contigous thus could account
less indirect/index blocks.

This patch also fixed the journal credit reservation for direct I/O
(DIO). Previously the estimated credits for DIO only was calculated for
non-extent files, which was not enough if the file is extent-based.

Also fixed was fallocate double-counting credits for modifying the the
superblock.

Signed-off-by: Mingming Cao <cmm@us.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>

authored by Mingming Cao and committed by Theodore Ts'o f3bd1f3f ee12b630

+30 -27
+1
fs/ext4/ext4.h
··· 1073 1073 extern void ext4_set_aops(struct inode *inode); 1074 1074 extern int ext4_writepage_trans_blocks(struct inode *); 1075 1075 extern int ext4_meta_trans_blocks(struct inode *, int nrblocks, int idxblocks); 1076 + extern int ext4_chunk_trans_blocks(struct inode *, int nrblocks); 1076 1077 extern int ext4_block_truncate_page(handle_t *handle, 1077 1078 struct address_space *mapping, loff_t from); 1078 1079 extern int ext4_page_mkwrite(struct vm_area_struct *vma, struct page *page);
+5 -6
fs/ext4/extents.c
··· 1758 1758 { 1759 1759 if (path) { 1760 1760 int depth = ext_depth(inode); 1761 - int ret; 1761 + int ret = 0; 1762 1762 1763 1763 /* probably there is space in leaf? */ 1764 1764 if (le16_to_cpu(path[depth].p_hdr->eh_entries) ··· 1777 1777 } 1778 1778 } 1779 1779 1780 - return ext4_meta_trans_blocks(inode, num, 1); 1780 + return ext4_chunk_trans_blocks(inode, num); 1781 1781 } 1782 1782 1783 1783 /* ··· 2810 2810 /* 2811 2811 * probably first extent we're gonna free will be last in block 2812 2812 */ 2813 - err = ext4_writepage_trans_blocks(inode) + 3; 2813 + err = ext4_writepage_trans_blocks(inode); 2814 2814 handle = ext4_journal_start(inode, err); 2815 2815 if (IS_ERR(handle)) 2816 2816 return; ··· 2923 2923 max_blocks = (EXT4_BLOCK_ALIGN(len + offset, blkbits) >> blkbits) 2924 2924 - block; 2925 2925 /* 2926 - * credits to insert 1 extent into extent tree + buffers to be able to 2927 - * modify 1 super block, 1 block bitmap and 1 group descriptor. 2926 + * credits to insert 1 extent into extent tree 2928 2927 */ 2929 - credits = EXT4_DATA_TRANS_BLOCKS(inode->i_sb) + 3; 2928 + credits = ext4_chunk_trans_blocks(inode, max_blocks); 2930 2929 mutex_lock(&inode->i_mutex); 2931 2930 retry: 2932 2931 while (ret >= 0 && ret < max_blocks) {
+24 -21
fs/ext4/inode.c
··· 1044 1044 spin_unlock(&EXT4_I(inode)->i_block_reservation_lock); 1045 1045 } 1046 1046 1047 - /* Maximum number of blocks we map for direct IO at once. */ 1048 - #define DIO_MAX_BLOCKS 4096 1049 - /* 1050 - * Number of credits we need for writing DIO_MAX_BLOCKS: 1051 - * We need sb + group descriptor + bitmap + inode -> 4 1052 - * For B blocks with A block pointers per block we need: 1053 - * 1 (triple ind.) + (B/A/A + 2) (doubly ind.) + (B/A + 2) (indirect). 1054 - * If we plug in 4096 for B and 256 for A (for 1KB block size), we get 25. 1055 - */ 1056 - #define DIO_CREDITS 25 1057 - 1058 - 1059 1047 /* 1060 1048 * The ext4_get_blocks_wrap() function try to look up the requested blocks, 1061 1049 * and returns if the blocks are already mapped. ··· 1155 1167 return retval; 1156 1168 } 1157 1169 1170 + /* Maximum number of blocks we map for direct IO at once. */ 1171 + #define DIO_MAX_BLOCKS 4096 1172 + 1158 1173 static int ext4_get_block(struct inode *inode, sector_t iblock, 1159 1174 struct buffer_head *bh_result, int create) 1160 1175 { 1161 1176 handle_t *handle = ext4_journal_current_handle(); 1162 1177 int ret = 0, started = 0; 1163 1178 unsigned max_blocks = bh_result->b_size >> inode->i_blkbits; 1179 + int dio_credits; 1164 1180 1165 1181 if (create && !handle) { 1166 1182 /* Direct IO write... */ 1167 1183 if (max_blocks > DIO_MAX_BLOCKS) 1168 1184 max_blocks = DIO_MAX_BLOCKS; 1169 - handle = ext4_journal_start(inode, DIO_CREDITS + 1170 - 2 * EXT4_QUOTA_TRANS_BLOCKS(inode->i_sb)); 1185 + dio_credits = ext4_chunk_trans_blocks(inode, max_blocks); 1186 + handle = ext4_journal_start(inode, dio_credits); 1171 1187 if (IS_ERR(handle)) { 1172 1188 ret = PTR_ERR(handle); 1173 1189 goto out; ··· 2235 2243 * for DIO, writepages, and truncate 2236 2244 */ 2237 2245 #define EXT4_MAX_WRITEBACK_PAGES DIO_MAX_BLOCKS 2238 - #define EXT4_MAX_WRITEBACK_CREDITS DIO_CREDITS 2246 + #define EXT4_MAX_WRITEBACK_CREDITS 25 2239 2247 2240 2248 static int ext4_da_writepages(struct address_space *mapping, 2241 2249 struct writeback_control *wbc) ··· 4433 4441 4434 4442 /* 4435 4443 * Calulate the total number of credits to reserve to fit 4436 - * the modification of a single pages into a single transaction 4444 + * the modification of a single pages into a single transaction, 4445 + * which may include multiple chunks of block allocations. 4437 4446 * 4438 4447 * This could be called via ext4_write_begin() or later 4439 4448 * ext4_da_writepages() in delalyed allocation case. ··· 4442 4449 * In both case it's possible that we could allocating multiple 4443 4450 * chunks of blocks. We need to consider the worse case, when 4444 4451 * one new block per extent. 4445 - * 4446 - * For Direct IO and fallocate, the journal credits reservation 4447 - * is based on one single extent allocation, so they could use 4448 - * EXT4_DATA_TRANS_BLOCKS to get the needed credit to log a single 4449 - * chunk of allocation needs. 4450 4452 */ 4451 4453 int ext4_writepage_trans_blocks(struct inode *inode) 4452 4454 { ··· 4455 4467 ret += bpp; 4456 4468 return ret; 4457 4469 } 4470 + 4471 + /* 4472 + * Calculate the journal credits for a chunk of data modification. 4473 + * 4474 + * This is called from DIO, fallocate or whoever calling 4475 + * ext4_get_blocks_wrap() to map/allocate a chunk of contigous disk blocks. 4476 + * 4477 + * journal buffers for data blocks are not included here, as DIO 4478 + * and fallocate do no need to journal data buffers. 4479 + */ 4480 + int ext4_chunk_trans_blocks(struct inode *inode, int nrblocks) 4481 + { 4482 + return ext4_meta_trans_blocks(inode, nrblocks, 1); 4483 + } 4484 + 4458 4485 /* 4459 4486 * The caller must have previously called ext4_reserve_inode_write(). 4460 4487 * Give this, we know that the caller already has write access to iloc->bh.