Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

Pull ext4 updates from Ted Ts'o:
"Improvements to ext4's block allocator performance for very large file
systems, especially when the file system or files which are highly
fragmented. There is a new mount option, prefetch_block_bitmaps which
will pull in the block bitmaps and set up the in-memory buddy bitmaps
when the file system is initially mounted.

Beyond that, a lot of bug fixes and cleanups. In particular, a number
of changes to make ext4 more robust in the face of write errors or
file system corruptions"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (46 commits)
ext4: limit the length of per-inode prealloc list
ext4: reorganize if statement of ext4_mb_release_context()
ext4: add mb_debug logging when there are lost chunks
ext4: Fix comment typo "the the".
jbd2: clean up checksum verification in do_one_pass()
ext4: change to use fallthrough macro
ext4: remove unused parameter of ext4_generic_delete_entry function
mballoc: replace seq_printf with seq_puts
ext4: optimize the implementation of ext4_mb_good_group()
ext4: delete invalid comments near ext4_mb_check_limits()
ext4: fix typos in ext4_mb_regular_allocator() comment
ext4: fix checking of directory entry validity for inline directories
fs: prevent BUG_ON in submit_bh_wbc()
ext4: correctly restore system zone info when remount fails
ext4: handle add_system_zone() failure in ext4_setup_system_zone()
ext4: fold ext4_data_block_valid_rcu() into the caller
ext4: check journal inode extents more carefully
ext4: don't allow overlapping system zones
ext4: handle error of ext4_setup_system_zone() on remount
ext4: delete the invalid BUGON in ext4_mb_load_buddy_gfp()
...

+878 -421
+13 -10
Documentation/admin-guide/ext4.rst
··· 489 489 multiple of this tuning parameter if the stripe size is not set in the 490 490 ext4 superblock 491 491 492 + mb_max_inode_prealloc 493 + The maximum length of per-inode ext4_prealloc_space list. 494 + 492 495 mb_max_to_scan 493 496 The maximum number of extents the multiblock allocator will search to 494 497 find the best extent. ··· 532 529 Ioctls 533 530 ====== 534 531 535 - There is some Ext4 specific functionality which can be accessed by applications 536 - through the system call interfaces. The list of all Ext4 specific ioctls are 537 - shown in the table below. 532 + Ext4 implements various ioctls which can be used by applications to access 533 + ext4-specific functionality. An incomplete list of these ioctls is shown in the 534 + table below. This list includes truly ext4-specific ioctls (``EXT4_IOC_*``) as 535 + well as ioctls that may have been ext4-specific originally but are now supported 536 + by some other filesystem(s) too (``FS_IOC_*``). 538 537 539 - Table of Ext4 specific ioctls 538 + Table of Ext4 ioctls 540 539 541 - EXT4_IOC_GETFLAGS 540 + FS_IOC_GETFLAGS 542 541 Get additional attributes associated with inode. The ioctl argument is 543 - an integer bitfield, with bit values described in ext4.h. This ioctl is 544 - an alias for FS_IOC_GETFLAGS. 542 + an integer bitfield, with bit values described in ext4.h. 545 543 546 - EXT4_IOC_SETFLAGS 544 + FS_IOC_SETFLAGS 547 545 Set additional attributes associated with inode. The ioctl argument is 548 - an integer bitfield, with bit values described in ext4.h. This ioctl is 549 - an alias for FS_IOC_SETFLAGS. 546 + an integer bitfield, with bit values described in ext4.h. 550 547 551 548 EXT4_IOC_GETVERSION, EXT4_IOC_GETVERSION_OLD 552 549 Get the inode i_generation number stored for each inode. The
+1 -1
Documentation/filesystems/ext4/about.rst
··· 39 39 Other References 40 40 ---------------- 41 41 42 - Also see http://www.nongnu.org/ext2-doc/ for quite a collection of 42 + Also see https://www.nongnu.org/ext2-doc/ for quite a collection of 43 43 information about ext2/3. Here's another old reference: 44 44 http://wiki.osdev.org/Ext2
+9
fs/buffer.c
··· 3157 3157 WARN_ON(atomic_read(&bh->b_count) < 1); 3158 3158 lock_buffer(bh); 3159 3159 if (test_clear_buffer_dirty(bh)) { 3160 + /* 3161 + * The bh should be mapped, but it might not be if the 3162 + * device was hot-removed. Not much we can do but fail the I/O. 3163 + */ 3164 + if (!buffer_mapped(bh)) { 3165 + unlock_buffer(bh); 3166 + return -EIO; 3167 + } 3168 + 3160 3169 get_bh(bh); 3161 3170 bh->b_end_io = end_buffer_write_sync; 3162 3171 ret = submit_bh(REQ_OP_WRITE, op_flags, bh);
+1 -1
fs/ext4/Kconfig
··· 110 110 This builds the ext4 KUnit tests. 111 111 112 112 KUnit tests run during boot and output the results to the debug log 113 - in TAP format (http://testanything.org/). Only useful for kernel devs 113 + in TAP format (https://testanything.org/). Only useful for kernel devs 114 114 running KUnit test harness and are not for inclusion into a production 115 115 build. 116 116
+12 -4
fs/ext4/balloc.c
··· 413 413 * Return buffer_head on success or an ERR_PTR in case of failure. 414 414 */ 415 415 struct buffer_head * 416 - ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group) 416 + ext4_read_block_bitmap_nowait(struct super_block *sb, ext4_group_t block_group, 417 + bool ignore_locked) 417 418 { 418 419 struct ext4_group_desc *desc; 419 420 struct ext4_sb_info *sbi = EXT4_SB(sb); ··· 440 439 "block_group = %u, block_bitmap = %llu", 441 440 block_group, bitmap_blk); 442 441 return ERR_PTR(-ENOMEM); 442 + } 443 + 444 + if (ignore_locked && buffer_locked(bh)) { 445 + /* buffer under IO already, return if called for prefetching */ 446 + put_bh(bh); 447 + return NULL; 443 448 } 444 449 445 450 if (bitmap_uptodate(bh)) ··· 494 487 * submit the buffer_head for reading 495 488 */ 496 489 set_buffer_new(bh); 497 - trace_ext4_read_block_bitmap_load(sb, block_group); 490 + trace_ext4_read_block_bitmap_load(sb, block_group, ignore_locked); 498 491 bh->b_end_io = ext4_end_bitmap_read; 499 492 get_bh(bh); 500 - submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO, bh); 493 + submit_bh(REQ_OP_READ, REQ_META | REQ_PRIO | 494 + (ignore_locked ? REQ_RAHEAD : 0), bh); 501 495 return bh; 502 496 verify: 503 497 err = ext4_validate_block_bitmap(sb, desc, block_group, bh); ··· 542 534 struct buffer_head *bh; 543 535 int err; 544 536 545 - bh = ext4_read_block_bitmap_nowait(sb, block_group); 537 + bh = ext4_read_block_bitmap_nowait(sb, block_group, false); 546 538 if (IS_ERR(bh)) 547 539 return bh; 548 540 err = ext4_wait_block_bitmap(sb, block_group, bh);
+68 -89
fs/ext4/block_validity.c
··· 24 24 struct rb_node node; 25 25 ext4_fsblk_t start_blk; 26 26 unsigned int count; 27 + u32 ino; 27 28 }; 28 29 29 30 static struct kmem_cache *ext4_system_zone_cachep; ··· 46 45 static inline int can_merge(struct ext4_system_zone *entry1, 47 46 struct ext4_system_zone *entry2) 48 47 { 49 - if ((entry1->start_blk + entry1->count) == entry2->start_blk) 48 + if ((entry1->start_blk + entry1->count) == entry2->start_blk && 49 + entry1->ino == entry2->ino) 50 50 return 1; 51 51 return 0; 52 52 } ··· 68 66 */ 69 67 static int add_system_zone(struct ext4_system_blocks *system_blks, 70 68 ext4_fsblk_t start_blk, 71 - unsigned int count) 69 + unsigned int count, u32 ino) 72 70 { 73 - struct ext4_system_zone *new_entry = NULL, *entry; 71 + struct ext4_system_zone *new_entry, *entry; 74 72 struct rb_node **n = &system_blks->root.rb_node, *node; 75 73 struct rb_node *parent = NULL, *new_node = NULL; 76 74 ··· 81 79 n = &(*n)->rb_left; 82 80 else if (start_blk >= (entry->start_blk + entry->count)) 83 81 n = &(*n)->rb_right; 84 - else { 85 - if (start_blk + count > (entry->start_blk + 86 - entry->count)) 87 - entry->count = (start_blk + count - 88 - entry->start_blk); 89 - new_node = *n; 90 - new_entry = rb_entry(new_node, struct ext4_system_zone, 91 - node); 92 - break; 93 - } 82 + else /* Unexpected overlap of system zones. */ 83 + return -EFSCORRUPTED; 94 84 } 95 85 96 - if (!new_entry) { 97 - new_entry = kmem_cache_alloc(ext4_system_zone_cachep, 98 - GFP_KERNEL); 99 - if (!new_entry) 100 - return -ENOMEM; 101 - new_entry->start_blk = start_blk; 102 - new_entry->count = count; 103 - new_node = &new_entry->node; 86 + new_entry = kmem_cache_alloc(ext4_system_zone_cachep, 87 + GFP_KERNEL); 88 + if (!new_entry) 89 + return -ENOMEM; 90 + new_entry->start_blk = start_blk; 91 + new_entry->count = count; 92 + new_entry->ino = ino; 93 + new_node = &new_entry->node; 104 94 105 - rb_link_node(new_node, parent, n); 106 - rb_insert_color(new_node, &system_blks->root); 107 - } 95 + rb_link_node(new_node, parent, n); 96 + rb_insert_color(new_node, &system_blks->root); 108 97 109 98 /* Can we merge to the left? */ 110 99 node = rb_prev(new_node); ··· 144 151 printk(KERN_CONT "\n"); 145 152 } 146 153 147 - /* 148 - * Returns 1 if the passed-in block region (start_blk, 149 - * start_blk+count) is valid; 0 if some part of the block region 150 - * overlaps with filesystem metadata blocks. 151 - */ 152 - static int ext4_data_block_valid_rcu(struct ext4_sb_info *sbi, 153 - struct ext4_system_blocks *system_blks, 154 - ext4_fsblk_t start_blk, 155 - unsigned int count) 156 - { 157 - struct ext4_system_zone *entry; 158 - struct rb_node *n; 159 - 160 - if ((start_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) || 161 - (start_blk + count < start_blk) || 162 - (start_blk + count > ext4_blocks_count(sbi->s_es))) 163 - return 0; 164 - 165 - if (system_blks == NULL) 166 - return 1; 167 - 168 - n = system_blks->root.rb_node; 169 - while (n) { 170 - entry = rb_entry(n, struct ext4_system_zone, node); 171 - if (start_blk + count - 1 < entry->start_blk) 172 - n = n->rb_left; 173 - else if (start_blk >= (entry->start_blk + entry->count)) 174 - n = n->rb_right; 175 - else 176 - return 0; 177 - } 178 - return 1; 179 - } 180 - 181 154 static int ext4_protect_reserved_inode(struct super_block *sb, 182 155 struct ext4_system_blocks *system_blks, 183 156 u32 ino) ··· 173 214 if (n == 0) { 174 215 i++; 175 216 } else { 176 - if (!ext4_data_block_valid_rcu(sbi, system_blks, 177 - map.m_pblk, n)) { 178 - err = -EFSCORRUPTED; 179 - __ext4_error(sb, __func__, __LINE__, -err, 180 - map.m_pblk, "blocks %llu-%llu " 181 - "from inode %u overlap system zone", 182 - map.m_pblk, 183 - map.m_pblk + map.m_len - 1, ino); 217 + err = add_system_zone(system_blks, map.m_pblk, n, ino); 218 + if (err < 0) { 219 + if (err == -EFSCORRUPTED) { 220 + __ext4_error(sb, __func__, __LINE__, 221 + -err, map.m_pblk, 222 + "blocks %llu-%llu from inode %u overlap system zone", 223 + map.m_pblk, 224 + map.m_pblk + map.m_len - 1, 225 + ino); 226 + } 184 227 break; 185 228 } 186 - err = add_system_zone(system_blks, map.m_pblk, n); 187 - if (err < 0) 188 - break; 189 229 i += n; 190 230 } 191 231 } ··· 220 262 int flex_size = ext4_flex_bg_size(sbi); 221 263 int ret; 222 264 223 - if (!test_opt(sb, BLOCK_VALIDITY)) { 224 - if (sbi->system_blks) 225 - ext4_release_system_zone(sb); 226 - return 0; 227 - } 228 - if (sbi->system_blks) 229 - return 0; 230 - 231 265 system_blks = kzalloc(sizeof(*system_blks), GFP_KERNEL); 232 266 if (!system_blks) 233 267 return -ENOMEM; ··· 227 277 for (i=0; i < ngroups; i++) { 228 278 cond_resched(); 229 279 if (ext4_bg_has_super(sb, i) && 230 - ((i < 5) || ((i % flex_size) == 0))) 231 - add_system_zone(system_blks, 280 + ((i < 5) || ((i % flex_size) == 0))) { 281 + ret = add_system_zone(system_blks, 232 282 ext4_group_first_block_no(sb, i), 233 - ext4_bg_num_gdb(sb, i) + 1); 283 + ext4_bg_num_gdb(sb, i) + 1, 0); 284 + if (ret) 285 + goto err; 286 + } 234 287 gdp = ext4_get_group_desc(sb, i, NULL); 235 288 ret = add_system_zone(system_blks, 236 - ext4_block_bitmap(sb, gdp), 1); 289 + ext4_block_bitmap(sb, gdp), 1, 0); 237 290 if (ret) 238 291 goto err; 239 292 ret = add_system_zone(system_blks, 240 - ext4_inode_bitmap(sb, gdp), 1); 293 + ext4_inode_bitmap(sb, gdp), 1, 0); 241 294 if (ret) 242 295 goto err; 243 296 ret = add_system_zone(system_blks, 244 297 ext4_inode_table(sb, gdp), 245 - sbi->s_itb_per_group); 298 + sbi->s_itb_per_group, 0); 246 299 if (ret) 247 300 goto err; 248 301 } ··· 294 341 call_rcu(&system_blks->rcu, ext4_destroy_system_zone); 295 342 } 296 343 297 - int ext4_data_block_valid(struct ext4_sb_info *sbi, ext4_fsblk_t start_blk, 344 + /* 345 + * Returns 1 if the passed-in block region (start_blk, 346 + * start_blk+count) is valid; 0 if some part of the block region 347 + * overlaps with some other filesystem metadata blocks. 348 + */ 349 + int ext4_inode_block_valid(struct inode *inode, ext4_fsblk_t start_blk, 298 350 unsigned int count) 299 351 { 352 + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); 300 353 struct ext4_system_blocks *system_blks; 301 - int ret; 354 + struct ext4_system_zone *entry; 355 + struct rb_node *n; 356 + int ret = 1; 357 + 358 + if ((start_blk <= le32_to_cpu(sbi->s_es->s_first_data_block)) || 359 + (start_blk + count < start_blk) || 360 + (start_blk + count > ext4_blocks_count(sbi->s_es))) 361 + return 0; 302 362 303 363 /* 304 364 * Lock the system zone to prevent it being released concurrently ··· 320 354 */ 321 355 rcu_read_lock(); 322 356 system_blks = rcu_dereference(sbi->system_blks); 323 - ret = ext4_data_block_valid_rcu(sbi, system_blks, start_blk, 324 - count); 357 + if (system_blks == NULL) 358 + goto out_rcu; 359 + 360 + n = system_blks->root.rb_node; 361 + while (n) { 362 + entry = rb_entry(n, struct ext4_system_zone, node); 363 + if (start_blk + count - 1 < entry->start_blk) 364 + n = n->rb_left; 365 + else if (start_blk >= (entry->start_blk + entry->count)) 366 + n = n->rb_right; 367 + else { 368 + ret = (entry->ino == inode->i_ino); 369 + break; 370 + } 371 + } 372 + out_rcu: 325 373 rcu_read_unlock(); 326 374 return ret; 327 375 } ··· 354 374 while (bref < p+max) { 355 375 blk = le32_to_cpu(*bref++); 356 376 if (blk && 357 - unlikely(!ext4_data_block_valid(EXT4_SB(inode->i_sb), 358 - blk, 1))) { 377 + unlikely(!ext4_inode_block_valid(inode, blk, 1))) { 359 378 ext4_error_inode(inode, function, line, blk, 360 379 "invalid block"); 361 380 return -EFSCORRUPTED;
+62 -27
fs/ext4/ext4.h
··· 434 434 #define EXT4_CASEFOLD_FL 0x40000000 /* Casefolded directory */ 435 435 #define EXT4_RESERVED_FL 0x80000000 /* reserved for ext4 lib */ 436 436 437 - #define EXT4_FL_USER_VISIBLE 0x725BDFFF /* User visible flags */ 438 - #define EXT4_FL_USER_MODIFIABLE 0x624BC0FF /* User modifiable flags */ 437 + /* User modifiable flags */ 438 + #define EXT4_FL_USER_MODIFIABLE (EXT4_SECRM_FL | \ 439 + EXT4_UNRM_FL | \ 440 + EXT4_COMPR_FL | \ 441 + EXT4_SYNC_FL | \ 442 + EXT4_IMMUTABLE_FL | \ 443 + EXT4_APPEND_FL | \ 444 + EXT4_NODUMP_FL | \ 445 + EXT4_NOATIME_FL | \ 446 + EXT4_JOURNAL_DATA_FL | \ 447 + EXT4_NOTAIL_FL | \ 448 + EXT4_DIRSYNC_FL | \ 449 + EXT4_TOPDIR_FL | \ 450 + EXT4_EXTENTS_FL | \ 451 + 0x00400000 /* EXT4_EOFBLOCKS_FL */ | \ 452 + EXT4_DAX_FL | \ 453 + EXT4_PROJINHERIT_FL | \ 454 + EXT4_CASEFOLD_FL) 439 455 440 - /* Flags we can manipulate with through EXT4_IOC_FSSETXATTR */ 456 + /* User visible flags */ 457 + #define EXT4_FL_USER_VISIBLE (EXT4_FL_USER_MODIFIABLE | \ 458 + EXT4_DIRTY_FL | \ 459 + EXT4_COMPRBLK_FL | \ 460 + EXT4_NOCOMPR_FL | \ 461 + EXT4_ENCRYPT_FL | \ 462 + EXT4_INDEX_FL | \ 463 + EXT4_VERITY_FL | \ 464 + EXT4_INLINE_DATA_FL) 465 + 466 + /* Flags we can manipulate with through FS_IOC_FSSETXATTR */ 441 467 #define EXT4_FL_XFLAG_VISIBLE (EXT4_SYNC_FL | \ 442 468 EXT4_IMMUTABLE_FL | \ 443 469 EXT4_APPEND_FL | \ ··· 695 669 /* 696 670 * ioctl commands 697 671 */ 698 - #define EXT4_IOC_GETFLAGS FS_IOC_GETFLAGS 699 - #define EXT4_IOC_SETFLAGS FS_IOC_SETFLAGS 700 672 #define EXT4_IOC_GETVERSION _IOR('f', 3, long) 701 673 #define EXT4_IOC_SETVERSION _IOW('f', 4, long) 702 674 #define EXT4_IOC_GETVERSION_OLD FS_IOC_GETVERSION ··· 711 687 #define EXT4_IOC_RESIZE_FS _IOW('f', 16, __u64) 712 688 #define EXT4_IOC_SWAP_BOOT _IO('f', 17) 713 689 #define EXT4_IOC_PRECACHE_EXTENTS _IO('f', 18) 714 - #define EXT4_IOC_SET_ENCRYPTION_POLICY FS_IOC_SET_ENCRYPTION_POLICY 715 - #define EXT4_IOC_GET_ENCRYPTION_PWSALT FS_IOC_GET_ENCRYPTION_PWSALT 716 - #define EXT4_IOC_GET_ENCRYPTION_POLICY FS_IOC_GET_ENCRYPTION_POLICY 717 690 /* ioctl codes 19--39 are reserved for fscrypt */ 718 691 #define EXT4_IOC_CLEAR_ES_CACHE _IO('f', 40) 719 692 #define EXT4_IOC_GETSTATE _IOW('f', 41, __u32) 720 693 #define EXT4_IOC_GET_ES_CACHE _IOWR('f', 42, struct fiemap) 721 - 722 - #define EXT4_IOC_FSGETXATTR FS_IOC_FSGETXATTR 723 - #define EXT4_IOC_FSSETXATTR FS_IOC_FSSETXATTR 724 694 725 695 #define EXT4_IOC_SHUTDOWN _IOR ('X', 125, __u32) 726 696 ··· 740 722 /* 741 723 * ioctl commands in 32 bit emulation 742 724 */ 743 - #define EXT4_IOC32_GETFLAGS FS_IOC32_GETFLAGS 744 - #define EXT4_IOC32_SETFLAGS FS_IOC32_SETFLAGS 745 725 #define EXT4_IOC32_GETVERSION _IOR('f', 3, int) 746 726 #define EXT4_IOC32_SETVERSION _IOW('f', 4, int) 747 727 #define EXT4_IOC32_GETRSVSZ _IOR('f', 5, int) ··· 1070 1054 struct timespec64 i_crtime; 1071 1055 1072 1056 /* mballoc */ 1057 + atomic_t i_prealloc_active; 1073 1058 struct list_head i_prealloc_list; 1074 1059 spinlock_t i_prealloc_lock; 1075 1060 ··· 1189 1172 #define EXT4_MOUNT_JOURNAL_CHECKSUM 0x800000 /* Journal checksums */ 1190 1173 #define EXT4_MOUNT_JOURNAL_ASYNC_COMMIT 0x1000000 /* Journal Async Commit */ 1191 1174 #define EXT4_MOUNT_WARN_ON_ERROR 0x2000000 /* Trigger WARN_ON on error */ 1175 + #define EXT4_MOUNT_PREFETCH_BLOCK_BITMAPS 0x4000000 1192 1176 #define EXT4_MOUNT_DELALLOC 0x8000000 /* Delalloc support */ 1193 1177 #define EXT4_MOUNT_DATA_ERR_ABORT 0x10000000 /* Abort on file data write */ 1194 1178 #define EXT4_MOUNT_BLOCK_VALIDITY 0x20000000 /* Block validity checking */ ··· 1519 1501 unsigned int s_mb_stats; 1520 1502 unsigned int s_mb_order2_reqs; 1521 1503 unsigned int s_mb_group_prealloc; 1504 + unsigned int s_mb_max_inode_prealloc; 1522 1505 unsigned int s_max_dir_size_kb; 1523 1506 /* where last allocation was done - for stream allocation */ 1524 1507 unsigned long s_mb_last_group; 1525 1508 unsigned long s_mb_last_start; 1509 + unsigned int s_mb_prefetch; 1510 + unsigned int s_mb_prefetch_limit; 1526 1511 1527 1512 /* stats for buddy allocator */ 1528 1513 atomic_t s_bal_reqs; /* number of reqs with len > 1 */ ··· 1593 1572 struct ratelimit_state s_err_ratelimit_state; 1594 1573 struct ratelimit_state s_warning_ratelimit_state; 1595 1574 struct ratelimit_state s_msg_ratelimit_state; 1575 + atomic_t s_warning_count; 1576 + atomic_t s_msg_count; 1596 1577 1597 1578 /* Encryption context for '-o test_dummy_encryption' */ 1598 1579 struct fscrypt_dummy_context s_dummy_enc_ctx; ··· 1608 1585 #ifdef CONFIG_EXT4_DEBUG 1609 1586 unsigned long s_simulate_fail; 1610 1587 #endif 1588 + /* Record the errseq of the backing block device */ 1589 + errseq_t s_bdev_wb_err; 1590 + spinlock_t s_bdev_wb_lock; 1611 1591 }; 1612 1592 1613 1593 static inline struct ext4_sb_info *EXT4_SB(struct super_block *sb) ··· 2339 2313 struct mutex li_list_mtx; 2340 2314 }; 2341 2315 2316 + enum ext4_li_mode { 2317 + EXT4_LI_MODE_PREFETCH_BBITMAP, 2318 + EXT4_LI_MODE_ITABLE, 2319 + }; 2320 + 2342 2321 struct ext4_li_request { 2343 2322 struct super_block *lr_super; 2344 - struct ext4_sb_info *lr_sbi; 2323 + enum ext4_li_mode lr_mode; 2324 + ext4_group_t lr_first_not_zeroed; 2345 2325 ext4_group_t lr_next_group; 2346 2326 struct list_head lr_request; 2347 2327 unsigned long lr_next_sched; ··· 2478 2446 extern int ext4_should_retry_alloc(struct super_block *sb, int *retries); 2479 2447 2480 2448 extern struct buffer_head *ext4_read_block_bitmap_nowait(struct super_block *sb, 2481 - ext4_group_t block_group); 2449 + ext4_group_t block_group, 2450 + bool ignore_locked); 2482 2451 extern int ext4_wait_block_bitmap(struct super_block *sb, 2483 2452 ext4_group_t block_group, 2484 2453 struct buffer_head *bh); ··· 2684 2651 extern ext4_fsblk_t ext4_mb_new_blocks(handle_t *, 2685 2652 struct ext4_allocation_request *, int *); 2686 2653 extern int ext4_mb_reserve_blocks(struct super_block *, int); 2687 - extern void ext4_discard_preallocations(struct inode *); 2654 + extern void ext4_discard_preallocations(struct inode *, unsigned int); 2688 2655 extern int __init ext4_init_mballoc(void); 2689 2656 extern void ext4_exit_mballoc(void); 2657 + extern ext4_group_t ext4_mb_prefetch(struct super_block *sb, 2658 + ext4_group_t group, 2659 + unsigned int nr, int *cnt); 2660 + extern void ext4_mb_prefetch_fini(struct super_block *sb, ext4_group_t group, 2661 + unsigned int nr); 2662 + 2690 2663 extern void ext4_free_blocks(handle_t *handle, struct inode *inode, 2691 2664 struct buffer_head *bh, ext4_fsblk_t block, 2692 2665 unsigned long count, int flags); ··· 2804 2765 struct ext4_filename *fname, 2805 2766 unsigned int offset, 2806 2767 struct ext4_dir_entry_2 **res_dir); 2807 - extern int ext4_generic_delete_entry(handle_t *handle, 2808 - struct inode *dir, 2768 + extern int ext4_generic_delete_entry(struct inode *dir, 2809 2769 struct ext4_dir_entry_2 *de_del, 2810 2770 struct buffer_head *bh, 2811 2771 void *entry_buf, ··· 2962 2924 2963 2925 #endif 2964 2926 2965 - extern int ext4_update_compat_feature(handle_t *handle, struct super_block *sb, 2966 - __u32 compat); 2967 - extern int ext4_update_rocompat_feature(handle_t *handle, 2968 - struct super_block *sb, __u32 rocompat); 2969 - extern int ext4_update_incompat_feature(handle_t *handle, 2970 - struct super_block *sb, __u32 incompat); 2971 2927 extern ext4_fsblk_t ext4_block_bitmap(struct super_block *sb, 2972 2928 struct ext4_group_desc *bg); 2973 2929 extern ext4_fsblk_t ext4_inode_bitmap(struct super_block *sb, ··· 3177 3145 (1 << EXT4_GROUP_INFO_BBITMAP_CORRUPT_BIT) 3178 3146 #define EXT4_GROUP_INFO_IBITMAP_CORRUPT \ 3179 3147 (1 << EXT4_GROUP_INFO_IBITMAP_CORRUPT_BIT) 3148 + #define EXT4_GROUP_INFO_BBITMAP_READ_BIT 4 3180 3149 3181 3150 #define EXT4_MB_GRP_NEED_INIT(grp) \ 3182 3151 (test_bit(EXT4_GROUP_INFO_NEED_INIT_BIT, &((grp)->bb_state))) ··· 3192 3159 (set_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state))) 3193 3160 #define EXT4_MB_GRP_CLEAR_TRIMMED(grp) \ 3194 3161 (clear_bit(EXT4_GROUP_INFO_WAS_TRIMMED_BIT, &((grp)->bb_state))) 3162 + #define EXT4_MB_GRP_TEST_AND_SET_READ(grp) \ 3163 + (test_and_set_bit(EXT4_GROUP_INFO_BBITMAP_READ_BIT, &((grp)->bb_state))) 3195 3164 3196 3165 #define EXT4_MAX_CONTENTION 8 3197 3166 #define EXT4_CONTENTION_THRESHOLD 2 ··· 3398 3363 extern int ext4_setup_system_zone(struct super_block *sb); 3399 3364 extern int __init ext4_init_system_zone(void); 3400 3365 extern void ext4_exit_system_zone(void); 3401 - extern int ext4_data_block_valid(struct ext4_sb_info *sbi, 3402 - ext4_fsblk_t start_blk, 3403 - unsigned int count); 3366 + extern int ext4_inode_block_valid(struct inode *inode, 3367 + ext4_fsblk_t start_blk, 3368 + unsigned int count); 3404 3369 extern int ext4_check_blockref(const char *, unsigned int, 3405 3370 struct inode *, __le32 *, unsigned int); 3406 3371
+25
fs/ext4/ext4_jbd2.c
··· 195 195 jbd2_journal_abort_handle(handle); 196 196 } 197 197 198 + static void ext4_check_bdev_write_error(struct super_block *sb) 199 + { 200 + struct address_space *mapping = sb->s_bdev->bd_inode->i_mapping; 201 + struct ext4_sb_info *sbi = EXT4_SB(sb); 202 + int err; 203 + 204 + /* 205 + * If the block device has write error flag, it may have failed to 206 + * async write out metadata buffers in the background. In this case, 207 + * we could read old data from disk and write it out again, which 208 + * may lead to on-disk filesystem inconsistency. 209 + */ 210 + if (errseq_check(&mapping->wb_err, READ_ONCE(sbi->s_bdev_wb_err))) { 211 + spin_lock(&sbi->s_bdev_wb_lock); 212 + err = errseq_check_and_advance(&mapping->wb_err, &sbi->s_bdev_wb_err); 213 + spin_unlock(&sbi->s_bdev_wb_lock); 214 + if (err) 215 + ext4_error_err(sb, -err, 216 + "Error while async write back metadata"); 217 + } 218 + } 219 + 198 220 int __ext4_journal_get_write_access(const char *where, unsigned int line, 199 221 handle_t *handle, struct buffer_head *bh) 200 222 { 201 223 int err = 0; 202 224 203 225 might_sleep(); 226 + 227 + if (bh->b_bdev->bd_super) 228 + ext4_check_bdev_write_error(bh->b_bdev->bd_super); 204 229 205 230 if (ext4_handle_valid(handle)) { 206 231 err = jbd2_journal_get_write_access(handle, bh);
+18 -24
fs/ext4/extents.c
··· 100 100 * i_mutex. So we can safely drop the i_data_sem here. 101 101 */ 102 102 BUG_ON(EXT4_JOURNAL(inode) == NULL); 103 - ext4_discard_preallocations(inode); 103 + ext4_discard_preallocations(inode, 0); 104 104 up_write(&EXT4_I(inode)->i_data_sem); 105 105 *dropped = 1; 106 106 return 0; ··· 340 340 */ 341 341 if (lblock + len <= lblock) 342 342 return 0; 343 - return ext4_data_block_valid(EXT4_SB(inode->i_sb), block, len); 343 + return ext4_inode_block_valid(inode, block, len); 344 344 } 345 345 346 346 static int ext4_valid_extent_idx(struct inode *inode, ··· 348 348 { 349 349 ext4_fsblk_t block = ext4_idx_pblock(ext_idx); 350 350 351 - return ext4_data_block_valid(EXT4_SB(inode->i_sb), block, 1); 351 + return ext4_inode_block_valid(inode, block, 1); 352 352 } 353 353 354 354 static int ext4_valid_extent_entries(struct inode *inode, ··· 507 507 } 508 508 if (buffer_verified(bh) && !(flags & EXT4_EX_FORCE_CACHE)) 509 509 return bh; 510 - if (!ext4_has_feature_journal(inode->i_sb) || 511 - (inode->i_ino != 512 - le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_journal_inum))) { 513 - err = __ext4_ext_check(function, line, inode, 514 - ext_block_hdr(bh), depth, pblk); 515 - if (err) 516 - goto errout; 517 - } 510 + err = __ext4_ext_check(function, line, inode, 511 + ext_block_hdr(bh), depth, pblk); 512 + if (err) 513 + goto errout; 518 514 set_buffer_verified(bh); 519 515 /* 520 516 * If this is a leaf block, cache all of its entries ··· 689 693 return; 690 694 depth = path->p_depth; 691 695 for (i = 0; i <= depth; i++, path++) { 692 - if (path->p_bh) { 693 - brelse(path->p_bh); 694 - path->p_bh = NULL; 695 - } 696 + brelse(path->p_bh); 697 + path->p_bh = NULL; 696 698 } 697 699 } 698 700 ··· 1909 1915 1910 1916 /* 1911 1917 * ext4_ext_insert_extent: 1912 - * tries to merge requsted extent into the existing extent or 1918 + * tries to merge requested extent into the existing extent or 1913 1919 * inserts requested extent as new one into the tree, 1914 1920 * creating new leaf in the no-space case. 1915 1921 */ ··· 3119 3125 * 3120 3126 * 3121 3127 * Splits extent [a, b] into two extents [a, @split) and [@split, b], states 3122 - * of which are deterimined by split_flag. 3128 + * of which are determined by split_flag. 3123 3129 * 3124 3130 * There are two cases: 3125 3131 * a> the extent are splitted into two extent. ··· 3644 3650 eof_block = map->m_lblk + map->m_len; 3645 3651 /* 3646 3652 * It is safe to convert extent to initialized via explicit 3647 - * zeroout only if extent is fully insde i_size or new_size. 3653 + * zeroout only if extent is fully inside i_size or new_size. 3648 3654 */ 3649 3655 depth = ext_depth(inode); 3650 3656 ex = path[depth].p_ext; ··· 4266 4272 * not a good idea to call discard here directly, 4267 4273 * but otherwise we'd need to call it every free(). 4268 4274 */ 4269 - ext4_discard_preallocations(inode); 4275 + ext4_discard_preallocations(inode, 0); 4270 4276 if (flags & EXT4_GET_BLOCKS_DELALLOC_RESERVE) 4271 4277 fb_flags = EXT4_FREE_BLOCKS_NO_QUOT_UPDATE; 4272 4278 ext4_free_blocks(handle, inode, NULL, newblock, ··· 4489 4495 } 4490 4496 4491 4497 /* 4492 - * Round up offset. This is not fallocate, we neet to zero out 4498 + * Round up offset. This is not fallocate, we need to zero out 4493 4499 * blocks, so convert interior block aligned part of the range to 4494 4500 * unwritten and possibly manually zero out unaligned parts of the 4495 4501 * range. ··· 5293 5299 } 5294 5300 5295 5301 down_write(&EXT4_I(inode)->i_data_sem); 5296 - ext4_discard_preallocations(inode); 5302 + ext4_discard_preallocations(inode, 0); 5297 5303 5298 5304 ret = ext4_es_remove_extent(inode, punch_start, 5299 5305 EXT_MAX_BLOCKS - punch_start); ··· 5307 5313 up_write(&EXT4_I(inode)->i_data_sem); 5308 5314 goto out_stop; 5309 5315 } 5310 - ext4_discard_preallocations(inode); 5316 + ext4_discard_preallocations(inode, 0); 5311 5317 5312 5318 ret = ext4_ext_shift_extents(inode, handle, punch_stop, 5313 5319 punch_stop - punch_start, SHIFT_LEFT); ··· 5439 5445 goto out_stop; 5440 5446 5441 5447 down_write(&EXT4_I(inode)->i_data_sem); 5442 - ext4_discard_preallocations(inode); 5448 + ext4_discard_preallocations(inode, 0); 5443 5449 5444 5450 path = ext4_find_extent(inode, offset_lblk, NULL, 0); 5445 5451 if (IS_ERR(path)) { ··· 5573 5579 } 5574 5580 ex1 = path1[path1->p_depth].p_ext; 5575 5581 ex2 = path2[path2->p_depth].p_ext; 5576 - /* Do we have somthing to swap ? */ 5582 + /* Do we have something to swap ? */ 5577 5583 if (unlikely(!ex2 || !ex1)) 5578 5584 goto finish; 5579 5585
+7 -4
fs/ext4/file.c
··· 145 145 /* if we are the last writer on the inode, drop the block reservation */ 146 146 if ((filp->f_mode & FMODE_WRITE) && 147 147 (atomic_read(&inode->i_writecount) == 1) && 148 - !EXT4_I(inode)->i_reserved_data_blocks) 149 - { 148 + !EXT4_I(inode)->i_reserved_data_blocks) { 150 149 down_write(&EXT4_I(inode)->i_data_sem); 151 - ext4_discard_preallocations(inode); 150 + ext4_discard_preallocations(inode, 0); 152 151 up_write(&EXT4_I(inode)->i_data_sem); 153 152 } 154 153 if (is_dx(inode) && filp->private_data) ··· 427 428 */ 428 429 if (*ilock_shared && (!IS_NOSEC(inode) || *extend || 429 430 !ext4_overwrite_io(inode, offset, count))) { 431 + if (iocb->ki_flags & IOCB_NOWAIT) { 432 + ret = -EAGAIN; 433 + goto out; 434 + } 430 435 inode_unlock_shared(inode); 431 436 *ilock_shared = false; 432 437 inode_lock(inode); ··· 815 812 return err; 816 813 } 817 814 818 - static int ext4_file_open(struct inode * inode, struct file * filp) 815 + static int ext4_file_open(struct inode *inode, struct file *filp) 819 816 { 820 817 int ret; 821 818
+2 -2
fs/ext4/hash.c
··· 233 233 break; 234 234 case DX_HASH_HALF_MD4_UNSIGNED: 235 235 str2hashbuf = str2hashbuf_unsigned; 236 - /* fall through */ 236 + fallthrough; 237 237 case DX_HASH_HALF_MD4: 238 238 p = name; 239 239 while (len > 0) { ··· 247 247 break; 248 248 case DX_HASH_TEA_UNSIGNED: 249 249 str2hashbuf = str2hashbuf_unsigned; 250 - /* fall through */ 250 + fallthrough; 251 251 case DX_HASH_TEA: 252 252 p = name; 253 253 while (len > 0) {
+9 -11
fs/ext4/indirect.c
··· 696 696 * i_mutex. So we can safely drop the i_data_sem here. 697 697 */ 698 698 BUG_ON(EXT4_JOURNAL(inode) == NULL); 699 - ext4_discard_preallocations(inode); 699 + ext4_discard_preallocations(inode, 0); 700 700 up_write(&EXT4_I(inode)->i_data_sem); 701 701 *dropped = 1; 702 702 return 0; ··· 858 858 else if (ext4_should_journal_data(inode)) 859 859 flags |= EXT4_FREE_BLOCKS_FORGET; 860 860 861 - if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), block_to_free, 862 - count)) { 861 + if (!ext4_inode_block_valid(inode, block_to_free, count)) { 863 862 EXT4_ERROR_INODE(inode, "attempt to clear invalid " 864 863 "blocks %llu len %lu", 865 864 (unsigned long long) block_to_free, count); ··· 1003 1004 if (!nr) 1004 1005 continue; /* A hole */ 1005 1006 1006 - if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), 1007 - nr, 1)) { 1007 + if (!ext4_inode_block_valid(inode, nr, 1)) { 1008 1008 EXT4_ERROR_INODE(inode, 1009 1009 "invalid indirect mapped " 1010 1010 "block %lu (level %d)", ··· 1180 1182 ext4_free_branches(handle, inode, NULL, &nr, &nr+1, 1); 1181 1183 i_data[EXT4_IND_BLOCK] = 0; 1182 1184 } 1183 - /* fall through */ 1185 + fallthrough; 1184 1186 case EXT4_IND_BLOCK: 1185 1187 nr = i_data[EXT4_DIND_BLOCK]; 1186 1188 if (nr) { 1187 1189 ext4_free_branches(handle, inode, NULL, &nr, &nr+1, 2); 1188 1190 i_data[EXT4_DIND_BLOCK] = 0; 1189 1191 } 1190 - /* fall through */ 1192 + fallthrough; 1191 1193 case EXT4_DIND_BLOCK: 1192 1194 nr = i_data[EXT4_TIND_BLOCK]; 1193 1195 if (nr) { 1194 1196 ext4_free_branches(handle, inode, NULL, &nr, &nr+1, 3); 1195 1197 i_data[EXT4_TIND_BLOCK] = 0; 1196 1198 } 1197 - /* fall through */ 1199 + fallthrough; 1198 1200 case EXT4_TIND_BLOCK: 1199 1201 ; 1200 1202 } ··· 1434 1436 ext4_free_branches(handle, inode, NULL, &nr, &nr+1, 1); 1435 1437 i_data[EXT4_IND_BLOCK] = 0; 1436 1438 } 1437 - /* fall through */ 1439 + fallthrough; 1438 1440 case EXT4_IND_BLOCK: 1439 1441 if (++n >= n2) 1440 1442 break; ··· 1443 1445 ext4_free_branches(handle, inode, NULL, &nr, &nr+1, 2); 1444 1446 i_data[EXT4_DIND_BLOCK] = 0; 1445 1447 } 1446 - /* fall through */ 1448 + fallthrough; 1447 1449 case EXT4_DIND_BLOCK: 1448 1450 if (++n >= n2) 1449 1451 break; ··· 1452 1454 ext4_free_branches(handle, inode, NULL, &nr, &nr+1, 3); 1453 1455 i_data[EXT4_TIND_BLOCK] = 0; 1454 1456 } 1455 - /* fall through */ 1457 + fallthrough; 1456 1458 case EXT4_TIND_BLOCK: 1457 1459 ; 1458 1460 }
+2 -2
fs/ext4/inline.c
··· 276 276 len = 0; 277 277 } 278 278 279 - /* Insert the the xttr entry. */ 279 + /* Insert the xttr entry. */ 280 280 i.value = value; 281 281 i.value_len = len; 282 282 ··· 1706 1706 if (err) 1707 1707 goto out; 1708 1708 1709 - err = ext4_generic_delete_entry(handle, dir, de_del, bh, 1709 + err = ext4_generic_delete_entry(dir, de_del, bh, 1710 1710 inline_start, inline_size, 0); 1711 1711 if (err) 1712 1712 goto out;
+15 -15
fs/ext4/inode.c
··· 383 383 */ 384 384 if ((ei->i_reserved_data_blocks == 0) && 385 385 !inode_is_open_for_write(inode)) 386 - ext4_discard_preallocations(inode); 386 + ext4_discard_preallocations(inode, 0); 387 387 } 388 388 389 389 static int __check_block_validity(struct inode *inode, const char *func, ··· 394 394 (inode->i_ino == 395 395 le32_to_cpu(EXT4_SB(inode->i_sb)->s_es->s_journal_inum))) 396 396 return 0; 397 - if (!ext4_data_block_valid(EXT4_SB(inode->i_sb), map->m_pblk, 398 - map->m_len)) { 397 + if (!ext4_inode_block_valid(inode, map->m_pblk, map->m_len)) { 399 398 ext4_error_inode(inode, func, line, map->m_pblk, 400 399 "lblock %lu mapped to illegal pblock %llu " 401 400 "(length %d)", (unsigned long) map->m_lblk, ··· 3287 3288 if (PageChecked(page)) 3288 3289 return 0; 3289 3290 if (journal) 3290 - return jbd2_journal_try_to_free_buffers(journal, page, wait); 3291 + return jbd2_journal_try_to_free_buffers(journal, page); 3291 3292 else 3292 3293 return try_to_free_buffers(page); 3293 3294 } ··· 4055 4056 if (stop_block > first_block) { 4056 4057 4057 4058 down_write(&EXT4_I(inode)->i_data_sem); 4058 - ext4_discard_preallocations(inode); 4059 + ext4_discard_preallocations(inode, 0); 4059 4060 4060 4061 ret = ext4_es_remove_extent(inode, first_block, 4061 4062 stop_block - first_block); ··· 4162 4163 trace_ext4_truncate_enter(inode); 4163 4164 4164 4165 if (!ext4_can_truncate(inode)) 4165 - return 0; 4166 + goto out_trace; 4166 4167 4167 4168 if (inode->i_size == 0 && !test_opt(inode->i_sb, NO_AUTO_DA_ALLOC)) 4168 4169 ext4_set_inode_state(inode, EXT4_STATE_DA_ALLOC_CLOSE); ··· 4171 4172 int has_inline = 1; 4172 4173 4173 4174 err = ext4_inline_data_truncate(inode, &has_inline); 4174 - if (err) 4175 - return err; 4176 - if (has_inline) 4177 - return 0; 4175 + if (err || has_inline) 4176 + goto out_trace; 4178 4177 } 4179 4178 4180 4179 /* If we zero-out tail of the page, we have to create jinode for jbd2 */ 4181 4180 if (inode->i_size & (inode->i_sb->s_blocksize - 1)) { 4182 4181 if (ext4_inode_attach_jinode(inode) < 0) 4183 - return 0; 4182 + goto out_trace; 4184 4183 } 4185 4184 4186 4185 if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) ··· 4187 4190 credits = ext4_blocks_for_truncate(inode); 4188 4191 4189 4192 handle = ext4_journal_start(inode, EXT4_HT_TRUNCATE, credits); 4190 - if (IS_ERR(handle)) 4191 - return PTR_ERR(handle); 4193 + if (IS_ERR(handle)) { 4194 + err = PTR_ERR(handle); 4195 + goto out_trace; 4196 + } 4192 4197 4193 4198 if (inode->i_size & (inode->i_sb->s_blocksize - 1)) 4194 4199 ext4_block_truncate_page(handle, mapping, inode->i_size); ··· 4210 4211 4211 4212 down_write(&EXT4_I(inode)->i_data_sem); 4212 4213 4213 - ext4_discard_preallocations(inode); 4214 + ext4_discard_preallocations(inode, 0); 4214 4215 4215 4216 if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) 4216 4217 err = ext4_ext_truncate(handle, inode); ··· 4241 4242 err = err2; 4242 4243 ext4_journal_stop(handle); 4243 4244 4245 + out_trace: 4244 4246 trace_ext4_truncate_exit(inode); 4245 4247 return err; 4246 4248 } ··· 4760 4760 4761 4761 ret = 0; 4762 4762 if (ei->i_file_acl && 4763 - !ext4_data_block_valid(EXT4_SB(sb), ei->i_file_acl, 1)) { 4763 + !ext4_inode_block_valid(inode, ei->i_file_acl, 1)) { 4764 4764 ext4_error_inode(inode, function, line, 0, 4765 4765 "iget: bad extended attribute block %llu", 4766 4766 ei->i_file_acl);
+17 -17
fs/ext4/ioctl.c
··· 202 202 reset_inode_seed(inode); 203 203 reset_inode_seed(inode_bl); 204 204 205 - ext4_discard_preallocations(inode); 205 + ext4_discard_preallocations(inode, 0); 206 206 207 207 err = ext4_mark_inode_dirty(handle, inode); 208 208 if (err < 0) { ··· 819 819 switch (cmd) { 820 820 case FS_IOC_GETFSMAP: 821 821 return ext4_ioc_getfsmap(sb, (void __user *)arg); 822 - case EXT4_IOC_GETFLAGS: 822 + case FS_IOC_GETFLAGS: 823 823 flags = ei->i_flags & EXT4_FL_USER_VISIBLE; 824 824 if (S_ISREG(inode->i_mode)) 825 825 flags &= ~EXT4_PROJINHERIT_FL; 826 826 return put_user(flags, (int __user *) arg); 827 - case EXT4_IOC_SETFLAGS: { 827 + case FS_IOC_SETFLAGS: { 828 828 int err; 829 829 830 830 if (!inode_owner_or_capable(inode)) ··· 1129 1129 case EXT4_IOC_PRECACHE_EXTENTS: 1130 1130 return ext4_ext_precache(inode); 1131 1131 1132 - case EXT4_IOC_SET_ENCRYPTION_POLICY: 1132 + case FS_IOC_SET_ENCRYPTION_POLICY: 1133 1133 if (!ext4_has_feature_encrypt(sb)) 1134 1134 return -EOPNOTSUPP; 1135 1135 return fscrypt_ioctl_set_policy(filp, (const void __user *)arg); 1136 1136 1137 - case EXT4_IOC_GET_ENCRYPTION_PWSALT: { 1137 + case FS_IOC_GET_ENCRYPTION_PWSALT: { 1138 1138 #ifdef CONFIG_FS_ENCRYPTION 1139 1139 int err, err2; 1140 1140 struct ext4_sb_info *sbi = EXT4_SB(sb); ··· 1174 1174 return -EOPNOTSUPP; 1175 1175 #endif 1176 1176 } 1177 - case EXT4_IOC_GET_ENCRYPTION_POLICY: 1177 + case FS_IOC_GET_ENCRYPTION_POLICY: 1178 1178 if (!ext4_has_feature_encrypt(sb)) 1179 1179 return -EOPNOTSUPP; 1180 1180 return fscrypt_ioctl_get_policy(filp, (void __user *)arg); ··· 1236 1236 case EXT4_IOC_GET_ES_CACHE: 1237 1237 return ext4_ioctl_get_es_cache(filp, arg); 1238 1238 1239 - case EXT4_IOC_FSGETXATTR: 1239 + case FS_IOC_FSGETXATTR: 1240 1240 { 1241 1241 struct fsxattr fa; 1242 1242 ··· 1247 1247 return -EFAULT; 1248 1248 return 0; 1249 1249 } 1250 - case EXT4_IOC_FSSETXATTR: 1250 + case FS_IOC_FSSETXATTR: 1251 1251 { 1252 1252 struct fsxattr fa, old_fa; 1253 1253 int err; ··· 1313 1313 { 1314 1314 /* These are just misnamed, they actually get/put from/to user an int */ 1315 1315 switch (cmd) { 1316 - case EXT4_IOC32_GETFLAGS: 1317 - cmd = EXT4_IOC_GETFLAGS; 1316 + case FS_IOC32_GETFLAGS: 1317 + cmd = FS_IOC_GETFLAGS; 1318 1318 break; 1319 - case EXT4_IOC32_SETFLAGS: 1320 - cmd = EXT4_IOC_SETFLAGS; 1319 + case FS_IOC32_SETFLAGS: 1320 + cmd = FS_IOC_SETFLAGS; 1321 1321 break; 1322 1322 case EXT4_IOC32_GETVERSION: 1323 1323 cmd = EXT4_IOC_GETVERSION; ··· 1361 1361 case EXT4_IOC_RESIZE_FS: 1362 1362 case FITRIM: 1363 1363 case EXT4_IOC_PRECACHE_EXTENTS: 1364 - case EXT4_IOC_SET_ENCRYPTION_POLICY: 1365 - case EXT4_IOC_GET_ENCRYPTION_PWSALT: 1366 - case EXT4_IOC_GET_ENCRYPTION_POLICY: 1364 + case FS_IOC_SET_ENCRYPTION_POLICY: 1365 + case FS_IOC_GET_ENCRYPTION_PWSALT: 1366 + case FS_IOC_GET_ENCRYPTION_POLICY: 1367 1367 case FS_IOC_GET_ENCRYPTION_POLICY_EX: 1368 1368 case FS_IOC_ADD_ENCRYPTION_KEY: 1369 1369 case FS_IOC_REMOVE_ENCRYPTION_KEY: ··· 1377 1377 case EXT4_IOC_CLEAR_ES_CACHE: 1378 1378 case EXT4_IOC_GETSTATE: 1379 1379 case EXT4_IOC_GET_ES_CACHE: 1380 - case EXT4_IOC_FSGETXATTR: 1381 - case EXT4_IOC_FSSETXATTR: 1380 + case FS_IOC_FSGETXATTR: 1381 + case FS_IOC_FSSETXATTR: 1382 1382 break; 1383 1383 default: 1384 1384 return -ENOIOCTLCMD;
+244 -45
fs/ext4/mballoc.c
··· 922 922 bh[i] = NULL; 923 923 continue; 924 924 } 925 - bh[i] = ext4_read_block_bitmap_nowait(sb, group); 925 + bh[i] = ext4_read_block_bitmap_nowait(sb, group, false); 926 926 if (IS_ERR(bh[i])) { 927 927 err = PTR_ERR(bh[i]); 928 928 bh[i] = NULL; ··· 1278 1278 /* Pages marked accessed already */ 1279 1279 e4b->bd_buddy_page = page; 1280 1280 e4b->bd_buddy = page_address(page) + (poff * sb->s_blocksize); 1281 - 1282 - BUG_ON(e4b->bd_bitmap_page == NULL); 1283 - BUG_ON(e4b->bd_buddy_page == NULL); 1284 1281 1285 1282 return 0; 1286 1283 ··· 1740 1743 1741 1744 } 1742 1745 1743 - /* 1744 - * regular allocator, for general purposes allocation 1745 - */ 1746 - 1747 1746 static void ext4_mb_check_limits(struct ext4_allocation_context *ac, 1748 1747 struct ext4_buddy *e4b, 1749 1748 int finish_group) ··· 2112 2119 2113 2120 BUG_ON(cr < 0 || cr >= 4); 2114 2121 2115 - free = grp->bb_free; 2116 - if (free == 0) 2117 - return false; 2118 - if (cr <= 2 && free < ac->ac_g_ex.fe_len) 2122 + if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(grp))) 2119 2123 return false; 2120 2124 2121 - if (unlikely(EXT4_MB_GRP_BBITMAP_CORRUPT(grp))) 2125 + free = grp->bb_free; 2126 + if (free == 0) 2122 2127 return false; 2123 2128 2124 2129 fragments = grp->bb_fragments; ··· 2133 2142 ((group % flex_size) == 0)) 2134 2143 return false; 2135 2144 2136 - if ((ac->ac_2order > ac->ac_sb->s_blocksize_bits+1) || 2137 - (free / fragments) >= ac->ac_g_ex.fe_len) 2145 + if (free < ac->ac_g_ex.fe_len) 2146 + return false; 2147 + 2148 + if (ac->ac_2order > ac->ac_sb->s_blocksize_bits+1) 2138 2149 return true; 2139 2150 2140 2151 if (grp->bb_largest_free_order < ac->ac_2order) ··· 2170 2177 { 2171 2178 struct ext4_group_info *grp = ext4_get_group_info(ac->ac_sb, group); 2172 2179 struct super_block *sb = ac->ac_sb; 2180 + struct ext4_sb_info *sbi = EXT4_SB(sb); 2173 2181 bool should_lock = ac->ac_flags & EXT4_MB_STRICT_CHECK; 2174 2182 ext4_grpblk_t free; 2175 2183 int ret = 0; ··· 2189 2195 2190 2196 /* We only do this if the grp has never been initialized */ 2191 2197 if (unlikely(EXT4_MB_GRP_NEED_INIT(grp))) { 2192 - ret = ext4_mb_init_group(ac->ac_sb, group, GFP_NOFS); 2198 + struct ext4_group_desc *gdp = 2199 + ext4_get_group_desc(sb, group, NULL); 2200 + int ret; 2201 + 2202 + /* cr=0/1 is a very optimistic search to find large 2203 + * good chunks almost for free. If buddy data is not 2204 + * ready, then this optimization makes no sense. But 2205 + * we never skip the first block group in a flex_bg, 2206 + * since this gets used for metadata block allocation, 2207 + * and we want to make sure we locate metadata blocks 2208 + * in the first block group in the flex_bg if possible. 2209 + */ 2210 + if (cr < 2 && 2211 + (!sbi->s_log_groups_per_flex || 2212 + ((group & ((1 << sbi->s_log_groups_per_flex) - 1)) != 0)) && 2213 + !(ext4_has_group_desc_csum(sb) && 2214 + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) 2215 + return 0; 2216 + ret = ext4_mb_init_group(sb, group, GFP_NOFS); 2193 2217 if (ret) 2194 2218 return ret; 2195 2219 } ··· 2221 2209 return ret; 2222 2210 } 2223 2211 2212 + /* 2213 + * Start prefetching @nr block bitmaps starting at @group. 2214 + * Return the next group which needs to be prefetched. 2215 + */ 2216 + ext4_group_t ext4_mb_prefetch(struct super_block *sb, ext4_group_t group, 2217 + unsigned int nr, int *cnt) 2218 + { 2219 + ext4_group_t ngroups = ext4_get_groups_count(sb); 2220 + struct buffer_head *bh; 2221 + struct blk_plug plug; 2222 + 2223 + blk_start_plug(&plug); 2224 + while (nr-- > 0) { 2225 + struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, 2226 + NULL); 2227 + struct ext4_group_info *grp = ext4_get_group_info(sb, group); 2228 + 2229 + /* 2230 + * Prefetch block groups with free blocks; but don't 2231 + * bother if it is marked uninitialized on disk, since 2232 + * it won't require I/O to read. Also only try to 2233 + * prefetch once, so we avoid getblk() call, which can 2234 + * be expensive. 2235 + */ 2236 + if (!EXT4_MB_GRP_TEST_AND_SET_READ(grp) && 2237 + EXT4_MB_GRP_NEED_INIT(grp) && 2238 + ext4_free_group_clusters(sb, gdp) > 0 && 2239 + !(ext4_has_group_desc_csum(sb) && 2240 + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) { 2241 + bh = ext4_read_block_bitmap_nowait(sb, group, true); 2242 + if (bh && !IS_ERR(bh)) { 2243 + if (!buffer_uptodate(bh) && cnt) 2244 + (*cnt)++; 2245 + brelse(bh); 2246 + } 2247 + } 2248 + if (++group >= ngroups) 2249 + group = 0; 2250 + } 2251 + blk_finish_plug(&plug); 2252 + return group; 2253 + } 2254 + 2255 + /* 2256 + * Prefetching reads the block bitmap into the buffer cache; but we 2257 + * need to make sure that the buddy bitmap in the page cache has been 2258 + * initialized. Note that ext4_mb_init_group() will block if the I/O 2259 + * is not yet completed, or indeed if it was not initiated by 2260 + * ext4_mb_prefetch did not start the I/O. 2261 + * 2262 + * TODO: We should actually kick off the buddy bitmap setup in a work 2263 + * queue when the buffer I/O is completed, so that we don't block 2264 + * waiting for the block allocation bitmap read to finish when 2265 + * ext4_mb_prefetch_fini is called from ext4_mb_regular_allocator(). 2266 + */ 2267 + void ext4_mb_prefetch_fini(struct super_block *sb, ext4_group_t group, 2268 + unsigned int nr) 2269 + { 2270 + while (nr-- > 0) { 2271 + struct ext4_group_desc *gdp = ext4_get_group_desc(sb, group, 2272 + NULL); 2273 + struct ext4_group_info *grp = ext4_get_group_info(sb, group); 2274 + 2275 + if (!group) 2276 + group = ext4_get_groups_count(sb); 2277 + group--; 2278 + grp = ext4_get_group_info(sb, group); 2279 + 2280 + if (EXT4_MB_GRP_NEED_INIT(grp) && 2281 + ext4_free_group_clusters(sb, gdp) > 0 && 2282 + !(ext4_has_group_desc_csum(sb) && 2283 + (gdp->bg_flags & cpu_to_le16(EXT4_BG_BLOCK_UNINIT)))) { 2284 + if (ext4_mb_init_group(sb, group, GFP_NOFS)) 2285 + break; 2286 + } 2287 + } 2288 + } 2289 + 2224 2290 static noinline_for_stack int 2225 2291 ext4_mb_regular_allocator(struct ext4_allocation_context *ac) 2226 2292 { 2227 - ext4_group_t ngroups, group, i; 2293 + ext4_group_t prefetch_grp = 0, ngroups, group, i; 2228 2294 int cr = -1; 2229 2295 int err = 0, first_err = 0; 2296 + unsigned int nr = 0, prefetch_ios = 0; 2230 2297 struct ext4_sb_info *sbi; 2231 2298 struct super_block *sb; 2232 2299 struct ext4_buddy e4b; 2300 + int lost; 2233 2301 2234 2302 sb = ac->ac_sb; 2235 2303 sbi = EXT4_SB(sb); ··· 2329 2237 goto out; 2330 2238 2331 2239 /* 2332 - * ac->ac2_order is set only if the fe_len is a power of 2 2333 - * if ac2_order is set we also set criteria to 0 so that we 2240 + * ac->ac_2order is set only if the fe_len is a power of 2 2241 + * if ac->ac_2order is set we also set criteria to 0 so that we 2334 2242 * try exact allocation using buddy. 2335 2243 */ 2336 2244 i = fls(ac->ac_g_ex.fe_len); ··· 2374 2282 * from the goal value specified 2375 2283 */ 2376 2284 group = ac->ac_g_ex.fe_group; 2285 + prefetch_grp = group; 2377 2286 2378 2287 for (i = 0; i < ngroups; group++, i++) { 2379 2288 int ret = 0; ··· 2385 2292 */ 2386 2293 if (group >= ngroups) 2387 2294 group = 0; 2295 + 2296 + /* 2297 + * Batch reads of the block allocation bitmaps 2298 + * to get multiple READs in flight; limit 2299 + * prefetching at cr=0/1, otherwise mballoc can 2300 + * spend a lot of time loading imperfect groups 2301 + */ 2302 + if ((prefetch_grp == group) && 2303 + (cr > 1 || 2304 + prefetch_ios < sbi->s_mb_prefetch_limit)) { 2305 + unsigned int curr_ios = prefetch_ios; 2306 + 2307 + nr = sbi->s_mb_prefetch; 2308 + if (ext4_has_feature_flex_bg(sb)) { 2309 + nr = (group / sbi->s_mb_prefetch) * 2310 + sbi->s_mb_prefetch; 2311 + nr = nr + sbi->s_mb_prefetch - group; 2312 + } 2313 + prefetch_grp = ext4_mb_prefetch(sb, group, 2314 + nr, &prefetch_ios); 2315 + if (prefetch_ios == curr_ios) 2316 + nr = 0; 2317 + } 2388 2318 2389 2319 /* This now checks without needing the buddy page */ 2390 2320 ret = ext4_mb_good_group_nolock(ac, group, cr); ··· 2457 2341 * We've been searching too long. Let's try to allocate 2458 2342 * the best chunk we've found so far 2459 2343 */ 2460 - 2461 2344 ext4_mb_try_best_found(ac, &e4b); 2462 2345 if (ac->ac_status != AC_STATUS_FOUND) { 2463 2346 /* 2464 2347 * Someone more lucky has already allocated it. 2465 2348 * The only thing we can do is just take first 2466 2349 * found block(s) 2467 - printk(KERN_DEBUG "EXT4-fs: someone won our chunk\n"); 2468 2350 */ 2351 + lost = atomic_inc_return(&sbi->s_mb_lost_chunks); 2352 + mb_debug(sb, "lost chunk, group: %u, start: %d, len: %d, lost: %d\n", 2353 + ac->ac_b_ex.fe_group, ac->ac_b_ex.fe_start, 2354 + ac->ac_b_ex.fe_len, lost); 2355 + 2469 2356 ac->ac_b_ex.fe_group = 0; 2470 2357 ac->ac_b_ex.fe_start = 0; 2471 2358 ac->ac_b_ex.fe_len = 0; 2472 2359 ac->ac_status = AC_STATUS_CONTINUE; 2473 2360 ac->ac_flags |= EXT4_MB_HINT_FIRST; 2474 2361 cr = 3; 2475 - atomic_inc(&sbi->s_mb_lost_chunks); 2476 2362 goto repeat; 2477 2363 } 2478 2364 } ··· 2485 2367 mb_debug(sb, "Best len %d, origin len %d, ac_status %u, ac_flags 0x%x, cr %d ret %d\n", 2486 2368 ac->ac_b_ex.fe_len, ac->ac_o_ex.fe_len, ac->ac_status, 2487 2369 ac->ac_flags, cr, err); 2370 + 2371 + if (nr) 2372 + ext4_mb_prefetch_fini(sb, prefetch_grp, nr); 2373 + 2488 2374 return err; 2489 2375 } 2490 2376 ··· 2561 2439 for (i = 0; i <= 13; i++) 2562 2440 seq_printf(seq, " %-5u", i <= blocksize_bits + 1 ? 2563 2441 sg.info.bb_counters[i] : 0); 2564 - seq_printf(seq, " ]\n"); 2442 + seq_puts(seq, " ]\n"); 2565 2443 2566 2444 return 0; 2567 2445 } ··· 2735 2613 goto err_freebuddy; 2736 2614 } 2737 2615 2616 + if (ext4_has_feature_flex_bg(sb)) { 2617 + /* a single flex group is supposed to be read by a single IO */ 2618 + sbi->s_mb_prefetch = 1 << sbi->s_es->s_log_groups_per_flex; 2619 + sbi->s_mb_prefetch *= 8; /* 8 prefetch IOs in flight at most */ 2620 + } else { 2621 + sbi->s_mb_prefetch = 32; 2622 + } 2623 + if (sbi->s_mb_prefetch > ext4_get_groups_count(sb)) 2624 + sbi->s_mb_prefetch = ext4_get_groups_count(sb); 2625 + /* now many real IOs to prefetch within a single allocation at cr=0 2626 + * given cr=0 is an CPU-related optimization we shouldn't try to 2627 + * load too many groups, at some point we should start to use what 2628 + * we've got in memory. 2629 + * with an average random access time 5ms, it'd take a second to get 2630 + * 200 groups (* N with flex_bg), so let's make this limit 4 2631 + */ 2632 + sbi->s_mb_prefetch_limit = sbi->s_mb_prefetch * 4; 2633 + if (sbi->s_mb_prefetch_limit > ext4_get_groups_count(sb)) 2634 + sbi->s_mb_prefetch_limit = ext4_get_groups_count(sb); 2635 + 2738 2636 return 0; 2739 2637 2740 2638 err_freebuddy: ··· 2878 2736 sbi->s_mb_stats = MB_DEFAULT_STATS; 2879 2737 sbi->s_mb_stream_request = MB_DEFAULT_STREAM_THRESHOLD; 2880 2738 sbi->s_mb_order2_reqs = MB_DEFAULT_ORDER2_REQS; 2739 + sbi->s_mb_max_inode_prealloc = MB_DEFAULT_MAX_INODE_PREALLOC; 2881 2740 /* 2882 2741 * The default group preallocation is 512, which for 4k block 2883 2742 * sizes translates to 2 megabytes. However for bigalloc file ··· 3233 3090 block = ext4_grp_offs_to_block(sb, &ac->ac_b_ex); 3234 3091 3235 3092 len = EXT4_C2B(sbi, ac->ac_b_ex.fe_len); 3236 - if (!ext4_data_block_valid(sbi, block, len)) { 3093 + if (!ext4_inode_block_valid(ac->ac_inode, block, len)) { 3237 3094 ext4_error(sb, "Allocating blocks %llu-%llu which overlap " 3238 3095 "fs metadata", block, block+len); 3239 3096 /* File system mounted not to panic on error ··· 3817 3674 mb_debug(sb, "preallocated %d for group %u\n", preallocated, group); 3818 3675 } 3819 3676 3677 + static void ext4_mb_mark_pa_deleted(struct super_block *sb, 3678 + struct ext4_prealloc_space *pa) 3679 + { 3680 + struct ext4_inode_info *ei; 3681 + 3682 + if (pa->pa_deleted) { 3683 + ext4_warning(sb, "deleted pa, type:%d, pblk:%llu, lblk:%u, len:%d\n", 3684 + pa->pa_type, pa->pa_pstart, pa->pa_lstart, 3685 + pa->pa_len); 3686 + return; 3687 + } 3688 + 3689 + pa->pa_deleted = 1; 3690 + 3691 + if (pa->pa_type == MB_INODE_PA) { 3692 + ei = EXT4_I(pa->pa_inode); 3693 + atomic_dec(&ei->i_prealloc_active); 3694 + } 3695 + } 3696 + 3820 3697 static void ext4_mb_pa_callback(struct rcu_head *head) 3821 3698 { 3822 3699 struct ext4_prealloc_space *pa; ··· 3869 3706 return; 3870 3707 } 3871 3708 3872 - pa->pa_deleted = 1; 3709 + ext4_mb_mark_pa_deleted(sb, pa); 3873 3710 spin_unlock(&pa->pa_lock); 3874 3711 3875 3712 grp_blk = pa->pa_pstart; ··· 3993 3830 spin_lock(pa->pa_obj_lock); 3994 3831 list_add_rcu(&pa->pa_inode_list, &ei->i_prealloc_list); 3995 3832 spin_unlock(pa->pa_obj_lock); 3833 + atomic_inc(&ei->i_prealloc_active); 3996 3834 } 3997 3835 3998 3836 /* ··· 4204 4040 } 4205 4041 4206 4042 /* seems this one can be freed ... */ 4207 - pa->pa_deleted = 1; 4043 + ext4_mb_mark_pa_deleted(sb, pa); 4208 4044 4209 4045 /* we can trust pa_free ... */ 4210 4046 free += pa->pa_free; ··· 4267 4103 * 4268 4104 * FIXME!! Make sure it is valid at all the call sites 4269 4105 */ 4270 - void ext4_discard_preallocations(struct inode *inode) 4106 + void ext4_discard_preallocations(struct inode *inode, unsigned int needed) 4271 4107 { 4272 4108 struct ext4_inode_info *ei = EXT4_I(inode); 4273 4109 struct super_block *sb = inode->i_sb; ··· 4285 4121 4286 4122 mb_debug(sb, "discard preallocation for inode %lu\n", 4287 4123 inode->i_ino); 4288 - trace_ext4_discard_preallocations(inode); 4124 + trace_ext4_discard_preallocations(inode, 4125 + atomic_read(&ei->i_prealloc_active), needed); 4289 4126 4290 4127 INIT_LIST_HEAD(&list); 4128 + 4129 + if (needed == 0) 4130 + needed = UINT_MAX; 4291 4131 4292 4132 repeat: 4293 4133 /* first, collect all pa's in the inode */ 4294 4134 spin_lock(&ei->i_prealloc_lock); 4295 - while (!list_empty(&ei->i_prealloc_list)) { 4296 - pa = list_entry(ei->i_prealloc_list.next, 4135 + while (!list_empty(&ei->i_prealloc_list) && needed) { 4136 + pa = list_entry(ei->i_prealloc_list.prev, 4297 4137 struct ext4_prealloc_space, pa_inode_list); 4298 4138 BUG_ON(pa->pa_obj_lock != &ei->i_prealloc_lock); 4299 4139 spin_lock(&pa->pa_lock); ··· 4314 4146 4315 4147 } 4316 4148 if (pa->pa_deleted == 0) { 4317 - pa->pa_deleted = 1; 4149 + ext4_mb_mark_pa_deleted(sb, pa); 4318 4150 spin_unlock(&pa->pa_lock); 4319 4151 list_del_rcu(&pa->pa_inode_list); 4320 4152 list_add(&pa->u.pa_tmp_list, &list); 4153 + needed--; 4321 4154 continue; 4322 4155 } 4323 4156 ··· 4568 4399 ac->ac_g_ex = ac->ac_o_ex; 4569 4400 ac->ac_flags = ar->flags; 4570 4401 4571 - /* we have to define context: we'll we work with a file or 4402 + /* we have to define context: we'll work with a file or 4572 4403 * locality group. this is a policy, actually */ 4573 4404 ext4_mb_group_or_file(ac); 4574 4405 ··· 4619 4450 BUG_ON(pa->pa_type != MB_GROUP_PA); 4620 4451 4621 4452 /* seems this one can be freed ... */ 4622 - pa->pa_deleted = 1; 4453 + ext4_mb_mark_pa_deleted(sb, pa); 4623 4454 spin_unlock(&pa->pa_lock); 4624 4455 4625 4456 list_del_rcu(&pa->pa_inode_list); ··· 4718 4549 } 4719 4550 4720 4551 /* 4552 + * if per-inode prealloc list is too long, trim some PA 4553 + */ 4554 + static void ext4_mb_trim_inode_pa(struct inode *inode) 4555 + { 4556 + struct ext4_inode_info *ei = EXT4_I(inode); 4557 + struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb); 4558 + int count, delta; 4559 + 4560 + count = atomic_read(&ei->i_prealloc_active); 4561 + delta = (sbi->s_mb_max_inode_prealloc >> 2) + 1; 4562 + if (count > sbi->s_mb_max_inode_prealloc + delta) { 4563 + count -= sbi->s_mb_max_inode_prealloc; 4564 + ext4_discard_preallocations(inode, count); 4565 + } 4566 + } 4567 + 4568 + /* 4721 4569 * release all resource we used in allocation 4722 4570 */ 4723 4571 static int ext4_mb_release_context(struct ext4_allocation_context *ac) 4724 4572 { 4573 + struct inode *inode = ac->ac_inode; 4574 + struct ext4_inode_info *ei = EXT4_I(inode); 4725 4575 struct ext4_sb_info *sbi = EXT4_SB(ac->ac_sb); 4726 4576 struct ext4_prealloc_space *pa = ac->ac_pa; 4727 4577 if (pa) { ··· 4752 4564 pa->pa_free -= ac->ac_b_ex.fe_len; 4753 4565 pa->pa_len -= ac->ac_b_ex.fe_len; 4754 4566 spin_unlock(&pa->pa_lock); 4567 + 4568 + /* 4569 + * We want to add the pa to the right bucket. 4570 + * Remove it from the list and while adding 4571 + * make sure the list to which we are adding 4572 + * doesn't grow big. 4573 + */ 4574 + if (likely(pa->pa_free)) { 4575 + spin_lock(pa->pa_obj_lock); 4576 + list_del_rcu(&pa->pa_inode_list); 4577 + spin_unlock(pa->pa_obj_lock); 4578 + ext4_mb_add_n_trim(ac); 4579 + } 4755 4580 } 4756 - } 4757 - if (pa) { 4758 - /* 4759 - * We want to add the pa to the right bucket. 4760 - * Remove it from the list and while adding 4761 - * make sure the list to which we are adding 4762 - * doesn't grow big. 4763 - */ 4764 - if ((pa->pa_type == MB_GROUP_PA) && likely(pa->pa_free)) { 4581 + 4582 + if (pa->pa_type == MB_INODE_PA) { 4583 + /* 4584 + * treat per-inode prealloc list as a lru list, then try 4585 + * to trim the least recently used PA. 4586 + */ 4765 4587 spin_lock(pa->pa_obj_lock); 4766 - list_del_rcu(&pa->pa_inode_list); 4588 + list_move(&pa->pa_inode_list, &ei->i_prealloc_list); 4767 4589 spin_unlock(pa->pa_obj_lock); 4768 - ext4_mb_add_n_trim(ac); 4769 4590 } 4591 + 4770 4592 ext4_mb_put_pa(ac, ac->ac_sb, pa); 4771 4593 } 4772 4594 if (ac->ac_bitmap_page) ··· 4786 4588 if (ac->ac_flags & EXT4_MB_HINT_GROUP_ALLOC) 4787 4589 mutex_unlock(&ac->ac_lg->lg_mutex); 4788 4590 ext4_mb_collect_stats(ac); 4591 + ext4_mb_trim_inode_pa(inode); 4789 4592 return 0; 4790 4593 } 4791 4594 ··· 5114 4915 5115 4916 sbi = EXT4_SB(sb); 5116 4917 if (!(flags & EXT4_FREE_BLOCKS_VALIDATED) && 5117 - !ext4_data_block_valid(sbi, block, count)) { 4918 + !ext4_inode_block_valid(inode, block, count)) { 5118 4919 ext4_error(sb, "Freeing blocks not in datazone - " 5119 4920 "block = %llu, count = %lu", block, count); 5120 4921 goto error_return;
+4
fs/ext4/mballoc.h
··· 73 73 */ 74 74 #define MB_DEFAULT_GROUP_PREALLOC 512 75 75 76 + /* 77 + * maximum length of inode prealloc list 78 + */ 79 + #define MB_DEFAULT_MAX_INODE_PREALLOC 512 76 80 77 81 struct ext4_free_data { 78 82 /* this links the free block information from sb_info */
+2 -2
fs/ext4/move_extent.c
··· 686 686 687 687 out: 688 688 if (*moved_len) { 689 - ext4_discard_preallocations(orig_inode); 690 - ext4_discard_preallocations(donor_inode); 689 + ext4_discard_preallocations(orig_inode, 0); 690 + ext4_discard_preallocations(donor_inode, 0); 691 691 } 692 692 693 693 ext4_ext_drop_refs(path);
+39 -27
fs/ext4/namei.c
··· 1396 1396 ext4_match(dir, fname, de)) { 1397 1397 /* found a match - just to be sure, do 1398 1398 * a full check */ 1399 - if (ext4_check_dir_entry(dir, NULL, de, bh, bh->b_data, 1400 - bh->b_size, offset)) 1399 + if (ext4_check_dir_entry(dir, NULL, de, bh, search_buf, 1400 + buf_size, offset)) 1401 1401 return -1; 1402 1402 *res_dir = de; 1403 1403 return 1; ··· 1858 1858 blocksize, hinfo, map); 1859 1859 map -= count; 1860 1860 dx_sort_map(map, count); 1861 - /* Split the existing block in the middle, size-wise */ 1861 + /* Ensure that neither split block is over half full */ 1862 1862 size = 0; 1863 1863 move = 0; 1864 1864 for (i = count-1; i >= 0; i--) { ··· 1868 1868 size += map[i].size; 1869 1869 move++; 1870 1870 } 1871 - /* map index at which we will split */ 1872 - split = count - move; 1871 + /* 1872 + * map index at which we will split 1873 + * 1874 + * If the sum of active entries didn't exceed half the block size, just 1875 + * split it in half by count; each resulting block will have at least 1876 + * half the space free. 1877 + */ 1878 + if (i > 0) 1879 + split = count - move; 1880 + else 1881 + split = count/2; 1882 + 1873 1883 hash2 = map[split].hash; 1874 1884 continued = hash2 == map[split - 1].hash; 1875 1885 dxtrace(printk(KERN_INFO "Split block %lu at %x, %i/%i\n", ··· 2465 2455 * ext4_generic_delete_entry deletes a directory entry by merging it 2466 2456 * with the previous entry 2467 2457 */ 2468 - int ext4_generic_delete_entry(handle_t *handle, 2469 - struct inode *dir, 2458 + int ext4_generic_delete_entry(struct inode *dir, 2470 2459 struct ext4_dir_entry_2 *de_del, 2471 2460 struct buffer_head *bh, 2472 2461 void *entry_buf, ··· 2481 2472 de = (struct ext4_dir_entry_2 *)entry_buf; 2482 2473 while (i < buf_size - csum_size) { 2483 2474 if (ext4_check_dir_entry(dir, NULL, de, bh, 2484 - bh->b_data, bh->b_size, i)) 2475 + entry_buf, buf_size, i)) 2485 2476 return -EFSCORRUPTED; 2486 2477 if (de == de_del) { 2487 2478 if (pde) ··· 2526 2517 if (unlikely(err)) 2527 2518 goto out; 2528 2519 2529 - err = ext4_generic_delete_entry(handle, dir, de_del, 2530 - bh, bh->b_data, 2520 + err = ext4_generic_delete_entry(dir, de_del, bh, bh->b_data, 2531 2521 dir->i_sb->s_blocksize, csum_size); 2532 2522 if (err) 2533 2523 goto out; ··· 3201 3193 * in separate transaction */ 3202 3194 retval = dquot_initialize(dir); 3203 3195 if (retval) 3204 - return retval; 3196 + goto out_trace; 3205 3197 retval = dquot_initialize(d_inode(dentry)); 3206 3198 if (retval) 3207 - return retval; 3199 + goto out_trace; 3208 3200 3209 - retval = -ENOENT; 3210 3201 bh = ext4_find_entry(dir, &dentry->d_name, &de, NULL); 3211 - if (IS_ERR(bh)) 3212 - return PTR_ERR(bh); 3213 - if (!bh) 3214 - goto end_unlink; 3202 + if (IS_ERR(bh)) { 3203 + retval = PTR_ERR(bh); 3204 + goto out_trace; 3205 + } 3206 + if (!bh) { 3207 + retval = -ENOENT; 3208 + goto out_trace; 3209 + } 3215 3210 3216 3211 inode = d_inode(dentry); 3217 3212 3218 - retval = -EFSCORRUPTED; 3219 - if (le32_to_cpu(de->inode) != inode->i_ino) 3220 - goto end_unlink; 3213 + if (le32_to_cpu(de->inode) != inode->i_ino) { 3214 + retval = -EFSCORRUPTED; 3215 + goto out_bh; 3216 + } 3221 3217 3222 3218 handle = ext4_journal_start(dir, EXT4_HT_DIR, 3223 3219 EXT4_DATA_TRANS_BLOCKS(dir->i_sb)); 3224 3220 if (IS_ERR(handle)) { 3225 3221 retval = PTR_ERR(handle); 3226 - handle = NULL; 3227 - goto end_unlink; 3222 + goto out_bh; 3228 3223 } 3229 3224 3230 3225 if (IS_DIRSYNC(dir)) ··· 3235 3224 3236 3225 retval = ext4_delete_entry(handle, dir, de, bh); 3237 3226 if (retval) 3238 - goto end_unlink; 3227 + goto out_handle; 3239 3228 dir->i_ctime = dir->i_mtime = current_time(dir); 3240 3229 ext4_update_dx_flag(dir); 3241 3230 retval = ext4_mark_inode_dirty(handle, dir); 3242 3231 if (retval) 3243 - goto end_unlink; 3232 + goto out_handle; 3244 3233 if (inode->i_nlink == 0) 3245 3234 ext4_warning_inode(inode, "Deleting file '%.*s' with no links", 3246 3235 dentry->d_name.len, dentry->d_name.name); ··· 3262 3251 d_invalidate(dentry); 3263 3252 #endif 3264 3253 3265 - end_unlink: 3254 + out_handle: 3255 + ext4_journal_stop(handle); 3256 + out_bh: 3266 3257 brelse(bh); 3267 - if (handle) 3268 - ext4_journal_stop(handle); 3258 + out_trace: 3269 3259 trace_ext4_unlink_exit(dentry, retval); 3270 3260 return retval; 3271 3261 }
+2 -2
fs/ext4/readpage.c
··· 140 140 return; 141 141 } 142 142 ctx->cur_step++; 143 - /* fall-through */ 143 + fallthrough; 144 144 case STEP_VERITY: 145 145 if (ctx->enabled_steps & (1 << STEP_VERITY)) { 146 146 INIT_WORK(&ctx->work, verity_work); ··· 148 148 return; 149 149 } 150 150 ctx->cur_step++; 151 - /* fall-through */ 151 + fallthrough; 152 152 default: 153 153 __read_end_io(ctx->bio); 154 154 }
+189 -79
fs/ext4/super.c
··· 66 66 unsigned long journal_devnum); 67 67 static int ext4_show_options(struct seq_file *seq, struct dentry *root); 68 68 static int ext4_commit_super(struct super_block *sb, int sync); 69 - static void ext4_mark_recovery_complete(struct super_block *sb, 69 + static int ext4_mark_recovery_complete(struct super_block *sb, 70 70 struct ext4_super_block *es); 71 - static void ext4_clear_journal_err(struct super_block *sb, 72 - struct ext4_super_block *es); 71 + static int ext4_clear_journal_err(struct super_block *sb, 72 + struct ext4_super_block *es); 73 73 static int ext4_sync_fs(struct super_block *sb, int wait); 74 74 static int ext4_remount(struct super_block *sb, int *flags, char *data); 75 75 static int ext4_statfs(struct dentry *dentry, struct kstatfs *buf); ··· 744 744 struct va_format vaf; 745 745 va_list args; 746 746 747 + atomic_inc(&EXT4_SB(sb)->s_msg_count); 747 748 if (!___ratelimit(&(EXT4_SB(sb)->s_msg_ratelimit_state), "EXT4-fs")) 748 749 return; 749 750 ··· 755 754 va_end(args); 756 755 } 757 756 758 - #define ext4_warning_ratelimit(sb) \ 759 - ___ratelimit(&(EXT4_SB(sb)->s_warning_ratelimit_state), \ 760 - "EXT4-fs warning") 757 + static int ext4_warning_ratelimit(struct super_block *sb) 758 + { 759 + atomic_inc(&EXT4_SB(sb)->s_warning_count); 760 + return ___ratelimit(&(EXT4_SB(sb)->s_warning_ratelimit_state), 761 + "EXT4-fs warning"); 762 + } 761 763 762 764 void __ext4_warning(struct super_block *sb, const char *function, 763 765 unsigned int line, const char *fmt, ...) ··· 1127 1123 inode_set_iversion(&ei->vfs_inode, 1); 1128 1124 spin_lock_init(&ei->i_raw_lock); 1129 1125 INIT_LIST_HEAD(&ei->i_prealloc_list); 1126 + atomic_set(&ei->i_prealloc_active, 0); 1130 1127 spin_lock_init(&ei->i_prealloc_lock); 1131 1128 ext4_es_init_tree(&ei->i_es_tree); 1132 1129 rwlock_init(&ei->i_es_lock); ··· 1221 1216 { 1222 1217 invalidate_inode_buffers(inode); 1223 1218 clear_inode(inode); 1224 - ext4_discard_preallocations(inode); 1219 + ext4_discard_preallocations(inode, 0); 1225 1220 ext4_es_remove_extent(inode, 0, EXT_MAX_BLOCKS); 1226 1221 dquot_drop(inode); 1227 1222 if (EXT4_I(inode)->jinode) { ··· 1293 1288 if (!page_has_buffers(page)) 1294 1289 return 0; 1295 1290 if (journal) 1296 - return jbd2_journal_try_to_free_buffers(journal, page, 1297 - wait & ~__GFP_DIRECT_RECLAIM); 1291 + return jbd2_journal_try_to_free_buffers(journal, page); 1292 + 1298 1293 return try_to_free_buffers(page); 1299 1294 } 1300 1295 ··· 1527 1522 Opt_dioread_nolock, Opt_dioread_lock, 1528 1523 Opt_discard, Opt_nodiscard, Opt_init_itable, Opt_noinit_itable, 1529 1524 Opt_max_dir_size_kb, Opt_nojournal_checksum, Opt_nombcache, 1525 + Opt_prefetch_block_bitmaps, 1530 1526 }; 1531 1527 1532 1528 static const match_table_t tokens = { ··· 1620 1614 {Opt_inlinecrypt, "inlinecrypt"}, 1621 1615 {Opt_nombcache, "nombcache"}, 1622 1616 {Opt_nombcache, "no_mbcache"}, /* for backward compatibility */ 1617 + {Opt_prefetch_block_bitmaps, "prefetch_block_bitmaps"}, 1623 1618 {Opt_removed, "check=none"}, /* mount option from ext2/3 */ 1624 1619 {Opt_removed, "nocheck"}, /* mount option from ext2/3 */ 1625 1620 {Opt_removed, "reservation"}, /* mount option from ext2/3 */ ··· 1838 1831 {Opt_max_dir_size_kb, 0, MOPT_GTE0}, 1839 1832 {Opt_test_dummy_encryption, 0, MOPT_STRING}, 1840 1833 {Opt_nombcache, EXT4_MOUNT_NO_MBCACHE, MOPT_SET}, 1834 + {Opt_prefetch_block_bitmaps, EXT4_MOUNT_PREFETCH_BLOCK_BITMAPS, 1835 + MOPT_SET}, 1841 1836 {Opt_err, 0, 0} 1842 1837 }; 1843 1838 ··· 3222 3213 static int ext4_run_li_request(struct ext4_li_request *elr) 3223 3214 { 3224 3215 struct ext4_group_desc *gdp = NULL; 3225 - ext4_group_t group, ngroups; 3226 - struct super_block *sb; 3216 + struct super_block *sb = elr->lr_super; 3217 + ext4_group_t ngroups = EXT4_SB(sb)->s_groups_count; 3218 + ext4_group_t group = elr->lr_next_group; 3227 3219 unsigned long timeout = 0; 3220 + unsigned int prefetch_ios = 0; 3228 3221 int ret = 0; 3229 3222 3230 - sb = elr->lr_super; 3231 - ngroups = EXT4_SB(sb)->s_groups_count; 3223 + if (elr->lr_mode == EXT4_LI_MODE_PREFETCH_BBITMAP) { 3224 + elr->lr_next_group = ext4_mb_prefetch(sb, group, 3225 + EXT4_SB(sb)->s_mb_prefetch, &prefetch_ios); 3226 + if (prefetch_ios) 3227 + ext4_mb_prefetch_fini(sb, elr->lr_next_group, 3228 + prefetch_ios); 3229 + trace_ext4_prefetch_bitmaps(sb, group, elr->lr_next_group, 3230 + prefetch_ios); 3231 + if (group >= elr->lr_next_group) { 3232 + ret = 1; 3233 + if (elr->lr_first_not_zeroed != ngroups && 3234 + !sb_rdonly(sb) && test_opt(sb, INIT_INODE_TABLE)) { 3235 + elr->lr_next_group = elr->lr_first_not_zeroed; 3236 + elr->lr_mode = EXT4_LI_MODE_ITABLE; 3237 + ret = 0; 3238 + } 3239 + } 3240 + return ret; 3241 + } 3232 3242 3233 - for (group = elr->lr_next_group; group < ngroups; group++) { 3243 + for (; group < ngroups; group++) { 3234 3244 gdp = ext4_get_group_desc(sb, group, NULL); 3235 3245 if (!gdp) { 3236 3246 ret = 1; ··· 3267 3239 timeout = jiffies; 3268 3240 ret = ext4_init_inode_table(sb, group, 3269 3241 elr->lr_timeout ? 0 : 1); 3242 + trace_ext4_lazy_itable_init(sb, group); 3270 3243 if (elr->lr_timeout == 0) { 3271 3244 timeout = (jiffies - timeout) * 3272 - elr->lr_sbi->s_li_wait_mult; 3245 + EXT4_SB(elr->lr_super)->s_li_wait_mult; 3273 3246 elr->lr_timeout = timeout; 3274 3247 } 3275 3248 elr->lr_next_sched = jiffies + elr->lr_timeout; ··· 3285 3256 */ 3286 3257 static void ext4_remove_li_request(struct ext4_li_request *elr) 3287 3258 { 3288 - struct ext4_sb_info *sbi; 3289 - 3290 3259 if (!elr) 3291 3260 return; 3292 3261 3293 - sbi = elr->lr_sbi; 3294 - 3295 3262 list_del(&elr->lr_request); 3296 - sbi->s_li_request = NULL; 3263 + EXT4_SB(elr->lr_super)->s_li_request = NULL; 3297 3264 kfree(elr); 3298 3265 } 3299 3266 ··· 3498 3473 static struct ext4_li_request *ext4_li_request_new(struct super_block *sb, 3499 3474 ext4_group_t start) 3500 3475 { 3501 - struct ext4_sb_info *sbi = EXT4_SB(sb); 3502 3476 struct ext4_li_request *elr; 3503 3477 3504 3478 elr = kzalloc(sizeof(*elr), GFP_KERNEL); ··· 3505 3481 return NULL; 3506 3482 3507 3483 elr->lr_super = sb; 3508 - elr->lr_sbi = sbi; 3509 - elr->lr_next_group = start; 3484 + elr->lr_first_not_zeroed = start; 3485 + if (test_opt(sb, PREFETCH_BLOCK_BITMAPS)) 3486 + elr->lr_mode = EXT4_LI_MODE_PREFETCH_BBITMAP; 3487 + else { 3488 + elr->lr_mode = EXT4_LI_MODE_ITABLE; 3489 + elr->lr_next_group = start; 3490 + } 3510 3491 3511 3492 /* 3512 3493 * Randomize first schedule time of the request to ··· 3541 3512 goto out; 3542 3513 } 3543 3514 3544 - if (first_not_zeroed == ngroups || sb_rdonly(sb) || 3545 - !test_opt(sb, INIT_INODE_TABLE)) 3515 + if (!test_opt(sb, PREFETCH_BLOCK_BITMAPS) && 3516 + (first_not_zeroed == ngroups || sb_rdonly(sb) || 3517 + !test_opt(sb, INIT_INODE_TABLE))) 3546 3518 goto out; 3547 3519 3548 3520 elr = ext4_li_request_new(sb, first_not_zeroed); ··· 4740 4710 4741 4711 ext4_set_resv_clusters(sb); 4742 4712 4743 - err = ext4_setup_system_zone(sb); 4744 - if (err) { 4745 - ext4_msg(sb, KERN_ERR, "failed to initialize system " 4746 - "zone (%d)", err); 4747 - goto failed_mount4a; 4713 + if (test_opt(sb, BLOCK_VALIDITY)) { 4714 + err = ext4_setup_system_zone(sb); 4715 + if (err) { 4716 + ext4_msg(sb, KERN_ERR, "failed to initialize system " 4717 + "zone (%d)", err); 4718 + goto failed_mount4a; 4719 + } 4748 4720 } 4749 4721 4750 4722 ext4_ext_init(sb); ··· 4809 4777 } 4810 4778 #endif /* CONFIG_QUOTA */ 4811 4779 4780 + /* 4781 + * Save the original bdev mapping's wb_err value which could be 4782 + * used to detect the metadata async write error. 4783 + */ 4784 + spin_lock_init(&sbi->s_bdev_wb_lock); 4785 + if (!sb_rdonly(sb)) 4786 + errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err, 4787 + &sbi->s_bdev_wb_err); 4788 + sb->s_bdev->bd_super = sb; 4812 4789 EXT4_SB(sb)->s_mount_state |= EXT4_ORPHAN_FS; 4813 4790 ext4_orphan_cleanup(sb, es); 4814 4791 EXT4_SB(sb)->s_mount_state &= ~EXT4_ORPHAN_FS; 4815 4792 if (needs_recovery) { 4816 4793 ext4_msg(sb, KERN_INFO, "recovery complete"); 4817 - ext4_mark_recovery_complete(sb, es); 4794 + err = ext4_mark_recovery_complete(sb, es); 4795 + if (err) 4796 + goto failed_mount8; 4818 4797 } 4819 4798 if (EXT4_SB(sb)->s_journal) { 4820 4799 if (test_opt(sb, DATA_FLAGS) == EXT4_MOUNT_JOURNAL_DATA) ··· 4859 4816 ratelimit_state_init(&sbi->s_err_ratelimit_state, 5 * HZ, 10); 4860 4817 ratelimit_state_init(&sbi->s_warning_ratelimit_state, 5 * HZ, 10); 4861 4818 ratelimit_state_init(&sbi->s_msg_ratelimit_state, 5 * HZ, 10); 4819 + atomic_set(&sbi->s_warning_count, 0); 4820 + atomic_set(&sbi->s_msg_count, 0); 4862 4821 4863 4822 kfree(orig_data); 4864 4823 return 0; ··· 4870 4825 ext4_msg(sb, KERN_ERR, "VFS: Can't find ext4 filesystem"); 4871 4826 goto failed_mount; 4872 4827 4873 - #ifdef CONFIG_QUOTA 4874 4828 failed_mount8: 4875 4829 ext4_unregister_sysfs(sb); 4876 - #endif 4877 4830 failed_mount7: 4878 4831 ext4_unregister_li_request(sb); 4879 4832 failed_mount6: ··· 5011 4968 struct inode *journal_inode; 5012 4969 journal_t *journal; 5013 4970 5014 - BUG_ON(!ext4_has_feature_journal(sb)); 4971 + if (WARN_ON_ONCE(!ext4_has_feature_journal(sb))) 4972 + return NULL; 5015 4973 5016 4974 journal_inode = ext4_get_journal_inode(sb, journal_inum); 5017 4975 if (!journal_inode) ··· 5042 4998 struct ext4_super_block *es; 5043 4999 struct block_device *bdev; 5044 5000 5045 - BUG_ON(!ext4_has_feature_journal(sb)); 5001 + if (WARN_ON_ONCE(!ext4_has_feature_journal(sb))) 5002 + return NULL; 5046 5003 5047 5004 bdev = ext4_blkdev_get(j_dev, sb); 5048 5005 if (bdev == NULL) ··· 5134 5089 dev_t journal_dev; 5135 5090 int err = 0; 5136 5091 int really_read_only; 5092 + int journal_dev_ro; 5137 5093 5138 - BUG_ON(!ext4_has_feature_journal(sb)); 5094 + if (WARN_ON_ONCE(!ext4_has_feature_journal(sb))) 5095 + return -EFSCORRUPTED; 5139 5096 5140 5097 if (journal_devnum && 5141 5098 journal_devnum != le32_to_cpu(es->s_journal_dev)) { ··· 5147 5100 } else 5148 5101 journal_dev = new_decode_dev(le32_to_cpu(es->s_journal_dev)); 5149 5102 5150 - really_read_only = bdev_read_only(sb->s_bdev); 5103 + if (journal_inum && journal_dev) { 5104 + ext4_msg(sb, KERN_ERR, 5105 + "filesystem has both journal inode and journal device!"); 5106 + return -EINVAL; 5107 + } 5108 + 5109 + if (journal_inum) { 5110 + journal = ext4_get_journal(sb, journal_inum); 5111 + if (!journal) 5112 + return -EINVAL; 5113 + } else { 5114 + journal = ext4_get_dev_journal(sb, journal_dev); 5115 + if (!journal) 5116 + return -EINVAL; 5117 + } 5118 + 5119 + journal_dev_ro = bdev_read_only(journal->j_dev); 5120 + really_read_only = bdev_read_only(sb->s_bdev) | journal_dev_ro; 5121 + 5122 + if (journal_dev_ro && !sb_rdonly(sb)) { 5123 + ext4_msg(sb, KERN_ERR, 5124 + "journal device read-only, try mounting with '-o ro'"); 5125 + err = -EROFS; 5126 + goto err_out; 5127 + } 5151 5128 5152 5129 /* 5153 5130 * Are we loading a blank journal or performing recovery after a ··· 5186 5115 ext4_msg(sb, KERN_ERR, "write access " 5187 5116 "unavailable, cannot proceed " 5188 5117 "(try mounting with noload)"); 5189 - return -EROFS; 5118 + err = -EROFS; 5119 + goto err_out; 5190 5120 } 5191 5121 ext4_msg(sb, KERN_INFO, "write access will " 5192 5122 "be enabled during recovery"); 5193 5123 } 5194 - } 5195 - 5196 - if (journal_inum && journal_dev) { 5197 - ext4_msg(sb, KERN_ERR, "filesystem has both journal " 5198 - "and inode journals!"); 5199 - return -EINVAL; 5200 - } 5201 - 5202 - if (journal_inum) { 5203 - if (!(journal = ext4_get_journal(sb, journal_inum))) 5204 - return -EINVAL; 5205 - } else { 5206 - if (!(journal = ext4_get_dev_journal(sb, journal_dev))) 5207 - return -EINVAL; 5208 5124 } 5209 5125 5210 5126 if (!(journal->j_flags & JBD2_BARRIER)) ··· 5213 5155 5214 5156 if (err) { 5215 5157 ext4_msg(sb, KERN_ERR, "error loading journal"); 5216 - jbd2_journal_destroy(journal); 5217 - return err; 5158 + goto err_out; 5218 5159 } 5219 5160 5220 5161 EXT4_SB(sb)->s_journal = journal; 5221 - ext4_clear_journal_err(sb, es); 5162 + err = ext4_clear_journal_err(sb, es); 5163 + if (err) { 5164 + EXT4_SB(sb)->s_journal = NULL; 5165 + jbd2_journal_destroy(journal); 5166 + return err; 5167 + } 5222 5168 5223 5169 if (!really_read_only && journal_devnum && 5224 5170 journal_devnum != le32_to_cpu(es->s_journal_dev)) { ··· 5233 5171 } 5234 5172 5235 5173 return 0; 5174 + 5175 + err_out: 5176 + jbd2_journal_destroy(journal); 5177 + return err; 5236 5178 } 5237 5179 5238 5180 static int ext4_commit_super(struct super_block *sb, int sync) ··· 5246 5180 int error = 0; 5247 5181 5248 5182 if (!sbh || block_device_ejected(sb)) 5249 - return error; 5250 - 5251 - /* 5252 - * The superblock bh should be mapped, but it might not be if the 5253 - * device was hot-removed. Not much we can do but fail the I/O. 5254 - */ 5255 - if (!buffer_mapped(sbh)) 5256 5183 return error; 5257 5184 5258 5185 /* ··· 5315 5256 * remounting) the filesystem readonly, then we will end up with a 5316 5257 * consistent fs on disk. Record that fact. 5317 5258 */ 5318 - static void ext4_mark_recovery_complete(struct super_block *sb, 5319 - struct ext4_super_block *es) 5259 + static int ext4_mark_recovery_complete(struct super_block *sb, 5260 + struct ext4_super_block *es) 5320 5261 { 5262 + int err; 5321 5263 journal_t *journal = EXT4_SB(sb)->s_journal; 5322 5264 5323 5265 if (!ext4_has_feature_journal(sb)) { 5324 - BUG_ON(journal != NULL); 5325 - return; 5266 + if (journal != NULL) { 5267 + ext4_error(sb, "Journal got removed while the fs was " 5268 + "mounted!"); 5269 + return -EFSCORRUPTED; 5270 + } 5271 + return 0; 5326 5272 } 5327 5273 jbd2_journal_lock_updates(journal); 5328 - if (jbd2_journal_flush(journal) < 0) 5274 + err = jbd2_journal_flush(journal); 5275 + if (err < 0) 5329 5276 goto out; 5330 5277 5331 5278 if (ext4_has_feature_journal_needs_recovery(sb) && sb_rdonly(sb)) { 5332 5279 ext4_clear_feature_journal_needs_recovery(sb); 5333 5280 ext4_commit_super(sb, 1); 5334 5281 } 5335 - 5336 5282 out: 5337 5283 jbd2_journal_unlock_updates(journal); 5284 + return err; 5338 5285 } 5339 5286 5340 5287 /* ··· 5348 5283 * has recorded an error from a previous lifetime, move that error to the 5349 5284 * main filesystem now. 5350 5285 */ 5351 - static void ext4_clear_journal_err(struct super_block *sb, 5286 + static int ext4_clear_journal_err(struct super_block *sb, 5352 5287 struct ext4_super_block *es) 5353 5288 { 5354 5289 journal_t *journal; 5355 5290 int j_errno; 5356 5291 const char *errstr; 5357 5292 5358 - BUG_ON(!ext4_has_feature_journal(sb)); 5293 + if (!ext4_has_feature_journal(sb)) { 5294 + ext4_error(sb, "Journal got removed while the fs was mounted!"); 5295 + return -EFSCORRUPTED; 5296 + } 5359 5297 5360 5298 journal = EXT4_SB(sb)->s_journal; 5361 5299 ··· 5383 5315 jbd2_journal_clear_err(journal); 5384 5316 jbd2_journal_update_sb_errno(journal); 5385 5317 } 5318 + return 0; 5386 5319 } 5387 5320 5388 5321 /* ··· 5526 5457 { 5527 5458 struct ext4_super_block *es; 5528 5459 struct ext4_sb_info *sbi = EXT4_SB(sb); 5529 - unsigned long old_sb_flags; 5460 + unsigned long old_sb_flags, vfs_flags; 5530 5461 struct ext4_mount_options old_opts; 5531 5462 int enable_quota = 0; 5532 5463 ext4_group_t g; ··· 5568 5499 #endif 5569 5500 if (sbi->s_journal && sbi->s_journal->j_task->io_context) 5570 5501 journal_ioprio = sbi->s_journal->j_task->io_context->ioprio; 5502 + 5503 + /* 5504 + * Some options can be enabled by ext4 and/or by VFS mount flag 5505 + * either way we need to make sure it matches in both *flags and 5506 + * s_flags. Copy those selected flags from *flags to s_flags 5507 + */ 5508 + vfs_flags = SB_LAZYTIME | SB_I_VERSION; 5509 + sb->s_flags = (sb->s_flags & ~vfs_flags) | (*flags & vfs_flags); 5571 5510 5572 5511 if (!parse_options(data, sb, NULL, &journal_ioprio, 1)) { 5573 5512 err = -EINVAL; ··· 5630 5553 set_task_ioprio(sbi->s_journal->j_task, journal_ioprio); 5631 5554 } 5632 5555 5633 - if (*flags & SB_LAZYTIME) 5634 - sb->s_flags |= SB_LAZYTIME; 5635 - 5636 5556 if ((bool)(*flags & SB_RDONLY) != sb_rdonly(sb)) { 5637 5557 if (sbi->s_mount_flags & EXT4_MF_FS_ABORTED) { 5638 5558 err = -EROFS; ··· 5659 5585 (sbi->s_mount_state & EXT4_VALID_FS)) 5660 5586 es->s_state = cpu_to_le16(sbi->s_mount_state); 5661 5587 5662 - if (sbi->s_journal) 5588 + if (sbi->s_journal) { 5589 + /* 5590 + * We let remount-ro finish even if marking fs 5591 + * as clean failed... 5592 + */ 5663 5593 ext4_mark_recovery_complete(sb, es); 5594 + } 5664 5595 if (sbi->s_mmp_tsk) 5665 5596 kthread_stop(sbi->s_mmp_tsk); 5666 5597 } else { ··· 5708 5629 } 5709 5630 5710 5631 /* 5632 + * Update the original bdev mapping's wb_err value 5633 + * which could be used to detect the metadata async 5634 + * write error. 5635 + */ 5636 + errseq_check_and_advance(&sb->s_bdev->bd_inode->i_mapping->wb_err, 5637 + &sbi->s_bdev_wb_err); 5638 + 5639 + /* 5711 5640 * Mounting a RDONLY partition read-write, so reread 5712 5641 * and store the current valid flag. (It may have 5713 5642 * been changed by e2fsck since we originally mounted 5714 5643 * the partition.) 5715 5644 */ 5716 - if (sbi->s_journal) 5717 - ext4_clear_journal_err(sb, es); 5645 + if (sbi->s_journal) { 5646 + err = ext4_clear_journal_err(sb, es); 5647 + if (err) 5648 + goto restore_opts; 5649 + } 5718 5650 sbi->s_mount_state = le16_to_cpu(es->s_state); 5719 5651 5720 5652 err = ext4_setup_super(sb, es, 0); ··· 5755 5665 ext4_register_li_request(sb, first_not_zeroed); 5756 5666 } 5757 5667 5758 - ext4_setup_system_zone(sb); 5668 + /* 5669 + * Handle creation of system zone data early because it can fail. 5670 + * Releasing of existing data is done when we are sure remount will 5671 + * succeed. 5672 + */ 5673 + if (test_opt(sb, BLOCK_VALIDITY) && !sbi->system_blks) { 5674 + err = ext4_setup_system_zone(sb); 5675 + if (err) 5676 + goto restore_opts; 5677 + } 5678 + 5759 5679 if (sbi->s_journal == NULL && !(old_sb_flags & SB_RDONLY)) { 5760 5680 err = ext4_commit_super(sb, 1); 5761 5681 if (err) ··· 5786 5686 } 5787 5687 } 5788 5688 #endif 5689 + if (!test_opt(sb, BLOCK_VALIDITY) && sbi->system_blks) 5690 + ext4_release_system_zone(sb); 5789 5691 5790 - *flags = (*flags & ~SB_LAZYTIME) | (sb->s_flags & SB_LAZYTIME); 5692 + /* 5693 + * Some options can be enabled by ext4 and/or by VFS mount flag 5694 + * either way we need to make sure it matches in both *flags and 5695 + * s_flags. Copy those selected flags from s_flags to *flags 5696 + */ 5697 + *flags = (*flags & ~vfs_flags) | (sb->s_flags & vfs_flags); 5698 + 5791 5699 ext4_msg(sb, KERN_INFO, "re-mounted. Opts: %s", orig_data); 5792 5700 kfree(orig_data); 5793 5701 return 0; ··· 5809 5701 sbi->s_commit_interval = old_opts.s_commit_interval; 5810 5702 sbi->s_min_batch_time = old_opts.s_min_batch_time; 5811 5703 sbi->s_max_batch_time = old_opts.s_max_batch_time; 5704 + if (!test_opt(sb, BLOCK_VALIDITY) && sbi->system_blks) 5705 + ext4_release_system_zone(sb); 5812 5706 #ifdef CONFIG_QUOTA 5813 5707 sbi->s_jquota_fmt = old_opts.s_jquota_fmt; 5814 5708 for (i = 0; i < EXT4_MAXQUOTAS; i++) {
+13
fs/ext4/sysfs.c
··· 189 189 #define EXT4_RW_ATTR_SBI_UL(_name,_elname) \ 190 190 EXT4_ATTR_OFFSET(_name, 0644, pointer_ul, ext4_sb_info, _elname) 191 191 192 + #define EXT4_RO_ATTR_SBI_ATOMIC(_name,_elname) \ 193 + EXT4_ATTR_OFFSET(_name, 0444, pointer_atomic, ext4_sb_info, _elname) 194 + 192 195 #define EXT4_ATTR_PTR(_name,_mode,_id,_ptr) \ 193 196 static struct ext4_attr ext4_attr_##_name = { \ 194 197 .attr = {.name = __stringify(_name), .mode = _mode }, \ ··· 218 215 EXT4_RW_ATTR_SBI_UI(mb_order2_req, s_mb_order2_reqs); 219 216 EXT4_RW_ATTR_SBI_UI(mb_stream_req, s_mb_stream_request); 220 217 EXT4_RW_ATTR_SBI_UI(mb_group_prealloc, s_mb_group_prealloc); 218 + EXT4_RW_ATTR_SBI_UI(mb_max_inode_prealloc, s_mb_max_inode_prealloc); 221 219 EXT4_RW_ATTR_SBI_UI(extent_max_zeroout_kb, s_extent_max_zeroout_kb); 222 220 EXT4_ATTR(trigger_fs_error, 0200, trigger_test_error); 223 221 EXT4_RW_ATTR_SBI_UI(err_ratelimit_interval_ms, s_err_ratelimit_state.interval); ··· 230 226 #ifdef CONFIG_EXT4_DEBUG 231 227 EXT4_RW_ATTR_SBI_UL(simulate_fail, s_simulate_fail); 232 228 #endif 229 + EXT4_RO_ATTR_SBI_ATOMIC(warning_count, s_warning_count); 230 + EXT4_RO_ATTR_SBI_ATOMIC(msg_count, s_msg_count); 233 231 EXT4_RO_ATTR_ES_UI(errors_count, s_error_count); 234 232 EXT4_RO_ATTR_ES_U8(first_error_errcode, s_first_error_errcode); 235 233 EXT4_RO_ATTR_ES_U8(last_error_errcode, s_last_error_errcode); ··· 246 240 EXT4_ATTR(first_error_time, 0444, first_error_time); 247 241 EXT4_ATTR(last_error_time, 0444, last_error_time); 248 242 EXT4_ATTR(journal_task, 0444, journal_task); 243 + EXT4_RW_ATTR_SBI_UI(mb_prefetch, s_mb_prefetch); 244 + EXT4_RW_ATTR_SBI_UI(mb_prefetch_limit, s_mb_prefetch_limit); 249 245 250 246 static unsigned int old_bump_val = 128; 251 247 EXT4_ATTR_PTR(max_writeback_mb_bump, 0444, pointer_ui, &old_bump_val); ··· 265 257 ATTR_LIST(mb_order2_req), 266 258 ATTR_LIST(mb_stream_req), 267 259 ATTR_LIST(mb_group_prealloc), 260 + ATTR_LIST(mb_max_inode_prealloc), 268 261 ATTR_LIST(max_writeback_mb_bump), 269 262 ATTR_LIST(extent_max_zeroout_kb), 270 263 ATTR_LIST(trigger_fs_error), ··· 276 267 ATTR_LIST(msg_ratelimit_interval_ms), 277 268 ATTR_LIST(msg_ratelimit_burst), 278 269 ATTR_LIST(errors_count), 270 + ATTR_LIST(warning_count), 271 + ATTR_LIST(msg_count), 279 272 ATTR_LIST(first_error_ino), 280 273 ATTR_LIST(last_error_ino), 281 274 ATTR_LIST(first_error_block), ··· 294 283 #ifdef CONFIG_EXT4_DEBUG 295 284 ATTR_LIST(simulate_fail), 296 285 #endif 286 + ATTR_LIST(mb_prefetch), 287 + ATTR_LIST(mb_prefetch_limit), 297 288 NULL, 298 289 }; 299 290 ATTRIBUTE_GROUPS(ext4);
+1 -2
fs/ext4/xattr.c
··· 1356 1356 1357 1357 block = 0; 1358 1358 while (wsize < bufsize) { 1359 - if (bh != NULL) 1360 - brelse(bh); 1359 + brelse(bh); 1361 1360 csize = (bufsize - wsize) > blocksize ? blocksize : 1362 1361 bufsize - wsize; 1363 1362 bh = ext4_getblk(handle, ea_inode, block, 0);
+9 -7
fs/jbd2/journal.c
··· 1285 1285 * superblock as being NULL to prevent the journal destroy from writing 1286 1286 * back a bogus superblock. 1287 1287 */ 1288 - static void journal_fail_superblock (journal_t *journal) 1288 + static void journal_fail_superblock(journal_t *journal) 1289 1289 { 1290 1290 struct buffer_head *bh = journal->j_sb_buffer; 1291 1291 brelse(bh); ··· 1367 1367 int ret; 1368 1368 1369 1369 /* Buffer got discarded which means block device got invalidated */ 1370 - if (!buffer_mapped(bh)) 1370 + if (!buffer_mapped(bh)) { 1371 + unlock_buffer(bh); 1371 1372 return -EIO; 1373 + } 1372 1374 1373 1375 trace_jbd2_write_superblock(journal, write_flags); 1374 1376 if (!(journal->j_flags & JBD2_BARRIER)) ··· 1817 1815 1818 1816 1819 1817 /** 1820 - *int jbd2_journal_check_used_features () - Check if features specified are used. 1818 + *int jbd2_journal_check_used_features() - Check if features specified are used. 1821 1819 * @journal: Journal to check. 1822 1820 * @compat: bitmask of compatible features 1823 1821 * @ro: bitmask of features that force read-only mount ··· 1827 1825 * features. Return true (non-zero) if it does. 1828 1826 **/ 1829 1827 1830 - int jbd2_journal_check_used_features (journal_t *journal, unsigned long compat, 1828 + int jbd2_journal_check_used_features(journal_t *journal, unsigned long compat, 1831 1829 unsigned long ro, unsigned long incompat) 1832 1830 { 1833 1831 journal_superblock_t *sb; ··· 1862 1860 * all of a given set of features on this journal. Return true 1863 1861 * (non-zero) if it can. */ 1864 1862 1865 - int jbd2_journal_check_available_features (journal_t *journal, unsigned long compat, 1863 + int jbd2_journal_check_available_features(journal_t *journal, unsigned long compat, 1866 1864 unsigned long ro, unsigned long incompat) 1867 1865 { 1868 1866 if (!compat && !ro && !incompat) ··· 1884 1882 } 1885 1883 1886 1884 /** 1887 - * int jbd2_journal_set_features () - Mark a given journal feature in the superblock 1885 + * int jbd2_journal_set_features() - Mark a given journal feature in the superblock 1888 1886 * @journal: Journal to act on. 1889 1887 * @compat: bitmask of compatible features 1890 1888 * @ro: bitmask of features that force read-only mount ··· 1895 1893 * 1896 1894 */ 1897 1895 1898 - int jbd2_journal_set_features (journal_t *journal, unsigned long compat, 1896 + int jbd2_journal_set_features(journal_t *journal, unsigned long compat, 1899 1897 unsigned long ro, unsigned long incompat) 1900 1898 { 1901 1899 #define INCOMPAT_FEATURE_ON(f) \
+11 -33
fs/jbd2/recovery.c
··· 690 690 * number. */ 691 691 if (pass == PASS_SCAN && 692 692 jbd2_has_feature_checksum(journal)) { 693 - int chksum_err, chksum_seen; 694 693 struct commit_header *cbh = 695 694 (struct commit_header *)bh->b_data; 696 695 unsigned found_chksum = 697 696 be32_to_cpu(cbh->h_chksum[0]); 698 - 699 - chksum_err = chksum_seen = 0; 700 697 701 698 if (info->end_transaction) { 702 699 journal->j_failed_commit = ··· 702 705 break; 703 706 } 704 707 705 - if (crc32_sum == found_chksum && 706 - cbh->h_chksum_type == JBD2_CRC32_CHKSUM && 707 - cbh->h_chksum_size == 708 - JBD2_CRC32_CHKSUM_SIZE) 709 - chksum_seen = 1; 710 - else if (!(cbh->h_chksum_type == 0 && 711 - cbh->h_chksum_size == 0 && 712 - found_chksum == 0 && 713 - !chksum_seen)) 714 - /* 715 - * If fs is mounted using an old kernel and then 716 - * kernel with journal_chksum is used then we 717 - * get a situation where the journal flag has 718 - * checksum flag set but checksums are not 719 - * present i.e chksum = 0, in the individual 720 - * commit blocks. 721 - * Hence to avoid checksum failures, in this 722 - * situation, this extra check is added. 723 - */ 724 - chksum_err = 1; 708 + /* Neither checksum match nor unused? */ 709 + if (!((crc32_sum == found_chksum && 710 + cbh->h_chksum_type == 711 + JBD2_CRC32_CHKSUM && 712 + cbh->h_chksum_size == 713 + JBD2_CRC32_CHKSUM_SIZE) || 714 + (cbh->h_chksum_type == 0 && 715 + cbh->h_chksum_size == 0 && 716 + found_chksum == 0))) 717 + goto chksum_error; 725 718 726 - if (chksum_err) { 727 - info->end_transaction = next_commit_ID; 728 - 729 - if (!jbd2_has_feature_async_commit(journal)) { 730 - journal->j_failed_commit = 731 - next_commit_ID; 732 - brelse(bh); 733 - break; 734 - } 735 - } 736 719 crc32_sum = ~0; 737 720 } 738 721 if (pass == PASS_SCAN && 739 722 !jbd2_commit_block_csum_verify(journal, 740 723 bh->b_data)) { 724 + chksum_error: 741 725 info->end_transaction = next_commit_ID; 742 726 743 727 if (!jbd2_has_feature_async_commit(journal)) {
+27 -6
fs/jbd2/transaction.c
··· 2026 2026 */ 2027 2027 static void __jbd2_journal_unfile_buffer(struct journal_head *jh) 2028 2028 { 2029 + J_ASSERT_JH(jh, jh->b_transaction != NULL); 2030 + J_ASSERT_JH(jh, jh->b_next_transaction == NULL); 2031 + 2029 2032 __jbd2_journal_temp_unlink_buffer(jh); 2030 2033 jh->b_transaction = NULL; 2031 2034 } ··· 2081 2078 * int jbd2_journal_try_to_free_buffers() - try to free page buffers. 2082 2079 * @journal: journal for operation 2083 2080 * @page: to try and free 2084 - * @gfp_mask: we use the mask to detect how hard should we try to release 2085 - * buffers. If __GFP_DIRECT_RECLAIM and __GFP_FS is set, we wait for commit 2086 - * code to release the buffers. 2087 - * 2088 2081 * 2089 2082 * For all the buffers on this page, 2090 2083 * if they are fully written out ordered data, move them onto BUF_CLEAN ··· 2111 2112 * 2112 2113 * Return 0 on failure, 1 on success 2113 2114 */ 2114 - int jbd2_journal_try_to_free_buffers(journal_t *journal, 2115 - struct page *page, gfp_t gfp_mask) 2115 + int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page) 2116 2116 { 2117 2117 struct buffer_head *head; 2118 2118 struct buffer_head *bh; 2119 + bool has_write_io_error = false; 2119 2120 int ret = 0; 2120 2121 2121 2122 J_ASSERT(PageLocked(page)); ··· 2140 2141 jbd2_journal_put_journal_head(jh); 2141 2142 if (buffer_jbd(bh)) 2142 2143 goto busy; 2144 + 2145 + /* 2146 + * If we free a metadata buffer which has been failed to 2147 + * write out, the jbd2 checkpoint procedure will not detect 2148 + * this failure and may lead to filesystem inconsistency 2149 + * after cleanup journal tail. 2150 + */ 2151 + if (buffer_write_io_error(bh)) { 2152 + pr_err("JBD2: Error while async write back metadata bh %llu.", 2153 + (unsigned long long)bh->b_blocknr); 2154 + has_write_io_error = true; 2155 + } 2143 2156 } while ((bh = bh->b_this_page) != head); 2144 2157 2145 2158 ret = try_to_free_buffers(page); 2146 2159 2147 2160 busy: 2161 + if (has_write_io_error) 2162 + jbd2_journal_abort(journal, -EIO); 2163 + 2148 2164 return ret; 2149 2165 } 2150 2166 ··· 2586 2572 2587 2573 was_dirty = test_clear_buffer_jbddirty(bh); 2588 2574 __jbd2_journal_temp_unlink_buffer(jh); 2575 + 2576 + /* 2577 + * b_transaction must be set, otherwise the new b_transaction won't 2578 + * be holding jh reference 2579 + */ 2580 + J_ASSERT_JH(jh, jh->b_transaction != NULL); 2581 + 2589 2582 /* 2590 2583 * We set b_transaction here because b_next_transaction will inherit 2591 2584 * our jh reference and thus __jbd2_journal_file_buffer() must not
+1 -1
include/linux/jbd2.h
··· 1381 1381 extern int jbd2_journal_forget (handle_t *, struct buffer_head *); 1382 1382 extern int jbd2_journal_invalidatepage(journal_t *, 1383 1383 struct page *, unsigned int, unsigned int); 1384 - extern int jbd2_journal_try_to_free_buffers(journal_t *, struct page *, gfp_t); 1384 + extern int jbd2_journal_try_to_free_buffers(journal_t *journal, struct page *page); 1385 1385 extern int jbd2_journal_stop(handle_t *); 1386 1386 extern int jbd2_journal_flush (journal_t *); 1387 1387 extern void jbd2_journal_lock_updates (journal_t *);
+75 -10
include/trace/events/ext4.h
··· 746 746 ); 747 747 748 748 TRACE_EVENT(ext4_discard_preallocations, 749 - TP_PROTO(struct inode *inode), 749 + TP_PROTO(struct inode *inode, unsigned int len, unsigned int needed), 750 750 751 - TP_ARGS(inode), 751 + TP_ARGS(inode, len, needed), 752 752 753 753 TP_STRUCT__entry( 754 - __field( dev_t, dev ) 755 - __field( ino_t, ino ) 754 + __field( dev_t, dev ) 755 + __field( ino_t, ino ) 756 + __field( unsigned int, len ) 757 + __field( unsigned int, needed ) 756 758 757 759 ), 758 760 759 761 TP_fast_assign( 760 762 __entry->dev = inode->i_sb->s_dev; 761 763 __entry->ino = inode->i_ino; 764 + __entry->len = len; 765 + __entry->needed = needed; 762 766 ), 763 767 764 - TP_printk("dev %d,%d ino %lu", 768 + TP_printk("dev %d,%d ino %lu len: %u needed %u", 765 769 MAJOR(__entry->dev), MINOR(__entry->dev), 766 - (unsigned long) __entry->ino) 770 + (unsigned long) __entry->ino, __entry->len, 771 + __entry->needed) 767 772 ); 768 773 769 774 TRACE_EVENT(ext4_mb_discard_preallocations, ··· 1317 1312 TP_ARGS(sb, group) 1318 1313 ); 1319 1314 1320 - DEFINE_EVENT(ext4__bitmap_load, ext4_read_block_bitmap_load, 1315 + DEFINE_EVENT(ext4__bitmap_load, ext4_load_inode_bitmap, 1321 1316 1322 1317 TP_PROTO(struct super_block *sb, unsigned long group), 1323 1318 1324 1319 TP_ARGS(sb, group) 1325 1320 ); 1326 1321 1327 - DEFINE_EVENT(ext4__bitmap_load, ext4_load_inode_bitmap, 1322 + TRACE_EVENT(ext4_read_block_bitmap_load, 1323 + TP_PROTO(struct super_block *sb, unsigned long group, bool prefetch), 1328 1324 1329 - TP_PROTO(struct super_block *sb, unsigned long group), 1325 + TP_ARGS(sb, group, prefetch), 1330 1326 1331 - TP_ARGS(sb, group) 1327 + TP_STRUCT__entry( 1328 + __field( dev_t, dev ) 1329 + __field( __u32, group ) 1330 + __field( bool, prefetch ) 1331 + 1332 + ), 1333 + 1334 + TP_fast_assign( 1335 + __entry->dev = sb->s_dev; 1336 + __entry->group = group; 1337 + __entry->prefetch = prefetch; 1338 + ), 1339 + 1340 + TP_printk("dev %d,%d group %u prefetch %d", 1341 + MAJOR(__entry->dev), MINOR(__entry->dev), 1342 + __entry->group, __entry->prefetch) 1332 1343 ); 1333 1344 1334 1345 TRACE_EVENT(ext4_direct_IO_enter, ··· 2745 2724 TP_printk("dev %d,%d function %s line %u", 2746 2725 MAJOR(__entry->dev), MINOR(__entry->dev), 2747 2726 __entry->function, __entry->line) 2727 + ); 2728 + 2729 + TRACE_EVENT(ext4_prefetch_bitmaps, 2730 + TP_PROTO(struct super_block *sb, ext4_group_t group, 2731 + ext4_group_t next, unsigned int prefetch_ios), 2732 + 2733 + TP_ARGS(sb, group, next, prefetch_ios), 2734 + 2735 + TP_STRUCT__entry( 2736 + __field( dev_t, dev ) 2737 + __field( __u32, group ) 2738 + __field( __u32, next ) 2739 + __field( __u32, ios ) 2740 + ), 2741 + 2742 + TP_fast_assign( 2743 + __entry->dev = sb->s_dev; 2744 + __entry->group = group; 2745 + __entry->next = next; 2746 + __entry->ios = prefetch_ios; 2747 + ), 2748 + 2749 + TP_printk("dev %d,%d group %u next %u ios %u", 2750 + MAJOR(__entry->dev), MINOR(__entry->dev), 2751 + __entry->group, __entry->next, __entry->ios) 2752 + ); 2753 + 2754 + TRACE_EVENT(ext4_lazy_itable_init, 2755 + TP_PROTO(struct super_block *sb, ext4_group_t group), 2756 + 2757 + TP_ARGS(sb, group), 2758 + 2759 + TP_STRUCT__entry( 2760 + __field( dev_t, dev ) 2761 + __field( __u32, group ) 2762 + ), 2763 + 2764 + TP_fast_assign( 2765 + __entry->dev = sb->s_dev; 2766 + __entry->group = group; 2767 + ), 2768 + 2769 + TP_printk("dev %d,%d group %u", 2770 + MAJOR(__entry->dev), MINOR(__entry->dev), __entry->group) 2748 2771 ); 2749 2772 2750 2773 #endif /* _TRACE_EXT4_H */