Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

btrfs: introduce delayed_refs_rsv

Traditionally we've had voodoo in btrfs to account for the space that
delayed refs may take up by having a global_block_rsv. This works most
of the time, except when it doesn't. We've had issues reported and seen
in production where sometimes the global reserve is exhausted during
transaction commit before we can run all of our delayed refs, resulting
in an aborted transaction. Because of this voodoo we have equally
dubious flushing semantics around throttling delayed refs which we often
get wrong.

So instead give them their own block_rsv. This way we can always know
exactly how much outstanding space we need for delayed refs. This
allows us to make sure we are constantly filling that reservation up
with space, and allows us to put more precise pressure on the enospc
system. Instead of doing math to see if its a good time to throttle,
the normal enospc code will be invoked if we have a lot of delayed refs
pending, and they will be run via the normal flushing mechanism.

For now the delayed_refs_rsv will hold the reservations for the delayed
refs, the block group updates, and deleting csums. We could have a
separate rsv for the block group updates, but the csum deletion stuff is
still handled via the delayed_refs so that will stay there.

Historical background:

The global reserve has grown to cover everything we don't reserve space
explicitly for, and we've grown a lot of weird ad-hoc heuristics to know
if we're running short on space and when it's time to force a commit. A
failure rate of 20-40 file systems when we run hundreds of thousands of
them isn't super high, but cleaning up this code will make things less
ugly and more predictible.

Thus the delayed refs rsv. We always know how many delayed refs we have
outstanding, and although running them generates more we can use the
global reserve for that spill over, which fits better into it's desired
use than a full blown reservation. This first approach is to simply
take how many times we're reserving space for and multiply that by 2 in
order to save enough space for the delayed refs that could be generated.
This is a niave approach and will probably evolve, but for now it works.

Signed-off-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com> # high-level review
[ added background notes from the cover letter ]
Signed-off-by: David Sterba <dsterba@suse.com>

authored by

Josef Bacik and committed by
David Sterba
ba2c4d4e 158ffa36

+281 -25
+10
fs/btrfs/ctree.h
··· 468 468 BTRFS_BLOCK_RSV_TRANS, 469 469 BTRFS_BLOCK_RSV_CHUNK, 470 470 BTRFS_BLOCK_RSV_DELOPS, 471 + BTRFS_BLOCK_RSV_DELREFS, 471 472 BTRFS_BLOCK_RSV_EMPTY, 472 473 BTRFS_BLOCK_RSV_TEMP, 473 474 }; ··· 832 831 struct btrfs_block_rsv chunk_block_rsv; 833 832 /* block reservation for delayed operations */ 834 833 struct btrfs_block_rsv delayed_block_rsv; 834 + /* block reservation for delayed refs */ 835 + struct btrfs_block_rsv delayed_refs_rsv; 835 836 836 837 struct btrfs_block_rsv empty_block_rsv; 837 838 ··· 2819 2816 void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info, 2820 2817 struct btrfs_block_rsv *block_rsv, 2821 2818 u64 num_bytes); 2819 + void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr); 2820 + void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans); 2821 + int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info, 2822 + enum btrfs_reserve_flush_enum flush); 2823 + void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info, 2824 + struct btrfs_block_rsv *src, 2825 + u64 num_bytes); 2822 2826 int btrfs_inc_block_group_ro(struct btrfs_block_group_cache *cache); 2823 2827 void btrfs_dec_block_group_ro(struct btrfs_block_group_cache *cache); 2824 2828 void btrfs_put_block_group_cache(struct btrfs_fs_info *info);
+38 -6
fs/btrfs/delayed-ref.c
··· 473 473 * helper function to update the accounting in the head ref 474 474 * existing and update must have the same bytenr 475 475 */ 476 - static noinline void 477 - update_existing_head_ref(struct btrfs_delayed_ref_root *delayed_refs, 476 + static noinline void update_existing_head_ref(struct btrfs_trans_handle *trans, 478 477 struct btrfs_delayed_ref_head *existing, 479 478 struct btrfs_delayed_ref_head *update, 480 479 int *old_ref_mod_ret) 481 480 { 481 + struct btrfs_delayed_ref_root *delayed_refs = 482 + &trans->transaction->delayed_refs; 483 + struct btrfs_fs_info *fs_info = trans->fs_info; 482 484 int old_ref_mod; 483 485 484 486 BUG_ON(existing->is_data != update->is_data); ··· 538 536 * versa we need to make sure to adjust pending_csums accordingly. 539 537 */ 540 538 if (existing->is_data) { 541 - if (existing->total_ref_mod >= 0 && old_ref_mod < 0) 539 + u64 csum_leaves = 540 + btrfs_csum_bytes_to_leaves(fs_info, 541 + existing->num_bytes); 542 + 543 + if (existing->total_ref_mod >= 0 && old_ref_mod < 0) { 542 544 delayed_refs->pending_csums -= existing->num_bytes; 543 - if (existing->total_ref_mod < 0 && old_ref_mod >= 0) 545 + btrfs_delayed_refs_rsv_release(fs_info, csum_leaves); 546 + } 547 + if (existing->total_ref_mod < 0 && old_ref_mod >= 0) { 544 548 delayed_refs->pending_csums += existing->num_bytes; 549 + trans->delayed_ref_updates += csum_leaves; 550 + } 545 551 } 546 552 spin_unlock(&existing->lock); 547 553 } ··· 655 645 && head_ref->qgroup_reserved 656 646 && existing->qgroup_ref_root 657 647 && existing->qgroup_reserved); 658 - update_existing_head_ref(delayed_refs, existing, head_ref, 648 + update_existing_head_ref(trans, existing, head_ref, 659 649 old_ref_mod); 660 650 /* 661 651 * we've updated the existing ref, free the newly ··· 666 656 } else { 667 657 if (old_ref_mod) 668 658 *old_ref_mod = 0; 669 - if (head_ref->is_data && head_ref->ref_mod < 0) 659 + if (head_ref->is_data && head_ref->ref_mod < 0) { 670 660 delayed_refs->pending_csums += head_ref->num_bytes; 661 + trans->delayed_ref_updates += 662 + btrfs_csum_bytes_to_leaves(trans->fs_info, 663 + head_ref->num_bytes); 664 + } 671 665 delayed_refs->num_heads++; 672 666 delayed_refs->num_heads_ready++; 673 667 atomic_inc(&delayed_refs->num_entries); ··· 807 793 ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node); 808 794 spin_unlock(&delayed_refs->lock); 809 795 796 + /* 797 + * Need to update the delayed_refs_rsv with any changes we may have 798 + * made. 799 + */ 800 + btrfs_update_delayed_refs_rsv(trans); 801 + 810 802 trace_add_delayed_tree_ref(fs_info, &ref->node, ref, 811 803 action == BTRFS_ADD_DELAYED_EXTENT ? 812 804 BTRFS_ADD_DELAYED_REF : action); ··· 894 874 ret = insert_delayed_ref(trans, delayed_refs, head_ref, &ref->node); 895 875 spin_unlock(&delayed_refs->lock); 896 876 877 + /* 878 + * Need to update the delayed_refs_rsv with any changes we may have 879 + * made. 880 + */ 881 + btrfs_update_delayed_refs_rsv(trans); 882 + 897 883 trace_add_delayed_data_ref(trans->fs_info, &ref->node, ref, 898 884 action == BTRFS_ADD_DELAYED_EXTENT ? 899 885 BTRFS_ADD_DELAYED_REF : action); ··· 936 910 NULL, NULL, NULL); 937 911 938 912 spin_unlock(&delayed_refs->lock); 913 + 914 + /* 915 + * Need to update the delayed_refs_rsv with any changes we may have 916 + * made. 917 + */ 918 + btrfs_update_delayed_refs_rsv(trans); 939 919 return 0; 940 920 } 941 921
+4
fs/btrfs/disk-io.c
··· 2678 2678 btrfs_init_block_rsv(&fs_info->empty_block_rsv, BTRFS_BLOCK_RSV_EMPTY); 2679 2679 btrfs_init_block_rsv(&fs_info->delayed_block_rsv, 2680 2680 BTRFS_BLOCK_RSV_DELOPS); 2681 + btrfs_init_block_rsv(&fs_info->delayed_refs_rsv, 2682 + BTRFS_BLOCK_RSV_DELREFS); 2683 + 2681 2684 atomic_set(&fs_info->async_delalloc_pages, 0); 2682 2685 atomic_set(&fs_info->defrag_running, 0); 2683 2686 atomic_set(&fs_info->qgroup_op_seq, 0); ··· 4449 4446 4450 4447 spin_unlock(&cur_trans->dirty_bgs_lock); 4451 4448 btrfs_put_block_group(cache); 4449 + btrfs_delayed_refs_rsv_release(fs_info, 1); 4452 4450 spin_lock(&cur_trans->dirty_bgs_lock); 4453 4451 } 4454 4452 spin_unlock(&cur_trans->dirty_bgs_lock);
+195 -16
fs/btrfs/extent-tree.c
··· 2462 2462 struct btrfs_fs_info *fs_info = trans->fs_info; 2463 2463 struct btrfs_delayed_ref_root *delayed_refs = 2464 2464 &trans->transaction->delayed_refs; 2465 + int nr_items = 1; /* Dropping this ref head update. */ 2465 2466 2466 2467 if (head->total_ref_mod < 0) { 2467 2468 struct btrfs_space_info *space_info; ··· 2480 2479 -head->num_bytes, 2481 2480 BTRFS_TOTAL_BYTES_PINNED_BATCH); 2482 2481 2482 + /* 2483 + * We had csum deletions accounted for in our delayed refs rsv, 2484 + * we need to drop the csum leaves for this update from our 2485 + * delayed_refs_rsv. 2486 + */ 2483 2487 if (head->is_data) { 2484 2488 spin_lock(&delayed_refs->lock); 2485 2489 delayed_refs->pending_csums -= head->num_bytes; 2486 2490 spin_unlock(&delayed_refs->lock); 2491 + nr_items += btrfs_csum_bytes_to_leaves(fs_info, 2492 + head->num_bytes); 2487 2493 } 2488 2494 } 2489 2495 2490 2496 /* Also free its reserved qgroup space */ 2491 2497 btrfs_qgroup_free_delayed_ref(fs_info, head->qgroup_ref_root, 2492 2498 head->qgroup_reserved); 2499 + btrfs_delayed_refs_rsv_release(fs_info, nr_items); 2493 2500 } 2494 2501 2495 2502 static int cleanup_ref_head(struct btrfs_trans_handle *trans, ··· 3635 3626 */ 3636 3627 mutex_lock(&trans->transaction->cache_write_mutex); 3637 3628 while (!list_empty(&dirty)) { 3629 + bool drop_reserve = true; 3630 + 3638 3631 cache = list_first_entry(&dirty, 3639 3632 struct btrfs_block_group_cache, 3640 3633 dirty_list); ··· 3709 3698 list_add_tail(&cache->dirty_list, 3710 3699 &cur_trans->dirty_bgs); 3711 3700 btrfs_get_block_group(cache); 3701 + drop_reserve = false; 3712 3702 } 3713 3703 spin_unlock(&cur_trans->dirty_bgs_lock); 3714 3704 } else if (ret) { ··· 3720 3708 /* if its not on the io list, we need to put the block group */ 3721 3709 if (should_put) 3722 3710 btrfs_put_block_group(cache); 3711 + if (drop_reserve) 3712 + btrfs_delayed_refs_rsv_release(fs_info, 1); 3723 3713 3724 3714 if (ret) 3725 3715 break; ··· 3870 3856 /* if its not on the io list, we need to put the block group */ 3871 3857 if (should_put) 3872 3858 btrfs_put_block_group(cache); 3859 + btrfs_delayed_refs_rsv_release(fs_info, 1); 3873 3860 spin_lock(&cur_trans->dirty_bgs_lock); 3874 3861 } 3875 3862 spin_unlock(&cur_trans->dirty_bgs_lock); ··· 5404 5389 return 0; 5405 5390 } 5406 5391 5392 + /** 5393 + * btrfs_migrate_to_delayed_refs_rsv - transfer bytes to our delayed refs rsv. 5394 + * @fs_info - the fs info for our fs. 5395 + * @src - the source block rsv to transfer from. 5396 + * @num_bytes - the number of bytes to transfer. 5397 + * 5398 + * This transfers up to the num_bytes amount from the src rsv to the 5399 + * delayed_refs_rsv. Any extra bytes are returned to the space info. 5400 + */ 5401 + void btrfs_migrate_to_delayed_refs_rsv(struct btrfs_fs_info *fs_info, 5402 + struct btrfs_block_rsv *src, 5403 + u64 num_bytes) 5404 + { 5405 + struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv; 5406 + u64 to_free = 0; 5407 + 5408 + spin_lock(&src->lock); 5409 + src->reserved -= num_bytes; 5410 + src->size -= num_bytes; 5411 + spin_unlock(&src->lock); 5412 + 5413 + spin_lock(&delayed_refs_rsv->lock); 5414 + if (delayed_refs_rsv->size > delayed_refs_rsv->reserved) { 5415 + u64 delta = delayed_refs_rsv->size - 5416 + delayed_refs_rsv->reserved; 5417 + if (num_bytes > delta) { 5418 + to_free = num_bytes - delta; 5419 + num_bytes = delta; 5420 + } 5421 + } else { 5422 + to_free = num_bytes; 5423 + num_bytes = 0; 5424 + } 5425 + 5426 + if (num_bytes) 5427 + delayed_refs_rsv->reserved += num_bytes; 5428 + if (delayed_refs_rsv->reserved >= delayed_refs_rsv->size) 5429 + delayed_refs_rsv->full = 1; 5430 + spin_unlock(&delayed_refs_rsv->lock); 5431 + 5432 + if (num_bytes) 5433 + trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv", 5434 + 0, num_bytes, 1); 5435 + if (to_free) 5436 + space_info_add_old_bytes(fs_info, delayed_refs_rsv->space_info, 5437 + to_free); 5438 + } 5439 + 5440 + /** 5441 + * btrfs_delayed_refs_rsv_refill - refill based on our delayed refs usage. 5442 + * @fs_info - the fs_info for our fs. 5443 + * @flush - control how we can flush for this reservation. 5444 + * 5445 + * This will refill the delayed block_rsv up to 1 items size worth of space and 5446 + * will return -ENOSPC if we can't make the reservation. 5447 + */ 5448 + int btrfs_delayed_refs_rsv_refill(struct btrfs_fs_info *fs_info, 5449 + enum btrfs_reserve_flush_enum flush) 5450 + { 5451 + struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv; 5452 + u64 limit = btrfs_calc_trans_metadata_size(fs_info, 1); 5453 + u64 num_bytes = 0; 5454 + int ret = -ENOSPC; 5455 + 5456 + spin_lock(&block_rsv->lock); 5457 + if (block_rsv->reserved < block_rsv->size) { 5458 + num_bytes = block_rsv->size - block_rsv->reserved; 5459 + num_bytes = min(num_bytes, limit); 5460 + } 5461 + spin_unlock(&block_rsv->lock); 5462 + 5463 + if (!num_bytes) 5464 + return 0; 5465 + 5466 + ret = reserve_metadata_bytes(fs_info->extent_root, block_rsv, 5467 + num_bytes, flush); 5468 + if (ret) 5469 + return ret; 5470 + block_rsv_add_bytes(block_rsv, num_bytes, 0); 5471 + trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv", 5472 + 0, num_bytes, 1); 5473 + return 0; 5474 + } 5475 + 5407 5476 /* 5408 5477 * This is for space we already have accounted in space_info->bytes_may_use, so 5409 5478 * basically when we're returning space from block_rsv's. ··· 5808 5709 return ret; 5809 5710 } 5810 5711 5712 + static u64 __btrfs_block_rsv_release(struct btrfs_fs_info *fs_info, 5713 + struct btrfs_block_rsv *block_rsv, 5714 + u64 num_bytes, u64 *qgroup_to_release) 5715 + { 5716 + struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv; 5717 + struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_refs_rsv; 5718 + struct btrfs_block_rsv *target = delayed_rsv; 5719 + 5720 + if (target->full || target == block_rsv) 5721 + target = global_rsv; 5722 + 5723 + if (block_rsv->space_info != target->space_info) 5724 + target = NULL; 5725 + 5726 + return block_rsv_release_bytes(fs_info, block_rsv, target, num_bytes, 5727 + qgroup_to_release); 5728 + } 5729 + 5730 + void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info, 5731 + struct btrfs_block_rsv *block_rsv, 5732 + u64 num_bytes) 5733 + { 5734 + __btrfs_block_rsv_release(fs_info, block_rsv, num_bytes, NULL); 5735 + } 5736 + 5811 5737 /** 5812 5738 * btrfs_inode_rsv_release - release any excessive reservation. 5813 5739 * @inode - the inode we need to release from. ··· 5847 5723 static void btrfs_inode_rsv_release(struct btrfs_inode *inode, bool qgroup_free) 5848 5724 { 5849 5725 struct btrfs_fs_info *fs_info = inode->root->fs_info; 5850 - struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv; 5851 5726 struct btrfs_block_rsv *block_rsv = &inode->block_rsv; 5852 5727 u64 released = 0; 5853 5728 u64 qgroup_to_release = 0; ··· 5856 5733 * are releasing 0 bytes, and then we'll just get the reservation over 5857 5734 * the size free'd. 5858 5735 */ 5859 - released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 0, 5860 - &qgroup_to_release); 5736 + released = __btrfs_block_rsv_release(fs_info, block_rsv, 0, 5737 + &qgroup_to_release); 5861 5738 if (released > 0) 5862 5739 trace_btrfs_space_reservation(fs_info, "delalloc", 5863 5740 btrfs_ino(inode), released, 0); ··· 5868 5745 qgroup_to_release); 5869 5746 } 5870 5747 5871 - void btrfs_block_rsv_release(struct btrfs_fs_info *fs_info, 5872 - struct btrfs_block_rsv *block_rsv, 5873 - u64 num_bytes) 5748 + /** 5749 + * btrfs_delayed_refs_rsv_release - release a ref head's reservation. 5750 + * @fs_info - the fs_info for our fs. 5751 + * @nr - the number of items to drop. 5752 + * 5753 + * This drops the delayed ref head's count from the delayed refs rsv and frees 5754 + * any excess reservation we had. 5755 + */ 5756 + void btrfs_delayed_refs_rsv_release(struct btrfs_fs_info *fs_info, int nr) 5874 5757 { 5758 + struct btrfs_block_rsv *block_rsv = &fs_info->delayed_refs_rsv; 5875 5759 struct btrfs_block_rsv *global_rsv = &fs_info->global_block_rsv; 5760 + u64 num_bytes = btrfs_calc_trans_metadata_size(fs_info, nr); 5761 + u64 released = 0; 5876 5762 5877 - if (global_rsv == block_rsv || 5878 - block_rsv->space_info != global_rsv->space_info) 5879 - global_rsv = NULL; 5880 - block_rsv_release_bytes(fs_info, block_rsv, global_rsv, num_bytes, NULL); 5763 + released = block_rsv_release_bytes(fs_info, block_rsv, global_rsv, 5764 + num_bytes, NULL); 5765 + if (released) 5766 + trace_btrfs_space_reservation(fs_info, "delayed_refs_rsv", 5767 + 0, released, 0); 5881 5768 } 5882 5769 5883 5770 static void update_global_block_rsv(struct btrfs_fs_info *fs_info) ··· 5952 5819 fs_info->trans_block_rsv.space_info = space_info; 5953 5820 fs_info->empty_block_rsv.space_info = space_info; 5954 5821 fs_info->delayed_block_rsv.space_info = space_info; 5822 + fs_info->delayed_refs_rsv.space_info = space_info; 5955 5823 5956 - fs_info->extent_root->block_rsv = &fs_info->global_block_rsv; 5957 - fs_info->csum_root->block_rsv = &fs_info->global_block_rsv; 5824 + fs_info->extent_root->block_rsv = &fs_info->delayed_refs_rsv; 5825 + fs_info->csum_root->block_rsv = &fs_info->delayed_refs_rsv; 5958 5826 fs_info->dev_root->block_rsv = &fs_info->global_block_rsv; 5959 5827 fs_info->tree_root->block_rsv = &fs_info->global_block_rsv; 5960 5828 if (fs_info->quota_root) ··· 5975 5841 WARN_ON(fs_info->chunk_block_rsv.reserved > 0); 5976 5842 WARN_ON(fs_info->delayed_block_rsv.size > 0); 5977 5843 WARN_ON(fs_info->delayed_block_rsv.reserved > 0); 5844 + WARN_ON(fs_info->delayed_refs_rsv.reserved > 0); 5845 + WARN_ON(fs_info->delayed_refs_rsv.size > 0); 5978 5846 } 5979 5847 5848 + /* 5849 + * btrfs_update_delayed_refs_rsv - adjust the size of the delayed refs rsv 5850 + * @trans - the trans that may have generated delayed refs 5851 + * 5852 + * This is to be called anytime we may have adjusted trans->delayed_ref_updates, 5853 + * it'll calculate the additional size and add it to the delayed_refs_rsv. 5854 + */ 5855 + void btrfs_update_delayed_refs_rsv(struct btrfs_trans_handle *trans) 5856 + { 5857 + struct btrfs_fs_info *fs_info = trans->fs_info; 5858 + struct btrfs_block_rsv *delayed_rsv = &fs_info->delayed_refs_rsv; 5859 + u64 num_bytes; 5860 + 5861 + if (!trans->delayed_ref_updates) 5862 + return; 5863 + 5864 + num_bytes = btrfs_calc_trans_metadata_size(fs_info, 5865 + trans->delayed_ref_updates); 5866 + spin_lock(&delayed_rsv->lock); 5867 + delayed_rsv->size += num_bytes; 5868 + delayed_rsv->full = 0; 5869 + spin_unlock(&delayed_rsv->lock); 5870 + trans->delayed_ref_updates = 0; 5871 + } 5980 5872 5981 5873 /* 5982 5874 * To be called after all the new block groups attached to the transaction ··· 6295 6135 u64 old_val; 6296 6136 u64 byte_in_group; 6297 6137 int factor; 6138 + int ret = 0; 6298 6139 6299 6140 /* block accounting for super block */ 6300 6141 spin_lock(&info->delalloc_root_lock); ··· 6309 6148 6310 6149 while (total) { 6311 6150 cache = btrfs_lookup_block_group(info, bytenr); 6312 - if (!cache) 6313 - return -ENOENT; 6151 + if (!cache) { 6152 + ret = -ENOENT; 6153 + break; 6154 + } 6314 6155 factor = btrfs_bg_type_to_factor(cache->flags); 6315 6156 6316 6157 /* ··· 6371 6208 list_add_tail(&cache->dirty_list, 6372 6209 &trans->transaction->dirty_bgs); 6373 6210 trans->transaction->num_dirty_bgs++; 6211 + trans->delayed_ref_updates++; 6374 6212 btrfs_get_block_group(cache); 6375 6213 } 6376 6214 spin_unlock(&trans->transaction->dirty_bgs_lock); ··· 6389 6225 total -= num_bytes; 6390 6226 bytenr += num_bytes; 6391 6227 } 6392 - return 0; 6228 + 6229 + /* Modified block groups are accounted for in the delayed_refs_rsv. */ 6230 + btrfs_update_delayed_refs_rsv(trans); 6231 + return ret; 6393 6232 } 6394 6233 6395 6234 static u64 first_logical_byte(struct btrfs_fs_info *fs_info, u64 search_start) ··· 8538 8371 goto again; 8539 8372 } 8540 8373 8541 - if (btrfs_test_opt(fs_info, ENOSPC_DEBUG)) { 8374 + /* 8375 + * The global reserve still exists to save us from ourselves, so don't 8376 + * warn_on if we are short on our delayed refs reserve. 8377 + */ 8378 + if (block_rsv->type != BTRFS_BLOCK_RSV_DELREFS && 8379 + btrfs_test_opt(fs_info, ENOSPC_DEBUG)) { 8542 8380 static DEFINE_RATELIMIT_STATE(_rs, 8543 8381 DEFAULT_RATELIMIT_INTERVAL * 10, 8544 8382 /*DEFAULT_RATELIMIT_BURST*/ 1); ··· 10476 10304 add_block_group_free_space(trans, block_group); 10477 10305 /* already aborted the transaction if it failed. */ 10478 10306 next: 10307 + btrfs_delayed_refs_rsv_release(fs_info, 1); 10479 10308 list_del_init(&block_group->bg_list); 10480 10309 } 10481 10310 btrfs_trans_release_chunk_metadata(trans); ··· 10554 10381 link_block_group(cache); 10555 10382 10556 10383 list_add_tail(&cache->bg_list, &trans->new_bgs); 10384 + trans->delayed_ref_updates++; 10385 + btrfs_update_delayed_refs_rsv(trans); 10557 10386 10558 10387 set_avail_alloc_bits(fs_info, type); 10559 10388 return 0; ··· 10593 10418 int factor; 10594 10419 struct btrfs_caching_control *caching_ctl = NULL; 10595 10420 bool remove_em; 10421 + bool remove_rsv = false; 10596 10422 10597 10423 block_group = btrfs_lookup_block_group(fs_info, group_start); 10598 10424 BUG_ON(!block_group); ··· 10658 10482 10659 10483 if (!list_empty(&block_group->dirty_list)) { 10660 10484 list_del_init(&block_group->dirty_list); 10485 + remove_rsv = true; 10661 10486 btrfs_put_block_group(block_group); 10662 10487 } 10663 10488 spin_unlock(&trans->transaction->dirty_bgs_lock); ··· 10868 10691 10869 10692 ret = btrfs_del_item(trans, root, path); 10870 10693 out: 10694 + if (remove_rsv) 10695 + btrfs_delayed_refs_rsv_release(fs_info, 1); 10871 10696 btrfs_free_path(path); 10872 10697 return ret; 10873 10698 }
+34 -3
fs/btrfs/transaction.c
··· 454 454 bool enforce_qgroups) 455 455 { 456 456 struct btrfs_fs_info *fs_info = root->fs_info; 457 - 457 + struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv; 458 458 struct btrfs_trans_handle *h; 459 459 struct btrfs_transaction *cur_trans; 460 460 u64 num_bytes = 0; ··· 483 483 * the appropriate flushing if need be. 484 484 */ 485 485 if (num_items && root != fs_info->chunk_root) { 486 + struct btrfs_block_rsv *rsv = &fs_info->trans_block_rsv; 487 + u64 delayed_refs_bytes = 0; 488 + 486 489 qgroup_reserved = num_items * fs_info->nodesize; 487 490 ret = btrfs_qgroup_reserve_meta_pertrans(root, qgroup_reserved, 488 491 enforce_qgroups); 489 492 if (ret) 490 493 return ERR_PTR(ret); 491 494 495 + /* 496 + * We want to reserve all the bytes we may need all at once, so 497 + * we only do 1 enospc flushing cycle per transaction start. We 498 + * accomplish this by simply assuming we'll do 2 x num_items 499 + * worth of delayed refs updates in this trans handle, and 500 + * refill that amount for whatever is missing in the reserve. 501 + */ 492 502 num_bytes = btrfs_calc_trans_metadata_size(fs_info, num_items); 503 + if (delayed_refs_rsv->full == 0) { 504 + delayed_refs_bytes = num_bytes; 505 + num_bytes <<= 1; 506 + } 507 + 493 508 /* 494 509 * Do the reservation for the relocation root creation 495 510 */ ··· 513 498 reloc_reserved = true; 514 499 } 515 500 516 - ret = btrfs_block_rsv_add(root, &fs_info->trans_block_rsv, 517 - num_bytes, flush); 501 + ret = btrfs_block_rsv_add(root, rsv, num_bytes, flush); 502 + if (ret) 503 + goto reserve_fail; 504 + if (delayed_refs_bytes) { 505 + btrfs_migrate_to_delayed_refs_rsv(fs_info, rsv, 506 + delayed_refs_bytes); 507 + num_bytes -= delayed_refs_bytes; 508 + } 509 + } else if (num_items == 0 && flush == BTRFS_RESERVE_FLUSH_ALL && 510 + !delayed_refs_rsv->full) { 511 + /* 512 + * Some people call with btrfs_start_transaction(root, 0) 513 + * because they can be throttled, but have some other mechanism 514 + * for reserving space. We still want these guys to refill the 515 + * delayed block_rsv so just add 1 items worth of reservation 516 + * here. 517 + */ 518 + ret = btrfs_delayed_refs_rsv_refill(fs_info, flush); 518 519 if (ret) 519 520 goto reserve_fail; 520 521 }