Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

btrfs: zoned: count fresh BG region as zone unusable

The naming of space_info->active_total_bytes is misleading. It counts
not only active block groups but also full ones which are previously
active but now inactive. That confusion results in a bug not counting
the full BGs into active_total_bytes on mount time.

For a background, there are three kinds of block groups in terms of
activation.

1. Block groups never activated
2. Block groups currently active
3. Block groups previously active and currently inactive (due to fully
written or zone finish)

What we really wanted to exclude from "total_bytes" is the total size of
BGs #1. They seem empty and allocatable but since they are not activated,
we cannot rely on them to do the space reservation.

And, since BGs #1 never get activated, they should have no "used",
"reserved" and "pinned" bytes.

OTOH, BGs #3 can be counted in the "total", since they are already full
we cannot allocate from them anyway. For them, "total_bytes == used +
reserved + pinned + zone_unusable" should hold.

Tracking #2 and #3 as "active_total_bytes" (current implementation) is
confusing. And, tracking #1 and subtract that properly from "total_bytes"
every time you need space reservation is cumbersome.

Instead, we can count the whole region of a newly allocated block group as
zone_unusable. Then, once that block group is activated, release
[0 .. zone_capacity] from the zone_unusable counters. With this, we can
eliminate the confusing ->active_total_bytes and the code will be common
among regular and the zoned mode. Also, no additional counter is needed
with this approach.

Fixes: 6a921de58992 ("btrfs: zoned: introduce space_info->active_total_bytes")
CC: stable@vger.kernel.org # 6.1+
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>

authored by

Naohiro Aota and committed by
David Sterba
fa2068d7 df384da5

+26 -6
+7 -1
fs/btrfs/free-space-cache.c
··· 2693 2693 bg_reclaim_threshold = READ_ONCE(sinfo->bg_reclaim_threshold); 2694 2694 2695 2695 spin_lock(&ctl->tree_lock); 2696 + /* Count initial region as zone_unusable until it gets activated. */ 2696 2697 if (!used) 2697 2698 to_free = size; 2699 + else if (initial && 2700 + test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &block_group->fs_info->flags) && 2701 + (block_group->flags & (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_SYSTEM))) 2702 + to_free = 0; 2698 2703 else if (initial) 2699 2704 to_free = block_group->zone_capacity; 2700 2705 else if (offset >= block_group->alloc_offset) ··· 2727 2722 reclaimable_unusable = block_group->zone_unusable - 2728 2723 (block_group->length - block_group->zone_capacity); 2729 2724 /* All the region is now unusable. Mark it as unused and reclaim */ 2730 - if (block_group->zone_unusable == block_group->length) { 2725 + if (block_group->zone_unusable == block_group->length && 2726 + block_group->alloc_offset) { 2731 2727 btrfs_mark_bg_unused(block_group); 2732 2728 } else if (bg_reclaim_threshold && 2733 2729 reclaimable_unusable >=
+19 -5
fs/btrfs/zoned.c
··· 1580 1580 return; 1581 1581 1582 1582 WARN_ON(cache->bytes_super != 0); 1583 - unusable = (cache->alloc_offset - cache->used) + 1584 - (cache->length - cache->zone_capacity); 1585 - free = cache->zone_capacity - cache->alloc_offset; 1583 + 1584 + /* Check for block groups never get activated */ 1585 + if (test_bit(BTRFS_FS_ACTIVE_ZONE_TRACKING, &cache->fs_info->flags) && 1586 + cache->flags & (BTRFS_BLOCK_GROUP_METADATA | BTRFS_BLOCK_GROUP_SYSTEM) && 1587 + !test_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &cache->runtime_flags) && 1588 + cache->alloc_offset == 0) { 1589 + unusable = cache->length; 1590 + free = 0; 1591 + } else { 1592 + unusable = (cache->alloc_offset - cache->used) + 1593 + (cache->length - cache->zone_capacity); 1594 + free = cache->zone_capacity - cache->alloc_offset; 1595 + } 1586 1596 1587 1597 /* We only need ->free_space in ALLOC_SEQ block groups */ 1588 1598 cache->cached = BTRFS_CACHE_FINISHED; ··· 1911 1901 1912 1902 /* Successfully activated all the zones */ 1913 1903 set_bit(BLOCK_GROUP_FLAG_ZONE_IS_ACTIVE, &block_group->runtime_flags); 1914 - space_info->active_total_bytes += block_group->length; 1904 + WARN_ON(block_group->alloc_offset != 0); 1905 + if (block_group->zone_unusable == block_group->length) { 1906 + block_group->zone_unusable = block_group->length - block_group->zone_capacity; 1907 + space_info->bytes_zone_unusable -= block_group->zone_capacity; 1908 + } 1915 1909 spin_unlock(&block_group->lock); 1916 1910 btrfs_try_granting_tickets(fs_info, space_info); 1917 1911 spin_unlock(&space_info->lock); ··· 2279 2265 u64 avail; 2280 2266 2281 2267 spin_lock(&block_group->lock); 2282 - if (block_group->reserved || 2268 + if (block_group->reserved || block_group->alloc_offset == 0 || 2283 2269 (block_group->flags & BTRFS_BLOCK_GROUP_SYSTEM)) { 2284 2270 spin_unlock(&block_group->lock); 2285 2271 continue;