Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Btrfs: prevent RAID level downgrades when space is low

The extent allocator has code that allows us to fill
allocations from any available block group, even if it doesn't
match the raid level we've requested.

This was put in because adding a new drive to a filesystem
made with the default mkfs options actually upgrades the metadata from
single spindle dup to full RAID1.

But, the code also allows us to allocate from a raid0 chunk when we
really want a raid1 or raid10 chunk. This can cause big trouble because
mkfs creates a small (4MB) raid0 chunk for data and metadata which then
goes unused for raid1/raid10 installs.

The allocator will happily wander in and allocate from that chunk when
things get tight, which is not correct.

The fix here is to make sure that we provide duplication when the
caller has asked for it. It does all the dups to be any raid level,
which preserves the dup->raid1 upgrade abilities.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

+19 -1
+19 -1
fs/btrfs/extent-tree.c
··· 4943 4943 btrfs_get_block_group(block_group); 4944 4944 search_start = block_group->key.objectid; 4945 4945 4946 + /* 4947 + * this can happen if we end up cycling through all the 4948 + * raid types, but we want to make sure we only allocate 4949 + * for the proper type. 4950 + */ 4951 + if (!block_group_bits(block_group, data)) { 4952 + u64 extra = BTRFS_BLOCK_GROUP_DUP | 4953 + BTRFS_BLOCK_GROUP_RAID1 | 4954 + BTRFS_BLOCK_GROUP_RAID10; 4955 + 4956 + /* 4957 + * if they asked for extra copies and this block group 4958 + * doesn't provide them, bail. This does allow us to 4959 + * fill raid0 from raid1. 4960 + */ 4961 + if ((data & extra) && !(block_group->flags & extra)) 4962 + goto loop; 4963 + } 4964 + 4946 4965 have_block_group: 4947 4966 if (unlikely(block_group->cached == BTRFS_CACHE_NO)) { 4948 4967 u64 free_percent; ··· 8292 8273 break; 8293 8274 if (ret != 0) 8294 8275 goto error; 8295 - 8296 8276 leaf = path->nodes[0]; 8297 8277 btrfs_item_key_to_cpu(leaf, &found_key, path->slots[0]); 8298 8278 cache = kzalloc(sizeof(*cache), GFP_NOFS);