Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Revert "ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits"

This reverts commit 32c0869370194ae5ac9f9f501953ef693040f6a1.

The reverted commit was intended to remove a dead check however it was observed
that this check was actually being used to exit early instead of looping
sbi->s_mb_max_to_scan times when we are able to find a free extent bigger than
the goal extent. Due to this, a my performance tests (fsmark, parallel file
writes in a highly fragmented FS) were seeing a 2x-3x regression.

Example, the default value of the following variables is:

sbi->s_mb_max_to_scan = 200
sbi->s_mb_min_to_scan = 10

In ext4_mb_check_limits() if we find an extent smaller than goal, then we return
early and try again. This loop will go on until we have processed
sbi->s_mb_max_to_scan(=200) number of free extents at which point we exit and
just use whatever we have even if it is smaller than goal extent.

Now, the regression comes when we find an extent bigger than goal. Earlier, in
this case we would loop only sbi->s_mb_min_to_scan(=10) times and then just use
the bigger extent. However with commit 32c08693 that check was removed and hence
we would loop sbi->s_mb_max_to_scan(=200) times even though we have a big enough
free extent to satisfy the request. The only time we would exit early would be
when the free extent is *exactly* the size of our goal, which is pretty uncommon
occurrence and so we would almost always end up looping 200 times.

Hence, revert the commit by adding the check back to fix the regression. Also
add a comment to outline this policy.

Fixes: 32c086937019 ("ext4: remove ac->ac_found > sbi->s_mb_min_to_scan dead check in ext4_mb_check_limits")
Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Kemeng Shi <shikemeng@huaweicloud.com>
Link: https://lore.kernel.org/r/ddcae9658e46880dfec2fb0aa61d01fb3353d202.1685449706.git.ojaswin@linux.ibm.com
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

authored by

Ojaswin Mujoo and committed by
Theodore Ts'o
3582e745 eb1f822c

+15 -1
+15 -1
fs/ext4/mballoc.c
··· 2062 2062 if (bex->fe_len < gex->fe_len) 2063 2063 return; 2064 2064 2065 - if (finish_group) 2065 + if (finish_group || ac->ac_found > sbi->s_mb_min_to_scan) 2066 2066 ext4_mb_use_best_found(ac, e4b); 2067 2067 } 2068 2068 ··· 2073 2073 * previous found extent and if new one is better, then it's stored 2074 2074 * in the context. Later, the best found extent will be used, if 2075 2075 * mballoc can't find good enough extent. 2076 + * 2077 + * The algorithm used is roughly as follows: 2078 + * 2079 + * * If free extent found is exactly as big as goal, then 2080 + * stop the scan and use it immediately 2081 + * 2082 + * * If free extent found is smaller than goal, then keep retrying 2083 + * upto a max of sbi->s_mb_max_to_scan times (default 200). After 2084 + * that stop scanning and use whatever we have. 2085 + * 2086 + * * If free extent found is bigger than goal, then keep retrying 2087 + * upto a max of sbi->s_mb_min_to_scan times (default 10) before 2088 + * stopping the scan and using the extent. 2089 + * 2076 2090 * 2077 2091 * FIXME: real allocation policy is to be designed yet! 2078 2092 */