Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

xfs: cache minimum realtime summary level

The realtime summary is a two-dimensional array on disk, effectively:

u32 rsum[log2(number of realtime extents) + 1][number of blocks in the bitmap]

rsum[log][bbno] is the number of extents of size 2**log which start in
bitmap block bbno.

xfs_rtallocate_extent_near() uses xfs_rtany_summary() to check whether
rsum[log][bbno] != 0 for any log level. However, the summary array is
stored in row-major order (i.e., like an array in C), so all of these
entries are not adjacent, but rather spread across the entire summary
file. In the worst case (a full bitmap block), xfs_rtany_summary() has
to check every level.

This means that on a moderately-used realtime device, an allocation will
waste a lot of time finding, reading, and releasing buffers for the
realtime summary. In particular, one of our storage services (which runs
on servers with 8 very slow CPUs and 15 8 TB XFS realtime filesystems)
spends almost 5% of its CPU cycles in xfs_rtbuf_get() and
xfs_trans_brelse() called from xfs_rtany_summary().

One solution would be to also store the summary with the dimensions
swapped. However, this would require a disk format change to a very old
component of XFS.

Instead, we can cache the minimum size which contains any extents. We do
so lazily; rather than guaranteeing that the cache contains the precise
minimum, it always contains a loose lower bound which we tighten when we
read or update a summary block. This only uses a few kilobytes of memory
and is already serialized via the realtime bitmap and summary inode
locks, so the cost is minimal. With this change, the same workload only
spends 0.2% of its CPU cycles in the realtime allocator.

Signed-off-by: Omar Sandoval <osandov@fb.com>
Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>

authored by

Omar Sandoval and committed by
Darrick J. Wong
355e3532 2c2d9d3a

+34 -4
+6
fs/xfs/libxfs/xfs_rtbitmap.c
··· 505 505 uint first = (uint)((char *)sp - (char *)bp->b_addr); 506 506 507 507 *sp += delta; 508 + if (mp->m_rsum_cache) { 509 + if (*sp == 0 && log == mp->m_rsum_cache[bbno]) 510 + mp->m_rsum_cache[bbno]++; 511 + if (*sp != 0 && log < mp->m_rsum_cache[bbno]) 512 + mp->m_rsum_cache[bbno] = log; 513 + } 508 514 xfs_trans_log_buf(tp, bp, first, first + sizeof(*sp) - 1); 509 515 } 510 516 if (sum)
+7
fs/xfs/xfs_mount.h
··· 89 89 int m_logbsize; /* size of each log buffer */ 90 90 uint m_rsumlevels; /* rt summary levels */ 91 91 uint m_rsumsize; /* size of rt summary, bytes */ 92 + /* 93 + * Optional cache of rt summary level per bitmap block with the 94 + * invariant that m_rsum_cache[bbno] <= the minimum i for which 95 + * rsum[i][bbno] != 0. Reads and writes are serialized by the rsumip 96 + * inode lock. 97 + */ 98 + uint8_t *m_rsum_cache; 92 99 struct xfs_inode *m_rbmip; /* pointer to bitmap inode */ 93 100 struct xfs_inode *m_rsumip; /* pointer to summary inode */ 94 101 struct xfs_inode *m_rootip; /* pointer to root directory */
+21 -4
fs/xfs/xfs_rtalloc.c
··· 64 64 int log; /* loop counter, log2 of ext. size */ 65 65 xfs_suminfo_t sum; /* summary data */ 66 66 67 + /* There are no extents at levels < m_rsum_cache[bbno]. */ 68 + if (mp->m_rsum_cache && low < mp->m_rsum_cache[bbno]) 69 + low = mp->m_rsum_cache[bbno]; 70 + 67 71 /* 68 - * Loop over logs of extent sizes. Order is irrelevant. 72 + * Loop over logs of extent sizes. 69 73 */ 70 74 for (log = low; log <= high; log++) { 71 75 /* ··· 84 80 */ 85 81 if (sum) { 86 82 *stat = 1; 87 - return 0; 83 + goto out; 88 84 } 89 85 } 90 86 /* 91 87 * Found nothing, return failure. 92 88 */ 93 89 *stat = 0; 90 + out: 91 + /* There were no extents at levels < log. */ 92 + if (mp->m_rsum_cache && log > mp->m_rsum_cache[bbno]) 93 + mp->m_rsum_cache[bbno] = log; 94 94 return 0; 95 95 } 96 96 ··· 1195 1187 } 1196 1188 1197 1189 /* 1198 - * Get the bitmap and summary inodes into the mount structure 1199 - * at mount time. 1190 + * Get the bitmap and summary inodes and the summary cache into the mount 1191 + * structure at mount time. 1200 1192 */ 1201 1193 int /* error */ 1202 1194 xfs_rtmount_inodes( ··· 1219 1211 return error; 1220 1212 } 1221 1213 ASSERT(mp->m_rsumip != NULL); 1214 + /* 1215 + * The rsum cache is initialized to all zeroes, which is trivially a 1216 + * lower bound on the minimum level with any free extents. We can 1217 + * continue without the cache if it couldn't be allocated. 1218 + */ 1219 + mp->m_rsum_cache = kmem_zalloc_large(sbp->sb_rbmblocks, KM_SLEEP); 1220 + if (!mp->m_rsum_cache) 1221 + xfs_warn(mp, "could not allocate realtime summary cache"); 1222 1222 return 0; 1223 1223 } 1224 1224 ··· 1234 1218 xfs_rtunmount_inodes( 1235 1219 struct xfs_mount *mp) 1236 1220 { 1221 + kmem_free(mp->m_rsum_cache); 1237 1222 if (mp->m_rbmip) 1238 1223 xfs_irele(mp->m_rbmip); 1239 1224 if (mp->m_rsumip)