Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

xfs: run an eofblocks scan on ENOSPC/EDQUOT

From: Brian Foster <bfoster@redhat.com>

Speculative preallocation and and the associated throttling metrics
assume we're working with large files on large filesystems. Users have
reported inefficiencies in these mechanisms when we happen to be dealing
with large files on smaller filesystems. This can occur because while
prealloc throttling is aggressive under low free space conditions, it is
not active until we reach 5% free space or less.

For example, a 40GB filesystem has enough space for several files large
enough to have multi-GB preallocations at any given time. If those files
are slow growing, they might reserve preallocation for long periods of
time as well as avoid the background scanner due to frequent
modification. If a new file is written under these conditions, said file
has no access to this already reserved space and premature ENOSPC is
imminent.

To handle this scenario, modify the buffered write ENOSPC handling and
retry sequence to invoke an eofblocks scan. In the smaller filesystem
scenario, the eofblocks scan resets the usage of preallocation such that
when the 5% free space threshold is met, throttling effectively takes
over to provide fair and efficient preallocation until legitimate
ENOSPC.

The eofblocks scan is selective based on the nature of the failure. For
example, an EDQUOT failure in a particular quota will use a filtered
scan for that quota. Because we don't know which quota might have caused
an allocation failure at any given time, we include each applicable
quota determined to be under low free space conditions in the scan.

Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>

authored by

Brian Foster and committed by
Dave Chinner
dc06f398 f4526397

+87 -4
+15
fs/xfs/xfs_dquot.h
··· 139 139 } 140 140 } 141 141 142 + /* 143 + * Check whether a dquot is under low free space conditions. We assume the quota 144 + * is enabled and enforced. 145 + */ 146 + static inline bool xfs_dquot_lowsp(struct xfs_dquot *dqp) 147 + { 148 + int64_t freesp; 149 + 150 + freesp = be64_to_cpu(dqp->q_core.d_blk_hardlimit) - dqp->q_res_bcount; 151 + if (freesp < dqp->q_low_space[XFS_QLOWSP_1_PCNT]) 152 + return true; 153 + 154 + return false; 155 + } 156 + 142 157 #define XFS_DQ_IS_LOCKED(dqp) (mutex_is_locked(&((dqp)->q_qlock))) 143 158 #define XFS_DQ_IS_DIRTY(dqp) ((dqp)->dq_flags & XFS_DQ_DIRTY) 144 159 #define XFS_QM_ISUDQ(dqp) ((dqp)->dq_flags & XFS_DQ_USER)
+19 -4
fs/xfs/xfs_file.c
··· 38 38 #include "xfs_trace.h" 39 39 #include "xfs_log.h" 40 40 #include "xfs_dinode.h" 41 + #include "xfs_icache.h" 41 42 42 43 #include <linux/aio.h> 43 44 #include <linux/dcache.h> ··· 690 689 ret = generic_perform_write(file, from, pos); 691 690 if (likely(ret >= 0)) 692 691 iocb->ki_pos = pos + ret; 692 + 693 693 /* 694 - * If we just got an ENOSPC, try to write back all dirty inodes to 695 - * convert delalloc space to free up some of the excess reserved 696 - * metadata space. 694 + * If we hit a space limit, try to free up some lingering preallocated 695 + * space before returning an error. In the case of ENOSPC, first try to 696 + * write back all dirty inodes to free up some of the excess reserved 697 + * metadata space. This reduces the chances that the eofblocks scan 698 + * waits on dirty mappings. Since xfs_flush_inodes() is serialized, this 699 + * also behaves as a filter to prevent too many eofblocks scans from 700 + * running at the same time. 697 701 */ 698 - if (ret == -ENOSPC && !enospc) { 702 + if (ret == -EDQUOT && !enospc) { 703 + enospc = xfs_inode_free_quota_eofblocks(ip); 704 + if (enospc) 705 + goto write_retry; 706 + } else if (ret == -ENOSPC && !enospc) { 707 + struct xfs_eofblocks eofb = {0}; 708 + 699 709 enospc = 1; 700 710 xfs_flush_inodes(ip->i_mount); 711 + eofb.eof_scan_owner = ip->i_ino; /* for locking */ 712 + eofb.eof_flags = XFS_EOF_FLAGS_SYNC; 713 + xfs_icache_free_eofblocks(ip->i_mount, &eofb); 701 714 goto write_retry; 702 715 } 703 716
+52
fs/xfs/xfs_icache.c
··· 33 33 #include "xfs_trace.h" 34 34 #include "xfs_icache.h" 35 35 #include "xfs_bmap_util.h" 36 + #include "xfs_quota.h" 37 + #include "xfs_dquot_item.h" 38 + #include "xfs_dquot.h" 36 39 37 40 #include <linux/kthread.h> 38 41 #include <linux/freezer.h> ··· 1301 1298 1302 1299 return xfs_inode_ag_iterator_tag(mp, xfs_inode_free_eofblocks, flags, 1303 1300 eofb, XFS_ICI_EOFBLOCKS_TAG); 1301 + } 1302 + 1303 + /* 1304 + * Run eofblocks scans on the quotas applicable to the inode. For inodes with 1305 + * multiple quotas, we don't know exactly which quota caused an allocation 1306 + * failure. We make a best effort by including each quota under low free space 1307 + * conditions (less than 1% free space) in the scan. 1308 + */ 1309 + int 1310 + xfs_inode_free_quota_eofblocks( 1311 + struct xfs_inode *ip) 1312 + { 1313 + int scan = 0; 1314 + struct xfs_eofblocks eofb = {0}; 1315 + struct xfs_dquot *dq; 1316 + 1317 + ASSERT(xfs_isilocked(ip, XFS_IOLOCK_EXCL)); 1318 + 1319 + /* 1320 + * Set the scan owner to avoid a potential livelock. Otherwise, the scan 1321 + * can repeatedly trylock on the inode we're currently processing. We 1322 + * run a sync scan to increase effectiveness and use the union filter to 1323 + * cover all applicable quotas in a single scan. 1324 + */ 1325 + eofb.eof_scan_owner = ip->i_ino; 1326 + eofb.eof_flags = XFS_EOF_FLAGS_UNION|XFS_EOF_FLAGS_SYNC; 1327 + 1328 + if (XFS_IS_UQUOTA_ENFORCED(ip->i_mount)) { 1329 + dq = xfs_inode_dquot(ip, XFS_DQ_USER); 1330 + if (dq && xfs_dquot_lowsp(dq)) { 1331 + eofb.eof_uid = VFS_I(ip)->i_uid; 1332 + eofb.eof_flags |= XFS_EOF_FLAGS_UID; 1333 + scan = 1; 1334 + } 1335 + } 1336 + 1337 + if (XFS_IS_GQUOTA_ENFORCED(ip->i_mount)) { 1338 + dq = xfs_inode_dquot(ip, XFS_DQ_GROUP); 1339 + if (dq && xfs_dquot_lowsp(dq)) { 1340 + eofb.eof_gid = VFS_I(ip)->i_gid; 1341 + eofb.eof_flags |= XFS_EOF_FLAGS_GID; 1342 + scan = 1; 1343 + } 1344 + } 1345 + 1346 + if (scan) 1347 + xfs_icache_free_eofblocks(ip->i_mount, &eofb); 1348 + 1349 + return scan; 1304 1350 } 1305 1351 1306 1352 void
+1
fs/xfs/xfs_icache.h
··· 58 58 void xfs_inode_set_eofblocks_tag(struct xfs_inode *ip); 59 59 void xfs_inode_clear_eofblocks_tag(struct xfs_inode *ip); 60 60 int xfs_icache_free_eofblocks(struct xfs_mount *, struct xfs_eofblocks *); 61 + int xfs_inode_free_quota_eofblocks(struct xfs_inode *ip); 61 62 void xfs_eofblocks_worker(struct work_struct *); 62 63 63 64 int xfs_inode_ag_iterator(struct xfs_mount *mp,