Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

nfs: avoid i_lock contention in nfs_clear_invalid_mapping

Multi-threaded buffered reads to the same file exposed significant
inode spinlock contention in nfs_clear_invalid_mapping().

Eliminate this spinlock contention by checking flags without locking,
instead using smp_rmb and smp_load_acquire accordingly, but then take
spinlock and double-check these inode flags.

Also refactor nfs_set_cache_invalid() slightly to use
smp_store_release() to pair with nfs_clear_invalid_mapping()'s
smp_load_acquire().

While this fix is beneficial for all multi-threaded buffered reads
issued by an NFS client, this issue was identified in the context of
surprisingly low LOCALIO performance with 4K multi-threaded buffered
read IO. This fix dramatically speeds up LOCALIO performance:

before: read: IOPS=1583k, BW=6182MiB/s (6482MB/s)(121GiB/20002msec)
after: read: IOPS=3046k, BW=11.6GiB/s (12.5GB/s)(232GiB/20001msec)

Fixes: 17dfeb911339 ("NFS: Fix races in nfs_revalidate_mapping")
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>

authored by

Mike Snitzer and committed by
Anna Schumaker
867da60d bc294086

+15 -5
+15 -5
fs/nfs/inode.c
··· 205 205 nfs_fscache_invalidate(inode, 0); 206 206 flags &= ~NFS_INO_REVAL_FORCED; 207 207 208 - nfsi->cache_validity |= flags; 208 + flags |= nfsi->cache_validity; 209 + if (inode->i_mapping->nrpages == 0) 210 + flags &= ~NFS_INO_INVALID_DATA; 209 211 210 - if (inode->i_mapping->nrpages == 0) { 211 - nfsi->cache_validity &= ~NFS_INO_INVALID_DATA; 212 - nfs_ooo_clear(nfsi); 213 - } else if (nfsi->cache_validity & NFS_INO_INVALID_DATA) { 212 + /* pairs with nfs_clear_invalid_mapping()'s smp_load_acquire() */ 213 + smp_store_release(&nfsi->cache_validity, flags); 214 + 215 + if (inode->i_mapping->nrpages == 0 || 216 + nfsi->cache_validity & NFS_INO_INVALID_DATA) { 214 217 nfs_ooo_clear(nfsi); 215 218 } 216 219 trace_nfs_set_cache_invalid(inode, 0); ··· 1424 1421 TASK_KILLABLE|TASK_FREEZABLE_UNSAFE); 1425 1422 if (ret) 1426 1423 goto out; 1424 + smp_rmb(); /* pairs with smp_wmb() below */ 1425 + if (test_bit(NFS_INO_INVALIDATING, bitlock)) 1426 + continue; 1427 + /* pairs with nfs_set_cache_invalid()'s smp_store_release() */ 1428 + if (!(smp_load_acquire(&nfsi->cache_validity) & NFS_INO_INVALID_DATA)) 1429 + goto out; 1430 + /* Slow-path that double-checks with spinlock held */ 1427 1431 spin_lock(&inode->i_lock); 1428 1432 if (test_bit(NFS_INO_INVALIDATING, bitlock)) { 1429 1433 spin_unlock(&inode->i_lock);