Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

fs.h: Optimize file struct to prevent false sharing

In the syscall test of UnixBench, performance regression occurred due
to false sharing.

The lock and atomic members, including file::f_lock, file::f_count and
file::f_pos_lock are highly contended and frequently updated in the
high-concurrency test scenarios. perf c2c indentified one affected
read access, file::f_op.
To prevent false sharing, the layout of file struct is changed as
following
(A) f_lock, f_count and f_pos_lock are put together to share the same
cache line.
(B) The read mostly members, including f_path, f_inode, f_op are put
into a separate cache line.
(C) f_mode is put together with f_count, since they are used frequently
at the same time.
Due to '__randomize_layout' attribute of file struct, the updated layout
only can be effective when CONFIG_RANDSTRUCT_NONE is 'y'.

The optimization has been validated in the syscall test of UnixBench.
performance gain is 30~50%. Furthermore, to confirm the optimization
effectiveness on the other codes path, the results of fsdisk, fsbuffer
and fstime are also shown.

Here are the detailed test results of unixbench.

Command: numactl -C 3-18 ./Run -c 16 syscall fsbuffer fstime fsdisk

Without Patch
------------------------------------------------------------------------
File Copy 1024 bufsize 2000 maxblocks 875052.1 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 235484.0 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 2815153.5 KBps (30.0 s, 2 samples)
System Call Overhead 5772268.3 lps (10.0 s, 7 samples)

System Benchmarks Partial Index BASELINE RESULT INDEX
File Copy 1024 bufsize 2000 maxblocks 3960.0 875052.1 2209.7
File Copy 256 bufsize 500 maxblocks 1655.0 235484.0 1422.9
File Copy 4096 bufsize 8000 maxblocks 5800.0 2815153.5 4853.7
System Call Overhead 15000.0 5772268.3 3848.2
========
System Benchmarks Index Score (Partial Only) 2768.3

With Patch
------------------------------------------------------------------------
File Copy 1024 bufsize 2000 maxblocks 1009977.2 KBps (30.0 s, 2 samples)
File Copy 256 bufsize 500 maxblocks 264765.9 KBps (30.0 s, 2 samples)
File Copy 4096 bufsize 8000 maxblocks 3052236.0 KBps (30.0 s, 2 samples)
System Call Overhead 8237404.4 lps (10.0 s, 7 samples)

System Benchmarks Partial Index BASELINE RESULT INDEX
File Copy 1024 bufsize 2000 maxblocks 3960.0 1009977.2 2550.4
File Copy 256 bufsize 500 maxblocks 1655.0 264765.9 1599.8
File Copy 4096 bufsize 8000 maxblocks 5800.0 3052236.0 5262.5
System Call Overhead 15000.0 8237404.4 5491.6
========
System Benchmarks Index Score (Partial Only) 3295.3

Signed-off-by: chenzhiyin <zhiyin.chen@intel.com>
Message-Id: <20230601092400.27162-1-zhiyin.chen@intel.com>
Signed-off-by: Christian Brauner <brauner@kernel.org>

authored by

chenzhiyin and committed by
Christian Brauner
a7bc2e8d d0e13540

+11 -5
+11 -5
include/linux/fs.h
··· 956 956 index < ra->start + ra->size); 957 957 } 958 958 959 + /* 960 + * f_{lock,count,pos_lock} members can be highly contended and share 961 + * the same cacheline. f_{lock,mode} are very frequently used together 962 + * and so share the same cacheline as well. The read-mostly 963 + * f_{path,inode,op} are kept on a separate cacheline. 964 + */ 959 965 struct file { 960 966 union { 961 967 struct llist_node f_llist; 962 968 struct rcu_head f_rcuhead; 963 969 unsigned int f_iocb_flags; 964 970 }; 965 - struct path f_path; 966 - struct inode *f_inode; /* cached value */ 967 - const struct file_operations *f_op; 968 971 969 972 /* 970 973 * Protects f_ep, f_flags. 971 974 * Must not be taken from IRQ context. 972 975 */ 973 976 spinlock_t f_lock; 974 - atomic_long_t f_count; 975 - unsigned int f_flags; 976 977 fmode_t f_mode; 978 + atomic_long_t f_count; 977 979 struct mutex f_pos_lock; 978 980 loff_t f_pos; 981 + unsigned int f_flags; 979 982 struct fown_struct f_owner; 980 983 const struct cred *f_cred; 981 984 struct file_ra_state f_ra; 985 + struct path f_path; 986 + struct inode *f_inode; /* cached value */ 987 + const struct file_operations *f_op; 982 988 983 989 u64 f_version; 984 990 #ifdef CONFIG_SECURITY