Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Btrfs: avoid syncing log in the fast fsync path when not necessary

Commit 3a8b36f37806 ("Btrfs: fix data loss in the fast fsync path") added
a performance regression for that causes an unnecessary sync of the log
trees (fs/subvol and root log trees) when 2 consecutive fsyncs are done
against a file, without no writes or any metadata updates to the inode in
between them and if a transaction is committed before the second fsync is
called.

Huang Ying reported this to lkml (https://lkml.org/lkml/2015/3/18/99)
after a test sysbench test that measured a -62% decrease of file io
requests per second for that tests' workload.

The test is:

echo performance > /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu2/cpufreq/scaling_governor
echo performance > /sys/devices/system/cpu/cpu3/cpufreq/scaling_governor
mkfs -t btrfs /dev/sda2
mount -t btrfs /dev/sda2 /fs/sda2
cd /fs/sda2
for ((i = 0; i < 1024; i++)); do fallocate -l 67108864 testfile.$i; done
sysbench --test=fileio --max-requests=0 --num-threads=4 --max-time=600 \
--file-test-mode=rndwr --file-total-size=68719476736 --file-io-mode=sync \
--file-num=1024 run

A test on kvm guest, running a debug kernel gave me the following results:

Without 3a8b36f378060d: 16.01 reqs/sec
With 3a8b36f378060d: 3.39 reqs/sec
With 3a8b36f378060d and this patch: 16.04 reqs/sec

Reported-by: Huang Ying <ying.huang@intel.com>
Tested-by: Huang, Ying <ying.huang@intel.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>

authored by

Filipe Manana and committed by
Chris Mason
b659ef02 1ab818b1

+23 -3
+6 -3
fs/btrfs/file.c
··· 1868 1868 struct btrfs_log_ctx ctx; 1869 1869 int ret = 0; 1870 1870 bool full_sync = 0; 1871 + const u64 len = end - start + 1; 1871 1872 1872 1873 trace_btrfs_sync_file(file, datasync); 1873 1874 ··· 1897 1896 * all extents are persisted and the respective file extent 1898 1897 * items are in the fs/subvol btree. 1899 1898 */ 1900 - ret = btrfs_wait_ordered_range(inode, start, end - start + 1); 1899 + ret = btrfs_wait_ordered_range(inode, start, len); 1901 1900 } else { 1902 1901 /* 1903 1902 * Start any new ordered operations before starting to log the ··· 1969 1968 */ 1970 1969 smp_mb(); 1971 1970 if (btrfs_inode_in_log(inode, root->fs_info->generation) || 1972 - (full_sync && BTRFS_I(inode)->last_trans <= 1973 - root->fs_info->last_trans_committed)) { 1971 + (BTRFS_I(inode)->last_trans <= 1972 + root->fs_info->last_trans_committed && 1973 + (full_sync || 1974 + !btrfs_have_ordered_extents_in_range(inode, start, len)))) { 1974 1975 /* 1975 1976 * We'v had everything committed since the last time we were 1976 1977 * modified so clear this flag in case it was set for whatever
+14
fs/btrfs/ordered-data.c
··· 837 837 return entry; 838 838 } 839 839 840 + bool btrfs_have_ordered_extents_in_range(struct inode *inode, 841 + u64 file_offset, 842 + u64 len) 843 + { 844 + struct btrfs_ordered_extent *oe; 845 + 846 + oe = btrfs_lookup_ordered_range(inode, file_offset, len); 847 + if (oe) { 848 + btrfs_put_ordered_extent(oe); 849 + return true; 850 + } 851 + return false; 852 + } 853 + 840 854 /* 841 855 * lookup and return any extent before 'file_offset'. NULL is returned 842 856 * if none is found
+3
fs/btrfs/ordered-data.h
··· 188 188 struct btrfs_ordered_extent *btrfs_lookup_ordered_range(struct inode *inode, 189 189 u64 file_offset, 190 190 u64 len); 191 + bool btrfs_have_ordered_extents_in_range(struct inode *inode, 192 + u64 file_offset, 193 + u64 len); 191 194 int btrfs_ordered_update_i_size(struct inode *inode, u64 offset, 192 195 struct btrfs_ordered_extent *ordered); 193 196 int btrfs_find_ordered_sum(struct inode *inode, u64 offset, u64 disk_bytenr,