Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

vfs: Add sysctl vfs_cache_pressure_denom for bulk file operations

On our HDFS servers with 12 HDDs per server, a HDFS datanode[0] startup
involves scanning all files and caching their metadata (including dentries
and inodes) in memory. Each HDD contains approximately 2 million files,
resulting in a total of ~20 million cached dentries after initialization.

To minimize dentry reclamation, we set vfs_cache_pressure to 1. Despite
this configuration, memory pressure conditions can still trigger
reclamation of up to 50% of cached dentries, reducing the cache from 20
million to approximately 10 million entries. During the subsequent cache
rebuild period, any HDFS datanode restart operation incurs substantial
latency penalties until full cache recovery completes.

To maintain service stability, we need to preserve more dentries during
memory reclamation. The current minimum reclaim ratio (1/100 of total
dentries) remains too aggressive for our workload. This patch introduces
vfs_cache_pressure_denom for more granular cache pressure control. The
configuration [vfs_cache_pressure=1, vfs_cache_pressure_denom=10000]
effectively maintains the full 20 million dentry cache under memory
pressure, preventing datanode restart performance degradation.

Link: https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html#NameNode+and+DataNodes [0]

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/20250511083624.9305-1-laoar.shao@gmail.com
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>

authored by

Yafang Shao and committed by
Christian Brauner
e7b9cea7 8d911700

+31 -12
+21 -11
Documentation/admin-guide/sysctl/vm.rst
··· 75 75 - unprivileged_userfaultfd 76 76 - user_reserve_kbytes 77 77 - vfs_cache_pressure 78 + - vfs_cache_pressure_denom 78 79 - watermark_boost_factor 79 80 - watermark_scale_factor 80 81 - zone_reclaim_mode ··· 1018 1017 This percentage value controls the tendency of the kernel to reclaim 1019 1018 the memory which is used for caching of directory and inode objects. 1020 1019 1021 - At the default value of vfs_cache_pressure=100 the kernel will attempt to 1022 - reclaim dentries and inodes at a "fair" rate with respect to pagecache and 1023 - swapcache reclaim. Decreasing vfs_cache_pressure causes the kernel to prefer 1024 - to retain dentry and inode caches. When vfs_cache_pressure=0, the kernel will 1025 - never reclaim dentries and inodes due to memory pressure and this can easily 1026 - lead to out-of-memory conditions. Increasing vfs_cache_pressure beyond 100 1027 - causes the kernel to prefer to reclaim dentries and inodes. 1020 + At the default value of vfs_cache_pressure=vfs_cache_pressure_denom the kernel 1021 + will attempt to reclaim dentries and inodes at a "fair" rate with respect to 1022 + pagecache and swapcache reclaim. Decreasing vfs_cache_pressure causes the 1023 + kernel to prefer to retain dentry and inode caches. When vfs_cache_pressure=0, 1024 + the kernel will never reclaim dentries and inodes due to memory pressure and 1025 + this can easily lead to out-of-memory conditions. Increasing vfs_cache_pressure 1026 + beyond vfs_cache_pressure_denom causes the kernel to prefer to reclaim dentries 1027 + and inodes. 1028 1028 1029 - Increasing vfs_cache_pressure significantly beyond 100 may have negative 1030 - performance impact. Reclaim code needs to take various locks to find freeable 1031 - directory and inode objects. With vfs_cache_pressure=1000, it will look for 1032 - ten times more freeable objects than there are. 1029 + Increasing vfs_cache_pressure significantly beyond vfs_cache_pressure_denom may 1030 + have negative performance impact. Reclaim code needs to take various locks to 1031 + find freeable directory and inode objects. When vfs_cache_pressure equals 1032 + (10 * vfs_cache_pressure_denom), it will look for ten times more freeable 1033 + objects than there are. 1033 1034 1035 + Note: This setting should always be used together with vfs_cache_pressure_denom. 1036 + 1037 + vfs_cache_pressure_denom 1038 + ======================== 1039 + 1040 + Defaults to 100 (minimum allowed value). Requires corresponding 1041 + vfs_cache_pressure setting to take effect. 1034 1042 1035 1043 watermark_boost_factor 1036 1044 ======================
+10 -1
fs/dcache.c
··· 74 74 * arbitrary, since it's serialized on rename_lock 75 75 */ 76 76 static int sysctl_vfs_cache_pressure __read_mostly = 100; 77 + static int sysctl_vfs_cache_pressure_denom __read_mostly = 100; 77 78 78 79 unsigned long vfs_pressure_ratio(unsigned long val) 79 80 { 80 - return mult_frac(val, sysctl_vfs_cache_pressure, 100); 81 + return mult_frac(val, sysctl_vfs_cache_pressure, sysctl_vfs_cache_pressure_denom); 81 82 } 82 83 EXPORT_SYMBOL_GPL(vfs_pressure_ratio); 83 84 ··· 225 224 .mode = 0644, 226 225 .proc_handler = proc_dointvec_minmax, 227 226 .extra1 = SYSCTL_ZERO, 227 + }, 228 + { 229 + .procname = "vfs_cache_pressure_denom", 230 + .data = &sysctl_vfs_cache_pressure_denom, 231 + .maxlen = sizeof(sysctl_vfs_cache_pressure_denom), 232 + .mode = 0644, 233 + .proc_handler = proc_dointvec_minmax, 234 + .extra1 = SYSCTL_ONE_HUNDRED, 228 235 }, 229 236 }; 230 237