Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm/page-writeback.c: do not count anon pages as dirtyable memory

The VM is currently heavily tuned to avoid swapping. Whether that is
good or bad is a separate discussion, but as long as the VM won't swap
to make room for dirty cache, we can not consider anonymous pages when
calculating the amount of dirtyable memory, the baseline to which
dirty_background_ratio and dirty_ratio are applied.

A simple workload that occupies a significant size (40+%, depending on
memory layout, storage speeds etc.) of memory with anon/tmpfs pages and
uses the remainder for a streaming writer demonstrates this problem. In
that case, the actual cache pages are a small fraction of what is
considered dirtyable overall, which results in an relatively large
portion of the cache pages to be dirtied. As kswapd starts rotating
these, random tasks enter direct reclaim and stall on IO.

Only consider free pages and file pages dirtyable.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reported-by: Tejun Heo <tj@kernel.org>
Tested-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Michal Hocko <mhocko@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Johannes Weiner and committed by
Linus Torvalds
a1c3bfb2 a804552b

+5 -27
-2
include/linux/vmstat.h
··· 142 142 return x; 143 143 } 144 144 145 - extern unsigned long global_reclaimable_pages(void); 146 - 147 145 #ifdef CONFIG_NUMA 148 146 /* 149 147 * Determine the per node value of a stat item. This function
-1
mm/internal.h
··· 83 83 */ 84 84 extern int isolate_lru_page(struct page *page); 85 85 extern void putback_lru_page(struct page *page); 86 - extern unsigned long zone_reclaimable_pages(struct zone *zone); 87 86 extern bool zone_reclaimable(struct zone *zone); 88 87 89 88 /*
+4 -2
mm/page-writeback.c
··· 205 205 nr_pages = zone_page_state(zone, NR_FREE_PAGES); 206 206 nr_pages -= min(nr_pages, zone->dirty_balance_reserve); 207 207 208 - nr_pages += zone_reclaimable_pages(zone); 208 + nr_pages += zone_page_state(zone, NR_INACTIVE_FILE); 209 + nr_pages += zone_page_state(zone, NR_ACTIVE_FILE); 209 210 210 211 return nr_pages; 211 212 } ··· 259 258 x = global_page_state(NR_FREE_PAGES); 260 259 x -= min(x, dirty_balance_reserve); 261 260 262 - x += global_reclaimable_pages(); 261 + x += global_page_state(NR_INACTIVE_FILE); 262 + x += global_page_state(NR_ACTIVE_FILE); 263 263 264 264 if (!vm_highmem_is_dirtyable) 265 265 x -= highmem_dirtyable_memory(x);
+1 -22
mm/vmscan.c
··· 147 147 } 148 148 #endif 149 149 150 - unsigned long zone_reclaimable_pages(struct zone *zone) 150 + static unsigned long zone_reclaimable_pages(struct zone *zone) 151 151 { 152 152 int nr; 153 153 ··· 3313 3313 3314 3314 trace_mm_vmscan_wakeup_kswapd(pgdat->node_id, zone_idx(zone), order); 3315 3315 wake_up_interruptible(&pgdat->kswapd_wait); 3316 - } 3317 - 3318 - /* 3319 - * The reclaimable count would be mostly accurate. 3320 - * The less reclaimable pages may be 3321 - * - mlocked pages, which will be moved to unevictable list when encountered 3322 - * - mapped pages, which may require several travels to be reclaimed 3323 - * - dirty pages, which is not "instantly" reclaimable 3324 - */ 3325 - unsigned long global_reclaimable_pages(void) 3326 - { 3327 - int nr; 3328 - 3329 - nr = global_page_state(NR_ACTIVE_FILE) + 3330 - global_page_state(NR_INACTIVE_FILE); 3331 - 3332 - if (get_nr_swap_pages() > 0) 3333 - nr += global_page_state(NR_ACTIVE_ANON) + 3334 - global_page_state(NR_INACTIVE_ANON); 3335 - 3336 - return nr; 3337 3316 } 3338 3317 3339 3318 #ifdef CONFIG_HIBERNATION