Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Unevictable LRU Infrastructure

When the system contains lots of mlocked or otherwise unevictable pages,
the pageout code (kswapd) can spend lots of time scanning over these
pages. Worse still, the presence of lots of unevictable pages can confuse
kswapd into thinking that more aggressive pageout modes are required,
resulting in all kinds of bad behaviour.

Infrastructure to manage pages excluded from reclaim--i.e., hidden from
vmscan. Based on a patch by Larry Woodman of Red Hat. Reworked to
maintain "unevictable" pages on a separate per-zone LRU list, to "hide"
them from vmscan.

Kosaki Motohiro added the support for the memory controller unevictable
lru list.

Pages on the unevictable list have both PG_unevictable and PG_lru set.
Thus, PG_unevictable is analogous to and mutually exclusive with
PG_active--it specifies which LRU list the page is on.

The unevictable infrastructure is enabled by a new mm Kconfig option
[CONFIG_]UNEVICTABLE_LRU.

A new function 'page_evictable(page, vma)' in vmscan.c tests whether or
not a page may be evictable. Subsequent patches will add the various
!evictable tests. We'll want to keep these tests light-weight for use in
shrink_active_list() and, possibly, the fault path.

To avoid races between tasks putting pages [back] onto an LRU list and
tasks that might be moving the page from non-evictable to evictable state,
the new function 'putback_lru_page()' -- inverse to 'isolate_lru_page()'
-- tests the "evictability" of a page after placing it on the LRU, before
dropping the reference. If the page has become unevictable,
putback_lru_page() will redo the 'putback', thus moving the page to the
unevictable list. This way, we avoid "stranding" evictable pages on the
unevictable list.

[akpm@linux-foundation.org: fix fallout from out-of-order merge]
[riel@redhat.com: fix UNEVICTABLE_LRU and !PROC_PAGE_MONITOR build]
[nishimura@mxp.nes.nec.co.jp: remove redundant mapping check]
[kosaki.motohiro@jp.fujitsu.com: unevictable-lru-infrastructure: putback_lru_page()/unevictable page handling rework]
[kosaki.motohiro@jp.fujitsu.com: kill unnecessary lock_page() in vmscan.c]
[kosaki.motohiro@jp.fujitsu.com: revert migration change of unevictable lru infrastructure]
[kosaki.motohiro@jp.fujitsu.com: revert to unevictable-lru-infrastructure-kconfig-fix.patch]
[kosaki.motohiro@jp.fujitsu.com: restore patch failure of vmstat-unevictable-and-mlocked-pages-vm-events.patch]
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Debugged-by: Benjamin Kidwell <benjkidwell@yahoo.com>
Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Lee Schermerhorn and committed by
Linus Torvalds
894bc310 8a7a8544

+345 -73
+1 -1
include/linux/memcontrol.h
··· 34 34 gfp_t gfp_mask); 35 35 extern int mem_cgroup_cache_charge(struct page *page, struct mm_struct *mm, 36 36 gfp_t gfp_mask); 37 + extern void mem_cgroup_move_lists(struct page *page, enum lru_list lru); 37 38 extern void mem_cgroup_uncharge_page(struct page *page); 38 39 extern void mem_cgroup_uncharge_cache_page(struct page *page); 39 - extern void mem_cgroup_move_lists(struct page *page, bool active); 40 40 extern int mem_cgroup_shrink_usage(struct mm_struct *mm, gfp_t gfp_mask); 41 41 42 42 extern unsigned long mem_cgroup_isolate_pages(unsigned long nr_to_scan,
+16 -7
include/linux/mm_inline.h
··· 91 91 enum lru_list l = LRU_BASE; 92 92 93 93 list_del(&page->lru); 94 - if (PageActive(page)) { 95 - __ClearPageActive(page); 96 - l += LRU_ACTIVE; 94 + if (PageUnevictable(page)) { 95 + __ClearPageUnevictable(page); 96 + l = LRU_UNEVICTABLE; 97 + } else { 98 + if (PageActive(page)) { 99 + __ClearPageActive(page); 100 + l += LRU_ACTIVE; 101 + } 102 + l += page_is_file_cache(page); 97 103 } 98 - l += page_is_file_cache(page); 99 104 __dec_zone_state(zone, NR_LRU_BASE + l); 100 105 } 101 106 ··· 115 110 { 116 111 enum lru_list lru = LRU_BASE; 117 112 118 - if (PageActive(page)) 119 - lru += LRU_ACTIVE; 120 - lru += page_is_file_cache(page); 113 + if (PageUnevictable(page)) 114 + lru = LRU_UNEVICTABLE; 115 + else { 116 + if (PageActive(page)) 117 + lru += LRU_ACTIVE; 118 + lru += page_is_file_cache(page); 119 + } 121 120 122 121 return lru; 123 122 }
+23 -1
include/linux/mmzone.h
··· 86 86 NR_ACTIVE_ANON, /* " " " " " */ 87 87 NR_INACTIVE_FILE, /* " " " " " */ 88 88 NR_ACTIVE_FILE, /* " " " " " */ 89 + #ifdef CONFIG_UNEVICTABLE_LRU 90 + NR_UNEVICTABLE, /* " " " " " */ 91 + #else 92 + NR_UNEVICTABLE = NR_ACTIVE_FILE, /* avoid compiler errors in dead code */ 93 + #endif 89 94 NR_ANON_PAGES, /* Mapped anonymous pages */ 90 95 NR_FILE_MAPPED, /* pagecache pages mapped into pagetables. 91 96 only modified from process context */ ··· 133 128 LRU_ACTIVE_ANON = LRU_BASE + LRU_ACTIVE, 134 129 LRU_INACTIVE_FILE = LRU_BASE + LRU_FILE, 135 130 LRU_ACTIVE_FILE = LRU_BASE + LRU_FILE + LRU_ACTIVE, 136 - NR_LRU_LISTS }; 131 + #ifdef CONFIG_UNEVICTABLE_LRU 132 + LRU_UNEVICTABLE, 133 + #else 134 + LRU_UNEVICTABLE = LRU_ACTIVE_FILE, /* avoid compiler errors in dead code */ 135 + #endif 136 + NR_LRU_LISTS 137 + }; 137 138 138 139 #define for_each_lru(l) for (l = 0; l < NR_LRU_LISTS; l++) 140 + 141 + #define for_each_evictable_lru(l) for (l = 0; l <= LRU_ACTIVE_FILE; l++) 139 142 140 143 static inline int is_file_lru(enum lru_list l) 141 144 { ··· 153 140 static inline int is_active_lru(enum lru_list l) 154 141 { 155 142 return (l == LRU_ACTIVE_ANON || l == LRU_ACTIVE_FILE); 143 + } 144 + 145 + static inline int is_unevictable_lru(enum lru_list l) 146 + { 147 + #ifdef CONFIG_UNEVICTABLE_LRU 148 + return (l == LRU_UNEVICTABLE); 149 + #else 150 + return 0; 151 + #endif 156 152 } 157 153 158 154 struct per_cpu_pages {
+21 -1
include/linux/page-flags.h
··· 94 94 PG_reclaim, /* To be reclaimed asap */ 95 95 PG_buddy, /* Page is free, on buddy lists */ 96 96 PG_swapbacked, /* Page is backed by RAM/swap */ 97 + #ifdef CONFIG_UNEVICTABLE_LRU 98 + PG_unevictable, /* Page is "unevictable" */ 99 + #endif 97 100 #ifdef CONFIG_IA64_UNCACHED_ALLOCATOR 98 101 PG_uncached, /* Page has been mapped as uncached */ 99 102 #endif ··· 185 182 PAGEFLAG(Dirty, dirty) TESTSCFLAG(Dirty, dirty) __CLEARPAGEFLAG(Dirty, dirty) 186 183 PAGEFLAG(LRU, lru) __CLEARPAGEFLAG(LRU, lru) 187 184 PAGEFLAG(Active, active) __CLEARPAGEFLAG(Active, active) 185 + TESTCLEARFLAG(Active, active) 188 186 __PAGEFLAG(Slab, slab) 189 187 PAGEFLAG(Checked, checked) /* Used by some filesystems */ 190 188 PAGEFLAG(Pinned, pinned) TESTSCFLAG(Pinned, pinned) /* Xen */ ··· 227 223 PAGEFLAG(SwapCache, swapcache) 228 224 #else 229 225 PAGEFLAG_FALSE(SwapCache) 226 + #endif 227 + 228 + #ifdef CONFIG_UNEVICTABLE_LRU 229 + PAGEFLAG(Unevictable, unevictable) __CLEARPAGEFLAG(Unevictable, unevictable) 230 + TESTCLEARFLAG(Unevictable, unevictable) 231 + #else 232 + PAGEFLAG_FALSE(Unevictable) TESTCLEARFLAG_FALSE(Unevictable) 233 + SETPAGEFLAG_NOOP(Unevictable) CLEARPAGEFLAG_NOOP(Unevictable) 234 + __CLEARPAGEFLAG_NOOP(Unevictable) 230 235 #endif 231 236 232 237 #ifdef CONFIG_IA64_UNCACHED_ALLOCATOR ··· 353 340 354 341 #endif /* !PAGEFLAGS_EXTENDED */ 355 342 343 + #ifdef CONFIG_UNEVICTABLE_LRU 344 + #define __PG_UNEVICTABLE (1 << PG_unevictable) 345 + #else 346 + #define __PG_UNEVICTABLE 0 347 + #endif 348 + 356 349 #define PAGE_FLAGS (1 << PG_lru | 1 << PG_private | 1 << PG_locked | \ 357 350 1 << PG_buddy | 1 << PG_writeback | \ 358 - 1 << PG_slab | 1 << PG_swapcache | 1 << PG_active) 351 + 1 << PG_slab | 1 << PG_swapcache | 1 << PG_active | \ 352 + __PG_UNEVICTABLE) 359 353 360 354 /* 361 355 * Flags checked in bad_page(). Pages on the free list should not have
-1
include/linux/pagevec.h
··· 101 101 ____pagevec_lru_add(pvec, LRU_ACTIVE_FILE); 102 102 } 103 103 104 - 105 104 static inline void pagevec_lru_add_file(struct pagevec *pvec) 106 105 { 107 106 if (pagevec_count(pvec))
+12
include/linux/swap.h
··· 180 180 extern void rotate_reclaimable_page(struct page *page); 181 181 extern void swap_setup(void); 182 182 183 + extern void add_page_to_unevictable_list(struct page *page); 184 + 183 185 /** 184 186 * lru_cache_add: add a page to the page lists 185 187 * @page: the page to add ··· 227 225 static inline int zone_reclaim(struct zone *z, gfp_t mask, unsigned int order) 228 226 { 229 227 return 0; 228 + } 229 + #endif 230 + 231 + #ifdef CONFIG_UNEVICTABLE_LRU 232 + extern int page_evictable(struct page *page, struct vm_area_struct *vma); 233 + #else 234 + static inline int page_evictable(struct page *page, 235 + struct vm_area_struct *vma) 236 + { 237 + return 1; 230 238 } 231 239 #endif 232 240
+11
mm/Kconfig
··· 209 209 def_bool y 210 210 depends on !ARCH_NO_VIRT_TO_BUS 211 211 212 + config UNEVICTABLE_LRU 213 + bool "Add LRU list to track non-evictable pages" 214 + default y 215 + depends on MMU 216 + help 217 + Keeps unevictable pages off of the active and inactive pageout 218 + lists, so kswapd will not waste CPU time or have its balancing 219 + algorithms thrown off by scanning these pages. Selecting this 220 + will use one page flag and increase the code size a little, 221 + say Y unless you know what you are doing. 222 + 212 223 config MMU_NOTIFIER 213 224 bool
+26
mm/internal.h
··· 39 39 atomic_dec(&page->_count); 40 40 } 41 41 42 + /* 43 + * in mm/vmscan.c: 44 + */ 42 45 extern int isolate_lru_page(struct page *page); 46 + extern void putback_lru_page(struct page *page); 43 47 48 + /* 49 + * in mm/page_alloc.c 50 + */ 44 51 extern void __free_pages_bootmem(struct page *page, unsigned int order); 45 52 46 53 /* ··· 60 53 VM_BUG_ON(!PageBuddy(page)); 61 54 return page_private(page); 62 55 } 56 + 57 + #ifdef CONFIG_UNEVICTABLE_LRU 58 + /* 59 + * unevictable_migrate_page() called only from migrate_page_copy() to 60 + * migrate unevictable flag to new page. 61 + * Note that the old page has been isolated from the LRU lists at this 62 + * point so we don't need to worry about LRU statistics. 63 + */ 64 + static inline void unevictable_migrate_page(struct page *new, struct page *old) 65 + { 66 + if (TestClearPageUnevictable(old)) 67 + SetPageUnevictable(new); 68 + } 69 + #else 70 + static inline void unevictable_migrate_page(struct page *new, struct page *old) 71 + { 72 + } 73 + #endif 74 + 63 75 64 76 /* 65 77 * FLATMEM and DISCONTIGMEM configurations use alloc_bootmem_node,
+45 -28
mm/memcontrol.c
··· 160 160 struct mem_cgroup *mem_cgroup; 161 161 int flags; 162 162 }; 163 - #define PAGE_CGROUP_FLAG_CACHE (0x1) /* charged as cache */ 164 - #define PAGE_CGROUP_FLAG_ACTIVE (0x2) /* page is active in this cgroup */ 165 - #define PAGE_CGROUP_FLAG_FILE (0x4) /* page is file system backed */ 163 + #define PAGE_CGROUP_FLAG_CACHE (0x1) /* charged as cache */ 164 + #define PAGE_CGROUP_FLAG_ACTIVE (0x2) /* page is active in this cgroup */ 165 + #define PAGE_CGROUP_FLAG_FILE (0x4) /* page is file system backed */ 166 + #define PAGE_CGROUP_FLAG_UNEVICTABLE (0x8) /* page is unevictableable */ 166 167 167 168 static int page_cgroup_nid(struct page_cgroup *pc) 168 169 { ··· 293 292 { 294 293 int lru = LRU_BASE; 295 294 296 - if (pc->flags & PAGE_CGROUP_FLAG_ACTIVE) 297 - lru += LRU_ACTIVE; 298 - if (pc->flags & PAGE_CGROUP_FLAG_FILE) 299 - lru += LRU_FILE; 295 + if (pc->flags & PAGE_CGROUP_FLAG_UNEVICTABLE) 296 + lru = LRU_UNEVICTABLE; 297 + else { 298 + if (pc->flags & PAGE_CGROUP_FLAG_ACTIVE) 299 + lru += LRU_ACTIVE; 300 + if (pc->flags & PAGE_CGROUP_FLAG_FILE) 301 + lru += LRU_FILE; 302 + } 300 303 301 304 MEM_CGROUP_ZSTAT(mz, lru) -= 1; 302 305 ··· 313 308 { 314 309 int lru = LRU_BASE; 315 310 316 - if (pc->flags & PAGE_CGROUP_FLAG_ACTIVE) 317 - lru += LRU_ACTIVE; 318 - if (pc->flags & PAGE_CGROUP_FLAG_FILE) 319 - lru += LRU_FILE; 311 + if (pc->flags & PAGE_CGROUP_FLAG_UNEVICTABLE) 312 + lru = LRU_UNEVICTABLE; 313 + else { 314 + if (pc->flags & PAGE_CGROUP_FLAG_ACTIVE) 315 + lru += LRU_ACTIVE; 316 + if (pc->flags & PAGE_CGROUP_FLAG_FILE) 317 + lru += LRU_FILE; 318 + } 320 319 321 320 MEM_CGROUP_ZSTAT(mz, lru) += 1; 322 321 list_add(&pc->lru, &mz->lists[lru]); ··· 328 319 mem_cgroup_charge_statistics(pc->mem_cgroup, pc->flags, true); 329 320 } 330 321 331 - static void __mem_cgroup_move_lists(struct page_cgroup *pc, bool active) 322 + static void __mem_cgroup_move_lists(struct page_cgroup *pc, enum lru_list lru) 332 323 { 333 324 struct mem_cgroup_per_zone *mz = page_cgroup_zoneinfo(pc); 334 - int from = pc->flags & PAGE_CGROUP_FLAG_ACTIVE; 335 - int file = pc->flags & PAGE_CGROUP_FLAG_FILE; 336 - int lru = LRU_FILE * !!file + !!from; 325 + int active = pc->flags & PAGE_CGROUP_FLAG_ACTIVE; 326 + int file = pc->flags & PAGE_CGROUP_FLAG_FILE; 327 + int unevictable = pc->flags & PAGE_CGROUP_FLAG_UNEVICTABLE; 328 + enum lru_list from = unevictable ? LRU_UNEVICTABLE : 329 + (LRU_FILE * !!file + !!active); 337 330 338 - MEM_CGROUP_ZSTAT(mz, lru) -= 1; 331 + if (lru == from) 332 + return; 339 333 340 - if (active) 341 - pc->flags |= PAGE_CGROUP_FLAG_ACTIVE; 342 - else 334 + MEM_CGROUP_ZSTAT(mz, from) -= 1; 335 + 336 + if (is_unevictable_lru(lru)) { 343 337 pc->flags &= ~PAGE_CGROUP_FLAG_ACTIVE; 338 + pc->flags |= PAGE_CGROUP_FLAG_UNEVICTABLE; 339 + } else { 340 + if (is_active_lru(lru)) 341 + pc->flags |= PAGE_CGROUP_FLAG_ACTIVE; 342 + else 343 + pc->flags &= ~PAGE_CGROUP_FLAG_ACTIVE; 344 + pc->flags &= ~PAGE_CGROUP_FLAG_UNEVICTABLE; 345 + } 344 346 345 - lru = LRU_FILE * !!file + !!active; 346 347 MEM_CGROUP_ZSTAT(mz, lru) += 1; 347 348 list_move(&pc->lru, &mz->lists[lru]); 348 349 } ··· 370 351 /* 371 352 * This routine assumes that the appropriate zone's lru lock is already held 372 353 */ 373 - void mem_cgroup_move_lists(struct page *page, bool active) 354 + void mem_cgroup_move_lists(struct page *page, enum lru_list lru) 374 355 { 375 356 struct page_cgroup *pc; 376 357 struct mem_cgroup_per_zone *mz; ··· 393 374 if (pc) { 394 375 mz = page_cgroup_zoneinfo(pc); 395 376 spin_lock_irqsave(&mz->lru_lock, flags); 396 - __mem_cgroup_move_lists(pc, active); 377 + __mem_cgroup_move_lists(pc, lru); 397 378 spin_unlock_irqrestore(&mz->lru_lock, flags); 398 379 } 399 380 unlock_page_cgroup(page); ··· 491 472 /* 492 473 * TODO: play better with lumpy reclaim, grabbing anything. 493 474 */ 494 - if (PageActive(page) && !active) { 495 - __mem_cgroup_move_lists(pc, true); 496 - continue; 497 - } 498 - if (!PageActive(page) && active) { 499 - __mem_cgroup_move_lists(pc, false); 475 + if (PageUnevictable(page) || 476 + (PageActive(page) && !active) || 477 + (!PageActive(page) && active)) { 478 + __mem_cgroup_move_lists(pc, page_lru(page)); 500 479 continue; 501 480 } 502 481
+1 -1
mm/mempolicy.c
··· 2202 2202 if (PageSwapCache(page)) 2203 2203 md->swapcache++; 2204 2204 2205 - if (PageActive(page)) 2205 + if (PageActive(page) || PageUnevictable(page)) 2206 2206 md->active++; 2207 2207 2208 2208 if (PageWriteback(page))
+17 -14
mm/migrate.c
··· 53 53 return 0; 54 54 } 55 55 56 - static inline void move_to_lru(struct page *page) 57 - { 58 - lru_cache_add_lru(page, page_lru(page)); 59 - put_page(page); 60 - } 61 - 62 56 /* 63 - * Add isolated pages on the list back to the LRU. 57 + * Add isolated pages on the list back to the LRU under page lock 58 + * to avoid leaking evictable pages back onto unevictable list. 64 59 * 65 60 * returns the number of pages put back. 66 61 */ ··· 67 72 68 73 list_for_each_entry_safe(page, page2, l, lru) { 69 74 list_del(&page->lru); 70 - move_to_lru(page); 75 + putback_lru_page(page); 71 76 count++; 72 77 } 73 78 return count; ··· 349 354 SetPageReferenced(newpage); 350 355 if (PageUptodate(page)) 351 356 SetPageUptodate(newpage); 352 - if (PageActive(page)) 357 + if (TestClearPageActive(page)) { 358 + VM_BUG_ON(PageUnevictable(page)); 353 359 SetPageActive(newpage); 360 + } else 361 + unevictable_migrate_page(newpage, page); 354 362 if (PageChecked(page)) 355 363 SetPageChecked(newpage); 356 364 if (PageMappedToDisk(page)) ··· 374 376 #ifdef CONFIG_SWAP 375 377 ClearPageSwapCache(page); 376 378 #endif 377 - ClearPageActive(page); 378 379 ClearPagePrivate(page); 379 380 set_page_private(page, 0); 380 381 page->mapping = NULL; ··· 552 555 * 553 556 * The new page will have replaced the old page if this function 554 557 * is successful. 558 + * 559 + * Return value: 560 + * < 0 - error code 561 + * == 0 - success 555 562 */ 556 563 static int move_to_new_page(struct page *newpage, struct page *page) 557 564 { ··· 618 617 if (!newpage) 619 618 return -ENOMEM; 620 619 621 - if (page_count(page) == 1) 620 + if (page_count(page) == 1) { 622 621 /* page was freed from under us. So we are done. */ 623 622 goto move_newpage; 623 + } 624 624 625 625 charge = mem_cgroup_prepare_migration(page, newpage); 626 626 if (charge == -ENOMEM) { ··· 695 693 rcu_read_unlock(); 696 694 697 695 unlock: 698 - 699 696 unlock_page(page); 700 697 701 698 if (rc != -EAGAIN) { ··· 705 704 * restored. 706 705 */ 707 706 list_del(&page->lru); 708 - move_to_lru(page); 707 + putback_lru_page(page); 709 708 } 710 709 711 710 move_newpage: 712 711 if (!charge) 713 712 mem_cgroup_end_migration(newpage); 713 + 714 714 /* 715 715 * Move the new page to the LRU. If migration was not successful 716 716 * then this will free the page. 717 717 */ 718 - move_to_lru(newpage); 718 + putback_lru_page(newpage); 719 + 719 720 if (result) { 720 721 if (rc) 721 722 *result = rc;
+36 -6
mm/swap.c
··· 115 115 zone = pagezone; 116 116 spin_lock(&zone->lru_lock); 117 117 } 118 - if (PageLRU(page) && !PageActive(page)) { 118 + if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) { 119 119 int lru = page_is_file_cache(page); 120 120 list_move_tail(&page->lru, &zone->lru[lru].list); 121 121 pgmoved++; ··· 136 136 void rotate_reclaimable_page(struct page *page) 137 137 { 138 138 if (!PageLocked(page) && !PageDirty(page) && !PageActive(page) && 139 - PageLRU(page)) { 139 + !PageUnevictable(page) && PageLRU(page)) { 140 140 struct pagevec *pvec; 141 141 unsigned long flags; 142 142 ··· 157 157 struct zone *zone = page_zone(page); 158 158 159 159 spin_lock_irq(&zone->lru_lock); 160 - if (PageLRU(page) && !PageActive(page)) { 160 + if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) { 161 161 int file = page_is_file_cache(page); 162 162 int lru = LRU_BASE + file; 163 163 del_page_from_lru_list(zone, page, lru); ··· 166 166 lru += LRU_ACTIVE; 167 167 add_page_to_lru_list(zone, page, lru); 168 168 __count_vm_event(PGACTIVATE); 169 - mem_cgroup_move_lists(page, true); 169 + mem_cgroup_move_lists(page, lru); 170 170 171 171 zone->recent_rotated[!!file]++; 172 172 zone->recent_scanned[!!file]++; ··· 183 183 */ 184 184 void mark_page_accessed(struct page *page) 185 185 { 186 - if (!PageActive(page) && PageReferenced(page) && PageLRU(page)) { 186 + if (!PageActive(page) && !PageUnevictable(page) && 187 + PageReferenced(page) && PageLRU(page)) { 187 188 activate_page(page); 188 189 ClearPageReferenced(page); 189 190 } else if (!PageReferenced(page)) { ··· 212 211 void lru_cache_add_lru(struct page *page, enum lru_list lru) 213 212 { 214 213 if (PageActive(page)) { 214 + VM_BUG_ON(PageUnevictable(page)); 215 215 ClearPageActive(page); 216 + } else if (PageUnevictable(page)) { 217 + VM_BUG_ON(PageActive(page)); 218 + ClearPageUnevictable(page); 216 219 } 217 220 218 - VM_BUG_ON(PageLRU(page) || PageActive(page)); 221 + VM_BUG_ON(PageLRU(page) || PageActive(page) || PageUnevictable(page)); 219 222 __lru_cache_add(page, lru); 223 + } 224 + 225 + /** 226 + * add_page_to_unevictable_list - add a page to the unevictable list 227 + * @page: the page to be added to the unevictable list 228 + * 229 + * Add page directly to its zone's unevictable list. To avoid races with 230 + * tasks that might be making the page evictable, through eg. munlock, 231 + * munmap or exit, while it's not on the lru, we want to add the page 232 + * while it's locked or otherwise "invisible" to other tasks. This is 233 + * difficult to do when using the pagevec cache, so bypass that. 234 + */ 235 + void add_page_to_unevictable_list(struct page *page) 236 + { 237 + struct zone *zone = page_zone(page); 238 + 239 + spin_lock_irq(&zone->lru_lock); 240 + SetPageUnevictable(page); 241 + SetPageLRU(page); 242 + add_page_to_lru_list(zone, page, LRU_UNEVICTABLE); 243 + spin_unlock_irq(&zone->lru_lock); 220 244 } 221 245 222 246 /* ··· 342 316 343 317 if (PageLRU(page)) { 344 318 struct zone *pagezone = page_zone(page); 319 + 345 320 if (pagezone != zone) { 346 321 if (zone) 347 322 spin_unlock_irqrestore(&zone->lru_lock, ··· 419 392 { 420 393 int i; 421 394 struct zone *zone = NULL; 395 + VM_BUG_ON(is_unevictable_lru(lru)); 422 396 423 397 for (i = 0; i < pagevec_count(pvec); i++) { 424 398 struct page *page = pvec->pages[i]; ··· 431 403 zone = pagezone; 432 404 spin_lock_irq(&zone->lru_lock); 433 405 } 406 + VM_BUG_ON(PageActive(page)); 407 + VM_BUG_ON(PageUnevictable(page)); 434 408 VM_BUG_ON(PageLRU(page)); 435 409 SetPageLRU(page); 436 410 if (is_active_lru(lru))
+136 -13
mm/vmscan.c
··· 470 470 return 0; 471 471 } 472 472 473 + /** 474 + * putback_lru_page - put previously isolated page onto appropriate LRU list 475 + * @page: page to be put back to appropriate lru list 476 + * 477 + * Add previously isolated @page to appropriate LRU list. 478 + * Page may still be unevictable for other reasons. 479 + * 480 + * lru_lock must not be held, interrupts must be enabled. 481 + */ 482 + #ifdef CONFIG_UNEVICTABLE_LRU 483 + void putback_lru_page(struct page *page) 484 + { 485 + int lru; 486 + int active = !!TestClearPageActive(page); 487 + 488 + VM_BUG_ON(PageLRU(page)); 489 + 490 + redo: 491 + ClearPageUnevictable(page); 492 + 493 + if (page_evictable(page, NULL)) { 494 + /* 495 + * For evictable pages, we can use the cache. 496 + * In event of a race, worst case is we end up with an 497 + * unevictable page on [in]active list. 498 + * We know how to handle that. 499 + */ 500 + lru = active + page_is_file_cache(page); 501 + lru_cache_add_lru(page, lru); 502 + } else { 503 + /* 504 + * Put unevictable pages directly on zone's unevictable 505 + * list. 506 + */ 507 + lru = LRU_UNEVICTABLE; 508 + add_page_to_unevictable_list(page); 509 + } 510 + mem_cgroup_move_lists(page, lru); 511 + 512 + /* 513 + * page's status can change while we move it among lru. If an evictable 514 + * page is on unevictable list, it never be freed. To avoid that, 515 + * check after we added it to the list, again. 516 + */ 517 + if (lru == LRU_UNEVICTABLE && page_evictable(page, NULL)) { 518 + if (!isolate_lru_page(page)) { 519 + put_page(page); 520 + goto redo; 521 + } 522 + /* This means someone else dropped this page from LRU 523 + * So, it will be freed or putback to LRU again. There is 524 + * nothing to do here. 525 + */ 526 + } 527 + 528 + put_page(page); /* drop ref from isolate */ 529 + } 530 + 531 + #else /* CONFIG_UNEVICTABLE_LRU */ 532 + 533 + void putback_lru_page(struct page *page) 534 + { 535 + int lru; 536 + VM_BUG_ON(PageLRU(page)); 537 + 538 + lru = !!TestClearPageActive(page) + page_is_file_cache(page); 539 + lru_cache_add_lru(page, lru); 540 + mem_cgroup_move_lists(page, lru); 541 + put_page(page); 542 + } 543 + #endif /* CONFIG_UNEVICTABLE_LRU */ 544 + 545 + 473 546 /* 474 547 * shrink_page_list() returns the number of reclaimed pages 475 548 */ ··· 575 502 VM_BUG_ON(PageActive(page)); 576 503 577 504 sc->nr_scanned++; 505 + 506 + if (unlikely(!page_evictable(page, NULL))) { 507 + unlock_page(page); 508 + putback_lru_page(page); 509 + continue; 510 + } 578 511 579 512 if (!sc->may_swap && page_mapped(page)) 580 513 goto keep_locked; ··· 681 602 * possible for a page to have PageDirty set, but it is actually 682 603 * clean (all its buffers are clean). This happens if the 683 604 * buffers were written out directly, with submit_bh(). ext3 684 - * will do this, as well as the blockdev mapping. 605 + * will do this, as well as the blockdev mapping. 685 606 * try_to_release_page() will discover that cleanness and will 686 607 * drop the buffers and mark the page clean - it can be freed. 687 608 * ··· 729 650 /* Not a candidate for swapping, so reclaim swap space. */ 730 651 if (PageSwapCache(page) && vm_swap_full()) 731 652 remove_exclusive_swap_page_ref(page); 653 + VM_BUG_ON(PageActive(page)); 732 654 SetPageActive(page); 733 655 pgactivate++; 734 656 keep_locked: ··· 777 697 return ret; 778 698 779 699 if (mode != ISOLATE_BOTH && (!page_is_file_cache(page) != !file)) 700 + return ret; 701 + 702 + /* 703 + * When this function is being called for lumpy reclaim, we 704 + * initially look into all LRU pages, active, inactive and 705 + * unevictable; only give shrink_page_list evictable pages. 706 + */ 707 + if (PageUnevictable(page)) 780 708 return ret; 781 709 782 710 ret = -EBUSY; ··· 898 810 /* else it is being freed elsewhere */ 899 811 list_move(&cursor_page->lru, src); 900 812 default: 901 - break; 813 + break; /* ! on LRU or wrong list */ 902 814 } 903 815 } 904 816 } ··· 958 870 * Returns -EBUSY if the page was not on an LRU list. 959 871 * 960 872 * The returned page will have PageLRU() cleared. If it was found on 961 - * the active list, it will have PageActive set. That flag may need 962 - * to be cleared by the caller before letting the page go. 873 + * the active list, it will have PageActive set. If it was found on 874 + * the unevictable list, it will have the PageUnevictable bit set. That flag 875 + * may need to be cleared by the caller before letting the page go. 963 876 * 964 877 * The vmstat statistic corresponding to the list on which the page was 965 878 * found will be decremented. ··· 981 892 982 893 spin_lock_irq(&zone->lru_lock); 983 894 if (PageLRU(page) && get_page_unless_zero(page)) { 984 - int lru = LRU_BASE; 895 + int lru = page_lru(page); 985 896 ret = 0; 986 897 ClearPageLRU(page); 987 898 988 - lru += page_is_file_cache(page) + !!PageActive(page); 989 899 del_page_from_lru_list(zone, page, lru); 990 900 } 991 901 spin_unlock_irq(&zone->lru_lock); ··· 1096 1008 * Put back any unfreeable pages. 1097 1009 */ 1098 1010 while (!list_empty(&page_list)) { 1011 + int lru; 1099 1012 page = lru_to_page(&page_list); 1100 1013 VM_BUG_ON(PageLRU(page)); 1101 - SetPageLRU(page); 1102 1014 list_del(&page->lru); 1103 - add_page_to_lru_list(zone, page, page_lru(page)); 1015 + if (unlikely(!page_evictable(page, NULL))) { 1016 + spin_unlock_irq(&zone->lru_lock); 1017 + putback_lru_page(page); 1018 + spin_lock_irq(&zone->lru_lock); 1019 + continue; 1020 + } 1021 + SetPageLRU(page); 1022 + lru = page_lru(page); 1023 + add_page_to_lru_list(zone, page, lru); 1024 + mem_cgroup_move_lists(page, lru); 1104 1025 if (PageActive(page) && scan_global_lru(sc)) { 1105 1026 int file = !!page_is_file_cache(page); 1106 1027 zone->recent_rotated[file]++; ··· 1204 1107 page = lru_to_page(&l_hold); 1205 1108 list_del(&page->lru); 1206 1109 1110 + if (unlikely(!page_evictable(page, NULL))) { 1111 + putback_lru_page(page); 1112 + continue; 1113 + } 1114 + 1207 1115 /* page_referenced clears PageReferenced */ 1208 1116 if (page_mapping_inuse(page) && 1209 1117 page_referenced(page, 0, sc->mem_cgroup)) ··· 1242 1140 ClearPageActive(page); 1243 1141 1244 1142 list_move(&page->lru, &zone->lru[lru].list); 1245 - mem_cgroup_move_lists(page, false); 1143 + mem_cgroup_move_lists(page, lru); 1246 1144 pgmoved++; 1247 1145 if (!pagevec_add(&pvec, page)) { 1248 1146 __mod_zone_page_state(zone, NR_LRU_BASE + lru, pgmoved); ··· 1388 1286 1389 1287 get_scan_ratio(zone, sc, percent); 1390 1288 1391 - for_each_lru(l) { 1289 + for_each_evictable_lru(l) { 1392 1290 if (scan_global_lru(sc)) { 1393 1291 int file = is_file_lru(l); 1394 1292 int scan; ··· 1420 1318 1421 1319 while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] || 1422 1320 nr[LRU_INACTIVE_FILE]) { 1423 - for_each_lru(l) { 1321 + for_each_evictable_lru(l) { 1424 1322 if (nr[l]) { 1425 1323 nr_to_scan = min(nr[l], 1426 1324 (unsigned long)sc->swap_cluster_max); ··· 1977 1875 if (zone_is_all_unreclaimable(zone) && prio != DEF_PRIORITY) 1978 1876 continue; 1979 1877 1980 - for_each_lru(l) { 1981 - /* For pass = 0 we don't shrink the active list */ 1878 + for_each_evictable_lru(l) { 1879 + /* For pass = 0, we don't shrink the active list */ 1982 1880 if (pass == 0 && 1983 1881 (l == LRU_ACTIVE || l == LRU_ACTIVE_FILE)) 1984 1882 continue; ··· 2313 2211 zone_clear_flag(zone, ZONE_RECLAIM_LOCKED); 2314 2212 2315 2213 return ret; 2214 + } 2215 + #endif 2216 + 2217 + #ifdef CONFIG_UNEVICTABLE_LRU 2218 + /* 2219 + * page_evictable - test whether a page is evictable 2220 + * @page: the page to test 2221 + * @vma: the VMA in which the page is or will be mapped, may be NULL 2222 + * 2223 + * Test whether page is evictable--i.e., should be placed on active/inactive 2224 + * lists vs unevictable list. 2225 + * 2226 + * Reasons page might not be evictable: 2227 + * TODO - later patches 2228 + */ 2229 + int page_evictable(struct page *page, struct vm_area_struct *vma) 2230 + { 2231 + 2232 + /* TODO: test page [!]evictable conditions */ 2233 + 2234 + return 1; 2316 2235 } 2317 2236 #endif