Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: memcontrol: rewrite uncharge API

The memcg uncharging code that is involved towards the end of a page's
lifetime - truncation, reclaim, swapout, migration - is impressively
complicated and fragile.

Because anonymous and file pages were always charged before they had their
page->mapping established, uncharges had to happen when the page type
could still be known from the context; as in unmap for anonymous, page
cache removal for file and shmem pages, and swap cache truncation for swap
pages. However, these operations happen well before the page is actually
freed, and so a lot of synchronization is necessary:

- Charging, uncharging, page migration, and charge migration all need
to take a per-page bit spinlock as they could race with uncharging.

- Swap cache truncation happens during both swap-in and swap-out, and
possibly repeatedly before the page is actually freed. This means
that the memcg swapout code is called from many contexts that make
no sense and it has to figure out the direction from page state to
make sure memory and memory+swap are always correctly charged.

- On page migration, the old page might be unmapped but then reused,
so memcg code has to prevent untimely uncharging in that case.
Because this code - which should be a simple charge transfer - is so
special-cased, it is not reusable for replace_page_cache().

But now that charged pages always have a page->mapping, introduce
mem_cgroup_uncharge(), which is called after the final put_page(), when we
know for sure that nobody is looking at the page anymore.

For page migration, introduce mem_cgroup_migrate(), which is called after
the migration is successful and the new page is fully rmapped. Because
the old page is no longer uncharged after migration, prevent double
charges by decoupling the page's memcg association (PCG_USED and
pc->mem_cgroup) from the page holding an actual charge. The new bits
PCG_MEM and PCG_MEMSW represent the respective charges and are transferred
to the new page during migration.

mem_cgroup_migrate() is suitable for replace_page_cache() as well,
which gets rid of mem_cgroup_replace_page_cache(). However, care
needs to be taken because both the source and the target page can
already be charged and on the LRU when fuse is splicing: grab the page
lock on the charge moving side to prevent changing pc->mem_cgroup of a
page under migration. Also, the lruvecs of both pages change as we
uncharge the old and charge the new during migration, and putback may
race with us, so grab the lru lock and isolate the pages iff on LRU to
prevent races and ensure the pages are on the right lruvec afterward.

Swap accounting is massively simplified: because the page is no longer
uncharged as early as swap cache deletion, a new mem_cgroup_swapout() can
transfer the page's memory+swap charge (PCG_MEMSW) to the swap entry
before the final put_page() in page reclaim.

Finally, page_cgroup changes are now protected by whatever protection the
page itself offers: anonymous pages are charged under the page table lock,
whereas page cache insertions, swapin, and migration hold the page lock.
Uncharging happens under full exclusion with no outstanding references.
Charging and uncharging also ensure that the page is off-LRU, which
serializes against charge migration. Remove the very costly page_cgroup
lock and set pc->flags non-atomically.

[mhocko@suse.cz: mem_cgroup_charge_statistics needs preempt_disable]
[vdavydov@parallels.com: fix flags definition]
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Tested-by: Jet Chen <jet.chen@intel.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Tested-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: Vladimir Davydov <vdavydov@parallels.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Johannes Weiner and committed by
Linus Torvalds
0a31bc97 00501b53

+389 -768
+5 -121
Documentation/cgroups/memcg_test.txt
··· 29 29 2. Uncharge 30 30 a page/swp_entry may be uncharged (usage -= PAGE_SIZE) by 31 31 32 - mem_cgroup_uncharge_page() 33 - Called when an anonymous page is fully unmapped. I.e., mapcount goes 34 - to 0. If the page is SwapCache, uncharge is delayed until 35 - mem_cgroup_uncharge_swapcache(). 36 - 37 - mem_cgroup_uncharge_cache_page() 38 - Called when a page-cache is deleted from radix-tree. If the page is 39 - SwapCache, uncharge is delayed until mem_cgroup_uncharge_swapcache(). 40 - 41 - mem_cgroup_uncharge_swapcache() 42 - Called when SwapCache is removed from radix-tree. The charge itself 43 - is moved to swap_cgroup. (If mem+swap controller is disabled, no 44 - charge to swap occurs.) 32 + mem_cgroup_uncharge() 33 + Called when a page's refcount goes down to 0. 45 34 46 35 mem_cgroup_uncharge_swap() 47 36 Called when swp_entry's refcnt goes down to 0. A charge against swap 48 37 disappears. 49 - 50 - mem_cgroup_end_migration(old, new) 51 - At success of migration old is uncharged (if necessary), a charge 52 - to new page is committed. At failure, charge to old page is committed. 53 38 54 39 3. charge-commit-cancel 55 40 Memcg pages are charged in two steps: ··· 54 69 Anonymous page is newly allocated at 55 70 - page fault into MAP_ANONYMOUS mapping. 56 71 - Copy-On-Write. 57 - It is charged right after it's allocated before doing any page table 58 - related operations. Of course, it's uncharged when another page is used 59 - for the fault address. 60 - 61 - At freeing anonymous page (by exit() or munmap()), zap_pte() is called 62 - and pages for ptes are freed one by one.(see mm/memory.c). Uncharges 63 - are done at page_remove_rmap() when page_mapcount() goes down to 0. 64 - 65 - Another page freeing is by page-reclaim (vmscan.c) and anonymous 66 - pages are swapped out. In this case, the page is marked as 67 - PageSwapCache(). uncharge() routine doesn't uncharge the page marked 68 - as SwapCache(). It's delayed until __delete_from_swap_cache(). 69 72 70 73 4.1 Swap-in. 71 74 At swap-in, the page is taken from swap-cache. There are 2 cases. ··· 61 88 (a) If the SwapCache is newly allocated and read, it has no charges. 62 89 (b) If the SwapCache has been mapped by processes, it has been 63 90 charged already. 64 - 65 - This swap-in is one of the most complicated work. In do_swap_page(), 66 - following events occur when pte is unchanged. 67 - 68 - (1) the page (SwapCache) is looked up. 69 - (2) lock_page() 70 - (3) try_charge_swapin() 71 - (4) reuse_swap_page() (may call delete_swap_cache()) 72 - (5) commit_charge_swapin() 73 - (6) swap_free(). 74 - 75 - Considering following situation for example. 76 - 77 - (A) The page has not been charged before (2) and reuse_swap_page() 78 - doesn't call delete_from_swap_cache(). 79 - (B) The page has not been charged before (2) and reuse_swap_page() 80 - calls delete_from_swap_cache(). 81 - (C) The page has been charged before (2) and reuse_swap_page() doesn't 82 - call delete_from_swap_cache(). 83 - (D) The page has been charged before (2) and reuse_swap_page() calls 84 - delete_from_swap_cache(). 85 - 86 - memory.usage/memsw.usage changes to this page/swp_entry will be 87 - Case (A) (B) (C) (D) 88 - Event 89 - Before (2) 0/ 1 0/ 1 1/ 1 1/ 1 90 - =========================================== 91 - (3) +1/+1 +1/+1 +1/+1 +1/+1 92 - (4) - 0/ 0 - -1/ 0 93 - (5) 0/-1 0/ 0 -1/-1 0/ 0 94 - (6) - 0/-1 - 0/-1 95 - =========================================== 96 - Result 1/ 1 1/ 1 1/ 1 1/ 1 97 - 98 - In any cases, charges to this page should be 1/ 1. 99 91 100 92 4.2 Swap-out. 101 93 At swap-out, typical state transition is below. ··· 74 136 swp_entry's refcnt -= 1. 75 137 76 138 77 - At (b), the page is marked as SwapCache and not uncharged. 78 - At (d), the page is removed from SwapCache and a charge in page_cgroup 79 - is moved to swap_cgroup. 80 - 81 139 Finally, at task exit, 82 140 (e) zap_pte() is called and swp_entry's refcnt -=1 -> 0. 83 - Here, a charge in swap_cgroup disappears. 84 141 85 142 5. Page Cache 86 143 Page Cache is charged at 87 144 - add_to_page_cache_locked(). 88 - 89 - uncharged at 90 - - __remove_from_page_cache(). 91 145 92 146 The logic is very clear. (About migration, see below) 93 147 Note: __remove_from_page_cache() is called by remove_from_page_cache() 94 148 and __remove_mapping(). 95 149 96 150 6. Shmem(tmpfs) Page Cache 97 - Memcg's charge/uncharge have special handlers of shmem. The best way 98 - to understand shmem's page state transition is to read mm/shmem.c. 151 + The best way to understand shmem's page state transition is to read 152 + mm/shmem.c. 99 153 But brief explanation of the behavior of memcg around shmem will be 100 154 helpful to understand the logic. 101 155 ··· 100 170 It's charged when... 101 171 - A new page is added to shmem's radix-tree. 102 172 - A swp page is read. (move a charge from swap_cgroup to page_cgroup) 103 - It's uncharged when 104 - - A page is removed from radix-tree and not SwapCache. 105 - - When SwapCache is removed, a charge is moved to swap_cgroup. 106 - - When swp_entry's refcnt goes down to 0, a charge in swap_cgroup 107 - disappears. 108 173 109 174 7. Page Migration 110 - One of the most complicated functions is page-migration-handler. 111 - Memcg has 2 routines. Assume that we are migrating a page's contents 112 - from OLDPAGE to NEWPAGE. 113 175 114 - Usual migration logic is.. 115 - (a) remove the page from LRU. 116 - (b) allocate NEWPAGE (migration target) 117 - (c) lock by lock_page(). 118 - (d) unmap all mappings. 119 - (e-1) If necessary, replace entry in radix-tree. 120 - (e-2) move contents of a page. 121 - (f) map all mappings again. 122 - (g) pushback the page to LRU. 123 - (-) OLDPAGE will be freed. 124 - 125 - Before (g), memcg should complete all necessary charge/uncharge to 126 - NEWPAGE/OLDPAGE. 127 - 128 - The point is.... 129 - - If OLDPAGE is anonymous, all charges will be dropped at (d) because 130 - try_to_unmap() drops all mapcount and the page will not be 131 - SwapCache. 132 - 133 - - If OLDPAGE is SwapCache, charges will be kept at (g) because 134 - __delete_from_swap_cache() isn't called at (e-1) 135 - 136 - - If OLDPAGE is page-cache, charges will be kept at (g) because 137 - remove_from_swap_cache() isn't called at (e-1) 138 - 139 - memcg provides following hooks. 140 - 141 - - mem_cgroup_prepare_migration(OLDPAGE) 142 - Called after (b) to account a charge (usage += PAGE_SIZE) against 143 - memcg which OLDPAGE belongs to. 144 - 145 - - mem_cgroup_end_migration(OLDPAGE, NEWPAGE) 146 - Called after (f) before (g). 147 - If OLDPAGE is used, commit OLDPAGE again. If OLDPAGE is already 148 - charged, a charge by prepare_migration() is automatically canceled. 149 - If NEWPAGE is used, commit NEWPAGE and uncharge OLDPAGE. 150 - 151 - But zap_pte() (by exit or munmap) can be called while migration, 152 - we have to check if OLDPAGE/NEWPAGE is a valid page after commit(). 176 + mem_cgroup_migrate() 153 177 154 178 8. LRU 155 179 Each memcg has its own private LRU. Now, its handling is under global
+16 -35
include/linux/memcontrol.h
··· 60 60 bool lrucare); 61 61 void mem_cgroup_cancel_charge(struct page *page, struct mem_cgroup *memcg); 62 62 63 + void mem_cgroup_uncharge(struct page *page); 64 + 65 + /* Batched uncharging */ 66 + void mem_cgroup_uncharge_start(void); 67 + void mem_cgroup_uncharge_end(void); 68 + 69 + void mem_cgroup_migrate(struct page *oldpage, struct page *newpage, 70 + bool lrucare); 71 + 63 72 struct lruvec *mem_cgroup_zone_lruvec(struct zone *, struct mem_cgroup *); 64 73 struct lruvec *mem_cgroup_page_lruvec(struct page *, struct zone *); 65 - 66 - /* For coalescing uncharge for reducing memcg' overhead*/ 67 - extern void mem_cgroup_uncharge_start(void); 68 - extern void mem_cgroup_uncharge_end(void); 69 - 70 - extern void mem_cgroup_uncharge_page(struct page *page); 71 - extern void mem_cgroup_uncharge_cache_page(struct page *page); 72 74 73 75 bool __mem_cgroup_same_or_subtree(const struct mem_cgroup *root_memcg, 74 76 struct mem_cgroup *memcg); ··· 98 96 99 97 extern struct cgroup_subsys_state *mem_cgroup_css(struct mem_cgroup *memcg); 100 98 101 - extern void 102 - mem_cgroup_prepare_migration(struct page *page, struct page *newpage, 103 - struct mem_cgroup **memcgp); 104 - extern void mem_cgroup_end_migration(struct mem_cgroup *memcg, 105 - struct page *oldpage, struct page *newpage, bool migration_ok); 106 - 107 99 struct mem_cgroup *mem_cgroup_iter(struct mem_cgroup *, 108 100 struct mem_cgroup *, 109 101 struct mem_cgroup_reclaim_cookie *); ··· 112 116 void mem_cgroup_update_lru_size(struct lruvec *, enum lru_list, int); 113 117 extern void mem_cgroup_print_oom_info(struct mem_cgroup *memcg, 114 118 struct task_struct *p); 115 - extern void mem_cgroup_replace_page_cache(struct page *oldpage, 116 - struct page *newpage); 117 119 118 120 static inline void mem_cgroup_oom_enable(void) 119 121 { ··· 229 235 { 230 236 } 231 237 238 + static inline void mem_cgroup_uncharge(struct page *page) 239 + { 240 + } 241 + 232 242 static inline void mem_cgroup_uncharge_start(void) 233 243 { 234 244 } ··· 241 243 { 242 244 } 243 245 244 - static inline void mem_cgroup_uncharge_page(struct page *page) 245 - { 246 - } 247 - 248 - static inline void mem_cgroup_uncharge_cache_page(struct page *page) 246 + static inline void mem_cgroup_migrate(struct page *oldpage, 247 + struct page *newpage, 248 + bool lrucare) 249 249 { 250 250 } 251 251 ··· 280 284 *mem_cgroup_css(struct mem_cgroup *memcg) 281 285 { 282 286 return NULL; 283 - } 284 - 285 - static inline void 286 - mem_cgroup_prepare_migration(struct page *page, struct page *newpage, 287 - struct mem_cgroup **memcgp) 288 - { 289 - } 290 - 291 - static inline void mem_cgroup_end_migration(struct mem_cgroup *memcg, 292 - struct page *oldpage, struct page *newpage, bool migration_ok) 293 - { 294 287 } 295 288 296 289 static inline struct mem_cgroup * ··· 375 390 376 391 static inline 377 392 void mem_cgroup_count_vm_event(struct mm_struct *mm, enum vm_event_item idx) 378 - { 379 - } 380 - static inline void mem_cgroup_replace_page_cache(struct page *oldpage, 381 - struct page *newpage) 382 393 { 383 394 } 384 395 #endif /* CONFIG_MEMCG */
+5 -38
include/linux/page_cgroup.h
··· 3 3 4 4 enum { 5 5 /* flags for mem_cgroup */ 6 - PCG_LOCK, /* Lock for pc->mem_cgroup and following bits. */ 7 - PCG_USED, /* this object is in use. */ 8 - PCG_MIGRATION, /* under page migration */ 6 + PCG_USED = 0x01, /* This page is charged to a memcg */ 7 + PCG_MEM = 0x02, /* This page holds a memory charge */ 8 + PCG_MEMSW = 0x04, /* This page holds a memory+swap charge */ 9 9 __NR_PCG_FLAGS, 10 10 }; 11 11 ··· 44 44 struct page_cgroup *lookup_page_cgroup(struct page *page); 45 45 struct page *lookup_cgroup_page(struct page_cgroup *pc); 46 46 47 - #define TESTPCGFLAG(uname, lname) \ 48 - static inline int PageCgroup##uname(struct page_cgroup *pc) \ 49 - { return test_bit(PCG_##lname, &pc->flags); } 50 - 51 - #define SETPCGFLAG(uname, lname) \ 52 - static inline void SetPageCgroup##uname(struct page_cgroup *pc)\ 53 - { set_bit(PCG_##lname, &pc->flags); } 54 - 55 - #define CLEARPCGFLAG(uname, lname) \ 56 - static inline void ClearPageCgroup##uname(struct page_cgroup *pc) \ 57 - { clear_bit(PCG_##lname, &pc->flags); } 58 - 59 - #define TESTCLEARPCGFLAG(uname, lname) \ 60 - static inline int TestClearPageCgroup##uname(struct page_cgroup *pc) \ 61 - { return test_and_clear_bit(PCG_##lname, &pc->flags); } 62 - 63 - TESTPCGFLAG(Used, USED) 64 - CLEARPCGFLAG(Used, USED) 65 - SETPCGFLAG(Used, USED) 66 - 67 - SETPCGFLAG(Migration, MIGRATION) 68 - CLEARPCGFLAG(Migration, MIGRATION) 69 - TESTPCGFLAG(Migration, MIGRATION) 70 - 71 - static inline void lock_page_cgroup(struct page_cgroup *pc) 47 + static inline int PageCgroupUsed(struct page_cgroup *pc) 72 48 { 73 - /* 74 - * Don't take this lock in IRQ context. 75 - * This lock is for pc->mem_cgroup, USED, MIGRATION 76 - */ 77 - bit_spin_lock(PCG_LOCK, &pc->flags); 78 - } 79 - 80 - static inline void unlock_page_cgroup(struct page_cgroup *pc) 81 - { 82 - bit_spin_unlock(PCG_LOCK, &pc->flags); 49 + return !!(pc->flags & PCG_USED); 83 50 } 84 51 85 52 #else /* CONFIG_MEMCG */
+8 -4
include/linux/swap.h
··· 381 381 } 382 382 #endif 383 383 #ifdef CONFIG_MEMCG_SWAP 384 - extern void mem_cgroup_uncharge_swap(swp_entry_t ent); 384 + extern void mem_cgroup_swapout(struct page *page, swp_entry_t entry); 385 + extern void mem_cgroup_uncharge_swap(swp_entry_t entry); 385 386 #else 386 - static inline void mem_cgroup_uncharge_swap(swp_entry_t ent) 387 + static inline void mem_cgroup_swapout(struct page *page, swp_entry_t entry) 388 + { 389 + } 390 + static inline void mem_cgroup_uncharge_swap(swp_entry_t entry) 387 391 { 388 392 } 389 393 #endif ··· 447 443 extern int swap_duplicate(swp_entry_t); 448 444 extern int swapcache_prepare(swp_entry_t); 449 445 extern void swap_free(swp_entry_t); 450 - extern void swapcache_free(swp_entry_t, struct page *page); 446 + extern void swapcache_free(swp_entry_t); 451 447 extern int free_swap_and_cache(swp_entry_t); 452 448 extern int swap_type_of(dev_t, sector_t, struct block_device **); 453 449 extern unsigned int count_swap_pages(int, int); ··· 511 507 { 512 508 } 513 509 514 - static inline void swapcache_free(swp_entry_t swp, struct page *page) 510 + static inline void swapcache_free(swp_entry_t swp) 515 511 { 516 512 } 517 513
+1 -3
mm/filemap.c
··· 234 234 spin_lock_irq(&mapping->tree_lock); 235 235 __delete_from_page_cache(page, NULL); 236 236 spin_unlock_irq(&mapping->tree_lock); 237 - mem_cgroup_uncharge_cache_page(page); 238 237 239 238 if (freepage) 240 239 freepage(page); ··· 489 490 if (PageSwapBacked(new)) 490 491 __inc_zone_page_state(new, NR_SHMEM); 491 492 spin_unlock_irq(&mapping->tree_lock); 492 - /* mem_cgroup codes must not be called under tree_lock */ 493 - mem_cgroup_replace_page_cache(old, new); 493 + mem_cgroup_migrate(old, new, true); 494 494 radix_tree_preload_end(); 495 495 if (freepage) 496 496 freepage(old);
+321 -507
mm/memcontrol.c
··· 754 754 static void mem_cgroup_remove_exceeded(struct mem_cgroup_per_zone *mz, 755 755 struct mem_cgroup_tree_per_zone *mctz) 756 756 { 757 - spin_lock(&mctz->lock); 757 + unsigned long flags; 758 + 759 + spin_lock_irqsave(&mctz->lock, flags); 758 760 __mem_cgroup_remove_exceeded(mz, mctz); 759 - spin_unlock(&mctz->lock); 761 + spin_unlock_irqrestore(&mctz->lock, flags); 760 762 } 761 763 762 764 ··· 781 779 * mem is over its softlimit. 782 780 */ 783 781 if (excess || mz->on_tree) { 784 - spin_lock(&mctz->lock); 782 + unsigned long flags; 783 + 784 + spin_lock_irqsave(&mctz->lock, flags); 785 785 /* if on-tree, remove it */ 786 786 if (mz->on_tree) 787 787 __mem_cgroup_remove_exceeded(mz, mctz); ··· 792 788 * If excess is 0, no tree ops. 793 789 */ 794 790 __mem_cgroup_insert_exceeded(mz, mctz, excess); 795 - spin_unlock(&mctz->lock); 791 + spin_unlock_irqrestore(&mctz->lock, flags); 796 792 } 797 793 } 798 794 } ··· 843 839 { 844 840 struct mem_cgroup_per_zone *mz; 845 841 846 - spin_lock(&mctz->lock); 842 + spin_lock_irq(&mctz->lock); 847 843 mz = __mem_cgroup_largest_soft_limit_node(mctz); 848 - spin_unlock(&mctz->lock); 844 + spin_unlock_irq(&mctz->lock); 849 845 return mz; 850 846 } 851 847 ··· 886 882 return val; 887 883 } 888 884 889 - static void mem_cgroup_swap_statistics(struct mem_cgroup *memcg, 890 - bool charge) 891 - { 892 - int val = (charge) ? 1 : -1; 893 - this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAP], val); 894 - } 895 - 896 885 static unsigned long mem_cgroup_read_events(struct mem_cgroup *memcg, 897 886 enum mem_cgroup_events_index idx) 898 887 { ··· 906 909 907 910 static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, 908 911 struct page *page, 909 - bool anon, int nr_pages) 912 + int nr_pages) 910 913 { 911 914 /* 912 915 * Here, RSS means 'mapped anon' and anon's SwapCache. Shmem/tmpfs is 913 916 * counted as CACHE even if it's on ANON LRU. 914 917 */ 915 - if (anon) 918 + if (PageAnon(page)) 916 919 __this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_RSS], 917 920 nr_pages); 918 921 else ··· 1010 1013 */ 1011 1014 static void memcg_check_events(struct mem_cgroup *memcg, struct page *page) 1012 1015 { 1013 - preempt_disable(); 1014 1016 /* threshold event is triggered in finer grain than soft limit */ 1015 1017 if (unlikely(mem_cgroup_event_ratelimit(memcg, 1016 1018 MEM_CGROUP_TARGET_THRESH))) { ··· 1022 1026 do_numainfo = mem_cgroup_event_ratelimit(memcg, 1023 1027 MEM_CGROUP_TARGET_NUMAINFO); 1024 1028 #endif 1025 - preempt_enable(); 1026 - 1027 1029 mem_cgroup_threshold(memcg); 1028 1030 if (unlikely(do_softlimit)) 1029 1031 mem_cgroup_update_tree(memcg, page); ··· 1029 1035 if (unlikely(do_numainfo)) 1030 1036 atomic_inc(&memcg->numainfo_events); 1031 1037 #endif 1032 - } else 1033 - preempt_enable(); 1038 + } 1034 1039 } 1035 1040 1036 1041 struct mem_cgroup *mem_cgroup_from_task(struct task_struct *p) ··· 1339 1346 lruvec->zone = zone; 1340 1347 return lruvec; 1341 1348 } 1342 - 1343 - /* 1344 - * Following LRU functions are allowed to be used without PCG_LOCK. 1345 - * Operations are called by routine of global LRU independently from memcg. 1346 - * What we have to take care of here is validness of pc->mem_cgroup. 1347 - * 1348 - * Changes to pc->mem_cgroup happens when 1349 - * 1. charge 1350 - * 2. moving account 1351 - * In typical case, "charge" is done before add-to-lru. Exception is SwapCache. 1352 - * It is added to LRU before charge. 1353 - * If PCG_USED bit is not set, page_cgroup is not added to this private LRU. 1354 - * When moving account, the page is not on LRU. It's isolated. 1355 - */ 1356 1349 1357 1350 /** 1358 1351 * mem_cgroup_page_lruvec - return lruvec for adding an lru page ··· 2240 2261 * 2241 2262 * Notes: Race condition 2242 2263 * 2243 - * We usually use lock_page_cgroup() for accessing page_cgroup member but 2244 - * it tends to be costly. But considering some conditions, we doesn't need 2245 - * to do so _always_. 2264 + * Charging occurs during page instantiation, while the page is 2265 + * unmapped and locked in page migration, or while the page table is 2266 + * locked in THP migration. No race is possible. 2246 2267 * 2247 - * Considering "charge", lock_page_cgroup() is not required because all 2248 - * file-stat operations happen after a page is attached to radix-tree. There 2249 - * are no race with "charge". 2268 + * Uncharge happens to pages with zero references, no race possible. 2250 2269 * 2251 - * Considering "uncharge", we know that memcg doesn't clear pc->mem_cgroup 2252 - * at "uncharge" intentionally. So, we always see valid pc->mem_cgroup even 2253 - * if there are race with "uncharge". Statistics itself is properly handled 2254 - * by flags. 2255 - * 2256 - * Considering "move", this is an only case we see a race. To make the race 2257 - * small, we check memcg->moving_account and detect there are possibility 2258 - * of race or not. If there is, we take a lock. 2270 + * Charge moving between groups is protected by checking mm->moving 2271 + * account and taking the move_lock in the slowpath. 2259 2272 */ 2260 2273 2261 2274 void __mem_cgroup_begin_update_page_stat(struct page *page, ··· 2660 2689 return mem_cgroup_from_id(id); 2661 2690 } 2662 2691 2692 + /* 2693 + * try_get_mem_cgroup_from_page - look up page's memcg association 2694 + * @page: the page 2695 + * 2696 + * Look up, get a css reference, and return the memcg that owns @page. 2697 + * 2698 + * The page must be locked to prevent racing with swap-in and page 2699 + * cache charges. If coming from an unlocked page table, the caller 2700 + * must ensure the page is on the LRU or this can race with charging. 2701 + */ 2663 2702 struct mem_cgroup *try_get_mem_cgroup_from_page(struct page *page) 2664 2703 { 2665 2704 struct mem_cgroup *memcg = NULL; ··· 2680 2699 VM_BUG_ON_PAGE(!PageLocked(page), page); 2681 2700 2682 2701 pc = lookup_page_cgroup(page); 2683 - lock_page_cgroup(pc); 2684 2702 if (PageCgroupUsed(pc)) { 2685 2703 memcg = pc->mem_cgroup; 2686 2704 if (memcg && !css_tryget_online(&memcg->css)) ··· 2693 2713 memcg = NULL; 2694 2714 rcu_read_unlock(); 2695 2715 } 2696 - unlock_page_cgroup(pc); 2697 2716 return memcg; 2698 2717 } 2699 2718 2719 + static void lock_page_lru(struct page *page, int *isolated) 2720 + { 2721 + struct zone *zone = page_zone(page); 2722 + 2723 + spin_lock_irq(&zone->lru_lock); 2724 + if (PageLRU(page)) { 2725 + struct lruvec *lruvec; 2726 + 2727 + lruvec = mem_cgroup_page_lruvec(page, zone); 2728 + ClearPageLRU(page); 2729 + del_page_from_lru_list(page, lruvec, page_lru(page)); 2730 + *isolated = 1; 2731 + } else 2732 + *isolated = 0; 2733 + } 2734 + 2735 + static void unlock_page_lru(struct page *page, int isolated) 2736 + { 2737 + struct zone *zone = page_zone(page); 2738 + 2739 + if (isolated) { 2740 + struct lruvec *lruvec; 2741 + 2742 + lruvec = mem_cgroup_page_lruvec(page, zone); 2743 + VM_BUG_ON_PAGE(PageLRU(page), page); 2744 + SetPageLRU(page); 2745 + add_page_to_lru_list(page, lruvec, page_lru(page)); 2746 + } 2747 + spin_unlock_irq(&zone->lru_lock); 2748 + } 2749 + 2700 2750 static void commit_charge(struct page *page, struct mem_cgroup *memcg, 2701 - unsigned int nr_pages, bool anon, bool lrucare) 2751 + unsigned int nr_pages, bool lrucare) 2702 2752 { 2703 2753 struct page_cgroup *pc = lookup_page_cgroup(page); 2704 - struct zone *uninitialized_var(zone); 2705 - struct lruvec *lruvec; 2706 - bool was_on_lru = false; 2754 + int isolated; 2707 2755 2708 - lock_page_cgroup(pc); 2709 2756 VM_BUG_ON_PAGE(PageCgroupUsed(pc), page); 2710 2757 /* 2711 2758 * we don't need page_cgroup_lock about tail pages, becase they are not ··· 2743 2736 * In some cases, SwapCache and FUSE(splice_buf->radixtree), the page 2744 2737 * may already be on some other mem_cgroup's LRU. Take care of it. 2745 2738 */ 2746 - if (lrucare) { 2747 - zone = page_zone(page); 2748 - spin_lock_irq(&zone->lru_lock); 2749 - if (PageLRU(page)) { 2750 - lruvec = mem_cgroup_zone_lruvec(zone, pc->mem_cgroup); 2751 - ClearPageLRU(page); 2752 - del_page_from_lru_list(page, lruvec, page_lru(page)); 2753 - was_on_lru = true; 2754 - } 2755 - } 2739 + if (lrucare) 2740 + lock_page_lru(page, &isolated); 2756 2741 2742 + /* 2743 + * Nobody should be changing or seriously looking at 2744 + * pc->mem_cgroup and pc->flags at this point: 2745 + * 2746 + * - the page is uncharged 2747 + * 2748 + * - the page is off-LRU 2749 + * 2750 + * - an anonymous fault has exclusive page access, except for 2751 + * a locked page table 2752 + * 2753 + * - a page cache insertion, a swapin fault, or a migration 2754 + * have the page locked 2755 + */ 2757 2756 pc->mem_cgroup = memcg; 2758 - SetPageCgroupUsed(pc); 2757 + pc->flags = PCG_USED | PCG_MEM | (do_swap_account ? PCG_MEMSW : 0); 2759 2758 2760 - if (lrucare) { 2761 - if (was_on_lru) { 2762 - lruvec = mem_cgroup_zone_lruvec(zone, pc->mem_cgroup); 2763 - VM_BUG_ON_PAGE(PageLRU(page), page); 2764 - SetPageLRU(page); 2765 - add_page_to_lru_list(page, lruvec, page_lru(page)); 2766 - } 2767 - spin_unlock_irq(&zone->lru_lock); 2768 - } 2759 + if (lrucare) 2760 + unlock_page_lru(page, isolated); 2769 2761 2770 - mem_cgroup_charge_statistics(memcg, page, anon, nr_pages); 2771 - unlock_page_cgroup(pc); 2772 - 2762 + local_irq_disable(); 2763 + mem_cgroup_charge_statistics(memcg, page, nr_pages); 2773 2764 /* 2774 2765 * "charge_statistics" updated event counter. Then, check it. 2775 2766 * Insert ancestor (and ancestor's ancestors), to softlimit RB-tree. 2776 2767 * if they exceeds softlimit. 2777 2768 */ 2778 2769 memcg_check_events(memcg, page); 2770 + local_irq_enable(); 2779 2771 } 2780 2772 2781 2773 static DEFINE_MUTEX(set_limit_mutex); ··· 3401 3395 3402 3396 #ifdef CONFIG_TRANSPARENT_HUGEPAGE 3403 3397 3404 - #define PCGF_NOCOPY_AT_SPLIT (1 << PCG_LOCK | 1 << PCG_MIGRATION) 3405 3398 /* 3406 3399 * Because tail pages are not marked as "used", set it. We're under 3407 3400 * zone->lru_lock, 'splitting on pmd' and compound_lock. ··· 3421 3416 for (i = 1; i < HPAGE_PMD_NR; i++) { 3422 3417 pc = head_pc + i; 3423 3418 pc->mem_cgroup = memcg; 3424 - pc->flags = head_pc->flags & ~PCGF_NOCOPY_AT_SPLIT; 3419 + pc->flags = head_pc->flags; 3425 3420 } 3426 3421 __this_cpu_sub(memcg->stat->count[MEM_CGROUP_STAT_RSS_HUGE], 3427 3422 HPAGE_PMD_NR); ··· 3451 3446 { 3452 3447 unsigned long flags; 3453 3448 int ret; 3454 - bool anon = PageAnon(page); 3455 3449 3456 3450 VM_BUG_ON(from == to); 3457 3451 VM_BUG_ON_PAGE(PageLRU(page), page); ··· 3464 3460 if (nr_pages > 1 && !PageTransHuge(page)) 3465 3461 goto out; 3466 3462 3467 - lock_page_cgroup(pc); 3463 + /* 3464 + * Prevent mem_cgroup_migrate() from looking at pc->mem_cgroup 3465 + * of its source page while we change it: page migration takes 3466 + * both pages off the LRU, but page cache replacement doesn't. 3467 + */ 3468 + if (!trylock_page(page)) 3469 + goto out; 3468 3470 3469 3471 ret = -EINVAL; 3470 3472 if (!PageCgroupUsed(pc) || pc->mem_cgroup != from) 3471 - goto unlock; 3473 + goto out_unlock; 3472 3474 3473 3475 move_lock_mem_cgroup(from, &flags); 3474 3476 3475 - if (!anon && page_mapped(page)) { 3477 + if (!PageAnon(page) && page_mapped(page)) { 3476 3478 __this_cpu_sub(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED], 3477 3479 nr_pages); 3478 3480 __this_cpu_add(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED], ··· 3492 3482 nr_pages); 3493 3483 } 3494 3484 3495 - mem_cgroup_charge_statistics(from, page, anon, -nr_pages); 3485 + /* 3486 + * It is safe to change pc->mem_cgroup here because the page 3487 + * is referenced, charged, and isolated - we can't race with 3488 + * uncharging, charging, migration, or LRU putback. 3489 + */ 3496 3490 3497 3491 /* caller should have done css_get */ 3498 3492 pc->mem_cgroup = to; 3499 - mem_cgroup_charge_statistics(to, page, anon, nr_pages); 3500 3493 move_unlock_mem_cgroup(from, &flags); 3501 3494 ret = 0; 3502 - unlock: 3503 - unlock_page_cgroup(pc); 3504 - /* 3505 - * check events 3506 - */ 3495 + 3496 + local_irq_disable(); 3497 + mem_cgroup_charge_statistics(to, page, nr_pages); 3507 3498 memcg_check_events(to, page); 3499 + mem_cgroup_charge_statistics(from, page, -nr_pages); 3508 3500 memcg_check_events(from, page); 3501 + local_irq_enable(); 3502 + out_unlock: 3503 + unlock_page(page); 3509 3504 out: 3510 3505 return ret; 3511 3506 } ··· 3581 3566 return ret; 3582 3567 } 3583 3568 3584 - static void mem_cgroup_do_uncharge(struct mem_cgroup *memcg, 3585 - unsigned int nr_pages, 3586 - const enum charge_type ctype) 3587 - { 3588 - struct memcg_batch_info *batch = NULL; 3589 - bool uncharge_memsw = true; 3590 - 3591 - /* If swapout, usage of swap doesn't decrease */ 3592 - if (!do_swap_account || ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT) 3593 - uncharge_memsw = false; 3594 - 3595 - batch = &current->memcg_batch; 3596 - /* 3597 - * In usual, we do css_get() when we remember memcg pointer. 3598 - * But in this case, we keep res->usage until end of a series of 3599 - * uncharges. Then, it's ok to ignore memcg's refcnt. 3600 - */ 3601 - if (!batch->memcg) 3602 - batch->memcg = memcg; 3603 - /* 3604 - * do_batch > 0 when unmapping pages or inode invalidate/truncate. 3605 - * In those cases, all pages freed continuously can be expected to be in 3606 - * the same cgroup and we have chance to coalesce uncharges. 3607 - * But we do uncharge one by one if this is killed by OOM(TIF_MEMDIE) 3608 - * because we want to do uncharge as soon as possible. 3609 - */ 3610 - 3611 - if (!batch->do_batch || test_thread_flag(TIF_MEMDIE)) 3612 - goto direct_uncharge; 3613 - 3614 - if (nr_pages > 1) 3615 - goto direct_uncharge; 3616 - 3617 - /* 3618 - * In typical case, batch->memcg == mem. This means we can 3619 - * merge a series of uncharges to an uncharge of res_counter. 3620 - * If not, we uncharge res_counter ony by one. 3621 - */ 3622 - if (batch->memcg != memcg) 3623 - goto direct_uncharge; 3624 - /* remember freed charge and uncharge it later */ 3625 - batch->nr_pages++; 3626 - if (uncharge_memsw) 3627 - batch->memsw_nr_pages++; 3628 - return; 3629 - direct_uncharge: 3630 - res_counter_uncharge(&memcg->res, nr_pages * PAGE_SIZE); 3631 - if (uncharge_memsw) 3632 - res_counter_uncharge(&memcg->memsw, nr_pages * PAGE_SIZE); 3633 - if (unlikely(batch->memcg != memcg)) 3634 - memcg_oom_recover(memcg); 3635 - } 3636 - 3637 - /* 3638 - * uncharge if !page_mapped(page) 3639 - */ 3640 - static struct mem_cgroup * 3641 - __mem_cgroup_uncharge_common(struct page *page, enum charge_type ctype, 3642 - bool end_migration) 3643 - { 3644 - struct mem_cgroup *memcg = NULL; 3645 - unsigned int nr_pages = 1; 3646 - struct page_cgroup *pc; 3647 - bool anon; 3648 - 3649 - if (mem_cgroup_disabled()) 3650 - return NULL; 3651 - 3652 - if (PageTransHuge(page)) { 3653 - nr_pages <<= compound_order(page); 3654 - VM_BUG_ON_PAGE(!PageTransHuge(page), page); 3655 - } 3656 - /* 3657 - * Check if our page_cgroup is valid 3658 - */ 3659 - pc = lookup_page_cgroup(page); 3660 - if (unlikely(!PageCgroupUsed(pc))) 3661 - return NULL; 3662 - 3663 - lock_page_cgroup(pc); 3664 - 3665 - memcg = pc->mem_cgroup; 3666 - 3667 - if (!PageCgroupUsed(pc)) 3668 - goto unlock_out; 3669 - 3670 - anon = PageAnon(page); 3671 - 3672 - switch (ctype) { 3673 - case MEM_CGROUP_CHARGE_TYPE_ANON: 3674 - /* 3675 - * Generally PageAnon tells if it's the anon statistics to be 3676 - * updated; but sometimes e.g. mem_cgroup_uncharge_page() is 3677 - * used before page reached the stage of being marked PageAnon. 3678 - */ 3679 - anon = true; 3680 - /* fallthrough */ 3681 - case MEM_CGROUP_CHARGE_TYPE_DROP: 3682 - /* See mem_cgroup_prepare_migration() */ 3683 - if (page_mapped(page)) 3684 - goto unlock_out; 3685 - /* 3686 - * Pages under migration may not be uncharged. But 3687 - * end_migration() /must/ be the one uncharging the 3688 - * unused post-migration page and so it has to call 3689 - * here with the migration bit still set. See the 3690 - * res_counter handling below. 3691 - */ 3692 - if (!end_migration && PageCgroupMigration(pc)) 3693 - goto unlock_out; 3694 - break; 3695 - case MEM_CGROUP_CHARGE_TYPE_SWAPOUT: 3696 - if (!PageAnon(page)) { /* Shared memory */ 3697 - if (page->mapping && !page_is_file_cache(page)) 3698 - goto unlock_out; 3699 - } else if (page_mapped(page)) /* Anon */ 3700 - goto unlock_out; 3701 - break; 3702 - default: 3703 - break; 3704 - } 3705 - 3706 - mem_cgroup_charge_statistics(memcg, page, anon, -nr_pages); 3707 - 3708 - ClearPageCgroupUsed(pc); 3709 - /* 3710 - * pc->mem_cgroup is not cleared here. It will be accessed when it's 3711 - * freed from LRU. This is safe because uncharged page is expected not 3712 - * to be reused (freed soon). Exception is SwapCache, it's handled by 3713 - * special functions. 3714 - */ 3715 - 3716 - unlock_page_cgroup(pc); 3717 - /* 3718 - * even after unlock, we have memcg->res.usage here and this memcg 3719 - * will never be freed, so it's safe to call css_get(). 3720 - */ 3721 - memcg_check_events(memcg, page); 3722 - if (do_swap_account && ctype == MEM_CGROUP_CHARGE_TYPE_SWAPOUT) { 3723 - mem_cgroup_swap_statistics(memcg, true); 3724 - css_get(&memcg->css); 3725 - } 3726 - /* 3727 - * Migration does not charge the res_counter for the 3728 - * replacement page, so leave it alone when phasing out the 3729 - * page that is unused after the migration. 3730 - */ 3731 - if (!end_migration) 3732 - mem_cgroup_do_uncharge(memcg, nr_pages, ctype); 3733 - 3734 - return memcg; 3735 - 3736 - unlock_out: 3737 - unlock_page_cgroup(pc); 3738 - return NULL; 3739 - } 3740 - 3741 - void mem_cgroup_uncharge_page(struct page *page) 3742 - { 3743 - /* early check. */ 3744 - if (page_mapped(page)) 3745 - return; 3746 - VM_BUG_ON_PAGE(page->mapping && !PageAnon(page), page); 3747 - /* 3748 - * If the page is in swap cache, uncharge should be deferred 3749 - * to the swap path, which also properly accounts swap usage 3750 - * and handles memcg lifetime. 3751 - * 3752 - * Note that this check is not stable and reclaim may add the 3753 - * page to swap cache at any time after this. However, if the 3754 - * page is not in swap cache by the time page->mapcount hits 3755 - * 0, there won't be any page table references to the swap 3756 - * slot, and reclaim will free it and not actually write the 3757 - * page to disk. 3758 - */ 3759 - if (PageSwapCache(page)) 3760 - return; 3761 - __mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_ANON, false); 3762 - } 3763 - 3764 - void mem_cgroup_uncharge_cache_page(struct page *page) 3765 - { 3766 - VM_BUG_ON_PAGE(page_mapped(page), page); 3767 - VM_BUG_ON_PAGE(page->mapping, page); 3768 - __mem_cgroup_uncharge_common(page, MEM_CGROUP_CHARGE_TYPE_CACHE, false); 3769 - } 3770 - 3771 3569 /* 3772 3570 * Batch_start/batch_end is called in unmap_page_range/invlidate/trucate. 3773 3571 * In that cases, pages are freed continuously and we can expect pages ··· 3591 3763 3592 3764 void mem_cgroup_uncharge_start(void) 3593 3765 { 3766 + unsigned long flags; 3767 + 3768 + local_irq_save(flags); 3594 3769 current->memcg_batch.do_batch++; 3595 3770 /* We can do nest. */ 3596 3771 if (current->memcg_batch.do_batch == 1) { ··· 3601 3770 current->memcg_batch.nr_pages = 0; 3602 3771 current->memcg_batch.memsw_nr_pages = 0; 3603 3772 } 3773 + local_irq_restore(flags); 3604 3774 } 3605 3775 3606 3776 void mem_cgroup_uncharge_end(void) 3607 3777 { 3608 3778 struct memcg_batch_info *batch = &current->memcg_batch; 3779 + unsigned long flags; 3609 3780 3610 - if (!batch->do_batch) 3611 - return; 3612 - 3613 - batch->do_batch--; 3614 - if (batch->do_batch) /* If stacked, do nothing. */ 3615 - return; 3616 - 3617 - if (!batch->memcg) 3618 - return; 3781 + local_irq_save(flags); 3782 + VM_BUG_ON(!batch->do_batch); 3783 + if (--batch->do_batch) /* If stacked, do nothing */ 3784 + goto out; 3619 3785 /* 3620 3786 * This "batch->memcg" is valid without any css_get/put etc... 3621 3787 * bacause we hide charges behind us. ··· 3624 3796 res_counter_uncharge(&batch->memcg->memsw, 3625 3797 batch->memsw_nr_pages * PAGE_SIZE); 3626 3798 memcg_oom_recover(batch->memcg); 3627 - /* forget this pointer (for sanity check) */ 3628 - batch->memcg = NULL; 3799 + out: 3800 + local_irq_restore(flags); 3629 3801 } 3630 - 3631 - #ifdef CONFIG_SWAP 3632 - /* 3633 - * called after __delete_from_swap_cache() and drop "page" account. 3634 - * memcg information is recorded to swap_cgroup of "ent" 3635 - */ 3636 - void 3637 - mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout) 3638 - { 3639 - struct mem_cgroup *memcg; 3640 - int ctype = MEM_CGROUP_CHARGE_TYPE_SWAPOUT; 3641 - 3642 - if (!swapout) /* this was a swap cache but the swap is unused ! */ 3643 - ctype = MEM_CGROUP_CHARGE_TYPE_DROP; 3644 - 3645 - memcg = __mem_cgroup_uncharge_common(page, ctype, false); 3646 - 3647 - /* 3648 - * record memcg information, if swapout && memcg != NULL, 3649 - * css_get() was called in uncharge(). 3650 - */ 3651 - if (do_swap_account && swapout && memcg) 3652 - swap_cgroup_record(ent, mem_cgroup_id(memcg)); 3653 - } 3654 - #endif 3655 3802 3656 3803 #ifdef CONFIG_MEMCG_SWAP 3657 - /* 3658 - * called from swap_entry_free(). remove record in swap_cgroup and 3659 - * uncharge "memsw" account. 3660 - */ 3661 - void mem_cgroup_uncharge_swap(swp_entry_t ent) 3804 + static void mem_cgroup_swap_statistics(struct mem_cgroup *memcg, 3805 + bool charge) 3662 3806 { 3663 - struct mem_cgroup *memcg; 3664 - unsigned short id; 3665 - 3666 - if (!do_swap_account) 3667 - return; 3668 - 3669 - id = swap_cgroup_record(ent, 0); 3670 - rcu_read_lock(); 3671 - memcg = mem_cgroup_lookup(id); 3672 - if (memcg) { 3673 - /* 3674 - * We uncharge this because swap is freed. This memcg can 3675 - * be obsolete one. We avoid calling css_tryget_online(). 3676 - */ 3677 - res_counter_uncharge(&memcg->memsw, PAGE_SIZE); 3678 - mem_cgroup_swap_statistics(memcg, false); 3679 - css_put(&memcg->css); 3680 - } 3681 - rcu_read_unlock(); 3807 + int val = (charge) ? 1 : -1; 3808 + this_cpu_add(memcg->stat->count[MEM_CGROUP_STAT_SWAP], val); 3682 3809 } 3683 3810 3684 3811 /** ··· 3684 3901 return -EINVAL; 3685 3902 } 3686 3903 #endif 3687 - 3688 - /* 3689 - * Before starting migration, account PAGE_SIZE to mem_cgroup that the old 3690 - * page belongs to. 3691 - */ 3692 - void mem_cgroup_prepare_migration(struct page *page, struct page *newpage, 3693 - struct mem_cgroup **memcgp) 3694 - { 3695 - struct mem_cgroup *memcg = NULL; 3696 - unsigned int nr_pages = 1; 3697 - struct page_cgroup *pc; 3698 - 3699 - *memcgp = NULL; 3700 - 3701 - if (mem_cgroup_disabled()) 3702 - return; 3703 - 3704 - if (PageTransHuge(page)) 3705 - nr_pages <<= compound_order(page); 3706 - 3707 - pc = lookup_page_cgroup(page); 3708 - lock_page_cgroup(pc); 3709 - if (PageCgroupUsed(pc)) { 3710 - memcg = pc->mem_cgroup; 3711 - css_get(&memcg->css); 3712 - /* 3713 - * At migrating an anonymous page, its mapcount goes down 3714 - * to 0 and uncharge() will be called. But, even if it's fully 3715 - * unmapped, migration may fail and this page has to be 3716 - * charged again. We set MIGRATION flag here and delay uncharge 3717 - * until end_migration() is called 3718 - * 3719 - * Corner Case Thinking 3720 - * A) 3721 - * When the old page was mapped as Anon and it's unmap-and-freed 3722 - * while migration was ongoing. 3723 - * If unmap finds the old page, uncharge() of it will be delayed 3724 - * until end_migration(). If unmap finds a new page, it's 3725 - * uncharged when it make mapcount to be 1->0. If unmap code 3726 - * finds swap_migration_entry, the new page will not be mapped 3727 - * and end_migration() will find it(mapcount==0). 3728 - * 3729 - * B) 3730 - * When the old page was mapped but migraion fails, the kernel 3731 - * remaps it. A charge for it is kept by MIGRATION flag even 3732 - * if mapcount goes down to 0. We can do remap successfully 3733 - * without charging it again. 3734 - * 3735 - * C) 3736 - * The "old" page is under lock_page() until the end of 3737 - * migration, so, the old page itself will not be swapped-out. 3738 - * If the new page is swapped out before end_migraton, our 3739 - * hook to usual swap-out path will catch the event. 3740 - */ 3741 - if (PageAnon(page)) 3742 - SetPageCgroupMigration(pc); 3743 - } 3744 - unlock_page_cgroup(pc); 3745 - /* 3746 - * If the page is not charged at this point, 3747 - * we return here. 3748 - */ 3749 - if (!memcg) 3750 - return; 3751 - 3752 - *memcgp = memcg; 3753 - /* 3754 - * We charge new page before it's used/mapped. So, even if unlock_page() 3755 - * is called before end_migration, we can catch all events on this new 3756 - * page. In the case new page is migrated but not remapped, new page's 3757 - * mapcount will be finally 0 and we call uncharge in end_migration(). 3758 - */ 3759 - /* 3760 - * The page is committed to the memcg, but it's not actually 3761 - * charged to the res_counter since we plan on replacing the 3762 - * old one and only one page is going to be left afterwards. 3763 - */ 3764 - commit_charge(newpage, memcg, nr_pages, PageAnon(page), false); 3765 - } 3766 - 3767 - /* remove redundant charge if migration failed*/ 3768 - void mem_cgroup_end_migration(struct mem_cgroup *memcg, 3769 - struct page *oldpage, struct page *newpage, bool migration_ok) 3770 - { 3771 - struct page *used, *unused; 3772 - struct page_cgroup *pc; 3773 - bool anon; 3774 - 3775 - if (!memcg) 3776 - return; 3777 - 3778 - if (!migration_ok) { 3779 - used = oldpage; 3780 - unused = newpage; 3781 - } else { 3782 - used = newpage; 3783 - unused = oldpage; 3784 - } 3785 - anon = PageAnon(used); 3786 - __mem_cgroup_uncharge_common(unused, 3787 - anon ? MEM_CGROUP_CHARGE_TYPE_ANON 3788 - : MEM_CGROUP_CHARGE_TYPE_CACHE, 3789 - true); 3790 - css_put(&memcg->css); 3791 - /* 3792 - * We disallowed uncharge of pages under migration because mapcount 3793 - * of the page goes down to zero, temporarly. 3794 - * Clear the flag and check the page should be charged. 3795 - */ 3796 - pc = lookup_page_cgroup(oldpage); 3797 - lock_page_cgroup(pc); 3798 - ClearPageCgroupMigration(pc); 3799 - unlock_page_cgroup(pc); 3800 - 3801 - /* 3802 - * If a page is a file cache, radix-tree replacement is very atomic 3803 - * and we can skip this check. When it was an Anon page, its mapcount 3804 - * goes down to 0. But because we added MIGRATION flage, it's not 3805 - * uncharged yet. There are several case but page->mapcount check 3806 - * and USED bit check in mem_cgroup_uncharge_page() will do enough 3807 - * check. (see prepare_charge() also) 3808 - */ 3809 - if (anon) 3810 - mem_cgroup_uncharge_page(used); 3811 - } 3812 - 3813 - /* 3814 - * At replace page cache, newpage is not under any memcg but it's on 3815 - * LRU. So, this function doesn't touch res_counter but handles LRU 3816 - * in correct way. Both pages are locked so we cannot race with uncharge. 3817 - */ 3818 - void mem_cgroup_replace_page_cache(struct page *oldpage, 3819 - struct page *newpage) 3820 - { 3821 - struct mem_cgroup *memcg = NULL; 3822 - struct page_cgroup *pc; 3823 - 3824 - if (mem_cgroup_disabled()) 3825 - return; 3826 - 3827 - pc = lookup_page_cgroup(oldpage); 3828 - /* fix accounting on old pages */ 3829 - lock_page_cgroup(pc); 3830 - if (PageCgroupUsed(pc)) { 3831 - memcg = pc->mem_cgroup; 3832 - mem_cgroup_charge_statistics(memcg, oldpage, false, -1); 3833 - ClearPageCgroupUsed(pc); 3834 - } 3835 - unlock_page_cgroup(pc); 3836 - 3837 - /* 3838 - * When called from shmem_replace_page(), in some cases the 3839 - * oldpage has already been charged, and in some cases not. 3840 - */ 3841 - if (!memcg) 3842 - return; 3843 - /* 3844 - * Even if newpage->mapping was NULL before starting replacement, 3845 - * the newpage may be on LRU(or pagevec for LRU) already. We lock 3846 - * LRU while we overwrite pc->mem_cgroup. 3847 - */ 3848 - commit_charge(newpage, memcg, 1, false, true); 3849 - } 3850 3904 3851 3905 #ifdef CONFIG_DEBUG_VM 3852 3906 static struct page_cgroup *lookup_page_cgroup_used(struct page *page) ··· 3883 4263 gfp_mask, &nr_scanned); 3884 4264 nr_reclaimed += reclaimed; 3885 4265 *total_scanned += nr_scanned; 3886 - spin_lock(&mctz->lock); 4266 + spin_lock_irq(&mctz->lock); 3887 4267 3888 4268 /* 3889 4269 * If we failed to reclaim anything from this memory cgroup ··· 3923 4303 */ 3924 4304 /* If excess == 0, no tree ops */ 3925 4305 __mem_cgroup_insert_exceeded(mz, mctz, excess); 3926 - spin_unlock(&mctz->lock); 4306 + spin_unlock_irq(&mctz->lock); 3927 4307 css_put(&mz->memcg->css); 3928 4308 loop++; 3929 4309 /* ··· 5885 6265 if (page) { 5886 6266 pc = lookup_page_cgroup(page); 5887 6267 /* 5888 - * Do only loose check w/o page_cgroup lock. 5889 - * mem_cgroup_move_account() checks the pc is valid or not under 5890 - * the lock. 6268 + * Do only loose check w/o serialization. 6269 + * mem_cgroup_move_account() checks the pc is valid or 6270 + * not under LRU exclusion. 5891 6271 */ 5892 6272 if (PageCgroupUsed(pc) && pc->mem_cgroup == mc.from) { 5893 6273 ret = MC_TARGET_PAGE; ··· 6349 6729 } 6350 6730 #endif 6351 6731 6732 + #ifdef CONFIG_MEMCG_SWAP 6733 + /** 6734 + * mem_cgroup_swapout - transfer a memsw charge to swap 6735 + * @page: page whose memsw charge to transfer 6736 + * @entry: swap entry to move the charge to 6737 + * 6738 + * Transfer the memsw charge of @page to @entry. 6739 + */ 6740 + void mem_cgroup_swapout(struct page *page, swp_entry_t entry) 6741 + { 6742 + struct page_cgroup *pc; 6743 + unsigned short oldid; 6744 + 6745 + VM_BUG_ON_PAGE(PageLRU(page), page); 6746 + VM_BUG_ON_PAGE(page_count(page), page); 6747 + 6748 + if (!do_swap_account) 6749 + return; 6750 + 6751 + pc = lookup_page_cgroup(page); 6752 + 6753 + /* Readahead page, never charged */ 6754 + if (!PageCgroupUsed(pc)) 6755 + return; 6756 + 6757 + VM_BUG_ON_PAGE(!(pc->flags & PCG_MEMSW), page); 6758 + 6759 + oldid = swap_cgroup_record(entry, mem_cgroup_id(pc->mem_cgroup)); 6760 + VM_BUG_ON_PAGE(oldid, page); 6761 + 6762 + pc->flags &= ~PCG_MEMSW; 6763 + css_get(&pc->mem_cgroup->css); 6764 + mem_cgroup_swap_statistics(pc->mem_cgroup, true); 6765 + } 6766 + 6767 + /** 6768 + * mem_cgroup_uncharge_swap - uncharge a swap entry 6769 + * @entry: swap entry to uncharge 6770 + * 6771 + * Drop the memsw charge associated with @entry. 6772 + */ 6773 + void mem_cgroup_uncharge_swap(swp_entry_t entry) 6774 + { 6775 + struct mem_cgroup *memcg; 6776 + unsigned short id; 6777 + 6778 + if (!do_swap_account) 6779 + return; 6780 + 6781 + id = swap_cgroup_record(entry, 0); 6782 + rcu_read_lock(); 6783 + memcg = mem_cgroup_lookup(id); 6784 + if (memcg) { 6785 + res_counter_uncharge(&memcg->memsw, PAGE_SIZE); 6786 + mem_cgroup_swap_statistics(memcg, false); 6787 + css_put(&memcg->css); 6788 + } 6789 + rcu_read_unlock(); 6790 + } 6791 + #endif 6792 + 6352 6793 /** 6353 6794 * mem_cgroup_try_charge - try charging a page 6354 6795 * @page: page to charge ··· 6512 6831 VM_BUG_ON_PAGE(!PageTransHuge(page), page); 6513 6832 } 6514 6833 6515 - commit_charge(page, memcg, nr_pages, PageAnon(page), lrucare); 6834 + commit_charge(page, memcg, nr_pages, lrucare); 6516 6835 6517 6836 if (do_swap_account && PageSwapCache(page)) { 6518 6837 swp_entry_t entry = { .val = page_private(page) }; ··· 6552 6871 } 6553 6872 6554 6873 cancel_charge(memcg, nr_pages); 6874 + } 6875 + 6876 + /** 6877 + * mem_cgroup_uncharge - uncharge a page 6878 + * @page: page to uncharge 6879 + * 6880 + * Uncharge a page previously charged with mem_cgroup_try_charge() and 6881 + * mem_cgroup_commit_charge(). 6882 + */ 6883 + void mem_cgroup_uncharge(struct page *page) 6884 + { 6885 + struct memcg_batch_info *batch; 6886 + unsigned int nr_pages = 1; 6887 + struct mem_cgroup *memcg; 6888 + struct page_cgroup *pc; 6889 + unsigned long pc_flags; 6890 + unsigned long flags; 6891 + 6892 + VM_BUG_ON_PAGE(PageLRU(page), page); 6893 + VM_BUG_ON_PAGE(page_count(page), page); 6894 + 6895 + if (mem_cgroup_disabled()) 6896 + return; 6897 + 6898 + pc = lookup_page_cgroup(page); 6899 + 6900 + /* Every final put_page() ends up here */ 6901 + if (!PageCgroupUsed(pc)) 6902 + return; 6903 + 6904 + if (PageTransHuge(page)) { 6905 + nr_pages <<= compound_order(page); 6906 + VM_BUG_ON_PAGE(!PageTransHuge(page), page); 6907 + } 6908 + /* 6909 + * Nobody should be changing or seriously looking at 6910 + * pc->mem_cgroup and pc->flags at this point, we have fully 6911 + * exclusive access to the page. 6912 + */ 6913 + memcg = pc->mem_cgroup; 6914 + pc_flags = pc->flags; 6915 + pc->flags = 0; 6916 + 6917 + local_irq_save(flags); 6918 + 6919 + if (nr_pages > 1) 6920 + goto direct; 6921 + if (unlikely(test_thread_flag(TIF_MEMDIE))) 6922 + goto direct; 6923 + batch = &current->memcg_batch; 6924 + if (!batch->do_batch) 6925 + goto direct; 6926 + if (batch->memcg && batch->memcg != memcg) 6927 + goto direct; 6928 + if (!batch->memcg) 6929 + batch->memcg = memcg; 6930 + if (pc_flags & PCG_MEM) 6931 + batch->nr_pages++; 6932 + if (pc_flags & PCG_MEMSW) 6933 + batch->memsw_nr_pages++; 6934 + goto out; 6935 + direct: 6936 + if (pc_flags & PCG_MEM) 6937 + res_counter_uncharge(&memcg->res, nr_pages * PAGE_SIZE); 6938 + if (pc_flags & PCG_MEMSW) 6939 + res_counter_uncharge(&memcg->memsw, nr_pages * PAGE_SIZE); 6940 + memcg_oom_recover(memcg); 6941 + out: 6942 + mem_cgroup_charge_statistics(memcg, page, -nr_pages); 6943 + memcg_check_events(memcg, page); 6944 + 6945 + local_irq_restore(flags); 6946 + } 6947 + 6948 + /** 6949 + * mem_cgroup_migrate - migrate a charge to another page 6950 + * @oldpage: currently charged page 6951 + * @newpage: page to transfer the charge to 6952 + * @lrucare: both pages might be on the LRU already 6953 + * 6954 + * Migrate the charge from @oldpage to @newpage. 6955 + * 6956 + * Both pages must be locked, @newpage->mapping must be set up. 6957 + */ 6958 + void mem_cgroup_migrate(struct page *oldpage, struct page *newpage, 6959 + bool lrucare) 6960 + { 6961 + unsigned int nr_pages = 1; 6962 + struct page_cgroup *pc; 6963 + int isolated; 6964 + 6965 + VM_BUG_ON_PAGE(!PageLocked(oldpage), oldpage); 6966 + VM_BUG_ON_PAGE(!PageLocked(newpage), newpage); 6967 + VM_BUG_ON_PAGE(!lrucare && PageLRU(oldpage), oldpage); 6968 + VM_BUG_ON_PAGE(!lrucare && PageLRU(newpage), newpage); 6969 + VM_BUG_ON_PAGE(PageAnon(oldpage) != PageAnon(newpage), newpage); 6970 + 6971 + if (mem_cgroup_disabled()) 6972 + return; 6973 + 6974 + /* Page cache replacement: new page already charged? */ 6975 + pc = lookup_page_cgroup(newpage); 6976 + if (PageCgroupUsed(pc)) 6977 + return; 6978 + 6979 + /* Re-entrant migration: old page already uncharged? */ 6980 + pc = lookup_page_cgroup(oldpage); 6981 + if (!PageCgroupUsed(pc)) 6982 + return; 6983 + 6984 + VM_BUG_ON_PAGE(!(pc->flags & PCG_MEM), oldpage); 6985 + VM_BUG_ON_PAGE(do_swap_account && !(pc->flags & PCG_MEMSW), oldpage); 6986 + 6987 + if (PageTransHuge(oldpage)) { 6988 + nr_pages <<= compound_order(oldpage); 6989 + VM_BUG_ON_PAGE(!PageTransHuge(oldpage), oldpage); 6990 + VM_BUG_ON_PAGE(!PageTransHuge(newpage), newpage); 6991 + } 6992 + 6993 + if (lrucare) 6994 + lock_page_lru(oldpage, &isolated); 6995 + 6996 + pc->flags = 0; 6997 + 6998 + if (lrucare) 6999 + unlock_page_lru(oldpage, isolated); 7000 + 7001 + local_irq_disable(); 7002 + mem_cgroup_charge_statistics(pc->mem_cgroup, oldpage, -nr_pages); 7003 + memcg_check_events(pc->mem_cgroup, oldpage); 7004 + local_irq_enable(); 7005 + 7006 + commit_charge(newpage, pc->mem_cgroup, nr_pages, lrucare); 6555 7007 } 6556 7008 6557 7009 /*
-2
mm/memory.c
··· 1292 1292 details = NULL; 1293 1293 1294 1294 BUG_ON(addr >= end); 1295 - mem_cgroup_uncharge_start(); 1296 1295 tlb_start_vma(tlb, vma); 1297 1296 pgd = pgd_offset(vma->vm_mm, addr); 1298 1297 do { ··· 1301 1302 next = zap_pud_range(tlb, vma, pgd, addr, next, details); 1302 1303 } while (pgd++, addr = next, addr != end); 1303 1304 tlb_end_vma(tlb, vma); 1304 - mem_cgroup_uncharge_end(); 1305 1305 } 1306 1306 1307 1307
+9 -29
mm/migrate.c
··· 780 780 if (rc != MIGRATEPAGE_SUCCESS) { 781 781 newpage->mapping = NULL; 782 782 } else { 783 + mem_cgroup_migrate(page, newpage, false); 783 784 if (remap_swapcache) 784 785 remove_migration_ptes(page, newpage); 785 786 page->mapping = NULL; ··· 796 795 { 797 796 int rc = -EAGAIN; 798 797 int remap_swapcache = 1; 799 - struct mem_cgroup *mem; 800 798 struct anon_vma *anon_vma = NULL; 801 799 802 800 if (!trylock_page(page)) { ··· 821 821 lock_page(page); 822 822 } 823 823 824 - /* charge against new page */ 825 - mem_cgroup_prepare_migration(page, newpage, &mem); 826 - 827 824 if (PageWriteback(page)) { 828 825 /* 829 826 * Only in the case of a full synchronous migration is it ··· 830 833 */ 831 834 if (mode != MIGRATE_SYNC) { 832 835 rc = -EBUSY; 833 - goto uncharge; 836 + goto out_unlock; 834 837 } 835 838 if (!force) 836 - goto uncharge; 839 + goto out_unlock; 837 840 wait_on_page_writeback(page); 838 841 } 839 842 /* ··· 869 872 */ 870 873 remap_swapcache = 0; 871 874 } else { 872 - goto uncharge; 875 + goto out_unlock; 873 876 } 874 877 } 875 878 ··· 882 885 * the page migration right away (proteced by page lock). 883 886 */ 884 887 rc = balloon_page_migrate(newpage, page, mode); 885 - goto uncharge; 888 + goto out_unlock; 886 889 } 887 890 888 891 /* ··· 901 904 VM_BUG_ON_PAGE(PageAnon(page), page); 902 905 if (page_has_private(page)) { 903 906 try_to_free_buffers(page); 904 - goto uncharge; 907 + goto out_unlock; 905 908 } 906 909 goto skip_unmap; 907 910 } ··· 920 923 if (anon_vma) 921 924 put_anon_vma(anon_vma); 922 925 923 - uncharge: 924 - mem_cgroup_end_migration(mem, page, newpage, 925 - (rc == MIGRATEPAGE_SUCCESS || 926 - rc == MIGRATEPAGE_BALLOON_SUCCESS)); 926 + out_unlock: 927 927 unlock_page(page); 928 928 out: 929 929 return rc; ··· 1780 1786 pg_data_t *pgdat = NODE_DATA(node); 1781 1787 int isolated = 0; 1782 1788 struct page *new_page = NULL; 1783 - struct mem_cgroup *memcg = NULL; 1784 1789 int page_lru = page_is_file_cache(page); 1785 1790 unsigned long mmun_start = address & HPAGE_PMD_MASK; 1786 1791 unsigned long mmun_end = mmun_start + HPAGE_PMD_SIZE; ··· 1845 1852 goto out_unlock; 1846 1853 } 1847 1854 1848 - /* 1849 - * Traditional migration needs to prepare the memcg charge 1850 - * transaction early to prevent the old page from being 1851 - * uncharged when installing migration entries. Here we can 1852 - * save the potential rollback and start the charge transfer 1853 - * only when migration is already known to end successfully. 1854 - */ 1855 - mem_cgroup_prepare_migration(page, new_page, &memcg); 1856 - 1857 1855 orig_entry = *pmd; 1858 1856 entry = mk_pmd(new_page, vma->vm_page_prot); 1859 1857 entry = pmd_mkhuge(entry); ··· 1872 1888 goto fail_putback; 1873 1889 } 1874 1890 1891 + mem_cgroup_migrate(page, new_page, false); 1892 + 1875 1893 page_remove_rmap(page); 1876 1894 1877 - /* 1878 - * Finish the charge transaction under the page table lock to 1879 - * prevent split_huge_page() from dividing up the charge 1880 - * before it's fully transferred to the new page. 1881 - */ 1882 - mem_cgroup_end_migration(memcg, page, new_page, true); 1883 1895 spin_unlock(ptl); 1884 1896 mmu_notifier_invalidate_range_end(mm, mmun_start, mmun_end); 1885 1897
-1
mm/rmap.c
··· 1089 1089 if (unlikely(PageHuge(page))) 1090 1090 goto out; 1091 1091 if (anon) { 1092 - mem_cgroup_uncharge_page(page); 1093 1092 if (PageTransHuge(page)) 1094 1093 __dec_zone_page_state(page, 1095 1094 NR_ANON_TRANSPARENT_HUGEPAGES);
+2 -6
mm/shmem.c
··· 419 419 pvec.pages, indices); 420 420 if (!pvec.nr) 421 421 break; 422 - mem_cgroup_uncharge_start(); 423 422 for (i = 0; i < pagevec_count(&pvec); i++) { 424 423 struct page *page = pvec.pages[i]; 425 424 ··· 446 447 } 447 448 pagevec_remove_exceptionals(&pvec); 448 449 pagevec_release(&pvec); 449 - mem_cgroup_uncharge_end(); 450 450 cond_resched(); 451 451 index++; 452 452 } ··· 493 495 index = start; 494 496 continue; 495 497 } 496 - mem_cgroup_uncharge_start(); 497 498 for (i = 0; i < pagevec_count(&pvec); i++) { 498 499 struct page *page = pvec.pages[i]; 499 500 ··· 528 531 } 529 532 pagevec_remove_exceptionals(&pvec); 530 533 pagevec_release(&pvec); 531 - mem_cgroup_uncharge_end(); 532 534 index++; 533 535 } 534 536 ··· 831 835 } 832 836 833 837 mutex_unlock(&shmem_swaplist_mutex); 834 - swapcache_free(swap, NULL); 838 + swapcache_free(swap); 835 839 redirty: 836 840 set_page_dirty(page); 837 841 if (wbc->for_reclaim) ··· 1004 1008 */ 1005 1009 oldpage = newpage; 1006 1010 } else { 1007 - mem_cgroup_replace_page_cache(oldpage, newpage); 1011 + mem_cgroup_migrate(oldpage, newpage, false); 1008 1012 lru_cache_add_anon(newpage); 1009 1013 *pagep = newpage; 1010 1014 }
+6
mm/swap.c
··· 62 62 del_page_from_lru_list(page, lruvec, page_off_lru(page)); 63 63 spin_unlock_irqrestore(&zone->lru_lock, flags); 64 64 } 65 + mem_cgroup_uncharge(page); 65 66 } 66 67 67 68 static void __put_single_page(struct page *page) ··· 908 907 struct lruvec *lruvec; 909 908 unsigned long uninitialized_var(flags); 910 909 910 + mem_cgroup_uncharge_start(); 911 + 911 912 for (i = 0; i < nr; i++) { 912 913 struct page *page = pages[i]; 913 914 ··· 941 938 __ClearPageLRU(page); 942 939 del_page_from_lru_list(page, lruvec, page_off_lru(page)); 943 940 } 941 + mem_cgroup_uncharge(page); 944 942 945 943 /* Clear Active bit in case of parallel mark_page_accessed */ 946 944 __ClearPageActive(page); ··· 950 946 } 951 947 if (zone) 952 948 spin_unlock_irqrestore(&zone->lru_lock, flags); 949 + 950 + mem_cgroup_uncharge_end(); 953 951 954 952 free_hot_cold_page_list(&pages_to_free, cold); 955 953 }
+4 -4
mm/swap_state.c
··· 176 176 177 177 if (unlikely(PageTransHuge(page))) 178 178 if (unlikely(split_huge_page_to_list(page, list))) { 179 - swapcache_free(entry, NULL); 179 + swapcache_free(entry); 180 180 return 0; 181 181 } 182 182 ··· 202 202 * add_to_swap_cache() doesn't return -EEXIST, so we can safely 203 203 * clear SWAP_HAS_CACHE flag. 204 204 */ 205 - swapcache_free(entry, NULL); 205 + swapcache_free(entry); 206 206 return 0; 207 207 } 208 208 } ··· 225 225 __delete_from_swap_cache(page); 226 226 spin_unlock_irq(&address_space->tree_lock); 227 227 228 - swapcache_free(entry, page); 228 + swapcache_free(entry); 229 229 page_cache_release(page); 230 230 } 231 231 ··· 386 386 * add_to_swap_cache() doesn't return -EEXIST, so we can safely 387 387 * clear SWAP_HAS_CACHE flag. 388 388 */ 389 - swapcache_free(entry, NULL); 389 + swapcache_free(entry); 390 390 } while (err != -ENOMEM); 391 391 392 392 if (new_page)
+2 -5
mm/swapfile.c
··· 843 843 /* 844 844 * Called after dropping swapcache to decrease refcnt to swap entries. 845 845 */ 846 - void swapcache_free(swp_entry_t entry, struct page *page) 846 + void swapcache_free(swp_entry_t entry) 847 847 { 848 848 struct swap_info_struct *p; 849 - unsigned char count; 850 849 851 850 p = swap_info_get(entry); 852 851 if (p) { 853 - count = swap_entry_free(p, entry, SWAP_HAS_CACHE); 854 - if (page) 855 - mem_cgroup_uncharge_swapcache(page, entry, count != 0); 852 + swap_entry_free(p, entry, SWAP_HAS_CACHE); 856 853 spin_unlock(&p->lock); 857 854 } 858 855 }
-9
mm/truncate.c
··· 281 281 while (index < end && pagevec_lookup_entries(&pvec, mapping, index, 282 282 min(end - index, (pgoff_t)PAGEVEC_SIZE), 283 283 indices)) { 284 - mem_cgroup_uncharge_start(); 285 284 for (i = 0; i < pagevec_count(&pvec); i++) { 286 285 struct page *page = pvec.pages[i]; 287 286 ··· 306 307 } 307 308 pagevec_remove_exceptionals(&pvec); 308 309 pagevec_release(&pvec); 309 - mem_cgroup_uncharge_end(); 310 310 cond_resched(); 311 311 index++; 312 312 } ··· 367 369 pagevec_release(&pvec); 368 370 break; 369 371 } 370 - mem_cgroup_uncharge_start(); 371 372 for (i = 0; i < pagevec_count(&pvec); i++) { 372 373 struct page *page = pvec.pages[i]; 373 374 ··· 391 394 } 392 395 pagevec_remove_exceptionals(&pvec); 393 396 pagevec_release(&pvec); 394 - mem_cgroup_uncharge_end(); 395 397 index++; 396 398 } 397 399 cleancache_invalidate_inode(mapping); ··· 489 493 while (index <= end && pagevec_lookup_entries(&pvec, mapping, index, 490 494 min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1, 491 495 indices)) { 492 - mem_cgroup_uncharge_start(); 493 496 for (i = 0; i < pagevec_count(&pvec); i++) { 494 497 struct page *page = pvec.pages[i]; 495 498 ··· 517 522 } 518 523 pagevec_remove_exceptionals(&pvec); 519 524 pagevec_release(&pvec); 520 - mem_cgroup_uncharge_end(); 521 525 cond_resched(); 522 526 index++; 523 527 } ··· 547 553 BUG_ON(page_has_private(page)); 548 554 __delete_from_page_cache(page, NULL); 549 555 spin_unlock_irq(&mapping->tree_lock); 550 - mem_cgroup_uncharge_cache_page(page); 551 556 552 557 if (mapping->a_ops->freepage) 553 558 mapping->a_ops->freepage(page); ··· 595 602 while (index <= end && pagevec_lookup_entries(&pvec, mapping, index, 596 603 min(end - index, (pgoff_t)PAGEVEC_SIZE - 1) + 1, 597 604 indices)) { 598 - mem_cgroup_uncharge_start(); 599 605 for (i = 0; i < pagevec_count(&pvec); i++) { 600 606 struct page *page = pvec.pages[i]; 601 607 ··· 647 655 } 648 656 pagevec_remove_exceptionals(&pvec); 649 657 pagevec_release(&pvec); 650 - mem_cgroup_uncharge_end(); 651 658 cond_resched(); 652 659 index++; 653 660 }
+9 -3
mm/vmscan.c
··· 577 577 578 578 if (PageSwapCache(page)) { 579 579 swp_entry_t swap = { .val = page_private(page) }; 580 + mem_cgroup_swapout(page, swap); 580 581 __delete_from_swap_cache(page); 581 582 spin_unlock_irq(&mapping->tree_lock); 582 - swapcache_free(swap, page); 583 + swapcache_free(swap); 583 584 } else { 584 585 void (*freepage)(struct page *); 585 586 void *shadow = NULL; ··· 601 600 shadow = workingset_eviction(mapping, page); 602 601 __delete_from_page_cache(page, shadow); 603 602 spin_unlock_irq(&mapping->tree_lock); 604 - mem_cgroup_uncharge_cache_page(page); 605 603 606 604 if (freepage != NULL) 607 605 freepage(page); ··· 1103 1103 */ 1104 1104 __clear_page_locked(page); 1105 1105 free_it: 1106 + mem_cgroup_uncharge(page); 1106 1107 nr_reclaimed++; 1107 1108 1108 1109 /* ··· 1133 1132 list_add(&page->lru, &ret_pages); 1134 1133 VM_BUG_ON_PAGE(PageLRU(page) || PageUnevictable(page), page); 1135 1134 } 1135 + mem_cgroup_uncharge_end(); 1136 1136 1137 1137 free_hot_cold_page_list(&free_pages, true); 1138 1138 1139 1139 list_splice(&ret_pages, page_list); 1140 1140 count_vm_events(PGACTIVATE, pgactivate); 1141 - mem_cgroup_uncharge_end(); 1141 + 1142 1142 *ret_nr_dirty += nr_dirty; 1143 1143 *ret_nr_congested += nr_congested; 1144 1144 *ret_nr_unqueued_dirty += nr_unqueued_dirty; ··· 1437 1435 __ClearPageActive(page); 1438 1436 del_page_from_lru_list(page, lruvec, lru); 1439 1437 1438 + mem_cgroup_uncharge(page); 1439 + 1440 1440 if (unlikely(PageCompound(page))) { 1441 1441 spin_unlock_irq(&zone->lru_lock); 1442 1442 (*get_compound_page_dtor(page))(page); ··· 1659 1655 __ClearPageLRU(page); 1660 1656 __ClearPageActive(page); 1661 1657 del_page_from_lru_list(page, lruvec, lru); 1658 + 1659 + mem_cgroup_uncharge(page); 1662 1660 1663 1661 if (unlikely(PageCompound(page))) { 1664 1662 spin_unlock_irq(&zone->lru_lock);
+1 -1
mm/zswap.c
··· 507 507 * add_to_swap_cache() doesn't return -EEXIST, so we can safely 508 508 * clear SWAP_HAS_CACHE flag. 509 509 */ 510 - swapcache_free(entry, NULL); 510 + swapcache_free(entry); 511 511 } while (err != -ENOMEM); 512 512 513 513 if (new_page)