Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

memcg: fix race in file_mapped accounting

Presently, memcg's FILE_MAPPED accounting has following race with
move_account (happens at rmdir()).

increment page->mapcount (rmap.c)
mem_cgroup_update_file_mapped() move_account()
lock_page_cgroup()
check page_mapped() if
page_mapped(page)>1 {
FILE_MAPPED -1 from old memcg
FILE_MAPPED +1 to old memcg
}
.....
overwrite pc->mem_cgroup
unlock_page_cgroup()
lock_page_cgroup()
FILE_MAPPED + 1 to pc->mem_cgroup
unlock_page_cgroup()

Then,
old memcg (-1 file mapped)
new memcg (+2 file mapped)

This happens because move_account see page_mapped() which is not guarded
by lock_page_cgroup(). This patch adds FILE_MAPPED flag to page_cgroup
and move account information based on it. Now, all checks are synchronous
with lock_page_cgroup().

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Balbir Singh <balbir@in.ibm.com>
Reviewed-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Andrea Righi <arighi@develer.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

KAMEZAWA Hiroyuki and committed by
Linus Torvalds
8725d541 116354d1

+15 -9
+6
include/linux/page_cgroup.h
··· 39 39 PCG_CACHE, /* charged as cache */ 40 40 PCG_USED, /* this object is in use. */ 41 41 PCG_ACCT_LRU, /* page has been accounted for */ 42 + PCG_FILE_MAPPED, /* page is accounted as "mapped" */ 42 43 }; 43 44 44 45 #define TESTPCGFLAG(uname, lname) \ ··· 73 72 CLEARPCGFLAG(AcctLRU, ACCT_LRU) 74 73 TESTPCGFLAG(AcctLRU, ACCT_LRU) 75 74 TESTCLEARPCGFLAG(AcctLRU, ACCT_LRU) 75 + 76 + 77 + SETPCGFLAG(FileMapped, FILE_MAPPED) 78 + CLEARPCGFLAG(FileMapped, FILE_MAPPED) 79 + TESTPCGFLAG(FileMapped, FILE_MAPPED) 76 80 77 81 static inline int page_cgroup_nid(struct page_cgroup *pc) 78 82 {
+9 -9
mm/memcontrol.c
··· 1359 1359 1360 1360 lock_page_cgroup(pc); 1361 1361 mem = pc->mem_cgroup; 1362 - if (!mem) 1363 - goto done; 1364 - 1365 - if (!PageCgroupUsed(pc)) 1362 + if (!mem || !PageCgroupUsed(pc)) 1366 1363 goto done; 1367 1364 1368 1365 /* 1369 1366 * Preemption is already disabled. We can use __this_cpu_xxx 1370 1367 */ 1371 - __this_cpu_add(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED], val); 1368 + if (val > 0) { 1369 + __this_cpu_inc(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); 1370 + SetPageCgroupFileMapped(pc); 1371 + } else { 1372 + __this_cpu_dec(mem->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]); 1373 + ClearPageCgroupFileMapped(pc); 1374 + } 1372 1375 1373 1376 done: 1374 1377 unlock_page_cgroup(pc); ··· 1804 1801 static void __mem_cgroup_move_account(struct page_cgroup *pc, 1805 1802 struct mem_cgroup *from, struct mem_cgroup *to, bool uncharge) 1806 1803 { 1807 - struct page *page; 1808 - 1809 1804 VM_BUG_ON(from == to); 1810 1805 VM_BUG_ON(PageLRU(pc->page)); 1811 1806 VM_BUG_ON(!PageCgroupLocked(pc)); 1812 1807 VM_BUG_ON(!PageCgroupUsed(pc)); 1813 1808 VM_BUG_ON(pc->mem_cgroup != from); 1814 1809 1815 - page = pc->page; 1816 - if (page_mapped(page) && !PageAnon(page)) { 1810 + if (PageCgroupFileMapped(pc)) { 1817 1811 /* Update mapped_file data for mem_cgroup */ 1818 1812 preempt_disable(); 1819 1813 __this_cpu_dec(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED]);