memcg: add lock to synchronize page accounting and migration

Introduce a new bit spin lock, PCG_MOVE_LOCK, to synchronize the page
accounting and migration code. This reworks the locking scheme of
_update_stat() and _move_account() by adding new lock bit PCG_MOVE_LOCK,
which is always taken under IRQ disable.

1. If pages are being migrated from a memcg, then updates to that
memcg page statistics are protected by grabbing PCG_MOVE_LOCK using
move_lock_page_cgroup(). In an upcoming commit, memcg dirty page
accounting will be updating memcg page accounting (specifically: num
writeback pages) from IRQ context (softirq). Avoid a deadlocking
nested spin lock attempt by disabling irq on the local processor when
grabbing the PCG_MOVE_LOCK.

2. lock for update_page_stat is used only for avoiding race with
move_account(). So, IRQ awareness of lock_page_cgroup() itself is not
a problem. The problem is between mem_cgroup_update_page_stat() and
mem_cgroup_move_account_page().

Trade-off:
* Changing lock_page_cgroup() to always disable IRQ (or
local_bh) has some impacts on performance and I think
it's bad to disable IRQ when it's not necessary.
* adding a new lock makes move_account() slower. Score is
here.

Performance Impact: moving a 8G anon process.

Before:
real 0m0.792s
user 0m0.000s
sys 0m0.780s

After:
real 0m0.854s
user 0m0.000s
sys 0m0.842s

This score is bad but planned patches for optimization can reduce
this impact.

Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Andrea Righi <arighi@develer.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by KAMEZAWA Hiroyuki and committed by Linus Torvalds dbd4ea78 2a7106f2

+35 -5
+28 -3
include/linux/page_cgroup.h
··· 35 35 36 36 enum { 37 37 /* flags for mem_cgroup */ 38 - PCG_LOCK, /* page cgroup is locked */ 38 + PCG_LOCK, /* Lock for pc->mem_cgroup and following bits. */ 39 39 PCG_CACHE, /* charged as cache */ 40 40 PCG_USED, /* this object is in use. */ 41 - PCG_ACCT_LRU, /* page has been accounted for */ 41 + PCG_MIGRATION, /* under page migration */ 42 + /* flags for mem_cgroup and file and I/O status */ 43 + PCG_MOVE_LOCK, /* For race between move_account v.s. following bits */ 42 44 PCG_FILE_MAPPED, /* page is accounted as "mapped" */ 43 45 PCG_FILE_DIRTY, /* page is dirty */ 44 46 PCG_FILE_WRITEBACK, /* page is under writeback */ 45 47 PCG_FILE_UNSTABLE_NFS, /* page is NFS unstable */ 46 - PCG_MIGRATION, /* under page migration */ 48 + /* No lock in page_cgroup */ 49 + PCG_ACCT_LRU, /* page has been accounted for (under lru_lock) */ 47 50 }; 48 51 49 52 #define TESTPCGFLAG(uname, lname) \ ··· 120 117 121 118 static inline void lock_page_cgroup(struct page_cgroup *pc) 122 119 { 120 + /* 121 + * Don't take this lock in IRQ context. 122 + * This lock is for pc->mem_cgroup, USED, CACHE, MIGRATION 123 + */ 123 124 bit_spin_lock(PCG_LOCK, &pc->flags); 124 125 } 125 126 ··· 135 128 static inline int page_is_cgroup_locked(struct page_cgroup *pc) 136 129 { 137 130 return bit_spin_is_locked(PCG_LOCK, &pc->flags); 131 + } 132 + 133 + static inline void move_lock_page_cgroup(struct page_cgroup *pc, 134 + unsigned long *flags) 135 + { 136 + /* 137 + * We know updates to pc->flags of page cache's stats are from both of 138 + * usual context or IRQ context. Disable IRQ to avoid deadlock. 139 + */ 140 + local_irq_save(*flags); 141 + bit_spin_lock(PCG_MOVE_LOCK, &pc->flags); 142 + } 143 + 144 + static inline void move_unlock_page_cgroup(struct page_cgroup *pc, 145 + unsigned long *flags) 146 + { 147 + bit_spin_unlock(PCG_MOVE_LOCK, &pc->flags); 148 + local_irq_restore(*flags); 138 149 } 139 150 140 151 #else /* CONFIG_CGROUP_MEM_RES_CTLR */
+7 -2
mm/memcontrol.c
··· 1606 1606 struct mem_cgroup *mem; 1607 1607 struct page_cgroup *pc = lookup_page_cgroup(page); 1608 1608 bool need_unlock = false; 1609 + unsigned long uninitialized_var(flags); 1609 1610 1610 1611 if (unlikely(!pc)) 1611 1612 return; ··· 1618 1617 /* pc->mem_cgroup is unstable ? */ 1619 1618 if (unlikely(mem_cgroup_stealed(mem))) { 1620 1619 /* take a lock against to access pc->mem_cgroup */ 1621 - lock_page_cgroup(pc); 1620 + move_lock_page_cgroup(pc, &flags); 1622 1621 need_unlock = true; 1623 1622 mem = pc->mem_cgroup; 1624 1623 if (!mem || !PageCgroupUsed(pc)) ··· 1641 1640 1642 1641 out: 1643 1642 if (unlikely(need_unlock)) 1644 - unlock_page_cgroup(pc); 1643 + move_unlock_page_cgroup(pc, &flags); 1645 1644 rcu_read_unlock(); 1646 1645 return; 1647 1646 } ··· 2212 2211 struct mem_cgroup *from, struct mem_cgroup *to, bool uncharge) 2213 2212 { 2214 2213 int ret = -EINVAL; 2214 + unsigned long flags; 2215 + 2215 2216 lock_page_cgroup(pc); 2216 2217 if (PageCgroupUsed(pc) && pc->mem_cgroup == from) { 2218 + move_lock_page_cgroup(pc, &flags); 2217 2219 __mem_cgroup_move_account(pc, from, to, uncharge); 2220 + move_unlock_page_cgroup(pc, &flags); 2218 2221 ret = 0; 2219 2222 } 2220 2223 unlock_page_cgroup(pc);