Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

memcg: add per cgroup dirty page accounting

When modifying PG_Dirty on cached file pages, update the new
MEM_CGROUP_STAT_DIRTY counter. This is done in the same places where
global NR_FILE_DIRTY is managed. The new memcg stat is visible in the
per memcg memory.stat cgroupfs file. The most recent past attempt at
this was http://thread.gmane.org/gmane.linux.kernel.cgroups/8632

The new accounting supports future efforts to add per cgroup dirty
page throttling and writeback. It also helps an administrator break
down a container's memory usage and provides evidence to understand
memcg oom kills (the new dirty count is included in memcg oom kill
messages).

The ability to move page accounting between memcg
(memory.move_charge_at_immigrate) makes this accounting more
complicated than the global counter. The existing
mem_cgroup_{begin,end}_page_stat() lock is used to serialize move
accounting with stat updates.
Typical update operation:
memcg = mem_cgroup_begin_page_stat(page)
if (TestSetPageDirty()) {
[...]
mem_cgroup_update_page_stat(memcg)
}
mem_cgroup_end_page_stat(memcg)

Summary of mem_cgroup_end_page_stat() overhead:
- Without CONFIG_MEMCG it's a no-op
- With CONFIG_MEMCG and no inter memcg task movement, it's just
rcu_read_lock()
- With CONFIG_MEMCG and inter memcg task movement, it's
rcu_read_lock() + spin_lock_irqsave()

A memcg parameter is added to several routines because their callers
now grab mem_cgroup_begin_page_stat() which returns the memcg later
needed by for mem_cgroup_update_page_stat().

Because mem_cgroup_begin_page_stat() may disable interrupts, some
adjustments are needed:
- move __mark_inode_dirty() from __set_page_dirty() to its caller.
__mark_inode_dirty() locking does not want interrupts disabled.
- use spin_lock_irqsave(tree_lock) rather than spin_lock_irq() in
__delete_from_page_cache(), replace_page_cache_page(),
invalidate_complete_page2(), and __remove_mapping().

text data bss dec hex filename
8925147 1774832 1785856 12485835 be84cb vmlinux-!CONFIG_MEMCG-before
8925339 1774832 1785856 12486027 be858b vmlinux-!CONFIG_MEMCG-after
+192 text bytes
8965977 1784992 1785856 12536825 bf4bf9 vmlinux-CONFIG_MEMCG-before
8966750 1784992 1785856 12537598 bf4efe vmlinux-CONFIG_MEMCG-after
+773 text bytes

Performance tests run on v4.0-rc1-36-g4f671fe2f952. Lower is better for
all metrics, they're all wall clock or cycle counts. The read and write
fault benchmarks just measure fault time, they do not include I/O time.

* CONFIG_MEMCG not set:
baseline patched
kbuild 1m25.030000(+-0.088% 3 samples) 1m25.426667(+-0.120% 3 samples)
dd write 100 MiB 0.859211561 +-15.10% 0.874162885 +-15.03%
dd write 200 MiB 1.670653105 +-17.87% 1.669384764 +-11.99%
dd write 1000 MiB 8.434691190 +-14.15% 8.474733215 +-14.77%
read fault cycles 254.0(+-0.000% 10 samples) 253.0(+-0.000% 10 samples)
write fault cycles 2021.2(+-3.070% 10 samples) 1984.5(+-1.036% 10 samples)

* CONFIG_MEMCG=y root_memcg:
baseline patched
kbuild 1m25.716667(+-0.105% 3 samples) 1m25.686667(+-0.153% 3 samples)
dd write 100 MiB 0.855650830 +-14.90% 0.887557919 +-14.90%
dd write 200 MiB 1.688322953 +-12.72% 1.667682724 +-13.33%
dd write 1000 MiB 8.418601605 +-14.30% 8.673532299 +-15.00%
read fault cycles 266.0(+-0.000% 10 samples) 266.0(+-0.000% 10 samples)
write fault cycles 2051.7(+-1.349% 10 samples) 2049.6(+-1.686% 10 samples)

* CONFIG_MEMCG=y non-root_memcg:
baseline patched
kbuild 1m26.120000(+-0.273% 3 samples) 1m25.763333(+-0.127% 3 samples)
dd write 100 MiB 0.861723964 +-15.25% 0.818129350 +-14.82%
dd write 200 MiB 1.669887569 +-13.30% 1.698645885 +-13.27%
dd write 1000 MiB 8.383191730 +-14.65% 8.351742280 +-14.52%
read fault cycles 265.7(+-0.172% 10 samples) 267.0(+-0.000% 10 samples)
write fault cycles 2070.6(+-1.512% 10 samples) 2084.4(+-2.148% 10 samples)

As expected anon page faults are not affected by this patch.

tj: Updated to apply on top of the recent cancel_dirty_page() changes.

Signed-off-by: Sha Zhengju <handai.szj@gmail.com>
Signed-off-by: Greg Thelen <gthelen@google.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <axboe@fb.com>

authored by

Greg Thelen and committed by
Jens Axboe
c4843a75 11f81bec

+156 -39
+1
Documentation/cgroups/memory.txt
··· 493 493 pgpgout - # of uncharging events to the memory cgroup. The uncharging 494 494 event happens each time a page is unaccounted from the cgroup. 495 495 swap - # of bytes of swap usage 496 + dirty - # of bytes that are waiting to get written back to the disk. 496 497 writeback - # of bytes of file/anon cache that are queued for syncing to 497 498 disk. 498 499 inactive_anon - # of bytes of anonymous and swap cache memory on inactive
+27 -7
fs/buffer.c
··· 623 623 * 624 624 * If warn is true, then emit a warning if the page is not uptodate and has 625 625 * not been truncated. 626 + * 627 + * The caller must hold mem_cgroup_begin_page_stat() lock. 626 628 */ 627 - static void __set_page_dirty(struct page *page, 628 - struct address_space *mapping, int warn) 629 + static void __set_page_dirty(struct page *page, struct address_space *mapping, 630 + struct mem_cgroup *memcg, int warn) 629 631 { 630 632 unsigned long flags; 631 633 632 634 spin_lock_irqsave(&mapping->tree_lock, flags); 633 635 if (page->mapping) { /* Race with truncate? */ 634 636 WARN_ON_ONCE(warn && !PageUptodate(page)); 635 - account_page_dirtied(page, mapping); 637 + account_page_dirtied(page, mapping, memcg); 636 638 radix_tree_tag_set(&mapping->page_tree, 637 639 page_index(page), PAGECACHE_TAG_DIRTY); 638 640 } 639 641 spin_unlock_irqrestore(&mapping->tree_lock, flags); 640 - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); 641 642 } 642 643 643 644 /* ··· 669 668 int __set_page_dirty_buffers(struct page *page) 670 669 { 671 670 int newly_dirty; 671 + struct mem_cgroup *memcg; 672 672 struct address_space *mapping = page_mapping(page); 673 673 674 674 if (unlikely(!mapping)) ··· 685 683 bh = bh->b_this_page; 686 684 } while (bh != head); 687 685 } 686 + /* 687 + * Use mem_group_begin_page_stat() to keep PageDirty synchronized with 688 + * per-memcg dirty page counters. 689 + */ 690 + memcg = mem_cgroup_begin_page_stat(page); 688 691 newly_dirty = !TestSetPageDirty(page); 689 692 spin_unlock(&mapping->private_lock); 690 693 691 694 if (newly_dirty) 692 - __set_page_dirty(page, mapping, 1); 695 + __set_page_dirty(page, mapping, memcg, 1); 696 + 697 + mem_cgroup_end_page_stat(memcg); 698 + 699 + if (newly_dirty) 700 + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); 701 + 693 702 return newly_dirty; 694 703 } 695 704 EXPORT_SYMBOL(__set_page_dirty_buffers); ··· 1171 1158 1172 1159 if (!test_set_buffer_dirty(bh)) { 1173 1160 struct page *page = bh->b_page; 1161 + struct address_space *mapping = NULL; 1162 + struct mem_cgroup *memcg; 1163 + 1164 + memcg = mem_cgroup_begin_page_stat(page); 1174 1165 if (!TestSetPageDirty(page)) { 1175 - struct address_space *mapping = page_mapping(page); 1166 + mapping = page_mapping(page); 1176 1167 if (mapping) 1177 - __set_page_dirty(page, mapping, 0); 1168 + __set_page_dirty(page, mapping, memcg, 0); 1178 1169 } 1170 + mem_cgroup_end_page_stat(memcg); 1171 + if (mapping) 1172 + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); 1179 1173 } 1180 1174 } 1181 1175 EXPORT_SYMBOL(mark_buffer_dirty);
+10 -2
fs/xfs/xfs_aops.c
··· 1873 1873 loff_t end_offset; 1874 1874 loff_t offset; 1875 1875 int newly_dirty; 1876 + struct mem_cgroup *memcg; 1876 1877 1877 1878 if (unlikely(!mapping)) 1878 1879 return !TestSetPageDirty(page); ··· 1893 1892 offset += 1 << inode->i_blkbits; 1894 1893 } while (bh != head); 1895 1894 } 1895 + /* 1896 + * Use mem_group_begin_page_stat() to keep PageDirty synchronized with 1897 + * per-memcg dirty page counters. 1898 + */ 1899 + memcg = mem_cgroup_begin_page_stat(page); 1896 1900 newly_dirty = !TestSetPageDirty(page); 1897 1901 spin_unlock(&mapping->private_lock); 1898 1902 ··· 1908 1902 spin_lock_irqsave(&mapping->tree_lock, flags); 1909 1903 if (page->mapping) { /* Race with truncate? */ 1910 1904 WARN_ON_ONCE(!PageUptodate(page)); 1911 - account_page_dirtied(page, mapping); 1905 + account_page_dirtied(page, mapping, memcg); 1912 1906 radix_tree_tag_set(&mapping->page_tree, 1913 1907 page_index(page), PAGECACHE_TAG_DIRTY); 1914 1908 } 1915 1909 spin_unlock_irqrestore(&mapping->tree_lock, flags); 1916 - __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); 1917 1910 } 1911 + mem_cgroup_end_page_stat(memcg); 1912 + if (newly_dirty) 1913 + __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); 1918 1914 return newly_dirty; 1919 1915 } 1920 1916
+1
include/linux/memcontrol.h
··· 41 41 MEM_CGROUP_STAT_RSS, /* # of pages charged as anon rss */ 42 42 MEM_CGROUP_STAT_RSS_HUGE, /* # of pages charged as anon huge */ 43 43 MEM_CGROUP_STAT_FILE_MAPPED, /* # of pages charged as file rss */ 44 + MEM_CGROUP_STAT_DIRTY, /* # of dirty pages in page cache */ 44 45 MEM_CGROUP_STAT_WRITEBACK, /* # of pages under writeback */ 45 46 MEM_CGROUP_STAT_SWAP, /* # of pages, swapped out */ 46 47 MEM_CGROUP_STAT_NSTATS,
+4 -2
include/linux/mm.h
··· 1211 1211 int __set_page_dirty_no_writeback(struct page *page); 1212 1212 int redirty_page_for_writepage(struct writeback_control *wbc, 1213 1213 struct page *page); 1214 - void account_page_dirtied(struct page *page, struct address_space *mapping); 1215 - void account_page_cleaned(struct page *page, struct address_space *mapping); 1214 + void account_page_dirtied(struct page *page, struct address_space *mapping, 1215 + struct mem_cgroup *memcg); 1216 + void account_page_cleaned(struct page *page, struct address_space *mapping, 1217 + struct mem_cgroup *memcg); 1216 1218 int set_page_dirty(struct page *page); 1217 1219 int set_page_dirty_lock(struct page *page); 1218 1220 void cancel_dirty_page(struct page *page);
+2 -1
include/linux/pagemap.h
··· 651 651 int add_to_page_cache_lru(struct page *page, struct address_space *mapping, 652 652 pgoff_t index, gfp_t gfp_mask); 653 653 extern void delete_from_page_cache(struct page *page); 654 - extern void __delete_from_page_cache(struct page *page, void *shadow); 654 + extern void __delete_from_page_cache(struct page *page, void *shadow, 655 + struct mem_cgroup *memcg); 655 656 int replace_page_cache_page(struct page *old, struct page *new, gfp_t gfp_mask); 656 657 657 658 /*
+22 -9
mm/filemap.c
··· 100 100 * ->tree_lock (page_remove_rmap->set_page_dirty) 101 101 * bdi.wb->list_lock (page_remove_rmap->set_page_dirty) 102 102 * ->inode->i_lock (page_remove_rmap->set_page_dirty) 103 + * ->memcg->move_lock (page_remove_rmap->mem_cgroup_begin_page_stat) 103 104 * bdi.wb->list_lock (zap_pte_range->set_page_dirty) 104 105 * ->inode->i_lock (zap_pte_range->set_page_dirty) 105 106 * ->private_lock (zap_pte_range->__set_page_dirty_buffers) ··· 175 174 /* 176 175 * Delete a page from the page cache and free it. Caller has to make 177 176 * sure the page is locked and that nobody else uses it - or that usage 178 - * is safe. The caller must hold the mapping's tree_lock. 177 + * is safe. The caller must hold the mapping's tree_lock and 178 + * mem_cgroup_begin_page_stat(). 179 179 */ 180 - void __delete_from_page_cache(struct page *page, void *shadow) 180 + void __delete_from_page_cache(struct page *page, void *shadow, 181 + struct mem_cgroup *memcg) 181 182 { 182 183 struct address_space *mapping = page->mapping; 183 184 ··· 213 210 * anyway will be cleared before returning page into buddy allocator. 214 211 */ 215 212 if (WARN_ON_ONCE(PageDirty(page))) 216 - account_page_cleaned(page, mapping); 213 + account_page_cleaned(page, mapping, memcg); 217 214 } 218 215 219 216 /** ··· 227 224 void delete_from_page_cache(struct page *page) 228 225 { 229 226 struct address_space *mapping = page->mapping; 227 + struct mem_cgroup *memcg; 228 + unsigned long flags; 229 + 230 230 void (*freepage)(struct page *); 231 231 232 232 BUG_ON(!PageLocked(page)); 233 233 234 234 freepage = mapping->a_ops->freepage; 235 - spin_lock_irq(&mapping->tree_lock); 236 - __delete_from_page_cache(page, NULL); 237 - spin_unlock_irq(&mapping->tree_lock); 235 + 236 + memcg = mem_cgroup_begin_page_stat(page); 237 + spin_lock_irqsave(&mapping->tree_lock, flags); 238 + __delete_from_page_cache(page, NULL, memcg); 239 + spin_unlock_irqrestore(&mapping->tree_lock, flags); 240 + mem_cgroup_end_page_stat(memcg); 238 241 239 242 if (freepage) 240 243 freepage(page); ··· 479 470 if (!error) { 480 471 struct address_space *mapping = old->mapping; 481 472 void (*freepage)(struct page *); 473 + struct mem_cgroup *memcg; 474 + unsigned long flags; 482 475 483 476 pgoff_t offset = old->index; 484 477 freepage = mapping->a_ops->freepage; ··· 489 478 new->mapping = mapping; 490 479 new->index = offset; 491 480 492 - spin_lock_irq(&mapping->tree_lock); 493 - __delete_from_page_cache(old, NULL); 481 + memcg = mem_cgroup_begin_page_stat(old); 482 + spin_lock_irqsave(&mapping->tree_lock, flags); 483 + __delete_from_page_cache(old, NULL, memcg); 494 484 error = radix_tree_insert(&mapping->page_tree, offset, new); 495 485 BUG_ON(error); 496 486 mapping->nrpages++; 497 487 __inc_zone_page_state(new, NR_FILE_PAGES); 498 488 if (PageSwapBacked(new)) 499 489 __inc_zone_page_state(new, NR_SHMEM); 500 - spin_unlock_irq(&mapping->tree_lock); 490 + spin_unlock_irqrestore(&mapping->tree_lock, flags); 491 + mem_cgroup_end_page_stat(memcg); 501 492 mem_cgroup_migrate(old, new, true); 502 493 radix_tree_preload_end(); 503 494 if (freepage)
+23 -1
mm/memcontrol.c
··· 90 90 "rss", 91 91 "rss_huge", 92 92 "mapped_file", 93 + "dirty", 93 94 "writeback", 94 95 "swap", 95 96 }; ··· 2012 2011 2013 2012 return memcg; 2014 2013 } 2014 + EXPORT_SYMBOL(mem_cgroup_begin_page_stat); 2015 2015 2016 2016 /** 2017 2017 * mem_cgroup_end_page_stat - finish a page state statistics transaction ··· 2031 2029 2032 2030 rcu_read_unlock(); 2033 2031 } 2032 + EXPORT_SYMBOL(mem_cgroup_end_page_stat); 2034 2033 2035 2034 /** 2036 2035 * mem_cgroup_update_page_stat - update page state statistics ··· 4749 4746 { 4750 4747 unsigned long flags; 4751 4748 int ret; 4749 + bool anon; 4752 4750 4753 4751 VM_BUG_ON(from == to); 4754 4752 VM_BUG_ON_PAGE(PageLRU(page), page); ··· 4775 4771 if (page->mem_cgroup != from) 4776 4772 goto out_unlock; 4777 4773 4774 + anon = PageAnon(page); 4775 + 4778 4776 spin_lock_irqsave(&from->move_lock, flags); 4779 4777 4780 - if (!PageAnon(page) && page_mapped(page)) { 4778 + if (!anon && page_mapped(page)) { 4781 4779 __this_cpu_sub(from->stat->count[MEM_CGROUP_STAT_FILE_MAPPED], 4782 4780 nr_pages); 4783 4781 __this_cpu_add(to->stat->count[MEM_CGROUP_STAT_FILE_MAPPED], 4784 4782 nr_pages); 4783 + } 4784 + 4785 + /* 4786 + * move_lock grabbed above and caller set from->moving_account, so 4787 + * mem_cgroup_update_page_stat() will serialize updates to PageDirty. 4788 + * So mapping should be stable for dirty pages. 4789 + */ 4790 + if (!anon && PageDirty(page)) { 4791 + struct address_space *mapping = page_mapping(page); 4792 + 4793 + if (mapping_cap_account_dirty(mapping)) { 4794 + __this_cpu_sub(from->stat->count[MEM_CGROUP_STAT_DIRTY], 4795 + nr_pages); 4796 + __this_cpu_add(to->stat->count[MEM_CGROUP_STAT_DIRTY], 4797 + nr_pages); 4798 + } 4785 4799 } 4786 4800 4787 4801 if (PageWriteback(page)) {
+42 -8
mm/page-writeback.c
··· 2090 2090 2091 2091 /* 2092 2092 * Helper function for set_page_dirty family. 2093 + * 2094 + * Caller must hold mem_cgroup_begin_page_stat(). 2095 + * 2093 2096 * NOTE: This relies on being atomic wrt interrupts. 2094 2097 */ 2095 - void account_page_dirtied(struct page *page, struct address_space *mapping) 2098 + void account_page_dirtied(struct page *page, struct address_space *mapping, 2099 + struct mem_cgroup *memcg) 2096 2100 { 2097 2101 trace_writeback_dirty_page(page, mapping); 2098 2102 2099 2103 if (mapping_cap_account_dirty(mapping)) { 2100 2104 struct backing_dev_info *bdi = inode_to_bdi(mapping->host); 2101 2105 2106 + mem_cgroup_inc_page_stat(memcg, MEM_CGROUP_STAT_DIRTY); 2102 2107 __inc_zone_page_state(page, NR_FILE_DIRTY); 2103 2108 __inc_zone_page_state(page, NR_DIRTIED); 2104 2109 __inc_bdi_stat(bdi, BDI_RECLAIMABLE); ··· 2117 2112 2118 2113 /* 2119 2114 * Helper function for deaccounting dirty page without writeback. 2115 + * 2116 + * Caller must hold mem_cgroup_begin_page_stat(). 2120 2117 */ 2121 - void account_page_cleaned(struct page *page, struct address_space *mapping) 2118 + void account_page_cleaned(struct page *page, struct address_space *mapping, 2119 + struct mem_cgroup *memcg) 2122 2120 { 2123 2121 if (mapping_cap_account_dirty(mapping)) { 2122 + mem_cgroup_dec_page_stat(memcg, MEM_CGROUP_STAT_DIRTY); 2124 2123 dec_zone_page_state(page, NR_FILE_DIRTY); 2125 2124 dec_bdi_stat(inode_to_bdi(mapping->host), BDI_RECLAIMABLE); 2126 2125 task_io_account_cancelled_write(PAGE_CACHE_SIZE); ··· 2145 2136 */ 2146 2137 int __set_page_dirty_nobuffers(struct page *page) 2147 2138 { 2139 + struct mem_cgroup *memcg; 2140 + 2141 + memcg = mem_cgroup_begin_page_stat(page); 2148 2142 if (!TestSetPageDirty(page)) { 2149 2143 struct address_space *mapping = page_mapping(page); 2150 2144 unsigned long flags; 2151 2145 2152 - if (!mapping) 2146 + if (!mapping) { 2147 + mem_cgroup_end_page_stat(memcg); 2153 2148 return 1; 2149 + } 2154 2150 2155 2151 spin_lock_irqsave(&mapping->tree_lock, flags); 2156 2152 BUG_ON(page_mapping(page) != mapping); 2157 2153 WARN_ON_ONCE(!PagePrivate(page) && !PageUptodate(page)); 2158 - account_page_dirtied(page, mapping); 2154 + account_page_dirtied(page, mapping, memcg); 2159 2155 radix_tree_tag_set(&mapping->page_tree, page_index(page), 2160 2156 PAGECACHE_TAG_DIRTY); 2161 2157 spin_unlock_irqrestore(&mapping->tree_lock, flags); 2158 + mem_cgroup_end_page_stat(memcg); 2159 + 2162 2160 if (mapping->host) { 2163 2161 /* !PageAnon && !swapper_space */ 2164 2162 __mark_inode_dirty(mapping->host, I_DIRTY_PAGES); 2165 2163 } 2166 2164 return 1; 2167 2165 } 2166 + mem_cgroup_end_page_stat(memcg); 2168 2167 return 0; 2169 2168 } 2170 2169 EXPORT_SYMBOL(__set_page_dirty_nobuffers); ··· 2290 2273 */ 2291 2274 void cancel_dirty_page(struct page *page) 2292 2275 { 2293 - if (TestClearPageDirty(page)) 2294 - account_page_cleaned(page, page_mapping(page)); 2276 + struct address_space *mapping = page_mapping(page); 2277 + 2278 + if (mapping_cap_account_dirty(mapping)) { 2279 + struct mem_cgroup *memcg; 2280 + 2281 + memcg = mem_cgroup_begin_page_stat(page); 2282 + 2283 + if (TestClearPageDirty(page)) 2284 + account_page_cleaned(page, mapping, memcg); 2285 + 2286 + mem_cgroup_end_page_stat(memcg); 2287 + } else { 2288 + ClearPageDirty(page); 2289 + } 2295 2290 } 2296 2291 EXPORT_SYMBOL(cancel_dirty_page); 2297 2292 ··· 2324 2295 int clear_page_dirty_for_io(struct page *page) 2325 2296 { 2326 2297 struct address_space *mapping = page_mapping(page); 2298 + struct mem_cgroup *memcg; 2299 + int ret = 0; 2327 2300 2328 2301 BUG_ON(!PageLocked(page)); 2329 2302 ··· 2365 2334 * always locked coming in here, so we get the desired 2366 2335 * exclusion. 2367 2336 */ 2337 + memcg = mem_cgroup_begin_page_stat(page); 2368 2338 if (TestClearPageDirty(page)) { 2339 + mem_cgroup_dec_page_stat(memcg, MEM_CGROUP_STAT_DIRTY); 2369 2340 dec_zone_page_state(page, NR_FILE_DIRTY); 2370 2341 dec_bdi_stat(inode_to_bdi(mapping->host), 2371 2342 BDI_RECLAIMABLE); 2372 - return 1; 2343 + ret = 1; 2373 2344 } 2374 - return 0; 2345 + mem_cgroup_end_page_stat(memcg); 2346 + return ret; 2375 2347 } 2376 2348 return TestClearPageDirty(page); 2377 2349 }
+2
mm/rmap.c
··· 30 30 * swap_lock (in swap_duplicate, swap_info_get) 31 31 * mmlist_lock (in mmput, drain_mmlist and others) 32 32 * mapping->private_lock (in __set_page_dirty_buffers) 33 + * mem_cgroup_{begin,end}_page_stat (memcg->move_lock) 34 + * mapping->tree_lock (widely used) 33 35 * inode->i_lock (in set_page_dirty's __mark_inode_dirty) 34 36 * bdi.wb->list_lock (in set_page_dirty's __mark_inode_dirty) 35 37 * sb_lock (within inode_lock in fs/fs-writeback.c)
+10 -4
mm/truncate.c
··· 510 510 static int 511 511 invalidate_complete_page2(struct address_space *mapping, struct page *page) 512 512 { 513 + struct mem_cgroup *memcg; 514 + unsigned long flags; 515 + 513 516 if (page->mapping != mapping) 514 517 return 0; 515 518 516 519 if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL)) 517 520 return 0; 518 521 519 - spin_lock_irq(&mapping->tree_lock); 522 + memcg = mem_cgroup_begin_page_stat(page); 523 + spin_lock_irqsave(&mapping->tree_lock, flags); 520 524 if (PageDirty(page)) 521 525 goto failed; 522 526 523 527 BUG_ON(page_has_private(page)); 524 - __delete_from_page_cache(page, NULL); 525 - spin_unlock_irq(&mapping->tree_lock); 528 + __delete_from_page_cache(page, NULL, memcg); 529 + spin_unlock_irqrestore(&mapping->tree_lock, flags); 530 + mem_cgroup_end_page_stat(memcg); 526 531 527 532 if (mapping->a_ops->freepage) 528 533 mapping->a_ops->freepage(page); ··· 535 530 page_cache_release(page); /* pagecache ref */ 536 531 return 1; 537 532 failed: 538 - spin_unlock_irq(&mapping->tree_lock); 533 + spin_unlock_irqrestore(&mapping->tree_lock, flags); 534 + mem_cgroup_end_page_stat(memcg); 539 535 return 0; 540 536 } 541 537
+12 -5
mm/vmscan.c
··· 579 579 static int __remove_mapping(struct address_space *mapping, struct page *page, 580 580 bool reclaimed) 581 581 { 582 + unsigned long flags; 583 + struct mem_cgroup *memcg; 584 + 582 585 BUG_ON(!PageLocked(page)); 583 586 BUG_ON(mapping != page_mapping(page)); 584 587 585 - spin_lock_irq(&mapping->tree_lock); 588 + memcg = mem_cgroup_begin_page_stat(page); 589 + spin_lock_irqsave(&mapping->tree_lock, flags); 586 590 /* 587 591 * The non racy check for a busy page. 588 592 * ··· 624 620 swp_entry_t swap = { .val = page_private(page) }; 625 621 mem_cgroup_swapout(page, swap); 626 622 __delete_from_swap_cache(page); 627 - spin_unlock_irq(&mapping->tree_lock); 623 + spin_unlock_irqrestore(&mapping->tree_lock, flags); 624 + mem_cgroup_end_page_stat(memcg); 628 625 swapcache_free(swap); 629 626 } else { 630 627 void (*freepage)(struct page *); ··· 645 640 if (reclaimed && page_is_file_cache(page) && 646 641 !mapping_exiting(mapping)) 647 642 shadow = workingset_eviction(mapping, page); 648 - __delete_from_page_cache(page, shadow); 649 - spin_unlock_irq(&mapping->tree_lock); 643 + __delete_from_page_cache(page, shadow, memcg); 644 + spin_unlock_irqrestore(&mapping->tree_lock, flags); 645 + mem_cgroup_end_page_stat(memcg); 650 646 651 647 if (freepage != NULL) 652 648 freepage(page); ··· 656 650 return 1; 657 651 658 652 cannot_free: 659 - spin_unlock_irq(&mapping->tree_lock); 653 + spin_unlock_irqrestore(&mapping->tree_lock, flags); 654 + mem_cgroup_end_page_stat(memcg); 660 655 return 0; 661 656 } 662 657