mm: fork: fix kernel_stack memcg stats for various stack implementations

Depending on CONFIG_VMAP_STACK and the THREAD_SIZE / PAGE_SIZE ratio the
space for task stacks can be allocated using __vmalloc_node_range(),
alloc_pages_node() and kmem_cache_alloc_node().

In the first and the second cases page->mem_cgroup pointer is set, but
in the third it's not: memcg membership of a slab page should be
determined using the memcg_from_slab_page() function, which looks at
page->slab_cache->memcg_params.memcg . In this case, using
mod_memcg_page_state() (as in account_kernel_stack()) is incorrect:
page->mem_cgroup pointer is NULL even for pages charged to a non-root
memory cgroup.

It can lead to kernel_stack per-memcg counters permanently showing 0 on
some architectures (depending on the configuration).

In order to fix it, let's introduce a mod_memcg_obj_state() helper,
which takes a pointer to a kernel object as a first argument, uses
mem_cgroup_from_obj() to get a RCU-protected memcg pointer and calls
mod_memcg_state(). It allows to handle all possible configurations
(CONFIG_VMAP_STACK and various THREAD_SIZE/PAGE_SIZE values) without
spilling any memcg/kmem specifics into fork.c .

Note: This is a special version of the patch created for stable
backports. It contains code from the following two patches:
- mm: memcg/slab: introduce mem_cgroup_from_obj()
- mm: fork: fix kernel_stack memcg stats for various stack implementations

[guro@fb.com: introduce mem_cgroup_from_obj()]
Link: http://lkml.kernel.org/r/20200324004221.GA36662@carbon.dhcp.thefacebook.com
Fixes: 4d96ba353075 ("mm: memcg/slab: stop setting page->mem_cgroup pointer for slab pages")
Signed-off-by: Roman Gushchin <guro@fb.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Bharata B Rao <bharata@linux.ibm.com>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200303233550.251375-1-guro@fb.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by Roman Gushchin and committed by Linus Torvalds 8380ce47 726b7bbe

Changed files
+52 -2
include
linux
kernel
mm
+12
include/linux/memcontrol.h
··· 695 695 void __mod_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, 696 696 int val); 697 697 void __mod_lruvec_slab_state(void *p, enum node_stat_item idx, int val); 698 + void mod_memcg_obj_state(void *p, int idx, int val); 698 699 699 700 static inline void mod_lruvec_state(struct lruvec *lruvec, 700 701 enum node_stat_item idx, int val) ··· 1124 1123 __mod_node_page_state(page_pgdat(page), idx, val); 1125 1124 } 1126 1125 1126 + static inline void mod_memcg_obj_state(void *p, int idx, int val) 1127 + { 1128 + } 1129 + 1127 1130 static inline 1128 1131 unsigned long mem_cgroup_soft_limit_reclaim(pg_data_t *pgdat, int order, 1129 1132 gfp_t gfp_mask, ··· 1432 1427 return memcg ? memcg->kmemcg_id : -1; 1433 1428 } 1434 1429 1430 + struct mem_cgroup *mem_cgroup_from_obj(void *p); 1431 + 1435 1432 #else 1436 1433 1437 1434 static inline int memcg_kmem_charge(struct page *page, gfp_t gfp, int order) ··· 1473 1466 1474 1467 static inline void memcg_put_cache_ids(void) 1475 1468 { 1469 + } 1470 + 1471 + static inline struct mem_cgroup *mem_cgroup_from_obj(void *p) 1472 + { 1473 + return NULL; 1476 1474 } 1477 1475 1478 1476 #endif /* CONFIG_MEMCG_KMEM */
+2 -2
kernel/fork.c
··· 397 397 mod_zone_page_state(page_zone(first_page), NR_KERNEL_STACK_KB, 398 398 THREAD_SIZE / 1024 * account); 399 399 400 - mod_memcg_page_state(first_page, MEMCG_KERNEL_STACK_KB, 401 - account * (THREAD_SIZE / 1024)); 400 + mod_memcg_obj_state(stack, MEMCG_KERNEL_STACK_KB, 401 + account * (THREAD_SIZE / 1024)); 402 402 } 403 403 } 404 404
+38
mm/memcontrol.c
··· 777 777 rcu_read_unlock(); 778 778 } 779 779 780 + void mod_memcg_obj_state(void *p, int idx, int val) 781 + { 782 + struct mem_cgroup *memcg; 783 + 784 + rcu_read_lock(); 785 + memcg = mem_cgroup_from_obj(p); 786 + if (memcg) 787 + mod_memcg_state(memcg, idx, val); 788 + rcu_read_unlock(); 789 + } 790 + 780 791 /** 781 792 * __count_memcg_events - account VM events in a cgroup 782 793 * @memcg: the memory cgroup ··· 2672 2661 } 2673 2662 2674 2663 #ifdef CONFIG_MEMCG_KMEM 2664 + /* 2665 + * Returns a pointer to the memory cgroup to which the kernel object is charged. 2666 + * 2667 + * The caller must ensure the memcg lifetime, e.g. by taking rcu_read_lock(), 2668 + * cgroup_mutex, etc. 2669 + */ 2670 + struct mem_cgroup *mem_cgroup_from_obj(void *p) 2671 + { 2672 + struct page *page; 2673 + 2674 + if (mem_cgroup_disabled()) 2675 + return NULL; 2676 + 2677 + page = virt_to_head_page(p); 2678 + 2679 + /* 2680 + * Slab pages don't have page->mem_cgroup set because corresponding 2681 + * kmem caches can be reparented during the lifetime. That's why 2682 + * memcg_from_slab_page() should be used instead. 2683 + */ 2684 + if (PageSlab(page)) 2685 + return memcg_from_slab_page(page); 2686 + 2687 + /* All other pages use page->mem_cgroup */ 2688 + return page->mem_cgroup; 2689 + } 2690 + 2675 2691 static int memcg_alloc_cache_id(void) 2676 2692 { 2677 2693 int id, size;