memcg: simplify consume_stock · tjh.dev/kernel@2fba596

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

memcg: simplify consume_stock

Patch series "memcg: decouple memcg and objcg stocks", v3.

The per-cpu memcg charge cache and objcg charge cache are coupled in a
single struct memcg_stock_pcp and a single local lock is used to protect
both of the caches. This makes memcg charging and objcg charging nmi safe
challenging. Decoupling memcg and objcg stocks would allow us to make
them nmi safe and even work without disabling irqs independently. This
series completely decouples memcg and objcg stocks.

To evaluate the impact of this series with and without PREEMPT_RT config,
we ran varying number of netperf clients in different cgroups on a 72 CPU
machine.

$ netserver -6
$ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K

PREEMPT_RT config:
------------------
number of clients | Without series | With series
6 | 38559.1 Mbps | 38652.6 Mbps
12 | 37388.8 Mbps | 37560.1 Mbps
18 | 30707.5 Mbps | 31378.3 Mbps
24 | 25908.4 Mbps | 26423.9 Mbps
30 | 22347.7 Mbps | 22326.5 Mbps
36 | 20235.1 Mbps | 20165.0 Mbps

!PREEMPT_RT config:
-------------------
number of clients | Without series | With series
6 | 50235.7 Mbps | 51415.4 Mbps
12 | 49336.5 Mbps | 49901.4 Mbps
18 | 46306.8 Mbps | 46482.7 Mbps
24 | 38145.7 Mbps | 38729.4 Mbps
30 | 30347.6 Mbps | 31698.2 Mbps
36 | 26976.6 Mbps | 27364.4 Mbps

No performance regression was observed.

This patch (of 4):

consume_stock() does not need to check gfp_mask for spinning and can
simply trylock the local lock to decide to proceed or fail. No need to
spin at all for local lock.

One of the concern raised was that on PREEMPT_RT kernels, this trylock can
fail more often due to tasks having lock_lock can be preempted. This can
potentially cause the task which have preempted the task having the
local_lock to take the slow path of memcg charging.

However this behavior will only impact the performance if memcg charging
slowpath is worse than two context switches and possibly scheduling delay
behavior of current code. From the network intensive workload experiment
it does not seem like the case.

We ran varying number of netperf clients in different cgroups on a 72 CPU
machine for PREEMPT_RT config.

$ netserver -6
$ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K

number of clients | Without series | With series
6 | 38559.1 Mbps | 38652.6 Mbps
12 | 37388.8 Mbps | 37560.1 Mbps
18 | 30707.5 Mbps | 31378.3 Mbps
24 | 25908.4 Mbps | 26423.9 Mbps
30 | 22347.7 Mbps | 22326.5 Mbps
36 | 20235.1 Mbps | 20165.0 Mbps

We don't see any significant performance difference for the network
intensive workload with this series.

Link: https://lkml.kernel.org/r/20250506225533.2580386-1-shakeel.butt@linux.dev
Link: https://lkml.kernel.org/r/20250506225533.2580386-2-shakeel.butt@linux.dev
Signed-off-by: Shakeel Butt <shakeel.butt@linux.dev>
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Eric Dumaze <edumazet@google.com>
Cc: Jakub Kacinski <kuba@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Muchun Song <muchun.song@linux.dev>
Cc: Roman Gushchin <roman.gushchin@linux.dev>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Shakeel Butt and committed by

Andrew Morton 10 months ago 2fba5961 0cad6736

+7 -13

1 changed file

expand all

memcontrol.c

+7 -13

mm/memcontrol.c

··· 1806 1806 * consume_stock: Try to consume stocked charge on this cpu. 1807 1807 * @memcg: memcg to consume from. 1808 1808 * @nr_pages: how many pages to charge. 1809 - * @gfp_mask: allocation mask. 1810 1809 * 1811 - * The charges will only happen if @memcg matches the current cpu's memcg 1812 - * stock, and at least @nr_pages are available in that stock. Failure to 1813 - * service an allocation will refill the stock. 1810 + * Consume the cached charge if enough nr_pages are present otherwise return 1811 + * failure. Also return failure for charge request larger than 1812 + * MEMCG_CHARGE_BATCH or if the local lock is already taken. 1814 1813 * 1815 1814 * returns true if successful, false otherwise. 1816 1815 */ 1817 - static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages, 1818 - gfp_t gfp_mask) 1816 + static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages) 1819 1817 { 1820 1818 struct memcg_stock_pcp *stock; 1821 1819 uint8_t stock_pages; ··· 1821 1823 bool ret = false; 1822 1824 int i; 1823 1825 1824 - if (nr_pages > MEMCG_CHARGE_BATCH) 1825 - return ret; 1826 - 1827 - if (gfpflags_allow_spinning(gfp_mask)) 1828 - local_lock_irqsave(&memcg_stock.stock_lock, flags); 1829 - else if (!local_trylock_irqsave(&memcg_stock.stock_lock, flags)) 1826 + if (nr_pages > MEMCG_CHARGE_BATCH || 1827 + !local_trylock_irqsave(&memcg_stock.stock_lock, flags)) 1830 1828 return ret; 1831 1829 1832 1830 stock = this_cpu_ptr(&memcg_stock); ··· 2325 2331 unsigned long pflags; 2326 2332 2327 2333 retry: 2328 - if (consume_stock(memcg, nr_pages, gfp_mask)) 2334 + if (consume_stock(memcg, nr_pages)) 2329 2335 return 0; 2330 2336 2331 2337 if (!gfpflags_allow_spinning(gfp_mask))