Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

net: introduce res_counter_charge_nofail() for socket allocations

There is a case in __sk_mem_schedule(), where an allocation
is beyond the maximum, but yet we are allowed to proceed.
It happens under the following condition:

sk->sk_wmem_queued + size >= sk->sk_sndbuf

The network code won't revert the allocation in this case,
meaning that at some point later it'll try to do it. Since
this is never communicated to the underlying res_counter
code, there is an inbalance in res_counter uncharge operation.

I see two ways of fixing this:

1) storing the information about those allocations somewhere
in memcg, and then deducting from that first, before
we start draining the res_counter,
2) providing a slightly different allocation function for
the res_counter, that matches the original behavior of
the network code more closely.

I decided to go for #2 here, believing it to be more elegant,
since #1 would require us to do basically that, but in a more
obscure way.

Signed-off-by: Glauber Costa <glommer@parallels.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
CC: Tejun Heo <tj@kernel.org>
CC: Li Zefan <lizf@cn.fujitsu.com>
CC: Laurent Chavey <chavey@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Glauber Costa and committed by
David S. Miller
0e90b31f 8cfd14ad

+37 -8
+6
include/linux/res_counter.h
··· 109 109 * 110 110 * returns 0 on success and <0 if the counter->usage will exceed the 111 111 * counter->limit _locked call expects the counter->lock to be taken 112 + * 113 + * charge_nofail works the same, except that it charges the resource 114 + * counter unconditionally, and returns < 0 if the after the current 115 + * charge we are over limit. 112 116 */ 113 117 114 118 int __must_check res_counter_charge_locked(struct res_counter *counter, 115 119 unsigned long val); 116 120 int __must_check res_counter_charge(struct res_counter *counter, 121 + unsigned long val, struct res_counter **limit_fail_at); 122 + int __must_check res_counter_charge_nofail(struct res_counter *counter, 117 123 unsigned long val, struct res_counter **limit_fail_at); 118 124 119 125 /*
+4 -6
include/net/sock.h
··· 1008 1008 struct res_counter *fail; 1009 1009 int ret; 1010 1010 1011 - ret = res_counter_charge(prot->memory_allocated, 1012 - amt << PAGE_SHIFT, &fail); 1013 - 1011 + ret = res_counter_charge_nofail(prot->memory_allocated, 1012 + amt << PAGE_SHIFT, &fail); 1014 1013 if (ret < 0) 1015 1014 *parent_status = OVER_LIMIT; 1016 1015 } ··· 1053 1054 } 1054 1055 1055 1056 static inline void 1056 - sk_memory_allocated_sub(struct sock *sk, int amt, int parent_status) 1057 + sk_memory_allocated_sub(struct sock *sk, int amt) 1057 1058 { 1058 1059 struct proto *prot = sk->sk_prot; 1059 1060 1060 - if (mem_cgroup_sockets_enabled && sk->sk_cgrp && 1061 - parent_status != OVER_LIMIT) /* Otherwise was uncharged already */ 1061 + if (mem_cgroup_sockets_enabled && sk->sk_cgrp) 1062 1062 memcg_memory_allocated_sub(sk->sk_cgrp, amt); 1063 1063 1064 1064 atomic_long_sub(amt, prot->memory_allocated);
+25
kernel/res_counter.c
··· 66 66 return ret; 67 67 } 68 68 69 + int res_counter_charge_nofail(struct res_counter *counter, unsigned long val, 70 + struct res_counter **limit_fail_at) 71 + { 72 + int ret, r; 73 + unsigned long flags; 74 + struct res_counter *c; 75 + 76 + r = ret = 0; 77 + *limit_fail_at = NULL; 78 + local_irq_save(flags); 79 + for (c = counter; c != NULL; c = c->parent) { 80 + spin_lock(&c->lock); 81 + r = res_counter_charge_locked(c, val); 82 + if (r) 83 + c->usage += val; 84 + spin_unlock(&c->lock); 85 + if (r < 0 && ret == 0) { 86 + *limit_fail_at = c; 87 + ret = r; 88 + } 89 + } 90 + local_irq_restore(flags); 91 + 92 + return ret; 93 + } 69 94 void res_counter_uncharge_locked(struct res_counter *counter, unsigned long val) 70 95 { 71 96 if (WARN_ON(counter->usage < val))
+2 -2
net/core/sock.c
··· 1827 1827 /* Alas. Undo changes. */ 1828 1828 sk->sk_forward_alloc -= amt * SK_MEM_QUANTUM; 1829 1829 1830 - sk_memory_allocated_sub(sk, amt, parent_status); 1830 + sk_memory_allocated_sub(sk, amt); 1831 1831 1832 1832 return 0; 1833 1833 } ··· 1840 1840 void __sk_mem_reclaim(struct sock *sk) 1841 1841 { 1842 1842 sk_memory_allocated_sub(sk, 1843 - sk->sk_forward_alloc >> SK_MEM_QUANTUM_SHIFT, 0); 1843 + sk->sk_forward_alloc >> SK_MEM_QUANTUM_SHIFT); 1844 1844 sk->sk_forward_alloc &= SK_MEM_QUANTUM - 1; 1845 1845 1846 1846 if (sk_under_memory_pressure(sk) &&