Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

[TCPv4]: Improve BH latency in /proc/net/tcp

Currently the code for /proc/net/tcp disable BH while iterating
over the entire established hash table. Even though we call
cond_resched_softirq for each entry, we still won't process
softirq's as regularly as we would otherwise do which results
in poor performance when the system is loaded near capacity.

This anomaly comes from the 2.4 code where this was all in a
single function and the local_bh_disable might have made sense
as a small optimisation.

The cost of each local_bh_disable is so small when compared
against the increased latency in keeping it disabled over a
large but mostly empty TCP established hash table that we
should just move it to the individual read_lock/read_unlock
calls as we do in inet_diag.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Herbert Xu and committed by
David S. Miller
a7ab4b50 c716a81a

+5 -14
+5 -14
net/ipv4/tcp_ipv4.c
··· 2045 2045 struct hlist_node *node; 2046 2046 struct inet_timewait_sock *tw; 2047 2047 2048 - /* We can reschedule _before_ having picked the target: */ 2049 - cond_resched_softirq(); 2050 - 2051 - read_lock(&tcp_hashinfo.ehash[st->bucket].lock); 2048 + read_lock_bh(&tcp_hashinfo.ehash[st->bucket].lock); 2052 2049 sk_for_each(sk, node, &tcp_hashinfo.ehash[st->bucket].chain) { 2053 2050 if (sk->sk_family != st->family) { 2054 2051 continue; ··· 2062 2065 rc = tw; 2063 2066 goto out; 2064 2067 } 2065 - read_unlock(&tcp_hashinfo.ehash[st->bucket].lock); 2068 + read_unlock_bh(&tcp_hashinfo.ehash[st->bucket].lock); 2066 2069 st->state = TCP_SEQ_STATE_ESTABLISHED; 2067 2070 } 2068 2071 out: ··· 2089 2092 cur = tw; 2090 2093 goto out; 2091 2094 } 2092 - read_unlock(&tcp_hashinfo.ehash[st->bucket].lock); 2095 + read_unlock_bh(&tcp_hashinfo.ehash[st->bucket].lock); 2093 2096 st->state = TCP_SEQ_STATE_ESTABLISHED; 2094 2097 2095 - /* We can reschedule between buckets: */ 2096 - cond_resched_softirq(); 2097 - 2098 2098 if (++st->bucket < tcp_hashinfo.ehash_size) { 2099 - read_lock(&tcp_hashinfo.ehash[st->bucket].lock); 2099 + read_lock_bh(&tcp_hashinfo.ehash[st->bucket].lock); 2100 2100 sk = sk_head(&tcp_hashinfo.ehash[st->bucket].chain); 2101 2101 } else { 2102 2102 cur = NULL; ··· 2138 2144 2139 2145 if (!rc) { 2140 2146 inet_listen_unlock(&tcp_hashinfo); 2141 - local_bh_disable(); 2142 2147 st->state = TCP_SEQ_STATE_ESTABLISHED; 2143 2148 rc = established_get_idx(seq, pos); 2144 2149 } ··· 2170 2177 rc = listening_get_next(seq, v); 2171 2178 if (!rc) { 2172 2179 inet_listen_unlock(&tcp_hashinfo); 2173 - local_bh_disable(); 2174 2180 st->state = TCP_SEQ_STATE_ESTABLISHED; 2175 2181 rc = established_get_first(seq); 2176 2182 } ··· 2201 2209 case TCP_SEQ_STATE_TIME_WAIT: 2202 2210 case TCP_SEQ_STATE_ESTABLISHED: 2203 2211 if (v) 2204 - read_unlock(&tcp_hashinfo.ehash[st->bucket].lock); 2205 - local_bh_enable(); 2212 + read_unlock_bh(&tcp_hashinfo.ehash[st->bucket].lock); 2206 2213 break; 2207 2214 } 2208 2215 }