Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

af_unix: Optimise hash table layout.

Commit 6dd4142fb5a9 ("Merge branch 'af_unix-per-netns-socket-hash'") and
commit 51bae889fe11 ("af_unix: Put pathname sockets in the global hash
table.") changed a hash table layout.

Before:
unix_socket_table [0 - 255] : abstract & pathname sockets
[256 - 511] : unnamed sockets

After:
per-netns table [0 - 255] : abstract & pathname sockets
[256 - 511] : unnamed sockets
bsd_socket_table [0 - 255] : pathname sockets (sk_bind_node)

Now, while looking up sockets, we traverse the global table for the
pathname sockets and the first half of each per-netns hash table for
abstract sockets, where pathname sockets are also linked. Thus, the
more pathname sockets we have, the longer we take to look up abstract
sockets. This characteristic has been there before the layout change,
but we can improve it now.

This patch changes the per-netns hash table's layout so that sockets not
requiring lookup reside in the first half and do not impact the lookup of
abstract sockets.

per-netns table [0 - 255] : pathname & unnamed sockets
[256 - 511] : abstract sockets
bsd_socket_table [0 - 255] : pathname sockets (sk_bind_node)

We have run a test that bind()s 100,000 abstract/pathname sockets for
each, bind()s an abstract socket 100,000 times and measures the time
on __unix_find_socket_byname(). The result shows that the patch makes
each lookup faster.

Without this patch:
$ sudo ./funclatency -p 2278 --microseconds __unix_find_socket_byname.isra.44
usec : count distribution
0 -> 1 : 0 | |
2 -> 3 : 0 | |
4 -> 7 : 0 | |
8 -> 15 : 126 | |
16 -> 31 : 1438 |* |
32 -> 63 : 4150 |*** |
64 -> 127 : 9049 |******* |
128 -> 255 : 37704 |******************************* |
256 -> 511 : 47533 |****************************************|

With this patch:
$ sudo ./funclatency -p 3648 --microseconds __unix_find_socket_byname.isra.46
usec : count distribution
0 -> 1 : 109 | |
2 -> 3 : 318 | |
4 -> 7 : 725 | |
8 -> 15 : 2501 |* |
16 -> 31 : 3061 |** |
32 -> 63 : 4028 |*** |
64 -> 127 : 9312 |******* |
128 -> 255 : 51372 |****************************************|
256 -> 511 : 28574 |********************** |

Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20220705233715.759-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

authored by

Kuniyuki Iwashima and committed by
Paolo Abeni
cf21b355 7ed5f245

+12 -6
+12 -6
net/unix/af_unix.c
··· 135 135 hash ^= hash >> 8; 136 136 hash ^= sk->sk_type; 137 137 138 - return UNIX_HASH_MOD + 1 + (hash & UNIX_HASH_MOD); 138 + return hash & UNIX_HASH_MOD; 139 139 } 140 140 141 141 static unsigned int unix_bsd_hash(struct inode *i) ··· 153 153 hash ^= hash >> 8; 154 154 hash ^= type; 155 155 156 - return hash & UNIX_HASH_MOD; 156 + return UNIX_HASH_MOD + 1 + (hash & UNIX_HASH_MOD); 157 157 } 158 158 159 159 static void unix_table_double_lock(struct net *net, 160 160 unsigned int hash1, unsigned int hash2) 161 161 { 162 - /* hash1 and hash2 is never the same because 163 - * one is between 0 and UNIX_HASH_MOD, and 164 - * another is between UNIX_HASH_MOD + 1 and UNIX_HASH_SIZE - 1. 165 - */ 162 + if (hash1 == hash2) { 163 + spin_lock(&net->unx.table.locks[hash1]); 164 + return; 165 + } 166 + 166 167 if (hash1 > hash2) 167 168 swap(hash1, hash2); 168 169 ··· 174 173 static void unix_table_double_unlock(struct net *net, 175 174 unsigned int hash1, unsigned int hash2) 176 175 { 176 + if (hash1 == hash2) { 177 + spin_unlock(&net->unx.table.locks[hash1]); 178 + return; 179 + } 180 + 177 181 spin_unlock(&net->unx.table.locks[hash1]); 178 182 spin_unlock(&net->unx.table.locks[hash2]); 179 183 }