Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

net: dqs: make struct dql more cache efficient

With the previous change, struct dqs->stall_thrs will be in the hot path
(at queue side), even if DQS is disabled.

The other fields accessed in this function (last_obj_cnt and num_queued)
are in the first cache line, let's move this field (stall_thrs) to the
very first cache line, since there is a hole there.

This does not change the structure size, since it moves an short (2
bytes) to 4-bytes whole in the first cache line.

This is the new structure format now:

struct dql {
unsigned int num_queued;
unsigned int last_obj_cnt;
...
short unsigned int stall_thrs;
/* XXX 2 bytes hole, try to pack */
...
/* --- cacheline 1 boundary (64 bytes) --- */
...
/* Longest stall detected, reported to user */
short unsigned int stall_max;
/* XXX 2 bytes hole, try to pack */
};

Also, read the stall_thrs (now in the very first cache line) earlier,
together with dql->num_queued (also in the first cache line).

Suggested-by: Jakub Kicinski <kuba@kernel.org>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/20240411192241.2498631-5-leitao@debian.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

Breno Leitao and committed by
Jakub Kicinski
4ba67ef3 721f076b

+12 -6
+3 -2
include/linux/dynamic_queue_limits.h
··· 50 50 unsigned int adj_limit; /* limit + num_completed */ 51 51 unsigned int last_obj_cnt; /* Count at last queuing */ 52 52 53 + /* Stall threshold (in jiffies), defined by user */ 54 + unsigned short stall_thrs; 55 + 53 56 unsigned long history_head; /* top 58 bits of jiffies */ 54 57 /* stall entries, a bit per entry */ 55 58 unsigned long history[DQL_HIST_LEN]; ··· 74 71 unsigned int min_limit; /* Minimum limit */ 75 72 unsigned int slack_hold_time; /* Time to measure slack */ 76 73 77 - /* Stall threshold (in jiffies), defined by user */ 78 - unsigned short stall_thrs; 79 74 /* Longest stall detected, reported to user */ 80 75 unsigned short stall_max; 81 76 unsigned long last_reap; /* Last reap (in jiffies) */
+9 -4
lib/dynamic_queue_limits.c
··· 15 15 #define POSDIFF(A, B) ((int)((A) - (B)) > 0 ? (A) - (B) : 0) 16 16 #define AFTER_EQ(A, B) ((int)((A) - (B)) >= 0) 17 17 18 - static void dql_check_stall(struct dql *dql) 18 + static void dql_check_stall(struct dql *dql, unsigned short stall_thrs) 19 19 { 20 - unsigned short stall_thrs; 21 20 unsigned long now; 22 21 23 - stall_thrs = READ_ONCE(dql->stall_thrs); 24 22 if (!stall_thrs) 25 23 return; 26 24 ··· 84 86 { 85 87 unsigned int inprogress, prev_inprogress, limit; 86 88 unsigned int ovlimit, completed, num_queued; 89 + unsigned short stall_thrs; 87 90 bool all_prev_completed; 88 91 89 92 num_queued = READ_ONCE(dql->num_queued); 93 + /* Read stall_thrs in advance since it belongs to the same (first) 94 + * cache line as ->num_queued. This way, dql_check_stall() does not 95 + * need to touch the first cache line again later, reducing the window 96 + * of possible false sharing. 97 + */ 98 + stall_thrs = READ_ONCE(dql->stall_thrs); 90 99 91 100 /* Can't complete more than what's in queue */ 92 101 BUG_ON(count > num_queued - dql->num_completed); ··· 183 178 dql->num_completed = completed; 184 179 dql->prev_num_queued = num_queued; 185 180 186 - dql_check_stall(dql); 181 + dql_check_stall(dql, stall_thrs); 187 182 } 188 183 EXPORT_SYMBOL(dql_completed); 189 184