Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

net: add net.core.qdisc_max_burst

In blamed commit, I added a check against the temporary queue
built in __dev_xmit_skb(). Idea was to drop packets early,
before any spinlock was acquired.

if (unlikely(defer_count > READ_ONCE(q->limit))) {
kfree_skb_reason(skb, SKB_DROP_REASON_QDISC_DROP);
return NET_XMIT_DROP;
}

It turned out that HTB Qdisc has a zero q->limit.
HTB limits packets on a per-class basis.
Some of our tests became flaky.

Add a new sysctl : net.core.qdisc_max_burst to control
how many packets can be stored in the temporary lockless queue.

Also add a new QDISC_BURST_DROP drop reason to better diagnose
future issues.

Thanks Neal !

Fixes: 100dfa74cad9 ("net: dev_queue_xmit() llist adoption")
Reported-and-bisected-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Link: https://patch.msgid.link/20260107104159.3669285-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

authored by

Eric Dumazet and committed by
Paolo Abeni
ffe4ccd3 dfdf7746

+26 -3
+8
Documentation/admin-guide/sysctl/net.rst
··· 303 303 Maximum number of packets, queued on the INPUT side, when the interface 304 304 receives packets faster than kernel can process them. 305 305 306 + qdisc_max_burst 307 + ------------------ 308 + 309 + Maximum number of packets that can be temporarily stored before 310 + reaching qdisc. 311 + 312 + Default: 1000 313 + 306 314 netdev_rss_key 307 315 -------------- 308 316
+6
include/net/dropreason-core.h
··· 67 67 FN(TC_EGRESS) \ 68 68 FN(SECURITY_HOOK) \ 69 69 FN(QDISC_DROP) \ 70 + FN(QDISC_BURST_DROP) \ 70 71 FN(QDISC_OVERLIMIT) \ 71 72 FN(QDISC_CONGESTED) \ 72 73 FN(CAKE_FLOOD) \ ··· 375 374 * failed to enqueue to current qdisc) 376 375 */ 377 376 SKB_DROP_REASON_QDISC_DROP, 377 + /** 378 + * @SKB_DROP_REASON_QDISC_BURST_DROP: dropped when net.core.qdisc_max_burst 379 + * limit is hit. 380 + */ 381 + SKB_DROP_REASON_QDISC_BURST_DROP, 378 382 /** 379 383 * @SKB_DROP_REASON_QDISC_OVERLIMIT: dropped by qdisc when a qdisc 380 384 * instance exceeds its total buffer size limit.
+1
include/net/hotdata.h
··· 42 42 int netdev_budget_usecs; 43 43 int tstamp_prequeue; 44 44 int max_backlog; 45 + int qdisc_max_burst; 45 46 int dev_tx_weight; 46 47 int dev_rx_weight; 47 48 int sysctl_max_skb_frags;
+3 -3
net/core/dev.c
··· 4203 4203 do { 4204 4204 if (first_n && !defer_count) { 4205 4205 defer_count = atomic_long_inc_return(&q->defer_count); 4206 - if (unlikely(defer_count > READ_ONCE(q->limit))) { 4207 - kfree_skb_reason(skb, SKB_DROP_REASON_QDISC_DROP); 4206 + if (unlikely(defer_count > READ_ONCE(net_hotdata.qdisc_max_burst))) { 4207 + kfree_skb_reason(skb, SKB_DROP_REASON_QDISC_BURST_DROP); 4208 4208 return NET_XMIT_DROP; 4209 4209 } 4210 4210 } ··· 4222 4222 ll_list = llist_del_all(&q->defer_list); 4223 4223 /* There is a small race because we clear defer_count not atomically 4224 4224 * with the prior llist_del_all(). This means defer_list could grow 4225 - * over q->limit. 4225 + * over qdisc_max_burst. 4226 4226 */ 4227 4227 atomic_long_set(&q->defer_count, 0); 4228 4228
+1
net/core/hotdata.c
··· 17 17 18 18 .tstamp_prequeue = 1, 19 19 .max_backlog = 1000, 20 + .qdisc_max_burst = 1000, 20 21 .dev_tx_weight = 64, 21 22 .dev_rx_weight = 64, 22 23 .sysctl_max_skb_frags = MAX_SKB_FRAGS,
+7
net/core/sysctl_net_core.c
··· 430 430 .proc_handler = proc_dointvec 431 431 }, 432 432 { 433 + .procname = "qdisc_max_burst", 434 + .data = &net_hotdata.qdisc_max_burst, 435 + .maxlen = sizeof(int), 436 + .mode = 0644, 437 + .proc_handler = proc_dointvec 438 + }, 439 + { 433 440 .procname = "netdev_rss_key", 434 441 .data = &netdev_rss_key, 435 442 .maxlen = sizeof(int),