Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

tcp: remove one ktime_get() from recvmsg() fast path

Each time some payload is consumed by user space (recvmsg() and friends),
TCP calls tcp_rcv_space_adjust() to run DRS algorithm to check
if an increase of sk->sk_rcvbuf is needed.

This function is based on time sampling, and currently calls
tcp_mstamp_refresh(tp), which is a wrapper around ktime_get_ns().

ktime_get_ns() has a high cost on some platforms.
100+ cycles for rdtscp on AMD EPYC Turin for instance.

We do not have to refresh tp->tcp_mpstamp, using the last cached value
is enough. We only need to refresh it from __tcp_cleanup_rbuf()
if an ACK must be sent (this is a rare event).

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251024120707.3516550-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

Eric Dumazet and committed by
Jakub Kicinski
0ae1ac73 6f147c83

+11 -3
+3 -1
net/ipv4/tcp.c
··· 1556 1556 time_to_ack = true; 1557 1557 } 1558 1558 } 1559 - if (time_to_ack) 1559 + if (time_to_ack) { 1560 + tcp_mstamp_refresh(tp); 1560 1561 tcp_send_ack(sk); 1562 + } 1561 1563 } 1562 1564 1563 1565 void tcp_cleanup_rbuf(struct sock *sk, int copied)
+8 -2
net/ipv4/tcp_input.c
··· 928 928 929 929 trace_tcp_rcv_space_adjust(sk); 930 930 931 - tcp_mstamp_refresh(tp); 931 + if (unlikely(!tp->rcv_rtt_est.rtt_us)) 932 + return; 933 + 934 + /* We do not refresh tp->tcp_mstamp here. 935 + * Some platforms have expensive ktime_get() implementations. 936 + * Using the last cached value is enough for DRS. 937 + */ 932 938 time = tcp_stamp_us_delta(tp->tcp_mstamp, tp->rcvq_space.time); 933 - if (time < (tp->rcv_rtt_est.rtt_us >> 3) || tp->rcv_rtt_est.rtt_us == 0) 939 + if (time < (tp->rcv_rtt_est.rtt_us >> 3)) 934 940 return; 935 941 936 942 /* Number of bytes copied to user in last RTT */