Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

xfrm: Refactor xfrm_input lock to reduce contention with RSS

With newer NICs like mlx5 supporting RSS for IPsec crypto offload,
packets for a single Security Association (SA) are scattered across
multiple CPU cores for parallel processing. The xfrm_state spinlock
(x->lock) is held for each packet during xfrm processing.

When multiple connections or flows share the same SA, this parallelism
causes high lock contention on x->lock, creating a performance
bottleneck and limiting scalability.

The original xfrm_input() function exacerbated this issue by releasing
and immediately re-acquiring x->lock. For hardware crypto offload
paths, this unlock/relock sequence is unnecessary and introduces
significant overhead. This patch refactors the function to relocate
the type_offload->input_tail call for the offload path, performing all
necessary work while continuously holding the lock. This reordering is
safe, since packets which don't pass the checks below will still fail
them with the new code.

Performance testing with iperf using multiple parallel streams over a
single IPsec SA shows significant improvement in throughput as the
number of queues (and thus CPU cores) increases:

+-----------+---------------+--------------+-----------------+
| RX queues | Before (Gbps) | After (Gbps) | Improvement (%) |
+-----------+---------------+--------------+-----------------+
| 2 | 32.3 | 34.4 | 6.5 |
| 4 | 34.4 | 40.0 | 16.3 |
| 6 | 24.5 | 38.3 | 56.3 |
| 8 | 23.1 | 38.3 | 65.8 |
| 12 | 18.1 | 29.9 | 65.2 |
| 16 | 16.0 | 25.2 | 57.5 |
+-----------+---------------+--------------+-----------------+

Signed-off-by: Jianbo Liu <jianbol@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>

authored by

Jianbo Liu and committed by
Steffen Klassert
10a11861 bfe62db5

+7 -7
+7 -7
net/xfrm/xfrm_input.c
··· 505 505 async = 1; 506 506 dev_put(skb->dev); 507 507 seq = XFRM_SKB_CB(skb)->seq.input.low; 508 + spin_lock(&x->lock); 508 509 goto resume; 509 510 } 510 511 /* GRO call */ ··· 542 541 XFRM_INC_STATS(net, LINUX_MIB_XFRMINHDRERROR); 543 542 goto drop; 544 543 } 544 + 545 + nexthdr = x->type_offload->input_tail(x, skb); 545 546 } 546 547 547 548 goto lock; ··· 641 638 goto drop_unlock; 642 639 } 643 640 644 - spin_unlock(&x->lock); 645 - 646 641 if (xfrm_tunnel_check(skb, x, family)) { 647 642 XFRM_INC_STATS(net, LINUX_MIB_XFRMINSTATEMODEERROR); 648 - goto drop; 643 + goto drop_unlock; 649 644 } 650 645 651 646 seq_hi = htonl(xfrm_replay_seqhi(x, seq)); ··· 651 650 XFRM_SKB_CB(skb)->seq.input.low = seq; 652 651 XFRM_SKB_CB(skb)->seq.input.hi = seq_hi; 653 652 654 - if (crypto_done) { 655 - nexthdr = x->type_offload->input_tail(x, skb); 656 - } else { 653 + if (!crypto_done) { 654 + spin_unlock(&x->lock); 657 655 dev_hold(skb->dev); 658 656 659 657 nexthdr = x->type->input(x, skb); ··· 660 660 return 0; 661 661 662 662 dev_put(skb->dev); 663 + spin_lock(&x->lock); 663 664 } 664 665 resume: 665 - spin_lock(&x->lock); 666 666 if (nexthdr < 0) { 667 667 if (nexthdr == -EBADMSG) { 668 668 xfrm_audit_state_icvfail(x, skb,