Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Revert "netfilter: nat: force port remap to prevent shadowing well-known ports"

This reverts commit 878aed8db324bec64f3c3f956e64d5ae7375a5de.

This change breaks existing setups where conntrack is used with
asymmetric paths.

In these cases, the NAT transformation occurs on the syn-ack instead of
the syn:

1. SYN x:12345 -> y -> 443 // sent by initiator, receiverd by responder
2. SYNACK y:443 -> x:12345 // First packet seen by conntrack, as sent by responder
3. tuple_force_port_remap() gets called, sees:
'tcp from 443 to port 12345 NAT' -> pick a new source port, inititor receives
4. SYNACK y:$RANDOM -> x:12345 // connection is never established

While its possible to avoid the breakage with NOTRACK rules, a kernel
update should not break working setups.

An alternative to the revert is to augment conntrack to tag
mid-stream connections plus more code in the nat core to skip NAT
for such connections, however, this leads to more interaction/integration
between conntrack and NAT.

Therefore, revert, users will need to add explicit nat rules to avoid
port shadowing.

Link: https://lore.kernel.org/netfilter-devel/20220302105908.GA5852@breakpoint.cc/#R
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2051413
Signed-off-by: Florian Westphal <fw@strlen.de>

+5 -43
+3 -40
net/netfilter/nf_nat_core.c
··· 494 494 goto another_round; 495 495 } 496 496 497 - static bool tuple_force_port_remap(const struct nf_conntrack_tuple *tuple) 498 - { 499 - u16 sp, dp; 500 - 501 - switch (tuple->dst.protonum) { 502 - case IPPROTO_TCP: 503 - sp = ntohs(tuple->src.u.tcp.port); 504 - dp = ntohs(tuple->dst.u.tcp.port); 505 - break; 506 - case IPPROTO_UDP: 507 - case IPPROTO_UDPLITE: 508 - sp = ntohs(tuple->src.u.udp.port); 509 - dp = ntohs(tuple->dst.u.udp.port); 510 - break; 511 - default: 512 - return false; 513 - } 514 - 515 - /* IANA: System port range: 1-1023, 516 - * user port range: 1024-49151, 517 - * private port range: 49152-65535. 518 - * 519 - * Linux default ephemeral port range is 32768-60999. 520 - * 521 - * Enforce port remapping if sport is significantly lower 522 - * than dport to prevent NAT port shadowing, i.e. 523 - * accidental match of 'new' inbound connection vs. 524 - * existing outbound one. 525 - */ 526 - return sp < 16384 && dp >= 32768; 527 - } 528 - 529 497 /* Manipulate the tuple into the range given. For NF_INET_POST_ROUTING, 530 498 * we change the source to map into the range. For NF_INET_PRE_ROUTING 531 499 * and NF_INET_LOCAL_OUT, we change the destination to map into the ··· 507 539 struct nf_conn *ct, 508 540 enum nf_nat_manip_type maniptype) 509 541 { 510 - bool random_port = range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL; 511 542 const struct nf_conntrack_zone *zone; 512 543 struct net *net = nf_ct_net(ct); 513 544 514 545 zone = nf_ct_zone(ct); 515 - 516 - if (maniptype == NF_NAT_MANIP_SRC && 517 - !random_port && 518 - !ct->local_origin) 519 - random_port = tuple_force_port_remap(orig_tuple); 520 546 521 547 /* 1) If this srcip/proto/src-proto-part is currently mapped, 522 548 * and that same mapping gives a unique tuple within the given ··· 520 558 * So far, we don't do local source mappings, so multiple 521 559 * manips not an issue. 522 560 */ 523 - if (maniptype == NF_NAT_MANIP_SRC && !random_port) { 561 + if (maniptype == NF_NAT_MANIP_SRC && 562 + !(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) { 524 563 /* try the original tuple first */ 525 564 if (in_range(orig_tuple, range)) { 526 565 if (!nf_nat_used_tuple(orig_tuple, ct)) { ··· 545 582 */ 546 583 547 584 /* Only bother mapping if it's not already in range and unique */ 548 - if (!random_port) { 585 + if (!(range->flags & NF_NAT_RANGE_PROTO_RANDOM_ALL)) { 549 586 if (range->flags & NF_NAT_RANGE_PROTO_SPECIFIED) { 550 587 if (!(range->flags & NF_NAT_RANGE_PROTO_OFFSET) && 551 588 l4proto_in_range(tuple, maniptype,
+2 -3
tools/testing/selftests/netfilter/nft_nat.sh
··· 880 880 return $ksft_skip 881 881 fi 882 882 883 - # test default behaviour. Packet from ns1 to ns0 is not redirected 884 - # due to automatic port translation. 885 - test_port_shadow "default" "ROUTER" 883 + # test default behaviour. Packet from ns1 to ns0 is redirected to ns2. 884 + test_port_shadow "default" "CLIENT" 886 885 887 886 # test packet filter based mitigation: prevent forwarding of 888 887 # packets claiming to come from the service port.