Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

netfilter: nf_nat: remove bogus direction check

Jakub reports spurious failures of the 'conntrack_reverse_clash.sh'
selftest. A bogus test makes nat core resort to port rewrite even
though there is no need for this.

When the test is made, nf_nat_used_tuple() would already have caused us
to return if no other CPU had added a colliding entry.
Moreover, nf_nat_used_tuple() would have ignored the colliding entry if
their origin tuples had been the same.

All that is left to check is if the colliding entry in the hash table
is subject to NAT, and, if its not, if our entry matches in the reverse
direction, e.g. hash table has

addr1:1234 -> addr2:80, and we want to commit
addr2:80 -> addr1:1234.

Because we already checked that neither the new nor the committed entry is
subject to NAT we only have to check origin vs. reply tuple:
for non-nat entries, the reply tuple is always the inverted original.

Just in case there are more problems extend the error reporting
in the selftest while at it and dump conntrack table/stats on error.

Reported-by: Jakub Kicinski <kuba@kernel.org>
Closes: https://lore.kernel.org/netdev/20251206175135.4a56591b@kernel.org/
Fixes: d8f84a9bc7c4 ("netfilter: nf_nat: don't try nat source port reallocation for reverse dir clash")
Signed-off-by: Florian Westphal <fw@strlen.de>

+12 -17
+1 -13
net/netfilter/nf_nat_core.c
··· 294 294 295 295 ct = nf_ct_tuplehash_to_ctrack(thash); 296 296 297 - /* NB: IP_CT_DIR_ORIGINAL should be impossible because 298 - * nf_nat_used_tuple() handles origin collisions. 299 - * 300 - * Handle remote chance other CPU confirmed its ct right after. 301 - */ 302 - if (thash->tuple.dst.dir != IP_CT_DIR_REPLY) 303 - goto out; 304 - 305 297 /* clashing connection subject to NAT? Retry with new tuple. */ 306 298 if (READ_ONCE(ct->status) & uses_nat) 307 299 goto out; 308 300 309 301 if (nf_ct_tuple_equal(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple, 310 - &ignored_ct->tuplehash[IP_CT_DIR_REPLY].tuple) && 311 - nf_ct_tuple_equal(&ct->tuplehash[IP_CT_DIR_REPLY].tuple, 312 - &ignored_ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple)) { 302 + &ignored_ct->tuplehash[IP_CT_DIR_REPLY].tuple)) 313 303 taken = false; 314 - goto out; 315 - } 316 304 out: 317 305 nf_ct_put(ct); 318 306 return taken;
+9 -4
tools/testing/selftests/net/netfilter/conntrack_reverse_clash.c
··· 33 33 exit(111); 34 34 } 35 35 36 - static void die_port(uint16_t got, uint16_t want) 36 + static void die_port(const struct sockaddr_in *sin, uint16_t want) 37 37 { 38 - fprintf(stderr, "Port number changed, wanted %d got %d\n", want, ntohs(got)); 38 + uint16_t got = ntohs(sin->sin_port); 39 + char str[INET_ADDRSTRLEN]; 40 + 41 + inet_ntop(AF_INET, &sin->sin_addr, str, sizeof(str)); 42 + 43 + fprintf(stderr, "Port number changed, wanted %d got %d from %s\n", want, got, str); 39 44 exit(1); 40 45 } 41 46 ··· 105 100 die("child recvfrom"); 106 101 107 102 if (peer.sin_port != htons(PORT)) 108 - die_port(peer.sin_port, PORT); 103 + die_port(&peer, PORT); 109 104 } else { 110 105 if (sendto(s2, buf, LEN, 0, (struct sockaddr *)&sa1, sizeof(sa1)) != LEN) 111 106 continue; ··· 114 109 die("parent recvfrom"); 115 110 116 111 if (peer.sin_port != htons((PORT + 1))) 117 - die_port(peer.sin_port, PORT + 1); 112 + die_port(&peer, PORT + 1); 118 113 } 119 114 } 120 115
+2
tools/testing/selftests/net/netfilter/conntrack_reverse_clash.sh
··· 45 45 echo "PASS: No SNAT performed for null bindings" 46 46 else 47 47 echo "ERROR: SNAT performed without any matching snat rule" 48 + ip netns exec "$ns0" conntrack -L 49 + ip netns exec "$ns0" conntrack -S 48 50 exit 1 49 51 fi 50 52