Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

netfilter: conntrack: do not auto-delete clash entries on reply

Its possible that we have more than one packet with the same ct tuple
simultaneously, e.g. when an application emits n packets on same UDP
socket from multiple threads.

NAT rules might be applied to those packets. With the right set of rules,
n packets will be mapped to m destinations, where at least two packets end
up with the same destination.

When this happens, the existing clash resolution may merge the skb that
is processed after the first has been received with the identical tuple
already in hash table.

However, its possible that this identical tuple is a NAT_CLASH tuple.
In that case the second skb will be sent, but no reply can be received
since the reply that is processed first removes the NAT_CLASH tuple.

Do not auto-delete, this gives a 1 second window for replies to be passed
back to originator.

Packets that are coming later (udp stream case) will not be affected:
they match the original ct entry, not a NAT_CLASH one.

Also prevent NAT_CLASH entries from getting offloaded.

Fixes: 6a757c07e51f ("netfilter: conntrack: allow insertion of clashing entries")
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

authored by

Florian Westphal and committed by
Pablo Neira Ayuso
c4617214 67afbda6

+11 -17
+10 -16
net/netfilter/nf_conntrack_proto_udp.c
··· 81 81 return false; 82 82 } 83 83 84 - static void nf_conntrack_udp_refresh_unreplied(struct nf_conn *ct, 85 - struct sk_buff *skb, 86 - enum ip_conntrack_info ctinfo, 87 - u32 extra_jiffies) 88 - { 89 - if (unlikely(ctinfo == IP_CT_ESTABLISHED_REPLY && 90 - ct->status & IPS_NAT_CLASH)) 91 - nf_ct_kill(ct); 92 - else 93 - nf_ct_refresh_acct(ct, ctinfo, skb, extra_jiffies); 94 - } 95 - 96 84 /* Returns verdict for packet, and may modify conntracktype */ 97 85 int nf_conntrack_udp_packet(struct nf_conn *ct, 98 86 struct sk_buff *skb, ··· 112 124 113 125 nf_ct_refresh_acct(ct, ctinfo, skb, extra); 114 126 127 + /* never set ASSURED for IPS_NAT_CLASH, they time out soon */ 128 + if (unlikely((ct->status & IPS_NAT_CLASH))) 129 + return NF_ACCEPT; 130 + 115 131 /* Also, more likely to be important, and not a probe */ 116 132 if (!test_and_set_bit(IPS_ASSURED_BIT, &ct->status)) 117 133 nf_conntrack_event_cache(IPCT_ASSURED, ct); 118 134 } else { 119 - nf_conntrack_udp_refresh_unreplied(ct, skb, ctinfo, 120 - timeouts[UDP_CT_UNREPLIED]); 135 + nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[UDP_CT_UNREPLIED]); 121 136 } 122 137 return NF_ACCEPT; 123 138 } ··· 197 206 if (test_bit(IPS_SEEN_REPLY_BIT, &ct->status)) { 198 207 nf_ct_refresh_acct(ct, ctinfo, skb, 199 208 timeouts[UDP_CT_REPLIED]); 209 + 210 + if (unlikely((ct->status & IPS_NAT_CLASH))) 211 + return NF_ACCEPT; 212 + 200 213 /* Also, more likely to be important, and not a probe */ 201 214 if (!test_and_set_bit(IPS_ASSURED_BIT, &ct->status)) 202 215 nf_conntrack_event_cache(IPCT_ASSURED, ct); 203 216 } else { 204 - nf_conntrack_udp_refresh_unreplied(ct, skb, ctinfo, 205 - timeouts[UDP_CT_UNREPLIED]); 217 + nf_ct_refresh_acct(ct, ctinfo, skb, timeouts[UDP_CT_UNREPLIED]); 206 218 } 207 219 return NF_ACCEPT; 208 220 }
+1 -1
net/netfilter/nft_flow_offload.c
··· 102 102 } 103 103 104 104 if (nf_ct_ext_exist(ct, NF_CT_EXT_HELPER) || 105 - ct->status & IPS_SEQ_ADJUST) 105 + ct->status & (IPS_SEQ_ADJUST | IPS_NAT_CLASH)) 106 106 goto out; 107 107 108 108 if (!nf_ct_is_confirmed(ct))