Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

netfilter: ctnetlink: make it safer when updating ct->status

After converting to use rcu for conntrack hash, one CPU may update
the ct->status via ctnetlink, while another CPU may process the
packets and update the ct->status.

So the non-atomic operation "ct->status |= status;" via ctnetlink
becomes unsafe, and this may clear the IPS_DYING_BIT bit set by
another CPU unexpectedly. For example:
CPU0 CPU1
ctnetlink_change_status __nf_conntrack_find_get
old = ct->status nf_ct_gc_expired
- nf_ct_kill
- test_and_set_bit(IPS_DYING_BIT
new = old | status; -
ct->status = new; <-- oops, _DYING_ is cleared!

Now using a series of atomic bit operation to solve the above issue.

Also note, user shouldn't set IPS_TEMPLATE, IPS_SEQ_ADJUST directly,
so make these two bits be unchangable too.

If we set the IPS_TEMPLATE_BIT, ct will be freed by nf_ct_tmpl_free,
but actually it is alloced by nf_conntrack_alloc.
If we set the IPS_SEQ_ADJUST_BIT, this may cause the NULL pointer
deference, as the nfct_seqadj(ct) maybe NULL.

Last, add some comments to describe the logic change due to the
commit a963d710f367 ("netfilter: ctnetlink: Fix regression in CTA_STATUS
processing"), which makes me feel a little confusing.

Fixes: 76507f69c44e ("[NETFILTER]: nf_conntrack: use RCU for conntrack hash")
Signed-off-by: Liping Zhang <zlpnobody@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

authored by

Liping Zhang and committed by
Pablo Neira Ayuso
53b56da8 88be4c09

+33 -13
+9 -4
include/uapi/linux/netfilter/nf_conntrack_common.h
··· 82 82 IPS_DYING_BIT = 9, 83 83 IPS_DYING = (1 << IPS_DYING_BIT), 84 84 85 - /* Bits that cannot be altered from userland. */ 86 - IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK | 87 - IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING), 88 - 89 85 /* Connection has fixed timeout. */ 90 86 IPS_FIXED_TIMEOUT_BIT = 10, 91 87 IPS_FIXED_TIMEOUT = (1 << IPS_FIXED_TIMEOUT_BIT), ··· 97 101 /* Conntrack got a helper explicitly attached via CT target. */ 98 102 IPS_HELPER_BIT = 13, 99 103 IPS_HELPER = (1 << IPS_HELPER_BIT), 104 + 105 + /* Be careful here, modifying these bits can make things messy, 106 + * so don't let users modify them directly. 107 + */ 108 + IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK | 109 + IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING | 110 + IPS_SEQ_ADJUST | IPS_TEMPLATE), 111 + 112 + __IPS_MAX_BIT = 14, 100 113 }; 101 114 102 115 /* Connection tracking event types */
+24 -9
net/netfilter/nf_conntrack_netlink.c
··· 1419 1419 } 1420 1420 #endif 1421 1421 1422 + static void 1423 + __ctnetlink_change_status(struct nf_conn *ct, unsigned long on, 1424 + unsigned long off) 1425 + { 1426 + unsigned int bit; 1427 + 1428 + /* Ignore these unchangable bits */ 1429 + on &= ~IPS_UNCHANGEABLE_MASK; 1430 + off &= ~IPS_UNCHANGEABLE_MASK; 1431 + 1432 + for (bit = 0; bit < __IPS_MAX_BIT; bit++) { 1433 + if (on & (1 << bit)) 1434 + set_bit(bit, &ct->status); 1435 + else if (off & (1 << bit)) 1436 + clear_bit(bit, &ct->status); 1437 + } 1438 + } 1439 + 1422 1440 static int 1423 1441 ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[]) 1424 1442 { ··· 1456 1438 /* ASSURED bit can only be set */ 1457 1439 return -EBUSY; 1458 1440 1459 - /* Be careful here, modifying NAT bits can screw up things, 1460 - * so don't let users modify them directly if they don't pass 1461 - * nf_nat_range. */ 1462 - ct->status |= status & ~(IPS_NAT_DONE_MASK | IPS_NAT_MASK); 1441 + __ctnetlink_change_status(ct, status, 0); 1463 1442 return 0; 1464 1443 } 1465 1444 ··· 1643 1628 if (ret < 0) 1644 1629 return ret; 1645 1630 1646 - ct->status |= IPS_SEQ_ADJUST; 1631 + set_bit(IPS_SEQ_ADJUST_BIT, &ct->status); 1647 1632 } 1648 1633 1649 1634 if (cda[CTA_SEQ_ADJ_REPLY]) { ··· 1652 1637 if (ret < 0) 1653 1638 return ret; 1654 1639 1655 - ct->status |= IPS_SEQ_ADJUST; 1640 + set_bit(IPS_SEQ_ADJUST_BIT, &ct->status); 1656 1641 } 1657 1642 1658 1643 return 0; ··· 2304 2289 /* This check is less strict than ctnetlink_change_status() 2305 2290 * because callers often flip IPS_EXPECTED bits when sending 2306 2291 * an NFQA_CT attribute to the kernel. So ignore the 2307 - * unchangeable bits but do not error out. 2292 + * unchangeable bits but do not error out. Also user programs 2293 + * are allowed to clear the bits that they are allowed to change. 2308 2294 */ 2309 - ct->status = (status & ~IPS_UNCHANGEABLE_MASK) | 2310 - (ct->status & IPS_UNCHANGEABLE_MASK); 2295 + __ctnetlink_change_status(ct, status, ~status); 2311 2296 return 0; 2312 2297 } 2313 2298