Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter/IPVS/OVS fixes for net

The following patchset contains a rather large batch of Netfilter, IPVS
and OVS fixes for your net tree. This includes fixes for ctnetlink, the
userspace conntrack helper infrastructure, conntrack OVS support,
ebtables DNAT target, several leaks in error path among other. More
specifically, they are:

1) Fix reference count leak in the CT target error path, from Gao Feng.

2) Remove conntrack entry clashing with a matching expectation, patch
from Jarno Rajahalme.

3) Fix bogus EEXIST when registering two different userspace helpers,
from Liping Zhang.

4) Don't leak dummy elements in the new bitmap set type in nf_tables,
from Liping Zhang.

5) Get rid of module autoload from conntrack update path in ctnetlink,
we don't need autoload at this late stage and it is happening with
rcu read lock held which is not good. From Liping Zhang.

6) Fix deadlock due to double-acquire of the expect_lock from conntrack
update path, this fixes a bug that was introduced when the central
spinlock got removed. Again from Liping Zhang.

7) Safe ct->status update from ctnetlink path, from Liping. The expect_lock
protection that was selected when the central spinlock was removed was
not really protecting anything at all.

8) Protect sequence adjustment under ct->lock.

9) Missing socket match with IPv6, from Peter Tirsek.

10) Adjust skb->pkt_type of DNAT'ed frames from ebtables, from
Linus Luessing.

11) Don't give up on evaluating the expression on new entries added via
dynset expression in nf_tables, from Liping Zhang.

12) Use skb_checksum() when mangling icmpv6 in IPv6 NAT as this deals
with non-linear skbuffs.

13) Don't allow IPv6 service in IPVS if no IPv6 support is available,
from Paolo Abeni.

14) Missing mutex release in error path of xt_find_table_lock(), from
Dan Carpenter.

15) Update maintainers files, Netfilter section. Add Florian to the
file, refer to nftables.org and change project status from Supported
to Maintained.

16) Bail out on mismatching extensions in element updates in nf_tables.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+174 -64
+3 -1
MAINTAINERS
··· 8747 8747 NETFILTER 8748 8748 M: Pablo Neira Ayuso <pablo@netfilter.org> 8749 8749 M: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> 8750 + M: Florian Westphal <fw@strlen.de> 8750 8751 L: netfilter-devel@vger.kernel.org 8751 8752 L: coreteam@netfilter.org 8752 8753 W: http://www.netfilter.org/ 8753 8754 W: http://www.iptables.org/ 8755 + W: http://www.nftables.org/ 8754 8756 Q: http://patchwork.ozlabs.org/project/netfilter-devel/list/ 8755 8757 T: git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf.git 8756 8758 T: git git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next.git 8757 - S: Supported 8759 + S: Maintained 8758 8760 F: include/linux/netfilter* 8759 8761 F: include/linux/netfilter/ 8760 8762 F: include/net/netfilter/
+9 -4
include/uapi/linux/netfilter/nf_conntrack_common.h
··· 84 84 IPS_DYING_BIT = 9, 85 85 IPS_DYING = (1 << IPS_DYING_BIT), 86 86 87 - /* Bits that cannot be altered from userland. */ 88 - IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK | 89 - IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING), 90 - 91 87 /* Connection has fixed timeout. */ 92 88 IPS_FIXED_TIMEOUT_BIT = 10, 93 89 IPS_FIXED_TIMEOUT = (1 << IPS_FIXED_TIMEOUT_BIT), ··· 99 103 /* Conntrack got a helper explicitly attached via CT target. */ 100 104 IPS_HELPER_BIT = 13, 101 105 IPS_HELPER = (1 << IPS_HELPER_BIT), 106 + 107 + /* Be careful here, modifying these bits can make things messy, 108 + * so don't let users modify them directly. 109 + */ 110 + IPS_UNCHANGEABLE_MASK = (IPS_NAT_DONE_MASK | IPS_NAT_MASK | 111 + IPS_EXPECTED | IPS_CONFIRMED | IPS_DYING | 112 + IPS_SEQ_ADJUST | IPS_TEMPLATE), 113 + 114 + __IPS_MAX_BIT = 14, 102 115 }; 103 116 104 117 /* Connection tracking event types */
+20
net/bridge/netfilter/ebt_dnat.c
··· 9 9 */ 10 10 #include <linux/module.h> 11 11 #include <net/sock.h> 12 + #include "../br_private.h" 12 13 #include <linux/netfilter.h> 13 14 #include <linux/netfilter/x_tables.h> 14 15 #include <linux/netfilter_bridge/ebtables.h> ··· 19 18 ebt_dnat_tg(struct sk_buff *skb, const struct xt_action_param *par) 20 19 { 21 20 const struct ebt_nat_info *info = par->targinfo; 21 + struct net_device *dev; 22 22 23 23 if (!skb_make_writable(skb, 0)) 24 24 return EBT_DROP; 25 25 26 26 ether_addr_copy(eth_hdr(skb)->h_dest, info->mac); 27 + 28 + if (is_multicast_ether_addr(info->mac)) { 29 + if (is_broadcast_ether_addr(info->mac)) 30 + skb->pkt_type = PACKET_BROADCAST; 31 + else 32 + skb->pkt_type = PACKET_MULTICAST; 33 + } else { 34 + if (xt_hooknum(par) != NF_BR_BROUTING) 35 + dev = br_port_get_rcu(xt_in(par))->br->dev; 36 + else 37 + dev = xt_in(par); 38 + 39 + if (ether_addr_equal(info->mac, dev->dev_addr)) 40 + skb->pkt_type = PACKET_HOST; 41 + else 42 + skb->pkt_type = PACKET_OTHERHOST; 43 + } 44 + 27 45 return info->target; 28 46 } 29 47
+1 -1
net/ipv6/netfilter/nf_nat_l3proto_ipv6.c
··· 235 235 inside->icmp6.icmp6_cksum = 236 236 csum_ipv6_magic(&ipv6h->saddr, &ipv6h->daddr, 237 237 skb->len - hdrlen, IPPROTO_ICMPV6, 238 - csum_partial(&inside->icmp6, 238 + skb_checksum(skb, hdrlen, 239 239 skb->len - hdrlen, 0)); 240 240 } 241 241
+17 -5
net/netfilter/ipvs/ip_vs_ctl.c
··· 3078 3078 return skb->len; 3079 3079 } 3080 3080 3081 + static bool ip_vs_is_af_valid(int af) 3082 + { 3083 + if (af == AF_INET) 3084 + return true; 3085 + #ifdef CONFIG_IP_VS_IPV6 3086 + if (af == AF_INET6 && ipv6_mod_enabled()) 3087 + return true; 3088 + #endif 3089 + return false; 3090 + } 3091 + 3081 3092 static int ip_vs_genl_parse_service(struct netns_ipvs *ipvs, 3082 3093 struct ip_vs_service_user_kern *usvc, 3083 3094 struct nlattr *nla, int full_entry, ··· 3116 3105 memset(usvc, 0, sizeof(*usvc)); 3117 3106 3118 3107 usvc->af = nla_get_u16(nla_af); 3119 - #ifdef CONFIG_IP_VS_IPV6 3120 - if (usvc->af != AF_INET && usvc->af != AF_INET6) 3121 - #else 3122 - if (usvc->af != AF_INET) 3123 - #endif 3108 + if (!ip_vs_is_af_valid(usvc->af)) 3124 3109 return -EAFNOSUPPORT; 3125 3110 3126 3111 if (nla_fwmark) { ··· 3618 3611 */ 3619 3612 if (udest.af == 0) 3620 3613 udest.af = svc->af; 3614 + 3615 + if (!ip_vs_is_af_valid(udest.af)) { 3616 + ret = -EAFNOSUPPORT; 3617 + goto out; 3618 + } 3621 3619 3622 3620 if (udest.af != svc->af && cmd != IPVS_CMD_DEL_DEST) { 3623 3621 /* The synchronization protocol is incompatible
+21 -5
net/netfilter/nf_conntrack_helper.c
··· 385 385 struct nf_conntrack_tuple_mask mask = { .src.u.all = htons(0xFFFF) }; 386 386 unsigned int h = helper_hash(&me->tuple); 387 387 struct nf_conntrack_helper *cur; 388 - int ret = 0; 388 + int ret = 0, i; 389 389 390 390 BUG_ON(me->expect_policy == NULL); 391 391 BUG_ON(me->expect_class_max >= NF_CT_MAX_EXPECT_CLASSES); ··· 395 395 return -EINVAL; 396 396 397 397 mutex_lock(&nf_ct_helper_mutex); 398 - hlist_for_each_entry(cur, &nf_ct_helper_hash[h], hnode) { 399 - if (nf_ct_tuple_src_mask_cmp(&cur->tuple, &me->tuple, &mask)) { 400 - ret = -EEXIST; 401 - goto out; 398 + for (i = 0; i < nf_ct_helper_hsize; i++) { 399 + hlist_for_each_entry(cur, &nf_ct_helper_hash[i], hnode) { 400 + if (!strcmp(cur->name, me->name) && 401 + (cur->tuple.src.l3num == NFPROTO_UNSPEC || 402 + cur->tuple.src.l3num == me->tuple.src.l3num) && 403 + cur->tuple.dst.protonum == me->tuple.dst.protonum) { 404 + ret = -EEXIST; 405 + goto out; 406 + } 407 + } 408 + } 409 + 410 + /* avoid unpredictable behaviour for auto_assign_helper */ 411 + if (!(me->flags & NF_CT_HELPER_F_USERSPACE)) { 412 + hlist_for_each_entry(cur, &nf_ct_helper_hash[h], hnode) { 413 + if (nf_ct_tuple_src_mask_cmp(&cur->tuple, &me->tuple, 414 + &mask)) { 415 + ret = -EEXIST; 416 + goto out; 417 + } 402 418 } 403 419 } 404 420 hlist_add_head_rcu(&me->hnode, &nf_ct_helper_hash[h]);
+49 -40
net/netfilter/nf_conntrack_netlink.c
··· 417 417 return -1; 418 418 } 419 419 420 - static int ctnetlink_dump_ct_seq_adj(struct sk_buff *skb, 421 - const struct nf_conn *ct) 420 + static int ctnetlink_dump_ct_seq_adj(struct sk_buff *skb, struct nf_conn *ct) 422 421 { 423 422 struct nf_conn_seqadj *seqadj = nfct_seqadj(ct); 424 423 struct nf_ct_seqadj *seq; ··· 425 426 if (!(ct->status & IPS_SEQ_ADJUST) || !seqadj) 426 427 return 0; 427 428 429 + spin_lock_bh(&ct->lock); 428 430 seq = &seqadj->seq[IP_CT_DIR_ORIGINAL]; 429 431 if (dump_ct_seq_adj(skb, seq, CTA_SEQ_ADJ_ORIG) == -1) 430 - return -1; 432 + goto err; 431 433 432 434 seq = &seqadj->seq[IP_CT_DIR_REPLY]; 433 435 if (dump_ct_seq_adj(skb, seq, CTA_SEQ_ADJ_REPLY) == -1) 434 - return -1; 436 + goto err; 435 437 438 + spin_unlock_bh(&ct->lock); 436 439 return 0; 440 + err: 441 + spin_unlock_bh(&ct->lock); 442 + return -1; 437 443 } 438 444 439 445 static int ctnetlink_dump_id(struct sk_buff *skb, const struct nf_conn *ct) ··· 1421 1417 } 1422 1418 #endif 1423 1419 1420 + static void 1421 + __ctnetlink_change_status(struct nf_conn *ct, unsigned long on, 1422 + unsigned long off) 1423 + { 1424 + unsigned int bit; 1425 + 1426 + /* Ignore these unchangable bits */ 1427 + on &= ~IPS_UNCHANGEABLE_MASK; 1428 + off &= ~IPS_UNCHANGEABLE_MASK; 1429 + 1430 + for (bit = 0; bit < __IPS_MAX_BIT; bit++) { 1431 + if (on & (1 << bit)) 1432 + set_bit(bit, &ct->status); 1433 + else if (off & (1 << bit)) 1434 + clear_bit(bit, &ct->status); 1435 + } 1436 + } 1437 + 1424 1438 static int 1425 1439 ctnetlink_change_status(struct nf_conn *ct, const struct nlattr * const cda[]) 1426 1440 { ··· 1458 1436 /* ASSURED bit can only be set */ 1459 1437 return -EBUSY; 1460 1438 1461 - /* Be careful here, modifying NAT bits can screw up things, 1462 - * so don't let users modify them directly if they don't pass 1463 - * nf_nat_range. */ 1464 - ct->status |= status & ~(IPS_NAT_DONE_MASK | IPS_NAT_MASK); 1439 + __ctnetlink_change_status(ct, status, 0); 1465 1440 return 0; 1466 1441 } 1467 1442 ··· 1527 1508 return 0; 1528 1509 } 1529 1510 1511 + rcu_read_lock(); 1530 1512 helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct), 1531 1513 nf_ct_protonum(ct)); 1532 1514 if (helper == NULL) { 1533 - #ifdef CONFIG_MODULES 1534 - spin_unlock_bh(&nf_conntrack_expect_lock); 1535 - 1536 - if (request_module("nfct-helper-%s", helpname) < 0) { 1537 - spin_lock_bh(&nf_conntrack_expect_lock); 1538 - return -EOPNOTSUPP; 1539 - } 1540 - 1541 - spin_lock_bh(&nf_conntrack_expect_lock); 1542 - helper = __nf_conntrack_helper_find(helpname, nf_ct_l3num(ct), 1543 - nf_ct_protonum(ct)); 1544 - if (helper) 1545 - return -EAGAIN; 1546 - #endif 1515 + rcu_read_unlock(); 1547 1516 return -EOPNOTSUPP; 1548 1517 } 1549 1518 ··· 1540 1533 /* update private helper data if allowed. */ 1541 1534 if (helper->from_nlattr) 1542 1535 helper->from_nlattr(helpinfo, ct); 1543 - return 0; 1536 + err = 0; 1544 1537 } else 1545 - return -EBUSY; 1538 + err = -EBUSY; 1539 + } else { 1540 + /* we cannot set a helper for an existing conntrack */ 1541 + err = -EOPNOTSUPP; 1546 1542 } 1547 1543 1548 - /* we cannot set a helper for an existing conntrack */ 1549 - return -EOPNOTSUPP; 1544 + rcu_read_unlock(); 1545 + return err; 1550 1546 } 1551 1547 1552 1548 static int ctnetlink_change_timeout(struct nf_conn *ct, ··· 1640 1630 if (!seqadj) 1641 1631 return 0; 1642 1632 1633 + spin_lock_bh(&ct->lock); 1643 1634 if (cda[CTA_SEQ_ADJ_ORIG]) { 1644 1635 ret = change_seq_adj(&seqadj->seq[IP_CT_DIR_ORIGINAL], 1645 1636 cda[CTA_SEQ_ADJ_ORIG]); 1646 1637 if (ret < 0) 1647 - return ret; 1638 + goto err; 1648 1639 1649 - ct->status |= IPS_SEQ_ADJUST; 1640 + set_bit(IPS_SEQ_ADJUST_BIT, &ct->status); 1650 1641 } 1651 1642 1652 1643 if (cda[CTA_SEQ_ADJ_REPLY]) { 1653 1644 ret = change_seq_adj(&seqadj->seq[IP_CT_DIR_REPLY], 1654 1645 cda[CTA_SEQ_ADJ_REPLY]); 1655 1646 if (ret < 0) 1656 - return ret; 1647 + goto err; 1657 1648 1658 - ct->status |= IPS_SEQ_ADJUST; 1649 + set_bit(IPS_SEQ_ADJUST_BIT, &ct->status); 1659 1650 } 1660 1651 1652 + spin_unlock_bh(&ct->lock); 1661 1653 return 0; 1654 + err: 1655 + spin_unlock_bh(&ct->lock); 1656 + return ret; 1662 1657 } 1663 1658 1664 1659 static int ··· 1974 1959 err = -EEXIST; 1975 1960 ct = nf_ct_tuplehash_to_ctrack(h); 1976 1961 if (!(nlh->nlmsg_flags & NLM_F_EXCL)) { 1977 - spin_lock_bh(&nf_conntrack_expect_lock); 1978 1962 err = ctnetlink_change_conntrack(ct, cda); 1979 - spin_unlock_bh(&nf_conntrack_expect_lock); 1980 1963 if (err == 0) { 1981 1964 nf_conntrack_eventmask_report((1 << IPCT_REPLY) | 1982 1965 (1 << IPCT_ASSURED) | ··· 2307 2294 /* This check is less strict than ctnetlink_change_status() 2308 2295 * because callers often flip IPS_EXPECTED bits when sending 2309 2296 * an NFQA_CT attribute to the kernel. So ignore the 2310 - * unchangeable bits but do not error out. 2297 + * unchangeable bits but do not error out. Also user programs 2298 + * are allowed to clear the bits that they are allowed to change. 2311 2299 */ 2312 - ct->status = (status & ~IPS_UNCHANGEABLE_MASK) | 2313 - (ct->status & IPS_UNCHANGEABLE_MASK); 2300 + __ctnetlink_change_status(ct, status, ~status); 2314 2301 return 0; 2315 2302 } 2316 2303 ··· 2364 2351 if (ret < 0) 2365 2352 return ret; 2366 2353 2367 - spin_lock_bh(&nf_conntrack_expect_lock); 2368 - ret = ctnetlink_glue_parse_ct((const struct nlattr **)cda, ct); 2369 - spin_unlock_bh(&nf_conntrack_expect_lock); 2370 - 2371 - return ret; 2354 + return ctnetlink_glue_parse_ct((const struct nlattr **)cda, ct); 2372 2355 } 2373 2356 2374 2357 static int ctnetlink_glue_exp_parse(const struct nlattr * const *cda,
+5
net/netfilter/nf_tables_api.c
··· 3778 3778 err = set->ops->insert(ctx->net, set, &elem, &ext2); 3779 3779 if (err) { 3780 3780 if (err == -EEXIST) { 3781 + if (nft_set_ext_exists(ext, NFT_SET_EXT_DATA) ^ 3782 + nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) || 3783 + nft_set_ext_exists(ext, NFT_SET_EXT_OBJREF) ^ 3784 + nft_set_ext_exists(ext2, NFT_SET_EXT_OBJREF)) 3785 + return -EBUSY; 3781 3786 if ((nft_set_ext_exists(ext, NFT_SET_EXT_DATA) && 3782 3787 nft_set_ext_exists(ext2, NFT_SET_EXT_DATA) && 3783 3788 memcmp(nft_set_ext_data(ext),
+2 -3
net/netfilter/nft_dynset.c
··· 82 82 nft_set_ext_exists(ext, NFT_SET_EXT_EXPIRATION)) { 83 83 timeout = priv->timeout ? : set->timeout; 84 84 *nft_set_ext_expiration(ext) = jiffies + timeout; 85 - } else if (sexpr == NULL) 86 - goto out; 85 + } 87 86 88 87 if (sexpr != NULL) 89 88 sexpr->ops->eval(sexpr, regs, pkt); ··· 91 92 regs->verdict.code = NFT_BREAK; 92 93 return; 93 94 } 94 - out: 95 + 95 96 if (!priv->invert) 96 97 regs->verdict.code = NFT_BREAK; 97 98 }
+5
net/netfilter/nft_set_bitmap.c
··· 257 257 258 258 static void nft_bitmap_destroy(const struct nft_set *set) 259 259 { 260 + struct nft_bitmap *priv = nft_set_priv(set); 261 + struct nft_bitmap_elem *be, *n; 262 + 263 + list_for_each_entry_safe(be, n, &priv->list, head) 264 + nft_set_elem_destroy(set, be, true); 260 265 } 261 266 262 267 static bool nft_bitmap_estimate(const struct nft_set_desc *desc, u32 features,
+3 -1
net/netfilter/x_tables.c
··· 1051 1051 list_for_each_entry(t, &init_net.xt.tables[af], list) { 1052 1052 if (strcmp(t->name, name)) 1053 1053 continue; 1054 - if (!try_module_get(t->me)) 1054 + if (!try_module_get(t->me)) { 1055 + mutex_unlock(&xt[af].mutex); 1055 1056 return NULL; 1057 + } 1056 1058 1057 1059 mutex_unlock(&xt[af].mutex); 1058 1060 if (t->table_init(net) != 0) {
+9 -2
net/netfilter/xt_CT.c
··· 168 168 goto err_put_timeout; 169 169 } 170 170 timeout_ext = nf_ct_timeout_ext_add(ct, timeout, GFP_ATOMIC); 171 - if (timeout_ext == NULL) 171 + if (!timeout_ext) { 172 172 ret = -ENOMEM; 173 + goto err_put_timeout; 174 + } 173 175 174 176 rcu_read_unlock(); 175 177 return ret; ··· 203 201 struct xt_ct_target_info_v1 *info) 204 202 { 205 203 struct nf_conntrack_zone zone; 204 + struct nf_conn_help *help; 206 205 struct nf_conn *ct; 207 206 int ret = -EOPNOTSUPP; 208 207 ··· 252 249 if (info->timeout[0]) { 253 250 ret = xt_ct_set_timeout(ct, par, info->timeout); 254 251 if (ret < 0) 255 - goto err3; 252 + goto err4; 256 253 } 257 254 __set_bit(IPS_CONFIRMED_BIT, &ct->status); 258 255 nf_conntrack_get(&ct->ct_general); ··· 260 257 info->ct = ct; 261 258 return 0; 262 259 260 + err4: 261 + help = nfct_help(ct); 262 + if (help) 263 + module_put(help->helper->me); 263 264 err3: 264 265 nf_ct_tmpl_free(ct); 265 266 err2:
+1 -1
net/netfilter/xt_socket.c
··· 152 152 switch (family) { 153 153 case NFPROTO_IPV4: 154 154 return nf_defrag_ipv4_enable(net); 155 - #ifdef XT_SOCKET_HAVE_IPV6 155 + #if IS_ENABLED(CONFIG_IP6_NF_IPTABLES) 156 156 case NFPROTO_IPV6: 157 157 return nf_defrag_ipv6_enable(net); 158 158 #endif
+29 -1
net/openvswitch/conntrack.c
··· 516 516 u16 proto, const struct sk_buff *skb) 517 517 { 518 518 struct nf_conntrack_tuple tuple; 519 + struct nf_conntrack_expect *exp; 519 520 520 521 if (!nf_ct_get_tuplepr(skb, skb_network_offset(skb), proto, net, &tuple)) 521 522 return NULL; 522 - return __nf_ct_expect_find(net, zone, &tuple); 523 + 524 + exp = __nf_ct_expect_find(net, zone, &tuple); 525 + if (exp) { 526 + struct nf_conntrack_tuple_hash *h; 527 + 528 + /* Delete existing conntrack entry, if it clashes with the 529 + * expectation. This can happen since conntrack ALGs do not 530 + * check for clashes between (new) expectations and existing 531 + * conntrack entries. nf_conntrack_in() will check the 532 + * expectations only if a conntrack entry can not be found, 533 + * which can lead to OVS finding the expectation (here) in the 534 + * init direction, but which will not be removed by the 535 + * nf_conntrack_in() call, if a matching conntrack entry is 536 + * found instead. In this case all init direction packets 537 + * would be reported as new related packets, while reply 538 + * direction packets would be reported as un-related 539 + * established packets. 540 + */ 541 + h = nf_conntrack_find_get(net, zone, &tuple); 542 + if (h) { 543 + struct nf_conn *ct = nf_ct_tuplehash_to_ctrack(h); 544 + 545 + nf_ct_delete(ct, 0, 0); 546 + nf_conntrack_put(&ct->ct_general); 547 + } 548 + } 549 + 550 + return exp; 523 551 } 524 552 525 553 /* This replicates logic from nf_conntrack_core.c that is not exported. */