Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2019-04-04

The following pull-request contains BPF updates for your *net* tree.

The main changes are:

1) Batch of fixes to the existing BPF flow dissector API to support
calling BPF programs from the eth_get_headlen context (support for
latter is planned to be added in bpf-next), from Stanislav.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+208 -26
+126
Documentation/networking/bpf_flow_dissector.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ================== 4 + BPF Flow Dissector 5 + ================== 6 + 7 + Overview 8 + ======== 9 + 10 + Flow dissector is a routine that parses metadata out of the packets. It's 11 + used in the various places in the networking subsystem (RFS, flow hash, etc). 12 + 13 + BPF flow dissector is an attempt to reimplement C-based flow dissector logic 14 + in BPF to gain all the benefits of BPF verifier (namely, limits on the 15 + number of instructions and tail calls). 16 + 17 + API 18 + === 19 + 20 + BPF flow dissector programs operate on an ``__sk_buff``. However, only the 21 + limited set of fields is allowed: ``data``, ``data_end`` and ``flow_keys``. 22 + ``flow_keys`` is ``struct bpf_flow_keys`` and contains flow dissector input 23 + and output arguments. 24 + 25 + The inputs are: 26 + * ``nhoff`` - initial offset of the networking header 27 + * ``thoff`` - initial offset of the transport header, initialized to nhoff 28 + * ``n_proto`` - L3 protocol type, parsed out of L2 header 29 + 30 + Flow dissector BPF program should fill out the rest of the ``struct 31 + bpf_flow_keys`` fields. Input arguments ``nhoff/thoff/n_proto`` should be 32 + also adjusted accordingly. 33 + 34 + The return code of the BPF program is either BPF_OK to indicate successful 35 + dissection, or BPF_DROP to indicate parsing error. 36 + 37 + __sk_buff->data 38 + =============== 39 + 40 + In the VLAN-less case, this is what the initial state of the BPF flow 41 + dissector looks like:: 42 + 43 + +------+------+------------+-----------+ 44 + | DMAC | SMAC | ETHER_TYPE | L3_HEADER | 45 + +------+------+------------+-----------+ 46 + ^ 47 + | 48 + +-- flow dissector starts here 49 + 50 + 51 + .. code:: c 52 + 53 + skb->data + flow_keys->nhoff point to the first byte of L3_HEADER 54 + flow_keys->thoff = nhoff 55 + flow_keys->n_proto = ETHER_TYPE 56 + 57 + In case of VLAN, flow dissector can be called with the two different states. 58 + 59 + Pre-VLAN parsing:: 60 + 61 + +------+------+------+-----+-----------+-----------+ 62 + | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | 63 + +------+------+------+-----+-----------+-----------+ 64 + ^ 65 + | 66 + +-- flow dissector starts here 67 + 68 + .. code:: c 69 + 70 + skb->data + flow_keys->nhoff point the to first byte of TCI 71 + flow_keys->thoff = nhoff 72 + flow_keys->n_proto = TPID 73 + 74 + Please note that TPID can be 802.1AD and, hence, BPF program would 75 + have to parse VLAN information twice for double tagged packets. 76 + 77 + 78 + Post-VLAN parsing:: 79 + 80 + +------+------+------+-----+-----------+-----------+ 81 + | DMAC | SMAC | TPID | TCI |ETHER_TYPE | L3_HEADER | 82 + +------+------+------+-----+-----------+-----------+ 83 + ^ 84 + | 85 + +-- flow dissector starts here 86 + 87 + .. code:: c 88 + 89 + skb->data + flow_keys->nhoff point the to first byte of L3_HEADER 90 + flow_keys->thoff = nhoff 91 + flow_keys->n_proto = ETHER_TYPE 92 + 93 + In this case VLAN information has been processed before the flow dissector 94 + and BPF flow dissector is not required to handle it. 95 + 96 + 97 + The takeaway here is as follows: BPF flow dissector program can be called with 98 + the optional VLAN header and should gracefully handle both cases: when single 99 + or double VLAN is present and when it is not present. The same program 100 + can be called for both cases and would have to be written carefully to 101 + handle both cases. 102 + 103 + 104 + Reference Implementation 105 + ======================== 106 + 107 + See ``tools/testing/selftests/bpf/progs/bpf_flow.c`` for the reference 108 + implementation and ``tools/testing/selftests/bpf/flow_dissector_load.[hc]`` 109 + for the loader. bpftool can be used to load BPF flow dissector program as well. 110 + 111 + The reference implementation is organized as follows: 112 + * ``jmp_table`` map that contains sub-programs for each supported L3 protocol 113 + * ``_dissect`` routine - entry point; it does input ``n_proto`` parsing and 114 + does ``bpf_tail_call`` to the appropriate L3 handler 115 + 116 + Since BPF at this point doesn't support looping (or any jumping back), 117 + jmp_table is used instead to handle multiple levels of encapsulation (and 118 + IPv6 options). 119 + 120 + 121 + Current Limitations 122 + =================== 123 + BPF flow dissector doesn't support exporting all the metadata that in-kernel 124 + C-based implementation can export. Notable example is single VLAN (802.1Q) 125 + and double VLAN (802.1AD) tags. Please refer to the ``struct bpf_flow_keys`` 126 + for a set of information that's currently can be exported from the BPF context.
+1
Documentation/networking/index.rst
··· 9 9 netdev-FAQ 10 10 af_xdp 11 11 batman-adv 12 + bpf_flow_dissector 12 13 can 13 14 can_ucan_protocol 14 15 device_drivers/freescale/dpaa2/index
+3 -13
net/core/filter.c
··· 6613 6613 const struct bpf_prog *prog, 6614 6614 struct bpf_insn_access_aux *info) 6615 6615 { 6616 - if (type == BPF_WRITE) { 6617 - switch (off) { 6618 - case bpf_ctx_range_till(struct __sk_buff, cb[0], cb[4]): 6619 - break; 6620 - default: 6621 - return false; 6622 - } 6623 - } 6616 + if (type == BPF_WRITE) 6617 + return false; 6624 6618 6625 6619 switch (off) { 6626 6620 case bpf_ctx_range(struct __sk_buff, data): ··· 6626 6632 case bpf_ctx_range_ptr(struct __sk_buff, flow_keys): 6627 6633 info->reg_type = PTR_TO_FLOW_KEYS; 6628 6634 break; 6629 - case bpf_ctx_range(struct __sk_buff, tc_classid): 6630 - case bpf_ctx_range(struct __sk_buff, data_meta): 6631 - case bpf_ctx_range_till(struct __sk_buff, family, local_port): 6632 - case bpf_ctx_range(struct __sk_buff, tstamp): 6633 - case bpf_ctx_range(struct __sk_buff, wire_len): 6635 + default: 6634 6636 return false; 6635 6637 } 6636 6638
+3 -1
net/core/flow_dissector.c
··· 707 707 /* Pass parameters to the BPF program */ 708 708 memset(flow_keys, 0, sizeof(*flow_keys)); 709 709 cb->qdisc_cb.flow_keys = flow_keys; 710 + flow_keys->n_proto = skb->protocol; 710 711 flow_keys->nhoff = skb_network_offset(skb); 711 712 flow_keys->thoff = flow_keys->nhoff; 712 713 ··· 717 716 /* Restore state */ 718 717 memcpy(cb, &cb_saved, sizeof(cb_saved)); 719 718 720 - flow_keys->nhoff = clamp_t(u16, flow_keys->nhoff, 0, skb->len); 719 + flow_keys->nhoff = clamp_t(u16, flow_keys->nhoff, 720 + skb_network_offset(skb), skb->len); 721 721 flow_keys->thoff = clamp_t(u16, flow_keys->thoff, 722 722 flow_keys->nhoff, skb->len); 723 723
+68
tools/testing/selftests/bpf/prog_tests/flow_dissector.c
··· 39 39 .n_proto = __bpf_constant_htons(ETH_P_IPV6), 40 40 }; 41 41 42 + #define VLAN_HLEN 4 43 + 44 + static struct { 45 + struct ethhdr eth; 46 + __u16 vlan_tci; 47 + __u16 vlan_proto; 48 + struct iphdr iph; 49 + struct tcphdr tcp; 50 + } __packed pkt_vlan_v4 = { 51 + .eth.h_proto = __bpf_constant_htons(ETH_P_8021Q), 52 + .vlan_proto = __bpf_constant_htons(ETH_P_IP), 53 + .iph.ihl = 5, 54 + .iph.protocol = IPPROTO_TCP, 55 + .iph.tot_len = __bpf_constant_htons(MAGIC_BYTES), 56 + .tcp.urg_ptr = 123, 57 + .tcp.doff = 5, 58 + }; 59 + 60 + static struct bpf_flow_keys pkt_vlan_v4_flow_keys = { 61 + .nhoff = VLAN_HLEN, 62 + .thoff = VLAN_HLEN + sizeof(struct iphdr), 63 + .addr_proto = ETH_P_IP, 64 + .ip_proto = IPPROTO_TCP, 65 + .n_proto = __bpf_constant_htons(ETH_P_IP), 66 + }; 67 + 68 + static struct { 69 + struct ethhdr eth; 70 + __u16 vlan_tci; 71 + __u16 vlan_proto; 72 + __u16 vlan_tci2; 73 + __u16 vlan_proto2; 74 + struct ipv6hdr iph; 75 + struct tcphdr tcp; 76 + } __packed pkt_vlan_v6 = { 77 + .eth.h_proto = __bpf_constant_htons(ETH_P_8021AD), 78 + .vlan_proto = __bpf_constant_htons(ETH_P_8021Q), 79 + .vlan_proto2 = __bpf_constant_htons(ETH_P_IPV6), 80 + .iph.nexthdr = IPPROTO_TCP, 81 + .iph.payload_len = __bpf_constant_htons(MAGIC_BYTES), 82 + .tcp.urg_ptr = 123, 83 + .tcp.doff = 5, 84 + }; 85 + 86 + static struct bpf_flow_keys pkt_vlan_v6_flow_keys = { 87 + .nhoff = VLAN_HLEN * 2, 88 + .thoff = VLAN_HLEN * 2 + sizeof(struct ipv6hdr), 89 + .addr_proto = ETH_P_IPV6, 90 + .ip_proto = IPPROTO_TCP, 91 + .n_proto = __bpf_constant_htons(ETH_P_IPV6), 92 + }; 93 + 42 94 void test_flow_dissector(void) 43 95 { 44 96 struct bpf_flow_keys flow_keys; ··· 119 67 "err %d errno %d retval %d duration %d size %u/%lu\n", 120 68 err, errno, retval, duration, size, sizeof(flow_keys)); 121 69 CHECK_FLOW_KEYS("ipv6_flow_keys", flow_keys, pkt_v6_flow_keys); 70 + 71 + err = bpf_prog_test_run(prog_fd, 10, &pkt_vlan_v4, sizeof(pkt_vlan_v4), 72 + &flow_keys, &size, &retval, &duration); 73 + CHECK(size != sizeof(flow_keys) || err || retval != 1, "vlan_ipv4", 74 + "err %d errno %d retval %d duration %d size %u/%lu\n", 75 + err, errno, retval, duration, size, sizeof(flow_keys)); 76 + CHECK_FLOW_KEYS("vlan_ipv4_flow_keys", flow_keys, 77 + pkt_vlan_v4_flow_keys); 78 + 79 + err = bpf_prog_test_run(prog_fd, 10, &pkt_vlan_v6, sizeof(pkt_vlan_v6), 80 + &flow_keys, &size, &retval, &duration); 81 + CHECK(size != sizeof(flow_keys) || err || retval != 1, "vlan_ipv6", 82 + "err %d errno %d retval %d duration %d size %u/%lu\n", 83 + err, errno, retval, duration, size, sizeof(flow_keys)); 84 + CHECK_FLOW_KEYS("vlan_ipv6_flow_keys", flow_keys, 85 + pkt_vlan_v6_flow_keys); 122 86 123 87 bpf_object__close(obj); 124 88 }
+7 -12
tools/testing/selftests/bpf/progs/bpf_flow.c
··· 92 92 { 93 93 struct bpf_flow_keys *keys = skb->flow_keys; 94 94 95 - keys->n_proto = proto; 96 95 switch (proto) { 97 96 case bpf_htons(ETH_P_IP): 98 97 bpf_tail_call(skb, &jmp_table, IP); ··· 118 119 SEC("flow_dissector") 119 120 int _dissect(struct __sk_buff *skb) 120 121 { 121 - if (!skb->vlan_present) 122 - return parse_eth_proto(skb, skb->protocol); 123 - else 124 - return parse_eth_proto(skb, skb->vlan_proto); 122 + struct bpf_flow_keys *keys = skb->flow_keys; 123 + 124 + return parse_eth_proto(skb, keys->n_proto); 125 125 } 126 126 127 127 /* Parses on IPPROTO_* */ ··· 334 336 { 335 337 struct bpf_flow_keys *keys = skb->flow_keys; 336 338 struct vlan_hdr *vlan, _vlan; 337 - __be16 proto; 338 - 339 - /* Peek back to see if single or double-tagging */ 340 - if (bpf_skb_load_bytes(skb, keys->thoff - sizeof(proto), &proto, 341 - sizeof(proto))) 342 - return BPF_DROP; 343 339 344 340 /* Account for double-tagging */ 345 - if (proto == bpf_htons(ETH_P_8021AD)) { 341 + if (keys->n_proto == bpf_htons(ETH_P_8021AD)) { 346 342 vlan = bpf_flow_dissect_get_header(skb, sizeof(*vlan), &_vlan); 347 343 if (!vlan) 348 344 return BPF_DROP; ··· 344 352 if (vlan->h_vlan_encapsulated_proto != bpf_htons(ETH_P_8021Q)) 345 353 return BPF_DROP; 346 354 355 + keys->nhoff += sizeof(*vlan); 347 356 keys->thoff += sizeof(*vlan); 348 357 } 349 358 ··· 352 359 if (!vlan) 353 360 return BPF_DROP; 354 361 362 + keys->nhoff += sizeof(*vlan); 355 363 keys->thoff += sizeof(*vlan); 356 364 /* Only allow 8021AD + 8021Q double tagging and no triple tagging.*/ 357 365 if (vlan->h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021AD) || 358 366 vlan->h_vlan_encapsulated_proto == bpf_htons(ETH_P_8021Q)) 359 367 return BPF_DROP; 360 368 369 + keys->n_proto = vlan->h_vlan_encapsulated_proto; 361 370 return parse_eth_proto(skb, vlan->h_vlan_encapsulated_proto); 362 371 } 363 372