Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'bpf-sk-lookup'

Joe Stringer says:

====================
This series proposes a new helper for the BPF API which allows BPF programs to
perform lookups for sockets in a network namespace. This would allow programs
to determine early on in processing whether the stack is expecting to receive
the packet, and perform some action (eg drop, forward somewhere) based on this
information.

The series is structured roughly into:
* Misc refactor
* Add the socket pointer type
* Add reference tracking to ensure that socket references are freed
* Extend the BPF API to add sk_lookup_xxx() / sk_release() functions
* Add tests/documentation

The helper proposed in this series includes a parameter for a tuple which must
be filled in by the caller to determine the socket to look up. The simplest
case would be filling with the contents of the packet, ie mapping the packet's
5-tuple into the parameter. In common cases, it may alternatively be useful to
reverse the direction of the tuple and perform a lookup, to find the socket
that initiates this connection; and if the BPF program ever performs a form of
IP address translation, it may further be useful to be able to look up
arbitrary tuples that are not based upon the packet, but instead based on state
held in BPF maps or hardcoded in the BPF program.

Currently, access into the socket's fields are limited to those which are
otherwise already accessible, and are restricted to read-only access.

Changes since v3:
* New patch: "bpf: Reuse canonical string formatter for ctx errs"
* Add PTR_TO_SOCKET to is_ctx_reg().
* Add a few new checks to prevent mixing of socket/non-socket pointers.
* Swap order of checks in sock_filter_is_valid_access().
* Prefix register spill macros with "bpf_".
* Add acks from previous round
* Rebase

Changes since v2:
* New patch: "selftests/bpf: Generalize dummy program types".
This enables adding verifier tests for socket lookup with tail calls.
* Define the semantics of the new helpers more clearly in uAPI header.
* Fix release of caller_net when netns is not specified.
* Use skb->sk to find caller net when skb->dev is unavailable.
* Fix build with !CONFIG_NET.
* Replace ptr_id defensive coding when releasing reference state with an
internal error (-EFAULT).
* Remove flags argument to sk_release().
* Add several new assembly tests suggested by Daniel.
* Add a few new C tests.
* Fix typo in verifier error message.

Changes since v1:
* Limit netns_id field to 32 bits
* Reuse reg_type_mismatch() in more places
* Reduce the number of passes at convert_ctx_access()
* Replace ptr_id defensive coding when releasing reference state with an
internal error (-EFAULT)
* Rework 'struct bpf_sock_tuple' to allow passing a packet pointer
* Allow direct packet access from helper
* Fix compile error with CONFIG_IPV6 enabled
* Improve commit messages

Changes since RFC:
* Split up sk_lookup() into sk_lookup_tcp(), sk_lookup_udp().
* Only take references on the socket when necessary.
* Make sk_release() only free the socket reference in this case.
* Fix some runtime reference leaks:
* Disallow BPF_LD_[ABS|IND] instructions while holding a reference.
* Disallow bpf_tail_call() while holding a reference.
* Prevent the same instruction being used for reference and other
pointer type.
* Simplify locating copies of a reference during helper calls by caching
the pointer id from the caller.
* Fix kbuild compilation warnings with particular configs.
* Improve code comments describing the new verifier pieces.
* Tested by Nitin
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

+2001 -156
+64
Documentation/networking/filter.txt
··· 1125 1125 PTR_TO_STACK Frame pointer. 1126 1126 PTR_TO_PACKET skb->data. 1127 1127 PTR_TO_PACKET_END skb->data + headlen; arithmetic forbidden. 1128 + PTR_TO_SOCKET Pointer to struct bpf_sock_ops, implicitly refcounted. 1129 + PTR_TO_SOCKET_OR_NULL 1130 + Either a pointer to a socket, or NULL; socket lookup 1131 + returns this type, which becomes a PTR_TO_SOCKET when 1132 + checked != NULL. PTR_TO_SOCKET is reference-counted, 1133 + so programs must release the reference through the 1134 + socket release function before the end of the program. 1135 + Arithmetic on these pointers is forbidden. 1128 1136 However, a pointer may be offset from this base (as a result of pointer 1129 1137 arithmetic), and this is tracked in two parts: the 'fixed offset' and 'variable 1130 1138 offset'. The former is used when an exactly-known value (e.g. an immediate ··· 1179 1171 pointer will have a variable offset known to be 4n+2 for some n, so adding the 2 1180 1172 bytes (NET_IP_ALIGN) gives a 4-byte alignment and so word-sized accesses through 1181 1173 that pointer are safe. 1174 + The 'id' field is also used on PTR_TO_SOCKET and PTR_TO_SOCKET_OR_NULL, common 1175 + to all copies of the pointer returned from a socket lookup. This has similar 1176 + behaviour to the handling for PTR_TO_MAP_VALUE_OR_NULL->PTR_TO_MAP_VALUE, but 1177 + it also handles reference tracking for the pointer. PTR_TO_SOCKET implicitly 1178 + represents a reference to the corresponding 'struct sock'. To ensure that the 1179 + reference is not leaked, it is imperative to NULL-check the reference and in 1180 + the non-NULL case, and pass the valid reference to the socket release function. 1182 1181 1183 1182 Direct packet access 1184 1183 -------------------- ··· 1458 1443 from 5 to 8: R0=imm0 R10=fp 1459 1444 8: (7a) *(u64 *)(r0 +0) = 1 1460 1445 R0 invalid mem access 'imm' 1446 + 1447 + Program that performs a socket lookup then sets the pointer to NULL without 1448 + checking it: 1449 + value: 1450 + BPF_MOV64_IMM(BPF_REG_2, 0), 1451 + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), 1452 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 1453 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 1454 + BPF_MOV64_IMM(BPF_REG_3, 4), 1455 + BPF_MOV64_IMM(BPF_REG_4, 0), 1456 + BPF_MOV64_IMM(BPF_REG_5, 0), 1457 + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), 1458 + BPF_MOV64_IMM(BPF_REG_0, 0), 1459 + BPF_EXIT_INSN(), 1460 + Error: 1461 + 0: (b7) r2 = 0 1462 + 1: (63) *(u32 *)(r10 -8) = r2 1463 + 2: (bf) r2 = r10 1464 + 3: (07) r2 += -8 1465 + 4: (b7) r3 = 4 1466 + 5: (b7) r4 = 0 1467 + 6: (b7) r5 = 0 1468 + 7: (85) call bpf_sk_lookup_tcp#65 1469 + 8: (b7) r0 = 0 1470 + 9: (95) exit 1471 + Unreleased reference id=1, alloc_insn=7 1472 + 1473 + Program that performs a socket lookup but does not NULL-check the returned 1474 + value: 1475 + BPF_MOV64_IMM(BPF_REG_2, 0), 1476 + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), 1477 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), 1478 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -8), 1479 + BPF_MOV64_IMM(BPF_REG_3, 4), 1480 + BPF_MOV64_IMM(BPF_REG_4, 0), 1481 + BPF_MOV64_IMM(BPF_REG_5, 0), 1482 + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), 1483 + BPF_EXIT_INSN(), 1484 + Error: 1485 + 0: (b7) r2 = 0 1486 + 1: (63) *(u32 *)(r10 -8) = r2 1487 + 2: (bf) r2 = r10 1488 + 3: (07) r2 += -8 1489 + 4: (b7) r3 = 4 1490 + 5: (b7) r4 = 0 1491 + 6: (b7) r5 = 0 1492 + 7: (85) call bpf_sk_lookup_tcp#65 1493 + 8: (95) exit 1494 + Unreleased reference id=1, alloc_insn=7 1461 1495 1462 1496 Testing 1463 1497 -------
+34
include/linux/bpf.h
··· 154 154 155 155 ARG_PTR_TO_CTX, /* pointer to context */ 156 156 ARG_ANYTHING, /* any (initialized) argument is ok */ 157 + ARG_PTR_TO_SOCKET, /* pointer to bpf_sock */ 157 158 }; 158 159 159 160 /* type of values returned from helper functions */ ··· 163 162 RET_VOID, /* function doesn't return anything */ 164 163 RET_PTR_TO_MAP_VALUE, /* returns a pointer to map elem value */ 165 164 RET_PTR_TO_MAP_VALUE_OR_NULL, /* returns a pointer to map elem value or NULL */ 165 + RET_PTR_TO_SOCKET_OR_NULL, /* returns a pointer to a socket or NULL */ 166 166 }; 167 167 168 168 /* eBPF function prototype used by verifier to allow BPF_CALLs from eBPF programs ··· 215 213 PTR_TO_PACKET, /* reg points to skb->data */ 216 214 PTR_TO_PACKET_END, /* skb->data + headlen */ 217 215 PTR_TO_FLOW_KEYS, /* reg points to bpf_flow_keys */ 216 + PTR_TO_SOCKET, /* reg points to struct bpf_sock */ 217 + PTR_TO_SOCKET_OR_NULL, /* reg points to struct bpf_sock or NULL */ 218 218 }; 219 219 220 220 /* The information passed from prog-specific *_is_valid_access ··· 347 343 348 344 typedef unsigned long (*bpf_ctx_copy_t)(void *dst, const void *src, 349 345 unsigned long off, unsigned long len); 346 + typedef u32 (*bpf_convert_ctx_access_t)(enum bpf_access_type type, 347 + const struct bpf_insn *src, 348 + struct bpf_insn *dst, 349 + struct bpf_prog *prog, 350 + u32 *target_size); 350 351 351 352 u64 bpf_event_output(struct bpf_map *map, u64 flags, void *meta, u64 meta_size, 352 353 void *ctx, u64 ctx_size, bpf_ctx_copy_t ctx_copy); ··· 844 835 /* Shared helpers among cBPF and eBPF. */ 845 836 void bpf_user_rnd_init_once(void); 846 837 u64 bpf_user_rnd_u32(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5); 838 + 839 + #if defined(CONFIG_NET) 840 + bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type, 841 + struct bpf_insn_access_aux *info); 842 + u32 bpf_sock_convert_ctx_access(enum bpf_access_type type, 843 + const struct bpf_insn *si, 844 + struct bpf_insn *insn_buf, 845 + struct bpf_prog *prog, 846 + u32 *target_size); 847 + #else 848 + static inline bool bpf_sock_is_valid_access(int off, int size, 849 + enum bpf_access_type type, 850 + struct bpf_insn_access_aux *info) 851 + { 852 + return false; 853 + } 854 + static inline u32 bpf_sock_convert_ctx_access(enum bpf_access_type type, 855 + const struct bpf_insn *si, 856 + struct bpf_insn *insn_buf, 857 + struct bpf_prog *prog, 858 + u32 *target_size) 859 + { 860 + return 0; 861 + } 862 + #endif 847 863 848 864 #endif /* _LINUX_BPF_H */
+34 -3
include/linux/bpf_verifier.h
··· 58 58 * offset, so they can share range knowledge. 59 59 * For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we 60 60 * came from, when one is tested for != NULL. 61 + * For PTR_TO_SOCKET this is used to share which pointers retain the 62 + * same reference to the socket, to determine proper reference freeing. 61 63 */ 62 64 u32 id; 63 65 /* For scalar types (SCALAR_VALUE), this represents our knowledge of ··· 104 102 u8 slot_type[BPF_REG_SIZE]; 105 103 }; 106 104 105 + struct bpf_reference_state { 106 + /* Track each reference created with a unique id, even if the same 107 + * instruction creates the reference multiple times (eg, via CALL). 108 + */ 109 + int id; 110 + /* Instruction where the allocation of this reference occurred. This 111 + * is used purely to inform the user of a reference leak. 112 + */ 113 + int insn_idx; 114 + }; 115 + 107 116 /* state of the program: 108 117 * type of all registers and stack info 109 118 */ ··· 132 119 */ 133 120 u32 subprogno; 134 121 135 - /* should be second to last. See copy_func_state() */ 122 + /* The following fields should be last. See copy_func_state() */ 123 + int acquired_refs; 124 + struct bpf_reference_state *refs; 136 125 int allocated_stack; 137 126 struct bpf_stack_state *stack; 138 127 }; ··· 145 130 struct bpf_func_state *frame[MAX_CALL_FRAMES]; 146 131 u32 curframe; 147 132 }; 133 + 134 + #define bpf_get_spilled_reg(slot, frame) \ 135 + (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ 136 + (frame->stack[slot].slot_type[0] == STACK_SPILL)) \ 137 + ? &frame->stack[slot].spilled_ptr : NULL) 138 + 139 + /* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */ 140 + #define bpf_for_each_spilled_reg(iter, frame, reg) \ 141 + for (iter = 0, reg = bpf_get_spilled_reg(iter, frame); \ 142 + iter < frame->allocated_stack / BPF_REG_SIZE; \ 143 + iter++, reg = bpf_get_spilled_reg(iter, frame)) 148 144 149 145 /* linked list of verifier states used to prune search */ 150 146 struct bpf_verifier_state_list { ··· 230 204 __printf(2, 3) void bpf_verifier_log_write(struct bpf_verifier_env *env, 231 205 const char *fmt, ...); 232 206 233 - static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env) 207 + static inline struct bpf_func_state *cur_func(struct bpf_verifier_env *env) 234 208 { 235 209 struct bpf_verifier_state *cur = env->cur_state; 236 210 237 - return cur->frame[cur->curframe]->regs; 211 + return cur->frame[cur->curframe]; 212 + } 213 + 214 + static inline struct bpf_reg_state *cur_regs(struct bpf_verifier_env *env) 215 + { 216 + return cur_func(env)->regs; 238 217 } 239 218 240 219 int bpf_prog_offload_verifier_prep(struct bpf_verifier_env *env);
+92 -1
include/uapi/linux/bpf.h
··· 2144 2144 * request in the skb. 2145 2145 * Return 2146 2146 * 0 on success, or a negative error in case of failure. 2147 + * 2148 + * struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags) 2149 + * Description 2150 + * Look for TCP socket matching *tuple*, optionally in a child 2151 + * network namespace *netns*. The return value must be checked, 2152 + * and if non-NULL, released via **bpf_sk_release**\ (). 2153 + * 2154 + * The *ctx* should point to the context of the program, such as 2155 + * the skb or socket (depending on the hook in use). This is used 2156 + * to determine the base network namespace for the lookup. 2157 + * 2158 + * *tuple_size* must be one of: 2159 + * 2160 + * **sizeof**\ (*tuple*\ **->ipv4**) 2161 + * Look for an IPv4 socket. 2162 + * **sizeof**\ (*tuple*\ **->ipv6**) 2163 + * Look for an IPv6 socket. 2164 + * 2165 + * If the *netns* is zero, then the socket lookup table in the 2166 + * netns associated with the *ctx* will be used. For the TC hooks, 2167 + * this in the netns of the device in the skb. For socket hooks, 2168 + * this in the netns of the socket. If *netns* is non-zero, then 2169 + * it specifies the ID of the netns relative to the netns 2170 + * associated with the *ctx*. 2171 + * 2172 + * All values for *flags* are reserved for future usage, and must 2173 + * be left at zero. 2174 + * 2175 + * This helper is available only if the kernel was compiled with 2176 + * **CONFIG_NET** configuration option. 2177 + * Return 2178 + * Pointer to *struct bpf_sock*, or NULL in case of failure. 2179 + * 2180 + * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags) 2181 + * Description 2182 + * Look for UDP socket matching *tuple*, optionally in a child 2183 + * network namespace *netns*. The return value must be checked, 2184 + * and if non-NULL, released via **bpf_sk_release**\ (). 2185 + * 2186 + * The *ctx* should point to the context of the program, such as 2187 + * the skb or socket (depending on the hook in use). This is used 2188 + * to determine the base network namespace for the lookup. 2189 + * 2190 + * *tuple_size* must be one of: 2191 + * 2192 + * **sizeof**\ (*tuple*\ **->ipv4**) 2193 + * Look for an IPv4 socket. 2194 + * **sizeof**\ (*tuple*\ **->ipv6**) 2195 + * Look for an IPv6 socket. 2196 + * 2197 + * If the *netns* is zero, then the socket lookup table in the 2198 + * netns associated with the *ctx* will be used. For the TC hooks, 2199 + * this in the netns of the device in the skb. For socket hooks, 2200 + * this in the netns of the socket. If *netns* is non-zero, then 2201 + * it specifies the ID of the netns relative to the netns 2202 + * associated with the *ctx*. 2203 + * 2204 + * All values for *flags* are reserved for future usage, and must 2205 + * be left at zero. 2206 + * 2207 + * This helper is available only if the kernel was compiled with 2208 + * **CONFIG_NET** configuration option. 2209 + * Return 2210 + * Pointer to *struct bpf_sock*, or NULL in case of failure. 2211 + * 2212 + * int bpf_sk_release(struct bpf_sock *sk) 2213 + * Description 2214 + * Release the reference held by *sock*. *sock* must be a non-NULL 2215 + * pointer that was returned from bpf_sk_lookup_xxx\ (). 2216 + * Return 2217 + * 0 on success, or a negative error in case of failure. 2147 2218 */ 2148 2219 #define __BPF_FUNC_MAPPER(FN) \ 2149 2220 FN(unspec), \ ··· 2300 2229 FN(get_current_cgroup_id), \ 2301 2230 FN(get_local_storage), \ 2302 2231 FN(sk_select_reuseport), \ 2303 - FN(skb_ancestor_cgroup_id), 2232 + FN(skb_ancestor_cgroup_id), \ 2233 + FN(sk_lookup_tcp), \ 2234 + FN(sk_lookup_udp), \ 2235 + FN(sk_release), 2304 2236 2305 2237 /* integer value in 'imm' field of BPF_CALL instruction selects which helper 2306 2238 * function eBPF program intends to call ··· 2471 2397 __u32 src_port; /* Allows 4-byte read. 2472 2398 * Stored in host byte order 2473 2399 */ 2400 + }; 2401 + 2402 + struct bpf_sock_tuple { 2403 + union { 2404 + struct { 2405 + __be32 saddr; 2406 + __be32 daddr; 2407 + __be16 sport; 2408 + __be16 dport; 2409 + } ipv4; 2410 + struct { 2411 + __be32 saddr[4]; 2412 + __be32 daddr[4]; 2413 + __be16 sport; 2414 + __be16 dport; 2415 + } ipv6; 2416 + }; 2474 2417 }; 2475 2418 2476 2419 #define XDP_PACKET_HEADROOM 256
+492 -110
kernel/bpf/verifier.c
··· 1 1 /* Copyright (c) 2011-2014 PLUMgrid, http://plumgrid.com 2 2 * Copyright (c) 2016 Facebook 3 + * Copyright (c) 2018 Covalent IO, Inc. http://covalent.io 3 4 * 4 5 * This program is free software; you can redistribute it and/or 5 6 * modify it under the terms of version 2 of the GNU General Public ··· 81 80 * (like pointer plus pointer becomes SCALAR_VALUE type) 82 81 * 83 82 * When verifier sees load or store instructions the type of base register 84 - * can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, PTR_TO_STACK. These are three pointer 85 - * types recognized by check_mem_access() function. 83 + * can be: PTR_TO_MAP_VALUE, PTR_TO_CTX, PTR_TO_STACK, PTR_TO_SOCKET. These are 84 + * four pointer types recognized by check_mem_access() function. 86 85 * 87 86 * PTR_TO_MAP_VALUE means that this register is pointing to 'map element value' 88 87 * and the range of [ptr, ptr + map's value_size) is accessible. ··· 141 140 * 142 141 * After the call R0 is set to return type of the function and registers R1-R5 143 142 * are set to NOT_INIT to indicate that they are no longer readable. 143 + * 144 + * The following reference types represent a potential reference to a kernel 145 + * resource which, after first being allocated, must be checked and freed by 146 + * the BPF program: 147 + * - PTR_TO_SOCKET_OR_NULL, PTR_TO_SOCKET 148 + * 149 + * When the verifier sees a helper call return a reference type, it allocates a 150 + * pointer id for the reference and stores it in the current function state. 151 + * Similar to the way that PTR_TO_MAP_VALUE_OR_NULL is converted into 152 + * PTR_TO_MAP_VALUE, PTR_TO_SOCKET_OR_NULL becomes PTR_TO_SOCKET when the type 153 + * passes through a NULL-check conditional. For the branch wherein the state is 154 + * changed to CONST_IMM, the verifier releases the reference. 155 + * 156 + * For each helper function that allocates a reference, such as 157 + * bpf_sk_lookup_tcp(), there is a corresponding release function, such as 158 + * bpf_sk_release(). When a reference type passes into the release function, 159 + * the verifier also releases the reference. If any unchecked or unreleased 160 + * reference remains at the end of the program, the verifier rejects it. 144 161 */ 145 162 146 163 /* verifier_state + insn_idx are pushed to stack when branch is encountered */ ··· 208 189 int access_size; 209 190 s64 msize_smax_value; 210 191 u64 msize_umax_value; 192 + int ptr_id; 211 193 }; 212 194 213 195 static DEFINE_MUTEX(bpf_verifier_lock); ··· 269 249 type == PTR_TO_PACKET_META; 270 250 } 271 251 252 + static bool reg_type_may_be_null(enum bpf_reg_type type) 253 + { 254 + return type == PTR_TO_MAP_VALUE_OR_NULL || 255 + type == PTR_TO_SOCKET_OR_NULL; 256 + } 257 + 258 + static bool type_is_refcounted(enum bpf_reg_type type) 259 + { 260 + return type == PTR_TO_SOCKET; 261 + } 262 + 263 + static bool type_is_refcounted_or_null(enum bpf_reg_type type) 264 + { 265 + return type == PTR_TO_SOCKET || type == PTR_TO_SOCKET_OR_NULL; 266 + } 267 + 268 + static bool reg_is_refcounted(const struct bpf_reg_state *reg) 269 + { 270 + return type_is_refcounted(reg->type); 271 + } 272 + 273 + static bool reg_is_refcounted_or_null(const struct bpf_reg_state *reg) 274 + { 275 + return type_is_refcounted_or_null(reg->type); 276 + } 277 + 278 + static bool arg_type_is_refcounted(enum bpf_arg_type type) 279 + { 280 + return type == ARG_PTR_TO_SOCKET; 281 + } 282 + 283 + /* Determine whether the function releases some resources allocated by another 284 + * function call. The first reference type argument will be assumed to be 285 + * released by release_reference(). 286 + */ 287 + static bool is_release_function(enum bpf_func_id func_id) 288 + { 289 + return func_id == BPF_FUNC_sk_release; 290 + } 291 + 272 292 /* string representation of 'enum bpf_reg_type' */ 273 293 static const char * const reg_type_str[] = { 274 294 [NOT_INIT] = "?", ··· 322 262 [PTR_TO_PACKET_META] = "pkt_meta", 323 263 [PTR_TO_PACKET_END] = "pkt_end", 324 264 [PTR_TO_FLOW_KEYS] = "flow_keys", 265 + [PTR_TO_SOCKET] = "sock", 266 + [PTR_TO_SOCKET_OR_NULL] = "sock_or_null", 325 267 }; 326 268 327 269 static char slot_type_char[] = { ··· 440 378 else 441 379 verbose(env, "=%s", types_buf); 442 380 } 381 + if (state->acquired_refs && state->refs[0].id) { 382 + verbose(env, " refs=%d", state->refs[0].id); 383 + for (i = 1; i < state->acquired_refs; i++) 384 + if (state->refs[i].id) 385 + verbose(env, ",%d", state->refs[i].id); 386 + } 443 387 verbose(env, "\n"); 444 388 } 445 389 446 - static int copy_stack_state(struct bpf_func_state *dst, 447 - const struct bpf_func_state *src) 448 - { 449 - if (!src->stack) 450 - return 0; 451 - if (WARN_ON_ONCE(dst->allocated_stack < src->allocated_stack)) { 452 - /* internal bug, make state invalid to reject the program */ 453 - memset(dst, 0, sizeof(*dst)); 454 - return -EFAULT; 455 - } 456 - memcpy(dst->stack, src->stack, 457 - sizeof(*src->stack) * (src->allocated_stack / BPF_REG_SIZE)); 458 - return 0; 390 + #define COPY_STATE_FN(NAME, COUNT, FIELD, SIZE) \ 391 + static int copy_##NAME##_state(struct bpf_func_state *dst, \ 392 + const struct bpf_func_state *src) \ 393 + { \ 394 + if (!src->FIELD) \ 395 + return 0; \ 396 + if (WARN_ON_ONCE(dst->COUNT < src->COUNT)) { \ 397 + /* internal bug, make state invalid to reject the program */ \ 398 + memset(dst, 0, sizeof(*dst)); \ 399 + return -EFAULT; \ 400 + } \ 401 + memcpy(dst->FIELD, src->FIELD, \ 402 + sizeof(*src->FIELD) * (src->COUNT / SIZE)); \ 403 + return 0; \ 459 404 } 405 + /* copy_reference_state() */ 406 + COPY_STATE_FN(reference, acquired_refs, refs, 1) 407 + /* copy_stack_state() */ 408 + COPY_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE) 409 + #undef COPY_STATE_FN 410 + 411 + #define REALLOC_STATE_FN(NAME, COUNT, FIELD, SIZE) \ 412 + static int realloc_##NAME##_state(struct bpf_func_state *state, int size, \ 413 + bool copy_old) \ 414 + { \ 415 + u32 old_size = state->COUNT; \ 416 + struct bpf_##NAME##_state *new_##FIELD; \ 417 + int slot = size / SIZE; \ 418 + \ 419 + if (size <= old_size || !size) { \ 420 + if (copy_old) \ 421 + return 0; \ 422 + state->COUNT = slot * SIZE; \ 423 + if (!size && old_size) { \ 424 + kfree(state->FIELD); \ 425 + state->FIELD = NULL; \ 426 + } \ 427 + return 0; \ 428 + } \ 429 + new_##FIELD = kmalloc_array(slot, sizeof(struct bpf_##NAME##_state), \ 430 + GFP_KERNEL); \ 431 + if (!new_##FIELD) \ 432 + return -ENOMEM; \ 433 + if (copy_old) { \ 434 + if (state->FIELD) \ 435 + memcpy(new_##FIELD, state->FIELD, \ 436 + sizeof(*new_##FIELD) * (old_size / SIZE)); \ 437 + memset(new_##FIELD + old_size / SIZE, 0, \ 438 + sizeof(*new_##FIELD) * (size - old_size) / SIZE); \ 439 + } \ 440 + state->COUNT = slot * SIZE; \ 441 + kfree(state->FIELD); \ 442 + state->FIELD = new_##FIELD; \ 443 + return 0; \ 444 + } 445 + /* realloc_reference_state() */ 446 + REALLOC_STATE_FN(reference, acquired_refs, refs, 1) 447 + /* realloc_stack_state() */ 448 + REALLOC_STATE_FN(stack, allocated_stack, stack, BPF_REG_SIZE) 449 + #undef REALLOC_STATE_FN 460 450 461 451 /* do_check() starts with zero-sized stack in struct bpf_verifier_state to 462 452 * make it consume minimal amount of memory. check_stack_write() access from 463 453 * the program calls into realloc_func_state() to grow the stack size. 464 - * Note there is a non-zero parent pointer inside each reg of bpf_verifier_state 465 - * which this function copies over. It points to corresponding reg in previous 466 - * bpf_verifier_state which is never reallocated 454 + * Note there is a non-zero 'parent' pointer inside bpf_verifier_state 455 + * which realloc_stack_state() copies over. It points to previous 456 + * bpf_verifier_state which is never reallocated. 467 457 */ 468 - static int realloc_func_state(struct bpf_func_state *state, int size, 469 - bool copy_old) 458 + static int realloc_func_state(struct bpf_func_state *state, int stack_size, 459 + int refs_size, bool copy_old) 470 460 { 471 - u32 old_size = state->allocated_stack; 472 - struct bpf_stack_state *new_stack; 473 - int slot = size / BPF_REG_SIZE; 461 + int err = realloc_reference_state(state, refs_size, copy_old); 462 + if (err) 463 + return err; 464 + return realloc_stack_state(state, stack_size, copy_old); 465 + } 474 466 475 - if (size <= old_size || !size) { 476 - if (copy_old) 467 + /* Acquire a pointer id from the env and update the state->refs to include 468 + * this new pointer reference. 469 + * On success, returns a valid pointer id to associate with the register 470 + * On failure, returns a negative errno. 471 + */ 472 + static int acquire_reference_state(struct bpf_verifier_env *env, int insn_idx) 473 + { 474 + struct bpf_func_state *state = cur_func(env); 475 + int new_ofs = state->acquired_refs; 476 + int id, err; 477 + 478 + err = realloc_reference_state(state, state->acquired_refs + 1, true); 479 + if (err) 480 + return err; 481 + id = ++env->id_gen; 482 + state->refs[new_ofs].id = id; 483 + state->refs[new_ofs].insn_idx = insn_idx; 484 + 485 + return id; 486 + } 487 + 488 + /* release function corresponding to acquire_reference_state(). Idempotent. */ 489 + static int __release_reference_state(struct bpf_func_state *state, int ptr_id) 490 + { 491 + int i, last_idx; 492 + 493 + if (!ptr_id) 494 + return -EFAULT; 495 + 496 + last_idx = state->acquired_refs - 1; 497 + for (i = 0; i < state->acquired_refs; i++) { 498 + if (state->refs[i].id == ptr_id) { 499 + if (last_idx && i != last_idx) 500 + memcpy(&state->refs[i], &state->refs[last_idx], 501 + sizeof(*state->refs)); 502 + memset(&state->refs[last_idx], 0, sizeof(*state->refs)); 503 + state->acquired_refs--; 477 504 return 0; 478 - state->allocated_stack = slot * BPF_REG_SIZE; 479 - if (!size && old_size) { 480 - kfree(state->stack); 481 - state->stack = NULL; 482 505 } 483 - return 0; 484 506 } 485 - new_stack = kmalloc_array(slot, sizeof(struct bpf_stack_state), 486 - GFP_KERNEL); 487 - if (!new_stack) 488 - return -ENOMEM; 489 - if (copy_old) { 490 - if (state->stack) 491 - memcpy(new_stack, state->stack, 492 - sizeof(*new_stack) * (old_size / BPF_REG_SIZE)); 493 - memset(new_stack + old_size / BPF_REG_SIZE, 0, 494 - sizeof(*new_stack) * (size - old_size) / BPF_REG_SIZE); 495 - } 496 - state->allocated_stack = slot * BPF_REG_SIZE; 497 - kfree(state->stack); 498 - state->stack = new_stack; 507 + return -EFAULT; 508 + } 509 + 510 + /* variation on the above for cases where we expect that there must be an 511 + * outstanding reference for the specified ptr_id. 512 + */ 513 + static int release_reference_state(struct bpf_verifier_env *env, int ptr_id) 514 + { 515 + struct bpf_func_state *state = cur_func(env); 516 + int err; 517 + 518 + err = __release_reference_state(state, ptr_id); 519 + if (WARN_ON_ONCE(err != 0)) 520 + verbose(env, "verifier internal error: can't release reference\n"); 521 + return err; 522 + } 523 + 524 + static int transfer_reference_state(struct bpf_func_state *dst, 525 + struct bpf_func_state *src) 526 + { 527 + int err = realloc_reference_state(dst, src->acquired_refs, false); 528 + if (err) 529 + return err; 530 + err = copy_reference_state(dst, src); 531 + if (err) 532 + return err; 499 533 return 0; 500 534 } 501 535 ··· 599 441 { 600 442 if (!state) 601 443 return; 444 + kfree(state->refs); 602 445 kfree(state->stack); 603 446 kfree(state); 604 447 } ··· 625 466 { 626 467 int err; 627 468 628 - err = realloc_func_state(dst, src->allocated_stack, false); 469 + err = realloc_func_state(dst, src->allocated_stack, src->acquired_refs, 470 + false); 629 471 if (err) 630 472 return err; 631 - memcpy(dst, src, offsetof(struct bpf_func_state, allocated_stack)); 473 + memcpy(dst, src, offsetof(struct bpf_func_state, acquired_refs)); 474 + err = copy_reference_state(dst, src); 475 + if (err) 476 + return err; 632 477 return copy_stack_state(dst, src); 633 478 } 634 479 ··· 1131 968 case PTR_TO_PACKET_END: 1132 969 case PTR_TO_FLOW_KEYS: 1133 970 case CONST_PTR_TO_MAP: 971 + case PTR_TO_SOCKET: 972 + case PTR_TO_SOCKET_OR_NULL: 1134 973 return true; 1135 974 default: 1136 975 return false; ··· 1157 992 enum bpf_reg_type type; 1158 993 1159 994 err = realloc_func_state(state, round_up(slot + 1, BPF_REG_SIZE), 1160 - true); 995 + state->acquired_refs, true); 1161 996 if (err) 1162 997 return err; 1163 998 /* caller checked that off % size == 0 and -MAX_BPF_STACK <= off < 0, ··· 1501 1336 return 0; 1502 1337 } 1503 1338 1339 + static int check_sock_access(struct bpf_verifier_env *env, u32 regno, int off, 1340 + int size, enum bpf_access_type t) 1341 + { 1342 + struct bpf_reg_state *regs = cur_regs(env); 1343 + struct bpf_reg_state *reg = &regs[regno]; 1344 + struct bpf_insn_access_aux info; 1345 + 1346 + if (reg->smin_value < 0) { 1347 + verbose(env, "R%d min value is negative, either use unsigned index or do a if (index >=0) check.\n", 1348 + regno); 1349 + return -EACCES; 1350 + } 1351 + 1352 + if (!bpf_sock_is_valid_access(off, size, t, &info)) { 1353 + verbose(env, "invalid bpf_sock access off=%d size=%d\n", 1354 + off, size); 1355 + return -EACCES; 1356 + } 1357 + 1358 + return 0; 1359 + } 1360 + 1504 1361 static bool __is_pointer_value(bool allow_ptr_leaks, 1505 1362 const struct bpf_reg_state *reg) 1506 1363 { ··· 1541 1354 { 1542 1355 const struct bpf_reg_state *reg = cur_regs(env) + regno; 1543 1356 1544 - return reg->type == PTR_TO_CTX; 1357 + return reg->type == PTR_TO_CTX || 1358 + reg->type == PTR_TO_SOCKET; 1545 1359 } 1546 1360 1547 1361 static bool is_pkt_reg(struct bpf_verifier_env *env, int regno) ··· 1641 1453 * aligned. 1642 1454 */ 1643 1455 strict = true; 1456 + break; 1457 + case PTR_TO_SOCKET: 1458 + pointer_desc = "sock "; 1644 1459 break; 1645 1460 default: 1646 1461 break; ··· 1912 1721 err = check_flow_keys_access(env, off, size); 1913 1722 if (!err && t == BPF_READ && value_regno >= 0) 1914 1723 mark_reg_unknown(env, regs, value_regno); 1724 + } else if (reg->type == PTR_TO_SOCKET) { 1725 + if (t == BPF_WRITE) { 1726 + verbose(env, "cannot write into socket\n"); 1727 + return -EACCES; 1728 + } 1729 + err = check_sock_access(env, regno, off, size, t); 1730 + if (!err && value_regno >= 0) 1731 + mark_reg_unknown(env, regs, value_regno); 1915 1732 } else { 1916 1733 verbose(env, "R%d invalid mem access '%s'\n", regno, 1917 1734 reg_type_str[reg->type]); ··· 1962 1763 if (is_ctx_reg(env, insn->dst_reg) || 1963 1764 is_pkt_reg(env, insn->dst_reg)) { 1964 1765 verbose(env, "BPF_XADD stores into R%d %s is not allowed\n", 1965 - insn->dst_reg, is_ctx_reg(env, insn->dst_reg) ? 1966 - "context" : "packet"); 1766 + insn->dst_reg, reg_type_str[insn->dst_reg]); 1967 1767 return -EACCES; 1968 1768 } 1969 1769 ··· 2142 1944 err = check_ctx_reg(env, reg, regno); 2143 1945 if (err < 0) 2144 1946 return err; 1947 + } else if (arg_type == ARG_PTR_TO_SOCKET) { 1948 + expected_type = PTR_TO_SOCKET; 1949 + if (type != expected_type) 1950 + goto err_type; 1951 + if (meta->ptr_id || !reg->id) { 1952 + verbose(env, "verifier internal error: mismatched references meta=%d, reg=%d\n", 1953 + meta->ptr_id, reg->id); 1954 + return -EFAULT; 1955 + } 1956 + meta->ptr_id = reg->id; 2145 1957 } else if (arg_type_is_mem_ptr(arg_type)) { 2146 1958 expected_type = PTR_TO_STACK; 2147 1959 /* One exception here. In case function allows for NULL to be ··· 2441 2233 return true; 2442 2234 } 2443 2235 2236 + static bool check_refcount_ok(const struct bpf_func_proto *fn) 2237 + { 2238 + int count = 0; 2239 + 2240 + if (arg_type_is_refcounted(fn->arg1_type)) 2241 + count++; 2242 + if (arg_type_is_refcounted(fn->arg2_type)) 2243 + count++; 2244 + if (arg_type_is_refcounted(fn->arg3_type)) 2245 + count++; 2246 + if (arg_type_is_refcounted(fn->arg4_type)) 2247 + count++; 2248 + if (arg_type_is_refcounted(fn->arg5_type)) 2249 + count++; 2250 + 2251 + /* We only support one arg being unreferenced at the moment, 2252 + * which is sufficient for the helper functions we have right now. 2253 + */ 2254 + return count <= 1; 2255 + } 2256 + 2444 2257 static int check_func_proto(const struct bpf_func_proto *fn) 2445 2258 { 2446 2259 return check_raw_mode_ok(fn) && 2447 - check_arg_pair_ok(fn) ? 0 : -EINVAL; 2260 + check_arg_pair_ok(fn) && 2261 + check_refcount_ok(fn) ? 0 : -EINVAL; 2448 2262 } 2449 2263 2450 2264 /* Packet data might have moved, any old PTR_TO_PACKET[_META,_END] ··· 2482 2252 if (reg_is_pkt_pointer_any(&regs[i])) 2483 2253 mark_reg_unknown(env, regs, i); 2484 2254 2485 - for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) { 2486 - if (state->stack[i].slot_type[0] != STACK_SPILL) 2255 + bpf_for_each_spilled_reg(i, state, reg) { 2256 + if (!reg) 2487 2257 continue; 2488 - reg = &state->stack[i].spilled_ptr; 2489 2258 if (reg_is_pkt_pointer_any(reg)) 2490 2259 __mark_reg_unknown(reg); 2491 2260 } ··· 2499 2270 __clear_all_pkt_pointers(env, vstate->frame[i]); 2500 2271 } 2501 2272 2273 + static void release_reg_references(struct bpf_verifier_env *env, 2274 + struct bpf_func_state *state, int id) 2275 + { 2276 + struct bpf_reg_state *regs = state->regs, *reg; 2277 + int i; 2278 + 2279 + for (i = 0; i < MAX_BPF_REG; i++) 2280 + if (regs[i].id == id) 2281 + mark_reg_unknown(env, regs, i); 2282 + 2283 + bpf_for_each_spilled_reg(i, state, reg) { 2284 + if (!reg) 2285 + continue; 2286 + if (reg_is_refcounted(reg) && reg->id == id) 2287 + __mark_reg_unknown(reg); 2288 + } 2289 + } 2290 + 2291 + /* The pointer with the specified id has released its reference to kernel 2292 + * resources. Identify all copies of the same pointer and clear the reference. 2293 + */ 2294 + static int release_reference(struct bpf_verifier_env *env, 2295 + struct bpf_call_arg_meta *meta) 2296 + { 2297 + struct bpf_verifier_state *vstate = env->cur_state; 2298 + int i; 2299 + 2300 + for (i = 0; i <= vstate->curframe; i++) 2301 + release_reg_references(env, vstate->frame[i], meta->ptr_id); 2302 + 2303 + return release_reference_state(env, meta->ptr_id); 2304 + } 2305 + 2502 2306 static int check_func_call(struct bpf_verifier_env *env, struct bpf_insn *insn, 2503 2307 int *insn_idx) 2504 2308 { 2505 2309 struct bpf_verifier_state *state = env->cur_state; 2506 2310 struct bpf_func_state *caller, *callee; 2507 - int i, subprog, target_insn; 2311 + int i, err, subprog, target_insn; 2508 2312 2509 2313 if (state->curframe + 1 >= MAX_CALL_FRAMES) { 2510 2314 verbose(env, "the call stack of %d frames is too deep\n", ··· 2575 2313 state->curframe + 1 /* frameno within this callchain */, 2576 2314 subprog /* subprog number within this prog */); 2577 2315 2316 + /* Transfer references to the callee */ 2317 + err = transfer_reference_state(callee, caller); 2318 + if (err) 2319 + return err; 2320 + 2578 2321 /* copy r1 - r5 args that callee can access. The copy includes parent 2579 2322 * pointers, which connects us up to the liveness chain 2580 2323 */ ··· 2612 2345 struct bpf_verifier_state *state = env->cur_state; 2613 2346 struct bpf_func_state *caller, *callee; 2614 2347 struct bpf_reg_state *r0; 2348 + int err; 2615 2349 2616 2350 callee = state->frame[state->curframe]; 2617 2351 r0 = &callee->regs[BPF_REG_0]; ··· 2631 2363 caller = state->frame[state->curframe]; 2632 2364 /* return to the caller whatever r0 had in the callee */ 2633 2365 caller->regs[BPF_REG_0] = *r0; 2366 + 2367 + /* Transfer references to the caller */ 2368 + err = transfer_reference_state(caller, callee); 2369 + if (err) 2370 + return err; 2634 2371 2635 2372 *insn_idx = callee->callsite + 1; 2636 2373 if (env->log.level) { ··· 2691 2418 bpf_map_ptr_store(aux, BPF_MAP_PTR_POISON, 2692 2419 meta->map_ptr->unpriv_array); 2693 2420 return 0; 2421 + } 2422 + 2423 + static int check_reference_leak(struct bpf_verifier_env *env) 2424 + { 2425 + struct bpf_func_state *state = cur_func(env); 2426 + int i; 2427 + 2428 + for (i = 0; i < state->acquired_refs; i++) { 2429 + verbose(env, "Unreleased reference id=%d alloc_insn=%d\n", 2430 + state->refs[i].id, state->refs[i].insn_idx); 2431 + } 2432 + return state->acquired_refs ? -EINVAL : 0; 2694 2433 } 2695 2434 2696 2435 static int check_helper_call(struct bpf_verifier_env *env, int func_id, int insn_idx) ··· 2783 2498 return err; 2784 2499 } 2785 2500 2501 + if (func_id == BPF_FUNC_tail_call) { 2502 + err = check_reference_leak(env); 2503 + if (err) { 2504 + verbose(env, "tail_call would lead to reference leak\n"); 2505 + return err; 2506 + } 2507 + } else if (is_release_function(func_id)) { 2508 + err = release_reference(env, &meta); 2509 + if (err) 2510 + return err; 2511 + } 2512 + 2786 2513 regs = cur_regs(env); 2787 2514 2788 2515 /* check that flags argument in get_local_storage(map, flags) is 0, ··· 2837 2540 } 2838 2541 regs[BPF_REG_0].map_ptr = meta.map_ptr; 2839 2542 regs[BPF_REG_0].id = ++env->id_gen; 2543 + } else if (fn->ret_type == RET_PTR_TO_SOCKET_OR_NULL) { 2544 + int id = acquire_reference_state(env, insn_idx); 2545 + if (id < 0) 2546 + return id; 2547 + mark_reg_known_zero(env, regs, BPF_REG_0); 2548 + regs[BPF_REG_0].type = PTR_TO_SOCKET_OR_NULL; 2549 + regs[BPF_REG_0].id = id; 2840 2550 } else { 2841 2551 verbose(env, "unknown return type %d of func %s#%d\n", 2842 2552 fn->ret_type, func_id_name(func_id), func_id); ··· 2974 2670 return -EACCES; 2975 2671 } 2976 2672 2977 - if (ptr_reg->type == PTR_TO_MAP_VALUE_OR_NULL) { 2978 - verbose(env, "R%d pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL prohibited, null-check it first\n", 2979 - dst); 2673 + switch (ptr_reg->type) { 2674 + case PTR_TO_MAP_VALUE_OR_NULL: 2675 + verbose(env, "R%d pointer arithmetic on %s prohibited, null-check it first\n", 2676 + dst, reg_type_str[ptr_reg->type]); 2980 2677 return -EACCES; 2981 - } 2982 - if (ptr_reg->type == CONST_PTR_TO_MAP) { 2983 - verbose(env, "R%d pointer arithmetic on CONST_PTR_TO_MAP prohibited\n", 2984 - dst); 2678 + case CONST_PTR_TO_MAP: 2679 + case PTR_TO_PACKET_END: 2680 + case PTR_TO_SOCKET: 2681 + case PTR_TO_SOCKET_OR_NULL: 2682 + verbose(env, "R%d pointer arithmetic on %s prohibited\n", 2683 + dst, reg_type_str[ptr_reg->type]); 2985 2684 return -EACCES; 2986 - } 2987 - if (ptr_reg->type == PTR_TO_PACKET_END) { 2988 - verbose(env, "R%d pointer arithmetic on PTR_TO_PACKET_END prohibited\n", 2989 - dst); 2990 - return -EACCES; 2685 + default: 2686 + break; 2991 2687 } 2992 2688 2993 2689 /* In case of 'scalar += pointer', dst_reg inherits pointer type and id. ··· 3699 3395 3700 3396 for (j = 0; j <= vstate->curframe; j++) { 3701 3397 state = vstate->frame[j]; 3702 - for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) { 3703 - if (state->stack[i].slot_type[0] != STACK_SPILL) 3398 + bpf_for_each_spilled_reg(i, state, reg) { 3399 + if (!reg) 3704 3400 continue; 3705 - reg = &state->stack[i].spilled_ptr; 3706 3401 if (reg->type == type && reg->id == dst_reg->id) 3707 3402 reg->range = max(reg->range, new_range); 3708 3403 } ··· 3907 3604 } 3908 3605 } 3909 3606 3910 - static void mark_map_reg(struct bpf_reg_state *regs, u32 regno, u32 id, 3911 - bool is_null) 3607 + static void mark_ptr_or_null_reg(struct bpf_func_state *state, 3608 + struct bpf_reg_state *reg, u32 id, 3609 + bool is_null) 3912 3610 { 3913 - struct bpf_reg_state *reg = &regs[regno]; 3914 - 3915 - if (reg->type == PTR_TO_MAP_VALUE_OR_NULL && reg->id == id) { 3611 + if (reg_type_may_be_null(reg->type) && reg->id == id) { 3916 3612 /* Old offset (both fixed and variable parts) should 3917 3613 * have been known-zero, because we don't allow pointer 3918 3614 * arithmetic on pointers that might be NULL. ··· 3924 3622 } 3925 3623 if (is_null) { 3926 3624 reg->type = SCALAR_VALUE; 3927 - } else if (reg->map_ptr->inner_map_meta) { 3928 - reg->type = CONST_PTR_TO_MAP; 3929 - reg->map_ptr = reg->map_ptr->inner_map_meta; 3930 - } else { 3931 - reg->type = PTR_TO_MAP_VALUE; 3625 + } else if (reg->type == PTR_TO_MAP_VALUE_OR_NULL) { 3626 + if (reg->map_ptr->inner_map_meta) { 3627 + reg->type = CONST_PTR_TO_MAP; 3628 + reg->map_ptr = reg->map_ptr->inner_map_meta; 3629 + } else { 3630 + reg->type = PTR_TO_MAP_VALUE; 3631 + } 3632 + } else if (reg->type == PTR_TO_SOCKET_OR_NULL) { 3633 + reg->type = PTR_TO_SOCKET; 3932 3634 } 3933 - /* We don't need id from this point onwards anymore, thus we 3934 - * should better reset it, so that state pruning has chances 3935 - * to take effect. 3936 - */ 3937 - reg->id = 0; 3635 + if (is_null || !reg_is_refcounted(reg)) { 3636 + /* We don't need id from this point onwards anymore, 3637 + * thus we should better reset it, so that state 3638 + * pruning has chances to take effect. 3639 + */ 3640 + reg->id = 0; 3641 + } 3938 3642 } 3939 3643 } 3940 3644 3941 3645 /* The logic is similar to find_good_pkt_pointers(), both could eventually 3942 3646 * be folded together at some point. 3943 3647 */ 3944 - static void mark_map_regs(struct bpf_verifier_state *vstate, u32 regno, 3945 - bool is_null) 3648 + static void mark_ptr_or_null_regs(struct bpf_verifier_state *vstate, u32 regno, 3649 + bool is_null) 3946 3650 { 3947 3651 struct bpf_func_state *state = vstate->frame[vstate->curframe]; 3948 - struct bpf_reg_state *regs = state->regs; 3652 + struct bpf_reg_state *reg, *regs = state->regs; 3949 3653 u32 id = regs[regno].id; 3950 3654 int i, j; 3951 3655 3656 + if (reg_is_refcounted_or_null(&regs[regno]) && is_null) 3657 + __release_reference_state(state, id); 3658 + 3952 3659 for (i = 0; i < MAX_BPF_REG; i++) 3953 - mark_map_reg(regs, i, id, is_null); 3660 + mark_ptr_or_null_reg(state, &regs[i], id, is_null); 3954 3661 3955 3662 for (j = 0; j <= vstate->curframe; j++) { 3956 3663 state = vstate->frame[j]; 3957 - for (i = 0; i < state->allocated_stack / BPF_REG_SIZE; i++) { 3958 - if (state->stack[i].slot_type[0] != STACK_SPILL) 3664 + bpf_for_each_spilled_reg(i, state, reg) { 3665 + if (!reg) 3959 3666 continue; 3960 - mark_map_reg(&state->stack[i].spilled_ptr, 0, id, is_null); 3667 + mark_ptr_or_null_reg(state, reg, id, is_null); 3961 3668 } 3962 3669 } 3963 3670 } ··· 4168 3857 /* detect if R == 0 where R is returned from bpf_map_lookup_elem() */ 4169 3858 if (BPF_SRC(insn->code) == BPF_K && 4170 3859 insn->imm == 0 && (opcode == BPF_JEQ || opcode == BPF_JNE) && 4171 - dst_reg->type == PTR_TO_MAP_VALUE_OR_NULL) { 4172 - /* Mark all identical map registers in each branch as either 3860 + reg_type_may_be_null(dst_reg->type)) { 3861 + /* Mark all identical registers in each branch as either 4173 3862 * safe or unknown depending R == 0 or R != 0 conditional. 4174 3863 */ 4175 - mark_map_regs(this_branch, insn->dst_reg, opcode == BPF_JNE); 4176 - mark_map_regs(other_branch, insn->dst_reg, opcode == BPF_JEQ); 3864 + mark_ptr_or_null_regs(this_branch, insn->dst_reg, 3865 + opcode == BPF_JNE); 3866 + mark_ptr_or_null_regs(other_branch, insn->dst_reg, 3867 + opcode == BPF_JEQ); 4177 3868 } else if (!try_match_pkt_pointers(insn, dst_reg, &regs[insn->src_reg], 4178 3869 this_branch, other_branch) && 4179 3870 is_pointer_value(env, insn->dst_reg)) { ··· 4297 3984 err = check_reg_arg(env, BPF_REG_6, SRC_OP); 4298 3985 if (err) 4299 3986 return err; 3987 + 3988 + /* Disallow usage of BPF_LD_[ABS|IND] with reference tracking, as 3989 + * gen_ld_abs() may terminate the program at runtime, leading to 3990 + * reference leak. 3991 + */ 3992 + err = check_reference_leak(env); 3993 + if (err) { 3994 + verbose(env, "BPF_LD_[ABS|IND] cannot be mixed with socket references\n"); 3995 + return err; 3996 + } 4300 3997 4301 3998 if (regs[BPF_REG_6].type != PTR_TO_CTX) { 4302 3999 verbose(env, ··· 4723 4400 case CONST_PTR_TO_MAP: 4724 4401 case PTR_TO_PACKET_END: 4725 4402 case PTR_TO_FLOW_KEYS: 4403 + case PTR_TO_SOCKET: 4404 + case PTR_TO_SOCKET_OR_NULL: 4726 4405 /* Only valid matches are exact, which memcmp() above 4727 4406 * would have accepted 4728 4407 */ ··· 4800 4475 return true; 4801 4476 } 4802 4477 4478 + static bool refsafe(struct bpf_func_state *old, struct bpf_func_state *cur) 4479 + { 4480 + if (old->acquired_refs != cur->acquired_refs) 4481 + return false; 4482 + return !memcmp(old->refs, cur->refs, 4483 + sizeof(*old->refs) * old->acquired_refs); 4484 + } 4485 + 4803 4486 /* compare two verifier states 4804 4487 * 4805 4488 * all states stored in state_list are known to be valid, since ··· 4852 4519 } 4853 4520 4854 4521 if (!stacksafe(old, cur, idmap)) 4522 + goto out_free; 4523 + 4524 + if (!refsafe(old, cur)) 4855 4525 goto out_free; 4856 4526 ret = true; 4857 4527 out_free: ··· 5013 4677 return 0; 5014 4678 } 5015 4679 4680 + /* Return true if it's OK to have the same insn return a different type. */ 4681 + static bool reg_type_mismatch_ok(enum bpf_reg_type type) 4682 + { 4683 + switch (type) { 4684 + case PTR_TO_CTX: 4685 + case PTR_TO_SOCKET: 4686 + case PTR_TO_SOCKET_OR_NULL: 4687 + return false; 4688 + default: 4689 + return true; 4690 + } 4691 + } 4692 + 4693 + /* If an instruction was previously used with particular pointer types, then we 4694 + * need to be careful to avoid cases such as the below, where it may be ok 4695 + * for one branch accessing the pointer, but not ok for the other branch: 4696 + * 4697 + * R1 = sock_ptr 4698 + * goto X; 4699 + * ... 4700 + * R1 = some_other_valid_ptr; 4701 + * goto X; 4702 + * ... 4703 + * R2 = *(u32 *)(R1 + 0); 4704 + */ 4705 + static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev) 4706 + { 4707 + return src != prev && (!reg_type_mismatch_ok(src) || 4708 + !reg_type_mismatch_ok(prev)); 4709 + } 4710 + 5016 4711 static int do_check(struct bpf_verifier_env *env) 5017 4712 { 5018 4713 struct bpf_verifier_state *state; ··· 5137 4770 5138 4771 regs = cur_regs(env); 5139 4772 env->insn_aux_data[insn_idx].seen = true; 4773 + 5140 4774 if (class == BPF_ALU || class == BPF_ALU64) { 5141 4775 err = check_alu_op(env, insn); 5142 4776 if (err) ··· 5177 4809 */ 5178 4810 *prev_src_type = src_reg_type; 5179 4811 5180 - } else if (src_reg_type != *prev_src_type && 5181 - (src_reg_type == PTR_TO_CTX || 5182 - *prev_src_type == PTR_TO_CTX)) { 4812 + } else if (reg_type_mismatch(src_reg_type, *prev_src_type)) { 5183 4813 /* ABuser program is trying to use the same insn 5184 4814 * dst_reg = *(u32*) (src_reg + off) 5185 4815 * with different pointer types: ··· 5222 4856 5223 4857 if (*prev_dst_type == NOT_INIT) { 5224 4858 *prev_dst_type = dst_reg_type; 5225 - } else if (dst_reg_type != *prev_dst_type && 5226 - (dst_reg_type == PTR_TO_CTX || 5227 - *prev_dst_type == PTR_TO_CTX)) { 4859 + } else if (reg_type_mismatch(dst_reg_type, *prev_dst_type)) { 5228 4860 verbose(env, "same insn cannot be used with different pointers\n"); 5229 4861 return -EINVAL; 5230 4862 } ··· 5239 4875 return err; 5240 4876 5241 4877 if (is_ctx_reg(env, insn->dst_reg)) { 5242 - verbose(env, "BPF_ST stores into R%d context is not allowed\n", 5243 - insn->dst_reg); 4878 + verbose(env, "BPF_ST stores into R%d %s is not allowed\n", 4879 + insn->dst_reg, reg_type_str[insn->dst_reg]); 5244 4880 return -EACCES; 5245 4881 } 5246 4882 ··· 5301 4937 do_print_state = true; 5302 4938 continue; 5303 4939 } 4940 + 4941 + err = check_reference_leak(env); 4942 + if (err) 4943 + return err; 5304 4944 5305 4945 /* eBPF calling convetion is such that R0 is used 5306 4946 * to return the value from eBPF program. ··· 5652 5284 } 5653 5285 } 5654 5286 5655 - /* convert load instructions that access fields of 'struct __sk_buff' 5656 - * into sequence of instructions that access fields of 'struct sk_buff' 5287 + /* convert load instructions that access fields of a context type into a 5288 + * sequence of instructions that access fields of the underlying structure: 5289 + * struct __sk_buff -> struct sk_buff 5290 + * struct bpf_sock_ops -> struct sock 5657 5291 */ 5658 5292 static int convert_ctx_accesses(struct bpf_verifier_env *env) 5659 5293 { ··· 5684 5314 } 5685 5315 } 5686 5316 5687 - if (!ops->convert_ctx_access || bpf_prog_is_dev_bound(env->prog->aux)) 5317 + if (bpf_prog_is_dev_bound(env->prog->aux)) 5688 5318 return 0; 5689 5319 5690 5320 insn = env->prog->insnsi + delta; 5691 5321 5692 5322 for (i = 0; i < insn_cnt; i++, insn++) { 5323 + bpf_convert_ctx_access_t convert_ctx_access; 5324 + 5693 5325 if (insn->code == (BPF_LDX | BPF_MEM | BPF_B) || 5694 5326 insn->code == (BPF_LDX | BPF_MEM | BPF_H) || 5695 5327 insn->code == (BPF_LDX | BPF_MEM | BPF_W) || ··· 5733 5361 continue; 5734 5362 } 5735 5363 5736 - if (env->insn_aux_data[i + delta].ptr_type != PTR_TO_CTX) 5364 + switch (env->insn_aux_data[i + delta].ptr_type) { 5365 + case PTR_TO_CTX: 5366 + if (!ops->convert_ctx_access) 5367 + continue; 5368 + convert_ctx_access = ops->convert_ctx_access; 5369 + break; 5370 + case PTR_TO_SOCKET: 5371 + convert_ctx_access = bpf_sock_convert_ctx_access; 5372 + break; 5373 + default: 5737 5374 continue; 5375 + } 5738 5376 5739 5377 ctx_field_size = env->insn_aux_data[i + delta].ctx_field_size; 5740 5378 size = BPF_LDST_BYTES(insn); ··· 5776 5394 } 5777 5395 5778 5396 target_size = 0; 5779 - cnt = ops->convert_ctx_access(type, insn, insn_buf, env->prog, 5780 - &target_size); 5397 + cnt = convert_ctx_access(type, insn, insn_buf, env->prog, 5398 + &target_size); 5781 5399 if (cnt == 0 || cnt >= ARRAY_SIZE(insn_buf) || 5782 5400 (ctx_field_size && !target_size)) { 5783 5401 verbose(env, "bpf verifier is misconfigured\n");
+169 -12
net/core/filter.c
··· 58 58 #include <net/busy_poll.h> 59 59 #include <net/tcp.h> 60 60 #include <net/xfrm.h> 61 + #include <net/udp.h> 61 62 #include <linux/bpf_trace.h> 62 63 #include <net/xdp_sock.h> 63 64 #include <linux/inetdevice.h> 65 + #include <net/inet_hashtables.h> 66 + #include <net/inet6_hashtables.h> 64 67 #include <net/ip_fib.h> 65 68 #include <net/flow.h> 66 69 #include <net/arp.h> 67 70 #include <net/ipv6.h> 71 + #include <net/net_namespace.h> 68 72 #include <linux/seg6_local.h> 69 73 #include <net/seg6.h> 70 74 #include <net/seg6_local.h> ··· 4817 4813 }; 4818 4814 #endif /* CONFIG_IPV6_SEG6_BPF */ 4819 4815 4816 + struct sock *sk_lookup(struct net *net, struct bpf_sock_tuple *tuple, 4817 + struct sk_buff *skb, u8 family, u8 proto) 4818 + { 4819 + int dif = skb->dev->ifindex; 4820 + bool refcounted = false; 4821 + struct sock *sk = NULL; 4822 + 4823 + if (family == AF_INET) { 4824 + __be32 src4 = tuple->ipv4.saddr; 4825 + __be32 dst4 = tuple->ipv4.daddr; 4826 + int sdif = inet_sdif(skb); 4827 + 4828 + if (proto == IPPROTO_TCP) 4829 + sk = __inet_lookup(net, &tcp_hashinfo, skb, 0, 4830 + src4, tuple->ipv4.sport, 4831 + dst4, tuple->ipv4.dport, 4832 + dif, sdif, &refcounted); 4833 + else 4834 + sk = __udp4_lib_lookup(net, src4, tuple->ipv4.sport, 4835 + dst4, tuple->ipv4.dport, 4836 + dif, sdif, &udp_table, skb); 4837 + #if IS_ENABLED(CONFIG_IPV6) 4838 + } else { 4839 + struct in6_addr *src6 = (struct in6_addr *)&tuple->ipv6.saddr; 4840 + struct in6_addr *dst6 = (struct in6_addr *)&tuple->ipv6.daddr; 4841 + int sdif = inet6_sdif(skb); 4842 + 4843 + if (proto == IPPROTO_TCP) 4844 + sk = __inet6_lookup(net, &tcp_hashinfo, skb, 0, 4845 + src6, tuple->ipv6.sport, 4846 + dst6, tuple->ipv6.dport, 4847 + dif, sdif, &refcounted); 4848 + else 4849 + sk = __udp6_lib_lookup(net, src6, tuple->ipv6.sport, 4850 + dst6, tuple->ipv6.dport, 4851 + dif, sdif, &udp_table, skb); 4852 + #endif 4853 + } 4854 + 4855 + if (unlikely(sk && !refcounted && !sock_flag(sk, SOCK_RCU_FREE))) { 4856 + WARN_ONCE(1, "Found non-RCU, unreferenced socket!"); 4857 + sk = NULL; 4858 + } 4859 + return sk; 4860 + } 4861 + 4862 + /* bpf_sk_lookup performs the core lookup for different types of sockets, 4863 + * taking a reference on the socket if it doesn't have the flag SOCK_RCU_FREE. 4864 + * Returns the socket as an 'unsigned long' to simplify the casting in the 4865 + * callers to satisfy BPF_CALL declarations. 4866 + */ 4867 + static unsigned long 4868 + bpf_sk_lookup(struct sk_buff *skb, struct bpf_sock_tuple *tuple, u32 len, 4869 + u8 proto, u64 netns_id, u64 flags) 4870 + { 4871 + struct net *caller_net; 4872 + struct sock *sk = NULL; 4873 + u8 family = AF_UNSPEC; 4874 + struct net *net; 4875 + 4876 + family = len == sizeof(tuple->ipv4) ? AF_INET : AF_INET6; 4877 + if (unlikely(family == AF_UNSPEC || netns_id > U32_MAX || flags)) 4878 + goto out; 4879 + 4880 + if (skb->dev) 4881 + caller_net = dev_net(skb->dev); 4882 + else 4883 + caller_net = sock_net(skb->sk); 4884 + if (netns_id) { 4885 + net = get_net_ns_by_id(caller_net, netns_id); 4886 + if (unlikely(!net)) 4887 + goto out; 4888 + sk = sk_lookup(net, tuple, skb, family, proto); 4889 + put_net(net); 4890 + } else { 4891 + net = caller_net; 4892 + sk = sk_lookup(net, tuple, skb, family, proto); 4893 + } 4894 + 4895 + if (sk) 4896 + sk = sk_to_full_sk(sk); 4897 + out: 4898 + return (unsigned long) sk; 4899 + } 4900 + 4901 + BPF_CALL_5(bpf_sk_lookup_tcp, struct sk_buff *, skb, 4902 + struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags) 4903 + { 4904 + return bpf_sk_lookup(skb, tuple, len, IPPROTO_TCP, netns_id, flags); 4905 + } 4906 + 4907 + static const struct bpf_func_proto bpf_sk_lookup_tcp_proto = { 4908 + .func = bpf_sk_lookup_tcp, 4909 + .gpl_only = false, 4910 + .pkt_access = true, 4911 + .ret_type = RET_PTR_TO_SOCKET_OR_NULL, 4912 + .arg1_type = ARG_PTR_TO_CTX, 4913 + .arg2_type = ARG_PTR_TO_MEM, 4914 + .arg3_type = ARG_CONST_SIZE, 4915 + .arg4_type = ARG_ANYTHING, 4916 + .arg5_type = ARG_ANYTHING, 4917 + }; 4918 + 4919 + BPF_CALL_5(bpf_sk_lookup_udp, struct sk_buff *, skb, 4920 + struct bpf_sock_tuple *, tuple, u32, len, u64, netns_id, u64, flags) 4921 + { 4922 + return bpf_sk_lookup(skb, tuple, len, IPPROTO_UDP, netns_id, flags); 4923 + } 4924 + 4925 + static const struct bpf_func_proto bpf_sk_lookup_udp_proto = { 4926 + .func = bpf_sk_lookup_udp, 4927 + .gpl_only = false, 4928 + .pkt_access = true, 4929 + .ret_type = RET_PTR_TO_SOCKET_OR_NULL, 4930 + .arg1_type = ARG_PTR_TO_CTX, 4931 + .arg2_type = ARG_PTR_TO_MEM, 4932 + .arg3_type = ARG_CONST_SIZE, 4933 + .arg4_type = ARG_ANYTHING, 4934 + .arg5_type = ARG_ANYTHING, 4935 + }; 4936 + 4937 + BPF_CALL_1(bpf_sk_release, struct sock *, sk) 4938 + { 4939 + if (!sock_flag(sk, SOCK_RCU_FREE)) 4940 + sock_gen_put(sk); 4941 + return 0; 4942 + } 4943 + 4944 + static const struct bpf_func_proto bpf_sk_release_proto = { 4945 + .func = bpf_sk_release, 4946 + .gpl_only = false, 4947 + .ret_type = RET_INTEGER, 4948 + .arg1_type = ARG_PTR_TO_SOCKET, 4949 + }; 4950 + 4820 4951 bool bpf_helper_changes_pkt_data(void *func) 4821 4952 { 4822 4953 if (func == bpf_skb_vlan_push || ··· 5158 5019 case BPF_FUNC_skb_ancestor_cgroup_id: 5159 5020 return &bpf_skb_ancestor_cgroup_id_proto; 5160 5021 #endif 5022 + case BPF_FUNC_sk_lookup_tcp: 5023 + return &bpf_sk_lookup_tcp_proto; 5024 + case BPF_FUNC_sk_lookup_udp: 5025 + return &bpf_sk_lookup_udp_proto; 5026 + case BPF_FUNC_sk_release: 5027 + return &bpf_sk_release_proto; 5161 5028 default: 5162 5029 return bpf_base_func_proto(func_id); 5163 5030 } ··· 5264 5119 return &bpf_sk_redirect_hash_proto; 5265 5120 case BPF_FUNC_get_local_storage: 5266 5121 return &bpf_get_local_storage_proto; 5122 + case BPF_FUNC_sk_lookup_tcp: 5123 + return &bpf_sk_lookup_tcp_proto; 5124 + case BPF_FUNC_sk_lookup_udp: 5125 + return &bpf_sk_lookup_udp_proto; 5126 + case BPF_FUNC_sk_release: 5127 + return &bpf_sk_release_proto; 5267 5128 default: 5268 5129 return bpf_base_func_proto(func_id); 5269 5130 } ··· 5545 5394 return size == size_default; 5546 5395 } 5547 5396 5548 - static bool sock_filter_is_valid_access(int off, int size, 5549 - enum bpf_access_type type, 5550 - const struct bpf_prog *prog, 5551 - struct bpf_insn_access_aux *info) 5397 + bool bpf_sock_is_valid_access(int off, int size, enum bpf_access_type type, 5398 + struct bpf_insn_access_aux *info) 5552 5399 { 5553 5400 if (off < 0 || off >= sizeof(struct bpf_sock)) 5554 5401 return false; 5555 5402 if (off % size != 0) 5556 5403 return false; 5557 - if (!__sock_filter_check_attach_type(off, type, 5558 - prog->expected_attach_type)) 5559 - return false; 5560 5404 if (!__sock_filter_check_size(off, size, info)) 5561 5405 return false; 5562 5406 return true; 5407 + } 5408 + 5409 + static bool sock_filter_is_valid_access(int off, int size, 5410 + enum bpf_access_type type, 5411 + const struct bpf_prog *prog, 5412 + struct bpf_insn_access_aux *info) 5413 + { 5414 + if (!bpf_sock_is_valid_access(off, size, type, info)) 5415 + return false; 5416 + return __sock_filter_check_attach_type(off, type, 5417 + prog->expected_attach_type); 5563 5418 } 5564 5419 5565 5420 static int bpf_unclone_prologue(struct bpf_insn *insn_buf, bool direct_write, ··· 6279 6122 return insn - insn_buf; 6280 6123 } 6281 6124 6282 - static u32 sock_filter_convert_ctx_access(enum bpf_access_type type, 6283 - const struct bpf_insn *si, 6284 - struct bpf_insn *insn_buf, 6285 - struct bpf_prog *prog, u32 *target_size) 6125 + u32 bpf_sock_convert_ctx_access(enum bpf_access_type type, 6126 + const struct bpf_insn *si, 6127 + struct bpf_insn *insn_buf, 6128 + struct bpf_prog *prog, u32 *target_size) 6286 6129 { 6287 6130 struct bpf_insn *insn = insn_buf; 6288 6131 int off; ··· 7194 7037 const struct bpf_verifier_ops cg_sock_verifier_ops = { 7195 7038 .get_func_proto = sock_filter_func_proto, 7196 7039 .is_valid_access = sock_filter_is_valid_access, 7197 - .convert_ctx_access = sock_filter_convert_ctx_access, 7040 + .convert_ctx_access = bpf_sock_convert_ctx_access, 7198 7041 }; 7199 7042 7200 7043 const struct bpf_prog_ops cg_sock_prog_ops = {
+92 -1
tools/include/uapi/linux/bpf.h
··· 2144 2144 * request in the skb. 2145 2145 * Return 2146 2146 * 0 on success, or a negative error in case of failure. 2147 + * 2148 + * struct bpf_sock *bpf_sk_lookup_tcp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags) 2149 + * Description 2150 + * Look for TCP socket matching *tuple*, optionally in a child 2151 + * network namespace *netns*. The return value must be checked, 2152 + * and if non-NULL, released via **bpf_sk_release**\ (). 2153 + * 2154 + * The *ctx* should point to the context of the program, such as 2155 + * the skb or socket (depending on the hook in use). This is used 2156 + * to determine the base network namespace for the lookup. 2157 + * 2158 + * *tuple_size* must be one of: 2159 + * 2160 + * **sizeof**\ (*tuple*\ **->ipv4**) 2161 + * Look for an IPv4 socket. 2162 + * **sizeof**\ (*tuple*\ **->ipv6**) 2163 + * Look for an IPv6 socket. 2164 + * 2165 + * If the *netns* is zero, then the socket lookup table in the 2166 + * netns associated with the *ctx* will be used. For the TC hooks, 2167 + * this in the netns of the device in the skb. For socket hooks, 2168 + * this in the netns of the socket. If *netns* is non-zero, then 2169 + * it specifies the ID of the netns relative to the netns 2170 + * associated with the *ctx*. 2171 + * 2172 + * All values for *flags* are reserved for future usage, and must 2173 + * be left at zero. 2174 + * 2175 + * This helper is available only if the kernel was compiled with 2176 + * **CONFIG_NET** configuration option. 2177 + * Return 2178 + * Pointer to *struct bpf_sock*, or NULL in case of failure. 2179 + * 2180 + * struct bpf_sock *bpf_sk_lookup_udp(void *ctx, struct bpf_sock_tuple *tuple, u32 tuple_size, u32 netns, u64 flags) 2181 + * Description 2182 + * Look for UDP socket matching *tuple*, optionally in a child 2183 + * network namespace *netns*. The return value must be checked, 2184 + * and if non-NULL, released via **bpf_sk_release**\ (). 2185 + * 2186 + * The *ctx* should point to the context of the program, such as 2187 + * the skb or socket (depending on the hook in use). This is used 2188 + * to determine the base network namespace for the lookup. 2189 + * 2190 + * *tuple_size* must be one of: 2191 + * 2192 + * **sizeof**\ (*tuple*\ **->ipv4**) 2193 + * Look for an IPv4 socket. 2194 + * **sizeof**\ (*tuple*\ **->ipv6**) 2195 + * Look for an IPv6 socket. 2196 + * 2197 + * If the *netns* is zero, then the socket lookup table in the 2198 + * netns associated with the *ctx* will be used. For the TC hooks, 2199 + * this in the netns of the device in the skb. For socket hooks, 2200 + * this in the netns of the socket. If *netns* is non-zero, then 2201 + * it specifies the ID of the netns relative to the netns 2202 + * associated with the *ctx*. 2203 + * 2204 + * All values for *flags* are reserved for future usage, and must 2205 + * be left at zero. 2206 + * 2207 + * This helper is available only if the kernel was compiled with 2208 + * **CONFIG_NET** configuration option. 2209 + * Return 2210 + * Pointer to *struct bpf_sock*, or NULL in case of failure. 2211 + * 2212 + * int bpf_sk_release(struct bpf_sock *sk) 2213 + * Description 2214 + * Release the reference held by *sock*. *sock* must be a non-NULL 2215 + * pointer that was returned from bpf_sk_lookup_xxx\ (). 2216 + * Return 2217 + * 0 on success, or a negative error in case of failure. 2147 2218 */ 2148 2219 #define __BPF_FUNC_MAPPER(FN) \ 2149 2220 FN(unspec), \ ··· 2300 2229 FN(get_current_cgroup_id), \ 2301 2230 FN(get_local_storage), \ 2302 2231 FN(sk_select_reuseport), \ 2303 - FN(skb_ancestor_cgroup_id), 2232 + FN(skb_ancestor_cgroup_id), \ 2233 + FN(sk_lookup_tcp), \ 2234 + FN(sk_lookup_udp), \ 2235 + FN(sk_release), 2304 2236 2305 2237 /* integer value in 'imm' field of BPF_CALL instruction selects which helper 2306 2238 * function eBPF program intends to call ··· 2471 2397 __u32 src_port; /* Allows 4-byte read. 2472 2398 * Stored in host byte order 2473 2399 */ 2400 + }; 2401 + 2402 + struct bpf_sock_tuple { 2403 + union { 2404 + struct { 2405 + __be32 saddr; 2406 + __be32 daddr; 2407 + __be16 sport; 2408 + __be16 dport; 2409 + } ipv4; 2410 + struct { 2411 + __be32 saddr[4]; 2412 + __be32 daddr[4]; 2413 + __be16 sport; 2414 + __be16 dport; 2415 + } ipv6; 2416 + }; 2474 2417 }; 2475 2418 2476 2419 #define XDP_PACKET_HEADROOM 256
+2 -2
tools/lib/bpf/libbpf.c
··· 228 228 }; 229 229 #define obj_elf_valid(o) ((o)->efile.elf) 230 230 231 - static void bpf_program__unload(struct bpf_program *prog) 231 + void bpf_program__unload(struct bpf_program *prog) 232 232 { 233 233 int i; 234 234 ··· 1375 1375 return ret; 1376 1376 } 1377 1377 1378 - static int 1378 + int 1379 1379 bpf_program__load(struct bpf_program *prog, 1380 1380 char *license, u32 kern_version) 1381 1381 {
+3
tools/lib/bpf/libbpf.h
··· 128 128 129 129 const char *bpf_program__title(struct bpf_program *prog, bool needs_copy); 130 130 131 + int bpf_program__load(struct bpf_program *prog, char *license, 132 + u32 kern_version); 131 133 int bpf_program__fd(struct bpf_program *prog); 132 134 int bpf_program__pin_instance(struct bpf_program *prog, const char *path, 133 135 int instance); 134 136 int bpf_program__pin(struct bpf_program *prog, const char *path); 137 + void bpf_program__unload(struct bpf_program *prog); 135 138 136 139 struct bpf_insn; 137 140
+1 -1
tools/testing/selftests/bpf/Makefile
··· 36 36 test_get_stack_rawtp.o test_sockmap_kern.o test_sockhash_kern.o \ 37 37 test_lwt_seg6local.o sendmsg4_prog.o sendmsg6_prog.o test_lirc_mode2_kern.o \ 38 38 get_cgroup_id_kern.o socket_cookie_prog.o test_select_reuseport_kern.o \ 39 - test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o 39 + test_skb_cgroup_id_kern.o bpf_flow.o netcnt_prog.o test_sk_lookup_kern.o 40 40 41 41 # Order correspond to 'make run_tests' order 42 42 TEST_PROGS := test_kmod.sh \
+12
tools/testing/selftests/bpf/bpf_helpers.h
··· 143 143 (void *) BPF_FUNC_skb_cgroup_id; 144 144 static unsigned long long (*bpf_skb_ancestor_cgroup_id)(void *ctx, int level) = 145 145 (void *) BPF_FUNC_skb_ancestor_cgroup_id; 146 + static struct bpf_sock *(*bpf_sk_lookup_tcp)(void *ctx, 147 + struct bpf_sock_tuple *tuple, 148 + int size, unsigned int netns_id, 149 + unsigned long long flags) = 150 + (void *) BPF_FUNC_sk_lookup_tcp; 151 + static struct bpf_sock *(*bpf_sk_lookup_udp)(void *ctx, 152 + struct bpf_sock_tuple *tuple, 153 + int size, unsigned int netns_id, 154 + unsigned long long flags) = 155 + (void *) BPF_FUNC_sk_lookup_udp; 156 + static int (*bpf_sk_release)(struct bpf_sock *sk) = 157 + (void *) BPF_FUNC_sk_release; 146 158 147 159 /* llvm builtin functions that eBPF C program may use to 148 160 * emit BPF_LD_ABS and BPF_LD_IND instructions
+38
tools/testing/selftests/bpf/test_progs.c
··· 1698 1698 "sys_enter_read"); 1699 1699 } 1700 1700 1701 + static void test_reference_tracking() 1702 + { 1703 + const char *file = "./test_sk_lookup_kern.o"; 1704 + struct bpf_object *obj; 1705 + struct bpf_program *prog; 1706 + __u32 duration; 1707 + int err = 0; 1708 + 1709 + obj = bpf_object__open(file); 1710 + if (IS_ERR(obj)) { 1711 + error_cnt++; 1712 + return; 1713 + } 1714 + 1715 + bpf_object__for_each_program(prog, obj) { 1716 + const char *title; 1717 + 1718 + /* Ignore .text sections */ 1719 + title = bpf_program__title(prog, false); 1720 + if (strstr(title, ".text") != NULL) 1721 + continue; 1722 + 1723 + bpf_program__set_type(prog, BPF_PROG_TYPE_SCHED_CLS); 1724 + 1725 + /* Expect verifier failure if test name has 'fail' */ 1726 + if (strstr(title, "fail") != NULL) { 1727 + libbpf_set_print(NULL, NULL, NULL); 1728 + err = !bpf_program__load(prog, "GPL", 0); 1729 + libbpf_set_print(printf, printf, NULL); 1730 + } else { 1731 + err = bpf_program__load(prog, "GPL", 0); 1732 + } 1733 + CHECK(err, title, "\n"); 1734 + } 1735 + bpf_object__close(obj); 1736 + } 1737 + 1701 1738 int main(void) 1702 1739 { 1703 1740 jit_enabled = is_jit_enabled(); ··· 1756 1719 test_get_stack_raw_tp(); 1757 1720 test_task_fd_query_rawtp(); 1758 1721 test_task_fd_query_tp(); 1722 + test_reference_tracking(); 1759 1723 1760 1724 printf("Summary: %d PASSED, %d FAILED\n", pass_cnt, error_cnt); 1761 1725 return error_cnt ? EXIT_FAILURE : EXIT_SUCCESS;
+180
tools/testing/selftests/bpf/test_sk_lookup_kern.c
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + // Copyright (c) 2018 Covalent IO, Inc. http://covalent.io 3 + 4 + #include <stddef.h> 5 + #include <stdbool.h> 6 + #include <string.h> 7 + #include <linux/bpf.h> 8 + #include <linux/if_ether.h> 9 + #include <linux/in.h> 10 + #include <linux/ip.h> 11 + #include <linux/ipv6.h> 12 + #include <linux/pkt_cls.h> 13 + #include <linux/tcp.h> 14 + #include <sys/socket.h> 15 + #include "bpf_helpers.h" 16 + #include "bpf_endian.h" 17 + 18 + int _version SEC("version") = 1; 19 + char _license[] SEC("license") = "GPL"; 20 + 21 + /* Fill 'tuple' with L3 info, and attempt to find L4. On fail, return NULL. */ 22 + static struct bpf_sock_tuple *get_tuple(void *data, __u64 nh_off, 23 + void *data_end, __u16 eth_proto, 24 + bool *ipv4) 25 + { 26 + struct bpf_sock_tuple *result; 27 + __u8 proto = 0; 28 + __u64 ihl_len; 29 + 30 + if (eth_proto == bpf_htons(ETH_P_IP)) { 31 + struct iphdr *iph = (struct iphdr *)(data + nh_off); 32 + 33 + if (iph + 1 > data_end) 34 + return NULL; 35 + ihl_len = iph->ihl * 4; 36 + proto = iph->protocol; 37 + *ipv4 = true; 38 + result = (struct bpf_sock_tuple *)&iph->saddr; 39 + } else if (eth_proto == bpf_htons(ETH_P_IPV6)) { 40 + struct ipv6hdr *ip6h = (struct ipv6hdr *)(data + nh_off); 41 + 42 + if (ip6h + 1 > data_end) 43 + return NULL; 44 + ihl_len = sizeof(*ip6h); 45 + proto = ip6h->nexthdr; 46 + *ipv4 = true; 47 + result = (struct bpf_sock_tuple *)&ip6h->saddr; 48 + } 49 + 50 + if (data + nh_off + ihl_len > data_end || proto != IPPROTO_TCP) 51 + return NULL; 52 + 53 + return result; 54 + } 55 + 56 + SEC("sk_lookup_success") 57 + int bpf_sk_lookup_test0(struct __sk_buff *skb) 58 + { 59 + void *data_end = (void *)(long)skb->data_end; 60 + void *data = (void *)(long)skb->data; 61 + struct ethhdr *eth = (struct ethhdr *)(data); 62 + struct bpf_sock_tuple *tuple; 63 + struct bpf_sock *sk; 64 + size_t tuple_len; 65 + bool ipv4; 66 + 67 + if (eth + 1 > data_end) 68 + return TC_ACT_SHOT; 69 + 70 + tuple = get_tuple(data, sizeof(*eth), data_end, eth->h_proto, &ipv4); 71 + if (!tuple || tuple + sizeof *tuple > data_end) 72 + return TC_ACT_SHOT; 73 + 74 + tuple_len = ipv4 ? sizeof(tuple->ipv4) : sizeof(tuple->ipv6); 75 + sk = bpf_sk_lookup_tcp(skb, tuple, tuple_len, 0, 0); 76 + if (sk) 77 + bpf_sk_release(sk); 78 + return sk ? TC_ACT_OK : TC_ACT_UNSPEC; 79 + } 80 + 81 + SEC("sk_lookup_success_simple") 82 + int bpf_sk_lookup_test1(struct __sk_buff *skb) 83 + { 84 + struct bpf_sock_tuple tuple = {}; 85 + struct bpf_sock *sk; 86 + 87 + sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); 88 + if (sk) 89 + bpf_sk_release(sk); 90 + return 0; 91 + } 92 + 93 + SEC("fail_use_after_free") 94 + int bpf_sk_lookup_uaf(struct __sk_buff *skb) 95 + { 96 + struct bpf_sock_tuple tuple = {}; 97 + struct bpf_sock *sk; 98 + __u32 family = 0; 99 + 100 + sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); 101 + if (sk) { 102 + bpf_sk_release(sk); 103 + family = sk->family; 104 + } 105 + return family; 106 + } 107 + 108 + SEC("fail_modify_sk_pointer") 109 + int bpf_sk_lookup_modptr(struct __sk_buff *skb) 110 + { 111 + struct bpf_sock_tuple tuple = {}; 112 + struct bpf_sock *sk; 113 + __u32 family; 114 + 115 + sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); 116 + if (sk) { 117 + sk += 1; 118 + bpf_sk_release(sk); 119 + } 120 + return 0; 121 + } 122 + 123 + SEC("fail_modify_sk_or_null_pointer") 124 + int bpf_sk_lookup_modptr_or_null(struct __sk_buff *skb) 125 + { 126 + struct bpf_sock_tuple tuple = {}; 127 + struct bpf_sock *sk; 128 + __u32 family; 129 + 130 + sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); 131 + sk += 1; 132 + if (sk) 133 + bpf_sk_release(sk); 134 + return 0; 135 + } 136 + 137 + SEC("fail_no_release") 138 + int bpf_sk_lookup_test2(struct __sk_buff *skb) 139 + { 140 + struct bpf_sock_tuple tuple = {}; 141 + 142 + bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); 143 + return 0; 144 + } 145 + 146 + SEC("fail_release_twice") 147 + int bpf_sk_lookup_test3(struct __sk_buff *skb) 148 + { 149 + struct bpf_sock_tuple tuple = {}; 150 + struct bpf_sock *sk; 151 + 152 + sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); 153 + bpf_sk_release(sk); 154 + bpf_sk_release(sk); 155 + return 0; 156 + } 157 + 158 + SEC("fail_release_unchecked") 159 + int bpf_sk_lookup_test4(struct __sk_buff *skb) 160 + { 161 + struct bpf_sock_tuple tuple = {}; 162 + struct bpf_sock *sk; 163 + 164 + sk = bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); 165 + bpf_sk_release(sk); 166 + return 0; 167 + } 168 + 169 + void lookup_no_release(struct __sk_buff *skb) 170 + { 171 + struct bpf_sock_tuple tuple = {}; 172 + bpf_sk_lookup_tcp(skb, &tuple, sizeof(tuple), 0, 0); 173 + } 174 + 175 + SEC("fail_no_release_subcall") 176 + int bpf_sk_lookup_test5(struct __sk_buff *skb) 177 + { 178 + lookup_no_release(skb); 179 + return 0; 180 + }
+788 -26
tools/testing/selftests/bpf/test_verifier.c
··· 3 3 * 4 4 * Copyright (c) 2014 PLUMgrid, http://plumgrid.com 5 5 * Copyright (c) 2017 Facebook 6 + * Copyright (c) 2018 Covalent IO, Inc. http://covalent.io 6 7 * 7 8 * This program is free software; you can redistribute it and/or 8 9 * modify it under the terms of version 2 of the GNU General Public ··· 178 177 res ^= (res >> 32); 179 178 self->retval = (uint32_t)res; 180 179 } 180 + 181 + /* BPF_SK_LOOKUP contains 13 instructions, if you need to fix up maps */ 182 + #define BPF_SK_LOOKUP \ 183 + /* struct bpf_sock_tuple tuple = {} */ \ 184 + BPF_MOV64_IMM(BPF_REG_2, 0), \ 185 + BPF_STX_MEM(BPF_W, BPF_REG_10, BPF_REG_2, -8), \ 186 + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -16), \ 187 + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -24), \ 188 + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -32), \ 189 + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -40), \ 190 + BPF_STX_MEM(BPF_DW, BPF_REG_10, BPF_REG_2, -48), \ 191 + /* sk = sk_lookup_tcp(ctx, &tuple, sizeof tuple, 0, 0) */ \ 192 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_10), \ 193 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_2, -48), \ 194 + BPF_MOV64_IMM(BPF_REG_3, sizeof(struct bpf_sock_tuple)), \ 195 + BPF_MOV64_IMM(BPF_REG_4, 0), \ 196 + BPF_MOV64_IMM(BPF_REG_5, 0), \ 197 + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp) 181 198 182 199 static struct bpf_test tests[] = { 183 200 { ··· 2727 2708 .prog_type = BPF_PROG_TYPE_SCHED_CLS, 2728 2709 }, 2729 2710 { 2711 + "unpriv: spill/fill of different pointers stx - ctx and sock", 2712 + .insns = { 2713 + BPF_MOV64_REG(BPF_REG_8, BPF_REG_1), 2714 + /* struct bpf_sock *sock = bpf_sock_lookup(...); */ 2715 + BPF_SK_LOOKUP, 2716 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), 2717 + /* u64 foo; */ 2718 + /* void *target = &foo; */ 2719 + BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10), 2720 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8), 2721 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_8), 2722 + /* if (skb == NULL) *target = sock; */ 2723 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), 2724 + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_2, 0), 2725 + /* else *target = skb; */ 2726 + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), 2727 + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0), 2728 + /* struct __sk_buff *skb = *target; */ 2729 + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 0), 2730 + /* skb->mark = 42; */ 2731 + BPF_MOV64_IMM(BPF_REG_3, 42), 2732 + BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, 2733 + offsetof(struct __sk_buff, mark)), 2734 + /* if (sk) bpf_sk_release(sk) */ 2735 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), 2736 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 2737 + BPF_MOV64_IMM(BPF_REG_0, 0), 2738 + BPF_EXIT_INSN(), 2739 + }, 2740 + .result = REJECT, 2741 + .errstr = "type=ctx expected=sock", 2742 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 2743 + }, 2744 + { 2745 + "unpriv: spill/fill of different pointers stx - leak sock", 2746 + .insns = { 2747 + BPF_MOV64_REG(BPF_REG_8, BPF_REG_1), 2748 + /* struct bpf_sock *sock = bpf_sock_lookup(...); */ 2749 + BPF_SK_LOOKUP, 2750 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), 2751 + /* u64 foo; */ 2752 + /* void *target = &foo; */ 2753 + BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10), 2754 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8), 2755 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_8), 2756 + /* if (skb == NULL) *target = sock; */ 2757 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), 2758 + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_2, 0), 2759 + /* else *target = skb; */ 2760 + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), 2761 + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0), 2762 + /* struct __sk_buff *skb = *target; */ 2763 + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 0), 2764 + /* skb->mark = 42; */ 2765 + BPF_MOV64_IMM(BPF_REG_3, 42), 2766 + BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, 2767 + offsetof(struct __sk_buff, mark)), 2768 + BPF_EXIT_INSN(), 2769 + }, 2770 + .result = REJECT, 2771 + //.errstr = "same insn cannot be used with different pointers", 2772 + .errstr = "Unreleased reference", 2773 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 2774 + }, 2775 + { 2776 + "unpriv: spill/fill of different pointers stx - sock and ctx (read)", 2777 + .insns = { 2778 + BPF_MOV64_REG(BPF_REG_8, BPF_REG_1), 2779 + /* struct bpf_sock *sock = bpf_sock_lookup(...); */ 2780 + BPF_SK_LOOKUP, 2781 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), 2782 + /* u64 foo; */ 2783 + /* void *target = &foo; */ 2784 + BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10), 2785 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8), 2786 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_8), 2787 + /* if (skb) *target = skb */ 2788 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), 2789 + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0), 2790 + /* else *target = sock */ 2791 + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), 2792 + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_2, 0), 2793 + /* struct bpf_sock *sk = *target; */ 2794 + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 0), 2795 + /* if (sk) u32 foo = sk->mark; bpf_sk_release(sk); */ 2796 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 2), 2797 + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, 2798 + offsetof(struct bpf_sock, mark)), 2799 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 2800 + BPF_MOV64_IMM(BPF_REG_0, 0), 2801 + BPF_EXIT_INSN(), 2802 + }, 2803 + .result = REJECT, 2804 + .errstr = "same insn cannot be used with different pointers", 2805 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 2806 + }, 2807 + { 2808 + "unpriv: spill/fill of different pointers stx - sock and ctx (write)", 2809 + .insns = { 2810 + BPF_MOV64_REG(BPF_REG_8, BPF_REG_1), 2811 + /* struct bpf_sock *sock = bpf_sock_lookup(...); */ 2812 + BPF_SK_LOOKUP, 2813 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_0), 2814 + /* u64 foo; */ 2815 + /* void *target = &foo; */ 2816 + BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10), 2817 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_6, -8), 2818 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_8), 2819 + /* if (skb) *target = skb */ 2820 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), 2821 + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_1, 0), 2822 + /* else *target = sock */ 2823 + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 1), 2824 + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_2, 0), 2825 + /* struct bpf_sock *sk = *target; */ 2826 + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_6, 0), 2827 + /* if (sk) sk->mark = 42; bpf_sk_release(sk); */ 2828 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 3), 2829 + BPF_MOV64_IMM(BPF_REG_3, 42), 2830 + BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_3, 2831 + offsetof(struct bpf_sock, mark)), 2832 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 2833 + BPF_MOV64_IMM(BPF_REG_0, 0), 2834 + BPF_EXIT_INSN(), 2835 + }, 2836 + .result = REJECT, 2837 + //.errstr = "same insn cannot be used with different pointers", 2838 + .errstr = "cannot write into socket", 2839 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 2840 + }, 2841 + { 2730 2842 "unpriv: spill/fill of different pointers ldx", 2731 2843 .insns = { 2732 2844 BPF_ALU64_REG(BPF_MOV, BPF_REG_6, BPF_REG_10), ··· 3426 3276 BPF_ST_MEM(BPF_DW, BPF_REG_1, offsetof(struct __sk_buff, mark), 0), 3427 3277 BPF_EXIT_INSN(), 3428 3278 }, 3429 - .errstr = "BPF_ST stores into R1 context is not allowed", 3279 + .errstr = "BPF_ST stores into R1 inv is not allowed", 3430 3280 .result = REJECT, 3431 3281 .prog_type = BPF_PROG_TYPE_SCHED_CLS, 3432 3282 }, ··· 3438 3288 BPF_REG_0, offsetof(struct __sk_buff, mark), 0), 3439 3289 BPF_EXIT_INSN(), 3440 3290 }, 3441 - .errstr = "BPF_XADD stores into R1 context is not allowed", 3291 + .errstr = "BPF_XADD stores into R1 inv is not allowed", 3442 3292 .result = REJECT, 3443 3293 .prog_type = BPF_PROG_TYPE_SCHED_CLS, 3444 3294 }, ··· 3788 3638 BPF_MOV64_IMM(BPF_REG_0, 0), 3789 3639 BPF_EXIT_INSN(), 3790 3640 }, 3791 - .errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END", 3641 + .errstr = "R3 pointer arithmetic on pkt_end", 3792 3642 .result = REJECT, 3793 3643 .prog_type = BPF_PROG_TYPE_SCHED_CLS, 3794 3644 }, ··· 5046 4896 BPF_EXIT_INSN(), 5047 4897 }, 5048 4898 .fixup_map1 = { 4 }, 5049 - .errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL", 4899 + .errstr = "R4 pointer arithmetic on map_value_or_null", 5050 4900 .result = REJECT, 5051 4901 .prog_type = BPF_PROG_TYPE_SCHED_CLS 5052 4902 }, ··· 5067 4917 BPF_EXIT_INSN(), 5068 4918 }, 5069 4919 .fixup_map1 = { 4 }, 5070 - .errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL", 4920 + .errstr = "R4 pointer arithmetic on map_value_or_null", 5071 4921 .result = REJECT, 5072 4922 .prog_type = BPF_PROG_TYPE_SCHED_CLS 5073 4923 }, ··· 5088 4938 BPF_EXIT_INSN(), 5089 4939 }, 5090 4940 .fixup_map1 = { 4 }, 5091 - .errstr = "R4 pointer arithmetic on PTR_TO_MAP_VALUE_OR_NULL", 4941 + .errstr = "R4 pointer arithmetic on map_value_or_null", 5092 4942 .result = REJECT, 5093 4943 .prog_type = BPF_PROG_TYPE_SCHED_CLS 5094 4944 }, ··· 5416 5266 .errstr_unpriv = "R2 leaks addr into mem", 5417 5267 .result_unpriv = REJECT, 5418 5268 .result = REJECT, 5419 - .errstr = "BPF_XADD stores into R1 context is not allowed", 5269 + .errstr = "BPF_XADD stores into R1 inv is not allowed", 5420 5270 }, 5421 5271 { 5422 5272 "leak pointer into ctx 2", ··· 5431 5281 .errstr_unpriv = "R10 leaks addr into mem", 5432 5282 .result_unpriv = REJECT, 5433 5283 .result = REJECT, 5434 - .errstr = "BPF_XADD stores into R1 context is not allowed", 5284 + .errstr = "BPF_XADD stores into R1 inv is not allowed", 5435 5285 }, 5436 5286 { 5437 5287 "leak pointer into ctx 3", ··· 7403 7253 BPF_EXIT_INSN(), 7404 7254 }, 7405 7255 .fixup_map_in_map = { 3 }, 7406 - .errstr = "R1 pointer arithmetic on CONST_PTR_TO_MAP prohibited", 7256 + .errstr = "R1 pointer arithmetic on map_ptr prohibited", 7407 7257 .result = REJECT, 7408 7258 }, 7409 7259 { ··· 9077 8927 BPF_MOV64_IMM(BPF_REG_0, 0), 9078 8928 BPF_EXIT_INSN(), 9079 8929 }, 9080 - .errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END", 8930 + .errstr = "R3 pointer arithmetic on pkt_end", 9081 8931 .result = REJECT, 9082 8932 .prog_type = BPF_PROG_TYPE_XDP, 9083 8933 }, ··· 9096 8946 BPF_MOV64_IMM(BPF_REG_0, 0), 9097 8947 BPF_EXIT_INSN(), 9098 8948 }, 9099 - .errstr = "R3 pointer arithmetic on PTR_TO_PACKET_END", 8949 + .errstr = "R3 pointer arithmetic on pkt_end", 9100 8950 .result = REJECT, 9101 8951 .prog_type = BPF_PROG_TYPE_XDP, 9102 8952 }, ··· 12380 12230 BPF_EXIT_INSN(), 12381 12231 }, 12382 12232 .result = REJECT, 12383 - .errstr = "BPF_XADD stores into R2 packet", 12233 + .errstr = "BPF_XADD stores into R2 ctx", 12384 12234 .prog_type = BPF_PROG_TYPE_XDP, 12385 12235 }, 12386 12236 { ··· 12708 12558 .result = ACCEPT, 12709 12559 }, 12710 12560 { 12561 + "reference tracking: leak potential reference", 12562 + .insns = { 12563 + BPF_SK_LOOKUP, 12564 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), /* leak reference */ 12565 + BPF_EXIT_INSN(), 12566 + }, 12567 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12568 + .errstr = "Unreleased reference", 12569 + .result = REJECT, 12570 + }, 12571 + { 12572 + "reference tracking: leak potential reference on stack", 12573 + .insns = { 12574 + BPF_SK_LOOKUP, 12575 + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10), 12576 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8), 12577 + BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0), 12578 + BPF_MOV64_IMM(BPF_REG_0, 0), 12579 + BPF_EXIT_INSN(), 12580 + }, 12581 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12582 + .errstr = "Unreleased reference", 12583 + .result = REJECT, 12584 + }, 12585 + { 12586 + "reference tracking: leak potential reference on stack 2", 12587 + .insns = { 12588 + BPF_SK_LOOKUP, 12589 + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10), 12590 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8), 12591 + BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0), 12592 + BPF_MOV64_IMM(BPF_REG_0, 0), 12593 + BPF_ST_MEM(BPF_DW, BPF_REG_4, 0, 0), 12594 + BPF_EXIT_INSN(), 12595 + }, 12596 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12597 + .errstr = "Unreleased reference", 12598 + .result = REJECT, 12599 + }, 12600 + { 12601 + "reference tracking: zero potential reference", 12602 + .insns = { 12603 + BPF_SK_LOOKUP, 12604 + BPF_MOV64_IMM(BPF_REG_0, 0), /* leak reference */ 12605 + BPF_EXIT_INSN(), 12606 + }, 12607 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12608 + .errstr = "Unreleased reference", 12609 + .result = REJECT, 12610 + }, 12611 + { 12612 + "reference tracking: copy and zero potential references", 12613 + .insns = { 12614 + BPF_SK_LOOKUP, 12615 + BPF_MOV64_REG(BPF_REG_7, BPF_REG_0), 12616 + BPF_MOV64_IMM(BPF_REG_0, 0), 12617 + BPF_MOV64_IMM(BPF_REG_7, 0), /* leak reference */ 12618 + BPF_EXIT_INSN(), 12619 + }, 12620 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12621 + .errstr = "Unreleased reference", 12622 + .result = REJECT, 12623 + }, 12624 + { 12625 + "reference tracking: release reference without check", 12626 + .insns = { 12627 + BPF_SK_LOOKUP, 12628 + /* reference in r0 may be NULL */ 12629 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12630 + BPF_MOV64_IMM(BPF_REG_2, 0), 12631 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12632 + BPF_EXIT_INSN(), 12633 + }, 12634 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12635 + .errstr = "type=sock_or_null expected=sock", 12636 + .result = REJECT, 12637 + }, 12638 + { 12639 + "reference tracking: release reference", 12640 + .insns = { 12641 + BPF_SK_LOOKUP, 12642 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12643 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), 12644 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12645 + BPF_EXIT_INSN(), 12646 + }, 12647 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12648 + .result = ACCEPT, 12649 + }, 12650 + { 12651 + "reference tracking: release reference 2", 12652 + .insns = { 12653 + BPF_SK_LOOKUP, 12654 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12655 + BPF_JMP_IMM(BPF_JNE, BPF_REG_0, 0, 1), 12656 + BPF_EXIT_INSN(), 12657 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12658 + BPF_EXIT_INSN(), 12659 + }, 12660 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12661 + .result = ACCEPT, 12662 + }, 12663 + { 12664 + "reference tracking: release reference twice", 12665 + .insns = { 12666 + BPF_SK_LOOKUP, 12667 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12668 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 12669 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), 12670 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12671 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 12672 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12673 + BPF_EXIT_INSN(), 12674 + }, 12675 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12676 + .errstr = "type=inv expected=sock", 12677 + .result = REJECT, 12678 + }, 12679 + { 12680 + "reference tracking: release reference twice inside branch", 12681 + .insns = { 12682 + BPF_SK_LOOKUP, 12683 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12684 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 12685 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), /* goto end */ 12686 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12687 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 12688 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12689 + BPF_EXIT_INSN(), 12690 + }, 12691 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12692 + .errstr = "type=inv expected=sock", 12693 + .result = REJECT, 12694 + }, 12695 + { 12696 + "reference tracking: alloc, check, free in one subbranch", 12697 + .insns = { 12698 + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 12699 + offsetof(struct __sk_buff, data)), 12700 + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, 12701 + offsetof(struct __sk_buff, data_end)), 12702 + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), 12703 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 16), 12704 + /* if (offsetof(skb, mark) > data_len) exit; */ 12705 + BPF_JMP_REG(BPF_JLE, BPF_REG_0, BPF_REG_3, 1), 12706 + BPF_EXIT_INSN(), 12707 + BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_2, 12708 + offsetof(struct __sk_buff, mark)), 12709 + BPF_SK_LOOKUP, 12710 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 1), /* mark == 0? */ 12711 + /* Leak reference in R0 */ 12712 + BPF_EXIT_INSN(), 12713 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), /* sk NULL? */ 12714 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12715 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12716 + BPF_EXIT_INSN(), 12717 + }, 12718 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12719 + .errstr = "Unreleased reference", 12720 + .result = REJECT, 12721 + }, 12722 + { 12723 + "reference tracking: alloc, check, free in both subbranches", 12724 + .insns = { 12725 + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 12726 + offsetof(struct __sk_buff, data)), 12727 + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, 12728 + offsetof(struct __sk_buff, data_end)), 12729 + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), 12730 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 16), 12731 + /* if (offsetof(skb, mark) > data_len) exit; */ 12732 + BPF_JMP_REG(BPF_JLE, BPF_REG_0, BPF_REG_3, 1), 12733 + BPF_EXIT_INSN(), 12734 + BPF_LDX_MEM(BPF_W, BPF_REG_6, BPF_REG_2, 12735 + offsetof(struct __sk_buff, mark)), 12736 + BPF_SK_LOOKUP, 12737 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_6, 0, 4), /* mark == 0? */ 12738 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), /* sk NULL? */ 12739 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12740 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12741 + BPF_EXIT_INSN(), 12742 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), /* sk NULL? */ 12743 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12744 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12745 + BPF_EXIT_INSN(), 12746 + }, 12747 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12748 + .result = ACCEPT, 12749 + }, 12750 + { 12751 + "reference tracking in call: free reference in subprog", 12752 + .insns = { 12753 + BPF_SK_LOOKUP, 12754 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), /* unchecked reference */ 12755 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2), 12756 + BPF_MOV64_IMM(BPF_REG_0, 0), 12757 + BPF_EXIT_INSN(), 12758 + 12759 + /* subprog 1 */ 12760 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), 12761 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_2, 0, 1), 12762 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12763 + BPF_EXIT_INSN(), 12764 + }, 12765 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12766 + .result = ACCEPT, 12767 + }, 12768 + { 12711 12769 "pass modified ctx pointer to helper, 1", 12712 12770 .insns = { 12713 12771 BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -612), ··· 12985 12627 .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12986 12628 .result = ACCEPT, 12987 12629 }, 12630 + { 12631 + "reference tracking in call: free reference in subprog and outside", 12632 + .insns = { 12633 + BPF_SK_LOOKUP, 12634 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), /* unchecked reference */ 12635 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 12636 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3), 12637 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 12638 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12639 + BPF_EXIT_INSN(), 12640 + 12641 + /* subprog 1 */ 12642 + BPF_MOV64_REG(BPF_REG_2, BPF_REG_1), 12643 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_2, 0, 1), 12644 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12645 + BPF_EXIT_INSN(), 12646 + }, 12647 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12648 + .errstr = "type=inv expected=sock", 12649 + .result = REJECT, 12650 + }, 12651 + { 12652 + "reference tracking in call: alloc & leak reference in subprog", 12653 + .insns = { 12654 + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10), 12655 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8), 12656 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 3), 12657 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12658 + BPF_MOV64_IMM(BPF_REG_0, 0), 12659 + BPF_EXIT_INSN(), 12660 + 12661 + /* subprog 1 */ 12662 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_4), 12663 + BPF_SK_LOOKUP, 12664 + /* spill unchecked sk_ptr into stack of caller */ 12665 + BPF_STX_MEM(BPF_DW, BPF_REG_6, BPF_REG_0, 0), 12666 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12667 + BPF_EXIT_INSN(), 12668 + }, 12669 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12670 + .errstr = "Unreleased reference", 12671 + .result = REJECT, 12672 + }, 12673 + { 12674 + "reference tracking in call: alloc in subprog, release outside", 12675 + .insns = { 12676 + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10), 12677 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 4), 12678 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12679 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), 12680 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12681 + BPF_EXIT_INSN(), 12682 + 12683 + /* subprog 1 */ 12684 + BPF_SK_LOOKUP, 12685 + BPF_EXIT_INSN(), /* return sk */ 12686 + }, 12687 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12688 + .retval = POINTER_VALUE, 12689 + .result = ACCEPT, 12690 + }, 12691 + { 12692 + "reference tracking in call: sk_ptr leak into caller stack", 12693 + .insns = { 12694 + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10), 12695 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8), 12696 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2), 12697 + BPF_MOV64_IMM(BPF_REG_0, 0), 12698 + BPF_EXIT_INSN(), 12699 + 12700 + /* subprog 1 */ 12701 + BPF_MOV64_REG(BPF_REG_5, BPF_REG_10), 12702 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, -8), 12703 + BPF_STX_MEM(BPF_DW, BPF_REG_5, BPF_REG_4, 0), 12704 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 5), 12705 + /* spill unchecked sk_ptr into stack of caller */ 12706 + BPF_MOV64_REG(BPF_REG_5, BPF_REG_10), 12707 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, -8), 12708 + BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_5, 0), 12709 + BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0), 12710 + BPF_EXIT_INSN(), 12711 + 12712 + /* subprog 2 */ 12713 + BPF_SK_LOOKUP, 12714 + BPF_EXIT_INSN(), 12715 + }, 12716 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12717 + .errstr = "Unreleased reference", 12718 + .result = REJECT, 12719 + }, 12720 + { 12721 + "reference tracking in call: sk_ptr spill into caller stack", 12722 + .insns = { 12723 + BPF_MOV64_REG(BPF_REG_4, BPF_REG_10), 12724 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_4, -8), 12725 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 2), 12726 + BPF_MOV64_IMM(BPF_REG_0, 0), 12727 + BPF_EXIT_INSN(), 12728 + 12729 + /* subprog 1 */ 12730 + BPF_MOV64_REG(BPF_REG_5, BPF_REG_10), 12731 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, -8), 12732 + BPF_STX_MEM(BPF_DW, BPF_REG_5, BPF_REG_4, 0), 12733 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 1, 0, 8), 12734 + /* spill unchecked sk_ptr into stack of caller */ 12735 + BPF_MOV64_REG(BPF_REG_5, BPF_REG_10), 12736 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_5, -8), 12737 + BPF_LDX_MEM(BPF_DW, BPF_REG_4, BPF_REG_5, 0), 12738 + BPF_STX_MEM(BPF_DW, BPF_REG_4, BPF_REG_0, 0), 12739 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), 12740 + /* now the sk_ptr is verified, free the reference */ 12741 + BPF_LDX_MEM(BPF_DW, BPF_REG_1, BPF_REG_4, 0), 12742 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12743 + BPF_EXIT_INSN(), 12744 + 12745 + /* subprog 2 */ 12746 + BPF_SK_LOOKUP, 12747 + BPF_EXIT_INSN(), 12748 + }, 12749 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12750 + .result = ACCEPT, 12751 + }, 12752 + { 12753 + "reference tracking: allow LD_ABS", 12754 + .insns = { 12755 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), 12756 + BPF_SK_LOOKUP, 12757 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12758 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), 12759 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12760 + BPF_LD_ABS(BPF_B, 0), 12761 + BPF_LD_ABS(BPF_H, 0), 12762 + BPF_LD_ABS(BPF_W, 0), 12763 + BPF_EXIT_INSN(), 12764 + }, 12765 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12766 + .result = ACCEPT, 12767 + }, 12768 + { 12769 + "reference tracking: forbid LD_ABS while holding reference", 12770 + .insns = { 12771 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), 12772 + BPF_SK_LOOKUP, 12773 + BPF_LD_ABS(BPF_B, 0), 12774 + BPF_LD_ABS(BPF_H, 0), 12775 + BPF_LD_ABS(BPF_W, 0), 12776 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12777 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), 12778 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12779 + BPF_EXIT_INSN(), 12780 + }, 12781 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12782 + .errstr = "BPF_LD_[ABS|IND] cannot be mixed with socket references", 12783 + .result = REJECT, 12784 + }, 12785 + { 12786 + "reference tracking: allow LD_IND", 12787 + .insns = { 12788 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), 12789 + BPF_SK_LOOKUP, 12790 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12791 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), 12792 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12793 + BPF_MOV64_IMM(BPF_REG_7, 1), 12794 + BPF_LD_IND(BPF_W, BPF_REG_7, -0x200000), 12795 + BPF_MOV64_REG(BPF_REG_0, BPF_REG_7), 12796 + BPF_EXIT_INSN(), 12797 + }, 12798 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12799 + .result = ACCEPT, 12800 + .retval = 1, 12801 + }, 12802 + { 12803 + "reference tracking: forbid LD_IND while holding reference", 12804 + .insns = { 12805 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_1), 12806 + BPF_SK_LOOKUP, 12807 + BPF_MOV64_REG(BPF_REG_4, BPF_REG_0), 12808 + BPF_MOV64_IMM(BPF_REG_7, 1), 12809 + BPF_LD_IND(BPF_W, BPF_REG_7, -0x200000), 12810 + BPF_MOV64_REG(BPF_REG_0, BPF_REG_7), 12811 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_4), 12812 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), 12813 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12814 + BPF_EXIT_INSN(), 12815 + }, 12816 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12817 + .errstr = "BPF_LD_[ABS|IND] cannot be mixed with socket references", 12818 + .result = REJECT, 12819 + }, 12820 + { 12821 + "reference tracking: check reference or tail call", 12822 + .insns = { 12823 + BPF_MOV64_REG(BPF_REG_7, BPF_REG_1), 12824 + BPF_SK_LOOKUP, 12825 + /* if (sk) bpf_sk_release() */ 12826 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12827 + BPF_JMP_IMM(BPF_JNE, BPF_REG_1, 0, 7), 12828 + /* bpf_tail_call() */ 12829 + BPF_MOV64_IMM(BPF_REG_3, 2), 12830 + BPF_LD_MAP_FD(BPF_REG_2, 0), 12831 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_7), 12832 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 12833 + BPF_FUNC_tail_call), 12834 + BPF_MOV64_IMM(BPF_REG_0, 0), 12835 + BPF_EXIT_INSN(), 12836 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12837 + BPF_EXIT_INSN(), 12838 + }, 12839 + .fixup_prog1 = { 17 }, 12840 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12841 + .result = ACCEPT, 12842 + }, 12843 + { 12844 + "reference tracking: release reference then tail call", 12845 + .insns = { 12846 + BPF_MOV64_REG(BPF_REG_7, BPF_REG_1), 12847 + BPF_SK_LOOKUP, 12848 + /* if (sk) bpf_sk_release() */ 12849 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12850 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), 12851 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12852 + /* bpf_tail_call() */ 12853 + BPF_MOV64_IMM(BPF_REG_3, 2), 12854 + BPF_LD_MAP_FD(BPF_REG_2, 0), 12855 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_7), 12856 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 12857 + BPF_FUNC_tail_call), 12858 + BPF_MOV64_IMM(BPF_REG_0, 0), 12859 + BPF_EXIT_INSN(), 12860 + }, 12861 + .fixup_prog1 = { 18 }, 12862 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12863 + .result = ACCEPT, 12864 + }, 12865 + { 12866 + "reference tracking: leak possible reference over tail call", 12867 + .insns = { 12868 + BPF_MOV64_REG(BPF_REG_7, BPF_REG_1), 12869 + /* Look up socket and store in REG_6 */ 12870 + BPF_SK_LOOKUP, 12871 + /* bpf_tail_call() */ 12872 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 12873 + BPF_MOV64_IMM(BPF_REG_3, 2), 12874 + BPF_LD_MAP_FD(BPF_REG_2, 0), 12875 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_7), 12876 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 12877 + BPF_FUNC_tail_call), 12878 + BPF_MOV64_IMM(BPF_REG_0, 0), 12879 + /* if (sk) bpf_sk_release() */ 12880 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 12881 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_1, 0, 1), 12882 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12883 + BPF_EXIT_INSN(), 12884 + }, 12885 + .fixup_prog1 = { 16 }, 12886 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12887 + .errstr = "tail_call would lead to reference leak", 12888 + .result = REJECT, 12889 + }, 12890 + { 12891 + "reference tracking: leak checked reference over tail call", 12892 + .insns = { 12893 + BPF_MOV64_REG(BPF_REG_7, BPF_REG_1), 12894 + /* Look up socket and store in REG_6 */ 12895 + BPF_SK_LOOKUP, 12896 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 12897 + /* if (!sk) goto end */ 12898 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 7), 12899 + /* bpf_tail_call() */ 12900 + BPF_MOV64_IMM(BPF_REG_3, 0), 12901 + BPF_LD_MAP_FD(BPF_REG_2, 0), 12902 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_7), 12903 + BPF_RAW_INSN(BPF_JMP | BPF_CALL, 0, 0, 0, 12904 + BPF_FUNC_tail_call), 12905 + BPF_MOV64_IMM(BPF_REG_0, 0), 12906 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 12907 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12908 + BPF_EXIT_INSN(), 12909 + }, 12910 + .fixup_prog1 = { 17 }, 12911 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12912 + .errstr = "tail_call would lead to reference leak", 12913 + .result = REJECT, 12914 + }, 12915 + { 12916 + "reference tracking: mangle and release sock_or_null", 12917 + .insns = { 12918 + BPF_SK_LOOKUP, 12919 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12920 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 5), 12921 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 1), 12922 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12923 + BPF_EXIT_INSN(), 12924 + }, 12925 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12926 + .errstr = "R1 pointer arithmetic on sock_or_null prohibited", 12927 + .result = REJECT, 12928 + }, 12929 + { 12930 + "reference tracking: mangle and release sock", 12931 + .insns = { 12932 + BPF_SK_LOOKUP, 12933 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12934 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), 12935 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, 5), 12936 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12937 + BPF_EXIT_INSN(), 12938 + }, 12939 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12940 + .errstr = "R1 pointer arithmetic on sock prohibited", 12941 + .result = REJECT, 12942 + }, 12943 + { 12944 + "reference tracking: access member", 12945 + .insns = { 12946 + BPF_SK_LOOKUP, 12947 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 12948 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), 12949 + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_0, 4), 12950 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 12951 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12952 + BPF_EXIT_INSN(), 12953 + }, 12954 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12955 + .result = ACCEPT, 12956 + }, 12957 + { 12958 + "reference tracking: write to member", 12959 + .insns = { 12960 + BPF_SK_LOOKUP, 12961 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 12962 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 5), 12963 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 12964 + BPF_LD_IMM64(BPF_REG_2, 42), 12965 + BPF_STX_MEM(BPF_W, BPF_REG_1, BPF_REG_2, 12966 + offsetof(struct bpf_sock, mark)), 12967 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 12968 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12969 + BPF_LD_IMM64(BPF_REG_0, 0), 12970 + BPF_EXIT_INSN(), 12971 + }, 12972 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12973 + .errstr = "cannot write into socket", 12974 + .result = REJECT, 12975 + }, 12976 + { 12977 + "reference tracking: invalid 64-bit access of member", 12978 + .insns = { 12979 + BPF_SK_LOOKUP, 12980 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 12981 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), 12982 + BPF_LDX_MEM(BPF_DW, BPF_REG_2, BPF_REG_0, 0), 12983 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 12984 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12985 + BPF_EXIT_INSN(), 12986 + }, 12987 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 12988 + .errstr = "invalid bpf_sock access off=0 size=8", 12989 + .result = REJECT, 12990 + }, 12991 + { 12992 + "reference tracking: access after release", 12993 + .insns = { 12994 + BPF_SK_LOOKUP, 12995 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_0), 12996 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 2), 12997 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 12998 + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 0), 12999 + BPF_EXIT_INSN(), 13000 + }, 13001 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 13002 + .errstr = "!read_ok", 13003 + .result = REJECT, 13004 + }, 13005 + { 13006 + "reference tracking: direct access for lookup", 13007 + .insns = { 13008 + /* Check that the packet is at least 64B long */ 13009 + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_1, 13010 + offsetof(struct __sk_buff, data)), 13011 + BPF_LDX_MEM(BPF_W, BPF_REG_3, BPF_REG_1, 13012 + offsetof(struct __sk_buff, data_end)), 13013 + BPF_MOV64_REG(BPF_REG_0, BPF_REG_2), 13014 + BPF_ALU64_IMM(BPF_ADD, BPF_REG_0, 64), 13015 + BPF_JMP_REG(BPF_JGT, BPF_REG_0, BPF_REG_3, 9), 13016 + /* sk = sk_lookup_tcp(ctx, skb->data, ...) */ 13017 + BPF_MOV64_IMM(BPF_REG_3, sizeof(struct bpf_sock_tuple)), 13018 + BPF_MOV64_IMM(BPF_REG_4, 0), 13019 + BPF_MOV64_IMM(BPF_REG_5, 0), 13020 + BPF_EMIT_CALL(BPF_FUNC_sk_lookup_tcp), 13021 + BPF_MOV64_REG(BPF_REG_6, BPF_REG_0), 13022 + BPF_JMP_IMM(BPF_JEQ, BPF_REG_0, 0, 3), 13023 + BPF_LDX_MEM(BPF_W, BPF_REG_2, BPF_REG_0, 4), 13024 + BPF_MOV64_REG(BPF_REG_1, BPF_REG_6), 13025 + BPF_EMIT_CALL(BPF_FUNC_sk_release), 13026 + BPF_EXIT_INSN(), 13027 + }, 13028 + .prog_type = BPF_PROG_TYPE_SCHED_CLS, 13029 + .result = ACCEPT, 13030 + }, 12988 13031 }; 12989 13032 12990 13033 static int probe_filter_length(const struct bpf_insn *fp) ··· 13411 12652 return fd; 13412 12653 } 13413 12654 13414 - static int create_prog_dummy1(void) 12655 + static int create_prog_dummy1(enum bpf_map_type prog_type) 13415 12656 { 13416 12657 struct bpf_insn prog[] = { 13417 12658 BPF_MOV64_IMM(BPF_REG_0, 42), 13418 12659 BPF_EXIT_INSN(), 13419 12660 }; 13420 12661 13421 - return bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, 12662 + return bpf_load_program(prog_type, prog, 13422 12663 ARRAY_SIZE(prog), "GPL", 0, NULL, 0); 13423 12664 } 13424 12665 13425 - static int create_prog_dummy2(int mfd, int idx) 12666 + static int create_prog_dummy2(enum bpf_map_type prog_type, int mfd, int idx) 13426 12667 { 13427 12668 struct bpf_insn prog[] = { 13428 12669 BPF_MOV64_IMM(BPF_REG_3, idx), ··· 13433 12674 BPF_EXIT_INSN(), 13434 12675 }; 13435 12676 13436 - return bpf_load_program(BPF_PROG_TYPE_SOCKET_FILTER, prog, 12677 + return bpf_load_program(prog_type, prog, 13437 12678 ARRAY_SIZE(prog), "GPL", 0, NULL, 0); 13438 12679 } 13439 12680 13440 - static int create_prog_array(uint32_t max_elem, int p1key) 12681 + static int create_prog_array(enum bpf_map_type prog_type, uint32_t max_elem, 12682 + int p1key) 13441 12683 { 13442 12684 int p2key = 1; 13443 12685 int mfd, p1fd, p2fd; ··· 13450 12690 return -1; 13451 12691 } 13452 12692 13453 - p1fd = create_prog_dummy1(); 13454 - p2fd = create_prog_dummy2(mfd, p2key); 12693 + p1fd = create_prog_dummy1(prog_type); 12694 + p2fd = create_prog_dummy2(prog_type, mfd, p2key); 13455 12695 if (p1fd < 0 || p2fd < 0) 13456 12696 goto out; 13457 12697 if (bpf_map_update_elem(mfd, &p1key, &p1fd, BPF_ANY) < 0) ··· 13508 12748 13509 12749 static char bpf_vlog[UINT_MAX >> 8]; 13510 12750 13511 - static void do_test_fixup(struct bpf_test *test, struct bpf_insn *prog, 13512 - int *map_fds) 12751 + static void do_test_fixup(struct bpf_test *test, enum bpf_map_type prog_type, 12752 + struct bpf_insn *prog, int *map_fds) 13513 12753 { 13514 12754 int *fixup_map1 = test->fixup_map1; 13515 12755 int *fixup_map2 = test->fixup_map2; ··· 13565 12805 } 13566 12806 13567 12807 if (*fixup_prog1) { 13568 - map_fds[4] = create_prog_array(4, 0); 12808 + map_fds[4] = create_prog_array(prog_type, 4, 0); 13569 12809 do { 13570 12810 prog[*fixup_prog1].imm = map_fds[4]; 13571 12811 fixup_prog1++; ··· 13573 12813 } 13574 12814 13575 12815 if (*fixup_prog2) { 13576 - map_fds[5] = create_prog_array(8, 7); 12816 + map_fds[5] = create_prog_array(prog_type, 8, 7); 13577 12817 do { 13578 12818 prog[*fixup_prog2].imm = map_fds[5]; 13579 12819 fixup_prog2++; ··· 13619 12859 for (i = 0; i < MAX_NR_MAPS; i++) 13620 12860 map_fds[i] = -1; 13621 12861 13622 - do_test_fixup(test, prog, map_fds); 12862 + if (!prog_type) 12863 + prog_type = BPF_PROG_TYPE_SOCKET_FILTER; 12864 + do_test_fixup(test, prog_type, prog, map_fds); 13623 12865 prog_len = probe_filter_length(prog); 13624 12866 13625 - fd_prog = bpf_verify_program(prog_type ? : BPF_PROG_TYPE_SOCKET_FILTER, 13626 - prog, prog_len, test->flags & F_LOAD_WITH_STRICT_ALIGNMENT, 12867 + fd_prog = bpf_verify_program(prog_type, prog, prog_len, 12868 + test->flags & F_LOAD_WITH_STRICT_ALIGNMENT, 13627 12869 "GPL", 0, bpf_vlog, sizeof(bpf_vlog), 1); 13628 12870 13629 12871 expected_ret = unpriv && test->result_unpriv != UNDEF ?