Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bpf: Add comments for map BTF matching requirement for bpf_list_head

The old behavior of bpf_map_meta_equal was that it compared timer_off
to be equal (but not spin_lock_off, because that was not allowed), and
did memcmp of kptr_off_tab.

Now, we memcmp the btf_record of two bpf_map structs, which has all
fields.

We preserve backwards compat as we kzalloc the array, so if only spin
lock and timer exist in map, we only compare offset while the rest of
unused members in the btf_field struct are zeroed out.

In case of kptr, btf and everything else is of vmlinux or module, so as
long type is same it will match, since kernel btf, module, dtor pointer
will be same across maps.

Now with list_head in the mix, things are a bit complicated. We
implicitly add a requirement that both BTFs are same, because struct
btf_field_list_head has btf and value_rec members.

We obviously shouldn't force BTFs to be equal by default, as that breaks
backwards compatibility.

Currently it is only implicitly required due to list_head matching
struct btf and value_rec member. value_rec points back into a btf_record
stashed in the map BTF (btf member of btf_field_list_head). So that
pointer and btf member has to match exactly.

Document all these subtle details so that things don't break in the
future when touching this code.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20221118015614.2013203-19-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

authored by

Kumar Kartikeya Dwivedi and committed by
Alexei Starovoitov
c22dfdd2 534e86bc

+22
+3
kernel/bpf/btf.c
··· 3648 3648 return NULL; 3649 3649 3650 3650 cnt = ret; 3651 + /* This needs to be kzalloc to zero out padding and unused fields, see 3652 + * comment in btf_record_equal. 3653 + */ 3651 3654 rec = kzalloc(offsetof(struct btf_record, fields[cnt]), GFP_KERNEL | __GFP_NOWARN); 3652 3655 if (!rec) 3653 3656 return ERR_PTR(-ENOMEM);
+5
kernel/bpf/map_in_map.c
··· 68 68 } 69 69 inner_map_meta->field_offs = field_offs; 70 70 } 71 + /* Note: We must use the same BTF, as we also used btf_record_dup above 72 + * which relies on BTF being same for both maps, as some members like 73 + * record->fields.list_head have pointers like value_rec pointing into 74 + * inner_map->btf. 75 + */ 71 76 if (inner_map->btf) { 72 77 btf_get(inner_map->btf); 73 78 inner_map_meta->btf = inner_map->btf;
+14
kernel/bpf/syscall.c
··· 611 611 if (rec_a->cnt != rec_b->cnt) 612 612 return false; 613 613 size = offsetof(struct btf_record, fields[rec_a->cnt]); 614 + /* btf_parse_fields uses kzalloc to allocate a btf_record, so unused 615 + * members are zeroed out. So memcmp is safe to do without worrying 616 + * about padding/unused fields. 617 + * 618 + * While spin_lock, timer, and kptr have no relation to map BTF, 619 + * list_head metadata is specific to map BTF, the btf and value_rec 620 + * members in particular. btf is the map BTF, while value_rec points to 621 + * btf_record in that map BTF. 622 + * 623 + * So while by default, we don't rely on the map BTF (which the records 624 + * were parsed from) matching for both records, which is not backwards 625 + * compatible, in case list_head is part of it, we implicitly rely on 626 + * that by way of depending on memcmp succeeding for it. 627 + */ 614 628 return !memcmp(rec_a, rec_b, size); 615 629 } 616 630