Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Documentation/bpf: Document CGROUP_STORAGE map type

The machanics and usage are not very straightforward. Given the
changes it's better to document how it works and how to use it,
rather than having to rely on the examples and implementation to
infer what is going on.

Signed-off-by: YiFei Zhu <zhuyifei@google.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/b412edfbb05cb1077c9e2a36a981a54ee23fa8b3.1595565795.git.zhuyifei@google.com

authored by

YiFei Zhu and committed by
Alexei Starovoitov
4e15f460 3573f384

+178
+9
Documentation/bpf/index.rst
··· 48 48 bpf_lsm 49 49 50 50 51 + Map types 52 + ========= 53 + 54 + .. toctree:: 55 + :maxdepth: 1 56 + 57 + map_cgroup_storage 58 + 59 + 51 60 Testing and debugging BPF 52 61 ========================= 53 62
+169
Documentation/bpf/map_cgroup_storage.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0-only 2 + .. Copyright (C) 2020 Google LLC. 3 + 4 + =========================== 5 + BPF_MAP_TYPE_CGROUP_STORAGE 6 + =========================== 7 + 8 + The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized 9 + storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that 10 + attach to cgroups; the programs are made available by the same Kconfig. The 11 + storage is identified by the cgroup the program is attached to. 12 + 13 + The map provide a local storage at the cgroup that the BPF program is attached 14 + to. It provides a faster and simpler access than the general purpose hash 15 + table, which performs a hash table lookups, and requires user to track live 16 + cgroups on their own. 17 + 18 + This document describes the usage and semantics of the 19 + ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in 20 + Linux 5.9 and this document will describe the differences. 21 + 22 + Usage 23 + ===== 24 + 25 + The map uses key of type of either ``__u64 cgroup_inode_id`` or 26 + ``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``:: 27 + 28 + struct bpf_cgroup_storage_key { 29 + __u64 cgroup_inode_id; 30 + __u32 attach_type; 31 + }; 32 + 33 + ``cgroup_inode_id`` is the inode id of the cgroup directory. 34 + ``attach_type`` is the the program's attach type. 35 + 36 + Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type. 37 + When this key type is used, then all attach types of the particular cgroup and 38 + map will share the same storage. Otherwise, if the type is 39 + ``struct bpf_cgroup_storage_key``, then programs of different attach types 40 + be isolated and see different storages. 41 + 42 + To access the storage in a program, use ``bpf_get_local_storage``:: 43 + 44 + void *bpf_get_local_storage(void *map, u64 flags) 45 + 46 + ``flags`` is reserved for future use and must be 0. 47 + 48 + There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE`` 49 + can be accessed by multiple programs across different CPUs, and user should 50 + take care of synchronization by themselves. The bpf infrastructure provides 51 + ``struct bpf_spin_lock`` to synchronize the storage. See 52 + ``tools/testing/selftests/bpf/progs/test_spin_lock.c``. 53 + 54 + Examples 55 + ======== 56 + 57 + Usage with key type as ``struct bpf_cgroup_storage_key``:: 58 + 59 + #include <bpf/bpf.h> 60 + 61 + struct { 62 + __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); 63 + __type(key, struct bpf_cgroup_storage_key); 64 + __type(value, __u32); 65 + } cgroup_storage SEC(".maps"); 66 + 67 + int program(struct __sk_buff *skb) 68 + { 69 + __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); 70 + __sync_fetch_and_add(ptr, 1); 71 + 72 + return 0; 73 + } 74 + 75 + Userspace accessing map declared above:: 76 + 77 + #include <linux/bpf.h> 78 + #include <linux/libbpf.h> 79 + 80 + __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) 81 + { 82 + struct bpf_cgroup_storage_key = { 83 + .cgroup_inode_id = cgrp, 84 + .attach_type = type, 85 + }; 86 + __u32 value; 87 + bpf_map_lookup_elem(bpf_map__fd(map), &key, &value); 88 + // error checking omitted 89 + return value; 90 + } 91 + 92 + Alternatively, using just ``__u64 cgroup_inode_id`` as key type:: 93 + 94 + #include <bpf/bpf.h> 95 + 96 + struct { 97 + __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); 98 + __type(key, __u64); 99 + __type(value, __u32); 100 + } cgroup_storage SEC(".maps"); 101 + 102 + int program(struct __sk_buff *skb) 103 + { 104 + __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); 105 + __sync_fetch_and_add(ptr, 1); 106 + 107 + return 0; 108 + } 109 + 110 + And userspace:: 111 + 112 + #include <linux/bpf.h> 113 + #include <linux/libbpf.h> 114 + 115 + __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) 116 + { 117 + __u32 value; 118 + bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value); 119 + // error checking omitted 120 + return value; 121 + } 122 + 123 + Semantics 124 + ========= 125 + 126 + ``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This 127 + per-CPU variant will have different memory regions for each CPU for each 128 + storage. The non-per-CPU will have the same memory region for each storage. 129 + 130 + Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and 131 + for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded 132 + that uses the map. A program may be attached to multiple cgroups or have 133 + multiple attach types, and each attach creates a fresh zeroed storage. The 134 + storage is freed upon detach. 135 + 136 + There is a one-to-one association between the map of each type (per-CPU and 137 + non-per-CPU) and the BPF program during load verification time. As a result, 138 + each map can only be used by one BPF program and each BPF program can only use 139 + one storage map of each type. Because of map can only be used by one BPF 140 + program, sharing of this cgroup's storage with other BPF programs were 141 + impossible. 142 + 143 + Since Linux 5.9, storage can be shared by multiple programs. When a program is 144 + attached to a cgroup, the kernel would create a new storage only if the map 145 + does not already contain an entry for the cgroup and attach type pair, or else 146 + the old storage is reused for the new attachment. If the map is attach type 147 + shared, then attach type is simply ignored during comparison. Storage is freed 148 + only when either the map or the cgroup attached to is being freed. Detaching 149 + will not directly free the storage, but it may cause the reference to the map 150 + to reach zero and indirectly freeing all storage in the map. 151 + 152 + The map is not associated with any BPF program, thus making sharing possible. 153 + However, the BPF program can still only associate with one map of each type 154 + (per-CPU and non-per-CPU). A BPF program cannot use more than one 155 + ``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one 156 + ``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``. 157 + 158 + In all versions, userspace may use the the attach parameters of cgroup and 159 + attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map 160 + APIs to read or update the storage for a given attachment. For Linux 5.9 161 + attach type shared storages, only the first value in the struct, cgroup inode 162 + id, is used during comparison, so userspace may just specify a ``__u64`` 163 + directly. 164 + 165 + The storage is bound at attach time. Even if the program is attached to parent 166 + and triggers in child, the storage still belongs to the parent. 167 + 168 + Userspace cannot create a new entry in the map or delete an existing entry. 169 + Program test runs always use a temporary storage.