Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bpf: Document BPF_PROG_TYPE_CGROUP_SYSCTL

Add documentation for BPF_PROG_TYPE_CGROUP_SYSCTL, including general
info, attach type, context, return code, helpers, example and usage
considerations.

A separate file prog_cgroup_sysctl.rst is added to Documentation/bpf/.

In the future more program types can be documented in their own
prog_<name>.rst files.

Another way to place program type specific documentation would be to
group program types somehow (e.g. cgroup.rst for all cgroup-bpf
programs), but it may not scale well since some program types may belong
to different groups, e.g. BPF_PROG_TYPE_CGROUP_SKB can be documented
together with either cgroup-bpf programs or programs that access skb.

The new file is added to the index and verified by `make htmldocs` /
sanity-check by lynx.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

authored by

Andrey Ignatov and committed by
Alexei Starovoitov
da703149 ba02de1a

+134
+9
Documentation/bpf/index.rst
··· 36 36 bpf_devel_QA 37 37 38 38 39 + Program types 40 + ============= 41 + 42 + .. toctree:: 43 + :maxdepth: 1 44 + 45 + prog_cgroup_sysctl 46 + 47 + 39 48 .. Links: 40 49 .. _Documentation/networking/filter.txt: ../networking/filter.txt 41 50 .. _man-pages: https://www.kernel.org/doc/man-pages/
+125
Documentation/bpf/prog_cgroup_sysctl.rst
··· 1 + .. SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) 2 + 3 + =========================== 4 + BPF_PROG_TYPE_CGROUP_SYSCTL 5 + =========================== 6 + 7 + This document describes ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program type that 8 + provides cgroup-bpf hook for sysctl. 9 + 10 + The hook has to be attached to a cgroup and will be called every time a 11 + process inside that cgroup tries to read from or write to sysctl knob in proc. 12 + 13 + 1. Attach type 14 + ************** 15 + 16 + ``BPF_CGROUP_SYSCTL`` attach type has to be used to attach 17 + ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program to a cgroup. 18 + 19 + 2. Context 20 + ********** 21 + 22 + ``BPF_PROG_TYPE_CGROUP_SYSCTL`` provides access to the following context from 23 + BPF program:: 24 + 25 + struct bpf_sysctl { 26 + __u32 write; 27 + __u32 file_pos; 28 + }; 29 + 30 + * ``write`` indicates whether sysctl value is being read (``0``) or written 31 + (``1``). This field is read-only. 32 + 33 + * ``file_pos`` indicates file position sysctl is being accessed at, read 34 + or written. This field is read-write. Writing to the field sets the starting 35 + position in sysctl proc file ``read(2)`` will be reading from or ``write(2)`` 36 + will be writing to. Writing zero to the field can be used e.g. to override 37 + whole sysctl value by ``bpf_sysctl_set_new_value()`` on ``write(2)`` even 38 + when it's called by user space on ``file_pos > 0``. Writing non-zero 39 + value to the field can be used to access part of sysctl value starting from 40 + specified ``file_pos``. Not all sysctl support access with ``file_pos != 41 + 0``, e.g. writes to numeric sysctl entries must always be at file position 42 + ``0``. See also ``kernel.sysctl_writes_strict`` sysctl. 43 + 44 + See `linux/bpf.h`_ for more details on how context field can be accessed. 45 + 46 + 3. Return code 47 + ************** 48 + 49 + ``BPF_PROG_TYPE_CGROUP_SYSCTL`` program must return one of the following 50 + return codes: 51 + 52 + * ``0`` means "reject access to sysctl"; 53 + * ``1`` means "proceed with access". 54 + 55 + If program returns ``0`` user space will get ``-1`` from ``read(2)`` or 56 + ``write(2)`` and ``errno`` will be set to ``EPERM``. 57 + 58 + 4. Helpers 59 + ********** 60 + 61 + Since sysctl knob is represented by a name and a value, sysctl specific BPF 62 + helpers focus on providing access to these properties: 63 + 64 + * ``bpf_sysctl_get_name()`` to get sysctl name as it is visible in 65 + ``/proc/sys`` into provided by BPF program buffer; 66 + 67 + * ``bpf_sysctl_get_current_value()`` to get string value currently held by 68 + sysctl into provided by BPF program buffer. This helper is available on both 69 + ``read(2)`` from and ``write(2)`` to sysctl; 70 + 71 + * ``bpf_sysctl_get_new_value()`` to get new string value currently being 72 + written to sysctl before actual write happens. This helper can be used only 73 + on ``ctx->write == 1``; 74 + 75 + * ``bpf_sysctl_set_new_value()`` to override new string value currently being 76 + written to sysctl before actual write happens. Sysctl value will be 77 + overridden starting from the current ``ctx->file_pos``. If the whole value 78 + has to be overridden BPF program can set ``file_pos`` to zero before calling 79 + to the helper. This helper can be used only on ``ctx->write == 1``. New 80 + string value set by the helper is treated and verified by kernel same way as 81 + an equivalent string passed by user space. 82 + 83 + BPF program sees sysctl value same way as user space does in proc filesystem, 84 + i.e. as a string. Since many sysctl values represent an integer or a vector 85 + of integers, the following helpers can be used to get numeric value from the 86 + string: 87 + 88 + * ``bpf_strtol()`` to convert initial part of the string to long integer 89 + similar to user space `strtol(3)`_; 90 + * ``bpf_strtoul()`` to convert initial part of the string to unsigned long 91 + integer similar to user space `strtoul(3)`_; 92 + 93 + See `linux/bpf.h`_ for more details on helpers described here. 94 + 95 + 5. Examples 96 + *********** 97 + 98 + See `test_sysctl_prog.c`_ for an example of BPF program in C that access 99 + sysctl name and value, parses string value to get vector of integers and uses 100 + the result to make decision whether to allow or deny access to sysctl. 101 + 102 + 6. Notes 103 + ******** 104 + 105 + ``BPF_PROG_TYPE_CGROUP_SYSCTL`` is intended to be used in **trusted** root 106 + environment, for example to monitor sysctl usage or catch unreasonable values 107 + an application, running as root in a separate cgroup, is trying to set. 108 + 109 + Since `task_dfl_cgroup(current)` is called at `sys_read` / `sys_write` time it 110 + may return results different from that at `sys_open` time, i.e. process that 111 + opened sysctl file in proc filesystem may differ from process that is trying 112 + to read from / write to it and two such processes may run in different 113 + cgroups, what means ``BPF_PROG_TYPE_CGROUP_SYSCTL`` should not be used as a 114 + security mechanism to limit sysctl usage. 115 + 116 + As with any cgroup-bpf program additional care should be taken if an 117 + application running as root in a cgroup should not be allowed to 118 + detach/replace BPF program attached by administrator. 119 + 120 + .. Links 121 + .. _linux/bpf.h: ../../include/uapi/linux/bpf.h 122 + .. _strtol(3): http://man7.org/linux/man-pages/man3/strtol.3p.html 123 + .. _strtoul(3): http://man7.org/linux/man-pages/man3/strtoul.3p.html 124 + .. _test_sysctl_prog.c: 125 + ../../tools/testing/selftests/bpf/progs/test_sysctl_prog.c