Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

bpf: mark vma->{vm_mm,vm_file} as __safe_trusted_or_null

The vma->vm_mm might be NULL and it can be accessed outside of RCU. Thus,
we can mark it as trusted_or_null. With this change, BPF helpers can safely
access vma->vm_mm to retrieve the associated mm_struct from the VMA.
Then we can make policy decision from the VMA.

The "trusted" annotation enables direct access to vma->vm_mm within kfuncs
marked with KF_TRUSTED_ARGS or KF_RCU, such as bpf_task_get_cgroup1() and
bpf_task_under_cgroup(). Conversely, "null" enforcement requires all
callsites using vma->vm_mm to perform NULL checks.

The lsm selftest must be modified because it directly accesses vma->vm_mm
without a NULL pointer check; otherwise it will break due to this
change.

For the VMA based THP policy, the use case is as follows,

@mm = @vma->vm_mm; // vm_area_struct::vm_mm is trusted or null
if (!@mm)
return;
bpf_rcu_read_lock(); // rcu lock must be held to dereference the owner
@owner = @mm->owner; // mm_struct::owner is rcu trusted or null
if (!@owner)
goto out;
@cgroup1 = bpf_task_get_cgroup1(@owner, MEMCG_HIERARCHY_ID);

/* make the decision based on the @cgroup1 attribute */

bpf_cgroup_release(@cgroup1); // release the associated cgroup
out:
bpf_rcu_read_unlock();

PSI memory information can be obtained from the associated cgroup to inform
policy decisions. Since upstream PSI support is currently limited to cgroup
v2, the following example demonstrates cgroup v2 implementation:

@owner = @mm->owner;
if (@owner) {
// @ancestor_cgid is user-configured
@ancestor = bpf_cgroup_from_id(@ancestor_cgid);
if (bpf_task_under_cgroup(@owner, @ancestor)) {
@psi_group = @ancestor->psi;

/* Extract PSI metrics from @psi_group and
* implement policy logic based on the values
*/

}
}

The vma::vm_file can also be marked with __safe_trusted_or_null.

No additional selftests are required since vma->vm_file and vma->vm_mm are
already validated in the existing selftest suite.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/r/20251016063929.13830-3-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

authored by

Yafang Shao and committed by
Alexei Starovoitov
7484e7cd ec8e3e27

+11 -3
+6
kernel/bpf/verifier.c
··· 7096 7096 struct sock *sk; 7097 7097 }; 7098 7098 7099 + BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct vm_area_struct) { 7100 + struct mm_struct *vm_mm; 7101 + struct file *vm_file; 7102 + }; 7103 + 7099 7104 static bool type_is_rcu(struct bpf_verifier_env *env, 7100 7105 struct bpf_reg_state *reg, 7101 7106 const char *field_name, u32 btf_id) ··· 7142 7137 { 7143 7138 BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct socket)); 7144 7139 BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct dentry)); 7140 + BTF_TYPE_EMIT(BTF_TYPE_SAFE_TRUSTED_OR_NULL(struct vm_area_struct)); 7145 7141 7146 7142 return btf_nested_type_is_trusted(&env->log, reg, field_name, btf_id, 7147 7143 "__safe_trusted_or_null");
+5 -3
tools/testing/selftests/bpf/progs/lsm.c
··· 89 89 int BPF_PROG(test_int_hook, struct vm_area_struct *vma, 90 90 unsigned long reqprot, unsigned long prot, int ret) 91 91 { 92 - if (ret != 0) 92 + struct mm_struct *mm = vma->vm_mm; 93 + 94 + if (ret != 0 || !mm) 93 95 return ret; 94 96 95 97 __s32 pid = bpf_get_current_pid_tgid() >> 32; 96 98 int is_stack = 0; 97 99 98 - is_stack = (vma->vm_start <= vma->vm_mm->start_stack && 99 - vma->vm_end >= vma->vm_mm->start_stack); 100 + is_stack = (vma->vm_start <= mm->start_stack && 101 + vma->vm_end >= mm->start_stack); 100 102 101 103 if (is_stack && monitored_pid == pid) { 102 104 mprotect_count++;