Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Revert "fs/exec: allow to unshare a time namespace on vfork+exec"

This reverts commit 133e2d3e81de5d9706cab2dd1d52d231c27382e5.

Alexey pointed out a few undesirable side effects of the reverted change.
First, it doesn't take into account that CLONE_VFORK can be used with
CLONE_THREAD. Second, a child process doesn't enter a target time name-space,
if its parent dies before the child calls exec. It happens because the parent
clears vfork_done.

Eric W. Biederman suggests installing a time namespace as a task gets a new mm.
It includes all new processes cloned without CLONE_VM and all tasks that call
exec(). This is an user API change, but we think there aren't users that depend
on the old behavior.

It is too late to make such changes in this release, so let's roll back
this patch and introduce the right one in the next release.

Cc: Alexey Izbyshev <izbyshev@ispras.ru>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrei Vagin <avagin@gmail.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220913102551.1121611-3-avagin@google.com

authored by

Andrei Vagin and committed by
Kees Cook
33a2d6bc 2b1e8921

+2 -13
-7
fs/exec.c
··· 65 65 #include <linux/io_uring.h> 66 66 #include <linux/syscall_user_dispatch.h> 67 67 #include <linux/coredump.h> 68 - #include <linux/time_namespace.h> 69 68 70 69 #include <linux/uaccess.h> 71 70 #include <asm/mmu_context.h> ··· 978 979 { 979 980 struct task_struct *tsk; 980 981 struct mm_struct *old_mm, *active_mm; 981 - bool vfork; 982 982 int ret; 983 983 984 984 /* Notify parent that we're no longer interested in the old VM */ 985 985 tsk = current; 986 - vfork = !!tsk->vfork_done; 987 986 old_mm = current->mm; 988 987 exec_mm_release(tsk, old_mm); 989 988 if (old_mm) ··· 1026 1029 tsk->mm->vmacache_seqnum = 0; 1027 1030 vmacache_flush(tsk); 1028 1031 task_unlock(tsk); 1029 - 1030 - if (vfork) 1031 - timens_on_fork(tsk->nsproxy, tsk); 1032 - 1033 1032 if (old_mm) { 1034 1033 mmap_read_unlock(old_mm); 1035 1034 BUG_ON(active_mm != old_mm);
+1 -4
kernel/fork.c
··· 2046 2046 /* 2047 2047 * If the new process will be in a different time namespace 2048 2048 * do not allow it to share VM or a thread group with the forking task. 2049 - * 2050 - * On vfork, the child process enters the target time namespace only 2051 - * after exec. 2052 2049 */ 2053 - if ((clone_flags & (CLONE_VM | CLONE_VFORK)) == CLONE_VM) { 2050 + if (clone_flags & (CLONE_THREAD | CLONE_VM)) { 2054 2051 if (nsp->time_ns != nsp->time_ns_for_children) 2055 2052 return ERR_PTR(-EINVAL); 2056 2053 }
+1 -2
kernel/nsproxy.c
··· 179 179 if (IS_ERR(new_ns)) 180 180 return PTR_ERR(new_ns); 181 181 182 - if ((flags & CLONE_VM) == 0) 183 - timens_on_fork(new_ns, tsk); 182 + timens_on_fork(new_ns, tsk); 184 183 185 184 tsk->nsproxy = new_ns; 186 185 return 0;