Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

c/r: prctl: add ability to set new mm_struct::exe_file

When we do restore we would like to have a way to setup a former
mm_struct::exe_file so that /proc/pid/exe would point to the original
executable file a process had at checkpoint time.

For this the PR_SET_MM_EXE_FILE code is introduced. This option takes a
file descriptor which will be set as a source for new /proc/$pid/exe
symlink.

Note it allows to change /proc/$pid/exe if there are no VM_EXECUTABLE
vmas present for current process, simply because this feature is a special
to C/R and mm::num_exe_file_vmas become meaningless after that.

To minimize the amount of transition the /proc/pid/exe symlink might have,
this feature is implemented in one-shot manner. Thus once changed the
symlink can't be changed again. This should help sysadmins to monitor the
symlinks over all process running in a system.

In particular one could make a snapshot of processes and ring alarm if
there unexpected changes of /proc/pid/exe's in a system.

Note -- this feature is available iif CONFIG_CHECKPOINT_RESTORE is set and
the caller must have CAP_SYS_RESOURCE capability granted, otherwise the
request to change symlink will be rejected.

Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Pavel Emelyanov <xemul@parallels.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Matt Helsley <matthltc@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Cyrill Gorcunov and committed by
Linus Torvalds
b32dfe37 fe8c7f5c

+57
+1
include/linux/prctl.h
··· 118 118 # define PR_SET_MM_ENV_START 10 119 119 # define PR_SET_MM_ENV_END 11 120 120 # define PR_SET_MM_AUXV 12 121 + # define PR_SET_MM_EXE_FILE 13 121 122 122 123 /* 123 124 * Set specific pid that is allowed to ptrace the current task.
+56
kernel/sys.c
··· 36 36 #include <linux/personality.h> 37 37 #include <linux/ptrace.h> 38 38 #include <linux/fs_struct.h> 39 + #include <linux/file.h> 40 + #include <linux/mount.h> 39 41 #include <linux/gfp.h> 40 42 #include <linux/syscore_ops.h> 41 43 #include <linux/version.h> ··· 1794 1792 (vma->vm_flags & banned); 1795 1793 } 1796 1794 1795 + static int prctl_set_mm_exe_file(struct mm_struct *mm, unsigned int fd) 1796 + { 1797 + struct file *exe_file; 1798 + struct dentry *dentry; 1799 + int err; 1800 + 1801 + /* 1802 + * Setting new mm::exe_file is only allowed when no VM_EXECUTABLE vma's 1803 + * remain. So perform a quick test first. 1804 + */ 1805 + if (mm->num_exe_file_vmas) 1806 + return -EBUSY; 1807 + 1808 + exe_file = fget(fd); 1809 + if (!exe_file) 1810 + return -EBADF; 1811 + 1812 + dentry = exe_file->f_path.dentry; 1813 + 1814 + /* 1815 + * Because the original mm->exe_file points to executable file, make 1816 + * sure that this one is executable as well, to avoid breaking an 1817 + * overall picture. 1818 + */ 1819 + err = -EACCES; 1820 + if (!S_ISREG(dentry->d_inode->i_mode) || 1821 + exe_file->f_path.mnt->mnt_flags & MNT_NOEXEC) 1822 + goto exit; 1823 + 1824 + err = inode_permission(dentry->d_inode, MAY_EXEC); 1825 + if (err) 1826 + goto exit; 1827 + 1828 + /* 1829 + * The symlink can be changed only once, just to disallow arbitrary 1830 + * transitions malicious software might bring in. This means one 1831 + * could make a snapshot over all processes running and monitor 1832 + * /proc/pid/exe changes to notice unusual activity if needed. 1833 + */ 1834 + down_write(&mm->mmap_sem); 1835 + if (likely(!mm->exe_file)) 1836 + set_mm_exe_file(mm, exe_file); 1837 + else 1838 + err = -EBUSY; 1839 + up_write(&mm->mmap_sem); 1840 + 1841 + exit: 1842 + fput(exe_file); 1843 + return err; 1844 + } 1845 + 1797 1846 static int prctl_set_mm(int opt, unsigned long addr, 1798 1847 unsigned long arg4, unsigned long arg5) 1799 1848 { ··· 1858 1805 1859 1806 if (!capable(CAP_SYS_RESOURCE)) 1860 1807 return -EPERM; 1808 + 1809 + if (opt == PR_SET_MM_EXE_FILE) 1810 + return prctl_set_mm_exe_file(mm, (unsigned int)addr); 1861 1811 1862 1812 if (addr >= TASK_SIZE) 1863 1813 return -EINVAL;