Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

ptrace: introduce PTRACE_SET_SYSCALL_INFO request

PTRACE_SET_SYSCALL_INFO is a generic ptrace API that complements
PTRACE_GET_SYSCALL_INFO by letting the ptracer modify details of system
calls the tracee is blocked in.

This API allows ptracers to obtain and modify system call details in a
straightforward and architecture-agnostic way, providing a consistent way
of manipulating the system call number and arguments across architectures.

As in case of PTRACE_GET_SYSCALL_INFO, PTRACE_SET_SYSCALL_INFO also does
not aim to address numerous architecture-specific system call ABI
peculiarities, like differences in the number of system call arguments for
such system calls as pread64 and preadv.

The current implementation supports changing only those bits of system
call information that are used by strace system call tampering, namely,
syscall number, syscall arguments, and syscall return value.

Support of changing additional details returned by
PTRACE_GET_SYSCALL_INFO, such as instruction pointer and stack pointer,
could be added later if needed, by using struct ptrace_syscall_info.flags
to specify the additional details that should be set. Currently, "flags"
and "reserved" fields of struct ptrace_syscall_info must be initialized
with zeroes; "arch", "instruction_pointer", and "stack_pointer" fields are
currently ignored.

PTRACE_SET_SYSCALL_INFO currently supports only PTRACE_SYSCALL_INFO_ENTRY,
PTRACE_SYSCALL_INFO_EXIT, and PTRACE_SYSCALL_INFO_SECCOMP operations.
Other operations could be added later if needed.

Ideally, PTRACE_SET_SYSCALL_INFO should have been introduced along with
PTRACE_GET_SYSCALL_INFO, but it didn't happen. The last straw that
convinced me to implement PTRACE_SET_SYSCALL_INFO was apparent failure to
provide an API of changing the first system call argument on riscv
architecture.

ptrace(2) man page:

long ptrace(enum __ptrace_request request, pid_t pid, void *addr, void *data);
...
PTRACE_SET_SYSCALL_INFO
Modify information about the system call that caused the stop.
The "data" argument is a pointer to struct ptrace_syscall_info
that specifies the system call information to be set.
The "addr" argument should be set to sizeof(struct ptrace_syscall_info)).

Link: https://lore.kernel.org/all/59505464-c84a-403d-972f-d4b2055eeaac@gmail.com/
Link: https://lkml.kernel.org/r/20250303112044.GF24170@strace.io
Signed-off-by: Dmitry V. Levin <ldv@strace.io>
Reviewed-by: Alexey Gladkov <legion@kernel.org>
Reviewed-by: Charlie Jenkins <charlie@rivosinc.com>
Tested-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Eugene Syromiatnikov <esyr@redhat.com>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Cc: Andreas Larsson <andreas@gaisler.com>
Cc: anton ivanov <anton.ivanov@cambridgegreys.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Betkov <bp@alien8.de>
Cc: Brian Cain <bcain@quicinc.com>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Zankel <chris@zankel.net>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Davide Berardi <berardi.dav@gmail.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Dinh Nguyen <dinguyen@kernel.org>
Cc: Eugene Syromyatnikov <evgsyr@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Guo Ren <guoren@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Helge Deller <deller@gmx.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Max Filippov <jcmvbkbc@gmail.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Simek <monstr@monstr.eu>
Cc: Mike Frysinger <vapier@gentoo.org>
Cc: Naveen N Rao <naveen@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Renzo Davoi <renzo@cs.unibo.it>
Cc: Richard Weinberger <richard@nod.at>
Cc: Rich Felker <dalias@libc.org>
Cc: Russel King <linux@armlinux.org.uk>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Stafford Horne <shorne@gmail.com>
Cc: Stefan Kristiansson <stefan.kristiansson@saunalahti.fi>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Thomas Gleinxer <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vineet Gupta <vgupta@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Dmitry V. Levin and committed by
Andrew Morton
26bb3276 c354ec9c

+126 -2
+6 -1
include/uapi/linux/ptrace.h
··· 74 74 }; 75 75 76 76 #define PTRACE_GET_SYSCALL_INFO 0x420e 77 + #define PTRACE_SET_SYSCALL_INFO 0x4212 77 78 #define PTRACE_SYSCALL_INFO_NONE 0 78 79 #define PTRACE_SYSCALL_INFO_ENTRY 1 79 80 #define PTRACE_SYSCALL_INFO_EXIT 2 ··· 82 81 83 82 struct ptrace_syscall_info { 84 83 __u8 op; /* PTRACE_SYSCALL_INFO_* */ 85 - __u8 pad[3]; 84 + __u8 reserved; 85 + __u16 flags; 86 86 __u32 arch; 87 87 __u64 instruction_pointer; 88 88 __u64 stack_pointer; ··· 100 98 __u64 nr; 101 99 __u64 args[6]; 102 100 __u32 ret_data; 101 + __u32 reserved2; 103 102 } seccomp; 104 103 }; 105 104 }; ··· 144 141 __u64 offset; 145 142 __u64 len; 146 143 }; 144 + 145 + /* 0x4212 is PTRACE_SET_SYSCALL_INFO */ 147 146 148 147 /* 149 148 * These values are stored in task->ptrace_message
+120 -1
kernel/ptrace.c
··· 944 944 ptrace_get_syscall_info_entry(child, regs, info); 945 945 info->seccomp.ret_data = child->ptrace_message; 946 946 947 - /* ret_data is the last field in struct ptrace_syscall_info.seccomp */ 947 + /* 948 + * ret_data is the last non-reserved field 949 + * in struct ptrace_syscall_info.seccomp 950 + */ 948 951 return offsetofend(struct ptrace_syscall_info, seccomp.ret_data); 949 952 } 950 953 ··· 1018 1015 1019 1016 write_size = min(actual_size, user_size); 1020 1017 return copy_to_user(datavp, &info, write_size) ? -EFAULT : actual_size; 1018 + } 1019 + 1020 + static int 1021 + ptrace_set_syscall_info_entry(struct task_struct *child, struct pt_regs *regs, 1022 + struct ptrace_syscall_info *info) 1023 + { 1024 + unsigned long args[ARRAY_SIZE(info->entry.args)]; 1025 + int nr = info->entry.nr; 1026 + int i; 1027 + 1028 + /* 1029 + * Check that the syscall number specified in info->entry.nr 1030 + * is either a value of type "int" or a sign-extended value 1031 + * of type "int". 1032 + */ 1033 + if (nr != info->entry.nr) 1034 + return -ERANGE; 1035 + 1036 + for (i = 0; i < ARRAY_SIZE(args); i++) { 1037 + args[i] = info->entry.args[i]; 1038 + /* 1039 + * Check that the syscall argument specified in 1040 + * info->entry.args[i] is either a value of type 1041 + * "unsigned long" or a sign-extended value of type "long". 1042 + */ 1043 + if (args[i] != info->entry.args[i]) 1044 + return -ERANGE; 1045 + } 1046 + 1047 + syscall_set_nr(child, regs, nr); 1048 + /* 1049 + * If the syscall number is set to -1, setting syscall arguments is not 1050 + * just pointless, it would also clobber the syscall return value on 1051 + * those architectures that share the same register both for the first 1052 + * argument of syscall and its return value. 1053 + */ 1054 + if (nr != -1) 1055 + syscall_set_arguments(child, regs, args); 1056 + 1057 + return 0; 1058 + } 1059 + 1060 + static int 1061 + ptrace_set_syscall_info_seccomp(struct task_struct *child, struct pt_regs *regs, 1062 + struct ptrace_syscall_info *info) 1063 + { 1064 + /* 1065 + * info->entry is currently a subset of info->seccomp, 1066 + * info->seccomp.ret_data is currently ignored. 1067 + */ 1068 + return ptrace_set_syscall_info_entry(child, regs, info); 1069 + } 1070 + 1071 + static int 1072 + ptrace_set_syscall_info_exit(struct task_struct *child, struct pt_regs *regs, 1073 + struct ptrace_syscall_info *info) 1074 + { 1075 + long rval = info->exit.rval; 1076 + 1077 + /* 1078 + * Check that the return value specified in info->exit.rval 1079 + * is either a value of type "long" or a sign-extended value 1080 + * of type "long". 1081 + */ 1082 + if (rval != info->exit.rval) 1083 + return -ERANGE; 1084 + 1085 + if (info->exit.is_error) 1086 + syscall_set_return_value(child, regs, rval, 0); 1087 + else 1088 + syscall_set_return_value(child, regs, 0, rval); 1089 + 1090 + return 0; 1091 + } 1092 + 1093 + static int 1094 + ptrace_set_syscall_info(struct task_struct *child, unsigned long user_size, 1095 + const void __user *datavp) 1096 + { 1097 + struct pt_regs *regs = task_pt_regs(child); 1098 + struct ptrace_syscall_info info; 1099 + 1100 + if (user_size < sizeof(info)) 1101 + return -EINVAL; 1102 + 1103 + /* 1104 + * The compatibility is tracked by info.op and info.flags: if user-space 1105 + * does not instruct us to use unknown extra bits from future versions 1106 + * of ptrace_syscall_info, we are not going to read them either. 1107 + */ 1108 + if (copy_from_user(&info, datavp, sizeof(info))) 1109 + return -EFAULT; 1110 + 1111 + /* Reserved for future use. */ 1112 + if (info.flags || info.reserved) 1113 + return -EINVAL; 1114 + 1115 + /* Changing the type of the system call stop is not supported yet. */ 1116 + if (ptrace_get_syscall_info_op(child) != info.op) 1117 + return -EINVAL; 1118 + 1119 + switch (info.op) { 1120 + case PTRACE_SYSCALL_INFO_ENTRY: 1121 + return ptrace_set_syscall_info_entry(child, regs, &info); 1122 + case PTRACE_SYSCALL_INFO_EXIT: 1123 + return ptrace_set_syscall_info_exit(child, regs, &info); 1124 + case PTRACE_SYSCALL_INFO_SECCOMP: 1125 + return ptrace_set_syscall_info_seccomp(child, regs, &info); 1126 + default: 1127 + /* Other types of system call stops are not supported yet. */ 1128 + return -EINVAL; 1129 + } 1021 1130 } 1022 1131 #endif /* CONFIG_HAVE_ARCH_TRACEHOOK */ 1023 1132 ··· 1348 1233 1349 1234 case PTRACE_GET_SYSCALL_INFO: 1350 1235 ret = ptrace_get_syscall_info(child, addr, datavp); 1236 + break; 1237 + 1238 + case PTRACE_SET_SYSCALL_INFO: 1239 + ret = ptrace_set_syscall_info(child, addr, datavp); 1351 1240 break; 1352 1241 #endif 1353 1242