Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

rseq, ptrace: Add PTRACE_GET_RSEQ_CONFIGURATION request

For userspace checkpoint and restore (C/R) a way of getting process state
containing RSEQ configuration is needed.

There are two ways this information is going to be used:
- to re-enable RSEQ for threads which had it enabled before C/R
- to detect if a thread was in a critical section during C/R

Since C/R preserves TLS memory and addresses RSEQ ABI will be restored
using the address registered before C/R.

Detection whether the thread is in a critical section during C/R is needed
to enforce behavior of RSEQ abort during C/R. Attaching with ptrace()
before registers are dumped itself doesn't cause RSEQ abort.
Restoring the instruction pointer within the critical section is
problematic because rseq_cs may get cleared before the control is passed
to the migrated application code leading to RSEQ invariants not being
preserved. C/R code will use RSEQ ABI address to find the abort handler
to which the instruction pointer needs to be set.

To achieve above goals expose the RSEQ ABI address and the signature value
with the new ptrace request PTRACE_GET_RSEQ_CONFIGURATION.

This new ptrace request can also be used by debuggers so they are aware
of stops within restartable sequences in progress.

Signed-off-by: Piotr Figiel <figiel@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Michal Miroslaw <emmir@google.com>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Link: https://lkml.kernel.org/r/20210226135156.1081606-1-figiel@google.com

authored by

Piotr Figiel and committed by
Thomas Gleixner
90f093fa 13c2235b

+35
+10
include/uapi/linux/ptrace.h
··· 102 102 }; 103 103 }; 104 104 105 + #define PTRACE_GET_RSEQ_CONFIGURATION 0x420f 106 + 107 + struct ptrace_rseq_configuration { 108 + __u64 rseq_abi_pointer; 109 + __u32 rseq_abi_size; 110 + __u32 signature; 111 + __u32 flags; 112 + __u32 pad; 113 + }; 114 + 105 115 /* 106 116 * These values are stored in task->ptrace_message 107 117 * by tracehook_report_syscall_* to describe the current syscall-stop.
+25
kernel/ptrace.c
··· 31 31 #include <linux/cn_proc.h> 32 32 #include <linux/compat.h> 33 33 #include <linux/sched/signal.h> 34 + #include <linux/minmax.h> 34 35 35 36 #include <asm/syscall.h> /* for syscall_get_* */ 36 37 ··· 780 779 return ret; 781 780 } 782 781 782 + #ifdef CONFIG_RSEQ 783 + static long ptrace_get_rseq_configuration(struct task_struct *task, 784 + unsigned long size, void __user *data) 785 + { 786 + struct ptrace_rseq_configuration conf = { 787 + .rseq_abi_pointer = (u64)(uintptr_t)task->rseq, 788 + .rseq_abi_size = sizeof(*task->rseq), 789 + .signature = task->rseq_sig, 790 + .flags = 0, 791 + }; 792 + 793 + size = min_t(unsigned long, size, sizeof(conf)); 794 + if (copy_to_user(data, &conf, size)) 795 + return -EFAULT; 796 + return sizeof(conf); 797 + } 798 + #endif 799 + 783 800 #ifdef PTRACE_SINGLESTEP 784 801 #define is_singlestep(request) ((request) == PTRACE_SINGLESTEP) 785 802 #else ··· 1240 1221 case PTRACE_SECCOMP_GET_METADATA: 1241 1222 ret = seccomp_get_metadata(child, addr, datavp); 1242 1223 break; 1224 + 1225 + #ifdef CONFIG_RSEQ 1226 + case PTRACE_GET_RSEQ_CONFIGURATION: 1227 + ret = ptrace_get_rseq_configuration(child, addr, datavp); 1228 + break; 1229 + #endif 1243 1230 1244 1231 default: 1245 1232 break;