Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/pkeys: Add PKRU as a parameter in signal handling functions

Assume there's a multithreaded application that runs untrusted user
code. Each thread has its stack/code protected by a non-zero PKEY, and the
PKRU register is set up such that only that particular non-zero PKEY is
enabled. Each thread also sets up an alternate signal stack to handle
signals, which is protected by PKEY zero. The PKEYs man page documents that
the PKRU will be reset to init_pkru when the signal handler is invoked,
which means that PKEY zero access will be enabled. But this reset happens
after the kernel attempts to push fpu state to the alternate stack, which
is not (yet) accessible by the kernel, which leads to a new SIGSEGV being
sent to the application, terminating it.

Enabling both the non-zero PKEY (for the thread) and PKEY zero in
userspace will not work for this use case. It cannot have the alt stack
writeable by all - the rationale here is that the code running in that
thread (using a non-zero PKEY) is untrusted and should not have access
to the alternate signal stack (that uses PKEY zero), to prevent the
return address of a function from being changed. The expectation is that
kernel should be able to set up the alternate signal stack and deliver
the signal to the application even if PKEY zero is explicitly disabled
by the application. The signal handler accessibility should not be
dictated by whatever PKRU value the thread sets up.

The PKRU register is managed by XSAVE, which means the sigframe contents
must match the register contents - which is not the case here. It's
required that the signal frame contains the user-defined PKRU value (so
that it is restored correctly from sigcontext) but the actual register must
be reset to init_pkru so that the alt stack is accessible and the signal
can be delivered to the application. It seems that the proper fix here
would be to remove PKRU from the XSAVE framework and manage it separately,
which is quite complicated. As a workaround, do this:

orig_pkru = rdpkru();
wrpkru(orig_pkru & init_pkru_value);
xsave_to_user_sigframe();
put_user(pkru_sigframe_addr, orig_pkru)

In preparation for writing PKRU to sigframe, pass PKRU as an additional
parameter down the call chain from get_sigframe().

No functional change.

Signed-off-by: Aruna Ramakrishna <aruna.ramakrishna@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/20240802061318.2140081-2-aruna.ramakrishna@oracle.com

authored by

Aruna Ramakrishna and committed by
Thomas Gleixner
24cf2bc9 4436e6da

+6 -5
+1 -1
arch/x86/include/asm/fpu/signal.h
··· 29 29 30 30 unsigned long fpu__get_fpstate_size(void); 31 31 32 - extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size); 32 + extern bool copy_fpstate_to_sigframe(void __user *buf, void __user *fp, int size, u32 pkru); 33 33 extern void fpu__clear_user_states(struct fpu *fpu); 34 34 extern bool fpu__restore_sig(void __user *buf, int ia32_frame); 35 35
+3 -3
arch/x86/kernel/fpu/signal.c
··· 156 156 return !err; 157 157 } 158 158 159 - static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf) 159 + static inline int copy_fpregs_to_sigframe(struct xregs_state __user *buf, u32 pkru) 160 160 { 161 161 if (use_xsave()) 162 162 return xsave_to_user_sigframe(buf); ··· 185 185 * For [f]xsave state, update the SW reserved fields in the [f]xsave frame 186 186 * indicating the absence/presence of the extended state to the user. 187 187 */ 188 - bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size) 188 + bool copy_fpstate_to_sigframe(void __user *buf, void __user *buf_fx, int size, u32 pkru) 189 189 { 190 190 struct task_struct *tsk = current; 191 191 struct fpstate *fpstate = tsk->thread.fpu.fpstate; ··· 228 228 fpregs_restore_userregs(); 229 229 230 230 pagefault_disable(); 231 - ret = copy_fpregs_to_sigframe(buf_fx); 231 + ret = copy_fpregs_to_sigframe(buf_fx, pkru); 232 232 pagefault_enable(); 233 233 fpregs_unlock(); 234 234
+2 -1
arch/x86/kernel/signal.c
··· 84 84 unsigned long math_size = 0; 85 85 unsigned long sp = regs->sp; 86 86 unsigned long buf_fx = 0; 87 + u32 pkru = read_pkru(); 87 88 88 89 /* redzone */ 89 90 if (!ia32_frame) ··· 140 139 } 141 140 142 141 /* save i387 and extended state */ 143 - if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size)) 142 + if (!copy_fpstate_to_sigframe(*fpstate, (void __user *)buf_fx, math_size, pkru)) 144 143 return (void __user *)-1L; 145 144 146 145 return (void __user *)sp;