Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

powerpc/pkeys: make protection key 0 less special

Applications need the ability to associate an address-range with some
key and latter revert to its initial default key. Pkey-0 comes close to
providing this function but falls short, because the current
implementation disallows applications to explicitly associate pkey-0 to
the address range.

Lets make pkey-0 less special and treat it almost like any other key.
Thus it can be explicitly associated with any address range, and can be
freed. This gives the application more flexibility and power. The
ability to free pkey-0 must be used responsibily, since pkey-0 is
associated with almost all address-range by default.

Even with this change pkey-0 continues to be slightly more special
from the following point of view.
(a) it is implicitly allocated.
(b) it is the default key assigned to any address-range.
(c) its permissions cannot be modified by userspace.

NOTE: (c) is specific to powerpc only. pkey-0 is associated by default
with all pages including kernel pages, and pkeys are also active in
kernel mode. If any permission is denied on pkey-0, the kernel running
in the context of the application will be unable to operate.

Tested on powerpc.

Signed-off-by: Ram Pai <linuxram@us.ibm.com>
[mpe: Drop #define PKEY_0 0 in favour of plain old 0]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

authored by

Ram Pai and committed by
Michael Ellerman
07f522d2 a4fcc877

+27 -13
+21 -6
arch/powerpc/include/asm/pkeys.h
··· 13 13 14 14 DECLARE_STATIC_KEY_TRUE(pkey_disabled); 15 15 extern int pkeys_total; /* total pkeys as per device tree */ 16 - extern u32 initial_allocation_mask; /* bits set for reserved keys */ 16 + extern u32 initial_allocation_mask; /* bits set for the initially allocated keys */ 17 + extern u32 reserved_allocation_mask; /* bits set for reserved keys */ 17 18 18 19 #define ARCH_VM_PKEY_FLAGS (VM_PKEY_BIT0 | VM_PKEY_BIT1 | VM_PKEY_BIT2 | \ 19 20 VM_PKEY_BIT3 | VM_PKEY_BIT4) ··· 84 83 #define __mm_pkey_is_allocated(mm, pkey) \ 85 84 (mm_pkey_allocation_map(mm) & pkey_alloc_mask(pkey)) 86 85 87 - #define __mm_pkey_is_reserved(pkey) (initial_allocation_mask & \ 86 + #define __mm_pkey_is_reserved(pkey) (reserved_allocation_mask & \ 88 87 pkey_alloc_mask(pkey)) 89 88 90 89 static inline bool mm_pkey_is_allocated(struct mm_struct *mm, int pkey) 91 90 { 92 - /* A reserved key is never considered as 'explicitly allocated' */ 93 - return ((pkey < arch_max_pkey()) && 94 - !__mm_pkey_is_reserved(pkey) && 95 - __mm_pkey_is_allocated(mm, pkey)); 91 + if (pkey < 0 || pkey >= arch_max_pkey()) 92 + return false; 93 + 94 + /* Reserved keys are never allocated. */ 95 + if (__mm_pkey_is_reserved(pkey)) 96 + return false; 97 + 98 + return __mm_pkey_is_allocated(mm, pkey); 96 99 } 97 100 98 101 /* ··· 181 176 { 182 177 if (static_branch_likely(&pkey_disabled)) 183 178 return -EINVAL; 179 + 180 + /* 181 + * userspace should not change pkey-0 permissions. 182 + * pkey-0 is associated with every page in the kernel. 183 + * If userspace denies any permission on pkey-0, the 184 + * kernel cannot operate. 185 + */ 186 + if (pkey == 0) 187 + return init_val ? -EINVAL : 0; 188 + 184 189 return __arch_set_user_pkey_access(tsk, pkey, init_val); 185 190 } 186 191
+6 -7
arch/powerpc/mm/pkeys.c
··· 14 14 bool pkey_execute_disable_supported; 15 15 int pkeys_total; /* Total pkeys as per device tree */ 16 16 bool pkeys_devtree_defined; /* pkey property exported by device tree */ 17 - u32 initial_allocation_mask; /* Bits set for reserved keys */ 17 + u32 initial_allocation_mask; /* Bits set for the initially allocated keys */ 18 + u32 reserved_allocation_mask; /* Bits set for reserved keys */ 18 19 u64 pkey_amr_mask; /* Bits in AMR not to be touched */ 19 20 u64 pkey_iamr_mask; /* Bits in AMR not to be touched */ 20 21 u64 pkey_uamor_mask; /* Bits in UMOR not to be touched */ ··· 122 121 #else 123 122 os_reserved = 0; 124 123 #endif 125 - initial_allocation_mask = (0x1 << 0) | (0x1 << 1) | 126 - (0x1 << execute_only_key); 124 + /* Bits are in LE format. */ 125 + reserved_allocation_mask = (0x1 << 1) | (0x1 << execute_only_key); 127 126 128 127 /* register mask is in BE format */ 129 128 pkey_amr_mask = ~0x0ul; ··· 139 138 140 139 /* mark the rest of the keys as reserved and hence unavailable */ 141 140 for (i = (pkeys_total - os_reserved); i < pkeys_total; i++) { 142 - initial_allocation_mask |= (0x1 << i); 141 + reserved_allocation_mask |= (0x1 << i); 143 142 pkey_uamor_mask &= ~(0x3ul << pkeyshift(i)); 144 143 } 144 + initial_allocation_mask = reserved_allocation_mask | (0x1 << 0); 145 145 146 146 if (unlikely((pkeys_total - os_reserved) <= execute_only_key)) { 147 147 /* ··· 360 358 { 361 359 int pkey_shift; 362 360 u64 amr; 363 - 364 - if (!pkey) 365 - return true; 366 361 367 362 if (!is_pkey_enabled(pkey)) 368 363 return true;