Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

selftests/vm/pkeys: exercise x86 XSAVE init state

On x86, there is a set of instructions used to save and restore register
state collectively known as the XSAVE architecture. There are about a
dozen different features managed with XSAVE. The protection keys
register, PKRU, is one of those features.

The hardware optimizes XSAVE by tracking when the state has not changed
from its initial (init) state. In this case, it can avoid the cost of
writing state to memory (it would usually just be a bunch of 0's).

When the pkey register is 0x0 the hardware optionally choose to track the
register as being in the init state (optimize away the writes). AMD CPUs
do this more aggressively compared to Intel.

On x86, PKRU is rarely in its (very permissive) init state. Instead, the
value defaults to something very restrictive. It is not surprising that
bugs have popped up in the rare cases when PKRU reaches its init state.

Add a protection key selftest which gets the protection keys register into
its init state in a way that should work on Intel and AMD. Then, do a
bunch of pkey register reads to watch for inadvertent changes.

This adds "-mxsave" to CFLAGS for all the x86 vm selftests in order to
allow use of the XSAVE instruction __builtin functions. This will make
the builtins available on all of the vm selftests, but is expected to be
harmless.

Link: https://lkml.kernel.org/r/20210611164202.1849B712@viggo.jf.intel.com
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ram Pai <linuxram@us.ibm.com>
Cc: Sandipan Das <sandipan@linux.ibm.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: "Desnes A. Nunes do Rosario" <desnesn@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thiago Jung Bauermann <bauerman@linux.ibm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Michal Suchanek <msuchanek@suse.de>
Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Dave Hansen and committed by
Linus Torvalds
d892454b 6039ca25

+76 -2
+2 -2
tools/testing/selftests/vm/Makefile
··· 101 101 endef 102 102 103 103 ifeq ($(CAN_BUILD_I386),1) 104 - $(BINARIES_32): CFLAGS += -m32 104 + $(BINARIES_32): CFLAGS += -m32 -mxsave 105 105 $(BINARIES_32): LDLIBS += -lrt -ldl -lm 106 106 $(BINARIES_32): $(OUTPUT)/%_32: %.c 107 107 $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@ ··· 109 109 endif 110 110 111 111 ifeq ($(CAN_BUILD_X86_64),1) 112 - $(BINARIES_64): CFLAGS += -m64 112 + $(BINARIES_64): CFLAGS += -m64 -mxsave 113 113 $(BINARIES_64): LDLIBS += -lrt -ldl 114 114 $(BINARIES_64): $(OUTPUT)/%_64: %.c 115 115 $(CC) $(CFLAGS) $(EXTRA_CFLAGS) $(notdir $^) $(LDLIBS) -o $@
+1
tools/testing/selftests/vm/pkey-x86.h
··· 126 126 127 127 #define XSTATE_PKEY_BIT (9) 128 128 #define XSTATE_PKEY 0x200 129 + #define XSTATE_BV_OFFSET 512 129 130 130 131 int pkey_reg_xstate_offset(void) 131 132 {
+73
tools/testing/selftests/vm/protection_keys.c
··· 1277 1277 } 1278 1278 } 1279 1279 1280 + void arch_force_pkey_reg_init(void) 1281 + { 1282 + #if defined(__i386__) || defined(__x86_64__) /* arch */ 1283 + u64 *buf; 1284 + 1285 + /* 1286 + * All keys should be allocated and set to allow reads and 1287 + * writes, so the register should be all 0. If not, just 1288 + * skip the test. 1289 + */ 1290 + if (read_pkey_reg()) 1291 + return; 1292 + 1293 + /* 1294 + * Just allocate an absurd about of memory rather than 1295 + * doing the XSAVE size enumeration dance. 1296 + */ 1297 + buf = mmap(NULL, 1*MB, PROT_READ|PROT_WRITE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); 1298 + 1299 + /* These __builtins require compiling with -mxsave */ 1300 + 1301 + /* XSAVE to build a valid buffer: */ 1302 + __builtin_ia32_xsave(buf, XSTATE_PKEY); 1303 + /* Clear XSTATE_BV[PKRU]: */ 1304 + buf[XSTATE_BV_OFFSET/sizeof(u64)] &= ~XSTATE_PKEY; 1305 + /* XRSTOR will likely get PKRU back to the init state: */ 1306 + __builtin_ia32_xrstor(buf, XSTATE_PKEY); 1307 + 1308 + munmap(buf, 1*MB); 1309 + #endif 1310 + } 1311 + 1312 + 1313 + /* 1314 + * This is mostly useless on ppc for now. But it will not 1315 + * hurt anything and should give some better coverage as 1316 + * a long-running test that continually checks the pkey 1317 + * register. 1318 + */ 1319 + void test_pkey_init_state(int *ptr, u16 pkey) 1320 + { 1321 + int err; 1322 + int allocated_pkeys[NR_PKEYS] = {0}; 1323 + int nr_allocated_pkeys = 0; 1324 + int i; 1325 + 1326 + for (i = 0; i < NR_PKEYS; i++) { 1327 + int new_pkey = alloc_pkey(); 1328 + 1329 + if (new_pkey < 0) 1330 + continue; 1331 + allocated_pkeys[nr_allocated_pkeys++] = new_pkey; 1332 + } 1333 + 1334 + dprintf3("%s()::%d\n", __func__, __LINE__); 1335 + 1336 + arch_force_pkey_reg_init(); 1337 + 1338 + /* 1339 + * Loop for a bit, hoping to get exercise the kernel 1340 + * context switch code. 1341 + */ 1342 + for (i = 0; i < 1000000; i++) 1343 + read_pkey_reg(); 1344 + 1345 + for (i = 0; i < nr_allocated_pkeys; i++) { 1346 + err = sys_pkey_free(allocated_pkeys[i]); 1347 + pkey_assert(!err); 1348 + read_pkey_reg(); /* for shadow checking */ 1349 + } 1350 + } 1351 + 1280 1352 /* 1281 1353 * pkey 0 is special. It is allocated by default, so you do not 1282 1354 * have to call pkey_alloc() to use it first. Make sure that it ··· 1580 1508 test_implicit_mprotect_exec_only_memory, 1581 1509 test_mprotect_with_pkey_0, 1582 1510 test_ptrace_of_child, 1511 + test_pkey_init_state, 1583 1512 test_pkey_syscalls_on_non_allocated_pkey, 1584 1513 test_pkey_syscalls_bad_args, 1585 1514 test_pkey_alloc_exhaust,