x86/entry/64/compat: Preserve r8-r11 in int $0x80

32-bit user code that uses int $80 doesn't care about r8-r11. There is,
however, some 64-bit user code that intentionally uses int $0x80 to invoke
32-bit system calls. From what I've seen, basically all such code assumes
that r8-r15 are all preserved, but the kernel clobbers r8-r11. Since I
doubt that there's any code that depends on int $0x80 zeroing r8-r11,
change the kernel to preserve them.

I suspect that very little user code is broken by the old clobber, since
r8-r11 are only rarely allocated by gcc, and they're clobbered by function
calls, so they only way we'd see a problem is if the same function that
invokes int $0x80 also spills something important to one of these
registers.

The current behavior seems to date back to the historical commit
"[PATCH] x86-64 merge for 2.6.4". Before that, all regs were
preserved. I can't find any explanation of why this change was made.

Update the test_syscall_vdso_32 testcase as well to verify the new
behavior, and it strengthens the test to make sure that the kernel doesn't
accidentally permute r8..r15.

Suggested-by: Denys Vlasenko <dvlasenk@redhat.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Link: https://lkml.kernel.org/r/d4c4d9985fbe64f8c9e19291886453914b48caee.1523975710.git.luto@kernel.org

authored by Andy Lutomirski and committed by Thomas Gleixner 8bb2610b 316d097c

Changed files
+25 -18
arch
x86
tools
testing
selftests
+4 -4
arch/x86/entry/entry_64_compat.S
··· 84 84 pushq %rdx /* pt_regs->dx */ 85 85 pushq %rcx /* pt_regs->cx */ 86 86 pushq $-ENOSYS /* pt_regs->ax */ 87 - pushq $0 /* pt_regs->r8 = 0 */ 87 + pushq %r8 /* pt_regs->r8 */ 88 88 xorl %r8d, %r8d /* nospec r8 */ 89 - pushq $0 /* pt_regs->r9 = 0 */ 89 + pushq %r9 /* pt_regs->r9 */ 90 90 xorl %r9d, %r9d /* nospec r9 */ 91 - pushq $0 /* pt_regs->r10 = 0 */ 91 + pushq %r10 /* pt_regs->r10 */ 92 92 xorl %r10d, %r10d /* nospec r10 */ 93 - pushq $0 /* pt_regs->r11 = 0 */ 93 + pushq %r11 /* pt_regs->r11 */ 94 94 xorl %r11d, %r11d /* nospec r11 */ 95 95 pushq %rbx /* pt_regs->rbx */ 96 96 xorl %ebx, %ebx /* nospec rbx */
+21 -14
tools/testing/selftests/x86/test_syscall_vdso.c
··· 100 100 " shl $32, %r8\n" 101 101 " orq $0x7f7f7f7f, %r8\n" 102 102 " movq %r8, %r9\n" 103 - " movq %r8, %r10\n" 104 - " movq %r8, %r11\n" 105 - " movq %r8, %r12\n" 106 - " movq %r8, %r13\n" 107 - " movq %r8, %r14\n" 108 - " movq %r8, %r15\n" 103 + " incq %r9\n" 104 + " movq %r9, %r10\n" 105 + " incq %r10\n" 106 + " movq %r10, %r11\n" 107 + " incq %r11\n" 108 + " movq %r11, %r12\n" 109 + " incq %r12\n" 110 + " movq %r12, %r13\n" 111 + " incq %r13\n" 112 + " movq %r13, %r14\n" 113 + " incq %r14\n" 114 + " movq %r14, %r15\n" 115 + " incq %r15\n" 109 116 " ret\n" 110 117 " .code32\n" 111 118 " .popsection\n" ··· 135 128 int err = 0; 136 129 int num = 8; 137 130 uint64_t *r64 = &regs64.r8; 131 + uint64_t expected = 0x7f7f7f7f7f7f7f7fULL; 138 132 139 133 if (!kernel_is_64bit) 140 134 return 0; 141 135 142 136 do { 143 - if (*r64 == 0x7f7f7f7f7f7f7f7fULL) 137 + if (*r64 == expected++) 144 138 continue; /* register did not change */ 145 139 if (syscall_addr != (long)&int80) { 146 140 /* ··· 155 147 continue; 156 148 } 157 149 } else { 158 - /* INT80 syscall entrypoint can be used by 150 + /* 151 + * INT80 syscall entrypoint can be used by 159 152 * 64-bit programs too, unlike SYSCALL/SYSENTER. 160 153 * Therefore it must preserve R12+ 161 154 * (they are callee-saved registers in 64-bit C ABI). 162 155 * 163 - * This was probably historically not intended, 164 - * but R8..11 are clobbered (cleared to 0). 165 - * IOW: they are the only registers which aren't 166 - * preserved across INT80 syscall. 156 + * Starting in Linux 4.17 (and any kernel that 157 + * backports the change), R8..11 are preserved. 158 + * Historically (and probably unintentionally), they 159 + * were clobbered or zeroed. 167 160 */ 168 - if (*r64 == 0 && num <= 11) 169 - continue; 170 161 } 171 162 printf("[FAIL]\tR%d has changed:%016llx\n", num, *r64); 172 163 err++;