Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

lib: make memzero_explicit more robust against dead store elimination

In commit 0b053c951829 ("lib: memzero_explicit: use barrier instead
of OPTIMIZER_HIDE_VAR"), we made memzero_explicit() more robust in
case LTO would decide to inline memzero_explicit() and eventually
find out it could be elimiated as dead store.

While using barrier() works well for the case of gcc, recent efforts
from LLVMLinux people suggest to use llvm as an alternative to gcc,
and there, Stephan found in a simple stand-alone user space example
that llvm could nevertheless optimize and thus elimitate the memset().
A similar issue has been observed in the referenced llvm bug report,
which is regarded as not-a-bug.

Based on some experiments, icc is a bit special on its own, while it
doesn't seem to eliminate the memset(), it could do so with an own
implementation, and then result in similar findings as with llvm.

The fix in this patch now works for all three compilers (also tested
with more aggressive optimization levels). Arguably, in the current
kernel tree it's more of a theoretical issue, but imho, it's better
to be pedantic about it.

It's clearly visible with gcc/llvm though, with the below code: if we
would have used barrier() only here, llvm would have omitted clearing,
not so with barrier_data() variant:

static inline void memzero_explicit(void *s, size_t count)
{
memset(s, 0, count);
barrier_data(s);
}

int main(void)
{
char buff[20];
memzero_explicit(buff, sizeof(buff));
return 0;
}

$ gcc -O2 test.c
$ gdb a.out
(gdb) disassemble main
Dump of assembler code for function main:
0x0000000000400400 <+0>: lea -0x28(%rsp),%rax
0x0000000000400405 <+5>: movq $0x0,-0x28(%rsp)
0x000000000040040e <+14>: movq $0x0,-0x20(%rsp)
0x0000000000400417 <+23>: movl $0x0,-0x18(%rsp)
0x000000000040041f <+31>: xor %eax,%eax
0x0000000000400421 <+33>: retq
End of assembler dump.

$ clang -O2 test.c
$ gdb a.out
(gdb) disassemble main
Dump of assembler code for function main:
0x00000000004004f0 <+0>: xorps %xmm0,%xmm0
0x00000000004004f3 <+3>: movaps %xmm0,-0x18(%rsp)
0x00000000004004f8 <+8>: movl $0x0,-0x8(%rsp)
0x0000000000400500 <+16>: lea -0x18(%rsp),%rax
0x0000000000400505 <+21>: xor %eax,%eax
0x0000000000400507 <+23>: retq
End of assembler dump.

As gcc, clang, but also icc defines __GNUC__, it's sufficient to define
this in compiler-gcc.h only to be picked up. For a fallback or otherwise
unsupported compiler, we define it as a barrier. Similarly, for ecc which
does not support gcc inline asm.

Reference: https://llvm.org/bugs/show_bug.cgi?id=15495
Reported-by: Stephan Mueller <smueller@chronox.de>
Tested-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Stephan Mueller <smueller@chronox.de>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: mancha security <mancha1@zoho.com>
Cc: Mark Charlebois <charlebm@gmail.com>
Cc: Behan Webster <behanw@converseincode.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

authored by

Daniel Borkmann and committed by
Herbert Xu
7829fb09 8c98ebd7

+23 -2
+15 -1
include/linux/compiler-gcc.h
··· 9 9 + __GNUC_MINOR__ * 100 \ 10 10 + __GNUC_PATCHLEVEL__) 11 11 12 - 13 12 /* Optimization barrier */ 13 + 14 14 /* The "volatile" is due to gcc bugs */ 15 15 #define barrier() __asm__ __volatile__("": : :"memory") 16 + /* 17 + * This version is i.e. to prevent dead stores elimination on @ptr 18 + * where gcc and llvm may behave differently when otherwise using 19 + * normal barrier(): while gcc behavior gets along with a normal 20 + * barrier(), llvm needs an explicit input variable to be assumed 21 + * clobbered. The issue is as follows: while the inline asm might 22 + * access any memory it wants, the compiler could have fit all of 23 + * @ptr into memory registers instead, and since @ptr never escaped 24 + * from that, it proofed that the inline asm wasn't touching any of 25 + * it. This version works well with both compilers, i.e. we're telling 26 + * the compiler that the inline asm absolutely may see the contents 27 + * of @ptr. See also: https://llvm.org/bugs/show_bug.cgi?id=15495 28 + */ 29 + #define barrier_data(ptr) __asm__ __volatile__("": :"r"(ptr) :"memory") 16 30 17 31 /* 18 32 * This macro obfuscates arithmetic on a variable address so that gcc
+3
include/linux/compiler-intel.h
··· 13 13 /* Intel ECC compiler doesn't support gcc specific asm stmts. 14 14 * It uses intrinsics to do the equivalent things. 15 15 */ 16 + #undef barrier_data 16 17 #undef RELOC_HIDE 17 18 #undef OPTIMIZER_HIDE_VAR 19 + 20 + #define barrier_data(ptr) barrier() 18 21 19 22 #define RELOC_HIDE(ptr, off) \ 20 23 ({ unsigned long __ptr; \
+4
include/linux/compiler.h
··· 169 169 # define barrier() __memory_barrier() 170 170 #endif 171 171 172 + #ifndef barrier_data 173 + # define barrier_data(ptr) barrier() 174 + #endif 175 + 172 176 /* Unreachable code */ 173 177 #ifndef unreachable 174 178 # define unreachable() do { } while (1)
+1 -1
lib/string.c
··· 607 607 void memzero_explicit(void *s, size_t count) 608 608 { 609 609 memset(s, 0, count); 610 - barrier(); 610 + barrier_data(s); 611 611 } 612 612 EXPORT_SYMBOL(memzero_explicit); 613 613