Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

kbuild/Makefile: Prepare for using macros in inline assembly code to work around asm() related GCC inlining bugs

Using macros in inline assembly allows us to work around bugs
in GCC's inlining decisions.

Compile macros.S and use it to assemble all C files.
Currently only x86 will use it.

Background:

The inlining pass of GCC doesn't include an assembler, so it's not aware
of basic properties of the generated code, such as its size in bytes,
or that there are such things as discontiuous blocks of code and data
due to the newfangled linker feature called 'sections' ...

Instead GCC uses a lazy and fragile heuristic: it does a linear count of
certain syntactic and whitespace elements in inlined assembly block source
code, such as a count of new-lines and semicolons (!), as a poor substitute
for "code size and complexity".

Unsurprisingly this heuristic falls over and breaks its neck whith certain
common types of kernel code that use inline assembly, such as the frequent
practice of putting useful information into alternative sections.

As a result of this fresh, 20+ years old GCC bug, GCC's inlining decisions
are effectively disabled for inlined functions that make use of such asm()
blocks, because GCC thinks those sections of code are "large" - when in
reality they are often result in just a very low number of machine
instructions.

This absolute lack of inlining provess when GCC comes across such asm()
blocks both increases generated kernel code size and causes performance
overhead, which is particularly noticeable on paravirt kernels, which make
frequent use of these inlining facilities in attempt to stay out of the
way when running on baremetal hardware.

Instead of fixing the compiler we use a workaround: we set an assembly macro
and call it from the inlined assembly block. As a result GCC considers the
inline assembly block as a single instruction. (Which it often isn't but I digress.)

This uglifies and bloats the source code - for example just the refcount
related changes have this impact:

Makefile | 9 +++++++--
arch/x86/Makefile | 7 +++++++
arch/x86/kernel/macros.S | 7 +++++++
scripts/Kbuild.include | 4 +++-
scripts/mod/Makefile | 2 ++
5 files changed, 26 insertions(+), 3 deletions(-)

Yay readability and maintainability, it's not like assembly code is hard to read
and maintain ...

We also hope that GCC will eventually get fixed, but we are not holding
our breath for that. Yet we are optimistic, it might still happen, any decade now.

[ mingo: Wrote new changelog describing the background. ]

Tested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Nadav Amit <namit@vmware.com>
Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Michal Marek <michal.lkml@markovi.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sam Ravnborg <sam@ravnborg.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kbuild@vger.kernel.org
Link: http://lkml.kernel.org/r/20181003213100.189959-3-namit@vmware.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

authored by

Nadav Amit and committed by
Ingo Molnar
77b0bf55 35e76b99

+26 -3
+7 -2
Makefile
··· 1071 1071 # version.h and scripts_basic is processed / created. 1072 1072 1073 1073 # Listed in dependency order 1074 - PHONY += prepare archprepare prepare0 prepare1 prepare2 prepare3 1074 + PHONY += prepare archprepare macroprepare prepare0 prepare1 prepare2 prepare3 1075 1075 1076 1076 # prepare3 is used to check if we are building in a separate output directory, 1077 1077 # and if so do: ··· 1094 1094 prepare1: prepare2 $(version_h) $(autoksyms_h) include/generated/utsrelease.h 1095 1095 $(cmd_crmodverdir) 1096 1096 1097 - archprepare: archheaders archscripts prepare1 scripts_basic 1097 + macroprepare: prepare1 archmacros 1098 + 1099 + archprepare: archheaders archscripts macroprepare scripts_basic 1098 1100 1099 1101 prepare0: archprepare gcc-plugins 1100 1102 $(Q)$(MAKE) $(build)=. ··· 1163 1161 1164 1162 PHONY += archscripts 1165 1163 archscripts: 1164 + 1165 + PHONY += archmacros 1166 + archmacros: 1166 1167 1167 1168 PHONY += __headers 1168 1169 __headers: $(version_h) scripts_basic uapi-asm-generic archheaders archscripts
+7
arch/x86/Makefile
··· 236 236 archheaders: 237 237 $(Q)$(MAKE) $(build)=arch/x86/entry/syscalls all 238 238 239 + archmacros: 240 + $(Q)$(MAKE) $(build)=arch/x86/kernel arch/x86/kernel/macros.s 241 + 242 + ASM_MACRO_FLAGS = -Wa,arch/x86/kernel/macros.s -Wa,- 243 + export ASM_MACRO_FLAGS 244 + KBUILD_CFLAGS += $(ASM_MACRO_FLAGS) 245 + 239 246 ### 240 247 # Kernel objects 241 248
+7
arch/x86/kernel/macros.S
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + 3 + /* 4 + * This file includes headers whose assembly part includes macros which are 5 + * commonly used. The macros are precompiled into assmebly file which is later 6 + * assembled together with each compiled file. 7 + */
+3 -1
scripts/Kbuild.include
··· 115 115 116 116 # Do not attempt to build with gcc plugins during cc-option tests. 117 117 # (And this uses delayed resolution so the flags will be up to date.) 118 - CC_OPTION_CFLAGS = $(filter-out $(GCC_PLUGINS_CFLAGS),$(KBUILD_CFLAGS)) 118 + # In addition, do not include the asm macros which are built later. 119 + CC_OPTION_FILTERED = $(GCC_PLUGINS_CFLAGS) $(ASM_MACRO_FLAGS) 120 + CC_OPTION_CFLAGS = $(filter-out $(CC_OPTION_FILTERED),$(KBUILD_CFLAGS)) 119 121 120 122 # cc-option 121 123 # Usage: cflags-y += $(call cc-option,-march=winchip-c6,-march=i586)
+2
scripts/mod/Makefile
··· 4 4 hostprogs-y := modpost mk_elfconfig 5 5 always := $(hostprogs-y) empty.o 6 6 7 + CFLAGS_REMOVE_empty.o := $(ASM_MACRO_FLAGS) 8 + 7 9 modpost-objs := modpost.o file2alias.o sumversion.o 8 10 9 11 devicetable-offsets-file := devicetable-offsets.h