Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

kbuild: Add Propeller configuration for kernel build

Add the build support for using Clang's Propeller optimizer. Like
AutoFDO, Propeller uses hardware sampling to gather information
about the frequency of execution of different code paths within a
binary. This information is then used to guide the compiler's
optimization decisions, resulting in a more efficient binary.

The support requires a Clang compiler LLVM 19 or later, and the
create_llvm_prof tool
(https://github.com/google/autofdo/releases/tag/v0.30.1). This
commit is limited to x86 platforms that support PMU features
like LBR on Intel machines and AMD Zen3 BRS.

Here is an example workflow for building an AutoFDO+Propeller
optimized kernel:

1) Build the kernel on the host machine, with AutoFDO and Propeller
build config
CONFIG_AUTOFDO_CLANG=y
CONFIG_PROPELLER_CLANG=y
then
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile>

“<autofdo_profile>” is the profile collected when doing a non-Propeller
AutoFDO build. This step builds a kernel that has the same optimization
level as AutoFDO, plus a metadata section that records basic block
information. This kernel image runs as fast as an AutoFDO optimized
kernel.

2) Install the kernel on test/production machines.

3) Run the load tests. The '-c' option in perf specifies the sample
event period. We suggest using a suitable prime number,
like 500009, for this purpose.
For Intel platforms:
$ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> \
-o <perf_file> -- <loadtest>
For AMD platforms:
The supported system are: Zen3 with BRS, or Zen4 with amd_lbr_v2
# To see if Zen3 support LBR:
$ cat proc/cpuinfo | grep " brs"
# To see if Zen4 support LBR:
$ cat proc/cpuinfo | grep amd_lbr_v2
# If the result is yes, then collect the profile using:
$ perf record --pfm-events RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a \
-N -b -c <count> -o <perf_file> -- <loadtest>

4) (Optional) Download the raw perf file to the host machine.

5) Generate Propeller profile:
$ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> \
--format=propeller --propeller_output_module_name \
--out=<propeller_profile_prefix>_cc_profile.txt \
--propeller_symorder=<propeller_profile_prefix>_ld_profile.txt

“create_llvm_prof” is the profile conversion tool, and a prebuilt
binary for linux can be found on
https://github.com/google/autofdo/releases/tag/v0.30.1 (can also build
from source).

"<propeller_profile_prefix>" can be something like
"/home/user/dir/any_string".

This command generates a pair of Propeller profiles:
"<propeller_profile_prefix>_cc_profile.txt" and
"<propeller_profile_prefix>_ld_profile.txt".

6) Rebuild the kernel using the AutoFDO and Propeller profile files.
CONFIG_AUTOFDO_CLANG=y
CONFIG_PROPELLER_CLANG=y
and
$ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo_profile> \
CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>

Co-developed-by: Han Shen <shenhan@google.com>
Signed-off-by: Han Shen <shenhan@google.com>
Signed-off-by: Rong Xu <xur@google.com>
Suggested-by: Sriraman Tallam <tmsriram@google.com>
Suggested-by: Krzysztof Pszeniczny <kpszeniczny@google.com>
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Suggested-by: Stephane Eranian <eranian@google.com>
Tested-by: Yonghong Song <yonghong.song@linux.dev>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>

authored by

Rong Xu and committed by
Masahiro Yamada
d5dc9583 2fd65f7a

+237 -3
+1
Documentation/dev-tools/index.rst
··· 35 35 checkuapi 36 36 gpio-sloppy-logic-analyzer 37 37 autofdo 38 + propeller 38 39 39 40 40 41 .. only:: subproject and html
+162
Documentation/dev-tools/propeller.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + 3 + ===================================== 4 + Using Propeller with the Linux kernel 5 + ===================================== 6 + 7 + This enables Propeller build support for the kernel when using Clang 8 + compiler. Propeller is a profile-guided optimization (PGO) method used 9 + to optimize binary executables. Like AutoFDO, it utilizes hardware 10 + sampling to gather information about the frequency of execution of 11 + different code paths within a binary. Unlike AutoFDO, this information 12 + is then used right before linking phase to optimize (among others) 13 + block layout within and across functions. 14 + 15 + A few important notes about adopting Propeller optimization: 16 + 17 + #. Although it can be used as a standalone optimization step, it is 18 + strongly recommended to apply Propeller on top of AutoFDO, 19 + AutoFDO+ThinLTO or Instrument FDO. The rest of this document 20 + assumes this paradigm. 21 + 22 + #. Propeller uses another round of profiling on top of 23 + AutoFDO/AutoFDO+ThinLTO/iFDO. The whole build process involves 24 + "build-afdo - train-afdo - build-propeller - train-propeller - 25 + build-optimized". 26 + 27 + #. Propeller requires LLVM 19 release or later for Clang/Clang++ 28 + and the linker(ld.lld). 29 + 30 + #. In addition to LLVM toolchain, Propeller requires a profiling 31 + conversion tool: https://github.com/google/autofdo with a release 32 + after v0.30.1: https://github.com/google/autofdo/releases/tag/v0.30.1. 33 + 34 + The Propeller optimization process involves the following steps: 35 + 36 + #. Initial building: Build the AutoFDO or AutoFDO+ThinLTO binary as 37 + you would normally do, but with a set of compile-time / link-time 38 + flags, so that a special metadata section is created within the 39 + kernel binary. The special section is only intend to be used by the 40 + profiling tool, it is not part of the runtime image, nor does it 41 + change kernel run time text sections. 42 + 43 + #. Profiling: The above kernel is then run with a representative 44 + workload to gather execution frequency data. This data is collected 45 + using hardware sampling, via perf. Propeller is most effective on 46 + platforms supporting advanced PMU features like LBR on Intel 47 + machines. This step is the same as profiling the kernel for AutoFDO 48 + (the exact perf parameters can be different). 49 + 50 + #. Propeller profile generation: Perf output file is converted to a 51 + pair of Propeller profiles via an offline tool. 52 + 53 + #. Optimized build: Build the AutoFDO or AutoFDO+ThinLTO optimized 54 + binary as you would normally do, but with a compile-time / 55 + link-time flag to pick up the Propeller compile time and link time 56 + profiles. This build step uses 3 profiles - the AutoFDO profile, 57 + the Propeller compile-time profile and the Propeller link-time 58 + profile. 59 + 60 + #. Deployment: The optimized kernel binary is deployed and used 61 + in production environments, providing improved performance 62 + and reduced latency. 63 + 64 + Preparation 65 + =========== 66 + 67 + Configure the kernel with:: 68 + 69 + CONFIG_AUTOFDO_CLANG=y 70 + CONFIG_PROPELLER_CLANG=y 71 + 72 + Customization 73 + ============= 74 + 75 + The default CONFIG_PROPELLER_CLANG setting covers kernel space objects 76 + for Propeller builds. One can, however, enable or disable Propeller build 77 + for individual files and directories by adding a line similar to the 78 + following to the respective kernel Makefile: 79 + 80 + - For enabling a single file (e.g. foo.o):: 81 + 82 + PROPELLER_PROFILE_foo.o := y 83 + 84 + - For enabling all files in one directory:: 85 + 86 + PROPELLER_PROFILE := y 87 + 88 + - For disabling one file:: 89 + 90 + PROPELLER_PROFILE_foo.o := n 91 + 92 + - For disabling all files in one directory:: 93 + 94 + PROPELLER__PROFILE := n 95 + 96 + 97 + Workflow 98 + ======== 99 + 100 + Here is an example workflow for building an AutoFDO+Propeller kernel: 101 + 102 + 1) Assuming an AutoFDO profile is already collected following 103 + instructions in the AutoFDO document, build the kernel on the host 104 + machine, with AutoFDO and Propeller build configs :: 105 + 106 + CONFIG_AUTOFDO_CLANG=y 107 + CONFIG_PROPELLER_CLANG=y 108 + 109 + and :: 110 + 111 + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<autofdo-profile-name> 112 + 113 + 2) Install the kernel on the test machine. 114 + 115 + 3) Run the load tests. The '-c' option in perf specifies the sample 116 + event period. We suggest using a suitable prime number, like 500009, 117 + for this purpose. 118 + 119 + - For Intel platforms:: 120 + 121 + $ perf record -e BR_INST_RETIRED.NEAR_TAKEN:k -a -N -b -c <count> -o <perf_file> -- <loadtest> 122 + 123 + - For AMD platforms:: 124 + 125 + $ perf record --pfm-event RETIRED_TAKEN_BRANCH_INSTRUCTIONS:k -a -N -b -c <count> -o <perf_file> -- <loadtest> 126 + 127 + Note you can repeat the above steps to collect multiple <perf_file>s. 128 + 129 + 4) (Optional) Download the raw perf file(s) to the host machine. 130 + 131 + 5) Use the create_llvm_prof tool (https://github.com/google/autofdo) to 132 + generate Propeller profile. :: 133 + 134 + $ create_llvm_prof --binary=<vmlinux> --profile=<perf_file> 135 + --format=propeller --propeller_output_module_name 136 + --out=<propeller_profile_prefix>_cc_profile.txt 137 + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt 138 + 139 + "<propeller_profile_prefix>" can be something like "/home/user/dir/any_string". 140 + 141 + This command generates a pair of Propeller profiles: 142 + "<propeller_profile_prefix>_cc_profile.txt" and 143 + "<propeller_profile_prefix>_ld_profile.txt". 144 + 145 + If there are more than 1 perf_file collected in the previous step, 146 + you can create a temp list file "<perf_file_list>" with each line 147 + containing one perf file name and run:: 148 + 149 + $ create_llvm_prof --binary=<vmlinux> --profile=@<perf_file_list> 150 + --format=propeller --propeller_output_module_name 151 + --out=<propeller_profile_prefix>_cc_profile.txt 152 + --propeller_symorder=<propeller_profile_prefix>_ld_profile.txt 153 + 154 + 6) Rebuild the kernel using the AutoFDO and Propeller 155 + profiles. :: 156 + 157 + CONFIG_AUTOFDO_CLANG=y 158 + CONFIG_PROPELLER_CLANG=y 159 + 160 + and :: 161 + 162 + $ make LLVM=1 CLANG_AUTOFDO_PROFILE=<profile_file> CLANG_PROPELLER_PROFILE_PREFIX=<propeller_profile_prefix>
+7
MAINTAINERS
··· 18503 18503 F: include/linux/psi* 18504 18504 F: kernel/sched/psi.c 18505 18505 18506 + PROPELLER BUILD 18507 + M: Rong Xu <xur@google.com> 18508 + M: Han Shen <shenhan@google.com> 18509 + S: Supported 18510 + F: Documentation/dev-tools/propeller.rst 18511 + F: scripts/Makefile.propeller 18512 + 18506 18513 PRINTK 18507 18514 M: Petr Mladek <pmladek@suse.com> 18508 18515 R: Steven Rostedt <rostedt@goodmis.org>
+1
Makefile
··· 1024 1024 include-$(CONFIG_KCOV) += scripts/Makefile.kcov 1025 1025 include-$(CONFIG_RANDSTRUCT) += scripts/Makefile.randstruct 1026 1026 include-$(CONFIG_AUTOFDO_CLANG) += scripts/Makefile.autofdo 1027 + include-$(CONFIG_PROPELLER_CLANG) += scripts/Makefile.propeller 1027 1028 include-$(CONFIG_GCC_PLUGINS) += scripts/Makefile.gcc-plugins 1028 1029 1029 1030 include $(addprefix $(srctree)/, $(include-y))
+19
arch/Kconfig
··· 831 831 832 832 If unsure, say N. 833 833 834 + config ARCH_SUPPORTS_PROPELLER_CLANG 835 + bool 836 + 837 + config PROPELLER_CLANG 838 + bool "Enable Clang's Propeller build" 839 + depends on ARCH_SUPPORTS_PROPELLER_CLANG 840 + depends on CC_IS_CLANG && CLANG_VERSION >= 190000 841 + help 842 + This option enables Clang’s Propeller build. When the Propeller 843 + profiles is specified in variable CLANG_PROPELLER_PROFILE_PREFIX 844 + during the build process, Clang uses the profiles to optimize 845 + the kernel. 846 + 847 + If no profile is specified, Propeller options are still passed 848 + to Clang to facilitate the collection of perf data for creating 849 + the Propeller profiles in subsequent builds. 850 + 851 + If unsure, say N. 852 + 834 853 config ARCH_SUPPORTS_CFI_CLANG 835 854 bool 836 855 help
+1
arch/x86/Kconfig
··· 127 127 select ARCH_SUPPORTS_LTO_CLANG_THIN 128 128 select ARCH_SUPPORTS_RT 129 129 select ARCH_SUPPORTS_AUTOFDO_CLANG 130 + select ARCH_SUPPORTS_PROPELLER_CLANG if X86_64 130 131 select ARCH_USE_BUILTIN_BSWAP 131 132 select ARCH_USE_CMPXCHG_LOCKREF if X86_CMPXCHG64 132 133 select ARCH_USE_MEMTEST
+4
arch/x86/kernel/vmlinux.lds.S
··· 443 443 444 444 STABS_DEBUG 445 445 DWARF_DEBUG 446 + #ifdef CONFIG_PROPELLER_CLANG 447 + .llvm_bb_addr_map : { *(.llvm_bb_addr_map) } 448 + #endif 449 + 446 450 ELF_DETAILS 447 451 448 452 DISCARDS
+3 -3
include/asm-generic/vmlinux.lds.h
··· 95 95 * With LTO_CLANG, the linker also splits sections by default, so we need 96 96 * these macros to combine the sections during the final link. 97 97 * 98 - * With AUTOFDO_CLANG, by default, the linker splits text sections and 99 - * regroups functions into subsections. 98 + * With AUTOFDO_CLANG and PROPELLER_CLANG, by default, the linker splits 99 + * text sections and regroups functions into subsections. 100 100 * 101 101 * RODATA_MAIN is not used because existing code already defines .rodata.x 102 102 * sections to be brought in with rodata. 103 103 */ 104 104 #if defined(CONFIG_LD_DEAD_CODE_DATA_ELIMINATION) || defined(CONFIG_LTO_CLANG) || \ 105 - defined(CONFIG_AUTOFDO_CLANG) 105 + defined(CONFIG_AUTOFDO_CLANG) || defined(CONFIG_PROPELLER_CLANG) 106 106 #define TEXT_MAIN .text .text.[0-9a-zA-Z_]* 107 107 #else 108 108 #define TEXT_MAIN .text
+10
scripts/Makefile.lib
··· 201 201 $(CFLAGS_AUTOFDO_CLANG)) 202 202 endif 203 203 204 + # 205 + # Enable Propeller build flags except some files or directories we don't want to 206 + # enable (depends on variables AUTOFDO_PROPELLER_obj.o and PROPELLER_PROFILE). 207 + # 208 + ifdef CONFIG_PROPELLER_CLANG 209 + _c_flags += $(if $(patsubst n%,, \ 210 + $(AUTOFDO_PROFILE_$(target-stem).o)$(AUTOFDO_PROFILE)$(PROPELLER_PROFILE))$(is-kernel-object), \ 211 + $(CFLAGS_PROPELLER_CLANG)) 212 + endif 213 + 204 214 # $(src) for including checkin headers from generated source files 205 215 # $(obj) for including generated headers from checkin source files 206 216 ifeq ($(KBUILD_EXTMOD),)
+28
scripts/Makefile.propeller
··· 1 + # SPDX-License-Identifier: GPL-2.0 2 + 3 + # Enable available and selected Clang Propeller features. 4 + ifdef CLANG_PROPELLER_PROFILE_PREFIX 5 + CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=list=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt -ffunction-sections 6 + KBUILD_LDFLAGS += --symbol-ordering-file=$(CLANG_PROPELLER_PROFILE_PREFIX)_ld_profile.txt --no-warn-symbol-ordering 7 + else 8 + CFLAGS_PROPELLER_CLANG := -fbasic-block-sections=labels 9 + endif 10 + 11 + # Propeller requires debug information to embed module names in the profiles. 12 + # If CONFIG_DEBUG_INFO is not enabled, set -gmlt option. Skip this for AutoFDO, 13 + # as the option should already be set. 14 + ifndef CONFIG_DEBUG_INFO 15 + ifndef CONFIG_AUTOFDO_CLANG 16 + CFLAGS_PROPELLER_CLANG += -gmlt 17 + endif 18 + endif 19 + 20 + ifdef CONFIG_LTO_CLANG_THIN 21 + ifdef CLANG_PROPELLER_PROFILE_PREFIX 22 + KBUILD_LDFLAGS += --lto-basic-block-sections=$(CLANG_PROPELLER_PROFILE_PREFIX)_cc_profile.txt 23 + else 24 + KBUILD_LDFLAGS += --lto-basic-block-sections=labels 25 + endif 26 + endif 27 + 28 + export CFLAGS_PROPELLER_CLANG
+1
tools/objtool/check.c
··· 4558 4558 !strcmp(sec->name, "__mcount_loc") || 4559 4559 !strcmp(sec->name, ".kcfi_traps") || 4560 4560 !strcmp(sec->name, ".llvm.call-graph-profile") || 4561 + !strcmp(sec->name, ".llvm_bb_addr_map") || 4561 4562 strstr(sec->name, "__patchable_function_entries")) 4562 4563 continue; 4563 4564