Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86: Separate out entry text section

Put x86 entry code into a separate link section: .entry.text.

Separating the entry text section seems to have performance
benefits - caused by more efficient instruction cache usage.

Running hackbench with perf stat --repeat showed that the change
compresses the icache footprint. The icache load miss rate went
down by about 15%:

before patch:
19417627 L1-icache-load-misses ( +- 0.147% )

after patch:
16490788 L1-icache-load-misses ( +- 0.180% )

The motivation of the patch was to fix a particular kprobes
bug that relates to the entry text section, the performance
advantage was discovered accidentally.

Whole perf output follows:

- results for current tip tree:

Performance counter stats for './hackbench/hackbench 10' (500 runs):

19417627 L1-icache-load-misses ( +- 0.147% )
2676914223 instructions # 0.497 IPC ( +- 0.079% )
5389516026 cycles ( +- 0.144% )

0.206267711 seconds time elapsed ( +- 0.138% )

- results for current tip tree with the patch applied:

Performance counter stats for './hackbench/hackbench 10' (500 runs):

16490788 L1-icache-load-misses ( +- 0.180% )
2717734941 instructions # 0.502 IPC ( +- 0.079% )
5414756975 cycles ( +- 0.148% )

0.206747566 seconds time elapsed ( +- 0.137% )

Signed-off-by: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: masami.hiramatsu.pt@hitachi.com
Cc: ananth@in.ibm.com
Cc: davem@davemloft.net
Cc: 2nddept-manager@sdl.hitachi.co.jp
LKML-Reference: <20110307181039.GB15197@jolsa.redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

authored by

Jiri Olsa and committed by
Ingo Molnar
ea714547 86cb2ec7

+18 -4
+2
arch/x86/ia32/ia32entry.S
··· 25 25 #define sysretl_audit ia32_ret_from_sys_call 26 26 #endif 27 27 28 + .section .entry.text, "ax" 29 + 28 30 #define IA32_NR_syscalls ((ia32_syscall_end - ia32_sys_call_table)/8) 29 31 30 32 .macro IA32_ARG_FIXUP noebp=0
+4 -2
arch/x86/kernel/entry_32.S
··· 65 65 #define sysexit_audit syscall_exit_work 66 66 #endif 67 67 68 + .section .entry.text, "ax" 69 + 68 70 /* 69 71 * We use macros for low-level operations which need to be overridden 70 72 * for paravirtualization. The following will never clobber any registers: ··· 790 788 */ 791 789 .section .init.rodata,"a" 792 790 ENTRY(interrupt) 793 - .text 791 + .section .entry.text, "ax" 794 792 .p2align 5 795 793 .p2align CONFIG_X86_L1_CACHE_SHIFT 796 794 ENTRY(irq_entries_start) ··· 809 807 .endif 810 808 .previous 811 809 .long 1b 812 - .text 810 + .section .entry.text, "ax" 813 811 vector=vector+1 814 812 .endif 815 813 .endr
+4 -2
arch/x86/kernel/entry_64.S
··· 61 61 #define __AUDIT_ARCH_LE 0x40000000 62 62 63 63 .code64 64 + .section .entry.text, "ax" 65 + 64 66 #ifdef CONFIG_FUNCTION_TRACER 65 67 #ifdef CONFIG_DYNAMIC_FTRACE 66 68 ENTRY(mcount) ··· 746 744 */ 747 745 .section .init.rodata,"a" 748 746 ENTRY(interrupt) 749 - .text 747 + .section .entry.text 750 748 .p2align 5 751 749 .p2align CONFIG_X86_L1_CACHE_SHIFT 752 750 ENTRY(irq_entries_start) ··· 765 763 .endif 766 764 .previous 767 765 .quad 1b 768 - .text 766 + .section .entry.text 769 767 vector=vector+1 770 768 .endif 771 769 .endr
+1
arch/x86/kernel/vmlinux.lds.S
··· 105 105 SCHED_TEXT 106 106 LOCK_TEXT 107 107 KPROBES_TEXT 108 + ENTRY_TEXT 108 109 IRQENTRY_TEXT 109 110 *(.fixup) 110 111 *(.gnu.warning)
+1
include/asm-generic/sections.h
··· 11 11 extern char _end[]; 12 12 extern char __per_cpu_load[], __per_cpu_start[], __per_cpu_end[]; 13 13 extern char __kprobes_text_start[], __kprobes_text_end[]; 14 + extern char __entry_text_start[], __entry_text_end[]; 14 15 extern char __initdata_begin[], __initdata_end[]; 15 16 extern char __start_rodata[], __end_rodata[]; 16 17
+6
include/asm-generic/vmlinux.lds.h
··· 424 424 *(.kprobes.text) \ 425 425 VMLINUX_SYMBOL(__kprobes_text_end) = .; 426 426 427 + #define ENTRY_TEXT \ 428 + ALIGN_FUNCTION(); \ 429 + VMLINUX_SYMBOL(__entry_text_start) = .; \ 430 + *(.entry.text) \ 431 + VMLINUX_SYMBOL(__entry_text_end) = .; 432 + 427 433 #ifdef CONFIG_FUNCTION_GRAPH_TRACER 428 434 #define IRQENTRY_TEXT \ 429 435 ALIGN_FUNCTION(); \