Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

x86/boot: Move compressed kernel to the end of the decompression buffer

This change makes later calculations about where the kernel is located
easier to reason about. To better understand this change, we must first
clarify what 'VO' and 'ZO' are. These values were introduced in commits
by hpa:

77d1a4999502 ("x86, boot: make symbols from the main vmlinux available")
37ba7ab5e33c ("x86, boot: make kernel_alignment adjustable; new bzImage fields")

Specifically:

All names prefixed with 'VO_':

- relate to the uncompressed kernel image

- the size of the VO image is: VO__end-VO__text ("VO_INIT_SIZE" define)

All names prefixed with 'ZO_':

- relate to the bootable compressed kernel image (boot/compressed/vmlinux),
which is composed of the following memory areas:
- head text
- compressed kernel (VO image and relocs table)
- decompressor code

- the size of the ZO image is: ZO__end - ZO_startup_32 ("ZO_INIT_SIZE" define, though see below)

The 'INIT_SIZE' value is used to find the larger of the two image sizes:

#define ZO_INIT_SIZE (ZO__end - ZO_startup_32 + ZO_z_extract_offset)
#define VO_INIT_SIZE (VO__end - VO__text)

#if ZO_INIT_SIZE > VO_INIT_SIZE
# define INIT_SIZE ZO_INIT_SIZE
#else
# define INIT_SIZE VO_INIT_SIZE
#endif

The current code uses extract_offset to decide where to position the
copied ZO (i.e. ZO starts at extract_offset). (This is why ZO_INIT_SIZE
currently includes the extract_offset.)

Why does z_extract_offset exist? It's needed because we are trying to minimize
the amount of RAM used for the whole act of creating an uncompressed, executable,
properly relocation-linked kernel image in system memory. We do this so that
kernels can be booted on even very small systems.

To achieve the goal of minimal memory consumption we have implemented an in-place
decompression strategy: instead of cleanly separating the VO and ZO images and
also allocating some memory for the decompression code's runtime needs, we instead
create this elaborate layout of memory buffers where the output (decompressed)
stream, as it progresses, overlaps with and destroys the input (compressed)
stream. This can only be done safely if the ZO image is placed to the end of the
VO range, plus a certain amount of safety distance to make sure that when the last
bytes of the VO range are decompressed, the compressed stream pointer is safely
beyond the end of the VO range.

z_extract_offset is calculated in arch/x86/boot/compressed/mkpiggy.c during
the build process, at a point when we know the exact compressed and
uncompressed size of the kernel images and can calculate this safe minimum
offset value. (Note that the mkpiggy.c calculation is not perfect, because
we don't know the decompressor used at that stage, so the z_extract_offset
calculation is necessarily imprecise and is mostly based on gzip internals -
we'll improve that in the next patch.)

When INIT_SIZE is bigger than VO_INIT_SIZE (uncommon but possible),
the copied ZO occupies the memory from extract_offset to the end of
decompression buffer. It overlaps with the soon-to-be-uncompressed kernel
like this:

|-----compressed kernel image------|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO VO__end ZO__end
^ ^
|-------uncompressed kernel image---------|

When INIT_SIZE is equal to VO_INIT_SIZE (likely) there's still space
left from end of ZO to the end of decompressing buffer, like below.

|-compressed kernel image-|
V V
0 extract_offset +INIT_SIZE
|-----------|---------------|-------------------------|--------|
| | | |
VO__text startup_32 of ZO ZO__end VO__end
^ ^
|------------uncompressed kernel image-------------|

To simplify calculations and avoid special cases, it is cleaner to
always place the compressed kernel image in memory so that ZO__end
is at the end of the decompression buffer, instead of placing t at
the start of extract_offset as is currently done.

This patch adds BP_init_size (which is the INIT_SIZE as passed in from
the boot_params) into asm-offsets.c to make it visible to the assembly
code.

Then when moving the ZO, it calculates the starting position of
the copied ZO (via BP_init_size and the ZO run size) so that the VO__end
will be at the end of the decompression buffer. To make the position
calculation safe, the end of ZO is page aligned (and a comment is added
to the existing VO alignment for good measure).

Signed-off-by: Yinghai Lu <yinghai@kernel.org>
[ Rewrote changelog and comments. ]
Signed-off-by: Kees Cook <keescook@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Young <dyoung@redhat.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vivek Goyal <vgoyal@redhat.com>
Cc: lasse.collin@tukaani.org
Link: http://lkml.kernel.org/r/1461888548-32439-3-git-send-email-keescook@chromium.org
[ Rewrote the changelog some more. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>

authored by

Yinghai Lu and committed by
Ingo Molnar
974f221c 6f9af75f

+35 -8
+9 -2
arch/x86/boot/compressed/head_32.S
··· 176 176 1: 177 177 178 178 /* Target address to relocate to for decompression */ 179 - addl $z_extract_offset, %ebx 179 + movl BP_init_size(%esi), %eax 180 + subl $_end, %eax 181 + addl %eax, %ebx 180 182 181 183 /* Set up the stack */ 182 184 leal boot_stack_end(%ebx), %esp ··· 240 238 /* push arguments for extract_kernel: */ 241 239 pushl $z_run_size /* size of kernel with .bss and .brk */ 242 240 pushl $z_output_len /* decompressed length, end of relocs */ 243 - leal z_extract_offset_negative(%ebx), %ebp 241 + 242 + movl BP_init_size(%esi), %eax 243 + subl $_end, %eax 244 + movl %ebx, %ebp 245 + subl %eax, %ebp 244 246 pushl %ebp /* output address */ 247 + 245 248 pushl $z_input_len /* input_len */ 246 249 leal input_data(%ebx), %eax 247 250 pushl %eax /* input_data */
+6 -2
arch/x86/boot/compressed/head_64.S
··· 110 110 1: 111 111 112 112 /* Target address to relocate to for decompression */ 113 - addl $z_extract_offset, %ebx 113 + movl BP_init_size(%esi), %eax 114 + subl $_end, %eax 115 + addl %eax, %ebx 114 116 115 117 /* 116 118 * Prepare for entering 64 bit mode ··· 340 338 1: 341 339 342 340 /* Target address to relocate to for decompression */ 343 - leaq z_extract_offset(%rbp), %rbx 341 + movl BP_init_size(%rsi), %ebx 342 + subl $_end, %ebx 343 + addq %rbp, %rbx 344 344 345 345 /* Set up the stack */ 346 346 leaq boot_stack_end(%rbx), %rsp
+17
arch/x86/boot/compressed/misc.c
··· 318 318 free(phdrs); 319 319 } 320 320 321 + /* 322 + * The compressed kernel image (ZO), has been moved so that its position 323 + * is against the end of the buffer used to hold the uncompressed kernel 324 + * image (VO) and the execution environment (.bss, .brk), which makes sure 325 + * there is room to do the in-place decompression. (See header.S for the 326 + * calculations.) 327 + * 328 + * |-----compressed kernel image------| 329 + * V V 330 + * 0 extract_offset +INIT_SIZE 331 + * |-----------|---------------|-------------------------|--------| 332 + * | | | | 333 + * VO__text startup_32 of ZO VO__end ZO__end 334 + * ^ ^ 335 + * |-------uncompressed kernel image---------| 336 + * 337 + */ 321 338 asmlinkage __visible void *extract_kernel(void *rmode, memptr heap, 322 339 unsigned char *input_data, 323 340 unsigned long input_len,
-3
arch/x86/boot/compressed/mkpiggy.c
··· 85 85 printf("z_output_len = %lu\n", (unsigned long)olen); 86 86 printf(".globl z_extract_offset\n"); 87 87 printf("z_extract_offset = 0x%lx\n", offs); 88 - /* z_extract_offset_negative allows simplification of head_32.S */ 89 - printf(".globl z_extract_offset_negative\n"); 90 - printf("z_extract_offset_negative = -0x%lx\n", offs); 91 88 printf(".globl z_run_size\n"); 92 89 printf("z_run_size = %lu\n", run_size); 93 90
+1
arch/x86/boot/compressed/vmlinux.lds.S
··· 70 70 _epgtable = . ; 71 71 } 72 72 #endif 73 + . = ALIGN(PAGE_SIZE); /* keep ZO size page aligned */ 73 74 _end = .; 74 75 }
+1
arch/x86/kernel/asm-offsets.c
··· 80 80 OFFSET(BP_hardware_subarch, boot_params, hdr.hardware_subarch); 81 81 OFFSET(BP_version, boot_params, hdr.version); 82 82 OFFSET(BP_kernel_alignment, boot_params, hdr.kernel_alignment); 83 + OFFSET(BP_init_size, boot_params, hdr.init_size); 83 84 OFFSET(BP_pref_address, boot_params, hdr.pref_address); 84 85 OFFSET(BP_code32_start, boot_params, hdr.code32_start); 85 86
+1 -1
arch/x86/kernel/vmlinux.lds.S
··· 334 334 __brk_limit = .; 335 335 } 336 336 337 - . = ALIGN(PAGE_SIZE); 337 + . = ALIGN(PAGE_SIZE); /* keep VO_INIT_SIZE page aligned */ 338 338 _end = .; 339 339 340 340 STABS_DEBUG