Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

ARM: 7825/1: document the use of NEON in kernel mode

Add a file to Documentation/arm explaining how kernel mode NEON
is supposed to be used.

Reviewed-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>

authored by

Ard Biesheuvel and committed by
Russell King
2afd0a05 83d26d11

+121
+121
Documentation/arm/kernel_mode_neon.txt
··· 1 + Kernel mode NEON 2 + ================ 3 + 4 + TL;DR summary 5 + ------------- 6 + * Use only NEON instructions, or VFP instructions that don't rely on support 7 + code 8 + * Isolate your NEON code in a separate compilation unit, and compile it with 9 + '-mfpu=neon -mfloat-abi=softfp' 10 + * Put kernel_neon_begin() and kernel_neon_end() calls around the calls into your 11 + NEON code 12 + * Don't sleep in your NEON code, and be aware that it will be executed with 13 + preemption disabled 14 + 15 + 16 + Introduction 17 + ------------ 18 + It is possible to use NEON instructions (and in some cases, VFP instructions) in 19 + code that runs in kernel mode. However, for performance reasons, the NEON/VFP 20 + register file is not preserved and restored at every context switch or taken 21 + exception like the normal register file is, so some manual intervention is 22 + required. Furthermore, special care is required for code that may sleep [i.e., 23 + may call schedule()], as NEON or VFP instructions will be executed in a 24 + non-preemptible section for reasons outlined below. 25 + 26 + 27 + Lazy preserve and restore 28 + ------------------------- 29 + The NEON/VFP register file is managed using lazy preserve (on UP systems) and 30 + lazy restore (on both SMP and UP systems). This means that the register file is 31 + kept 'live', and is only preserved and restored when multiple tasks are 32 + contending for the NEON/VFP unit (or, in the SMP case, when a task migrates to 33 + another core). Lazy restore is implemented by disabling the NEON/VFP unit after 34 + every context switch, resulting in a trap when subsequently a NEON/VFP 35 + instruction is issued, allowing the kernel to step in and perform the restore if 36 + necessary. 37 + 38 + Any use of the NEON/VFP unit in kernel mode should not interfere with this, so 39 + it is required to do an 'eager' preserve of the NEON/VFP register file, and 40 + enable the NEON/VFP unit explicitly so no exceptions are generated on first 41 + subsequent use. This is handled by the function kernel_neon_begin(), which 42 + should be called before any kernel mode NEON or VFP instructions are issued. 43 + Likewise, the NEON/VFP unit should be disabled again after use to make sure user 44 + mode will hit the lazy restore trap upon next use. This is handled by the 45 + function kernel_neon_end(). 46 + 47 + 48 + Interruptions in kernel mode 49 + ---------------------------- 50 + For reasons of performance and simplicity, it was decided that there shall be no 51 + preserve/restore mechanism for the kernel mode NEON/VFP register contents. This 52 + implies that interruptions of a kernel mode NEON section can only be allowed if 53 + they are guaranteed not to touch the NEON/VFP registers. For this reason, the 54 + following rules and restrictions apply in the kernel: 55 + * NEON/VFP code is not allowed in interrupt context; 56 + * NEON/VFP code is not allowed to sleep; 57 + * NEON/VFP code is executed with preemption disabled. 58 + 59 + If latency is a concern, it is possible to put back to back calls to 60 + kernel_neon_end() and kernel_neon_begin() in places in your code where none of 61 + the NEON registers are live. (Additional calls to kernel_neon_begin() should be 62 + reasonably cheap if no context switch occurred in the meantime) 63 + 64 + 65 + VFP and support code 66 + -------------------- 67 + Earlier versions of VFP (prior to version 3) rely on software support for things 68 + like IEEE-754 compliant underflow handling etc. When the VFP unit needs such 69 + software assistance, it signals the kernel by raising an undefined instruction 70 + exception. The kernel responds by inspecting the VFP control registers and the 71 + current instruction and arguments, and emulates the instruction in software. 72 + 73 + Such software assistance is currently not implemented for VFP instructions 74 + executed in kernel mode. If such a condition is encountered, the kernel will 75 + fail and generate an OOPS. 76 + 77 + 78 + Separating NEON code from ordinary code 79 + --------------------------------------- 80 + The compiler is not aware of the special significance of kernel_neon_begin() and 81 + kernel_neon_end(), i.e., that it is only allowed to issue NEON/VFP instructions 82 + between calls to these respective functions. Furthermore, GCC may generate NEON 83 + instructions of its own at -O3 level if -mfpu=neon is selected, and even if the 84 + kernel is currently compiled at -O2, future changes may result in NEON/VFP 85 + instructions appearing in unexpected places if no special care is taken. 86 + 87 + Therefore, the recommended and only supported way of using NEON/VFP in the 88 + kernel is by adhering to the following rules: 89 + * isolate the NEON code in a separate compilation unit and compile it with 90 + '-mfpu=neon -mfloat-abi=softfp'; 91 + * issue the calls to kernel_neon_begin(), kernel_neon_end() as well as the calls 92 + into the unit containing the NEON code from a compilation unit which is *not* 93 + built with the GCC flag '-mfpu=neon' set. 94 + 95 + As the kernel is compiled with '-msoft-float', the above will guarantee that 96 + both NEON and VFP instructions will only ever appear in designated compilation 97 + units at any optimization level. 98 + 99 + 100 + NEON assembler 101 + -------------- 102 + NEON assembler is supported with no additional caveats as long as the rules 103 + above are followed. 104 + 105 + 106 + NEON code generated by GCC 107 + -------------------------- 108 + The GCC option -ftree-vectorize (implied by -O3) tries to exploit implicit 109 + parallelism, and generates NEON code from ordinary C source code. This is fully 110 + supported as long as the rules above are followed. 111 + 112 + 113 + NEON intrinsics 114 + --------------- 115 + NEON intrinsics are also supported. However, as code using NEON intrinsics 116 + relies on the GCC header <arm_neon.h>, (which #includes <stdint.h>), you should 117 + observe the following in addition to the rules above: 118 + * Compile the unit containing the NEON intrinsics with '-ffreestanding' so GCC 119 + uses its builtin version of <stdint.h> (this is a C99 header which the kernel 120 + does not supply); 121 + * Include <arm_neon.h> last, or at least after <linux/types.h>