Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

kmsan: add ReST documentation

Add Documentation/dev-tools/kmsan.rst and reference it in the dev-tools
index.

Link: https://lkml.kernel.org/r/20220915150417.722975-7-glider@google.com
Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Andrey Konovalov <andreyknvl@google.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers@google.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Marco Elver <elver@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Alexander Potapenko and committed by
Andrew Morton
93858ae7 2b420aaf

+428
+1
Documentation/dev-tools/index.rst
··· 24 24 kcov 25 25 gcov 26 26 kasan 27 + kmsan 27 28 ubsan 28 29 kmemleak 29 30 kcsan
+427
Documentation/dev-tools/kmsan.rst
··· 1 + .. SPDX-License-Identifier: GPL-2.0 2 + .. Copyright (C) 2022, Google LLC. 3 + 4 + =================================== 5 + The Kernel Memory Sanitizer (KMSAN) 6 + =================================== 7 + 8 + KMSAN is a dynamic error detector aimed at finding uses of uninitialized 9 + values. It is based on compiler instrumentation, and is quite similar to the 10 + userspace `MemorySanitizer tool`_. 11 + 12 + An important note is that KMSAN is not intended for production use, because it 13 + drastically increases kernel memory footprint and slows the whole system down. 14 + 15 + Usage 16 + ===== 17 + 18 + Building the kernel 19 + ------------------- 20 + 21 + In order to build a kernel with KMSAN you will need a fresh Clang (14.0.6+). 22 + Please refer to `LLVM documentation`_ for the instructions on how to build Clang. 23 + 24 + Now configure and build the kernel with CONFIG_KMSAN enabled. 25 + 26 + Example report 27 + -------------- 28 + 29 + Here is an example of a KMSAN report:: 30 + 31 + ===================================================== 32 + BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test] 33 + test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273 34 + kunit_run_case_internal lib/kunit/test.c:333 35 + kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374 36 + kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28 37 + kthread+0x721/0x850 kernel/kthread.c:327 38 + ret_from_fork+0x1f/0x30 ??:? 39 + 40 + Uninit was stored to memory at: 41 + do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260 42 + test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271 43 + kunit_run_case_internal lib/kunit/test.c:333 44 + kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374 45 + kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28 46 + kthread+0x721/0x850 kernel/kthread.c:327 47 + ret_from_fork+0x1f/0x30 ??:? 48 + 49 + Local variable uninit created at: 50 + do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256 51 + test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271 52 + 53 + Bytes 4-7 of 8 are uninitialized 54 + Memory access of size 8 starts at ffff888083fe3da0 55 + 56 + CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G B E 5.16.0-rc3+ #104 57 + Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014 58 + ===================================================== 59 + 60 + The report says that the local variable ``uninit`` was created uninitialized in 61 + ``do_uninit_local_array()``. The third stack trace corresponds to the place 62 + where this variable was created. 63 + 64 + The first stack trace shows where the uninit value was used (in 65 + ``test_uninit_kmsan_check_memory()``). The tool shows the bytes which were left 66 + uninitialized in the local variable, as well as the stack where the value was 67 + copied to another memory location before use. 68 + 69 + A use of uninitialized value ``v`` is reported by KMSAN in the following cases: 70 + - in a condition, e.g. ``if (v) { ... }``; 71 + - in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``; 72 + - when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``; 73 + - when it is passed as an argument to a function, and 74 + ``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below). 75 + 76 + The mentioned cases (apart from copying data to userspace or hardware, which is 77 + a security issue) are considered undefined behavior from the C11 Standard point 78 + of view. 79 + 80 + Disabling the instrumentation 81 + ----------------------------- 82 + 83 + A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN 84 + ignore uninitialized values in that function and mark its output as initialized. 85 + As a result, the user will not get KMSAN reports related to that function. 86 + 87 + Another function attribute supported by KMSAN is ``__no_sanitize_memory``. 88 + Applying this attribute to a function will result in KMSAN not instrumenting 89 + it, which can be helpful if we do not want the compiler to interfere with some 90 + low-level code (e.g. that marked with ``noinstr`` which implicitly adds 91 + ``__no_sanitize_memory``). 92 + 93 + This however comes at a cost: stack allocations from such functions will have 94 + incorrect shadow/origin values, likely leading to false positives. Functions 95 + called from non-instrumented code may also receive incorrect metadata for their 96 + parameters. 97 + 98 + As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly. 99 + 100 + It is also possible to disable KMSAN for a single file (e.g. main.o):: 101 + 102 + KMSAN_SANITIZE_main.o := n 103 + 104 + or for the whole directory:: 105 + 106 + KMSAN_SANITIZE := n 107 + 108 + in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every 109 + function in the file or directory. Most users won't need KMSAN_SANITIZE, unless 110 + their code gets broken by KMSAN (e.g. runs at early boot time). 111 + 112 + Support 113 + ======= 114 + 115 + In order for KMSAN to work the kernel must be built with Clang, which so far is 116 + the only compiler that has KMSAN support. The kernel instrumentation pass is 117 + based on the userspace `MemorySanitizer tool`_. 118 + 119 + The runtime library only supports x86_64 at the moment. 120 + 121 + How KMSAN works 122 + =============== 123 + 124 + KMSAN shadow memory 125 + ------------------- 126 + 127 + KMSAN associates a metadata byte (also called shadow byte) with every byte of 128 + kernel memory. A bit in the shadow byte is set iff the corresponding bit of the 129 + kernel memory byte is uninitialized. Marking the memory uninitialized (i.e. 130 + setting its shadow bytes to ``0xff``) is called poisoning, marking it 131 + initialized (setting the shadow bytes to ``0x00``) is called unpoisoning. 132 + 133 + When a new variable is allocated on the stack, it is poisoned by default by 134 + instrumentation code inserted by the compiler (unless it is a stack variable 135 + that is immediately initialized). Any new heap allocation done without 136 + ``__GFP_ZERO`` is also poisoned. 137 + 138 + Compiler instrumentation also tracks the shadow values as they are used along 139 + the code. When needed, instrumentation code invokes the runtime library in 140 + ``mm/kmsan/`` to persist shadow values. 141 + 142 + The shadow value of a basic or compound type is an array of bytes of the same 143 + length. When a constant value is written into memory, that memory is unpoisoned. 144 + When a value is read from memory, its shadow memory is also obtained and 145 + propagated into all the operations which use that value. For every instruction 146 + that takes one or more values the compiler generates code that calculates the 147 + shadow of the result depending on those values and their shadows. 148 + 149 + Example:: 150 + 151 + int a = 0xff; // i.e. 0x000000ff 152 + int b; 153 + int c = a | b; 154 + 155 + In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``, 156 + shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of 157 + ``c`` are uninitialized, while the lower byte is initialized. 158 + 159 + Origin tracking 160 + --------------- 161 + 162 + Every four bytes of kernel memory also have a so-called origin mapped to them. 163 + This origin describes the point in program execution at which the uninitialized 164 + value was created. Every origin is associated with either the full allocation 165 + stack (for heap-allocated memory), or the function containing the uninitialized 166 + variable (for locals). 167 + 168 + When an uninitialized variable is allocated on stack or heap, a new origin 169 + value is created, and that variable's origin is filled with that value. When a 170 + value is read from memory, its origin is also read and kept together with the 171 + shadow. For every instruction that takes one or more values, the origin of the 172 + result is one of the origins corresponding to any of the uninitialized inputs. 173 + If a poisoned value is written into memory, its origin is written to the 174 + corresponding storage as well. 175 + 176 + Example 1:: 177 + 178 + int a = 42; 179 + int b; 180 + int c = a + b; 181 + 182 + In this case the origin of ``b`` is generated upon function entry, and is 183 + stored to the origin of ``c`` right before the addition result is written into 184 + memory. 185 + 186 + Several variables may share the same origin address, if they are stored in the 187 + same four-byte chunk. In this case every write to either variable updates the 188 + origin for all of them. We have to sacrifice precision in this case, because 189 + storing origins for individual bits (and even bytes) would be too costly. 190 + 191 + Example 2:: 192 + 193 + int combine(short a, short b) { 194 + union ret_t { 195 + int i; 196 + short s[2]; 197 + } ret; 198 + ret.s[0] = a; 199 + ret.s[1] = b; 200 + return ret.i; 201 + } 202 + 203 + If ``a`` is initialized and ``b`` is not, the shadow of the result would be 204 + 0xffff0000, and the origin of the result would be the origin of ``b``. 205 + ``ret.s[0]`` would have the same origin, but it will never be used, because 206 + that variable is initialized. 207 + 208 + If both function arguments are uninitialized, only the origin of the second 209 + argument is preserved. 210 + 211 + Origin chaining 212 + ~~~~~~~~~~~~~~~ 213 + 214 + To ease debugging, KMSAN creates a new origin for every store of an 215 + uninitialized value to memory. The new origin references both its creation stack 216 + and the previous origin the value had. This may cause increased memory 217 + consumption, so we limit the length of origin chains in the runtime. 218 + 219 + Clang instrumentation API 220 + ------------------------- 221 + 222 + Clang instrumentation pass inserts calls to functions defined in 223 + ``mm/kmsan/nstrumentation.c`` into the kernel code. 224 + 225 + Shadow manipulation 226 + ~~~~~~~~~~~~~~~~~~~ 227 + 228 + For every memory access the compiler emits a call to a function that returns a 229 + pair of pointers to the shadow and origin addresses of the given memory:: 230 + 231 + typedef struct { 232 + void *shadow, *origin; 233 + } shadow_origin_ptr_t 234 + 235 + shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr) 236 + shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr) 237 + shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size) 238 + shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size) 239 + 240 + The function name depends on the memory access size. 241 + 242 + The compiler makes sure that for every loaded value its shadow and origin 243 + values are read from memory. When a value is stored to memory, its shadow and 244 + origin are also stored using the metadata pointers. 245 + 246 + Handling locals 247 + ~~~~~~~~~~~~~~~ 248 + 249 + A special function is used to create a new origin value for a local variable and 250 + set the origin of that variable to that value:: 251 + 252 + void __msan_poison_alloca(void *addr, uintptr_t size, char *descr) 253 + 254 + Access to per-task data 255 + ~~~~~~~~~~~~~~~~~~~~~~~ 256 + 257 + At the beginning of every instrumented function KMSAN inserts a call to 258 + ``__msan_get_context_state()``:: 259 + 260 + kmsan_context_state *__msan_get_context_state(void) 261 + 262 + ``kmsan_context_state`` is declared in ``include/linux/kmsan.h``:: 263 + 264 + struct kmsan_context_state { 265 + char param_tls[KMSAN_PARAM_SIZE]; 266 + char retval_tls[KMSAN_RETVAL_SIZE]; 267 + char va_arg_tls[KMSAN_PARAM_SIZE]; 268 + char va_arg_origin_tls[KMSAN_PARAM_SIZE]; 269 + u64 va_arg_overflow_size_tls; 270 + char param_origin_tls[KMSAN_PARAM_SIZE]; 271 + depot_stack_handle_t retval_origin_tls; 272 + }; 273 + 274 + This structure is used by KMSAN to pass parameter shadows and origins between 275 + instrumented functions (unless the parameters are checked immediately by 276 + ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``). 277 + 278 + Passing uninitialized values to functions 279 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 280 + 281 + Clang's MemorySanitizer instrumentation has an option, 282 + ``-fsanitize-memory-param-retval``, which makes the compiler check function 283 + parameters passed by value, as well as function return values. 284 + 285 + The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is 286 + enabled by default to let KMSAN report uninitialized values earlier. 287 + Please refer to the `LKML discussion`_ for more details. 288 + 289 + Because of the way the checks are implemented in LLVM (they are only applied to 290 + parameters marked as ``noundef``), not all parameters are guaranteed to be 291 + checked, so we cannot give up the metadata storage in ``kmsan_context_state``. 292 + 293 + String functions 294 + ~~~~~~~~~~~~~~~~ 295 + 296 + The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the 297 + following functions. These functions are also called when data structures are 298 + initialized or copied, making sure shadow and origin values are copied alongside 299 + with the data:: 300 + 301 + void *__msan_memcpy(void *dst, void *src, uintptr_t n) 302 + void *__msan_memmove(void *dst, void *src, uintptr_t n) 303 + void *__msan_memset(void *dst, int c, uintptr_t n) 304 + 305 + Error reporting 306 + ~~~~~~~~~~~~~~~ 307 + 308 + For each use of a value the compiler emits a shadow check that calls 309 + ``__msan_warning()`` in the case that value is poisoned:: 310 + 311 + void __msan_warning(u32 origin) 312 + 313 + ``__msan_warning()`` causes KMSAN runtime to print an error report. 314 + 315 + Inline assembly instrumentation 316 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 317 + 318 + KMSAN instruments every inline assembly output with a call to:: 319 + 320 + void __msan_instrument_asm_store(void *addr, uintptr_t size) 321 + 322 + , which unpoisons the memory region. 323 + 324 + This approach may mask certain errors, but it also helps to avoid a lot of 325 + false positives in bitwise operations, atomics etc. 326 + 327 + Sometimes the pointers passed into inline assembly do not point to valid memory. 328 + In such cases they are ignored at runtime. 329 + 330 + 331 + Runtime library 332 + --------------- 333 + 334 + The code is located in ``mm/kmsan/``. 335 + 336 + Per-task KMSAN state 337 + ~~~~~~~~~~~~~~~~~~~~ 338 + 339 + Every task_struct has an associated KMSAN task state that holds the KMSAN 340 + context (see above) and a per-task flag disallowing KMSAN reports:: 341 + 342 + struct kmsan_context { 343 + ... 344 + bool allow_reporting; 345 + struct kmsan_context_state cstate; 346 + ... 347 + } 348 + 349 + struct task_struct { 350 + ... 351 + struct kmsan_context kmsan; 352 + ... 353 + } 354 + 355 + KMSAN contexts 356 + ~~~~~~~~~~~~~~ 357 + 358 + When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to 359 + hold the metadata for function parameters and return values. 360 + 361 + But in the case the kernel is running in the interrupt, softirq or NMI context, 362 + where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state:: 363 + 364 + DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx); 365 + 366 + Metadata allocation 367 + ~~~~~~~~~~~~~~~~~~~ 368 + 369 + There are several places in the kernel for which the metadata is stored. 370 + 371 + 1. Each ``struct page`` instance contains two pointers to its shadow and 372 + origin pages:: 373 + 374 + struct page { 375 + ... 376 + struct page *shadow, *origin; 377 + ... 378 + }; 379 + 380 + At boot-time, the kernel allocates shadow and origin pages for every available 381 + kernel page. This is done quite late, when the kernel address space is already 382 + fragmented, so normal data pages may arbitrarily interleave with the metadata 383 + pages. 384 + 385 + This means that in general for two contiguous memory pages their shadow/origin 386 + pages may not be contiguous. Consequently, if a memory access crosses the 387 + boundary of a memory block, accesses to shadow/origin memory may potentially 388 + corrupt other pages or read incorrect values from them. 389 + 390 + In practice, contiguous memory pages returned by the same ``alloc_pages()`` 391 + call will have contiguous metadata, whereas if these pages belong to two 392 + different allocations their metadata pages can be fragmented. 393 + 394 + For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions 395 + there also are no guarantees on metadata contiguity. 396 + 397 + In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two 398 + pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions:: 399 + 400 + char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); 401 + char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE))); 402 + 403 + ``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes. 404 + All stores to ``dummy_store_page`` are ignored. 405 + 406 + 2. For vmalloc memory and modules, there is a direct mapping between the memory 407 + range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only 408 + the first quarter available to ``vmalloc()``. The second quarter of the vmalloc 409 + area contains shadow memory for the first quarter, the third one holds the 410 + origins. A small part of the fourth quarter contains shadow and origins for the 411 + kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for 412 + more details. 413 + 414 + When an array of pages is mapped into a contiguous virtual memory space, their 415 + shadow and origin pages are similarly mapped into contiguous regions. 416 + 417 + References 418 + ========== 419 + 420 + E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized 421 + memory use in C++ 422 + <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_. 423 + In Proceedings of CGO 2015. 424 + 425 + .. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html 426 + .. _LLVM documentation: https://llvm.org/docs/GettingStarted.html 427 + .. _LKML discussion: https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/