1 .. SPDX-License-Identifier: GPL-2.0
2 .. Copyright (C) 2022, Google LLC.
4 ===============================
5 Kernel Memory Sanitizer (KMSAN)
6 ===============================
8 KMSAN is a dynamic error detector aimed at finding uses of uninitialized
9 values. It is based on compiler instrumentation, and is quite similar to the
10 userspace `MemorySanitizer tool`_.
12 An important note is that KMSAN is not intended for production use, because it
13 drastically increases kernel memory footprint and slows the whole system down.
21 In order to build a kernel with KMSAN you will need a fresh Clang (14.0.6+).
22 Please refer to `LLVM documentation`_ for the instructions on how to build Clang.
24 Now configure and build the kernel with CONFIG_KMSAN enabled.
29 Here is an example of a KMSAN report::
31 =====================================================
32 BUG: KMSAN: uninit-value in test_uninit_kmsan_check_memory+0x1be/0x380 [kmsan_test]
33 test_uninit_kmsan_check_memory+0x1be/0x380 mm/kmsan/kmsan_test.c:273
34 kunit_run_case_internal lib/kunit/test.c:333
35 kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
36 kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
37 kthread+0x721/0x850 kernel/kthread.c:327
38 ret_from_fork+0x1f/0x30 ??:?
40 Uninit was stored to memory at:
41 do_uninit_local_array+0xfa/0x110 mm/kmsan/kmsan_test.c:260
42 test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
43 kunit_run_case_internal lib/kunit/test.c:333
44 kunit_try_run_case+0x206/0x420 lib/kunit/test.c:374
45 kunit_generic_run_threadfn_adapter+0x6d/0xc0 lib/kunit/try-catch.c:28
46 kthread+0x721/0x850 kernel/kthread.c:327
47 ret_from_fork+0x1f/0x30 ??:?
49 Local variable uninit created at:
50 do_uninit_local_array+0x4a/0x110 mm/kmsan/kmsan_test.c:256
51 test_uninit_kmsan_check_memory+0x1a2/0x380 mm/kmsan/kmsan_test.c:271
53 Bytes 4-7 of 8 are uninitialized
54 Memory access of size 8 starts at ffff888083fe3da0
56 CPU: 0 PID: 6731 Comm: kunit_try_catch Tainted: G B E 5.16.0-rc3+ #104
57 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
58 =====================================================
60 The report says that the local variable ``uninit`` was created uninitialized in
61 ``do_uninit_local_array()``. The third stack trace corresponds to the place
62 where this variable was created.
64 The first stack trace shows where the uninit value was used (in
65 ``test_uninit_kmsan_check_memory()``). The tool shows the bytes which were left
66 uninitialized in the local variable, as well as the stack where the value was
67 copied to another memory location before use.
69 A use of uninitialized value ``v`` is reported by KMSAN in the following cases:
71 - in a condition, e.g. ``if (v) { ... }``;
72 - in an indexing or pointer dereferencing, e.g. ``array[v]`` or ``*v``;
73 - when it is copied to userspace or hardware, e.g. ``copy_to_user(..., &v, ...)``;
74 - when it is passed as an argument to a function, and
75 ``CONFIG_KMSAN_CHECK_PARAM_RETVAL`` is enabled (see below).
77 The mentioned cases (apart from copying data to userspace or hardware, which is
78 a security issue) are considered undefined behavior from the C11 Standard point
81 Disabling the instrumentation
82 -----------------------------
84 A function can be marked with ``__no_kmsan_checks``. Doing so makes KMSAN
85 ignore uninitialized values in that function and mark its output as initialized.
86 As a result, the user will not get KMSAN reports related to that function.
88 Another function attribute supported by KMSAN is ``__no_sanitize_memory``.
89 Applying this attribute to a function will result in KMSAN not instrumenting
90 it, which can be helpful if we do not want the compiler to interfere with some
91 low-level code (e.g. that marked with ``noinstr`` which implicitly adds
92 ``__no_sanitize_memory``).
94 This however comes at a cost: stack allocations from such functions will have
95 incorrect shadow/origin values, likely leading to false positives. Functions
96 called from non-instrumented code may also receive incorrect metadata for their
99 As a rule of thumb, avoid using ``__no_sanitize_memory`` explicitly.
101 It is also possible to disable KMSAN for a single file (e.g. main.o)::
103 KMSAN_SANITIZE_main.o := n
105 or for the whole directory::
109 in the Makefile. Think of this as applying ``__no_sanitize_memory`` to every
110 function in the file or directory. Most users won't need KMSAN_SANITIZE, unless
111 their code gets broken by KMSAN (e.g. runs at early boot time).
113 KMSAN checks can also be temporarily disabled for the current task using
114 ``kmsan_disable_current()`` and ``kmsan_enable_current()`` calls. Each
115 ``kmsan_enable_current()`` call must be preceded by a
116 ``kmsan_disable_current()`` call; these call pairs may be nested. One needs to
117 be careful with these calls, keeping the regions short and preferring other
118 ways to disable instrumentation, where possible.
123 In order for KMSAN to work the kernel must be built with Clang, which so far is
124 the only compiler that has KMSAN support. The kernel instrumentation pass is
125 based on the userspace `MemorySanitizer tool`_.
127 The runtime library only supports x86_64 at the moment.
135 KMSAN associates a metadata byte (also called shadow byte) with every byte of
136 kernel memory. A bit in the shadow byte is set if the corresponding bit of the
137 kernel memory byte is uninitialized. Marking the memory uninitialized (i.e.
138 setting its shadow bytes to ``0xff``) is called poisoning, marking it
139 initialized (setting the shadow bytes to ``0x00``) is called unpoisoning.
141 When a new variable is allocated on the stack, it is poisoned by default by
142 instrumentation code inserted by the compiler (unless it is a stack variable
143 that is immediately initialized). Any new heap allocation done without
144 ``__GFP_ZERO`` is also poisoned.
146 Compiler instrumentation also tracks the shadow values as they are used along
147 the code. When needed, instrumentation code invokes the runtime library in
148 ``mm/kmsan/`` to persist shadow values.
150 The shadow value of a basic or compound type is an array of bytes of the same
151 length. When a constant value is written into memory, that memory is unpoisoned.
152 When a value is read from memory, its shadow memory is also obtained and
153 propagated into all the operations which use that value. For every instruction
154 that takes one or more values the compiler generates code that calculates the
155 shadow of the result depending on those values and their shadows.
159 int a = 0xff; // i.e. 0x000000ff
163 In this case the shadow of ``a`` is ``0``, shadow of ``b`` is ``0xffffffff``,
164 shadow of ``c`` is ``0xffffff00``. This means that the upper three bytes of
165 ``c`` are uninitialized, while the lower byte is initialized.
170 Every four bytes of kernel memory also have a so-called origin mapped to them.
171 This origin describes the point in program execution at which the uninitialized
172 value was created. Every origin is associated with either the full allocation
173 stack (for heap-allocated memory), or the function containing the uninitialized
174 variable (for locals).
176 When an uninitialized variable is allocated on stack or heap, a new origin
177 value is created, and that variable's origin is filled with that value. When a
178 value is read from memory, its origin is also read and kept together with the
179 shadow. For every instruction that takes one or more values, the origin of the
180 result is one of the origins corresponding to any of the uninitialized inputs.
181 If a poisoned value is written into memory, its origin is written to the
182 corresponding storage as well.
190 In this case the origin of ``b`` is generated upon function entry, and is
191 stored to the origin of ``c`` right before the addition result is written into
194 Several variables may share the same origin address, if they are stored in the
195 same four-byte chunk. In this case every write to either variable updates the
196 origin for all of them. We have to sacrifice precision in this case, because
197 storing origins for individual bits (and even bytes) would be too costly.
201 int combine(short a, short b) {
211 If ``a`` is initialized and ``b`` is not, the shadow of the result would be
212 0xffff0000, and the origin of the result would be the origin of ``b``.
213 ``ret.s[0]`` would have the same origin, but it will never be used, because
214 that variable is initialized.
216 If both function arguments are uninitialized, only the origin of the second
217 argument is preserved.
222 To ease debugging, KMSAN creates a new origin for every store of an
223 uninitialized value to memory. The new origin references both its creation stack
224 and the previous origin the value had. This may cause increased memory
225 consumption, so we limit the length of origin chains in the runtime.
227 Clang instrumentation API
228 -------------------------
230 Clang instrumentation pass inserts calls to functions defined in
231 ``mm/kmsan/nstrumentation.c`` into the kernel code.
236 For every memory access the compiler emits a call to a function that returns a
237 pair of pointers to the shadow and origin addresses of the given memory::
240 void *shadow, *origin;
241 } shadow_origin_ptr_t
243 shadow_origin_ptr_t __msan_metadata_ptr_for_load_{1,2,4,8}(void *addr)
244 shadow_origin_ptr_t __msan_metadata_ptr_for_store_{1,2,4,8}(void *addr)
245 shadow_origin_ptr_t __msan_metadata_ptr_for_load_n(void *addr, uintptr_t size)
246 shadow_origin_ptr_t __msan_metadata_ptr_for_store_n(void *addr, uintptr_t size)
248 The function name depends on the memory access size.
250 The compiler makes sure that for every loaded value its shadow and origin
251 values are read from memory. When a value is stored to memory, its shadow and
252 origin are also stored using the metadata pointers.
257 A special function is used to create a new origin value for a local variable and
258 set the origin of that variable to that value::
260 void __msan_poison_alloca(void *addr, uintptr_t size, char *descr)
262 Access to per-task data
263 ~~~~~~~~~~~~~~~~~~~~~~~
265 At the beginning of every instrumented function KMSAN inserts a call to
266 ``__msan_get_context_state()``::
268 kmsan_context_state *__msan_get_context_state(void)
270 ``kmsan_context_state`` is declared in ``include/linux/kmsan.h``::
272 struct kmsan_context_state {
273 char param_tls[KMSAN_PARAM_SIZE];
274 char retval_tls[KMSAN_RETVAL_SIZE];
275 char va_arg_tls[KMSAN_PARAM_SIZE];
276 char va_arg_origin_tls[KMSAN_PARAM_SIZE];
277 u64 va_arg_overflow_size_tls;
278 char param_origin_tls[KMSAN_PARAM_SIZE];
279 depot_stack_handle_t retval_origin_tls;
282 This structure is used by KMSAN to pass parameter shadows and origins between
283 instrumented functions (unless the parameters are checked immediately by
284 ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``).
286 Passing uninitialized values to functions
287 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
289 Clang's MemorySanitizer instrumentation has an option,
290 ``-fsanitize-memory-param-retval``, which makes the compiler check function
291 parameters passed by value, as well as function return values.
293 The option is controlled by ``CONFIG_KMSAN_CHECK_PARAM_RETVAL``, which is
294 enabled by default to let KMSAN report uninitialized values earlier.
295 Please refer to the `LKML discussion`_ for more details.
297 Because of the way the checks are implemented in LLVM (they are only applied to
298 parameters marked as ``noundef``), not all parameters are guaranteed to be
299 checked, so we cannot give up the metadata storage in ``kmsan_context_state``.
304 The compiler replaces calls to ``memcpy()``/``memmove()``/``memset()`` with the
305 following functions. These functions are also called when data structures are
306 initialized or copied, making sure shadow and origin values are copied alongside
309 void *__msan_memcpy(void *dst, void *src, uintptr_t n)
310 void *__msan_memmove(void *dst, void *src, uintptr_t n)
311 void *__msan_memset(void *dst, int c, uintptr_t n)
316 For each use of a value the compiler emits a shadow check that calls
317 ``__msan_warning()`` in the case that value is poisoned::
319 void __msan_warning(u32 origin)
321 ``__msan_warning()`` causes KMSAN runtime to print an error report.
323 Inline assembly instrumentation
324 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
326 KMSAN instruments every inline assembly output with a call to::
328 void __msan_instrument_asm_store(void *addr, uintptr_t size)
330 , which unpoisons the memory region.
332 This approach may mask certain errors, but it also helps to avoid a lot of
333 false positives in bitwise operations, atomics etc.
335 Sometimes the pointers passed into inline assembly do not point to valid memory.
336 In such cases they are ignored at runtime.
342 The code is located in ``mm/kmsan/``.
347 Every task_struct has an associated KMSAN task state that holds the KMSAN
348 context (see above) and a per-task counter disallowing KMSAN reports::
350 struct kmsan_context {
353 struct kmsan_context_state cstate;
359 struct kmsan_context kmsan;
366 When running in a kernel task context, KMSAN uses ``current->kmsan.cstate`` to
367 hold the metadata for function parameters and return values.
369 But in the case the kernel is running in the interrupt, softirq or NMI context,
370 where ``current`` is unavailable, KMSAN switches to per-cpu interrupt state::
372 DEFINE_PER_CPU(struct kmsan_ctx, kmsan_percpu_ctx);
377 There are several places in the kernel for which the metadata is stored.
379 1. Each ``struct page`` instance contains two pointers to its shadow and
384 struct page *shadow, *origin;
388 At boot-time, the kernel allocates shadow and origin pages for every available
389 kernel page. This is done quite late, when the kernel address space is already
390 fragmented, so normal data pages may arbitrarily interleave with the metadata
393 This means that in general for two contiguous memory pages their shadow/origin
394 pages may not be contiguous. Consequently, if a memory access crosses the
395 boundary of a memory block, accesses to shadow/origin memory may potentially
396 corrupt other pages or read incorrect values from them.
398 In practice, contiguous memory pages returned by the same ``alloc_pages()``
399 call will have contiguous metadata, whereas if these pages belong to two
400 different allocations their metadata pages can be fragmented.
402 For the kernel data (``.data``, ``.bss`` etc.) and percpu memory regions
403 there also are no guarantees on metadata contiguity.
405 In the case ``__msan_metadata_ptr_for_XXX_YYY()`` hits the border between two
406 pages with non-contiguous metadata, it returns pointers to fake shadow/origin regions::
408 char dummy_load_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
409 char dummy_store_page[PAGE_SIZE] __attribute__((aligned(PAGE_SIZE)));
411 ``dummy_load_page`` is zero-initialized, so reads from it always yield zeroes.
412 All stores to ``dummy_store_page`` are ignored.
414 2. For vmalloc memory and modules, there is a direct mapping between the memory
415 range, its shadow and origin. KMSAN reduces the vmalloc area by 3/4, making only
416 the first quarter available to ``vmalloc()``. The second quarter of the vmalloc
417 area contains shadow memory for the first quarter, the third one holds the
418 origins. A small part of the fourth quarter contains shadow and origins for the
419 kernel modules. Please refer to ``arch/x86/include/asm/pgtable_64_types.h`` for
422 When an array of pages is mapped into a contiguous virtual memory space, their
423 shadow and origin pages are similarly mapped into contiguous regions.
428 E. Stepanov, K. Serebryany. `MemorySanitizer: fast detector of uninitialized
430 <https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/43308.pdf>`_.
431 In Proceedings of CGO 2015.
433 .. _MemorySanitizer tool: https://clang.llvm.org/docs/MemorySanitizer.html
434 .. _LLVM documentation: https://llvm.org/docs/GettingStarted.html
435 .. _LKML discussion: https://lore.kernel.org/all/20220614144853.3693273-1-glider@google.com/