Randomized slab caches for kmalloc() · tjh.dev/kernel@3c61529

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Randomized slab caches for kmalloc()

When exploiting memory vulnerabilities, "heap spraying" is a common
technique targeting those related to dynamic memory allocation (i.e. the
"heap"), and it plays an important role in a successful exploitation.
Basically, it is to overwrite the memory area of vulnerable object by
triggering allocation in other subsystems or modules and therefore
getting a reference to the targeted memory location. It's usable on
various types of vulnerablity including use after free (UAF), heap out-
of-bound write and etc.

There are (at least) two reasons why the heap can be sprayed: 1) generic
slab caches are shared among different subsystems and modules, and
2) dedicated slab caches could be merged with the generic ones.
Currently these two factors cannot be prevented at a low cost: the first
one is a widely used memory allocation mechanism, and shutting down slab
merging completely via `slub_nomerge` would be overkill.

To efficiently prevent heap spraying, we propose the following approach:
to create multiple copies of generic slab caches that will never be
merged, and random one of them will be used at allocation. The random
selection is based on the address of code that calls `kmalloc()`, which
means it is static at runtime (rather than dynamically determined at
each time of allocation, which could be bypassed by repeatedly spraying
in brute force). In other words, the randomness of cache selection will
be with respect to the code address rather than time, i.e. allocations
in different code paths would most likely pick different caches,
although kmalloc() at each place would use the same cache copy whenever
it is executed. In this way, the vulnerable object and memory allocated
in other subsystems and modules will (most probably) be on different
slab caches, which prevents the object from being sprayed.

Meanwhile, the static random selection is further enhanced with a
per-boot random seed, which prevents the attacker from finding a usable
kmalloc that happens to pick the same cache with the vulnerable
subsystem/module by analyzing the open source code. In other words, with
the per-boot seed, the random selection is static during each time the
system starts and runs, but not across different system startups.

The overhead of performance has been tested on a 40-core x86 server by
comparing the results of `perf bench all` between the kernels with and
without this patch based on the latest linux-next kernel, which shows
minor difference. A subset of benchmarks are listed below:

sched/ sched/ syscall/ mem/ mem/
messaging pipe basic memcpy memset
(sec) (sec) (sec) (GB/sec) (GB/sec)

control1 0.019 5.459 0.733 15.258789 51.398026
control2 0.019 5.439 0.730 16.009221 48.828125
control3 0.019 5.282 0.735 16.009221 48.828125
control_avg 0.019 5.393 0.733 15.759077 49.684759

experiment1 0.019 5.374 0.741 15.500992 46.502976
experiment2 0.019 5.440 0.746 16.276042 51.398026
experiment3 0.019 5.242 0.752 15.258789 51.398026
experiment_avg 0.019 5.352 0.746 15.678608 49.766343

The overhead of memory usage was measured by executing `free` after boot
on a QEMU VM with 1GB total memory, and as expected, it's positively
correlated with # of cache copies:

control 4 copies 8 copies 16 copies

total 969.8M 968.2M 968.2M 968.2M
used 20.0M 21.9M 24.1M 26.7M
free 936.9M 933.6M 931.4M 928.6M
available 932.2M 928.8M 926.6M 923.9M

Co-developed-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: GONG, Ruiqi <gongruiqi@huaweicloud.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com>
Acked-by: Dennis Zhou <dennis@kernel.org> # percpu
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

authored by

GONG, Ruiqi and committed by

Vlastimil Babka 2 years ago 3c615294 06c2afb8

+97 -15

7 changed files

expand all

include

linux

percpu.h

slab.h

Kconfig

kfence

kfence_test.c

slab.c

slab.h

slab_common.c

+9 -3

include/linux/percpu.h

··· 35 35 #define PCPU_BITMAP_BLOCK_BITS (PCPU_BITMAP_BLOCK_SIZE >> \ 36 36 PCPU_MIN_ALLOC_SHIFT) 37 37 38 + #ifdef CONFIG_RANDOM_KMALLOC_CACHES 39 + #define PERCPU_DYNAMIC_SIZE_SHIFT 12 40 + #else 41 + #define PERCPU_DYNAMIC_SIZE_SHIFT 10 42 + #endif 43 + 38 44 /* 39 45 * Percpu allocator can serve percpu allocations before slab is 40 46 * initialized which allows slab to depend on the percpu allocator. ··· 48 42 * for this. Keep PERCPU_DYNAMIC_RESERVE equal to or larger than 49 43 * PERCPU_DYNAMIC_EARLY_SIZE. 50 44 */ 51 - #define PERCPU_DYNAMIC_EARLY_SIZE (20 << 10) 45 + #define PERCPU_DYNAMIC_EARLY_SIZE (20 << PERCPU_DYNAMIC_SIZE_SHIFT) 52 46 53 47 /* 54 48 * PERCPU_DYNAMIC_RESERVE indicates the amount of free area to piggy ··· 62 56 * intelligent way to determine this would be nice. 63 57 */ 64 58 #if BITS_PER_LONG > 32 65 - #define PERCPU_DYNAMIC_RESERVE (28 << 10) 59 + #define PERCPU_DYNAMIC_RESERVE (28 << PERCPU_DYNAMIC_SIZE_SHIFT) 66 60 #else 67 - #define PERCPU_DYNAMIC_RESERVE (20 << 10) 61 + #define PERCPU_DYNAMIC_RESERVE (20 << PERCPU_DYNAMIC_SIZE_SHIFT) 68 62 #endif 69 63 70 64 extern void *pcpu_base_addr;

+20 -3

include/linux/slab.h

··· 19 19 #include <linux/workqueue.h> 20 20 #include <linux/percpu-refcount.h> 21 21 #include <linux/cleanup.h> 22 + #include <linux/hash.h> 22 23 23 24 24 25 /* ··· 346 345 #define SLAB_OBJ_MIN_SIZE (KMALLOC_MIN_SIZE < 16 ? \ 347 346 (KMALLOC_MIN_SIZE) : 16) 348 347 348 + #ifdef CONFIG_RANDOM_KMALLOC_CACHES 349 + #define RANDOM_KMALLOC_CACHES_NR 15 // # of cache copies 350 + #else 351 + #define RANDOM_KMALLOC_CACHES_NR 0 352 + #endif 353 + 349 354 /* 350 355 * Whenever changing this, take care of that kmalloc_type() and 351 356 * create_kmalloc_caches() still work as intended. ··· 368 361 #ifndef CONFIG_MEMCG_KMEM 369 362 KMALLOC_CGROUP = KMALLOC_NORMAL, 370 363 #endif 364 + KMALLOC_RANDOM_START = KMALLOC_NORMAL, 365 + KMALLOC_RANDOM_END = KMALLOC_RANDOM_START + RANDOM_KMALLOC_CACHES_NR, 371 366 #ifdef CONFIG_SLUB_TINY 372 367 KMALLOC_RECLAIM = KMALLOC_NORMAL, 373 368 #else ··· 395 386 (IS_ENABLED(CONFIG_ZONE_DMA) ? __GFP_DMA : 0) | \ 396 387 (IS_ENABLED(CONFIG_MEMCG_KMEM) ? __GFP_ACCOUNT : 0)) 397 388 398 - static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags) 389 + extern unsigned long random_kmalloc_seed; 390 + 391 + static __always_inline enum kmalloc_cache_type kmalloc_type(gfp_t flags, unsigned long caller) 399 392 { 400 393 /* 401 394 * The most common case is KMALLOC_NORMAL, so test for it 402 395 * with a single branch for all the relevant flags. 403 396 */ 404 397 if (likely((flags & KMALLOC_NOT_NORMAL_BITS) == 0)) 398 + #ifdef CONFIG_RANDOM_KMALLOC_CACHES 399 + /* RANDOM_KMALLOC_CACHES_NR (=15) copies + the KMALLOC_NORMAL */ 400 + return KMALLOC_RANDOM_START + hash_64(caller ^ random_kmalloc_seed, 401 + ilog2(RANDOM_KMALLOC_CACHES_NR + 1)); 402 + #else 405 403 return KMALLOC_NORMAL; 404 + #endif 406 405 407 406 /* 408 407 * At least one of the flags has to be set. Their priorities in ··· 597 580 598 581 index = kmalloc_index(size); 599 582 return kmalloc_trace( 600 - kmalloc_caches[kmalloc_type(flags)][index], 583 + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index], 601 584 flags, size); 602 585 } 603 586 return __kmalloc(size, flags); ··· 613 596 614 597 index = kmalloc_index(size); 615 598 return kmalloc_node_trace( 616 - kmalloc_caches[kmalloc_type(flags)][index], 599 + kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index], 617 600 flags, node, size); 618 601 } 619 602 return __kmalloc_node(size, flags, node);

+17

mm/Kconfig

··· 337 337 which requires the taking of locks that may cause latency spikes. 338 338 Typically one would choose no for a realtime system. 339 339 340 + config RANDOM_KMALLOC_CACHES 341 + default n 342 + depends on SLUB && !SLUB_TINY 343 + bool "Randomize slab caches for normal kmalloc" 344 + help 345 + A hardening feature that creates multiple copies of slab caches for 346 + normal kmalloc allocation and makes kmalloc randomly pick one based 347 + on code address, which makes the attackers more difficult to spray 348 + vulnerable memory objects on the heap for the purpose of exploiting 349 + memory vulnerabilities. 350 + 351 + Currently the number of copies is set to 16, a reasonably large value 352 + that effectively diverges the memory objects allocated for different 353 + subsystems or modules into different caches, at the expense of a 354 + limited degree of memory and CPU overhead that relates to hardware and 355 + system workload. 356 + 340 357 endmenu # SLAB allocator options 341 358 342 359 config SHUFFLE_PAGE_ALLOCATOR

+5 -2

mm/kfence/kfence_test.c

··· 212 212 213 213 static inline size_t kmalloc_cache_alignment(size_t size) 214 214 { 215 - return kmalloc_caches[kmalloc_type(GFP_KERNEL)][__kmalloc_index(size, false)]->align; 215 + /* just to get ->align so no need to pass in the real caller */ 216 + enum kmalloc_cache_type type = kmalloc_type(GFP_KERNEL, 0); 217 + return kmalloc_caches[type][__kmalloc_index(size, false)]->align; 216 218 } 217 219 218 220 /* Must always inline to match stack trace against caller. */ ··· 284 282 285 283 if (is_kfence_address(alloc)) { 286 284 struct slab *slab = virt_to_slab(alloc); 285 + enum kmalloc_cache_type type = kmalloc_type(GFP_KERNEL, _RET_IP_); 287 286 struct kmem_cache *s = test_cache ?: 288 - kmalloc_caches[kmalloc_type(GFP_KERNEL)][__kmalloc_index(size, false)]; 287 + kmalloc_caches[type][__kmalloc_index(size, false)]; 289 288 290 289 /* 291 290 * Verify that various helpers return the right values

+1 -1

mm/slab.c

··· 1670 1670 if (freelist_size > KMALLOC_MAX_CACHE_SIZE) { 1671 1671 freelist_cache_size = PAGE_SIZE << get_order(freelist_size); 1672 1672 } else { 1673 - freelist_cache = kmalloc_slab(freelist_size, 0u); 1673 + freelist_cache = kmalloc_slab(freelist_size, 0u, _RET_IP_); 1674 1674 if (!freelist_cache) 1675 1675 continue; 1676 1676 freelist_cache_size = freelist_cache->size;

+1 -1

mm/slab.h

··· 282 282 void create_kmalloc_caches(slab_flags_t); 283 283 284 284 /* Find the kmalloc slab corresponding for a certain size */ 285 - struct kmem_cache *kmalloc_slab(size_t, gfp_t); 285 + struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags, unsigned long caller); 286 286 287 287 void *__kmem_cache_alloc_node(struct kmem_cache *s, gfp_t gfpflags, 288 288 int node, size_t orig_size,

+44 -5

mm/slab_common.c

··· 678 678 { /* initialization for https://bugs.llvm.org/show_bug.cgi?id=42570 */ }; 679 679 EXPORT_SYMBOL(kmalloc_caches); 680 680 681 + #ifdef CONFIG_RANDOM_KMALLOC_CACHES 682 + unsigned long random_kmalloc_seed __ro_after_init; 683 + EXPORT_SYMBOL(random_kmalloc_seed); 684 + #endif 685 + 681 686 /* 682 687 * Conversion table for small slabs sizes / 8 to the index in the 683 688 * kmalloc array. This is necessary for slabs < 192 since we have non power ··· 725 720 * Find the kmem_cache structure that serves a given size of 726 721 * allocation 727 722 */ 728 - struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags) 723 + struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags, unsigned long caller) 729 724 { 730 725 unsigned int index; 731 726 ··· 740 735 index = fls(size - 1); 741 736 } 742 737 743 - return kmalloc_caches[kmalloc_type(flags)][index]; 738 + return kmalloc_caches[kmalloc_type(flags, caller)][index]; 744 739 } 745 740 746 741 size_t kmalloc_size_roundup(size_t size) ··· 757 752 if (size > KMALLOC_MAX_CACHE_SIZE) 758 753 return PAGE_SIZE << get_order(size); 759 754 760 - /* The flags don't matter since size_index is common to all. */ 761 - c = kmalloc_slab(size, GFP_KERNEL); 755 + /* 756 + * The flags don't matter since size_index is common to all. 757 + * Neither does the caller for just getting ->object_size. 758 + */ 759 + c = kmalloc_slab(size, GFP_KERNEL, 0); 762 760 return c ? c->object_size : 0; 763 761 } 764 762 EXPORT_SYMBOL(kmalloc_size_roundup); ··· 784 776 #define KMALLOC_RCL_NAME(sz) 785 777 #endif 786 778 779 + #ifdef CONFIG_RANDOM_KMALLOC_CACHES 780 + #define __KMALLOC_RANDOM_CONCAT(a, b) a ## b 781 + #define KMALLOC_RANDOM_NAME(N, sz) __KMALLOC_RANDOM_CONCAT(KMA_RAND_, N)(sz) 782 + #define KMA_RAND_1(sz) .name[KMALLOC_RANDOM_START + 1] = "kmalloc-rnd-01-" #sz, 783 + #define KMA_RAND_2(sz) KMA_RAND_1(sz) .name[KMALLOC_RANDOM_START + 2] = "kmalloc-rnd-02-" #sz, 784 + #define KMA_RAND_3(sz) KMA_RAND_2(sz) .name[KMALLOC_RANDOM_START + 3] = "kmalloc-rnd-03-" #sz, 785 + #define KMA_RAND_4(sz) KMA_RAND_3(sz) .name[KMALLOC_RANDOM_START + 4] = "kmalloc-rnd-04-" #sz, 786 + #define KMA_RAND_5(sz) KMA_RAND_4(sz) .name[KMALLOC_RANDOM_START + 5] = "kmalloc-rnd-05-" #sz, 787 + #define KMA_RAND_6(sz) KMA_RAND_5(sz) .name[KMALLOC_RANDOM_START + 6] = "kmalloc-rnd-06-" #sz, 788 + #define KMA_RAND_7(sz) KMA_RAND_6(sz) .name[KMALLOC_RANDOM_START + 7] = "kmalloc-rnd-07-" #sz, 789 + #define KMA_RAND_8(sz) KMA_RAND_7(sz) .name[KMALLOC_RANDOM_START + 8] = "kmalloc-rnd-08-" #sz, 790 + #define KMA_RAND_9(sz) KMA_RAND_8(sz) .name[KMALLOC_RANDOM_START + 9] = "kmalloc-rnd-09-" #sz, 791 + #define KMA_RAND_10(sz) KMA_RAND_9(sz) .name[KMALLOC_RANDOM_START + 10] = "kmalloc-rnd-10-" #sz, 792 + #define KMA_RAND_11(sz) KMA_RAND_10(sz) .name[KMALLOC_RANDOM_START + 11] = "kmalloc-rnd-11-" #sz, 793 + #define KMA_RAND_12(sz) KMA_RAND_11(sz) .name[KMALLOC_RANDOM_START + 12] = "kmalloc-rnd-12-" #sz, 794 + #define KMA_RAND_13(sz) KMA_RAND_12(sz) .name[KMALLOC_RANDOM_START + 13] = "kmalloc-rnd-13-" #sz, 795 + #define KMA_RAND_14(sz) KMA_RAND_13(sz) .name[KMALLOC_RANDOM_START + 14] = "kmalloc-rnd-14-" #sz, 796 + #define KMA_RAND_15(sz) KMA_RAND_14(sz) .name[KMALLOC_RANDOM_START + 15] = "kmalloc-rnd-15-" #sz, 797 + #else // CONFIG_RANDOM_KMALLOC_CACHES 798 + #define KMALLOC_RANDOM_NAME(N, sz) 799 + #endif 800 + 787 801 #define INIT_KMALLOC_INFO(__size, __short_size) \ 788 802 { \ 789 803 .name[KMALLOC_NORMAL] = "kmalloc-" #__short_size, \ 790 804 KMALLOC_RCL_NAME(__short_size) \ 791 805 KMALLOC_CGROUP_NAME(__short_size) \ 792 806 KMALLOC_DMA_NAME(__short_size) \ 807 + KMALLOC_RANDOM_NAME(RANDOM_KMALLOC_CACHES_NR, __short_size) \ 793 808 .size = __size, \ 794 809 } 795 810 ··· 921 890 flags |= SLAB_CACHE_DMA; 922 891 } 923 892 893 + #ifdef CONFIG_RANDOM_KMALLOC_CACHES 894 + if (type >= KMALLOC_RANDOM_START && type <= KMALLOC_RANDOM_END) 895 + flags |= SLAB_NO_MERGE; 896 + #endif 897 + 924 898 /* 925 899 * If CONFIG_MEMCG_KMEM is enabled, disable cache merging for 926 900 * KMALLOC_NORMAL caches. ··· 977 941 new_kmalloc_cache(2, type, flags); 978 942 } 979 943 } 944 + #ifdef CONFIG_RANDOM_KMALLOC_CACHES 945 + random_kmalloc_seed = get_random_u64(); 946 + #endif 980 947 981 948 /* Kmalloc array is now usable */ 982 949 slab_state = UP; ··· 1015 976 return ret; 1016 977 } 1017 978 1018 - s = kmalloc_slab(size, flags); 979 + s = kmalloc_slab(size, flags, caller); 1019 980 1020 981 if (unlikely(ZERO_OR_NULL_PTR(s))) 1021 982 return s;