Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab updates from Vlastimil Babka:

- Conversion of slub_debug stack traces to stackdepot, allowing more
useful debugfs-based inspection for e.g. memory leak debugging.
Allocation and free debugfs info now includes full traces and is
sorted by the unique trace frequency.

The stackdepot conversion was already attempted last year but
reverted by ae14c63a9f20. The memory overhead (while not actually
enabled on boot) has been meanwhile solved by making the large
stackdepot allocation dynamic. The xfstest issues haven't been
reproduced on current kernel locally nor in -next, so the slab cache
layout changes that originally made that bug manifest were probably
not the root cause.

- Refactoring of dma-kmalloc caches creation.

- Trivial cleanups such as removal of unused parameters, fixes and
clarifications of comments.

- Hyeonggon Yoo joins as a reviewer.

* tag 'slab-for-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
MAINTAINERS: add myself as reviewer for slab
mm/slub: remove unused kmem_cache_order_objects max
mm: slab: fix comment for __assume_kmalloc_alignment
mm: slab: fix comment for ARCH_KMALLOC_MINALIGN
mm/slub: remove unneeded return value of slab_pad_check
mm/slab_common: move dma-kmalloc caches creation into new_kmalloc_cache()
mm/slub: remove meaningless node check in ___slab_alloc()
mm/slub: remove duplicate flag in allocate_slab()
mm/slub: remove unused parameter in setup_object*()
mm/slab.c: fix comments
slab, documentation: add description of debugfs files for SLUB caches
mm/slub: sort debugfs output by frequency of stack traces
mm/slub: distinguish and print stack traces in debugfs files
mm/slub: use stackdepot to save stack trace in objects
mm/slub: move struct track init out of set_track()
lib/stackdepot: allow requesting early initialization dynamically
mm/slub, kunit: Make slub_kunit unaffected by user specified flags
mm/slab: remove some unused functions

+283 -143
+64
Documentation/vm/slub.rst
··· 384 384 40,60`` range will plot only samples collected between 40th and 385 385 60th seconds). 386 386 387 + 388 + DebugFS files for SLUB 389 + ====================== 390 + 391 + For more information about current state of SLUB caches with the user tracking 392 + debug option enabled, debugfs files are available, typically under 393 + /sys/kernel/debug/slab/<cache>/ (created only for caches with enabled user 394 + tracking). There are 2 types of these files with the following debug 395 + information: 396 + 397 + 1. alloc_traces:: 398 + 399 + Prints information about unique allocation traces of the currently 400 + allocated objects. The output is sorted by frequency of each trace. 401 + 402 + Information in the output: 403 + Number of objects, allocating function, minimal/average/maximal jiffies since alloc, 404 + pid range of the allocating processes, cpu mask of allocating cpus, and stack trace. 405 + 406 + Example::: 407 + 408 + 1085 populate_error_injection_list+0x97/0x110 age=166678/166680/166682 pid=1 cpus=1:: 409 + __slab_alloc+0x6d/0x90 410 + kmem_cache_alloc_trace+0x2eb/0x300 411 + populate_error_injection_list+0x97/0x110 412 + init_error_injection+0x1b/0x71 413 + do_one_initcall+0x5f/0x2d0 414 + kernel_init_freeable+0x26f/0x2d7 415 + kernel_init+0xe/0x118 416 + ret_from_fork+0x22/0x30 417 + 418 + 419 + 2. free_traces:: 420 + 421 + Prints information about unique freeing traces of the currently allocated 422 + objects. The freeing traces thus come from the previous life-cycle of the 423 + objects and are reported as not available for objects allocated for the first 424 + time. The output is sorted by frequency of each trace. 425 + 426 + Information in the output: 427 + Number of objects, freeing function, minimal/average/maximal jiffies since free, 428 + pid range of the freeing processes, cpu mask of freeing cpus, and stack trace. 429 + 430 + Example::: 431 + 432 + 1980 <not-available> age=4294912290 pid=0 cpus=0 433 + 51 acpi_ut_update_ref_count+0x6a6/0x782 age=236886/237027/237772 pid=1 cpus=1 434 + kfree+0x2db/0x420 435 + acpi_ut_update_ref_count+0x6a6/0x782 436 + acpi_ut_update_object_reference+0x1ad/0x234 437 + acpi_ut_remove_reference+0x7d/0x84 438 + acpi_rs_get_prt_method_data+0x97/0xd6 439 + acpi_get_irq_routing_table+0x82/0xc4 440 + acpi_pci_irq_find_prt_entry+0x8e/0x2e0 441 + acpi_pci_irq_lookup+0x3a/0x1e0 442 + acpi_pci_irq_enable+0x77/0x240 443 + pcibios_enable_device+0x39/0x40 444 + do_pci_enable_device.part.0+0x5d/0xe0 445 + pci_enable_device_flags+0xfc/0x120 446 + pci_enable_device+0x13/0x20 447 + virtio_pci_probe+0x9e/0x170 448 + local_pci_probe+0x48/0x80 449 + pci_device_probe+0x105/0x1c0 450 + 387 451 Christoph Lameter, May 30, 2007 388 452 Sergey Senozhatsky, October 23, 2015
+1
MAINTAINERS
··· 18163 18163 M: Andrew Morton <akpm@linux-foundation.org> 18164 18164 M: Vlastimil Babka <vbabka@suse.cz> 18165 18165 R: Roman Gushchin <roman.gushchin@linux.dev> 18166 + R: Hyeonggon Yoo <42.hyeyoo@gmail.com> 18166 18167 L: linux-mm@kvack.org 18167 18168 S: Maintained 18168 18169 T: git git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git
+11 -4
include/linux/slab.h
··· 112 112 #define SLAB_KASAN 0 113 113 #endif 114 114 115 + /* 116 + * Ignore user specified debugging flags. 117 + * Intended for caches created for self-tests so they have only flags 118 + * specified in the code and other flags are ignored. 119 + */ 120 + #define SLAB_NO_USER_FLAGS ((slab_flags_t __force)0x10000000U) 121 + 115 122 /* The following flags affect the page allocator grouping pages by mobility */ 116 123 /* Objects are reclaimable */ 117 124 #define SLAB_RECLAIM_ACCOUNT ((slab_flags_t __force)0x00020000U) ··· 197 190 /* 198 191 * Some archs want to perform DMA into kmalloc caches and need a guaranteed 199 192 * alignment larger than the alignment of a 64-bit integer. 200 - * Setting ARCH_KMALLOC_MINALIGN in arch headers allows that. 193 + * Setting ARCH_DMA_MINALIGN in arch headers allows that. 201 194 */ 202 195 #if defined(ARCH_DMA_MINALIGN) && ARCH_DMA_MINALIGN > 8 203 196 #define ARCH_KMALLOC_MINALIGN ARCH_DMA_MINALIGN ··· 217 210 #endif 218 211 219 212 /* 220 - * kmalloc and friends return ARCH_KMALLOC_MINALIGN aligned 221 - * pointers. kmem_cache_alloc and friends return ARCH_SLAB_MINALIGN 222 - * aligned pointers. 213 + * kmem_cache_alloc and friends return pointers aligned to ARCH_SLAB_MINALIGN. 214 + * kmalloc and friends return pointers aligned to both ARCH_KMALLOC_MINALIGN 215 + * and ARCH_SLAB_MINALIGN, but here we only assume the former alignment. 223 216 */ 224 217 #define __assume_kmalloc_alignment __assume_aligned(ARCH_KMALLOC_MINALIGN) 225 218 #define __assume_slab_alignment __assume_aligned(ARCH_SLAB_MINALIGN)
-1
include/linux/slub_def.h
··· 105 105 struct kmem_cache_order_objects oo; 106 106 107 107 /* Allocation and freeing of slabs */ 108 - struct kmem_cache_order_objects max; 109 108 struct kmem_cache_order_objects min; 110 109 gfp_t allocflags; /* gfp flags to use on each alloc */ 111 110 int refcount; /* Refcount for slab cache destroy */
+22 -4
include/linux/stackdepot.h
··· 20 20 gfp_t gfp_flags, bool can_alloc); 21 21 22 22 /* 23 - * Every user of stack depot has to call this during its own init when it's 24 - * decided that it will be calling stack_depot_save() later. 23 + * Every user of stack depot has to call stack_depot_init() during its own init 24 + * when it's decided that it will be calling stack_depot_save() later. This is 25 + * recommended for e.g. modules initialized later in the boot process, when 26 + * slab_is_available() is true. 25 27 * 26 28 * The alternative is to select STACKDEPOT_ALWAYS_INIT to have stack depot 27 29 * enabled as part of mm_init(), for subsystems where it's known at compile time 28 30 * that stack depot will be used. 31 + * 32 + * Another alternative is to call stack_depot_want_early_init(), when the 33 + * decision to use stack depot is taken e.g. when evaluating kernel boot 34 + * parameters, which precedes the enablement point in mm_init(). 35 + * 36 + * stack_depot_init() and stack_depot_want_early_init() can be called regardless 37 + * of CONFIG_STACKDEPOT and are no-op when disabled. The actual save/fetch/print 38 + * functions should only be called from code that makes sure CONFIG_STACKDEPOT 39 + * is enabled. 29 40 */ 41 + #ifdef CONFIG_STACKDEPOT 30 42 int stack_depot_init(void); 31 43 32 - #ifdef CONFIG_STACKDEPOT_ALWAYS_INIT 33 - static inline int stack_depot_early_init(void) { return stack_depot_init(); } 44 + void __init stack_depot_want_early_init(void); 45 + 46 + /* This is supposed to be called only from mm_init() */ 47 + int __init stack_depot_early_init(void); 34 48 #else 49 + static inline int stack_depot_init(void) { return 0; } 50 + 51 + static inline void stack_depot_want_early_init(void) { } 52 + 35 53 static inline int stack_depot_early_init(void) { return 0; } 36 54 #endif 37 55
+1
init/Kconfig
··· 1875 1875 default y 1876 1876 bool "Enable SLUB debugging support" if EXPERT 1877 1877 depends on SLUB && SYSFS 1878 + select STACKDEPOT if STACKTRACE_SUPPORT 1878 1879 help 1879 1880 SLUB has extensive debug support features. Disabling these can 1880 1881 result in significant savings in code size. This also disables
+1
lib/Kconfig.debug
··· 710 710 config SLUB_DEBUG_ON 711 711 bool "SLUB debugging on by default" 712 712 depends on SLUB && SLUB_DEBUG 713 + select STACKDEPOT_ALWAYS_INIT if STACKTRACE_SUPPORT 713 714 default n 714 715 help 715 716 Boot with debugging on by default. SLUB boots by default with
+5 -5
lib/slub_kunit.c
··· 12 12 static void test_clobber_zone(struct kunit *test) 13 13 { 14 14 struct kmem_cache *s = kmem_cache_create("TestSlub_RZ_alloc", 64, 0, 15 - SLAB_RED_ZONE, NULL); 15 + SLAB_RED_ZONE|SLAB_NO_USER_FLAGS, NULL); 16 16 u8 *p = kmem_cache_alloc(s, GFP_KERNEL); 17 17 18 18 kasan_disable_current(); ··· 30 30 static void test_next_pointer(struct kunit *test) 31 31 { 32 32 struct kmem_cache *s = kmem_cache_create("TestSlub_next_ptr_free", 64, 0, 33 - SLAB_POISON, NULL); 33 + SLAB_POISON|SLAB_NO_USER_FLAGS, NULL); 34 34 u8 *p = kmem_cache_alloc(s, GFP_KERNEL); 35 35 unsigned long tmp; 36 36 unsigned long *ptr_addr; ··· 75 75 static void test_first_word(struct kunit *test) 76 76 { 77 77 struct kmem_cache *s = kmem_cache_create("TestSlub_1th_word_free", 64, 0, 78 - SLAB_POISON, NULL); 78 + SLAB_POISON|SLAB_NO_USER_FLAGS, NULL); 79 79 u8 *p = kmem_cache_alloc(s, GFP_KERNEL); 80 80 81 81 kmem_cache_free(s, p); ··· 90 90 static void test_clobber_50th_byte(struct kunit *test) 91 91 { 92 92 struct kmem_cache *s = kmem_cache_create("TestSlub_50th_word_free", 64, 0, 93 - SLAB_POISON, NULL); 93 + SLAB_POISON|SLAB_NO_USER_FLAGS, NULL); 94 94 u8 *p = kmem_cache_alloc(s, GFP_KERNEL); 95 95 96 96 kmem_cache_free(s, p); ··· 106 106 static void test_clobber_redzone_free(struct kunit *test) 107 107 { 108 108 struct kmem_cache *s = kmem_cache_create("TestSlub_RZ_free", 64, 0, 109 - SLAB_RED_ZONE, NULL); 109 + SLAB_RED_ZONE|SLAB_NO_USER_FLAGS, NULL); 110 110 u8 *p = kmem_cache_alloc(s, GFP_KERNEL); 111 111 112 112 kasan_disable_current();
+45 -22
lib/stackdepot.c
··· 66 66 unsigned long entries[]; /* Variable-sized array of entries. */ 67 67 }; 68 68 69 + static bool __stack_depot_want_early_init __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT); 70 + static bool __stack_depot_early_init_passed __initdata; 71 + 69 72 static void *stack_slabs[STACK_ALLOC_MAX_SLABS]; 70 73 71 74 static int depot_index; ··· 165 162 } 166 163 early_param("stack_depot_disable", is_stack_depot_disabled); 167 164 168 - /* 169 - * __ref because of memblock_alloc(), which will not be actually called after 170 - * the __init code is gone, because at that point slab_is_available() is true 171 - */ 172 - __ref int stack_depot_init(void) 165 + void __init stack_depot_want_early_init(void) 166 + { 167 + /* Too late to request early init now */ 168 + WARN_ON(__stack_depot_early_init_passed); 169 + 170 + __stack_depot_want_early_init = true; 171 + } 172 + 173 + int __init stack_depot_early_init(void) 174 + { 175 + size_t size; 176 + 177 + /* This is supposed to be called only once, from mm_init() */ 178 + if (WARN_ON(__stack_depot_early_init_passed)) 179 + return 0; 180 + 181 + __stack_depot_early_init_passed = true; 182 + 183 + if (!__stack_depot_want_early_init || stack_depot_disable) 184 + return 0; 185 + 186 + size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); 187 + pr_info("Stack Depot early init allocating hash table with memblock_alloc, %zu bytes\n", 188 + size); 189 + stack_table = memblock_alloc(size, SMP_CACHE_BYTES); 190 + 191 + if (!stack_table) { 192 + pr_err("Stack Depot hash table allocation failed, disabling\n"); 193 + stack_depot_disable = true; 194 + return -ENOMEM; 195 + } 196 + 197 + return 0; 198 + } 199 + 200 + int stack_depot_init(void) 173 201 { 174 202 static DEFINE_MUTEX(stack_depot_init_mutex); 203 + int ret = 0; 175 204 176 205 mutex_lock(&stack_depot_init_mutex); 177 206 if (!stack_depot_disable && !stack_table) { 178 - size_t size = (STACK_HASH_SIZE * sizeof(struct stack_record *)); 179 - int i; 180 - 181 - if (slab_is_available()) { 182 - pr_info("Stack Depot allocating hash table with kvmalloc\n"); 183 - stack_table = kvmalloc(size, GFP_KERNEL); 184 - } else { 185 - pr_info("Stack Depot allocating hash table with memblock_alloc\n"); 186 - stack_table = memblock_alloc(size, SMP_CACHE_BYTES); 187 - } 188 - if (stack_table) { 189 - for (i = 0; i < STACK_HASH_SIZE; i++) 190 - stack_table[i] = NULL; 191 - } else { 207 + pr_info("Stack Depot allocating hash table with kvcalloc\n"); 208 + stack_table = kvcalloc(STACK_HASH_SIZE, sizeof(struct stack_record *), GFP_KERNEL); 209 + if (!stack_table) { 192 210 pr_err("Stack Depot hash table allocation failed, disabling\n"); 193 211 stack_depot_disable = true; 194 - mutex_unlock(&stack_depot_init_mutex); 195 - return -ENOMEM; 212 + ret = -ENOMEM; 196 213 } 197 214 } 198 215 mutex_unlock(&stack_depot_init_mutex); 199 - return 0; 216 + return ret; 200 217 } 201 218 EXPORT_SYMBOL_GPL(stack_depot_init); 202 219
+6 -3
mm/page_owner.c
··· 45 45 46 46 static int __init early_page_owner_param(char *buf) 47 47 { 48 - return kstrtobool(buf, &page_owner_enabled); 48 + int ret = kstrtobool(buf, &page_owner_enabled); 49 + 50 + if (page_owner_enabled) 51 + stack_depot_want_early_init(); 52 + 53 + return ret; 49 54 } 50 55 early_param("page_owner", early_page_owner_param); 51 56 ··· 87 82 { 88 83 if (!page_owner_enabled) 89 84 return; 90 - 91 - stack_depot_init(); 92 85 93 86 register_dummy_stack(); 94 87 register_failure_stack();
+8 -21
mm/slab.c
··· 619 619 return 0; 620 620 } 621 621 622 - static inline void *alternate_node_alloc(struct kmem_cache *cachep, 623 - gfp_t flags) 624 - { 625 - return NULL; 626 - } 627 - 628 - static inline void *____cache_alloc_node(struct kmem_cache *cachep, 629 - gfp_t flags, int nodeid) 630 - { 631 - return NULL; 632 - } 633 - 634 622 static inline gfp_t gfp_exact_node(gfp_t flags) 635 623 { 636 624 return flags & ~__GFP_NOFAIL; 637 625 } 638 626 639 627 #else /* CONFIG_NUMA */ 640 - 641 - static void *____cache_alloc_node(struct kmem_cache *, gfp_t, int); 642 - static void *alternate_node_alloc(struct kmem_cache *, gfp_t); 643 628 644 629 static struct alien_cache *__alloc_alien_cache(int node, int entries, 645 630 int batch, gfp_t gfp) ··· 781 796 int slab_node = slab_nid(virt_to_slab(objp)); 782 797 int node = numa_mem_id(); 783 798 /* 784 - * Make sure we are not freeing a object from another node to the array 799 + * Make sure we are not freeing an object from another node to the array 785 800 * cache on this cpu. 786 801 */ 787 802 if (likely(node == slab_node)) ··· 832 847 833 848 /* 834 849 * The kmem_cache_nodes don't come and go as CPUs 835 - * come and go. slab_mutex is sufficient 850 + * come and go. slab_mutex provides sufficient 836 851 * protection here. 837 852 */ 838 853 cachep->node[node] = n; ··· 845 860 * Allocates and initializes node for a node on each slab cache, used for 846 861 * either memory or cpu hotplug. If memory is being hot-added, the kmem_cache_node 847 862 * will be allocated off-node since memory is not yet online for the new node. 848 - * When hotplugging memory or a cpu, existing node are not replaced if 863 + * When hotplugging memory or a cpu, existing nodes are not replaced if 849 864 * already in use. 850 865 * 851 866 * Must hold slab_mutex. ··· 1046 1061 * offline. 1047 1062 * 1048 1063 * Even if all the cpus of a node are down, we don't free the 1049 - * kmem_cache_node of any cache. This to avoid a race between cpu_down, and 1064 + * kmem_cache_node of any cache. This is to avoid a race between cpu_down, and 1050 1065 * a kmalloc allocation from another cpu for memory from the node of 1051 1066 * the cpu going down. The kmem_cache_node structure is usually allocated from 1052 1067 * kmem_cache_create() and gets destroyed at kmem_cache_destroy(). ··· 1890 1905 * @flags: SLAB flags 1891 1906 * 1892 1907 * Returns a ptr to the cache on success, NULL on failure. 1893 - * Cannot be called within a int, but can be interrupted. 1908 + * Cannot be called within an int, but can be interrupted. 1894 1909 * The @ctor is run when new pages are allocated by the cache. 1895 1910 * 1896 1911 * The flags are ··· 3041 3056 } 3042 3057 3043 3058 #ifdef CONFIG_NUMA 3059 + static void *____cache_alloc_node(struct kmem_cache *, gfp_t, int); 3060 + 3044 3061 /* 3045 3062 * Try allocating on another node if PFA_SPREAD_SLAB is a mempolicy is set. 3046 3063 * ··· 3138 3151 } 3139 3152 3140 3153 /* 3141 - * A interface to enable slab creation on nodeid 3154 + * An interface to enable slab creation on nodeid 3142 3155 */ 3143 3156 static void *____cache_alloc_node(struct kmem_cache *cachep, gfp_t flags, 3144 3157 int nodeid)
+3 -2
mm/slab.h
··· 331 331 SLAB_ACCOUNT) 332 332 #elif defined(CONFIG_SLUB) 333 333 #define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE | SLAB_RECLAIM_ACCOUNT | \ 334 - SLAB_TEMPORARY | SLAB_ACCOUNT) 334 + SLAB_TEMPORARY | SLAB_ACCOUNT | SLAB_NO_USER_FLAGS) 335 335 #else 336 336 #define SLAB_CACHE_FLAGS (SLAB_NOLEAKTRACE) 337 337 #endif ··· 350 350 SLAB_NOLEAKTRACE | \ 351 351 SLAB_RECLAIM_ACCOUNT | \ 352 352 SLAB_TEMPORARY | \ 353 - SLAB_ACCOUNT) 353 + SLAB_ACCOUNT | \ 354 + SLAB_NO_USER_FLAGS) 354 355 355 356 bool __kmem_cache_empty(struct kmem_cache *); 356 357 int __kmem_cache_shutdown(struct kmem_cache *);
+8 -15
mm/slab_common.c
··· 24 24 #include <asm/tlbflush.h> 25 25 #include <asm/page.h> 26 26 #include <linux/memcontrol.h> 27 + #include <linux/stackdepot.h> 27 28 28 29 #define CREATE_TRACE_POINTS 29 30 #include <trace/events/kmem.h> ··· 315 314 * If no slub_debug was enabled globally, the static key is not yet 316 315 * enabled by setup_slub_debug(). Enable it if the cache is being 317 316 * created with any of the debugging flags passed explicitly. 317 + * It's also possible that this is the first cache created with 318 + * SLAB_STORE_USER and we should init stack_depot for it. 318 319 */ 319 320 if (flags & SLAB_DEBUG_FLAGS) 320 321 static_branch_enable(&slub_debug_enabled); 322 + if (flags & SLAB_STORE_USER) 323 + stack_depot_init(); 321 324 #endif 322 325 323 326 mutex_lock(&slab_mutex); ··· 863 858 return; 864 859 } 865 860 flags |= SLAB_ACCOUNT; 861 + } else if (IS_ENABLED(CONFIG_ZONE_DMA) && (type == KMALLOC_DMA)) { 862 + flags |= SLAB_CACHE_DMA; 866 863 } 867 864 868 865 kmalloc_caches[type][idx] = create_kmalloc_cache( ··· 893 886 /* 894 887 * Including KMALLOC_CGROUP if CONFIG_MEMCG_KMEM defined 895 888 */ 896 - for (type = KMALLOC_NORMAL; type <= KMALLOC_RECLAIM; type++) { 889 + for (type = KMALLOC_NORMAL; type < NR_KMALLOC_TYPES; type++) { 897 890 for (i = KMALLOC_SHIFT_LOW; i <= KMALLOC_SHIFT_HIGH; i++) { 898 891 if (!kmalloc_caches[type][i]) 899 892 new_kmalloc_cache(i, type, flags); ··· 914 907 915 908 /* Kmalloc array is now usable */ 916 909 slab_state = UP; 917 - 918 - #ifdef CONFIG_ZONE_DMA 919 - for (i = 0; i <= KMALLOC_SHIFT_HIGH; i++) { 920 - struct kmem_cache *s = kmalloc_caches[KMALLOC_NORMAL][i]; 921 - 922 - if (s) { 923 - kmalloc_caches[KMALLOC_DMA][i] = create_kmalloc_cache( 924 - kmalloc_info[i].name[KMALLOC_DMA], 925 - kmalloc_info[i].size, 926 - SLAB_CACHE_DMA | flags, 0, 927 - kmalloc_info[i].size); 928 - } 929 - } 930 - #endif 931 910 } 932 911 #endif /* !CONFIG_SLOB */ 933 912
+108 -66
mm/slub.c
··· 26 26 #include <linux/cpuset.h> 27 27 #include <linux/mempolicy.h> 28 28 #include <linux/ctype.h> 29 + #include <linux/stackdepot.h> 29 30 #include <linux/debugobjects.h> 30 31 #include <linux/kallsyms.h> 31 32 #include <linux/kfence.h> ··· 38 37 #include <linux/memcontrol.h> 39 38 #include <linux/random.h> 40 39 #include <kunit/test.h> 40 + #include <linux/sort.h> 41 41 42 42 #include <linux/debugfs.h> 43 43 #include <trace/events/kmem.h> ··· 266 264 #define TRACK_ADDRS_COUNT 16 267 265 struct track { 268 266 unsigned long addr; /* Called from address */ 269 - #ifdef CONFIG_STACKTRACE 270 - unsigned long addrs[TRACK_ADDRS_COUNT]; /* Called from address */ 267 + #ifdef CONFIG_STACKDEPOT 268 + depot_stack_handle_t handle; 271 269 #endif 272 270 int cpu; /* Was running on cpu */ 273 271 int pid; /* Pid context */ ··· 726 724 return kasan_reset_tag(p + alloc); 727 725 } 728 726 729 - static void set_track(struct kmem_cache *s, void *object, 727 + static void noinline set_track(struct kmem_cache *s, void *object, 730 728 enum track_item alloc, unsigned long addr) 731 729 { 732 730 struct track *p = get_track(s, object, alloc); 733 731 734 - if (addr) { 735 - #ifdef CONFIG_STACKTRACE 736 - unsigned int nr_entries; 732 + #ifdef CONFIG_STACKDEPOT 733 + unsigned long entries[TRACK_ADDRS_COUNT]; 734 + unsigned int nr_entries; 737 735 738 - metadata_access_enable(); 739 - nr_entries = stack_trace_save(kasan_reset_tag(p->addrs), 740 - TRACK_ADDRS_COUNT, 3); 741 - metadata_access_disable(); 742 - 743 - if (nr_entries < TRACK_ADDRS_COUNT) 744 - p->addrs[nr_entries] = 0; 736 + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 3); 737 + p->handle = stack_depot_save(entries, nr_entries, GFP_NOWAIT); 745 738 #endif 746 - p->addr = addr; 747 - p->cpu = smp_processor_id(); 748 - p->pid = current->pid; 749 - p->when = jiffies; 750 - } else { 751 - memset(p, 0, sizeof(struct track)); 752 - } 739 + 740 + p->addr = addr; 741 + p->cpu = smp_processor_id(); 742 + p->pid = current->pid; 743 + p->when = jiffies; 753 744 } 754 745 755 746 static void init_tracking(struct kmem_cache *s, void *object) 756 747 { 748 + struct track *p; 749 + 757 750 if (!(s->flags & SLAB_STORE_USER)) 758 751 return; 759 752 760 - set_track(s, object, TRACK_FREE, 0UL); 761 - set_track(s, object, TRACK_ALLOC, 0UL); 753 + p = get_track(s, object, TRACK_ALLOC); 754 + memset(p, 0, 2*sizeof(struct track)); 762 755 } 763 756 764 757 static void print_track(const char *s, struct track *t, unsigned long pr_time) 765 758 { 759 + depot_stack_handle_t handle __maybe_unused; 760 + 766 761 if (!t->addr) 767 762 return; 768 763 769 764 pr_err("%s in %pS age=%lu cpu=%u pid=%d\n", 770 765 s, (void *)t->addr, pr_time - t->when, t->cpu, t->pid); 771 - #ifdef CONFIG_STACKTRACE 772 - { 773 - int i; 774 - for (i = 0; i < TRACK_ADDRS_COUNT; i++) 775 - if (t->addrs[i]) 776 - pr_err("\t%pS\n", (void *)t->addrs[i]); 777 - else 778 - break; 779 - } 766 + #ifdef CONFIG_STACKDEPOT 767 + handle = READ_ONCE(t->handle); 768 + if (handle) 769 + stack_depot_print(handle); 770 + else 771 + pr_err("object allocation/free stack trace missing\n"); 780 772 #endif 781 773 } 782 774 ··· 1017 1021 } 1018 1022 1019 1023 /* Check the pad bytes at the end of a slab page */ 1020 - static int slab_pad_check(struct kmem_cache *s, struct slab *slab) 1024 + static void slab_pad_check(struct kmem_cache *s, struct slab *slab) 1021 1025 { 1022 1026 u8 *start; 1023 1027 u8 *fault; ··· 1027 1031 int remainder; 1028 1032 1029 1033 if (!(s->flags & SLAB_POISON)) 1030 - return 1; 1034 + return; 1031 1035 1032 1036 start = slab_address(slab); 1033 1037 length = slab_size(slab); 1034 1038 end = start + length; 1035 1039 remainder = length % s->size; 1036 1040 if (!remainder) 1037 - return 1; 1041 + return; 1038 1042 1039 1043 pad = end - remainder; 1040 1044 metadata_access_enable(); 1041 1045 fault = memchr_inv(kasan_reset_tag(pad), POISON_INUSE, remainder); 1042 1046 metadata_access_disable(); 1043 1047 if (!fault) 1044 - return 1; 1048 + return; 1045 1049 while (end > fault && end[-1] == POISON_INUSE) 1046 1050 end--; 1047 1051 ··· 1050 1054 print_section(KERN_ERR, "Padding ", pad, remainder); 1051 1055 1052 1056 restore_bytes(s, "slab padding", POISON_INUSE, fault, end); 1053 - return 0; 1054 1057 } 1055 1058 1056 1059 static int check_object(struct kmem_cache *s, struct slab *slab, ··· 1263 1268 } 1264 1269 1265 1270 /* Object debug checks for alloc/free paths */ 1266 - static void setup_object_debug(struct kmem_cache *s, struct slab *slab, 1267 - void *object) 1271 + static void setup_object_debug(struct kmem_cache *s, void *object) 1268 1272 { 1269 1273 if (!kmem_cache_debug_flags(s, SLAB_STORE_USER|SLAB_RED_ZONE|__OBJECT_POISON)) 1270 1274 return; ··· 1528 1534 global_slub_debug_changed = true; 1529 1535 } else { 1530 1536 slab_list_specified = true; 1537 + if (flags & SLAB_STORE_USER) 1538 + stack_depot_want_early_init(); 1531 1539 } 1532 1540 } 1533 1541 ··· 1547 1551 } 1548 1552 out: 1549 1553 slub_debug = global_flags; 1554 + if (slub_debug & SLAB_STORE_USER) 1555 + stack_depot_want_early_init(); 1550 1556 if (slub_debug != 0 || slub_debug_string) 1551 1557 static_branch_enable(&slub_debug_enabled); 1552 1558 else ··· 1581 1583 char *next_block; 1582 1584 slab_flags_t block_flags; 1583 1585 slab_flags_t slub_debug_local = slub_debug; 1586 + 1587 + if (flags & SLAB_NO_USER_FLAGS) 1588 + return flags; 1584 1589 1585 1590 /* 1586 1591 * If the slab cache is for debugging (e.g. kmemleak) then ··· 1629 1628 return flags | slub_debug_local; 1630 1629 } 1631 1630 #else /* !CONFIG_SLUB_DEBUG */ 1632 - static inline void setup_object_debug(struct kmem_cache *s, 1633 - struct slab *slab, void *object) {} 1631 + static inline void setup_object_debug(struct kmem_cache *s, void *object) {} 1634 1632 static inline 1635 1633 void setup_slab_debug(struct kmem_cache *s, struct slab *slab, void *addr) {} 1636 1634 ··· 1641 1641 void *head, void *tail, int bulk_cnt, 1642 1642 unsigned long addr) { return 0; } 1643 1643 1644 - static inline int slab_pad_check(struct kmem_cache *s, struct slab *slab) 1645 - { return 1; } 1644 + static inline void slab_pad_check(struct kmem_cache *s, struct slab *slab) {} 1646 1645 static inline int check_object(struct kmem_cache *s, struct slab *slab, 1647 1646 void *object, u8 val) { return 1; } 1648 1647 static inline void add_full(struct kmem_cache *s, struct kmem_cache_node *n, ··· 1771 1772 return *head != NULL; 1772 1773 } 1773 1774 1774 - static void *setup_object(struct kmem_cache *s, struct slab *slab, 1775 - void *object) 1775 + static void *setup_object(struct kmem_cache *s, void *object) 1776 1776 { 1777 - setup_object_debug(s, slab, object); 1777 + setup_object_debug(s, object); 1778 1778 object = kasan_init_slab_obj(s, object); 1779 1779 if (unlikely(s->ctor)) { 1780 1780 kasan_unpoison_object_data(s, object); ··· 1892 1894 /* First entry is used as the base of the freelist */ 1893 1895 cur = next_freelist_entry(s, slab, &pos, start, page_limit, 1894 1896 freelist_count); 1895 - cur = setup_object(s, slab, cur); 1897 + cur = setup_object(s, cur); 1896 1898 slab->freelist = cur; 1897 1899 1898 1900 for (idx = 1; idx < slab->objects; idx++) { 1899 1901 next = next_freelist_entry(s, slab, &pos, start, page_limit, 1900 1902 freelist_count); 1901 - next = setup_object(s, slab, next); 1903 + next = setup_object(s, next); 1902 1904 set_freepointer(s, cur, next); 1903 1905 cur = next; 1904 1906 } ··· 1937 1939 */ 1938 1940 alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL; 1939 1941 if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > oo_order(s->min)) 1940 - alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~(__GFP_RECLAIM|__GFP_NOFAIL); 1942 + alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & ~__GFP_RECLAIM; 1941 1943 1942 1944 slab = alloc_slab_page(alloc_gfp, node, oo); 1943 1945 if (unlikely(!slab)) { ··· 1969 1971 1970 1972 if (!shuffle) { 1971 1973 start = fixup_red_left(s, start); 1972 - start = setup_object(s, slab, start); 1974 + start = setup_object(s, start); 1973 1975 slab->freelist = start; 1974 1976 for (idx = 0, p = start; idx < slab->objects - 1; idx++) { 1975 1977 next = p + s->size; 1976 - next = setup_object(s, slab, next); 1978 + next = setup_object(s, next); 1977 1979 set_freepointer(s, p, next); 1978 1980 p = next; 1979 1981 } ··· 2908 2910 */ 2909 2911 if (!node_isset(node, slab_nodes)) { 2910 2912 node = NUMA_NO_NODE; 2911 - goto redo; 2912 2913 } else { 2913 2914 stat(s, ALLOC_NODE_MISMATCH); 2914 2915 goto deactivate_slab; ··· 4162 4165 */ 4163 4166 s->oo = oo_make(order, size); 4164 4167 s->min = oo_make(get_order(size), size); 4165 - if (oo_objects(s->oo) > oo_objects(s->max)) 4166 - s->max = s->oo; 4167 4168 4168 4169 return !!oo_objects(s->oo); 4169 4170 } ··· 4339 4344 objp = fixup_red_left(s, objp); 4340 4345 trackp = get_track(s, objp, TRACK_ALLOC); 4341 4346 kpp->kp_ret = (void *)trackp->addr; 4342 - #ifdef CONFIG_STACKTRACE 4343 - for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) { 4344 - kpp->kp_stack[i] = (void *)trackp->addrs[i]; 4345 - if (!kpp->kp_stack[i]) 4346 - break; 4347 - } 4347 + #ifdef CONFIG_STACKDEPOT 4348 + { 4349 + depot_stack_handle_t handle; 4350 + unsigned long *entries; 4351 + unsigned int nr_entries; 4348 4352 4349 - trackp = get_track(s, objp, TRACK_FREE); 4350 - for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) { 4351 - kpp->kp_free_stack[i] = (void *)trackp->addrs[i]; 4352 - if (!kpp->kp_free_stack[i]) 4353 - break; 4353 + handle = READ_ONCE(trackp->handle); 4354 + if (handle) { 4355 + nr_entries = stack_depot_fetch(handle, &entries); 4356 + for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++) 4357 + kpp->kp_stack[i] = (void *)entries[i]; 4358 + } 4359 + 4360 + trackp = get_track(s, objp, TRACK_FREE); 4361 + handle = READ_ONCE(trackp->handle); 4362 + if (handle) { 4363 + nr_entries = stack_depot_fetch(handle, &entries); 4364 + for (i = 0; i < KS_ADDRS_COUNT && i < nr_entries; i++) 4365 + kpp->kp_free_stack[i] = (void *)entries[i]; 4366 + } 4354 4367 } 4355 4368 #endif 4356 4369 #endif ··· 5060 5057 */ 5061 5058 5062 5059 struct location { 5060 + depot_stack_handle_t handle; 5063 5061 unsigned long count; 5064 5062 unsigned long addr; 5065 5063 long long sum_time; ··· 5113 5109 { 5114 5110 long start, end, pos; 5115 5111 struct location *l; 5116 - unsigned long caddr; 5112 + unsigned long caddr, chandle; 5117 5113 unsigned long age = jiffies - track->when; 5114 + depot_stack_handle_t handle = 0; 5118 5115 5116 + #ifdef CONFIG_STACKDEPOT 5117 + handle = READ_ONCE(track->handle); 5118 + #endif 5119 5119 start = -1; 5120 5120 end = t->count; 5121 5121 ··· 5134 5126 break; 5135 5127 5136 5128 caddr = t->loc[pos].addr; 5137 - if (track->addr == caddr) { 5129 + chandle = t->loc[pos].handle; 5130 + if ((track->addr == caddr) && (handle == chandle)) { 5138 5131 5139 5132 l = &t->loc[pos]; 5140 5133 l->count++; ··· 5160 5151 5161 5152 if (track->addr < caddr) 5162 5153 end = pos; 5154 + else if (track->addr == caddr && handle < chandle) 5155 + end = pos; 5163 5156 else 5164 5157 start = pos; 5165 5158 } ··· 5184 5173 l->max_time = age; 5185 5174 l->min_pid = track->pid; 5186 5175 l->max_pid = track->pid; 5176 + l->handle = handle; 5187 5177 cpumask_clear(to_cpumask(l->cpus)); 5188 5178 cpumask_set_cpu(track->cpu, to_cpumask(l->cpus)); 5189 5179 nodes_clear(l->nodes); ··· 6094 6082 seq_printf(seq, " nodes=%*pbl", 6095 6083 nodemask_pr_args(&l->nodes)); 6096 6084 6085 + #ifdef CONFIG_STACKDEPOT 6086 + { 6087 + depot_stack_handle_t handle; 6088 + unsigned long *entries; 6089 + unsigned int nr_entries, j; 6090 + 6091 + handle = READ_ONCE(l->handle); 6092 + if (handle) { 6093 + nr_entries = stack_depot_fetch(handle, &entries); 6094 + seq_puts(seq, "\n"); 6095 + for (j = 0; j < nr_entries; j++) 6096 + seq_printf(seq, " %pS\n", (void *)entries[j]); 6097 + } 6098 + } 6099 + #endif 6097 6100 seq_puts(seq, "\n"); 6098 6101 } 6099 6102 ··· 6131 6104 return ppos; 6132 6105 6133 6106 return NULL; 6107 + } 6108 + 6109 + static int cmp_loc_by_count(const void *a, const void *b, const void *data) 6110 + { 6111 + struct location *loc1 = (struct location *)a; 6112 + struct location *loc2 = (struct location *)b; 6113 + 6114 + if (loc1->count > loc2->count) 6115 + return -1; 6116 + else 6117 + return 1; 6134 6118 } 6135 6119 6136 6120 static void *slab_debugfs_start(struct seq_file *seq, loff_t *ppos) ··· 6204 6166 process_slab(t, s, slab, alloc, obj_map); 6205 6167 spin_unlock_irqrestore(&n->list_lock, flags); 6206 6168 } 6169 + 6170 + /* Sort locations by count */ 6171 + sort_r(t->loc, t->count, sizeof(struct location), 6172 + cmp_loc_by_count, NULL, NULL); 6207 6173 6208 6174 bitmap_free(obj_map); 6209 6175 return 0;