Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

vfs: avoid large kmalloc()s for the fdtable

Azurit reports large increases in system time after 2.6.36 when running
Apache. It was bisected down to a892e2d7dcdfa6c76e6 ("vfs: use kmalloc()
to allocate fdmem if possible").

That patch caused the vfs to use kmalloc() for very large allocations and
this is causing excessive work (and presumably excessive reclaim) within
the page allocator.

Fix it by falling back to vmalloc() earlier - when the allocation attempt
would have been considered "costly" by reclaim.

Reported-by: azurIt <azurit@pobox.sk>
Tested-by: azurIt <azurit@pobox.sk>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Cc: Americo Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Andrew Morton and committed by
Linus Torvalds
6d4831c2 e8dad694

+11 -7
+11 -7
fs/file.c
··· 9 9 #include <linux/module.h> 10 10 #include <linux/fs.h> 11 11 #include <linux/mm.h> 12 + #include <linux/mmzone.h> 12 13 #include <linux/time.h> 13 14 #include <linux/sched.h> 14 15 #include <linux/slab.h> ··· 40 39 */ 41 40 static DEFINE_PER_CPU(struct fdtable_defer, fdtable_defer_list); 42 41 43 - static inline void *alloc_fdmem(unsigned int size) 42 + static void *alloc_fdmem(unsigned int size) 44 43 { 45 - void *data; 46 - 47 - data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN); 48 - if (data != NULL) 49 - return data; 50 - 44 + /* 45 + * Very large allocations can stress page reclaim, so fall back to 46 + * vmalloc() if the allocation size will be considered "large" by the VM. 47 + */ 48 + if (size <= (PAGE_SIZE << PAGE_ALLOC_COSTLY_ORDER)) { 49 + void *data = kmalloc(size, GFP_KERNEL|__GFP_NOWARN); 50 + if (data != NULL) 51 + return data; 52 + } 51 53 return vmalloc(size); 52 54 } 53 55