Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm, page_alloc: disallow __GFP_COMP in alloc_pages_exact()

alloc_pages_exact*() allocates a page of sufficient order and then splits
it to return only the number of pages requested. That makes it
incompatible with __GFP_COMP, because compound pages cannot be split.

As shown by [1] things may silently work until the requested size
(possibly depending on user) stops being power of two. Then for
CONFIG_DEBUG_VM, BUG_ON() triggers in split_page(). Without
CONFIG_DEBUG_VM, consequences are unclear.

There are several options here, none of them great:

1) Don't do the splitting when __GFP_COMP is passed, and return the
whole compound page. However if caller then returns it via
free_pages_exact(), that will be unexpected and the freeing actions
there will be wrong.

2) Warn and remove __GFP_COMP from the flags. But the caller may have
really wanted it, so things may break later somewhere.

3) Warn and return NULL. However NULL may be unexpected, especially
for small sizes.

This patch picks option 2, because as Michal Hocko put it: "callers wanted
it" is much less probable than "caller is simply confused and more gfp
flags is surely better than fewer".

[1] https://lore.kernel.org/lkml/20181126002805.GI18977@shao2-debian/T/#u

Link: http://lkml.kernel.org/r/0c6393eb-b28d-4607-c386-862a71f09de6@suse.cz
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Mel Gorman <mgorman@techsingularity.net>
Cc: Takashi Iwai <tiwai@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Vlastimil Babka and committed by
Linus Torvalds
63931eb9 5fd4ca2d

+11 -3
+11 -3
mm/page_alloc.c
··· 4821 4821 /** 4822 4822 * alloc_pages_exact - allocate an exact number physically-contiguous pages. 4823 4823 * @size: the number of bytes to allocate 4824 - * @gfp_mask: GFP flags for the allocation 4824 + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP 4825 4825 * 4826 4826 * This function is similar to alloc_pages(), except that it allocates the 4827 4827 * minimum number of pages to satisfy the request. alloc_pages() can only ··· 4838 4838 unsigned int order = get_order(size); 4839 4839 unsigned long addr; 4840 4840 4841 + if (WARN_ON_ONCE(gfp_mask & __GFP_COMP)) 4842 + gfp_mask &= ~__GFP_COMP; 4843 + 4841 4844 addr = __get_free_pages(gfp_mask, order); 4842 4845 return make_alloc_exact(addr, order, size); 4843 4846 } ··· 4851 4848 * pages on a node. 4852 4849 * @nid: the preferred node ID where memory should be allocated 4853 4850 * @size: the number of bytes to allocate 4854 - * @gfp_mask: GFP flags for the allocation 4851 + * @gfp_mask: GFP flags for the allocation, must not contain __GFP_COMP 4855 4852 * 4856 4853 * Like alloc_pages_exact(), but try to allocate on node nid first before falling 4857 4854 * back. ··· 4861 4858 void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask) 4862 4859 { 4863 4860 unsigned int order = get_order(size); 4864 - struct page *p = alloc_pages_node(nid, gfp_mask, order); 4861 + struct page *p; 4862 + 4863 + if (WARN_ON_ONCE(gfp_mask & __GFP_COMP)) 4864 + gfp_mask &= ~__GFP_COMP; 4865 + 4866 + p = alloc_pages_node(nid, gfp_mask, order); 4865 4867 if (!p) 4866 4868 return NULL; 4867 4869 return make_alloc_exact((unsigned long)page_address(p), order, size);