Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: rename alloc_pages_exact_node() to __alloc_pages_node()

alloc_pages_exact_node() was introduced in commit 6484eb3e2a81 ("page
allocator: do not check NUMA node ID when the caller knows the node is
valid") as an optimized variant of alloc_pages_node(), that doesn't
fallback to current node for nid == NUMA_NO_NODE. Unfortunately the
name of the function can easily suggest that the allocation is
restricted to the given node and fails otherwise. In truth, the node is
only preferred, unless __GFP_THISNODE is passed among the gfp flags.

The misleading name has lead to mistakes in the past, see for example
commits 5265047ac301 ("mm, thp: really limit transparent hugepage
allocation to local node") and b360edb43f8e ("mm, mempolicy:
migrate_to_node should only migrate to node").

Another issue with the name is that there's a family of
alloc_pages_exact*() functions where 'exact' means exact size (instead
of page order), which leads to more confusion.

To prevent further mistakes, this patch effectively renames
alloc_pages_exact_node() to __alloc_pages_node() to better convey that
it's an optimized variant of alloc_pages_node() not intended for general
usage. Both functions get described in comments.

It has been also considered to really provide a convenience function for
allocations restricted to a node, but the major opinion seems to be that
__GFP_THISNODE already provides that functionality and we shouldn't
duplicate the API needlessly. The number of users would be small
anyway.

Existing callers of alloc_pages_exact_node() are simply converted to
call __alloc_pages_node(), with the exception of sba_alloc_coherent()
which open-codes the check for NUMA_NO_NODE, so it is converted to use
alloc_pages_node() instead. This means it no longer performs some
VM_BUG_ON checks, and since the current check for nid in
alloc_pages_node() uses a 'nid < 0' comparison (which includes
NUMA_NO_NODE), it may hide wrong values which would be previously
exposed.

Both differences will be rectified by the next patch.

To sum up, this patch makes no functional changes, except temporarily
hiding potentially buggy callers. Restricting the checks in
alloc_pages_node() is left for the next patch which can in turn expose
more existing buggy callers.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Robin Holt <robinmholt@gmail.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Cc: Mel Gorman <mgorman@suse.de>
Cc: David Rientjes <rientjes@google.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Gleb Natapov <gleb@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Cliff Whickman <cpw@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Vlastimil Babka and committed by
Linus Torvalds
96db800f 7fadc820

+39 -38
+1 -5
arch/ia64/hp/common/sba_iommu.c
··· 1140 1140 1141 1141 #ifdef CONFIG_NUMA 1142 1142 { 1143 - int node = ioc->node; 1144 1143 struct page *page; 1145 1144 1146 - if (node == NUMA_NO_NODE) 1147 - node = numa_node_id(); 1148 - 1149 - page = alloc_pages_exact_node(node, flags, get_order(size)); 1145 + page = alloc_pages_node(ioc->node, flags, get_order(size)); 1150 1146 if (unlikely(!page)) 1151 1147 return NULL; 1152 1148
+1 -1
arch/ia64/kernel/uncached.c
··· 97 97 98 98 /* attempt to allocate a granule's worth of cached memory pages */ 99 99 100 - page = alloc_pages_exact_node(nid, 100 + page = __alloc_pages_node(nid, 101 101 GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE, 102 102 IA64_GRANULE_SHIFT-PAGE_SHIFT); 103 103 if (!page) {
+1 -1
arch/ia64/sn/pci/pci_dma.c
··· 92 92 */ 93 93 node = pcibus_to_node(pdev->bus); 94 94 if (likely(node >=0)) { 95 - struct page *p = alloc_pages_exact_node(node, 95 + struct page *p = __alloc_pages_node(node, 96 96 flags, get_order(size)); 97 97 98 98 if (likely(p))
+1 -1
arch/powerpc/platforms/cell/ras.c
··· 123 123 124 124 area->nid = nid; 125 125 area->order = order; 126 - area->pages = alloc_pages_exact_node(area->nid, 126 + area->pages = __alloc_pages_node(area->nid, 127 127 GFP_KERNEL|__GFP_THISNODE, 128 128 area->order); 129 129
+1 -1
arch/x86/kvm/vmx.c
··· 3150 3150 struct page *pages; 3151 3151 struct vmcs *vmcs; 3152 3152 3153 - pages = alloc_pages_exact_node(node, GFP_KERNEL, vmcs_config.order); 3153 + pages = __alloc_pages_node(node, GFP_KERNEL, vmcs_config.order); 3154 3154 if (!pages) 3155 3155 return NULL; 3156 3156 vmcs = page_address(pages);
+1 -1
drivers/misc/sgi-xp/xpc_uv.c
··· 239 239 mq->mmr_blade = uv_cpu_to_blade_id(cpu); 240 240 241 241 nid = cpu_to_node(cpu); 242 - page = alloc_pages_exact_node(nid, 242 + page = __alloc_pages_node(nid, 243 243 GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE, 244 244 pg_order); 245 245 if (page == NULL) {
+16 -9
include/linux/gfp.h
··· 303 303 return __alloc_pages_nodemask(gfp_mask, order, zonelist, NULL); 304 304 } 305 305 306 + /* 307 + * Allocate pages, preferring the node given as nid. The node must be valid and 308 + * online. For more general interface, see alloc_pages_node(). 309 + */ 310 + static inline struct page * 311 + __alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) 312 + { 313 + VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)); 314 + 315 + return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); 316 + } 317 + 318 + /* 319 + * Allocate pages, preferring the node given as nid. When nid == NUMA_NO_NODE, 320 + * prefer the current CPU's node. 321 + */ 306 322 static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, 307 323 unsigned int order) 308 324 { 309 325 /* Unknown node is current node */ 310 326 if (nid < 0) 311 327 nid = numa_node_id(); 312 - 313 - return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); 314 - } 315 - 316 - static inline struct page *alloc_pages_exact_node(int nid, gfp_t gfp_mask, 317 - unsigned int order) 318 - { 319 - VM_BUG_ON(nid < 0 || nid >= MAX_NUMNODES || !node_online(nid)); 320 328 321 329 return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); 322 330 } ··· 365 357 366 358 void *alloc_pages_exact(size_t size, gfp_t gfp_mask); 367 359 void free_pages_exact(void *virt, size_t size); 368 - /* This is different from alloc_pages_exact_node !!! */ 369 360 void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask); 370 361 371 362 #define __get_free_page(gfp_mask) \
+4 -4
kernel/profile.c
··· 339 339 node = cpu_to_mem(cpu); 340 340 per_cpu(cpu_profile_flip, cpu) = 0; 341 341 if (!per_cpu(cpu_profile_hits, cpu)[1]) { 342 - page = alloc_pages_exact_node(node, 342 + page = __alloc_pages_node(node, 343 343 GFP_KERNEL | __GFP_ZERO, 344 344 0); 345 345 if (!page) ··· 347 347 per_cpu(cpu_profile_hits, cpu)[1] = page_address(page); 348 348 } 349 349 if (!per_cpu(cpu_profile_hits, cpu)[0]) { 350 - page = alloc_pages_exact_node(node, 350 + page = __alloc_pages_node(node, 351 351 GFP_KERNEL | __GFP_ZERO, 352 352 0); 353 353 if (!page) ··· 543 543 int node = cpu_to_mem(cpu); 544 544 struct page *page; 545 545 546 - page = alloc_pages_exact_node(node, 546 + page = __alloc_pages_node(node, 547 547 GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE, 548 548 0); 549 549 if (!page) 550 550 goto out_cleanup; 551 551 per_cpu(cpu_profile_hits, cpu)[1] 552 552 = (struct profile_hit *)page_address(page); 553 - page = alloc_pages_exact_node(node, 553 + page = __alloc_pages_node(node, 554 554 GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE, 555 555 0); 556 556 if (!page)
+1 -1
mm/filemap.c
··· 674 674 do { 675 675 cpuset_mems_cookie = read_mems_allowed_begin(); 676 676 n = cpuset_mem_spread_node(); 677 - page = alloc_pages_exact_node(n, gfp, 0); 677 + page = __alloc_pages_node(n, gfp, 0); 678 678 } while (!page && read_mems_allowed_retry(cpuset_mems_cookie)); 679 679 680 680 return page;
+1 -1
mm/huge_memory.c
··· 2414 2414 */ 2415 2415 up_read(&mm->mmap_sem); 2416 2416 2417 - *hpage = alloc_pages_exact_node(node, gfp, HPAGE_PMD_ORDER); 2417 + *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); 2418 2418 if (unlikely(!*hpage)) { 2419 2419 count_vm_event(THP_COLLAPSE_ALLOC_FAILED); 2420 2420 *hpage = ERR_PTR(-ENOMEM);
+2 -2
mm/hugetlb.c
··· 1331 1331 { 1332 1332 struct page *page; 1333 1333 1334 - page = alloc_pages_exact_node(nid, 1334 + page = __alloc_pages_node(nid, 1335 1335 htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE| 1336 1336 __GFP_REPEAT|__GFP_NOWARN, 1337 1337 huge_page_order(h)); ··· 1483 1483 __GFP_REPEAT|__GFP_NOWARN, 1484 1484 huge_page_order(h)); 1485 1485 else 1486 - page = alloc_pages_exact_node(nid, 1486 + page = __alloc_pages_node(nid, 1487 1487 htlb_alloc_mask(h)|__GFP_COMP|__GFP_THISNODE| 1488 1488 __GFP_REPEAT|__GFP_NOWARN, huge_page_order(h)); 1489 1489
+1 -1
mm/memory-failure.c
··· 1521 1521 return alloc_huge_page_node(page_hstate(compound_head(p)), 1522 1522 nid); 1523 1523 else 1524 - return alloc_pages_exact_node(nid, GFP_HIGHUSER_MOVABLE, 0); 1524 + return __alloc_pages_node(nid, GFP_HIGHUSER_MOVABLE, 0); 1525 1525 } 1526 1526 1527 1527 /*
+2 -2
mm/mempolicy.c
··· 942 942 return alloc_huge_page_node(page_hstate(compound_head(page)), 943 943 node); 944 944 else 945 - return alloc_pages_exact_node(node, GFP_HIGHUSER_MOVABLE | 945 + return __alloc_pages_node(node, GFP_HIGHUSER_MOVABLE | 946 946 __GFP_THISNODE, 0); 947 947 } 948 948 ··· 1998 1998 nmask = policy_nodemask(gfp, pol); 1999 1999 if (!nmask || node_isset(hpage_node, *nmask)) { 2000 2000 mpol_cond_put(pol); 2001 - page = alloc_pages_exact_node(hpage_node, 2001 + page = __alloc_pages_node(hpage_node, 2002 2002 gfp | __GFP_THISNODE, order); 2003 2003 goto out; 2004 2004 }
+2 -2
mm/migrate.c
··· 1195 1195 return alloc_huge_page_node(page_hstate(compound_head(p)), 1196 1196 pm->node); 1197 1197 else 1198 - return alloc_pages_exact_node(pm->node, 1198 + return __alloc_pages_node(pm->node, 1199 1199 GFP_HIGHUSER_MOVABLE | __GFP_THISNODE, 0); 1200 1200 } 1201 1201 ··· 1555 1555 int nid = (int) data; 1556 1556 struct page *newpage; 1557 1557 1558 - newpage = alloc_pages_exact_node(nid, 1558 + newpage = __alloc_pages_node(nid, 1559 1559 (GFP_HIGHUSER_MOVABLE | 1560 1560 __GFP_THISNODE | __GFP_NOMEMALLOC | 1561 1561 __GFP_NORETRY | __GFP_NOWARN) &
-2
mm/page_alloc.c
··· 3511 3511 * 3512 3512 * Like alloc_pages_exact(), but try to allocate on node nid first before falling 3513 3513 * back. 3514 - * Note this is not alloc_pages_exact_node() which allocates on a specific node, 3515 - * but is not exact. 3516 3514 */ 3517 3515 void * __meminit alloc_pages_exact_nid(int nid, size_t size, gfp_t gfp_mask) 3518 3516 {
+1 -1
mm/slab.c
··· 1595 1595 if (memcg_charge_slab(cachep, flags, cachep->gfporder)) 1596 1596 return NULL; 1597 1597 1598 - page = alloc_pages_exact_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder); 1598 + page = __alloc_pages_node(nodeid, flags | __GFP_NOTRACK, cachep->gfporder); 1599 1599 if (!page) { 1600 1600 memcg_uncharge_slab(cachep, cachep->gfporder); 1601 1601 slab_out_of_memory(cachep, flags, nodeid);
+2 -2
mm/slob.c
··· 45 45 * NUMA support in SLOB is fairly simplistic, pushing most of the real 46 46 * logic down to the page allocator, and simply doing the node accounting 47 47 * on the upper levels. In the event that a node id is explicitly 48 - * provided, alloc_pages_exact_node() with the specified node id is used 48 + * provided, __alloc_pages_node() with the specified node id is used 49 49 * instead. The common case (or when the node id isn't explicitly provided) 50 50 * will default to the current node, as per numa_node_id(). 51 51 * ··· 193 193 194 194 #ifdef CONFIG_NUMA 195 195 if (node != NUMA_NO_NODE) 196 - page = alloc_pages_exact_node(node, gfp, order); 196 + page = __alloc_pages_node(node, gfp, order); 197 197 else 198 198 #endif 199 199 page = alloc_pages(gfp, order);
+1 -1
mm/slub.c
··· 1334 1334 if (node == NUMA_NO_NODE) 1335 1335 page = alloc_pages(flags, order); 1336 1336 else 1337 - page = alloc_pages_exact_node(node, flags, order); 1337 + page = __alloc_pages_node(node, flags, order); 1338 1338 1339 1339 if (!page) 1340 1340 memcg_uncharge_slab(s, order);