[PATCH] Handle all and empty zones when setting up custom zonelists for mbind

The memory allocator doesn't like empty zones (which have an
uninitialized freelist), so a x86-64 system with a node fully
in GFP_DMA32 only would crash on mbind.

Fix that up by putting all possible zones as fallback into the zonelist
and skipping the empty ones.

In fact the code always enough allocated space for all zones,
but only used it for the highest. This change just uses all the
memory that was allocated before.

This should work fine for now, but whoever implements node hot removal
needs to fix this somewhere else too (or make sure zone datastructures
by itself never go away, only their memory)

Signed-off-by: Andi Kleen <ak@suse.de>
Acked-by: Christoph Lameter <clameter@sgi.com>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

authored by Andi Kleen and committed by Linus Torvalds dd942ae3 759b650f

+14 -4
+14 -4
mm/mempolicy.c
··· 132 132 } 133 133 return nodes_subset(*nodes, node_online_map) ? 0 : -EINVAL; 134 134 } 135 + 135 136 /* Generate a custom zonelist for the BIND policy. */ 136 137 static struct zonelist *bind_zonelist(nodemask_t *nodes) 137 138 { 138 139 struct zonelist *zl; 139 - int num, max, nd; 140 + int num, max, nd, k; 140 141 141 142 max = 1 + MAX_NR_ZONES * nodes_weight(*nodes); 142 - zl = kmalloc(sizeof(void *) * max, GFP_KERNEL); 143 + zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL); 143 144 if (!zl) 144 145 return NULL; 145 146 num = 0; 146 - for_each_node_mask(nd, *nodes) 147 - zl->zones[num++] = &NODE_DATA(nd)->node_zones[policy_zone]; 147 + /* First put in the highest zones from all nodes, then all the next 148 + lower zones etc. Avoid empty zones because the memory allocator 149 + doesn't like them. If you implement node hot removal you 150 + have to fix that. */ 151 + for (k = policy_zone; k >= 0; k--) { 152 + for_each_node_mask(nd, *nodes) { 153 + struct zone *z = &NODE_DATA(nd)->node_zones[k]; 154 + if (z->present_pages > 0) 155 + zl->zones[num++] = z; 156 + } 157 + } 148 158 zl->zones[num] = NULL; 149 159 return zl; 150 160 }