[PATCH] fix NUMA interleaving for huge pages

Since vma->vm_pgoff is in units of smallpages, VMAs for huge pages have the
lower HPAGE_SHIFT - PAGE_SHIFT bits always cleared, which results in badd
offsets to the interleave functions. Take this difference from small pages
into account when calculating the offset. This does add a 0-bit shift into
the small-page path (via alloc_page_vma()), but I think that is negligible.
Also add a BUG_ON to prevent the offset from growing due to a negative
right-shift, which probably shouldn't be allowed anyways.

Tested on an 8-memory node ppc64 NUMA box and got the interleaving I
expected.

Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Signed-off-by: Adam Litke <agl@us.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Acked-by: Christoph Lameter <clameter@engr.sgi.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>

authored by Nishanth Aravamudan and committed by Linus Torvalds 3b98b087 1678df37

+9 -1
+9 -1
mm/mempolicy.c
··· 1176 1176 if (vma) { 1177 1177 unsigned long off; 1178 1178 1179 - off = vma->vm_pgoff; 1179 + /* 1180 + * for small pages, there is no difference between 1181 + * shift and PAGE_SHIFT, so the bit-shift is safe. 1182 + * for huge pages, since vm_pgoff is in units of small 1183 + * pages, we need to shift off the always 0 bits to get 1184 + * a useful offset. 1185 + */ 1186 + BUG_ON(shift < PAGE_SHIFT); 1187 + off = vma->vm_pgoff >> (shift - PAGE_SHIFT); 1180 1188 off += (addr - vma->vm_start) >> shift; 1181 1189 return offset_il_node(pol, vma, off); 1182 1190 } else