Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: mempolicy: don't have to split pmd for huge zero page

When trying to migrate pages to obey mempolicy, the huge zero page is
split by inserting base zero pfn to all PTEs, then the page table walk
fallback to PTE level and just skips zero page. Skipping zero page for
mempolicy has been the behavior of kernel since v2.6.16 due to commit
f4598c8b3678 ("[PATCH] migration: make sure there is no attempt to migrate
reserved pages."). So it seems pointless to split huge zero page, it
could be just skipped like base zero page.

Set ACTION_CONTINUE to prevent the walk_page_range() split the pmd for
this case.

Link: https://lkml.kernel.org/r/20210609172146.3594-1-shy828301@gmail.com
Link: https://lkml.kernel.org/r/20210604203513.240709-1-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Yang Shi and committed by
Linus Torvalds
e5947d23 95837924

+5 -4
+5 -4
mm/mempolicy.c
··· 437 437 438 438 /* 439 439 * queue_pages_pmd() has four possible return values: 440 - * 0 - pages are placed on the right node or queued successfully. 440 + * 0 - pages are placed on the right node or queued successfully, or 441 + * special page is met, i.e. huge zero page. 441 442 * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were 442 443 * specified. 443 444 * 2 - THP was split. ··· 462 461 page = pmd_page(*pmd); 463 462 if (is_huge_zero_page(page)) { 464 463 spin_unlock(ptl); 465 - __split_huge_pmd(walk->vma, pmd, addr, false, NULL); 466 - ret = 2; 464 + walk->action = ACTION_CONTINUE; 467 465 goto out; 468 466 } 469 467 if (!queue_pages_required(page, qp)) ··· 489 489 * and move them to the pagelist if they do. 490 490 * 491 491 * queue_pages_pte_range() has three possible return values: 492 - * 0 - pages are placed on the right node or queued successfully. 492 + * 0 - pages are placed on the right node or queued successfully, or 493 + * special page is met, i.e. zero page. 493 494 * 1 - there is unmovable page, and MPOL_MF_MOVE* & MPOL_MF_STRICT were 494 495 * specified. 495 496 * -EIO - only MPOL_MF_STRICT was specified and an existing page was already