mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified

When calling mbind() with MPOL_MF_{MOVE|MOVEALL} | MPOL_MF_STRICT, kernel
should attempt to migrate all existing pages, and return -EIO if there is
misplaced or unmovable page. Then commit 6f4576e3687b ("mempolicy: apply
page table walker on queue_pages_range()") messed up the return value and
didn't break VMA scan early ianymore when MPOL_MF_STRICT alone. The
return value problem was fixed by commit a7f40cfe3b7a ("mm: mempolicy:
make mbind() return -EIO when MPOL_MF_STRICT is specified"), but it broke
the VMA walk early if unmovable page is met, it may cause some pages are
not migrated as expected.

The code should conceptually do:

if (MPOL_MF_MOVE|MOVEALL)
scan all vmas
try to migrate the existing pages
return success
else if (MPOL_MF_MOVE* | MPOL_MF_STRICT)
scan all vmas
try to migrate the existing pages
return -EIO if unmovable or migration failed
else /* MPOL_MF_STRICT alone */
break early if meets unmovable and don't call mbind_range() at all
else /* none of those flags */
check the ranges in test_walk, EFAULT without mbind_range() if discontig.

Fixed the behavior.

Link: https://lkml.kernel.org/r/20230920223242.3425775-1-yang@os.amperecomputing.com
Fixes: a7f40cfe3b7a ("mm: mempolicy: make mbind() return -EIO when MPOL_MF_STRICT is specified")
Signed-off-by: Yang Shi <yang@os.amperecomputing.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [4.9+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by Yang Shi and committed by Andrew Morton 24526268 45120b15

Changed files
+19 -20
mm
+19 -20
mm/mempolicy.c
··· 426 426 unsigned long start; 427 427 unsigned long end; 428 428 struct vm_area_struct *first; 429 + bool has_unmovable; 429 430 }; 430 431 431 432 /* ··· 447 446 /* 448 447 * queue_folios_pmd() has three possible return values: 449 448 * 0 - folios are placed on the right node or queued successfully, or 450 - * special page is met, i.e. huge zero page. 451 - * 1 - there is unmovable folio, and MPOL_MF_MOVE* & MPOL_MF_STRICT were 452 - * specified. 449 + * special page is met, i.e. zero page, or unmovable page is found 450 + * but continue walking (indicated by queue_pages.has_unmovable). 453 451 * -EIO - is migration entry or only MPOL_MF_STRICT was specified and an 454 452 * existing folio was already on a node that does not follow the 455 453 * policy. ··· 479 479 if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { 480 480 if (!vma_migratable(walk->vma) || 481 481 migrate_folio_add(folio, qp->pagelist, flags)) { 482 - ret = 1; 482 + qp->has_unmovable = true; 483 483 goto unlock; 484 484 } 485 485 } else ··· 495 495 * 496 496 * queue_folios_pte_range() has three possible return values: 497 497 * 0 - folios are placed on the right node or queued successfully, or 498 - * special page is met, i.e. zero page. 499 - * 1 - there is unmovable folio, and MPOL_MF_MOVE* & MPOL_MF_STRICT were 500 - * specified. 498 + * special page is met, i.e. zero page, or unmovable page is found 499 + * but continue walking (indicated by queue_pages.has_unmovable). 501 500 * -EIO - only MPOL_MF_STRICT was specified and an existing folio was already 502 501 * on a node that does not follow the policy. 503 502 */ ··· 507 508 struct folio *folio; 508 509 struct queue_pages *qp = walk->private; 509 510 unsigned long flags = qp->flags; 510 - bool has_unmovable = false; 511 511 pte_t *pte, *mapped_pte; 512 512 pte_t ptent; 513 513 spinlock_t *ptl; ··· 536 538 if (!queue_folio_required(folio, qp)) 537 539 continue; 538 540 if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { 539 - /* MPOL_MF_STRICT must be specified if we get here */ 540 - if (!vma_migratable(vma)) { 541 - has_unmovable = true; 542 - break; 543 - } 541 + /* 542 + * MPOL_MF_STRICT must be specified if we get here. 543 + * Continue walking vmas due to MPOL_MF_MOVE* flags. 544 + */ 545 + if (!vma_migratable(vma)) 546 + qp->has_unmovable = true; 544 547 545 548 /* 546 549 * Do not abort immediately since there may be ··· 549 550 * need migrate other LRU pages. 550 551 */ 551 552 if (migrate_folio_add(folio, qp->pagelist, flags)) 552 - has_unmovable = true; 553 + qp->has_unmovable = true; 553 554 } else 554 555 break; 555 556 } 556 557 pte_unmap_unlock(mapped_pte, ptl); 557 558 cond_resched(); 558 - 559 - if (has_unmovable) 560 - return 1; 561 559 562 560 return addr != end ? -EIO : 0; 563 561 } ··· 595 599 * Detecting misplaced folio but allow migrating folios which 596 600 * have been queued. 597 601 */ 598 - ret = 1; 602 + qp->has_unmovable = true; 599 603 goto unlock; 600 604 } 601 605 ··· 616 620 * Failed to isolate folio but allow migrating pages 617 621 * which have been queued. 618 622 */ 619 - ret = 1; 623 + qp->has_unmovable = true; 620 624 } 621 625 unlock: 622 626 spin_unlock(ptl); ··· 752 756 .start = start, 753 757 .end = end, 754 758 .first = NULL, 759 + .has_unmovable = false, 755 760 }; 756 761 const struct mm_walk_ops *ops = lock_vma ? 757 762 &queue_pages_lock_vma_walk_ops : &queue_pages_walk_ops; 758 763 759 764 err = walk_page_range(mm, start, end, ops, &qp); 760 765 766 + if (qp.has_unmovable) 767 + err = 1; 761 768 if (!qp.first) 762 769 /* whole range in hole */ 763 770 err = -EFAULT; ··· 1357 1358 putback_movable_pages(&pagelist); 1358 1359 } 1359 1360 1360 - if ((ret > 0) || (nr_failed && (flags & MPOL_MF_STRICT))) 1361 + if (((ret > 0) || nr_failed) && (flags & MPOL_MF_STRICT)) 1361 1362 err = -EIO; 1362 1363 } else { 1363 1364 up_out: