Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm/lru: introduce TestClearPageLRU()

Currently lru_lock still guards both lru list and page's lru bit, that's
ok. but if we want to use specific lruvec lock on the page, we need to
pin down the page's lruvec/memcg during locking. Just taking lruvec lock
first may be undermined by the page's memcg charge/migration. To fix this
problem, we will clear the lru bit out of locking and use it as pin down
action to block the page isolation in memcg changing.

So now a standard steps of page isolation is following:
1, get_page(); #pin the page avoid to be free
2, TestClearPageLRU(); #block other isolation like memcg change
3, spin_lock on lru_lock; #serialize lru list access
4, delete page from lru list;

This patch start with the first part: TestClearPageLRU, which combines
PageLRU check and ClearPageLRU into a macro func TestClearPageLRU. This
function will be used as page isolation precondition to prevent other
isolations some where else. Then there are may !PageLRU page on lru list,
need to remove BUG() checking accordingly.

There 2 rules for lru bit now:
1, the lru bit still indicate if a page on lru list, just in some
temporary moment(isolating), the page may have no lru bit when
it's on lru list. but the page still must be on lru list when the
lru bit set.
2, have to remove lru bit before delete it from lru list.

As Andrew Morton mentioned this change would dirty cacheline for a page
which isn't on the LRU. But the loss would be acceptable in Rong Chen
<rong.a.chen@intel.com> report:
https://lore.kernel.org/lkml/20200304090301.GB5972@shao2-debian/

Link: https://lkml.kernel.org/r/1604566549-62481-15-git-send-email-alex.shi@linux.alibaba.com
Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Daniel Jordan <daniel.m.jordan@oracle.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Jann Horn <jannh@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mika Penttilä <mika.penttila@nextfour.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Shakeel Butt <shakeelb@google.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Yang Shi <yang.shi@linux.alibaba.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Alex Shi and committed by
Linus Torvalds
d25b5bd8 13805a88

+21 -22
+1
include/linux/page-flags.h
··· 334 334 PAGEFLAG(Dirty, dirty, PF_HEAD) TESTSCFLAG(Dirty, dirty, PF_HEAD) 335 335 __CLEARPAGEFLAG(Dirty, dirty, PF_HEAD) 336 336 PAGEFLAG(LRU, lru, PF_HEAD) __CLEARPAGEFLAG(LRU, lru, PF_HEAD) 337 + TESTCLEARFLAG(LRU, lru, PF_HEAD) 337 338 PAGEFLAG(Active, active, PF_HEAD) __CLEARPAGEFLAG(Active, active, PF_HEAD) 338 339 TESTCLEARFLAG(Active, active, PF_HEAD) 339 340 PAGEFLAG(Workingset, workingset, PF_HEAD)
+1 -2
mm/mlock.c
··· 276 276 * We already have pin from follow_page_mask() 277 277 * so we can spare the get_page() here. 278 278 */ 279 - if (PageLRU(page)) { 279 + if (TestClearPageLRU(page)) { 280 280 struct lruvec *lruvec; 281 281 282 - ClearPageLRU(page); 283 282 lruvec = mem_cgroup_page_lruvec(page, 284 283 page_pgdat(page)); 285 284 del_page_from_lru_list(page, lruvec,
+19 -20
mm/vmscan.c
··· 1541 1541 */ 1542 1542 int __isolate_lru_page(struct page *page, isolate_mode_t mode) 1543 1543 { 1544 - int ret = -EINVAL; 1544 + int ret = -EBUSY; 1545 1545 1546 1546 /* Only take pages on the LRU. */ 1547 1547 if (!PageLRU(page)) ··· 1550 1550 /* Compaction should not handle unevictable pages but CMA can do so */ 1551 1551 if (PageUnevictable(page) && !(mode & ISOLATE_UNEVICTABLE)) 1552 1552 return ret; 1553 - 1554 - ret = -EBUSY; 1555 1553 1556 1554 /* 1557 1555 * To minimise LRU disruption, the caller can indicate that it only ··· 1597 1599 * sure the page is not being freed elsewhere -- the 1598 1600 * page release code relies on it. 1599 1601 */ 1600 - ClearPageLRU(page); 1601 - ret = 0; 1602 + if (TestClearPageLRU(page)) 1603 + ret = 0; 1604 + else 1605 + put_page(page); 1602 1606 } 1603 1607 1604 1608 return ret; ··· 1665 1665 1666 1666 page = lru_to_page(src); 1667 1667 prefetchw_prev_lru_page(page, src, flags); 1668 - 1669 - VM_BUG_ON_PAGE(!PageLRU(page), page); 1670 1668 1671 1669 nr_pages = compound_nr(page); 1672 1670 total_scan += nr_pages; ··· 1762 1764 VM_BUG_ON_PAGE(!page_count(page), page); 1763 1765 WARN_RATELIMIT(PageTail(page), "trying to isolate tail page"); 1764 1766 1765 - if (PageLRU(page)) { 1767 + if (TestClearPageLRU(page)) { 1766 1768 pg_data_t *pgdat = page_pgdat(page); 1767 1769 struct lruvec *lruvec; 1768 1770 1769 - spin_lock_irq(&pgdat->lru_lock); 1771 + get_page(page); 1770 1772 lruvec = mem_cgroup_page_lruvec(page, pgdat); 1771 - if (PageLRU(page)) { 1772 - int lru = page_lru(page); 1773 - get_page(page); 1774 - ClearPageLRU(page); 1775 - del_page_from_lru_list(page, lruvec, lru); 1776 - ret = 0; 1777 - } 1773 + spin_lock_irq(&pgdat->lru_lock); 1774 + del_page_from_lru_list(page, lruvec, page_lru(page)); 1778 1775 spin_unlock_irq(&pgdat->lru_lock); 1776 + ret = 0; 1779 1777 } 1778 + 1780 1779 return ret; 1781 1780 } 1782 1781 ··· 4284 4289 nr_pages = thp_nr_pages(page); 4285 4290 pgscanned += nr_pages; 4286 4291 4292 + /* block memcg migration during page moving between lru */ 4293 + if (!TestClearPageLRU(page)) 4294 + continue; 4295 + 4287 4296 if (pagepgdat != pgdat) { 4288 4297 if (pgdat) 4289 4298 spin_unlock_irq(&pgdat->lru_lock); ··· 4296 4297 } 4297 4298 lruvec = mem_cgroup_page_lruvec(page, pgdat); 4298 4299 4299 - if (!PageLRU(page) || !PageUnevictable(page)) 4300 - continue; 4301 - 4302 - if (page_evictable(page)) { 4300 + if (page_evictable(page) && PageUnevictable(page)) { 4303 4301 enum lru_list lru = page_lru_base_type(page); 4304 4302 4305 4303 VM_BUG_ON_PAGE(PageActive(page), page); ··· 4305 4309 add_page_to_lru_list(page, lruvec, lru); 4306 4310 pgrescued += nr_pages; 4307 4311 } 4312 + SetPageLRU(page); 4308 4313 } 4309 4314 4310 4315 if (pgdat) { 4311 4316 __count_vm_events(UNEVICTABLE_PGRESCUED, pgrescued); 4312 4317 __count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); 4313 4318 spin_unlock_irq(&pgdat->lru_lock); 4319 + } else if (pgscanned) { 4320 + count_vm_events(UNEVICTABLE_PGSCANNED, pgscanned); 4314 4321 } 4315 4322 } 4316 4323 EXPORT_SYMBOL_GPL(check_move_unevictable_pages);