Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: shmem: support large folio swap out

Shmem will support large folio allocation [1] [2] to get a better
performance, however, the memory reclaim still splits the precious large
folios when trying to swap out shmem, which may lead to the memory
fragmentation issue and can not take advantage of the large folio for
shmeme.

Moreover, the swap code already supports for swapping out large folio
without split, hence this patch set supports the large folio swap out for
shmem.

Note the i915_gem_shmem driver still need to be split when swapping, thus
add a new flag 'split_large_folio' for writeback_control to indicate
spliting the large folio.

[1] https://lore.kernel.org/all/cover.1717495894.git.baolin.wang@linux.alibaba.com/
[2] https://lore.kernel.org/all/20240515055719.32577-1-da.gomez@samsung.com/

[hughd@google.com: shmem_writepage() split folio at EOF before swapout]
Link: https://lkml.kernel.org/r/aef55f8d-6040-692d-65e3-16150cce4440@google.com
[baolin.wang@linux.alibaba.com: remove the wbc->split_large_folio per Hugh]
Link: https://lkml.kernel.org/r/1236a002daa301b3b9ba73d6c0fab348427cf295.1724833399.git.baolin.wang@linux.alibaba.com
Link: https://lkml.kernel.org/r/d80c21abd20e1b0f5ca66b330f074060fb2f082d.1723434324.git.baolin.wang@linux.alibaba.com
Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: David Hildenbrand <david@redhat.com>
Cc: "Huang, Ying" <ying.huang@intel.com>
Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Lance Yang <ioworker0@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Pankaj Raghav <p.raghav@samsung.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Baolin Wang and committed by
Andrew Morton
809bc865 12885cbe

+48 -13
+3
include/linux/writeback.h
··· 79 79 */ 80 80 struct swap_iocb **swap_plug; 81 81 82 + /* Target list for splitting a large folio */ 83 + struct list_head *list; 84 + 82 85 /* internal fields used by the ->writepages implementation: */ 83 86 struct folio_batch fbatch; 84 87 pgoff_t index;
+22 -6
mm/shmem.c
··· 795 795 VM_BUG_ON_FOLIO(index != round_down(index, nr), folio); 796 796 VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); 797 797 VM_BUG_ON_FOLIO(!folio_test_swapbacked(folio), folio); 798 - VM_BUG_ON(expected && folio_test_large(folio)); 799 798 800 799 folio_ref_add(folio, nr); 801 800 folio->mapping = mapping; ··· 1459 1460 swp_entry_t swap; 1460 1461 pgoff_t index; 1461 1462 int nr_pages; 1463 + bool split = false; 1462 1464 1463 1465 /* 1464 1466 * Our capabilities prevent regular writeback or sync from ever calling ··· 1478 1478 goto redirty; 1479 1479 1480 1480 /* 1481 - * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or 1482 - * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages, 1483 - * and its shmem_writeback() needs them to be split when swapping. 1481 + * If CONFIG_THP_SWAP is not enabled, the large folio should be 1482 + * split when swapping. 1483 + * 1484 + * And shrinkage of pages beyond i_size does not split swap, so 1485 + * swapout of a large folio crossing i_size needs to split too 1486 + * (unless fallocate has been used to preallocate beyond EOF). 1484 1487 */ 1485 1488 if (folio_test_large(folio)) { 1489 + index = shmem_fallocend(inode, 1490 + DIV_ROUND_UP(i_size_read(inode), PAGE_SIZE)); 1491 + if ((index > folio->index && index < folio_next_index(folio)) || 1492 + !IS_ENABLED(CONFIG_THP_SWAP)) 1493 + split = true; 1494 + } 1495 + 1496 + if (split) { 1497 + try_split: 1486 1498 /* Ensure the subpages are still dirty */ 1487 1499 folio_test_set_dirty(folio); 1488 - if (split_huge_page(page) < 0) 1500 + if (split_huge_page_to_list_to_order(page, wbc->list, 0)) 1489 1501 goto redirty; 1490 1502 folio = page_folio(page); 1491 1503 folio_clear_dirty(folio); ··· 1539 1527 } 1540 1528 1541 1529 swap = folio_alloc_swap(folio); 1542 - if (!swap.val) 1530 + if (!swap.val) { 1531 + if (nr_pages > 1) 1532 + goto try_split; 1533 + 1543 1534 goto redirty; 1535 + } 1544 1536 1545 1537 /* 1546 1538 * Add inode to shmem_unuse()'s list of swapped-out inodes,
+23 -7
mm/vmscan.c
··· 628 628 * Calls ->writepage(). 629 629 */ 630 630 static pageout_t pageout(struct folio *folio, struct address_space *mapping, 631 - struct swap_iocb **plug) 631 + struct swap_iocb **plug, struct list_head *folio_list) 632 632 { 633 633 /* 634 634 * If the folio is dirty, only perform writeback if that write ··· 675 675 .for_reclaim = 1, 676 676 .swap_plug = plug, 677 677 }; 678 + 679 + /* 680 + * The large shmem folio can be split if CONFIG_THP_SWAP is 681 + * not enabled or contiguous swap entries are failed to 682 + * allocate. 683 + */ 684 + if (shmem_mapping(mapping) && folio_test_large(folio)) 685 + wbc.list = folio_list; 678 686 679 687 folio_set_reclaim(folio); 680 688 res = mapping->a_ops->writepage(&folio->page, &wbc); ··· 1265 1257 goto activate_locked_split; 1266 1258 } 1267 1259 } 1268 - } else if (folio_test_swapbacked(folio) && 1269 - folio_test_large(folio)) { 1270 - /* Split shmem folio */ 1271 - if (split_folio_to_list(folio, folio_list)) 1272 - goto keep_locked; 1273 1260 } 1274 1261 1275 1262 /* ··· 1365 1362 * starts and then write it out here. 1366 1363 */ 1367 1364 try_to_unmap_flush_dirty(); 1368 - switch (pageout(folio, mapping, &plug)) { 1365 + switch (pageout(folio, mapping, &plug, folio_list)) { 1369 1366 case PAGE_KEEP: 1370 1367 goto keep_locked; 1371 1368 case PAGE_ACTIVATE: 1369 + /* 1370 + * If shmem folio is split when writeback to swap, 1371 + * the tail pages will make their own pass through 1372 + * this function and be accounted then. 1373 + */ 1374 + if (nr_pages > 1 && !folio_test_large(folio)) { 1375 + sc->nr_scanned -= (nr_pages - 1); 1376 + nr_pages = 1; 1377 + } 1372 1378 goto activate_locked; 1373 1379 case PAGE_SUCCESS: 1380 + if (nr_pages > 1 && !folio_test_large(folio)) { 1381 + sc->nr_scanned -= (nr_pages - 1); 1382 + nr_pages = 1; 1383 + } 1374 1384 stat->nr_pageout += nr_pages; 1375 1385 1376 1386 if (folio_test_writeback(folio))