Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: softdirty: add pgtable_supports_soft_dirty()

Patch series "mm: Add soft-dirty and uffd-wp support for RISC-V", v15.

This patchset adds support for Svrsw60t59b [1] extension which is ratified
now, also add soft dirty and userfaultfd write protect tracking for
RISC-V.

The patches 1 and 2 add macros to allow architectures to define their own
checks if the soft-dirty / uffd_wp PTE bits are available, in other words
for RISC-V, the Svrsw60t59b extension is supported on which device the
kernel is running. Also patch1-2 are removing "ifdef
CONFIG_MEM_SOFT_DIRTY" "ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP" and "ifdef
CONFIG_PTE_MARKER_UFFD_WP" in favor of checks which if not overridden by
the architecture, no change in behavior is expected.

This patchset has been tested with kselftest mm suite in which soft-dirty,
madv_populate, test_unmerge_uffd_wp, and uffd-unit-tests run and pass, and
no regressions are observed in any of the other tests.


This patch (of 6):

Some platforms can customize the PTE PMD entry soft-dirty bit making it
unavailable even if the architecture provides the resource.

Add an API which architectures can define their specific implementations
to detect if soft-dirty bit is available on which device the kernel is
running.

This patch is removing "ifdef CONFIG_MEM_SOFT_DIRTY" in favor of
pgtable_supports_soft_dirty() checks that defaults to
IS_ENABLED(CONFIG_MEM_SOFT_DIRTY), if not overridden by the architecture,
no change in behavior is expected.

We make sure to never set VM_SOFTDIRTY if !pgtable_supports_soft_dirty(),
so we will never run into VM_SOFTDIRTY checks.

[lorenzo.stoakes@oracle.com: fix VMA selftests]
Link: https://lkml.kernel.org/r/dac6ddfe-773a-43d5-8f69-021b9ca4d24b@lucifer.local
Link: https://lkml.kernel.org/r/20251113072806.795029-1-zhangchunyan@iscas.ac.cn
Link: https://lkml.kernel.org/r/20251113072806.795029-2-zhangchunyan@iscas.ac.cn
Link: https://github.com/riscv-non-isa/riscv-iommu/pull/543 [1]
Signed-off-by: Chunyan Zhang <zhangchunyan@iscas.ac.cn>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Alexandre Ghiti <alex@ghiti.fr>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Conor Dooley <conor@kernel.org>
Cc: Deepak Gupta <debug@rivosinc.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yuanchu Xie <yuanchu@google.com>
Cc: Alexandre Ghiti <alexghiti@rivosinc.com>
Cc: Andrew Jones <ajones@ventanamicro.com>
Cc: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Chunyan Zhang and committed by
Andrew Morton
277a1ae3 d85b653f

+59 -38
+6 -9
fs/proc/task_mmu.c
··· 1584 1584 enum clear_refs_types type; 1585 1585 }; 1586 1586 1587 - #ifdef CONFIG_MEM_SOFT_DIRTY 1588 - 1589 1587 static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr, pte_t pte) 1590 1588 { 1591 1589 struct folio *folio; ··· 1603 1605 static inline void clear_soft_dirty(struct vm_area_struct *vma, 1604 1606 unsigned long addr, pte_t *pte) 1605 1607 { 1608 + if (!pgtable_supports_soft_dirty()) 1609 + return; 1606 1610 /* 1607 1611 * The soft-dirty tracker uses #PF-s to catch writes 1608 1612 * to pages, so write-protect the pte as well. See the ··· 1630 1630 set_pte_at(vma->vm_mm, addr, pte, ptent); 1631 1631 } 1632 1632 } 1633 - #else 1634 - static inline void clear_soft_dirty(struct vm_area_struct *vma, 1635 - unsigned long addr, pte_t *pte) 1636 - { 1637 - } 1638 - #endif 1639 1633 1640 - #if defined(CONFIG_MEM_SOFT_DIRTY) && defined(CONFIG_TRANSPARENT_HUGEPAGE) 1634 + #if defined(CONFIG_TRANSPARENT_HUGEPAGE) 1641 1635 static inline void clear_soft_dirty_pmd(struct vm_area_struct *vma, 1642 1636 unsigned long addr, pmd_t *pmdp) 1643 1637 { 1644 1638 pmd_t old, pmd = *pmdp; 1639 + 1640 + if (!pgtable_supports_soft_dirty()) 1641 + return; 1645 1642 1646 1643 if (pmd_present(pmd)) { 1647 1644 /* See comment in change_huge_pmd() */
+3
include/linux/mm.h
··· 859 859 static inline void vm_flags_init(struct vm_area_struct *vma, 860 860 vm_flags_t flags) 861 861 { 862 + VM_WARN_ON_ONCE(!pgtable_supports_soft_dirty() && (flags & VM_SOFTDIRTY)); 862 863 ACCESS_PRIVATE(vma, __vm_flags) = flags; 863 864 } 864 865 ··· 871 870 static inline void vm_flags_reset(struct vm_area_struct *vma, 872 871 vm_flags_t flags) 873 872 { 873 + VM_WARN_ON_ONCE(!pgtable_supports_soft_dirty() && (flags & VM_SOFTDIRTY)); 874 874 vma_assert_write_locked(vma); 875 875 vm_flags_init(vma, flags); 876 876 } ··· 893 891 static inline void vm_flags_clear(struct vm_area_struct *vma, 894 892 vm_flags_t flags) 895 893 { 894 + VM_WARN_ON_ONCE(!pgtable_supports_soft_dirty() && (flags & VM_SOFTDIRTY)); 896 895 vma_start_write(vma); 897 896 ACCESS_PRIVATE(vma, __vm_flags) &= ~flags; 898 897 }
+12
include/linux/pgtable.h
··· 1553 1553 #define arch_start_context_switch(prev) do {} while (0) 1554 1554 #endif 1555 1555 1556 + /* 1557 + * Some platforms can customize the PTE soft-dirty bit making it unavailable 1558 + * even if the architecture provides the resource. 1559 + * Adding this API allows architectures to add their own checks for the 1560 + * devices on which the kernel is running. 1561 + * Note: When overriding it, please make sure the CONFIG_MEM_SOFT_DIRTY 1562 + * is part of this macro. 1563 + */ 1564 + #ifndef pgtable_supports_soft_dirty 1565 + #define pgtable_supports_soft_dirty() IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) 1566 + #endif 1567 + 1556 1568 #ifdef CONFIG_HAVE_ARCH_SOFT_DIRTY 1557 1569 #ifndef CONFIG_ARCH_ENABLE_THP_MIGRATION 1558 1570 static inline pmd_t pmd_swp_mksoft_dirty(pmd_t pmd)
+5 -5
mm/debug_vm_pgtable.c
··· 704 704 { 705 705 pte_t pte = pfn_pte(args->fixed_pte_pfn, args->page_prot); 706 706 707 - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) 707 + if (!pgtable_supports_soft_dirty()) 708 708 return; 709 709 710 710 pr_debug("Validating PTE soft dirty\n"); ··· 717 717 pte_t pte; 718 718 softleaf_t entry; 719 719 720 - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) 720 + if (!pgtable_supports_soft_dirty()) 721 721 return; 722 722 723 723 pr_debug("Validating PTE swap soft dirty\n"); ··· 734 734 { 735 735 pmd_t pmd; 736 736 737 - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) 737 + if (!pgtable_supports_soft_dirty()) 738 738 return; 739 739 740 740 if (!has_transparent_hugepage()) ··· 750 750 { 751 751 pmd_t pmd; 752 752 753 - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) || 754 - !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION)) 753 + if (!pgtable_supports_soft_dirty() || 754 + !IS_ENABLED(CONFIG_ARCH_ENABLE_THP_MIGRATION)) 755 755 return; 756 756 757 757 if (!has_transparent_hugepage())
+7 -6
mm/huge_memory.c
··· 2427 2427 2428 2428 static pmd_t move_soft_dirty_pmd(pmd_t pmd) 2429 2429 { 2430 - #ifdef CONFIG_MEM_SOFT_DIRTY 2431 - if (unlikely(pmd_is_migration_entry(pmd))) 2432 - pmd = pmd_swp_mksoft_dirty(pmd); 2433 - else if (pmd_present(pmd)) 2434 - pmd = pmd_mksoft_dirty(pmd); 2435 - #endif 2430 + if (pgtable_supports_soft_dirty()) { 2431 + if (unlikely(pmd_is_migration_entry(pmd))) 2432 + pmd = pmd_swp_mksoft_dirty(pmd); 2433 + else if (pmd_present(pmd)) 2434 + pmd = pmd_mksoft_dirty(pmd); 2435 + } 2436 + 2436 2437 return pmd; 2437 2438 } 2438 2439
+1 -1
mm/internal.h
··· 1554 1554 * VM_SOFTDIRTY is defined as 0x0, then !(vm_flags & VM_SOFTDIRTY) 1555 1555 * will be constantly true. 1556 1556 */ 1557 - if (!IS_ENABLED(CONFIG_MEM_SOFT_DIRTY)) 1557 + if (!pgtable_supports_soft_dirty()) 1558 1558 return false; 1559 1559 1560 1560 /*
+4 -2
mm/mmap.c
··· 1448 1448 return ERR_PTR(-ENOMEM); 1449 1449 1450 1450 vma_set_range(vma, addr, addr + len, 0); 1451 - vm_flags_init(vma, (vm_flags | mm->def_flags | 1452 - VM_DONTEXPAND | VM_SOFTDIRTY) & ~VM_LOCKED_MASK); 1451 + vm_flags |= mm->def_flags | VM_DONTEXPAND; 1452 + if (pgtable_supports_soft_dirty()) 1453 + vm_flags |= VM_SOFTDIRTY; 1454 + vm_flags_init(vma, vm_flags & ~VM_LOCKED_MASK); 1453 1455 vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); 1454 1456 1455 1457 vma->vm_ops = ops;
+7 -6
mm/mremap.c
··· 165 165 * Set soft dirty bit so we can notice 166 166 * in userspace the ptes were moved. 167 167 */ 168 - #ifdef CONFIG_MEM_SOFT_DIRTY 169 - if (pte_present(pte)) 170 - pte = pte_mksoft_dirty(pte); 171 - else 172 - pte = pte_swp_mksoft_dirty(pte); 173 - #endif 168 + if (pgtable_supports_soft_dirty()) { 169 + if (pte_present(pte)) 170 + pte = pte_mksoft_dirty(pte); 171 + else 172 + pte = pte_swp_mksoft_dirty(pte); 173 + } 174 + 174 175 return pte; 175 176 } 176 177
+4 -6
mm/userfaultfd.c
··· 1119 1119 1120 1120 orig_dst_pte = folio_mk_pte(src_folio, dst_vma->vm_page_prot); 1121 1121 /* Set soft dirty bit so userspace can notice the pte was moved */ 1122 - #ifdef CONFIG_MEM_SOFT_DIRTY 1123 - orig_dst_pte = pte_mksoft_dirty(orig_dst_pte); 1124 - #endif 1122 + if (pgtable_supports_soft_dirty()) 1123 + orig_dst_pte = pte_mksoft_dirty(orig_dst_pte); 1125 1124 if (pte_dirty(orig_src_pte)) 1126 1125 orig_dst_pte = pte_mkdirty(orig_dst_pte); 1127 1126 orig_dst_pte = pte_mkwrite(orig_dst_pte, dst_vma); ··· 1207 1208 } 1208 1209 1209 1210 orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte); 1210 - #ifdef CONFIG_MEM_SOFT_DIRTY 1211 - orig_src_pte = pte_swp_mksoft_dirty(orig_src_pte); 1212 - #endif 1211 + if (pgtable_supports_soft_dirty()) 1212 + orig_src_pte = pte_swp_mksoft_dirty(orig_src_pte); 1213 1213 set_pte_at(mm, dst_addr, dst_pte, orig_src_pte); 1214 1214 double_pt_unlock(dst_ptl, src_ptl); 1215 1215
+4 -2
mm/vma.c
··· 2559 2559 * then new mapped in-place (which must be aimed as 2560 2560 * a completely new data area). 2561 2561 */ 2562 - vm_flags_set(vma, VM_SOFTDIRTY); 2562 + if (pgtable_supports_soft_dirty()) 2563 + vm_flags_set(vma, VM_SOFTDIRTY); 2563 2564 2564 2565 vma_set_page_prot(vma); 2565 2566 } ··· 2865 2864 mm->data_vm += len >> PAGE_SHIFT; 2866 2865 if (vm_flags & VM_LOCKED) 2867 2866 mm->locked_vm += (len >> PAGE_SHIFT); 2868 - vm_flags_set(vma, VM_SOFTDIRTY); 2867 + if (pgtable_supports_soft_dirty()) 2868 + vm_flags_set(vma, VM_SOFTDIRTY); 2869 2869 return 0; 2870 2870 2871 2871 mas_store_fail:
+4 -1
mm/vma_exec.c
··· 107 107 int create_init_stack_vma(struct mm_struct *mm, struct vm_area_struct **vmap, 108 108 unsigned long *top_mem_p) 109 109 { 110 + unsigned long flags = VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP; 110 111 int err; 111 112 struct vm_area_struct *vma = vm_area_alloc(mm); 112 113 ··· 138 137 BUILD_BUG_ON(VM_STACK_FLAGS & VM_STACK_INCOMPLETE_SETUP); 139 138 vma->vm_end = STACK_TOP_MAX; 140 139 vma->vm_start = vma->vm_end - PAGE_SIZE; 141 - vm_flags_init(vma, VM_SOFTDIRTY | VM_STACK_FLAGS | VM_STACK_INCOMPLETE_SETUP); 140 + if (pgtable_supports_soft_dirty()) 141 + flags |= VM_SOFTDIRTY; 142 + vm_flags_init(vma, flags); 142 143 vma->vm_page_prot = vm_get_page_prot(vma->vm_flags); 143 144 144 145 err = insert_vm_struct(mm, vma);
+2
tools/testing/vma/vma_internal.h
··· 212 212 213 213 #define ASSERT_EXCLUSIVE_WRITER(x) 214 214 215 + #define pgtable_supports_soft_dirty() 1 216 + 215 217 /** 216 218 * swap - swap values of @a and @b 217 219 * @a: first value