Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm/huge_memory: respect MADV_COLLAPSE with PR_THP_DISABLE_EXCEPT_ADVISED

Let's allow for making MADV_COLLAPSE succeed on areas that neither have
VM_HUGEPAGE nor VM_NOHUGEPAGE when we have THP disabled unless explicitly
advised (PR_THP_DISABLE_EXCEPT_ADVISED).

MADV_COLLAPSE is a clear advice that we want to collapse.

Note that we still respect the VM_NOHUGEPAGE flag, just like
MADV_COLLAPSE always does. So consequently, MADV_COLLAPSE is now only
refused on VM_NOHUGEPAGE with PR_THP_DISABLE_EXCEPT_ADVISED,
including for shmem.

Link: https://lkml.kernel.org/r/20250815135549.130506-4-usamaarif642@gmail.com
Co-developed-by: Usama Arif <usamaarif642@gmail.com>
Signed-off-by: Usama Arif <usamaarif642@gmail.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Reviewed-by: Zi Yan <ziy@nvidia.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Barry Song <baohua@kernel.org>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: SeongJae Park <sj@kernel.org>
Cc: Shakeel Butt <shakeel.butt@linux.dev>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Yafang <laoar.shao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

David Hildenbrand and committed by
Andrew Morton
8cdc4d27 1f1c0610

+16 -7
+7 -1
include/linux/huge_mm.h
··· 329 329 * through madvise or prctl. 330 330 */ 331 331 static inline bool vma_thp_disabled(struct vm_area_struct *vma, 332 - vm_flags_t vm_flags) 332 + vm_flags_t vm_flags, bool forced_collapse) 333 333 { 334 334 /* Are THPs disabled for this VMA? */ 335 335 if (vm_flags & VM_NOHUGEPAGE) ··· 342 342 * advise to use them? 343 343 */ 344 344 if (vm_flags & VM_HUGEPAGE) 345 + return false; 346 + /* 347 + * Forcing a collapse (e.g., madv_collapse), is a clear advice to 348 + * use THPs. 349 + */ 350 + if (forced_collapse) 345 351 return false; 346 352 return mm_flags_test(MMF_DISABLE_THP_EXCEPT_ADVISED, vma->vm_mm); 347 353 }
+1 -1
include/uapi/linux/prctl.h
··· 185 185 #define PR_SET_THP_DISABLE 41 186 186 /* 187 187 * Don't disable THPs when explicitly advised (e.g., MADV_HUGEPAGE / 188 - * VM_HUGEPAGE). 188 + * VM_HUGEPAGE, MADV_COLLAPSE). 189 189 */ 190 190 # define PR_THP_DISABLE_EXCEPT_ADVISED (1 << 1) 191 191 #define PR_GET_THP_DISABLE 42
+3 -2
mm/huge_memory.c
··· 104 104 { 105 105 const bool smaps = type == TVA_SMAPS; 106 106 const bool in_pf = type == TVA_PAGEFAULT; 107 - const bool enforce_sysfs = type != TVA_FORCED_COLLAPSE; 107 + const bool forced_collapse = type == TVA_FORCED_COLLAPSE; 108 + const bool enforce_sysfs = !forced_collapse; 108 109 unsigned long supported_orders; 109 110 110 111 /* Check the intersection of requested and supported orders. */ ··· 123 122 if (!vma->vm_mm) /* vdso */ 124 123 return 0; 125 124 126 - if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags)) 125 + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vm_flags, forced_collapse)) 127 126 return 0; 128 127 129 128 /* khugepaged doesn't collapse DAX vma, but page fault is fine. */
+4 -2
mm/memory.c
··· 5332 5332 * It is too late to allocate a small folio, we already have a large 5333 5333 * folio in the pagecache: especially s390 KVM cannot tolerate any 5334 5334 * PMD mappings, but PTE-mapped THP are fine. So let's simply refuse any 5335 - * PMD mappings if THPs are disabled. 5335 + * PMD mappings if THPs are disabled. As we already have a THP, 5336 + * behave as if we are forcing a collapse. 5336 5337 */ 5337 - if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags)) 5338 + if (thp_disabled_by_hw() || vma_thp_disabled(vma, vma->vm_flags, 5339 + /* forced_collapse=*/ true)) 5338 5340 return ret; 5339 5341 5340 5342 if (!thp_vma_suitable_order(vma, haddr, PMD_ORDER))
+1 -1
mm/shmem.c
··· 1817 1817 vm_flags_t vm_flags = vma ? vma->vm_flags : 0; 1818 1818 unsigned int global_orders; 1819 1819 1820 - if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags))) 1820 + if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags, shmem_huge_force))) 1821 1821 return 0; 1822 1822 1823 1823 global_orders = shmem_huge_global_enabled(inode, index, write_end,