arm64: hugetlb: Fix huge_ptep_get_and_clear() for non-present ptes

arm64 supports multiple huge_pte sizes. Some of the sizes are covered by
a single pte entry at a particular level (PMD_SIZE, PUD_SIZE), and some
are covered by multiple ptes at a particular level (CONT_PTE_SIZE,
CONT_PMD_SIZE). So the function has to figure out the size from the
huge_pte pointer. This was previously done by walking the pgtable to
determine the level and by using the PTE_CONT bit to determine the
number of ptes at the level.

But the PTE_CONT bit is only valid when the pte is present. For
non-present pte values (e.g. markers, migration entries), the previous
implementation was therefore erroneously determining the size. There is
at least one known caller in core-mm, move_huge_pte(), which may call
huge_ptep_get_and_clear() for a non-present pte. So we must be robust to
this case. Additionally the "regular" ptep_get_and_clear() is robust to
being called for non-present ptes so it makes sense to follow the
behavior.

Fix this by using the new sz parameter which is now provided to the
function. Additionally when clearing each pte in a contig range, don't
gather the access and dirty bits if the pte is not present.

An alternative approach that would not require API changes would be to
store the PTE_CONT bit in a spare bit in the swap entry pte for the
non-present case. But it felt cleaner to follow other APIs' lead and
just pass in the size.

As an aside, PTE_CONT is bit 52, which corresponds to bit 40 in the swap
entry offset field (layout of non-present pte). Since hugetlb is never
swapped to disk, this field will only be populated for markers, which
always set this bit to 0 and hwpoison swap entries, which set the offset
field to a PFN; So it would only ever be 1 for a 52-bit PVA system where
memory in that high half was poisoned (I think!). So in practice, this
bit would almost always be zero for non-present ptes and we would only
clear the first entry if it was actually a contiguous block. That's
probably a less severe symptom than if it was always interpreted as 1
and cleared out potentially-present neighboring PTEs.

Cc: stable@vger.kernel.org
Fixes: 66b3923a1a0f ("arm64: hugetlb: add support for PTE contiguous bit")
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Link: https://lore.kernel.org/r/20250226120656.2400136-3-ryan.roberts@arm.com
Signed-off-by: Will Deacon <will@kernel.org>

authored by Ryan Roberts and committed by Will Deacon 49c87f76 02410ac7

+19 -32
+19 -32
arch/arm64/mm/hugetlbpage.c
··· 100 100 101 101 static inline int num_contig_ptes(unsigned long size, size_t *pgsize) 102 102 { 103 - int contig_ptes = 0; 103 + int contig_ptes = 1; 104 104 105 105 *pgsize = size; 106 106 107 107 switch (size) { 108 - #ifndef __PAGETABLE_PMD_FOLDED 109 - case PUD_SIZE: 110 - if (pud_sect_supported()) 111 - contig_ptes = 1; 112 - break; 113 - #endif 114 - case PMD_SIZE: 115 - contig_ptes = 1; 116 - break; 117 108 case CONT_PMD_SIZE: 118 109 *pgsize = PMD_SIZE; 119 110 contig_ptes = CONT_PMDS; ··· 113 122 *pgsize = PAGE_SIZE; 114 123 contig_ptes = CONT_PTES; 115 124 break; 125 + default: 126 + WARN_ON(!__hugetlb_valid_size(size)); 116 127 } 117 128 118 129 return contig_ptes; ··· 156 163 unsigned long pgsize, 157 164 unsigned long ncontig) 158 165 { 159 - pte_t orig_pte = __ptep_get(ptep); 160 - unsigned long i; 166 + pte_t pte, tmp_pte; 167 + bool present; 161 168 162 - for (i = 0; i < ncontig; i++, addr += pgsize, ptep++) { 163 - pte_t pte = __ptep_get_and_clear(mm, addr, ptep); 164 - 165 - /* 166 - * If HW_AFDBM is enabled, then the HW could turn on 167 - * the dirty or accessed bit for any page in the set, 168 - * so check them all. 169 - */ 170 - if (pte_dirty(pte)) 171 - orig_pte = pte_mkdirty(orig_pte); 172 - 173 - if (pte_young(pte)) 174 - orig_pte = pte_mkyoung(orig_pte); 169 + pte = __ptep_get_and_clear(mm, addr, ptep); 170 + present = pte_present(pte); 171 + while (--ncontig) { 172 + ptep++; 173 + addr += pgsize; 174 + tmp_pte = __ptep_get_and_clear(mm, addr, ptep); 175 + if (present) { 176 + if (pte_dirty(tmp_pte)) 177 + pte = pte_mkdirty(pte); 178 + if (pte_young(tmp_pte)) 179 + pte = pte_mkyoung(pte); 180 + } 175 181 } 176 - return orig_pte; 182 + return pte; 177 183 } 178 184 179 185 static pte_t get_clear_contig_flush(struct mm_struct *mm, ··· 393 401 { 394 402 int ncontig; 395 403 size_t pgsize; 396 - pte_t orig_pte = __ptep_get(ptep); 397 404 398 - if (!pte_cont(orig_pte)) 399 - return __ptep_get_and_clear(mm, addr, ptep); 400 - 401 - ncontig = find_num_contig(mm, addr, ptep, &pgsize); 402 - 405 + ncontig = num_contig_ptes(sz, &pgsize); 403 406 return get_clear_contig(mm, addr, ptep, pgsize, ncontig); 404 407 } 405 408