Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

arm64: contpte: fix set_access_flags() no-op check for SMMU/ATS faults

contpte_ptep_set_access_flags() compared the gathered ptep_get() value
against the requested entry to detect no-ops. ptep_get() ORs AF/dirty
from all sub-PTEs in the CONT block, so a dirty sibling can make the
target appear already-dirty. When the gathered value matches entry, the
function returns 0 even though the target sub-PTE still has PTE_RDONLY
set in hardware.

For a CPU with FEAT_HAFDBS this gathered view is fine, since hardware may
set AF/dirty on any sub-PTE and CPU TLB behavior is effectively gathered
across the CONT range. But page-table walkers that evaluate each
descriptor individually (e.g. a CPU without DBM support, or an SMMU
without HTTU, or with HA/HD disabled in CD.TCR) can keep faulting on the
unchanged target sub-PTE, causing an infinite fault loop.

Gathering can therefore cause false no-ops when only a sibling has been
updated:
- write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared)
- read faults: target still lacks PTE_AF

Fix by checking each sub-PTE against the requested AF/dirty/write state
(the same bits consumed by __ptep_set_access_flags()), using raw
per-PTE values rather than the gathered ptep_get() view, before
returning no-op. Keep using the raw target PTE for the write-bit unfold
decision.

Per Arm ARM (DDI 0487) D8.7.1 ("The Contiguous bit"), any sub-PTE in a CONT
range may become the effective cached translation and software must
maintain consistent attributes across the range.

Fixes: 4602e5757bcc ("arm64/mm: wire up PTE_CONT for user mappings")
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Breno Leitao <leitao@debian.org>
Cc: stable@vger.kernel.org
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: James Houghton <jthoughton@google.com>
Reviewed-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com>
Acked-by: Balbir Singh <balbirs@nvidia.com>
Signed-off-by: Will Deacon <will@kernel.org>

authored by

Piotr Jaroszynski and committed by
Will Deacon
97c5550b 0100e495

+49 -4
+49 -4
arch/arm64/mm/contpte.c
··· 599 599 } 600 600 EXPORT_SYMBOL_GPL(contpte_clear_young_dirty_ptes); 601 601 602 + static bool contpte_all_subptes_match_access_flags(pte_t *ptep, pte_t entry) 603 + { 604 + pte_t *cont_ptep = contpte_align_down(ptep); 605 + /* 606 + * PFNs differ per sub-PTE. Match only bits consumed by 607 + * __ptep_set_access_flags(): AF, DIRTY and write permission. 608 + */ 609 + const pteval_t cmp_mask = PTE_RDONLY | PTE_AF | PTE_WRITE | PTE_DIRTY; 610 + pteval_t entry_cmp = pte_val(entry) & cmp_mask; 611 + int i; 612 + 613 + for (i = 0; i < CONT_PTES; i++) { 614 + pteval_t pte_cmp = pte_val(__ptep_get(cont_ptep + i)) & cmp_mask; 615 + 616 + if (pte_cmp != entry_cmp) 617 + return false; 618 + } 619 + 620 + return true; 621 + } 622 + 602 623 int contpte_ptep_set_access_flags(struct vm_area_struct *vma, 603 624 unsigned long addr, pte_t *ptep, 604 625 pte_t entry, int dirty) ··· 629 608 int i; 630 609 631 610 /* 632 - * Gather the access/dirty bits for the contiguous range. If nothing has 633 - * changed, its a noop. 611 + * Check whether all sub-PTEs in the CONT block already match the 612 + * requested access flags/write permission, using raw per-PTE values 613 + * rather than the gathered ptep_get() view. 614 + * 615 + * __ptep_set_access_flags() can update AF, dirty and write 616 + * permission, but only to make the mapping more permissive. 617 + * 618 + * ptep_get() gathers AF/dirty state across the whole CONT block, 619 + * which is correct for a CPU with FEAT_HAFDBS. But page-table 620 + * walkers that evaluate each descriptor individually (e.g. a CPU 621 + * without DBM support, or an SMMU without HTTU, or with HA/HD 622 + * disabled in CD.TCR) can keep faulting on the target sub-PTE if 623 + * only a sibling has been updated. Gathering can therefore cause 624 + * false no-ops when only a sibling has been updated: 625 + * - write faults: target still has PTE_RDONLY (needs PTE_RDONLY cleared) 626 + * - read faults: target still lacks PTE_AF 627 + * 628 + * Per Arm ARM (DDI 0487) D8.7.1, any sub-PTE in a CONT range may 629 + * become the effective cached translation, so all entries must have 630 + * consistent attributes. Check the full CONT block before returning 631 + * no-op, and when any sub-PTE mismatches, proceed to update the whole 632 + * range. 634 633 */ 635 - orig_pte = pte_mknoncont(ptep_get(ptep)); 636 - if (pte_val(orig_pte) == pte_val(entry)) 634 + if (contpte_all_subptes_match_access_flags(ptep, entry)) 637 635 return 0; 636 + 637 + /* 638 + * Use raw target pte (not gathered) for write-bit unfold decision. 639 + */ 640 + orig_pte = pte_mknoncont(__ptep_get(ptep)); 638 641 639 642 /* 640 643 * We can fix up access/dirty bits without having to unfold the contig