Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm/memory: factor out common code from vm_normal_page_*()

Let's reduce the code duplication and factor out the non-pte/pmd related
magic into __vm_normal_page().

To keep it simpler, check the pfn against both zero folios, which
shouldn't really make a difference.

It's a good question if we can even hit the !CONFIG_ARCH_HAS_PTE_SPECIAL
scenario in the PMD case in practice: but doesn't really matter, as it's
now all unified in vm_normal_page_pfn().

Add kerneldoc for all involved functions.

Note that, as a side product, we now:
* Support the find_special_page special thingy also for PMD
* Don't check for is_huge_zero_pfn() anymore if we have
CONFIG_ARCH_HAS_PTE_SPECIAL and the PMD is not special. The
VM_WARN_ON_ONCE would catch any abuse

No functional change intended.

Link: https://lkml.kernel.org/r/20250811112631.759341-10-david@redhat.com
Signed-off-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Cc: Dev Jain <dev.jain@arm.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Jann Horn <jannh@google.com>
Cc: Juegren Gross <jgross@suse.com>
Cc: Lance Yang <lance.yang@linux.dev>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mariano Pache <npache@redhat.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Wei Yang <richard.weiyang@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

David Hildenbrand and committed by
Andrew Morton
af385388 ec63a440

+120 -88
+120 -88
mm/memory.c
··· 614 614 #define print_bad_pte(vma, addr, pte, page) \ 615 615 print_bad_page_map(vma, addr, pte_val(pte), page, PGTABLE_LEVEL_PTE) 616 616 617 - /* 618 - * vm_normal_page -- This function gets the "struct page" associated with a pte. 617 + /** 618 + * __vm_normal_page() - Get the "struct page" associated with a page table entry. 619 + * @vma: The VMA mapping the page table entry. 620 + * @addr: The address where the page table entry is mapped. 621 + * @pfn: The PFN stored in the page table entry. 622 + * @special: Whether the page table entry is marked "special". 623 + * @level: The page table level for error reporting purposes only. 624 + * @entry: The page table entry value for error reporting purposes only. 619 625 * 620 626 * "Special" mappings do not wish to be associated with a "struct page" (either 621 627 * it doesn't exist, or it exists but they don't want to touch it). In this ··· 634 628 * Selected page table walkers (such as GUP) can still identify mappings of the 635 629 * shared zero folios and work with the underlying "struct page". 636 630 * 637 - * There are 2 broad cases. Firstly, an architecture may define a pte_special() 638 - * pte bit, in which case this function is trivial. Secondly, an architecture 639 - * may not have a spare pte bit, which requires a more complicated scheme, 640 - * described below. 631 + * There are 2 broad cases. Firstly, an architecture may define a "special" 632 + * page table entry bit, such as pte_special(), in which case this function is 633 + * trivial. Secondly, an architecture may not have a spare page table 634 + * entry bit, which requires a more complicated scheme, described below. 641 635 * 642 636 * A raw VM_PFNMAP mapping (ie. one that is not COWed) is always considered a 643 637 * special mapping (even if there are underlying and valid "struct pages"). ··· 670 664 * don't have to follow the strict linearity rule of PFNMAP mappings in 671 665 * order to support COWable mappings. 672 666 * 667 + * Return: Returns the "struct page" if this is a "normal" mapping. Returns 668 + * NULL if this is a "special" mapping. 669 + */ 670 + static inline struct page *__vm_normal_page(struct vm_area_struct *vma, 671 + unsigned long addr, unsigned long pfn, bool special, 672 + unsigned long long entry, enum pgtable_level level) 673 + { 674 + if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) { 675 + if (unlikely(special)) { 676 + if (vma->vm_ops && vma->vm_ops->find_special_page) 677 + return vma->vm_ops->find_special_page(vma, addr); 678 + if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) 679 + return NULL; 680 + if (is_zero_pfn(pfn) || is_huge_zero_pfn(pfn)) 681 + return NULL; 682 + 683 + print_bad_page_map(vma, addr, entry, NULL, level); 684 + return NULL; 685 + } 686 + /* 687 + * With CONFIG_ARCH_HAS_PTE_SPECIAL, any special page table 688 + * mappings (incl. shared zero folios) are marked accordingly. 689 + */ 690 + } else { 691 + if (unlikely(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))) { 692 + if (vma->vm_flags & VM_MIXEDMAP) { 693 + /* If it has a "struct page", it's "normal". */ 694 + if (!pfn_valid(pfn)) 695 + return NULL; 696 + } else { 697 + unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT; 698 + 699 + /* Only CoW'ed anon folios are "normal". */ 700 + if (pfn == vma->vm_pgoff + off) 701 + return NULL; 702 + if (!is_cow_mapping(vma->vm_flags)) 703 + return NULL; 704 + } 705 + } 706 + 707 + if (is_zero_pfn(pfn) || is_huge_zero_pfn(pfn)) 708 + return NULL; 709 + } 710 + 711 + if (unlikely(pfn > highest_memmap_pfn)) { 712 + /* Corrupted page table entry. */ 713 + print_bad_page_map(vma, addr, entry, NULL, level); 714 + return NULL; 715 + } 716 + /* 717 + * NOTE! We still have PageReserved() pages in the page tables. 718 + * For example, VDSO mappings can cause them to exist. 719 + */ 720 + VM_WARN_ON_ONCE(is_zero_pfn(pfn) || is_huge_zero_pfn(pfn)); 721 + return pfn_to_page(pfn); 722 + } 723 + 724 + /** 725 + * vm_normal_page() - Get the "struct page" associated with a PTE 726 + * @vma: The VMA mapping the @pte. 727 + * @addr: The address where the @pte is mapped. 728 + * @pte: The PTE. 729 + * 730 + * Get the "struct page" associated with a PTE. See __vm_normal_page() 731 + * for details on "normal" and "special" mappings. 732 + * 733 + * Return: Returns the "struct page" if this is a "normal" mapping. Returns 734 + * NULL if this is a "special" mapping. 673 735 */ 674 736 struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, 675 737 pte_t pte) 676 738 { 677 - unsigned long pfn = pte_pfn(pte); 678 - 679 - if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL)) { 680 - if (likely(!pte_special(pte))) 681 - goto check_pfn; 682 - if (vma->vm_ops && vma->vm_ops->find_special_page) 683 - return vma->vm_ops->find_special_page(vma, addr); 684 - if (vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP)) 685 - return NULL; 686 - if (is_zero_pfn(pfn)) 687 - return NULL; 688 - 689 - print_bad_pte(vma, addr, pte, NULL); 690 - return NULL; 691 - } 692 - 693 - /* !CONFIG_ARCH_HAS_PTE_SPECIAL case follows: */ 694 - 695 - if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { 696 - if (vma->vm_flags & VM_MIXEDMAP) { 697 - if (!pfn_valid(pfn)) 698 - return NULL; 699 - if (is_zero_pfn(pfn)) 700 - return NULL; 701 - goto out; 702 - } else { 703 - unsigned long off; 704 - off = (addr - vma->vm_start) >> PAGE_SHIFT; 705 - if (pfn == vma->vm_pgoff + off) 706 - return NULL; 707 - if (!is_cow_mapping(vma->vm_flags)) 708 - return NULL; 709 - } 710 - } 711 - 712 - if (is_zero_pfn(pfn)) 713 - return NULL; 714 - 715 - check_pfn: 716 - if (unlikely(pfn > highest_memmap_pfn)) { 717 - print_bad_pte(vma, addr, pte, NULL); 718 - return NULL; 719 - } 720 - 721 - /* 722 - * NOTE! We still have PageReserved() pages in the page tables. 723 - * eg. VDSO mappings can cause them to exist. 724 - */ 725 - out: 726 - VM_WARN_ON_ONCE(is_zero_pfn(pfn)); 727 - return pfn_to_page(pfn); 739 + return __vm_normal_page(vma, addr, pte_pfn(pte), pte_special(pte), 740 + pte_val(pte), PGTABLE_LEVEL_PTE); 728 741 } 729 742 743 + /** 744 + * vm_normal_folio() - Get the "struct folio" associated with a PTE 745 + * @vma: The VMA mapping the @pte. 746 + * @addr: The address where the @pte is mapped. 747 + * @pte: The PTE. 748 + * 749 + * Get the "struct folio" associated with a PTE. See __vm_normal_page() 750 + * for details on "normal" and "special" mappings. 751 + * 752 + * Return: Returns the "struct folio" if this is a "normal" mapping. Returns 753 + * NULL if this is a "special" mapping. 754 + */ 730 755 struct folio *vm_normal_folio(struct vm_area_struct *vma, unsigned long addr, 731 756 pte_t pte) 732 757 { ··· 769 732 } 770 733 771 734 #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES 735 + /** 736 + * vm_normal_page_pmd() - Get the "struct page" associated with a PMD 737 + * @vma: The VMA mapping the @pmd. 738 + * @addr: The address where the @pmd is mapped. 739 + * @pmd: The PMD. 740 + * 741 + * Get the "struct page" associated with a PTE. See __vm_normal_page() 742 + * for details on "normal" and "special" mappings. 743 + * 744 + * Return: Returns the "struct page" if this is a "normal" mapping. Returns 745 + * NULL if this is a "special" mapping. 746 + */ 772 747 struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, 773 748 pmd_t pmd) 774 749 { 775 - unsigned long pfn = pmd_pfn(pmd); 776 - 777 - if (unlikely(pmd_special(pmd))) 778 - return NULL; 779 - 780 - if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { 781 - if (vma->vm_flags & VM_MIXEDMAP) { 782 - if (!pfn_valid(pfn)) 783 - return NULL; 784 - goto out; 785 - } else { 786 - unsigned long off; 787 - off = (addr - vma->vm_start) >> PAGE_SHIFT; 788 - if (pfn == vma->vm_pgoff + off) 789 - return NULL; 790 - if (!is_cow_mapping(vma->vm_flags)) 791 - return NULL; 792 - } 793 - } 794 - 795 - if (is_huge_zero_pfn(pfn)) 796 - return NULL; 797 - if (unlikely(pfn > highest_memmap_pfn)) 798 - return NULL; 799 - 800 - /* 801 - * NOTE! We still have PageReserved() pages in the page tables. 802 - * eg. VDSO mappings can cause them to exist. 803 - */ 804 - out: 805 - return pfn_to_page(pfn); 750 + return __vm_normal_page(vma, addr, pmd_pfn(pmd), pmd_special(pmd), 751 + pmd_val(pmd), PGTABLE_LEVEL_PMD); 806 752 } 807 753 754 + /** 755 + * vm_normal_folio_pmd() - Get the "struct folio" associated with a PMD 756 + * @vma: The VMA mapping the @pmd. 757 + * @addr: The address where the @pmd is mapped. 758 + * @pmd: The PMD. 759 + * 760 + * Get the "struct folio" associated with a PTE. See __vm_normal_page() 761 + * for details on "normal" and "special" mappings. 762 + * 763 + * Return: Returns the "struct folio" if this is a "normal" mapping. Returns 764 + * NULL if this is a "special" mapping. 765 + */ 808 766 struct folio *vm_normal_folio_pmd(struct vm_area_struct *vma, 809 767 unsigned long addr, pmd_t pmd) 810 768 {