Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: device exclusive memory access

Some devices require exclusive write access to shared virtual memory (SVM)
ranges to perform atomic operations on that memory. This requires CPU
page tables to be updated to deny access whilst atomic operations are
occurring.

In order to do this introduce a new swap entry type
(SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for exclusive
access by a device all page table mappings for the particular range are
replaced with device exclusive swap entries. This causes any CPU access
to the page to result in a fault.

Faults are resovled by replacing the faulting entry with the original
mapping. This results in MMU notifiers being called which a driver uses
to update access permissions such as revoking atomic access. After
notifiers have been called the device will no longer have exclusive access
to the region.

Walking of the page tables to find the target pages is handled by
get_user_pages() rather than a direct page table walk. A direct page
table walk similar to what migrate_vma_collect()/unmap() does could also
have been utilised. However this resulted in more code similar in
functionality to what get_user_pages() provides as page faulting is
required to make the PTEs present and to break COW.

[dan.carpenter@oracle.com: fix signedness bug in make_device_exclusive_range()]
Link: https://lkml.kernel.org/r/YNIz5NVnZ5GiZ3u1@mwanda

Link: https://lkml.kernel.org/r/20210616105937.23201-8-apopple@nvidia.com
Signed-off-by: Alistair Popple <apopple@nvidia.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: John Hubbard <jhubbard@nvidia.com>
Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
Cc: Peter Xu <peterx@redhat.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Alistair Popple and committed by
Linus Torvalds
b756a3b5 9a5cc85c

+405 -10
+17
Documentation/vm/hmm.rst
··· 405 405 406 406 The lock can now be released. 407 407 408 + Exclusive access memory 409 + ======================= 410 + 411 + Some devices have features such as atomic PTE bits that can be used to implement 412 + atomic access to system memory. To support atomic operations to a shared virtual 413 + memory page such a device needs access to that page which is exclusive of any 414 + userspace access from the CPU. The ``make_device_exclusive_range()`` function 415 + can be used to make a memory range inaccessible from userspace. 416 + 417 + This replaces all mappings for pages in the given range with special swap 418 + entries. Any attempt to access the swap entry results in a fault which is 419 + resovled by replacing the entry with the original mapping. A driver gets 420 + notified that the mapping has been changed by MMU notifiers, after which point 421 + it will no longer have exclusive access to the page. Exclusive access is 422 + guranteed to last until the driver drops the page lock and page reference, at 423 + which point any CPU faults on the page may proceed as described. 424 + 408 425 Memory cgroup (memcg) and rss accounting 409 426 ======================================== 410 427
+6
include/linux/mmu_notifier.h
··· 42 42 * @MMU_NOTIFY_MIGRATE: used during migrate_vma_collect() invalidate to signal 43 43 * a device driver to possibly ignore the invalidation if the 44 44 * owner field matches the driver's device private pgmap owner. 45 + * 46 + * @MMU_NOTIFY_EXCLUSIVE: to signal a device driver that the device will no 47 + * longer have exclusive access to the page. When sent during creation of an 48 + * exclusive range the owner will be initialised to the value provided by the 49 + * caller of make_device_exclusive_range(), otherwise the owner will be NULL. 45 50 */ 46 51 enum mmu_notifier_event { 47 52 MMU_NOTIFY_UNMAP = 0, ··· 56 51 MMU_NOTIFY_SOFT_DIRTY, 57 52 MMU_NOTIFY_RELEASE, 58 53 MMU_NOTIFY_MIGRATE, 54 + MMU_NOTIFY_EXCLUSIVE, 59 55 }; 60 56 61 57 #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0)
+4
include/linux/rmap.h
··· 194 194 void try_to_migrate(struct page *page, enum ttu_flags flags); 195 195 void try_to_unmap(struct page *, enum ttu_flags flags); 196 196 197 + int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, 198 + unsigned long end, struct page **pages, 199 + void *arg); 200 + 197 201 /* Avoid racy checks */ 198 202 #define PVMW_SYNC (1 << 0) 199 203 /* Look for migarion entries rather than present PTEs */
+7 -2
include/linux/swap.h
··· 62 62 * migrate part of a process memory to device memory. 63 63 * 64 64 * When a page is migrated from CPU to device, we set the CPU page table entry 65 - * to a special SWP_DEVICE_* entry. 65 + * to a special SWP_DEVICE_{READ|WRITE} entry. 66 + * 67 + * When a page is mapped by the device for exclusive access we set the CPU page 68 + * table entries to special SWP_DEVICE_EXCLUSIVE_* entries. 66 69 */ 67 70 #ifdef CONFIG_DEVICE_PRIVATE 68 - #define SWP_DEVICE_NUM 2 71 + #define SWP_DEVICE_NUM 4 69 72 #define SWP_DEVICE_WRITE (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM) 70 73 #define SWP_DEVICE_READ (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM+1) 74 + #define SWP_DEVICE_EXCLUSIVE_WRITE (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM+2) 75 + #define SWP_DEVICE_EXCLUSIVE_READ (MAX_SWAPFILES+SWP_HWPOISON_NUM+SWP_MIGRATION_NUM+3) 71 76 #else 72 77 #define SWP_DEVICE_NUM 0 73 78 #endif
+43 -1
include/linux/swapops.h
··· 127 127 { 128 128 return unlikely(swp_type(entry) == SWP_DEVICE_WRITE); 129 129 } 130 + 131 + static inline swp_entry_t make_readable_device_exclusive_entry(pgoff_t offset) 132 + { 133 + return swp_entry(SWP_DEVICE_EXCLUSIVE_READ, offset); 134 + } 135 + 136 + static inline swp_entry_t make_writable_device_exclusive_entry(pgoff_t offset) 137 + { 138 + return swp_entry(SWP_DEVICE_EXCLUSIVE_WRITE, offset); 139 + } 140 + 141 + static inline bool is_device_exclusive_entry(swp_entry_t entry) 142 + { 143 + return swp_type(entry) == SWP_DEVICE_EXCLUSIVE_READ || 144 + swp_type(entry) == SWP_DEVICE_EXCLUSIVE_WRITE; 145 + } 146 + 147 + static inline bool is_writable_device_exclusive_entry(swp_entry_t entry) 148 + { 149 + return unlikely(swp_type(entry) == SWP_DEVICE_EXCLUSIVE_WRITE); 150 + } 130 151 #else /* CONFIG_DEVICE_PRIVATE */ 131 152 static inline swp_entry_t make_readable_device_private_entry(pgoff_t offset) 132 153 { ··· 165 144 } 166 145 167 146 static inline bool is_writable_device_private_entry(swp_entry_t entry) 147 + { 148 + return false; 149 + } 150 + 151 + static inline swp_entry_t make_readable_device_exclusive_entry(pgoff_t offset) 152 + { 153 + return swp_entry(0, 0); 154 + } 155 + 156 + static inline swp_entry_t make_writable_device_exclusive_entry(pgoff_t offset) 157 + { 158 + return swp_entry(0, 0); 159 + } 160 + 161 + static inline bool is_device_exclusive_entry(swp_entry_t entry) 162 + { 163 + return false; 164 + } 165 + 166 + static inline bool is_writable_device_exclusive_entry(swp_entry_t entry) 168 167 { 169 168 return false; 170 169 } ··· 267 226 */ 268 227 static inline bool is_pfn_swap_entry(swp_entry_t entry) 269 228 { 270 - return is_migration_entry(entry) || is_device_private_entry(entry); 229 + return is_migration_entry(entry) || is_device_private_entry(entry) || 230 + is_device_exclusive_entry(entry); 271 231 } 272 232 273 233 struct page_vma_mapped_walk;
+5
mm/hmm.c
··· 26 26 #include <linux/mmu_notifier.h> 27 27 #include <linux/memory_hotplug.h> 28 28 29 + #include "internal.h" 30 + 29 31 struct hmm_vma_walk { 30 32 struct hmm_range *range; 31 33 unsigned long last; ··· 271 269 } 272 270 273 271 if (!non_swap_entry(entry)) 272 + goto fault; 273 + 274 + if (is_device_exclusive_entry(entry)) 274 275 goto fault; 275 276 276 277 if (is_migration_entry(entry)) {
+123 -4
mm/memory.c
··· 699 699 } 700 700 #endif 701 701 702 + static void restore_exclusive_pte(struct vm_area_struct *vma, 703 + struct page *page, unsigned long address, 704 + pte_t *ptep) 705 + { 706 + pte_t pte; 707 + swp_entry_t entry; 708 + 709 + pte = pte_mkold(mk_pte(page, READ_ONCE(vma->vm_page_prot))); 710 + if (pte_swp_soft_dirty(*ptep)) 711 + pte = pte_mksoft_dirty(pte); 712 + 713 + entry = pte_to_swp_entry(*ptep); 714 + if (pte_swp_uffd_wp(*ptep)) 715 + pte = pte_mkuffd_wp(pte); 716 + else if (is_writable_device_exclusive_entry(entry)) 717 + pte = maybe_mkwrite(pte_mkdirty(pte), vma); 718 + 719 + set_pte_at(vma->vm_mm, address, ptep, pte); 720 + 721 + /* 722 + * No need to take a page reference as one was already 723 + * created when the swap entry was made. 724 + */ 725 + if (PageAnon(page)) 726 + page_add_anon_rmap(page, vma, address, false); 727 + else 728 + /* 729 + * Currently device exclusive access only supports anonymous 730 + * memory so the entry shouldn't point to a filebacked page. 731 + */ 732 + WARN_ON_ONCE(!PageAnon(page)); 733 + 734 + if (vma->vm_flags & VM_LOCKED) 735 + mlock_vma_page(page); 736 + 737 + /* 738 + * No need to invalidate - it was non-present before. However 739 + * secondary CPUs may have mappings that need invalidating. 740 + */ 741 + update_mmu_cache(vma, address, ptep); 742 + } 743 + 744 + /* 745 + * Tries to restore an exclusive pte if the page lock can be acquired without 746 + * sleeping. 747 + */ 748 + static int 749 + try_restore_exclusive_pte(pte_t *src_pte, struct vm_area_struct *vma, 750 + unsigned long addr) 751 + { 752 + swp_entry_t entry = pte_to_swp_entry(*src_pte); 753 + struct page *page = pfn_swap_entry_to_page(entry); 754 + 755 + if (trylock_page(page)) { 756 + restore_exclusive_pte(vma, page, addr, src_pte); 757 + unlock_page(page); 758 + return 0; 759 + } 760 + 761 + return -EBUSY; 762 + } 763 + 702 764 /* 703 765 * copy one vm_area from one task to the other. Assumes the page tables 704 766 * already present in the new task to be cleared in the whole range ··· 842 780 pte = pte_swp_mkuffd_wp(pte); 843 781 set_pte_at(src_mm, addr, src_pte, pte); 844 782 } 783 + } else if (is_device_exclusive_entry(entry)) { 784 + /* 785 + * Make device exclusive entries present by restoring the 786 + * original entry then copying as for a present pte. Device 787 + * exclusive entries currently only support private writable 788 + * (ie. COW) mappings. 789 + */ 790 + VM_BUG_ON(!is_cow_mapping(src_vma->vm_flags)); 791 + if (try_restore_exclusive_pte(src_pte, src_vma, addr)) 792 + return -EBUSY; 793 + return -ENOENT; 845 794 } 846 795 if (!userfaultfd_wp(dst_vma)) 847 796 pte = pte_swp_clear_uffd_wp(pte); ··· 1053 980 if (ret == -EIO) { 1054 981 entry = pte_to_swp_entry(*src_pte); 1055 982 break; 983 + } else if (ret == -EBUSY) { 984 + break; 985 + } else if (!ret) { 986 + progress += 8; 987 + continue; 1056 988 } 1057 - progress += 8; 1058 - continue; 989 + 990 + /* 991 + * Device exclusive entry restored, continue by copying 992 + * the now present pte. 993 + */ 994 + WARN_ON_ONCE(ret != -ENOENT); 1059 995 } 1060 996 /* copy_present_pte() will clear `*prealloc' if consumed */ 1061 997 ret = copy_present_pte(dst_vma, src_vma, dst_pte, src_pte, ··· 1102 1020 goto out; 1103 1021 } 1104 1022 entry.val = 0; 1023 + } else if (ret == -EBUSY) { 1024 + goto out; 1105 1025 } else if (ret == -EAGAIN) { 1106 1026 prealloc = page_copy_prealloc(src_mm, src_vma, addr); 1107 1027 if (!prealloc) ··· 1371 1287 } 1372 1288 1373 1289 entry = pte_to_swp_entry(ptent); 1374 - if (is_device_private_entry(entry)) { 1290 + if (is_device_private_entry(entry) || 1291 + is_device_exclusive_entry(entry)) { 1375 1292 struct page *page = pfn_swap_entry_to_page(entry); 1376 1293 1377 1294 if (unlikely(details && details->check_mapping)) { ··· 1388 1303 1389 1304 pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); 1390 1305 rss[mm_counter(page)]--; 1391 - page_remove_rmap(page, false); 1306 + 1307 + if (is_device_private_entry(entry)) 1308 + page_remove_rmap(page, false); 1309 + 1392 1310 put_page(page); 1393 1311 continue; 1394 1312 } ··· 3440 3352 EXPORT_SYMBOL(unmap_mapping_range); 3441 3353 3442 3354 /* 3355 + * Restore a potential device exclusive pte to a working pte entry 3356 + */ 3357 + static vm_fault_t remove_device_exclusive_entry(struct vm_fault *vmf) 3358 + { 3359 + struct page *page = vmf->page; 3360 + struct vm_area_struct *vma = vmf->vma; 3361 + struct mmu_notifier_range range; 3362 + 3363 + if (!lock_page_or_retry(page, vma->vm_mm, vmf->flags)) 3364 + return VM_FAULT_RETRY; 3365 + mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, vma, 3366 + vma->vm_mm, vmf->address & PAGE_MASK, 3367 + (vmf->address & PAGE_MASK) + PAGE_SIZE, NULL); 3368 + mmu_notifier_invalidate_range_start(&range); 3369 + 3370 + vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, 3371 + &vmf->ptl); 3372 + if (likely(pte_same(*vmf->pte, vmf->orig_pte))) 3373 + restore_exclusive_pte(vma, page, vmf->address, vmf->pte); 3374 + 3375 + pte_unmap_unlock(vmf->pte, vmf->ptl); 3376 + unlock_page(page); 3377 + 3378 + mmu_notifier_invalidate_range_end(&range); 3379 + return 0; 3380 + } 3381 + 3382 + /* 3443 3383 * We enter with non-exclusive mmap_lock (to exclude vma changes, 3444 3384 * but allow concurrent faults), and pte mapped but not yet locked. 3445 3385 * We return with pte unmapped and unlocked. ··· 3495 3379 if (is_migration_entry(entry)) { 3496 3380 migration_entry_wait(vma->vm_mm, vmf->pmd, 3497 3381 vmf->address); 3382 + } else if (is_device_exclusive_entry(entry)) { 3383 + vmf->page = pfn_swap_entry_to_page(entry); 3384 + ret = remove_device_exclusive_entry(vmf); 3498 3385 } else if (is_device_private_entry(entry)) { 3499 3386 vmf->page = pfn_swap_entry_to_page(entry); 3500 3387 ret = vmf->page->pgmap->ops->migrate_to_ram(vmf);
+8
mm/mprotect.c
··· 165 165 newpte = swp_entry_to_pte(entry); 166 166 if (pte_swp_uffd_wp(oldpte)) 167 167 newpte = pte_swp_mkuffd_wp(newpte); 168 + } else if (is_writable_device_exclusive_entry(entry)) { 169 + entry = make_readable_device_exclusive_entry( 170 + swp_offset(entry)); 171 + newpte = swp_entry_to_pte(entry); 172 + if (pte_swp_soft_dirty(oldpte)) 173 + newpte = pte_swp_mksoft_dirty(newpte); 174 + if (pte_swp_uffd_wp(oldpte)) 175 + newpte = pte_swp_mkuffd_wp(newpte); 168 176 } else { 169 177 newpte = oldpte; 170 178 }
+6 -3
mm/page_vma_mapped.c
··· 41 41 42 42 /* Handle un-addressable ZONE_DEVICE memory */ 43 43 entry = pte_to_swp_entry(*pvmw->pte); 44 - if (!is_device_private_entry(entry)) 44 + if (!is_device_private_entry(entry) && 45 + !is_device_exclusive_entry(entry)) 45 46 return false; 46 47 } else if (!pte_present(*pvmw->pte)) 47 48 return false; ··· 94 93 return false; 95 94 entry = pte_to_swp_entry(*pvmw->pte); 96 95 97 - if (!is_migration_entry(entry)) 96 + if (!is_migration_entry(entry) && 97 + !is_device_exclusive_entry(entry)) 98 98 return false; 99 99 100 100 pfn = swp_offset(entry); ··· 104 102 105 103 /* Handle un-addressable ZONE_DEVICE memory */ 106 104 entry = pte_to_swp_entry(*pvmw->pte); 107 - if (!is_device_private_entry(entry)) 105 + if (!is_device_private_entry(entry) && 106 + !is_device_exclusive_entry(entry)) 108 107 return false; 109 108 110 109 pfn = swp_offset(entry);
+186
mm/rmap.c
··· 2028 2028 rmap_walk(page, &rwc); 2029 2029 } 2030 2030 2031 + #ifdef CONFIG_DEVICE_PRIVATE 2032 + struct make_exclusive_args { 2033 + struct mm_struct *mm; 2034 + unsigned long address; 2035 + void *owner; 2036 + bool valid; 2037 + }; 2038 + 2039 + static bool page_make_device_exclusive_one(struct page *page, 2040 + struct vm_area_struct *vma, unsigned long address, void *priv) 2041 + { 2042 + struct mm_struct *mm = vma->vm_mm; 2043 + struct page_vma_mapped_walk pvmw = { 2044 + .page = page, 2045 + .vma = vma, 2046 + .address = address, 2047 + }; 2048 + struct make_exclusive_args *args = priv; 2049 + pte_t pteval; 2050 + struct page *subpage; 2051 + bool ret = true; 2052 + struct mmu_notifier_range range; 2053 + swp_entry_t entry; 2054 + pte_t swp_pte; 2055 + 2056 + mmu_notifier_range_init_owner(&range, MMU_NOTIFY_EXCLUSIVE, 0, vma, 2057 + vma->vm_mm, address, min(vma->vm_end, 2058 + address + page_size(page)), args->owner); 2059 + mmu_notifier_invalidate_range_start(&range); 2060 + 2061 + while (page_vma_mapped_walk(&pvmw)) { 2062 + /* Unexpected PMD-mapped THP? */ 2063 + VM_BUG_ON_PAGE(!pvmw.pte, page); 2064 + 2065 + if (!pte_present(*pvmw.pte)) { 2066 + ret = false; 2067 + page_vma_mapped_walk_done(&pvmw); 2068 + break; 2069 + } 2070 + 2071 + subpage = page - page_to_pfn(page) + pte_pfn(*pvmw.pte); 2072 + address = pvmw.address; 2073 + 2074 + /* Nuke the page table entry. */ 2075 + flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); 2076 + pteval = ptep_clear_flush(vma, address, pvmw.pte); 2077 + 2078 + /* Move the dirty bit to the page. Now the pte is gone. */ 2079 + if (pte_dirty(pteval)) 2080 + set_page_dirty(page); 2081 + 2082 + /* 2083 + * Check that our target page is still mapped at the expected 2084 + * address. 2085 + */ 2086 + if (args->mm == mm && args->address == address && 2087 + pte_write(pteval)) 2088 + args->valid = true; 2089 + 2090 + /* 2091 + * Store the pfn of the page in a special migration 2092 + * pte. do_swap_page() will wait until the migration 2093 + * pte is removed and then restart fault handling. 2094 + */ 2095 + if (pte_write(pteval)) 2096 + entry = make_writable_device_exclusive_entry( 2097 + page_to_pfn(subpage)); 2098 + else 2099 + entry = make_readable_device_exclusive_entry( 2100 + page_to_pfn(subpage)); 2101 + swp_pte = swp_entry_to_pte(entry); 2102 + if (pte_soft_dirty(pteval)) 2103 + swp_pte = pte_swp_mksoft_dirty(swp_pte); 2104 + if (pte_uffd_wp(pteval)) 2105 + swp_pte = pte_swp_mkuffd_wp(swp_pte); 2106 + 2107 + set_pte_at(mm, address, pvmw.pte, swp_pte); 2108 + 2109 + /* 2110 + * There is a reference on the page for the swap entry which has 2111 + * been removed, so shouldn't take another. 2112 + */ 2113 + page_remove_rmap(subpage, false); 2114 + } 2115 + 2116 + mmu_notifier_invalidate_range_end(&range); 2117 + 2118 + return ret; 2119 + } 2120 + 2121 + /** 2122 + * page_make_device_exclusive - mark the page exclusively owned by a device 2123 + * @page: the page to replace page table entries for 2124 + * @mm: the mm_struct where the page is expected to be mapped 2125 + * @address: address where the page is expected to be mapped 2126 + * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier callbacks 2127 + * 2128 + * Tries to remove all the page table entries which are mapping this page and 2129 + * replace them with special device exclusive swap entries to grant a device 2130 + * exclusive access to the page. Caller must hold the page lock. 2131 + * 2132 + * Returns false if the page is still mapped, or if it could not be unmapped 2133 + * from the expected address. Otherwise returns true (success). 2134 + */ 2135 + static bool page_make_device_exclusive(struct page *page, struct mm_struct *mm, 2136 + unsigned long address, void *owner) 2137 + { 2138 + struct make_exclusive_args args = { 2139 + .mm = mm, 2140 + .address = address, 2141 + .owner = owner, 2142 + .valid = false, 2143 + }; 2144 + struct rmap_walk_control rwc = { 2145 + .rmap_one = page_make_device_exclusive_one, 2146 + .done = page_not_mapped, 2147 + .anon_lock = page_lock_anon_vma_read, 2148 + .arg = &args, 2149 + }; 2150 + 2151 + /* 2152 + * Restrict to anonymous pages for now to avoid potential writeback 2153 + * issues. Also tail pages shouldn't be passed to rmap_walk so skip 2154 + * those. 2155 + */ 2156 + if (!PageAnon(page) || PageTail(page)) 2157 + return false; 2158 + 2159 + rmap_walk(page, &rwc); 2160 + 2161 + return args.valid && !page_mapcount(page); 2162 + } 2163 + 2164 + /** 2165 + * make_device_exclusive_range() - Mark a range for exclusive use by a device 2166 + * @mm: mm_struct of assoicated target process 2167 + * @start: start of the region to mark for exclusive device access 2168 + * @end: end address of region 2169 + * @pages: returns the pages which were successfully marked for exclusive access 2170 + * @owner: passed to MMU_NOTIFY_EXCLUSIVE range notifier to allow filtering 2171 + * 2172 + * Returns: number of pages found in the range by GUP. A page is marked for 2173 + * exclusive access only if the page pointer is non-NULL. 2174 + * 2175 + * This function finds ptes mapping page(s) to the given address range, locks 2176 + * them and replaces mappings with special swap entries preventing userspace CPU 2177 + * access. On fault these entries are replaced with the original mapping after 2178 + * calling MMU notifiers. 2179 + * 2180 + * A driver using this to program access from a device must use a mmu notifier 2181 + * critical section to hold a device specific lock during programming. Once 2182 + * programming is complete it should drop the page lock and reference after 2183 + * which point CPU access to the page will revoke the exclusive access. 2184 + */ 2185 + int make_device_exclusive_range(struct mm_struct *mm, unsigned long start, 2186 + unsigned long end, struct page **pages, 2187 + void *owner) 2188 + { 2189 + long npages = (end - start) >> PAGE_SHIFT; 2190 + long i; 2191 + 2192 + npages = get_user_pages_remote(mm, start, npages, 2193 + FOLL_GET | FOLL_WRITE | FOLL_SPLIT_PMD, 2194 + pages, NULL, NULL); 2195 + if (npages < 0) 2196 + return npages; 2197 + 2198 + for (i = 0; i < npages; i++, start += PAGE_SIZE) { 2199 + if (!trylock_page(pages[i])) { 2200 + put_page(pages[i]); 2201 + pages[i] = NULL; 2202 + continue; 2203 + } 2204 + 2205 + if (!page_make_device_exclusive(pages[i], mm, start, owner)) { 2206 + unlock_page(pages[i]); 2207 + put_page(pages[i]); 2208 + pages[i] = NULL; 2209 + } 2210 + } 2211 + 2212 + return npages; 2213 + } 2214 + EXPORT_SYMBOL_GPL(make_device_exclusive_range); 2215 + #endif 2216 + 2031 2217 void __put_anon_vma(struct anon_vma *anon_vma) 2032 2218 { 2033 2219 struct anon_vma *root = anon_vma->root;