Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

xen/gntdev: Accommodate VMA splitting

Prior to this commit, the gntdev driver code did not handle the
following scenario correctly with paravirtualized (PV) Xen domains:

* User process sets up a gntdev mapping composed of two grant mappings
(i.e., two pages shared by another Xen domain).
* User process munmap()s one of the pages.
* User process munmap()s the remaining page.
* User process exits.

In the scenario above, the user process would cause the kernel to log
the following messages in dmesg for the first munmap(), and the second
munmap() call would result in similar log messages:

BUG: Bad page map in process doublemap.test pte:... pmd:...
page:0000000057c97bff refcount:1 mapcount:-1 \
mapping:0000000000000000 index:0x0 pfn:...
...
page dumped because: bad pte
...
file:gntdev fault:0x0 mmap:gntdev_mmap [xen_gntdev] readpage:0x0
...
Call Trace:
<TASK>
dump_stack_lvl+0x46/0x5e
print_bad_pte.cold+0x66/0xb6
unmap_page_range+0x7e5/0xdc0
unmap_vmas+0x78/0xf0
unmap_region+0xa8/0x110
__do_munmap+0x1ea/0x4e0
__vm_munmap+0x75/0x120
__x64_sys_munmap+0x28/0x40
do_syscall_64+0x38/0x90
entry_SYSCALL_64_after_hwframe+0x61/0xcb
...

For each munmap() call, the Xen hypervisor (if built with CONFIG_DEBUG)
would print out the following and trigger a general protection fault in
the affected Xen PV domain:

(XEN) d0v... Attempt to implicitly unmap d0's grant PTE ...
(XEN) d0v... Attempt to implicitly unmap d0's grant PTE ...

As of this writing, gntdev_grant_map structure's vma field (referred to
as map->vma below) is mainly used for checking the start and end
addresses of mappings. However, with split VMAs, these may change, and
there could be more than one VMA associated with a gntdev mapping.
Hence, remove the use of map->vma and rely on map->pages_vm_start for
the original start address and on (map->count << PAGE_SHIFT) for the
original mapping size. Let the invalidate() and find_special_page()
hooks use these.

Also, given that there can be multiple VMAs associated with a gntdev
mapping, move the "mmu_interval_notifier_remove(&map->notifier)" call to
the end of gntdev_put_map, so that the MMU notifier is only removed
after the closing of the last remaining VMA.

Finally, use an atomic to prevent inadvertent gntdev mapping re-use,
instead of using the map->live_grants atomic counter and/or the map->vma
pointer (the latter of which is now removed). This prevents the
userspace from mmap()'ing (with MAP_FIXED) a gntdev mapping over the
same address range as a previously set up gntdev mapping. This scenario
can be summarized with the following call-trace, which was valid prior
to this commit:

mmap
gntdev_mmap
mmap (repeat mmap with MAP_FIXED over the same address range)
gntdev_invalidate
unmap_grant_pages (sets 'being_removed' entries to true)
gnttab_unmap_refs_async
unmap_single_vma
gntdev_mmap (maps the shared pages again)
munmap
gntdev_invalidate
unmap_grant_pages
(no-op because 'being_removed' entries are true)
unmap_single_vma (For PV domains, Xen reports that a granted page
is being unmapped and triggers a general protection fault in the
affected domain, if Xen was built with CONFIG_DEBUG)

The fix for this last scenario could be worth its own commit, but we
opted for a single commit, because removing the gntdev_grant_map
structure's vma field requires guarding the entry to gntdev_mmap(), and
the live_grants atomic counter is not sufficient on its own to prevent
the mmap() over a pre-existing mapping.

Link: https://github.com/QubesOS/qubes-issues/issues/7631
Fixes: ab31523c2fca ("xen/gntdev: allow usermode to map granted pages")
Cc: stable@vger.kernel.org
Signed-off-by: M. Vefa Bicakci <m.v.b@runbox.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20221002222006.2077-3-m.v.b@runbox.com
Signed-off-by: Juergen Gross <jgross@suse.com>

authored by

M. Vefa Bicakci and committed by
Juergen Gross
5c13a4a0 0991028c

+27 -34
+2 -1
drivers/xen/gntdev-common.h
··· 44 44 }; 45 45 46 46 struct gntdev_grant_map { 47 + atomic_t in_use; 47 48 struct mmu_interval_notifier notifier; 49 + bool notifier_init; 48 50 struct list_head next; 49 - struct vm_area_struct *vma; 50 51 int index; 51 52 int count; 52 53 int flags;
+25 -33
drivers/xen/gntdev.c
··· 286 286 */ 287 287 } 288 288 289 + if (use_ptemod && map->notifier_init) 290 + mmu_interval_notifier_remove(&map->notifier); 291 + 289 292 if (map->notify.flags & UNMAP_NOTIFY_SEND_EVENT) { 290 293 notify_remote_via_evtchn(map->notify.event); 291 294 evtchn_put(map->notify.event); ··· 301 298 static int find_grant_ptes(pte_t *pte, unsigned long addr, void *data) 302 299 { 303 300 struct gntdev_grant_map *map = data; 304 - unsigned int pgnr = (addr - map->vma->vm_start) >> PAGE_SHIFT; 301 + unsigned int pgnr = (addr - map->pages_vm_start) >> PAGE_SHIFT; 305 302 int flags = map->flags | GNTMAP_application_map | GNTMAP_contains_pte | 306 303 (1 << _GNTMAP_guest_avail0); 307 304 u64 pte_maddr; ··· 511 508 struct gntdev_priv *priv = file->private_data; 512 509 513 510 pr_debug("gntdev_vma_close %p\n", vma); 514 - if (use_ptemod) { 515 - WARN_ON(map->vma != vma); 516 - mmu_interval_notifier_remove(&map->notifier); 517 - map->vma = NULL; 518 - } 511 + 519 512 vma->vm_private_data = NULL; 520 513 gntdev_put_map(priv, map); 521 514 } ··· 539 540 struct gntdev_grant_map *map = 540 541 container_of(mn, struct gntdev_grant_map, notifier); 541 542 unsigned long mstart, mend; 543 + unsigned long map_start, map_end; 542 544 543 545 if (!mmu_notifier_range_blockable(range)) 544 546 return false; 547 + 548 + map_start = map->pages_vm_start; 549 + map_end = map->pages_vm_start + (map->count << PAGE_SHIFT); 545 550 546 551 /* 547 552 * If the VMA is split or otherwise changed the notifier is not ··· 553 550 * VMA. FIXME: It would be much more understandable to just prevent 554 551 * modifying the VMA in the first place. 555 552 */ 556 - if (map->vma->vm_start >= range->end || 557 - map->vma->vm_end <= range->start) 553 + if (map_start >= range->end || map_end <= range->start) 558 554 return true; 559 555 560 - mstart = max(range->start, map->vma->vm_start); 561 - mend = min(range->end, map->vma->vm_end); 556 + mstart = max(range->start, map_start); 557 + mend = min(range->end, map_end); 562 558 pr_debug("map %d+%d (%lx %lx), range %lx %lx, mrange %lx %lx\n", 563 - map->index, map->count, 564 - map->vma->vm_start, map->vma->vm_end, 565 - range->start, range->end, mstart, mend); 566 - unmap_grant_pages(map, 567 - (mstart - map->vma->vm_start) >> PAGE_SHIFT, 568 - (mend - mstart) >> PAGE_SHIFT); 559 + map->index, map->count, map_start, map_end, 560 + range->start, range->end, mstart, mend); 561 + unmap_grant_pages(map, (mstart - map_start) >> PAGE_SHIFT, 562 + (mend - mstart) >> PAGE_SHIFT); 569 563 570 564 return true; 571 565 } ··· 1042 1042 return -EINVAL; 1043 1043 1044 1044 pr_debug("map %d+%d at %lx (pgoff %lx)\n", 1045 - index, count, vma->vm_start, vma->vm_pgoff); 1045 + index, count, vma->vm_start, vma->vm_pgoff); 1046 1046 1047 1047 mutex_lock(&priv->lock); 1048 1048 map = gntdev_find_map_index(priv, index, count); 1049 1049 if (!map) 1050 1050 goto unlock_out; 1051 - if (use_ptemod && map->vma) 1051 + if (!atomic_add_unless(&map->in_use, 1, 1)) 1052 1052 goto unlock_out; 1053 - if (atomic_read(&map->live_grants)) { 1054 - err = -EAGAIN; 1055 - goto unlock_out; 1056 - } 1053 + 1057 1054 refcount_inc(&map->users); 1058 1055 1059 1056 vma->vm_ops = &gntdev_vmops; ··· 1071 1074 map->flags |= GNTMAP_readonly; 1072 1075 } 1073 1076 1077 + map->pages_vm_start = vma->vm_start; 1078 + 1074 1079 if (use_ptemod) { 1075 - map->vma = vma; 1076 1080 err = mmu_interval_notifier_insert_locked( 1077 1081 &map->notifier, vma->vm_mm, vma->vm_start, 1078 1082 vma->vm_end - vma->vm_start, &gntdev_mmu_ops); 1079 - if (err) { 1080 - map->vma = NULL; 1083 + if (err) 1081 1084 goto out_unlock_put; 1082 - } 1085 + 1086 + map->notifier_init = true; 1083 1087 } 1084 1088 mutex_unlock(&priv->lock); 1085 1089 ··· 1097 1099 */ 1098 1100 mmu_interval_read_begin(&map->notifier); 1099 1101 1100 - map->pages_vm_start = vma->vm_start; 1101 1102 err = apply_to_page_range(vma->vm_mm, vma->vm_start, 1102 1103 vma->vm_end - vma->vm_start, 1103 1104 find_grant_ptes, map); ··· 1125 1128 out_unlock_put: 1126 1129 mutex_unlock(&priv->lock); 1127 1130 out_put_map: 1128 - if (use_ptemod) { 1131 + if (use_ptemod) 1129 1132 unmap_grant_pages(map, 0, map->count); 1130 - if (map->vma) { 1131 - mmu_interval_notifier_remove(&map->notifier); 1132 - map->vma = NULL; 1133 - } 1134 - } 1135 1133 gntdev_put_map(priv, map); 1136 1134 return err; 1137 1135 }