Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

userfaultfd: wp: support write protection for userfault vma range

Add API to enable/disable writeprotect a vma range. Unlike mprotect, this
doesn't split/merge vmas.

[peterx@redhat.com:
- use the helper to find VMA;
- return -ENOENT if not found to match mcopy case;
- use the new MM_CP_UFFD_WP* flags for change_protection
- check against mmap_changing for failures
- replace find_dst_vma with vma_find_uffd]
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Reviewed-by: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: Brian Geffon <bgeffon@google.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220163112.11409-13-peterx@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

authored by

Shaohua Li and committed by
Linus Torvalds
ffd05793 e1e267c7

+57
+3
include/linux/userfaultfd_k.h
··· 41 41 unsigned long dst_start, 42 42 unsigned long len, 43 43 bool *mmap_changing); 44 + extern int mwriteprotect_range(struct mm_struct *dst_mm, 45 + unsigned long start, unsigned long len, 46 + bool enable_wp, bool *mmap_changing); 44 47 45 48 /* mm helpers */ 46 49 static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
+54
mm/userfaultfd.c
··· 638 638 { 639 639 return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); 640 640 } 641 + 642 + int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, 643 + unsigned long len, bool enable_wp, bool *mmap_changing) 644 + { 645 + struct vm_area_struct *dst_vma; 646 + pgprot_t newprot; 647 + int err; 648 + 649 + /* 650 + * Sanitize the command parameters: 651 + */ 652 + BUG_ON(start & ~PAGE_MASK); 653 + BUG_ON(len & ~PAGE_MASK); 654 + 655 + /* Does the address range wrap, or is the span zero-sized? */ 656 + BUG_ON(start + len <= start); 657 + 658 + down_read(&dst_mm->mmap_sem); 659 + 660 + /* 661 + * If memory mappings are changing because of non-cooperative 662 + * operation (e.g. mremap) running in parallel, bail out and 663 + * request the user to retry later 664 + */ 665 + err = -EAGAIN; 666 + if (mmap_changing && READ_ONCE(*mmap_changing)) 667 + goto out_unlock; 668 + 669 + err = -ENOENT; 670 + dst_vma = find_dst_vma(dst_mm, start, len); 671 + /* 672 + * Make sure the vma is not shared, that the dst range is 673 + * both valid and fully within a single existing vma. 674 + */ 675 + if (!dst_vma || (dst_vma->vm_flags & VM_SHARED)) 676 + goto out_unlock; 677 + if (!userfaultfd_wp(dst_vma)) 678 + goto out_unlock; 679 + if (!vma_is_anonymous(dst_vma)) 680 + goto out_unlock; 681 + 682 + if (enable_wp) 683 + newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE)); 684 + else 685 + newprot = vm_get_page_prot(dst_vma->vm_flags); 686 + 687 + change_protection(dst_vma, start, start + len, newprot, 688 + enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE); 689 + 690 + err = 0; 691 + out_unlock: 692 + up_read(&dst_mm->mmap_sem); 693 + return err; 694 + }