mm/vma: add give_up_on_oom option on modify/merge, use in uffd release

Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

kernel os linux

Currently, if a VMA merge fails due to an OOM condition arising on commit
merge or a failure to duplicate anon_vma's, we report this so the caller
can handle it.

However there are cases where the caller is only ostensibly trying a
merge, and doesn't mind if it fails due to this condition.

Since we do not want to introduce an implicit assumption that we only
actually modify VMAs after OOM conditions might arise, add a 'give up on
oom' option and make an explicit contract that, should this flag be set, we
absolutely will not modify any VMAs should OOM arise and just bail out.

Since it'd be very unusual for a user to try to vma_modify() with this flag
set but be specifying a range within a VMA which ends up being split (which
can fail due to rlimit issues, not only OOM), we add a debug warning for
this condition.

The motivating reason for this is uffd release - syzkaller (and Pedro
Falcato's VERY astute analysis) found a way in which an injected fault on
allocation, triggering an OOM condition on commit merge, would result in
uffd code becoming confused and treating an error value as if it were a VMA
pointer.

To avoid this, we make use of this new VMG flag to ensure that this never
occurs, utilising the fact that, should we be clearing entire VMAs, we do
not wish an OOM event to be reported to us.

Many thanks to Pedro Falcato for his excellent analysis and Jann Horn for
his insightful and intelligent analysis of the situation, both of whom were
instrumental in this fix.

Link: https://lkml.kernel.org/r/20250321100937.46634-1-lorenzo.stoakes@oracle.com
Reported-by: syzbot+20ed41006cf9d842c2b5@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67dc67f0.050a0220.25ae54.001e.GAE@google.com/
Fixes: 47b16d0462a4 ("mm: abort vma_modify() on merge out of memory failure")
Signed-off-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Suggested-by: Pedro Falcato <pfalcato@suse.de>
Suggested-by: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Lorenzo Stoakes and committed by

Andrew Morton 1 year ago 41e6ddca 9c02223e

+66 -7

3 changed files

expand all

userfaultfd.c

vma.c

vma.h

+11 -2

mm/userfaultfd.c

··· 1902 1902 unsigned long end) 1903 1903 { 1904 1904 struct vm_area_struct *ret; 1905 + bool give_up_on_oom = false; 1906 + 1907 + /* 1908 + * If we are modifying only and not splitting, just give up on the merge 1909 + * if OOM prevents us from merging successfully. 1910 + */ 1911 + if (start == vma->vm_start && end == vma->vm_end) 1912 + give_up_on_oom = true; 1905 1913 1906 1914 /* Reset ptes for the whole vma range if wr-protected */ 1907 1915 if (userfaultfd_wp(vma)) ··· 1917 1909 1918 1910 ret = vma_modify_flags_uffd(vmi, prev, vma, start, end, 1919 1911 vma->vm_flags & ~__VM_UFFD_FLAGS, 1920 - NULL_VM_UFFD_CTX); 1912 + NULL_VM_UFFD_CTX, give_up_on_oom); 1921 1913 1922 1914 /* 1923 1915 * In the vma_merge() successful mprotect-like case 8: ··· 1968 1960 new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags; 1969 1961 vma = vma_modify_flags_uffd(&vmi, prev, vma, start, vma_end, 1970 1962 new_flags, 1971 - (struct vm_userfaultfd_ctx){ctx}); 1963 + (struct vm_userfaultfd_ctx){ctx}, 1964 + /* give_up_on_oom = */false); 1972 1965 if (IS_ERR(vma)) 1973 1966 return PTR_ERR(vma); 1974 1967

+47 -4

mm/vma.c

··· 666 666 /* 667 667 * Actually perform the VMA merge operation. 668 668 * 669 + * IMPORTANT: We guarantee that, should vmg->give_up_on_oom is set, to not 670 + * modify any VMAs or cause inconsistent state should an OOM condition arise. 671 + * 669 672 * Returns 0 on success, or an error value on failure. 670 673 */ 671 674 static int commit_merge(struct vma_merge_struct *vmg) ··· 688 685 689 686 init_multi_vma_prep(&vp, vma, vmg); 690 687 688 + /* 689 + * If vmg->give_up_on_oom is set, we're safe, because we don't actually 690 + * manipulate any VMAs until we succeed at preallocation. 691 + * 692 + * Past this point, we will not return an error. 693 + */ 691 694 if (vma_iter_prealloc(vmg->vmi, vma)) 692 695 return -ENOMEM; 693 696 ··· 924 915 if (anon_dup) 925 916 unlink_anon_vmas(anon_dup); 926 917 927 - vmg->state = VMA_MERGE_ERROR_NOMEM; 918 + /* 919 + * We've cleaned up any cloned anon_vma's, no VMAs have been 920 + * modified, no harm no foul if the user requests that we not 921 + * report this and just give up, leaving the VMAs unmerged. 922 + */ 923 + if (!vmg->give_up_on_oom) 924 + vmg->state = VMA_MERGE_ERROR_NOMEM; 928 925 return NULL; 929 926 } 930 927 ··· 941 926 abort: 942 927 vma_iter_set(vmg->vmi, start); 943 928 vma_iter_load(vmg->vmi); 944 - vmg->state = VMA_MERGE_ERROR_NOMEM; 929 + 930 + /* 931 + * This means we have failed to clone anon_vma's correctly, but no 932 + * actual changes to VMAs have occurred, so no harm no foul - if the 933 + * user doesn't want this reported and instead just wants to give up on 934 + * the merge, allow it. 935 + */ 936 + if (!vmg->give_up_on_oom) 937 + vmg->state = VMA_MERGE_ERROR_NOMEM; 945 938 return NULL; 946 939 } 947 940 ··· 1091 1068 /* This should already have been checked by this point. */ 1092 1069 VM_WARN_ON_VMG(!can_merge_remove_vma(next), vmg); 1093 1070 vma_start_write(next); 1071 + /* 1072 + * In this case we don't report OOM, so vmg->give_up_on_mm is 1073 + * safe. 1074 + */ 1094 1075 ret = dup_anon_vma(middle, next, &anon_dup); 1095 1076 if (ret) 1096 1077 return ret; ··· 1117 1090 return 0; 1118 1091 1119 1092 nomem: 1120 - vmg->state = VMA_MERGE_ERROR_NOMEM; 1121 1093 if (anon_dup) 1122 1094 unlink_anon_vmas(anon_dup); 1095 + /* 1096 + * If the user requests that we just give upon OOM, we are safe to do so 1097 + * here, as commit merge provides this contract to us. Nothing has been 1098 + * changed - no harm no foul, just don't report it. 1099 + */ 1100 + if (!vmg->give_up_on_oom) 1101 + vmg->state = VMA_MERGE_ERROR_NOMEM; 1123 1102 return -ENOMEM; 1124 1103 } 1125 1104 ··· 1567 1534 if (vmg_nomem(vmg)) 1568 1535 return ERR_PTR(-ENOMEM); 1569 1536 1537 + /* 1538 + * Split can fail for reasons other than OOM, so if the user requests 1539 + * this it's probably a mistake. 1540 + */ 1541 + VM_WARN_ON(vmg->give_up_on_oom && 1542 + (vma->vm_start != start || vma->vm_end != end)); 1543 + 1570 1544 /* Split any preceding portion of the VMA. */ 1571 1545 if (vma->vm_start < start) { 1572 1546 int err = split_vma(vmg->vmi, vma, start, 1); ··· 1642 1602 struct vm_area_struct *vma, 1643 1603 unsigned long start, unsigned long end, 1644 1604 unsigned long new_flags, 1645 - struct vm_userfaultfd_ctx new_ctx) 1605 + struct vm_userfaultfd_ctx new_ctx, 1606 + bool give_up_on_oom) 1646 1607 { 1647 1608 VMG_VMA_STATE(vmg, vmi, prev, vma, start, end); 1648 1609 1649 1610 vmg.flags = new_flags; 1650 1611 vmg.uffd_ctx = new_ctx; 1612 + if (give_up_on_oom) 1613 + vmg.give_up_on_oom = true; 1651 1614 1652 1615 return vma_modify(&vmg); 1653 1616 }

+8 -1

mm/vma.h

··· 114 114 */ 115 115 bool just_expand :1; 116 116 117 + /* 118 + * If a merge is possible, but an OOM error occurs, give up and don't 119 + * execute the merge, returning NULL. 120 + */ 121 + bool give_up_on_oom :1; 122 + 117 123 /* Internal flags set during merge process: */ 118 124 119 125 /* ··· 261 255 struct vm_area_struct *vma, 262 256 unsigned long start, unsigned long end, 263 257 unsigned long new_flags, 264 - struct vm_userfaultfd_ctx new_ctx); 258 + struct vm_userfaultfd_ctx new_ctx, 259 + bool give_up_on_oom); 265 260 266 261 __must_check struct vm_area_struct 267 262 *vma_merge_new_range(struct vma_merge_struct *vmg);