Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

um: use proper care when taking mmap lock during segfault

Segfaults can occur at times where the mmap lock cannot be taken. If
that happens the segfault handler may not be able to take the mmap lock.

Fix the code to use the same approach as most other architectures.
Unfortunately, this requires copying code from mm/memory.c and modifying
it slightly as UML does not have exception tables.

Signed-off-by: Benjamin Berg <benjamin.berg@intel.com>
Link: https://patch.msgid.link/20250408074524.300153-2-benjamin@sipsolutions.net
Signed-off-by: Johannes Berg <johannes.berg@intel.com>

authored by

Benjamin Berg and committed by
Johannes Berg
6767e878 49caacf1

+117 -12
+117 -12
arch/um/kernel/trap.c
··· 18 18 #include <skas.h> 19 19 20 20 /* 21 + * NOTE: UML does not have exception tables. As such, this is almost a copy 22 + * of the code in mm/memory.c, only adjusting the logic to simply check whether 23 + * we are coming from the kernel instead of doing an additional lookup in the 24 + * exception table. 25 + * We can do this simplification because we never get here if the exception was 26 + * fixable. 27 + */ 28 + static inline bool get_mmap_lock_carefully(struct mm_struct *mm, bool is_user) 29 + { 30 + if (likely(mmap_read_trylock(mm))) 31 + return true; 32 + 33 + if (!is_user) 34 + return false; 35 + 36 + return !mmap_read_lock_killable(mm); 37 + } 38 + 39 + static inline bool mmap_upgrade_trylock(struct mm_struct *mm) 40 + { 41 + /* 42 + * We don't have this operation yet. 43 + * 44 + * It should be easy enough to do: it's basically a 45 + * atomic_long_try_cmpxchg_acquire() 46 + * from RWSEM_READER_BIAS -> RWSEM_WRITER_LOCKED, but 47 + * it also needs the proper lockdep magic etc. 48 + */ 49 + return false; 50 + } 51 + 52 + static inline bool upgrade_mmap_lock_carefully(struct mm_struct *mm, bool is_user) 53 + { 54 + mmap_read_unlock(mm); 55 + if (!is_user) 56 + return false; 57 + 58 + return !mmap_write_lock_killable(mm); 59 + } 60 + 61 + /* 62 + * Helper for page fault handling. 63 + * 64 + * This is kind of equivalend to "mmap_read_lock()" followed 65 + * by "find_extend_vma()", except it's a lot more careful about 66 + * the locking (and will drop the lock on failure). 67 + * 68 + * For example, if we have a kernel bug that causes a page 69 + * fault, we don't want to just use mmap_read_lock() to get 70 + * the mm lock, because that would deadlock if the bug were 71 + * to happen while we're holding the mm lock for writing. 72 + * 73 + * So this checks the exception tables on kernel faults in 74 + * order to only do this all for instructions that are actually 75 + * expected to fault. 76 + * 77 + * We can also actually take the mm lock for writing if we 78 + * need to extend the vma, which helps the VM layer a lot. 79 + */ 80 + static struct vm_area_struct * 81 + um_lock_mm_and_find_vma(struct mm_struct *mm, 82 + unsigned long addr, bool is_user) 83 + { 84 + struct vm_area_struct *vma; 85 + 86 + if (!get_mmap_lock_carefully(mm, is_user)) 87 + return NULL; 88 + 89 + vma = find_vma(mm, addr); 90 + if (likely(vma && (vma->vm_start <= addr))) 91 + return vma; 92 + 93 + /* 94 + * Well, dang. We might still be successful, but only 95 + * if we can extend a vma to do so. 96 + */ 97 + if (!vma || !(vma->vm_flags & VM_GROWSDOWN)) { 98 + mmap_read_unlock(mm); 99 + return NULL; 100 + } 101 + 102 + /* 103 + * We can try to upgrade the mmap lock atomically, 104 + * in which case we can continue to use the vma 105 + * we already looked up. 106 + * 107 + * Otherwise we'll have to drop the mmap lock and 108 + * re-take it, and also look up the vma again, 109 + * re-checking it. 110 + */ 111 + if (!mmap_upgrade_trylock(mm)) { 112 + if (!upgrade_mmap_lock_carefully(mm, is_user)) 113 + return NULL; 114 + 115 + vma = find_vma(mm, addr); 116 + if (!vma) 117 + goto fail; 118 + if (vma->vm_start <= addr) 119 + goto success; 120 + if (!(vma->vm_flags & VM_GROWSDOWN)) 121 + goto fail; 122 + } 123 + 124 + if (expand_stack_locked(vma, addr)) 125 + goto fail; 126 + 127 + success: 128 + mmap_write_downgrade(mm); 129 + return vma; 130 + 131 + fail: 132 + mmap_write_unlock(mm); 133 + return NULL; 134 + } 135 + 136 + /* 21 137 * Note this is constrained to return 0, -EFAULT, -EACCES, -ENOMEM by 22 138 * segv(). 23 139 */ ··· 159 43 if (is_user) 160 44 flags |= FAULT_FLAG_USER; 161 45 retry: 162 - mmap_read_lock(mm); 163 - vma = find_vma(mm, address); 164 - if (!vma) 165 - goto out; 166 - if (vma->vm_start <= address) 167 - goto good_area; 168 - if (!(vma->vm_flags & VM_GROWSDOWN)) 169 - goto out; 170 - if (is_user && !ARCH_IS_STACKGROW(address)) 171 - goto out; 172 - vma = expand_stack(mm, address); 46 + vma = um_lock_mm_and_find_vma(mm, address, is_user); 173 47 if (!vma) 174 48 goto out_nosemaphore; 175 49 176 - good_area: 177 50 *code_out = SEGV_ACCERR; 178 51 if (is_write) { 179 52 if (!(vma->vm_flags & VM_WRITE))