Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: fix a UAF when vma->mm is freed after vma->vm_refcnt got dropped

By inducing delays in the right places, Jann Horn created a reproducer for
a hard to hit UAF issue that became possible after VMAs were allowed to be
recycled by adding SLAB_TYPESAFE_BY_RCU to their cache.

Race description is borrowed from Jann's discovery report:
lock_vma_under_rcu() looks up a VMA locklessly with mas_walk() under
rcu_read_lock(). At that point, the VMA may be concurrently freed, and it
can be recycled by another process. vma_start_read() then increments the
vma->vm_refcnt (if it is in an acceptable range), and if this succeeds,
vma_start_read() can return a recycled VMA.

In this scenario where the VMA has been recycled, lock_vma_under_rcu()
will then detect the mismatching ->vm_mm pointer and drop the VMA through
vma_end_read(), which calls vma_refcount_put(). vma_refcount_put() drops
the refcount and then calls rcuwait_wake_up() using a copy of vma->vm_mm.
This is wrong: It implicitly assumes that the caller is keeping the VMA's
mm alive, but in this scenario the caller has no relation to the VMA's mm,
so the rcuwait_wake_up() can cause UAF.

The diagram depicting the race:
T1 T2 T3
== == ==
lock_vma_under_rcu
mas_walk
<VMA gets removed from mm>
mmap
<the same VMA is reallocated>
vma_start_read
__refcount_inc_not_zero_limited_acquire
munmap
__vma_enter_locked
refcount_add_not_zero
vma_end_read
vma_refcount_put
__refcount_dec_and_test
rcuwait_wait_event
<finish operation>
rcuwait_wake_up [UAF]

Note that rcuwait_wait_event() in T3 does not block because refcount was
already dropped by T1. At this point T3 can exit and free the mm causing
UAF in T1.

To avoid this we move vma->vm_mm verification into vma_start_read() and
grab vma->vm_mm to stabilize it before vma_refcount_put() operation.

[surenb@google.com: v3]
Link: https://lkml.kernel.org/r/20250729145709.2731370-1-surenb@google.com
Link: https://lkml.kernel.org/r/20250728175355.2282375-1-surenb@google.com
Fixes: 3104138517fc ("mm: make vma cache SLAB_TYPESAFE_BY_RCU")
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Reported-by: Jann Horn <jannh@google.com>
Closes: https://lore.kernel.org/all/CAG48ez0-deFbVH=E3jbkWx=X3uVbd8nWeo6kbJPQ0KoUD+m2tA@mail.gmail.com/
Reviewed-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Jann Horn <jannh@google.com>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Suren Baghdasaryan and committed by
Andrew Morton
9bbffee6 a222439e

+33 -7
+30
include/linux/mmap_lock.h
··· 12 12 #include <linux/tracepoint-defs.h> 13 13 #include <linux/types.h> 14 14 #include <linux/cleanup.h> 15 + #include <linux/sched/mm.h> 15 16 16 17 #define MMAP_LOCK_INITIALIZER(name) \ 17 18 .mmap_lock = __RWSEM_INITIALIZER((name).mmap_lock), ··· 155 154 * reused and attached to a different mm before we lock it. 156 155 * Returns the vma on success, NULL on failure to lock and EAGAIN if vma got 157 156 * detached. 157 + * 158 + * WARNING! The vma passed to this function cannot be used if the function 159 + * fails to lock it because in certain cases RCU lock is dropped and then 160 + * reacquired. Once RCU lock is dropped the vma can be concurently freed. 158 161 */ 159 162 static inline struct vm_area_struct *vma_start_read(struct mm_struct *mm, 160 163 struct vm_area_struct *vma) ··· 188 183 } 189 184 190 185 rwsem_acquire_read(&vma->vmlock_dep_map, 0, 1, _RET_IP_); 186 + 187 + /* 188 + * If vma got attached to another mm from under us, that mm is not 189 + * stable and can be freed in the narrow window after vma->vm_refcnt 190 + * is dropped and before rcuwait_wake_up(mm) is called. Grab it before 191 + * releasing vma->vm_refcnt. 192 + */ 193 + if (unlikely(vma->vm_mm != mm)) { 194 + /* Use a copy of vm_mm in case vma is freed after we drop vm_refcnt */ 195 + struct mm_struct *other_mm = vma->vm_mm; 196 + 197 + /* 198 + * __mmdrop() is a heavy operation and we don't need RCU 199 + * protection here. Release RCU lock during these operations. 200 + * We reinstate the RCU read lock as the caller expects it to 201 + * be held when this function returns even on error. 202 + */ 203 + rcu_read_unlock(); 204 + mmgrab(other_mm); 205 + vma_refcount_put(vma); 206 + mmdrop(other_mm); 207 + rcu_read_lock(); 208 + return NULL; 209 + } 210 + 191 211 /* 192 212 * Overflow of vm_lock_seq/mm_lock_seq might produce false locked result. 193 213 * False unlocked result is impossible because we modify and check
+3 -7
mm/mmap_lock.c
··· 164 164 */ 165 165 166 166 /* Check if the vma we locked is the right one. */ 167 - if (unlikely(vma->vm_mm != mm || 168 - address < vma->vm_start || address >= vma->vm_end)) 167 + if (unlikely(address < vma->vm_start || address >= vma->vm_end)) 169 168 goto inval_end_read; 170 169 171 170 rcu_read_unlock(); ··· 235 236 goto fallback; 236 237 } 237 238 238 - /* 239 - * Verify the vma we locked belongs to the same address space and it's 240 - * not behind of the last search position. 241 - */ 242 - if (unlikely(vma->vm_mm != mm || from_addr >= vma->vm_end)) 239 + /* Verify the vma is not behind the last search position. */ 240 + if (unlikely(from_addr >= vma->vm_end)) 243 241 goto fallback_unlock; 244 242 245 243 /*