Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Xarray: do not return sibling entries from xas_find_marked()

Patch series "Fixes and cleanups to xarray", v5.

This series contains some random fixes and cleanups to xarray. Patch 1-2
are fixes and patch 3-6 are cleanups. More details can be found in
respective patches.


This patch (of 5):

Similar to issue fixed in commit cbc02854331ed ("XArray: Do not return
sibling entries from xa_load()"), we may return sibling entries from
xas_find_marked as following:
Thread A: Thread B:
xa_store_range(xa, entry, 6, 7, gfp);
xa_set_mark(xa, 6, mark)
XA_STATE(xas, xa, 6);
xas_find_marked(&xas, 7, mark);
offset = xas_find_chunk(xas, advance, mark);
[offset is 6 which points to a valid entry]
xa_store_range(xa, entry, 4, 7, gfp);
entry = xa_entry(xa, node, 6);
[entry is a sibling of 4]
if (!xa_is_node(entry))
return entry;

Skip sibling entry like xas_find() does to protect caller from seeing
sibling entry from xas_find_marked() or caller may use sibling entry
as a valid entry and crash the kernel.

Besides, load_race() test is modified to catch mentioned issue and modified
load_race() only passes after this fix is merged.

Here is an example how this bug could be triggerred in tmpfs which
enables large folio in mapping:
Let's take a look at involved racer:
1. How pages could be created and dirtied in shmem file.
write
ksys_write
vfs_write
new_sync_write
shmem_file_write_iter
generic_perform_write
shmem_write_begin
shmem_get_folio
shmem_allowable_huge_orders
shmem_alloc_and_add_folios
shmem_alloc_folio
__folio_set_locked
shmem_add_to_page_cache
XA_STATE_ORDER(..., index, order)
xax_store()
shmem_write_end
folio_mark_dirty()

2. How dirty pages could be deleted in shmem file.
ioctl
do_vfs_ioctl
file_ioctl
ioctl_preallocate
vfs_fallocate
shmem_fallocate
shmem_truncate_range
shmem_undo_range
truncate_inode_folio
filemap_remove_folio
page_cache_delete
xas_store(&xas, NULL);

3. How dirty pages could be lockless searched
sync_file_range
ksys_sync_file_range
__filemap_fdatawrite_range
filemap_fdatawrite_wbc
do_writepages
writeback_use_writepage
writeback_iter
writeback_get_folio
filemap_get_folios_tag
find_get_entry
folio = xas_find_marked()
folio_try_get(folio)

Kernel will crash as following:
1.Create 2.Search 3.Delete
/* write page 2,3 */
write
...
shmem_write_begin
XA_STATE_ORDER(xas, i_pages, index = 2, order = 1)
xa_store(&xas, folio)
shmem_write_end
folio_mark_dirty()

/* sync page 2 and page 3 */
sync_file_range
...
find_get_entry
folio = xas_find_marked()
/* offset will be 2 */
offset = xas_find_chunk()

/* delete page 2 and page 3 */
ioctl
...
xas_store(&xas, NULL);

/* write page 0-3 */
write
...
shmem_write_begin
XA_STATE_ORDER(xas, i_pages, index = 0, order = 2)
xa_store(&xas, folio)
shmem_write_end
folio_mark_dirty(folio)

/* get sibling entry from offset 2 */
entry = xa_entry(.., 2)
/* use sibling entry as folio and crash kernel */
folio_try_get(folio)

Link: https://lkml.kernel.org/r/20241213122523.12764-1-shikemeng@huaweicloud.com
Link: https://lkml.kernel.org/r/20241213122523.12764-2-shikemeng@huaweicloud.com
Signed-off-by: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Mattew Wilcox <willy@infradead.org> [English fixes]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Kemeng Shi and committed by
Andrew Morton
7e060df0 cb7c77e9

+6
+2
lib/xarray.c
··· 1387 1387 entry = xa_entry(xas->xa, xas->xa_node, xas->xa_offset); 1388 1388 if (!entry && !(xa_track_free(xas->xa) && mark == XA_FREE_MARK)) 1389 1389 continue; 1390 + if (xa_is_sibling(entry)) 1391 + continue; 1390 1392 if (!xa_is_node(entry)) 1391 1393 return entry; 1392 1394 xas->xa_node = xa_to_node(entry);
+4
tools/testing/radix-tree/multiorder.c
··· 227 227 unsigned long index = (3 << RADIX_TREE_MAP_SHIFT) - 228 228 (1 << order); 229 229 item_insert_order(tree, index, order); 230 + xa_set_mark(tree, index, XA_MARK_1); 230 231 item_delete_rcu(tree, index); 231 232 } 232 233 } ··· 243 242 244 243 rcu_register_thread(); 245 244 while (!stop_iteration) { 245 + unsigned long find_index = (2 << RADIX_TREE_MAP_SHIFT) + 1; 246 246 struct item *item = xa_load(ptr, index); 247 + assert(!xa_is_internal(item)); 248 + item = xa_find(ptr, &find_index, index, XA_MARK_1); 247 249 assert(!xa_is_internal(item)); 248 250 } 249 251 rcu_unregister_thread();