Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

mm: khugepaged: fix kernel BUG in hpage_collapse_scan_file()

Syzkaller reported the following issue:

kernel BUG at mm/khugepaged.c:1823!
invalid opcode: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 5097 Comm: syz-executor220 Not tainted 6.2.0-syzkaller-13154-g857f1268a591 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/16/2023
RIP: 0010:collapse_file mm/khugepaged.c:1823 [inline]
RIP: 0010:hpage_collapse_scan_file+0x67c8/0x7580 mm/khugepaged.c:2233
Code: 00 00 89 de e8 c9 66 a3 ff 31 ff 89 de e8 c0 66 a3 ff 45 84 f6 0f 85 28 0d 00 00 e8 22 64 a3 ff e9 dc f7 ff ff e8 18 64 a3 ff <0f> 0b f3 0f 1e fa e8 0d 64 a3 ff e9 93 f6 ff ff f3 0f 1e fa 4c 89
RSP: 0018:ffffc90003dff4e0 EFLAGS: 00010093
RAX: ffffffff81e95988 RBX: 00000000000001c1 RCX: ffff8880205b3a80
RDX: 0000000000000000 RSI: 00000000000001c0 RDI: 00000000000001c1
RBP: ffffc90003dff830 R08: ffffffff81e90e67 R09: fffffbfff1a433c3
R10: 0000000000000000 R11: dffffc0000000001 R12: 0000000000000000
R13: ffffc90003dff6c0 R14: 00000000000001c0 R15: 0000000000000000
FS: 00007fdbae5ee700(0000) GS:ffff8880b9900000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdbae6901e0 CR3: 000000007b2dd000 CR4: 00000000003506e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
madvise_collapse+0x721/0xf50 mm/khugepaged.c:2693
madvise_vma_behavior mm/madvise.c:1086 [inline]
madvise_walk_vmas mm/madvise.c:1260 [inline]
do_madvise+0x9e5/0x4680 mm/madvise.c:1439
__do_sys_madvise mm/madvise.c:1452 [inline]
__se_sys_madvise mm/madvise.c:1450 [inline]
__x64_sys_madvise+0xa5/0xb0 mm/madvise.c:1450
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

The xas_store() call during page cache scanning can potentially translate
'xas' into the error state (with the reproducer provided by the syzkaller
the error code is -ENOMEM). However, there are no further checks after
the 'xas_store', and the next call of 'xas_next' at the start of the
scanning cycle doesn't increase the xa_index, and the issue occurs.

This patch will add the xarray state error checking after the xas_store()
and the corresponding result error code.

Tested via syzbot.

[akpm@linux-foundation.org: update include/trace/events/huge_memory.h's SCAN_STATUS]
Link: https://lkml.kernel.org/r/20230329145330.23191-1-ivan.orlov0322@gmail.com
Link: https://syzkaller.appspot.com/bug?id=7d6bb3760e026ece7524500fe44fb024a0e959fc
Signed-off-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Reported-by: syzbot+9578faa5475acb35fa50@syzkaller.appspotmail.com
Tested-by: Zach O'Keefe <zokeefe@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Himadri Pandya <himadrispandya@gmail.com>
Cc: Ivan Orlov <ivan.orlov0322@gmail.com>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

authored by

Ivan Orlov and committed by
Andrew Morton
2ce0bdfe 90fd8336

+22 -1
+2 -1
include/trace/events/huge_memory.h
··· 36 36 EM( SCAN_ALLOC_HUGE_PAGE_FAIL, "alloc_huge_page_failed") \ 37 37 EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \ 38 38 EM( SCAN_TRUNCATED, "truncated") \ 39 - EMe(SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ 39 + EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ 40 + EMe(SCAN_STORE_FAILED, "store_failed") 40 41 41 42 #undef EM 42 43 #undef EMe
+20
mm/khugepaged.c
··· 55 55 SCAN_CGROUP_CHARGE_FAIL, 56 56 SCAN_TRUNCATED, 57 57 SCAN_PAGE_HAS_PRIVATE, 58 + SCAN_STORE_FAILED, 58 59 }; 59 60 60 61 #define CREATE_TRACE_POINTS ··· 1858 1857 goto xa_locked; 1859 1858 } 1860 1859 xas_store(&xas, hpage); 1860 + if (xas_error(&xas)) { 1861 + /* revert shmem_charge performed 1862 + * in the previous condition 1863 + */ 1864 + mapping->nrpages--; 1865 + shmem_uncharge(mapping->host, 1); 1866 + result = SCAN_STORE_FAILED; 1867 + goto xa_locked; 1868 + } 1861 1869 nr_none++; 1862 1870 continue; 1863 1871 } ··· 2019 2009 2020 2010 /* Finally, replace with the new page. */ 2021 2011 xas_store(&xas, hpage); 2012 + /* We can't get an ENOMEM here (because the allocation happened before) 2013 + * but let's check for errors (XArray implementation can be 2014 + * changed in the future) 2015 + */ 2016 + WARN_ON_ONCE(xas_error(&xas)); 2022 2017 continue; 2023 2018 out_unlock: 2024 2019 unlock_page(page); ··· 2061 2046 /* Join all the small entries into a single multi-index entry */ 2062 2047 xas_set_order(&xas, start, HPAGE_PMD_ORDER); 2063 2048 xas_store(&xas, hpage); 2049 + /* Here we can't get an ENOMEM (because entries were 2050 + * previously allocated) But let's check for errors 2051 + * (XArray implementation can be changed in the future) 2052 + */ 2053 + WARN_ON_ONCE(xas_error(&xas)); 2064 2054 xa_locked: 2065 2055 xas_unlock_irq(&xas); 2066 2056 xa_unlocked: