Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

fuse: fix readahead reclaim deadlock

Commit e26ee4efbc79 ("fuse: allocate ff->release_args only if release is
needed") skips allocating ff->release_args if the server does not
implement open. However in doing so, fuse_prepare_release() now skips
grabbing the reference on the inode, which makes it possible for an
inode to be evicted from the dcache while there are inflight readahead
requests. This causes a deadlock if the server triggers reclaim while
servicing the readahead request and reclaim attempts to evict the inode
of the file being read ahead. Since the folio is locked during
readahead, when reclaim evicts the fuse inode and fuse_evict_inode()
attempts to remove all folios associated with the inode from the page
cache (truncate_inode_pages_range()), reclaim will block forever waiting
for the lock since readahead cannot relinquish the lock because it is
itself blocked in reclaim:

>>> stack_trace(1504735)
folio_wait_bit_common (mm/filemap.c:1308:4)
folio_lock (./include/linux/pagemap.h:1052:3)
truncate_inode_pages_range (mm/truncate.c:336:10)
fuse_evict_inode (fs/fuse/inode.c:161:2)
evict (fs/inode.c:704:3)
dentry_unlink_inode (fs/dcache.c:412:3)
__dentry_kill (fs/dcache.c:615:3)
shrink_kill (fs/dcache.c:1060:12)
shrink_dentry_list (fs/dcache.c:1087:3)
prune_dcache_sb (fs/dcache.c:1168:2)
super_cache_scan (fs/super.c:221:10)
do_shrink_slab (mm/shrinker.c:435:9)
shrink_slab (mm/shrinker.c:626:10)
shrink_node (mm/vmscan.c:5951:2)
shrink_zones (mm/vmscan.c:6195:3)
do_try_to_free_pages (mm/vmscan.c:6257:3)
do_swap_page (mm/memory.c:4136:11)
handle_pte_fault (mm/memory.c:5562:10)
handle_mm_fault (mm/memory.c:5870:9)
do_user_addr_fault (arch/x86/mm/fault.c:1338:10)
handle_page_fault (arch/x86/mm/fault.c:1481:3)
exc_page_fault (arch/x86/mm/fault.c:1539:2)
asm_exc_page_fault+0x22/0x27

Fix this deadlock by allocating ff->release_args and grabbing the
reference on the inode when preparing the file for release even if the
server does not implement open. The inode reference will be dropped when
the last reference on the fuse file is dropped (see fuse_file_put() ->
fuse_release_end()).

Fixes: e26ee4efbc79 ("fuse: allocate ff->release_args only if release is needed")
Cc: stable@vger.kernel.org
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reported-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>

authored by

Joanne Koong and committed by
Miklos Szeredi
bd5603ea e9a6fb0b

+19 -7
+19 -7
fs/fuse/file.c
··· 110 110 fuse_file_io_release(ff, ra->inode); 111 111 112 112 if (!args) { 113 - /* Do nothing when server does not implement 'open' */ 113 + /* Do nothing when server does not implement 'opendir' */ 114 + } else if (args->opcode == FUSE_RELEASE && ff->fm->fc->no_open) { 115 + fuse_release_end(ff->fm, args, 0); 114 116 } else if (sync) { 115 117 fuse_simple_request(ff->fm, args); 116 118 fuse_release_end(ff->fm, args, 0); ··· 133 131 struct fuse_file *ff; 134 132 int opcode = isdir ? FUSE_OPENDIR : FUSE_OPEN; 135 133 bool open = isdir ? !fc->no_opendir : !fc->no_open; 134 + bool release = !isdir || open; 136 135 137 - ff = fuse_file_alloc(fm, open); 136 + /* 137 + * ff->args->release_args still needs to be allocated (so we can hold an 138 + * inode reference while there are pending inflight file operations when 139 + * ->release() is called, see fuse_prepare_release()) even if 140 + * fc->no_open is set else it becomes possible for reclaim to deadlock 141 + * if while servicing the readahead request the server triggers reclaim 142 + * and reclaim evicts the inode of the file being read ahead. 143 + */ 144 + ff = fuse_file_alloc(fm, release); 138 145 if (!ff) 139 146 return ERR_PTR(-ENOMEM); 140 147 ··· 163 152 fuse_file_free(ff); 164 153 return ERR_PTR(err); 165 154 } else { 166 - /* No release needed */ 167 - kfree(ff->args); 168 - ff->args = NULL; 169 - if (isdir) 155 + if (isdir) { 156 + /* No release needed */ 157 + kfree(ff->args); 158 + ff->args = NULL; 170 159 fc->no_opendir = 1; 171 - else 160 + } else { 172 161 fc->no_open = 1; 162 + } 173 163 } 174 164 } 175 165