Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

netfs: Fix a number of read-retry hangs

Fix a number of hangs in the netfslib read-retry code, including:

(1) netfs_reissue_read() doubles up the getting of references on
subrequests, thereby leaking the subrequest and causing inode eviction
to wait indefinitely. This can lead to the kernel reporting a hang in
the filesystem's evict_inode().

Fix this by removing the get from netfs_reissue_read() and adding one
to netfs_retry_read_subrequests() to deal with the one place that
didn't double up.

(2) The loop in netfs_retry_read_subrequests() that retries a sequence of
failed subrequests doesn't record whether or not it retried the one
that the "subreq" pointer points to when it leaves the loop. It may
not if renegotiation/repreparation of the subrequests means that fewer
subrequests are needed to span the cumulative range of the sequence.

Because it doesn't record this, the piece of code that discards
now-superfluous subrequests doesn't know whether it should discard the
one "subreq" points to - and so it doesn't.

Fix this by noting whether the last subreq it examines is superfluous
and if it is, then getting rid of it and all subsequent subrequests.

If that one one wasn't superfluous, then we would have tried to go
round the previous loop again and so there can be no further unretried
subrequests in the sequence.

(3) netfs_retry_read_subrequests() gets yet an extra ref on any additional
subrequests it has to get because it ran out of ones it could reuse to
to renegotiation/repreparation shrinking the subrequests.

Fix this by removing that extra ref.

(4) In netfs_retry_reads(), it was using wait_on_bit() to wait for
NETFS_SREQ_IN_PROGRESS to be cleared on all subrequests in the
sequence - but netfs_read_subreq_terminated() is now using a wait
queue on the request instead and so this wait will never finish.

Fix this by waiting on the wait queue instead. To make this work, a
new flag, NETFS_RREQ_RETRYING, is now set around the wait loop to tell
the wake-up code to wake up the wait queue rather than requeuing the
request's work item.

Note that this flag replaces the NETFS_RREQ_NEED_RETRY flag which is
no longer used.

(5) Whilst not strictly anything to do with the hang,
netfs_retry_read_subrequests() was also doubly incrementing the
subreq_counter and re-setting the debug index, leaving a gap in the
trace. This is also fixed.

One of these hangs was observed with 9p and with cifs. Others were forced
by manual code injection into fs/afs/file.c. Firstly, afs_prepare_read()
was created to provide an changing pattern of maximum subrequest sizes:

static int afs_prepare_read(struct netfs_io_subrequest *subreq)
{
struct netfs_io_request *rreq = subreq->rreq;
if (!S_ISREG(subreq->rreq->inode->i_mode))
return 0;
if (subreq->retry_count < 20)
rreq->io_streams[0].sreq_max_len =
umax(200, 2222 - subreq->retry_count * 40);
else
rreq->io_streams[0].sreq_max_len = 3333;
return 0;
}

and pointed to by afs_req_ops. Then the following:

struct netfs_io_subrequest *subreq = op->fetch.subreq;
if (subreq->error == 0 &&
S_ISREG(subreq->rreq->inode->i_mode) &&
subreq->retry_count < 20) {
subreq->transferred = subreq->already_done;
__clear_bit(NETFS_SREQ_HIT_EOF, &subreq->flags);
__set_bit(NETFS_SREQ_NEED_RETRY, &subreq->flags);
afs_fetch_data_notify(op);
return;
}

was inserted into afs_fetch_data_success() at the beginning and struct
netfs_io_subrequest given an extra field, "already_done" that was set to
the value in "subreq->transferred" by netfs_reissue_read().

When reading a 4K file, the subrequests would get gradually smaller, a new
subrequest would be allocated around the 3rd retry and then eventually be
rendered superfluous when the 20th retry was hit and the limit on the first
subrequest was eased.

Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/20250212222402.3618494-2-dhowells@redhat.com
Tested-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Steve French <stfrench@microsoft.com>
cc: Ihor Solodrai <ihor.solodrai@pm.me>
cc: Eric Van Hensbergen <ericvh@kernel.org>
cc: Latchesar Ionkov <lucho@ionkov.net>
cc: Dominique Martinet <asmadeus@codewreck.org>
cc: Christian Schoenebeck <linux_oss@crudebyte.com>
cc: Paulo Alcantara <pc@manguebit.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: v9fs@lists.linux.dev
cc: linux-cifs@vger.kernel.org
cc: netfs@lists.linux.dev
cc: linux-fsdevel@vger.kernel.org
Signed-off-by: Christian Brauner <brauner@kernel.org>

authored by

David Howells and committed by
Christian Brauner
1d001396 24018929

+38 -14
+4 -2
fs/netfs/read_collect.c
··· 470 470 */ 471 471 void netfs_wake_read_collector(struct netfs_io_request *rreq) 472 472 { 473 - if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags)) { 473 + if (test_bit(NETFS_RREQ_OFFLOAD_COLLECTION, &rreq->flags) && 474 + !test_bit(NETFS_RREQ_RETRYING, &rreq->flags)) { 474 475 if (!work_pending(&rreq->work)) { 475 476 netfs_get_request(rreq, netfs_rreq_trace_get_work); 476 477 if (!queue_work(system_unbound_wq, &rreq->work)) ··· 587 586 smp_mb__after_atomic(); /* Clear IN_PROGRESS before task state */ 588 587 589 588 /* If we are at the head of the queue, wake up the collector. */ 590 - if (list_is_first(&subreq->rreq_link, &stream->subrequests)) 589 + if (list_is_first(&subreq->rreq_link, &stream->subrequests) || 590 + test_bit(NETFS_RREQ_RETRYING, &rreq->flags)) 591 591 netfs_wake_read_collector(rreq); 592 592 593 593 netfs_put_subrequest(subreq, true, netfs_sreq_trace_put_terminated);
+30 -10
fs/netfs/read_retry.c
··· 14 14 { 15 15 __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags); 16 16 __set_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags); 17 - netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); 18 17 subreq->rreq->netfs_ops->issue_read(subreq); 19 18 } 20 19 ··· 47 48 __clear_bit(NETFS_SREQ_MADE_PROGRESS, &subreq->flags); 48 49 subreq->retry_count++; 49 50 netfs_reset_iter(subreq); 51 + netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); 50 52 netfs_reissue_read(rreq, subreq); 51 53 } 52 54 } ··· 75 75 struct iov_iter source; 76 76 unsigned long long start, len; 77 77 size_t part; 78 - bool boundary = false; 78 + bool boundary = false, subreq_superfluous = false; 79 79 80 80 /* Go through the subreqs and find the next span of contiguous 81 81 * buffer that we then rejig (cifs, for example, needs the ··· 116 116 /* Work through the sublist. */ 117 117 subreq = from; 118 118 list_for_each_entry_from(subreq, &stream->subrequests, rreq_link) { 119 - if (!len) 119 + if (!len) { 120 + subreq_superfluous = true; 120 121 break; 122 + } 121 123 subreq->source = NETFS_DOWNLOAD_FROM_SERVER; 122 124 subreq->start = start - subreq->transferred; 123 125 subreq->len = len + subreq->transferred; ··· 156 154 157 155 netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); 158 156 netfs_reissue_read(rreq, subreq); 159 - if (subreq == to) 157 + if (subreq == to) { 158 + subreq_superfluous = false; 160 159 break; 160 + } 161 161 } 162 162 163 163 /* If we managed to use fewer subreqs, we can discard the 164 164 * excess; if we used the same number, then we're done. 165 165 */ 166 166 if (!len) { 167 - if (subreq == to) 167 + if (!subreq_superfluous) 168 168 continue; 169 169 list_for_each_entry_safe_from(subreq, tmp, 170 170 &stream->subrequests, rreq_link) { 171 - trace_netfs_sreq(subreq, netfs_sreq_trace_discard); 171 + trace_netfs_sreq(subreq, netfs_sreq_trace_superfluous); 172 172 list_del(&subreq->rreq_link); 173 173 netfs_put_subrequest(subreq, false, netfs_sreq_trace_put_done); 174 174 if (subreq == to) ··· 191 187 subreq->source = NETFS_DOWNLOAD_FROM_SERVER; 192 188 subreq->start = start; 193 189 subreq->len = len; 194 - subreq->debug_index = atomic_inc_return(&rreq->subreq_counter); 195 190 subreq->stream_nr = stream->stream_nr; 196 191 subreq->retry_count = 1; 197 192 198 193 trace_netfs_sreq_ref(rreq->debug_id, subreq->debug_index, 199 194 refcount_read(&subreq->ref), 200 195 netfs_sreq_trace_new); 201 - netfs_get_subrequest(subreq, netfs_sreq_trace_get_resubmit); 202 196 203 197 list_add(&subreq->rreq_link, &to->rreq_link); 204 198 to = list_next_entry(to, rreq_link); ··· 258 256 { 259 257 struct netfs_io_subrequest *subreq; 260 258 struct netfs_io_stream *stream = &rreq->io_streams[0]; 259 + DEFINE_WAIT(myself); 260 + 261 + set_bit(NETFS_RREQ_RETRYING, &rreq->flags); 261 262 262 263 /* Wait for all outstanding I/O to quiesce before performing retries as 263 264 * we may need to renegotiate the I/O sizes. 264 265 */ 265 266 list_for_each_entry(subreq, &stream->subrequests, rreq_link) { 266 - wait_on_bit(&subreq->flags, NETFS_SREQ_IN_PROGRESS, 267 - TASK_UNINTERRUPTIBLE); 267 + if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags)) 268 + continue; 269 + 270 + trace_netfs_rreq(rreq, netfs_rreq_trace_wait_queue); 271 + for (;;) { 272 + prepare_to_wait(&rreq->waitq, &myself, TASK_UNINTERRUPTIBLE); 273 + 274 + if (!test_bit(NETFS_SREQ_IN_PROGRESS, &subreq->flags)) 275 + break; 276 + 277 + trace_netfs_sreq(subreq, netfs_sreq_trace_wait_for); 278 + schedule(); 279 + trace_netfs_rreq(rreq, netfs_rreq_trace_woke_queue); 280 + } 281 + 282 + finish_wait(&rreq->waitq, &myself); 268 283 } 284 + clear_bit(NETFS_RREQ_RETRYING, &rreq->flags); 269 285 270 286 trace_netfs_rreq(rreq, netfs_rreq_trace_resubmit); 271 287 netfs_retry_read_subrequests(rreq);
+1 -1
include/linux/netfs.h
··· 278 278 #define NETFS_RREQ_PAUSE 11 /* Pause subrequest generation */ 279 279 #define NETFS_RREQ_USE_IO_ITER 12 /* Use ->io_iter rather than ->i_pages */ 280 280 #define NETFS_RREQ_ALL_QUEUED 13 /* All subreqs are now queued */ 281 - #define NETFS_RREQ_NEED_RETRY 14 /* Need to try retrying */ 281 + #define NETFS_RREQ_RETRYING 14 /* Set if we're in the retry path */ 282 282 #define NETFS_RREQ_USE_PGPRIV2 31 /* [DEPRECATED] Use PG_private_2 to mark 283 283 * write to cache on read */ 284 284 const struct netfs_request_ops *netfs_ops;
+3 -1
include/trace/events/netfs.h
··· 99 99 EM(netfs_sreq_trace_limited, "LIMIT") \ 100 100 EM(netfs_sreq_trace_need_clear, "N-CLR") \ 101 101 EM(netfs_sreq_trace_partial_read, "PARTR") \ 102 - EM(netfs_sreq_trace_need_retry, "NRTRY") \ 102 + EM(netfs_sreq_trace_need_retry, "ND-RT") \ 103 103 EM(netfs_sreq_trace_prepare, "PREP ") \ 104 104 EM(netfs_sreq_trace_prep_failed, "PRPFL") \ 105 105 EM(netfs_sreq_trace_progress, "PRGRS") \ ··· 108 108 EM(netfs_sreq_trace_short, "SHORT") \ 109 109 EM(netfs_sreq_trace_split, "SPLIT") \ 110 110 EM(netfs_sreq_trace_submit, "SUBMT") \ 111 + EM(netfs_sreq_trace_superfluous, "SPRFL") \ 111 112 EM(netfs_sreq_trace_terminated, "TERM ") \ 113 + EM(netfs_sreq_trace_wait_for, "_WAIT") \ 112 114 EM(netfs_sreq_trace_write, "WRITE") \ 113 115 EM(netfs_sreq_trace_write_skip, "SKIP ") \ 114 116 E_(netfs_sreq_trace_write_term, "WTERM")