Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

pnfs: Proper delay for NFS4ERR_RECALLCONFLICT in layout_get_done

An NFS4ERR_RECALLCONFLICT is returned by server from a GET_LAYOUT
only when a Server Sent a RECALL do to that GET_LAYOUT, or
the RECALL and GET_LAYOUT crossed on the wire.
In any way this means we want to wait at most until in-flight IO
is finished and the RECALL can be satisfied.

So a proper wait here is more like 1/10 of a second, not 15 seconds
like we have now. In case of a server bug we delay exponentially
longer on each retry.

Current code totally craps out performance of very large files on
most pnfs-objects layouts, because of how the map changes when the
file has grown into the next raid group.

[Stable: This will patch back to 3.9. If there are earlier still
maintained trees, please tell me I'll send a patch]

CC: Stable Tree <stable@vger.kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>

authored by

Boaz Harrosh and committed by
Trond Myklebust
ed7e5423 471252cd

+30 -4
+30 -4
fs/nfs/nfs4proc.c
··· 7409 7409 struct nfs_server *server = NFS_SERVER(inode); 7410 7410 struct pnfs_layout_hdr *lo; 7411 7411 struct nfs4_state *state = NULL; 7412 - unsigned long timeo, giveup; 7412 + unsigned long timeo, now, giveup; 7413 7413 7414 - dprintk("--> %s\n", __func__); 7414 + dprintk("--> %s tk_status => %d\n", __func__, -task->tk_status); 7415 7415 7416 7416 if (!nfs41_sequence_done(task, &lgp->res.seq_res)) 7417 7417 goto out; ··· 7419 7419 switch (task->tk_status) { 7420 7420 case 0: 7421 7421 goto out; 7422 + /* 7423 + * NFS4ERR_LAYOUTTRYLATER is a conflict with another client 7424 + * (or clients) writing to the same RAID stripe 7425 + */ 7422 7426 case -NFS4ERR_LAYOUTTRYLATER: 7427 + /* 7428 + * NFS4ERR_RECALLCONFLICT is when conflict with self (must recall 7429 + * existing layout before getting a new one). 7430 + */ 7423 7431 case -NFS4ERR_RECALLCONFLICT: 7424 7432 timeo = rpc_get_timeout(task->tk_client); 7425 7433 giveup = lgp->args.timestamp + timeo; 7426 - if (time_after(giveup, jiffies)) 7427 - task->tk_status = -NFS4ERR_DELAY; 7434 + now = jiffies; 7435 + if (time_after(giveup, now)) { 7436 + unsigned long delay; 7437 + 7438 + /* Delay for: 7439 + * - Not less then NFS4_POLL_RETRY_MIN. 7440 + * - One last time a jiffie before we give up 7441 + * - exponential backoff (time_now minus start_attempt) 7442 + */ 7443 + delay = max_t(unsigned long, NFS4_POLL_RETRY_MIN, 7444 + min((giveup - now - 1), 7445 + now - lgp->args.timestamp)); 7446 + 7447 + dprintk("%s: NFS4ERR_RECALLCONFLICT waiting %lu\n", 7448 + __func__, delay); 7449 + rpc_delay(task, delay); 7450 + task->tk_status = 0; 7451 + rpc_restart_call_prepare(task); 7452 + goto out; /* Do not call nfs4_async_handle_error() */ 7453 + } 7428 7454 break; 7429 7455 case -NFS4ERR_EXPIRED: 7430 7456 case -NFS4ERR_BAD_STATEID: