Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client updates from Anna Schumaker:
"New Features:
- Enable using direct IO with localio
- Added localio related tracepoints

Bugfixes:
- Sunrpc fixes for working with a very large cl_tasks list
- Fix a possible buffer overflow in nfs_sysfs_link_rpc_client()
- Fixes for handling reconnections with localio
- Fix how the NFS_FSCACHE kconfig option interacts with NETFS_SUPPORT
- Fix COPY_NOTIFY xdr_buf size calculations
- pNFS/Flexfiles fix for retrying requesting a layout segment for
reads
- Sunrpc fix for retrying on EKEYEXPIRED error when the TGT is
expired

Cleanups:
- Various other nfs & nfsd localio cleanups
- Prepratory patches for async copy improvements that are under
development
- Make OFFLOAD_CANCEL, LAYOUTSTATS, and LAYOUTERR moveable to other
xprts
- Add netns inum and srcaddr to debugfs rpc_xprt info"

* tag 'nfs-for-6.14-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (28 commits)
SUNRPC: do not retry on EKEYEXPIRED when user TGT ticket expired
sunrpc: add netns inum and srcaddr to debugfs rpc_xprt info
pnfs/flexfiles: retry getting layout segment for reads
NFSv4.2: make LAYOUTSTATS and LAYOUTERROR MOVEABLE
NFSv4.2: mark OFFLOAD_CANCEL MOVEABLE
NFSv4.2: fix COPY_NOTIFY xdr buf size calculation
NFS: Rename struct nfs4_offloadcancel_data
NFS: Fix typo in OFFLOAD_CANCEL comment
NFS: CB_OFFLOAD can return NFS4ERR_DELAY
nfs: Make NFS_FSCACHE select NETFS_SUPPORT instead of depending on it
nfs: fix incorrect error handling in LOCALIO
nfs: probe for LOCALIO when v3 client reconnects to server
nfs: probe for LOCALIO when v4 client reconnects to server
nfs/localio: remove redundant code and simplify LOCALIO enablement
nfs_common: add nfs_localio trace events
nfs_common: track all open nfsd_files per LOCALIO nfs_client
nfs_common: rename nfslocalio nfs_uuid_lock to nfs_uuids_lock
nfsd: nfsd_file_acquire_local no longer returns GC'd nfsd_file
nfsd: rename nfsd_serv_ prefixed methods and variables with nfsd_net_
nfsd: update percpu_ref to manage references on nfsd_net
...

+840 -319
+50 -50
Documentation/filesystems/nfs/localio.rst
··· 218 218 =============================== 219 219 220 220 LOCALIO provides the nfs_uuid_t object and associated interfaces to 221 - allow proper network namespace (net-ns) and NFSD object refcounting: 221 + allow proper network namespace (net-ns) and NFSD object refcounting. 222 222 223 - We don't want to keep a long-term counted reference on each NFSD's 224 - net-ns in the client because that prevents a server container from 225 - completely shutting down. 226 - 227 - So we avoid taking a reference at all and rely on the per-cpu 228 - reference to the server (detailed below) being sufficient to keep 229 - the net-ns active. This involves allowing the NFSD's net-ns exit 230 - code to iterate all active clients and clear their ->net pointers 231 - (which are needed to find the per-cpu-refcount for the nfsd_serv). 232 - 233 - Details: 234 - 235 - - Embed nfs_uuid_t in nfs_client. nfs_uuid_t provides a list_head 236 - that can be used to find the client. It does add the 16-byte 237 - uuid_t to nfs_client so it is bigger than needed (given that 238 - uuid_t is only used during the initial NFS client and server 239 - LOCALIO handshake to determine if they are local to each other). 240 - If that is really a problem we can find a fix. 241 - 242 - - When the nfs server confirms that the uuid_t is local, it moves 243 - the nfs_uuid_t onto a per-net-ns list in NFSD's nfsd_net. 244 - 245 - - When each server's net-ns is shutting down - in a "pre_exit" 246 - handler, all these nfs_uuid_t have their ->net cleared. There is 247 - an rcu_synchronize() call between pre_exit() handlers and exit() 248 - handlers so any caller that sees nfs_uuid_t ->net as not NULL can 249 - safely manage the per-cpu-refcount for nfsd_serv. 250 - 251 - - The client's nfs_uuid_t is passed to nfsd_open_local_fh() so it 252 - can safely dereference ->net in a private rcu_read_lock() section 253 - to allow safe access to the associated nfsd_net and nfsd_serv. 254 - 255 - So LOCALIO required the introduction and use of NFSD's percpu_ref to 256 - interlock nfsd_destroy_serv() and nfsd_open_local_fh(), to ensure each 257 - nn->nfsd_serv is not destroyed while in use by nfsd_open_local_fh(), and 223 + LOCALIO required the introduction and use of NFSD's percpu nfsd_net_ref 224 + to interlock nfsd_shutdown_net() and nfsd_open_local_fh(), to ensure 225 + each net-ns is not destroyed while in use by nfsd_open_local_fh(), and 258 226 warrants a more detailed explanation: 259 227 260 - nfsd_open_local_fh() uses nfsd_serv_try_get() before opening its 228 + nfsd_open_local_fh() uses nfsd_net_try_get() before opening its 261 229 nfsd_file handle and then the caller (NFS client) must drop the 262 - reference for the nfsd_file and associated nn->nfsd_serv using 263 - nfs_file_put_local() once it has completed its IO. 230 + reference for the nfsd_file and associated net-ns using 231 + nfsd_file_put_local() once it has completed its IO. 264 232 265 233 This interlock working relies heavily on nfsd_open_local_fh() being 266 234 afforded the ability to safely deal with the possibility that the 267 235 NFSD's net-ns (and nfsd_net by association) may have been destroyed 268 - by nfsd_destroy_serv() via nfsd_shutdown_net() -- which is only 269 - possible given the nfs_uuid_t ->net pointer managemenet detailed 270 - above. 236 + by nfsd_destroy_serv() via nfsd_shutdown_net(). 271 237 272 - All told, this elaborate interlock of the NFS client and server has been 273 - verified to fix an easy to hit crash that would occur if an NFSD 274 - instance running in a container, with a LOCALIO client mounted, is 275 - shutdown. Upon restart of the container and associated NFSD the client 276 - would go on to crash due to NULL pointer dereference that occurred due 277 - to the LOCALIO client's attempting to nfsd_open_local_fh(), using 278 - nn->nfsd_serv, without having a proper reference on nn->nfsd_serv. 238 + This interlock of the NFS client and server has been verified to fix an 239 + easy to hit crash that would occur if an NFSD instance running in a 240 + container, with a LOCALIO client mounted, is shutdown. Upon restart of 241 + the container and associated NFSD, the client would go on to crash due 242 + to NULL pointer dereference that occurred due to the LOCALIO client's 243 + attempting to nfsd_open_local_fh() without having a proper reference on 244 + NFSD's net-ns. 279 245 280 246 NFS Client issues IO instead of Server 281 247 ====================================== ··· 272 306 the NFS server. See: fs/nfs/localio.c:nfs_local_doio() and 273 307 fs/nfs/localio.c:nfs_local_commit(). 274 308 309 + With normal NFS that makes use of RPC to issue IO to the server, if an 310 + application uses O_DIRECT the NFS client will bypass the pagecache but 311 + the NFS server will not. The NFS server's use of buffered IO affords 312 + applications to be less precise with their alignment when issuing IO to 313 + the NFS client. But if all applications properly align their IO, LOCALIO 314 + can be configured to use end-to-end O_DIRECT semantics from the NFS 315 + client to the underlying local filesystem, that it is sharing with 316 + the NFS server, by setting the 'localio_O_DIRECT_semantics' nfs module 317 + parameter to Y, e.g.: 318 + 319 + echo Y > /sys/module/nfs/parameters/localio_O_DIRECT_semantics 320 + 321 + Once enabled, it will cause LOCALIO to use end-to-end O_DIRECT semantics 322 + (but again, this may cause IO to fail if applications do not properly 323 + align their IO). 324 + 275 325 Security 276 326 ======== 277 327 278 - Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka 328 + LOCALIO is only supported when UNIX-style authentication (AUTH_UNIX, aka 279 329 AUTH_SYS) is used. 280 330 281 331 Care is taken to ensure the same NFS security mechanisms are used ··· 305 323 client is afforded this same level of access (albeit in terms of the NFS 306 324 protocol via SUNRPC). No other namespaces (user, mount, etc) have been 307 325 altered or purposely extended from the server to the client. 326 + 327 + Module Parameters 328 + ================= 329 + 330 + /sys/module/nfs/parameters/localio_enabled (bool) 331 + controls if LOCALIO is enabled, defaults to Y. If client and server are 332 + local but 'localio_enabled' is set to N then LOCALIO will not be used. 333 + 334 + /sys/module/nfs/parameters/localio_O_DIRECT_semantics (bool) 335 + controls if O_DIRECT extends down to the underlying filesystem, defaults 336 + to N. Application IO must be logical blocksize aligned, otherwise 337 + O_DIRECT will fail. 338 + 339 + /sys/module/nfsv3/parameters/nfs3_localio_probe_throttle (uint) 340 + controls if NFSv3 read and write IOs will trigger (re)enabling of 341 + LOCALIO every N (nfs3_localio_probe_throttle) IOs, defaults to 0 342 + (disabled). Must be power-of-2, admin keeps all the pieces if they 343 + misconfigure (too low a value or non-power-of-2). 308 344 309 345 Testing 310 346 =======
+2 -1
fs/nfs/Kconfig
··· 170 170 171 171 config NFS_FSCACHE 172 172 bool "Provide NFS client caching support" 173 - depends on NFS_FS=m && NETFS_SUPPORT || NFS_FS=y && NETFS_SUPPORT=y 173 + depends on NFS_FS 174 + select NETFS_SUPPORT 174 175 select FSCACHE 175 176 help 176 177 Say Y here if you want NFS data to be cached locally on disc through
+1 -1
fs/nfs/callback_proc.c
··· 718 718 719 719 copy = kzalloc(sizeof(struct nfs4_copy_state), GFP_KERNEL); 720 720 if (!copy) 721 - return htonl(NFS4ERR_SERVERFAULT); 721 + return cpu_to_be32(NFS4ERR_DELAY); 722 722 723 723 spin_lock(&cps->clp->cl_lock); 724 724 rcu_read_lock();
+3 -3
fs/nfs/client.c
··· 38 38 #include <linux/sunrpc/bc_xprt.h> 39 39 #include <linux/nsproxy.h> 40 40 #include <linux/pid_namespace.h> 41 - 41 + #include <linux/nfslocalio.h> 42 42 43 43 #include "nfs4_fs.h" 44 44 #include "callback.h" ··· 186 186 seqlock_init(&clp->cl_boot_lock); 187 187 ktime_get_real_ts64(&clp->cl_nfssvc_boot); 188 188 nfs_uuid_init(&clp->cl_uuid); 189 - spin_lock_init(&clp->cl_localio_lock); 189 + INIT_WORK(&clp->cl_local_probe_work, nfs_local_probe_async_work); 190 190 #endif /* CONFIG_NFS_LOCALIO */ 191 191 192 192 clp->cl_principal = "*"; ··· 244 244 */ 245 245 void nfs_free_client(struct nfs_client *clp) 246 246 { 247 - nfs_local_disable(clp); 247 + nfs_localio_disable_client(clp); 248 248 249 249 /* -EIO all pending I/O */ 250 250 if (!IS_ERR(clp->cl_rpcclient))
+1
fs/nfs/direct.c
··· 303 303 static void nfs_direct_pgio_init(struct nfs_pgio_header *hdr) 304 304 { 305 305 get_dreq(hdr->dreq); 306 + set_bit(NFS_IOHDR_ODIRECT, &hdr->flags); 306 307 } 307 308 308 309 static const struct nfs_pgio_completion_ops nfs_direct_read_completion_ops = {
+34 -18
fs/nfs/flexfilelayout/flexfilelayout.c
··· 164 164 } 165 165 166 166 static struct nfsd_file * 167 - ff_local_open_fh(struct nfs_client *clp, const struct cred *cred, 167 + ff_local_open_fh(struct pnfs_layout_segment *lseg, u32 ds_idx, 168 + struct nfs_client *clp, const struct cred *cred, 168 169 struct nfs_fh *fh, fmode_t mode) 169 170 { 170 - if (mode & FMODE_WRITE) { 171 - /* 172 - * Always request read and write access since this corresponds 173 - * to a rw layout. 174 - */ 175 - mode |= FMODE_READ; 176 - } 171 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 172 + struct nfs4_ff_layout_mirror *mirror = FF_LAYOUT_COMP(lseg, ds_idx); 177 173 178 - return nfs_local_open_fh(clp, cred, fh, mode); 174 + return nfs_local_open_fh(clp, cred, fh, &mirror->nfl, mode); 175 + #else 176 + return NULL; 177 + #endif 179 178 } 180 179 181 180 static bool ff_mirror_match_fh(const struct nfs4_ff_layout_mirror *m1, ··· 246 247 spin_lock_init(&mirror->lock); 247 248 refcount_set(&mirror->ref, 1); 248 249 INIT_LIST_HEAD(&mirror->mirrors); 250 + nfs_localio_file_init(&mirror->nfl); 249 251 } 250 252 return mirror; 251 253 } ··· 257 257 258 258 ff_layout_remove_mirror(mirror); 259 259 kfree(mirror->fh_versions); 260 + nfs_close_local_fh(&mirror->nfl); 260 261 cred = rcu_access_pointer(mirror->ro_cred); 261 262 put_cred(cred); 262 263 cred = rcu_access_pointer(mirror->rw_cred); ··· 848 847 struct nfs4_pnfs_ds *ds; 849 848 u32 ds_idx; 850 849 850 + if (NFS_SERVER(pgio->pg_inode)->flags & 851 + (NFS_MOUNT_SOFT|NFS_MOUNT_SOFTERR)) 852 + pgio->pg_maxretrans = io_maxretrans; 851 853 retry: 852 854 pnfs_generic_pg_check_layout(pgio, req); 853 855 /* Use full layout for now */ ··· 864 860 if (!pgio->pg_lseg) 865 861 goto out_nolseg; 866 862 } 863 + /* Reset wb_nio, since getting layout segment was successful */ 864 + req->wb_nio = 0; 867 865 868 866 ds = ff_layout_get_ds_for_read(pgio, &ds_idx); 869 867 if (!ds) { ··· 882 876 pgm->pg_bsize = mirror->mirror_ds->ds_versions[0].rsize; 883 877 884 878 pgio->pg_mirror_idx = ds_idx; 885 - 886 - if (NFS_SERVER(pgio->pg_inode)->flags & 887 - (NFS_MOUNT_SOFT|NFS_MOUNT_SOFTERR)) 888 - pgio->pg_maxretrans = io_maxretrans; 889 879 return; 890 880 out_nolseg: 891 - if (pgio->pg_error < 0) 892 - return; 881 + if (pgio->pg_error < 0) { 882 + if (pgio->pg_error != -EAGAIN) 883 + return; 884 + /* Retry getting layout segment if lower layer returned -EAGAIN */ 885 + if (pgio->pg_maxretrans && req->wb_nio++ > pgio->pg_maxretrans) { 886 + if (NFS_SERVER(pgio->pg_inode)->flags & NFS_MOUNT_SOFTERR) 887 + pgio->pg_error = -ETIMEDOUT; 888 + else 889 + pgio->pg_error = -EIO; 890 + return; 891 + } 892 + pgio->pg_error = 0; 893 + /* Sleep for 1 second before retrying */ 894 + ssleep(1); 895 + goto retry; 896 + } 893 897 out_mds: 894 898 trace_pnfs_mds_fallback_pg_init_read(pgio->pg_inode, 895 899 0, NFS4_MAX_UINT64, IOMODE_READ, ··· 1836 1820 hdr->mds_offset = offset; 1837 1821 1838 1822 /* Start IO accounting for local read */ 1839 - localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, FMODE_READ); 1823 + localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, FMODE_READ); 1840 1824 if (localio) { 1841 1825 hdr->task.tk_start = ktime_get(); 1842 1826 ff_layout_read_record_layoutstats_start(&hdr->task, hdr); ··· 1912 1896 hdr->args.offset = offset; 1913 1897 1914 1898 /* Start IO accounting for local write */ 1915 - localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, 1899 + localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, 1916 1900 FMODE_READ|FMODE_WRITE); 1917 1901 if (localio) { 1918 1902 hdr->task.tk_start = ktime_get(); ··· 1997 1981 data->args.fh = fh; 1998 1982 1999 1983 /* Start IO accounting for local commit */ 2000 - localio = ff_local_open_fh(ds->ds_clp, ds_cred, fh, 1984 + localio = ff_local_open_fh(lseg, idx, ds->ds_clp, ds_cred, fh, 2001 1985 FMODE_READ|FMODE_WRITE); 2002 1986 if (localio) { 2003 1987 data->task.tk_start = ktime_get();
+1
fs/nfs/flexfilelayout/flexfilelayout.h
··· 83 83 nfs4_stateid stateid; 84 84 const struct cred __rcu *ro_cred; 85 85 const struct cred __rcu *rw_cred; 86 + struct nfs_file_localio nfl; 86 87 refcount_t ref; 87 88 spinlock_t lock; 88 89 unsigned long flags;
+3
fs/nfs/inode.c
··· 1137 1137 ctx->lock_context.open_context = ctx; 1138 1138 INIT_LIST_HEAD(&ctx->list); 1139 1139 ctx->mdsthreshold = NULL; 1140 + nfs_localio_file_init(&ctx->nfl); 1141 + 1140 1142 return ctx; 1141 1143 } 1142 1144 EXPORT_SYMBOL_GPL(alloc_nfs_open_context); ··· 1170 1168 nfs_sb_deactive(sb); 1171 1169 put_rpccred(rcu_dereference_protected(ctx->ll_cred, 1)); 1172 1170 kfree(ctx->mdsthreshold); 1171 + nfs_close_local_fh(&ctx->nfl); 1173 1172 kfree_rcu(ctx, rcu_head); 1174 1173 } 1175 1174
+6 -3
fs/nfs/internal.h
··· 455 455 456 456 #if IS_ENABLED(CONFIG_NFS_LOCALIO) 457 457 /* localio.c */ 458 - extern void nfs_local_disable(struct nfs_client *); 459 458 extern void nfs_local_probe(struct nfs_client *); 459 + extern void nfs_local_probe_async(struct nfs_client *); 460 + extern void nfs_local_probe_async_work(struct work_struct *); 460 461 extern struct nfsd_file *nfs_local_open_fh(struct nfs_client *, 461 462 const struct cred *, 462 463 struct nfs_fh *, 464 + struct nfs_file_localio *, 463 465 const fmode_t); 464 466 extern int nfs_local_doio(struct nfs_client *, 465 467 struct nfsd_file *, ··· 473 471 extern bool nfs_server_is_local(const struct nfs_client *clp); 474 472 475 473 #else /* CONFIG_NFS_LOCALIO */ 476 - static inline void nfs_local_disable(struct nfs_client *clp) {} 477 474 static inline void nfs_local_probe(struct nfs_client *clp) {} 475 + static inline void nfs_local_probe_async(struct nfs_client *clp) {} 478 476 static inline struct nfsd_file * 479 477 nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred, 480 - struct nfs_fh *fh, const fmode_t mode) 478 + struct nfs_fh *fh, struct nfs_file_localio *nfl, 479 + const fmode_t mode) 481 480 { 482 481 return NULL; 483 482 }
+177 -59
fs/nfs/localio.c
··· 35 35 struct bio_vec *bvec; 36 36 struct nfs_pgio_header *hdr; 37 37 struct work_struct work; 38 + void (*aio_complete_work)(struct work_struct *); 38 39 struct nfsd_file *localio; 39 40 }; 40 41 ··· 49 48 static bool localio_enabled __read_mostly = true; 50 49 module_param(localio_enabled, bool, 0644); 51 50 51 + static bool localio_O_DIRECT_semantics __read_mostly = false; 52 + module_param(localio_O_DIRECT_semantics, bool, 0644); 53 + MODULE_PARM_DESC(localio_O_DIRECT_semantics, 54 + "LOCALIO will use O_DIRECT semantics to filesystem."); 55 + 52 56 static inline bool nfs_client_is_local(const struct nfs_client *clp) 53 57 { 54 - return !!test_bit(NFS_CS_LOCAL_IO, &clp->cl_flags); 58 + return !!rcu_access_pointer(clp->cl_uuid.net); 55 59 } 56 60 57 61 bool nfs_server_is_local(const struct nfs_client *clp) ··· 122 116 }; 123 117 124 118 /* 125 - * nfs_local_enable - enable local i/o for an nfs_client 126 - */ 127 - static void nfs_local_enable(struct nfs_client *clp) 128 - { 129 - spin_lock(&clp->cl_localio_lock); 130 - set_bit(NFS_CS_LOCAL_IO, &clp->cl_flags); 131 - trace_nfs_local_enable(clp); 132 - spin_unlock(&clp->cl_localio_lock); 133 - } 134 - 135 - /* 136 - * nfs_local_disable - disable local i/o for an nfs_client 137 - */ 138 - void nfs_local_disable(struct nfs_client *clp) 139 - { 140 - spin_lock(&clp->cl_localio_lock); 141 - if (test_and_clear_bit(NFS_CS_LOCAL_IO, &clp->cl_flags)) { 142 - trace_nfs_local_disable(clp); 143 - nfs_uuid_invalidate_one_client(&clp->cl_uuid); 144 - } 145 - spin_unlock(&clp->cl_localio_lock); 146 - } 147 - 148 - /* 149 119 * nfs_init_localioclient - Initialise an NFS localio client connection 150 120 */ 151 121 static struct rpc_clnt *nfs_init_localioclient(struct nfs_client *clp) ··· 160 178 rpc_shutdown_client(rpcclient_localio); 161 179 162 180 /* Server is only local if it initialized required struct members */ 163 - if (status || !clp->cl_uuid.net || !clp->cl_uuid.dom) 181 + if (status || !rcu_access_pointer(clp->cl_uuid.net) || !clp->cl_uuid.dom) 164 182 return false; 165 183 166 184 return true; ··· 176 194 /* Disallow localio if disabled via sysfs or AUTH_SYS isn't used */ 177 195 if (!localio_enabled || 178 196 clp->cl_rpcclient->cl_auth->au_flavor != RPC_AUTH_UNIX) { 179 - nfs_local_disable(clp); 197 + nfs_localio_disable_client(clp); 180 198 return; 181 199 } 182 200 183 201 if (nfs_client_is_local(clp)) { 184 202 /* If already enabled, disable and re-enable */ 185 - nfs_local_disable(clp); 203 + nfs_localio_disable_client(clp); 186 204 } 187 205 188 206 if (!nfs_uuid_begin(&clp->cl_uuid)) 189 207 return; 190 208 if (nfs_server_uuid_is_local(clp)) 191 - nfs_local_enable(clp); 209 + nfs_localio_enable_client(clp); 192 210 nfs_uuid_end(&clp->cl_uuid); 193 211 } 194 212 EXPORT_SYMBOL_GPL(nfs_local_probe); 195 213 214 + void nfs_local_probe_async_work(struct work_struct *work) 215 + { 216 + struct nfs_client *clp = 217 + container_of(work, struct nfs_client, cl_local_probe_work); 218 + 219 + nfs_local_probe(clp); 220 + } 221 + 222 + void nfs_local_probe_async(struct nfs_client *clp) 223 + { 224 + queue_work(nfsiod_workqueue, &clp->cl_local_probe_work); 225 + } 226 + EXPORT_SYMBOL_GPL(nfs_local_probe_async); 227 + 228 + static inline struct nfsd_file *nfs_local_file_get(struct nfsd_file *nf) 229 + { 230 + return nfs_to->nfsd_file_get(nf); 231 + } 232 + 233 + static inline void nfs_local_file_put(struct nfsd_file *nf) 234 + { 235 + nfs_to->nfsd_file_put(nf); 236 + } 237 + 196 238 /* 197 - * nfs_local_open_fh - open a local filehandle in terms of nfsd_file 239 + * __nfs_local_open_fh - open a local filehandle in terms of nfsd_file. 198 240 * 199 - * Returns a pointer to a struct nfsd_file or NULL 241 + * Returns a pointer to a struct nfsd_file or ERR_PTR. 242 + * Caller must release returned nfsd_file with nfs_to_nfsd_file_put_local(). 200 243 */ 201 - struct nfsd_file * 202 - nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred, 203 - struct nfs_fh *fh, const fmode_t mode) 244 + static struct nfsd_file * 245 + __nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred, 246 + struct nfs_fh *fh, struct nfs_file_localio *nfl, 247 + const fmode_t mode) 204 248 { 205 249 struct nfsd_file *localio; 206 - int status; 207 - 208 - if (!nfs_server_is_local(clp)) 209 - return NULL; 210 - if (mode & ~(FMODE_READ | FMODE_WRITE)) 211 - return NULL; 212 250 213 251 localio = nfs_open_local_fh(&clp->cl_uuid, clp->cl_rpcclient, 214 - cred, fh, mode); 252 + cred, fh, nfl, mode); 215 253 if (IS_ERR(localio)) { 216 - status = PTR_ERR(localio); 254 + int status = PTR_ERR(localio); 217 255 trace_nfs_local_open_fh(fh, mode, status); 218 256 switch (status) { 219 257 case -ENOMEM: ··· 242 240 /* Revalidate localio, will disable if unsupported */ 243 241 nfs_local_probe(clp); 244 242 } 245 - return NULL; 246 243 } 247 244 return localio; 245 + } 246 + 247 + /* 248 + * nfs_local_open_fh - open a local filehandle in terms of nfsd_file. 249 + * First checking if the open nfsd_file is already cached, otherwise 250 + * must __nfs_local_open_fh and insert the nfsd_file in nfs_file_localio. 251 + * 252 + * Returns a pointer to a struct nfsd_file or NULL. 253 + */ 254 + struct nfsd_file * 255 + nfs_local_open_fh(struct nfs_client *clp, const struct cred *cred, 256 + struct nfs_fh *fh, struct nfs_file_localio *nfl, 257 + const fmode_t mode) 258 + { 259 + struct nfsd_file *nf, *new, __rcu **pnf; 260 + 261 + if (!nfs_server_is_local(clp)) 262 + return NULL; 263 + if (mode & ~(FMODE_READ | FMODE_WRITE)) 264 + return NULL; 265 + 266 + if (mode & FMODE_WRITE) 267 + pnf = &nfl->rw_file; 268 + else 269 + pnf = &nfl->ro_file; 270 + 271 + new = NULL; 272 + rcu_read_lock(); 273 + nf = rcu_dereference(*pnf); 274 + if (!nf) { 275 + rcu_read_unlock(); 276 + new = __nfs_local_open_fh(clp, cred, fh, nfl, mode); 277 + if (IS_ERR(new)) 278 + return NULL; 279 + /* try to swap in the pointer */ 280 + spin_lock(&clp->cl_uuid.lock); 281 + nf = rcu_dereference_protected(*pnf, 1); 282 + if (!nf) { 283 + nf = new; 284 + new = NULL; 285 + rcu_assign_pointer(*pnf, nf); 286 + } 287 + spin_unlock(&clp->cl_uuid.lock); 288 + rcu_read_lock(); 289 + } 290 + nf = nfs_local_file_get(nf); 291 + rcu_read_unlock(); 292 + if (new) 293 + nfs_to_nfsd_file_put_local(new); 294 + return nf; 248 295 } 249 296 EXPORT_SYMBOL_GPL(nfs_local_open_fh); 250 297 ··· 336 285 kfree(iocb); 337 286 return NULL; 338 287 } 339 - init_sync_kiocb(&iocb->kiocb, file); 288 + 289 + if (localio_O_DIRECT_semantics && 290 + test_bit(NFS_IOHDR_ODIRECT, &hdr->flags)) { 291 + iocb->kiocb.ki_filp = file; 292 + iocb->kiocb.ki_flags = IOCB_DIRECT; 293 + } else 294 + init_sync_kiocb(&iocb->kiocb, file); 295 + 340 296 iocb->kiocb.ki_pos = hdr->args.offset; 341 297 iocb->hdr = hdr; 342 298 iocb->kiocb.ki_flags &= ~IOCB_APPEND; 299 + iocb->aio_complete_work = NULL; 300 + 343 301 return iocb; 344 302 } 345 303 ··· 388 328 hdr->res.op_status = NFS4_OK; 389 329 hdr->task.tk_status = 0; 390 330 } else { 391 - hdr->res.op_status = nfs4_stat_to_errno(status); 331 + hdr->res.op_status = nfs_localio_errno_to_nfs4_stat(status); 392 332 hdr->task.tk_status = status; 393 333 } 394 334 } ··· 398 338 { 399 339 struct nfs_pgio_header *hdr = iocb->hdr; 400 340 401 - nfs_to_nfsd_file_put_local(iocb->localio); 341 + nfs_local_file_put(iocb->localio); 402 342 nfs_local_iocb_free(iocb); 403 343 nfs_local_hdr_release(hdr, hdr->task.tk_ops); 344 + } 345 + 346 + /* 347 + * Complete the I/O from iocb->kiocb.ki_complete() 348 + * 349 + * Note that this function can be called from a bottom half context, 350 + * hence we need to queue the rpc_call_done() etc to a workqueue 351 + */ 352 + static inline void nfs_local_pgio_aio_complete(struct nfs_local_kiocb *iocb) 353 + { 354 + INIT_WORK(&iocb->work, iocb->aio_complete_work); 355 + queue_work(nfsiod_workqueue, &iocb->work); 404 356 } 405 357 406 358 static void ··· 437 365 status > 0 ? status : 0, hdr->res.eof); 438 366 } 439 367 368 + static void nfs_local_read_aio_complete_work(struct work_struct *work) 369 + { 370 + struct nfs_local_kiocb *iocb = 371 + container_of(work, struct nfs_local_kiocb, work); 372 + 373 + nfs_local_pgio_release(iocb); 374 + } 375 + 376 + static void nfs_local_read_aio_complete(struct kiocb *kiocb, long ret) 377 + { 378 + struct nfs_local_kiocb *iocb = 379 + container_of(kiocb, struct nfs_local_kiocb, kiocb); 380 + 381 + nfs_local_read_done(iocb, ret); 382 + nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_read_aio_complete_work */ 383 + } 384 + 440 385 static void nfs_local_call_read(struct work_struct *work) 441 386 { 442 387 struct nfs_local_kiocb *iocb = ··· 468 379 nfs_local_iter_init(&iter, iocb, READ); 469 380 470 381 status = filp->f_op->read_iter(&iocb->kiocb, &iter); 471 - WARN_ON_ONCE(status == -EIOCBQUEUED); 472 - 473 - nfs_local_read_done(iocb, status); 474 - nfs_local_pgio_release(iocb); 382 + if (status != -EIOCBQUEUED) { 383 + nfs_local_read_done(iocb, status); 384 + nfs_local_pgio_release(iocb); 385 + } 475 386 476 387 revert_creds(save_cred); 477 388 } ··· 498 409 499 410 nfs_local_pgio_init(hdr, call_ops); 500 411 hdr->res.eof = false; 412 + 413 + if (iocb->kiocb.ki_flags & IOCB_DIRECT) { 414 + iocb->kiocb.ki_complete = nfs_local_read_aio_complete; 415 + iocb->aio_complete_work = nfs_local_read_aio_complete_work; 416 + } 501 417 502 418 INIT_WORK(&iocb->work, nfs_local_call_read); 503 419 queue_work(nfslocaliod_workqueue, &iocb->work); ··· 628 534 nfs_local_pgio_done(hdr, status); 629 535 } 630 536 537 + static void nfs_local_write_aio_complete_work(struct work_struct *work) 538 + { 539 + struct nfs_local_kiocb *iocb = 540 + container_of(work, struct nfs_local_kiocb, work); 541 + 542 + nfs_local_vfs_getattr(iocb); 543 + nfs_local_pgio_release(iocb); 544 + } 545 + 546 + static void nfs_local_write_aio_complete(struct kiocb *kiocb, long ret) 547 + { 548 + struct nfs_local_kiocb *iocb = 549 + container_of(kiocb, struct nfs_local_kiocb, kiocb); 550 + 551 + nfs_local_write_done(iocb, ret); 552 + nfs_local_pgio_aio_complete(iocb); /* Calls nfs_local_write_aio_complete_work */ 553 + } 554 + 631 555 static void nfs_local_call_write(struct work_struct *work) 632 556 { 633 557 struct nfs_local_kiocb *iocb = ··· 664 552 file_start_write(filp); 665 553 status = filp->f_op->write_iter(&iocb->kiocb, &iter); 666 554 file_end_write(filp); 667 - WARN_ON_ONCE(status == -EIOCBQUEUED); 668 - 669 - nfs_local_write_done(iocb, status); 670 - nfs_local_vfs_getattr(iocb); 671 - nfs_local_pgio_release(iocb); 555 + if (status != -EIOCBQUEUED) { 556 + nfs_local_write_done(iocb, status); 557 + nfs_local_vfs_getattr(iocb); 558 + nfs_local_pgio_release(iocb); 559 + } 672 560 673 561 revert_creds(save_cred); 674 562 current->flags = old_flags; ··· 704 592 case NFS_FILE_SYNC: 705 593 iocb->kiocb.ki_flags |= IOCB_DSYNC|IOCB_SYNC; 706 594 } 595 + 707 596 nfs_local_pgio_init(hdr, call_ops); 708 597 709 598 nfs_set_local_verifier(hdr->inode, hdr->res.verf, hdr->args.stable); 599 + 600 + if (iocb->kiocb.ki_flags & IOCB_DIRECT) { 601 + iocb->kiocb.ki_complete = nfs_local_write_aio_complete; 602 + iocb->aio_complete_work = nfs_local_write_aio_complete_work; 603 + } 710 604 711 605 INIT_WORK(&iocb->work, nfs_local_call_write); 712 606 queue_work(nfslocaliod_workqueue, &iocb->work); ··· 744 626 745 627 if (status != 0) { 746 628 if (status == -EAGAIN) 747 - nfs_local_disable(clp); 748 - nfs_to_nfsd_file_put_local(localio); 629 + nfs_localio_disable_client(clp); 630 + nfs_local_file_put(localio); 749 631 hdr->task.tk_status = status; 750 632 nfs_local_hdr_release(hdr, call_ops); 751 633 } ··· 786 668 data->task.tk_status = 0; 787 669 } else { 788 670 nfs_reset_boot_verifier(data->inode); 789 - data->res.op_status = nfs4_stat_to_errno(status); 671 + data->res.op_status = nfs_localio_errno_to_nfs4_stat(status); 790 672 data->task.tk_status = status; 791 673 } 792 674 } ··· 796 678 struct nfs_commit_data *data, 797 679 const struct rpc_call_ops *call_ops) 798 680 { 799 - nfs_to_nfsd_file_put_local(localio); 681 + nfs_local_file_put(localio); 800 682 call_ops->rpc_call_done(&data->task, data); 801 683 call_ops->rpc_release(data); 802 684 }
+43 -3
fs/nfs/nfs3proc.c
··· 844 844 return status; 845 845 } 846 846 847 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 848 + 849 + static unsigned nfs3_localio_probe_throttle __read_mostly = 0; 850 + module_param(nfs3_localio_probe_throttle, uint, 0644); 851 + MODULE_PARM_DESC(nfs3_localio_probe_throttle, 852 + "Probe for NFSv3 LOCALIO every N IO requests. Must be power-of-2, defaults to 0 (probing disabled)."); 853 + 854 + static void nfs3_localio_probe(struct nfs_server *server) 855 + { 856 + struct nfs_client *clp = server->nfs_client; 857 + 858 + /* Throttled to reduce nfs_local_probe_async() frequency */ 859 + if (!nfs3_localio_probe_throttle || nfs_server_is_local(clp)) 860 + return; 861 + 862 + /* 863 + * Try (re)enabling LOCALIO if isn't enabled -- admin deems 864 + * it worthwhile to periodically check if LOCALIO possible by 865 + * setting the 'nfs3_localio_probe_throttle' module parameter. 866 + * 867 + * This is useful if LOCALIO was previously enabled, but was 868 + * disabled due to server restart, and IO has successfully 869 + * completed in terms of normal RPC. 870 + */ 871 + if ((clp->cl_uuid.nfs3_localio_probe_count++ & 872 + (nfs3_localio_probe_throttle - 1)) == 0) { 873 + if (!nfs_server_is_local(clp)) 874 + nfs_local_probe_async(clp); 875 + } 876 + } 877 + 878 + #else 879 + static void nfs3_localio_probe(struct nfs_server *server) {} 880 + #endif 881 + 847 882 static int nfs3_read_done(struct rpc_task *task, struct nfs_pgio_header *hdr) 848 883 { 849 884 struct inode *inode = hdr->inode; ··· 890 855 if (nfs3_async_handle_jukebox(task, inode)) 891 856 return -EAGAIN; 892 857 893 - if (task->tk_status >= 0 && !server->read_hdrsize) 894 - cmpxchg(&server->read_hdrsize, 0, hdr->res.replen); 858 + if (task->tk_status >= 0) { 859 + if (!server->read_hdrsize) 860 + cmpxchg(&server->read_hdrsize, 0, hdr->res.replen); 861 + nfs3_localio_probe(server); 862 + } 895 863 896 864 nfs_invalidate_atime(inode); 897 865 nfs_refresh_inode(inode, &hdr->fattr); ··· 924 886 925 887 if (nfs3_async_handle_jukebox(task, inode)) 926 888 return -EAGAIN; 927 - if (task->tk_status >= 0) 889 + if (task->tk_status >= 0) { 928 890 nfs_writeback_update_inode(hdr); 891 + nfs3_localio_probe(NFS_SERVER(inode)); 892 + } 929 893 return 0; 930 894 } 931 895
+12 -12
fs/nfs/nfs42proc.c
··· 498 498 return err; 499 499 } 500 500 501 - struct nfs42_offloadcancel_data { 501 + struct nfs42_offload_data { 502 502 struct nfs_server *seq_server; 503 503 struct nfs42_offload_status_args args; 504 504 struct nfs42_offload_status_res res; 505 505 }; 506 506 507 - static void nfs42_offload_cancel_prepare(struct rpc_task *task, void *calldata) 507 + static void nfs42_offload_prepare(struct rpc_task *task, void *calldata) 508 508 { 509 - struct nfs42_offloadcancel_data *data = calldata; 509 + struct nfs42_offload_data *data = calldata; 510 510 511 511 nfs4_setup_sequence(data->seq_server->nfs_client, 512 512 &data->args.osa_seq_args, ··· 515 515 516 516 static void nfs42_offload_cancel_done(struct rpc_task *task, void *calldata) 517 517 { 518 - struct nfs42_offloadcancel_data *data = calldata; 518 + struct nfs42_offload_data *data = calldata; 519 519 520 520 trace_nfs4_offload_cancel(&data->args, task->tk_status); 521 521 nfs41_sequence_done(task, &data->res.osr_seq_res); ··· 525 525 rpc_restart_call_prepare(task); 526 526 } 527 527 528 - static void nfs42_free_offloadcancel_data(void *data) 528 + static void nfs42_offload_release(void *data) 529 529 { 530 530 kfree(data); 531 531 } 532 532 533 533 static const struct rpc_call_ops nfs42_offload_cancel_ops = { 534 - .rpc_call_prepare = nfs42_offload_cancel_prepare, 534 + .rpc_call_prepare = nfs42_offload_prepare, 535 535 .rpc_call_done = nfs42_offload_cancel_done, 536 - .rpc_release = nfs42_free_offloadcancel_data, 536 + .rpc_release = nfs42_offload_release, 537 537 }; 538 538 539 539 static int nfs42_do_offload_cancel_async(struct file *dst, 540 540 nfs4_stateid *stateid) 541 541 { 542 542 struct nfs_server *dst_server = NFS_SERVER(file_inode(dst)); 543 - struct nfs42_offloadcancel_data *data = NULL; 543 + struct nfs42_offload_data *data = NULL; 544 544 struct nfs_open_context *ctx = nfs_file_open_context(dst); 545 545 struct rpc_task *task; 546 546 struct rpc_message msg = { ··· 552 552 .rpc_message = &msg, 553 553 .callback_ops = &nfs42_offload_cancel_ops, 554 554 .workqueue = nfsiod_workqueue, 555 - .flags = RPC_TASK_ASYNC, 555 + .flags = RPC_TASK_ASYNC | RPC_TASK_MOVEABLE, 556 556 }; 557 557 int status; 558 558 559 559 if (!(dst_server->caps & NFS_CAP_OFFLOAD_CANCEL)) 560 560 return -EOPNOTSUPP; 561 561 562 - data = kzalloc(sizeof(struct nfs42_offloadcancel_data), GFP_KERNEL); 562 + data = kzalloc(sizeof(struct nfs42_offload_data), GFP_KERNEL); 563 563 if (data == NULL) 564 564 return -ENOMEM; 565 565 ··· 861 861 .rpc_message = &msg, 862 862 .callback_ops = &nfs42_layoutstat_ops, 863 863 .callback_data = data, 864 - .flags = RPC_TASK_ASYNC, 864 + .flags = RPC_TASK_ASYNC | RPC_TASK_MOVEABLE, 865 865 }; 866 866 struct rpc_task *task; 867 867 ··· 1016 1016 struct rpc_task_setup task_setup = { 1017 1017 .rpc_message = &msg, 1018 1018 .callback_ops = &nfs42_layouterror_ops, 1019 - .flags = RPC_TASK_ASYNC, 1019 + .flags = RPC_TASK_ASYNC | RPC_TASK_MOVEABLE, 1020 1020 }; 1021 1021 unsigned int i; 1022 1022
+3 -1
fs/nfs/nfs42xdr.c
··· 144 144 decode_putfh_maxsz + \ 145 145 decode_offload_cancel_maxsz) 146 146 #define NFS4_enc_copy_notify_sz (compound_encode_hdr_maxsz + \ 147 + encode_sequence_maxsz + \ 147 148 encode_putfh_maxsz + \ 148 149 encode_copy_notify_maxsz) 149 150 #define NFS4_dec_copy_notify_sz (compound_decode_hdr_maxsz + \ 151 + decode_sequence_maxsz + \ 150 152 decode_putfh_maxsz + \ 151 153 decode_copy_notify_maxsz) 152 154 #define NFS4_enc_deallocate_sz (compound_encode_hdr_maxsz + \ ··· 551 549 } 552 550 553 551 /* 554 - * Encode OFFLOAD_CANEL request 552 + * Encode OFFLOAD_CANCEL request 555 553 */ 556 554 static void nfs4_xdr_enc_offload_cancel(struct rpc_rqst *req, 557 555 struct xdr_stream *xdr,
+1
fs/nfs/nfs4state.c
··· 1955 1955 } 1956 1956 rcu_read_unlock(); 1957 1957 nfs4_free_state_owners(&freeme); 1958 + nfs_local_probe_async(clp); 1958 1959 if (lost_locks) 1959 1960 pr_warn("NFS: %s: lost %d locks\n", 1960 1961 clp->cl_hostname, lost_locks);
-32
fs/nfs/nfstrace.h
··· 1714 1714 ) 1715 1715 ); 1716 1716 1717 - DECLARE_EVENT_CLASS(nfs_local_client_event, 1718 - TP_PROTO( 1719 - const struct nfs_client *clp 1720 - ), 1721 - 1722 - TP_ARGS(clp), 1723 - 1724 - TP_STRUCT__entry( 1725 - __field(unsigned int, protocol) 1726 - __string(server, clp->cl_hostname) 1727 - ), 1728 - 1729 - TP_fast_assign( 1730 - __entry->protocol = clp->rpc_ops->version; 1731 - __assign_str(server); 1732 - ), 1733 - 1734 - TP_printk( 1735 - "server=%s NFSv%u", __get_str(server), __entry->protocol 1736 - ) 1737 - ); 1738 - 1739 - #define DEFINE_NFS_LOCAL_CLIENT_EVENT(name) \ 1740 - DEFINE_EVENT(nfs_local_client_event, name, \ 1741 - TP_PROTO( \ 1742 - const struct nfs_client *clp \ 1743 - ), \ 1744 - TP_ARGS(clp)) 1745 - 1746 - DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_enable); 1747 - DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_local_disable); 1748 - 1749 1717 DECLARE_EVENT_CLASS(nfs_xdr_event, 1750 1718 TP_PROTO( 1751 1719 const struct xdr_stream *xdr,
+3 -2
fs/nfs/pagelist.c
··· 961 961 struct nfs_client *clp = NFS_SERVER(hdr->inode)->nfs_client; 962 962 963 963 struct nfsd_file *localio = 964 - nfs_local_open_fh(clp, hdr->cred, 965 - hdr->args.fh, hdr->args.context->mode); 964 + nfs_local_open_fh(clp, hdr->cred, hdr->args.fh, 965 + &hdr->args.context->nfl, 966 + hdr->args.context->mode); 966 967 967 968 if (NFS_SERVER(hdr->inode)->nfs_client->cl_minorversion) 968 969 task_flags = RPC_TASK_MOVEABLE;
+3 -3
fs/nfs/sysfs.c
··· 280 280 char name[RPC_CLIENT_NAME_SIZE]; 281 281 int ret; 282 282 283 - strcpy(name, clnt->cl_program->name); 284 - strcat(name, uniq ? uniq : ""); 285 - strcat(name, "_client"); 283 + strscpy(name, clnt->cl_program->name, sizeof(name)); 284 + strncat(name, uniq ? uniq : "", sizeof(name) - strlen(name) - 1); 285 + strncat(name, "_client", sizeof(name) - strlen(name) - 1); 286 286 287 287 ret = sysfs_create_link_nowarn(&server->kobj, 288 288 &clnt->cl_sysfs->kobject, name);
+2 -1
fs/nfs/write.c
··· 1826 1826 task_flags = RPC_TASK_MOVEABLE; 1827 1827 1828 1828 localio = nfs_local_open_fh(NFS_SERVER(inode)->nfs_client, data->cred, 1829 - data->args.fh, data->context->mode); 1829 + data->args.fh, &data->context->nfl, 1830 + data->context->mode); 1830 1831 return nfs_initiate_commit(NFS_CLIENT(inode), data, NFS_PROTO(inode), 1831 1832 data->mds_ops, how, 1832 1833 RPC_TASK_CRED_NOREF | task_flags, localio);
+2 -1
fs/nfs_common/Makefile
··· 6 6 obj-$(CONFIG_NFS_ACL_SUPPORT) += nfs_acl.o 7 7 nfs_acl-objs := nfsacl.o 8 8 9 + CFLAGS_localio_trace.o += -I$(src) 9 10 obj-$(CONFIG_NFS_COMMON_LOCALIO_SUPPORT) += nfs_localio.o 10 - nfs_localio-objs := nfslocalio.o 11 + nfs_localio-objs := nfslocalio.o localio_trace.o 11 12 12 13 obj-$(CONFIG_GRACE_PERIOD) += grace.o 13 14 obj-$(CONFIG_NFS_V4_2_SSC_HELPER) += nfs_ssc.o
+78 -11
fs/nfs_common/common.c
··· 15 15 { NFS_OK, 0 }, 16 16 { NFSERR_PERM, -EPERM }, 17 17 { NFSERR_NOENT, -ENOENT }, 18 - { NFSERR_IO, -errno_NFSERR_IO}, 18 + { NFSERR_IO, -EIO }, 19 19 { NFSERR_NXIO, -ENXIO }, 20 20 /* { NFSERR_EAGAIN, -EAGAIN }, */ 21 21 { NFSERR_ACCES, -EACCES }, ··· 45 45 { NFSERR_SERVERFAULT, -EREMOTEIO }, 46 46 { NFSERR_BADTYPE, -EBADTYPE }, 47 47 { NFSERR_JUKEBOX, -EJUKEBOX }, 48 - { -1, -EIO } 49 48 }; 50 49 51 50 /** ··· 58 59 { 59 60 int i; 60 61 61 - for (i = 0; nfs_errtbl[i].stat != -1; i++) { 62 + for (i = 0; i < ARRAY_SIZE(nfs_errtbl); i++) { 62 63 if (nfs_errtbl[i].stat == (int)status) 63 64 return nfs_errtbl[i].errno; 64 65 } 65 - return nfs_errtbl[i].errno; 66 + return -EIO; 66 67 } 67 68 EXPORT_SYMBOL_GPL(nfs_stat_to_errno); 68 69 69 70 /* 70 71 * We need to translate between nfs v4 status return values and 71 72 * the local errno values which may not be the same. 73 + * 74 + * nfs4_errtbl_common[] is used before more specialized mappings 75 + * available in nfs4_errtbl[] or nfs4_errtbl_localio[]. 72 76 */ 73 77 static const struct { 74 78 int stat; 75 79 int errno; 76 - } nfs4_errtbl[] = { 80 + } nfs4_errtbl_common[] = { 77 81 { NFS4_OK, 0 }, 78 82 { NFS4ERR_PERM, -EPERM }, 79 83 { NFS4ERR_NOENT, -ENOENT }, 80 - { NFS4ERR_IO, -errno_NFSERR_IO}, 84 + { NFS4ERR_IO, -EIO }, 81 85 { NFS4ERR_NXIO, -ENXIO }, 82 86 { NFS4ERR_ACCESS, -EACCES }, 83 87 { NFS4ERR_EXIST, -EEXIST }, ··· 100 98 { NFS4ERR_BAD_COOKIE, -EBADCOOKIE }, 101 99 { NFS4ERR_NOTSUPP, -ENOTSUPP }, 102 100 { NFS4ERR_TOOSMALL, -ETOOSMALL }, 103 - { NFS4ERR_SERVERFAULT, -EREMOTEIO }, 104 101 { NFS4ERR_BADTYPE, -EBADTYPE }, 105 - { NFS4ERR_LOCKED, -EAGAIN }, 106 102 { NFS4ERR_SYMLINK, -ELOOP }, 107 - { NFS4ERR_OP_ILLEGAL, -EOPNOTSUPP }, 108 103 { NFS4ERR_DEADLOCK, -EDEADLK }, 104 + }; 105 + 106 + static const struct { 107 + int stat; 108 + int errno; 109 + } nfs4_errtbl[] = { 110 + { NFS4ERR_SERVERFAULT, -EREMOTEIO }, 111 + { NFS4ERR_LOCKED, -EAGAIN }, 112 + { NFS4ERR_OP_ILLEGAL, -EOPNOTSUPP }, 109 113 { NFS4ERR_NOXATTR, -ENODATA }, 110 114 { NFS4ERR_XATTR2BIG, -E2BIG }, 111 - { -1, -EIO } 112 115 }; 113 116 114 117 /* ··· 123 116 int nfs4_stat_to_errno(int stat) 124 117 { 125 118 int i; 126 - for (i = 0; nfs4_errtbl[i].stat != -1; i++) { 119 + 120 + /* First check nfs4_errtbl_common */ 121 + for (i = 0; i < ARRAY_SIZE(nfs4_errtbl_common); i++) { 122 + if (nfs4_errtbl_common[i].stat == stat) 123 + return nfs4_errtbl_common[i].errno; 124 + } 125 + /* Then check nfs4_errtbl */ 126 + for (i = 0; i < ARRAY_SIZE(nfs4_errtbl); i++) { 127 127 if (nfs4_errtbl[i].stat == stat) 128 128 return nfs4_errtbl[i].errno; 129 129 } ··· 146 132 return -stat; 147 133 } 148 134 EXPORT_SYMBOL_GPL(nfs4_stat_to_errno); 135 + 136 + /* 137 + * This table is useful for conversion from local errno to NFS error. 138 + * It provides more logically correct mappings for use with LOCALIO 139 + * (which is focused on converting from errno to NFS status). 140 + */ 141 + static const struct { 142 + int stat; 143 + int errno; 144 + } nfs4_errtbl_localio[] = { 145 + /* Map errors differently than nfs4_errtbl */ 146 + { NFS4ERR_IO, -EREMOTEIO }, 147 + { NFS4ERR_DELAY, -EAGAIN }, 148 + { NFS4ERR_FBIG, -E2BIG }, 149 + /* Map errors not handled by nfs4_errtbl */ 150 + { NFS4ERR_STALE, -EBADF }, 151 + { NFS4ERR_STALE, -EOPENSTALE }, 152 + { NFS4ERR_DELAY, -ETIMEDOUT }, 153 + { NFS4ERR_DELAY, -ERESTARTSYS }, 154 + { NFS4ERR_DELAY, -ENOMEM }, 155 + { NFS4ERR_IO, -ETXTBSY }, 156 + { NFS4ERR_IO, -EBUSY }, 157 + { NFS4ERR_SERVERFAULT, -ESERVERFAULT }, 158 + { NFS4ERR_SERVERFAULT, -ENFILE }, 159 + { NFS4ERR_IO, -EUCLEAN }, 160 + { NFS4ERR_PERM, -ENOKEY }, 161 + }; 162 + 163 + /* 164 + * Convert an errno to an NFS error code for LOCALIO. 165 + */ 166 + __u32 nfs_localio_errno_to_nfs4_stat(int errno) 167 + { 168 + int i; 169 + 170 + /* First check nfs4_errtbl_common */ 171 + for (i = 0; i < ARRAY_SIZE(nfs4_errtbl_common); i++) { 172 + if (nfs4_errtbl_common[i].errno == errno) 173 + return nfs4_errtbl_common[i].stat; 174 + } 175 + /* Then check nfs4_errtbl_localio */ 176 + for (i = 0; i < ARRAY_SIZE(nfs4_errtbl_localio); i++) { 177 + if (nfs4_errtbl_localio[i].errno == errno) 178 + return nfs4_errtbl_localio[i].stat; 179 + } 180 + /* If we cannot translate the error, the recovery routines should 181 + * handle it. 182 + * Note: remaining NFSv4 error codes have values > 10000, so should 183 + * not conflict with native Linux error codes. 184 + */ 185 + return NFS4ERR_SERVERFAULT; 186 + } 187 + EXPORT_SYMBOL_GPL(nfs_localio_errno_to_nfs4_stat);
+10
fs/nfs_common/localio_trace.c
··· 1 + // SPDX-License-Identifier: GPL-2.0 2 + /* 3 + * Copyright (c) 2024 Trond Myklebust <trond.myklebust@hammerspace.com> 4 + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com> 5 + */ 6 + #include <linux/nfs_fs.h> 7 + #include <linux/namei.h> 8 + 9 + #define CREATE_TRACE_POINTS 10 + #include "localio_trace.h"
+56
fs/nfs_common/localio_trace.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + /* 3 + * Copyright (c) 2024 Trond Myklebust <trond.myklebust@hammerspace.com> 4 + * Copyright (C) 2024 Mike Snitzer <snitzer@hammerspace.com> 5 + */ 6 + #undef TRACE_SYSTEM 7 + #define TRACE_SYSTEM nfs_localio 8 + 9 + #if !defined(_TRACE_NFS_COMMON_LOCALIO_H) || defined(TRACE_HEADER_MULTI_READ) 10 + #define _TRACE_NFS_COMMON_LOCALIO_H 11 + 12 + #include <linux/tracepoint.h> 13 + 14 + #include <trace/misc/fs.h> 15 + #include <trace/misc/nfs.h> 16 + #include <trace/misc/sunrpc.h> 17 + 18 + DECLARE_EVENT_CLASS(nfs_local_client_event, 19 + TP_PROTO( 20 + const struct nfs_client *clp 21 + ), 22 + 23 + TP_ARGS(clp), 24 + 25 + TP_STRUCT__entry( 26 + __field(unsigned int, protocol) 27 + __string(server, clp->cl_hostname) 28 + ), 29 + 30 + TP_fast_assign( 31 + __entry->protocol = clp->rpc_ops->version; 32 + __assign_str(server); 33 + ), 34 + 35 + TP_printk( 36 + "server=%s NFSv%u", __get_str(server), __entry->protocol 37 + ) 38 + ); 39 + 40 + #define DEFINE_NFS_LOCAL_CLIENT_EVENT(name) \ 41 + DEFINE_EVENT(nfs_local_client_event, name, \ 42 + TP_PROTO( \ 43 + const struct nfs_client *clp \ 44 + ), \ 45 + TP_ARGS(clp)) 46 + 47 + DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_localio_enable_client); 48 + DEFINE_NFS_LOCAL_CLIENT_EVENT(nfs_localio_disable_client); 49 + 50 + #endif /* _TRACE_NFS_COMMON_LOCALIO_H */ 51 + 52 + #undef TRACE_INCLUDE_PATH 53 + #define TRACE_INCLUDE_PATH . 54 + #define TRACE_INCLUDE_FILE localio_trace 55 + /* This part must be outside protection */ 56 + #include <trace/define_trace.h>
+203 -55
fs/nfs_common/nfslocalio.c
··· 7 7 #include <linux/module.h> 8 8 #include <linux/list.h> 9 9 #include <linux/nfslocalio.h> 10 + #include <linux/nfs3.h> 11 + #include <linux/nfs4.h> 12 + #include <linux/nfs_fs.h> 10 13 #include <net/netns/generic.h> 14 + 15 + #include "localio_trace.h" 11 16 12 17 MODULE_LICENSE("GPL"); 13 18 MODULE_DESCRIPTION("NFS localio protocol bypass support"); 14 19 15 - static DEFINE_SPINLOCK(nfs_uuid_lock); 20 + static DEFINE_SPINLOCK(nfs_uuids_lock); 16 21 17 22 /* 18 23 * Global list of nfs_uuid_t instances 19 - * that is protected by nfs_uuid_lock. 24 + * that is protected by nfs_uuids_lock. 20 25 */ 21 26 static LIST_HEAD(nfs_uuids); 22 27 28 + /* 29 + * Lock ordering: 30 + * 1: nfs_uuid->lock 31 + * 2: nfs_uuids_lock 32 + * 3: nfs_uuid->list_lock (aka nn->local_clients_lock) 33 + * 34 + * May skip locks in select cases, but never hold multiple 35 + * locks out of order. 36 + */ 37 + 23 38 void nfs_uuid_init(nfs_uuid_t *nfs_uuid) 24 39 { 25 - nfs_uuid->net = NULL; 40 + RCU_INIT_POINTER(nfs_uuid->net, NULL); 26 41 nfs_uuid->dom = NULL; 42 + nfs_uuid->list_lock = NULL; 27 43 INIT_LIST_HEAD(&nfs_uuid->list); 44 + INIT_LIST_HEAD(&nfs_uuid->files); 45 + spin_lock_init(&nfs_uuid->lock); 46 + nfs_uuid->nfs3_localio_probe_count = 0; 28 47 } 29 48 EXPORT_SYMBOL_GPL(nfs_uuid_init); 30 49 31 50 bool nfs_uuid_begin(nfs_uuid_t *nfs_uuid) 32 51 { 33 - spin_lock(&nfs_uuid_lock); 34 - /* Is this nfs_uuid already in use? */ 35 - if (!list_empty(&nfs_uuid->list)) { 36 - spin_unlock(&nfs_uuid_lock); 52 + spin_lock(&nfs_uuid->lock); 53 + if (rcu_access_pointer(nfs_uuid->net)) { 54 + /* This nfs_uuid is already in use */ 55 + spin_unlock(&nfs_uuid->lock); 37 56 return false; 38 57 } 39 - uuid_gen(&nfs_uuid->uuid); 58 + 59 + spin_lock(&nfs_uuids_lock); 60 + if (!list_empty(&nfs_uuid->list)) { 61 + /* This nfs_uuid is already in use */ 62 + spin_unlock(&nfs_uuids_lock); 63 + spin_unlock(&nfs_uuid->lock); 64 + return false; 65 + } 40 66 list_add_tail(&nfs_uuid->list, &nfs_uuids); 41 - spin_unlock(&nfs_uuid_lock); 67 + spin_unlock(&nfs_uuids_lock); 68 + 69 + uuid_gen(&nfs_uuid->uuid); 70 + spin_unlock(&nfs_uuid->lock); 42 71 43 72 return true; 44 73 } ··· 75 46 76 47 void nfs_uuid_end(nfs_uuid_t *nfs_uuid) 77 48 { 78 - if (nfs_uuid->net == NULL) { 79 - spin_lock(&nfs_uuid_lock); 80 - if (nfs_uuid->net == NULL) 49 + if (!rcu_access_pointer(nfs_uuid->net)) { 50 + spin_lock(&nfs_uuid->lock); 51 + if (!rcu_access_pointer(nfs_uuid->net)) { 52 + /* Not local, remove from nfs_uuids */ 53 + spin_lock(&nfs_uuids_lock); 81 54 list_del_init(&nfs_uuid->list); 82 - spin_unlock(&nfs_uuid_lock); 83 - } 55 + spin_unlock(&nfs_uuids_lock); 56 + } 57 + spin_unlock(&nfs_uuid->lock); 58 + } 84 59 } 85 60 EXPORT_SYMBOL_GPL(nfs_uuid_end); 86 61 ··· 102 69 static struct module *nfsd_mod; 103 70 104 71 void nfs_uuid_is_local(const uuid_t *uuid, struct list_head *list, 105 - struct net *net, struct auth_domain *dom, 106 - struct module *mod) 72 + spinlock_t *list_lock, struct net *net, 73 + struct auth_domain *dom, struct module *mod) 107 74 { 108 75 nfs_uuid_t *nfs_uuid; 109 76 110 - spin_lock(&nfs_uuid_lock); 77 + spin_lock(&nfs_uuids_lock); 111 78 nfs_uuid = nfs_uuid_lookup_locked(uuid); 112 - if (nfs_uuid) { 113 - kref_get(&dom->ref); 114 - nfs_uuid->dom = dom; 115 - /* 116 - * We don't hold a ref on the net, but instead put 117 - * ourselves on a list so the net pointer can be 118 - * invalidated. 119 - */ 120 - list_move(&nfs_uuid->list, list); 121 - rcu_assign_pointer(nfs_uuid->net, net); 122 - 123 - __module_get(mod); 124 - nfsd_mod = mod; 79 + if (!nfs_uuid) { 80 + spin_unlock(&nfs_uuids_lock); 81 + return; 125 82 } 126 - spin_unlock(&nfs_uuid_lock); 83 + 84 + /* 85 + * We don't hold a ref on the net, but instead put 86 + * ourselves on @list (nn->local_clients) so the net 87 + * pointer can be invalidated. 88 + */ 89 + spin_lock(list_lock); /* list_lock is nn->local_clients_lock */ 90 + list_move(&nfs_uuid->list, list); 91 + spin_unlock(list_lock); 92 + 93 + spin_unlock(&nfs_uuids_lock); 94 + /* Once nfs_uuid is parented to @list, avoid global nfs_uuids_lock */ 95 + spin_lock(&nfs_uuid->lock); 96 + 97 + __module_get(mod); 98 + nfsd_mod = mod; 99 + 100 + nfs_uuid->list_lock = list_lock; 101 + kref_get(&dom->ref); 102 + nfs_uuid->dom = dom; 103 + rcu_assign_pointer(nfs_uuid->net, net); 104 + spin_unlock(&nfs_uuid->lock); 127 105 } 128 106 EXPORT_SYMBOL_GPL(nfs_uuid_is_local); 129 107 130 - static void nfs_uuid_put_locked(nfs_uuid_t *nfs_uuid) 108 + void nfs_localio_enable_client(struct nfs_client *clp) 131 109 { 132 - if (nfs_uuid->net) { 133 - module_put(nfsd_mod); 134 - nfs_uuid->net = NULL; 110 + /* nfs_uuid_is_local() does the actual enablement */ 111 + trace_nfs_localio_enable_client(clp); 112 + } 113 + EXPORT_SYMBOL_GPL(nfs_localio_enable_client); 114 + 115 + /* 116 + * Cleanup the nfs_uuid_t embedded in an nfs_client. 117 + * This is the long-form of nfs_uuid_init(). 118 + */ 119 + static bool nfs_uuid_put(nfs_uuid_t *nfs_uuid) 120 + { 121 + LIST_HEAD(local_files); 122 + struct nfs_file_localio *nfl, *tmp; 123 + 124 + spin_lock(&nfs_uuid->lock); 125 + if (unlikely(!rcu_access_pointer(nfs_uuid->net))) { 126 + spin_unlock(&nfs_uuid->lock); 127 + return false; 135 128 } 129 + RCU_INIT_POINTER(nfs_uuid->net, NULL); 130 + 136 131 if (nfs_uuid->dom) { 137 132 auth_domain_put(nfs_uuid->dom); 138 133 nfs_uuid->dom = NULL; 139 134 } 140 - list_del_init(&nfs_uuid->list); 135 + 136 + list_splice_init(&nfs_uuid->files, &local_files); 137 + spin_unlock(&nfs_uuid->lock); 138 + 139 + /* Walk list of files and ensure their last references dropped */ 140 + list_for_each_entry_safe(nfl, tmp, &local_files, list) { 141 + nfs_close_local_fh(nfl); 142 + cond_resched(); 143 + } 144 + 145 + spin_lock(&nfs_uuid->lock); 146 + BUG_ON(!list_empty(&nfs_uuid->files)); 147 + 148 + /* Remove client from nn->local_clients */ 149 + if (nfs_uuid->list_lock) { 150 + spin_lock(nfs_uuid->list_lock); 151 + BUG_ON(list_empty(&nfs_uuid->list)); 152 + list_del_init(&nfs_uuid->list); 153 + spin_unlock(nfs_uuid->list_lock); 154 + nfs_uuid->list_lock = NULL; 155 + } 156 + 157 + module_put(nfsd_mod); 158 + spin_unlock(&nfs_uuid->lock); 159 + 160 + return true; 141 161 } 142 162 143 - void nfs_uuid_invalidate_clients(struct list_head *list) 163 + void nfs_localio_disable_client(struct nfs_client *clp) 144 164 { 165 + if (nfs_uuid_put(&clp->cl_uuid)) 166 + trace_nfs_localio_disable_client(clp); 167 + } 168 + EXPORT_SYMBOL_GPL(nfs_localio_disable_client); 169 + 170 + void nfs_localio_invalidate_clients(struct list_head *nn_local_clients, 171 + spinlock_t *nn_local_clients_lock) 172 + { 173 + LIST_HEAD(local_clients); 145 174 nfs_uuid_t *nfs_uuid, *tmp; 175 + struct nfs_client *clp; 146 176 147 - spin_lock(&nfs_uuid_lock); 148 - list_for_each_entry_safe(nfs_uuid, tmp, list, list) 149 - nfs_uuid_put_locked(nfs_uuid); 150 - spin_unlock(&nfs_uuid_lock); 151 - } 152 - EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_clients); 153 - 154 - void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid) 155 - { 156 - if (nfs_uuid->net) { 157 - spin_lock(&nfs_uuid_lock); 158 - nfs_uuid_put_locked(nfs_uuid); 159 - spin_unlock(&nfs_uuid_lock); 177 + spin_lock(nn_local_clients_lock); 178 + list_splice_init(nn_local_clients, &local_clients); 179 + spin_unlock(nn_local_clients_lock); 180 + list_for_each_entry_safe(nfs_uuid, tmp, &local_clients, list) { 181 + if (WARN_ON(nfs_uuid->list_lock != nn_local_clients_lock)) 182 + break; 183 + clp = container_of(nfs_uuid, struct nfs_client, cl_uuid); 184 + nfs_localio_disable_client(clp); 160 185 } 161 186 } 162 - EXPORT_SYMBOL_GPL(nfs_uuid_invalidate_one_client); 187 + EXPORT_SYMBOL_GPL(nfs_localio_invalidate_clients); 163 188 189 + static void nfs_uuid_add_file(nfs_uuid_t *nfs_uuid, struct nfs_file_localio *nfl) 190 + { 191 + /* Add nfl to nfs_uuid->files if it isn't already */ 192 + spin_lock(&nfs_uuid->lock); 193 + if (list_empty(&nfl->list)) { 194 + rcu_assign_pointer(nfl->nfs_uuid, nfs_uuid); 195 + list_add_tail(&nfl->list, &nfs_uuid->files); 196 + } 197 + spin_unlock(&nfs_uuid->lock); 198 + } 199 + 200 + /* 201 + * Caller is responsible for calling nfsd_net_put and 202 + * nfsd_file_put (via nfs_to_nfsd_file_put_local). 203 + */ 164 204 struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *uuid, 165 205 struct rpc_clnt *rpc_clnt, const struct cred *cred, 166 - const struct nfs_fh *nfs_fh, const fmode_t fmode) 206 + const struct nfs_fh *nfs_fh, struct nfs_file_localio *nfl, 207 + const fmode_t fmode) 167 208 { 168 209 struct net *net; 169 210 struct nfsd_file *localio; ··· 246 139 * Not running in nfsd context, so must safely get reference on nfsd_serv. 247 140 * But the server may already be shutting down, if so disallow new localio. 248 141 * uuid->net is NOT a counted reference, but rcu_read_lock() ensures that 249 - * if uuid->net is not NULL, then calling nfsd_serv_try_get() is safe 142 + * if uuid->net is not NULL, then calling nfsd_net_try_get() is safe 250 143 * and if it succeeds we will have an implied reference to the net. 251 144 * 252 145 * Otherwise NFS may not have ref on NFSD and therefore cannot safely ··· 254 147 */ 255 148 rcu_read_lock(); 256 149 net = rcu_dereference(uuid->net); 257 - if (!net || !nfs_to->nfsd_serv_try_get(net)) { 150 + if (!net || !nfs_to->nfsd_net_try_get(net)) { 258 151 rcu_read_unlock(); 259 152 return ERR_PTR(-ENXIO); 260 153 } 261 154 rcu_read_unlock(); 262 - /* We have an implied reference to net thanks to nfsd_serv_try_get */ 155 + /* We have an implied reference to net thanks to nfsd_net_try_get */ 263 156 localio = nfs_to->nfsd_open_local_fh(net, uuid->dom, rpc_clnt, 264 157 cred, nfs_fh, fmode); 265 158 if (IS_ERR(localio)) 266 159 nfs_to_nfsd_net_put(net); 160 + else 161 + nfs_uuid_add_file(uuid, nfl); 267 162 268 163 return localio; 269 164 } 270 165 EXPORT_SYMBOL_GPL(nfs_open_local_fh); 166 + 167 + void nfs_close_local_fh(struct nfs_file_localio *nfl) 168 + { 169 + struct nfsd_file *ro_nf = NULL; 170 + struct nfsd_file *rw_nf = NULL; 171 + nfs_uuid_t *nfs_uuid; 172 + 173 + rcu_read_lock(); 174 + nfs_uuid = rcu_dereference(nfl->nfs_uuid); 175 + if (!nfs_uuid) { 176 + /* regular (non-LOCALIO) NFS will hammer this */ 177 + rcu_read_unlock(); 178 + return; 179 + } 180 + 181 + ro_nf = rcu_access_pointer(nfl->ro_file); 182 + rw_nf = rcu_access_pointer(nfl->rw_file); 183 + if (ro_nf || rw_nf) { 184 + spin_lock(&nfs_uuid->lock); 185 + if (ro_nf) 186 + ro_nf = rcu_dereference_protected(xchg(&nfl->ro_file, NULL), 1); 187 + if (rw_nf) 188 + rw_nf = rcu_dereference_protected(xchg(&nfl->rw_file, NULL), 1); 189 + 190 + /* Remove nfl from nfs_uuid->files list */ 191 + RCU_INIT_POINTER(nfl->nfs_uuid, NULL); 192 + list_del_init(&nfl->list); 193 + spin_unlock(&nfs_uuid->lock); 194 + rcu_read_unlock(); 195 + 196 + if (ro_nf) 197 + nfs_to_nfsd_file_put_local(ro_nf); 198 + if (rw_nf) 199 + nfs_to_nfsd_file_put_local(rw_nf); 200 + return; 201 + } 202 + rcu_read_unlock(); 203 + } 204 + EXPORT_SYMBOL_GPL(nfs_close_local_fh); 271 205 272 206 /* 273 207 * The NFS LOCALIO code needs to call into NFSD using various symbols,
+14 -6
fs/nfsd/filecache.c
··· 39 39 #include <linux/fsnotify.h> 40 40 #include <linux/seq_file.h> 41 41 #include <linux/rhashtable.h> 42 + #include <linux/nfslocalio.h> 42 43 43 44 #include "vfs.h" 44 45 #include "nfsd.h" ··· 392 391 } 393 392 394 393 /** 395 - * nfsd_file_put_local - put nfsd_file reference and arm nfsd_serv_put in caller 394 + * nfsd_file_put_local - put nfsd_file reference and arm nfsd_net_put in caller 396 395 * @nf: nfsd_file of which to put the reference 397 396 * 398 397 * First save the associated net to return to caller, then put ··· 834 833 struct nfsd_file *nf; 835 834 LIST_HEAD(dispose); 836 835 836 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 837 + if (net) { 838 + struct nfsd_net *nn = net_generic(net, nfsd_net_id); 839 + nfs_localio_invalidate_clients(&nn->local_clients, 840 + &nn->local_clients_lock); 841 + } 842 + #endif 843 + 837 844 rhltable_walk_enter(&nfsd_file_rhltable, &iter); 838 845 do { 839 846 rhashtable_walk_start(&iter); ··· 1231 1222 * a file. The security implications of this should be carefully 1232 1223 * considered before use. 1233 1224 * 1234 - * The nfsd_file object returned by this API is reference-counted 1235 - * and garbage-collected. The object is retained for a few 1236 - * seconds after the final nfsd_file_put() in case the caller 1237 - * wants to re-use it. 1225 + * The nfsd_file_object returned by this API is reference-counted 1226 + * but not garbage-collected. The object is unhashed after the 1227 + * final nfsd_file_put(). 1238 1228 * 1239 1229 * Return values: 1240 1230 * %nfs_ok - @pnf points to an nfsd_file with its reference ··· 1255 1247 __be32 beres; 1256 1248 1257 1249 beres = nfsd_file_do_acquire(NULL, net, cred, client, 1258 - fhp, may_flags, NULL, pnf, true); 1250 + fhp, may_flags, NULL, pnf, false); 1259 1251 put_cred(revert_creds(save_cred)); 1260 1252 return beres; 1261 1253 }
+6 -3
fs/nfsd/localio.c
··· 25 25 #include "cache.h" 26 26 27 27 static const struct nfsd_localio_operations nfsd_localio_ops = { 28 - .nfsd_serv_try_get = nfsd_serv_try_get, 29 - .nfsd_serv_put = nfsd_serv_put, 28 + .nfsd_net_try_get = nfsd_net_try_get, 29 + .nfsd_net_put = nfsd_net_put, 30 30 .nfsd_open_local_fh = nfsd_open_local_fh, 31 31 .nfsd_file_put_local = nfsd_file_put_local, 32 + .nfsd_file_get = nfsd_file_get, 33 + .nfsd_file_put = nfsd_file_put, 32 34 .nfsd_file_file = nfsd_file_file, 33 35 }; 34 36 ··· 54 52 * avoid all the NFS overhead with reads, writes and commits. 55 53 * 56 54 * On successful return, returned nfsd_file will have its nf_net member 57 - * set. Caller (NFS client) is responsible for calling nfsd_serv_put and 55 + * set. Caller (NFS client) is responsible for calling nfsd_net_put and 58 56 * nfsd_file_put (via nfs_to_nfsd_file_put_local). 59 57 */ 60 58 struct nfsd_file * ··· 116 114 struct nfsd_net *nn = net_generic(net, nfsd_net_id); 117 115 118 116 nfs_uuid_is_local(&argp->uuid, &nn->local_clients, 117 + &nn->local_clients_lock, 119 118 net, rqstp->rq_client, THIS_MODULE); 120 119 121 120 return rpc_success;
+7 -5
fs/nfsd/netns.h
··· 134 134 135 135 struct svc_info nfsd_info; 136 136 #define nfsd_serv nfsd_info.serv 137 - struct percpu_ref nfsd_serv_ref; 138 - struct completion nfsd_serv_confirm_done; 139 - struct completion nfsd_serv_free_done; 137 + 138 + struct percpu_ref nfsd_net_ref; 139 + struct completion nfsd_net_confirm_done; 140 + struct completion nfsd_net_free_done; 140 141 141 142 /* 142 143 * clientid and stateid data for construction of net unique COPY ··· 214 213 215 214 #if IS_ENABLED(CONFIG_NFS_LOCALIO) 216 215 /* Local clients to be invalidated when net is shut down */ 216 + spinlock_t local_clients_lock; 217 217 struct list_head local_clients; 218 218 #endif 219 219 }; ··· 225 223 extern bool nfsd_support_version(int vers); 226 224 extern unsigned int nfsd_net_id; 227 225 228 - bool nfsd_serv_try_get(struct net *net); 229 - void nfsd_serv_put(struct net *net); 226 + bool nfsd_net_try_get(struct net *net); 227 + void nfsd_net_put(struct net *net); 230 228 231 229 void nfsd_copy_write_verifier(__be32 verf[2], struct nfsd_net *nn); 232 230 void nfsd_reset_write_verifier(struct nfsd_net *nn);
+4 -2
fs/nfsd/nfsctl.c
··· 2217 2217 seqlock_init(&nn->writeverf_lock); 2218 2218 nfsd_proc_stat_init(net); 2219 2219 #if IS_ENABLED(CONFIG_NFS_LOCALIO) 2220 + spin_lock_init(&nn->local_clients_lock); 2220 2221 INIT_LIST_HEAD(&nn->local_clients); 2221 2222 #endif 2222 2223 return 0; ··· 2235 2234 * nfsd_net_pre_exit - Disconnect localio clients from net namespace 2236 2235 * @net: a network namespace that is about to be destroyed 2237 2236 * 2238 - * This invalidated ->net pointers held by localio clients 2237 + * This invalidates ->net pointers held by localio clients 2239 2238 * while they can still safely access nn->counter. 2240 2239 */ 2241 2240 static __net_exit void nfsd_net_pre_exit(struct net *net) 2242 2241 { 2243 2242 struct nfsd_net *nn = net_generic(net, nfsd_net_id); 2244 2243 2245 - nfs_uuid_invalidate_clients(&nn->local_clients); 2244 + nfs_localio_invalidate_clients(&nn->local_clients, 2245 + &nn->local_clients_lock); 2246 2246 } 2247 2247 #endif 2248 2248
+21 -19
fs/nfsd/nfssvc.c
··· 204 204 return 0; 205 205 } 206 206 207 - bool nfsd_serv_try_get(struct net *net) __must_hold(rcu) 207 + bool nfsd_net_try_get(struct net *net) __must_hold(rcu) 208 208 { 209 209 struct nfsd_net *nn = net_generic(net, nfsd_net_id); 210 210 211 - return (nn && percpu_ref_tryget_live(&nn->nfsd_serv_ref)); 211 + return (nn && percpu_ref_tryget_live(&nn->nfsd_net_ref)); 212 212 } 213 213 214 - void nfsd_serv_put(struct net *net) __must_hold(rcu) 214 + void nfsd_net_put(struct net *net) __must_hold(rcu) 215 215 { 216 216 struct nfsd_net *nn = net_generic(net, nfsd_net_id); 217 217 218 - percpu_ref_put(&nn->nfsd_serv_ref); 218 + percpu_ref_put(&nn->nfsd_net_ref); 219 219 } 220 220 221 - static void nfsd_serv_done(struct percpu_ref *ref) 221 + static void nfsd_net_done(struct percpu_ref *ref) 222 222 { 223 - struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref); 223 + struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_net_ref); 224 224 225 - complete(&nn->nfsd_serv_confirm_done); 225 + complete(&nn->nfsd_net_confirm_done); 226 226 } 227 227 228 - static void nfsd_serv_free(struct percpu_ref *ref) 228 + static void nfsd_net_free(struct percpu_ref *ref) 229 229 { 230 - struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_serv_ref); 230 + struct nfsd_net *nn = container_of(ref, struct nfsd_net, nfsd_net_ref); 231 231 232 - complete(&nn->nfsd_serv_free_done); 232 + complete(&nn->nfsd_net_free_done); 233 233 } 234 234 235 235 /* ··· 426 426 427 427 if (!nn->nfsd_net_up) 428 428 return; 429 + 430 + percpu_ref_kill_and_confirm(&nn->nfsd_net_ref, nfsd_net_done); 431 + wait_for_completion(&nn->nfsd_net_confirm_done); 432 + 429 433 nfsd_export_flush(net); 430 434 nfs4_state_shutdown_net(net); 431 435 nfsd_reply_cache_shutdown(nn); ··· 438 434 lockd_down(net); 439 435 nn->lockd_up = false; 440 436 } 441 - percpu_ref_exit(&nn->nfsd_serv_ref); 437 + 438 + wait_for_completion(&nn->nfsd_net_free_done); 439 + percpu_ref_exit(&nn->nfsd_net_ref); 440 + 442 441 nn->nfsd_net_up = false; 443 442 nfsd_shutdown_generic(); 444 443 } ··· 522 515 struct svc_serv *serv = nn->nfsd_serv; 523 516 524 517 lockdep_assert_held(&nfsd_mutex); 525 - 526 - percpu_ref_kill_and_confirm(&nn->nfsd_serv_ref, nfsd_serv_done); 527 - wait_for_completion(&nn->nfsd_serv_confirm_done); 528 - wait_for_completion(&nn->nfsd_serv_free_done); 529 - /* percpu_ref_exit is called in nfsd_shutdown_net */ 530 518 531 519 spin_lock(&nfsd_notifier_lock); 532 520 nn->nfsd_serv = NULL; ··· 623 621 if (nn->nfsd_serv) 624 622 return 0; 625 623 626 - error = percpu_ref_init(&nn->nfsd_serv_ref, nfsd_serv_free, 624 + error = percpu_ref_init(&nn->nfsd_net_ref, nfsd_net_free, 627 625 0, GFP_KERNEL); 628 626 if (error) 629 627 return error; 630 - init_completion(&nn->nfsd_serv_free_done); 631 - init_completion(&nn->nfsd_serv_confirm_done); 628 + init_completion(&nn->nfsd_net_free_done); 629 + init_completion(&nn->nfsd_net_confirm_done); 632 630 633 631 if (nfsd_max_blksize == 0) 634 632 nfsd_max_blksize = nfsd_get_default_max_blksize();
+2 -1
include/linux/nfs_common.h
··· 9 9 #include <uapi/linux/nfs.h> 10 10 11 11 /* Mapping from NFS error code to "errno" error code. */ 12 - #define errno_NFSERR_IO EIO 13 12 14 13 int nfs_stat_to_errno(enum nfs_stat status); 15 14 int nfs4_stat_to_errno(int stat); 15 + 16 + __u32 nfs_localio_errno_to_nfs4_stat(int errno); 16 17 17 18 #endif /* _LINUX_NFS_COMMON_H */
+20 -2
include/linux/nfs_fs.h
··· 77 77 struct rcu_head rcu_head; 78 78 }; 79 79 80 + struct nfs_file_localio { 81 + struct nfsd_file __rcu *ro_file; 82 + struct nfsd_file __rcu *rw_file; 83 + struct list_head list; 84 + void __rcu *nfs_uuid; /* opaque pointer to 'nfs_uuid_t' */ 85 + }; 86 + 87 + static inline void nfs_localio_file_init(struct nfs_file_localio *nfl) 88 + { 89 + #if IS_ENABLED(CONFIG_NFS_LOCALIO) 90 + nfl->ro_file = NULL; 91 + nfl->rw_file = NULL; 92 + INIT_LIST_HEAD(&nfl->list); 93 + nfl->nfs_uuid = NULL; 94 + #endif 95 + } 96 + 80 97 struct nfs4_state; 81 98 struct nfs_open_context { 82 99 struct nfs_lock_context lock_context; ··· 104 87 struct nfs4_state *state; 105 88 fmode_t mode; 106 89 90 + int error; 107 91 unsigned long flags; 108 92 #define NFS_CONTEXT_BAD (2) 109 93 #define NFS_CONTEXT_UNLOCK (3) 110 94 #define NFS_CONTEXT_FILE_OPEN (4) 111 - int error; 112 95 113 - struct list_head list; 114 96 struct nfs4_threshold *mdsthreshold; 97 + struct list_head list; 115 98 struct rcu_head rcu_head; 99 + struct nfs_file_localio nfl; 116 100 }; 117 101 118 102 struct nfs_open_dir_context {
+1 -2
include/linux/nfs_fs_sb.h
··· 50 50 #define NFS_CS_DS 7 /* - Server is a DS */ 51 51 #define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */ 52 52 #define NFS_CS_PNFS 9 /* - Server used for pnfs */ 53 - #define NFS_CS_LOCAL_IO 10 /* - client is local */ 54 53 struct sockaddr_storage cl_addr; /* server identifier */ 55 54 size_t cl_addrlen; 56 55 char * cl_hostname; /* hostname of server */ ··· 131 132 struct timespec64 cl_nfssvc_boot; 132 133 seqlock_t cl_boot_lock; 133 134 nfs_uuid_t cl_uuid; 134 - spinlock_t cl_localio_lock; 135 + struct work_struct cl_local_probe_work; 135 136 #endif /* CONFIG_NFS_LOCALIO */ 136 137 }; 137 138
+1
include/linux/nfs_xdr.h
··· 1632 1632 NFS_IOHDR_RESEND_PNFS, 1633 1633 NFS_IOHDR_RESEND_MDS, 1634 1634 NFS_IOHDR_UNSTABLE_WRITES, 1635 + NFS_IOHDR_ODIRECT, 1635 1636 }; 1636 1637 1637 1638 struct nfs_io_completion;
+34 -14
include/linux/nfslocalio.h
··· 6 6 #ifndef __LINUX_NFSLOCALIO_H 7 7 #define __LINUX_NFSLOCALIO_H 8 8 9 - /* nfsd_file structure is purposely kept opaque to NFS client */ 10 - struct nfsd_file; 11 - 12 9 #if IS_ENABLED(CONFIG_NFS_LOCALIO) 13 10 14 11 #include <linux/module.h> ··· 16 19 #include <linux/nfs.h> 17 20 #include <net/net_namespace.h> 18 21 22 + struct nfs_client; 23 + struct nfs_file_localio; 24 + 19 25 /* 20 26 * Useful to allow a client to negotiate if localio 21 27 * possible with its server. ··· 27 27 */ 28 28 typedef struct { 29 29 uuid_t uuid; 30 + unsigned nfs3_localio_probe_count; 31 + /* this struct is over a cacheline, avoid bouncing */ 32 + spinlock_t ____cacheline_aligned lock; 30 33 struct list_head list; 34 + spinlock_t *list_lock; /* nn->local_clients_lock */ 31 35 struct net __rcu *net; /* nfsd's network namespace */ 32 36 struct auth_domain *dom; /* auth_domain for localio */ 37 + /* Local files to close when net is shut down or exports change */ 38 + struct list_head files; 33 39 } nfs_uuid_t; 34 40 35 41 void nfs_uuid_init(nfs_uuid_t *); 36 42 bool nfs_uuid_begin(nfs_uuid_t *); 37 43 void nfs_uuid_end(nfs_uuid_t *); 38 - void nfs_uuid_is_local(const uuid_t *, struct list_head *, 44 + void nfs_uuid_is_local(const uuid_t *, struct list_head *, spinlock_t *, 39 45 struct net *, struct auth_domain *, struct module *); 40 - void nfs_uuid_invalidate_clients(struct list_head *list); 41 - void nfs_uuid_invalidate_one_client(nfs_uuid_t *nfs_uuid); 46 + 47 + void nfs_localio_enable_client(struct nfs_client *clp); 48 + void nfs_localio_disable_client(struct nfs_client *clp); 49 + void nfs_localio_invalidate_clients(struct list_head *nn_local_clients, 50 + spinlock_t *nn_local_clients_lock); 42 51 43 52 /* localio needs to map filehandle -> struct nfsd_file */ 44 53 extern struct nfsd_file * 45 54 nfsd_open_local_fh(struct net *, struct auth_domain *, struct rpc_clnt *, 46 55 const struct cred *, const struct nfs_fh *, 47 56 const fmode_t) __must_hold(rcu); 57 + void nfs_close_local_fh(struct nfs_file_localio *); 48 58 49 59 struct nfsd_localio_operations { 50 - bool (*nfsd_serv_try_get)(struct net *); 51 - void (*nfsd_serv_put)(struct net *); 60 + bool (*nfsd_net_try_get)(struct net *); 61 + void (*nfsd_net_put)(struct net *); 52 62 struct nfsd_file *(*nfsd_open_local_fh)(struct net *, 53 63 struct auth_domain *, 54 64 struct rpc_clnt *, ··· 66 56 const struct nfs_fh *, 67 57 const fmode_t); 68 58 struct net *(*nfsd_file_put_local)(struct nfsd_file *); 59 + struct nfsd_file *(*nfsd_file_get)(struct nfsd_file *); 60 + void (*nfsd_file_put)(struct nfsd_file *); 69 61 struct file *(*nfsd_file_file)(struct nfsd_file *); 70 62 } ____cacheline_aligned; 71 63 ··· 76 64 77 65 struct nfsd_file *nfs_open_local_fh(nfs_uuid_t *, 78 66 struct rpc_clnt *, const struct cred *, 79 - const struct nfs_fh *, const fmode_t); 67 + const struct nfs_fh *, struct nfs_file_localio *, 68 + const fmode_t); 80 69 81 70 static inline void nfs_to_nfsd_net_put(struct net *net) 82 71 { 83 72 /* 84 - * Once reference to nfsd_serv is dropped, NFSD could be 85 - * unloaded, so ensure safe return from nfsd_file_put_local() 86 - * by always taking RCU. 73 + * Once reference to net (and associated nfsd_serv) is dropped, NFSD 74 + * could be unloaded, so ensure safe return from nfsd_net_put() by 75 + * always taking RCU. 87 76 */ 88 77 rcu_read_lock(); 89 - nfs_to->nfsd_serv_put(net); 78 + nfs_to->nfsd_net_put(net); 90 79 rcu_read_unlock(); 91 80 } 92 81 ··· 104 91 } 105 92 106 93 #else /* CONFIG_NFS_LOCALIO */ 94 + 95 + struct nfs_file_localio; 96 + static inline void nfs_close_local_fh(struct nfs_file_localio *nfl) 97 + { 98 + } 107 99 static inline void nfsd_localio_ops_init(void) 108 100 { 109 101 } 110 - static inline void nfs_to_nfsd_file_put_local(struct nfsd_file *localio) 102 + struct nfs_client; 103 + static inline void nfs_localio_disable_client(struct nfs_client *clp) 111 104 { 112 105 } 106 + 113 107 #endif /* CONFIG_NFS_LOCALIO */ 114 108 115 109 #endif /* __LINUX_NFSLOCALIO_H */
+1
include/linux/sunrpc/clnt.h
··· 93 93 const struct cred *cl_cred; 94 94 unsigned int cl_max_connect; /* max number of transports not to the same IP */ 95 95 struct super_block *pipefs_sb; 96 + atomic_t cl_task_count; 96 97 }; 97 98 98 99 /*
+20 -9
net/sunrpc/clnt.c
··· 958 958 959 959 trace_rpc_clnt_shutdown(clnt); 960 960 961 + clnt->cl_shutdown = 1; 961 962 while (!list_empty(&clnt->cl_tasks)) { 962 963 rpc_killall_tasks(clnt); 963 964 wait_event_timeout(destroy_wait, 964 965 list_empty(&clnt->cl_tasks), 1*HZ); 965 966 } 967 + 968 + /* wait for tasks still in workqueue or waitqueue */ 969 + wait_event_timeout(destroy_wait, 970 + atomic_read(&clnt->cl_task_count) == 0, 1 * HZ); 966 971 967 972 rpc_release_client(clnt); 968 973 } ··· 1144 1139 list_del(&task->tk_task); 1145 1140 spin_unlock(&clnt->cl_lock); 1146 1141 task->tk_client = NULL; 1142 + atomic_dec(&clnt->cl_task_count); 1147 1143 1148 1144 rpc_release_client(clnt); 1149 1145 } ··· 1195 1189 task->tk_flags |= RPC_TASK_TIMEOUT; 1196 1190 if (clnt->cl_noretranstimeo) 1197 1191 task->tk_flags |= RPC_TASK_NO_RETRANS_TIMEOUT; 1198 - /* Add to the client's list of all tasks */ 1199 - spin_lock(&clnt->cl_lock); 1200 - list_add_tail(&task->tk_task, &clnt->cl_tasks); 1201 - spin_unlock(&clnt->cl_lock); 1192 + atomic_inc(&clnt->cl_task_count); 1202 1193 } 1203 1194 1204 1195 static void ··· 1790 1787 if (status >= 0) { 1791 1788 if (task->tk_rqstp) { 1792 1789 task->tk_action = call_refresh; 1790 + 1791 + /* Add to the client's list of all tasks */ 1792 + spin_lock(&task->tk_client->cl_lock); 1793 + if (list_empty(&task->tk_task)) 1794 + list_add_tail(&task->tk_task, &task->tk_client->cl_tasks); 1795 + spin_unlock(&task->tk_client->cl_lock); 1793 1796 return; 1794 1797 } 1795 - 1796 1798 rpc_call_rpcerror(task, -EIO); 1797 1799 return; 1798 1800 } ··· 1862 1854 fallthrough; 1863 1855 case -EAGAIN: 1864 1856 status = -EACCES; 1865 - fallthrough; 1866 - case -EKEYEXPIRED: 1867 1857 if (!task->tk_cred_retry) 1868 1858 break; 1869 1859 task->tk_cred_retry--; 1870 1860 trace_rpc_retry_refresh_status(task); 1871 1861 return; 1862 + case -EKEYEXPIRED: 1863 + break; 1872 1864 case -ENOMEM: 1873 1865 rpc_delay(task, HZ >> 4); 1874 1866 return; ··· 3327 3319 EXPORT_SYMBOL_GPL(rpc_clnt_xprt_switch_has_addr); 3328 3320 3329 3321 #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) 3330 - static void rpc_show_header(void) 3322 + static void rpc_show_header(struct rpc_clnt *clnt) 3331 3323 { 3324 + printk(KERN_INFO "clnt[%pISpc] RPC tasks[%d]\n", 3325 + (struct sockaddr *)&clnt->cl_xprt->addr, 3326 + atomic_read(&clnt->cl_task_count)); 3332 3327 printk(KERN_INFO "-pid- flgs status -client- --rqstp- " 3333 3328 "-timeout ---ops--\n"); 3334 3329 } ··· 3363 3352 spin_lock(&clnt->cl_lock); 3364 3353 list_for_each_entry(task, &clnt->cl_tasks, tk_task) { 3365 3354 if (!header) { 3366 - rpc_show_header(); 3355 + rpc_show_header(clnt); 3367 3356 header++; 3368 3357 } 3369 3358 rpc_show_task(clnt, task);
+15
net/sunrpc/debugfs.c
··· 74 74 { 75 75 struct rpc_clnt *clnt = f->private; 76 76 spin_unlock(&clnt->cl_lock); 77 + seq_printf(f, "clnt[%pISpc] RPC tasks[%d]\n", 78 + (struct sockaddr *)&clnt->cl_xprt->addr, 79 + atomic_read(&clnt->cl_task_count)); 77 80 } 78 81 79 82 static const struct seq_operations tasks_seq_operations = { ··· 182 179 seq_printf(f, "addr: %s\n", xprt->address_strings[RPC_DISPLAY_ADDR]); 183 180 seq_printf(f, "port: %s\n", xprt->address_strings[RPC_DISPLAY_PORT]); 184 181 seq_printf(f, "state: 0x%lx\n", xprt->state); 182 + seq_printf(f, "netns: %u\n", xprt->xprt_net->ns.inum); 183 + 184 + if (xprt->ops->get_srcaddr) { 185 + int ret, buflen; 186 + char buf[INET6_ADDRSTRLEN]; 187 + 188 + buflen = ARRAY_SIZE(buf); 189 + ret = xprt->ops->get_srcaddr(xprt, buf, buflen); 190 + if (ret < 0) 191 + ret = sprintf(buf, "<closed>"); 192 + seq_printf(f, "saddr: %.*s\n", ret, buf); 193 + } 185 194 return 0; 186 195 } 187 196