Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

NFSv4.1: fix pnfs MDS=DS session trunking

Currently, when GETDEVICEINFO returns multiple locations where each
is a different IP but the server's identity is same as MDS, then
nfs4_set_ds_client() finds the existing nfs_client structure which
has the MDS's max_connect value (and if it's 1), then the 1st IP
on the DS's list will get dropped due to MDS trunking rules. Other
IPs would be added as they fall under the pnfs trunking rules.

For the list of IPs the 1st goes thru calling nfs4_set_ds_client()
which will eventually call nfs4_add_trunk() and call into
rpc_clnt_test_and_add_xprt() which has the check for MDS trunking.
The other IPs (after the 1st one), would call rpc_clnt_add_xprt()
which doesn't go thru that check.

nfs4_add_trunk() is called when MDS trunking is happening and it
needs to enforce the usage of max_connect mount option of the
1st mount. However, this shouldn't be applied to pnfs flow.

Instead, this patch proposed to treat MDS=DS as DS trunking and
make sure that MDS's max_connect limit does not apply to the
1st IP returned in the GETDEVICEINFO list. It does so by
marking the newly created client with a new flag NFS_CS_PNFS
which then used to pass max_connect value to use into the
rpc_clnt_test_and_add_xprt() instead of the existing rpc
client's max_connect value set by the MDS connection.

For example, mount was done without max_connect value set
so MDS's rpc client has cl_max_connect=1. Upon calling into
rpc_clnt_test_and_add_xprt() and using rpc client's value,
the caller passes in max_connect value which is previously
been set in the pnfs path (as a part of handling
GETDEVICEINFO list of IPs) in nfs4_set_ds_client().

However, when NFS_CS_PNFS flag is not set and we know we
are doing MDS trunking, comparing a new IP of the same
server, we then set the max_connect value to the
existing MDS's value and pass that into
rpc_clnt_test_and_add_xprt().

Fixes: dc48e0abee24 ("SUNRPC enforce creation of no more than max_connect xprts")
Signed-off-by: Olga Kornievskaia <kolga@netapp.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

authored by

Olga Kornievskaia and committed by
Anna Schumaker
806a3bc4 e86fcf08

+13 -5
+5 -1
fs/nfs/nfs4client.c
··· 417 417 .net = old->cl_net, 418 418 .servername = old->cl_hostname, 419 419 }; 420 + int max_connect = test_bit(NFS_CS_PNFS, &clp->cl_flags) ? 421 + clp->cl_max_connect : old->cl_max_connect; 420 422 421 423 if (clp->cl_proto != old->cl_proto) 422 424 return; ··· 432 430 xprt_args.addrlen = clp_salen; 433 431 434 432 rpc_clnt_add_xprt(old->cl_rpcclient, &xprt_args, 435 - rpc_clnt_test_and_add_xprt, NULL); 433 + rpc_clnt_test_and_add_xprt, &max_connect); 436 434 } 437 435 438 436 /** ··· 1012 1010 __set_bit(NFS_CS_NORESVPORT, &cl_init.init_flags); 1013 1011 1014 1012 __set_bit(NFS_CS_DS, &cl_init.init_flags); 1013 + __set_bit(NFS_CS_PNFS, &cl_init.init_flags); 1014 + cl_init.max_connect = NFS_MAX_TRANSPORTS; 1015 1015 /* 1016 1016 * Set an authflavor equual to the MDS value. Use the MDS nfs_client 1017 1017 * cl_ipaddr so as to use the same EXCHANGE_ID co_ownerid as the MDS
+1
include/linux/nfs_fs_sb.h
··· 48 48 #define NFS_CS_NOPING 6 /* - don't ping on connect */ 49 49 #define NFS_CS_DS 7 /* - Server is a DS */ 50 50 #define NFS_CS_REUSEPORT 8 /* - reuse src port on reconnect */ 51 + #define NFS_CS_PNFS 9 /* - Server used for pnfs */ 51 52 struct sockaddr_storage cl_addr; /* server identifier */ 52 53 size_t cl_addrlen; 53 54 char * cl_hostname; /* hostname of server */
+7 -4
net/sunrpc/clnt.c
··· 2908 2908 * @clnt: pointer to struct rpc_clnt 2909 2909 * @xps: pointer to struct rpc_xprt_switch, 2910 2910 * @xprt: pointer struct rpc_xprt 2911 - * @dummy: unused 2911 + * @in_max_connect: pointer to the max_connect value for the passed in xprt transport 2912 2912 */ 2913 2913 int rpc_clnt_test_and_add_xprt(struct rpc_clnt *clnt, 2914 2914 struct rpc_xprt_switch *xps, struct rpc_xprt *xprt, 2915 - void *dummy) 2915 + void *in_max_connect) 2916 2916 { 2917 2917 struct rpc_cb_add_xprt_calldata *data; 2918 2918 struct rpc_task *task; 2919 + int max_connect = clnt->cl_max_connect; 2919 2920 2920 - if (xps->xps_nunique_destaddr_xprts + 1 > clnt->cl_max_connect) { 2921 + if (in_max_connect) 2922 + max_connect = *(int *)in_max_connect; 2923 + if (xps->xps_nunique_destaddr_xprts + 1 > max_connect) { 2921 2924 rcu_read_lock(); 2922 2925 pr_warn("SUNRPC: reached max allowed number (%d) did not add " 2923 - "transport to server: %s\n", clnt->cl_max_connect, 2926 + "transport to server: %s\n", max_connect, 2924 2927 rpc_peeraddr2str(clnt, RPC_DISPLAY_ADDR)); 2925 2928 rcu_read_unlock(); 2926 2929 return -EINVAL;