Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

xprtrdma: Boost maximum transport header size

Although I haven't seen any performance results that justify it,
I've received several complaints that NFS/RDMA no longer supports
a maximum rsize and wsize of 1MB. These days it is somewhat smaller.

To simplify the logic that determines whether a chunk list is
necessary, the implementation uses a fixed maximum size of the
transport header. Currently that maximum size is 256 bytes, one
quarter of the default inline threshold size for RPC/RDMA v1.

Since commit a78868497c2e ("xprtrdma: Reduce max_frwr_depth"), the
size of chunks is also smaller to take advantage of inline page
lists in device internal MR data structures.

The combination of these two design choices has reduced the maximum
NFS rsize and wsize that can be used for most RNIC/HCAs. Increasing
the maximum transport header size and the maximum number of RDMA
segments it can contain increases the negotiated maximum rsize/wsize
on common RNIC/HCAs.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>

authored by

Chuck Lever and committed by
Anna Schumaker
f3c66a2f 36bdd905

+18 -14
+8 -1
net/sunrpc/xprtrdma/verbs.c
··· 53 53 #include <linux/slab.h> 54 54 #include <linux/sunrpc/addr.h> 55 55 #include <linux/sunrpc/svc_rdma.h> 56 + #include <linux/log2.h> 56 57 57 58 #include <asm-generic/barrier.h> 58 59 #include <asm/bitops.h> ··· 1001 1000 struct rpcrdma_buffer *buffer = &r_xprt->rx_buf; 1002 1001 struct rpcrdma_regbuf *rb; 1003 1002 struct rpcrdma_req *req; 1003 + size_t maxhdrsize; 1004 1004 1005 1005 req = kzalloc(sizeof(*req), flags); 1006 1006 if (req == NULL) 1007 1007 goto out1; 1008 1008 1009 - rb = rpcrdma_regbuf_alloc(RPCRDMA_HDRBUF_SIZE, DMA_TO_DEVICE, flags); 1009 + /* Compute maximum header buffer size in bytes */ 1010 + maxhdrsize = rpcrdma_fixed_maxsz + 3 + 1011 + r_xprt->rx_ia.ri_max_segs * rpcrdma_readchunk_maxsz; 1012 + maxhdrsize *= sizeof(__be32); 1013 + rb = rpcrdma_regbuf_alloc(__roundup_pow_of_two(maxhdrsize), 1014 + DMA_TO_DEVICE, flags); 1010 1015 if (!rb) 1011 1016 goto out2; 1012 1017 req->rl_rdmabuf = rb;
+10 -13
net/sunrpc/xprtrdma/xprt_rdma.h
··· 155 155 156 156 /* To ensure a transport can always make forward progress, 157 157 * the number of RDMA segments allowed in header chunk lists 158 - * is capped at 8. This prevents less-capable devices and 159 - * memory registrations from overrunning the Send buffer 160 - * while building chunk lists. 158 + * is capped at 16. This prevents less-capable devices from 159 + * overrunning the Send buffer while building chunk lists. 161 160 * 162 161 * Elements of the Read list take up more room than the 163 - * Write list or Reply chunk. 8 read segments means the Read 164 - * list (or Write list or Reply chunk) cannot consume more 165 - * than 162 + * Write list or Reply chunk. 16 read segments means the 163 + * chunk lists cannot consume more than 166 164 * 167 - * ((8 + 2) * read segment size) + 1 XDR words, or 244 bytes. 165 + * ((16 + 2) * read segment size) + 1 XDR words, 168 166 * 169 - * And the fixed part of the header is another 24 bytes. 170 - * 171 - * The smallest inline threshold is 1024 bytes, ensuring that 172 - * at least 750 bytes are available for RPC messages. 167 + * or about 400 bytes. The fixed part of the header is 168 + * another 24 bytes. Thus when the inline threshold is 169 + * 1024 bytes, at least 600 bytes are available for RPC 170 + * message bodies. 173 171 */ 174 172 enum { 175 - RPCRDMA_MAX_HDR_SEGS = 8, 176 - RPCRDMA_HDRBUF_SIZE = 256, 173 + RPCRDMA_MAX_HDR_SEGS = 16, 177 174 }; 178 175 179 176 /*