Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

svcrdma: Report Write/Reply chunk overruns

Observed at Connectathon 2017.

If a client has underestimated the size of a Write or Reply chunk,
the Linux server writes as much payload data as it can, then it
recognizes there was a problem and closes the connection without
sending the transport header.

This creates a couple of problems:

<> The client never receives indication of the server-side failure,
so it continues to retransmit the bad RPC. Forward progress on
the transport is blocked.

<> The reply payload pages are not moved out of the svc_rqst, thus
they can be released by the RPC server before the RDMA Writes
have completed.

The new rdma_rw-ized helpers return a distinct error code when a
Write/Reply chunk overrun occurs, so it's now easy for the caller
(svc_rdma_sendto) to recognize this case.

Instead of dropping the connection, post an RDMA_ERROR message. The
client now sees an RDMA_ERROR and can properly terminate the RPC
transaction.

As part of the new logic, set up the same delayed release for these
payload pages as would have occurred in the normal case.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>

authored by

Chuck Lever and committed by
J. Bruce Fields
4757d90b 6b19cc5c

+56 -2
+56 -2
net/sunrpc/xprtrdma/svc_rdma_sendto.c
··· 621 621 return ret; 622 622 } 623 623 624 + /* Given the client-provided Write and Reply chunks, the server was not 625 + * able to form a complete reply. Return an RDMA_ERROR message so the 626 + * client can retire this RPC transaction. As above, the Send completion 627 + * routine releases payload pages that were part of a previous RDMA Write. 628 + * 629 + * Remote Invalidation is skipped for simplicity. 630 + */ 631 + static int svc_rdma_send_error_msg(struct svcxprt_rdma *rdma, 632 + __be32 *rdma_resp, struct svc_rqst *rqstp) 633 + { 634 + struct svc_rdma_op_ctxt *ctxt; 635 + __be32 *p; 636 + int ret; 637 + 638 + ctxt = svc_rdma_get_context(rdma); 639 + 640 + /* Replace the original transport header with an 641 + * RDMA_ERROR response. XID etc are preserved. 642 + */ 643 + p = rdma_resp + 3; 644 + *p++ = rdma_error; 645 + *p = err_chunk; 646 + 647 + ret = svc_rdma_map_reply_hdr(rdma, ctxt, rdma_resp, 20); 648 + if (ret < 0) 649 + goto err; 650 + 651 + svc_rdma_save_io_pages(rqstp, ctxt); 652 + 653 + ret = svc_rdma_post_send_wr(rdma, ctxt, 1 + ret, 0); 654 + if (ret) 655 + goto err; 656 + 657 + return 0; 658 + 659 + err: 660 + pr_err("svcrdma: failed to post Send WR (%d)\n", ret); 661 + svc_rdma_unmap_dma(ctxt); 662 + svc_rdma_put_context(ctxt, 1); 663 + return ret; 664 + } 665 + 624 666 void svc_rdma_prep_reply_hdr(struct svc_rqst *rqstp) 625 667 { 626 668 } ··· 725 683 /* XXX: Presume the client sent only one Write chunk */ 726 684 ret = svc_rdma_send_write_chunk(rdma, wr_lst, xdr); 727 685 if (ret < 0) 728 - goto err1; 686 + goto err2; 729 687 svc_rdma_xdr_encode_write_list(rdma_resp, wr_lst, ret); 730 688 } 731 689 if (rp_ch) { 732 690 ret = svc_rdma_send_reply_chunk(rdma, rp_ch, wr_lst, xdr); 733 691 if (ret < 0) 734 - goto err1; 692 + goto err2; 735 693 svc_rdma_xdr_encode_reply_chunk(rdma_resp, rp_ch, ret); 736 694 } 737 695 ··· 740 698 goto err1; 741 699 ret = svc_rdma_send_reply_msg(rdma, rdma_argp, rdma_resp, rqstp, 742 700 wr_lst, rp_ch); 701 + if (ret < 0) 702 + goto err0; 703 + return 0; 704 + 705 + err2: 706 + if (ret != -E2BIG) 707 + goto err1; 708 + 709 + ret = svc_rdma_post_recv(rdma, GFP_KERNEL); 710 + if (ret) 711 + goto err1; 712 + ret = svc_rdma_send_error_msg(rdma, rdma_resp, rqstp); 743 713 if (ret < 0) 744 714 goto err0; 745 715 return 0;