Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

rds: Use sendmsg(MSG_SPLICE_PAGES) rather than sendpage

When transmitting data, call down into TCP using a single sendmsg with
MSG_SPLICE_PAGES to indicate that content should be spliced.

To make this work, the data is assembled in a bio_vec array and attached to
a BVEC-type iterator.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
cc: Jens Axboe <axboe@kernel.dk>
cc: Matthew Wilcox <willy@infradead.org>
cc: rds-devel@oss.oracle.com
Link: https://lore.kernel.org/r/20230623225513.2732256-6-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

David Howells and committed by
Jakub Kicinski
572efade fa094cca

+11 -10
+11 -10
net/rds/tcp_send.c
··· 72 72 { 73 73 struct rds_conn_path *cp = rm->m_inc.i_conn_path; 74 74 struct rds_tcp_connection *tc = cp->cp_transport_data; 75 + struct msghdr msg = {}; 76 + struct bio_vec bvec; 75 77 int done = 0; 76 78 int ret = 0; 77 - int more; 78 79 79 80 if (hdr_off == 0) { 80 81 /* ··· 112 111 goto out; 113 112 } 114 113 115 - more = rm->data.op_nents > 1 ? (MSG_MORE | MSG_SENDPAGE_NOTLAST) : 0; 116 114 while (sg < rm->data.op_nents) { 117 - int flags = MSG_DONTWAIT | MSG_NOSIGNAL | more; 115 + msg.msg_flags = MSG_SPLICE_PAGES | MSG_DONTWAIT | MSG_NOSIGNAL; 116 + if (sg + 1 < rm->data.op_nents) 117 + msg.msg_flags |= MSG_MORE; 118 118 119 - ret = tc->t_sock->ops->sendpage(tc->t_sock, 120 - sg_page(&rm->data.op_sg[sg]), 121 - rm->data.op_sg[sg].offset + off, 122 - rm->data.op_sg[sg].length - off, 123 - flags); 119 + bvec_set_page(&bvec, sg_page(&rm->data.op_sg[sg]), 120 + rm->data.op_sg[sg].length - off, 121 + rm->data.op_sg[sg].offset + off); 122 + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, 123 + rm->data.op_sg[sg].length - off); 124 + ret = sock_sendmsg(tc->t_sock, &msg); 124 125 rdsdebug("tcp sendpage %p:%u:%u ret %d\n", (void *)sg_page(&rm->data.op_sg[sg]), 125 126 rm->data.op_sg[sg].offset + off, rm->data.op_sg[sg].length - off, 126 127 ret); ··· 135 132 off = 0; 136 133 sg++; 137 134 } 138 - if (sg == rm->data.op_nents - 1) 139 - more = 0; 140 135 } 141 136 142 137 out: