Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'splice-net-rewrite-splice-to-socket-fix-splice_f_more-and-handle-msg_splice_pages-in-af_tls'

David Howells says:

====================
splice, net: Rewrite splice-to-socket, fix SPLICE_F_MORE and handle MSG_SPLICE_PAGES in AF_TLS

Here are patches to do the following:

(1) Block MSG_SENDPAGE_* flags from leaking into ->sendmsg() from
userspace, whilst allowing splice_to_socket() to pass them in.

(2) Allow MSG_SPLICE_PAGES to be passed into tls_*_sendmsg(). Until
support is added, it will be ignored and a splice-driven sendmsg()
will be treated like a normal sendmsg(). TCP, UDP, AF_UNIX and
Chelsio-TLS already handle the flag in net-next.

(3) Replace a chain of functions to splice-to-sendpage with a single
function to splice via sendmsg() with MSG_SPLICE_PAGES. This allows a
bunch of pages to be spliced from a pipe in a single call using a
bio_vec[] and pushes the main processing loop down into the bowels of
the protocol driver rather than repeatedly calling in with a page at a
time.

(4) Provide a ->splice_eof() op[2] that allows splice to signal to its
output that the input observed a premature EOF and that the caller
didn't flag SPLICE_F_MORE, thereby allowing a corked socket to be
flushed. This attempts to maintain the current behaviour. It is also
not called if we didn't manage to read any data and so didn't called
the actor function.

This needs routing though several layers to get it down to the network
protocol.

[!] Note that I chose not to pass in any flags - I'm not sure it's
particularly useful to pass in the splice flags; I also elected
not to return any error code - though we might actually want to do
that.

(5) Provide tls_{device,sw}_splice_eof() to flush a pending TLS record if
there is one.

(6) Provide splice_eof() for UDP, TCP, Chelsio-TLS and AF_KCM. AF_UNIX
doesn't seem to pay attention to the MSG_MORE or MSG_SENDPAGE_NOTLAST
flags.

(7) Alter the behaviour of sendfile() and fix SPLICE_F_MORE/MSG_MORE
signalling[1] such SPLICE_F_MORE is always signalled until we have
read sufficient data to finish the request. If we get a zero-length
before we've managed to splice sufficient data, we now leave the
socket expecting more data and leave it to userspace to deal with it.

(8) Make AF_TLS handle the MSG_SPLICE_PAGES internal sendmsg flag.
MSG_SPLICE_PAGES is an internal hint that tells the protocol that it
should splice the pages supplied if it can. Its sendpage
implementations are then turned into wrappers around that.

Link: https://lore.kernel.org/r/499791.1685485603@warthog.procyon.org.uk/ [1]
Link: https://lore.kernel.org/r/CAHk-=wh=V579PDYvkpnTobCLGczbgxpMgGmmhqiTyE34Cpi5Gg@mail.gmail.com/ [2]
Link: https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=51c78a4d532efe9543a4df019ff405f05c6157f6 # part 1
Link: https://lore.kernel.org/r/20230524153311.3625329-1-dhowells@redhat.com/ # v1
====================

Link: https://lore.kernel.org/r/20230607181920.2294972-1-dhowells@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

+505 -265
+1
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls.h
··· 568 568 int chtls_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); 569 569 int chtls_recvmsg(struct sock *sk, struct msghdr *msg, 570 570 size_t len, int flags, int *addr_len); 571 + void chtls_splice_eof(struct socket *sock); 571 572 int chtls_sendpage(struct sock *sk, struct page *page, 572 573 int offset, size_t size, int flags); 573 574 int send_tx_flowc_wr(struct sock *sk, int compl,
+9
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_io.c
··· 1237 1237 goto done; 1238 1238 } 1239 1239 1240 + void chtls_splice_eof(struct socket *sock) 1241 + { 1242 + struct sock *sk = sock->sk; 1243 + 1244 + lock_sock(sk); 1245 + chtls_tcp_push(sk, 0); 1246 + release_sock(sk); 1247 + } 1248 + 1240 1249 int chtls_sendpage(struct sock *sk, struct page *page, 1241 1250 int offset, size_t size, int flags) 1242 1251 {
+1
drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_main.c
··· 606 606 chtls_cpl_prot.destroy = chtls_destroy_sock; 607 607 chtls_cpl_prot.shutdown = chtls_shutdown; 608 608 chtls_cpl_prot.sendmsg = chtls_sendmsg; 609 + chtls_cpl_prot.splice_eof = chtls_splice_eof; 609 610 chtls_cpl_prot.sendpage = chtls_sendpage; 610 611 chtls_cpl_prot.recvmsg = chtls_recvmsg; 611 612 chtls_cpl_prot.setsockopt = chtls_setsockopt;
+167 -40
fs/splice.c
··· 33 33 #include <linux/fsnotify.h> 34 34 #include <linux/security.h> 35 35 #include <linux/gfp.h> 36 + #include <linux/net.h> 36 37 #include <linux/socket.h> 37 38 #include <linux/sched/signal.h> 38 39 ··· 449 448 }; 450 449 EXPORT_SYMBOL(nosteal_pipe_buf_ops); 451 450 452 - /* 453 - * Send 'sd->len' bytes to socket from 'sd->file' at position 'sd->pos' 454 - * using sendpage(). Return the number of bytes sent. 455 - */ 456 - static int pipe_to_sendpage(struct pipe_inode_info *pipe, 457 - struct pipe_buffer *buf, struct splice_desc *sd) 458 - { 459 - struct file *file = sd->u.file; 460 - loff_t pos = sd->pos; 461 - int more; 462 - 463 - if (!likely(file->f_op->sendpage)) 464 - return -EINVAL; 465 - 466 - more = (sd->flags & SPLICE_F_MORE) ? MSG_MORE : 0; 467 - 468 - if (sd->len < sd->total_len && 469 - pipe_occupancy(pipe->head, pipe->tail) > 1) 470 - more |= MSG_SENDPAGE_NOTLAST; 471 - 472 - return file->f_op->sendpage(file, buf->page, buf->offset, 473 - sd->len, &pos, more); 474 - } 475 - 476 451 static void wakeup_pipe_writers(struct pipe_inode_info *pipe) 477 452 { 478 453 smp_mb(); ··· 629 652 * Description: 630 653 * This function does little more than loop over the pipe and call 631 654 * @actor to do the actual moving of a single struct pipe_buffer to 632 - * the desired destination. See pipe_to_file, pipe_to_sendpage, or 655 + * the desired destination. See pipe_to_file, pipe_to_sendmsg, or 633 656 * pipe_to_user. 634 657 * 635 658 */ ··· 810 833 811 834 EXPORT_SYMBOL(iter_file_splice_write); 812 835 836 + #ifdef CONFIG_NET 813 837 /** 814 - * generic_splice_sendpage - splice data from a pipe to a socket 838 + * splice_to_socket - splice data from a pipe to a socket 815 839 * @pipe: pipe to splice from 816 840 * @out: socket to write to 817 841 * @ppos: position in @out ··· 824 846 * is involved. 825 847 * 826 848 */ 827 - ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, struct file *out, 828 - loff_t *ppos, size_t len, unsigned int flags) 849 + ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out, 850 + loff_t *ppos, size_t len, unsigned int flags) 829 851 { 830 - return splice_from_pipe(pipe, out, ppos, len, flags, pipe_to_sendpage); 831 - } 852 + struct socket *sock = sock_from_file(out); 853 + struct bio_vec bvec[16]; 854 + struct msghdr msg = {}; 855 + ssize_t ret = 0; 856 + size_t spliced = 0; 857 + bool need_wakeup = false; 832 858 833 - EXPORT_SYMBOL(generic_splice_sendpage); 859 + pipe_lock(pipe); 860 + 861 + while (len > 0) { 862 + unsigned int head, tail, mask, bc = 0; 863 + size_t remain = len; 864 + 865 + /* 866 + * Check for signal early to make process killable when there 867 + * are always buffers available 868 + */ 869 + ret = -ERESTARTSYS; 870 + if (signal_pending(current)) 871 + break; 872 + 873 + while (pipe_empty(pipe->head, pipe->tail)) { 874 + ret = 0; 875 + if (!pipe->writers) 876 + goto out; 877 + 878 + if (spliced) 879 + goto out; 880 + 881 + ret = -EAGAIN; 882 + if (flags & SPLICE_F_NONBLOCK) 883 + goto out; 884 + 885 + ret = -ERESTARTSYS; 886 + if (signal_pending(current)) 887 + goto out; 888 + 889 + if (need_wakeup) { 890 + wakeup_pipe_writers(pipe); 891 + need_wakeup = false; 892 + } 893 + 894 + pipe_wait_readable(pipe); 895 + } 896 + 897 + head = pipe->head; 898 + tail = pipe->tail; 899 + mask = pipe->ring_size - 1; 900 + 901 + while (!pipe_empty(head, tail)) { 902 + struct pipe_buffer *buf = &pipe->bufs[tail & mask]; 903 + size_t seg; 904 + 905 + if (!buf->len) { 906 + tail++; 907 + continue; 908 + } 909 + 910 + seg = min_t(size_t, remain, buf->len); 911 + seg = min_t(size_t, seg, PAGE_SIZE); 912 + 913 + ret = pipe_buf_confirm(pipe, buf); 914 + if (unlikely(ret)) { 915 + if (ret == -ENODATA) 916 + ret = 0; 917 + break; 918 + } 919 + 920 + bvec_set_page(&bvec[bc++], buf->page, seg, buf->offset); 921 + remain -= seg; 922 + if (seg >= buf->len) 923 + tail++; 924 + if (bc >= ARRAY_SIZE(bvec)) 925 + break; 926 + } 927 + 928 + if (!bc) 929 + break; 930 + 931 + msg.msg_flags = MSG_SPLICE_PAGES; 932 + if (flags & SPLICE_F_MORE) 933 + msg.msg_flags |= MSG_MORE; 934 + if (remain && pipe_occupancy(pipe->head, tail) > 0) 935 + msg.msg_flags |= MSG_MORE; 936 + 937 + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, bvec, bc, 938 + len - remain); 939 + ret = sock_sendmsg(sock, &msg); 940 + if (ret <= 0) 941 + break; 942 + 943 + spliced += ret; 944 + len -= ret; 945 + tail = pipe->tail; 946 + while (ret > 0) { 947 + struct pipe_buffer *buf = &pipe->bufs[tail & mask]; 948 + size_t seg = min_t(size_t, ret, buf->len); 949 + 950 + buf->offset += seg; 951 + buf->len -= seg; 952 + ret -= seg; 953 + 954 + if (!buf->len) { 955 + pipe_buf_release(pipe, buf); 956 + tail++; 957 + } 958 + } 959 + 960 + if (tail != pipe->tail) { 961 + pipe->tail = tail; 962 + if (pipe->files) 963 + need_wakeup = true; 964 + } 965 + } 966 + 967 + out: 968 + pipe_unlock(pipe); 969 + if (need_wakeup) 970 + wakeup_pipe_writers(pipe); 971 + return spliced ?: ret; 972 + } 973 + #endif 834 974 835 975 static int warn_unsupported(struct file *file, const char *op) 836 976 { ··· 967 871 if (unlikely(!out->f_op->splice_write)) 968 872 return warn_unsupported(out, "write"); 969 873 return out->f_op->splice_write(pipe, out, ppos, len, flags); 874 + } 875 + 876 + /* 877 + * Indicate to the caller that there was a premature EOF when reading from the 878 + * source and the caller didn't indicate they would be sending more data after 879 + * this. 880 + */ 881 + static void do_splice_eof(struct splice_desc *sd) 882 + { 883 + if (sd->splice_eof) 884 + sd->splice_eof(sd); 970 885 } 971 886 972 887 /* ··· 1063 956 */ 1064 957 bytes = 0; 1065 958 len = sd->total_len; 959 + 960 + /* Don't block on output, we have to drain the direct pipe. */ 1066 961 flags = sd->flags; 962 + sd->flags &= ~SPLICE_F_NONBLOCK; 1067 963 1068 964 /* 1069 - * Don't block on output, we have to drain the direct pipe. 965 + * We signal MORE until we've read sufficient data to fulfill the 966 + * request and we keep signalling it if the caller set it. 1070 967 */ 1071 - sd->flags &= ~SPLICE_F_NONBLOCK; 1072 968 more = sd->flags & SPLICE_F_MORE; 969 + sd->flags |= SPLICE_F_MORE; 1073 970 1074 971 WARN_ON_ONCE(!pipe_empty(pipe->head, pipe->tail)); 1075 972 ··· 1083 972 1084 973 ret = do_splice_to(in, &pos, pipe, len, flags); 1085 974 if (unlikely(ret <= 0)) 1086 - goto out_release; 975 + goto read_failure; 1087 976 1088 977 read_len = ret; 1089 978 sd->total_len = read_len; 1090 979 1091 980 /* 1092 - * If more data is pending, set SPLICE_F_MORE 1093 - * If this is the last data and SPLICE_F_MORE was not set 1094 - * initially, clears it. 981 + * If we now have sufficient data to fulfill the request then 982 + * we clear SPLICE_F_MORE if it was not set initially. 1095 983 */ 1096 - if (read_len < len) 1097 - sd->flags |= SPLICE_F_MORE; 1098 - else if (!more) 984 + if (read_len >= len && !more) 1099 985 sd->flags &= ~SPLICE_F_MORE; 986 + 1100 987 /* 1101 988 * NOTE: nonblocking mode only applies to the input. We 1102 989 * must not do the output in nonblocking mode as then we ··· 1121 1012 file_accessed(in); 1122 1013 return bytes; 1123 1014 1015 + read_failure: 1016 + /* 1017 + * If the user did *not* set SPLICE_F_MORE *and* we didn't hit that 1018 + * "use all of len" case that cleared SPLICE_F_MORE, *and* we did a 1019 + * "->splice_in()" that returned EOF (ie zero) *and* we have sent at 1020 + * least 1 byte *then* we will also do the ->splice_eof() call. 1021 + */ 1022 + if (ret == 0 && !more && len > 0 && bytes) 1023 + do_splice_eof(sd); 1124 1024 out_release: 1125 1025 /* 1126 1026 * If we did an incomplete transfer we must release ··· 1158 1040 sd->flags); 1159 1041 } 1160 1042 1043 + static void direct_file_splice_eof(struct splice_desc *sd) 1044 + { 1045 + struct file *file = sd->u.file; 1046 + 1047 + if (file->f_op->splice_eof) 1048 + file->f_op->splice_eof(file); 1049 + } 1050 + 1161 1051 /** 1162 1052 * do_splice_direct - splices data directly between two files 1163 1053 * @in: file to splice from ··· 1191 1065 .flags = flags, 1192 1066 .pos = *ppos, 1193 1067 .u.file = out, 1068 + .splice_eof = direct_file_splice_eof, 1194 1069 .opos = opos, 1195 1070 }; 1196 1071 long ret;
+1 -2
include/linux/fs.h
··· 1796 1796 int (*flock) (struct file *, int, struct file_lock *); 1797 1797 ssize_t (*splice_write)(struct pipe_inode_info *, struct file *, loff_t *, size_t, unsigned int); 1798 1798 ssize_t (*splice_read)(struct file *, loff_t *, struct pipe_inode_info *, size_t, unsigned int); 1799 + void (*splice_eof)(struct file *file); 1799 1800 int (*setlease)(struct file *, long, struct file_lock **, void **); 1800 1801 long (*fallocate)(struct file *file, int mode, loff_t offset, 1801 1802 loff_t len); ··· 2760 2759 struct pipe_inode_info *, size_t, unsigned int); 2761 2760 extern ssize_t iter_file_splice_write(struct pipe_inode_info *, 2762 2761 struct file *, loff_t *, size_t, unsigned int); 2763 - extern ssize_t generic_splice_sendpage(struct pipe_inode_info *pipe, 2764 - struct file *out, loff_t *, size_t len, unsigned int flags); 2765 2762 extern long do_splice_direct(struct file *in, loff_t *ppos, struct file *out, 2766 2763 loff_t *opos, size_t len, unsigned int flags); 2767 2764
+1
include/linux/net.h
··· 210 210 int offset, size_t size, int flags); 211 211 ssize_t (*splice_read)(struct socket *sock, loff_t *ppos, 212 212 struct pipe_inode_info *pipe, size_t len, unsigned int flags); 213 + void (*splice_eof)(struct socket *sock); 213 214 int (*set_peek_off)(struct sock *sk, int val); 214 215 int (*peek_len)(struct socket *sock); 215 216
+3 -1
include/linux/socket.h
··· 339 339 #endif 340 340 341 341 /* Flags to be cleared on entry by sendmsg and sendmmsg syscalls */ 342 - #define MSG_INTERNAL_SENDMSG_FLAGS (MSG_SPLICE_PAGES) 342 + #define MSG_INTERNAL_SENDMSG_FLAGS \ 343 + (MSG_SPLICE_PAGES | MSG_SENDPAGE_NOPOLICY | MSG_SENDPAGE_NOTLAST | \ 344 + MSG_SENDPAGE_DECRYPTED) 343 345 344 346 /* Setsockoptions(2) level. Thanks to BSD these must match IPPROTO_xxx */ 345 347 #define SOL_IP 0
+3
include/linux/splice.h
··· 38 38 struct file *file; /* file to read/write */ 39 39 void *data; /* cookie */ 40 40 } u; 41 + void (*splice_eof)(struct splice_desc *sd); /* Unexpected EOF handler */ 41 42 loff_t pos; /* file position */ 42 43 loff_t *opos; /* sendfile: output position */ 43 44 size_t num_spliced; /* number of bytes already spliced */ ··· 85 84 86 85 extern long do_tee(struct file *in, struct file *out, size_t len, 87 86 unsigned int flags); 87 + extern ssize_t splice_to_socket(struct pipe_inode_info *pipe, struct file *out, 88 + loff_t *ppos, size_t len, unsigned int flags); 88 89 89 90 /* 90 91 * for dynamic pipe sizing
+1
include/net/inet_common.h
··· 35 35 struct sock *newsk); 36 36 int inet_send_prepare(struct sock *sk); 37 37 int inet_sendmsg(struct socket *sock, struct msghdr *msg, size_t size); 38 + void inet_splice_eof(struct socket *sock); 38 39 ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset, 39 40 size_t size, int flags); 40 41 int inet_recvmsg(struct socket *sock, struct msghdr *msg, size_t size,
+1
include/net/sock.h
··· 1279 1279 size_t len, int flags, int *addr_len); 1280 1280 int (*sendpage)(struct sock *sk, struct page *page, 1281 1281 int offset, size_t size, int flags); 1282 + void (*splice_eof)(struct socket *sock); 1282 1283 int (*bind)(struct sock *sk, 1283 1284 struct sockaddr *addr, int addr_len); 1284 1285 int (*bind_add)(struct sock *sk,
+1
include/net/tcp.h
··· 327 327 int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size); 328 328 int tcp_sendmsg_fastopen(struct sock *sk, struct msghdr *msg, int *copied, 329 329 size_t size, struct ubuf_info *uarg); 330 + void tcp_splice_eof(struct socket *sock); 330 331 int tcp_sendpage(struct sock *sk, struct page *page, int offset, size_t size, 331 332 int flags); 332 333 int tcp_sendpage_locked(struct sock *sk, struct page *page, int offset,
+1
include/net/udp.h
··· 278 278 int udp_err(struct sk_buff *, u32); 279 279 int udp_abort(struct sock *sk, int err); 280 280 int udp_sendmsg(struct sock *sk, struct msghdr *msg, size_t len); 281 + void udp_splice_eof(struct socket *sock); 281 282 int udp_push_pending_frames(struct sock *sk); 282 283 void udp_flush_pending_frames(struct sock *sk); 283 284 int udp_cmsg_send(struct sock *sk, struct msghdr *msg, u16 *gso_size);
+18
net/ipv4/af_inet.c
··· 831 831 } 832 832 EXPORT_SYMBOL(inet_sendmsg); 833 833 834 + void inet_splice_eof(struct socket *sock) 835 + { 836 + const struct proto *prot; 837 + struct sock *sk = sock->sk; 838 + 839 + if (unlikely(inet_send_prepare(sk))) 840 + return; 841 + 842 + /* IPV6_ADDRFORM can change sk->sk_prot under us. */ 843 + prot = READ_ONCE(sk->sk_prot); 844 + if (prot->splice_eof) 845 + prot->splice_eof(sock); 846 + } 847 + EXPORT_SYMBOL_GPL(inet_splice_eof); 848 + 834 849 ssize_t inet_sendpage(struct socket *sock, struct page *page, int offset, 835 850 size_t size, int flags) 836 851 { ··· 1065 1050 #ifdef CONFIG_MMU 1066 1051 .mmap = tcp_mmap, 1067 1052 #endif 1053 + .splice_eof = inet_splice_eof, 1068 1054 .sendpage = inet_sendpage, 1069 1055 .splice_read = tcp_splice_read, 1070 1056 .read_sock = tcp_read_sock, ··· 1100 1084 .read_skb = udp_read_skb, 1101 1085 .recvmsg = inet_recvmsg, 1102 1086 .mmap = sock_no_mmap, 1087 + .splice_eof = inet_splice_eof, 1103 1088 .sendpage = inet_sendpage, 1104 1089 .set_peek_off = sk_set_peek_off, 1105 1090 #ifdef CONFIG_COMPAT ··· 1132 1115 .sendmsg = inet_sendmsg, 1133 1116 .recvmsg = inet_recvmsg, 1134 1117 .mmap = sock_no_mmap, 1118 + .splice_eof = inet_splice_eof, 1135 1119 .sendpage = inet_sendpage, 1136 1120 #ifdef CONFIG_COMPAT 1137 1121 .compat_ioctl = inet_compat_ioctl,
+16
net/ipv4/tcp.c
··· 1371 1371 } 1372 1372 EXPORT_SYMBOL(tcp_sendmsg); 1373 1373 1374 + void tcp_splice_eof(struct socket *sock) 1375 + { 1376 + struct sock *sk = sock->sk; 1377 + struct tcp_sock *tp = tcp_sk(sk); 1378 + int mss_now, size_goal; 1379 + 1380 + if (!tcp_write_queue_tail(sk)) 1381 + return; 1382 + 1383 + lock_sock(sk); 1384 + mss_now = tcp_send_mss(sk, &size_goal, 0); 1385 + tcp_push(sk, 0, mss_now, tp->nonagle, size_goal); 1386 + release_sock(sk); 1387 + } 1388 + EXPORT_SYMBOL_GPL(tcp_splice_eof); 1389 + 1374 1390 /* 1375 1391 * Handle reading urgent data. BSD has very simple semantics for 1376 1392 * this, no blocking and very strange errors 8)
+1
net/ipv4/tcp_ipv4.c
··· 3116 3116 .keepalive = tcp_set_keepalive, 3117 3117 .recvmsg = tcp_recvmsg, 3118 3118 .sendmsg = tcp_sendmsg, 3119 + .splice_eof = tcp_splice_eof, 3119 3120 .sendpage = tcp_sendpage, 3120 3121 .backlog_rcv = tcp_v4_do_rcv, 3121 3122 .release_cb = tcp_release_cb,
+16
net/ipv4/udp.c
··· 1324 1324 } 1325 1325 EXPORT_SYMBOL(udp_sendmsg); 1326 1326 1327 + void udp_splice_eof(struct socket *sock) 1328 + { 1329 + struct sock *sk = sock->sk; 1330 + struct udp_sock *up = udp_sk(sk); 1331 + 1332 + if (!up->pending || READ_ONCE(up->corkflag)) 1333 + return; 1334 + 1335 + lock_sock(sk); 1336 + if (up->pending && !READ_ONCE(up->corkflag)) 1337 + udp_push_pending_frames(sk); 1338 + release_sock(sk); 1339 + } 1340 + EXPORT_SYMBOL_GPL(udp_splice_eof); 1341 + 1327 1342 int udp_sendpage(struct sock *sk, struct page *page, int offset, 1328 1343 size_t size, int flags) 1329 1344 { ··· 2933 2918 .getsockopt = udp_getsockopt, 2934 2919 .sendmsg = udp_sendmsg, 2935 2920 .recvmsg = udp_recvmsg, 2921 + .splice_eof = udp_splice_eof, 2936 2922 .sendpage = udp_sendpage, 2937 2923 .release_cb = ip4_datagram_release_cb, 2938 2924 .hash = udp_lib_hash,
+1
net/ipv6/af_inet6.c
··· 695 695 #ifdef CONFIG_MMU 696 696 .mmap = tcp_mmap, 697 697 #endif 698 + .splice_eof = inet_splice_eof, 698 699 .sendpage = inet_sendpage, 699 700 .sendmsg_locked = tcp_sendmsg_locked, 700 701 .sendpage_locked = tcp_sendpage_locked,
+1
net/ipv6/tcp_ipv6.c
··· 2150 2150 .keepalive = tcp_set_keepalive, 2151 2151 .recvmsg = tcp_recvmsg, 2152 2152 .sendmsg = tcp_sendmsg, 2153 + .splice_eof = tcp_splice_eof, 2153 2154 .sendpage = tcp_sendpage, 2154 2155 .backlog_rcv = tcp_v6_do_rcv, 2155 2156 .release_cb = tcp_release_cb,
+15
net/ipv6/udp.c
··· 1653 1653 } 1654 1654 EXPORT_SYMBOL(udpv6_sendmsg); 1655 1655 1656 + static void udpv6_splice_eof(struct socket *sock) 1657 + { 1658 + struct sock *sk = sock->sk; 1659 + struct udp_sock *up = udp_sk(sk); 1660 + 1661 + if (!up->pending || READ_ONCE(up->corkflag)) 1662 + return; 1663 + 1664 + lock_sock(sk); 1665 + if (up->pending && !READ_ONCE(up->corkflag)) 1666 + udp_v6_push_pending_frames(sk); 1667 + release_sock(sk); 1668 + } 1669 + 1656 1670 void udpv6_destroy_sock(struct sock *sk) 1657 1671 { 1658 1672 struct udp_sock *up = udp_sk(sk); ··· 1778 1764 .getsockopt = udpv6_getsockopt, 1779 1765 .sendmsg = udpv6_sendmsg, 1780 1766 .recvmsg = udpv6_recvmsg, 1767 + .splice_eof = udpv6_splice_eof, 1781 1768 .release_cb = ip6_datagram_release_cb, 1782 1769 .hash = udp_lib_hash, 1783 1770 .unhash = udp_lib_unhash,
+15
net/kcm/kcmsock.c
··· 968 968 return err; 969 969 } 970 970 971 + static void kcm_splice_eof(struct socket *sock) 972 + { 973 + struct sock *sk = sock->sk; 974 + struct kcm_sock *kcm = kcm_sk(sk); 975 + 976 + if (skb_queue_empty_lockless(&sk->sk_write_queue)) 977 + return; 978 + 979 + lock_sock(sk); 980 + kcm_write_msgs(kcm); 981 + release_sock(sk); 982 + } 983 + 971 984 static ssize_t kcm_sendpage(struct socket *sock, struct page *page, 972 985 int offset, size_t size, int flags) 973 986 ··· 1786 1773 .sendmsg = kcm_sendmsg, 1787 1774 .recvmsg = kcm_recvmsg, 1788 1775 .mmap = sock_no_mmap, 1776 + .splice_eof = kcm_splice_eof, 1789 1777 .sendpage = kcm_sendpage, 1790 1778 }; 1791 1779 ··· 1808 1794 .sendmsg = kcm_sendmsg, 1809 1795 .recvmsg = kcm_recvmsg, 1810 1796 .mmap = sock_no_mmap, 1797 + .splice_eof = kcm_splice_eof, 1811 1798 .sendpage = kcm_sendpage, 1812 1799 .splice_read = kcm_splice_read, 1813 1800 };
+12 -24
net/socket.c
··· 57 57 #include <linux/mm.h> 58 58 #include <linux/socket.h> 59 59 #include <linux/file.h> 60 + #include <linux/splice.h> 60 61 #include <linux/net.h> 61 62 #include <linux/interrupt.h> 62 63 #include <linux/thread_info.h> ··· 127 126 unsigned int cmd, unsigned long arg); 128 127 #endif 129 128 static int sock_fasync(int fd, struct file *filp, int on); 130 - static ssize_t sock_sendpage(struct file *file, struct page *page, 131 - int offset, size_t size, loff_t *ppos, int more); 132 129 static ssize_t sock_splice_read(struct file *file, loff_t *ppos, 133 130 struct pipe_inode_info *pipe, size_t len, 134 131 unsigned int flags); 132 + static void sock_splice_eof(struct file *file); 135 133 136 134 #ifdef CONFIG_PROC_FS 137 135 static void sock_show_fdinfo(struct seq_file *m, struct file *f) ··· 162 162 .mmap = sock_mmap, 163 163 .release = sock_close, 164 164 .fasync = sock_fasync, 165 - .sendpage = sock_sendpage, 166 - .splice_write = generic_splice_sendpage, 165 + .splice_write = splice_to_socket, 167 166 .splice_read = sock_splice_read, 167 + .splice_eof = sock_splice_eof, 168 168 .show_fdinfo = sock_show_fdinfo, 169 169 }; 170 170 ··· 1066 1066 } 1067 1067 EXPORT_SYMBOL(kernel_recvmsg); 1068 1068 1069 - static ssize_t sock_sendpage(struct file *file, struct page *page, 1070 - int offset, size_t size, loff_t *ppos, int more) 1071 - { 1072 - struct socket *sock; 1073 - int flags; 1074 - int ret; 1075 - 1076 - sock = file->private_data; 1077 - 1078 - flags = (file->f_flags & O_NONBLOCK) ? MSG_DONTWAIT : 0; 1079 - /* more is a combination of MSG_MORE and MSG_SENDPAGE_NOTLAST */ 1080 - flags |= more; 1081 - 1082 - ret = kernel_sendpage(sock, page, offset, size, flags); 1083 - 1084 - if (trace_sock_send_length_enabled()) 1085 - call_trace_sock_send_length(sock->sk, ret, 0); 1086 - return ret; 1087 - } 1088 - 1089 1069 static ssize_t sock_splice_read(struct file *file, loff_t *ppos, 1090 1070 struct pipe_inode_info *pipe, size_t len, 1091 1071 unsigned int flags) ··· 1076 1096 return generic_file_splice_read(file, ppos, pipe, len, flags); 1077 1097 1078 1098 return sock->ops->splice_read(sock, ppos, pipe, len, flags); 1099 + } 1100 + 1101 + static void sock_splice_eof(struct file *file) 1102 + { 1103 + struct socket *sock = file->private_data; 1104 + 1105 + if (sock->ops->splice_eof) 1106 + sock->ops->splice_eof(sock); 1079 1107 } 1080 1108 1081 1109 static ssize_t sock_read_iter(struct kiocb *iocb, struct iov_iter *to)
+2
net/tls/tls.h
··· 97 97 void tls_sw_strparser_arm(struct sock *sk, struct tls_context *ctx); 98 98 void tls_sw_strparser_done(struct tls_context *tls_ctx); 99 99 int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); 100 + void tls_sw_splice_eof(struct socket *sock); 100 101 int tls_sw_sendpage_locked(struct sock *sk, struct page *page, 101 102 int offset, size_t size, int flags); 102 103 int tls_sw_sendpage(struct sock *sk, struct page *page, ··· 116 115 size_t len, unsigned int flags); 117 116 118 117 int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size); 118 + void tls_device_splice_eof(struct socket *sock); 119 119 int tls_device_sendpage(struct sock *sk, struct page *page, 120 120 int offset, size_t size, int flags); 121 121 int tls_tx_records(struct sock *sk, int flags);
+57 -53
net/tls/tls_device.c
··· 422 422 return 0; 423 423 } 424 424 425 - union tls_iter_offset { 426 - struct iov_iter *msg_iter; 427 - int offset; 428 - }; 429 - 430 425 static int tls_push_data(struct sock *sk, 431 - union tls_iter_offset iter_offset, 426 + struct iov_iter *iter, 432 427 size_t size, int flags, 433 - unsigned char record_type, 434 - struct page *zc_page) 428 + unsigned char record_type) 435 429 { 436 430 struct tls_context *tls_ctx = tls_get_ctx(sk); 437 431 struct tls_prot_info *prot = &tls_ctx->prot_info; ··· 441 447 long timeo; 442 448 443 449 if (flags & 444 - ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST)) 450 + ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST | 451 + MSG_SPLICE_PAGES)) 445 452 return -EOPNOTSUPP; 446 453 447 454 if (unlikely(sk->sk_err)) ··· 494 499 record = ctx->open_record; 495 500 496 501 copy = min_t(size_t, size, max_open_record_len - record->len); 497 - if (copy && zc_page) { 502 + if (copy && (flags & MSG_SPLICE_PAGES)) { 498 503 struct page_frag zc_pfrag; 504 + struct page **pages = &zc_pfrag.page; 505 + size_t off; 499 506 500 - zc_pfrag.page = zc_page; 501 - zc_pfrag.offset = iter_offset.offset; 507 + rc = iov_iter_extract_pages(iter, &pages, 508 + copy, 1, 0, &off); 509 + if (rc <= 0) { 510 + if (rc == 0) 511 + rc = -EIO; 512 + goto handle_error; 513 + } 514 + copy = rc; 515 + 516 + if (WARN_ON_ONCE(!sendpage_ok(zc_pfrag.page))) { 517 + iov_iter_revert(iter, copy); 518 + rc = -EIO; 519 + goto handle_error; 520 + } 521 + 522 + zc_pfrag.offset = off; 502 523 zc_pfrag.size = copy; 503 524 tls_append_frag(record, &zc_pfrag, copy); 504 - 505 - iter_offset.offset += copy; 506 525 } else if (copy) { 507 526 copy = min_t(size_t, copy, pfrag->size - pfrag->offset); 508 527 509 528 rc = tls_device_copy_data(page_address(pfrag->page) + 510 529 pfrag->offset, copy, 511 - iter_offset.msg_iter); 530 + iter); 512 531 if (rc) 513 532 goto handle_error; 514 533 tls_append_frag(record, pfrag, copy); ··· 577 568 { 578 569 unsigned char record_type = TLS_RECORD_TYPE_DATA; 579 570 struct tls_context *tls_ctx = tls_get_ctx(sk); 580 - union tls_iter_offset iter; 581 571 int rc; 572 + 573 + if (!tls_ctx->zerocopy_sendfile) 574 + msg->msg_flags &= ~MSG_SPLICE_PAGES; 582 575 583 576 mutex_lock(&tls_ctx->tx_lock); 584 577 lock_sock(sk); ··· 591 580 goto out; 592 581 } 593 582 594 - iter.msg_iter = &msg->msg_iter; 595 - rc = tls_push_data(sk, iter, size, msg->msg_flags, record_type, NULL); 583 + rc = tls_push_data(sk, &msg->msg_iter, size, msg->msg_flags, 584 + record_type); 596 585 597 586 out: 598 587 release_sock(sk); ··· 600 589 return rc; 601 590 } 602 591 603 - int tls_device_sendpage(struct sock *sk, struct page *page, 604 - int offset, size_t size, int flags) 592 + void tls_device_splice_eof(struct socket *sock) 605 593 { 594 + struct sock *sk = sock->sk; 606 595 struct tls_context *tls_ctx = tls_get_ctx(sk); 607 - union tls_iter_offset iter_offset; 608 - struct iov_iter msg_iter; 609 - char *kaddr; 610 - struct kvec iov; 611 - int rc; 596 + struct iov_iter iter = {}; 612 597 613 - if (flags & MSG_SENDPAGE_NOTLAST) 614 - flags |= MSG_MORE; 598 + if (!tls_is_partially_sent_record(tls_ctx)) 599 + return; 615 600 616 601 mutex_lock(&tls_ctx->tx_lock); 617 602 lock_sock(sk); 618 603 619 - if (flags & MSG_OOB) { 620 - rc = -EOPNOTSUPP; 621 - goto out; 604 + if (tls_is_partially_sent_record(tls_ctx)) { 605 + iov_iter_bvec(&iter, ITER_SOURCE, NULL, 0, 0); 606 + tls_push_data(sk, &iter, 0, 0, TLS_RECORD_TYPE_DATA); 622 607 } 623 608 624 - if (tls_ctx->zerocopy_sendfile) { 625 - iter_offset.offset = offset; 626 - rc = tls_push_data(sk, iter_offset, size, 627 - flags, TLS_RECORD_TYPE_DATA, page); 628 - goto out; 629 - } 630 - 631 - kaddr = kmap(page); 632 - iov.iov_base = kaddr + offset; 633 - iov.iov_len = size; 634 - iov_iter_kvec(&msg_iter, ITER_SOURCE, &iov, 1, size); 635 - iter_offset.msg_iter = &msg_iter; 636 - rc = tls_push_data(sk, iter_offset, size, flags, TLS_RECORD_TYPE_DATA, 637 - NULL); 638 - kunmap(page); 639 - 640 - out: 641 609 release_sock(sk); 642 610 mutex_unlock(&tls_ctx->tx_lock); 643 - return rc; 611 + } 612 + 613 + int tls_device_sendpage(struct sock *sk, struct page *page, 614 + int offset, size_t size, int flags) 615 + { 616 + struct bio_vec bvec; 617 + struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; 618 + 619 + if (flags & MSG_SENDPAGE_NOTLAST) 620 + msg.msg_flags |= MSG_MORE; 621 + 622 + if (flags & MSG_OOB) 623 + return -EOPNOTSUPP; 624 + 625 + bvec_set_page(&bvec, page, size, offset); 626 + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); 627 + return tls_device_sendmsg(sk, &msg, size); 644 628 } 645 629 646 630 struct tls_record_info *tls_get_record(struct tls_offload_context_tx *context, ··· 700 694 701 695 static int tls_device_push_pending_record(struct sock *sk, int flags) 702 696 { 703 - union tls_iter_offset iter; 704 - struct iov_iter msg_iter; 697 + struct iov_iter iter; 705 698 706 - iov_iter_kvec(&msg_iter, ITER_SOURCE, NULL, 0, 0); 707 - iter.msg_iter = &msg_iter; 708 - return tls_push_data(sk, iter, 0, flags, TLS_RECORD_TYPE_DATA, NULL); 699 + iov_iter_kvec(&iter, ITER_SOURCE, NULL, 0, 0); 700 + return tls_push_data(sk, &iter, 0, flags, TLS_RECORD_TYPE_DATA); 709 701 } 710 702 711 703 void tls_device_write_space(struct sock *sk, struct tls_context *ctx)
+4
net/tls/tls_main.c
··· 957 957 ops[TLS_BASE][TLS_BASE] = *base; 958 958 959 959 ops[TLS_SW ][TLS_BASE] = ops[TLS_BASE][TLS_BASE]; 960 + ops[TLS_SW ][TLS_BASE].splice_eof = tls_sw_splice_eof; 960 961 ops[TLS_SW ][TLS_BASE].sendpage_locked = tls_sw_sendpage_locked; 961 962 962 963 ops[TLS_BASE][TLS_SW ] = ops[TLS_BASE][TLS_BASE]; ··· 1028 1027 1029 1028 prot[TLS_SW][TLS_BASE] = prot[TLS_BASE][TLS_BASE]; 1030 1029 prot[TLS_SW][TLS_BASE].sendmsg = tls_sw_sendmsg; 1030 + prot[TLS_SW][TLS_BASE].splice_eof = tls_sw_splice_eof; 1031 1031 prot[TLS_SW][TLS_BASE].sendpage = tls_sw_sendpage; 1032 1032 1033 1033 prot[TLS_BASE][TLS_SW] = prot[TLS_BASE][TLS_BASE]; ··· 1044 1042 #ifdef CONFIG_TLS_DEVICE 1045 1043 prot[TLS_HW][TLS_BASE] = prot[TLS_BASE][TLS_BASE]; 1046 1044 prot[TLS_HW][TLS_BASE].sendmsg = tls_device_sendmsg; 1045 + prot[TLS_HW][TLS_BASE].splice_eof = tls_device_splice_eof; 1047 1046 prot[TLS_HW][TLS_BASE].sendpage = tls_device_sendpage; 1048 1047 1049 1048 prot[TLS_HW][TLS_SW] = prot[TLS_BASE][TLS_SW]; 1050 1049 prot[TLS_HW][TLS_SW].sendmsg = tls_device_sendmsg; 1050 + prot[TLS_HW][TLS_SW].splice_eof = tls_device_splice_eof; 1051 1051 prot[TLS_HW][TLS_SW].sendpage = tls_device_sendpage; 1052 1052 1053 1053 prot[TLS_BASE][TLS_HW] = prot[TLS_BASE][TLS_SW];
+157 -145
net/tls/tls_sw.c
··· 931 931 &copied, flags); 932 932 } 933 933 934 - int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) 934 + static int tls_sw_sendmsg_splice(struct sock *sk, struct msghdr *msg, 935 + struct sk_msg *msg_pl, size_t try_to_copy, 936 + ssize_t *copied) 937 + { 938 + struct page *page = NULL, **pages = &page; 939 + 940 + do { 941 + ssize_t part; 942 + size_t off; 943 + 944 + part = iov_iter_extract_pages(&msg->msg_iter, &pages, 945 + try_to_copy, 1, 0, &off); 946 + if (part <= 0) 947 + return part ?: -EIO; 948 + 949 + if (WARN_ON_ONCE(!sendpage_ok(page))) { 950 + iov_iter_revert(&msg->msg_iter, part); 951 + return -EIO; 952 + } 953 + 954 + sk_msg_page_add(msg_pl, page, part, off); 955 + sk_mem_charge(sk, part); 956 + *copied += part; 957 + try_to_copy -= part; 958 + } while (try_to_copy && !sk_msg_full(msg_pl)); 959 + 960 + return 0; 961 + } 962 + 963 + static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg, 964 + size_t size) 935 965 { 936 966 long timeo = sock_sndtimeo(sk, msg->msg_flags & MSG_DONTWAIT); 937 967 struct tls_context *tls_ctx = tls_get_ctx(sk); ··· 983 953 int orig_size; 984 954 int ret = 0; 985 955 int pending; 986 - 987 - if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | 988 - MSG_CMSG_COMPAT)) 989 - return -EOPNOTSUPP; 990 - 991 - ret = mutex_lock_interruptible(&tls_ctx->tx_lock); 992 - if (ret) 993 - return ret; 994 - lock_sock(sk); 995 956 996 957 if (unlikely(msg->msg_controllen)) { 997 958 ret = tls_process_cmsg(sk, msg, &record_type); ··· 1039 1018 */ 1040 1019 try_to_copy -= required_size - msg_en->sg.size; 1041 1020 full_record = true; 1021 + } 1022 + 1023 + if (try_to_copy && (msg->msg_flags & MSG_SPLICE_PAGES)) { 1024 + ret = tls_sw_sendmsg_splice(sk, msg, msg_pl, 1025 + try_to_copy, &copied); 1026 + if (ret < 0) 1027 + goto send_end; 1028 + tls_ctx->pending_open_record_frags = true; 1029 + if (full_record || eor || sk_msg_full(msg_pl)) 1030 + goto copied; 1031 + continue; 1042 1032 } 1043 1033 1044 1034 if (!is_kvec && (full_record || eor) && !async_capable) { ··· 1116 1084 */ 1117 1085 tls_ctx->pending_open_record_frags = true; 1118 1086 copied += try_to_copy; 1087 + copied: 1119 1088 if (full_record || eor) { 1120 1089 ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, 1121 1090 record_type, &copied, ··· 1184 1151 1185 1152 send_end: 1186 1153 ret = sk_stream_error(sk, msg->msg_flags, ret); 1187 - 1188 - release_sock(sk); 1189 - mutex_unlock(&tls_ctx->tx_lock); 1190 1154 return copied > 0 ? copied : ret; 1191 1155 } 1192 1156 1193 - static int tls_sw_do_sendpage(struct sock *sk, struct page *page, 1194 - int offset, size_t size, int flags) 1195 - { 1196 - long timeo = sock_sndtimeo(sk, flags & MSG_DONTWAIT); 1197 - struct tls_context *tls_ctx = tls_get_ctx(sk); 1198 - struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx); 1199 - struct tls_prot_info *prot = &tls_ctx->prot_info; 1200 - unsigned char record_type = TLS_RECORD_TYPE_DATA; 1201 - struct sk_msg *msg_pl; 1202 - struct tls_rec *rec; 1203 - int num_async = 0; 1204 - ssize_t copied = 0; 1205 - bool full_record; 1206 - int record_room; 1207 - int ret = 0; 1208 - bool eor; 1209 - 1210 - eor = !(flags & MSG_SENDPAGE_NOTLAST); 1211 - sk_clear_bit(SOCKWQ_ASYNC_NOSPACE, sk); 1212 - 1213 - /* Call the sk_stream functions to manage the sndbuf mem. */ 1214 - while (size > 0) { 1215 - size_t copy, required_size; 1216 - 1217 - if (sk->sk_err) { 1218 - ret = -sk->sk_err; 1219 - goto sendpage_end; 1220 - } 1221 - 1222 - if (ctx->open_rec) 1223 - rec = ctx->open_rec; 1224 - else 1225 - rec = ctx->open_rec = tls_get_rec(sk); 1226 - if (!rec) { 1227 - ret = -ENOMEM; 1228 - goto sendpage_end; 1229 - } 1230 - 1231 - msg_pl = &rec->msg_plaintext; 1232 - 1233 - full_record = false; 1234 - record_room = TLS_MAX_PAYLOAD_SIZE - msg_pl->sg.size; 1235 - copy = size; 1236 - if (copy >= record_room) { 1237 - copy = record_room; 1238 - full_record = true; 1239 - } 1240 - 1241 - required_size = msg_pl->sg.size + copy + prot->overhead_size; 1242 - 1243 - if (!sk_stream_memory_free(sk)) 1244 - goto wait_for_sndbuf; 1245 - alloc_payload: 1246 - ret = tls_alloc_encrypted_msg(sk, required_size); 1247 - if (ret) { 1248 - if (ret != -ENOSPC) 1249 - goto wait_for_memory; 1250 - 1251 - /* Adjust copy according to the amount that was 1252 - * actually allocated. The difference is due 1253 - * to max sg elements limit 1254 - */ 1255 - copy -= required_size - msg_pl->sg.size; 1256 - full_record = true; 1257 - } 1258 - 1259 - sk_msg_page_add(msg_pl, page, copy, offset); 1260 - sk_mem_charge(sk, copy); 1261 - 1262 - offset += copy; 1263 - size -= copy; 1264 - copied += copy; 1265 - 1266 - tls_ctx->pending_open_record_frags = true; 1267 - if (full_record || eor || sk_msg_full(msg_pl)) { 1268 - ret = bpf_exec_tx_verdict(msg_pl, sk, full_record, 1269 - record_type, &copied, flags); 1270 - if (ret) { 1271 - if (ret == -EINPROGRESS) 1272 - num_async++; 1273 - else if (ret == -ENOMEM) 1274 - goto wait_for_memory; 1275 - else if (ret != -EAGAIN) { 1276 - if (ret == -ENOSPC) 1277 - ret = 0; 1278 - goto sendpage_end; 1279 - } 1280 - } 1281 - } 1282 - continue; 1283 - wait_for_sndbuf: 1284 - set_bit(SOCK_NOSPACE, &sk->sk_socket->flags); 1285 - wait_for_memory: 1286 - ret = sk_stream_wait_memory(sk, &timeo); 1287 - if (ret) { 1288 - if (ctx->open_rec) 1289 - tls_trim_both_msgs(sk, msg_pl->sg.size); 1290 - goto sendpage_end; 1291 - } 1292 - 1293 - if (ctx->open_rec) 1294 - goto alloc_payload; 1295 - } 1296 - 1297 - if (num_async) { 1298 - /* Transmit if any encryptions have completed */ 1299 - if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) { 1300 - cancel_delayed_work(&ctx->tx_work.work); 1301 - tls_tx_records(sk, flags); 1302 - } 1303 - } 1304 - sendpage_end: 1305 - ret = sk_stream_error(sk, flags, ret); 1306 - return copied > 0 ? copied : ret; 1307 - } 1308 - 1309 - int tls_sw_sendpage_locked(struct sock *sk, struct page *page, 1310 - int offset, size_t size, int flags) 1311 - { 1312 - if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | 1313 - MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY | 1314 - MSG_NO_SHARED_FRAGS)) 1315 - return -EOPNOTSUPP; 1316 - 1317 - return tls_sw_do_sendpage(sk, page, offset, size, flags); 1318 - } 1319 - 1320 - int tls_sw_sendpage(struct sock *sk, struct page *page, 1321 - int offset, size_t size, int flags) 1157 + int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) 1322 1158 { 1323 1159 struct tls_context *tls_ctx = tls_get_ctx(sk); 1324 1160 int ret; 1325 1161 1326 - if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | 1327 - MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY)) 1162 + if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | 1163 + MSG_CMSG_COMPAT | MSG_SPLICE_PAGES | 1164 + MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY)) 1328 1165 return -EOPNOTSUPP; 1329 1166 1330 1167 ret = mutex_lock_interruptible(&tls_ctx->tx_lock); 1331 1168 if (ret) 1332 1169 return ret; 1333 1170 lock_sock(sk); 1334 - ret = tls_sw_do_sendpage(sk, page, offset, size, flags); 1171 + ret = tls_sw_sendmsg_locked(sk, msg, size); 1335 1172 release_sock(sk); 1336 1173 mutex_unlock(&tls_ctx->tx_lock); 1337 1174 return ret; 1175 + } 1176 + 1177 + /* 1178 + * Handle unexpected EOF during splice without SPLICE_F_MORE set. 1179 + */ 1180 + void tls_sw_splice_eof(struct socket *sock) 1181 + { 1182 + struct sock *sk = sock->sk; 1183 + struct tls_context *tls_ctx = tls_get_ctx(sk); 1184 + struct tls_sw_context_tx *ctx = tls_sw_ctx_tx(tls_ctx); 1185 + struct tls_rec *rec; 1186 + struct sk_msg *msg_pl; 1187 + ssize_t copied = 0; 1188 + bool retrying = false; 1189 + int ret = 0; 1190 + int pending; 1191 + 1192 + if (!ctx->open_rec) 1193 + return; 1194 + 1195 + mutex_lock(&tls_ctx->tx_lock); 1196 + lock_sock(sk); 1197 + 1198 + retry: 1199 + rec = ctx->open_rec; 1200 + if (!rec) 1201 + goto unlock; 1202 + 1203 + msg_pl = &rec->msg_plaintext; 1204 + 1205 + /* Check the BPF advisor and perform transmission. */ 1206 + ret = bpf_exec_tx_verdict(msg_pl, sk, false, TLS_RECORD_TYPE_DATA, 1207 + &copied, 0); 1208 + switch (ret) { 1209 + case 0: 1210 + case -EAGAIN: 1211 + if (retrying) 1212 + goto unlock; 1213 + retrying = true; 1214 + goto retry; 1215 + case -EINPROGRESS: 1216 + break; 1217 + default: 1218 + goto unlock; 1219 + } 1220 + 1221 + /* Wait for pending encryptions to get completed */ 1222 + spin_lock_bh(&ctx->encrypt_compl_lock); 1223 + ctx->async_notify = true; 1224 + 1225 + pending = atomic_read(&ctx->encrypt_pending); 1226 + spin_unlock_bh(&ctx->encrypt_compl_lock); 1227 + if (pending) 1228 + crypto_wait_req(-EINPROGRESS, &ctx->async_wait); 1229 + else 1230 + reinit_completion(&ctx->async_wait.completion); 1231 + 1232 + /* There can be no concurrent accesses, since we have no pending 1233 + * encrypt operations 1234 + */ 1235 + WRITE_ONCE(ctx->async_notify, false); 1236 + 1237 + if (ctx->async_wait.err) 1238 + goto unlock; 1239 + 1240 + /* Transmit if any encryptions have completed */ 1241 + if (test_and_clear_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask)) { 1242 + cancel_delayed_work(&ctx->tx_work.work); 1243 + tls_tx_records(sk, 0); 1244 + } 1245 + 1246 + unlock: 1247 + release_sock(sk); 1248 + mutex_unlock(&tls_ctx->tx_lock); 1249 + } 1250 + 1251 + int tls_sw_sendpage_locked(struct sock *sk, struct page *page, 1252 + int offset, size_t size, int flags) 1253 + { 1254 + struct bio_vec bvec; 1255 + struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; 1256 + 1257 + if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | 1258 + MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY | 1259 + MSG_NO_SHARED_FRAGS)) 1260 + return -EOPNOTSUPP; 1261 + if (flags & MSG_SENDPAGE_NOTLAST) 1262 + msg.msg_flags |= MSG_MORE; 1263 + 1264 + bvec_set_page(&bvec, page, size, offset); 1265 + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); 1266 + return tls_sw_sendmsg_locked(sk, &msg, size); 1267 + } 1268 + 1269 + int tls_sw_sendpage(struct sock *sk, struct page *page, 1270 + int offset, size_t size, int flags) 1271 + { 1272 + struct bio_vec bvec; 1273 + struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, }; 1274 + 1275 + if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | 1276 + MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY)) 1277 + return -EOPNOTSUPP; 1278 + if (flags & MSG_SENDPAGE_NOTLAST) 1279 + msg.msg_flags |= MSG_MORE; 1280 + 1281 + bvec_set_page(&bvec, page, size, offset); 1282 + iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, size); 1283 + return tls_sw_sendmsg(sk, &msg, size); 1338 1284 } 1339 1285 1340 1286 static int