Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'vsock-virtio-vhost-zerocopy'

Arseniy Krasnov says:

====================
vsock/virtio/vhost: MSG_ZEROCOPY preparations

this patchset is first of three parts of another big patchset for
MSG_ZEROCOPY flag support:
https://lore.kernel.org/netdev/20230701063947.3422088-1-AVKrasnov@sberdevices.ru/

During review of this series, Stefano Garzarella <sgarzare@redhat.com>
suggested to split it for three parts to simplify review and merging:

1) virtio and vhost updates (for fragged skbs) <--- this patchset
2) AF_VSOCK updates (allows to enable MSG_ZEROCOPY mode and read
tx completions) and update for Documentation/.
3) Updates for tests and utils.

This series enables handling of fragged skbs in virtio and vhost parts.
Newly logic won't be triggered, because SO_ZEROCOPY options is still
impossible to enable at this moment (next bunch of patches from big
set above will enable it).

I've included changelog to some patches anyway, because there were some
comments during review of last big patchset from the link above.

Head for this patchset is:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=f2fa1c812c91e99d0317d1fc7d845e1e05f39716

Link to v1:
https://lore.kernel.org/netdev/20230717210051.856388-1-AVKrasnov@sberdevices.ru/
Link to v2:
https://lore.kernel.org/netdev/20230718180237.3248179-1-AVKrasnov@sberdevices.ru/
Link to v3:
https://lore.kernel.org/netdev/20230720214245.457298-1-AVKrasnov@sberdevices.ru/
Link to v4:
https://lore.kernel.org/netdev/20230727222627.1895355-1-AVKrasnov@sberdevices.ru/
Link to v5:
https://lore.kernel.org/netdev/20230730085905.3420811-1-AVKrasnov@sberdevices.ru/
Link to v6:
https://lore.kernel.org/netdev/20230814212720.3679058-1-AVKrasnov@sberdevices.ru/
Link to v7:
https://lore.kernel.org/netdev/20230827085436.941183-1-avkrasnov@salutedevices.com/
Link to v8:
https://lore.kernel.org/netdev/20230911202234.1932024-1-avkrasnov@salutedevices.com/

Changelog:
v3 -> v4:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
v4 -> v5:
* See per-patch changelog after ---.
v5 -> v6:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
v6 -> v7:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
v7 -> v8:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
v8 -> v9:
* Patchset rebased and tested on new HEAD of net-next (see hash above).
* See per-patch changelog after ---.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+1170 -17
+11 -2
Documentation/networking/msg_zerocopy.rst
··· 7 7 ===== 8 8 9 9 The MSG_ZEROCOPY flag enables copy avoidance for socket send calls. 10 - The feature is currently implemented for TCP and UDP sockets. 10 + The feature is currently implemented for TCP, UDP and VSOCK (with 11 + virtio transport) sockets. 11 12 12 13 13 14 Opportunity and Caveats ··· 175 174 is encoded in the standard error format, sock_extended_err. 176 175 177 176 The level and type fields in the control data are protocol family 178 - specific, IP_RECVERR or IPV6_RECVERR. 177 + specific, IP_RECVERR or IPV6_RECVERR (for TCP or UDP socket). 178 + For VSOCK socket, cmsg_level will be SOL_VSOCK and cmsg_type will be 179 + VSOCK_RECVERR. 179 180 180 181 Error origin is the new type SO_EE_ORIGIN_ZEROCOPY. ee_errno is zero, 181 182 as explained before, to avoid blocking read and write system calls on ··· 238 235 Loopback 239 236 -------- 240 237 238 + For TCP and UDP: 241 239 Data sent to local sockets can be queued indefinitely if the receive 242 240 process does not read its socket. Unbound notification latency is not 243 241 acceptable. For this reason all packets generated with MSG_ZEROCOPY 244 242 that are looped to a local socket will incur a deferred copy. This 245 243 includes looping onto packet sockets (e.g., tcpdump) and tun devices. 246 244 245 + For VSOCK: 246 + Data path sent to local sockets is the same as for non-local sockets. 247 247 248 248 Testing 249 249 ======= ··· 260 254 namespaces, the test will not show any improvement. For testing, the 261 255 loopback restriction can be temporarily relaxed by making 262 256 skb_orphan_frags_rx identical to skb_orphan_frags. 257 + 258 + For VSOCK type of socket example can be found in 259 + tools/testing/vsock/vsock_test_zerocopy.c.
+7
drivers/vhost/vsock.c
··· 398 398 return val < vq->num; 399 399 } 400 400 401 + static bool vhost_transport_msgzerocopy_allow(void) 402 + { 403 + return true; 404 + } 405 + 401 406 static bool vhost_transport_seqpacket_allow(u32 remote_cid); 402 407 403 408 static struct virtio_transport vhost_transport = { ··· 435 430 .seqpacket_enqueue = virtio_transport_seqpacket_enqueue, 436 431 .seqpacket_allow = vhost_transport_seqpacket_allow, 437 432 .seqpacket_has_data = virtio_transport_seqpacket_has_data, 433 + 434 + .msgzerocopy_allow = vhost_transport_msgzerocopy_allow, 438 435 439 436 .notify_poll_in = virtio_transport_notify_poll_in, 440 437 .notify_poll_out = virtio_transport_notify_poll_out,
+1
include/linux/socket.h
··· 383 383 #define SOL_MPTCP 284 384 384 #define SOL_MCTP 285 385 385 #define SOL_SMC 286 386 + #define SOL_VSOCK 287 386 387 387 388 /* IPX options */ 388 389 #define IPX_TYPE 1
+7
include/net/af_vsock.h
··· 177 177 178 178 /* Read a single skb */ 179 179 int (*read_skb)(struct vsock_sock *, skb_read_actor_t); 180 + 181 + /* Zero-copy. */ 182 + bool (*msgzerocopy_allow)(void); 180 183 }; 181 184 182 185 /**** CORE ****/ ··· 244 241 {} 245 242 #endif 246 243 244 + static inline bool vsock_msgzerocopy_allow(const struct vsock_transport *t) 245 + { 246 + return t->msgzerocopy_allow && t->msgzerocopy_allow(); 247 + } 247 248 #endif /* __AF_VSOCK_H__ */
+17
include/uapi/linux/vm_sockets.h
··· 191 191 192 192 #define IOCTL_VM_SOCKETS_GET_LOCAL_CID _IO(7, 0xb9) 193 193 194 + /* MSG_ZEROCOPY notifications are encoded in the standard error format, 195 + * sock_extended_err. See Documentation/networking/msg_zerocopy.rst in 196 + * kernel source tree for more details. 197 + */ 198 + 199 + /* 'cmsg_level' field value of 'struct cmsghdr' for notification parsing 200 + * when MSG_ZEROCOPY flag is used on transmissions. 201 + */ 202 + 203 + #define SOL_VSOCK 287 204 + 205 + /* 'cmsg_type' field value of 'struct cmsghdr' for notification parsing 206 + * when MSG_ZEROCOPY flag is used on transmissions. 207 + */ 208 + 209 + #define VSOCK_RECVERR 1 210 + 194 211 #endif /* _UAPI_VM_SOCKETS_H */
+61 -2
net/vmw_vsock/af_vsock.c
··· 89 89 #include <linux/types.h> 90 90 #include <linux/bitops.h> 91 91 #include <linux/cred.h> 92 + #include <linux/errqueue.h> 92 93 #include <linux/init.h> 93 94 #include <linux/io.h> 94 95 #include <linux/kernel.h> ··· 111 110 #include <linux/workqueue.h> 112 111 #include <net/sock.h> 113 112 #include <net/af_vsock.h> 113 + #include <uapi/linux/vm_sockets.h> 114 114 115 115 static int __vsock_bind(struct sock *sk, struct sockaddr_vm *addr); 116 116 static void vsock_sk_destruct(struct sock *sk); ··· 1032 1030 poll_wait(file, sk_sleep(sk), wait); 1033 1031 mask = 0; 1034 1032 1035 - if (sk->sk_err) 1033 + if (sk->sk_err || !skb_queue_empty_lockless(&sk->sk_error_queue)) 1036 1034 /* Signify that there has been an error on this socket. */ 1037 1035 mask |= EPOLLERR; 1038 1036 ··· 1406 1404 goto out; 1407 1405 } 1408 1406 1407 + if (vsock_msgzerocopy_allow(transport)) { 1408 + set_bit(SOCK_SUPPORT_ZC, &sk->sk_socket->flags); 1409 + } else if (sock_flag(sk, SOCK_ZEROCOPY)) { 1410 + /* If this option was set before 'connect()', 1411 + * when transport was unknown, check that this 1412 + * feature is supported here. 1413 + */ 1414 + err = -EOPNOTSUPP; 1415 + goto out; 1416 + } 1417 + 1409 1418 err = vsock_auto_bind(vsk); 1410 1419 if (err) 1411 1420 goto out; ··· 1571 1558 } else { 1572 1559 newsock->state = SS_CONNECTED; 1573 1560 sock_graft(connected, newsock); 1561 + if (vsock_msgzerocopy_allow(vconnected->transport)) 1562 + set_bit(SOCK_SUPPORT_ZC, 1563 + &connected->sk_socket->flags); 1574 1564 } 1575 1565 1576 1566 release_sock(connected); ··· 1651 1635 const struct vsock_transport *transport; 1652 1636 u64 val; 1653 1637 1654 - if (level != AF_VSOCK) 1638 + if (level != AF_VSOCK && level != SOL_SOCKET) 1655 1639 return -ENOPROTOOPT; 1656 1640 1657 1641 #define COPY_IN(_v) \ ··· 1673 1657 lock_sock(sk); 1674 1658 1675 1659 transport = vsk->transport; 1660 + 1661 + if (level == SOL_SOCKET) { 1662 + int zerocopy; 1663 + 1664 + if (optname != SO_ZEROCOPY) { 1665 + release_sock(sk); 1666 + return sock_setsockopt(sock, level, optname, optval, optlen); 1667 + } 1668 + 1669 + /* Use 'int' type here, because variable to 1670 + * set this option usually has this type. 1671 + */ 1672 + COPY_IN(zerocopy); 1673 + 1674 + if (zerocopy < 0 || zerocopy > 1) { 1675 + err = -EINVAL; 1676 + goto exit; 1677 + } 1678 + 1679 + if (transport && !vsock_msgzerocopy_allow(transport)) { 1680 + err = -EOPNOTSUPP; 1681 + goto exit; 1682 + } 1683 + 1684 + sock_valbool_flag(sk, SOCK_ZEROCOPY, zerocopy); 1685 + goto exit; 1686 + } 1676 1687 1677 1688 switch (optname) { 1678 1689 case SO_VM_SOCKETS_BUFFER_SIZE: ··· 1862 1819 1863 1820 if (!vsock_addr_bound(&vsk->remote_addr)) { 1864 1821 err = -EDESTADDRREQ; 1822 + goto out; 1823 + } 1824 + 1825 + if (msg->msg_flags & MSG_ZEROCOPY && 1826 + !vsock_msgzerocopy_allow(transport)) { 1827 + err = -EOPNOTSUPP; 1865 1828 goto out; 1866 1829 } 1867 1830 ··· 2186 2137 int err; 2187 2138 2188 2139 sk = sock->sk; 2140 + 2141 + if (unlikely(flags & MSG_ERRQUEUE)) 2142 + return sock_recv_errqueue(sk, msg, len, SOL_VSOCK, VSOCK_RECVERR); 2143 + 2189 2144 vsk = vsock_sk(sk); 2190 2145 err = 0; 2191 2146 ··· 2356 2303 return ret; 2357 2304 } 2358 2305 } 2306 + 2307 + /* SOCK_DGRAM doesn't have 'setsockopt' callback set in its 2308 + * proto_ops, so there is no handler for custom logic. 2309 + */ 2310 + if (sock_type_connectible(sock->type)) 2311 + set_bit(SOCK_CUSTOM_SOCKOPT, &sk->sk_socket->flags); 2359 2312 2360 2313 vsock_insert_unbound(vsk); 2361 2314
+7
net/vmw_vsock/virtio_transport.c
··· 486 486 return res; 487 487 } 488 488 489 + static bool virtio_transport_msgzerocopy_allow(void) 490 + { 491 + return true; 492 + } 493 + 489 494 static bool virtio_transport_seqpacket_allow(u32 remote_cid); 490 495 491 496 static struct virtio_transport virtio_transport = { ··· 523 518 .seqpacket_enqueue = virtio_transport_seqpacket_enqueue, 524 519 .seqpacket_allow = virtio_transport_seqpacket_allow, 525 520 .seqpacket_has_data = virtio_transport_seqpacket_has_data, 521 + 522 + .msgzerocopy_allow = virtio_transport_msgzerocopy_allow, 526 523 527 524 .notify_poll_in = virtio_transport_notify_poll_in, 528 525 .notify_poll_out = virtio_transport_notify_poll_out,
+6
net/vmw_vsock/vsock_loopback.c
··· 47 47 } 48 48 49 49 static bool vsock_loopback_seqpacket_allow(u32 remote_cid); 50 + static bool vsock_loopback_msgzerocopy_allow(void) 51 + { 52 + return true; 53 + } 50 54 51 55 static struct virtio_transport loopback_transport = { 52 56 .transport = { ··· 82 78 .seqpacket_enqueue = virtio_transport_seqpacket_enqueue, 83 79 .seqpacket_allow = vsock_loopback_seqpacket_allow, 84 80 .seqpacket_has_data = virtio_transport_seqpacket_has_data, 81 + 82 + .msgzerocopy_allow = vsock_loopback_msgzerocopy_allow, 85 83 86 84 .notify_poll_in = virtio_transport_notify_poll_in, 87 85 .notify_poll_out = virtio_transport_notify_poll_out,
+1
tools/testing/vsock/.gitignore
··· 3 3 vsock_test 4 4 vsock_diag_test 5 5 vsock_perf 6 + vsock_uring_test
+7 -4
tools/testing/vsock/Makefile
··· 1 1 # SPDX-License-Identifier: GPL-2.0-only 2 2 all: test vsock_perf 3 - test: vsock_test vsock_diag_test 4 - vsock_test: vsock_test.o timeout.o control.o util.o 3 + test: vsock_test vsock_diag_test vsock_uring_test 4 + vsock_test: vsock_test.o vsock_test_zerocopy.o timeout.o control.o util.o msg_zerocopy_common.o 5 5 vsock_diag_test: vsock_diag_test.o timeout.o control.o util.o 6 - vsock_perf: vsock_perf.o 6 + vsock_perf: vsock_perf.o msg_zerocopy_common.o 7 + 8 + vsock_uring_test: LDLIBS = -luring 9 + vsock_uring_test: control.o util.o vsock_uring_test.o timeout.o msg_zerocopy_common.o 7 10 8 11 CFLAGS += -g -O2 -Werror -Wall -I. -I../../include -I../../../usr/include -Wno-pointer-sign -fno-strict-overflow -fno-strict-aliasing -fno-common -MMD -U_FORTIFY_SOURCE -D_GNU_SOURCE 9 12 .PHONY: all test clean 10 13 clean: 11 - ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf 14 + ${RM} *.o *.d vsock_test vsock_diag_test vsock_perf vsock_uring_test 12 15 -include *.d
+87
tools/testing/vsock/msg_zerocopy_common.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* Some common code for MSG_ZEROCOPY logic 3 + * 4 + * Copyright (C) 2023 SberDevices. 5 + * 6 + * Author: Arseniy Krasnov <avkrasnov@salutedevices.com> 7 + */ 8 + 9 + #include <stdio.h> 10 + #include <stdlib.h> 11 + #include <sys/types.h> 12 + #include <sys/socket.h> 13 + #include <linux/errqueue.h> 14 + 15 + #include "msg_zerocopy_common.h" 16 + 17 + void enable_so_zerocopy(int fd) 18 + { 19 + int val = 1; 20 + 21 + if (setsockopt(fd, SOL_SOCKET, SO_ZEROCOPY, &val, sizeof(val))) { 22 + perror("setsockopt"); 23 + exit(EXIT_FAILURE); 24 + } 25 + } 26 + 27 + void vsock_recv_completion(int fd, const bool *zerocopied) 28 + { 29 + struct sock_extended_err *serr; 30 + struct msghdr msg = { 0 }; 31 + char cmsg_data[128]; 32 + struct cmsghdr *cm; 33 + ssize_t res; 34 + 35 + msg.msg_control = cmsg_data; 36 + msg.msg_controllen = sizeof(cmsg_data); 37 + 38 + res = recvmsg(fd, &msg, MSG_ERRQUEUE); 39 + if (res) { 40 + fprintf(stderr, "failed to read error queue: %zi\n", res); 41 + exit(EXIT_FAILURE); 42 + } 43 + 44 + cm = CMSG_FIRSTHDR(&msg); 45 + if (!cm) { 46 + fprintf(stderr, "cmsg: no cmsg\n"); 47 + exit(EXIT_FAILURE); 48 + } 49 + 50 + if (cm->cmsg_level != SOL_VSOCK) { 51 + fprintf(stderr, "cmsg: unexpected 'cmsg_level'\n"); 52 + exit(EXIT_FAILURE); 53 + } 54 + 55 + if (cm->cmsg_type != VSOCK_RECVERR) { 56 + fprintf(stderr, "cmsg: unexpected 'cmsg_type'\n"); 57 + exit(EXIT_FAILURE); 58 + } 59 + 60 + serr = (void *)CMSG_DATA(cm); 61 + if (serr->ee_origin != SO_EE_ORIGIN_ZEROCOPY) { 62 + fprintf(stderr, "serr: wrong origin: %u\n", serr->ee_origin); 63 + exit(EXIT_FAILURE); 64 + } 65 + 66 + if (serr->ee_errno) { 67 + fprintf(stderr, "serr: wrong error code: %u\n", serr->ee_errno); 68 + exit(EXIT_FAILURE); 69 + } 70 + 71 + /* This flag is used for tests, to check that transmission was 72 + * performed as expected: zerocopy or fallback to copy. If NULL 73 + * - don't care. 74 + */ 75 + if (!zerocopied) 76 + return; 77 + 78 + if (*zerocopied && (serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) { 79 + fprintf(stderr, "serr: was copy instead of zerocopy\n"); 80 + exit(EXIT_FAILURE); 81 + } 82 + 83 + if (!*zerocopied && !(serr->ee_code & SO_EE_CODE_ZEROCOPY_COPIED)) { 84 + fprintf(stderr, "serr: was zerocopy instead of copy\n"); 85 + exit(EXIT_FAILURE); 86 + } 87 + }
+18
tools/testing/vsock/msg_zerocopy_common.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef MSG_ZEROCOPY_COMMON_H 3 + #define MSG_ZEROCOPY_COMMON_H 4 + 5 + #include <stdbool.h> 6 + 7 + #ifndef SOL_VSOCK 8 + #define SOL_VSOCK 287 9 + #endif 10 + 11 + #ifndef VSOCK_RECVERR 12 + #define VSOCK_RECVERR 1 13 + #endif 14 + 15 + void enable_so_zerocopy(int fd); 16 + void vsock_recv_completion(int fd, const bool *zerocopied); 17 + 18 + #endif /* MSG_ZEROCOPY_COMMON_H */
+133
tools/testing/vsock/util.c
··· 11 11 #include <stdio.h> 12 12 #include <stdint.h> 13 13 #include <stdlib.h> 14 + #include <string.h> 14 15 #include <signal.h> 15 16 #include <unistd.h> 16 17 #include <assert.h> 17 18 #include <sys/epoll.h> 19 + #include <sys/mman.h> 18 20 19 21 #include "timeout.h" 20 22 #include "control.h" ··· 445 443 } 446 444 447 445 return hash; 446 + } 447 + 448 + size_t iovec_bytes(const struct iovec *iov, size_t iovnum) 449 + { 450 + size_t bytes; 451 + int i; 452 + 453 + for (bytes = 0, i = 0; i < iovnum; i++) 454 + bytes += iov[i].iov_len; 455 + 456 + return bytes; 457 + } 458 + 459 + unsigned long iovec_hash_djb2(const struct iovec *iov, size_t iovnum) 460 + { 461 + unsigned long hash; 462 + size_t iov_bytes; 463 + size_t offs; 464 + void *tmp; 465 + int i; 466 + 467 + iov_bytes = iovec_bytes(iov, iovnum); 468 + 469 + tmp = malloc(iov_bytes); 470 + if (!tmp) { 471 + perror("malloc"); 472 + exit(EXIT_FAILURE); 473 + } 474 + 475 + for (offs = 0, i = 0; i < iovnum; i++) { 476 + memcpy(tmp + offs, iov[i].iov_base, iov[i].iov_len); 477 + offs += iov[i].iov_len; 478 + } 479 + 480 + hash = hash_djb2(tmp, iov_bytes); 481 + free(tmp); 482 + 483 + return hash; 484 + } 485 + 486 + /* Allocates and returns new 'struct iovec *' according pattern 487 + * in the 'test_iovec'. For each element in the 'test_iovec' it 488 + * allocates new element in the resulting 'iovec'. 'iov_len' 489 + * of the new element is copied from 'test_iovec'. 'iov_base' is 490 + * allocated depending on the 'iov_base' of 'test_iovec': 491 + * 492 + * 'iov_base' == NULL -> valid buf: mmap('iov_len'). 493 + * 494 + * 'iov_base' == MAP_FAILED -> invalid buf: 495 + * mmap('iov_len'), then munmap('iov_len'). 496 + * 'iov_base' still contains result of 497 + * mmap(). 498 + * 499 + * 'iov_base' == number -> unaligned valid buf: 500 + * mmap('iov_len') + number. 501 + * 502 + * 'iovnum' is number of elements in 'test_iovec'. 503 + * 504 + * Returns new 'iovec' or calls 'exit()' on error. 505 + */ 506 + struct iovec *alloc_test_iovec(const struct iovec *test_iovec, int iovnum) 507 + { 508 + struct iovec *iovec; 509 + int i; 510 + 511 + iovec = malloc(sizeof(*iovec) * iovnum); 512 + if (!iovec) { 513 + perror("malloc"); 514 + exit(EXIT_FAILURE); 515 + } 516 + 517 + for (i = 0; i < iovnum; i++) { 518 + iovec[i].iov_len = test_iovec[i].iov_len; 519 + 520 + iovec[i].iov_base = mmap(NULL, iovec[i].iov_len, 521 + PROT_READ | PROT_WRITE, 522 + MAP_PRIVATE | MAP_ANONYMOUS | MAP_POPULATE, 523 + -1, 0); 524 + if (iovec[i].iov_base == MAP_FAILED) { 525 + perror("mmap"); 526 + exit(EXIT_FAILURE); 527 + } 528 + 529 + if (test_iovec[i].iov_base != MAP_FAILED) 530 + iovec[i].iov_base += (uintptr_t)test_iovec[i].iov_base; 531 + } 532 + 533 + /* Unmap "invalid" elements. */ 534 + for (i = 0; i < iovnum; i++) { 535 + if (test_iovec[i].iov_base == MAP_FAILED) { 536 + if (munmap(iovec[i].iov_base, iovec[i].iov_len)) { 537 + perror("munmap"); 538 + exit(EXIT_FAILURE); 539 + } 540 + } 541 + } 542 + 543 + for (i = 0; i < iovnum; i++) { 544 + int j; 545 + 546 + if (test_iovec[i].iov_base == MAP_FAILED) 547 + continue; 548 + 549 + for (j = 0; j < iovec[i].iov_len; j++) 550 + ((uint8_t *)iovec[i].iov_base)[j] = rand() & 0xff; 551 + } 552 + 553 + return iovec; 554 + } 555 + 556 + /* Frees 'iovec *', previously allocated by 'alloc_test_iovec()'. 557 + * On error calls 'exit()'. 558 + */ 559 + void free_test_iovec(const struct iovec *test_iovec, 560 + struct iovec *iovec, int iovnum) 561 + { 562 + int i; 563 + 564 + for (i = 0; i < iovnum; i++) { 565 + if (test_iovec[i].iov_base != MAP_FAILED) { 566 + if (test_iovec[i].iov_base) 567 + iovec[i].iov_base -= (uintptr_t)test_iovec[i].iov_base; 568 + 569 + if (munmap(iovec[i].iov_base, iovec[i].iov_len)) { 570 + perror("munmap"); 571 + exit(EXIT_FAILURE); 572 + } 573 + } 574 + } 575 + 576 + free(iovec); 448 577 }
+5
tools/testing/vsock/util.h
··· 53 53 void skip_test(struct test_case *test_cases, size_t test_cases_len, 54 54 const char *test_id_str); 55 55 unsigned long hash_djb2(const void *data, size_t len); 56 + size_t iovec_bytes(const struct iovec *iov, size_t iovnum); 57 + unsigned long iovec_hash_djb2(const struct iovec *iov, size_t iovnum); 58 + struct iovec *alloc_test_iovec(const struct iovec *test_iovec, int iovnum); 59 + void free_test_iovec(const struct iovec *test_iovec, 60 + struct iovec *iovec, int iovnum); 56 61 #endif /* UTIL_H */
+71 -9
tools/testing/vsock/vsock_perf.c
··· 18 18 #include <poll.h> 19 19 #include <sys/socket.h> 20 20 #include <linux/vm_sockets.h> 21 + #include <sys/mman.h> 22 + 23 + #include "msg_zerocopy_common.h" 21 24 22 25 #define DEFAULT_BUF_SIZE_BYTES (128 * 1024) 23 26 #define DEFAULT_TO_SEND_BYTES (64 * 1024) ··· 34 31 static unsigned int port = DEFAULT_PORT; 35 32 static unsigned long buf_size_bytes = DEFAULT_BUF_SIZE_BYTES; 36 33 static unsigned long vsock_buf_bytes = DEFAULT_VSOCK_BUF_BYTES; 34 + static bool zerocopy; 37 35 38 36 static void error(const char *s) 39 37 { ··· 256 252 time_t tx_begin_ns; 257 253 time_t tx_total_ns; 258 254 size_t total_send; 255 + time_t time_in_send; 259 256 void *data; 260 257 int fd; 261 258 262 - printf("Run as sender\n"); 259 + if (zerocopy) 260 + printf("Run as sender MSG_ZEROCOPY\n"); 261 + else 262 + printf("Run as sender\n"); 263 + 263 264 printf("Connect to %i:%u\n", peer_cid, port); 264 265 printf("Send %lu bytes\n", to_send_bytes); 265 266 printf("TX buffer %lu bytes\n", buf_size_bytes); ··· 274 265 if (fd < 0) 275 266 exit(EXIT_FAILURE); 276 267 277 - data = malloc(buf_size_bytes); 268 + if (zerocopy) { 269 + enable_so_zerocopy(fd); 278 270 279 - if (!data) { 280 - fprintf(stderr, "'malloc()' failed\n"); 281 - exit(EXIT_FAILURE); 271 + data = mmap(NULL, buf_size_bytes, PROT_READ | PROT_WRITE, 272 + MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); 273 + if (data == MAP_FAILED) { 274 + perror("mmap"); 275 + exit(EXIT_FAILURE); 276 + } 277 + } else { 278 + data = malloc(buf_size_bytes); 279 + 280 + if (!data) { 281 + fprintf(stderr, "'malloc()' failed\n"); 282 + exit(EXIT_FAILURE); 283 + } 282 284 } 283 285 284 286 memset(data, 0, buf_size_bytes); 285 287 total_send = 0; 288 + time_in_send = 0; 286 289 tx_begin_ns = current_nsec(); 287 290 288 291 while (total_send < to_send_bytes) { 289 292 ssize_t sent; 293 + size_t rest_bytes; 294 + time_t before; 290 295 291 - sent = write(fd, data, buf_size_bytes); 296 + rest_bytes = to_send_bytes - total_send; 297 + 298 + before = current_nsec(); 299 + sent = send(fd, data, (rest_bytes > buf_size_bytes) ? 300 + buf_size_bytes : rest_bytes, 301 + zerocopy ? MSG_ZEROCOPY : 0); 302 + time_in_send += (current_nsec() - before); 292 303 293 304 if (sent <= 0) 294 305 error("write"); 295 306 296 307 total_send += sent; 308 + 309 + if (zerocopy) { 310 + struct pollfd fds = { 0 }; 311 + 312 + fds.fd = fd; 313 + 314 + if (poll(&fds, 1, -1) < 0) { 315 + perror("poll"); 316 + exit(EXIT_FAILURE); 317 + } 318 + 319 + if (!(fds.revents & POLLERR)) { 320 + fprintf(stderr, "POLLERR expected\n"); 321 + exit(EXIT_FAILURE); 322 + } 323 + 324 + vsock_recv_completion(fd, NULL); 325 + } 297 326 } 298 327 299 328 tx_total_ns = current_nsec() - tx_begin_ns; 300 329 301 330 printf("total bytes sent: %zu\n", total_send); 302 331 printf("tx performance: %f Gbits/s\n", 303 - get_gbps(total_send * 8, tx_total_ns)); 304 - printf("total time in 'write()': %f sec\n", 332 + get_gbps(total_send * 8, time_in_send)); 333 + printf("total time in tx loop: %f sec\n", 305 334 (float)tx_total_ns / NSEC_PER_SEC); 335 + printf("time in 'send()': %f sec\n", 336 + (float)time_in_send / NSEC_PER_SEC); 306 337 307 338 close(fd); 308 - free(data); 339 + 340 + if (zerocopy) 341 + munmap(data, buf_size_bytes); 342 + else 343 + free(data); 309 344 } 310 345 311 346 static const char optstring[] = ""; ··· 389 336 .has_arg = required_argument, 390 337 .val = 'R', 391 338 }, 339 + { 340 + .name = "zerocopy", 341 + .has_arg = no_argument, 342 + .val = 'Z', 343 + }, 392 344 {}, 393 345 }; 394 346 ··· 409 351 " --help This message\n" 410 352 " --sender <cid> Sender mode (receiver default)\n" 411 353 " <cid> of the receiver to connect to\n" 354 + " --zerocopy Enable zerocopy (for sender mode only)\n" 412 355 " --port <port> Port (default %d)\n" 413 356 " --bytes <bytes>KMG Bytes to send (default %d)\n" 414 357 " --buf-size <bytes>KMG Data buffer size (default %d). In sender mode\n" ··· 471 412 break; 472 413 case 'H': /* Help. */ 473 414 usage(); 415 + break; 416 + case 'Z': /* Zerocopy. */ 417 + zerocopy = true; 474 418 break; 475 419 default: 476 420 usage();
+16
tools/testing/vsock/vsock_test.c
··· 21 21 #include <poll.h> 22 22 #include <signal.h> 23 23 24 + #include "vsock_test_zerocopy.h" 24 25 #include "timeout.h" 25 26 #include "control.h" 26 27 #include "util.h" ··· 1269 1268 .name = "SOCK_STREAM SHUT_RD", 1270 1269 .run_client = test_stream_shutrd_client, 1271 1270 .run_server = test_stream_shutrd_server, 1271 + }, 1272 + { 1273 + .name = "SOCK_STREAM MSG_ZEROCOPY", 1274 + .run_client = test_stream_msgzcopy_client, 1275 + .run_server = test_stream_msgzcopy_server, 1276 + }, 1277 + { 1278 + .name = "SOCK_SEQPACKET MSG_ZEROCOPY", 1279 + .run_client = test_seqpacket_msgzcopy_client, 1280 + .run_server = test_seqpacket_msgzcopy_server, 1281 + }, 1282 + { 1283 + .name = "SOCK_STREAM MSG_ZEROCOPY empty MSG_ERRQUEUE", 1284 + .run_client = test_stream_msgzcopy_empty_errq_client, 1285 + .run_server = test_stream_msgzcopy_empty_errq_server, 1272 1286 }, 1273 1287 {}, 1274 1288 };
+358
tools/testing/vsock/vsock_test_zerocopy.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* MSG_ZEROCOPY feature tests for vsock 3 + * 4 + * Copyright (C) 2023 SberDevices. 5 + * 6 + * Author: Arseniy Krasnov <avkrasnov@salutedevices.com> 7 + */ 8 + 9 + #include <stdio.h> 10 + #include <stdlib.h> 11 + #include <string.h> 12 + #include <sys/mman.h> 13 + #include <unistd.h> 14 + #include <poll.h> 15 + #include <linux/errqueue.h> 16 + #include <linux/kernel.h> 17 + #include <errno.h> 18 + 19 + #include "control.h" 20 + #include "vsock_test_zerocopy.h" 21 + #include "msg_zerocopy_common.h" 22 + 23 + #ifndef PAGE_SIZE 24 + #define PAGE_SIZE 4096 25 + #endif 26 + 27 + #define VSOCK_TEST_DATA_MAX_IOV 3 28 + 29 + struct vsock_test_data { 30 + /* This test case if for SOCK_STREAM only. */ 31 + bool stream_only; 32 + /* Data must be zerocopied. This field is checked against 33 + * field 'ee_code' of the 'struct sock_extended_err', which 34 + * contains bit to detect that zerocopy transmission was 35 + * fallbacked to copy mode. 36 + */ 37 + bool zerocopied; 38 + /* Enable SO_ZEROCOPY option on the socket. Without enabled 39 + * SO_ZEROCOPY, every MSG_ZEROCOPY transmission will behave 40 + * like without MSG_ZEROCOPY flag. 41 + */ 42 + bool so_zerocopy; 43 + /* 'errno' after 'sendmsg()' call. */ 44 + int sendmsg_errno; 45 + /* Number of valid elements in 'vecs'. */ 46 + int vecs_cnt; 47 + struct iovec vecs[VSOCK_TEST_DATA_MAX_IOV]; 48 + }; 49 + 50 + static struct vsock_test_data test_data_array[] = { 51 + /* Last element has non-page aligned size. */ 52 + { 53 + .zerocopied = true, 54 + .so_zerocopy = true, 55 + .sendmsg_errno = 0, 56 + .vecs_cnt = 3, 57 + { 58 + { NULL, PAGE_SIZE }, 59 + { NULL, PAGE_SIZE }, 60 + { NULL, 200 } 61 + } 62 + }, 63 + /* All elements have page aligned base and size. */ 64 + { 65 + .zerocopied = true, 66 + .so_zerocopy = true, 67 + .sendmsg_errno = 0, 68 + .vecs_cnt = 3, 69 + { 70 + { NULL, PAGE_SIZE }, 71 + { NULL, PAGE_SIZE * 2 }, 72 + { NULL, PAGE_SIZE * 3 } 73 + } 74 + }, 75 + /* All elements have page aligned base and size. But 76 + * data length is bigger than 64Kb. 77 + */ 78 + { 79 + .zerocopied = true, 80 + .so_zerocopy = true, 81 + .sendmsg_errno = 0, 82 + .vecs_cnt = 3, 83 + { 84 + { NULL, PAGE_SIZE * 16 }, 85 + { NULL, PAGE_SIZE * 16 }, 86 + { NULL, PAGE_SIZE * 16 } 87 + } 88 + }, 89 + /* Middle element has both non-page aligned base and size. */ 90 + { 91 + .zerocopied = true, 92 + .so_zerocopy = true, 93 + .sendmsg_errno = 0, 94 + .vecs_cnt = 3, 95 + { 96 + { NULL, PAGE_SIZE }, 97 + { (void *)1, 100 }, 98 + { NULL, PAGE_SIZE } 99 + } 100 + }, 101 + /* Middle element is unmapped. */ 102 + { 103 + .zerocopied = false, 104 + .so_zerocopy = true, 105 + .sendmsg_errno = ENOMEM, 106 + .vecs_cnt = 3, 107 + { 108 + { NULL, PAGE_SIZE }, 109 + { MAP_FAILED, PAGE_SIZE }, 110 + { NULL, PAGE_SIZE } 111 + } 112 + }, 113 + /* Valid data, but SO_ZEROCOPY is off. This 114 + * will trigger fallback to copy. 115 + */ 116 + { 117 + .zerocopied = false, 118 + .so_zerocopy = false, 119 + .sendmsg_errno = 0, 120 + .vecs_cnt = 1, 121 + { 122 + { NULL, PAGE_SIZE } 123 + } 124 + }, 125 + /* Valid data, but message is bigger than peer's 126 + * buffer, so this will trigger fallback to copy. 127 + * This test is for SOCK_STREAM only, because 128 + * for SOCK_SEQPACKET, 'sendmsg()' returns EMSGSIZE. 129 + */ 130 + { 131 + .stream_only = true, 132 + .zerocopied = false, 133 + .so_zerocopy = true, 134 + .sendmsg_errno = 0, 135 + .vecs_cnt = 1, 136 + { 137 + { NULL, 100 * PAGE_SIZE } 138 + } 139 + }, 140 + }; 141 + 142 + #define POLL_TIMEOUT_MS 100 143 + 144 + static void test_client(const struct test_opts *opts, 145 + const struct vsock_test_data *test_data, 146 + bool sock_seqpacket) 147 + { 148 + struct pollfd fds = { 0 }; 149 + struct msghdr msg = { 0 }; 150 + ssize_t sendmsg_res; 151 + struct iovec *iovec; 152 + int fd; 153 + 154 + if (sock_seqpacket) 155 + fd = vsock_seqpacket_connect(opts->peer_cid, 1234); 156 + else 157 + fd = vsock_stream_connect(opts->peer_cid, 1234); 158 + 159 + if (fd < 0) { 160 + perror("connect"); 161 + exit(EXIT_FAILURE); 162 + } 163 + 164 + if (test_data->so_zerocopy) 165 + enable_so_zerocopy(fd); 166 + 167 + iovec = alloc_test_iovec(test_data->vecs, test_data->vecs_cnt); 168 + 169 + msg.msg_iov = iovec; 170 + msg.msg_iovlen = test_data->vecs_cnt; 171 + 172 + errno = 0; 173 + 174 + sendmsg_res = sendmsg(fd, &msg, MSG_ZEROCOPY); 175 + if (errno != test_data->sendmsg_errno) { 176 + fprintf(stderr, "expected 'errno' == %i, got %i\n", 177 + test_data->sendmsg_errno, errno); 178 + exit(EXIT_FAILURE); 179 + } 180 + 181 + if (!errno) { 182 + if (sendmsg_res != iovec_bytes(iovec, test_data->vecs_cnt)) { 183 + fprintf(stderr, "expected 'sendmsg()' == %li, got %li\n", 184 + iovec_bytes(iovec, test_data->vecs_cnt), 185 + sendmsg_res); 186 + exit(EXIT_FAILURE); 187 + } 188 + } 189 + 190 + fds.fd = fd; 191 + fds.events = 0; 192 + 193 + if (poll(&fds, 1, POLL_TIMEOUT_MS) < 0) { 194 + perror("poll"); 195 + exit(EXIT_FAILURE); 196 + } 197 + 198 + if (fds.revents & POLLERR) { 199 + vsock_recv_completion(fd, &test_data->zerocopied); 200 + } else if (test_data->so_zerocopy && !test_data->sendmsg_errno) { 201 + /* If we don't have data in the error queue, but 202 + * SO_ZEROCOPY was enabled and 'sendmsg()' was 203 + * successful - this is an error. 204 + */ 205 + fprintf(stderr, "POLLERR expected\n"); 206 + exit(EXIT_FAILURE); 207 + } 208 + 209 + if (!test_data->sendmsg_errno) 210 + control_writeulong(iovec_hash_djb2(iovec, test_data->vecs_cnt)); 211 + else 212 + control_writeulong(0); 213 + 214 + control_writeln("DONE"); 215 + free_test_iovec(test_data->vecs, iovec, test_data->vecs_cnt); 216 + close(fd); 217 + } 218 + 219 + void test_stream_msgzcopy_client(const struct test_opts *opts) 220 + { 221 + int i; 222 + 223 + for (i = 0; i < ARRAY_SIZE(test_data_array); i++) 224 + test_client(opts, &test_data_array[i], false); 225 + } 226 + 227 + void test_seqpacket_msgzcopy_client(const struct test_opts *opts) 228 + { 229 + int i; 230 + 231 + for (i = 0; i < ARRAY_SIZE(test_data_array); i++) { 232 + if (test_data_array[i].stream_only) 233 + continue; 234 + 235 + test_client(opts, &test_data_array[i], true); 236 + } 237 + } 238 + 239 + static void test_server(const struct test_opts *opts, 240 + const struct vsock_test_data *test_data, 241 + bool sock_seqpacket) 242 + { 243 + unsigned long remote_hash; 244 + unsigned long local_hash; 245 + ssize_t total_bytes_rec; 246 + unsigned char *data; 247 + size_t data_len; 248 + int fd; 249 + 250 + if (sock_seqpacket) 251 + fd = vsock_seqpacket_accept(VMADDR_CID_ANY, 1234, NULL); 252 + else 253 + fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL); 254 + 255 + if (fd < 0) { 256 + perror("accept"); 257 + exit(EXIT_FAILURE); 258 + } 259 + 260 + data_len = iovec_bytes(test_data->vecs, test_data->vecs_cnt); 261 + 262 + data = malloc(data_len); 263 + if (!data) { 264 + perror("malloc"); 265 + exit(EXIT_FAILURE); 266 + } 267 + 268 + total_bytes_rec = 0; 269 + 270 + while (total_bytes_rec != data_len) { 271 + ssize_t bytes_rec; 272 + 273 + bytes_rec = read(fd, data + total_bytes_rec, 274 + data_len - total_bytes_rec); 275 + if (bytes_rec <= 0) 276 + break; 277 + 278 + total_bytes_rec += bytes_rec; 279 + } 280 + 281 + if (test_data->sendmsg_errno == 0) 282 + local_hash = hash_djb2(data, data_len); 283 + else 284 + local_hash = 0; 285 + 286 + free(data); 287 + 288 + /* Waiting for some result. */ 289 + remote_hash = control_readulong(); 290 + if (remote_hash != local_hash) { 291 + fprintf(stderr, "hash mismatch\n"); 292 + exit(EXIT_FAILURE); 293 + } 294 + 295 + control_expectln("DONE"); 296 + close(fd); 297 + } 298 + 299 + void test_stream_msgzcopy_server(const struct test_opts *opts) 300 + { 301 + int i; 302 + 303 + for (i = 0; i < ARRAY_SIZE(test_data_array); i++) 304 + test_server(opts, &test_data_array[i], false); 305 + } 306 + 307 + void test_seqpacket_msgzcopy_server(const struct test_opts *opts) 308 + { 309 + int i; 310 + 311 + for (i = 0; i < ARRAY_SIZE(test_data_array); i++) { 312 + if (test_data_array[i].stream_only) 313 + continue; 314 + 315 + test_server(opts, &test_data_array[i], true); 316 + } 317 + } 318 + 319 + void test_stream_msgzcopy_empty_errq_client(const struct test_opts *opts) 320 + { 321 + struct msghdr msg = { 0 }; 322 + char cmsg_data[128]; 323 + ssize_t res; 324 + int fd; 325 + 326 + fd = vsock_stream_connect(opts->peer_cid, 1234); 327 + if (fd < 0) { 328 + perror("connect"); 329 + exit(EXIT_FAILURE); 330 + } 331 + 332 + msg.msg_control = cmsg_data; 333 + msg.msg_controllen = sizeof(cmsg_data); 334 + 335 + res = recvmsg(fd, &msg, MSG_ERRQUEUE); 336 + if (res != -1) { 337 + fprintf(stderr, "expected 'recvmsg(2)' failure, got %zi\n", 338 + res); 339 + exit(EXIT_FAILURE); 340 + } 341 + 342 + control_writeln("DONE"); 343 + close(fd); 344 + } 345 + 346 + void test_stream_msgzcopy_empty_errq_server(const struct test_opts *opts) 347 + { 348 + int fd; 349 + 350 + fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL); 351 + if (fd < 0) { 352 + perror("accept"); 353 + exit(EXIT_FAILURE); 354 + } 355 + 356 + control_expectln("DONE"); 357 + close(fd); 358 + }
+15
tools/testing/vsock/vsock_test_zerocopy.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0-only */ 2 + #ifndef VSOCK_TEST_ZEROCOPY_H 3 + #define VSOCK_TEST_ZEROCOPY_H 4 + #include "util.h" 5 + 6 + void test_stream_msgzcopy_client(const struct test_opts *opts); 7 + void test_stream_msgzcopy_server(const struct test_opts *opts); 8 + 9 + void test_seqpacket_msgzcopy_client(const struct test_opts *opts); 10 + void test_seqpacket_msgzcopy_server(const struct test_opts *opts); 11 + 12 + void test_stream_msgzcopy_empty_errq_client(const struct test_opts *opts); 13 + void test_stream_msgzcopy_empty_errq_server(const struct test_opts *opts); 14 + 15 + #endif /* VSOCK_TEST_ZEROCOPY_H */
+342
tools/testing/vsock/vsock_uring_test.c
··· 1 + // SPDX-License-Identifier: GPL-2.0-only 2 + /* io_uring tests for vsock 3 + * 4 + * Copyright (C) 2023 SberDevices. 5 + * 6 + * Author: Arseniy Krasnov <avkrasnov@salutedevices.com> 7 + */ 8 + 9 + #include <getopt.h> 10 + #include <stdio.h> 11 + #include <stdlib.h> 12 + #include <string.h> 13 + #include <liburing.h> 14 + #include <unistd.h> 15 + #include <sys/mman.h> 16 + #include <linux/kernel.h> 17 + #include <error.h> 18 + 19 + #include "util.h" 20 + #include "control.h" 21 + #include "msg_zerocopy_common.h" 22 + 23 + #ifndef PAGE_SIZE 24 + #define PAGE_SIZE 4096 25 + #endif 26 + 27 + #define RING_ENTRIES_NUM 4 28 + 29 + #define VSOCK_TEST_DATA_MAX_IOV 3 30 + 31 + struct vsock_io_uring_test { 32 + /* Number of valid elements in 'vecs'. */ 33 + int vecs_cnt; 34 + struct iovec vecs[VSOCK_TEST_DATA_MAX_IOV]; 35 + }; 36 + 37 + static struct vsock_io_uring_test test_data_array[] = { 38 + /* All elements have page aligned base and size. */ 39 + { 40 + .vecs_cnt = 3, 41 + { 42 + { NULL, PAGE_SIZE }, 43 + { NULL, 2 * PAGE_SIZE }, 44 + { NULL, 3 * PAGE_SIZE }, 45 + } 46 + }, 47 + /* Middle element has both non-page aligned base and size. */ 48 + { 49 + .vecs_cnt = 3, 50 + { 51 + { NULL, PAGE_SIZE }, 52 + { (void *)1, 200 }, 53 + { NULL, 3 * PAGE_SIZE }, 54 + } 55 + } 56 + }; 57 + 58 + static void vsock_io_uring_client(const struct test_opts *opts, 59 + const struct vsock_io_uring_test *test_data, 60 + bool msg_zerocopy) 61 + { 62 + struct io_uring_sqe *sqe; 63 + struct io_uring_cqe *cqe; 64 + struct io_uring ring; 65 + struct iovec *iovec; 66 + struct msghdr msg; 67 + int fd; 68 + 69 + fd = vsock_stream_connect(opts->peer_cid, 1234); 70 + if (fd < 0) { 71 + perror("connect"); 72 + exit(EXIT_FAILURE); 73 + } 74 + 75 + if (msg_zerocopy) 76 + enable_so_zerocopy(fd); 77 + 78 + iovec = alloc_test_iovec(test_data->vecs, test_data->vecs_cnt); 79 + 80 + if (io_uring_queue_init(RING_ENTRIES_NUM, &ring, 0)) 81 + error(1, errno, "io_uring_queue_init"); 82 + 83 + if (io_uring_register_buffers(&ring, iovec, test_data->vecs_cnt)) 84 + error(1, errno, "io_uring_register_buffers"); 85 + 86 + memset(&msg, 0, sizeof(msg)); 87 + msg.msg_iov = iovec; 88 + msg.msg_iovlen = test_data->vecs_cnt; 89 + sqe = io_uring_get_sqe(&ring); 90 + 91 + if (msg_zerocopy) 92 + io_uring_prep_sendmsg_zc(sqe, fd, &msg, 0); 93 + else 94 + io_uring_prep_sendmsg(sqe, fd, &msg, 0); 95 + 96 + if (io_uring_submit(&ring) != 1) 97 + error(1, errno, "io_uring_submit"); 98 + 99 + if (io_uring_wait_cqe(&ring, &cqe)) 100 + error(1, errno, "io_uring_wait_cqe"); 101 + 102 + io_uring_cqe_seen(&ring, cqe); 103 + 104 + control_writeulong(iovec_hash_djb2(iovec, test_data->vecs_cnt)); 105 + 106 + control_writeln("DONE"); 107 + io_uring_queue_exit(&ring); 108 + free_test_iovec(test_data->vecs, iovec, test_data->vecs_cnt); 109 + close(fd); 110 + } 111 + 112 + static void vsock_io_uring_server(const struct test_opts *opts, 113 + const struct vsock_io_uring_test *test_data) 114 + { 115 + unsigned long remote_hash; 116 + unsigned long local_hash; 117 + struct io_uring ring; 118 + size_t data_len; 119 + size_t recv_len; 120 + void *data; 121 + int fd; 122 + 123 + fd = vsock_stream_accept(VMADDR_CID_ANY, 1234, NULL); 124 + if (fd < 0) { 125 + perror("accept"); 126 + exit(EXIT_FAILURE); 127 + } 128 + 129 + data_len = iovec_bytes(test_data->vecs, test_data->vecs_cnt); 130 + 131 + data = malloc(data_len); 132 + if (!data) { 133 + perror("malloc"); 134 + exit(EXIT_FAILURE); 135 + } 136 + 137 + if (io_uring_queue_init(RING_ENTRIES_NUM, &ring, 0)) 138 + error(1, errno, "io_uring_queue_init"); 139 + 140 + recv_len = 0; 141 + 142 + while (recv_len < data_len) { 143 + struct io_uring_sqe *sqe; 144 + struct io_uring_cqe *cqe; 145 + struct iovec iovec; 146 + 147 + sqe = io_uring_get_sqe(&ring); 148 + iovec.iov_base = data + recv_len; 149 + iovec.iov_len = data_len; 150 + 151 + io_uring_prep_readv(sqe, fd, &iovec, 1, 0); 152 + 153 + if (io_uring_submit(&ring) != 1) 154 + error(1, errno, "io_uring_submit"); 155 + 156 + if (io_uring_wait_cqe(&ring, &cqe)) 157 + error(1, errno, "io_uring_wait_cqe"); 158 + 159 + recv_len += cqe->res; 160 + io_uring_cqe_seen(&ring, cqe); 161 + } 162 + 163 + if (recv_len != data_len) { 164 + fprintf(stderr, "expected %zu, got %zu\n", data_len, 165 + recv_len); 166 + exit(EXIT_FAILURE); 167 + } 168 + 169 + local_hash = hash_djb2(data, data_len); 170 + 171 + remote_hash = control_readulong(); 172 + if (remote_hash != local_hash) { 173 + fprintf(stderr, "hash mismatch\n"); 174 + exit(EXIT_FAILURE); 175 + } 176 + 177 + control_expectln("DONE"); 178 + io_uring_queue_exit(&ring); 179 + free(data); 180 + } 181 + 182 + void test_stream_uring_server(const struct test_opts *opts) 183 + { 184 + int i; 185 + 186 + for (i = 0; i < ARRAY_SIZE(test_data_array); i++) 187 + vsock_io_uring_server(opts, &test_data_array[i]); 188 + } 189 + 190 + void test_stream_uring_client(const struct test_opts *opts) 191 + { 192 + int i; 193 + 194 + for (i = 0; i < ARRAY_SIZE(test_data_array); i++) 195 + vsock_io_uring_client(opts, &test_data_array[i], false); 196 + } 197 + 198 + void test_stream_uring_msg_zc_server(const struct test_opts *opts) 199 + { 200 + int i; 201 + 202 + for (i = 0; i < ARRAY_SIZE(test_data_array); i++) 203 + vsock_io_uring_server(opts, &test_data_array[i]); 204 + } 205 + 206 + void test_stream_uring_msg_zc_client(const struct test_opts *opts) 207 + { 208 + int i; 209 + 210 + for (i = 0; i < ARRAY_SIZE(test_data_array); i++) 211 + vsock_io_uring_client(opts, &test_data_array[i], true); 212 + } 213 + 214 + static struct test_case test_cases[] = { 215 + { 216 + .name = "SOCK_STREAM io_uring test", 217 + .run_server = test_stream_uring_server, 218 + .run_client = test_stream_uring_client, 219 + }, 220 + { 221 + .name = "SOCK_STREAM io_uring MSG_ZEROCOPY test", 222 + .run_server = test_stream_uring_msg_zc_server, 223 + .run_client = test_stream_uring_msg_zc_client, 224 + }, 225 + {}, 226 + }; 227 + 228 + static const char optstring[] = ""; 229 + static const struct option longopts[] = { 230 + { 231 + .name = "control-host", 232 + .has_arg = required_argument, 233 + .val = 'H', 234 + }, 235 + { 236 + .name = "control-port", 237 + .has_arg = required_argument, 238 + .val = 'P', 239 + }, 240 + { 241 + .name = "mode", 242 + .has_arg = required_argument, 243 + .val = 'm', 244 + }, 245 + { 246 + .name = "peer-cid", 247 + .has_arg = required_argument, 248 + .val = 'p', 249 + }, 250 + { 251 + .name = "help", 252 + .has_arg = no_argument, 253 + .val = '?', 254 + }, 255 + {}, 256 + }; 257 + 258 + static void usage(void) 259 + { 260 + fprintf(stderr, "Usage: vsock_uring_test [--help] [--control-host=<host>] --control-port=<port> --mode=client|server --peer-cid=<cid>\n" 261 + "\n" 262 + " Server: vsock_uring_test --control-port=1234 --mode=server --peer-cid=3\n" 263 + " Client: vsock_uring_test --control-host=192.168.0.1 --control-port=1234 --mode=client --peer-cid=2\n" 264 + "\n" 265 + "Run transmission tests using io_uring. Usage is the same as\n" 266 + "in ./vsock_test\n" 267 + "\n" 268 + "Options:\n" 269 + " --help This help message\n" 270 + " --control-host <host> Server IP address to connect to\n" 271 + " --control-port <port> Server port to listen on/connect to\n" 272 + " --mode client|server Server or client mode\n" 273 + " --peer-cid <cid> CID of the other side\n" 274 + ); 275 + exit(EXIT_FAILURE); 276 + } 277 + 278 + int main(int argc, char **argv) 279 + { 280 + const char *control_host = NULL; 281 + const char *control_port = NULL; 282 + struct test_opts opts = { 283 + .mode = TEST_MODE_UNSET, 284 + .peer_cid = VMADDR_CID_ANY, 285 + }; 286 + 287 + init_signals(); 288 + 289 + for (;;) { 290 + int opt = getopt_long(argc, argv, optstring, longopts, NULL); 291 + 292 + if (opt == -1) 293 + break; 294 + 295 + switch (opt) { 296 + case 'H': 297 + control_host = optarg; 298 + break; 299 + case 'm': 300 + if (strcmp(optarg, "client") == 0) { 301 + opts.mode = TEST_MODE_CLIENT; 302 + } else if (strcmp(optarg, "server") == 0) { 303 + opts.mode = TEST_MODE_SERVER; 304 + } else { 305 + fprintf(stderr, "--mode must be \"client\" or \"server\"\n"); 306 + return EXIT_FAILURE; 307 + } 308 + break; 309 + case 'p': 310 + opts.peer_cid = parse_cid(optarg); 311 + break; 312 + case 'P': 313 + control_port = optarg; 314 + break; 315 + case '?': 316 + default: 317 + usage(); 318 + } 319 + } 320 + 321 + if (!control_port) 322 + usage(); 323 + if (opts.mode == TEST_MODE_UNSET) 324 + usage(); 325 + if (opts.peer_cid == VMADDR_CID_ANY) 326 + usage(); 327 + 328 + if (!control_host) { 329 + if (opts.mode != TEST_MODE_SERVER) 330 + usage(); 331 + control_host = "0.0.0.0"; 332 + } 333 + 334 + control_init(control_host, control_port, 335 + opts.mode == TEST_MODE_SERVER); 336 + 337 + run_tests(test_cases, &opts); 338 + 339 + control_cleanup(); 340 + 341 + return 0; 342 + }