Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

vsock: add G2H fallback for CIDs not owned by H2G transport

When no H2G transport is loaded, vsock currently routes all CIDs to the
G2H transport (commit 65b422d9b61b ("vsock: forward all packets to the
host when no H2G is registered"). Extend that existing behavior: when
an H2G transport is loaded but does not claim a given CID, the
connection falls back to G2H in the same way.

This matters in environments like Nitro Enclaves, where an instance may
run nested VMs via vhost-vsock (H2G) while also needing to reach sibling
enclaves at higher CIDs through virtio-vsock-pci (G2H). With the old
code, any CID > 2 was unconditionally routed to H2G when vhost was
loaded, making those enclaves unreachable without setting
VMADDR_FLAG_TO_HOST explicitly on every connect.

Requiring every application to set VMADDR_FLAG_TO_HOST creates friction:
tools like socat, iperf, and others would all need to learn about it.
The flag was introduced 6 years ago and I am still not aware of any tool
that supports it. Even if there was support, it would be cumbersome to
use. The most natural experience is a single CID address space where H2G
only wins for CIDs it actually owns, and everything else falls through to
G2H, extending the behavior that already exists when H2G is absent.

To give user space at least a hint that the kernel applied this logic,
automatically set the VMADDR_FLAG_TO_HOST on the remote address so it
can determine the path taken via getpeername().

Add a per-network namespace sysctl net.vsock.g2h_fallback (default 1).
At 0 it forces strict routing: H2G always wins for CID > VMADDR_CID_HOST,
or ENODEV if H2G is not loaded.

Signed-off-by: Alexander Graf <graf@amazon.com>
Tested-by: syzbot@syzkaller.appspotmail.com
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Link: https://patch.msgid.link/20260304230027.59857-1-graf@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

authored by

Alexander Graf and committed by
Paolo Abeni
0de607dc 17edc4e8

+89 -5
+28
Documentation/admin-guide/sysctl/net.rst
··· 602 602 603 603 A namespace with ``ns_mode`` set to ``local`` cannot change 604 604 ``child_ns_mode`` to ``global`` (returns ``-EPERM``). 605 + 606 + g2h_fallback 607 + ------------ 608 + 609 + Controls whether connections to CIDs not owned by the host-to-guest (H2G) 610 + transport automatically fall back to the guest-to-host (G2H) transport. 611 + 612 + When enabled, if a connect targets a CID that the H2G transport (e.g. 613 + vhost-vsock) does not serve, or if no H2G transport is loaded at all, the 614 + connection is routed via the G2H transport (e.g. virtio-vsock) instead. This 615 + allows a host running both nested VMs (via vhost-vsock) and sibling VMs 616 + reachable through the hypervisor (e.g. Nitro Enclaves) to address both using 617 + a single CID space, without requiring applications to set 618 + ``VMADDR_FLAG_TO_HOST``. 619 + 620 + When the fallback is taken, ``VMADDR_FLAG_TO_HOST`` is automatically set on 621 + the remote address so that userspace can determine the path via 622 + ``getpeername()``. 623 + 624 + Note: With this sysctl enabled, user space that attempts to talk to a guest 625 + CID which is not implemented by the H2G transport will create host vsock 626 + traffic. Environments that rely on H2G-only isolation should set it to 0. 627 + 628 + Values: 629 + 630 + - 0 - Connections to CIDs <= 2 or with VMADDR_FLAG_TO_HOST use G2H; 631 + all others use H2G (or fail with ENODEV if H2G is not loaded). 632 + - 1 - Connections to CIDs not owned by H2G fall back to G2H. (default)
+13
drivers/vhost/vsock.c
··· 91 91 return NULL; 92 92 } 93 93 94 + static bool vhost_transport_has_remote_cid(struct vsock_sock *vsk, u32 cid) 95 + { 96 + struct sock *sk = sk_vsock(vsk); 97 + struct net *net = sock_net(sk); 98 + bool found; 99 + 100 + rcu_read_lock(); 101 + found = !!vhost_vsock_get(cid, net); 102 + rcu_read_unlock(); 103 + return found; 104 + } 105 + 94 106 static void 95 107 vhost_transport_do_send_pkt(struct vhost_vsock *vsock, 96 108 struct vhost_virtqueue *vq) ··· 436 424 .module = THIS_MODULE, 437 425 438 426 .get_local_cid = vhost_transport_get_local_cid, 427 + .has_remote_cid = vhost_transport_has_remote_cid, 439 428 440 429 .init = virtio_transport_do_socket_init, 441 430 .destruct = virtio_transport_destruct,
+9
include/net/af_vsock.h
··· 179 179 /* Addressing. */ 180 180 u32 (*get_local_cid)(void); 181 181 182 + /* Check if this transport serves a specific remote CID. 183 + * For H2G transports: return true if the CID belongs to a registered 184 + * guest. If not implemented, all CIDs > VMADDR_CID_HOST go to H2G. 185 + * For G2H transports: return true if the transport can reach arbitrary 186 + * CIDs via the hypervisor (i.e. supports the fallback overlay). VMCI 187 + * does not implement this as it only serves CIDs 0 and 2. 188 + */ 189 + bool (*has_remote_cid)(struct vsock_sock *vsk, u32 remote_cid); 190 + 182 191 /* Read a single skb */ 183 192 int (*read_skb)(struct vsock_sock *, skb_read_actor_t); 184 193
+2
include/net/netns/vsock.h
··· 20 20 21 21 /* 0 = unlocked, 1 = locked to global, 2 = locked to local */ 22 22 int child_ns_mode_locked; 23 + 24 + int g2h_fallback; 23 25 }; 24 26 #endif /* __NET_NET_NAMESPACE_VSOCK_H */
+30 -5
net/vmw_vsock/af_vsock.c
··· 545 545 * The vsk->remote_addr is used to decide which transport to use: 546 546 * - remote CID == VMADDR_CID_LOCAL or g2h->local_cid or VMADDR_CID_HOST if 547 547 * g2h is not loaded, will use local transport; 548 - * - remote CID <= VMADDR_CID_HOST or h2g is not loaded or remote flags field 549 - * includes VMADDR_FLAG_TO_HOST flag value, will use guest->host transport; 550 - * - remote CID > VMADDR_CID_HOST will use host->guest transport; 548 + * - remote CID <= VMADDR_CID_HOST or remote flags field includes 549 + * VMADDR_FLAG_TO_HOST, will use guest->host transport; 550 + * - remote CID > VMADDR_CID_HOST and h2g is loaded and h2g claims that CID, 551 + * will use host->guest transport; 552 + * - h2g not loaded or h2g does not claim that CID and g2h claims the CID via 553 + * has_remote_cid, will use guest->host transport (when g2h_fallback=1) 554 + * - anything else goes to h2g or returns -ENODEV if no h2g is available 551 555 */ 552 556 int vsock_assign_transport(struct vsock_sock *vsk, struct vsock_sock *psk) 553 557 { ··· 585 581 case SOCK_SEQPACKET: 586 582 if (vsock_use_local_transport(remote_cid)) 587 583 new_transport = transport_local; 588 - else if (remote_cid <= VMADDR_CID_HOST || !transport_h2g || 584 + else if (remote_cid <= VMADDR_CID_HOST || 589 585 (remote_flags & VMADDR_FLAG_TO_HOST)) 590 586 new_transport = transport_g2h; 591 - else 587 + else if (transport_h2g && 588 + (!transport_h2g->has_remote_cid || 589 + transport_h2g->has_remote_cid(vsk, remote_cid))) 592 590 new_transport = transport_h2g; 591 + else if (sock_net(sk)->vsock.g2h_fallback && 592 + transport_g2h && transport_g2h->has_remote_cid && 593 + transport_g2h->has_remote_cid(vsk, remote_cid)) { 594 + vsk->remote_addr.svm_flags |= VMADDR_FLAG_TO_HOST; 595 + new_transport = transport_g2h; 596 + } else { 597 + new_transport = transport_h2g; 598 + } 593 599 break; 594 600 default: 595 601 ret = -ESOCKTNOSUPPORT; ··· 2893 2879 .mode = 0644, 2894 2880 .proc_handler = vsock_net_child_mode_string 2895 2881 }, 2882 + { 2883 + .procname = "g2h_fallback", 2884 + .data = &init_net.vsock.g2h_fallback, 2885 + .maxlen = sizeof(int), 2886 + .mode = 0644, 2887 + .proc_handler = proc_dointvec_minmax, 2888 + .extra1 = SYSCTL_ZERO, 2889 + .extra2 = SYSCTL_ONE, 2890 + }, 2896 2891 }; 2897 2892 2898 2893 static int __net_init vsock_sysctl_register(struct net *net) ··· 2917 2894 2918 2895 table[0].data = &net->vsock.mode; 2919 2896 table[1].data = &net->vsock.child_ns_mode; 2897 + table[2].data = &net->vsock.g2h_fallback; 2920 2898 } 2921 2899 2922 2900 net->vsock.sysctl_hdr = register_net_sysctl_sz(net, "net/vsock", table, ··· 2952 2928 net->vsock.mode = vsock_net_child_mode(current->nsproxy->net_ns); 2953 2929 2954 2930 net->vsock.child_ns_mode = net->vsock.mode; 2931 + net->vsock.g2h_fallback = 1; 2955 2932 } 2956 2933 2957 2934 static __net_init int vsock_sysctl_init_net(struct net *net)
+7
net/vmw_vsock/virtio_transport.c
··· 547 547 static bool virtio_transport_seqpacket_allow(struct vsock_sock *vsk, 548 548 u32 remote_cid); 549 549 550 + static bool virtio_transport_has_remote_cid(struct vsock_sock *vsk, u32 cid) 551 + { 552 + /* The CID could be implemented by the host. Always assume it is. */ 553 + return true; 554 + } 555 + 550 556 static struct virtio_transport virtio_transport = { 551 557 .transport = { 552 558 .module = THIS_MODULE, 553 559 554 560 .get_local_cid = virtio_transport_get_local_cid, 561 + .has_remote_cid = virtio_transport_has_remote_cid, 555 562 556 563 .init = virtio_transport_do_socket_init, 557 564 .destruct = virtio_transport_destruct,