Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

net: Add queue-create operation

Add a ynl netdev family operation called queue-create that creates a
new queue on a netdevice:

name: queue-create
attribute-set: queue
flags: [admin-perm]
do:
request:
attributes:
- ifindex
- type
- lease
reply: &queue-create-op
attributes:
- id

This is a generic operation such that it can be extended for various
use cases in future. Right now it is mandatory to specify ifindex,
the queue type which is enforced to rx and a lease. The newly created
queue id is returned to the caller.

A queue from a virtual device can have a lease which refers to another
queue from a physical device. This is useful for memory providers
and AF_XDP operations which take an ifindex and queue id to allow
applications to bind against virtual devices in containers. The lease
couples both queues together and allows to proxy the operations from
a virtual device in a container to the physical device.

In future, the nested lease attribute can be lifted and made optional
for other use-cases such as dynamic queue creation for physical
netdevs. The lack of lease and the specification of the physical
device as an ifindex will imply that we need a real queue to be
allocated. Similarly, the queue type enforcement to rx can then be
lifted as well to support tx.

An early implementation had only driver-specific integration [0], but
in order for other virtual devices to reuse, it makes sense to have
this as a generic API in core net.

For leasing queues, the virtual netdev must have real_num_rx_queues
less than num_rx_queues at the time of calling queue-create. The
queue-type must be rx as only rx queues are supported for leasing
for now. We also enforce that the queue-create ifindex must point
to a virtual device, and that the nested lease attribute's ifindex
must point to a physical device. The nested lease attribute set
contains a netns-id attribute which is optional and can specify a
netns-id relative to the caller's netns. It requires cap_net_admin
and if the netns-id attribute is not specified, the lease ifindex
will be retrieved from the current netns. Also, it is modeled as
an s32 type similarly as done elsewhere in the stack.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Co-developed-by: David Wei <dw@davidwei.uk>
Signed-off-by: David Wei <dw@davidwei.uk>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf [0]
Link: https://patch.msgid.link/20260402231031.447597-2-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

authored by

Daniel Borkmann and committed by
Jakub Kicinski
7789c6bb 9700282a

+95
+46
Documentation/netlink/specs/netdev.yaml
··· 339 339 doc: XSK information for this queue, if any. 340 340 type: nest 341 341 nested-attributes: xsk-info 342 + - 343 + name: lease 344 + doc: | 345 + A queue from a virtual device can have a lease which refers to 346 + another queue from a physical device. This is useful for memory 347 + providers and AF_XDP operations which take an ifindex and queue id 348 + to allow applications to bind against virtual devices in containers. 349 + type: nest 350 + nested-attributes: lease 342 351 - 343 352 name: qstats 344 353 doc: | ··· 547 538 - 548 539 name: type 549 540 - 541 + name: lease 542 + attributes: 543 + - 544 + name: ifindex 545 + doc: The netdev ifindex to lease the queue from. 546 + type: u32 547 + checks: 548 + min: 1 549 + - 550 + name: queue 551 + doc: The netdev queue to lease from. 552 + type: nest 553 + nested-attributes: queue-id 554 + - 555 + name: netns-id 556 + doc: The network namespace id of the netdev. 557 + type: s32 558 + checks: 559 + min: 0 560 + - 550 561 name: dmabuf 551 562 attributes: 552 563 - ··· 715 686 - dmabuf 716 687 - io-uring 717 688 - xsk 689 + - lease 718 690 dump: 719 691 request: 720 692 attributes: ··· 825 795 - ifindex 826 796 - fd 827 797 reply: 798 + attributes: 799 + - id 800 + - 801 + name: queue-create 802 + doc: | 803 + Create a new queue for the given netdevice. Whether this operation 804 + is supported depends on the device and the driver. 805 + attribute-set: queue 806 + flags: [admin-perm] 807 + do: 808 + request: 809 + attributes: 810 + - ifindex 811 + - type 812 + - lease 813 + reply: &queue-create-op 828 814 attributes: 829 815 - id 830 816
+11
include/uapi/linux/netdev.h
··· 160 160 NETDEV_A_QUEUE_DMABUF, 161 161 NETDEV_A_QUEUE_IO_URING, 162 162 NETDEV_A_QUEUE_XSK, 163 + NETDEV_A_QUEUE_LEASE, 163 164 164 165 __NETDEV_A_QUEUE_MAX, 165 166 NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1) ··· 204 203 }; 205 204 206 205 enum { 206 + NETDEV_A_LEASE_IFINDEX = 1, 207 + NETDEV_A_LEASE_QUEUE, 208 + NETDEV_A_LEASE_NETNS_ID, 209 + 210 + __NETDEV_A_LEASE_MAX, 211 + NETDEV_A_LEASE_MAX = (__NETDEV_A_LEASE_MAX - 1) 212 + }; 213 + 214 + enum { 207 215 NETDEV_A_DMABUF_IFINDEX = 1, 208 216 NETDEV_A_DMABUF_QUEUES, 209 217 NETDEV_A_DMABUF_FD, ··· 238 228 NETDEV_CMD_BIND_RX, 239 229 NETDEV_CMD_NAPI_SET, 240 230 NETDEV_CMD_BIND_TX, 231 + NETDEV_CMD_QUEUE_CREATE, 241 232 242 233 __NETDEV_CMD_MAX, 243 234 NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)
+20
net/core/netdev-genl-gen.c
··· 28 28 }; 29 29 30 30 /* Common nested types */ 31 + const struct nla_policy netdev_lease_nl_policy[NETDEV_A_LEASE_NETNS_ID + 1] = { 32 + [NETDEV_A_LEASE_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1), 33 + [NETDEV_A_LEASE_QUEUE] = NLA_POLICY_NESTED(netdev_queue_id_nl_policy), 34 + [NETDEV_A_LEASE_NETNS_ID] = NLA_POLICY_MIN(NLA_S32, 0), 35 + }; 36 + 31 37 const struct nla_policy netdev_page_pool_info_nl_policy[NETDEV_A_PAGE_POOL_IFINDEX + 1] = { 32 38 [NETDEV_A_PAGE_POOL_ID] = NLA_POLICY_FULL_RANGE(NLA_UINT, &netdev_a_page_pool_id_range), 33 39 [NETDEV_A_PAGE_POOL_IFINDEX] = NLA_POLICY_FULL_RANGE(NLA_U32, &netdev_a_page_pool_ifindex_range), ··· 111 105 static const struct nla_policy netdev_bind_tx_nl_policy[NETDEV_A_DMABUF_FD + 1] = { 112 106 [NETDEV_A_DMABUF_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1), 113 107 [NETDEV_A_DMABUF_FD] = { .type = NLA_U32, }, 108 + }; 109 + 110 + /* NETDEV_CMD_QUEUE_CREATE - do */ 111 + static const struct nla_policy netdev_queue_create_nl_policy[NETDEV_A_QUEUE_LEASE + 1] = { 112 + [NETDEV_A_QUEUE_IFINDEX] = NLA_POLICY_MIN(NLA_U32, 1), 113 + [NETDEV_A_QUEUE_TYPE] = NLA_POLICY_MAX(NLA_U32, 1), 114 + [NETDEV_A_QUEUE_LEASE] = NLA_POLICY_NESTED(netdev_lease_nl_policy), 114 115 }; 115 116 116 117 /* Ops table for netdev */ ··· 217 204 .policy = netdev_bind_tx_nl_policy, 218 205 .maxattr = NETDEV_A_DMABUF_FD, 219 206 .flags = GENL_CMD_CAP_DO, 207 + }, 208 + { 209 + .cmd = NETDEV_CMD_QUEUE_CREATE, 210 + .doit = netdev_nl_queue_create_doit, 211 + .policy = netdev_queue_create_nl_policy, 212 + .maxattr = NETDEV_A_QUEUE_LEASE, 213 + .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, 220 214 }, 221 215 }; 222 216
+2
net/core/netdev-genl-gen.h
··· 14 14 #include <net/netdev_netlink.h> 15 15 16 16 /* Common nested types */ 17 + extern const struct nla_policy netdev_lease_nl_policy[NETDEV_A_LEASE_NETNS_ID + 1]; 17 18 extern const struct nla_policy netdev_page_pool_info_nl_policy[NETDEV_A_PAGE_POOL_IFINDEX + 1]; 18 19 extern const struct nla_policy netdev_queue_id_nl_policy[NETDEV_A_QUEUE_TYPE + 1]; 19 20 ··· 37 36 int netdev_nl_bind_rx_doit(struct sk_buff *skb, struct genl_info *info); 38 37 int netdev_nl_napi_set_doit(struct sk_buff *skb, struct genl_info *info); 39 38 int netdev_nl_bind_tx_doit(struct sk_buff *skb, struct genl_info *info); 39 + int netdev_nl_queue_create_doit(struct sk_buff *skb, struct genl_info *info); 40 40 41 41 enum { 42 42 NETDEV_NLGRP_MGMT,
+5
net/core/netdev-genl.c
··· 1120 1120 return err; 1121 1121 } 1122 1122 1123 + int netdev_nl_queue_create_doit(struct sk_buff *skb, struct genl_info *info) 1124 + { 1125 + return -EOPNOTSUPP; 1126 + } 1127 + 1123 1128 void netdev_nl_sock_priv_init(struct netdev_nl_sock *priv) 1124 1129 { 1125 1130 INIT_LIST_HEAD(&priv->bindings);
+11
tools/include/uapi/linux/netdev.h
··· 160 160 NETDEV_A_QUEUE_DMABUF, 161 161 NETDEV_A_QUEUE_IO_URING, 162 162 NETDEV_A_QUEUE_XSK, 163 + NETDEV_A_QUEUE_LEASE, 163 164 164 165 __NETDEV_A_QUEUE_MAX, 165 166 NETDEV_A_QUEUE_MAX = (__NETDEV_A_QUEUE_MAX - 1) ··· 204 203 }; 205 204 206 205 enum { 206 + NETDEV_A_LEASE_IFINDEX = 1, 207 + NETDEV_A_LEASE_QUEUE, 208 + NETDEV_A_LEASE_NETNS_ID, 209 + 210 + __NETDEV_A_LEASE_MAX, 211 + NETDEV_A_LEASE_MAX = (__NETDEV_A_LEASE_MAX - 1) 212 + }; 213 + 214 + enum { 207 215 NETDEV_A_DMABUF_IFINDEX = 1, 208 216 NETDEV_A_DMABUF_QUEUES, 209 217 NETDEV_A_DMABUF_FD, ··· 238 228 NETDEV_CMD_BIND_RX, 239 229 NETDEV_CMD_NAPI_SET, 240 230 NETDEV_CMD_BIND_TX, 231 + NETDEV_CMD_QUEUE_CREATE, 241 232 242 233 __NETDEV_CMD_MAX, 243 234 NETDEV_CMD_MAX = (__NETDEV_CMD_MAX - 1)