Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

dccp: Policy-based packet dequeueing infrastructure

This patch adds a generic infrastructure for policy-based dequeueing of
TX packets and provides two policies:
* a simple FIFO policy (which is the default) and
* a priority based policy (set via socket options).
Both policies honour the tx_qlen sysctl for the maximum size of the write
queue (can be overridden via socket options).

The priority policy uses skb->priority internally to assign an u32 priority
identifier, using the same ranking as SO_PRIORITY. The skb->priority field
is set to 0 when the packet leaves DCCP. The priority is supplied as ancillary
data using cmsg(3), the patch also provides the requisite parsing routines.

Signed-off-by: Tomasz Grobelny <tomasz@grobelny.oswiecenia.net>
Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>

authored by

Tomasz Grobelny and committed by
Gerrit Renker
871a2c16 cfa969e3

+248 -9
+20
Documentation/networking/dccp.txt
··· 47 47 48 48 Socket options 49 49 ============== 50 + DCCP_SOCKOPT_QPOLICY_ID sets the dequeuing policy for outgoing packets. It takes 51 + a policy ID as argument and can only be set before the connection (i.e. changes 52 + during an established connection are not supported). Currently, two policies are 53 + defined: the "simple" policy (DCCPQ_POLICY_SIMPLE), which does nothing special, 54 + and a priority-based variant (DCCPQ_POLICY_PRIO). The latter allows to pass an 55 + u32 priority value as ancillary data to sendmsg(), where higher numbers indicate 56 + a higher packet priority (similar to SO_PRIORITY). This ancillary data needs to 57 + be formatted using a cmsg(3) message header filled in as follows: 58 + cmsg->cmsg_level = SOL_DCCP; 59 + cmsg->cmsg_type = DCCP_SCM_PRIORITY; 60 + cmsg->cmsg_len = CMSG_LEN(sizeof(uint32_t)); /* or CMSG_LEN(4) */ 61 + 62 + DCCP_SOCKOPT_QPOLICY_TXQLEN sets the maximum length of the output queue. A zero 63 + value is always interpreted as unbounded queue length. If different from zero, 64 + the interpretation of this parameter depends on the current dequeuing policy 65 + (see above): the "simple" policy will enforce a fixed queue size by returning 66 + EAGAIN, whereas the "prio" policy enforces a fixed queue length by dropping the 67 + lowest-priority packet first. The default value for this parameter is 68 + initialised from /proc/sys/net/dccp/default/tx_qlen. 69 + 50 70 DCCP_SOCKOPT_SERVICE sets the service. The specification mandates use of 51 71 service codes (RFC 4340, sec. 8.1.2); if this socket option is not set, 52 72 the socket will fall back to 0 (which means that no meaningful service code
+21
include/linux/dccp.h
··· 197 197 DCCPF_MAX_CCID_SPECIFIC = 255, 198 198 }; 199 199 200 + /* DCCP socket control message types for cmsg */ 201 + enum dccp_cmsg_type { 202 + DCCP_SCM_PRIORITY = 1, 203 + DCCP_SCM_QPOLICY_MAX = 0xFFFF, 204 + /* ^-- Up to here reserved exclusively for qpolicy parameters */ 205 + DCCP_SCM_MAX 206 + }; 207 + 208 + /* DCCP priorities for outgoing/queued packets */ 209 + enum dccp_packet_dequeueing_policy { 210 + DCCPQ_POLICY_SIMPLE, 211 + DCCPQ_POLICY_PRIO, 212 + DCCPQ_POLICY_MAX 213 + }; 214 + 200 215 /* DCCP socket options */ 201 216 #define DCCP_SOCKOPT_PACKET_SIZE 1 /* XXX deprecated, without effect */ 202 217 #define DCCP_SOCKOPT_SERVICE 2 ··· 225 210 #define DCCP_SOCKOPT_CCID 13 226 211 #define DCCP_SOCKOPT_TX_CCID 14 227 212 #define DCCP_SOCKOPT_RX_CCID 15 213 + #define DCCP_SOCKOPT_QPOLICY_ID 16 214 + #define DCCP_SOCKOPT_QPOLICY_TXQLEN 17 228 215 #define DCCP_SOCKOPT_CCID_RX_INFO 128 229 216 #define DCCP_SOCKOPT_CCID_TX_INFO 192 230 217 ··· 475 458 * @dccps_hc_rx_ccid - CCID used for the receiver (or receiving half-connection) 476 459 * @dccps_hc_tx_ccid - CCID used for the sender (or sending half-connection) 477 460 * @dccps_options_received - parsed set of retrieved options 461 + * @dccps_qpolicy - TX dequeueing policy, one of %dccp_packet_dequeueing_policy 462 + * @dccps_tx_qlen - maximum length of the TX queue 478 463 * @dccps_role - role of this sock, one of %dccp_role 479 464 * @dccps_hc_rx_insert_options - receiver wants to add options when acking 480 465 * @dccps_hc_tx_insert_options - sender wants to add options when sending ··· 519 500 struct ccid *dccps_hc_rx_ccid; 520 501 struct ccid *dccps_hc_tx_ccid; 521 502 struct dccp_options_received dccps_options_received; 503 + __u8 dccps_qpolicy; 504 + __u32 dccps_tx_qlen; 522 505 enum dccp_role dccps_role:2; 523 506 __u8 dccps_hc_rx_insert_options:1; 524 507 __u8 dccps_hc_tx_insert_options:1;
+2 -2
net/dccp/Makefile
··· 1 1 obj-$(CONFIG_IP_DCCP) += dccp.o dccp_ipv4.o 2 2 3 - dccp-y := ccid.o feat.o input.o minisocks.o options.o output.o proto.o timer.o 4 - 3 + dccp-y := ccid.o feat.o input.o minisocks.o options.o output.o proto.o timer.o \ 4 + qpolicy.o 5 5 # 6 6 # CCID algorithms to be used by dccp.ko 7 7 #
+12
net/dccp/dccp.h
··· 243 243 extern void dccp_send_sync(struct sock *sk, const u64 seq, 244 244 const enum dccp_pkt_type pkt_type); 245 245 246 + /* 247 + * TX Packet Dequeueing Interface 248 + */ 249 + extern void dccp_qpolicy_push(struct sock *sk, struct sk_buff *skb); 250 + extern bool dccp_qpolicy_full(struct sock *sk); 251 + extern void dccp_qpolicy_drop(struct sock *sk, struct sk_buff *skb); 252 + extern struct sk_buff *dccp_qpolicy_top(struct sock *sk); 253 + extern struct sk_buff *dccp_qpolicy_pop(struct sock *sk); 254 + 255 + /* 256 + * TX Packet Output and TX Timers 257 + */ 246 258 extern void dccp_write_xmit(struct sock *sk); 247 259 extern void dccp_write_space(struct sock *sk); 248 260 extern void dccp_flush_write_queue(struct sock *sk, long *time_budget);
+3 -4
net/dccp/output.c
··· 242 242 { 243 243 int err, len; 244 244 struct dccp_sock *dp = dccp_sk(sk); 245 - struct sk_buff *skb = skb_dequeue(&sk->sk_write_queue); 245 + struct sk_buff *skb = dccp_qpolicy_pop(sk); 246 246 247 247 if (unlikely(skb == NULL)) 248 248 return; ··· 345 345 struct dccp_sock *dp = dccp_sk(sk); 346 346 struct sk_buff *skb; 347 347 348 - while ((skb = skb_peek(&sk->sk_write_queue))) { 348 + while ((skb = dccp_qpolicy_top(sk))) { 349 349 int rc = ccid_hc_tx_send_packet(dp->dccps_hc_tx_ccid, sk, skb); 350 350 351 351 switch (ccid_packet_dequeue_eval(rc)) { ··· 359 359 dccp_xmit_packet(sk); 360 360 break; 361 361 case CCID_PACKET_ERR: 362 - skb_dequeue(&sk->sk_write_queue); 363 - kfree_skb(skb); 362 + dccp_qpolicy_drop(sk, skb); 364 363 dccp_pr_debug("packet discarded due to err=%d\n", rc); 365 364 } 366 365 }
+64 -3
net/dccp/proto.c
··· 185 185 dp->dccps_role = DCCP_ROLE_UNDEFINED; 186 186 dp->dccps_service = DCCP_SERVICE_CODE_IS_ABSENT; 187 187 dp->dccps_l_ack_ratio = dp->dccps_r_ack_ratio = 1; 188 + dp->dccps_tx_qlen = sysctl_dccp_tx_qlen; 188 189 189 190 dccp_init_xmit_timers(sk); 190 191 ··· 533 532 case DCCP_SOCKOPT_RECV_CSCOV: 534 533 err = dccp_setsockopt_cscov(sk, val, true); 535 534 break; 535 + case DCCP_SOCKOPT_QPOLICY_ID: 536 + if (sk->sk_state != DCCP_CLOSED) 537 + err = -EISCONN; 538 + else if (val < 0 || val >= DCCPQ_POLICY_MAX) 539 + err = -EINVAL; 540 + else 541 + dp->dccps_qpolicy = val; 542 + break; 543 + case DCCP_SOCKOPT_QPOLICY_TXQLEN: 544 + if (val < 0) 545 + err = -EINVAL; 546 + else 547 + dp->dccps_tx_qlen = val; 548 + break; 536 549 default: 537 550 err = -ENOPROTOOPT; 538 551 break; ··· 654 639 case DCCP_SOCKOPT_RECV_CSCOV: 655 640 val = dp->dccps_pcrlen; 656 641 break; 642 + case DCCP_SOCKOPT_QPOLICY_ID: 643 + val = dp->dccps_qpolicy; 644 + break; 645 + case DCCP_SOCKOPT_QPOLICY_TXQLEN: 646 + val = dp->dccps_tx_qlen; 647 + break; 657 648 case 128 ... 191: 658 649 return ccid_hc_rx_getsockopt(dp->dccps_hc_rx_ccid, sk, optname, 659 650 len, (u32 __user *)optval, optlen); ··· 702 681 EXPORT_SYMBOL_GPL(compat_dccp_getsockopt); 703 682 #endif 704 683 684 + static int dccp_msghdr_parse(struct msghdr *msg, struct sk_buff *skb) 685 + { 686 + struct cmsghdr *cmsg = CMSG_FIRSTHDR(msg); 687 + 688 + /* 689 + * Assign an (opaque) qpolicy priority value to skb->priority. 690 + * 691 + * We are overloading this skb field for use with the qpolicy subystem. 692 + * The skb->priority is normally used for the SO_PRIORITY option, which 693 + * is initialised from sk_priority. Since the assignment of sk_priority 694 + * to skb->priority happens later (on layer 3), we overload this field 695 + * for use with queueing priorities as long as the skb is on layer 4. 696 + * The default priority value (if nothing is set) is 0. 697 + */ 698 + skb->priority = 0; 699 + 700 + for (; cmsg != NULL; cmsg = CMSG_NXTHDR(msg, cmsg)) { 701 + 702 + if (!CMSG_OK(msg, cmsg)) 703 + return -EINVAL; 704 + 705 + if (cmsg->cmsg_level != SOL_DCCP) 706 + continue; 707 + 708 + switch (cmsg->cmsg_type) { 709 + case DCCP_SCM_PRIORITY: 710 + if (cmsg->cmsg_len != CMSG_LEN(sizeof(__u32))) 711 + return -EINVAL; 712 + skb->priority = *(__u32 *)CMSG_DATA(cmsg); 713 + break; 714 + default: 715 + return -EINVAL; 716 + } 717 + } 718 + return 0; 719 + } 720 + 705 721 int dccp_sendmsg(struct kiocb *iocb, struct sock *sk, struct msghdr *msg, 706 722 size_t len) 707 723 { ··· 754 696 755 697 lock_sock(sk); 756 698 757 - if (sysctl_dccp_tx_qlen && 758 - (sk->sk_write_queue.qlen >= sysctl_dccp_tx_qlen)) { 699 + if (dccp_qpolicy_full(sk)) { 759 700 rc = -EAGAIN; 760 701 goto out_release; 761 702 } ··· 782 725 if (rc != 0) 783 726 goto out_discard; 784 727 785 - skb_queue_tail(&sk->sk_write_queue, skb); 728 + rc = dccp_msghdr_parse(msg, skb); 729 + if (rc != 0) 730 + goto out_discard; 731 + 732 + dccp_qpolicy_push(sk, skb); 786 733 /* 787 734 * The xmit_timer is set if the TX CCID is rate-based and will expire 788 735 * when congestion control permits to release further packets into the
+126
net/dccp/qpolicy.c
··· 1 + /* 2 + * net/dccp/qpolicy.c 3 + * 4 + * Policy-based packet dequeueing interface for DCCP. 5 + * 6 + * Copyright (c) 2008 Tomasz Grobelny <tomasz@grobelny.oswiecenia.net> 7 + * 8 + * This program is free software; you can redistribute it and/or 9 + * modify it under the terms of the GNU General Public License v2 10 + * as published by the Free Software Foundation. 11 + */ 12 + #include "dccp.h" 13 + 14 + /* 15 + * Simple Dequeueing Policy: 16 + * If tx_qlen is different from 0, enqueue up to tx_qlen elements. 17 + */ 18 + static void qpolicy_simple_push(struct sock *sk, struct sk_buff *skb) 19 + { 20 + skb_queue_tail(&sk->sk_write_queue, skb); 21 + } 22 + 23 + static bool qpolicy_simple_full(struct sock *sk) 24 + { 25 + return dccp_sk(sk)->dccps_tx_qlen && 26 + sk->sk_write_queue.qlen >= dccp_sk(sk)->dccps_tx_qlen; 27 + } 28 + 29 + static struct sk_buff *qpolicy_simple_top(struct sock *sk) 30 + { 31 + return skb_peek(&sk->sk_write_queue); 32 + } 33 + 34 + /* 35 + * Priority-based Dequeueing Policy: 36 + * If tx_qlen is different from 0 and the queue has reached its upper bound 37 + * of tx_qlen elements, replace older packets lowest-priority-first. 38 + */ 39 + static struct sk_buff *qpolicy_prio_best_skb(struct sock *sk) 40 + { 41 + struct sk_buff *skb, *best = NULL; 42 + 43 + skb_queue_walk(&sk->sk_write_queue, skb) 44 + if (best == NULL || skb->priority > best->priority) 45 + best = skb; 46 + return best; 47 + } 48 + 49 + static struct sk_buff *qpolicy_prio_worst_skb(struct sock *sk) 50 + { 51 + struct sk_buff *skb, *worst = NULL; 52 + 53 + skb_queue_walk(&sk->sk_write_queue, skb) 54 + if (worst == NULL || skb->priority < worst->priority) 55 + worst = skb; 56 + return worst; 57 + } 58 + 59 + static bool qpolicy_prio_full(struct sock *sk) 60 + { 61 + if (qpolicy_simple_full(sk)) 62 + dccp_qpolicy_drop(sk, qpolicy_prio_worst_skb(sk)); 63 + return false; 64 + } 65 + 66 + /** 67 + * struct dccp_qpolicy_operations - TX Packet Dequeueing Interface 68 + * @push: add a new @skb to the write queue 69 + * @full: indicates that no more packets will be admitted 70 + * @top: peeks at whatever the queueing policy defines as its `top' 71 + */ 72 + static struct dccp_qpolicy_operations { 73 + void (*push) (struct sock *sk, struct sk_buff *skb); 74 + bool (*full) (struct sock *sk); 75 + struct sk_buff* (*top) (struct sock *sk); 76 + 77 + } qpol_table[DCCPQ_POLICY_MAX] = { 78 + [DCCPQ_POLICY_SIMPLE] = { 79 + .push = qpolicy_simple_push, 80 + .full = qpolicy_simple_full, 81 + .top = qpolicy_simple_top, 82 + }, 83 + [DCCPQ_POLICY_PRIO] = { 84 + .push = qpolicy_simple_push, 85 + .full = qpolicy_prio_full, 86 + .top = qpolicy_prio_best_skb, 87 + }, 88 + }; 89 + 90 + /* 91 + * Externally visible interface 92 + */ 93 + void dccp_qpolicy_push(struct sock *sk, struct sk_buff *skb) 94 + { 95 + qpol_table[dccp_sk(sk)->dccps_qpolicy].push(sk, skb); 96 + } 97 + 98 + bool dccp_qpolicy_full(struct sock *sk) 99 + { 100 + return qpol_table[dccp_sk(sk)->dccps_qpolicy].full(sk); 101 + } 102 + 103 + void dccp_qpolicy_drop(struct sock *sk, struct sk_buff *skb) 104 + { 105 + if (skb != NULL) { 106 + skb_unlink(skb, &sk->sk_write_queue); 107 + kfree_skb(skb); 108 + } 109 + } 110 + 111 + struct sk_buff *dccp_qpolicy_top(struct sock *sk) 112 + { 113 + return qpol_table[dccp_sk(sk)->dccps_qpolicy].top(sk); 114 + } 115 + 116 + struct sk_buff *dccp_qpolicy_pop(struct sock *sk) 117 + { 118 + struct sk_buff *skb = dccp_qpolicy_top(sk); 119 + 120 + if (skb != NULL) { 121 + /* Clear any skb fields that we used internally */ 122 + skb->priority = 0; 123 + skb_unlink(skb, &sk->sk_write_queue); 124 + } 125 + return skb; 126 + }