Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

Merge branch 'net-sched-taprio-change-schedules'

Vinicius Costa Gomes says:

====================
net/sched: taprio change schedules

Changes from RFC:
- Removed the patches for taprio offloading, because of the lack of
in-tree users;
- Updated the links to point to the PATCH version of this series;

Original cover letter:

Overview
--------

This RFC has two objectives, it adds support for changing the running
schedules during "runtime", explained in more detail later, and
proposes an interface between taprio and the drivers for hardware
offloading.

These two different features are presented together so it's clear what
the "final state" would look like. But after the RFC stage, they can
be proposed (and reviewed) separately.

Changing the schedules without disrupting traffic is important for
handling dynamic use cases, for example, when streams are
added/removed and when the network configuration changes.

Hardware offloading support allows schedules to be more precise and
have lower resource usage.

Changing schedules
------------------

The same as the other interfaces we proposed, we try to use the same
concepts as the IEEE 802.1Q-2018 specification. So, for changing
schedules, there are an "oper" (operational) and an "admin" schedule.
The "admin" schedule is mutable and not in use, the "oper" schedule is
immutable and is in use.

That is, when the user first adds an schedule it is in the "admin"
state, and it becomes "oper" when its base-time (basically when it
starts) is reached.

What this means is that now it's possible to create taprio with a schedule:

$ tc qdisc add dev IFACE parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 2@2 \
base-time 10000000 \
sched-entry S 03 300000 \
sched-entry S 02 300000 \
sched-entry S 06 400000 \
clockid CLOCK_TAI

And then, later, after the previous schedule is "promoted" to "oper",
add a new ("admin") schedule to be used some time later:

$ tc qdisc change dev IFACE parent root handle 100 taprio \
base-time 1553121866000000000 \
sched-entry S 02 500000 \
sched-entry S 0f 400000 \
clockid CLOCK_TAI

When enabling the ability to change schedules, it makes sense to add
two more defined knobs to schedules: "cycle-time" allows to truncate a
cycle to some value, so it repeats after a well-defined value;
"cycle-time-extension" controls how much an entry can be extended if
it's the last one before the change of schedules, the reason is to
avoid a very small cycle when transitioning from a schedule to
another.

With these, taprio in the software mode should provide a fairly
complete implementation of what's defined in the Enhancements for
Scheduled Traffic parts of the specification.

Hardware offload
----------------

Some workloads require better guarantees from their schedules than
what's provided by the software implementation. This series proposes
an interface for configuring schedules into compatible network
controllers.

This part is proposed together with the support for changing
schedules, because it raises questions like, should the "qdisc" side
be responsible of providing visibility into the schedules or should it
be the driver?

In this proposal, the driver is called passing the new schedule as
soon as it is validated, and the "core" qdisc takes care of displaying
(".dump()") the correct schedules at all times. It means that some
logic would need to be duplicated in the driver, if the hardware
doesn't have support for multiple schedules. But as taprio doesn't
have enough information about the underlying controller to know how
much in advance a schedule needs to be informed to the hardware, it
feels like a fair compromise.

The hardware offloading part of this proposal also tries to define an
interface for frame-preemption and how it interacts with the
scheduling of traffic, see Section 8.6.8.4 of IEEE 802.1Q-2018 for
more information.

One important difference between the qdisc interface and the
qdisc-driver interface, is that the "gate mask" on the qdisc side
references traffic classes, that is bit 0 of the gate mask means
Traffic Class 0, and in the driver interface, it specifies the queues,
that is bit 0 means queue 0. That is to say that taprio converts the
references to traffic classes to references to queues before sending
the offloading request to the driver.

Request for help
----------------

I would like that interested driver maintainers could take a look at
the proposed interface and see if it's going to be too awkward for any
particular device. Also, pointers to available documentation would be
appreciated. The idea here is to start a discussion so we can have an
interface that would work for multiple vendors.

Links
-----

kernel patches:
https://github.com/vcgomes/net-next/tree/taprio-add-support-for-change-v3

iproute2 patches:
https://github.com/vcgomes/iproute2/tree/taprio-add-support-for-change-v3
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

+415 -209
+13
include/uapi/linux/pkt_sched.h
··· 1148 1148 1149 1149 #define TCA_TAPRIO_SCHED_MAX (__TCA_TAPRIO_SCHED_MAX - 1) 1150 1150 1151 + /* The format for the admin sched (dump only): 1152 + * [TCA_TAPRIO_SCHED_ADMIN_SCHED] 1153 + * [TCA_TAPRIO_ATTR_SCHED_BASE_TIME] 1154 + * [TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST] 1155 + * [TCA_TAPRIO_ATTR_SCHED_ENTRY] 1156 + * [TCA_TAPRIO_ATTR_SCHED_ENTRY_CMD] 1157 + * [TCA_TAPRIO_ATTR_SCHED_ENTRY_GATES] 1158 + * [TCA_TAPRIO_ATTR_SCHED_ENTRY_INTERVAL] 1159 + */ 1160 + 1151 1161 enum { 1152 1162 TCA_TAPRIO_ATTR_UNSPEC, 1153 1163 TCA_TAPRIO_ATTR_PRIOMAP, /* struct tc_mqprio_qopt */ ··· 1166 1156 TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY, /* single entry */ 1167 1157 TCA_TAPRIO_ATTR_SCHED_CLOCKID, /* s32 */ 1168 1158 TCA_TAPRIO_PAD, 1159 + TCA_TAPRIO_ATTR_ADMIN_SCHED, /* The admin sched, only used in dump */ 1160 + TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, /* s64 */ 1161 + TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, /* s64 */ 1169 1162 __TCA_TAPRIO_ATTR_MAX, 1170 1163 }; 1171 1164
+402 -209
net/sched/sch_taprio.c
··· 16 16 #include <linux/math64.h> 17 17 #include <linux/module.h> 18 18 #include <linux/spinlock.h> 19 + #include <linux/rcupdate.h> 19 20 #include <net/netlink.h> 20 21 #include <net/pkt_sched.h> 21 22 #include <net/pkt_cls.h> ··· 42 41 u8 command; 43 42 }; 44 43 44 + struct sched_gate_list { 45 + struct rcu_head rcu; 46 + struct list_head entries; 47 + size_t num_entries; 48 + ktime_t cycle_close_time; 49 + s64 cycle_time; 50 + s64 cycle_time_extension; 51 + s64 base_time; 52 + }; 53 + 45 54 struct taprio_sched { 46 55 struct Qdisc **qdiscs; 47 56 struct Qdisc *root; 48 - s64 base_time; 49 57 int clockid; 50 58 atomic64_t picos_per_byte; /* Using picoseconds because for 10Gbps+ 51 59 * speeds it's sub-nanoseconds per byte 52 60 */ 53 - size_t num_entries; 54 61 55 62 /* Protects the update side of the RCU protected current_entry */ 56 63 spinlock_t current_entry_lock; 57 64 struct sched_entry __rcu *current_entry; 58 - struct list_head entries; 65 + struct sched_gate_list __rcu *oper_sched; 66 + struct sched_gate_list __rcu *admin_sched; 59 67 ktime_t (*get_time)(void); 60 68 struct hrtimer advance_timer; 61 69 struct list_head taprio_list; 62 70 }; 71 + 72 + static ktime_t sched_base_time(const struct sched_gate_list *sched) 73 + { 74 + if (!sched) 75 + return KTIME_MAX; 76 + 77 + return ns_to_ktime(sched->base_time); 78 + } 79 + 80 + static void taprio_free_sched_cb(struct rcu_head *head) 81 + { 82 + struct sched_gate_list *sched = container_of(head, struct sched_gate_list, rcu); 83 + struct sched_entry *entry, *n; 84 + 85 + if (!sched) 86 + return; 87 + 88 + list_for_each_entry_safe(entry, n, &sched->entries, list) { 89 + list_del(&entry->list); 90 + kfree(entry); 91 + } 92 + 93 + kfree(sched); 94 + } 95 + 96 + static void switch_schedules(struct taprio_sched *q, 97 + struct sched_gate_list **admin, 98 + struct sched_gate_list **oper) 99 + { 100 + rcu_assign_pointer(q->oper_sched, *admin); 101 + rcu_assign_pointer(q->admin_sched, NULL); 102 + 103 + if (*oper) 104 + call_rcu(&(*oper)->rcu, taprio_free_sched_cb); 105 + 106 + *oper = *admin; 107 + *admin = NULL; 108 + } 109 + 110 + static ktime_t get_cycle_time(struct sched_gate_list *sched) 111 + { 112 + struct sched_entry *entry; 113 + ktime_t cycle = 0; 114 + 115 + if (sched->cycle_time != 0) 116 + return sched->cycle_time; 117 + 118 + list_for_each_entry(entry, &sched->entries, list) 119 + cycle = ktime_add_ns(cycle, entry->interval); 120 + 121 + sched->cycle_time = cycle; 122 + 123 + return cycle; 124 + } 63 125 64 126 static int taprio_enqueue(struct sk_buff *skb, struct Qdisc *sch, 65 127 struct sk_buff **to_free) ··· 200 136 { 201 137 struct taprio_sched *q = qdisc_priv(sch); 202 138 struct net_device *dev = qdisc_dev(sch); 139 + struct sk_buff *skb = NULL; 203 140 struct sched_entry *entry; 204 - struct sk_buff *skb; 205 141 u32 gate_mask; 206 142 int i; 207 143 ··· 218 154 * "AdminGateSates" 219 155 */ 220 156 gate_mask = entry ? entry->gate_mask : TAPRIO_ALL_GATES_OPEN; 221 - rcu_read_unlock(); 222 157 223 158 if (!gate_mask) 224 - return NULL; 159 + goto done; 225 160 226 161 for (i = 0; i < dev->num_tx_queues; i++) { 227 162 struct Qdisc *child = q->qdiscs[i]; ··· 260 197 261 198 skb = child->ops->dequeue(child); 262 199 if (unlikely(!skb)) 263 - return NULL; 200 + goto done; 264 201 265 202 qdisc_bstats_update(sch, skb); 266 203 qdisc_qstats_backlog_dec(sch, skb); 267 204 sch->q.qlen--; 268 205 269 - return skb; 206 + goto done; 270 207 } 271 208 272 - return NULL; 209 + done: 210 + rcu_read_unlock(); 211 + 212 + return skb; 213 + } 214 + 215 + static bool should_restart_cycle(const struct sched_gate_list *oper, 216 + const struct sched_entry *entry) 217 + { 218 + if (list_is_last(&entry->list, &oper->entries)) 219 + return true; 220 + 221 + if (ktime_compare(entry->close_time, oper->cycle_close_time) == 0) 222 + return true; 223 + 224 + return false; 225 + } 226 + 227 + static bool should_change_schedules(const struct sched_gate_list *admin, 228 + const struct sched_gate_list *oper, 229 + ktime_t close_time) 230 + { 231 + ktime_t next_base_time, extension_time; 232 + 233 + if (!admin) 234 + return false; 235 + 236 + next_base_time = sched_base_time(admin); 237 + 238 + /* This is the simple case, the close_time would fall after 239 + * the next schedule base_time. 240 + */ 241 + if (ktime_compare(next_base_time, close_time) <= 0) 242 + return true; 243 + 244 + /* This is the cycle_time_extension case, if the close_time 245 + * plus the amount that can be extended would fall after the 246 + * next schedule base_time, we can extend the current schedule 247 + * for that amount. 248 + */ 249 + extension_time = ktime_add_ns(close_time, oper->cycle_time_extension); 250 + 251 + /* FIXME: the IEEE 802.1Q-2018 Specification isn't clear about 252 + * how precisely the extension should be made. So after 253 + * conformance testing, this logic may change. 254 + */ 255 + if (ktime_compare(next_base_time, extension_time) <= 0) 256 + return true; 257 + 258 + return false; 273 259 } 274 260 275 261 static enum hrtimer_restart advance_sched(struct hrtimer *timer) 276 262 { 277 263 struct taprio_sched *q = container_of(timer, struct taprio_sched, 278 264 advance_timer); 265 + struct sched_gate_list *oper, *admin; 279 266 struct sched_entry *entry, *next; 280 267 struct Qdisc *sch = q->root; 281 268 ktime_t close_time; ··· 333 220 spin_lock(&q->current_entry_lock); 334 221 entry = rcu_dereference_protected(q->current_entry, 335 222 lockdep_is_held(&q->current_entry_lock)); 223 + oper = rcu_dereference_protected(q->oper_sched, 224 + lockdep_is_held(&q->current_entry_lock)); 225 + admin = rcu_dereference_protected(q->admin_sched, 226 + lockdep_is_held(&q->current_entry_lock)); 336 227 337 - /* This is the case that it's the first time that the schedule 338 - * runs, so it only happens once per schedule. The first entry 339 - * is pre-calculated during the schedule initialization. 228 + if (!oper) 229 + switch_schedules(q, &admin, &oper); 230 + 231 + /* This can happen in two cases: 1. this is the very first run 232 + * of this function (i.e. we weren't running any schedule 233 + * previously); 2. The previous schedule just ended. The first 234 + * entry of all schedules are pre-calculated during the 235 + * schedule initialization. 340 236 */ 341 - if (unlikely(!entry)) { 342 - next = list_first_entry(&q->entries, struct sched_entry, 237 + if (unlikely(!entry || entry->close_time == oper->base_time)) { 238 + next = list_first_entry(&oper->entries, struct sched_entry, 343 239 list); 344 240 close_time = next->close_time; 345 241 goto first_run; 346 242 } 347 243 348 - if (list_is_last(&entry->list, &q->entries)) 349 - next = list_first_entry(&q->entries, struct sched_entry, 244 + if (should_restart_cycle(oper, entry)) { 245 + next = list_first_entry(&oper->entries, struct sched_entry, 350 246 list); 351 - else 247 + oper->cycle_close_time = ktime_add_ns(oper->cycle_close_time, 248 + oper->cycle_time); 249 + } else { 352 250 next = list_next_entry(entry, list); 251 + } 353 252 354 253 close_time = ktime_add_ns(entry->close_time, next->interval); 254 + close_time = min_t(ktime_t, close_time, oper->cycle_close_time); 255 + 256 + if (should_change_schedules(admin, oper, close_time)) { 257 + /* Set things so the next time this runs, the new 258 + * schedule runs. 259 + */ 260 + close_time = sched_base_time(admin); 261 + switch_schedules(q, &admin, &oper); 262 + } 355 263 356 264 next->close_time = close_time; 357 265 taprio_set_budget(q, next); ··· 405 271 [TCA_TAPRIO_ATTR_PRIOMAP] = { 406 272 .len = sizeof(struct tc_mqprio_qopt) 407 273 }, 408 - [TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST] = { .type = NLA_NESTED }, 409 - [TCA_TAPRIO_ATTR_SCHED_BASE_TIME] = { .type = NLA_S64 }, 410 - [TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY] = { .type = NLA_NESTED }, 411 - [TCA_TAPRIO_ATTR_SCHED_CLOCKID] = { .type = NLA_S32 }, 274 + [TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST] = { .type = NLA_NESTED }, 275 + [TCA_TAPRIO_ATTR_SCHED_BASE_TIME] = { .type = NLA_S64 }, 276 + [TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY] = { .type = NLA_NESTED }, 277 + [TCA_TAPRIO_ATTR_SCHED_CLOCKID] = { .type = NLA_S32 }, 278 + [TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME] = { .type = NLA_S64 }, 279 + [TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION] = { .type = NLA_S64 }, 412 280 }; 413 281 414 282 static int fill_sched_entry(struct nlattr **tb, struct sched_entry *entry, ··· 458 322 return fill_sched_entry(tb, entry, extack); 459 323 } 460 324 461 - /* Returns the number of entries in case of success */ 462 - static int parse_sched_single_entry(struct nlattr *n, 463 - struct taprio_sched *q, 464 - struct netlink_ext_ack *extack) 465 - { 466 - struct nlattr *tb_entry[TCA_TAPRIO_SCHED_ENTRY_MAX + 1] = { }; 467 - struct nlattr *tb_list[TCA_TAPRIO_SCHED_MAX + 1] = { }; 468 - struct sched_entry *entry; 469 - bool found = false; 470 - u32 index; 471 - int err; 472 - 473 - err = nla_parse_nested_deprecated(tb_list, TCA_TAPRIO_SCHED_MAX, n, 474 - entry_list_policy, NULL); 475 - if (err < 0) { 476 - NL_SET_ERR_MSG(extack, "Could not parse nested entry"); 477 - return -EINVAL; 478 - } 479 - 480 - if (!tb_list[TCA_TAPRIO_SCHED_ENTRY]) { 481 - NL_SET_ERR_MSG(extack, "Single-entry must include an entry"); 482 - return -EINVAL; 483 - } 484 - 485 - err = nla_parse_nested_deprecated(tb_entry, 486 - TCA_TAPRIO_SCHED_ENTRY_MAX, 487 - tb_list[TCA_TAPRIO_SCHED_ENTRY], 488 - entry_policy, NULL); 489 - if (err < 0) { 490 - NL_SET_ERR_MSG(extack, "Could not parse nested entry"); 491 - return -EINVAL; 492 - } 493 - 494 - if (!tb_entry[TCA_TAPRIO_SCHED_ENTRY_INDEX]) { 495 - NL_SET_ERR_MSG(extack, "Entry must specify an index\n"); 496 - return -EINVAL; 497 - } 498 - 499 - index = nla_get_u32(tb_entry[TCA_TAPRIO_SCHED_ENTRY_INDEX]); 500 - if (index >= q->num_entries) { 501 - NL_SET_ERR_MSG(extack, "Index for single entry exceeds number of entries in schedule"); 502 - return -EINVAL; 503 - } 504 - 505 - list_for_each_entry(entry, &q->entries, list) { 506 - if (entry->index == index) { 507 - found = true; 508 - break; 509 - } 510 - } 511 - 512 - if (!found) { 513 - NL_SET_ERR_MSG(extack, "Could not find entry"); 514 - return -ENOENT; 515 - } 516 - 517 - err = fill_sched_entry(tb_entry, entry, extack); 518 - if (err < 0) 519 - return err; 520 - 521 - return q->num_entries; 522 - } 523 - 524 325 static int parse_sched_list(struct nlattr *list, 525 - struct taprio_sched *q, 326 + struct sched_gate_list *sched, 526 327 struct netlink_ext_ack *extack) 527 328 { 528 329 struct nlattr *n; ··· 489 416 return err; 490 417 } 491 418 492 - list_add_tail(&entry->list, &q->entries); 419 + list_add_tail(&entry->list, &sched->entries); 493 420 i++; 494 421 } 495 422 496 - q->num_entries = i; 423 + sched->num_entries = i; 497 424 498 425 return i; 499 426 } 500 427 501 - /* Returns the number of entries in case of success */ 502 - static int parse_taprio_opt(struct nlattr **tb, struct taprio_sched *q, 503 - struct netlink_ext_ack *extack) 428 + static int parse_taprio_schedule(struct nlattr **tb, 429 + struct sched_gate_list *new, 430 + struct netlink_ext_ack *extack) 504 431 { 505 432 int err = 0; 506 - int clockid; 507 433 508 - if (tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST] && 509 - tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY]) 510 - return -EINVAL; 511 - 512 - if (tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY] && q->num_entries == 0) 513 - return -EINVAL; 514 - 515 - if (q->clockid == -1 && !tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) 516 - return -EINVAL; 434 + if (tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY]) { 435 + NL_SET_ERR_MSG(extack, "Adding a single entry is not supported"); 436 + return -ENOTSUPP; 437 + } 517 438 518 439 if (tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]) 519 - q->base_time = nla_get_s64( 520 - tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]); 440 + new->base_time = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_BASE_TIME]); 521 441 522 - if (tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) { 523 - clockid = nla_get_s32(tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]); 442 + if (tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION]) 443 + new->cycle_time_extension = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION]); 524 444 525 - /* We only support static clockids and we don't allow 526 - * for it to be modified after the first init. 527 - */ 528 - if (clockid < 0 || (q->clockid != -1 && q->clockid != clockid)) 529 - return -EINVAL; 530 - 531 - q->clockid = clockid; 532 - } 445 + if (tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME]) 446 + new->cycle_time = nla_get_s64(tb[TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME]); 533 447 534 448 if (tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST]) 535 449 err = parse_sched_list( 536 - tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST], q, extack); 537 - else if (tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY]) 538 - err = parse_sched_single_entry( 539 - tb[TCA_TAPRIO_ATTR_SCHED_SINGLE_ENTRY], q, extack); 450 + tb[TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST], new, extack); 451 + if (err < 0) 452 + return err; 540 453 541 - /* parse_sched_* return the number of entries in the schedule, 542 - * a schedule with zero entries is an error. 543 - */ 544 - if (err == 0) { 545 - NL_SET_ERR_MSG(extack, "The schedule should contain at least one entry"); 546 - return -EINVAL; 547 - } 548 - 549 - return err; 454 + return 0; 550 455 } 551 456 552 457 static int taprio_parse_mqprio_opt(struct net_device *dev, ··· 533 482 { 534 483 int i, j; 535 484 536 - if (!qopt) { 485 + if (!qopt && !dev->num_tc) { 537 486 NL_SET_ERR_MSG(extack, "'mqprio' configuration is necessary"); 538 487 return -EINVAL; 539 488 } 489 + 490 + /* If num_tc is already set, it means that the user already 491 + * configured the mqprio part 492 + */ 493 + if (dev->num_tc) 494 + return 0; 540 495 541 496 /* Verify num_tc is not out of max range */ 542 497 if (qopt->num_tc > TC_MAX_QUEUE) { ··· 589 532 return 0; 590 533 } 591 534 592 - static int taprio_get_start_time(struct Qdisc *sch, ktime_t *start) 535 + static int taprio_get_start_time(struct Qdisc *sch, 536 + struct sched_gate_list *sched, 537 + ktime_t *start) 593 538 { 594 539 struct taprio_sched *q = qdisc_priv(sch); 595 - struct sched_entry *entry; 596 540 ktime_t now, base, cycle; 597 541 s64 n; 598 542 599 - base = ns_to_ktime(q->base_time); 543 + base = sched_base_time(sched); 600 544 now = q->get_time(); 601 545 602 546 if (ktime_after(base, now)) { ··· 605 547 return 0; 606 548 } 607 549 608 - /* Calculate the cycle_time, by summing all the intervals. 609 - */ 610 - cycle = 0; 611 - list_for_each_entry(entry, &q->entries, list) 612 - cycle = ktime_add_ns(cycle, entry->interval); 550 + cycle = get_cycle_time(sched); 613 551 614 552 /* The qdisc is expected to have at least one sched_entry. Moreover, 615 553 * any entry must have 'interval' > 0. Thus if the cycle time is zero, ··· 623 569 return 0; 624 570 } 625 571 626 - static void taprio_start_sched(struct Qdisc *sch, ktime_t start) 572 + static void setup_first_close_time(struct taprio_sched *q, 573 + struct sched_gate_list *sched, ktime_t base) 627 574 { 628 - struct taprio_sched *q = qdisc_priv(sch); 629 575 struct sched_entry *first; 630 - unsigned long flags; 576 + ktime_t cycle; 631 577 632 - spin_lock_irqsave(&q->current_entry_lock, flags); 578 + first = list_first_entry(&sched->entries, 579 + struct sched_entry, list); 633 580 634 - first = list_first_entry(&q->entries, struct sched_entry, 635 - list); 581 + cycle = get_cycle_time(sched); 636 582 637 - first->close_time = ktime_add_ns(start, first->interval); 583 + /* FIXME: find a better place to do this */ 584 + sched->cycle_close_time = ktime_add_ns(base, cycle); 585 + 586 + first->close_time = ktime_add_ns(base, first->interval); 638 587 taprio_set_budget(q, first); 639 588 rcu_assign_pointer(q->current_entry, NULL); 589 + } 640 590 641 - spin_unlock_irqrestore(&q->current_entry_lock, flags); 591 + static void taprio_start_sched(struct Qdisc *sch, 592 + ktime_t start, struct sched_gate_list *new) 593 + { 594 + struct taprio_sched *q = qdisc_priv(sch); 595 + ktime_t expires; 596 + 597 + expires = hrtimer_get_expires(&q->advance_timer); 598 + if (expires == 0) 599 + expires = KTIME_MAX; 600 + 601 + /* If the new schedule starts before the next expiration, we 602 + * reprogram it to the earliest one, so we change the admin 603 + * schedule to the operational one at the right time. 604 + */ 605 + start = min_t(ktime_t, start, expires); 642 606 643 607 hrtimer_start(&q->advance_timer, start, HRTIMER_MODE_ABS); 644 608 } ··· 711 639 struct netlink_ext_ack *extack) 712 640 { 713 641 struct nlattr *tb[TCA_TAPRIO_ATTR_MAX + 1] = { }; 642 + struct sched_gate_list *oper, *admin, *new_admin; 714 643 struct taprio_sched *q = qdisc_priv(sch); 715 644 struct net_device *dev = qdisc_dev(sch); 716 645 struct tc_mqprio_qopt *mqprio = NULL; 717 - int i, err, size; 646 + int i, err, clockid; 647 + unsigned long flags; 718 648 ktime_t start; 719 649 720 650 err = nla_parse_nested_deprecated(tb, TCA_TAPRIO_ATTR_MAX, opt, ··· 731 657 if (err < 0) 732 658 return err; 733 659 734 - /* A schedule with less than one entry is an error */ 735 - size = parse_taprio_opt(tb, q, extack); 736 - if (size < 0) 737 - return size; 660 + new_admin = kzalloc(sizeof(*new_admin), GFP_KERNEL); 661 + if (!new_admin) { 662 + NL_SET_ERR_MSG(extack, "Not enough memory for a new schedule"); 663 + return -ENOMEM; 664 + } 665 + INIT_LIST_HEAD(&new_admin->entries); 738 666 739 - hrtimer_init(&q->advance_timer, q->clockid, HRTIMER_MODE_ABS); 740 - q->advance_timer.function = advance_sched; 667 + rcu_read_lock(); 668 + oper = rcu_dereference(q->oper_sched); 669 + admin = rcu_dereference(q->admin_sched); 670 + rcu_read_unlock(); 741 671 742 - switch (q->clockid) { 743 - case CLOCK_REALTIME: 744 - q->get_time = ktime_get_real; 745 - break; 746 - case CLOCK_MONOTONIC: 747 - q->get_time = ktime_get; 748 - break; 749 - case CLOCK_BOOTTIME: 750 - q->get_time = ktime_get_boottime; 751 - break; 752 - case CLOCK_TAI: 753 - q->get_time = ktime_get_clocktai; 754 - break; 755 - default: 756 - return -ENOTSUPP; 672 + if (mqprio && (oper || admin)) { 673 + NL_SET_ERR_MSG(extack, "Changing the traffic mapping of a running schedule is not supported"); 674 + err = -ENOTSUPP; 675 + goto free_sched; 757 676 } 758 677 759 - for (i = 0; i < dev->num_tx_queues; i++) { 760 - struct netdev_queue *dev_queue; 761 - struct Qdisc *qdisc; 678 + err = parse_taprio_schedule(tb, new_admin, extack); 679 + if (err < 0) 680 + goto free_sched; 762 681 763 - dev_queue = netdev_get_tx_queue(dev, i); 764 - qdisc = qdisc_create_dflt(dev_queue, 765 - &pfifo_qdisc_ops, 766 - TC_H_MAKE(TC_H_MAJ(sch->handle), 767 - TC_H_MIN(i + 1)), 768 - extack); 769 - if (!qdisc) 770 - return -ENOMEM; 682 + if (new_admin->num_entries == 0) { 683 + NL_SET_ERR_MSG(extack, "There should be at least one entry in the schedule"); 684 + err = -EINVAL; 685 + goto free_sched; 686 + } 771 687 772 - if (i < dev->real_num_tx_queues) 773 - qdisc_hash_add(qdisc, false); 688 + if (tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) { 689 + clockid = nla_get_s32(tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]); 774 690 775 - q->qdiscs[i] = qdisc; 691 + /* We only support static clockids and we don't allow 692 + * for it to be modified after the first init. 693 + */ 694 + if (clockid < 0 || 695 + (q->clockid != -1 && q->clockid != clockid)) { 696 + NL_SET_ERR_MSG(extack, "Changing the 'clockid' of a running schedule is not supported"); 697 + err = -ENOTSUPP; 698 + goto free_sched; 699 + } 700 + 701 + q->clockid = clockid; 702 + } 703 + 704 + if (q->clockid == -1 && !tb[TCA_TAPRIO_ATTR_SCHED_CLOCKID]) { 705 + NL_SET_ERR_MSG(extack, "Specifying a 'clockid' is mandatory"); 706 + err = -EINVAL; 707 + goto free_sched; 708 + } 709 + 710 + taprio_set_picos_per_byte(dev, q); 711 + 712 + /* Protects against enqueue()/dequeue() */ 713 + spin_lock_bh(qdisc_lock(sch)); 714 + 715 + if (!hrtimer_active(&q->advance_timer)) { 716 + hrtimer_init(&q->advance_timer, q->clockid, HRTIMER_MODE_ABS); 717 + q->advance_timer.function = advance_sched; 776 718 } 777 719 778 720 if (mqprio) { ··· 804 714 mqprio->prio_tc_map[i]); 805 715 } 806 716 807 - taprio_set_picos_per_byte(dev, q); 808 - 809 - err = taprio_get_start_time(sch, &start); 810 - if (err < 0) { 811 - NL_SET_ERR_MSG(extack, "Internal error: failed get start time"); 812 - return err; 717 + switch (q->clockid) { 718 + case CLOCK_REALTIME: 719 + q->get_time = ktime_get_real; 720 + break; 721 + case CLOCK_MONOTONIC: 722 + q->get_time = ktime_get; 723 + break; 724 + case CLOCK_BOOTTIME: 725 + q->get_time = ktime_get_boottime; 726 + break; 727 + case CLOCK_TAI: 728 + q->get_time = ktime_get_clocktai; 729 + break; 730 + default: 731 + NL_SET_ERR_MSG(extack, "Invalid 'clockid'"); 732 + err = -EINVAL; 733 + goto unlock; 813 734 } 814 735 815 - taprio_start_sched(sch, start); 736 + err = taprio_get_start_time(sch, new_admin, &start); 737 + if (err < 0) { 738 + NL_SET_ERR_MSG(extack, "Internal error: failed get start time"); 739 + goto unlock; 740 + } 816 741 817 - return 0; 742 + setup_first_close_time(q, new_admin, start); 743 + 744 + /* Protects against advance_sched() */ 745 + spin_lock_irqsave(&q->current_entry_lock, flags); 746 + 747 + taprio_start_sched(sch, start, new_admin); 748 + 749 + rcu_assign_pointer(q->admin_sched, new_admin); 750 + if (admin) 751 + call_rcu(&admin->rcu, taprio_free_sched_cb); 752 + new_admin = NULL; 753 + 754 + spin_unlock_irqrestore(&q->current_entry_lock, flags); 755 + 756 + err = 0; 757 + 758 + unlock: 759 + spin_unlock_bh(qdisc_lock(sch)); 760 + 761 + free_sched: 762 + kfree(new_admin); 763 + 764 + return err; 818 765 } 819 766 820 767 static void taprio_destroy(struct Qdisc *sch) 821 768 { 822 769 struct taprio_sched *q = qdisc_priv(sch); 823 770 struct net_device *dev = qdisc_dev(sch); 824 - struct sched_entry *entry, *n; 825 771 unsigned int i; 826 772 827 773 spin_lock(&taprio_list_lock); ··· 876 750 877 751 netdev_set_num_tc(dev, 0); 878 752 879 - list_for_each_entry_safe(entry, n, &q->entries, list) { 880 - list_del(&entry->list); 881 - kfree(entry); 882 - } 753 + if (q->oper_sched) 754 + call_rcu(&q->oper_sched->rcu, taprio_free_sched_cb); 755 + 756 + if (q->admin_sched) 757 + call_rcu(&q->admin_sched->rcu, taprio_free_sched_cb); 883 758 } 884 759 885 760 static int taprio_init(struct Qdisc *sch, struct nlattr *opt, ··· 888 761 { 889 762 struct taprio_sched *q = qdisc_priv(sch); 890 763 struct net_device *dev = qdisc_dev(sch); 764 + int i; 891 765 892 - INIT_LIST_HEAD(&q->entries); 893 766 spin_lock_init(&q->current_entry_lock); 894 767 895 - /* We may overwrite the configuration later */ 896 768 hrtimer_init(&q->advance_timer, CLOCK_TAI, HRTIMER_MODE_ABS); 769 + q->advance_timer.function = advance_sched; 897 770 898 771 q->root = sch; 899 772 ··· 922 795 spin_lock(&taprio_list_lock); 923 796 list_add(&q->taprio_list, &taprio_list); 924 797 spin_unlock(&taprio_list_lock); 798 + 799 + for (i = 0; i < dev->num_tx_queues; i++) { 800 + struct netdev_queue *dev_queue; 801 + struct Qdisc *qdisc; 802 + 803 + dev_queue = netdev_get_tx_queue(dev, i); 804 + qdisc = qdisc_create_dflt(dev_queue, 805 + &pfifo_qdisc_ops, 806 + TC_H_MAKE(TC_H_MAJ(sch->handle), 807 + TC_H_MIN(i + 1)), 808 + extack); 809 + if (!qdisc) 810 + return -ENOMEM; 811 + 812 + if (i < dev->real_num_tx_queues) 813 + qdisc_hash_add(qdisc, false); 814 + 815 + q->qdiscs[i] = qdisc; 816 + } 925 817 926 818 return taprio_change(sch, opt, extack); 927 819 } ··· 1013 867 return -1; 1014 868 } 1015 869 870 + static int dump_schedule(struct sk_buff *msg, 871 + const struct sched_gate_list *root) 872 + { 873 + struct nlattr *entry_list; 874 + struct sched_entry *entry; 875 + 876 + if (nla_put_s64(msg, TCA_TAPRIO_ATTR_SCHED_BASE_TIME, 877 + root->base_time, TCA_TAPRIO_PAD)) 878 + return -1; 879 + 880 + if (nla_put_s64(msg, TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME, 881 + root->cycle_time, TCA_TAPRIO_PAD)) 882 + return -1; 883 + 884 + if (nla_put_s64(msg, TCA_TAPRIO_ATTR_SCHED_CYCLE_TIME_EXTENSION, 885 + root->cycle_time_extension, TCA_TAPRIO_PAD)) 886 + return -1; 887 + 888 + entry_list = nla_nest_start_noflag(msg, 889 + TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST); 890 + if (!entry_list) 891 + goto error_nest; 892 + 893 + list_for_each_entry(entry, &root->entries, list) { 894 + if (dump_entry(msg, entry) < 0) 895 + goto error_nest; 896 + } 897 + 898 + nla_nest_end(msg, entry_list); 899 + return 0; 900 + 901 + error_nest: 902 + nla_nest_cancel(msg, entry_list); 903 + return -1; 904 + } 905 + 1016 906 static int taprio_dump(struct Qdisc *sch, struct sk_buff *skb) 1017 907 { 1018 908 struct taprio_sched *q = qdisc_priv(sch); 1019 909 struct net_device *dev = qdisc_dev(sch); 910 + struct sched_gate_list *oper, *admin; 1020 911 struct tc_mqprio_qopt opt = { 0 }; 1021 - struct nlattr *nest, *entry_list; 1022 - struct sched_entry *entry; 912 + struct nlattr *nest, *sched_nest; 1023 913 unsigned int i; 914 + 915 + rcu_read_lock(); 916 + oper = rcu_dereference(q->oper_sched); 917 + admin = rcu_dereference(q->admin_sched); 1024 918 1025 919 opt.num_tc = netdev_get_num_tc(dev); 1026 920 memcpy(opt.prio_tc_map, dev->prio_tc_map, sizeof(opt.prio_tc_map)); ··· 1072 886 1073 887 nest = nla_nest_start_noflag(skb, TCA_OPTIONS); 1074 888 if (!nest) 1075 - return -ENOSPC; 889 + goto start_error; 1076 890 1077 891 if (nla_put(skb, TCA_TAPRIO_ATTR_PRIOMAP, sizeof(opt), &opt)) 1078 - goto options_error; 1079 - 1080 - if (nla_put_s64(skb, TCA_TAPRIO_ATTR_SCHED_BASE_TIME, 1081 - q->base_time, TCA_TAPRIO_PAD)) 1082 892 goto options_error; 1083 893 1084 894 if (nla_put_s32(skb, TCA_TAPRIO_ATTR_SCHED_CLOCKID, q->clockid)) 1085 895 goto options_error; 1086 896 1087 - entry_list = nla_nest_start_noflag(skb, 1088 - TCA_TAPRIO_ATTR_SCHED_ENTRY_LIST); 1089 - if (!entry_list) 897 + if (oper && dump_schedule(skb, oper)) 1090 898 goto options_error; 1091 899 1092 - list_for_each_entry(entry, &q->entries, list) { 1093 - if (dump_entry(skb, entry) < 0) 1094 - goto options_error; 1095 - } 900 + if (!admin) 901 + goto done; 1096 902 1097 - nla_nest_end(skb, entry_list); 903 + sched_nest = nla_nest_start_noflag(skb, TCA_TAPRIO_ATTR_ADMIN_SCHED); 904 + 905 + if (dump_schedule(skb, admin)) 906 + goto admin_error; 907 + 908 + nla_nest_end(skb, sched_nest); 909 + 910 + done: 911 + rcu_read_unlock(); 1098 912 1099 913 return nla_nest_end(skb, nest); 1100 914 915 + admin_error: 916 + nla_nest_cancel(skb, sched_nest); 917 + 1101 918 options_error: 1102 919 nla_nest_cancel(skb, nest); 1103 - return -1; 920 + 921 + start_error: 922 + rcu_read_unlock(); 923 + return -ENOSPC; 1104 924 } 1105 925 1106 926 static struct Qdisc *taprio_leaf(struct Qdisc *sch, unsigned long cl) ··· 1193 1001 .id = "taprio", 1194 1002 .priv_size = sizeof(struct taprio_sched), 1195 1003 .init = taprio_init, 1004 + .change = taprio_change, 1196 1005 .destroy = taprio_destroy, 1197 1006 .peek = taprio_peek, 1198 1007 .dequeue = taprio_dequeue,