Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

taprio: fix panic while hw offload sched list swap

Don't swap oper and admin schedules too early, it's not correct and
causes crash.

Steps to reproduce:

1)
tc qdisc replace dev eth0 parent root handle 100 taprio \
num_tc 3 \
map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
queues 1@0 1@1 1@2 \
base-time $SOME_BASE_TIME \
sched-entry S 01 80000 \
sched-entry S 02 15000 \
sched-entry S 04 40000 \
flags 2

2)
tc qdisc replace dev eth0 parent root handle 100 taprio \
base-time $SOME_BASE_TIME \
sched-entry S 01 90000 \
sched-entry S 02 20000 \
sched-entry S 04 40000 \
flags 2

3)
tc qdisc replace dev eth0 parent root handle 100 taprio \
base-time $SOME_BASE_TIME \
sched-entry S 01 150000 \
sched-entry S 02 200000 \
sched-entry S 04 40000 \
flags 2

Do 2 3 2 .. steps more times if not happens and observe:

[ 305.832319] Unable to handle kernel write to read-only memory at
virtual address ffff0000087ce7f0
[ 305.910887] CPU: 0 PID: 0 Comm: swapper/0 Not tainted
[ 305.919306] Hardware name: Texas Instruments AM654 Base Board (DT)

[...]

[ 306.017119] x1 : ffff800848031d88 x0 : ffff800848031d80
[ 306.022422] Call trace:
[ 306.024866] taprio_free_sched_cb+0x4c/0x98
[ 306.029040] rcu_process_callbacks+0x25c/0x410
[ 306.033476] __do_softirq+0x10c/0x208
[ 306.037132] irq_exit+0xb8/0xc8
[ 306.040267] __handle_domain_irq+0x64/0xb8
[ 306.044352] gic_handle_irq+0x7c/0x178
[ 306.048092] el1_irq+0xb0/0x128
[ 306.051227] arch_cpu_idle+0x10/0x18
[ 306.054795] do_idle+0x120/0x138
[ 306.058015] cpu_startup_entry+0x20/0x28
[ 306.061931] rest_init+0xcc/0xd8
[ 306.065154] start_kernel+0x3bc/0x3e4
[ 306.068810] Code: f2fbd5b7 f2fbd5b6 d503201f f9400422 (f9000662)
[ 306.074900] ---[ end trace 96c8e2284a9d9d6e ]---
[ 306.079507] Kernel panic - not syncing: Fatal exception in interrupt
[ 306.085847] SMP: stopping secondary CPUs
[ 306.089765] Kernel Offset: disabled

Try to explain one of the possible crash cases:

The "real" admin list is assigned when admin_sched is set to
new_admin, it happens after "swap", that assigns to oper_sched NULL.
Thus if call qdisc show it can crash.

Farther, next second time, when sched list is updated, the admin_sched
is not NULL and becomes the oper_sched, previous oper_sched was NULL so
just skipped. But then admin_sched is assigned new_admin, but schedules
to free previous assigned admin_sched (that already became oper_sched).

Farther, next third time, when sched list is updated,
while one more swap, oper_sched is not null, but it was happy to be
freed already (while prev. admin update), so while try to free
oper_sched the kernel panic happens at taprio_free_sched_cb().

So, move the "swap emulation" where it should be according to function
comment from code.

Fixes: 9c66d15646760e ("taprio: Add support for hardware offloading")
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

authored by

Ivan Khoronzhuk and committed by
David S. Miller
0763b3e8 fc564e09

+3 -2
+3 -2
net/sched/sch_taprio.c
··· 1224 1224 goto done; 1225 1225 } 1226 1226 1227 - taprio_offload_config_changed(q); 1228 - 1229 1227 done: 1230 1228 taprio_offload_free(offload); 1231 1229 ··· 1503 1505 call_rcu(&admin->rcu, taprio_free_sched_cb); 1504 1506 1505 1507 spin_unlock_irqrestore(&q->current_entry_lock, flags); 1508 + 1509 + if (FULL_OFFLOAD_IS_ENABLED(taprio_flags)) 1510 + taprio_offload_config_changed(q); 1506 1511 } 1507 1512 1508 1513 new_admin = NULL;