Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux

netfilter: x_tables: fix ordering of jumpstack allocation and table update

During kernel stability testing on an SMP ARMv7 system, Yalin Wang
reported the following panic from the netfilter code:

1fe0: 0000001c 5e2d3b10 4007e779 4009e110 60000010 00000032 ff565656 ff545454
[<c06c48dc>] (ipt_do_table+0x448/0x584) from [<c0655ef0>] (nf_iterate+0x48/0x7c)
[<c0655ef0>] (nf_iterate+0x48/0x7c) from [<c0655f7c>] (nf_hook_slow+0x58/0x104)
[<c0655f7c>] (nf_hook_slow+0x58/0x104) from [<c0683bbc>] (ip_local_deliver+0x88/0xa8)
[<c0683bbc>] (ip_local_deliver+0x88/0xa8) from [<c0683718>] (ip_rcv_finish+0x418/0x43c)
[<c0683718>] (ip_rcv_finish+0x418/0x43c) from [<c062b1c4>] (__netif_receive_skb+0x4cc/0x598)
[<c062b1c4>] (__netif_receive_skb+0x4cc/0x598) from [<c062b314>] (process_backlog+0x84/0x158)
[<c062b314>] (process_backlog+0x84/0x158) from [<c062de84>] (net_rx_action+0x70/0x1dc)
[<c062de84>] (net_rx_action+0x70/0x1dc) from [<c0088230>] (__do_softirq+0x11c/0x27c)
[<c0088230>] (__do_softirq+0x11c/0x27c) from [<c008857c>] (do_softirq+0x44/0x50)
[<c008857c>] (do_softirq+0x44/0x50) from [<c0088614>] (local_bh_enable_ip+0x8c/0xd0)
[<c0088614>] (local_bh_enable_ip+0x8c/0xd0) from [<c06b0330>] (inet_stream_connect+0x164/0x298)
[<c06b0330>] (inet_stream_connect+0x164/0x298) from [<c061d68c>] (sys_connect+0x88/0xc8)
[<c061d68c>] (sys_connect+0x88/0xc8) from [<c000e340>] (ret_fast_syscall+0x0/0x30)
Code: 2a000021 e59d2028 e59de01c e59f011c (e7824103)
---[ end trace da227214a82491bd ]---
Kernel panic - not syncing: Fatal exception in interrupt

This comes about because CPU1 is executing xt_replace_table in response
to a setsockopt syscall, resulting in:

ret = xt_jumpstack_alloc(newinfo);
--> newinfo->jumpstack = kzalloc(size, GFP_KERNEL);

[...]

table->private = newinfo;
newinfo->initial_entries = private->initial_entries;

Meanwhile, CPU0 is handling the network receive path and ends up in
ipt_do_table, resulting in:

private = table->private;

[...]

jumpstack = (struct ipt_entry **)private->jumpstack[cpu];

On weakly ordered memory architectures, the writes to table->private
and newinfo->jumpstack from CPU1 can be observed out of order by CPU0.
Furthermore, on architectures which don't respect ordering of address
dependencies (i.e. Alpha), the reads from CPU0 can also be re-ordered.

This patch adds an smp_wmb() before the assignment to table->private
(which is essentially publishing newinfo) to ensure that all writes to
newinfo will be observed before plugging it into the table structure.
A dependent-read barrier is also added on the consumer sides, to ensure
the same ordering requirements are also respected there.

Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Reported-by: Wang, Yalin <Yalin.Wang@sonymobile.com>
Tested-by: Wang, Yalin <Yalin.Wang@sonymobile.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

authored by

Will Deacon and committed by
Pablo Neira Ayuso
b416c144 278f2b3e

+21 -1
+5
net/ipv4/netfilter/arp_tables.c
··· 271 271 local_bh_disable(); 272 272 addend = xt_write_recseq_begin(); 273 273 private = table->private; 274 + /* 275 + * Ensure we load private-> members after we've fetched the base 276 + * pointer. 277 + */ 278 + smp_read_barrier_depends(); 274 279 table_base = private->entries[smp_processor_id()]; 275 280 276 281 e = get_entry(table_base, private->hook_entry[hook]);
+5
net/ipv4/netfilter/ip_tables.c
··· 327 327 addend = xt_write_recseq_begin(); 328 328 private = table->private; 329 329 cpu = smp_processor_id(); 330 + /* 331 + * Ensure we load private-> members after we've fetched the base 332 + * pointer. 333 + */ 334 + smp_read_barrier_depends(); 330 335 table_base = private->entries[cpu]; 331 336 jumpstack = (struct ipt_entry **)private->jumpstack[cpu]; 332 337 stackptr = per_cpu_ptr(private->stackptr, cpu);
+5
net/ipv6/netfilter/ip6_tables.c
··· 349 349 local_bh_disable(); 350 350 addend = xt_write_recseq_begin(); 351 351 private = table->private; 352 + /* 353 + * Ensure we load private-> members after we've fetched the base 354 + * pointer. 355 + */ 356 + smp_read_barrier_depends(); 352 357 cpu = smp_processor_id(); 353 358 table_base = private->entries[cpu]; 354 359 jumpstack = (struct ip6t_entry **)private->jumpstack[cpu];
+6 -1
net/netfilter/x_tables.c
··· 845 845 return NULL; 846 846 } 847 847 848 - table->private = newinfo; 849 848 newinfo->initial_entries = private->initial_entries; 849 + /* 850 + * Ensure contents of newinfo are visible before assigning to 851 + * private. 852 + */ 853 + smp_wmb(); 854 + table->private = newinfo; 850 855 851 856 /* 852 857 * Even though table entries have now been swapped, other CPU's