commits

When the scheduler sets TIF_NEED_RESCHED & we call into the scheduler
from arch/mips/kernel/entry.S we disable interrupts. This is true
regardless of whether we reach work_resched from syscall_exit_work,
resume_userspace or by looping after calling schedule(). Although we
disable interrupts in these paths we don't call trace_hardirqs_off()
before calling into C code which may acquire locks, and we therefore
leave lockdep with an inconsistent view of whether interrupts are
disabled or not when CONFIG_PROVE_LOCKING & CONFIG_DEBUG_LOCKDEP are
both enabled.

Without tracing this interrupt state lockdep will print warnings such
as the following once a task returns from a syscall via
syscall_exit_partial with TIF_NEED_RESCHED set:

[ 49.927678] ------------[ cut here ]------------
[ 49.934445] WARNING: CPU: 0 PID: 1 at kernel/locking/lockdep.c:3687 check_flags.part.41+0x1dc/0x1e8
[ 49.946031] DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
[ 49.946355] CPU: 0 PID: 1 Comm: init Not tainted 4.10.0-00439-gc9fd5d362289-dirty #197
[ 49.963505] Stack : 0000000000000000 ffffffff81bb5d6a 0000000000000006 ffffffff801ce9c4
[ 49.974431] 0000000000000000 0000000000000000 0000000000000000 000000000000004a
[ 49.985300] ffffffff80b7e487 ffffffff80a24498 a8000000ff160000 ffffffff80ede8b8
[ 49.996194] 0000000000000001 0000000000000000 0000000000000000 0000000077c8030c
[ 50.007063] 000000007fd8a510 ffffffff801cd45c 0000000000000000 a8000000ff127c88
[ 50.017945] 0000000000000000 ffffffff801cf928 0000000000000001 ffffffff80a24498
[ 50.028827] 0000000000000000 0000000000000001 0000000000000000 0000000000000000
[ 50.039688] 0000000000000000 a8000000ff127bd0 0000000000000000 ffffffff805509bc
[ 50.050575] 00000000140084e0 0000000000000000 0000000000000000 0000000000040a00
[ 50.061448] 0000000000000000 ffffffff8010e1b0 0000000000000000 ffffffff805509bc
[ 50.072327] ...
[ 50.076087] Call Trace:
[ 50.079869] [<ffffffff8010e1b0>] show_stack+0x80/0xa8
[ 50.086577] [<ffffffff805509bc>] dump_stack+0x10c/0x190
[ 50.093498] [<ffffffff8015dde0>] __warn+0xf0/0x108
[ 50.099889] [<ffffffff8015de34>] warn_slowpath_fmt+0x3c/0x48
[ 50.107241] [<ffffffff801c15b4>] check_flags.part.41+0x1dc/0x1e8
[ 50.114961] [<ffffffff801c239c>] lock_is_held_type+0x8c/0xb0
[ 50.122291] [<ffffffff809461b8>] __schedule+0x8c0/0x10f8
[ 50.129221] [<ffffffff80946a60>] schedule+0x30/0x98
[ 50.135659] [<ffffffff80106278>] work_resched+0x8/0x34
[ 50.142397] ---[ end trace 0cb4f6ef5b99fe21 ]---
[ 50.148405] possible reason: unannotated irqs-off.
[ 50.154600] irq event stamp: 400463
[ 50.159566] hardirqs last enabled at (400463): [<ffffffff8094edc8>] _raw_spin_unlock_irqrestore+0x40/0xa8
[ 50.171981] hardirqs last disabled at (400462): [<ffffffff8094eb98>] _raw_spin_lock_irqsave+0x30/0xb0
[ 50.183897] softirqs last enabled at (400450): [<ffffffff8016580c>] __do_softirq+0x4ac/0x6a8
[ 50.195015] softirqs last disabled at (400425): [<ffffffff80165e78>] irq_exit+0x110/0x128

Fix this by using the TRACE_IRQS_OFF macro to call trace_hardirqs_off()
when CONFIG_TRACE_IRQFLAGS is enabled. This is done before invoking
schedule() following the work_resched label because:

1) Interrupts are disabled regardless of the path we take to reach
work_resched() & schedule().

2) Performing the tracing here avoids the need to do it in paths which
disable interrupts but don't call out to C code before hitting a
path which uses the RESTORE_SOME macro that will call
trace_hardirqs_on() or trace_hardirqs_off() as appropriate.

We call trace_hardirqs_on() using the TRACE_IRQS_ON macro before calling
syscall_trace_leave() for similar reasons, ensuring that lockdep has a
consistent view of state after we re-enable interrupts.

Signed-off-by: Paul Burton <paul.burton@imgtec.com>
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Cc: linux-mips@linux-mips.org
Cc: stable <stable@vger.kernel.org>
Patchwork: https://patchwork.linux-mips.org/patch/15385/
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

8y ago

Linus Torvalds

46589d7a

Merge tag 'pinctrl-v4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

8y ago

Ingo Molnar

e3c2c4fb

Merge tag 'perf-urgent-for-mingo-4.12-20170626' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

8y ago

Baoquan He

8eabf42a

x86/boot/KASLR: Fix kexec crash due to 'virt_addr' calculation bug

8y ago

Yisheng Xie

bbeedfda

ARM: 8681/1: make VMSPLIT_3G_OPT depends on !ARM_LPAE

8y ago

Paul Burton

161c51cc

MIPS: pm-cps: Drop manual cache-line alignment of ready_count

8y ago

Linus Torvalds

fc93274a

Merge tag 'gpio-v4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

8y ago

Brian Norris

1d80df93

Revert "pinctrl: rockchip: avoid hardirq-unsafe functions in irq_chip"

8y ago

Linus Torvalds

c0bc126f

Linux 4.12-rc7 v4.12-rc7

8y ago

Jiri Olsa

3f938ee2

perf machine: Fix segfault for kernel.kptr_restrict=2

8y ago

Baoquan He

b892cb87

x86/boot/KASLR: Add checking for the offset of kernel virtual address randomization

8y ago

Ard Biesheuvel

60ce2858

ARM: 8680/1: boot/compressed: fix inappropriate Thumb2 mnemonic for __nop

8y ago

Aleksandar Markovic

ddbfff74

MIPS: math-emu: Handle zero accumulator case in MADDF and MSUBF separately

8y ago

Linus Torvalds

c0a0c7a4

Merge tag 'trace-v4.12-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

8y ago

Hans de Goede

c06632ea

gpio: acpi: Skip _AEI entries without a handler rather then aborting the scan

8y ago

Linus Torvalds

a4fd8b3a

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

8y ago

Ingo Molnar

977282ed

Merge tag 'perf-urgent-for-mingo-4.12-20170622' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

8y ago

Kan Liang

80c65fdb

perf/x86/intel/uncore: Fix wrong box pointer check

8y ago

Ard Biesheuvel

06a4b6d0

ARM: 8677/1: boot/compressed: fix decompressor header layout for v7-M

8y ago

Karl Beldan

25d8b92e

MIPS: head: Reorder instructions missing a delay slot

8y ago

Zack Weinberg

fbd57629

uapi/linux/a.out.h: don't use deprecated system-specific predefines.

8y ago

Sabrina Dubroca

9e52b325

tracing/kprobes: Allow to create probe with a module name starting with a digit

8y ago

Bartosz Golaszewski

ad537b82

gpiolib: fix filtering out unwanted events

8y ago

Linus Torvalds

5f4b37d8

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

8y ago

Thomas Gleixner

26fcd952

x86/mshyperv: Remove excess #includes from mshyperv.h

8y ago

Kan Liang

fb3a5055

perf/x86/intel: Add 1G DTLB load/store miss support for SKL

8y ago

Björn Töpel

7598f8bc

perf probe: Fix probe definition for inlined functions

In commit 613f050d68a8 ("perf probe: Fix to probe on gcc generated
functions in modules"), the offset from symbol is, incorrectly, added
to the trace point address. This leads to incorrect probe trace points
for inlined functions and when using relative line number on symbols.

Prior this patch:
$ perf probe -m nf_nat -D in_range
p:probe/in_range nf_nat:in_range.isra.9+0
$ perf probe -m i40e -D i40e_clean_rx_irq
p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2212
$ perf probe -m i40e -D i40e_clean_rx_irq:16
p:probe/i40e_clean_rx_irq i40e:i40e_lan_xmit_frame+626

After:
$ perf probe -m nf_nat -D in_range
p:probe/in_range nf_nat:in_range.isra.9+0
$ perf probe -m i40e -D i40e_clean_rx_irq
p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+1106
$ perf probe -m i40e -D i40e_clean_rx_irq:16
p:probe/i40e_clean_rx_irq i40e:i40e_napi_poll+2665

Committer testing:

Using 'pfunct', a tool found in the 'dwarves' package [1], one can ask what are
the functions that while not being explicitely marked as inline, were inlined
by the compiler:

# pfunct --cc_inlined /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko | head
__ew32
e1000_regdump
e1000e_dump_ps_pages
e1000_desc_unused
e1000e_systim_to_hwtstamp
e1000e_rx_hwtstamp
e1000e_update_rdt_wa
e1000e_update_tdt_wa
e1000_put_txbuf
e1000_consume_page

Then ask 'perf probe' to produce the kprobe_tracer probe definitions for two of
them:

# perf probe -m e1000e -D e1000e_rx_hwtstamp
p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+74

# perf probe -m e1000e -D e1000_consume_page
p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+876
p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+1506
p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074

Now lets concentrate on the 'e1000_consume_page' one, that was inlined twice in
e1000_clean_jumbo_rx_irq(), lets see what readelf says about the DWARF tags for
that function:

$ readelf -wi /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
<SNIP>
<1><13e27b>: Abbrev Number: 121 (DW_TAG_subprogram)
<13e27c> DW_AT_name : (indirect string, offset: 0xa8945): e1000_clean_jumbo_rx_irq
<13e287> DW_AT_low_pc : 0x17a30
<3><13e6ef>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
<13e6f0> DW_AT_abstract_origin: <0x13ed2c>
<13e6f4> DW_AT_low_pc : 0x17be6
<SNIP>
<1><13ed2c>: Abbrev Number: 142 (DW_TAG_subprogram)
<13ed2e> DW_AT_name : (indirect string, offset: 0xa54c3): e1000_consume_page

So, the first time in e1000_clean_jumbo_rx_irq() where e1000_consume_page() is
inlined is at PC 0x17be6, which subtracted from e1000_clean_jumbo_rx_irq()'s
address, gives us the offset we should use in the probe definition:

0x17be6 - 0x17a30 = 438

but above we have 876, which is twice as much.

Lets see the second inline expansion of e1000_consume_page() in
e1000_clean_jumbo_rx_irq():

<3><13e86e>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
<13e86f> DW_AT_abstract_origin: <0x13ed2c>
<13e873> DW_AT_low_pc : 0x17d21

0x17d21 - 0x17a30 = 753

So we where adding it at twice the offset from the containing function as we
should.

And then after this patch:

# perf probe -m e1000e -D e1000e_rx_hwtstamp
p:probe/e1000e_rx_hwtstamp e1000e:e1000_receive_skb+37

# perf probe -m e1000e -D e1000_consume_page
p:probe/e1000_consume_page e1000e:e1000_clean_jumbo_rx_irq+438
p:probe/e1000_consume_page_1 e1000e:e1000_clean_jumbo_rx_irq+753
p:probe/e1000_consume_page_2 e1000e:e1000_clean_jumbo_rx_irq+1353
#

Which matches the two first expansions and shows that because we were
doubling the offset it would spill over the next function:

readelf -sw /lib/modules/4.12.0-rc4+/kernel/drivers/net/ethernet/intel/e1000e/e1000e.ko
673: 0000000000017a30 1626 FUNC LOCAL DEFAULT 2 e1000_clean_jumbo_rx_irq
674: 0000000000018090 2013 FUNC LOCAL DEFAULT 2 e1000_clean_rx_irq_ps

This is the 3rd inline expansion of e1000_consume_page() in
e1000_clean_jumbo_rx_irq():

<3><13ec77>: Abbrev Number: 119 (DW_TAG_inlined_subroutine)
<13ec78> DW_AT_abstract_origin: <0x13ed2c>
<13ec7c> DW_AT_low_pc : 0x17f79

0x17f79 - 0x17a30 = 1353

So:

0x17a30 + 2 * 1353 = 0x184c2

And:

0x184c2 - 0x18090 = 1074

Which explains the bogus third expansion for e1000_consume_page() to end up at:

p:probe/e1000_consume_page_2 e1000e:e1000_clean_rx_irq_ps+1074

All fixed now :-)

[1] https://git.kernel.org/pub/scm/devel/pahole/pahole.git/

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 613f050d68a8 ("perf probe: Fix to probe on gcc generated functions in modules")
Link: http://lkml.kernel.org/r/20170621164134.5701-1-bjorn.topel@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

8y ago

Jérôme Glisse

98fe3633

x86/mm/hotplug: Fix BUG_ON() after hot-remove by not freeing PUD

8y ago

Vladimir Murzin

7ef4783e

ARM: 8676/1: NOMMU: provide pgprot_device() macro

8y ago

Jakub Kicinski

dbd18777

hashtable: remove repeated phrase from a comment

8y ago

Steven Rostedt (VMware)

0f179765

ftrace: Fix regression with module command in stack_trace_filter

8y ago

Linus Torvalds

35d8d5d4

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

8y ago

Thomas Gleixner

8e6cec1c

Merge branch 'clockevents/4.12-fixes' of https://git.linaro.org/people/daniel.lezcano/linux into timers/urgent

8y ago

Linus Torvalds

8d829b9b

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

8y ago

Hendrik Brueckner

8a1898db

perf/aux: Correct return code of rb_alloc_aux() if !has_aux(ev)

8y ago

Sudeep Holla

29226b19

ARM: 8675/1: MCPM: ensure not to enter __hyp_soft_restart from loopback and cpu_power_down

8y ago

Linus Torvalds

b4df2e35

Merge tag 'powerpc-4.12-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

8y ago

Linus Torvalds

32c1431e

Linux 4.12-rc5 v4.12-rc5

8y ago

Linus Torvalds

1a8cca18

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

8y ago

Will Deacon

dbb236c1

arm64/vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW

8y ago

Stephen Rothwell

459fa246

clocksource: Explicitly include linux/clocksource.h when needed

8y ago

Linus Torvalds

48b6bbef

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix refcounting wrt timers which hold onto inet6 address objects,
from Xin Long.

2) Fix an ancient bug in wireless wext ioctls, from Johannes Berg.

3) Firmware handling fixes in brcm80211 driver, from Arend Van Spriel.

4) Several mlx5 driver fixes (firmware readiness, timestamp cap
reporting, devlink command validity checking, tc offloading, etc.)
From Eli Cohen, Maor Dickman, Chris Mi, and Or Gerlitz.

5) Fix dst leak in IP/IP6 tunnels, from Haishuang Yan.

6) Fix dst refcount bug in decnet, from Wei Wang.

7) Netdev can be double freed in register_vlan_device(). Fix from Gao
Feng.

8) Don't allow object to be destroyed while it is being dumped in SCTP,
from Xin Long.

9) Fix dpaa_eth build when modular, from Madalin Bucur.

10) Fix throw route leaks, from Serhey Popovych.

11) IFLA_GROUP missing from if_nlmsg_size() and ifla_policy[] table,
also from Serhey Popovych.

12) Fix premature TX SKB free in stmmac, from Niklas Cassel.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (36 commits)
igmp: add a missing spin_lock_init()
net: stmmac: free an skb first when there are no longer any descriptors using it
sfc: remove duplicate up_write on VF filter_sem
rtnetlink: add IFLA_GROUP to ifla_policy
ipv6: Do not leak throw route references
dt-bindings: net: sms911x: Add missing optional VDD regulators
dpaa_eth: reuse the dma_ops provided by the FMan MAC device
fsl/fman: propagate dma_ops
net/core: remove explicit do_softirq() from busy_poll_stop()
fib_rules: Resolve goto rules target on delete
sctp: ensure ep is not destroyed before doing the dump
net/hns:bugfix of ethtool -t phy self_test
net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev
cxgb4: notify uP to route ctrlq compl to rdma rspq
ip6_tunnel: Correct tos value in collect_md mode
decnet: always not take dst->__refcnt when inserting dst into hash table
ip6_tunnel: fix potential issue in __ip6_tnl_rcv
ip_tunnel: fix potential issue in ip_tunnel_rcv
brcmfmac: fix uninitialized warning in brcmf_usb_probe_phase2()
net/mlx5e: Avoid doing a cleanup call if the profile doesn't have it
...