commits

Pull ftrace fixes from Steven Rostedt:
"During testing Sedat Dilek hit a "suspicious RCU usage" splat that
pointed out a real bug. During suspend and resume the tlb_flush
tracepoint is called when the CPU is going offline. As the CPU has
been noted as offline, RCU is ignoring that CPU, which means that it
can not use RCU protected locks. When tracepoints are activated, they
require RCU locking, and if RCU is ignoring a CPU that runs a
tracepoint, there is a chance that the tracepoint could cause
corruption.

The solution was to change the tracepoint into a
TRACE_EVENT_CONDITION() which allows us to check a condition to
determine if the tracepoint should be called or not. If the condition
is not met, the rcu protected code will not be executed. By adding
the condition "cpu_online(smp_processor_id())", this will prevent the
RCU protected code from being executed if the CPU is marked offline.

After adding this, another bug was discovered. As RCU checks rcu
callers, if a rcu call is not done, there is no check (obviously). We
found that tracepoints could be added in RCU ignored locations and not
have lockdep complain until the tracepoint is activated. This missed
places where tracepoints were added in places they should not have
been. To fix this, code was added in 3.18 that if lockdep is enabled,
any tracepoint will still call the rcu checks even if the tracepoint
is not enabled. The bug here, is that the check does not take the
CONDITION into account. As the condition may prevent tracepoints from
being activated in RCU ignored areas (as the one patch does), we get
false positives when we enable lockdep and hit a tracepoint that the
condition prevents it from being called in a RCU ignored location.

The fix for this is to add the CONDITION to the rcu checks, even if
the tracepoint is not enabled"

* tag 'trace-fixes-v3.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
x86/tlb/trace: Do not trace on CPU that is offline
tracing: Add condition check to RCU lockdep checks

11y ago

Dave Chinner

9c9ce763

aio: annotate aio_read_event_ring for sleep patterns

11y ago

Linus Torvalds

e36f014e

Linux 3.19-rc7 v3.19-rc7

11y ago

Linus Torvalds

0b1ce1a8

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

11y ago

Steven Rostedt (Red Hat)

6c8465a8

x86/tlb/trace: Do not trace on CPU that is offline

When taking a CPU down for suspend and resume, a tracepoint may be called
when the CPU has been designated offline. As tracepoints require RCU for
protection, they must not be called if the current CPU is offline.

Unfortunately, trace_tlb_flush() is called in this scenario as was noted
by LOCKDEP:

...

Disabling non-boot CPUs ...
intel_pstate CPU 1 exiting

===============================
smpboot: CPU 1 didn't die...
[ INFO: suspicious RCU usage. ]
3.19.0-rc7-next-20150204.1-iniza-small #1 Not tainted
-------------------------------
include/trace/events/tlb.h:35 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

RCU used illegally from offline CPU!
rcu_scheduler_active = 1, debug_locks = 0
no locks held by swapper/1/0.

stack backtrace:
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.19.0-rc7-next-20150204.1-iniza-small #1
Hardware name: SAMSUNG ELECTRONICS CO., LTD. 530U3BI/530U4BI/530U4BH/530U3BI/530U4BI/530U4BH, BIOS 13XK 03/28/2013
0000000000000001 ffff88011a44fe18 ffffffff817e370d 0000000000000011
ffff88011a448290 ffff88011a44fe48 ffffffff810d6847 ffff8800c66b9600
0000000000000001 ffff88011a44c000 ffffffff81cb3900 ffff88011a44fe78
Call Trace:
[<ffffffff817e370d>] dump_stack+0x4c/0x65
[<ffffffff810d6847>] lockdep_rcu_suspicious+0xe7/0x120
[<ffffffff810b71a5>] idle_task_exit+0x205/0x2c0
[<ffffffff81054c4e>] play_dead_common+0xe/0x50
[<ffffffff81054ca5>] native_play_dead+0x15/0x140
[<ffffffff8102963f>] arch_cpu_idle_dead+0xf/0x20
[<ffffffff810cd89e>] cpu_startup_entry+0x37e/0x580
[<ffffffff81053e20>] start_secondary+0x140/0x150
intel_pstate CPU 2 exiting

...

By converting the tlb_flush tracepoint to a TRACE_EVENT_CONDITION where the
condition is cpu_online(smp_processor_id()), we can avoid calling RCU protected
code when the CPU is offline.

Link: http://lkml.kernel.org/r/CA+icZUUGiGDoL5NU8RuxKzFjoLjEKRtUWx=JB8B9a0EQv-eGzQ@mail.gmail.com

Cc: stable@vger.kernel.org # 3.17+
Fixes: d17d8f9dedb9 "x86/mm: Add tracepoints for TLB flushes"
Reported-by: Sedat Dilek <sedat.dilek@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Suggested-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Acked-by: Dave Hansen <dave@sr71.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

11y ago

Linus Torvalds

0f98c38d

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

11y ago

Linus Torvalds

fba7e994

Merge tag 'armsoc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

11y ago

Linus Torvalds

bdfeb5a1

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

11y ago

Yann Droneaud

43c61165

Revert "IB/core: Add support for extended query device caps"

11y ago

Steven Rostedt (Red Hat)

a05d59a5

tracing: Add condition check to RCU lockdep checks

11y ago

Linus Torvalds

0dc17d14

Merge tag 'gpio-v3.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-gpio

11y ago

Ming Lei

e09aae7e

blk-mq: release mq's kobjects in blk_release_queue()

11y ago

Linus Torvalds

3441456b

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

11y ago

Olof Johansson

28111dda

Merge tag 'renesas-soc-fixes3-for-v3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes

11y ago

Linus Torvalds

26cdd1f7

Merge branches 'timers-urgent-for-linus' and 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

11y ago

Forrest Liu

3da5ab56

Btrfs: add missing blk_finish_plug in btrfs_sync_log()

11y ago

Linus Torvalds

9d82f5eb

MMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Stretch ACKs can kill performance with Reno and CUBIC congestion
control, largely due to LRO and GRO. Fix from Neal Cardwell.

2) Fix userland breakage because we accidently emit zero length netlink
messages from the bridging code. From Roopa Prabhu.

3) Carry handling in generic csum_tcpudp_nofold is broken, fix from
Karl Beldan.

4) Remove bogus dev_set_net() calls from CAIF driver, from Nicolas
Dichtel.

5) Make sure PPP deflation never returns a length greater then the
output buffer, otherwise we overflow and trigger skb_over_panic().
Fix from Florian Westphal.

6) COSA driver needs VIRT_TO_BUS Kconfig dependencies, from Arnd
Bergmann.

7) Don't increase route cached MTU on datagram too big ICMPs. From Li
Wei.

8) Fix error path leaks in nf_tables, from Pablo Neira Ayuso.

9) Fix bitmask handling regression in netlink that broke things like
acpi userland tools. From Pablo Neira Ayuso.

10) Wrong header pointer passed to param_type2af() in SCTP code, from
Saran Maruti Ramanara.

11) Stacked vlans not handled correctly by vlan_get_protocol(), from
Toshiaki Makita.

12) Add missing DMA memory barrier to xgene driver, from Iyappan
Subramanian.

13) Fix crash in rate estimators, from Eric Dumazet.

14) We've been adding various workarounds, one after another, for the
change which added the per-net tcp_sock. It was meant to reduce
socket contention but added lots of problems.

Reduce this instead to a proper per-cpu socket and that rids us of
all the daemons.

From Eric Dumazet.

15) Fix memory corruption and OOPS in mlx4 driver, from Jack
Morgenstein.

16) When we disabled UFO in the virtio_net device, it introduces some
serious performance regressions. The orignal problem was IPV6
fragment ID generation, so fix that properly instead. From Vlad
Yasevich.

17) sr9700 driver build breaks on xtensa because it defines macros with
the same name as those used by the arch code. Use more unique
names. From Chen Gang.

18) Fix endianness in new virio 1.0 mode of the vhost net driver, from
Michael S Tsirkin.

19) Several sysctls were setting the maxlen attribute incorrectly, from
Sasha Levin.

20) Don't accept an FQ scheduler quantum of zero, that leads to crashes.
From Kenneth Klette Jonassen.

21) Fix dumping of non-existing actions in the packet scheduler
classifier. From Ignacy Gawędzki.

22) Return the write work_done value when doing TX work in the qlcnic
driver.

23) ip6gre_err accesses the info field with the wrong endianness, from
Sabrina Dubroca.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (54 commits)
sit: fix some __be16/u16 mismatches
ipv6: fix sparse errors in ip6_make_flowlabel()
net: remove some sparse warnings
flow_keys: n_proto type should be __be16
ip6_gre: fix endianness errors in ip6gre_err
qlcnic: Fix NAPI poll routine for Tx completion
amd-xgbe: Set RSS enablement based on hardware features
amd-xgbe: Adjust for zero-based traffic class count
cls_api.c: Fix dumping of non-existing actions' stats.
pkt_sched: fq: avoid hang when quantum 0
net: rds: use correct size for max unacked packets and bytes
vhost/net: fix up num_buffers endian-ness
gianfar: correct the bad expression while writing bit-pattern
net: usb: sr9700: Use 'SR_' prefix for the common register macros
Revert "drivers/net: Disable UFO through virtio"
Revert "drivers/net, ipv6: Select IPv6 fragment idents for virtio UFO packets"
ipv6: Select fragment id during UFO segmentation if not set.
xen-netback: stop the guest rx thread after a fatal error
net/mlx4_core: Fix kernel Oops (mem corruption) when working with more than 80 VFs
isdn: off by one in connect_res()
...

11y ago

Steven Rostedt (Red Hat)

ce1039bd

tracing: Fix enabling of syscall events on the command line

11y ago

Johan Hovold

49d2ca84

gpio: sysfs: fix memory leak in gpiod_sysfs_set_active_low

11y ago

Ming Lei

74170118

Revert "blk-mq: fix hctx/ctx kobject use-after-free"

11y ago

Linus Torvalds

00845eb9

sched: don't cause task state changes in nested sleep debugging

Commit 8eb23b9f35aa ("sched: Debug nested sleeps") added code to report
on nested sleep conditions, which we generally want to avoid because the
inner sleeping operation can re-set the thread state to TASK_RUNNING,
but that will then cause the outer sleep loop not actually sleep when it
calls schedule.

However, that's actually valid traditional behavior, with the inner
sleep being some fairly rare case (like taking a sleeping lock that
normally doesn't actually need to sleep).

And the debug code would actually change the state of the task to
TASK_RUNNING internally, which makes that kind of traditional and
working code not work at all, because now the nested sleep doesn't just
sometimes cause the outer one to not block, but will cause it to happen
every time.

In particular, it will cause the cardbus kernel daemon (pccardd) to
basically busy-loop doing scheduling, converting a laptop into a heater,
as reported by Bruno Prémont. But there may be other legacy uses of
that nested sleep model in other drivers that are also likely to never
get converted to the new model.

This fixes both cases:

- don't set TASK_RUNNING when the nested condition happens (note: even
if WARN_ONCE() only _warns_ once, the return value isn't whether the
warning happened, but whether the condition for the warning was true.
So despite the warning only happening once, the "if (WARN_ON(..))"
would trigger for every nested sleep.

- in the cases where we knowingly disable the warning by using
"sched_annotate_sleep()", don't change the task state (that is used
for all core scheduling decisions), instead use '->task_state_change'
that is used for the debugging decision itself.

(Credit for the second part of the fix goes to Oleg Nesterov: "Can't we
avoid this subtle change in behaviour DEBUG_ATOMIC_SLEEP adds?" with the
suggested change to use 'task_state_change' as part of the test)

Reported-and-bisected-by: Bruno Prémont <bonbons@linux-vserver.org>
Tested-by: Rafael J Wysocki <rjw@rjwysocki.net>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Cc: Ilya Dryomov <ilya.dryomov@inktank.com>,
Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Hurley <peter@hurleysoftware.com>,
Cc: Davidlohr Bueso <dave@stgolabs.net>,
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

11y ago

Rainer Koenig

47c1ffb2

Input: elantech - add more Fujtisu notebooks to force crc_enabled

11y ago

Laurent Pinchart

eab8d653

arm: dma-mapping: Set DMA IOMMU ops in arm_iommu_attach_device()

11y ago

Magnus Damm

77cf5166

ARM: shmobile: r8a7790: Instantiate GIC from C board code in legacy builds

11y ago

Linus Torvalds

396e9099

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

11y ago

John Stultz

2d926c15

hrtimer: Fix incorrect tai offset calculation for non high-res timer systems

11y ago

Ingo Molnar

6d84d1d1

Merge tag 'microcode_fix_for_3.19' of git://git.kernel.org/pub/scm/linux/kernel/git/bp/bp into x86/urgent

11y ago

Gui Hecheng

063c54dc

btrfs: fix raid56 scrub failed in xfstests btrfs/072

11y ago

Linus Torvalds

14365ea2

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

11y ago

Eric Dumazet

a409caec

sit: fix some __be16/u16 mismatches

11y ago

Steven Rostedt (Red Hat)

83829b74

tracing: Remove extra call to init_ftrace_syscalls()

11y ago

Johan Hovold

0f303db0

gpio: sysfs: fix memory leak in gpiod_export_link

11y ago

Linus Torvalds

c59c961c

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

11y ago

Linus Torvalds

788807d7

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

11y ago

Jochen Hein

1d90d6d5

Input: i8042 - add noloop quirk for Medion Akoya E7225 (MD98857)

11y ago

Olof Johansson

764e2c70

Merge tag 'mvebu-fixes-3.19-6' of git://git.infradead.org/linux-mvebu into fixes

11y ago

Magnus Damm

974b072f

ARM: shmobile: r8a73a4: Instantiate GIC from C board code in legacy builds

11y ago

Linus Torvalds

29f12c48

Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

11y ago

Peter Zijlstra

40767b0d

sched/deadline: Fix deadline parameter modification handling

11y ago

Linus Torvalds

dc6d6844

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

11y ago

Boris Ostrovsky

da63865a

x86, microcode: Return error from driver init code when loader is disabled

11y ago

Qu Wenruo

a53f4f8e

btrfs: Don't call btrfs_start_transaction() on frozen fs to avoid deadlock.

Commit 6b5fe46dfa52 (btrfs: do commit in sync_fs if there are pending
changes) will call btrfs_start_transaction() in sync_fs(), to handle
some operations needed to be done in next transaction.

However this can cause deadlock if the filesystem is frozen, with the
following sys_r+w output:
[ 143.255932] Call Trace:
[ 143.255936] [<ffffffff816c0e09>] schedule+0x29/0x70
[ 143.255939] [<ffffffff811cb7f3>] __sb_start_write+0xb3/0x100
[ 143.255971] [<ffffffffa040ec06>] start_transaction+0x2e6/0x5a0
[btrfs]
[ 143.255992] [<ffffffffa040f1eb>] btrfs_start_transaction+0x1b/0x20
[btrfs]
[ 143.256003] [<ffffffffa03dc0ba>] btrfs_sync_fs+0xca/0xd0 [btrfs]
[ 143.256007] [<ffffffff811f7be0>] sync_fs_one_sb+0x20/0x30
[ 143.256011] [<ffffffff811cbd01>] iterate_supers+0xe1/0xf0
[ 143.256014] [<ffffffff811f7d75>] sys_sync+0x55/0x90
[ 143.256017] [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17
[ 143.256111] Call Trace:
[ 143.256114] [<ffffffff816c0e09>] schedule+0x29/0x70
[ 143.256119] [<ffffffff816c3405>] rwsem_down_write_failed+0x1c5/0x2d0
[ 143.256123] [<ffffffff8133f013>] call_rwsem_down_write_failed+0x13/0x20
[ 143.256131] [<ffffffff811caae8>] thaw_super+0x28/0xc0
[ 143.256135] [<ffffffff811db3e5>] do_vfs_ioctl+0x3f5/0x540
[ 143.256187] [<ffffffff811db5c1>] SyS_ioctl+0x91/0xb0
[ 143.256213] [<ffffffff816c49d2>] system_call_fastpath+0x12/0x17

The reason is like the following:
(Holding s_umount)
VFS sync_fs staff:
|- btrfs_sync_fs()
|- btrfs_start_transaction()
|- sb_start_intwrite()
(Waiting thaw_fs to unfreeze)
VFS thaw_fs staff:
thaw_fs()
(Waiting sync_fs to release
s_umount)

So deadlock happens.
This can be easily triggered by fstest/generic/068 with inode_cache
mount option.

The fix is to check if the fs is frozen, if the fs is frozen, just
return and waiting for the next transaction.

Cc: David Sterba <dsterba@suse.cz>
Reported-by: Gui Hecheng <guihc.fnst@cn.fujitsu.com>
Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com>
[enhanced comment, changed to SB_FREEZE_WRITE]
Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <clm@fb.com>

11y ago

Linus Torvalds

42345d63

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

11y ago

Brian King

3a9794d3

sd: Fix max transfer length for 4k disks

11y ago

Eric Dumazet

67765146

ipv6: fix sparse errors in ip6_make_flowlabel()

11y ago

Steven Rostedt (Red Hat)

237d28db

ftrace/jprobes/x86: Fix conflict between jprobes and function graph tracing

If the function graph tracer traces a jprobe callback, the system will
crash. This can easily be demonstrated by compiling the jprobe
sample module that is in the kernel tree, loading it and running the
function graph tracer.

# modprobe jprobe_example.ko
# echo function_graph > /sys/kernel/debug/tracing/current_tracer
# ls

The first two commands end up in a nice crash after the first fork.
(do_fork has a jprobe attached to it, so "ls" just triggers that fork)

The problem is caused by the jprobe_return() that all jprobe callbacks
must end with. The way jprobes works is that the function a jprobe
is attached to has a breakpoint placed at the start of it (or it uses
ftrace if fentry is supported). The breakpoint handler (or ftrace callback)
will copy the stack frame and change the ip address to return to the
jprobe handler instead of the function. The jprobe handler must end
with jprobe_return() which swaps the stack and does an int3 (breakpoint).
This breakpoint handler will then put back the saved stack frame,
simulate the instruction at the beginning of the function it added
a breakpoint to, and then continue on.

For function tracing to work, it hijakes the return address from the
stack frame, and replaces it with a hook function that will trace
the end of the call. This hook function will restore the return
address of the function call.

If the function tracer traces the jprobe handler, the hook function
for that handler will not be called, and its saved return address
will be used for the next function. This will result in a kernel crash.

To solve this, pause function tracing before the jprobe handler is called
and unpause it before it returns back to the function it probed.

Some other updates:

Used a variable "saved_sp" to hold kcb->jprobe_saved_sp. This makes the
code look a bit cleaner and easier to understand (various tries to fix
this bug required this change).

Note, if fentry is being used, jprobes will change the ip address before
the function graph tracer runs and it will not be able to trace the
function that the jprobe is probing.

Link: http://lkml.kernel.org/r/20150114154329.552437962@goodmis.org

Cc: stable@vger.kernel.org # 2.6.30+
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>

11y ago

Sonic Zhang

b184c388

gpio: mcp23s08: handle default gpio base

11y ago

Linus Torvalds

59343cd7

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Don't OOPS on socket AIO, from Christoph Hellwig.

2) Scheduled scans should be aborted upon RFKILL, from Emmanuel
Grumbach.

3) Fix sleep in atomic context in kvaser_usb, from Ahmed S Darwish.

4) Fix RCU locking across copy_to_user() in bpf code, from Alexei
Starovoitov.

5) Lots of crash, memory leak, short TX packet et al bug fixes in
sh_eth from Ben Hutchings.

6) Fix memory corruption in SCTP wrt. INIT collitions, from Daniel
Borkmann.

7) Fix return value logic for poll handlers in netxen, enic, and bnx2x.
From Eric Dumazet and Govindarajulu Varadarajan.

8) Header length calculation fix in mac80211 from Fred Chou.

9) mv643xx_eth doesn't handle highmem correctly in non-TSO code paths.
From Ezequiel Garcia.

10) udp_diag has bogus logic in it's hash chain skipping, copy same fix
tcp diag used. From Herbert Xu.

11) amd-xgbe programs wrong rx flow control register, from Thomas
Lendacky.

12) Fix race leading to use after free in ping receive path, from Subash
Abhinov Kasiviswanathan.

13) Cache redirect routes otherwise we can get a heavy backlog of rcu
jobs liberating DST_NOCACHE entries. From Hannes Frederic Sowa.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (48 commits)
net: don't OOPS on socket aio
stmmac: prevent probe drivers to crash kernel
bnx2x: fix napi poll return value for repoll
ipv6: replacing a rt6_info needs to purge possible propagated rt6_infos too
sh_eth: Fix DMA-API usage for RX buffers
sh_eth: Check for DMA mapping errors on transmit
sh_eth: Ensure DMA engines are stopped before freeing buffers
sh_eth: Remove RX overflow log messages
ping: Fix race in free in receive path
udp_diag: Fix socket skipping within chain
can: kvaser_usb: Fix state handling upon BUS_ERROR events
can: kvaser_usb: Retry the first bulk transfer on -ETIMEDOUT
can: kvaser_usb: Send correct context to URB completion
can: kvaser_usb: Do not sleep in atomic context
ipv4: try to cache dst_entries which would cause a redirect
samples: bpf: relax test_maps check
bpf: rcu lock must not be held when calling copy_to_user()
net: sctp: fix slab corruption from use after free on INIT collisions
net: mv643xx_eth: Fix highmem support in non-TSO egress path
sh_eth: Fix serialisation of interrupt disable with interrupt & NAPI handlers
...