commits
Pull timer fixes from Thomas Gleixner:
"Two fixes in the timer area:
- a long-standing lock inversion due to a printk
- suspend-related hrtimer corruption in sched_clock"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
sched_clock: Avoid corrupting hrtimer tree during suspend
Pull ARM fixes from Russell King:
"A few fixes for ARM. Some of these are correctness issues:
- TLBs must be flushed after the old mappings are removed by the DMA
mapping code, but before the new mappings are established.
- An off-by-one entry error in the Keystone LPAE setup code.
Fixes include:
- ensuring that the identity mapping for LPAE does not remove the
kernel image from the identity map.
- preventing userspace from trapping into kgdb.
- fixing a preemption issue in the Intel iwmmxt code.
- fixing a build error with nommu.
Other changes include:
- Adding a note about which areas of memory are expected to be
accessible while the identity mapping tables are in place"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instruction
ARM: idmap: add identity mapping usage note
ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout
ARM: fix alignment of keystone page table fixup
ARM: 8112/1: only select ARM_PATCH_PHYS_VIRT if MMU is enabled
ARM: 8100/1: Fix preemption disable in iwmmxt_task_enable()
ARM: DMA: ensure that old section mappings are flushed from the TLB
clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:
======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc8-06195-g939f04b #2 Not tainted
-------------------------------------------------------
trinity-main/74 is trying to acquire lock:
(&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c
but task is already holding lock:
(hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #5 (hrtimer_bases.lock){-.-...}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
[<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
[<81080792>] task_clock_event_start+0x3a/0x3f
[<810807a4>] task_clock_event_add+0xd/0x14
[<8108259a>] event_sched_in+0xb6/0x17a
[<810826a2>] group_sched_in+0x44/0x122
[<81082885>] ctx_sched_in.isra.67+0x105/0x11f
[<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
[<81082bf6>] __perf_install_in_context+0x8b/0xa3
[<8107eb8e>] remote_function+0x12/0x2a
[<8105f5af>] smp_call_function_single+0x2d/0x53
[<8107e17d>] task_function_call+0x30/0x36
[<8107fb82>] perf_install_in_context+0x87/0xbb
[<810852c9>] SYSC_perf_event_open+0x5c6/0x701
[<810856f9>] SyS_perf_event_open+0x17/0x19
[<8142f8ee>] syscall_call+0x7/0xb
-> #4 (&ctx->lock){......}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f04c>] _raw_spin_lock+0x21/0x30
[<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
[<8142cacc>] __schedule+0x4c6/0x4cb
[<8142cae0>] schedule+0xf/0x11
[<8142f9a6>] work_resched+0x5/0x30
-> #3 (&rq->lock){-.-.-.}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f04c>] _raw_spin_lock+0x21/0x30
[<81040873>] __task_rq_lock+0x33/0x3a
[<8104184c>] wake_up_new_task+0x25/0xc2
[<8102474b>] do_fork+0x15c/0x2a0
[<810248a9>] kernel_thread+0x1a/0x1f
[<814232a2>] rest_init+0x1a/0x10e
[<817af949>] start_kernel+0x303/0x308
[<817af2ab>] i386_start_kernel+0x79/0x7d
-> #2 (&p->pi_lock){-.-...}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<810413dd>] try_to_wake_up+0x1d/0xd6
[<810414cd>] default_wake_function+0xb/0xd
[<810461f3>] __wake_up_common+0x39/0x59
[<81046346>] __wake_up+0x29/0x3b
[<811b8733>] tty_wakeup+0x49/0x51
[<811c3568>] uart_write_wakeup+0x17/0x19
[<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
[<811c5f28>] serial8250_handle_irq+0x54/0x6a
[<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
[<811c56d8>] serial8250_interrupt+0x38/0x9e
[<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
[<81051296>] handle_irq_event+0x2c/0x43
[<81052cee>] handle_level_irq+0x57/0x80
[<81002a72>] handle_irq+0x46/0x5c
[<810027df>] do_IRQ+0x32/0x89
[<8143036e>] common_interrupt+0x2e/0x33
[<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
[<811c25a4>] uart_start+0x2d/0x32
[<811c2c04>] uart_write+0xc7/0xd6
[<811bc6f6>] n_tty_write+0xb8/0x35e
[<811b9beb>] tty_write+0x163/0x1e4
[<811b9cd9>] redirected_tty_write+0x6d/0x75
[<810b6ed6>] vfs_write+0x75/0xb0
[<810b7265>] SyS_write+0x44/0x77
[<8142f8ee>] syscall_call+0x7/0xb
-> #1 (&tty->write_wait){-.....}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<81046332>] __wake_up+0x15/0x3b
[<811b8733>] tty_wakeup+0x49/0x51
[<811c3568>] uart_write_wakeup+0x17/0x19
[<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
[<811c5f28>] serial8250_handle_irq+0x54/0x6a
[<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
[<811c56d8>] serial8250_interrupt+0x38/0x9e
[<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
[<81051296>] handle_irq_event+0x2c/0x43
[<81052cee>] handle_level_irq+0x57/0x80
[<81002a72>] handle_irq+0x46/0x5c
[<810027df>] do_IRQ+0x32/0x89
[<8143036e>] common_interrupt+0x2e/0x33
[<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
[<811c25a4>] uart_start+0x2d/0x32
[<811c2c04>] uart_write+0xc7/0xd6
[<811bc6f6>] n_tty_write+0xb8/0x35e
[<811b9beb>] tty_write+0x163/0x1e4
[<811b9cd9>] redirected_tty_write+0x6d/0x75
[<810b6ed6>] vfs_write+0x75/0xb0
[<810b7265>] SyS_write+0x44/0x77
[<8142f8ee>] syscall_call+0x7/0xb
-> #0 (&port_lock_key){-.....}:
[<8104a62d>] __lock_acquire+0x9ea/0xc6d
[<8104a942>] lock_acquire+0x92/0x101
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<811c60be>] serial8250_console_write+0x8c/0x10c
[<8104e402>] call_console_drivers.constprop.31+0x87/0x118
[<8104f5d5>] console_unlock+0x1d7/0x398
[<8104fb70>] vprintk_emit+0x3da/0x3e4
[<81425f76>] printk+0x17/0x19
[<8105bfa0>] clockevents_program_min_delta+0x104/0x116
[<8105c548>] clockevents_program_event+0xe7/0xf3
[<8105cc1c>] tick_program_event+0x1e/0x23
[<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
[<8103c49e>] __remove_hrtimer+0x5b/0x79
[<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
[<8103cb4b>] hrtimer_cancel+0xd/0x18
[<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
[<81080705>] task_clock_event_stop+0x20/0x64
[<81080756>] task_clock_event_del+0xd/0xf
[<81081350>] event_sched_out+0xab/0x11e
[<810813e0>] group_sched_out+0x1d/0x66
[<81081682>] ctx_sched_out+0xaf/0xbf
[<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
[<8142cacc>] __schedule+0x4c6/0x4cb
[<8142cae0>] schedule+0xf/0x11
[<8142f9a6>] work_resched+0x5/0x30
other info that might help us debug this:
Chain exists of:
&port_lock_key --> &ctx->lock --> hrtimer_bases.lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(hrtimer_bases.lock);
lock(&ctx->lock);
lock(hrtimer_bases.lock);
lock(&port_lock_key);
*** DEADLOCK ***
4 locks held by trinity-main/74:
#0: (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
#1: (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
#2: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
#3: (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4
stack backtrace:
CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2
00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
Call Trace:
[<81426f69>] dump_stack+0x16/0x18
[<81425a99>] print_circular_bug+0x18f/0x19c
[<8104a62d>] __lock_acquire+0x9ea/0xc6d
[<8104a942>] lock_acquire+0x92/0x101
[<811c60be>] ? serial8250_console_write+0x8c/0x10c
[<811c6032>] ? wait_for_xmitr+0x76/0x76
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<811c60be>] ? serial8250_console_write+0x8c/0x10c
[<811c60be>] serial8250_console_write+0x8c/0x10c
[<8104af87>] ? lock_release+0x191/0x223
[<811c6032>] ? wait_for_xmitr+0x76/0x76
[<8104e402>] call_console_drivers.constprop.31+0x87/0x118
[<8104f5d5>] console_unlock+0x1d7/0x398
[<8104fb70>] vprintk_emit+0x3da/0x3e4
[<81425f76>] printk+0x17/0x19
[<8105bfa0>] clockevents_program_min_delta+0x104/0x116
[<8105cc1c>] tick_program_event+0x1e/0x23
[<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
[<8103c49e>] __remove_hrtimer+0x5b/0x79
[<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
[<8103cb4b>] hrtimer_cancel+0xd/0x18
[<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
[<81080705>] task_clock_event_stop+0x20/0x64
[<81080756>] task_clock_event_del+0xd/0xf
[<81081350>] event_sched_out+0xab/0x11e
[<810813e0>] group_sched_out+0x1d/0x66
[<81081682>] ctx_sched_out+0xaf/0xbf
[<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
[<8104416d>] ? __dequeue_entity+0x23/0x27
[<81044505>] ? pick_next_task_fair+0xb1/0x120
[<8142cacc>] __schedule+0x4c6/0x4cb
[<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
[<810475b0>] ? trace_hardirqs_off+0xb/0xd
[<81056346>] ? rcu_irq_exit+0x64/0x77
Fix the problem by using printk_deferred() which does not call into the
scheduler.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull vfs fixes from Al Viro:
"This contains a couple of fixes - one is the aio fix from Christoph,
the other a fallocate() one from Eric"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
vfs: fix check for fallocate on active swapfile
direct-io: fix AIO regression
The kgdb breakpoint hooks (kgdb_brk_fn and kgdb_compiled_brk_fn)
should only be entered when a kgdb break instruction is executed
from the kernel. Otherwise, if kgdb is enabled, a userspace program
can cause the kernel to drop into the debugger by executing either
KGDB_BREAKINST or KGDB_COMPILED_BREAK.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
During suspend we call sched_clock_poll() to update the epoch and
accumulated time and reprogram the sched_clock_timer to fire
before the next wrap-around time. Unfortunately,
sched_clock_poll() doesn't restart the timer, instead it relies
on the hrtimer layer to do that and during suspend we aren't
calling that function from the hrtimer layer. Instead, we're
reprogramming the expires time while the hrtimer is enqueued,
which can cause the hrtimer tree to be corrupted. Furthermore, we
restart the timer during suspend but we update the epoch during
resume which seems counter-intuitive.
Let's fix this by saving the accumulated state and canceling the
timer during suspend. On resume we can update the epoch and
restart the timer similar to what we would do if we were starting
the clock for the first time.
Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 fix from Peter Anvin:
"A single fix to not invoke the espfix code on Xen PV, as it turns out
to oops the guest when invoked after all. This patch leaves some
amount of dead code, in particular unnecessary initialization of the
espfix stacks when they won't be used, but in the interest of keeping
the patch minimal that cleanup can wait for the next cycle"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_64/entry/xen: Do not invoke espfix64 on Xen
Fix the broken check for calling sys_fallocate() on an active swapfile,
introduced by commit 0790b31b69374ddadefe ("fs: disallow all fallocate
operation on active swapfile").
Signed-off-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Add a note about the usage of the identity mapping; we do not support
accesses outside of the identity map region and kernel image while a
CPU is using the identity map. This is because the identity mapping
may overwrite vmalloc space, IO mappings, the vectors pages, etc.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull nfsd bugfix from Bruce Fields:
"Another regression from the xdr encoding rewrite"
* 'for-3.16' of git://linux-nfs.org/~bfields/linux:
NFSD: Fix crash encoding lock reply on 32-bit
Pull staging driver bugfixes from Greg KH:
"Here are some tiny staging driver bugfixes that I've had in my tree
for the past week that resolve some reported issues. Nothing major at
all, but it would be good to get them merged for 3.16-rc8 or -final"
* tag 'staging-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vt6655: Fix disassociated messages every 10 seconds
staging: vt6655: Fix Warning on boot handle_irq_event_percpu.
staging: rtl8723au: rtw_resume(): release semaphore before exit on error
iio:bma180: Missing check for frequency fractional part
iio:bma180: Fix scale factors to report correct acceleration units
iio: buffer: Fix demux table creation
This moves the espfix64 logic into native_iret. To make this work,
it gets rid of the native patch for INTERRUPT_RETURN:
INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.
This changes the 16-bit SS behavior on Xen from OOPSing to leaking
some bits of the Xen hypervisor's RSP (I think).
[ hpa: this is a nonzero cost on native, but probably not enough to
measure. Xen needs to fix this in their own code, probably doing
something equivalent to espfix64. ]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
The direct-io.c rewrite to use the iov_iter infrastructure stopped updating
the size field in struct dio_submit, and thus rendered the check for
allowing asynchronous completions to always return false. Fix this by
comparing it to the count of bytes in the iov_iter instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2
(pmd) entries map 2MiB.
When the identity mapping is created on LPAE, the pgd pointers are copied
from the swapper_pg_dir. If we find that we need to modify the contents
of a pmd, we allocate a new empty pmd table and insert it into the
appropriate 1GB slot, before then filling it with the identity mapping.
However, if the 1GB slot covers the kernel lowmem mappings, we obliterate
those mappings.
When replacing a PMD, first copy the old PMD contents to the new PMD, so
that we preserve the existing mappings, particularly the mappings of the
kernel itself.
[rewrote commit message and added code comment -- rmk]
Fixes: ae2de101739c ("ARM: LPAE: Add identity mapping support for the 3-level page table format")
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull arm64 fix from Catalin Marinas:
"Fix arm64 regression introduced by limiting the CMA buffer to ZONE_DMA
on platforms where RAM starts above 4GB (and ZONE_DMA becoming 0)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Create non-empty ZONE_DMA when DRAM starts above 4GB
Commit 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low
on space" forgot to free conf->data in nfsd4_encode_lockt and before
sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak.
Worse, kfree() can be called on an uninitialized pointer in the case of
a succesful lock (or one that fails for a reason other than a conflict).
(Note that lock->lk_denied.ld_owner.data appears it should be zero here,
until you notice that it's one arm of a union the other arm of which is
written to in the succesful case by the
memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid,
sizeof(stateid_t));
in nfsd4_lock(). In the 32-bit case this overwrites ld_owner.data.)
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 8c7424cff6 ""nfsd4: don't try to encode conflicting owner if low on space"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull device mapper fixes from Mike Snitzer:
"Fix dm bufio shrinker to properly zero-fill all fields.
Fix race in dm cache that caused improper reporting of the number of
dirty blocks in the cache"
* tag 'dm-3.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: fix race affecting dirty block count
dm bufio: fully initialize shrinker
byReAssocCount is incremented every second resulting in
disassociated message being send every 10 seconds whether
connection or not.
byReAssocCount should only advance while eCommandState
is in WLAN_ASSOCIATE_WAIT
Change existing scope to if condition.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull ACPI fix from Rafael Wysocki:
"One commit that fixes a problem causing PNP devices to be associated
with wrong ACPI device objects sometimes during device enumeration due
to an incorrect check in a matching function.
That problem was uncovered by the ACPI device enumeration rework in
3.14"
* tag 'pm+acpi-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PNP: Fix acpi_pnp_match()
If init_mm.brk is not section aligned, the LPAE fixup code will miss
updating the final PMD. Fix this by aligning map_end.
Fixes: a77e0c7b2774 ("ARM: mm: Recreate kernel mappings in early_paging_init()")
Cc: <stable@vger.kernel.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull Xtensa fixes from Chris Zankel:
- resolve FIXMEs in double exception handler for window overflow. This
fix makes native building of linux on xtensa host possible;
- fix sysmem region removal issue introduced in 3.15.
* tag 'xtensa-next-20140721' of git://github.com/czankel/xtensa-linux:
xtensa: fix sysmem reservation at the end of existing block
xtensa: add fixup for double exception raised in window overflow
ZONE_DMA is created to allow 32-bit only devices to access memory in the
absence of an IOMMU. On systems where the memory starts above 4GB, it is
expected that some devices have a DMA offset hardwired to be able to
access the bottom of the memory. Linux currently supports DT bindings
for the DMA offsets but they are not (easily) available early during
boot.
This patch tries to guess a DMA offset and assumes that ZONE_DMA
corresponds to the 32-bit mask above the start of DRAM.
Fixes: 2d5a5612bc (arm64: Limit the CMA buffer to 32-bit if ZONE_DMA)
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Mark Salter <msalter@redhat.com>
Tested-by: Anup Patel <anup.patel@linaro.org>
Introduced by commit 561f0ed498 (nfsd4: allow large readdirs).
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull ARM straggler SoC fix from Olof Johansson:
"A DT bugfix for Nomadik that had an ambigouos double-inversion of a
gpio line, and one MAINTAINER URL update that might as well go in now.
We could hold off until the merge window, but then we'll just have to
mark the DT fix for stable and it just seems like in total causing
more work"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
MAINTAINERS: Update Tegra Git URL
ARM: nomadik: fix up double inversion in DT
nr_dirty is updated without locking, causing it to drift so that it is
non-zero (either a small positive integer, or a very large one when an
underflow occurs) even when there are no actual dirty blocks. This was
due to a race between the workqueue and map function accessing nr_dirty
in parallel without proper protection.
People were seeing under runs due to a race on increment/decrement of
nr_dirty, see: https://lkml.org/lkml/2014/6/3/648
Fix this by using an atomic_t for nr_dirty.
Reported-by: roma1390@gmail.com
Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
WARNING: CPU: 0 PID: 929 at /home/apw/COD/linux/kernel/irq/handle.c:147 handle_irq_event_percpu+0x1d1/0x1e0()
irq 17 handler device_intr+0x0/0xa80 [vt6655_stage] enabled interrupts
Using spin_lock_irqsave appears to fix this.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull perf fixes from Thomas Gleixner:
"A bunch of fixes for perf and kprobes:
- revert a commit that caused a perf group regression
- silence dmesg spam
- fix kprobe probing errors on ia64 and ppc64
- filter kprobe faults from userspace
- lockdep fix for perf exit path
- prevent perf #GP in KVM guest
- correct perf event and filters"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes: Fix "Failed to find blacklist" probing errors on ia64 and ppc64
kprobes/x86: Don't try to resolve kprobe faults from userspace
perf/x86/intel: Avoid spamming kernel log for BTS buffer failure
perf/x86/intel: Protect LBR and extra_regs against KVM lying
perf: Fix lockdep warning on process exit
perf/x86/intel/uncore: Fix SNB-EP/IVT Cbox filter mappings
perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
perf: Revert ("perf: Always destroy groups on exit")
Pull clock driver fix from Mike Turquette:
"A single patch to re-enable audio which is broken on all DRA7
SoC-based platforms. Missed this one from the last set of fixes"
* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux:
clk: ti: clk-7xx: Correct ABE DPLL configuration
The acpi_pnp_match() function is used for finding the ACPI device
object that should be associated with the given PNP device.
Unfortunately, the check used by that function is not strict enough
and may cause success to be returned for a wrong ACPI device object.
To fix that, use the observation that the pointer to the ACPI
device object in question is already stored in the data field
in struct pnp_dev, so acpi_pnp_match() can simply use that
field to do its job.
This problem was uncovered in 3.14 by commit 202317a573b2 (ACPI / scan:
Add acpi_device objects for all device nodes in the namespace).
Fixes: 202317a573b2 (ACPI / scan: Add acpi_device objects for all device nodes in the namespace)
Reported-and-tested-by: Vinson Lee <vlee@twopensource.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This fixes the following warning:
warning: (ARCH_MULTIPLATFORM && ARCH_INTEGRATOR && ARCH_SHMOBILE_LEGACY) selects ARM_PATCH_PHYS_VIRT which has unmet direct dependencies (!XIP_KERNEL && MMU && (!ARCH_REALVIEW || !SPARSEMEM))
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull pin control fixes from Linus Walleij:
"Here are three pin control fixes for the v3.16 series. Sorry that
some of these arrive late, the summer heat in Sweden makes me slow.
- an IRQ handling fix for the STi driver, also for stable
- another IRQ fix for the RCAR GPIO driver
- a MAINTAINERS entry"
* tag 'pinctrl-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
gpio: rcar: Add support for DT IRQ flags
MAINTAINERS: Add entry for the Renesas pin controller driver
pinctrl: st: Fix irqmux handler
Xtensa fixes for 3.16:
- resolve FIXMEs in double exception handler for window overflow. This
fix makes native building of linux on xtensa host possible;
- fix sysmem region removal issue introduced in 3.15.
include/linux/sched.h implements TASK_SIZE_OF as TASK_SIZE if it
is not set by the architecture headers. TASK_SIZE uses the
current task to determine the size of the virtual address space.
On a 64-bit kernel this will cause reading /proc/pid/pagemap of a
64-bit process from a 32-bit process to return EOF when it reads
past 0xffffffff.
Implement TASK_SIZE_OF exactly the same as TASK_SIZE with
test_tsk_thread_flag instead of test_thread_flag.
Cc: stable@vger.kernel.org
Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
XDR requires 4-byte alignment; nfs4d READLINK reply writes out the padding,
but truncates the packet to the padding-less size.
Fix by taking the padding into consideration when truncating the packet.
Symptoms:
# ll /mnt/
ls: cannot read symbolic link /mnt/test: Input/output error
total 4
-rw-r--r--. 1 root root 0 Jun 14 01:21 123456
lrwxrwxrwx. 1 root root 6 Jul 2 03:33 test
drwxr-xr-x. 1 root root 0 Jul 2 23:50 tmp
drwxr-xr-x. 1 root root 60 Jul 2 23:44 tree
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Fixes: 476a7b1f4b2c (nfsd4: don't treat readlink like a zero-copy operation)
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
swarren/linux-tegra.git is a stale location; it has moved to
tegra/linux.git.
While the git protocol re-directs to the new location, HTTP does not.
Besides, MAINTAINERS should contain the canonical URL.
Signed-off-by: Andreas Färber <afaerber@suse.de>
[swarren, updated commit message]
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
1d3d4437eae1 ("vmscan: per-node deferred work") added a flags field to
struct shrinker assuming that all shrinkers were zero filled. The dm
bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data
in flags. So far the only defined flags bit is SHRINKER_NUMA_AWARE.
But there are proposed patches which add other bits to shrinker.flags
(e.g. memcg awareness).
Rather than simply initializing the shrinker, this patch uses kzalloc()
when allocating the dm_bufio_client to ensure that the embedded shrinker
and any other similar structures are zeroed.
This fixes theoretical over aggressive shrinking of dm bufio objects.
If the uninitialized dm_bufio_client.shrinker.flags contains
SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for
each numa node rather than just once. This has been broken since 3.12.
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.12+
Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull x86 fixes from Peter Anvin:
"A couple of crash fixes, plus a fix that on 32 bits would cause a
missing -ENOSYS for nonexistent system calls"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, cpu: Fix cache topology for early P4-SMT
x86_32, entry: Store badsys error code in %eax
x86, MCE: Robustify mcheck_init_device
On ia64 and ppc64, function pointers do not point to the
entry address of the function, but to the address of a
function descriptor (which contains the entry address and misc
data).
Since the kprobes code passes the function pointer stored
by NOKPROBE_SYMBOL() to kallsyms_lookup_size_offset() for
initalizing its blacklist, it fails and reports many errors,
such as:
Failed to find blacklist 0001013168300000
Failed to find blacklist 0001013000f0a000
[...]
To fix this bug, use arch_deref_entry_point() to get the
function entry address for kallsyms_lookup_size_offset()
instead of the raw function pointer.
Suzuki also pointed out that blacklist entries should also
be updated as well.
Reported-by: Tony Luck <tony.luck@gmail.com>
Fixed-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (for powerpc)
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: sparse@chrisli.org
Cc: Paul Mackerras <paulus@samba.org>
Cc: akataria@vmware.com
Cc: anil.s.keshavamurthy@intel.com
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: rdunlap@infradead.org
Cc: dl9pf@gmx.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20140717114411.13401.2632.stgit@kbuild-fedora.novalocal
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull crypto fix from Herbert Xu:
"This adds missing SELinux labeling to AF_ALG sockets which apparently
causes SELinux (or at least the SELinux people) to misbehave :)"
* git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: af_alg - properly label AF_ALG socket
ABE DPLL frequency need to be lowered from 361267200
to 180633600 to facilitate the ATL requironments.
The dpll_abe_m2x2_ck clock need to be set to double
of ABE DPLL rate in order to have correct clocks
for audio.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
commit 431a84b1a4f7d1a0085d5b91330c5053cc8e8b12
("ARM: 8034/1: Disable preemption in iwmmxt_task_enable()")
introduced macros {inc,dec}_preempt_count to iwmmxt_task_enable
to make it run with preemption disabled.
Unfortunately, other functions in iwmmxt.S also use concan_{save,dump,load}
sections located in iwmmxt_task_enable() to deal with iWMMXt coprocessor.
This causes an unbalanced preempt_count due to excessive dec_preempt_count
and destroyed return addresses in callers of concan_ labels due to a register
collision:
Linux version 3.16.0-rc3-00062-gd92a333-dirty (jef@armhf) (gcc version 4.8.3 (Debian 4.8.3-4) ) #5 PREEMPT Thu Jul 3 19:46:39 CEST 2014
CPU: ARMv7 Processor [560f5815] revision 5 (ARMv7), cr=10c5387d
CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
Machine model: SolidRun CuBox
...
PJ4 iWMMXt v2 coprocessor enabled.
...
Unable to handle kernel paging request at virtual address fffffffe
pgd = bb25c000
[fffffffe] *pgd=3bfde821, *pte=00000000, *ppte=00000000
Internal error: Oops: 80000007 [#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 62 Comm: startpar Not tainted 3.16.0-rc3-00062-gd92a333-dirty #5
task: bb230b80 ti: bb256000 task.ti: bb256000
PC is at 0xfffffffe
LR is at iwmmxt_task_copy+0x44/0x4c
pc : [<fffffffe>] lr : [<800130ac>] psr: 40000033
sp : bb257de8 ip : 00000013 fp : bb257ea4
r10: bb256000 r9 : fffffdfe r8 : 76e898e6
r7 : bb257ec8 r6 : bb256000 r5 : 7ea12760 r4 : 000000a0
r3 : ffffffff r2 : 00000003 r1 : bb257df8 r0 : 00000000
Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user
Control: 10c5387d Table: 3b25c019 DAC: 00000015
Process startpar (pid: 62, stack limit = 0xbb256248)
This patch fixes the issue by moving concan_{save,dump,load} into separate
code sections and make iwmmxt_task_enable() call them in the same way the
other functions use concan_ symbols. The test for valid ownership is moved
to concan_save and is safe for the other user of it, iwmmxt_task_disable().
The register collision is also resolved by moving concan_ symbols as
{inc,dec}_preempt_count are now local to iwmmxt_task_enable().
Fixes: 431a84b1a4f7 ("ARM: 8034/1: Disable preemption in iwmmxt_task_enable()")
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jean-Francois Moine <moinejf@free.fr>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull libata regression fix from Tejun Heo:
"The last libata/for-3.16-fixes pull contained a regression introduced
by 1871ee134b73 ("libata: support the ata host which implements a
queue depth less than 32") which in turn was a fix for a regression
introduced earlier while changing queue tag order to accomodate hard
drives which perform poorly if tags are not allocated in circular
order (ugh...).
The regression happens only for SAS controllers making use of libata
to serve ATA devices. They don't fill an ata_host field which is used
by the new tag allocation function leading to NULL dereference.
This patch adds a new intermediate field ata_host->n_tags which is
initialized for both SAS and !SAS cases to fix the issue"
* 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: introduce ata_host->n_tags to avoid oops on SAS controllers
The gpio-rcar driver has no IRQ domain OF xlate function and thus
ignores IRQ flags specified in DT. Fix this by using the two-cell xlate
function.
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
When sysmem reservation occurs exactly at the end of an existing block
that block is deleted, because it is incorrectly included in the range
of memblocks to remove. Fix that by skipping such block.
Cc: stable@vger.kernel.org
Signed-off-by: Max Filippov <jcmvbkbc@gmail.com>
The __cpu_clear_user_page() and __cpu_copy_user_page() functions
are not currently exported. This prevents modules from using
clear_user_page() and copy_user_page().
Signed-off-by: Mark Salter <msalter@redhat.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
An NFS operation that creates a new symlink includes the symlink data,
which is xdr-encoded as a length followed by the data plus 0 to 3 bytes
of zero-padding as required to reach a 4-byte boundary.
The vfs, on the other hand, wants null-terminated data.
The simple way to handle this would be by copying the data into a newly
allocated buffer with space for the final null.
The current nfsd_symlink code tries to be more clever by skipping that
step in the (likely) case where the byte following the string is already
0.
But that assumes that the byte following the string is ours to look at.
In fact, it might be the first byte of a page that we can't read, or of
some object that another task might modify.
Worse, the NFSv4 code tries to fix the problem by actually writing to
that byte.
In the NFSv2/v3 cases this actually appears to be safe:
- nfs3svc_decode_symlinkargs explicitly null-terminates the data
(after first checking its length and copying it to a new
page).
- NFSv2 limits symlinks to 1k. The buffer holding the rpc
request is always at least a page, and the link data (and
previous fields) have maximum lengths that prevent the request
from reaching the end of a page.
In the NFSv4 case the CREATE op is potentially just one part of a long
compound so can end up on the end of a page if you're unlucky.
The minimal fix here is to copy and null-terminate in the NFSv4 case.
The nfsd_symlink() interface here seems too fragile, though. It should
really either do the copy itself every time or just require a
null-terminated string.
Reported-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The GPIO pin connected to card detect was inverted twice: once by
the argument to the GPIO line itself where it was magically marked
as active low by the flag GPIO_ACTIVE_LOW (0x01) in the third cell,
and also marked active low AGAIN by explicitly stating
"cd-inverted" (a deprecated method).
After commit 78f87df2b4f8760954d7d80603d0cfcbd4759683
"mmc: mmci: Use the common mmc DT parser" this results in the
line being inverted twice so it was effectively uninverted, while
the old code would not have this effect, instead disregarding the
flag on the GPIO line altogether, which is a bug. I admit the
semantics may be unclear but inverting twice is as good a
definition as any on how this should work.
So fix up the buggy device tree. Use proper #includes so the DTS
is clear and readable.
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Jonathan writes:
Fifth set of fixes for IIO in the 3.16 cycle.
One nasty one that has been around a long time and a couple of non compliant
ABI fixes.
* The demux code used to split out desired channels for devices that only
support reading sets of channels at one time had a bug where it was
building it's conversion tables against the wrong bitmap resulting in
it never actually demuxing anything. This is an old bug, but will be
effecting an increasing number of drivers as it is often used to avoid
some fiddly code in the individual drivers.
* bma180 and mma8452 weren't obeying the ABI wrt to units for acceleration.
Were in G rather than m/s^2. A little input check was missing from bma180
that might lead to acceptance of incorrect values. This last one is minor
but might lead to incorrect userspace code working and problems in the
future.
Pull vfs fixes from Christoph Hellwig:
"A vfsmount leak fix, and a compile warning fix"
* 'vfs-for-3.16' of git://git.infradead.org/users/hch/vfs:
fs: umount on symlink leaks mnt count
direct-io: fix uninitialized warning in do_direct_IO()
Promote one fix for 3.16
This fix was necessary after
9c15a24b038f ("x86/mce: Improve mcheck_init_device() error handling")
went in. What this patch did was, among others, check the return value
of misc_register and exit early if it encountered an error. Original
code sloppily didn't do that.
However,
cef12ee52b05 ("xen/mce: Add mcelog support for Xen platform")
made it so that xen's init routine xen_late_init_mcelog runs first. This
was needed for the xen mcelog device which is supposed to be independent
from the baremetal one.
Initially it was reported that misc_register() fails often on xen and
that's why it needed fixing. However, it is *supposed* to fail by
design, when running in dom0 so that the xen mcelog device file gets
registered first.
And *then* you need the notifier *not* unregistered on the error path so
that the timer does get deleted properly in the CPU hotplug notifier.
Btw, this fix is needed also on baremetal in the unlikely event that
misc_register(&mce_chrdev_device) fails there too.
I was unsure whether to rush it in now and decided to delay it to 3.17.
However, xen people wanted it promoted as it breaks xen when doing cpu
hotplug there. So, after a bit of simmering in tip/master for initial
smoke testing, let's move it to 3.16. It fixes a semi-regression which
got introduced in 3.16 so no need for stable tagging.
tip/x86/ras contains that exact same commit but we can't remove it
there as it is not the last one. It won't cause any merge issues, as I
confirmed locally but I should state here the special situation of this
one fix explicitly anyway.
Thanks.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This commit:
commit 6f6343f53d133bae516caf3d254bce37d8774625
Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Date: Thu Apr 17 17:17:33 2014 +0900
kprobes/x86: Call exception handlers directly from do_int3/do_debug
appears to have inadvertently dropped a check that the int3 came
from kernel mode. Trying to dereference addr when addr is
user-controlled is completely bogus.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/c4e339882c121aa76254f2adde3fcbdf502faec2.1405099506.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull SCSI barrier fix from James Bottomley:
"This is a potential data corruption fix: If we get an error sending
down a barrier, we simply ignore it meaning the barrier semantics get
violated without anyone being any the wiser. If the system crashes at
this point, the filesystem potentially becomes corrupt. Fix is to
report errors on failed barriers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: handle flush errors properly
Th AF_ALG socket was missing a security label (e.g. SELinux)
which means that socket was in "unlabeled" state.
This was recently demonstrated in the cryptsetup package
(cryptsetup v1.6.5 and later.)
See https://bugzilla.redhat.com/show_bug.cgi?id=1115120
This patch clones the sock's label from the parent sock
and resolves the issue (similar to AF_BLUETOOTH protocol family).
Cc: stable@vger.kernel.org
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When setting up the CMA region, we must ensure that the old section
mappings are flushed from the TLB before replacing them with page
tables, otherwise we can suffer from mismatched aliases if the CPU
speculatively prefetches from these mappings at an inopportune time.
A mismatched alias can occur when the TLB contains a section mapping,
but a subsequent prefetch causes it to load a page table mapping,
resulting in the possibility of the TLB containing two matching
mappings for the same virtual address region.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull input layer fixes from Dmitry Torokhov:
"A few fixups for the input subsystem"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: document INPUT_PROP_TOPBUTTONPAD
Input: fix defuzzing logic
Input: sirfsoc-onkey - fix GPL v2 license string typo
Input: st-keyscan - fix 'defined but not used' compiler warnings
Input: synaptics - add min/max quirk for pnp-id LEN2002 (Edge E531)
Input: i8042 - add Acer Aspire 5710 to nomux blacklist
Input: ti_am335x_tsc - warn about incorrect spelling
Input: wacom - cleanup multitouch code when touch_max is 2
1871ee134b73 ("libata: support the ata host which implements a queue
depth less than 32") directly used ata_port->scsi_host->can_queue from
ata_qc_new() to determine the number of tags supported by the host;
unfortunately, SAS controllers doing SATA don't initialize ->scsi_host
leading to the following oops.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: isci libsas scsi_transport_sas mgag200 drm_kms_helper ttm
CPU: 1 PID: 518 Comm: udevd Not tainted 3.16.0-rc6+ #62
Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
task: ffff880c1a00b280 ti: ffff88061a000000 task.ti: ffff88061a000000
RIP: 0010:[<ffffffff814e0618>] [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
RSP: 0018:ffff88061a003ae8 EFLAGS: 00010012
RAX: 0000000000000001 RBX: ffff88000241ca80 RCX: 00000000000000fa
RDX: 0000000000000020 RSI: 0000000000000020 RDI: ffff8806194aa298
RBP: ffff88061a003ae8 R08: ffff8806194a8000 R09: 0000000000000000
R10: 0000000000000000 R11: ffff88000241ca80 R12: ffff88061ad58200
R13: ffff8806194aa298 R14: ffffffff814e67a0 R15: ffff8806194a8000
FS: 00007f3ad7fe3840(0000) GS:ffff880627620000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000058 CR3: 000000061a118000 CR4: 00000000001407e0
Stack:
ffff88061a003b20 ffffffff814e96e1 ffff88000241ca80 ffff88061ad58200
ffff8800b6bf6000 ffff880c1c988000 ffff880619903850 ffff88061a003b68
ffffffffa0056ce1 ffff88061a003b48 0000000013d6e6f8 ffff88000241ca80
Call Trace:
[<ffffffff814e96e1>] ata_sas_queuecmd+0xa1/0x430
[<ffffffffa0056ce1>] sas_queuecommand+0x191/0x220 [libsas]
[<ffffffff8149afee>] scsi_dispatch_cmd+0x10e/0x300
[<ffffffff814a3bc5>] scsi_request_fn+0x2f5/0x550
[<ffffffff81317613>] __blk_run_queue+0x33/0x40
[<ffffffff8131781a>] queue_unplugged+0x2a/0x90
[<ffffffff8131ceb4>] blk_flush_plug_list+0x1b4/0x210
[<ffffffff8131d274>] blk_finish_plug+0x14/0x50
[<ffffffff8117eaa8>] __do_page_cache_readahead+0x198/0x1f0
[<ffffffff8117ee21>] force_page_cache_readahead+0x31/0x50
[<ffffffff8117ee7e>] page_cache_sync_readahead+0x3e/0x50
[<ffffffff81172ac6>] generic_file_read_iter+0x496/0x5a0
[<ffffffff81219897>] blkdev_read_iter+0x37/0x40
[<ffffffff811e307e>] new_sync_read+0x7e/0xb0
[<ffffffff811e3734>] vfs_read+0x94/0x170
[<ffffffff811e43c6>] SyS_read+0x46/0xb0
[<ffffffff811e33d1>] ? SyS_lseek+0x91/0xb0
[<ffffffff8171ee29>] system_call_fastpath+0x16/0x1b
Code: 00 00 00 88 50 29 83 7f 08 01 19 d2 83 e2 f0 83 ea 50 88 50 34 c6 81 1d 02 00 00 40 c6 81 17 02 00 00 00 5d c3 66 0f 1f 44 00 00 <89> 14 25 58 00 00 00
Fix it by introducing ata_host->n_tags which is initialized to
ATA_MAX_QUEUE - 1 in ata_host_init() for SAS controllers and set to
scsi_host_template->can_queue in ata_host_register() for !SAS ones.
As SAS hosts are never registered, this will give them the same
ATA_MAX_QUEUE - 1 as before. Note that we can't use
scsi_host->can_queue directly for SAS hosts anyway as they can go
higher than the libata maximum.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Reported-by: Jesse Brandeburg <jesse.brandeburg@gmail.com>
Reported-by: Peter Hurley <peter@hurleysoftware.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Fixes: 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32")
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org
Pull timer fixes from Thomas Gleixner:
"Two fixes in the timer area:
- a long-standing lock inversion due to a printk
- suspend-related hrtimer corruption in sched_clock"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timer: Fix lock inversion between hrtimer_bases.lock and scheduler locks
sched_clock: Avoid corrupting hrtimer tree during suspend
Pull ARM fixes from Russell King:
"A few fixes for ARM. Some of these are correctness issues:
- TLBs must be flushed after the old mappings are removed by the DMA
mapping code, but before the new mappings are established.
- An off-by-one entry error in the Keystone LPAE setup code.
Fixes include:
- ensuring that the identity mapping for LPAE does not remove the
kernel image from the identity map.
- preventing userspace from trapping into kgdb.
- fixing a preemption issue in the Intel iwmmxt code.
- fixing a build error with nommu.
Other changes include:
- Adding a note about which areas of memory are expected to be
accessible while the identity mapping tables are in place"
* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
ARM: 8124/1: don't enter kgdb when userspace executes a kgdb break instruction
ARM: idmap: add identity mapping usage note
ARM: 8115/1: LPAE: reduce damage caused by idmap to virtual memory layout
ARM: fix alignment of keystone page table fixup
ARM: 8112/1: only select ARM_PATCH_PHYS_VIRT if MMU is enabled
ARM: 8100/1: Fix preemption disable in iwmmxt_task_enable()
ARM: DMA: ensure that old section mappings are flushed from the TLB
clockevents_increase_min_delta() calls printk() from under
hrtimer_bases.lock. That causes lock inversion on scheduler locks because
printk() can call into the scheduler. Lockdep puts it as:
======================================================
[ INFO: possible circular locking dependency detected ]
3.15.0-rc8-06195-g939f04b #2 Not tainted
-------------------------------------------------------
trinity-main/74 is trying to acquire lock:
(&port_lock_key){-.....}, at: [<811c60be>] serial8250_console_write+0x8c/0x10c
but task is already holding lock:
(hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #5 (hrtimer_bases.lock){-.-...}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<8103c918>] __hrtimer_start_range_ns+0x1c/0x197
[<8107ec20>] perf_swevent_start_hrtimer.part.41+0x7a/0x85
[<81080792>] task_clock_event_start+0x3a/0x3f
[<810807a4>] task_clock_event_add+0xd/0x14
[<8108259a>] event_sched_in+0xb6/0x17a
[<810826a2>] group_sched_in+0x44/0x122
[<81082885>] ctx_sched_in.isra.67+0x105/0x11f
[<810828e6>] perf_event_sched_in.isra.70+0x47/0x4b
[<81082bf6>] __perf_install_in_context+0x8b/0xa3
[<8107eb8e>] remote_function+0x12/0x2a
[<8105f5af>] smp_call_function_single+0x2d/0x53
[<8107e17d>] task_function_call+0x30/0x36
[<8107fb82>] perf_install_in_context+0x87/0xbb
[<810852c9>] SYSC_perf_event_open+0x5c6/0x701
[<810856f9>] SyS_perf_event_open+0x17/0x19
[<8142f8ee>] syscall_call+0x7/0xb
-> #4 (&ctx->lock){......}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f04c>] _raw_spin_lock+0x21/0x30
[<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
[<8142cacc>] __schedule+0x4c6/0x4cb
[<8142cae0>] schedule+0xf/0x11
[<8142f9a6>] work_resched+0x5/0x30
-> #3 (&rq->lock){-.-.-.}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f04c>] _raw_spin_lock+0x21/0x30
[<81040873>] __task_rq_lock+0x33/0x3a
[<8104184c>] wake_up_new_task+0x25/0xc2
[<8102474b>] do_fork+0x15c/0x2a0
[<810248a9>] kernel_thread+0x1a/0x1f
[<814232a2>] rest_init+0x1a/0x10e
[<817af949>] start_kernel+0x303/0x308
[<817af2ab>] i386_start_kernel+0x79/0x7d
-> #2 (&p->pi_lock){-.-...}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<810413dd>] try_to_wake_up+0x1d/0xd6
[<810414cd>] default_wake_function+0xb/0xd
[<810461f3>] __wake_up_common+0x39/0x59
[<81046346>] __wake_up+0x29/0x3b
[<811b8733>] tty_wakeup+0x49/0x51
[<811c3568>] uart_write_wakeup+0x17/0x19
[<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
[<811c5f28>] serial8250_handle_irq+0x54/0x6a
[<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
[<811c56d8>] serial8250_interrupt+0x38/0x9e
[<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
[<81051296>] handle_irq_event+0x2c/0x43
[<81052cee>] handle_level_irq+0x57/0x80
[<81002a72>] handle_irq+0x46/0x5c
[<810027df>] do_IRQ+0x32/0x89
[<8143036e>] common_interrupt+0x2e/0x33
[<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
[<811c25a4>] uart_start+0x2d/0x32
[<811c2c04>] uart_write+0xc7/0xd6
[<811bc6f6>] n_tty_write+0xb8/0x35e
[<811b9beb>] tty_write+0x163/0x1e4
[<811b9cd9>] redirected_tty_write+0x6d/0x75
[<810b6ed6>] vfs_write+0x75/0xb0
[<810b7265>] SyS_write+0x44/0x77
[<8142f8ee>] syscall_call+0x7/0xb
-> #1 (&tty->write_wait){-.....}:
[<8104a942>] lock_acquire+0x92/0x101
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<81046332>] __wake_up+0x15/0x3b
[<811b8733>] tty_wakeup+0x49/0x51
[<811c3568>] uart_write_wakeup+0x17/0x19
[<811c5dc1>] serial8250_tx_chars+0xbc/0xfb
[<811c5f28>] serial8250_handle_irq+0x54/0x6a
[<811c5f57>] serial8250_default_handle_irq+0x19/0x1c
[<811c56d8>] serial8250_interrupt+0x38/0x9e
[<810510e7>] handle_irq_event_percpu+0x5f/0x1e2
[<81051296>] handle_irq_event+0x2c/0x43
[<81052cee>] handle_level_irq+0x57/0x80
[<81002a72>] handle_irq+0x46/0x5c
[<810027df>] do_IRQ+0x32/0x89
[<8143036e>] common_interrupt+0x2e/0x33
[<8142f23c>] _raw_spin_unlock_irqrestore+0x3f/0x49
[<811c25a4>] uart_start+0x2d/0x32
[<811c2c04>] uart_write+0xc7/0xd6
[<811bc6f6>] n_tty_write+0xb8/0x35e
[<811b9beb>] tty_write+0x163/0x1e4
[<811b9cd9>] redirected_tty_write+0x6d/0x75
[<810b6ed6>] vfs_write+0x75/0xb0
[<810b7265>] SyS_write+0x44/0x77
[<8142f8ee>] syscall_call+0x7/0xb
-> #0 (&port_lock_key){-.....}:
[<8104a62d>] __lock_acquire+0x9ea/0xc6d
[<8104a942>] lock_acquire+0x92/0x101
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<811c60be>] serial8250_console_write+0x8c/0x10c
[<8104e402>] call_console_drivers.constprop.31+0x87/0x118
[<8104f5d5>] console_unlock+0x1d7/0x398
[<8104fb70>] vprintk_emit+0x3da/0x3e4
[<81425f76>] printk+0x17/0x19
[<8105bfa0>] clockevents_program_min_delta+0x104/0x116
[<8105c548>] clockevents_program_event+0xe7/0xf3
[<8105cc1c>] tick_program_event+0x1e/0x23
[<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
[<8103c49e>] __remove_hrtimer+0x5b/0x79
[<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
[<8103cb4b>] hrtimer_cancel+0xd/0x18
[<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
[<81080705>] task_clock_event_stop+0x20/0x64
[<81080756>] task_clock_event_del+0xd/0xf
[<81081350>] event_sched_out+0xab/0x11e
[<810813e0>] group_sched_out+0x1d/0x66
[<81081682>] ctx_sched_out+0xaf/0xbf
[<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
[<8142cacc>] __schedule+0x4c6/0x4cb
[<8142cae0>] schedule+0xf/0x11
[<8142f9a6>] work_resched+0x5/0x30
other info that might help us debug this:
Chain exists of:
&port_lock_key --> &ctx->lock --> hrtimer_bases.lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(hrtimer_bases.lock);
lock(&ctx->lock);
lock(hrtimer_bases.lock);
lock(&port_lock_key);
*** DEADLOCK ***
4 locks held by trinity-main/74:
#0: (&rq->lock){-.-.-.}, at: [<8142c6f3>] __schedule+0xed/0x4cb
#1: (&ctx->lock){......}, at: [<81081df3>] __perf_event_task_sched_out+0x1dc/0x34f
#2: (hrtimer_bases.lock){-.-...}, at: [<8103caeb>] hrtimer_try_to_cancel+0x13/0x66
#3: (console_lock){+.+...}, at: [<8104fb5d>] vprintk_emit+0x3c7/0x3e4
stack backtrace:
CPU: 0 PID: 74 Comm: trinity-main Not tainted 3.15.0-rc8-06195-g939f04b #2
00000000 81c3a310 8b995c14 81426f69 8b995c44 81425a99 8161f671 8161f570
8161f538 8161f559 8161f538 8b995c78 8b142bb0 00000004 8b142fdc 8b142bb0
8b995ca8 8104a62d 8b142fac 000016f2 81c3a310 00000001 00000001 00000003
Call Trace:
[<81426f69>] dump_stack+0x16/0x18
[<81425a99>] print_circular_bug+0x18f/0x19c
[<8104a62d>] __lock_acquire+0x9ea/0xc6d
[<8104a942>] lock_acquire+0x92/0x101
[<811c60be>] ? serial8250_console_write+0x8c/0x10c
[<811c6032>] ? wait_for_xmitr+0x76/0x76
[<8142f11d>] _raw_spin_lock_irqsave+0x2e/0x3e
[<811c60be>] ? serial8250_console_write+0x8c/0x10c
[<811c60be>] serial8250_console_write+0x8c/0x10c
[<8104af87>] ? lock_release+0x191/0x223
[<811c6032>] ? wait_for_xmitr+0x76/0x76
[<8104e402>] call_console_drivers.constprop.31+0x87/0x118
[<8104f5d5>] console_unlock+0x1d7/0x398
[<8104fb70>] vprintk_emit+0x3da/0x3e4
[<81425f76>] printk+0x17/0x19
[<8105bfa0>] clockevents_program_min_delta+0x104/0x116
[<8105cc1c>] tick_program_event+0x1e/0x23
[<8103c43c>] hrtimer_force_reprogram+0x88/0x8f
[<8103c49e>] __remove_hrtimer+0x5b/0x79
[<8103cb21>] hrtimer_try_to_cancel+0x49/0x66
[<8103cb4b>] hrtimer_cancel+0xd/0x18
[<8107f102>] perf_swevent_cancel_hrtimer.part.60+0x2b/0x30
[<81080705>] task_clock_event_stop+0x20/0x64
[<81080756>] task_clock_event_del+0xd/0xf
[<81081350>] event_sched_out+0xab/0x11e
[<810813e0>] group_sched_out+0x1d/0x66
[<81081682>] ctx_sched_out+0xaf/0xbf
[<81081e04>] __perf_event_task_sched_out+0x1ed/0x34f
[<8104416d>] ? __dequeue_entity+0x23/0x27
[<81044505>] ? pick_next_task_fair+0xb1/0x120
[<8142cacc>] __schedule+0x4c6/0x4cb
[<81047574>] ? trace_hardirqs_off_caller+0xd7/0x108
[<810475b0>] ? trace_hardirqs_off+0xb/0xd
[<81056346>] ? rcu_irq_exit+0x64/0x77
Fix the problem by using printk_deferred() which does not call into the
scheduler.
Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Cc: stable@vger.kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
The kgdb breakpoint hooks (kgdb_brk_fn and kgdb_compiled_brk_fn)
should only be entered when a kgdb break instruction is executed
from the kernel. Otherwise, if kgdb is enabled, a userspace program
can cause the kernel to drop into the debugger by executing either
KGDB_BREAKINST or KGDB_COMPILED_BREAK.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Omar Sandoval <osandov@osandov.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
During suspend we call sched_clock_poll() to update the epoch and
accumulated time and reprogram the sched_clock_timer to fire
before the next wrap-around time. Unfortunately,
sched_clock_poll() doesn't restart the timer, instead it relies
on the hrtimer layer to do that and during suspend we aren't
calling that function from the hrtimer layer. Instead, we're
reprogramming the expires time while the hrtimer is enqueued,
which can cause the hrtimer tree to be corrupted. Furthermore, we
restart the timer during suspend but we update the epoch during
resume which seems counter-intuitive.
Let's fix this by saving the accumulated state and canceling the
timer during suspend. On resume we can update the epoch and
restart the timer similar to what we would do if we were starting
the clock for the first time.
Fixes: a08ca5d1089d "sched_clock: Use an hrtimer instead of timer"
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Link: http://lkml.kernel.org/r/1406174630-23458-1-git-send-email-john.stultz@linaro.org
Cc: Ingo Molnar <mingo@kernel.org>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Pull x86 fix from Peter Anvin:
"A single fix to not invoke the espfix code on Xen PV, as it turns out
to oops the guest when invoked after all. This patch leaves some
amount of dead code, in particular unnecessary initialization of the
espfix stacks when they won't be used, but in the interest of keeping
the patch minimal that cleanup can wait for the next cycle"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86_64/entry/xen: Do not invoke espfix64 on Xen
Add a note about the usage of the identity mapping; we do not support
accesses outside of the identity map region and kernel image while a
CPU is using the identity map. This is because the identity mapping
may overwrite vmalloc space, IO mappings, the vectors pages, etc.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull staging driver bugfixes from Greg KH:
"Here are some tiny staging driver bugfixes that I've had in my tree
for the past week that resolve some reported issues. Nothing major at
all, but it would be good to get them merged for 3.16-rc8 or -final"
* tag 'staging-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: vt6655: Fix disassociated messages every 10 seconds
staging: vt6655: Fix Warning on boot handle_irq_event_percpu.
staging: rtl8723au: rtw_resume(): release semaphore before exit on error
iio:bma180: Missing check for frequency fractional part
iio:bma180: Fix scale factors to report correct acceleration units
iio: buffer: Fix demux table creation
This moves the espfix64 logic into native_iret. To make this work,
it gets rid of the native patch for INTERRUPT_RETURN:
INTERRUPT_RETURN on native kernels is now 'jmp native_iret'.
This changes the 16-bit SS behavior on Xen from OOPSing to leaking
some bits of the Xen hypervisor's RSP (I think).
[ hpa: this is a nonzero cost on native, but probably not enough to
measure. Xen needs to fix this in their own code, probably doing
something equivalent to espfix64. ]
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Link: http://lkml.kernel.org/r/7b8f1d8ef6597cb16ae004a43c56980a7de3cf94.1406129132.git.luto@amacapital.net
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
The direct-io.c rewrite to use the iov_iter infrastructure stopped updating
the size field in struct dio_submit, and thus rendered the check for
allowing asynchronous completions to always return false. Fix this by
comparing it to the count of bytes in the iov_iter instead.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Tim Chen <tim.c.chen@linux.intel.com>
On LPAE, each level 1 (pgd) page table entry maps 1GiB, and the level 2
(pmd) entries map 2MiB.
When the identity mapping is created on LPAE, the pgd pointers are copied
from the swapper_pg_dir. If we find that we need to modify the contents
of a pmd, we allocate a new empty pmd table and insert it into the
appropriate 1GB slot, before then filling it with the identity mapping.
However, if the 1GB slot covers the kernel lowmem mappings, we obliterate
those mappings.
When replacing a PMD, first copy the old PMD contents to the new PMD, so
that we preserve the existing mappings, particularly the mappings of the
kernel itself.
[rewrote commit message and added code comment -- rmk]
Fixes: ae2de101739c ("ARM: LPAE: Add identity mapping support for the 3-level page table format")
Signed-off-by: Konstantin Khlebnikov <k.khlebnikov@samsung.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull arm64 fix from Catalin Marinas:
"Fix arm64 regression introduced by limiting the CMA buffer to ZONE_DMA
on platforms where RAM starts above 4GB (and ZONE_DMA becoming 0)"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Create non-empty ZONE_DMA when DRAM starts above 4GB
Commit 8c7424cff6 "nfsd4: don't try to encode conflicting owner if low
on space" forgot to free conf->data in nfsd4_encode_lockt and before
sign conf->data to NULL in nfsd4_encode_lock_denied, causing a leak.
Worse, kfree() can be called on an uninitialized pointer in the case of
a succesful lock (or one that fails for a reason other than a conflict).
(Note that lock->lk_denied.ld_owner.data appears it should be zero here,
until you notice that it's one arm of a union the other arm of which is
written to in the succesful case by the
memcpy(&lock->lk_resp_stateid, &lock_stp->st_stid.sc_stateid,
sizeof(stateid_t));
in nfsd4_lock(). In the 32-bit case this overwrites ld_owner.data.)
Signed-off-by: Kinglong Mee <kinglongmee@gmail.com>
Fixes: 8c7424cff6 ""nfsd4: don't try to encode conflicting owner if low on space"
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Pull device mapper fixes from Mike Snitzer:
"Fix dm bufio shrinker to properly zero-fill all fields.
Fix race in dm cache that caused improper reporting of the number of
dirty blocks in the cache"
* tag 'dm-3.16-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm cache: fix race affecting dirty block count
dm bufio: fully initialize shrinker
byReAssocCount is incremented every second resulting in
disassociated message being send every 10 seconds whether
connection or not.
byReAssocCount should only advance while eCommandState
is in WLAN_ASSOCIATE_WAIT
Change existing scope to if condition.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull ACPI fix from Rafael Wysocki:
"One commit that fixes a problem causing PNP devices to be associated
with wrong ACPI device objects sometimes during device enumeration due
to an incorrect check in a matching function.
That problem was uncovered by the ACPI device enumeration rework in
3.14"
* tag 'pm+acpi-3.16-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / PNP: Fix acpi_pnp_match()
Pull Xtensa fixes from Chris Zankel:
- resolve FIXMEs in double exception handler for window overflow. This
fix makes native building of linux on xtensa host possible;
- fix sysmem region removal issue introduced in 3.15.
* tag 'xtensa-next-20140721' of git://github.com/czankel/xtensa-linux:
xtensa: fix sysmem reservation at the end of existing block
xtensa: add fixup for double exception raised in window overflow
ZONE_DMA is created to allow 32-bit only devices to access memory in the
absence of an IOMMU. On systems where the memory starts above 4GB, it is
expected that some devices have a DMA offset hardwired to be able to
access the bottom of the memory. Linux currently supports DT bindings
for the DMA offsets but they are not (easily) available early during
boot.
This patch tries to guess a DMA offset and assumes that ZONE_DMA
corresponds to the 32-bit mask above the start of DRAM.
Fixes: 2d5a5612bc (arm64: Limit the CMA buffer to 32-bit if ZONE_DMA)
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Mark Salter <msalter@redhat.com>
Tested-by: Mark Salter <msalter@redhat.com>
Tested-by: Anup Patel <anup.patel@linaro.org>
Pull ARM straggler SoC fix from Olof Johansson:
"A DT bugfix for Nomadik that had an ambigouos double-inversion of a
gpio line, and one MAINTAINER URL update that might as well go in now.
We could hold off until the merge window, but then we'll just have to
mark the DT fix for stable and it just seems like in total causing
more work"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
MAINTAINERS: Update Tegra Git URL
ARM: nomadik: fix up double inversion in DT
nr_dirty is updated without locking, causing it to drift so that it is
non-zero (either a small positive integer, or a very large one when an
underflow occurs) even when there are no actual dirty blocks. This was
due to a race between the workqueue and map function accessing nr_dirty
in parallel without proper protection.
People were seeing under runs due to a race on increment/decrement of
nr_dirty, see: https://lkml.org/lkml/2014/6/3/648
Fix this by using an atomic_t for nr_dirty.
Reported-by: roma1390@gmail.com
Signed-off-by: Anssi Hannula <anssi.hannula@iki.fi>
Signed-off-by: Joe Thornber <ejt@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
WARNING: CPU: 0 PID: 929 at /home/apw/COD/linux/kernel/irq/handle.c:147 handle_irq_event_percpu+0x1d1/0x1e0()
irq 17 handler device_intr+0x0/0xa80 [vt6655_stage] enabled interrupts
Using spin_lock_irqsave appears to fix this.
Signed-off-by: Malcolm Priestley <tvboxspy@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Pull perf fixes from Thomas Gleixner:
"A bunch of fixes for perf and kprobes:
- revert a commit that caused a perf group regression
- silence dmesg spam
- fix kprobe probing errors on ia64 and ppc64
- filter kprobe faults from userspace
- lockdep fix for perf exit path
- prevent perf #GP in KVM guest
- correct perf event and filters"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes: Fix "Failed to find blacklist" probing errors on ia64 and ppc64
kprobes/x86: Don't try to resolve kprobe faults from userspace
perf/x86/intel: Avoid spamming kernel log for BTS buffer failure
perf/x86/intel: Protect LBR and extra_regs against KVM lying
perf: Fix lockdep warning on process exit
perf/x86/intel/uncore: Fix SNB-EP/IVT Cbox filter mappings
perf/x86/intel: Use proper dTLB-load-misses event on IvyBridge
perf: Revert ("perf: Always destroy groups on exit")
The acpi_pnp_match() function is used for finding the ACPI device
object that should be associated with the given PNP device.
Unfortunately, the check used by that function is not strict enough
and may cause success to be returned for a wrong ACPI device object.
To fix that, use the observation that the pointer to the ACPI
device object in question is already stored in the data field
in struct pnp_dev, so acpi_pnp_match() can simply use that
field to do its job.
This problem was uncovered in 3.14 by commit 202317a573b2 (ACPI / scan:
Add acpi_device objects for all device nodes in the namespace).
Fixes: 202317a573b2 (ACPI / scan: Add acpi_device objects for all device nodes in the namespace)
Reported-and-tested-by: Vinson Lee <vlee@twopensource.com>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This fixes the following warning:
warning: (ARCH_MULTIPLATFORM && ARCH_INTEGRATOR && ARCH_SHMOBILE_LEGACY) selects ARM_PATCH_PHYS_VIRT which has unmet direct dependencies (!XIP_KERNEL && MMU && (!ARCH_REALVIEW || !SPARSEMEM))
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull pin control fixes from Linus Walleij:
"Here are three pin control fixes for the v3.16 series. Sorry that
some of these arrive late, the summer heat in Sweden makes me slow.
- an IRQ handling fix for the STi driver, also for stable
- another IRQ fix for the RCAR GPIO driver
- a MAINTAINERS entry"
* tag 'pinctrl-v3.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
gpio: rcar: Add support for DT IRQ flags
MAINTAINERS: Add entry for the Renesas pin controller driver
pinctrl: st: Fix irqmux handler
include/linux/sched.h implements TASK_SIZE_OF as TASK_SIZE if it
is not set by the architecture headers. TASK_SIZE uses the
current task to determine the size of the virtual address space.
On a 64-bit kernel this will cause reading /proc/pid/pagemap of a
64-bit process from a 32-bit process to return EOF when it reads
past 0xffffffff.
Implement TASK_SIZE_OF exactly the same as TASK_SIZE with
test_tsk_thread_flag instead of test_thread_flag.
Cc: stable@vger.kernel.org
Signed-off-by: Colin Cross <ccross@android.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
XDR requires 4-byte alignment; nfs4d READLINK reply writes out the padding,
but truncates the packet to the padding-less size.
Fix by taking the padding into consideration when truncating the packet.
Symptoms:
# ll /mnt/
ls: cannot read symbolic link /mnt/test: Input/output error
total 4
-rw-r--r--. 1 root root 0 Jun 14 01:21 123456
lrwxrwxrwx. 1 root root 6 Jul 2 03:33 test
drwxr-xr-x. 1 root root 0 Jul 2 23:50 tmp
drwxr-xr-x. 1 root root 60 Jul 2 23:44 tree
Signed-off-by: Avi Kivity <avi@cloudius-systems.com>
Fixes: 476a7b1f4b2c (nfsd4: don't treat readlink like a zero-copy operation)
Reviewed-by: Kinglong Mee <kinglongmee@gmail.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
swarren/linux-tegra.git is a stale location; it has moved to
tegra/linux.git.
While the git protocol re-directs to the new location, HTTP does not.
Besides, MAINTAINERS should contain the canonical URL.
Signed-off-by: Andreas Färber <afaerber@suse.de>
[swarren, updated commit message]
Signed-off-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
1d3d4437eae1 ("vmscan: per-node deferred work") added a flags field to
struct shrinker assuming that all shrinkers were zero filled. The dm
bufio shrinker is not zero filled, which leaves arbitrary kmalloc() data
in flags. So far the only defined flags bit is SHRINKER_NUMA_AWARE.
But there are proposed patches which add other bits to shrinker.flags
(e.g. memcg awareness).
Rather than simply initializing the shrinker, this patch uses kzalloc()
when allocating the dm_bufio_client to ensure that the embedded shrinker
and any other similar structures are zeroed.
This fixes theoretical over aggressive shrinking of dm bufio objects.
If the uninitialized dm_bufio_client.shrinker.flags contains
SHRINKER_NUMA_AWARE then shrink_slab() would call the dm shrinker for
each numa node rather than just once. This has been broken since 3.12.
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.12+
Pull x86 fixes from Peter Anvin:
"A couple of crash fixes, plus a fix that on 32 bits would cause a
missing -ENOSYS for nonexistent system calls"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, cpu: Fix cache topology for early P4-SMT
x86_32, entry: Store badsys error code in %eax
x86, MCE: Robustify mcheck_init_device
On ia64 and ppc64, function pointers do not point to the
entry address of the function, but to the address of a
function descriptor (which contains the entry address and misc
data).
Since the kprobes code passes the function pointer stored
by NOKPROBE_SYMBOL() to kallsyms_lookup_size_offset() for
initalizing its blacklist, it fails and reports many errors,
such as:
Failed to find blacklist 0001013168300000
Failed to find blacklist 0001013000f0a000
[...]
To fix this bug, use arch_deref_entry_point() to get the
function entry address for kallsyms_lookup_size_offset()
instead of the raw function pointer.
Suzuki also pointed out that blacklist entries should also
be updated as well.
Reported-by: Tony Luck <tony.luck@gmail.com>
Fixed-by: Suzuki K. Poulose <suzuki@in.ibm.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (for powerpc)
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: sparse@chrisli.org
Cc: Paul Mackerras <paulus@samba.org>
Cc: akataria@vmware.com
Cc: anil.s.keshavamurthy@intel.com
Cc: Fenghua Yu <fenghua.yu@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Chris Wright <chrisw@sous-sol.org>
Cc: yrl.pp-manager.tt@hitachi.com
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: rdunlap@infradead.org
Cc: dl9pf@gmx.de
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: linux-ia64@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/20140717114411.13401.2632.stgit@kbuild-fedora.novalocal
Signed-off-by: Ingo Molnar <mingo@kernel.org>
ABE DPLL frequency need to be lowered from 361267200
to 180633600 to facilitate the ATL requironments.
The dpll_abe_m2x2_ck clock need to be set to double
of ABE DPLL rate in order to have correct clocks
for audio.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com>
Acked-by: Tero Kristo <t-kristo@ti.com>
Signed-off-by: Mike Turquette <mturquette@linaro.org>
commit 431a84b1a4f7d1a0085d5b91330c5053cc8e8b12
("ARM: 8034/1: Disable preemption in iwmmxt_task_enable()")
introduced macros {inc,dec}_preempt_count to iwmmxt_task_enable
to make it run with preemption disabled.
Unfortunately, other functions in iwmmxt.S also use concan_{save,dump,load}
sections located in iwmmxt_task_enable() to deal with iWMMXt coprocessor.
This causes an unbalanced preempt_count due to excessive dec_preempt_count
and destroyed return addresses in callers of concan_ labels due to a register
collision:
Linux version 3.16.0-rc3-00062-gd92a333-dirty (jef@armhf) (gcc version 4.8.3 (Debian 4.8.3-4) ) #5 PREEMPT Thu Jul 3 19:46:39 CEST 2014
CPU: ARMv7 Processor [560f5815] revision 5 (ARMv7), cr=10c5387d
CPU: PIPT / VIPT nonaliasing data cache, PIPT instruction cache
Machine model: SolidRun CuBox
...
PJ4 iWMMXt v2 coprocessor enabled.
...
Unable to handle kernel paging request at virtual address fffffffe
pgd = bb25c000
[fffffffe] *pgd=3bfde821, *pte=00000000, *ppte=00000000
Internal error: Oops: 80000007 [#1] PREEMPT ARM
Modules linked in:
CPU: 0 PID: 62 Comm: startpar Not tainted 3.16.0-rc3-00062-gd92a333-dirty #5
task: bb230b80 ti: bb256000 task.ti: bb256000
PC is at 0xfffffffe
LR is at iwmmxt_task_copy+0x44/0x4c
pc : [<fffffffe>] lr : [<800130ac>] psr: 40000033
sp : bb257de8 ip : 00000013 fp : bb257ea4
r10: bb256000 r9 : fffffdfe r8 : 76e898e6
r7 : bb257ec8 r6 : bb256000 r5 : 7ea12760 r4 : 000000a0
r3 : ffffffff r2 : 00000003 r1 : bb257df8 r0 : 00000000
Flags: nZcv IRQs on FIQs on Mode SVC_32 ISA Thumb Segment user
Control: 10c5387d Table: 3b25c019 DAC: 00000015
Process startpar (pid: 62, stack limit = 0xbb256248)
This patch fixes the issue by moving concan_{save,dump,load} into separate
code sections and make iwmmxt_task_enable() call them in the same way the
other functions use concan_ symbols. The test for valid ownership is moved
to concan_save and is safe for the other user of it, iwmmxt_task_disable().
The register collision is also resolved by moving concan_ symbols as
{inc,dec}_preempt_count are now local to iwmmxt_task_enable().
Fixes: 431a84b1a4f7 ("ARM: 8034/1: Disable preemption in iwmmxt_task_enable()")
Signed-off-by: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Reported-by: Jean-Francois Moine <moinejf@free.fr>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull libata regression fix from Tejun Heo:
"The last libata/for-3.16-fixes pull contained a regression introduced
by 1871ee134b73 ("libata: support the ata host which implements a
queue depth less than 32") which in turn was a fix for a regression
introduced earlier while changing queue tag order to accomodate hard
drives which perform poorly if tags are not allocated in circular
order (ugh...).
The regression happens only for SAS controllers making use of libata
to serve ATA devices. They don't fill an ata_host field which is used
by the new tag allocation function leading to NULL dereference.
This patch adds a new intermediate field ata_host->n_tags which is
initialized for both SAS and !SAS cases to fix the issue"
* 'for-3.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/libata:
libata: introduce ata_host->n_tags to avoid oops on SAS controllers
An NFS operation that creates a new symlink includes the symlink data,
which is xdr-encoded as a length followed by the data plus 0 to 3 bytes
of zero-padding as required to reach a 4-byte boundary.
The vfs, on the other hand, wants null-terminated data.
The simple way to handle this would be by copying the data into a newly
allocated buffer with space for the final null.
The current nfsd_symlink code tries to be more clever by skipping that
step in the (likely) case where the byte following the string is already
0.
But that assumes that the byte following the string is ours to look at.
In fact, it might be the first byte of a page that we can't read, or of
some object that another task might modify.
Worse, the NFSv4 code tries to fix the problem by actually writing to
that byte.
In the NFSv2/v3 cases this actually appears to be safe:
- nfs3svc_decode_symlinkargs explicitly null-terminates the data
(after first checking its length and copying it to a new
page).
- NFSv2 limits symlinks to 1k. The buffer holding the rpc
request is always at least a page, and the link data (and
previous fields) have maximum lengths that prevent the request
from reaching the end of a page.
In the NFSv4 case the CREATE op is potentially just one part of a long
compound so can end up on the end of a page if you're unlucky.
The minimal fix here is to copy and null-terminate in the NFSv4 case.
The nfsd_symlink() interface here seems too fragile, though. It should
really either do the copy itself every time or just require a
null-terminated string.
Reported-by: Jeff Layton <jlayton@primarydata.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
The GPIO pin connected to card detect was inverted twice: once by
the argument to the GPIO line itself where it was magically marked
as active low by the flag GPIO_ACTIVE_LOW (0x01) in the third cell,
and also marked active low AGAIN by explicitly stating
"cd-inverted" (a deprecated method).
After commit 78f87df2b4f8760954d7d80603d0cfcbd4759683
"mmc: mmci: Use the common mmc DT parser" this results in the
line being inverted twice so it was effectively uninverted, while
the old code would not have this effect, instead disregarding the
flag on the GPIO line altogether, which is a bug. I admit the
semantics may be unclear but inverting twice is as good a
definition as any on how this should work.
So fix up the buggy device tree. Use proper #includes so the DTS
is clear and readable.
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
Jonathan writes:
Fifth set of fixes for IIO in the 3.16 cycle.
One nasty one that has been around a long time and a couple of non compliant
ABI fixes.
* The demux code used to split out desired channels for devices that only
support reading sets of channels at one time had a bug where it was
building it's conversion tables against the wrong bitmap resulting in
it never actually demuxing anything. This is an old bug, but will be
effecting an increasing number of drivers as it is often used to avoid
some fiddly code in the individual drivers.
* bma180 and mma8452 weren't obeying the ABI wrt to units for acceleration.
Were in G rather than m/s^2. A little input check was missing from bma180
that might lead to acceptance of incorrect values. This last one is minor
but might lead to incorrect userspace code working and problems in the
future.
Promote one fix for 3.16
This fix was necessary after
9c15a24b038f ("x86/mce: Improve mcheck_init_device() error handling")
went in. What this patch did was, among others, check the return value
of misc_register and exit early if it encountered an error. Original
code sloppily didn't do that.
However,
cef12ee52b05 ("xen/mce: Add mcelog support for Xen platform")
made it so that xen's init routine xen_late_init_mcelog runs first. This
was needed for the xen mcelog device which is supposed to be independent
from the baremetal one.
Initially it was reported that misc_register() fails often on xen and
that's why it needed fixing. However, it is *supposed* to fail by
design, when running in dom0 so that the xen mcelog device file gets
registered first.
And *then* you need the notifier *not* unregistered on the error path so
that the timer does get deleted properly in the CPU hotplug notifier.
Btw, this fix is needed also on baremetal in the unlikely event that
misc_register(&mce_chrdev_device) fails there too.
I was unsure whether to rush it in now and decided to delay it to 3.17.
However, xen people wanted it promoted as it breaks xen when doing cpu
hotplug there. So, after a bit of simmering in tip/master for initial
smoke testing, let's move it to 3.16. It fixes a semi-regression which
got introduced in 3.16 so no need for stable tagging.
tip/x86/ras contains that exact same commit but we can't remove it
there as it is not the last one. It won't cause any merge issues, as I
confirmed locally but I should state here the special situation of this
one fix explicitly anyway.
Thanks.
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
This commit:
commit 6f6343f53d133bae516caf3d254bce37d8774625
Author: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Date: Thu Apr 17 17:17:33 2014 +0900
kprobes/x86: Call exception handlers directly from do_int3/do_debug
appears to have inadvertently dropped a check that the int3 came
from kernel mode. Trying to dereference addr when addr is
user-controlled is completely bogus.
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Link: http://lkml.kernel.org/r/c4e339882c121aa76254f2adde3fcbdf502faec2.1405099506.git.luto@amacapital.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull SCSI barrier fix from James Bottomley:
"This is a potential data corruption fix: If we get an error sending
down a barrier, we simply ignore it meaning the barrier semantics get
violated without anyone being any the wiser. If the system crashes at
this point, the filesystem potentially becomes corrupt. Fix is to
report errors on failed barriers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: handle flush errors properly
Th AF_ALG socket was missing a security label (e.g. SELinux)
which means that socket was in "unlabeled" state.
This was recently demonstrated in the cryptsetup package
(cryptsetup v1.6.5 and later.)
See https://bugzilla.redhat.com/show_bug.cgi?id=1115120
This patch clones the sock's label from the parent sock
and resolves the issue (similar to AF_BLUETOOTH protocol family).
Cc: stable@vger.kernel.org
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When setting up the CMA region, we must ensure that the old section
mappings are flushed from the TLB before replacing them with page
tables, otherwise we can suffer from mismatched aliases if the CPU
speculatively prefetches from these mappings at an inopportune time.
A mismatched alias can occur when the TLB contains a section mapping,
but a subsequent prefetch causes it to load a page table mapping,
resulting in the possibility of the TLB containing two matching
mappings for the same virtual address region.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Pull input layer fixes from Dmitry Torokhov:
"A few fixups for the input subsystem"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: document INPUT_PROP_TOPBUTTONPAD
Input: fix defuzzing logic
Input: sirfsoc-onkey - fix GPL v2 license string typo
Input: st-keyscan - fix 'defined but not used' compiler warnings
Input: synaptics - add min/max quirk for pnp-id LEN2002 (Edge E531)
Input: i8042 - add Acer Aspire 5710 to nomux blacklist
Input: ti_am335x_tsc - warn about incorrect spelling
Input: wacom - cleanup multitouch code when touch_max is 2
1871ee134b73 ("libata: support the ata host which implements a queue
depth less than 32") directly used ata_port->scsi_host->can_queue from
ata_qc_new() to determine the number of tags supported by the host;
unfortunately, SAS controllers doing SATA don't initialize ->scsi_host
leading to the following oops.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000058
IP: [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
PGD 0
Oops: 0002 [#1] SMP
Modules linked in: isci libsas scsi_transport_sas mgag200 drm_kms_helper ttm
CPU: 1 PID: 518 Comm: udevd Not tainted 3.16.0-rc6+ #62
Hardware name: Intel Corporation S2600CO/S2600CO, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
task: ffff880c1a00b280 ti: ffff88061a000000 task.ti: ffff88061a000000
RIP: 0010:[<ffffffff814e0618>] [<ffffffff814e0618>] ata_qc_new_init+0x188/0x1b0
RSP: 0018:ffff88061a003ae8 EFLAGS: 00010012
RAX: 0000000000000001 RBX: ffff88000241ca80 RCX: 00000000000000fa
RDX: 0000000000000020 RSI: 0000000000000020 RDI: ffff8806194aa298
RBP: ffff88061a003ae8 R08: ffff8806194a8000 R09: 0000000000000000
R10: 0000000000000000 R11: ffff88000241ca80 R12: ffff88061ad58200
R13: ffff8806194aa298 R14: ffffffff814e67a0 R15: ffff8806194a8000
FS: 00007f3ad7fe3840(0000) GS:ffff880627620000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000058 CR3: 000000061a118000 CR4: 00000000001407e0
Stack:
ffff88061a003b20 ffffffff814e96e1 ffff88000241ca80 ffff88061ad58200
ffff8800b6bf6000 ffff880c1c988000 ffff880619903850 ffff88061a003b68
ffffffffa0056ce1 ffff88061a003b48 0000000013d6e6f8 ffff88000241ca80
Call Trace:
[<ffffffff814e96e1>] ata_sas_queuecmd+0xa1/0x430
[<ffffffffa0056ce1>] sas_queuecommand+0x191/0x220 [libsas]
[<ffffffff8149afee>] scsi_dispatch_cmd+0x10e/0x300
[<ffffffff814a3bc5>] scsi_request_fn+0x2f5/0x550
[<ffffffff81317613>] __blk_run_queue+0x33/0x40
[<ffffffff8131781a>] queue_unplugged+0x2a/0x90
[<ffffffff8131ceb4>] blk_flush_plug_list+0x1b4/0x210
[<ffffffff8131d274>] blk_finish_plug+0x14/0x50
[<ffffffff8117eaa8>] __do_page_cache_readahead+0x198/0x1f0
[<ffffffff8117ee21>] force_page_cache_readahead+0x31/0x50
[<ffffffff8117ee7e>] page_cache_sync_readahead+0x3e/0x50
[<ffffffff81172ac6>] generic_file_read_iter+0x496/0x5a0
[<ffffffff81219897>] blkdev_read_iter+0x37/0x40
[<ffffffff811e307e>] new_sync_read+0x7e/0xb0
[<ffffffff811e3734>] vfs_read+0x94/0x170
[<ffffffff811e43c6>] SyS_read+0x46/0xb0
[<ffffffff811e33d1>] ? SyS_lseek+0x91/0xb0
[<ffffffff8171ee29>] system_call_fastpath+0x16/0x1b
Code: 00 00 00 88 50 29 83 7f 08 01 19 d2 83 e2 f0 83 ea 50 88 50 34 c6 81 1d 02 00 00 40 c6 81 17 02 00 00 00 5d c3 66 0f 1f 44 00 00 <89> 14 25 58 00 00 00
Fix it by introducing ata_host->n_tags which is initialized to
ATA_MAX_QUEUE - 1 in ata_host_init() for SAS controllers and set to
scsi_host_template->can_queue in ata_host_register() for !SAS ones.
As SAS hosts are never registered, this will give them the same
ATA_MAX_QUEUE - 1 as before. Note that we can't use
scsi_host->can_queue directly for SAS hosts anyway as they can go
higher than the libata maximum.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Mike Qiu <qiudayu@linux.vnet.ibm.com>
Reported-by: Jesse Brandeburg <jesse.brandeburg@gmail.com>
Reported-by: Peter Hurley <peter@hurleysoftware.com>
Reported-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Fixes: 1871ee134b73 ("libata: support the ata host which implements a queue depth less than 32")
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: stable@vger.kernel.org