commits
Pull two powerpc fixes from Ben Herrenschmidt:
"Here are a couple of fixes for 3.15. One from Anton fixes a nasty
regression I introduced when trying to fix a loss of irq_work whose
consequences is that we can completely lose timer interrupts on a
CPU... not pretty.
The other one is a change to our PCIe reset hook to use a firmware
call instead of direct config space accesses to trigger a fundamental
reset on the root port. This is necessary so that the FW gets a
chance to disable the link down error monitoring, which would
otherwise trip and cause subsequent fatal EEH error"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: irq work racing with timer interrupt can result in timer interrupt hang
powerpc/powernv: Reset root port in firmware
Pull two btrfs fixes from Chris Mason:
"This has two fixes that we've been testing for 3.16, but since both
are safe and fix real bugs, it makes sense to send for 3.15 instead"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: send, fix incorrect ref access when using extrefs
Btrfs: fix EIO on reading file after ioctl clone works on it
I am seeing an issue where a CPU running perf eventually hangs.
Traces show timer interrupts happening every 4 seconds even
when a userspace task is running on the CPU. /proc/timer_list
also shows pending hrtimers have not run in over an hour,
including the scheduler.
Looking closer, decrementers_next_tb is getting set to
0xffffffffffffffff, and at that point we will never take
a timer interrupt again.
In __timer_interrupt() we set decrementers_next_tb to
0xffffffffffffffff and rely on ->event_handler to update it:
*next_tb = ~(u64)0;
if (evt->event_handler)
evt->event_handler(evt);
In this case ->event_handler is hrtimer_interrupt. This will eventually
call back through the clockevents code with the next event to be
programmed:
static int decrementer_set_next_event(unsigned long evt,
struct clock_event_device *dev)
{
/* Don't adjust the decrementer if some irq work is pending */
if (test_irq_work_pending())
return 0;
__get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
If irq work came in between these two points, we will return
before updating decrementers_next_tb and we never process a timer
interrupt again.
This looks to have been introduced by 0215f7d8c53f (powerpc: Fix races
with irq_work). Fix it by removing the early exit and relying on
code later on in the function to force an early decrementer:
/* We may have raced with new irq work */
if (test_irq_work_pending())
set_dec(1);
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull two ceph fixes from Sage Weil:
"The first patch fixes a problem when we have a page count of 0 for
sendpage which is triggered by zfs. The second fixes a bug in CRUSH
that was resolved in the userland code a while back but fell through
the cracks on the kernel side"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
crush: decode and initialize chooseleaf_vary_r
libceph: fix corruption when using page_count 0 page in rbd
When running send, if an inode only has extended reference items
associated to it and no regular references, send.c:get_first_ref()
was incorrectly assuming the reference it found was of type
BTRFS_INODE_REF_KEY due to use of the wrong key variable.
This caused weird behaviour when using the found item has a regular
reference, such as weird path string, and occasionally (when lucky)
a crash:
[ 190.600652] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 190.600994] Modules linked in: btrfs xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc psmouse serio_raw evbug pcspkr i2c_piix4 e1000 floppy
[ 190.602565] CPU: 2 PID: 14520 Comm: btrfs Not tainted 3.13.0-fdm-btrfs-next-26+ #1
[ 190.602728] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 190.602868] task: ffff8800d447c920 ti: ffff8801fa79e000 task.ti: ffff8801fa79e000
[ 190.603030] RIP: 0010:[<ffffffff813266b4>] [<ffffffff813266b4>] memcpy+0x54/0x110
[ 190.603262] RSP: 0018:ffff8801fa79f880 EFLAGS: 00010202
[ 190.603395] RAX: ffff8800d4326e3f RBX: 000000000000036a RCX: ffff880000000000
[ 190.603553] RDX: 000000000000032a RSI: ffe708844042936a RDI: ffff8800d43271a9
[ 190.603710] RBP: ffff8801fa79f8c8 R08: 00000000003a4ef0 R09: 0000000000000000
[ 190.603867] R10: 793a4ef09f000000 R11: 9f0000000053726f R12: ffff8800d43271a9
[ 190.604020] R13: 0000160000000000 R14: ffff8802110134f0 R15: 000000000000036a
[ 190.604020] FS: 00007fb423d09b80(0000) GS:ffff880216200000(0000) knlGS:0000000000000000
[ 190.604020] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 190.604020] CR2: 00007fb4229d4b78 CR3: 00000001f5d76000 CR4: 00000000000006e0
[ 190.604020] Stack:
[ 190.604020] ffffffffa01f4d49 ffff8801fa79f8f0 00000000000009f9 ffff8801fa79f8c8
[ 190.604020] 00000000000009f9 ffff880211013260 000000000000f971 ffff88021147dba8
[ 190.604020] 00000000000009f9 ffff8801fa79f918 ffffffffa02367f5 ffff8801fa79f928
[ 190.604020] Call Trace:
[ 190.604020] [<ffffffffa01f4d49>] ? read_extent_buffer+0xb9/0x120 [btrfs]
[ 190.604020] [<ffffffffa02367f5>] fs_path_add_from_extent_buffer+0x45/0x60 [btrfs]
[ 190.604020] [<ffffffffa0238806>] get_first_ref+0x1f6/0x210 [btrfs]
[ 190.604020] [<ffffffffa0238994>] __get_cur_name_and_parent+0x174/0x3a0 [btrfs]
[ 190.604020] [<ffffffff8118df3d>] ? kmem_cache_alloc_trace+0x11d/0x1e0
[ 190.604020] [<ffffffffa0236674>] ? fs_path_alloc+0x24/0x60 [btrfs]
[ 190.604020] [<ffffffffa0238c91>] get_cur_path+0xd1/0x240 [btrfs]
(...)
Steps to reproduce (either crash or some weirdness like an odd path string):
mkfs.btrfs -f -O extref /dev/sdd
mount /dev/sdd /mnt
mkdir /mnt/testdir
touch /mnt/testdir/foobar
for i in `seq 1 2550`; do
ln /mnt/testdir/foobar /mnt/testdir/foobar_link_`printf "%04d" $i`
done
ln /mnt/testdir/foobar /mnt/testdir/final_foobar_name
rm -f /mnt/testdir/foobar
for i in `seq 1 2550`; do
rm -f /mnt/testdir/foobar_link_`printf "%04d" $i`
done
btrfs subvolume snapshot -r /mnt /mnt/mysnap
btrfs send /mnt/mysnap -f /tmp/mysnap.send
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Resetting root port has more stuff to do than that for PCIe switch
ports and we should have resetting root port done in firmware instead
of the kernel itself. The problem was introduced by commit 5b2e198e
("powerpc/powernv: Rework EEH reset").
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull xfs fixes from Dave Chinner:
"Code inspection of the XFS error number sign translations found a
bunch of issues, including returning incorrectly signed errors for
some data integrity operations.
These leak to userspace and result in applications not getting the
errors correctly reported. Hence they need fixing sooner rather than
later.
A couple of the bugs are in data integrity operations, a couple more
are in the new COLLAPSE_RANGE code. One of these came in through a
recent ext4 merge and so I had to update the base tree to 3.15-rc5
before fixing the issues"
* tag 'xfs-for-linus-3.15-rc6' of git://oss.sgi.com/xfs/xfs:
xfs: list_lru_init returns a negative error
xfs: negate xfs_icsb_init_counters error value
xfs: negate mount workqueue init error value
xfs: fix wrong err sign on xfs_set_acl()
xfs: fix wrong errno from xfs_initxattrs
xfs: correct error sign on COLLAPSE_RANGE errors
xfs: xfs_commit_metadata returns wrong errno
xfs: fix incorrect error sign in xfs_file_aio_read
xfs: xfs_dir_fsync() returns positive errno
Commit e2b149cc4ba0 ("crush: add chooseleaf_vary_r tunable") added the
crush_map::chooseleaf_vary_r field but missed the decode part. This
lead to misdirected requests caused by incorrect raw crush mapping
sets.
Fixes: http://tracker.ceph.com/issues/8226
Reported-and-Tested-by: Dmitry Smirnov <onlyjob@member.fsf.org>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
For inline data extent, we need to make its length aligned, otherwise,
we can get a phantom extent map which confuses readpages() to return -EIO.
This can be detected by xfstests/btrfs/035.
Reported-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
This patch fixes this section mismatch:
WARNING: vmlinux.o(.text+0x1efc4): Section mismatch in reference from
the function apm821xx_pciex_init_port_hw() to the function
.init.text:ppc4xx_pciex_wait_on_sdr.isra.9()
The function apm821xx_pciex_init_port_hw() references the function
__init ppc4xx_pciex_wait_on_sdr.isra.9(). This is often because
apm821xx_pciex_init_port_hw lacks a __init annotation or the
annotation of ppc4xx_pciex_wait_on_sdr.isra.9 is wrong.
apm821xx_pciex_init_port_hw is only referenced by a struct in
__initdata, so it should be safe to add __init to
apm821xx_pciex_init_port_hw.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull renameat2 arch support from Miklos Szeredi:
"I've collected architecture patches for the renameat2 syscall that
maintainers acked and/or asked me to queue.
This adds architecture support for the renameat2 syscall to m68k,
parisc, ia64 and through asm-generic to arc, arm64, c6x, hexagon,
metag, openrisc, score, tile, unicore32"
* 'renameat2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
scripts/checksyscalls.sh: Make renameat optional
asm-generic: Add renameat2 syscall
ia64: add renameat2 syscall
parisc: add renameat2 syscall
m68k: add renameat2 syscall
And we don't invert it properly when initialising the dquot lru
list.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:
https://github.com/zfsonlinux/spl/issues/241
http://tracker.ceph.com/issues/7790
The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.
This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.
Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
fs_path_ensure_buf is used to make sure our path buffers for
send are big enough for the path names as we construct them.
The buffer size is limited to 32K by the length field in
the struct.
But bugs in the path construction can end up trying to build
a huge buffer, and we'll do invalid memmmoves when the
buffer length field wraps.
This patch is step one, preventing the overflows.
Signed-off-by: Chris Mason <clm@fb.com>
When the guest cedes the vcpu or the vcpu has no guest to
run it naps. Clear the runlatch bit of the vcpu before
napping to indicate an idle cpu.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull iommu fixes from Joerg Roedel:
"Three fixes for the AMD IOMMU driver:
- fix a locking issue around get_user_pages()
- fix two issues with device aliasing and exclusion range handling"
* tag 'iommu-fixes-v3.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: fix enabling exclusion range for an exact device
iommu/amd: Take mmap_sem when calling get_user_pages
iommu/amd: Fix interrupt remapping for aliased devices
The new renameat2 syscall provides all the functionality of renameat
with an additional flags argument, so make renameat optional so that
future architectures can omit it without getting a warning.
This patch doesn't affect existing architectures.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: linux-arch@vger.kernel.org
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
If we had to retry on the profiles seqlock (due to a concurrent write), we
would set bits on the input flags that corresponded both to the current
profile and to previous values of the profile.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
The secondary threads in the core are kept offline before launching guests
in kvm on powerpc: "371fefd6f2dc4666:KVM: PPC: Allow book3s_hv guests to use
SMT processor modes."
Hence their runlatch bits are cleared. When the secondary threads are called
in to start a guest, their runlatch bits need to be set to indicate that they
are busy. The primary thread has its runlatch bit set though, but there is no
harm in setting this bit once again. Hence set the runlatch bit for all
threads before they start guest.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull iscsi_ibft fix from Konrad Rzeszutek Wilk:
"Fix iBFT regression on Broadcom NICs introduced in 3.2"
* tag 'stable/for-linus-3.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/ibft:
iscsi_ibft: Fix finding Broadcom specific ibft sign
set_device_exclusion_range(u16 devid, struct ivmd_header *m) enables
exclusion range for ONE device. IOMMU does not translate the access
to the exclusion range from the device.
The device is specified by input argument 'devid'. But 'devid' is not
passed to the actual set function set_dev_entry_bit(), instead
'm->devid' is passed. 'm->devid' does not specify the exact device
which needs enable the exclusion range. 'm->devid' represents DeviceID
field of IVMD, which has different meaning depends on IVMD type.
The caller init_exclusion_range() sets 'devid' for ONE device. When
m->type is equal to ACPI_IVMD_TYPE_ALL or ACPI_IVMD_TYPE_RANGE,
'm->devid' is not equal to 'devid'.
This patch fixes 'm->devid' to 'devid'.
Signed-off-by: Su Friendy <friendy.su@sony.com.cn>
Signed-off-by: Tamori Masahiro <Masahiro.Tamori@jp.sony.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Add the renameat2 syscall to the generic syscall list, which is used by the
following architectures: arc, arm64, c6x, hexagon, metag, openrisc, score,
tile, unicore32.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Pull x86 fixes from Peter Anvin:
"A somewhat unpleasantly large collection of small fixes. The big ones
are the __visible tree sweep and a fix for 'earlyprintk=efi,keep'. It
was using __init functions with predictably suboptimal results.
Another key fix is a build fix which would produce output that simply
would not decompress correctly in some configuration, due to the
existing Makefiles picking up an unfortunate local label and mistaking
it for the global symbol _end.
Additional fixes include the handling of 64-bit numbers when setting
the vdso data page (a latent bug which became manifest when i386
started exporting a vdso with time functions), a fix to the new MSR
manipulation accessors which would cause features to not get properly
unblocked, a build fix for 32-bit userland, and a few new platform
quirks"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, vdso, time: Cast tv_nsec to u64 for proper shifting in update_vsyscall()
x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macro
x86: Fix typo preventing msr_set/clear_bit from having an effect
x86/intel: Add quirk to disable HPET for the Baytrail platform
x86/hpet: Make boot_hpet_disable extern
x86-64, build: Fix stack protector Makefile breakage with 32-bit userland
x86/reboot: Add reboot quirk for Certec BPC600
asmlinkage: Add explicit __visible to drivers/*, lib/*, kernel/*
asmlinkage, x86: Add explicit __visible to arch/x86/*
asmlinkage: Revert "lto: Make asmlinkage __visible"
x86, build: Don't get confused by local symbols
x86/efi: earlyprintk=efi,keep fix
If skinny metadata is enabled and our first tree search fails to find a
skinny extent item, we may repeat a tree search for a "fat" extent item
(if the previous item in the leaf is not the "fat" extent we're looking
for). However we were not setting the new key's objectid to the right
value, as we previously used the same key variable to peek at the previous
item in the leaf, which has a different objectid. So just set the right
objectid to avoid modifying/deleting a wrong item if we repeat the tree
search.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Up until now we have been setting the runlatch bits for a busy CPU and
clearing it when a CPU enters idle state. The runlatch bit has thus
been consistent with the utilization of a CPU as long as the CPU is online.
However when a CPU is hotplugged out the runlatch bit is not cleared. It
needs to be cleared to indicate an unused CPU. Hence this patch has the
runlatch bit cleared for an offline CPU just before entering an idle state
and sets it immediately after it exits the idle state.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull SH driver fix from Simon Horman:
"Compile drivers/sh/pm_runtime.c if ARCH_SHMOBILE_MULTI
This resolves a regression introduced in v3.14 by commit bf98c1eac1d4
("ARM: Rename ARCH_SHMOBILE to ARCH_SHMOBILE_LEGACY")"
* tag 'renesas-sh-drivers-for-v3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
drivers: sh: compile drivers/sh/pm_runtime.c if ARCH_SHMOBILE_MULTI
Search for Broadcom specific ibft sign "BIFT"
along with other possible values on UEFI
This patch is fix for regression introduced in
“935a9fee51c945b8942be2d7b4bae069167b4886”.
https://lkml.org/lkml/2011/12/16/353
This impacts Broadcom CNA for iSCSI Boot on UEFI platform.
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
get_user_pages requires caller to hold a read lock on mmap_sem.
Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Pull xfs fixes from Dave Chinner:
"The main fix is adding support for default ACLs on O_TMPFILE opened
inodes to bring XFS into line with other filesystems. Metadata CRCs
are now also considered well enough tested to be fully supported, so
we're removing the shouty warnings issued at mount time for
filesystems with that format. And there's transaction block
reservation overrun fix.
Summary:
- fix a remote attribute size calculation bug that leads to a
transaction overrun
- add default ACLs to O_TMPFILE files
- Remove the EXPERIMENTAL tag from filesystems with metadata CRC
support"
* tag 'xfs-for-linus-3.15-rc5' of git://oss.sgi.com/xfs/xfs:
xfs: remote attribute overwrite causes transaction overrun
xfs: initialize default acls for ->tmpfile()
xfs: fully support v5 format filesystems
With tk->wall_to_monotonic.tv_nsec being a 32-bit value on 32-bit
systems, (tk->wall_to_monotonic.tv_nsec << tk->shift) in update_vsyscall()
may lose upper bits or, worse, add them since compiler will do this:
(u64)(tk->wall_to_monotonic.tv_nsec << tk->shift)
instead of
((u64)tk->wall_to_monotonic.tv_nsec << tk->shift)
So if, for example, tv_nsec is 0x800000 and shift is 8 we will end up
with 0xffffffff80000000 instead of 0x80000000. And then we are stuck in
the subsequent 'while' loop.
We need an explicit cast.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1399648287-15178-1-git-send-email-boris.ostrovsky@oracle.com
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org> # v3.14
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Currently, with inode cache enabled, we will reuse its inode id immediately
after unlinking file, we may hit something like following:
|->iput inode
|->return inode id into inode cache
|->create dir,fsync
|->power off
An easy way to reproduce this problem is:
mkfs.btrfs -f /dev/sdb
mount /dev/sdb /mnt -o inode_cache,commit=100
dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync
inode_id=`ls -i /mnt/data | awk '{print $1}'`
rm -f /mnt/data
i=1
while [ 1 ]
do
mkdir /mnt/dir_$i
test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'`
if [ $test1 -eq $inode_id ]
then
dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync
echo b > /proc/sysrq-trigger
fi
sleep 1
i=$(($i+1))
done
mount /dev/sdb /mnt
umount /dev/sdb
btrfs check /dev/sdb
We fix this problem by adding unlinked inode's id into pinned tree,
and we can not reuse them until committing transaction.
Cc: stable@vger.kernel.org
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
While testing memory hot-remove, I found following dead lock:
Process #1141 is drmgr, trying to remove some memory, i.e. memory499.
It holds the memory_hotplug_mutex, and blocks when trying to remove file
"online" under dir memory499, in kernfs_drain(), at
wait_event(root->deactivate_waitq,
atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
Process #1120 is trying to online memory499 by
echo 1 > memory499/online
In .kernfs_fop_write, it uses kernfs_get_active() to increase
&kn->active, thus blocking process #1141. While itself is blocked later
when trying to acquire memory_hotplug_mutex, which is held by process
The backtrace of both processes are shown below:
[<c000000001b18600>] 0xc000000001b18600
[<c000000000015044>] .__switch_to+0x144/0x200
[<c000000000263ca4>] .online_pages+0x74/0x7b0
[<c00000000055b40c>] .memory_subsys_online+0x9c/0x150
[<c00000000053cbe8>] .device_online+0xb8/0x120
[<c00000000053cd04>] .online_store+0xb4/0xc0
[<c000000000538ce4>] .dev_attr_store+0x64/0xa0
[<c00000000030f4ec>] .sysfs_kf_write+0x7c/0xb0
[<c00000000030e574>] .kernfs_fop_write+0x154/0x1e0
[<c000000000268450>] .vfs_write+0xe0/0x260
[<c000000000269144>] .SyS_write+0x64/0x110
[<c000000000009ffc>] syscall_exit+0x0/0x7c
[<c000000001b18600>] 0xc000000001b18600
[<c000000000015044>] .__switch_to+0x144/0x200
[<c00000000030be14>] .__kernfs_remove+0x204/0x300
[<c00000000030d428>] .kernfs_remove_by_name_ns+0x68/0xf0
[<c00000000030fb38>] .sysfs_remove_file_ns+0x38/0x60
[<c000000000539354>] .device_remove_attrs+0x54/0xc0
[<c000000000539fd8>] .device_del+0x158/0x250
[<c00000000053a104>] .device_unregister+0x34/0xa0
[<c00000000055bc14>] .unregister_memory_section+0x164/0x170
[<c00000000024ee18>] .__remove_pages+0x108/0x4c0
[<c00000000004b590>] .arch_remove_memory+0x60/0xc0
[<c00000000026446c>] .remove_memory+0x8c/0xe0
[<c00000000007f9f4>] .pseries_remove_memblock+0xd4/0x160
[<c00000000007fcfc>] .pseries_memory_notifier+0x27c/0x290
[<c0000000008ae6cc>] .notifier_call_chain+0x8c/0x100
[<c0000000000d858c>] .__blocking_notifier_call_chain+0x6c/0xe0
[<c00000000071ddec>] .of_property_notify+0x7c/0xc0
[<c00000000071ed3c>] .of_update_property+0x3c/0x1b0
[<c0000000000756cc>] .ofdt_write+0x3dc/0x740
[<c0000000002f60fc>] .proc_reg_write+0xac/0x110
[<c000000000268450>] .vfs_write+0xe0/0x260
[<c000000000269144>] .SyS_write+0x64/0x110
[<c000000000009ffc>] syscall_exit+0x0/0x7c
This patch uses lock_device_hotplug() to protect remove_memory() called
in pseries_remove_memblock(), which is also stated before function
remove_memory():
* NOTE: The caller must call lock_device_hotplug() to serialize hotplug
* and online/offline operations before this call, as required by
* try_offline_node().
*/
void __ref remove_memory(int nid, u64 start, u64 size)
With this lock held, the other process(#1120 above) trying to online the
memory block will retry the system call when calling
lock_device_hotplug_sysfs(), and finally find No such device error.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull media fixes from Mauro Carvalho Chehab:
"Most of the changes are drivers fixes (rtl28xuu, fc2580, ov7670,
davinci, gspca, s5p-fimc and s5c73m3).
There is also a compat32 fix and one infoleak fixup at the media
controller"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] V4L2: fix VIDIOC_CREATE_BUFS in 64- / 32-bit compatibility mode
[media] V4L2: ov7670: fix a wrong index, potentially Oopsing the kernel from user-space
[media] media-device: fix infoleak in ioctl media_enum_entities()
[media] fc2580: fix tuning failure on 32-bit arch
[media] Prefer gspca_sonixb over sn9c102 for all devices
[media] media: davinci: vpfe: make sure all the buffers unmapped and released
[media] staging: media: davinci: vpfe: make sure all the buffers are released
[media] media: davinci: vpbe_display: fix releasing of active buffers
[media] media: davinci: vpif_display: fix releasing of active buffers
[media] media: davinci: vpif_capture: fix releasing of active buffers
[media] s5p-fimc: Fix YUV422P depth
[media] s5c73m3: Add missing rename of v4l2_of_get_next_endpoint() function
[media] rtl28xxu: silence error log about disabled rtl2832_sdr module
[media] rtl28xxu: do not hard depend on staging SDR module
If the kernel is built to support multi-ARM configuration with shmobile
support built in, then drivers/sh is not built. This contains the PM
runtime code in drivers/sh/pm_runtime.c, which implicitly enables the
module clocks for all devices, and thus is quite essential.
Without this, the state of clocks depends on implicit reset state, or on
the bootloader.
If ARCH_SHMOBILE_MULTI then build the drivers/sh directory, but ensure that
bits that may conflict (drivers/sh/clk if the common clock framework is
enabled) or are not used (drivers/sh/intc), are not built.
Also, only enable the PM runtime code when actually running on a shmobile
SoCs that needs it.
ARCH_SHMOBILE_MULTI was added a while ago by commit
efacfce5f8a523457e9419a25d52fe39db00b26a ("ARM: shmobile: Introduce
ARCH_SHMOBILE_MULTI"), but drivers/sh was compiled for both
ARCH_SHMOBILE_LEGACY and ARCH_SHMOBILE_MULTI until commit
bf98c1eac1d4a6bcf00532e4fa41d8126cd6c187 ("ARM: Rename ARCH_SHMOBILE to
ARCH_SHMOBILE_LEGACY").
Inspired by a patch from Ben Dooks <ben.dooks@codethink.co.uk>.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
An apparent cut and paste error prevents the correct flags from being
set on the alias device resulting in MSI on conventional PCI devices
failing to work. This also produces error events from the IOMMU like:
AMD-Vi: Event logged [INVALID_DEVICE_REQUEST device=00:14.4 address=0x000000fdf8000000 flags=0x0a00]
Where 14.4 is a PCIe-to-PCI bridge with a device behind it trying to
use MSI interrupts.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Helge Deller <deller@gmx.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Pull tracing fixes from Steven Rostedt:
"This contains two fixes.
The first is a long standing bug that causes bogus data to show up in
the refcnt field of the module_refcnt tracepoint. It was introduced
by a merge conflict resolution back in 2.6.35-rc days.
The result should be 'refcnt = incs - decs', but instead it did
'refcnt = incs + decs'.
The second fix is to a bug that was introduced in this merge window
that allowed for a tracepoint funcs pointer to be used after it was
freed. Moving the location of where the probes are released solved
the problem"
* tag 'trace-fixes-v3.15-rc4-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracepoint: Fix use of tracepoint funcs after rcu free
trace: module: Maintain a valid user count
Commit e461fcb ("xfs: remote attribute lookups require the value
length") passes the remote attribute length in the xfs_da_args
structure on lookup so that CRC calculations and validity checking
can be performed correctly by related code. This, unfortunately has
the side effect of changing the args->valuelen parameter in cases
where it shouldn't.
That is, when we replace a remote attribute, the incoming
replacement stores the value and length in args->value and
args->valuelen, but then the lookup which finds the existing remote
attribute overwrites args->valuelen with the length of the remote
attribute being replaced. Hence when we go to create the new
attribute, we create it of the size of the existing remote
attribute, not the size it is supposed to be. When the new attribute
is much smaller than the old attribute, this results in a
transaction overrun and an ASSERT() failure on a debug kernel:
XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 331
Fix this by keeping the remote attribute value length separate to
the attribute value length in the xfs_da_args structure. The enables
us to pass the length of the remote attribute to be removed without
overwriting the new attribute's length.
Also, ensure that when we save remote block contexts for a later
rename we zero the original state variables so that we don't confuse
the state of the attribute to be removes with the state of the new
attribute that we just added. [Spotted by Brain Foster.]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The spuriously added semicolon didn't have any effect because the
macro isn't currently in use.
c0a639ad0bc6b178b46996bd1f821a04643e2bde
Signed-off-by: Andres Freund <andres@anarazel.de>
Link: http://lkml.kernel.org/r/1399598957-7011-3-git-send-email-andres@anarazel.de
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Fix possible memory leaks in the following error handling paths:
read_tree_block()
btrfs_recover_log_trees
btrfs_commit_super()
btrfs_find_orphan_roots()
btrfs_cleanup_fs_roots()
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
module_init should return 0 or a negative errno.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull staging driver fixes from Greg KH:
"Here are five staging driver fixes for 3.15-rc6 that resolve some
reported issues. They are for the imx and rtl8723au drivers"
* tag 'staging-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8723au: Do not reset wdev->iftype in netdev_close()
staging: rtl8723au: Use correct pipe type for USB interrupts
imx-drm: imx-tve: correct DDC property name to 'ddc-i2c-bus'
imx-drm: imx-drm-core: skip components whose parent device is disabled
imx-drm: imx-drm-core: fix imx_drm_encoder_get_mux_id
If a struct contains 64-bit fields, it is aligned on 64-bit boundaries
within containing structs in 64-bit compilations. This is the case with
struct v4l2_window, which contains pointers and is embedded into struct
v4l2_format, and that one is embedded into struct v4l2_create_buffers.
Unlike some other structs, used as a part of the kernel ABI as ioctl()
arguments, that are packed, these structs aren't packed. This isn't a
problem per se, but the ioctl-compat code for VIDIOC_CREATE_BUFS contains
a bug, that triggers in such 64-bit builds. That code wrongly assumes,
that in struct v4l2_create_buffers, struct v4l2_format immediately follows
the __u32 memory field, which in fact isn't the case. This bug wasn't
visible until now, because until recently hardly any applications used
this ioctl() and mostly embedded 32-bit only drivers implemented it. This
is changing now with addition of this ioctl() to some USB drivers, e.g.
UVC. This patch fixes the bug by copying parts of struct
v4l2_create_buffers separately.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: stable@vger.kernel.org
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Jie Liu <jeff.liu@oracle.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Pull input subsystem fixes from Dmitry Torokhov:
"Just a few fixups to various drivers"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elantech - fix touchpad initialization on Gigabyte U2442
Input: tca8418 - fix loading this driver as a module from a device tree
Input: bma150 - extend chip detection for bma180
Input: atkbd - fix keyboard not working on some LG laptops
Input: synaptics - add min/max quirk for ThinkPad Edge E431
Commit de7b2973903c "tracepoint: Use struct pointer instead of name hash
for reg/unreg tracepoints" introduces a use after free by calling
release_probes on the old struct tracepoint array before the newly
allocated array is published with rcu_assign_pointer. There is a race
window where tracepoints (RCU readers) can perform a
"use-after-grace-period-after-free", which shows up as a GPF in
stress-tests.
Link: http://lkml.kernel.org/r/53698021.5020108@oracle.com
Link: http://lkml.kernel.org/p/1399549669-25465-1-git-send-email-mathieu.desnoyers@efficios.com
Reported-by: Sasha Levin <sasha.levin@oracle.com>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Dave Jones <davej@redhat.com>
Fixes: de7b2973903c "tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints"
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The current tmpfile handler does not initialize default ACLs. Doing so
within xfs_vn_tmpfile() makes it roughly equivalent to xfs_vn_mknod(),
which is already used as a common create handler.
xfs_vn_mknod() does not currently have a mechanism to determine whether
to link the file into the namespace. Therefore, further abstract
xfs_vn_mknod() into a new xfs_generic_create() handler with a tmpfile
parameter. This new handler calls xfs_create_tmpfile() and d_tmpfile()
on the dentry when called via ->tmpfile().
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Due to a typo the msr accessor function introduced in
22085a66c2fab6cf9b9393c056a3600a6b4735de didn't have any lasting
effects because they accidentally wrote the old value back.
After c0a639ad0bc6b178b46996bd1f821a04643e2bde this at the very least
this causes cpuid limits not to be lifted on some cpus leading to
missing capabilities for those.
Signed-off-by: Andres Freund <andres@anarazel.de>
Link: http://lkml.kernel.org/r/1399598957-7011-2-git-send-email-andres@anarazel.de
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
When running stress test(including snapshots,balance,fstress), we trigger
the following BUG_ON() which is because we fail to start inode caching task.
[ 181.131945] kernel BUG at fs/btrfs/inode-map.c:179!
[ 181.137963] invalid opcode: 0000 [#1] SMP
[ 181.217096] CPU: 11 PID: 2532 Comm: btrfs Not tainted 3.14.0 #1
[ 181.240521] task: ffff88013b621b30 ti: ffff8800b6ada000 task.ti: ffff8800b6ada000
[ 181.367506] Call Trace:
[ 181.371107] [<ffffffffa036c1be>] btrfs_return_ino+0x9e/0x110 [btrfs]
[ 181.379191] [<ffffffffa038082b>] btrfs_evict_inode+0x46b/0x4c0 [btrfs]
[ 181.387464] [<ffffffff810b5a70>] ? autoremove_wake_function+0x40/0x40
[ 181.395642] [<ffffffff811dc5fe>] evict+0x9e/0x190
[ 181.401882] [<ffffffff811dcde3>] iput+0xf3/0x180
[ 181.408025] [<ffffffffa03812de>] btrfs_orphan_cleanup+0x1ee/0x430 [btrfs]
[ 181.416614] [<ffffffffa03a6abd>] btrfs_mksubvol.isra.29+0x3bd/0x450 [btrfs]
[ 181.425399] [<ffffffffa03a6cd6>] btrfs_ioctl_snap_create_transid+0x186/0x190 [btrfs]
[ 181.435059] [<ffffffffa03a6e3b>] btrfs_ioctl_snap_create_v2+0xeb/0x130 [btrfs]
[ 181.444148] [<ffffffffa03a9656>] btrfs_ioctl+0xf76/0x2b90 [btrfs]
[ 181.451971] [<ffffffff8117e565>] ? handle_mm_fault+0x475/0xe80
[ 181.459509] [<ffffffff8167ba0c>] ? __do_page_fault+0x1ec/0x520
[ 181.467046] [<ffffffff81185b35>] ? do_mmap_pgoff+0x2f5/0x3c0
[ 181.474393] [<ffffffff811d4da8>] do_vfs_ioctl+0x2d8/0x4b0
[ 181.481450] [<ffffffff811d5001>] SyS_ioctl+0x81/0xa0
[ 181.488021] [<ffffffff81680b69>] system_call_fastpath+0x16/0x1b
We should avoid triggering BUG_ON() here, instead, we output warning messages
and clear inode_cache option.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Bump the boot wrapper BOOT_COMMAND_LINE_SIZE to match the
kernel.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull driver core fixes from Greg KH:
"Here are two driver core (well, sysfs) fixes for 3.15-rc6 that resolve
some reported issues and a regression from 3.13"
* tag 'driver-core-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: make sure read buffer is zeroed
kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs
Pull two powerpc fixes from Ben Herrenschmidt:
"Here are a couple of fixes for 3.15. One from Anton fixes a nasty
regression I introduced when trying to fix a loss of irq_work whose
consequences is that we can completely lose timer interrupts on a
CPU... not pretty.
The other one is a change to our PCIe reset hook to use a firmware
call instead of direct config space accesses to trigger a fundamental
reset on the root port. This is necessary so that the FW gets a
chance to disable the link down error monitoring, which would
otherwise trip and cause subsequent fatal EEH error"
* 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc:
powerpc: irq work racing with timer interrupt can result in timer interrupt hang
powerpc/powernv: Reset root port in firmware
Pull two btrfs fixes from Chris Mason:
"This has two fixes that we've been testing for 3.16, but since both
are safe and fix real bugs, it makes sense to send for 3.15 instead"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: send, fix incorrect ref access when using extrefs
Btrfs: fix EIO on reading file after ioctl clone works on it
I am seeing an issue where a CPU running perf eventually hangs.
Traces show timer interrupts happening every 4 seconds even
when a userspace task is running on the CPU. /proc/timer_list
also shows pending hrtimers have not run in over an hour,
including the scheduler.
Looking closer, decrementers_next_tb is getting set to
0xffffffffffffffff, and at that point we will never take
a timer interrupt again.
In __timer_interrupt() we set decrementers_next_tb to
0xffffffffffffffff and rely on ->event_handler to update it:
*next_tb = ~(u64)0;
if (evt->event_handler)
evt->event_handler(evt);
In this case ->event_handler is hrtimer_interrupt. This will eventually
call back through the clockevents code with the next event to be
programmed:
static int decrementer_set_next_event(unsigned long evt,
struct clock_event_device *dev)
{
/* Don't adjust the decrementer if some irq work is pending */
if (test_irq_work_pending())
return 0;
__get_cpu_var(decrementers_next_tb) = get_tb_or_rtc() + evt;
If irq work came in between these two points, we will return
before updating decrementers_next_tb and we never process a timer
interrupt again.
This looks to have been introduced by 0215f7d8c53f (powerpc: Fix races
with irq_work). Fix it by removing the early exit and relying on
code later on in the function to force an early decrementer:
/* We may have raced with new irq work */
if (test_irq_work_pending())
set_dec(1);
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: stable@vger.kernel.org # 3.14+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull two ceph fixes from Sage Weil:
"The first patch fixes a problem when we have a page count of 0 for
sendpage which is triggered by zfs. The second fixes a bug in CRUSH
that was resolved in the userland code a while back but fell through
the cracks on the kernel side"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
crush: decode and initialize chooseleaf_vary_r
libceph: fix corruption when using page_count 0 page in rbd
When running send, if an inode only has extended reference items
associated to it and no regular references, send.c:get_first_ref()
was incorrectly assuming the reference it found was of type
BTRFS_INODE_REF_KEY due to use of the wrong key variable.
This caused weird behaviour when using the found item has a regular
reference, such as weird path string, and occasionally (when lucky)
a crash:
[ 190.600652] general protection fault: 0000 [#1] SMP DEBUG_PAGEALLOC
[ 190.600994] Modules linked in: btrfs xor raid6_pq binfmt_misc nfsd auth_rpcgss oid_registry nfs_acl nfs lockd fscache sunrpc psmouse serio_raw evbug pcspkr i2c_piix4 e1000 floppy
[ 190.602565] CPU: 2 PID: 14520 Comm: btrfs Not tainted 3.13.0-fdm-btrfs-next-26+ #1
[ 190.602728] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 190.602868] task: ffff8800d447c920 ti: ffff8801fa79e000 task.ti: ffff8801fa79e000
[ 190.603030] RIP: 0010:[<ffffffff813266b4>] [<ffffffff813266b4>] memcpy+0x54/0x110
[ 190.603262] RSP: 0018:ffff8801fa79f880 EFLAGS: 00010202
[ 190.603395] RAX: ffff8800d4326e3f RBX: 000000000000036a RCX: ffff880000000000
[ 190.603553] RDX: 000000000000032a RSI: ffe708844042936a RDI: ffff8800d43271a9
[ 190.603710] RBP: ffff8801fa79f8c8 R08: 00000000003a4ef0 R09: 0000000000000000
[ 190.603867] R10: 793a4ef09f000000 R11: 9f0000000053726f R12: ffff8800d43271a9
[ 190.604020] R13: 0000160000000000 R14: ffff8802110134f0 R15: 000000000000036a
[ 190.604020] FS: 00007fb423d09b80(0000) GS:ffff880216200000(0000) knlGS:0000000000000000
[ 190.604020] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 190.604020] CR2: 00007fb4229d4b78 CR3: 00000001f5d76000 CR4: 00000000000006e0
[ 190.604020] Stack:
[ 190.604020] ffffffffa01f4d49 ffff8801fa79f8f0 00000000000009f9 ffff8801fa79f8c8
[ 190.604020] 00000000000009f9 ffff880211013260 000000000000f971 ffff88021147dba8
[ 190.604020] 00000000000009f9 ffff8801fa79f918 ffffffffa02367f5 ffff8801fa79f928
[ 190.604020] Call Trace:
[ 190.604020] [<ffffffffa01f4d49>] ? read_extent_buffer+0xb9/0x120 [btrfs]
[ 190.604020] [<ffffffffa02367f5>] fs_path_add_from_extent_buffer+0x45/0x60 [btrfs]
[ 190.604020] [<ffffffffa0238806>] get_first_ref+0x1f6/0x210 [btrfs]
[ 190.604020] [<ffffffffa0238994>] __get_cur_name_and_parent+0x174/0x3a0 [btrfs]
[ 190.604020] [<ffffffff8118df3d>] ? kmem_cache_alloc_trace+0x11d/0x1e0
[ 190.604020] [<ffffffffa0236674>] ? fs_path_alloc+0x24/0x60 [btrfs]
[ 190.604020] [<ffffffffa0238c91>] get_cur_path+0xd1/0x240 [btrfs]
(...)
Steps to reproduce (either crash or some weirdness like an odd path string):
mkfs.btrfs -f -O extref /dev/sdd
mount /dev/sdd /mnt
mkdir /mnt/testdir
touch /mnt/testdir/foobar
for i in `seq 1 2550`; do
ln /mnt/testdir/foobar /mnt/testdir/foobar_link_`printf "%04d" $i`
done
ln /mnt/testdir/foobar /mnt/testdir/final_foobar_name
rm -f /mnt/testdir/foobar
for i in `seq 1 2550`; do
rm -f /mnt/testdir/foobar_link_`printf "%04d" $i`
done
btrfs subvolume snapshot -r /mnt /mnt/mysnap
btrfs send /mnt/mysnap -f /tmp/mysnap.send
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Reviewed-by: Liu Bo <bo.li.liu@oracle.com>
Resetting root port has more stuff to do than that for PCIe switch
ports and we should have resetting root port done in firmware instead
of the kernel itself. The problem was introduced by commit 5b2e198e
("powerpc/powernv: Rework EEH reset").
Cc: linux-stable <stable@vger.kernel.org>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull xfs fixes from Dave Chinner:
"Code inspection of the XFS error number sign translations found a
bunch of issues, including returning incorrectly signed errors for
some data integrity operations.
These leak to userspace and result in applications not getting the
errors correctly reported. Hence they need fixing sooner rather than
later.
A couple of the bugs are in data integrity operations, a couple more
are in the new COLLAPSE_RANGE code. One of these came in through a
recent ext4 merge and so I had to update the base tree to 3.15-rc5
before fixing the issues"
* tag 'xfs-for-linus-3.15-rc6' of git://oss.sgi.com/xfs/xfs:
xfs: list_lru_init returns a negative error
xfs: negate xfs_icsb_init_counters error value
xfs: negate mount workqueue init error value
xfs: fix wrong err sign on xfs_set_acl()
xfs: fix wrong errno from xfs_initxattrs
xfs: correct error sign on COLLAPSE_RANGE errors
xfs: xfs_commit_metadata returns wrong errno
xfs: fix incorrect error sign in xfs_file_aio_read
xfs: xfs_dir_fsync() returns positive errno
Commit e2b149cc4ba0 ("crush: add chooseleaf_vary_r tunable") added the
crush_map::chooseleaf_vary_r field but missed the decode part. This
lead to misdirected requests caused by incorrect raw crush mapping
sets.
Fixes: http://tracker.ceph.com/issues/8226
Reported-and-Tested-by: Dmitry Smirnov <onlyjob@member.fsf.org>
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Sage Weil <sage@inktank.com>
For inline data extent, we need to make its length aligned, otherwise,
we can get a phantom extent map which confuses readpages() to return -EIO.
This can be detected by xfstests/btrfs/035.
Reported-by: David Disseldorp <ddiss@suse.de>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
Signed-off-by: Chris Mason <clm@fb.com>
This patch fixes this section mismatch:
WARNING: vmlinux.o(.text+0x1efc4): Section mismatch in reference from
the function apm821xx_pciex_init_port_hw() to the function
.init.text:ppc4xx_pciex_wait_on_sdr.isra.9()
The function apm821xx_pciex_init_port_hw() references the function
__init ppc4xx_pciex_wait_on_sdr.isra.9(). This is often because
apm821xx_pciex_init_port_hw lacks a __init annotation or the
annotation of ppc4xx_pciex_wait_on_sdr.isra.9 is wrong.
apm821xx_pciex_init_port_hw is only referenced by a struct in
__initdata, so it should be safe to add __init to
apm821xx_pciex_init_port_hw.
Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull renameat2 arch support from Miklos Szeredi:
"I've collected architecture patches for the renameat2 syscall that
maintainers acked and/or asked me to queue.
This adds architecture support for the renameat2 syscall to m68k,
parisc, ia64 and through asm-generic to arc, arm64, c6x, hexagon,
metag, openrisc, score, tile, unicore32"
* 'renameat2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
scripts/checksyscalls.sh: Make renameat optional
asm-generic: Add renameat2 syscall
ia64: add renameat2 syscall
parisc: add renameat2 syscall
m68k: add renameat2 syscall
It has been reported that using ZFSonLinux on rbd will result in memory
corruption. The bug report can be found here:
https://github.com/zfsonlinux/spl/issues/241
http://tracker.ceph.com/issues/7790
The reason is that ZFS will send pages with page_count 0 into rbd, which in
turns send them to tcp_sendpage. However, tcp_sendpage cannot deal with
page_count 0, as it will do get_page and put_page, and erroneously free the
page.
This type of issue has been noted before, and handled in iscsi, drbd,
etc. So, rbd should also handle this. This fix address this issue by fall back
to slower sendmsg when page_count 0 detected.
Cc: Sage Weil <sage@inktank.com>
Cc: Yehuda Sadeh <yehuda@inktank.com>
Cc: stable@vger.kernel.org
Signed-off-by: Chunwei Chen <tuxoko@gmail.com>
Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>
fs_path_ensure_buf is used to make sure our path buffers for
send are big enough for the path names as we construct them.
The buffer size is limited to 32K by the length field in
the struct.
But bugs in the path construction can end up trying to build
a huge buffer, and we'll do invalid memmmoves when the
buffer length field wraps.
This patch is step one, preventing the overflows.
Signed-off-by: Chris Mason <clm@fb.com>
When the guest cedes the vcpu or the vcpu has no guest to
run it naps. Clear the runlatch bit of the vcpu before
napping to indicate an idle cpu.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull iommu fixes from Joerg Roedel:
"Three fixes for the AMD IOMMU driver:
- fix a locking issue around get_user_pages()
- fix two issues with device aliasing and exclusion range handling"
* tag 'iommu-fixes-v3.15-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu/amd: fix enabling exclusion range for an exact device
iommu/amd: Take mmap_sem when calling get_user_pages
iommu/amd: Fix interrupt remapping for aliased devices
The new renameat2 syscall provides all the functionality of renameat
with an additional flags argument, so make renameat optional so that
future architectures can omit it without getting a warning.
This patch doesn't affect existing architectures.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: linux-arch@vger.kernel.org
The secondary threads in the core are kept offline before launching guests
in kvm on powerpc: "371fefd6f2dc4666:KVM: PPC: Allow book3s_hv guests to use
SMT processor modes."
Hence their runlatch bits are cleared. When the secondary threads are called
in to start a guest, their runlatch bits need to be set to indicate that they
are busy. The primary thread has its runlatch bit set though, but there is no
harm in setting this bit once again. Hence set the runlatch bit for all
threads before they start guest.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
set_device_exclusion_range(u16 devid, struct ivmd_header *m) enables
exclusion range for ONE device. IOMMU does not translate the access
to the exclusion range from the device.
The device is specified by input argument 'devid'. But 'devid' is not
passed to the actual set function set_dev_entry_bit(), instead
'm->devid' is passed. 'm->devid' does not specify the exact device
which needs enable the exclusion range. 'm->devid' represents DeviceID
field of IVMD, which has different meaning depends on IVMD type.
The caller init_exclusion_range() sets 'devid' for ONE device. When
m->type is equal to ACPI_IVMD_TYPE_ALL or ACPI_IVMD_TYPE_RANGE,
'm->devid' is not equal to 'devid'.
This patch fixes 'm->devid' to 'devid'.
Signed-off-by: Su Friendy <friendy.su@sony.com.cn>
Signed-off-by: Tamori Masahiro <Masahiro.Tamori@jp.sony.com>
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Add the renameat2 syscall to the generic syscall list, which is used by the
following architectures: arc, arm64, c6x, hexagon, metag, openrisc, score,
tile, unicore32.
Signed-off-by: James Hogan <james.hogan@imgtec.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Cc: linux-arch@vger.kernel.org
Cc: Vineet Gupta <vgupta@synopsys.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Salter <msalter@redhat.com>
Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
Cc: Richard Kuo <rkuo@codeaurora.org>
Cc: linux-hexagon@vger.kernel.org
Cc: linux-metag@vger.kernel.org
Cc: Jonas Bonn <jonas@southpole.se>
Cc: Chen Liqin <liqin.linux@gmail.com>
Cc: Lennox Wu <lennox.wu@gmail.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
Pull x86 fixes from Peter Anvin:
"A somewhat unpleasantly large collection of small fixes. The big ones
are the __visible tree sweep and a fix for 'earlyprintk=efi,keep'. It
was using __init functions with predictably suboptimal results.
Another key fix is a build fix which would produce output that simply
would not decompress correctly in some configuration, due to the
existing Makefiles picking up an unfortunate local label and mistaking
it for the global symbol _end.
Additional fixes include the handling of 64-bit numbers when setting
the vdso data page (a latent bug which became manifest when i386
started exporting a vdso with time functions), a fix to the new MSR
manipulation accessors which would cause features to not get properly
unblocked, a build fix for 32-bit userland, and a few new platform
quirks"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, vdso, time: Cast tv_nsec to u64 for proper shifting in update_vsyscall()
x86: Fix typo in MSR_IA32_MISC_ENABLE_LIMIT_CPUID macro
x86: Fix typo preventing msr_set/clear_bit from having an effect
x86/intel: Add quirk to disable HPET for the Baytrail platform
x86/hpet: Make boot_hpet_disable extern
x86-64, build: Fix stack protector Makefile breakage with 32-bit userland
x86/reboot: Add reboot quirk for Certec BPC600
asmlinkage: Add explicit __visible to drivers/*, lib/*, kernel/*
asmlinkage, x86: Add explicit __visible to arch/x86/*
asmlinkage: Revert "lto: Make asmlinkage __visible"
x86, build: Don't get confused by local symbols
x86/efi: earlyprintk=efi,keep fix
If skinny metadata is enabled and our first tree search fails to find a
skinny extent item, we may repeat a tree search for a "fat" extent item
(if the previous item in the leaf is not the "fat" extent we're looking
for). However we were not setting the new key's objectid to the right
value, as we previously used the same key variable to peek at the previous
item in the leaf, which has a different objectid. So just set the right
objectid to avoid modifying/deleting a wrong item if we repeat the tree
search.
Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com>
Signed-off-by: Chris Mason <clm@fb.com>
Up until now we have been setting the runlatch bits for a busy CPU and
clearing it when a CPU enters idle state. The runlatch bit has thus
been consistent with the utilization of a CPU as long as the CPU is online.
However when a CPU is hotplugged out the runlatch bit is not cleared. It
needs to be cleared to indicate an unused CPU. Hence this patch has the
runlatch bit cleared for an offline CPU just before entering an idle state
and sets it immediately after it exits the idle state.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull SH driver fix from Simon Horman:
"Compile drivers/sh/pm_runtime.c if ARCH_SHMOBILE_MULTI
This resolves a regression introduced in v3.14 by commit bf98c1eac1d4
("ARM: Rename ARCH_SHMOBILE to ARCH_SHMOBILE_LEGACY")"
* tag 'renesas-sh-drivers-for-v3.15' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas:
drivers: sh: compile drivers/sh/pm_runtime.c if ARCH_SHMOBILE_MULTI
Search for Broadcom specific ibft sign "BIFT"
along with other possible values on UEFI
This patch is fix for regression introduced in
“935a9fee51c945b8942be2d7b4bae069167b4886”.
https://lkml.org/lkml/2011/12/16/353
This impacts Broadcom CNA for iSCSI Boot on UEFI platform.
Signed-off-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Pull xfs fixes from Dave Chinner:
"The main fix is adding support for default ACLs on O_TMPFILE opened
inodes to bring XFS into line with other filesystems. Metadata CRCs
are now also considered well enough tested to be fully supported, so
we're removing the shouty warnings issued at mount time for
filesystems with that format. And there's transaction block
reservation overrun fix.
Summary:
- fix a remote attribute size calculation bug that leads to a
transaction overrun
- add default ACLs to O_TMPFILE files
- Remove the EXPERIMENTAL tag from filesystems with metadata CRC
support"
* tag 'xfs-for-linus-3.15-rc5' of git://oss.sgi.com/xfs/xfs:
xfs: remote attribute overwrite causes transaction overrun
xfs: initialize default acls for ->tmpfile()
xfs: fully support v5 format filesystems
With tk->wall_to_monotonic.tv_nsec being a 32-bit value on 32-bit
systems, (tk->wall_to_monotonic.tv_nsec << tk->shift) in update_vsyscall()
may lose upper bits or, worse, add them since compiler will do this:
(u64)(tk->wall_to_monotonic.tv_nsec << tk->shift)
instead of
((u64)tk->wall_to_monotonic.tv_nsec << tk->shift)
So if, for example, tv_nsec is 0x800000 and shift is 8 we will end up
with 0xffffffff80000000 instead of 0x80000000. And then we are stuck in
the subsequent 'while' loop.
We need an explicit cast.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: http://lkml.kernel.org/r/1399648287-15178-1-git-send-email-boris.ostrovsky@oracle.com
Acked-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: <stable@vger.kernel.org> # v3.14
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Currently, with inode cache enabled, we will reuse its inode id immediately
after unlinking file, we may hit something like following:
|->iput inode
|->return inode id into inode cache
|->create dir,fsync
|->power off
An easy way to reproduce this problem is:
mkfs.btrfs -f /dev/sdb
mount /dev/sdb /mnt -o inode_cache,commit=100
dd if=/dev/zero of=/mnt/data bs=1M count=10 oflag=sync
inode_id=`ls -i /mnt/data | awk '{print $1}'`
rm -f /mnt/data
i=1
while [ 1 ]
do
mkdir /mnt/dir_$i
test1=`stat /mnt/dir_$i | grep Inode: | awk '{print $4}'`
if [ $test1 -eq $inode_id ]
then
dd if=/dev/zero of=/mnt/dir_$i/data bs=1M count=1 oflag=sync
echo b > /proc/sysrq-trigger
fi
sleep 1
i=$(($i+1))
done
mount /dev/sdb /mnt
umount /dev/sdb
btrfs check /dev/sdb
We fix this problem by adding unlinked inode's id into pinned tree,
and we can not reuse them until committing transaction.
Cc: stable@vger.kernel.org
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
While testing memory hot-remove, I found following dead lock:
Process #1141 is drmgr, trying to remove some memory, i.e. memory499.
It holds the memory_hotplug_mutex, and blocks when trying to remove file
"online" under dir memory499, in kernfs_drain(), at
wait_event(root->deactivate_waitq,
atomic_read(&kn->active) == KN_DEACTIVATED_BIAS);
Process #1120 is trying to online memory499 by
echo 1 > memory499/online
In .kernfs_fop_write, it uses kernfs_get_active() to increase
&kn->active, thus blocking process #1141. While itself is blocked later
when trying to acquire memory_hotplug_mutex, which is held by process
The backtrace of both processes are shown below:
[<c000000001b18600>] 0xc000000001b18600
[<c000000000015044>] .__switch_to+0x144/0x200
[<c000000000263ca4>] .online_pages+0x74/0x7b0
[<c00000000055b40c>] .memory_subsys_online+0x9c/0x150
[<c00000000053cbe8>] .device_online+0xb8/0x120
[<c00000000053cd04>] .online_store+0xb4/0xc0
[<c000000000538ce4>] .dev_attr_store+0x64/0xa0
[<c00000000030f4ec>] .sysfs_kf_write+0x7c/0xb0
[<c00000000030e574>] .kernfs_fop_write+0x154/0x1e0
[<c000000000268450>] .vfs_write+0xe0/0x260
[<c000000000269144>] .SyS_write+0x64/0x110
[<c000000000009ffc>] syscall_exit+0x0/0x7c
[<c000000001b18600>] 0xc000000001b18600
[<c000000000015044>] .__switch_to+0x144/0x200
[<c00000000030be14>] .__kernfs_remove+0x204/0x300
[<c00000000030d428>] .kernfs_remove_by_name_ns+0x68/0xf0
[<c00000000030fb38>] .sysfs_remove_file_ns+0x38/0x60
[<c000000000539354>] .device_remove_attrs+0x54/0xc0
[<c000000000539fd8>] .device_del+0x158/0x250
[<c00000000053a104>] .device_unregister+0x34/0xa0
[<c00000000055bc14>] .unregister_memory_section+0x164/0x170
[<c00000000024ee18>] .__remove_pages+0x108/0x4c0
[<c00000000004b590>] .arch_remove_memory+0x60/0xc0
[<c00000000026446c>] .remove_memory+0x8c/0xe0
[<c00000000007f9f4>] .pseries_remove_memblock+0xd4/0x160
[<c00000000007fcfc>] .pseries_memory_notifier+0x27c/0x290
[<c0000000008ae6cc>] .notifier_call_chain+0x8c/0x100
[<c0000000000d858c>] .__blocking_notifier_call_chain+0x6c/0xe0
[<c00000000071ddec>] .of_property_notify+0x7c/0xc0
[<c00000000071ed3c>] .of_update_property+0x3c/0x1b0
[<c0000000000756cc>] .ofdt_write+0x3dc/0x740
[<c0000000002f60fc>] .proc_reg_write+0xac/0x110
[<c000000000268450>] .vfs_write+0xe0/0x260
[<c000000000269144>] .SyS_write+0x64/0x110
[<c000000000009ffc>] syscall_exit+0x0/0x7c
This patch uses lock_device_hotplug() to protect remove_memory() called
in pseries_remove_memblock(), which is also stated before function
remove_memory():
* NOTE: The caller must call lock_device_hotplug() to serialize hotplug
* and online/offline operations before this call, as required by
* try_offline_node().
*/
void __ref remove_memory(int nid, u64 start, u64 size)
With this lock held, the other process(#1120 above) trying to online the
memory block will retry the system call when calling
lock_device_hotplug_sysfs(), and finally find No such device error.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull media fixes from Mauro Carvalho Chehab:
"Most of the changes are drivers fixes (rtl28xuu, fc2580, ov7670,
davinci, gspca, s5p-fimc and s5c73m3).
There is also a compat32 fix and one infoleak fixup at the media
controller"
* 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
[media] V4L2: fix VIDIOC_CREATE_BUFS in 64- / 32-bit compatibility mode
[media] V4L2: ov7670: fix a wrong index, potentially Oopsing the kernel from user-space
[media] media-device: fix infoleak in ioctl media_enum_entities()
[media] fc2580: fix tuning failure on 32-bit arch
[media] Prefer gspca_sonixb over sn9c102 for all devices
[media] media: davinci: vpfe: make sure all the buffers unmapped and released
[media] staging: media: davinci: vpfe: make sure all the buffers are released
[media] media: davinci: vpbe_display: fix releasing of active buffers
[media] media: davinci: vpif_display: fix releasing of active buffers
[media] media: davinci: vpif_capture: fix releasing of active buffers
[media] s5p-fimc: Fix YUV422P depth
[media] s5c73m3: Add missing rename of v4l2_of_get_next_endpoint() function
[media] rtl28xxu: silence error log about disabled rtl2832_sdr module
[media] rtl28xxu: do not hard depend on staging SDR module
If the kernel is built to support multi-ARM configuration with shmobile
support built in, then drivers/sh is not built. This contains the PM
runtime code in drivers/sh/pm_runtime.c, which implicitly enables the
module clocks for all devices, and thus is quite essential.
Without this, the state of clocks depends on implicit reset state, or on
the bootloader.
If ARCH_SHMOBILE_MULTI then build the drivers/sh directory, but ensure that
bits that may conflict (drivers/sh/clk if the common clock framework is
enabled) or are not used (drivers/sh/intc), are not built.
Also, only enable the PM runtime code when actually running on a shmobile
SoCs that needs it.
ARCH_SHMOBILE_MULTI was added a while ago by commit
efacfce5f8a523457e9419a25d52fe39db00b26a ("ARM: shmobile: Introduce
ARCH_SHMOBILE_MULTI"), but drivers/sh was compiled for both
ARCH_SHMOBILE_LEGACY and ARCH_SHMOBILE_MULTI until commit
bf98c1eac1d4a6bcf00532e4fa41d8126cd6c187 ("ARM: Rename ARCH_SHMOBILE to
ARCH_SHMOBILE_LEGACY").
Inspired by a patch from Ben Dooks <ben.dooks@codethink.co.uk>.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Simon Horman <horms+renesas@verge.net.au>
An apparent cut and paste error prevents the correct flags from being
set on the alias device resulting in MSI on conventional PCI devices
failing to work. This also produces error events from the IOMMU like:
AMD-Vi: Event logged [INVALID_DEVICE_REQUEST device=00:14.4 address=0x000000fdf8000000 flags=0x0a00]
Where 14.4 is a PCIe-to-PCI bridge with a device behind it trying to
use MSI interrupts.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Joerg Roedel <joro@8bytes.org>
Pull tracing fixes from Steven Rostedt:
"This contains two fixes.
The first is a long standing bug that causes bogus data to show up in
the refcnt field of the module_refcnt tracepoint. It was introduced
by a merge conflict resolution back in 2.6.35-rc days.
The result should be 'refcnt = incs - decs', but instead it did
'refcnt = incs + decs'.
The second fix is to a bug that was introduced in this merge window
that allowed for a tracepoint funcs pointer to be used after it was
freed. Moving the location of where the probes are released solved
the problem"
* tag 'trace-fixes-v3.15-rc4-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracepoint: Fix use of tracepoint funcs after rcu free
trace: module: Maintain a valid user count
Commit e461fcb ("xfs: remote attribute lookups require the value
length") passes the remote attribute length in the xfs_da_args
structure on lookup so that CRC calculations and validity checking
can be performed correctly by related code. This, unfortunately has
the side effect of changing the args->valuelen parameter in cases
where it shouldn't.
That is, when we replace a remote attribute, the incoming
replacement stores the value and length in args->value and
args->valuelen, but then the lookup which finds the existing remote
attribute overwrites args->valuelen with the length of the remote
attribute being replaced. Hence when we go to create the new
attribute, we create it of the size of the existing remote
attribute, not the size it is supposed to be. When the new attribute
is much smaller than the old attribute, this results in a
transaction overrun and an ASSERT() failure on a debug kernel:
XFS: Assertion failed: tp->t_blk_res_used <= tp->t_blk_res, file: fs/xfs/xfs_trans.c, line: 331
Fix this by keeping the remote attribute value length separate to
the attribute value length in the xfs_da_args structure. The enables
us to pass the length of the remote attribute to be removed without
overwriting the new attribute's length.
Also, ensure that when we save remote block contexts for a later
rename we zero the original state variables so that we don't confuse
the state of the attribute to be removes with the state of the new
attribute that we just added. [Spotted by Brain Foster.]
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
The spuriously added semicolon didn't have any effect because the
macro isn't currently in use.
c0a639ad0bc6b178b46996bd1f821a04643e2bde
Signed-off-by: Andres Freund <andres@anarazel.de>
Link: http://lkml.kernel.org/r/1399598957-7011-3-git-send-email-andres@anarazel.de
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Pull staging driver fixes from Greg KH:
"Here are five staging driver fixes for 3.15-rc6 that resolve some
reported issues. They are for the imx and rtl8723au drivers"
* tag 'staging-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
staging: rtl8723au: Do not reset wdev->iftype in netdev_close()
staging: rtl8723au: Use correct pipe type for USB interrupts
imx-drm: imx-tve: correct DDC property name to 'ddc-i2c-bus'
imx-drm: imx-drm-core: skip components whose parent device is disabled
imx-drm: imx-drm-core: fix imx_drm_encoder_get_mux_id
If a struct contains 64-bit fields, it is aligned on 64-bit boundaries
within containing structs in 64-bit compilations. This is the case with
struct v4l2_window, which contains pointers and is embedded into struct
v4l2_format, and that one is embedded into struct v4l2_create_buffers.
Unlike some other structs, used as a part of the kernel ABI as ioctl()
arguments, that are packed, these structs aren't packed. This isn't a
problem per se, but the ioctl-compat code for VIDIOC_CREATE_BUFS contains
a bug, that triggers in such 64-bit builds. That code wrongly assumes,
that in struct v4l2_create_buffers, struct v4l2_format immediately follows
the __u32 memory field, which in fact isn't the case. This bug wasn't
visible until now, because until recently hardly any applications used
this ioctl() and mostly embedded 32-bit only drivers implemented it. This
is changing now with addition of this ioctl() to some USB drivers, e.g.
UVC. This patch fixes the bug by copying parts of struct
v4l2_create_buffers separately.
Signed-off-by: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Acked-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com>
Cc: stable@vger.kernel.org
Pull input subsystem fixes from Dmitry Torokhov:
"Just a few fixups to various drivers"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elantech - fix touchpad initialization on Gigabyte U2442
Input: tca8418 - fix loading this driver as a module from a device tree
Input: bma150 - extend chip detection for bma180
Input: atkbd - fix keyboard not working on some LG laptops
Input: synaptics - add min/max quirk for ThinkPad Edge E431
Commit de7b2973903c "tracepoint: Use struct pointer instead of name hash
for reg/unreg tracepoints" introduces a use after free by calling
release_probes on the old struct tracepoint array before the newly
allocated array is published with rcu_assign_pointer. There is a race
window where tracepoints (RCU readers) can perform a
"use-after-grace-period-after-free", which shows up as a GPF in
stress-tests.
Link: http://lkml.kernel.org/r/53698021.5020108@oracle.com
Link: http://lkml.kernel.org/p/1399549669-25465-1-git-send-email-mathieu.desnoyers@efficios.com
Reported-by: Sasha Levin <sasha.levin@oracle.com>
CC: Oleg Nesterov <oleg@redhat.com>
CC: Dave Jones <davej@redhat.com>
Fixes: de7b2973903c "tracepoint: Use struct pointer instead of name hash for reg/unreg tracepoints"
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
The current tmpfile handler does not initialize default ACLs. Doing so
within xfs_vn_tmpfile() makes it roughly equivalent to xfs_vn_mknod(),
which is already used as a common create handler.
xfs_vn_mknod() does not currently have a mechanism to determine whether
to link the file into the namespace. Therefore, further abstract
xfs_vn_mknod() into a new xfs_generic_create() handler with a tmpfile
parameter. This new handler calls xfs_create_tmpfile() and d_tmpfile()
on the dentry when called via ->tmpfile().
Signed-off-by: Brian Foster <bfoster@redhat.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Due to a typo the msr accessor function introduced in
22085a66c2fab6cf9b9393c056a3600a6b4735de didn't have any lasting
effects because they accidentally wrote the old value back.
After c0a639ad0bc6b178b46996bd1f821a04643e2bde this at the very least
this causes cpuid limits not to be lifted on some cpus leading to
missing capabilities for those.
Signed-off-by: Andres Freund <andres@anarazel.de>
Link: http://lkml.kernel.org/r/1399598957-7011-2-git-send-email-andres@anarazel.de
Cc: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
When running stress test(including snapshots,balance,fstress), we trigger
the following BUG_ON() which is because we fail to start inode caching task.
[ 181.131945] kernel BUG at fs/btrfs/inode-map.c:179!
[ 181.137963] invalid opcode: 0000 [#1] SMP
[ 181.217096] CPU: 11 PID: 2532 Comm: btrfs Not tainted 3.14.0 #1
[ 181.240521] task: ffff88013b621b30 ti: ffff8800b6ada000 task.ti: ffff8800b6ada000
[ 181.367506] Call Trace:
[ 181.371107] [<ffffffffa036c1be>] btrfs_return_ino+0x9e/0x110 [btrfs]
[ 181.379191] [<ffffffffa038082b>] btrfs_evict_inode+0x46b/0x4c0 [btrfs]
[ 181.387464] [<ffffffff810b5a70>] ? autoremove_wake_function+0x40/0x40
[ 181.395642] [<ffffffff811dc5fe>] evict+0x9e/0x190
[ 181.401882] [<ffffffff811dcde3>] iput+0xf3/0x180
[ 181.408025] [<ffffffffa03812de>] btrfs_orphan_cleanup+0x1ee/0x430 [btrfs]
[ 181.416614] [<ffffffffa03a6abd>] btrfs_mksubvol.isra.29+0x3bd/0x450 [btrfs]
[ 181.425399] [<ffffffffa03a6cd6>] btrfs_ioctl_snap_create_transid+0x186/0x190 [btrfs]
[ 181.435059] [<ffffffffa03a6e3b>] btrfs_ioctl_snap_create_v2+0xeb/0x130 [btrfs]
[ 181.444148] [<ffffffffa03a9656>] btrfs_ioctl+0xf76/0x2b90 [btrfs]
[ 181.451971] [<ffffffff8117e565>] ? handle_mm_fault+0x475/0xe80
[ 181.459509] [<ffffffff8167ba0c>] ? __do_page_fault+0x1ec/0x520
[ 181.467046] [<ffffffff81185b35>] ? do_mmap_pgoff+0x2f5/0x3c0
[ 181.474393] [<ffffffff811d4da8>] do_vfs_ioctl+0x2d8/0x4b0
[ 181.481450] [<ffffffff811d5001>] SyS_ioctl+0x81/0xa0
[ 181.488021] [<ffffffff81680b69>] system_call_fastpath+0x16/0x1b
We should avoid triggering BUG_ON() here, instead, we output warning messages
and clear inode_cache option.
Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com>
Signed-off-by: Chris Mason <clm@fb.com>
Pull driver core fixes from Greg KH:
"Here are two driver core (well, sysfs) fixes for 3.15-rc6 that resolve
some reported issues and a regression from 3.13"
* tag 'driver-core-3.15-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
sysfs: make sure read buffer is zeroed
kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs