commits
Pull fuse fix from Miklos Szeredi:
"This makes sure userspace filesystems are not broken by the parallel
lookups and readdir feature"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: serialize dirops by default
Pull overlayfs fixes from Miklos Szeredi:
"This contains fixes for a dentry leak, a regression in 4.6 noticed by
Docker users and missing write access checking in truncate"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: warn instead of error if d_type is not supported
ovl: get_write_access() in truncate
ovl: fix dentry leak for default_permissions
Negotiate with userspace filesystems whether they support parallel readdir
and lookup. Disable parallelism by default for fear of breaking fuse
filesystems.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 9902af79c01a ("parallel lookups: actual switch to rwsem")
Fixes: d9b3dbdcfd62 ("fuse: switch to ->iterate_shared()")
Pull MIPS fix from Ralf Baechle:
"Only a single fix for 4.7 pending at this point. It fixes an issue
that may lead to corruption of the cache mode bits in the page table"
* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus:
MIPS: Fix possible corruption of cache mode by mprotect.
overlay needs underlying fs to support d_type. Recently I put in a
patch in to detect this condition and started failing mount if
underlying fs did not support d_type.
But this breaks existing configurations over kernel upgrade. Those who
are running docker (partially broken configuration) with xfs not
supporting d_type, are surprised that after kernel upgrade docker does
not run anymore.
https://github.com/docker/docker/issues/22937#issuecomment-229881315
So instead of erroring out, detect broken configuration and warn
about it. This should allow existing docker setups to continue
working after kernel upgrade.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 45aebeaf4f67 ("ovl: Ensure upper filesystem supports d_type")
Cc: <stable@vger.kernel.org> 4.6
Pull powerpc fixes from Michael Ellerman:
- tm: Always reclaim in start_thread() for exec() class syscalls from
Cyril Bur
- tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael
Neuling
- eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan
- Initialise pci_io_base as early as possible from Darren Stevens
* tag 'powerpc-4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Initialise pci_io_base as early as possible
powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()
powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
The following testcase may result in a page table entries with a invalid
CCA field being generated:
static void *bindstack;
static int sysrqfd;
static void protect_low(int protect)
{
mprotect(bindstack, BINDSTACK_SIZE, protect);
}
static void sigbus_handler(int signal, siginfo_t * info, void *context)
{
void *addr = info->si_addr;
write(sysrqfd, "x", 1);
printf("sigbus, fault address %p (should not happen, but might)\n",
addr);
abort();
}
static void run_bind_test(void)
{
unsigned int *p = bindstack;
p[0] = 0xf001f001;
write(sysrqfd, "x", 1);
/* Set trap on access to p[0] */
protect_low(PROT_NONE);
write(sysrqfd, "x", 1);
/* Clear trap on access to p[0] */
protect_low(PROT_READ | PROT_WRITE | PROT_EXEC);
write(sysrqfd, "x", 1);
/* Check the contents of p[0] */
if (p[0] != 0xf001f001) {
write(sysrqfd, "x", 1);
/* Reached, but shouldn't be */
printf("badness, shouldn't happen but does\n");
abort();
}
}
int main(void)
{
struct sigaction sa;
sysrqfd = open("/proc/sysrq-trigger", O_WRONLY);
if (sigprocmask(SIG_BLOCK, NULL, &sa.sa_mask)) {
perror("sigprocmask");
return 0;
}
sa.sa_sigaction = sigbus_handler;
sa.sa_flags = SA_SIGINFO | SA_NODEFER | SA_RESTART;
if (sigaction(SIGBUS, &sa, NULL)) {
perror("sigaction");
return 0;
}
bindstack = mmap(NULL,
BINDSTACK_SIZE,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (bindstack == MAP_FAILED) {
perror("mmap bindstack");
return 0;
}
printf("bindstack: %p\n", bindstack);
run_bind_test();
printf("done\n");
return 0;
}
There are multiple ingredients for this:
1) PAGE_NONE is defined to _CACHE_CACHABLE_NONCOHERENT, which is CCA 3
on all platforms except SB1 where it's CCA 5.
2) _page_cachable_default must have bits set which are not set
_CACHE_CACHABLE_NONCOHERENT.
3) Either the defective version of pte_modify for XPA or the standard
version must be in used. However pte_modify for the 36 bit address
space support is no affected.
In that case additional bits in the final CCA mode may generate an invalid
value for the CCA field. On the R10000 system where this was tracked
down for example a CCA 7 has been observed, which is Uncached Accelerated.
Fixed by:
1) Using the proper CCA mode for PAGE_NONE just like for all the other
PAGE_* pte/pmd bits.
2) Fix the two affected variants of pte_modify.
Further code inspection also shows the same issue to exist in pmd_modify
which would affect huge page systems.
Issue in pte_modify tracked down by Alastair Bridgewater, PAGE_NONE
and pmd_modify issue found by me.
The history of this goes back beyond Linus' git history. Chris Dearman's
commit 351336929ccf222ae38ff0cb7a8dd5fd5c6236a0 ("[MIPS] Allow setting of
the cache attribute at run time.") missed the opportunity to fix this
but it was originally introduced in lmo commit
d523832cf12007b3242e50bb77d0c9e63e0b6518 ("Missing from last commit.")
and 32cc38229ac7538f2346918a09e75413e8861f87 ("New configuration option
CONFIG_MIPS_UNCACHED.")
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reported-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
When truncating a file we should check write access on the underlying
inode. And we should do so on the lower file as well (before copy-up) for
consistency.
Original patch and test case by Aihua Zhang.
- - >o >o - - test.c - - >o >o - -
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int ret;
ret = truncate(argv[0], 4096);
if (ret != -1) {
fprintf(stderr, "truncate(argv[0]) should have failed\n");
return 1;
}
if (errno != ETXTBSY) {
perror("truncate(argv[0])");
return 1;
}
return 0;
}
- - >o >o - - >o >o - - >o >o - -
Reported-by: Aihua Zhang <zhangaihua1@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Pull SCSI fixes from James Bottomley:
"Two straightforward fixes.
One is a concurrency issue only affecting SAS connected SATA drives,
but which could hang the storage subsystem if it triggers (because the
outstanding command count on error never goes back to zero) and the
other is a NO_TAG fallout from the switch to hostwide tags which
causes the system to crash on module insertion (we've checked
carefully and only the 53c700 family of drivers is vulnerable to this
issue)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
53c700: fix BUG on untagged commands
scsi: fix race between simultaneous decrements of ->host_failed
Pull drm fixes frlm Dave Airlie:
"Just some AMD and Intel fixes, the AMD ones are further production
Polaris fixes, and the Intel ones fix some early timeouts, some PCI ID
changes and a couple of other fixes.
Still a bit Internet challenged here, hopefully end of next week will
solve it"
* tag 'drm-fixes-for-v4.7-rc6' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Fix missing unlock on error in i915_ppgtt_info()
drm/amd/powerplay: workaround for UVD clock issue
drm/amdgpu: add ACLK_CNTL setting for polaris10
drm/amd/powerplay: fix issue uvd dpm can't enabled on Polaris11.
drm/amd/powerplay: Workaround for Memory EDC Error on Polaris10.
drm/i915: Removing PCI IDs that are no longer listed as Kabylake.
drm/i915: Add more Kabylake PCI IDs.
drm/i915: Avoid early timeout during AUX transfers
drm/i915/hsw: Avoid early timeout during LCPLL disable/restore
drm/i915/lpt: Avoid early timeout during FDI PHY reset
drm/i915/bxt: Avoid early timeout during PLL enable
drm/i915: Refresh cached DP port register value on resume
drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation
drm/amd/powerplay: disable FFC.
drm/amd/powerplay: add some definition for FFC feature on polaris.
Commit d6a9996e84ac ("powerpc/mm: vmalloc abstraction in preparation for
radix") turned kernel memory and IO addresses from #defined constants to
variables initialised at runtime.
On PA6T (pasemi) systems the setup_arch() machine call initialises the
onboard PCI-e root-ports, and uses pci_io_base to do this, which is now
before its value has been set, resulting in a panic early in boot before
console IO is initialised.
Move the pci_io_base initialisation to the same place as vmalloc ranges
are set (hash__early_init_mmu()/radix__early_init_mmu()) - this is the
earliest possible place we can initialise it.
Fixes: d6a9996e84ac ("powerpc/mm: vmalloc abstraction in preparation for radix")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Add #ifdef CONFIG_PCI, massage change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
When using the 'default_permissions' mount option, ovl_permission() on
non-directories was missing a dput(alias), resulting in "BUG Dentry still
in use".
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 8d3095f4ad47 ("ovl: default permissions")
Cc: <stable@vger.kernel.org> # v4.5+
Pull btrfs fixes part 2 from Chris Mason:
"This has one patch from Omar to bring iterate_shared back to btrfs.
We have a tree of work we queue up for directory items and it doesn't
lend itself well to shared access. While we're cleaning it up, Omar
has changed things to use an exclusive lock when there are delayed
items"
* 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
Pull spi fixes from Mark Brown:
"A few small driver-specific fixes for SPI, all in the normal important
if you hit them category especially the rockchip driver fix which
addresses a race which has been exposed more frequently with some
recent performance improvements"
* tag 'spi-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: sunxi: fix transfer timeout
spi: sun4i: fix FIFO limit
spi: rockchip: Signal unfinished DMA transfers
spi: spi-ti-qspi: Suspend the queue before removing the device
here's a batch of i915 fixes for 4.7.
* tag 'drm-intel-fixes-2016-06-30' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Fix missing unlock on error in i915_ppgtt_info()
drm/i915: Removing PCI IDs that are no longer listed as Kabylake.
drm/i915: Add more Kabylake PCI IDs.
drm/i915: Avoid early timeout during AUX transfers
drm/i915/hsw: Avoid early timeout during LCPLL disable/restore
drm/i915/lpt: Avoid early timeout during FDI PHY reset
drm/i915/bxt: Avoid early timeout during PLL enable
drm/i915: Refresh cached DP port register value on resume
Currently we have 2 segments that are bolted for the kernel linear
mapping (ie 0xc000... addresses). This is 0 to 1TB and also the kernel
stacks. Anything accessed outside of these regions may need to be
faulted in. (In practice machines with TM always have 1T segments)
If a machine has < 2TB of memory we never fault on the kernel linear
mapping as these two segments cover all physical memory. If a machine
has > 2TB of memory, there may be structures outside of these two
segments that need to be faulted in. This faulting can occur when
running as a guest as the hypervisor may remove any SLB that's not
bolted.
When we treclaim and trecheckpoint we have a window where we need to
run with the userspace GPRs. This means that we no longer have a valid
stack pointer in r1. For this window we therefore clear MSR RI to
indicate that any exceptions taken at this point won't be able to be
handled. This means that we can't take segment misses in this RI=0
window.
In this RI=0 region, we currently access the thread_struct for the
process being context switched to or from. This thread_struct access
may cause a segment fault since it's not guaranteed to be covered by
the two bolted segment entries described above.
We've seen this with a crash when running as a guest with > 2TB of
memory on PowerVM:
Unrecoverable exception 4100 at c00000000004f138
Oops: Unrecoverable exception, sig: 6 [#1]
SMP NR_CPUS=2048 NUMA pSeries
CPU: 1280 PID: 7755 Comm: kworker/1280:1 Tainted: G X 4.4.13-46-default #1
task: c000189001df4210 ti: c000189001d5c000 task.ti: c000189001d5c000
NIP: c00000000004f138 LR: 0000000010003a24 CTR: 0000000010001b20
REGS: c000189001d5f730 TRAP: 4100 Tainted: G X (4.4.13-46-default)
MSR: 8000000100001031 <SF,ME,IR,DR,LE> CR: 24000048 XER: 00000000
CFAR: c00000000004ed18 SOFTE: 0
GPR00: ffffffffc58d7b60 c000189001d5f9b0 00000000100d7d00 000000003a738288
GPR04: 0000000000002781 0000000000000006 0000000000000000 c0000d1f4d889620
GPR08: 000000000000c350 00000000000008ab 00000000000008ab 00000000100d7af0
GPR12: 00000000100d7ae8 00003ffe787e67a0 0000000000000000 0000000000000211
GPR16: 0000000010001b20 0000000000000000 0000000000800000 00003ffe787df110
GPR20: 0000000000000001 00000000100d1e10 0000000000000000 00003ffe787df050
GPR24: 0000000000000003 0000000000010000 0000000000000000 00003fffe79e2e30
GPR28: 00003fffe79e2e68 00000000003d0f00 00003ffe787e67a0 00003ffe787de680
NIP [c00000000004f138] restore_gprs+0xd0/0x16c
LR [0000000010003a24] 0x10003a24
Call Trace:
[c000189001d5f9b0] [c000189001d5f9f0] 0xc000189001d5f9f0 (unreliable)
[c000189001d5fb90] [c00000000001583c] tm_recheckpoint+0x6c/0xa0
[c000189001d5fbd0] [c000000000015c40] __switch_to+0x2c0/0x350
[c000189001d5fc30] [c0000000007e647c] __schedule+0x32c/0x9c0
[c000189001d5fcb0] [c0000000007e6b58] schedule+0x48/0xc0
[c000189001d5fce0] [c0000000000deabc] worker_thread+0x22c/0x5b0
[c000189001d5fd80] [c0000000000e7000] kthread+0x110/0x130
[c000189001d5fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
Instruction dump:
7cb103a6 7cc0e3a6 7ca222a6 78a58402 38c00800 7cc62838 08860000 7cc000a6
38a00006 78c60022 7cc62838 0b060000 <e8c701a0> 7ccff120 e8270078 e8a70098
---[ end trace 602126d0a1dedd54 ]---
This fixes this by copying the required data from the thread_struct to
the stack before we clear MSR RI. Then once we clear RI, we only access
the stack, guaranteeing there's no segment miss.
We also tighten the region over which we set RI=0 on the treclaim()
path. This may have a slight performance impact since we're adding an
mtmsr instruction.
Fixes: 090b9284d725 ("powerpc/tm: Clear MSR RI in non-recoverable TM code")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull btrfs fixes from Chris Mason:
"I have a two part pull this time because one of the patches Dave
Sterba collected needed to be against v4.7-rc2 or higher (we used
rc4). I try to make my for-linus-xx branch testable on top of the
last major so we can hand fixes to people on the list more easily, so
I've split this pull in two.
This first part has some fixes and two performance improvements that
we've been testing for some time.
Josef's two performance fixes are most notable. The transid tracking
patch makes a big improvement on pretty much every workload"
* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: Force stripesize to the value of sectorsize
btrfs: fix disk_i_size update bug when fallocate() fails
Btrfs: fix error handling in map_private_extent_buffer
Btrfs: fix error return code in btrfs_init_test_fs()
Btrfs: don't do nocow check unless we have to
btrfs: fix deadlock in delayed_ref_async_start
Btrfs: track transid for delayed ref flushing
Commit fe742fd4f90f ("Revert "btrfs: switch to ->iterate_shared()"")
backed out the conversion to ->iterate_shared() for Btrfs because the
delayed inode handling in btrfs_real_readdir() is racy. However, we can
still do readdir in parallel if there are no delayed nodes.
This is a temporary fix which upgrades the shared inode lock to an
exclusive lock only when we have delayed items until we come up with a
more complete solution. While we're here, rename the
btrfs_{get,put}_delayed_items functions to make it very clear that
they're just for readdir.
Tested with xfstests and by doing a parallel kernel build:
while make tinyconfig && make -j4 && git clean dqfx; do
:
done
along with a bunch of parallel finds in another shell:
while true; do
for ((i=0; i<4; i++)); do
find . >/dev/null &
done
wait
done
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
The untagged command case in the 53c700 driver has been broken since
host wide tags were enabled because the replaced scsi_find_tag()
function had a special case for the tag value SCSI_NO_TAG to retrieve
sdev->current_cmnd. The replacement function scsi_host_find_tag() has
no such special case and returns NULL causing untagged commands to
trigger a BUG() in the driver. Inspection shows that the 53c700 is the
only driver using this SCSI_NO_TAG case, so a local fix in the driver
suffices to fix this problem globally.
Fixes: 64d513ac31b - "scsi: use host wide tags by default"
Cc: stable@vger.kernel.org # 4.4+
Reported-by: Helge Deller <deller@gmx.de>
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: James Bottomley <jejb@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull regulator fixes from Mark Brown:
"Two small fixes for the regulator subsystem - one fixing a crash with
one of the devices supported by the max77620 driver, another fixing
startup for the anatop regulator when it starts up with the regulator
in bypass mode"
* tag 'regulator-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: max77620: check for valid regulator info
regulator: anatop: allow regulator to be in bypass mode
Just a few more late fixes for Polaris cards.
* 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/powerplay: workaround for UVD clock issue
drm/amdgpu: add ACLK_CNTL setting for polaris10
drm/amd/powerplay: fix issue uvd dpm can't enabled on Polaris11.
drm/amd/powerplay: Workaround for Memory EDC Error on Polaris10.
drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation
drm/amd/powerplay: disable FFC.
drm/amd/powerplay: add some definition for FFC feature on polaris.
Add the missing unlock before return from function i915_ppgtt_info()
in the error handling case.
Fixes: 1d2ac403ae3b(drm: Protect dev->filelist with its own mutex)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1465861320-26221-1-git-send-email-weiyj_lk@163.com
(cherry picked from commit b0212486909de4f239ca9f20d032de1b1f2dc52e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When calling eeh_rmv_device() in eeh_reset_device() for partial hotplug
case, @rmv_data instead of its address is the proper argument.
Otherwise, the stack frame is corrupted when writing to
@rmv_data (actually its address) in eeh_rmv_device(). It results in
kernel crash as observed.
This fixes the issue by passing @rmv_data, not its address to
eeh_rmv_device() in eeh_reset_device().
Fixes: 67086e32b564 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE")
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull sound fixes from Takashi Iwai:
"Again pretty calm weeks: we've had only a few trivial / stable
HD-audio fixes in addition to a possible race fix for snd-dummy driver
spotted by syzkaller"
* tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: dummy: Fix a use-after-free at closing
ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
ALSA: hda - Fix the headset mic jack detection on Dell machine
ALSA: hda/tegra: iomem fixups for sparse warnings
ALSA: hdac_regmap - fix the register access for runtime PM
Btrfs code currently assumes stripesize to be same as
sectorsize. However Btrfs-progs (until commit
df05c7ed455f519e6e15e46196392e4757257305) has been setting
btrfs_super_block->stripesize to a value of 4096.
This commit makes sure that the value of btrfs_super_block->stripesize
is a power of 2. Later, it unconditionally sets btrfs_root->stripesize
to sectorsize.
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Pull SCSI fixes from James Bottomley:
"This is a set of four fixes noticed in the merge window. The aacraid
one is an optimisation, the mp3sas one fixes a spurious printk, the
sd_check_events one fixes a theoretical race and the failed zero
length commands fixes a bug in our completion/retry routines that has
been causing problems in the field"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
aacraid: do not activate events on non-SRC adapters
mpt3sas: add missing curly braces
sd: get disk reference in sd_check_events()
scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands
For historic reasons, io_opt is in bytes and max_sectors in block layer
sectors. This interface inconsistency is error prone and should be
fixed. But for 4.4--4.7 let's make the unit difference explicit via a
wrapper function.
Fixes: d0eb20a863ba ("sd: Optimal I/O size is in bytes, not sectors")
Cc: stable@vger.kernel.org # 4.4+
Reported-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Andrew Patterson <andrew.patterson@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
sas_ata_strategy_handler() adds the works of the ata error handler to
system_unbound_wq. This workqueue asynchronously runs work items, so the
ata error handler will be performed concurrently on different CPUs. In
this case, ->host_failed will be decreased simultaneously in
scsi_eh_finish_cmd() on different CPUs, and become abnormal.
It will lead to permanently inequality between ->host_failed and
->host_busy, and scsi error handler thread won't start running. IO
errors after that won't be handled.
Since all scmds must have been handled in the strategy handler, just
remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after
the strategy handler to fix this race.
Fixes: 50824d6c5657 ("[SCSI] libsas: async ata-eh")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull clk fixes from Stephen Boyd:
"A small fix for the newly added oxnas clk driver and a handful of
rockchip clk driver fixes for newly added rk3399 support"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Fix return value check in oxnas_stdclk_probe()
clk: rockchip: release io resource when failing to init clk on rk3399
clk: rockchip: fix cpuclk registration error handling
clk: rockchip: Revert "clk: rockchip: reset init state before mmc card initialization"
clk: rockchip: fix incorrect parent for rk3399's {c,g}pll_aclk_perihp_src
clk: rockchip: mark rk3399 GIC clocks as critical
clk: rockchip: initialize flags of clk_init_data in mmc-phase clock
When using DMA, the transfer_one callback should return 1 because the
transfer hasn't finished yet.
A previous commit changed the function to return 0 when the DMA channels
were correctly prepared.
This manifested in Veyron boards with this message:
[ 1.983605] cros-ec-spi spi0.0: EC failed to respond in time
Fixes: ea9849113343 ("spi: rockchip: check return value of dmaengine_prep_slave_sg")
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
The trasfer timeout is fixed at 1000 ms. Reading a 4Mbyte flash over
1MHz SPI bus takes way longer than that. Calculate the timeout from the
actual time the transfer is supposed to take and multiply by 2 for good
measure.
Signed-off-by: Michal Suchanek <hramrach@gmail.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Before disabling the pm_runtime, we must ensure that there is no transfer
in progress nor will a new one be started. Otherwise the message pump will
fail and in the end, the process requesting the transfer will be stuck.
This behavior has been observed when transferring data from a SPI flash
with dd while removing the module on a DRA7x-evm.
Signed-off-by: Jean-Jacques Hiblot <jjhiblot@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull ACPI fix from Rafael Wysocki:
"Fix an expression in the ACPI PCI IRQ management code added by a
recent commit that overlooked missing parens in it, so the result of
the computation is incorrect in some cases (Sinan Kaya)"
* tag 'acpi-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI,PCI,IRQ: correct operator precedence
workaround issue that when uvd dpm disabled,
uvd clock remain high on polaris10. Manually turn
off the clocks.
Signed-off-by: Rex Zhu <Rex.Zhu@amd.com>
Reviewed-by: Ken Wang <Qingqing.Wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This is unusual. Usually IDs listed on early stages of platform
definition are kept there as reserved for later use.
However these IDs here are not listed anymore in any of steppings
and devices IDs tables for Kabylake on configurations overview
section of BSpec.
So it is better removing them before they become used in any
other future platform.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466718636-19675-2-git-send-email-rodrigo.vivi@intel.com
(cherry picked from commit a922eb8d4581c883c37ce6e12dca9ff2cb1ea723)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Userspace can quite legitimately perform an exec() syscall with a
suspended transaction. exec() does not return to the old process, rather
it load a new one and starts that, the expectation therefore is that the
new process starts not in a transaction. Currently exec() is not treated
any differently to any other syscall which creates problems.
Firstly it could allow a new process to start with a suspended
transaction for a binary that no longer exists. This means that the
checkpointed state won't be valid and if the suspended transaction were
ever to be resumed and subsequently aborted (a possibility which is
exceedingly likely as exec()ing will likely doom the transaction) the
new process will jump to invalid state.
Secondly the incorrect attempt to keep the transactional state while
still zeroing state for the new process creates at least two TM Bad
Things. The first triggers on the rfid to return to userspace as
start_thread() has given the new process a 'clean' MSR but the suspend
will still be set in the hardware MSR. The second TM Bad Thing triggers
in __switch_to() as the processor is still transactionally suspended but
__switch_to() wants to zero the TM sprs for the new process.
This is an example of the outcome of calling exec() with a suspended
transaction. Note the first 700 is likely the first TM bad thing
decsribed earlier only the kernel can't report it as we've loaded
userspace registers. c000000000009980 is the rfid in
fast_exception_return()
Bad kernel stack pointer 3fffcfa1a370 at c000000000009980
Oops: Bad kernel stack pointer, sig: 6 [#1]
CPU: 0 PID: 2006 Comm: tm-execed Not tainted
NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000
REGS: c00000003ffefd40 TRAP: 0700 Not tainted
MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]> CR: 00000000 XER: 00000000
CFAR: c0000000000098b4 SOFTE: 0
PACATMSCRATCH: b00000010000d033
GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000
NIP [c000000000009980] fast_exception_return+0xb0/0xb8
LR [0000000000000000] (null)
Call Trace:
Instruction dump:
f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070
e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b
Kernel BUG at c000000000043e80 [verbose debug info unavailable]
Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033)
Oops: Unrecoverable exception, sig: 6 [#2]
CPU: 0 PID: 2006 Comm: tm-execed Tainted: G D
task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000
NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000
REGS: c00000003ffef7e0 TRAP: 0700 Tainted: G D
MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]> CR: 28002828 XER: 00000000
CFAR: c000000000015a20 SOFTE: 0
PACATMSCRATCH: b00000010000d033
GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000
GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000
GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004
GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000
GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000
GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80
NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c
LR [c000000000015a24] __switch_to+0x1f4/0x420
Call Trace:
Instruction dump:
7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020
This fixes CVE-2016-5828.
Fixes: bc2a9408fa65 ("powerpc: Hook in new transactional memory code")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull x86 kprobe fix from Thomas Gleixner:
"A single fix clearing the TF bit when a fault is single stepped"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
kprobes/x86: Clear TF bit in fault on single-stepping
syzkaller fuzzer spotted a potential use-after-free case in snd-dummy
driver when hrtimer is used as backend:
> ==================================================================
> BUG: KASAN: use-after-free in rb_erase+0x1b17/0x2010 at addr ffff88005e5b6f68
> Read of size 8 by task syz-executor/8984
> =============================================================================
> BUG kmalloc-192 (Not tainted): kasan: bad access detected
> -----------------------------------------------------------------------------
>
> Disabling lock debugging due to kernel taint
> INFO: Allocated in 0xbbbbbbbbbbbbbbbb age=18446705582212484632
> ....
> [< none >] dummy_hrtimer_create+0x49/0x1a0 sound/drivers/dummy.c:464
> ....
> INFO: Freed in 0xfffd8e09 age=18446705496313138713 cpu=2164287125 pid=-1
> [< none >] dummy_hrtimer_free+0x68/0x80 sound/drivers/dummy.c:481
> ....
> Call Trace:
> [<ffffffff8179e59e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:333
> [< inline >] rb_set_parent include/linux/rbtree_augmented.h:111
> [< inline >] __rb_erase_augmented include/linux/rbtree_augmented.h:218
> [<ffffffff82ca5787>] rb_erase+0x1b17/0x2010 lib/rbtree.c:427
> [<ffffffff82cb02e8>] timerqueue_del+0x78/0x170 lib/timerqueue.c:86
> [<ffffffff814d0c80>] __remove_hrtimer+0x90/0x220 kernel/time/hrtimer.c:903
> [< inline >] remove_hrtimer kernel/time/hrtimer.c:945
> [<ffffffff814d23da>] hrtimer_try_to_cancel+0x22a/0x570 kernel/time/hrtimer.c:1046
> [<ffffffff814d2742>] hrtimer_cancel+0x22/0x40 kernel/time/hrtimer.c:1066
> [<ffffffff85420531>] dummy_hrtimer_stop+0x91/0xb0 sound/drivers/dummy.c:417
> [<ffffffff854228bf>] dummy_pcm_trigger+0x17f/0x1e0 sound/drivers/dummy.c:507
> [<ffffffff85392170>] snd_pcm_do_stop+0x160/0x1b0 sound/core/pcm_native.c:1106
> [<ffffffff85391b26>] snd_pcm_action_single+0x76/0x120 sound/core/pcm_native.c:956
> [<ffffffff85391e01>] snd_pcm_action+0x231/0x290 sound/core/pcm_native.c:974
> [< inline >] snd_pcm_stop sound/core/pcm_native.c:1139
> [<ffffffff8539754d>] snd_pcm_drop+0x12d/0x1d0 sound/core/pcm_native.c:1784
> [<ffffffff8539d3be>] snd_pcm_common_ioctl1+0xfae/0x2150 sound/core/pcm_native.c:2805
> [<ffffffff8539ee91>] snd_pcm_capture_ioctl1+0x2a1/0x5e0 sound/core/pcm_native.c:2976
> [<ffffffff8539f2ec>] snd_pcm_kernel_ioctl+0x11c/0x160 sound/core/pcm_native.c:3020
> [<ffffffff853d9a44>] snd_pcm_oss_sync+0x3a4/0xa30 sound/core/oss/pcm_oss.c:1693
> [<ffffffff853da27d>] snd_pcm_oss_release+0x1ad/0x280 sound/core/oss/pcm_oss.c:2483
> .....
A workaround is to call hrtimer_cancel() in dummy_hrtimer_sync() which
is called certainly before other blocking ops.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When doing truncate operation, btrfs_setsize() will first call
truncate_setsize() to set new inode->i_size, but if later
btrfs_truncate() fails, btrfs_setsize() will call
"i_size_write(inode, BTRFS_I(inode)->disk_i_size)" to reset the
inmemory inode size, now bug occurs. It's because for truncate
case btrfs_ordered_update_i_size() directly uses inode->i_size
to update BTRFS_I(inode)->disk_i_size, indeed we should use the
"offset" argument to update disk_i_size. Here is the call graph:
==>btrfs_truncate()
====>btrfs_truncate_inode_items()
======>btrfs_ordered_update_i_size(inode, last_size, NULL);
Here btrfs_ordered_update_i_size()'s offset argument is last_size.
And below test case can reveal this bug:
dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=100
dev=$(losetup --show -f fs.img)
mkdir -p /mnt/mntpoint
mkfs.btrfs -f $dev
mount $dev /mnt/mntpoint
cd /mnt/mntpoint
echo "workdir is: /mnt/mntpoint"
blocksize=$((128 * 1024))
dd if=/dev/zero of=testfile bs=$blocksize count=1
sync
count=$((17*1024*1024*1024/blocksize))
echo "file size is:" $((count*blocksize))
for ((i = 1; i <= $count; i++)); do
i=$((i + 1))
dst_offset=$((blocksize * i))
xfs_io -f -c "reflink testfile 0 $dst_offset $blocksize"\
testfile > /dev/null
done
sync
truncate --size 0 testfile
ls -l testfile
du -sh testfile
exit
In this case, truncate operation will fail for enospc reason and
"du -sh testfile" returns value greater than 0, but testfile's
size is 0, we need to reflect correct inode->i_size.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Pull UDF fixes and a reiserfs fix from Jan Kara:
"A couple of udf fixes (most notably a bug in parsing UDF partitions
which led to inability to mount recent Windows installation media) and
a reiserfs fix for handling kstrdup failure"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: check kstrdup failure
udf: Use correct partition reference number for metadata
udf: Use IS_ERR when loading metadata mirror file entry
udf: Don't BUG on missing metadata partition descriptor
Only SRC-based adapters support the AifReqEvent function, so there is no
point in trying to activate it on older, non-SRC based adapters. Doing
so lead to crashes on older adapters.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAaditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Linux fails to boot as a guest with a QEMU CD-ROM:
[ 4.439488] ata2.00: ATAPI: QEMU CD-ROM, 0.8.2, max UDMA/100
[ 4.443649] ata2.00: configured for MWDMA2
[ 4.450267] scsi 1:0:0:0: CD-ROM QEMU QEMU CD-ROM 0.8. PQ: 0 ANSI: 5
[ 4.464317] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 4.464319] ata2.00: BMDMA stat 0x5
[ 4.464339] ata2.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 0 dma 16640 in
[ 4.464339] Inquiry 12 01 00 00 ff 00res 48/20:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
[ 4.464341] ata2.00: status: { DRDY DRQ }
[ 4.465864] ata2: soft resetting link
[ 4.625971] ata2.00: configured for MWDMA2
[ 4.628290] ata2: EH complete
[ 4.646670] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 4.646671] ata2.00: BMDMA stat 0x5
[ 4.646683] ata2.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 0 dma 16640 in
[ 4.646683] Inquiry 12 01 00 00 ff 00res 48/20:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
[ 4.646685] ata2.00: status: { DRDY DRQ }
[ 4.648193] ata2: soft resetting link
...
Fix this by suppressing VPD inquiry for this device.
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A bunch of fixes. Some for the newly added rk3399 clock tree, some
concerning error handling and initialization and a revert of the
mmc-phase clock initialization, as this could conflict with the
bootloader setting of this clock and a real solution to initing
the phase correctly from dw_mmc went in as fix for 4.7 through
the mmc tree.
* tag 'v4.7-rockchip-clk-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
clk: rockchip: release io resource when failing to init clk on rk3399
clk: rockchip: fix cpuclk registration error handling
clk: rockchip: Revert "clk: rockchip: reset init state before mmc card initialization"
clk: rockchip: fix incorrect parent for rk3399's {c,g}pll_aclk_perihp_src
clk: rockchip: mark rk3399 GIC clocks as critical
clk: rockchip: initialize flags of clk_init_data in mmc-phase clock
Bypass support was added in commit d38018f2019c ("regulator: anatop: Add
bypass support to digital LDOs"). A check for valid voltage selectors was
added in commit da0607c8df5c ("regulator: anatop: Fail on invalid voltage
selector") but it also discards all regulators that are in bypass mode. Add
check for the bypass setting. Errors below were seen on a Variscite mx6
board.
anatop_regulator 20c8000.anatop:regulator-vddcore@140: Failed to read a valid default voltage selector.
anatop_regulator: probe of 20c8000.anatop:regulator-vddcore@140 failed with error -22
anatop_regulator 20c8000.anatop:regulator-vddsoc@140: Failed to read a valid default voltage selector.
anatop_regulator: probe of 20c8000.anatop:regulator-vddsoc@140 failed with error -22
Fixes: da0607c8df5c ("regulator: anatop: Fail on invalid voltage selector")
Signed-off-by: Mika Båtsman <mbatsman@mvista.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
SD4 regulator is not registered with regulator core
framework in probe as there is no support in MAX77620 PMIC,
removing SD4 entry from MAX77620 regulator information list
and checking for valid regulator information data before
configuring FPS source and FPS power up/down period to avoid
NULL pointer exception if regulator not registered with core.
Signed-off-by: Venkat Reddy Talla <vreddytalla@nvidia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
PTR_ERR should access the value just tested by IS_ERR.
The semantic patch that makes this change is available
in scripts/coccinelle/tests/odd_ptr_err.cocci.
Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
When testing SPI without DMA I noticed that filling the FIFO on the
spi controller causes timeout.
Always leave room for one byte in the FIFO.
Signed-off-by: Michal Suchanek <hramrach@gmail.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Pull power management fixes from Rafael Wysocki:
"Three cpufreq fixes, one in the core (stable-candidate) and two in
drivers (intel_pstate and cpufreq-dt).
Specifics:
- Fix a recent intel_pstate regression that caused the number of
wakeups to increase significantly on an idle system in some cases
due to excessive synchronize_sched() invocations (Rafael Wysocki).
- Fix unnecessary invocations of WARN_ON() in the cpufreq core after
cpufreq has been suspended introduced during the 4.6 cycla (Rafael
Wysocki).
- Fix an error code path in the cpufreq-dt-platdev driver that
forgets to drop a reference to a DT node (Masahiro Yamada)"
* tag 'pm-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Avoid false-positive WARN_ON()s in cpufreq_update_policy()
cpufreq: dt: call of_node_put() before error out
intel_pstate: Do not clear utilization update hooks on policy changes
The omitted parenthesis prevents the addition operation when
acpi_penalize_isa_irq function is called.
Fixes: 103544d86976 (ACPI,PCI,IRQ: reduce resource requirements)
Signed-off-by: Sinan Kaya <okaya@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This is a temporary workaround for early boards.
Signed-off-by: Ken Wang <Qingqing.Wang@amd.com>
Reviewed-by: Rex Zhu <Rex.Zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Pull overlayfs fixes from Miklos Szeredi:
"This contains fixes for a dentry leak, a regression in 4.6 noticed by
Docker users and missing write access checking in truncate"
* 'overlayfs-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
ovl: warn instead of error if d_type is not supported
ovl: get_write_access() in truncate
ovl: fix dentry leak for default_permissions
Negotiate with userspace filesystems whether they support parallel readdir
and lookup. Disable parallelism by default for fear of breaking fuse
filesystems.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 9902af79c01a ("parallel lookups: actual switch to rwsem")
Fixes: d9b3dbdcfd62 ("fuse: switch to ->iterate_shared()")
overlay needs underlying fs to support d_type. Recently I put in a
patch in to detect this condition and started failing mount if
underlying fs did not support d_type.
But this breaks existing configurations over kernel upgrade. Those who
are running docker (partially broken configuration) with xfs not
supporting d_type, are surprised that after kernel upgrade docker does
not run anymore.
https://github.com/docker/docker/issues/22937#issuecomment-229881315
So instead of erroring out, detect broken configuration and warn
about it. This should allow existing docker setups to continue
working after kernel upgrade.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 45aebeaf4f67 ("ovl: Ensure upper filesystem supports d_type")
Cc: <stable@vger.kernel.org> 4.6
Pull powerpc fixes from Michael Ellerman:
- tm: Always reclaim in start_thread() for exec() class syscalls from
Cyril Bur
- tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0 from Michael
Neuling
- eeh: Fix wrong argument passed to eeh_rmv_device() from Gavin Shan
- Initialise pci_io_base as early as possible from Darren Stevens
* tag 'powerpc-4.7-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Initialise pci_io_base as early as possible
powerpc/tm: Avoid SLB faults in treclaim/trecheckpoint when RI=0
powerpc/eeh: Fix wrong argument passed to eeh_rmv_device()
powerpc/tm: Always reclaim in start_thread() for exec() class syscalls
The following testcase may result in a page table entries with a invalid
CCA field being generated:
static void *bindstack;
static int sysrqfd;
static void protect_low(int protect)
{
mprotect(bindstack, BINDSTACK_SIZE, protect);
}
static void sigbus_handler(int signal, siginfo_t * info, void *context)
{
void *addr = info->si_addr;
write(sysrqfd, "x", 1);
printf("sigbus, fault address %p (should not happen, but might)\n",
addr);
abort();
}
static void run_bind_test(void)
{
unsigned int *p = bindstack;
p[0] = 0xf001f001;
write(sysrqfd, "x", 1);
/* Set trap on access to p[0] */
protect_low(PROT_NONE);
write(sysrqfd, "x", 1);
/* Clear trap on access to p[0] */
protect_low(PROT_READ | PROT_WRITE | PROT_EXEC);
write(sysrqfd, "x", 1);
/* Check the contents of p[0] */
if (p[0] != 0xf001f001) {
write(sysrqfd, "x", 1);
/* Reached, but shouldn't be */
printf("badness, shouldn't happen but does\n");
abort();
}
}
int main(void)
{
struct sigaction sa;
sysrqfd = open("/proc/sysrq-trigger", O_WRONLY);
if (sigprocmask(SIG_BLOCK, NULL, &sa.sa_mask)) {
perror("sigprocmask");
return 0;
}
sa.sa_sigaction = sigbus_handler;
sa.sa_flags = SA_SIGINFO | SA_NODEFER | SA_RESTART;
if (sigaction(SIGBUS, &sa, NULL)) {
perror("sigaction");
return 0;
}
bindstack = mmap(NULL,
BINDSTACK_SIZE,
PROT_READ | PROT_WRITE | PROT_EXEC,
MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
if (bindstack == MAP_FAILED) {
perror("mmap bindstack");
return 0;
}
printf("bindstack: %p\n", bindstack);
run_bind_test();
printf("done\n");
return 0;
}
There are multiple ingredients for this:
1) PAGE_NONE is defined to _CACHE_CACHABLE_NONCOHERENT, which is CCA 3
on all platforms except SB1 where it's CCA 5.
2) _page_cachable_default must have bits set which are not set
_CACHE_CACHABLE_NONCOHERENT.
3) Either the defective version of pte_modify for XPA or the standard
version must be in used. However pte_modify for the 36 bit address
space support is no affected.
In that case additional bits in the final CCA mode may generate an invalid
value for the CCA field. On the R10000 system where this was tracked
down for example a CCA 7 has been observed, which is Uncached Accelerated.
Fixed by:
1) Using the proper CCA mode for PAGE_NONE just like for all the other
PAGE_* pte/pmd bits.
2) Fix the two affected variants of pte_modify.
Further code inspection also shows the same issue to exist in pmd_modify
which would affect huge page systems.
Issue in pte_modify tracked down by Alastair Bridgewater, PAGE_NONE
and pmd_modify issue found by me.
The history of this goes back beyond Linus' git history. Chris Dearman's
commit 351336929ccf222ae38ff0cb7a8dd5fd5c6236a0 ("[MIPS] Allow setting of
the cache attribute at run time.") missed the opportunity to fix this
but it was originally introduced in lmo commit
d523832cf12007b3242e50bb77d0c9e63e0b6518 ("Missing from last commit.")
and 32cc38229ac7538f2346918a09e75413e8861f87 ("New configuration option
CONFIG_MIPS_UNCACHED.")
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Reported-by: Alastair Bridgewater <alastair.bridgewater@gmail.com>
When truncating a file we should check write access on the underlying
inode. And we should do so on the lower file as well (before copy-up) for
consistency.
Original patch and test case by Aihua Zhang.
- - >o >o - - test.c - - >o >o - -
#include <stdio.h>
#include <errno.h>
#include <unistd.h>
int main(int argc, char *argv[])
{
int ret;
ret = truncate(argv[0], 4096);
if (ret != -1) {
fprintf(stderr, "truncate(argv[0]) should have failed\n");
return 1;
}
if (errno != ETXTBSY) {
perror("truncate(argv[0])");
return 1;
}
return 0;
}
- - >o >o - - >o >o - - >o >o - -
Reported-by: Aihua Zhang <zhangaihua1@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Cc: <stable@vger.kernel.org>
Pull SCSI fixes from James Bottomley:
"Two straightforward fixes.
One is a concurrency issue only affecting SAS connected SATA drives,
but which could hang the storage subsystem if it triggers (because the
outstanding command count on error never goes back to zero) and the
other is a NO_TAG fallout from the switch to hostwide tags which
causes the system to crash on module insertion (we've checked
carefully and only the 53c700 family of drivers is vulnerable to this
issue)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
53c700: fix BUG on untagged commands
scsi: fix race between simultaneous decrements of ->host_failed
Pull drm fixes frlm Dave Airlie:
"Just some AMD and Intel fixes, the AMD ones are further production
Polaris fixes, and the Intel ones fix some early timeouts, some PCI ID
changes and a couple of other fixes.
Still a bit Internet challenged here, hopefully end of next week will
solve it"
* tag 'drm-fixes-for-v4.7-rc6' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Fix missing unlock on error in i915_ppgtt_info()
drm/amd/powerplay: workaround for UVD clock issue
drm/amdgpu: add ACLK_CNTL setting for polaris10
drm/amd/powerplay: fix issue uvd dpm can't enabled on Polaris11.
drm/amd/powerplay: Workaround for Memory EDC Error on Polaris10.
drm/i915: Removing PCI IDs that are no longer listed as Kabylake.
drm/i915: Add more Kabylake PCI IDs.
drm/i915: Avoid early timeout during AUX transfers
drm/i915/hsw: Avoid early timeout during LCPLL disable/restore
drm/i915/lpt: Avoid early timeout during FDI PHY reset
drm/i915/bxt: Avoid early timeout during PLL enable
drm/i915: Refresh cached DP port register value on resume
drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation
drm/amd/powerplay: disable FFC.
drm/amd/powerplay: add some definition for FFC feature on polaris.
Commit d6a9996e84ac ("powerpc/mm: vmalloc abstraction in preparation for
radix") turned kernel memory and IO addresses from #defined constants to
variables initialised at runtime.
On PA6T (pasemi) systems the setup_arch() machine call initialises the
onboard PCI-e root-ports, and uses pci_io_base to do this, which is now
before its value has been set, resulting in a panic early in boot before
console IO is initialised.
Move the pci_io_base initialisation to the same place as vmalloc ranges
are set (hash__early_init_mmu()/radix__early_init_mmu()) - this is the
earliest possible place we can initialise it.
Fixes: d6a9996e84ac ("powerpc/mm: vmalloc abstraction in preparation for radix")
Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Darren Stevens <darren@stevens-zone.net>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
[mpe: Add #ifdef CONFIG_PCI, massage change log slightly]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull btrfs fixes part 2 from Chris Mason:
"This has one patch from Omar to bring iterate_shared back to btrfs.
We have a tree of work we queue up for directory items and it doesn't
lend itself well to shared access. While we're cleaning it up, Omar
has changed things to use an exclusive lock when there are delayed
items"
* 'for-linus-4.7-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: fix ->iterate_shared() by upgrading i_rwsem for delayed nodes
Pull spi fixes from Mark Brown:
"A few small driver-specific fixes for SPI, all in the normal important
if you hit them category especially the rockchip driver fix which
addresses a race which has been exposed more frequently with some
recent performance improvements"
* tag 'spi-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: sunxi: fix transfer timeout
spi: sun4i: fix FIFO limit
spi: rockchip: Signal unfinished DMA transfers
spi: spi-ti-qspi: Suspend the queue before removing the device
here's a batch of i915 fixes for 4.7.
* tag 'drm-intel-fixes-2016-06-30' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Fix missing unlock on error in i915_ppgtt_info()
drm/i915: Removing PCI IDs that are no longer listed as Kabylake.
drm/i915: Add more Kabylake PCI IDs.
drm/i915: Avoid early timeout during AUX transfers
drm/i915/hsw: Avoid early timeout during LCPLL disable/restore
drm/i915/lpt: Avoid early timeout during FDI PHY reset
drm/i915/bxt: Avoid early timeout during PLL enable
drm/i915: Refresh cached DP port register value on resume
Currently we have 2 segments that are bolted for the kernel linear
mapping (ie 0xc000... addresses). This is 0 to 1TB and also the kernel
stacks. Anything accessed outside of these regions may need to be
faulted in. (In practice machines with TM always have 1T segments)
If a machine has < 2TB of memory we never fault on the kernel linear
mapping as these two segments cover all physical memory. If a machine
has > 2TB of memory, there may be structures outside of these two
segments that need to be faulted in. This faulting can occur when
running as a guest as the hypervisor may remove any SLB that's not
bolted.
When we treclaim and trecheckpoint we have a window where we need to
run with the userspace GPRs. This means that we no longer have a valid
stack pointer in r1. For this window we therefore clear MSR RI to
indicate that any exceptions taken at this point won't be able to be
handled. This means that we can't take segment misses in this RI=0
window.
In this RI=0 region, we currently access the thread_struct for the
process being context switched to or from. This thread_struct access
may cause a segment fault since it's not guaranteed to be covered by
the two bolted segment entries described above.
We've seen this with a crash when running as a guest with > 2TB of
memory on PowerVM:
Unrecoverable exception 4100 at c00000000004f138
Oops: Unrecoverable exception, sig: 6 [#1]
SMP NR_CPUS=2048 NUMA pSeries
CPU: 1280 PID: 7755 Comm: kworker/1280:1 Tainted: G X 4.4.13-46-default #1
task: c000189001df4210 ti: c000189001d5c000 task.ti: c000189001d5c000
NIP: c00000000004f138 LR: 0000000010003a24 CTR: 0000000010001b20
REGS: c000189001d5f730 TRAP: 4100 Tainted: G X (4.4.13-46-default)
MSR: 8000000100001031 <SF,ME,IR,DR,LE> CR: 24000048 XER: 00000000
CFAR: c00000000004ed18 SOFTE: 0
GPR00: ffffffffc58d7b60 c000189001d5f9b0 00000000100d7d00 000000003a738288
GPR04: 0000000000002781 0000000000000006 0000000000000000 c0000d1f4d889620
GPR08: 000000000000c350 00000000000008ab 00000000000008ab 00000000100d7af0
GPR12: 00000000100d7ae8 00003ffe787e67a0 0000000000000000 0000000000000211
GPR16: 0000000010001b20 0000000000000000 0000000000800000 00003ffe787df110
GPR20: 0000000000000001 00000000100d1e10 0000000000000000 00003ffe787df050
GPR24: 0000000000000003 0000000000010000 0000000000000000 00003fffe79e2e30
GPR28: 00003fffe79e2e68 00000000003d0f00 00003ffe787e67a0 00003ffe787de680
NIP [c00000000004f138] restore_gprs+0xd0/0x16c
LR [0000000010003a24] 0x10003a24
Call Trace:
[c000189001d5f9b0] [c000189001d5f9f0] 0xc000189001d5f9f0 (unreliable)
[c000189001d5fb90] [c00000000001583c] tm_recheckpoint+0x6c/0xa0
[c000189001d5fbd0] [c000000000015c40] __switch_to+0x2c0/0x350
[c000189001d5fc30] [c0000000007e647c] __schedule+0x32c/0x9c0
[c000189001d5fcb0] [c0000000007e6b58] schedule+0x48/0xc0
[c000189001d5fce0] [c0000000000deabc] worker_thread+0x22c/0x5b0
[c000189001d5fd80] [c0000000000e7000] kthread+0x110/0x130
[c000189001d5fe30] [c000000000009538] ret_from_kernel_thread+0x5c/0xa4
Instruction dump:
7cb103a6 7cc0e3a6 7ca222a6 78a58402 38c00800 7cc62838 08860000 7cc000a6
38a00006 78c60022 7cc62838 0b060000 <e8c701a0> 7ccff120 e8270078 e8a70098
---[ end trace 602126d0a1dedd54 ]---
This fixes this by copying the required data from the thread_struct to
the stack before we clear MSR RI. Then once we clear RI, we only access
the stack, guaranteeing there's no segment miss.
We also tighten the region over which we set RI=0 on the treclaim()
path. This may have a slight performance impact since we're adding an
mtmsr instruction.
Fixes: 090b9284d725 ("powerpc/tm: Clear MSR RI in non-recoverable TM code")
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull btrfs fixes from Chris Mason:
"I have a two part pull this time because one of the patches Dave
Sterba collected needed to be against v4.7-rc2 or higher (we used
rc4). I try to make my for-linus-xx branch testable on top of the
last major so we can hand fixes to people on the list more easily, so
I've split this pull in two.
This first part has some fixes and two performance improvements that
we've been testing for some time.
Josef's two performance fixes are most notable. The transid tracking
patch makes a big improvement on pretty much every workload"
* 'for-linus-4.7' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
Btrfs: Force stripesize to the value of sectorsize
btrfs: fix disk_i_size update bug when fallocate() fails
Btrfs: fix error handling in map_private_extent_buffer
Btrfs: fix error return code in btrfs_init_test_fs()
Btrfs: don't do nocow check unless we have to
btrfs: fix deadlock in delayed_ref_async_start
Btrfs: track transid for delayed ref flushing
Commit fe742fd4f90f ("Revert "btrfs: switch to ->iterate_shared()"")
backed out the conversion to ->iterate_shared() for Btrfs because the
delayed inode handling in btrfs_real_readdir() is racy. However, we can
still do readdir in parallel if there are no delayed nodes.
This is a temporary fix which upgrades the shared inode lock to an
exclusive lock only when we have delayed items until we come up with a
more complete solution. While we're here, rename the
btrfs_{get,put}_delayed_items functions to make it very clear that
they're just for readdir.
Tested with xfstests and by doing a parallel kernel build:
while make tinyconfig && make -j4 && git clean dqfx; do
:
done
along with a bunch of parallel finds in another shell:
while true; do
for ((i=0; i<4; i++)); do
find . >/dev/null &
done
wait
done
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
The untagged command case in the 53c700 driver has been broken since
host wide tags were enabled because the replaced scsi_find_tag()
function had a special case for the tag value SCSI_NO_TAG to retrieve
sdev->current_cmnd. The replacement function scsi_host_find_tag() has
no such special case and returns NULL causing untagged commands to
trigger a BUG() in the driver. Inspection shows that the 53c700 is the
only driver using this SCSI_NO_TAG case, so a local fix in the driver
suffices to fix this problem globally.
Fixes: 64d513ac31b - "scsi: use host wide tags by default"
Cc: stable@vger.kernel.org # 4.4+
Reported-by: Helge Deller <deller@gmx.de>
Tested-by: Helge Deller <deller@gmx.de>
Signed-off-by: James Bottomley <jejb@linux.vnet.ibm.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Acked-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull regulator fixes from Mark Brown:
"Two small fixes for the regulator subsystem - one fixing a crash with
one of the devices supported by the max77620 driver, another fixing
startup for the anatop regulator when it starts up with the regulator
in bypass mode"
* tag 'regulator-fix-v4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: max77620: check for valid regulator info
regulator: anatop: allow regulator to be in bypass mode
Just a few more late fixes for Polaris cards.
* 'drm-fixes-4.7' of git://people.freedesktop.org/~agd5f/linux:
drm/amd/powerplay: workaround for UVD clock issue
drm/amdgpu: add ACLK_CNTL setting for polaris10
drm/amd/powerplay: fix issue uvd dpm can't enabled on Polaris11.
drm/amd/powerplay: Workaround for Memory EDC Error on Polaris10.
drm/amd/powerplay: Update CKS on/ CKS off voltage offset calculation
drm/amd/powerplay: disable FFC.
drm/amd/powerplay: add some definition for FFC feature on polaris.
Add the missing unlock before return from function i915_ppgtt_info()
in the error handling case.
Fixes: 1d2ac403ae3b(drm: Protect dev->filelist with its own mutex)
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: http://patchwork.freedesktop.org/patch/msgid/1465861320-26221-1-git-send-email-weiyj_lk@163.com
(cherry picked from commit b0212486909de4f239ca9f20d032de1b1f2dc52e)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
When calling eeh_rmv_device() in eeh_reset_device() for partial hotplug
case, @rmv_data instead of its address is the proper argument.
Otherwise, the stack frame is corrupted when writing to
@rmv_data (actually its address) in eeh_rmv_device(). It results in
kernel crash as observed.
This fixes the issue by passing @rmv_data, not its address to
eeh_rmv_device() in eeh_reset_device().
Fixes: 67086e32b564 ("powerpc/eeh: powerpc/eeh: Support error recovery for VF PE")
Reported-by: Pridhiviraj Paidipeddi <ppaidipe@in.ibm.com>
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull sound fixes from Takashi Iwai:
"Again pretty calm weeks: we've had only a few trivial / stable
HD-audio fixes in addition to a possible race fix for snd-dummy driver
spotted by syzkaller"
* tag 'sound-4.7-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: dummy: Fix a use-after-free at closing
ALSA: hda / realtek - add two more Thinkpad IDs (5050,5053) for tpt460 fixup
ALSA: hda - Fix the headset mic jack detection on Dell machine
ALSA: hda/tegra: iomem fixups for sparse warnings
ALSA: hdac_regmap - fix the register access for runtime PM
Btrfs code currently assumes stripesize to be same as
sectorsize. However Btrfs-progs (until commit
df05c7ed455f519e6e15e46196392e4757257305) has been setting
btrfs_super_block->stripesize to a value of 4096.
This commit makes sure that the value of btrfs_super_block->stripesize
is a power of 2. Later, it unconditionally sets btrfs_root->stripesize
to sectorsize.
Signed-off-by: Chandan Rajendra <chandan@linux.vnet.ibm.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Pull SCSI fixes from James Bottomley:
"This is a set of four fixes noticed in the merge window. The aacraid
one is an optimisation, the mp3sas one fixes a spurious printk, the
sd_check_events one fixes a theoretical race and the failed zero
length commands fixes a bug in our completion/retry routines that has
been causing problems in the field"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
aacraid: do not activate events on non-SRC adapters
mpt3sas: add missing curly braces
sd: get disk reference in sd_check_events()
scsi_lib: correctly retry failed zero length REQ_TYPE_FS commands
For historic reasons, io_opt is in bytes and max_sectors in block layer
sectors. This interface inconsistency is error prone and should be
fixed. But for 4.4--4.7 let's make the unit difference explicit via a
wrapper function.
Fixes: d0eb20a863ba ("sd: Optimal I/O size is in bytes, not sectors")
Cc: stable@vger.kernel.org # 4.4+
Reported-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Tested-by: Andrew Patterson <andrew.patterson@hpe.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
sas_ata_strategy_handler() adds the works of the ata error handler to
system_unbound_wq. This workqueue asynchronously runs work items, so the
ata error handler will be performed concurrently on different CPUs. In
this case, ->host_failed will be decreased simultaneously in
scsi_eh_finish_cmd() on different CPUs, and become abnormal.
It will lead to permanently inequality between ->host_failed and
->host_busy, and scsi error handler thread won't start running. IO
errors after that won't be handled.
Since all scmds must have been handled in the strategy handler, just
remove the decrement in scsi_eh_finish_cmd() and zero ->host_busy after
the strategy handler to fix this race.
Fixes: 50824d6c5657 ("[SCSI] libsas: async ata-eh")
Cc: stable@vger.kernel.org
Signed-off-by: Wei Fang <fangwei1@huawei.com>
Reviewed-by: James Bottomley <jejb@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull clk fixes from Stephen Boyd:
"A small fix for the newly added oxnas clk driver and a handful of
rockchip clk driver fixes for newly added rk3399 support"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: Fix return value check in oxnas_stdclk_probe()
clk: rockchip: release io resource when failing to init clk on rk3399
clk: rockchip: fix cpuclk registration error handling
clk: rockchip: Revert "clk: rockchip: reset init state before mmc card initialization"
clk: rockchip: fix incorrect parent for rk3399's {c,g}pll_aclk_perihp_src
clk: rockchip: mark rk3399 GIC clocks as critical
clk: rockchip: initialize flags of clk_init_data in mmc-phase clock
When using DMA, the transfer_one callback should return 1 because the
transfer hasn't finished yet.
A previous commit changed the function to return 0 when the DMA channels
were correctly prepared.
This manifested in Veyron boards with this message:
[ 1.983605] cros-ec-spi spi0.0: EC failed to respond in time
Fixes: ea9849113343 ("spi: rockchip: check return value of dmaengine_prep_slave_sg")
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
The trasfer timeout is fixed at 1000 ms. Reading a 4Mbyte flash over
1MHz SPI bus takes way longer than that. Calculate the timeout from the
actual time the transfer is supposed to take and multiply by 2 for good
measure.
Signed-off-by: Michal Suchanek <hramrach@gmail.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Before disabling the pm_runtime, we must ensure that there is no transfer
in progress nor will a new one be started. Otherwise the message pump will
fail and in the end, the process requesting the transfer will be stuck.
This behavior has been observed when transferring data from a SPI flash
with dd while removing the module on a DRA7x-evm.
Signed-off-by: Jean-Jacques Hiblot <jjhiblot@ti.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Pull ACPI fix from Rafael Wysocki:
"Fix an expression in the ACPI PCI IRQ management code added by a
recent commit that overlooked missing parens in it, so the result of
the computation is incorrect in some cases (Sinan Kaya)"
* tag 'acpi-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI,PCI,IRQ: correct operator precedence
This is unusual. Usually IDs listed on early stages of platform
definition are kept there as reserved for later use.
However these IDs here are not listed anymore in any of steppings
and devices IDs tables for Kabylake on configurations overview
section of BSpec.
So it is better removing them before they become used in any
other future platform.
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/1466718636-19675-2-git-send-email-rodrigo.vivi@intel.com
(cherry picked from commit a922eb8d4581c883c37ce6e12dca9ff2cb1ea723)
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Userspace can quite legitimately perform an exec() syscall with a
suspended transaction. exec() does not return to the old process, rather
it load a new one and starts that, the expectation therefore is that the
new process starts not in a transaction. Currently exec() is not treated
any differently to any other syscall which creates problems.
Firstly it could allow a new process to start with a suspended
transaction for a binary that no longer exists. This means that the
checkpointed state won't be valid and if the suspended transaction were
ever to be resumed and subsequently aborted (a possibility which is
exceedingly likely as exec()ing will likely doom the transaction) the
new process will jump to invalid state.
Secondly the incorrect attempt to keep the transactional state while
still zeroing state for the new process creates at least two TM Bad
Things. The first triggers on the rfid to return to userspace as
start_thread() has given the new process a 'clean' MSR but the suspend
will still be set in the hardware MSR. The second TM Bad Thing triggers
in __switch_to() as the processor is still transactionally suspended but
__switch_to() wants to zero the TM sprs for the new process.
This is an example of the outcome of calling exec() with a suspended
transaction. Note the first 700 is likely the first TM bad thing
decsribed earlier only the kernel can't report it as we've loaded
userspace registers. c000000000009980 is the rfid in
fast_exception_return()
Bad kernel stack pointer 3fffcfa1a370 at c000000000009980
Oops: Bad kernel stack pointer, sig: 6 [#1]
CPU: 0 PID: 2006 Comm: tm-execed Not tainted
NIP: c000000000009980 LR: 0000000000000000 CTR: 0000000000000000
REGS: c00000003ffefd40 TRAP: 0700 Not tainted
MSR: 8000000300201031 <SF,ME,IR,DR,LE,TM[SE]> CR: 00000000 XER: 00000000
CFAR: c0000000000098b4 SOFTE: 0
PACATMSCRATCH: b00000010000d033
GPR00: 0000000000000000 00003fffcfa1a370 0000000000000000 0000000000000000
GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR12: 00003fff966611c0 0000000000000000 0000000000000000 0000000000000000
NIP [c000000000009980] fast_exception_return+0xb0/0xb8
LR [0000000000000000] (null)
Call Trace:
Instruction dump:
f84d0278 e9a100d8 7c7b03a6 e84101a0 7c4ff120 e8410170 7c5a03a6 e8010070
e8410080 e8610088 e8810090 e8210078 <4c000024> 48000000 e8610178 88ed023b
Kernel BUG at c000000000043e80 [verbose debug info unavailable]
Unexpected TM Bad Thing exception at c000000000043e80 (msr 0x201033)
Oops: Unrecoverable exception, sig: 6 [#2]
CPU: 0 PID: 2006 Comm: tm-execed Tainted: G D
task: c0000000fbea6d80 ti: c00000003ffec000 task.ti: c0000000fb7ec000
NIP: c000000000043e80 LR: c000000000015a24 CTR: 0000000000000000
REGS: c00000003ffef7e0 TRAP: 0700 Tainted: G D
MSR: 8000000300201033 <SF,ME,IR,DR,RI,LE,TM[SE]> CR: 28002828 XER: 00000000
CFAR: c000000000015a20 SOFTE: 0
PACATMSCRATCH: b00000010000d033
GPR00: 0000000000000000 c00000003ffefa60 c000000000db5500 c0000000fbead000
GPR04: 8000000300001033 2222222222222222 2222222222222222 00000000ff160000
GPR08: 0000000000000000 800000010000d033 c0000000fb7e3ea0 c00000000fe00004
GPR12: 0000000000002200 c00000000fe00000 0000000000000000 0000000000000000
GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
GPR20: 0000000000000000 0000000000000000 c0000000fbea7410 00000000ff160000
GPR24: c0000000ffe1f600 c0000000fbea8700 c0000000fbea8700 c0000000fbead000
GPR28: c000000000e20198 c0000000fbea6d80 c0000000fbeab680 c0000000fbea6d80
NIP [c000000000043e80] tm_restore_sprs+0xc/0x1c
LR [c000000000015a24] __switch_to+0x1f4/0x420
Call Trace:
Instruction dump:
7c800164 4e800020 7c0022a6 f80304a8 7c0222a6 f80304b0 7c0122a6 f80304b8
4e800020 e80304a8 7c0023a6 e80304b0 <7c0223a6> e80304b8 7c0123a6 4e800020
This fixes CVE-2016-5828.
Fixes: bc2a9408fa65 ("powerpc: Hook in new transactional memory code")
Cc: stable@vger.kernel.org # v3.9+
Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
syzkaller fuzzer spotted a potential use-after-free case in snd-dummy
driver when hrtimer is used as backend:
> ==================================================================
> BUG: KASAN: use-after-free in rb_erase+0x1b17/0x2010 at addr ffff88005e5b6f68
> Read of size 8 by task syz-executor/8984
> =============================================================================
> BUG kmalloc-192 (Not tainted): kasan: bad access detected
> -----------------------------------------------------------------------------
>
> Disabling lock debugging due to kernel taint
> INFO: Allocated in 0xbbbbbbbbbbbbbbbb age=18446705582212484632
> ....
> [< none >] dummy_hrtimer_create+0x49/0x1a0 sound/drivers/dummy.c:464
> ....
> INFO: Freed in 0xfffd8e09 age=18446705496313138713 cpu=2164287125 pid=-1
> [< none >] dummy_hrtimer_free+0x68/0x80 sound/drivers/dummy.c:481
> ....
> Call Trace:
> [<ffffffff8179e59e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:333
> [< inline >] rb_set_parent include/linux/rbtree_augmented.h:111
> [< inline >] __rb_erase_augmented include/linux/rbtree_augmented.h:218
> [<ffffffff82ca5787>] rb_erase+0x1b17/0x2010 lib/rbtree.c:427
> [<ffffffff82cb02e8>] timerqueue_del+0x78/0x170 lib/timerqueue.c:86
> [<ffffffff814d0c80>] __remove_hrtimer+0x90/0x220 kernel/time/hrtimer.c:903
> [< inline >] remove_hrtimer kernel/time/hrtimer.c:945
> [<ffffffff814d23da>] hrtimer_try_to_cancel+0x22a/0x570 kernel/time/hrtimer.c:1046
> [<ffffffff814d2742>] hrtimer_cancel+0x22/0x40 kernel/time/hrtimer.c:1066
> [<ffffffff85420531>] dummy_hrtimer_stop+0x91/0xb0 sound/drivers/dummy.c:417
> [<ffffffff854228bf>] dummy_pcm_trigger+0x17f/0x1e0 sound/drivers/dummy.c:507
> [<ffffffff85392170>] snd_pcm_do_stop+0x160/0x1b0 sound/core/pcm_native.c:1106
> [<ffffffff85391b26>] snd_pcm_action_single+0x76/0x120 sound/core/pcm_native.c:956
> [<ffffffff85391e01>] snd_pcm_action+0x231/0x290 sound/core/pcm_native.c:974
> [< inline >] snd_pcm_stop sound/core/pcm_native.c:1139
> [<ffffffff8539754d>] snd_pcm_drop+0x12d/0x1d0 sound/core/pcm_native.c:1784
> [<ffffffff8539d3be>] snd_pcm_common_ioctl1+0xfae/0x2150 sound/core/pcm_native.c:2805
> [<ffffffff8539ee91>] snd_pcm_capture_ioctl1+0x2a1/0x5e0 sound/core/pcm_native.c:2976
> [<ffffffff8539f2ec>] snd_pcm_kernel_ioctl+0x11c/0x160 sound/core/pcm_native.c:3020
> [<ffffffff853d9a44>] snd_pcm_oss_sync+0x3a4/0xa30 sound/core/oss/pcm_oss.c:1693
> [<ffffffff853da27d>] snd_pcm_oss_release+0x1ad/0x280 sound/core/oss/pcm_oss.c:2483
> .....
A workaround is to call hrtimer_cancel() in dummy_hrtimer_sync() which
is called certainly before other blocking ops.
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
When doing truncate operation, btrfs_setsize() will first call
truncate_setsize() to set new inode->i_size, but if later
btrfs_truncate() fails, btrfs_setsize() will call
"i_size_write(inode, BTRFS_I(inode)->disk_i_size)" to reset the
inmemory inode size, now bug occurs. It's because for truncate
case btrfs_ordered_update_i_size() directly uses inode->i_size
to update BTRFS_I(inode)->disk_i_size, indeed we should use the
"offset" argument to update disk_i_size. Here is the call graph:
==>btrfs_truncate()
====>btrfs_truncate_inode_items()
======>btrfs_ordered_update_i_size(inode, last_size, NULL);
Here btrfs_ordered_update_i_size()'s offset argument is last_size.
And below test case can reveal this bug:
dd if=/dev/zero of=fs.img bs=$((1024*1024)) count=100
dev=$(losetup --show -f fs.img)
mkdir -p /mnt/mntpoint
mkfs.btrfs -f $dev
mount $dev /mnt/mntpoint
cd /mnt/mntpoint
echo "workdir is: /mnt/mntpoint"
blocksize=$((128 * 1024))
dd if=/dev/zero of=testfile bs=$blocksize count=1
sync
count=$((17*1024*1024*1024/blocksize))
echo "file size is:" $((count*blocksize))
for ((i = 1; i <= $count; i++)); do
i=$((i + 1))
dst_offset=$((blocksize * i))
xfs_io -f -c "reflink testfile 0 $dst_offset $blocksize"\
testfile > /dev/null
done
sync
truncate --size 0 testfile
ls -l testfile
du -sh testfile
exit
In this case, truncate operation will fail for enospc reason and
"du -sh testfile" returns value greater than 0, but testfile's
size is 0, we need to reflect correct inode->i_size.
Signed-off-by: Wang Xiaoguang <wangxg.fnst@cn.fujitsu.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Chris Mason <clm@fb.com>
Pull UDF fixes and a reiserfs fix from Jan Kara:
"A couple of udf fixes (most notably a bug in parsing UDF partitions
which led to inability to mount recent Windows installation media) and
a reiserfs fix for handling kstrdup failure"
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
reiserfs: check kstrdup failure
udf: Use correct partition reference number for metadata
udf: Use IS_ERR when loading metadata mirror file entry
udf: Don't BUG on missing metadata partition descriptor
Only SRC-based adapters support the AifReqEvent function, so there is no
point in trying to activate it on older, non-SRC based adapters. Doing
so lead to crashes on older adapters.
Signed-off-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.com>
Reviewed-by: Raghava Aditya Renukunta <RaghavaAaditya.Renukunta@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Linux fails to boot as a guest with a QEMU CD-ROM:
[ 4.439488] ata2.00: ATAPI: QEMU CD-ROM, 0.8.2, max UDMA/100
[ 4.443649] ata2.00: configured for MWDMA2
[ 4.450267] scsi 1:0:0:0: CD-ROM QEMU QEMU CD-ROM 0.8. PQ: 0 ANSI: 5
[ 4.464317] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 4.464319] ata2.00: BMDMA stat 0x5
[ 4.464339] ata2.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 0 dma 16640 in
[ 4.464339] Inquiry 12 01 00 00 ff 00res 48/20:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
[ 4.464341] ata2.00: status: { DRDY DRQ }
[ 4.465864] ata2: soft resetting link
[ 4.625971] ata2.00: configured for MWDMA2
[ 4.628290] ata2: EH complete
[ 4.646670] ata2.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen
[ 4.646671] ata2.00: BMDMA stat 0x5
[ 4.646683] ata2.00: cmd a0/01:00:00:00:01/00:00:00:00:00/a0 tag 0 dma 16640 in
[ 4.646683] Inquiry 12 01 00 00 ff 00res 48/20:02:00:24:00/00:00:00:00:00/a0 Emask 0x2 (HSM violation)
[ 4.646685] ata2.00: status: { DRDY DRQ }
[ 4.648193] ata2: soft resetting link
...
Fix this by suppressing VPD inquiry for this device.
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Reported-by: Jan Stancek <jstancek@redhat.com>
Tested-by: Jan Stancek <jstancek@redhat.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A bunch of fixes. Some for the newly added rk3399 clock tree, some
concerning error handling and initialization and a revert of the
mmc-phase clock initialization, as this could conflict with the
bootloader setting of this clock and a real solution to initing
the phase correctly from dw_mmc went in as fix for 4.7 through
the mmc tree.
* tag 'v4.7-rockchip-clk-fixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip:
clk: rockchip: release io resource when failing to init clk on rk3399
clk: rockchip: fix cpuclk registration error handling
clk: rockchip: Revert "clk: rockchip: reset init state before mmc card initialization"
clk: rockchip: fix incorrect parent for rk3399's {c,g}pll_aclk_perihp_src
clk: rockchip: mark rk3399 GIC clocks as critical
clk: rockchip: initialize flags of clk_init_data in mmc-phase clock
Bypass support was added in commit d38018f2019c ("regulator: anatop: Add
bypass support to digital LDOs"). A check for valid voltage selectors was
added in commit da0607c8df5c ("regulator: anatop: Fail on invalid voltage
selector") but it also discards all regulators that are in bypass mode. Add
check for the bypass setting. Errors below were seen on a Variscite mx6
board.
anatop_regulator 20c8000.anatop:regulator-vddcore@140: Failed to read a valid default voltage selector.
anatop_regulator: probe of 20c8000.anatop:regulator-vddcore@140 failed with error -22
anatop_regulator 20c8000.anatop:regulator-vddsoc@140: Failed to read a valid default voltage selector.
anatop_regulator: probe of 20c8000.anatop:regulator-vddsoc@140 failed with error -22
Fixes: da0607c8df5c ("regulator: anatop: Fail on invalid voltage selector")
Signed-off-by: Mika Båtsman <mbatsman@mvista.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
SD4 regulator is not registered with regulator core
framework in probe as there is no support in MAX77620 PMIC,
removing SD4 entry from MAX77620 regulator information list
and checking for valid regulator information data before
configuring FPS source and FPS power up/down period to avoid
NULL pointer exception if regulator not registered with core.
Signed-off-by: Venkat Reddy Talla <vreddytalla@nvidia.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
When testing SPI without DMA I noticed that filling the FIFO on the
spi controller causes timeout.
Always leave room for one byte in the FIFO.
Signed-off-by: Michal Suchanek <hramrach@gmail.com>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Pull power management fixes from Rafael Wysocki:
"Three cpufreq fixes, one in the core (stable-candidate) and two in
drivers (intel_pstate and cpufreq-dt).
Specifics:
- Fix a recent intel_pstate regression that caused the number of
wakeups to increase significantly on an idle system in some cases
due to excessive synchronize_sched() invocations (Rafael Wysocki).
- Fix unnecessary invocations of WARN_ON() in the cpufreq core after
cpufreq has been suspended introduced during the 4.6 cycla (Rafael
Wysocki).
- Fix an error code path in the cpufreq-dt-platdev driver that
forgets to drop a reference to a DT node (Masahiro Yamada)"
* tag 'pm-4.7-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Avoid false-positive WARN_ON()s in cpufreq_update_policy()
cpufreq: dt: call of_node_put() before error out
intel_pstate: Do not clear utilization update hooks on policy changes