commits
Pull libnvdimm fixes from Dan Williams:
"These fixes are all tagged for -stable and have received a build
success notification from the kbuild robot.
- NVDIMM namespaces, configured to enforce 1GB alignment, fail to
initialize on platforms that mis-align the start or end of the
physical address range.
- The Linux implementation of the BTT (Block Translation Table) is
incompatible with the UEFI 2.7 definition of the BTT format. The
BTT layers a software atomic sector semantic on top of an NVDIMM
namespace. Linux needs to be compatible with the UEFI definition to
enable boot support or any pre-OS access of data on a BTT enabled
namespace.
- A fix for ACPI SMART notification events, this allows a userspace
monitor to register for health events rather than poll. This has
been broken since it was initially merged as the unit test
inadvertently worked around the problem. The urgency for fixing
this during the -rc series is driven by how expensive it is to poll
for this data (System Management Mode entry)"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm, btt: Fix an incompatibility in the log layout
libnvdimm, btt: add a couple of missing kernel-doc lines
libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment
libnvdimm, pfn: fix start_pad handling for aligned namespaces
acpi, nfit: fix health event notification
Pull x86 PTI preparatory patches from Thomas Gleixner:
"Todays Advent calendar window contains twentyfour easy to digest
patches. The original plan was to have twenty three matching the date,
but a late fixup made that moot.
- Move the cpu_entry_area mapping out of the fixmap into a separate
address space. That's necessary because the fixmap becomes too big
with NRCPUS=8192 and this caused already subtle and hard to
diagnose failures.
The top most patch is fresh from today and cures a brain slip of
that tall grumpy german greybeard, who ignored the intricacies of
32bit wraparounds.
- Limit the number of CPUs on 32bit to 64. That's insane big already,
but at least it's small enough to prevent address space issues with
the cpu_entry_area map, which have been observed and debugged with
the fixmap code
- A few TLB flush fixes in various places plus documentation which of
the TLB functions should be used for what.
- Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for
more than sysenter now and keeping the name makes backtraces
confusing.
- Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(),
which is only invoked on fork().
- Make vysycall more robust.
- A few fixes and cleanups of the debug_pagetables code. Check
PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the
C89 initialization of the address hint array which already was out
of sync with the index enums.
- Move the ESPFIX init to a different place to prepare for PTI.
- Several code moves with no functional change to make PTI
integration simpler and header files less convoluted.
- Documentation fixes and clarifications"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit
init: Invoke init_espfix_bsp() from mm_init()
x86/cpu_entry_area: Move it out of the fixmap
x86/cpu_entry_area: Move it to a separate unit
x86/mm: Create asm/invpcid.h
x86/mm: Put MMU to hardware ASID translation in one place
x86/mm: Remove hard-coded ASID limit checks
x86/mm: Move the CR3 construction functions to tlbflush.h
x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what
x86/mm: Remove superfluous barriers
x86/mm: Use __flush_tlb_one() for kernel memory
x86/microcode: Dont abuse the TLB-flush interface
x86/uv: Use the right TLB-flush API
x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack
x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation
x86/mm/64: Improve the memory map documentation
x86/ldt: Prevent LDT inheritance on exec
x86/ldt: Rework locking
arch, mm: Allow arch_dup_mmap() to fail
x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
...
Due to a spec misinterpretation, the Linux implementation of the BTT log
area had different padding scheme from other implementations, such as
UEFI and NVML.
This fixes the padding scheme, and defaults to it for new BTT layouts.
We attempt to detect the padding scheme in use when probing for an
existing BTT. If we detect the older/incompatible scheme, we continue
using it.
Reported-by: Juston Li <juston.li@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull powerpc fixes from Michael Ellerman:
"This is all fairly boring, except that there's two KVM fixes that
you'd normally get via Paul's kvm-ppc tree. He's away so I picked them
up. I was waiting to see if he would apply them, which is why they
have only been in my tree since today. But they were on the list for a
while and have been tested on the relevant hardware.
Of note is two fixes for KVM XIVE (Power9 interrupt controller). These
would normally go via the KVM tree but Paul is away so I've picked
them up.
Other than that, two fixes for error handling in the IMC driver, and
one for a potential oops in the BHRB code if the hardware records a
branch address that has subsequently been unmapped, and finally a
s/%p/%px/ in our oops code.
Thanks to: Anju T Sudhakar, Cédric Le Goater, Laurent Vivier, Madhavan
Srinivasan, Naveen N. Rao, Ravi Bangoria"
* tag 'powerpc-4.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Fix pending_pri value in kvmppc_xive_get_icp()
KVM: PPC: Book3S: fix XIVE migration of pending interrupts
powerpc/kernel: Print actual address of regs when oopsing
powerpc/perf: Fix kfree memory allocated for nest pmus
powerpc/perf/imc: Fix nest-imc cpuhotplug callback failure
powerpc/perf: Dereference BHRB entries safely
The loop which populates the CPU entry area PMDs can wrap around on 32bit
machines when the number of CPUs is small.
It worked wonderful for NR_CPUS=64 for whatever reason and the moron who
wrote that code did not bother to test it with !SMP.
Check for the wraparound to fix it.
Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas "Feels stupid" Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Recent updates to btt.h neglected to add corresponding kernel-doc lines
for new structure members. Add them.
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull xen fixes from Juergen Gross:
"This contains two fixes for running under Xen:
- a fix avoiding resource conflicts between adding mmio areas and
memory hotplug
- a fix setting NX bits in page table entries copied from Xen when
running a PV guest"
* tag 'for-linus-4.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: Mark unallocated host memory as UNUSABLE
x86-64/Xen: eliminate W+X mappings
When we migrate a VM from a POWER8 host (XICS) to a POWER9 host
(XICS-on-XIVE), we have an error:
qemu-kvm: Unable to restore KVM interrupt controller state \
(0xff000000) for CPU 0: Invalid argument
This is because kvmppc_xics_set_icp() checks the new state
is internaly consistent, and especially:
...
1129 if (xisr == 0) {
1130 if (pending_pri != 0xff)
1131 return -EINVAL;
...
On the other side, kvmppc_xive_get_icp() doesn't set
neither the pending_pri value, nor the xisr value (set to 0)
(and kvmppc_xive_set_icp() ignores the pending_pri value)
As xisr is 0, pending_pri must be set to 0xff.
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
init_espfix_bsp() needs to be invoked before the page table isolation
initialization. Move it into mm_init() which is the place where pti_init()
will be added.
While at it get rid of the #ifdeffery and provide proper stub functions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following namespace configuration attempt:
# ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f
libndctl: ndctl_dax_enable: dax0.1: failed to enable
Error: namespace0.0: failed to enable
failed to reconfigure namespace: No such device or address
...fails when the backing memory range is not physically aligned to 1G:
# cat /proc/iomem | grep Persistent
210000000-30fffffff : Persistent Memory (legacy)
In the above example the 4G persistent memory range starts and ends on a
256MB boundary.
We handle this case correctly when needing to handle cases that violate
section alignment (128MB) collisions against "System RAM", and we simply
need to extend that padding/truncation for the 1GB alignment use case.
Cc: <stable@vger.kernel.org>
Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute...")
Reported-and-tested-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull xfs fixes from Darrick Wong:
"Here are some XFS fixes for 4.15-rc5. Apologies for the unusually
large number of patches this late, but I wanted to make sure the
corruption fixes were really ready to go.
Changes since last update:
- Fix a locking problem during xattr block conversion that could lead
to the log checkpointing thread to try to write an incomplete
buffer to disk, which leads to a corruption shutdown
- Fix a null pointer dereference when removing delayed allocation
extents
- Remove post-eof speculative allocations when reflinking a block
past current inode size so that we don't just leave them there and
assert on inode reclaim
- Relax an assert which didn't accurately reflect the way locking
works and would trigger under heavy io load
- Avoid infinite loop when cancelling copy on write extents after a
writeback failure
- Try to avoid copy on write transaction reservation overflows when
remapping after a successful write
- Fix various problems with the copy-on-write reservation automatic
garbage collection not being cleaned up properly during a ro
remount
- Fix problems with rmap log items being processed in the wrong
order, leading to corruption shutdowns
- Fix problems with EFI recovery wherein the "remove any rmapping if
present" mechanism wasn't actually doing anything, which would lead
to corruption problems later when the extent is reallocated,
leading to multiple rmaps for the same extent"
* tag 'xfs-4.15-fixes-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: only skip rmap owner checks for unknown-owner rmap removal
xfs: always honor OWN_UNKNOWN rmap removal requests
xfs: queue deferred rmap ops for cow staging extent alloc/free in the right order
xfs: set cowblocks tag for direct cow writes too
xfs: remove leftover CoW reservations when remounting ro
xfs: don't be so eager to clear the cowblocks tag on truncate
xfs: track cowblocks separately in i_flags
xfs: allow CoW remap transactions to use reserve blocks
xfs: avoid infinite loop when cancelling CoW blocks after writeback failure
xfs: relax is_reflink_inode assert in xfs_reflink_find_cow_mapping
xfs: remove dest file's post-eof preallocations before reflinking
xfs: move xfs_iext_insert tracepoint to report useful information
xfs: account for null transactions in bunmapi
xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of an attribute
xfs: add the ability to join a held buffer to a defer_ops
Commit f5775e0b6116 ("x86/xen: discard RAM regions above the maximum
reservation") left host memory not assigned to dom0 as available for
memory hotplug.
Unfortunately this also meant that those regions could be used by
others. Specifically, commit fa564ad96366 ("x86/PCI: Enable a 64bit BAR
on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") may try to map those
addresses as MMIO.
To prevent this mark unallocated host memory as E820_TYPE_UNUSABLE (thus
effectively reverting f5775e0b6116) and keep track of that region as
a hostmem resource that can be used for the hotplug.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
When restoring a pending interrupt, we are setting the Q bit to force
a retrigger in xive_finish_unmask(). But we also need to force an EOI
in this case to reach the same initial state : P=1, Q=0.
This can be done by not setting 'old_p' for pending interrupts which
will inform xive_finish_unmask() that an EOI needs to be sent.
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big
and 0-day already hit a case where the fixmap PTEs were cleared by
cleanup_highmap().
Aside of that the fixmap API is a pain as it's all backwards.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The alignment checks at pfn driver startup fail to properly account for
the 'start_pad' in the case where the namespace is misaligned relative
to its internal alignment. This is typically triggered in 1G aligned
namespace, but could theoretically trigger with small namespace
alignments. When this triggers the kernel reports messages of the form:
dax2.1: bad offset: 0x3c000000 dax disabled align: 0x40000000
Cc: <stable@vger.kernel.org>
Fixes: 1ee6667cd8d1 ("libnvdimm, pfn, dax: fix initialization vs autodetect...")
Reported-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- fix chacha20 crash on zero-length input due to unset IV
- fix potential race conditions in mcryptd with spinlock
- only wait once at top of algif recvmsg to avoid inconsistencies
- fix potential use-after-free in algif_aead/algif_skcipher"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: af_alg - fix race accessing cipher request
crypto: mcryptd - protect the per-CPU queue with a lock
crypto: af_alg - wait for data at beginning of recvmsg
crypto: skcipher - set walk.iv for zero-length inputs
For rmap removal, refactor the rmap owner checks into a separate
function, then skip the checks if we are performing an unknown-owner
removal.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
A few thousand such pages are usually left around due to the re-use of
L1 tables having been provided by the hypervisor (Dom0) or tool stack
(DomU). Set NX in the direct map variant, which needs to be done in L2
due to the dual use of the re-used L1s.
For x86_configure_nx() to actually do what it is supposed to do, call
get_cpu_cap() first. This was broken by commit 4763ed4d45 ("x86, mm:
Clean up and simplify NX enablement") when switching away from the
direct EFER read.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
When we oops or otherwise call show_regs() we print the address of the
regs structure. Being able to see the address is fairly useful,
firstly to verify that the regs pointer is not completely bogus, and
secondly it allows you to dump the regs and surrounding memory with a
debugger if you have one.
In the normal case the regs will be located somewhere on the stack, so
printing their location discloses no further information than printing
the stack pointer does already.
So switch to %px and print the actual address, not the hashed value.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Separate the cpu_entry_area code out of cpu/common.c and the fixmap.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Integration testing with a BIOS that generates injected health event
notifications fails to communicate those events to userspace. The nfit
driver neglects to link the ACPI DIMM device with the necessary driver
data so acpi_nvdimm_notify() fails this lookup:
nfit_mem = dev_get_drvdata(dev);
if (nfit_mem && nfit_mem->flags_attr)
sysfs_notify_dirent(nfit_mem->flags_attr);
Add the necessary linkage when installing the notification handler and
clean it up when the nfit driver instance is torn down.
Cc: <stable@vger.kernel.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Fixes: ba9c8dd3c222 ("acpi, nfit: add dimm device notification support")
Reported-by: Daniel Osawa <daniel.k.osawa@intel.com>
Tested-by: Daniel Osawa <daniel.k.osawa@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull pin control fix from Linus Walleij:
"A single pin control fix for Intel machines, affecting a bunch of
Chromebooks. Nothing else collected up amazingly"
* tag 'pinctrl-v4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
When invoking an asynchronous cipher operation, the invocation of the
callback may be performed before the subsequent operations in the
initial code path are invoked. The callback deletes the cipher request
data structure which implies that after the invocation of the
asynchronous cipher operation, this data structure must not be accessed
any more.
The setting of the return code size with the request data structure must
therefore be moved before the invocation of the asynchronous cipher
operation.
Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management")
Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management")
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Calling xfs_rmap_free with an unknown owner is supposed to remove any
rmaps covering that range regardless of owner. This is used by the EFI
recovery code to say "we're freeing this, it mustn't be owned by
anything anymore", but for whatever reason xfs_free_ag_extent filters
them out.
Therefore, remove the filter and make xfs_rmap_unmap actually treat it
as a wildcard owner -- free anything that's already there, and if
there's no owner at all then that's fine too.
There are two existing callers of bmap_add_free that take care the rmap
deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
cleanup; convert these to use OWN_NULL (via helpers), and now we really
require that an RUI (if any) gets added to the defer ops before any EFI.
Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
growfs will have to consult directly with the rmap to ensure that there
aren't any rmaps in the grown region.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Add a respective dependency.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
imc_common_cpuhp_mem_free() is the common function for all
IMC (In-memory Collection counters) domains to unregister cpuhotplug
callback and free memory. Since kfree of memory allocated for
nest-imc (per_nest_pmu_arr) is in the common code, all
domains (core/nest/thread) can do the kfree in the failure case.
This could potentially create a call trace as shown below, where
core(/thread/nest) imc pmu initialization fails and in the failure
path imc_common_cpuhp_mem_free() free the memory(per_nest_pmu_arr),
which is allocated by successfully registered nest units.
The call trace is generated in a scenario where core-imc
initialization is made to fail and a cpuhotplug is performed in a p9
system. During cpuhotplug ppc_nest_imc_cpu_offline() tries to access
per_nest_pmu_arr, which is already freed by core-imc.
NIP [c000000000cb6a94] mutex_lock+0x34/0x90
LR [c000000000cb6a88] mutex_lock+0x28/0x90
Call Trace:
mutex_lock+0x28/0x90 (unreliable)
perf_pmu_migrate_context+0x90/0x3a0
ppc_nest_imc_cpu_offline+0x190/0x1f0
cpuhp_invoke_callback+0x160/0x820
cpuhp_thread_fun+0x1bc/0x270
smpboot_thread_fn+0x250/0x290
kthread+0x1a8/0x1b0
ret_from_kernel_thread+0x5c/0x74
To address this scenario do the kfree(per_nest_pmu_arr) only in case
of nest-imc initialization failure, and when there is no other nest
units registered.
Fixes: 73ce9aec65b1 ("powerpc/perf: Fix IMC_MAX_PMU macro")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Unclutter tlbflush.h a little.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull drm fixes from Dave Airlie:
"I've got most of two weeks worth of fixes here due to being on
holidays last week.
The main things are:
- Core:
* Syncobj fd reference count fix
* Leasing ioctl misuse fix
- nouveau regression fixes
- further amdgpu DC fixes
- sun4i regression fixes
I'm not sure I'll see many fixes over next couple of weeks, we'll see
how we go"
* tag 'drm-fixes-for-v4.15-rc5' of git://people.freedesktop.org/~airlied/linux: (27 commits)
drm/syncobj: Stop reusing the same struct file for all syncobj -> fd
drm: move lease init after validation in drm_lease_create
drm/plane: Make framebuffer refcounting the responsibility of setplane_internal callers
drm/sun4i: hdmi: Move the mode_valid callback to the encoder
drm/nouveau: fix obvious memory leak
drm/i915: Protect DDI port to DPLL map from theoretical race.
drm/i915/lpe: Remove double-encapsulation of info string
drm/sun4i: Fix error path handling
drm/nouveau: use alternate memory type for system-memory buffers with kind != 0
drm/nouveau: avoid GPU page sizes > PAGE_SIZE for buffer objects in host memory
drm/nouveau/mmu/gp10b: use correct implementation
drm/nouveau/pci: do a msi rearm on init
drm/nouveau/imem/nv50: fix refcount_t warning
drm/nouveau/bios/dp: support DP Info Table 2.0
drm/nouveau/fbcon: fix NULL pointer access in nouveau_fbcon_destroy
drm/amd/display: Fix rehook MST display not light back on
drm/amd/display: fix missing pixel clock adjustment for dongle
drm/amd/display: set chroma taps to 1 when not scaling
drm/amd/display: add pipe locking before front end programing
drm/sun4i: validate modes for HDMI
...
Guenter Roeck reported an interrupt storm on a prototype system which is
based on Cyan Chromebook. The root cause turned out to be a incorrectly
configured pin that triggers spurious interrupts. This will be fixed in
coreboot but currently we need to prevent the interrupt storm from
happening by masking all interrupts (but not GPEs) on those systems.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197953
Fixes: bcb48cca23ec ("pinctrl: cherryview: Do not mask all interrupts in probe")
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
mcryptd_enqueue_request() grabs the per-CPU queue struct and protects
access to it with disabled preemption. Then it schedules a worker on the
same CPU. The worker in mcryptd_queue_worker() guards access to the same
per-CPU variable with disabled preemption.
If we take CPU-hotplug into account then it is possible that between
queue_work_on() and the actual invocation of the worker the CPU goes
down and the worker will be scheduled on _another_ CPU. And here the
preempt_disable() protection does not work anymore. The easiest thing is
to add a spin_lock() to guard access to the list.
Another detail: mcryptd_queue_worker() is not processing more than
MCRYPTD_BATCH invocation in a row. If there are still items left, then
it will invoke queue_work() to proceed with more later. *I* would
suggest to simply drop that check because it does not use a system
workqueue and the workqueue is already marked as "CPU_INTENSIVE". And if
preemption is required then the scheduler should do it.
However if queue_work() is used then the work item is marked as CPU
unbound. That means it will try to run on the local CPU but it may run
on another CPU as well. Especially with CONFIG_DEBUG_WQ_FORCE_RR_CPU=y.
Again, the preempt_disable() won't work here but lock which was
introduced will help.
In order to keep work-item on the local CPU (and avoid RR) I changed it
to queue_work_on().
Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Under the deferred rmap operation scheme, there's a certain order in
which the rmap deferred ops have to be queued to maintain integrity
during log replay. For alloc/map operations that order is cui -> rui;
for free/unmap operations that order is cui -> rui -> efi. However, the
initial refcount code got the ordering wrong in the free side of things
because it queued refcount free op and an EFI and the refcount free op
queued a rmap free op, resulting in the order cui -> efi -> rui.
If we fail before the efd finishes, the efi recovery will try to do a
wildcard rmap removal and the subsequent rui will fail to find the rmap
and blow up. This didn't ever happen due to other screws up in handling
unknown owner rmap removals, but those other screw ups broke recovery in
other ways, so fix the ordering to follow the intended rules.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Unconditionally reporting a value seen on the P4 or older invokes
functionality like io_apic_get_unique_id() on 32-bit builds, resulting
in a panic() with sufficiently many CPUs and/or IO-APICs. Doing what
that function does would be the hypervisor's responsibility anyway, so
makes no sense to be used when running on Xen. Uniformly report a more
modern version; this shouldn't matter much as both LAPIC and IO-APIC are
being managed entirely / mostly by the hypervisor.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Oops is observed during boot:
Faulting instruction address: 0xc000000000248340
cpu 0x0: Vector: 380 (Data Access Out of Range) at [c000000ff66fb850]
pc: c000000000248340: event_function_call+0x50/0x1f0
lr: c00000000024878c: perf_remove_from_context+0x3c/0x100
sp: c000000ff66fbad0
msr: 9000000000009033
dar: 7d20e2a6f92d03c0
pid = 14, comm = cpuhp/0
While registering the cpuhotplug callbacks for nest-imc, if we fail in
the cpuhotplug online path for any random node in a multi node
system (because the opal call to stop nest-imc counters fails for that
node), ppc_nest_imc_cpu_offline() will get invoked for other nodes who
successfully returned from cpuhotplug online path.
This call trace is generated since in the ppc_nest_imc_cpu_offline()
path we are trying to migrate the event context, when nest-imc
counters are not even initialized.
Patch to add a check to ensure that nest-imc is registered before
migrating the event context.
Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There are effectively two ASID types:
1. The one stored in the mmu_context that goes from 0..5
2. The one programmed into the hardware that goes from 1..6
This consolidates the locations where converting between the two (by doing
a +1) to a single place which gives us a nice place to comment.
PAGE_TABLE_ISOLATION will also need to, given an ASID, know which hardware
ASID to flush for the userspace mapping.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull ARM fix from Russell King:
"Just one fix this time around, for the late commit in the merge window
that triggered a problem with qemu. Qemu is apparently also going to
receive a fix for the discovered issue"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: avoid faulting on qemu
Pull clk fixes from Stephen Boyd:
"Here's a trio of fixes:
- The runtime PM clk patches that landed this merge window forgot to
runtime resume devices that may be off while recalculating and
setting rates of child clks of whatever clk is changing rates.
- We had a NULL pointer deref in an old clk tracepoint when
clk_set_parent() is called with a NULL parent pointer. This
shouldn't really happen, but it's best to avoid this regardless.
- The sun9i-mmc clk driver didn't provide 'reset' support, just
'assert' and 'deassert' so the MMC driver stopped probing when the
probe was changed to do a reset instead of assert/deassert pair.
This implements the reset so things work again"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi: sun9i-mmc: Implement reset callback for reset controls
clk: fix a panic error caused by accessing NULL pointer
clk: Manage proper runtime PM state in clk_change_rate()
The vk cts test:
dEQP-VK.api.external.semaphore.opaque_fd.export_multiple_times_temporary
triggers a lot of
VFS: Close: file count is 0
Dave pointed out that clearing the syncobj->file from
drm_syncobj_file_release() was sufficient to silence the test, but that
opens a can of worm since we assumed that the syncobj->file was never
unset. Stop trying to reuse the same struct file for every fd pointing
to the drm_syncobj, and allocate one file for each fd instead.
v2: Fixup return handling of drm_syncobj_fd_to_handle
v2.1: [airlied: fix possible syncobj ref race]
Reported-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The wait for data is a non-atomic operation that can sleep and therefore
potentially release the socket lock. The release of the socket lock
allows another thread to modify the context data structure. The waiting
operation for new data therefore must be called at the beginning of
recvmsg. This prevents a race condition where checks of the members of
the context data structure are performed by recvmsg while there is a
potential for modification of these values.
Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management")
Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management")
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If a user performs a direct CoW write, we end up loading the CoW fork
with preallocated extents. Therefore, we must set the cowblocks tag so
that they can be cleared out if we run low on space.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
bedata->ref can't be less than zero because it's unsigned. This affects
certain error paths in probe. We first set ->ref = -1 and then we set
it to a valid value later.
Fixes: 219681909913 ("xen/pvcalls: connect to the backend")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
It's theoretically possible that branch instructions recorded in
BHRB (Branch History Rolling Buffer) entries have already been
unmapped before they are processed by the kernel. Hence, trying to
dereference such memory location will result in a crash. eg:
Unable to handle kernel paging request for data at address 0xd000000019c41764
Faulting instruction address: 0xc000000000084a14
NIP [c000000000084a14] branch_target+0x4/0x70
LR [c0000000000eb828] record_and_restart+0x568/0x5c0
Call Trace:
[c0000000000eb3b4] record_and_restart+0xf4/0x5c0 (unreliable)
[c0000000000ec378] perf_event_interrupt+0x298/0x460
[c000000000027964] performance_monitor_exception+0x54/0x70
[c000000000009ba4] performance_monitor_common+0x114/0x120
Fix it by deferefencing the addresses safely.
Fixes: 691231846ceb ("powerpc/perf: Fix setting of "to" addresses for BHRB")
Cc: stable@vger.kernel.org # v3.10+
Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Use probe_kernel_read() which is clearer, tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
First, it's nice to remove the magic numbers.
Second, PAGE_TABLE_ISOLATION is going to consume half of the available ASID
space. The space is currently unused, but add a comment to spell out this
new restriction.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull i2c fixes from Wolfram Sang:
"Here are two bugfixes for I2C, fixing a memleak in the core and irq
allocation for i801.
Also three bugfixes for the at24 eeprom driver which Bartosz collected
while taking over maintainership for this driver"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
eeprom: at24: check at24_read/write arguments
eeprom: at24: fix reading from 24MAC402/24MAC602
eeprom: at24: correctly set the size for at24mac402
i2c: i2c-boardinfo: fix memory leaks on devinfo
i2c: i801: Fix Failed to allocate irq -2147483648 error
When qemu starts a kernel in a bare environment, the default SCR has
the AW and FW bits clear, which means that the kernel can't modify
the PSR A or PSR F bits, and means that FIQs and imprecise aborts are
always masked.
When running uboot under qemu, the AW and FW SCR bits are set, and the
kernel functions normally - and this is how real hardware behaves.
Fix this for qemu by ignoring the FIQ bit.
Fixes: 8bafae202c82 ("ARM: BUG if jumping to usermode address in kernel mode")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Pull networking fixes from David Miller"
"What's a holiday weekend without some networking bug fixes? [1]
1) Fix some eBPF JIT bugs wrt. SKB pointers across helper function
calls, from Daniel Borkmann.
2) Fix regression from errata limiting change to marvell PHY driver,
from Zhao Qiang.
3) Fix u16 overflow in SCTP, from Xin Long.
4) Fix potential memory leak during bridge newlink, from Nikolay
Aleksandrov.
5) Fix BPF selftest build on s390, from Hendrik Brueckner.
6) Don't append to cfg80211 automatically generated certs file,
always write new ones from scratch. From Thierry Reding.
7) Fix sleep in atomic in mac80211 hwsim, from Jia-Ju Bai.
8) Fix hang on tg3 MTU change with certain chips, from Brian King.
9) Add stall detection to arc emac driver and reset chip when this
happens, from Alexander Kochetkov.
10) Fix MTU limitng in GRE tunnel drivers, from Xin Long.
11) Fix stmmac timestamping bug due to mis-shifting of field. From
Fredrik Hallenberg.
12) Fix metrics match when deleting an ipv4 route. The kernel sets
some internal metrics bits which the user isn't going to set when
it makes the delete request. From Phil Sutter.
13) mvneta driver loop over RX queues limits on "txq_number" :-) Fix
from Yelena Krivosheev.
14) Fix double free and memory corruption in get_net_ns_by_id, from
Eric W. Biederman.
15) Flush ipv4 FIB tables in the reverse order. Some tables can share
their actual backing data, in particular this happens for the MAIN
and LOCAL tables. We have to kill the LOCAL table first, because
it uses MAIN's backing memory. Fix from Ido Schimmel.
16) Several eBPF verifier value tracking fixes, from Edward Cree, Jann
Horn, and Alexei Starovoitov.
17) Make changes to ipv6 autoflowlabel sysctl really propagate to
sockets, unless the socket has set the per-socket value
explicitly. From Shaohua Li.
18) Fix leaks and double callback invocations of zerocopy SKBs, from
Willem de Bruijn"
[1] Is this a trick question? "Relaxing"? "Quiet"? "Fine"? - Linus.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
skbuff: skb_copy_ubufs must release uarg even without user frags
skbuff: orphan frags before zerocopy clone
net: reevalulate autoflowlabel setting after sysctl setting
openvswitch: Fix pop_vlan action for double tagged frames
ipv6: Honor specified parameters in fibmatch lookup
bpf: do not allow root to mangle valid pointers
selftests/bpf: add tests for recent bugfixes
bpf: fix integer overflows
bpf: don't prune branches when a scalar is replaced with a pointer
bpf: force strict alignment checks for stack pointers
bpf: fix missing error return in check_stack_boundary()
bpf: fix 32-bit ALU op verification
bpf: fix incorrect tracking of register size truncation
bpf: fix incorrect sign extension in check_alu_op()
bpf/verifier: fix bounds calculation on BPF_RSH
ipv4: Fix use-after-free when flushing FIB tables
s390/qeth: fix error handling in checksum cmd callback
tipc: remove joining group member from congested list
selftests: net: Adding config fragment CONFIG_NUMA=y
nfp: bpf: keep track of the offloaded program
...
Our MMC host driver now issues a reset, instead of just deasserting
the reset control, since commit c34eda69ad4c ("mmc: sunxi: Reset the
device at probe time"). The sun9i-mmc clock driver does not support
this, and will fail, which results in MMC not probing.
This patch implements the reset callback by asserting the reset control,
then deasserting it after a small delay.
Fixes: 7a6fca879f59 ("clk: sunxi: Add driver for A80 MMC config clocks/resets")
Cc: <stable@vger.kernel.org> # 4.14.x
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Link: lkml.kernel.org/r/20171218035751.20661-1-wens@csie.org
drm-misc-fixes before holidays:
- fixup for the lease fixup (Keith)
- fb leak in the ww mutex fallback code (Maarten)
- sun4i fixes (Maxime, Hans)
* tag 'drm-misc-fixes-2017-12-21' of git://anongit.freedesktop.org/drm/drm-misc:
drm: move lease init after validation in drm_lease_create
drm/plane: Make framebuffer refcounting the responsibility of setplane_internal callers
drm/sun4i: hdmi: Move the mode_valid callback to the encoder
drm/sun4i: Fix error path handling
drm/sun4i: validate modes for HDMI
HPFS does not set SB_I_VERSION and does not use the i_version counter
internally.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Mikulas Patocka <mikulas@twibright.com>
Reviewed-by: Mikulas Patocka <mikulas@twibright.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
All the ChaCha20 algorithms as well as the ARM bit-sliced AES-XTS
algorithms call skcipher_walk_virt(), then access the IV (walk.iv)
before checking whether any bytes need to be processed (walk.nbytes).
But if the input is empty, then skcipher_walk_virt() doesn't set the IV,
and the algorithms crash trying to use the uninitialized IV pointer.
Fix it by setting the IV earlier in skcipher_walk_virt(). Also fix it
for the AEAD walk functions.
This isn't a perfect solution because we can't actually align the IV to
->cra_alignmask unless there are bytes to process, for one because the
temporary buffer for the aligned IV is freed by skcipher_walk_done(),
which is only called when there are bytes to process. Thus, algorithms
that require aligned IVs will still need to avoid accessing the IV when
walk.nbytes == 0. Still, many algorithms/architectures are fine with
IVs having any alignment, and even for those that aren't, a misaligned
pointer bug is much less severe than an uninitialized pointer bug.
This change also matches the behavior of the older blkcipher_walk API.
Fixes: 0cabf2af6f5a ("crypto: skcipher - Fix crash on zero-length input")
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When we're remounting the filesystem readonly, remove all CoW
preallocations prior to going ro. If the fs goes down after the ro
remount, we never clean up the staging extents, which means xfs_check
will trip over them on a subsequent run. Practically speaking, the next
mount will clean them up too, so this is unlikely to be seen. Since we
shut down the cowblocks cleaner on remount-ro, we also have to make sure
we start it back up if/when we remount-rw.
Found by adding clonerange to fsstress and running xfs/017.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Smatch complains that "len" is uninitialized if xenbus_read() fails so
let's add some error handling.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
pointers printed with %p are hashed, ie. you don't see the actual
pointer value but rather a cryptographic hash of its value.
In xmon we want to see the actual pointer values, because xmon is a
debugger, so replace %p with %px which prints the actual pointer
value.
We justify doing this in xmon because 1) xmon is a kernel crash
debugger, it's only accessible via the console 2) xmon doesn't print
to dmesg, so the pointers it prints are not able to be leaked that
way.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For flushing the TLB, the ASID which has been programmed into the hardware
must be known. That differs from what is in 'cpu_tlbstate'.
Add functions to transform the 'cpu_tlbstate' values into to the one
programmed into the hardware (CR3).
It's not easy to include mmu_context.h into tlbflush.h, so just move the
CR3 building over to tlbflush.h.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull hwmon fixes from Guenter Roeck:
"Fixes:
- Drop reference to obsolete maintainer tree
- Fix overflow bug in pmbus driver
- Fix SMBUS timeout problem in jc42 driver
For the SMBUS timeout handling, we had a brief discussion if this
should be considered a bug fix or a feature. Peter says "it fixes real
problems where the application misbehave due to faulty content when
reading from an eeprom", and he needs the patch in his company's v4.14
images. This is good enough for me and warrants backport to stable
kernels"
* tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (jc42) optionally try to disable the SMBUS timeout
hwmon: (pmbus) Use 64bit math for DIRECT format values
hwmon: Drop reference to Jean's tree
Please consider pulling the following fixes for v4.15. While it doesn't
fix any regression introduced in the v4.15 merge window, we have a
feature in at24 since linux v4.8 - reading the mac address block from
at24mac series - which turned out to be not working.
This pull request contains changes that fix it together with a patch
that hardens the read and write argument sanitization with
out-of-bounds checks that were missing.
Detect if we are returning to usermode via the normal kernel exit paths
but the saved PSR value indicates that we are in kernel mode. This
could occur due to corrupted stack state, which has been observed with
"ftracetest".
This ensures that we catch the problem case before we get to user code.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Pull block fixes from Jens Axboe:
"It's been a few weeks, so here's a small collection of fixes that
should go into the current series.
This contains:
- NVMe pull request from Christoph, with a few important fixes.
- kyber hang fix from Omar.
- A blk-throttl fix from Shaohua, fixing a case where we double
charge a bio.
- Two call_single_data alignment fixes from me, fixing up some
unfortunate changes that went into 4.14 without being properly
reviewed on the block side (since nobody was CC'ed on the
patch...).
- A bounce buffer fix in two parts, one from me and one from Ming.
- Revert bdi debug error handling patch. It's causing boot issues for
some folks, and a week down the line, we're still no closer to a
fix. Revert this patch for now until it's figured out, then we can
retry for 4.16"
* 'for-linus' of git://git.kernel.dk/linux-block:
Revert "bdi: add error handle for bdi_debug_register"
null_blk: unalign call_single_data
block: unalign call_single_data in struct request
block-throttle: avoid double charge
block: fix blk_rq_append_bio
block: don't let passthrough IO go into .make_request_fn()
nvme: setup streams after initializing namespace head
nvme: check hw sectors before setting chunk sectors
nvme: call blk_integrity_unregister after queue is cleaned up
nvme-fc: remove double put reference if admin connect fails
nvme: set discard_alignment to zero
kyber: fix another domain token wait queue hang
Pull libnvdimm fixes from Dan Williams:
"These fixes are all tagged for -stable and have received a build
success notification from the kbuild robot.
- NVDIMM namespaces, configured to enforce 1GB alignment, fail to
initialize on platforms that mis-align the start or end of the
physical address range.
- The Linux implementation of the BTT (Block Translation Table) is
incompatible with the UEFI 2.7 definition of the BTT format. The
BTT layers a software atomic sector semantic on top of an NVDIMM
namespace. Linux needs to be compatible with the UEFI definition to
enable boot support or any pre-OS access of data on a BTT enabled
namespace.
- A fix for ACPI SMART notification events, this allows a userspace
monitor to register for health events rather than poll. This has
been broken since it was initially merged as the unit test
inadvertently worked around the problem. The urgency for fixing
this during the -rc series is driven by how expensive it is to poll
for this data (System Management Mode entry)"
* 'libnvdimm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
libnvdimm, btt: Fix an incompatibility in the log layout
libnvdimm, btt: add a couple of missing kernel-doc lines
libnvdimm, dax: fix 1GB-aligned namespaces vs physical misalignment
libnvdimm, pfn: fix start_pad handling for aligned namespaces
acpi, nfit: fix health event notification
Pull x86 PTI preparatory patches from Thomas Gleixner:
"Todays Advent calendar window contains twentyfour easy to digest
patches. The original plan was to have twenty three matching the date,
but a late fixup made that moot.
- Move the cpu_entry_area mapping out of the fixmap into a separate
address space. That's necessary because the fixmap becomes too big
with NRCPUS=8192 and this caused already subtle and hard to
diagnose failures.
The top most patch is fresh from today and cures a brain slip of
that tall grumpy german greybeard, who ignored the intricacies of
32bit wraparounds.
- Limit the number of CPUs on 32bit to 64. That's insane big already,
but at least it's small enough to prevent address space issues with
the cpu_entry_area map, which have been observed and debugged with
the fixmap code
- A few TLB flush fixes in various places plus documentation which of
the TLB functions should be used for what.
- Rename the SYSENTER stack to CPU_ENTRY_AREA stack as it is used for
more than sysenter now and keeping the name makes backtraces
confusing.
- Prevent LDT inheritance on exec() by moving it to arch_dup_mmap(),
which is only invoked on fork().
- Make vysycall more robust.
- A few fixes and cleanups of the debug_pagetables code. Check
PAGE_PRESENT instead of checking the PTE for 0 and a cleanup of the
C89 initialization of the address hint array which already was out
of sync with the index enums.
- Move the ESPFIX init to a different place to prepare for PTI.
- Several code moves with no functional change to make PTI
integration simpler and header files less convoluted.
- Documentation fixes and clarifications"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86/cpu_entry_area: Prevent wraparound in setup_cpu_entry_area_ptes() on 32bit
init: Invoke init_espfix_bsp() from mm_init()
x86/cpu_entry_area: Move it out of the fixmap
x86/cpu_entry_area: Move it to a separate unit
x86/mm: Create asm/invpcid.h
x86/mm: Put MMU to hardware ASID translation in one place
x86/mm: Remove hard-coded ASID limit checks
x86/mm: Move the CR3 construction functions to tlbflush.h
x86/mm: Add comments to clarify which TLB-flush functions are supposed to flush what
x86/mm: Remove superfluous barriers
x86/mm: Use __flush_tlb_one() for kernel memory
x86/microcode: Dont abuse the TLB-flush interface
x86/uv: Use the right TLB-flush API
x86/entry: Rename SYSENTER_stack to CPU_ENTRY_AREA_entry_stack
x86/doc: Remove obvious weirdnesses from the x86 MM layout documentation
x86/mm/64: Improve the memory map documentation
x86/ldt: Prevent LDT inheritance on exec
x86/ldt: Rework locking
arch, mm: Allow arch_dup_mmap() to fail
x86/vsyscall/64: Warn and fail vsyscall emulation in NATIVE mode
...
Due to a spec misinterpretation, the Linux implementation of the BTT log
area had different padding scheme from other implementations, such as
UEFI and NVML.
This fixes the padding scheme, and defaults to it for new BTT layouts.
We attempt to detect the padding scheme in use when probing for an
existing BTT. If we detect the older/incompatible scheme, we continue
using it.
Reported-by: Juston Li <juston.li@intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: <stable@vger.kernel.org>
Fixes: 5212e11fde4d ("nd_btt: atomic sector updates")
Signed-off-by: Vishal Verma <vishal.l.verma@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull powerpc fixes from Michael Ellerman:
"This is all fairly boring, except that there's two KVM fixes that
you'd normally get via Paul's kvm-ppc tree. He's away so I picked them
up. I was waiting to see if he would apply them, which is why they
have only been in my tree since today. But they were on the list for a
while and have been tested on the relevant hardware.
Of note is two fixes for KVM XIVE (Power9 interrupt controller). These
would normally go via the KVM tree but Paul is away so I've picked
them up.
Other than that, two fixes for error handling in the IMC driver, and
one for a potential oops in the BHRB code if the hardware records a
branch address that has subsequently been unmapped, and finally a
s/%p/%px/ in our oops code.
Thanks to: Anju T Sudhakar, Cédric Le Goater, Laurent Vivier, Madhavan
Srinivasan, Naveen N. Rao, Ravi Bangoria"
* tag 'powerpc-4.15-5' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Fix pending_pri value in kvmppc_xive_get_icp()
KVM: PPC: Book3S: fix XIVE migration of pending interrupts
powerpc/kernel: Print actual address of regs when oopsing
powerpc/perf: Fix kfree memory allocated for nest pmus
powerpc/perf/imc: Fix nest-imc cpuhotplug callback failure
powerpc/perf: Dereference BHRB entries safely
The loop which populates the CPU entry area PMDs can wrap around on 32bit
machines when the number of CPUs is small.
It worked wonderful for NR_CPUS=64 for whatever reason and the moron who
wrote that code did not bother to test it with !SMP.
Check for the wraparound to fix it.
Fixes: 92a0f81d8957 ("x86/cpu_entry_area: Move it out of the fixmap")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Thomas "Feels stupid" Gleixner <tglx@linutronix.de>
Tested-by: Borislav Petkov <bp@alien8.de>
Pull xen fixes from Juergen Gross:
"This contains two fixes for running under Xen:
- a fix avoiding resource conflicts between adding mmio areas and
memory hotplug
- a fix setting NX bits in page table entries copied from Xen when
running a PV guest"
* tag 'for-linus-4.15-rc5-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: Mark unallocated host memory as UNUSABLE
x86-64/Xen: eliminate W+X mappings
When we migrate a VM from a POWER8 host (XICS) to a POWER9 host
(XICS-on-XIVE), we have an error:
qemu-kvm: Unable to restore KVM interrupt controller state \
(0xff000000) for CPU 0: Invalid argument
This is because kvmppc_xics_set_icp() checks the new state
is internaly consistent, and especially:
...
1129 if (xisr == 0) {
1130 if (pending_pri != 0xff)
1131 return -EINVAL;
...
On the other side, kvmppc_xive_get_icp() doesn't set
neither the pending_pri value, nor the xisr value (set to 0)
(and kvmppc_xive_set_icp() ignores the pending_pri value)
As xisr is 0, pending_pri must be set to 0xff.
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
init_espfix_bsp() needs to be invoked before the page table isolation
initialization. Move it into mm_init() which is the place where pti_init()
will be added.
While at it get rid of the #ifdeffery and provide proper stub functions.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The following namespace configuration attempt:
# ndctl create-namespace -e namespace0.0 -m devdax -a 1G -f
libndctl: ndctl_dax_enable: dax0.1: failed to enable
Error: namespace0.0: failed to enable
failed to reconfigure namespace: No such device or address
...fails when the backing memory range is not physically aligned to 1G:
# cat /proc/iomem | grep Persistent
210000000-30fffffff : Persistent Memory (legacy)
In the above example the 4G persistent memory range starts and ends on a
256MB boundary.
We handle this case correctly when needing to handle cases that violate
section alignment (128MB) collisions against "System RAM", and we simply
need to extend that padding/truncation for the 1GB alignment use case.
Cc: <stable@vger.kernel.org>
Fixes: 315c562536c4 ("libnvdimm, pfn: add 'align' attribute...")
Reported-and-tested-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull xfs fixes from Darrick Wong:
"Here are some XFS fixes for 4.15-rc5. Apologies for the unusually
large number of patches this late, but I wanted to make sure the
corruption fixes were really ready to go.
Changes since last update:
- Fix a locking problem during xattr block conversion that could lead
to the log checkpointing thread to try to write an incomplete
buffer to disk, which leads to a corruption shutdown
- Fix a null pointer dereference when removing delayed allocation
extents
- Remove post-eof speculative allocations when reflinking a block
past current inode size so that we don't just leave them there and
assert on inode reclaim
- Relax an assert which didn't accurately reflect the way locking
works and would trigger under heavy io load
- Avoid infinite loop when cancelling copy on write extents after a
writeback failure
- Try to avoid copy on write transaction reservation overflows when
remapping after a successful write
- Fix various problems with the copy-on-write reservation automatic
garbage collection not being cleaned up properly during a ro
remount
- Fix problems with rmap log items being processed in the wrong
order, leading to corruption shutdowns
- Fix problems with EFI recovery wherein the "remove any rmapping if
present" mechanism wasn't actually doing anything, which would lead
to corruption problems later when the extent is reallocated,
leading to multiple rmaps for the same extent"
* tag 'xfs-4.15-fixes-8' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: only skip rmap owner checks for unknown-owner rmap removal
xfs: always honor OWN_UNKNOWN rmap removal requests
xfs: queue deferred rmap ops for cow staging extent alloc/free in the right order
xfs: set cowblocks tag for direct cow writes too
xfs: remove leftover CoW reservations when remounting ro
xfs: don't be so eager to clear the cowblocks tag on truncate
xfs: track cowblocks separately in i_flags
xfs: allow CoW remap transactions to use reserve blocks
xfs: avoid infinite loop when cancelling CoW blocks after writeback failure
xfs: relax is_reflink_inode assert in xfs_reflink_find_cow_mapping
xfs: remove dest file's post-eof preallocations before reflinking
xfs: move xfs_iext_insert tracepoint to report useful information
xfs: account for null transactions in bunmapi
xfs: hold xfs_buf locked between shortform->leaf conversion and the addition of an attribute
xfs: add the ability to join a held buffer to a defer_ops
Commit f5775e0b6116 ("x86/xen: discard RAM regions above the maximum
reservation") left host memory not assigned to dom0 as available for
memory hotplug.
Unfortunately this also meant that those regions could be used by
others. Specifically, commit fa564ad96366 ("x86/PCI: Enable a 64bit BAR
on AMD Family 15h (Models 00-1f, 30-3f, 60-7f)") may try to map those
addresses as MMIO.
To prevent this mark unallocated host memory as E820_TYPE_UNUSABLE (thus
effectively reverting f5775e0b6116) and keep track of that region as
a hostmem resource that can be used for the hotplug.
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
When restoring a pending interrupt, we are setting the Q bit to force
a retrigger in xive_finish_unmask(). But we also need to force an EOI
in this case to reach the same initial state : P=1, Q=0.
This can be done by not setting 'old_p' for pending interrupts which
will inform xive_finish_unmask() that an EOI needs to be sent.
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller")
Cc: stable@vger.kernel.org # v4.12+
Suggested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Laurent Vivier <lvivier@redhat.com>
Tested-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Put the cpu_entry_area into a separate P4D entry. The fixmap gets too big
and 0-day already hit a case where the fixmap PTEs were cleared by
cleanup_highmap().
Aside of that the fixmap API is a pain as it's all backwards.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The alignment checks at pfn driver startup fail to properly account for
the 'start_pad' in the case where the namespace is misaligned relative
to its internal alignment. This is typically triggered in 1G aligned
namespace, but could theoretically trigger with small namespace
alignments. When this triggers the kernel reports messages of the form:
dax2.1: bad offset: 0x3c000000 dax disabled align: 0x40000000
Cc: <stable@vger.kernel.org>
Fixes: 1ee6667cd8d1 ("libnvdimm, pfn, dax: fix initialization vs autodetect...")
Reported-by: Jane Chu <jane.chu@oracle.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull crypto fixes from Herbert Xu:
"This fixes the following issues:
- fix chacha20 crash on zero-length input due to unset IV
- fix potential race conditions in mcryptd with spinlock
- only wait once at top of algif recvmsg to avoid inconsistencies
- fix potential use-after-free in algif_aead/algif_skcipher"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: af_alg - fix race accessing cipher request
crypto: mcryptd - protect the per-CPU queue with a lock
crypto: af_alg - wait for data at beginning of recvmsg
crypto: skcipher - set walk.iv for zero-length inputs
A few thousand such pages are usually left around due to the re-use of
L1 tables having been provided by the hypervisor (Dom0) or tool stack
(DomU). Set NX in the direct map variant, which needs to be done in L2
due to the dual use of the re-used L1s.
For x86_configure_nx() to actually do what it is supposed to do, call
get_cpu_cap() first. This was broken by commit 4763ed4d45 ("x86, mm:
Clean up and simplify NX enablement") when switching away from the
direct EFER read.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
When we oops or otherwise call show_regs() we print the address of the
regs structure. Being able to see the address is fairly useful,
firstly to verify that the regs pointer is not completely bogus, and
secondly it allows you to dump the regs and surrounding memory with a
debugger if you have one.
In the normal case the regs will be located somewhere on the stack, so
printing their location discloses no further information than printing
the stack pointer does already.
So switch to %px and print the actual address, not the hashed value.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Separate the cpu_entry_area code out of cpu/common.c and the fixmap.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Integration testing with a BIOS that generates injected health event
notifications fails to communicate those events to userspace. The nfit
driver neglects to link the ACPI DIMM device with the necessary driver
data so acpi_nvdimm_notify() fails this lookup:
nfit_mem = dev_get_drvdata(dev);
if (nfit_mem && nfit_mem->flags_attr)
sysfs_notify_dirent(nfit_mem->flags_attr);
Add the necessary linkage when installing the notification handler and
clean it up when the nfit driver instance is torn down.
Cc: <stable@vger.kernel.org>
Cc: Toshi Kani <toshi.kani@hpe.com>
Cc: Vishal Verma <vishal.l.verma@intel.com>
Fixes: ba9c8dd3c222 ("acpi, nfit: add dimm device notification support")
Reported-by: Daniel Osawa <daniel.k.osawa@intel.com>
Tested-by: Daniel Osawa <daniel.k.osawa@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Pull pin control fix from Linus Walleij:
"A single pin control fix for Intel machines, affecting a bunch of
Chromebooks. Nothing else collected up amazingly"
* tag 'pinctrl-v4.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: cherryview: Mask all interrupts on Intel_Strago based systems
When invoking an asynchronous cipher operation, the invocation of the
callback may be performed before the subsequent operations in the
initial code path are invoked. The callback deletes the cipher request
data structure which implies that after the invocation of the
asynchronous cipher operation, this data structure must not be accessed
any more.
The setting of the return code size with the request data structure must
therefore be moved before the invocation of the asynchronous cipher
operation.
Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management")
Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management")
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Calling xfs_rmap_free with an unknown owner is supposed to remove any
rmaps covering that range regardless of owner. This is used by the EFI
recovery code to say "we're freeing this, it mustn't be owned by
anything anymore", but for whatever reason xfs_free_ag_extent filters
them out.
Therefore, remove the filter and make xfs_rmap_unmap actually treat it
as a wildcard owner -- free anything that's already there, and if
there's no owner at all then that's fine too.
There are two existing callers of bmap_add_free that take care the rmap
deferred ops themselves and use OWN_UNKNOWN to skip the EFI-based rmap
cleanup; convert these to use OWN_NULL (via helpers), and now we really
require that an RUI (if any) gets added to the defer ops before any EFI.
Lastly, now that xfs_free_extent filters out OWN_NULL rmap free requests,
growfs will have to consult directly with the rmap to ensure that there
aren't any rmaps in the grown region.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
imc_common_cpuhp_mem_free() is the common function for all
IMC (In-memory Collection counters) domains to unregister cpuhotplug
callback and free memory. Since kfree of memory allocated for
nest-imc (per_nest_pmu_arr) is in the common code, all
domains (core/nest/thread) can do the kfree in the failure case.
This could potentially create a call trace as shown below, where
core(/thread/nest) imc pmu initialization fails and in the failure
path imc_common_cpuhp_mem_free() free the memory(per_nest_pmu_arr),
which is allocated by successfully registered nest units.
The call trace is generated in a scenario where core-imc
initialization is made to fail and a cpuhotplug is performed in a p9
system. During cpuhotplug ppc_nest_imc_cpu_offline() tries to access
per_nest_pmu_arr, which is already freed by core-imc.
NIP [c000000000cb6a94] mutex_lock+0x34/0x90
LR [c000000000cb6a88] mutex_lock+0x28/0x90
Call Trace:
mutex_lock+0x28/0x90 (unreliable)
perf_pmu_migrate_context+0x90/0x3a0
ppc_nest_imc_cpu_offline+0x190/0x1f0
cpuhp_invoke_callback+0x160/0x820
cpuhp_thread_fun+0x1bc/0x270
smpboot_thread_fn+0x250/0x290
kthread+0x1a8/0x1b0
ret_from_kernel_thread+0x5c/0x74
To address this scenario do the kfree(per_nest_pmu_arr) only in case
of nest-imc initialization failure, and when there is no other nest
units registered.
Fixes: 73ce9aec65b1 ("powerpc/perf: Fix IMC_MAX_PMU macro")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Unclutter tlbflush.h a little.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull drm fixes from Dave Airlie:
"I've got most of two weeks worth of fixes here due to being on
holidays last week.
The main things are:
- Core:
* Syncobj fd reference count fix
* Leasing ioctl misuse fix
- nouveau regression fixes
- further amdgpu DC fixes
- sun4i regression fixes
I'm not sure I'll see many fixes over next couple of weeks, we'll see
how we go"
* tag 'drm-fixes-for-v4.15-rc5' of git://people.freedesktop.org/~airlied/linux: (27 commits)
drm/syncobj: Stop reusing the same struct file for all syncobj -> fd
drm: move lease init after validation in drm_lease_create
drm/plane: Make framebuffer refcounting the responsibility of setplane_internal callers
drm/sun4i: hdmi: Move the mode_valid callback to the encoder
drm/nouveau: fix obvious memory leak
drm/i915: Protect DDI port to DPLL map from theoretical race.
drm/i915/lpe: Remove double-encapsulation of info string
drm/sun4i: Fix error path handling
drm/nouveau: use alternate memory type for system-memory buffers with kind != 0
drm/nouveau: avoid GPU page sizes > PAGE_SIZE for buffer objects in host memory
drm/nouveau/mmu/gp10b: use correct implementation
drm/nouveau/pci: do a msi rearm on init
drm/nouveau/imem/nv50: fix refcount_t warning
drm/nouveau/bios/dp: support DP Info Table 2.0
drm/nouveau/fbcon: fix NULL pointer access in nouveau_fbcon_destroy
drm/amd/display: Fix rehook MST display not light back on
drm/amd/display: fix missing pixel clock adjustment for dongle
drm/amd/display: set chroma taps to 1 when not scaling
drm/amd/display: add pipe locking before front end programing
drm/sun4i: validate modes for HDMI
...
Guenter Roeck reported an interrupt storm on a prototype system which is
based on Cyan Chromebook. The root cause turned out to be a incorrectly
configured pin that triggers spurious interrupts. This will be fixed in
coreboot but currently we need to prevent the interrupt storm from
happening by masking all interrupts (but not GPEs) on those systems.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=197953
Fixes: bcb48cca23ec ("pinctrl: cherryview: Do not mask all interrupts in probe")
Reported-and-tested-by: Guenter Roeck <linux@roeck-us.net>
Reported-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
mcryptd_enqueue_request() grabs the per-CPU queue struct and protects
access to it with disabled preemption. Then it schedules a worker on the
same CPU. The worker in mcryptd_queue_worker() guards access to the same
per-CPU variable with disabled preemption.
If we take CPU-hotplug into account then it is possible that between
queue_work_on() and the actual invocation of the worker the CPU goes
down and the worker will be scheduled on _another_ CPU. And here the
preempt_disable() protection does not work anymore. The easiest thing is
to add a spin_lock() to guard access to the list.
Another detail: mcryptd_queue_worker() is not processing more than
MCRYPTD_BATCH invocation in a row. If there are still items left, then
it will invoke queue_work() to proceed with more later. *I* would
suggest to simply drop that check because it does not use a system
workqueue and the workqueue is already marked as "CPU_INTENSIVE". And if
preemption is required then the scheduler should do it.
However if queue_work() is used then the work item is marked as CPU
unbound. That means it will try to run on the local CPU but it may run
on another CPU as well. Especially with CONFIG_DEBUG_WQ_FORCE_RR_CPU=y.
Again, the preempt_disable() won't work here but lock which was
introduced will help.
In order to keep work-item on the local CPU (and avoid RR) I changed it
to queue_work_on().
Cc: stable@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Under the deferred rmap operation scheme, there's a certain order in
which the rmap deferred ops have to be queued to maintain integrity
during log replay. For alloc/map operations that order is cui -> rui;
for free/unmap operations that order is cui -> rui -> efi. However, the
initial refcount code got the ordering wrong in the free side of things
because it queued refcount free op and an EFI and the refcount free op
queued a rmap free op, resulting in the order cui -> efi -> rui.
If we fail before the efd finishes, the efi recovery will try to do a
wildcard rmap removal and the subsequent rui will fail to find the rmap
and blow up. This didn't ever happen due to other screws up in handling
unknown owner rmap removals, but those other screw ups broke recovery in
other ways, so fix the ordering to follow the intended rules.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Unconditionally reporting a value seen on the P4 or older invokes
functionality like io_apic_get_unique_id() on 32-bit builds, resulting
in a panic() with sufficiently many CPUs and/or IO-APICs. Doing what
that function does would be the hypervisor's responsibility anyway, so
makes no sense to be used when running on Xen. Uniformly report a more
modern version; this shouldn't matter much as both LAPIC and IO-APIC are
being managed entirely / mostly by the hypervisor.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Oops is observed during boot:
Faulting instruction address: 0xc000000000248340
cpu 0x0: Vector: 380 (Data Access Out of Range) at [c000000ff66fb850]
pc: c000000000248340: event_function_call+0x50/0x1f0
lr: c00000000024878c: perf_remove_from_context+0x3c/0x100
sp: c000000ff66fbad0
msr: 9000000000009033
dar: 7d20e2a6f92d03c0
pid = 14, comm = cpuhp/0
While registering the cpuhotplug callbacks for nest-imc, if we fail in
the cpuhotplug online path for any random node in a multi node
system (because the opal call to stop nest-imc counters fails for that
node), ppc_nest_imc_cpu_offline() will get invoked for other nodes who
successfully returned from cpuhotplug online path.
This call trace is generated since in the ppc_nest_imc_cpu_offline()
path we are trying to migrate the event context, when nest-imc
counters are not even initialized.
Patch to add a check to ensure that nest-imc is registered before
migrating the event context.
Fixes: 885dcd709ba9 ("powerpc/perf: Add nest IMC PMU support")
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
There are effectively two ASID types:
1. The one stored in the mmu_context that goes from 0..5
2. The one programmed into the hardware that goes from 1..6
This consolidates the locations where converting between the two (by doing
a +1) to a single place which gives us a nice place to comment.
PAGE_TABLE_ISOLATION will also need to, given an ASID, know which hardware
ASID to flush for the userspace mapping.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull clk fixes from Stephen Boyd:
"Here's a trio of fixes:
- The runtime PM clk patches that landed this merge window forgot to
runtime resume devices that may be off while recalculating and
setting rates of child clks of whatever clk is changing rates.
- We had a NULL pointer deref in an old clk tracepoint when
clk_set_parent() is called with a NULL parent pointer. This
shouldn't really happen, but it's best to avoid this regardless.
- The sun9i-mmc clk driver didn't provide 'reset' support, just
'assert' and 'deassert' so the MMC driver stopped probing when the
probe was changed to do a reset instead of assert/deassert pair.
This implements the reset so things work again"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi: sun9i-mmc: Implement reset callback for reset controls
clk: fix a panic error caused by accessing NULL pointer
clk: Manage proper runtime PM state in clk_change_rate()
The vk cts test:
dEQP-VK.api.external.semaphore.opaque_fd.export_multiple_times_temporary
triggers a lot of
VFS: Close: file count is 0
Dave pointed out that clearing the syncobj->file from
drm_syncobj_file_release() was sufficient to silence the test, but that
opens a can of worm since we assumed that the syncobj->file was never
unset. Stop trying to reuse the same struct file for every fd pointing
to the drm_syncobj, and allocate one file for each fd instead.
v2: Fixup return handling of drm_syncobj_fd_to_handle
v2.1: [airlied: fix possible syncobj ref race]
Reported-by: Dave Airlie <airlied@redhat.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The wait for data is a non-atomic operation that can sleep and therefore
potentially release the socket lock. The release of the socket lock
allows another thread to modify the context data structure. The waiting
operation for new data therefore must be called at the beginning of
recvmsg. This prevents a race condition where checks of the members of
the context data structure are performed by recvmsg while there is a
potential for modification of these values.
Fixes: e870456d8e7c ("crypto: algif_skcipher - overhaul memory management")
Fixes: d887c52d6ae4 ("crypto: algif_aead - overhaul memory management")
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Stephan Mueller <smueller@chronox.de>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
If a user performs a direct CoW write, we end up loading the CoW fork
with preallocated extents. Therefore, we must set the cowblocks tag so
that they can be cleared out if we run low on space.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
bedata->ref can't be less than zero because it's unsigned. This affects
certain error paths in probe. We first set ->ref = -1 and then we set
it to a valid value later.
Fixes: 219681909913 ("xen/pvcalls: connect to the backend")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
It's theoretically possible that branch instructions recorded in
BHRB (Branch History Rolling Buffer) entries have already been
unmapped before they are processed by the kernel. Hence, trying to
dereference such memory location will result in a crash. eg:
Unable to handle kernel paging request for data at address 0xd000000019c41764
Faulting instruction address: 0xc000000000084a14
NIP [c000000000084a14] branch_target+0x4/0x70
LR [c0000000000eb828] record_and_restart+0x568/0x5c0
Call Trace:
[c0000000000eb3b4] record_and_restart+0xf4/0x5c0 (unreliable)
[c0000000000ec378] perf_event_interrupt+0x298/0x460
[c000000000027964] performance_monitor_exception+0x54/0x70
[c000000000009ba4] performance_monitor_common+0x114/0x120
Fix it by deferefencing the addresses safely.
Fixes: 691231846ceb ("powerpc/perf: Fix setting of "to" addresses for BHRB")
Cc: stable@vger.kernel.org # v3.10+
Suggested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
[mpe: Use probe_kernel_read() which is clearer, tweak change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
First, it's nice to remove the magic numbers.
Second, PAGE_TABLE_ISOLATION is going to consume half of the available ASID
space. The space is currently unused, but add a comment to spell out this
new restriction.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Dave Hansen <dave.hansen@intel.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull i2c fixes from Wolfram Sang:
"Here are two bugfixes for I2C, fixing a memleak in the core and irq
allocation for i801.
Also three bugfixes for the at24 eeprom driver which Bartosz collected
while taking over maintainership for this driver"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
eeprom: at24: check at24_read/write arguments
eeprom: at24: fix reading from 24MAC402/24MAC602
eeprom: at24: correctly set the size for at24mac402
i2c: i2c-boardinfo: fix memory leaks on devinfo
i2c: i801: Fix Failed to allocate irq -2147483648 error
When qemu starts a kernel in a bare environment, the default SCR has
the AW and FW bits clear, which means that the kernel can't modify
the PSR A or PSR F bits, and means that FIQs and imprecise aborts are
always masked.
When running uboot under qemu, the AW and FW SCR bits are set, and the
kernel functions normally - and this is how real hardware behaves.
Fix this for qemu by ignoring the FIQ bit.
Fixes: 8bafae202c82 ("ARM: BUG if jumping to usermode address in kernel mode")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Pull networking fixes from David Miller"
"What's a holiday weekend without some networking bug fixes? [1]
1) Fix some eBPF JIT bugs wrt. SKB pointers across helper function
calls, from Daniel Borkmann.
2) Fix regression from errata limiting change to marvell PHY driver,
from Zhao Qiang.
3) Fix u16 overflow in SCTP, from Xin Long.
4) Fix potential memory leak during bridge newlink, from Nikolay
Aleksandrov.
5) Fix BPF selftest build on s390, from Hendrik Brueckner.
6) Don't append to cfg80211 automatically generated certs file,
always write new ones from scratch. From Thierry Reding.
7) Fix sleep in atomic in mac80211 hwsim, from Jia-Ju Bai.
8) Fix hang on tg3 MTU change with certain chips, from Brian King.
9) Add stall detection to arc emac driver and reset chip when this
happens, from Alexander Kochetkov.
10) Fix MTU limitng in GRE tunnel drivers, from Xin Long.
11) Fix stmmac timestamping bug due to mis-shifting of field. From
Fredrik Hallenberg.
12) Fix metrics match when deleting an ipv4 route. The kernel sets
some internal metrics bits which the user isn't going to set when
it makes the delete request. From Phil Sutter.
13) mvneta driver loop over RX queues limits on "txq_number" :-) Fix
from Yelena Krivosheev.
14) Fix double free and memory corruption in get_net_ns_by_id, from
Eric W. Biederman.
15) Flush ipv4 FIB tables in the reverse order. Some tables can share
their actual backing data, in particular this happens for the MAIN
and LOCAL tables. We have to kill the LOCAL table first, because
it uses MAIN's backing memory. Fix from Ido Schimmel.
16) Several eBPF verifier value tracking fixes, from Edward Cree, Jann
Horn, and Alexei Starovoitov.
17) Make changes to ipv6 autoflowlabel sysctl really propagate to
sockets, unless the socket has set the per-socket value
explicitly. From Shaohua Li.
18) Fix leaks and double callback invocations of zerocopy SKBs, from
Willem de Bruijn"
[1] Is this a trick question? "Relaxing"? "Quiet"? "Fine"? - Linus.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (77 commits)
skbuff: skb_copy_ubufs must release uarg even without user frags
skbuff: orphan frags before zerocopy clone
net: reevalulate autoflowlabel setting after sysctl setting
openvswitch: Fix pop_vlan action for double tagged frames
ipv6: Honor specified parameters in fibmatch lookup
bpf: do not allow root to mangle valid pointers
selftests/bpf: add tests for recent bugfixes
bpf: fix integer overflows
bpf: don't prune branches when a scalar is replaced with a pointer
bpf: force strict alignment checks for stack pointers
bpf: fix missing error return in check_stack_boundary()
bpf: fix 32-bit ALU op verification
bpf: fix incorrect tracking of register size truncation
bpf: fix incorrect sign extension in check_alu_op()
bpf/verifier: fix bounds calculation on BPF_RSH
ipv4: Fix use-after-free when flushing FIB tables
s390/qeth: fix error handling in checksum cmd callback
tipc: remove joining group member from congested list
selftests: net: Adding config fragment CONFIG_NUMA=y
nfp: bpf: keep track of the offloaded program
...
Our MMC host driver now issues a reset, instead of just deasserting
the reset control, since commit c34eda69ad4c ("mmc: sunxi: Reset the
device at probe time"). The sun9i-mmc clock driver does not support
this, and will fail, which results in MMC not probing.
This patch implements the reset callback by asserting the reset control,
then deasserting it after a small delay.
Fixes: 7a6fca879f59 ("clk: sunxi: Add driver for A80 MMC config clocks/resets")
Cc: <stable@vger.kernel.org> # 4.14.x
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Acked-by: Philipp Zabel <p.zabel@pengutronix.de>
Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Signed-off-by: Michael Turquette <mturquette@baylibre.com>
Link: lkml.kernel.org/r/20171218035751.20661-1-wens@csie.org
drm-misc-fixes before holidays:
- fixup for the lease fixup (Keith)
- fb leak in the ww mutex fallback code (Maarten)
- sun4i fixes (Maxime, Hans)
* tag 'drm-misc-fixes-2017-12-21' of git://anongit.freedesktop.org/drm/drm-misc:
drm: move lease init after validation in drm_lease_create
drm/plane: Make framebuffer refcounting the responsibility of setplane_internal callers
drm/sun4i: hdmi: Move the mode_valid callback to the encoder
drm/sun4i: Fix error path handling
drm/sun4i: validate modes for HDMI
All the ChaCha20 algorithms as well as the ARM bit-sliced AES-XTS
algorithms call skcipher_walk_virt(), then access the IV (walk.iv)
before checking whether any bytes need to be processed (walk.nbytes).
But if the input is empty, then skcipher_walk_virt() doesn't set the IV,
and the algorithms crash trying to use the uninitialized IV pointer.
Fix it by setting the IV earlier in skcipher_walk_virt(). Also fix it
for the AEAD walk functions.
This isn't a perfect solution because we can't actually align the IV to
->cra_alignmask unless there are bytes to process, for one because the
temporary buffer for the aligned IV is freed by skcipher_walk_done(),
which is only called when there are bytes to process. Thus, algorithms
that require aligned IVs will still need to avoid accessing the IV when
walk.nbytes == 0. Still, many algorithms/architectures are fine with
IVs having any alignment, and even for those that aren't, a misaligned
pointer bug is much less severe than an uninitialized pointer bug.
This change also matches the behavior of the older blkcipher_walk API.
Fixes: 0cabf2af6f5a ("crypto: skcipher - Fix crash on zero-length input")
Reported-by: syzbot <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # v4.14+
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When we're remounting the filesystem readonly, remove all CoW
preallocations prior to going ro. If the fs goes down after the ro
remount, we never clean up the staging extents, which means xfs_check
will trip over them on a subsequent run. Practically speaking, the next
mount will clean them up too, so this is unlikely to be seen. Since we
shut down the cowblocks cleaner on remount-ro, we also have to make sure
we start it back up if/when we remount-rw.
Found by adding clonerange to fsstress and running xfs/017.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Smatch complains that "len" is uninitialized if xenbus_read() fails so
let's add some error handling.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Stefano Stabellini <sstabellini@kernel.org>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
pointers printed with %p are hashed, ie. you don't see the actual
pointer value but rather a cryptographic hash of its value.
In xmon we want to see the actual pointer values, because xmon is a
debugger, so replace %p with %px which prints the actual pointer
value.
We justify doing this in xmon because 1) xmon is a kernel crash
debugger, it's only accessible via the console 2) xmon doesn't print
to dmesg, so the pointers it prints are not able to be leaked that
way.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
For flushing the TLB, the ASID which has been programmed into the hardware
must be known. That differs from what is in 'cpu_tlbstate'.
Add functions to transform the 'cpu_tlbstate' values into to the one
programmed into the hardware (CR3).
It's not easy to include mmu_context.h into tlbflush.h, so just move the
CR3 building over to tlbflush.h.
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: David Laight <David.Laight@aculab.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Eduardo Valentin <eduval@amazon.com>
Cc: Greg KH <gregkh@linuxfoundation.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: aliguori@amazon.com
Cc: daniel.gruss@iaik.tugraz.at
Cc: hughd@google.com
Cc: keescook@google.com
Cc: linux-mm@kvack.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull hwmon fixes from Guenter Roeck:
"Fixes:
- Drop reference to obsolete maintainer tree
- Fix overflow bug in pmbus driver
- Fix SMBUS timeout problem in jc42 driver
For the SMBUS timeout handling, we had a brief discussion if this
should be considered a bug fix or a feature. Peter says "it fixes real
problems where the application misbehave due to faulty content when
reading from an eeprom", and he needs the patch in his company's v4.14
images. This is good enough for me and warrants backport to stable
kernels"
* tag 'hwmon-for-linus-v4.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (jc42) optionally try to disable the SMBUS timeout
hwmon: (pmbus) Use 64bit math for DIRECT format values
hwmon: Drop reference to Jean's tree
Please consider pulling the following fixes for v4.15. While it doesn't
fix any regression introduced in the v4.15 merge window, we have a
feature in at24 since linux v4.8 - reading the mac address block from
at24mac series - which turned out to be not working.
This pull request contains changes that fix it together with a patch
that hardens the read and write argument sanitization with
out-of-bounds checks that were missing.
Detect if we are returning to usermode via the normal kernel exit paths
but the saved PSR value indicates that we are in kernel mode. This
could occur due to corrupted stack state, which has been observed with
"ftracetest".
This ensures that we catch the problem case before we get to user code.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Pull block fixes from Jens Axboe:
"It's been a few weeks, so here's a small collection of fixes that
should go into the current series.
This contains:
- NVMe pull request from Christoph, with a few important fixes.
- kyber hang fix from Omar.
- A blk-throttl fix from Shaohua, fixing a case where we double
charge a bio.
- Two call_single_data alignment fixes from me, fixing up some
unfortunate changes that went into 4.14 without being properly
reviewed on the block side (since nobody was CC'ed on the
patch...).
- A bounce buffer fix in two parts, one from me and one from Ming.
- Revert bdi debug error handling patch. It's causing boot issues for
some folks, and a week down the line, we're still no closer to a
fix. Revert this patch for now until it's figured out, then we can
retry for 4.16"
* 'for-linus' of git://git.kernel.dk/linux-block:
Revert "bdi: add error handle for bdi_debug_register"
null_blk: unalign call_single_data
block: unalign call_single_data in struct request
block-throttle: avoid double charge
block: fix blk_rq_append_bio
block: don't let passthrough IO go into .make_request_fn()
nvme: setup streams after initializing namespace head
nvme: check hw sectors before setting chunk sectors
nvme: call blk_integrity_unregister after queue is cleaned up
nvme-fc: remove double put reference if admin connect fails
nvme: set discard_alignment to zero
kyber: fix another domain token wait queue hang