commits
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix potential NULL pointer dereference in the auxtrace option parser
- Fix access to PID in an array when setting a PID filter in 'perf ftrace'
- Fix error return code in the 'perf data' tool and in maps__clone(),
found using a static analysis tool from Huawei
* tag 'perf-tools-fixes-for-v5.12-2021-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf map: Fix error return code in maps__clone()
perf ftrace: Fix access to pid in array when setting a pid filter
perf auxtrace: Fix potential NULL pointer dereference
perf data: Fix error return code in perf_data__create_dir()
Pull x86 perf fixes from Borislav Petkov:
- Fix Broadwell Xeon's stepping in the PEBS isolation table of CPUs
- Fix a panic when initializing perf uncore machinery on Haswell and
Broadwell servers
* tag 'perf_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]
perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3
Although 'err' has been initialized to -ENOMEM, but it will be reassigned
by the "err = unwind__prepare_access(...)" statement in the for loop. So
that, the value of 'err' is unknown when map__clone() failed.
Fixes: 6c502584438bda63 ("perf unwind: Call unwind__prepare_access for forked thread")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: zhen lei <thunder.leizhen@huawei.com>
Link: http://lore.kernel.org/lkml/20210415092744.3793-1-thunder.leizhen@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull locking fix from Borislav Petkov:
"Fix ordering in the queued writer lock's slowpath"
* tag 'locking_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/qrwlock: Fix ordering in queued_write_lock_slowpath()
The only stepping of Broadwell Xeon parts is stepping 1. Fix the
relevant isolation_ucodes[] entry, which previously enumerated
stepping 2.
Although the original commit was characterized as an optimization, it
is also a workaround for a correctness issue.
If a PMI arrives between kvm's call to perf_guest_get_msrs() and the
subsequent VM-entry, a stale value for the IA32_PEBS_ENABLE MSR may be
restored at the next VM-exit. This is because, unbeknownst to kvm, PMI
throttling may clear bits in the IA32_PEBS_ENABLE MSR. CPUs with "PEBS
isolation" don't suffer from this issue, because perf_guest_get_msrs()
doesn't report the IA32_PEBS_ENABLE value.
Fixes: 9b545c04abd4f ("perf/x86/kvm: Avoid unnecessary work in guest filtering")
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Peter Shier <pshier@google.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20210422001834.1748319-1-jmattson@google.com
Command 'perf ftrace -v -- ls' fails in s390 (at least 5.12.0rc6).
The root cause is a missing pointer dereference which causes an
array element address to be used as PID.
Fix this by extracting the PID.
Output before:
# ./perf ftrace -v -- ls
function_graph tracer is used
write '-263732416' to tracing/set_ftrace_pid failed: Invalid argument
failed to set ftrace pid
#
Output after:
./perf ftrace -v -- ls
function_graph tracer is used
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
4) | rcu_read_lock_sched_held() {
4) 0.552 us | rcu_lockdep_current_cpu_online();
4) 6.124 us | }
Reported-by: Alexander Schmidt <alexschm@de.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20210421120400.2126433-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull scheduler fix from Borislav Petkov:
"Fix a typo in a macro ifdeffery"
* tag 'sched_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
preempt/dynamic: Fix typo in macro conditional statement
While this code is executed with the wait_lock held, a reader can
acquire the lock without holding wait_lock. The writer side loops
checking the value with the atomic_cond_read_acquire(), but only truly
acquires the lock when the compare-and-exchange is completed
successfully which isn’t ordered. This exposes the window between the
acquire and the cmpxchg to an A-B-A problem which allows reads
following the lock acquisition to observe values speculatively before
the write lock is truly acquired.
We've seen a problem in epoll where the reader does a xchg while
holding the read lock, but the writer can see a value change out from
under it.
Writer | Reader
--------------------------------------------------------------------------------
ep_scan_ready_list() |
|- write_lock_irq() |
|- queued_write_lock_slowpath() |
|- atomic_cond_read_acquire() |
| read_lock_irqsave(&ep->lock, flags);
--> (observes value before unlock) | chain_epi_lockless()
| | epi->next = xchg(&ep->ovflist, epi);
| | read_unlock_irqrestore(&ep->lock, flags);
| |
| atomic_cmpxchg_relaxed() |
|-- READ_ONCE(ep->ovflist); |
A core can order the read of the ovflist ahead of the
atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire
semantics addresses this issue at which point the atomic_cond_read can
be switched to use relaxed semantics.
Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock")
Signed-off-by: Ali Saidi <alisaidi@amazon.com>
[peterz: use try_cmpxchg()]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Tested-by: Steve Capper <steve.capper@arm.com>
There may be a kernel panic on the Haswell server and the Broadwell
server, if the snbep_pci2phy_map_init() return error.
The uncore_extra_pci_dev[HSWEP_PCI_PCU_3] is used in the cpu_init() to
detect the existence of the SBOX, which is a MSR type of PMON unit.
The uncore_extra_pci_dev is allocated in the uncore_pci_init(). If the
snbep_pci2phy_map_init() returns error, perf doesn't initialize the
PCI type of the PMON units, so the uncore_extra_pci_dev will not be
allocated. But perf may continue initializing the MSR type of PMON
units. A null dereference kernel panic will be triggered.
The sockets in a Haswell server or a Broadwell server are identical.
Only need to detect the existence of the SBOX once.
Current perf probes all available PCU devices and stores them into the
uncore_extra_pci_dev. It's unnecessary.
Use the pci_get_device() to replace the uncore_extra_pci_dev. Only
detect the existence of the SBOX on the first available PCU device once.
Factor out hswep_has_limit_sbox(), since the Haswell server and the
Broadwell server uses the same way to detect the existence of the SBOX.
Add some macros to replace the magic number.
Fixes: 5306c31c5733 ("perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes")
Reported-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lkml.kernel.org/r/1618521764-100923-1-git-send-email-kan.liang@linux.intel.com
In the function auxtrace_parse_snapshot_options(), the callback pointer
"itr->parse_snapshot_options" can be NULL if it has not been set during
the AUX record initialization. This can cause tool crashing if the
callback pointer "itr->parse_snapshot_options" is dereferenced without
performing NULL check.
Add a NULL check for the pointer "itr->parse_snapshot_options" before
invoke the callback.
Fixes: d20031bb63dd6dde ("perf tools: Add AUX area tracing Snapshot Mode")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: http://lore.kernel.org/lkml/20210420151554.2031768-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull x86 fix from Borislav Petkov:
"Fix an out-of-bounds memory access when setting up a crash kernel with
kexec"
* tag 'x86_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/crash: Fix crash_setup_memmap_entries() out-of-bounds access
Commit 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched()
static call") tried to provide irqentry_exit_cond_resched() static call
in irqentry_exit, but has a typo in macro conditional statement.
Fixes: 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched() static call")
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210410073523.5493-1-zhouzhouyi@gmail.com
Although 'ret' has been initialized to -1, but it will be reassigned by
the "ret = open(...)" statement in the for loop. So that, the value of
'ret' is unknown when asprintf() failed.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210415083417.3740-1-thunder.leizhen@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull kvm fix from Paolo Bonzini:
"Fix SRCU bug introduced in the merge window"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: x86/xen: Take srcu lock when accessing kvm_memslots()
Commit in Fixes: added support for kexec-ing a kernel on panic using a
new system call. As part of it, it does prepare a memory map for the new
kernel.
However, while doing so, it wrongly accesses memory it has not
allocated: it accesses the first element of the cmem->ranges[] array in
memmap_exclude_ranges() but it has not allocated the memory for it in
crash_setup_memmap_entries(). As KASAN reports:
BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0
Write of size 8 at addr ffffc90000426008 by task kexec/1187
(gdb) list *crash_setup_memmap_entries+0x17e
0xffffffff8107cafe is in crash_setup_memmap_entries (arch/x86/kernel/crash.c:322).
317 unsigned long long mend)
318 {
319 unsigned long start, end;
320
321 cmem->ranges[0].start = mstart;
322 cmem->ranges[0].end = mend;
323 cmem->nr_ranges = 1;
324
325 /* Exclude elf header region */
326 start = image->arch.elf_load_addr;
(gdb)
Make sure the ranges array becomes a single element allocated.
[ bp: Write a proper commit message. ]
Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call")
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/725fa3dc1da2737f0f6188a1a9701bead257ea9d.camel@gmx.de
Pull btrfs fix from David Sterba:
"One more patch that we'd like to get to 5.12 before release.
It's changing where and how the superblock is stored in the zoned
mode. It is an on-disk format change but so far there are no
implications for users as the proper mkfs support hasn't been merged
and is waiting for the kernel side to settle.
Until now, the superblocks were derived from the zone index, but zone
size can differ per device. This is changed to be based on fixed
offset values, to make it independent of the device zone size.
The work on that got a bit delayed, we discussed the exact locations
to support potential device sizes and usecases. (Partially delayed
also due to my vacation.) Having that in the same release where the
zoned mode is declared usable is highly desired, there are userspace
projects that need to be updated to recognize the feature. Pushing
that to the next release would make things harder to test"
* tag 'for-5.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: move superblock logging zone location
Pull ARM SoC fixes from Arnd Bergmann:
"Another smaller set of fixes for three of the Arm platforms:
TI OMAP:
Fix swapped mmc device order also for omap3 that got changed with
the recent PROBE_PREFER_ASYNCHRONOUS changes. While eventually the
aliases should be board specific, all the mmc device instances are
all there in the SoC, and we do probe them by default so that PM
runtime can idle the devices if left enabled from the bootloader.
Qualcomm Snapdragon:
This bypasses the recently introduced interconnect handling in
the GENI (serial engine) driver when running off ACPI, as this
causes the GENI probe to fail and the Lenovo Yoga C630 to boot
without keyboard and touchpad.
Allwinner:
One 32kHz clock fix for the beelink gs1, a CD polarity fix for the
SoPine, some MAINTAINERS maintainance, and a clk / reset switch to
our headers"
* tag 'arm-fixes-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: allwinner: h6: beelink-gs1: Remove ext. 32 kHz osc reference
MAINTAINERS: Match on allwinner keyword
MAINTAINERS: Add our new mailing-list
arm64: dts: allwinner: Fix SD card CD GPIO for SOPine systems
arm64: dts: allwinner: h6: Switch to macros for RSB clock/reset indices
ARM: OMAP2+: Fix uninitialized sr_inst
ARM: dts: Fix swapped mmc order for omap3
ARM: OMAP2+: Fix warning for omap_init_time_of()
soc: qcom: geni: shield geni_icc_get() for ACPI boot
This reverts commit 04c53de57cb6435738961dace8b1b71d3ecd3c39.
Nathan Chancellor points out that it should not have been merged into
mainline by itself. It was a fix for "gcov: use kvmalloc()", which is
still in -mm/-next. Merging it alone has broken the build.
Link: https://github.com/ClangBuiltLinux/continuous-integration2/runs/2384465683?check_suite_focus=true
Reported-by: Nathan Chancellor <nathan@kernel.org>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 0c85a7e87465f2d4cbc768e245f4f45b2f299b05.
The games with 'rm' are on (two separate instances) of a local variable,
and make no difference.
Quoting Aditya Pakki:
"I was the author of the patch and it was the cause of the giant UMN
revert.
The patch is garbage and I was unaware of the steps involved in
retracting it. I *believed* the maintainers would pull it, given it
was already under Greg's list. The patch does not introduce any bugs
but is pointless and is stupid. I accept my incompetence and for not
requesting a revert earlier."
Link: https://lwn.net/Articles/854319/
Requested-by: Aditya Pakki <pakki001@umn.edu>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kvm_memslots() will be called by kvm_write_guest_offset_cached() so we should
take the srcu lock. Let's pull the srcu lock operation from kvm_steal_time_set_preempted()
again to fix xen part.
Fixes: 30b5c851af7 ("KVM: x86/xen: Add support for vCPU runstate information")
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1619166200-9215-1-git-send-email-wanpengli@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull locking fixlets from Ingo Molnar:
"Two minor fixes: one for a Clang warning, the other improves an
ambiguous/confusing kernel log message"
* tag 'locking-urgent-2021-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Address clang -Wformat warning printing for %hd
lockdep: Add a missing initialization hint to the "INFO: Trying to register non-static key" message
Moves the location of the superblock logging zones. The new locations of
the logging zones are now determined based on fixed block addresses
instead of on fixed zone numbers.
The old placement method based on fixed zone numbers causes problems when
one needs to inspect a file system image without access to the drive zone
information. In such case, the super block locations cannot be reliably
determined as the zone size is unknown. By locating the superblock logging
zones using fixed addresses, we can scan a dumped file system image without
the zone information since a super block copy will always be present at or
after the fixed known locations.
Introduce the following three pairs of zones containing fixed offset
locations, regardless of the device zone size.
- primary superblock: offset 0B (and the following zone)
- first copy: offset 512G (and the following zone)
- Second copy: offset 4T (4096G, and the following zone)
If a logging zone is outside of the disk capacity, we do not record the
superblock copy.
The first copy position is much larger than for a non-zoned filesystem,
which is at 64M. This is to avoid overlapping with the log zones for
the primary superblock. This higher location is arbitrary but allows
supporting devices with very large zone sizes, plus some space around in
between.
Such large zone size is unrealistic and very unlikely to ever be seen in
real devices. Currently, SMR disks have a zone size of 256MB, and we are
expecting ZNS drives to be in the 1-4GB range, so this limit gives us
room to breathe. For now, we only allow zone sizes up to 8GB. The
maximum zone size that would still fit in the space is 256G.
The fixed location addresses are somewhat arbitrary, with the intent of
maintaining superblock reliability for smaller and larger devices, with
the preference for the latter. For this reason, there are two superblocks
under the first 1T. This should cover use cases for physical devices and
for emulated/device-mapper devices.
The superblock logging zones are reserved for superblock logging and
never used for data or metadata blocks. Note that we only reserve the
two zones per primary/copy actually used for superblock logging. We do
not reserve the ranges of zones possibly containing superblocks with the
largest supported zone size (0-16GB, 512G-528GB, 4096G-4112G).
The zones containing the fixed location offsets used to store
superblocks on a non-zoned volume are also reserved to avoid confusion.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull ARM fixes from Russell King:
- Halve maximum number of CPUs if DEBUG_KMAP_LOCAL is enabled
- Fix conversion for_each_membock() to for_each_mem_range()
- Fix footbridge PCI mapping
- Avoid uprobes hooking on thumb instructions
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9071/1: uprobes: Don't hook on thumb instructions
ARM: footbridge: fix PCI interrupt mapping
ARM: 9069/1: NOMMU: Fix conversion for_each_membock() to for_each_mem_range()
ARM: 9063/1: mm: reduce maximum number of CPUs if DEBUG_KMAP_LOCAL is enabled
Fixes for omaps for v5.12-rc cycle
Fix swapped mmc device order also for omap3 that got changed with the
recent PROBE_PREFER_ASYNCHRONOUS changes. While eventually the aliases
should be board specific, all the mmc device instances are all there in
the SoC, and we do probe them by default so that PM runtime can idle the
devices if left enabled from the bootloader.
Also included are two compiler warning fixes.
* tag 'omap-for-v5.12/fixes-rc6-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: Fix uninitialized sr_inst
ARM: dts: Fix swapped mmc order for omap3
ARM: OMAP2+: Fix warning for omap_init_time_of()
ARM: OMAP4: PM: update ROM return address for OSWR and OFF
ARM: OMAP4: Fix PMIC voltage domains for bionic
ARM: dts: Fix moving mmc devices with aliases for omap4 & 5
ARM: dts: Drop duplicate sha2md5_fck to fix clk_disable race
Link: https://lore.kernel.org/r/pull-1617702755-711306@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull pin control fixes from Linus Walleij:
"Late pin control fixes, would have been in the main pull request
normally but hey I got lucky and we got another week to polish up
v5.12 so here we go.
One driver fix and one making the core debugfs work:
- Fix the number of pins in the community of the Intel Lewisburg SoC
- Show pin numbers for controllers with base = 0 in the new debugfs
feature"
* tag 'pinctrl-v5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: core: Show pin numbers for the controllers with base = 0
pinctrl: lewisburg: Update number of pins in community
Pull x86 fixes from Borislav Petkov:
- Fix the vDSO exception handling return path to disable interrupts
again.
- A fix for the CE collector to return the proper return values to its
callers which are used to convey what the collector has done with the
error address.
* tag 'x86_urgent_for_v5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/traps: Correct exc_general_protection() and math_error() return paths
RAS/CEC: Correct ce_add_elem()'s returned values
Clang doesn't like format strings that truncate a 32-bit
value to something shorter:
kernel/locking/lockdep.c:709:4: error: format specifies type 'short' but the argument has type 'int' [-Werror,-Wformat]
In this case, the warning is a slightly questionable, as it could realize
that both class->wait_type_outer and class->wait_type_inner are in fact
8-bit struct members, even though the result of the ?: operator becomes an
'int'.
However, there is really no point in printing the number as a 16-bit
'short' rather than either an 8-bit or 32-bit number, so just change
it to a normal %d.
Fixes: de8f5e4f2dc1 ("lockdep: Introduce wait-type checks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210322115531.3987555-1-arnd@kernel.org
Commit 1dae796aabf6 ("btrfs: inode: sink parameter start and len to
check_data_csum()") replaced the start parameter to check_data_csum()
with page_offset(), but page_offset() is not meaningful for direct I/O
pages. Bring back the start parameter.
Fixes: 265d4ac03fdf ("btrfs: sink parameter start and len to check_data_csum")
CC: stable@vger.kernel.org # 5.11+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull SCSI fixes from James Bottomley:
"Two fixes: the libsas fix is for a problem that occurs when trying to
change the cache type of an ATA device and the libiscsi one is a
regression fix from this merge window"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: libsas: Reset num_scatter if libata marks qc as NODATA
scsi: iscsi: Fix iSCSI cls conn state
Since uprobes is not supported for thumb, check that the thumb bit is
not set when matching the uprobes instruction hooks.
The Arm UDF instructions used for uprobes triggering
(UPROBE_SWBP_ARM_INSN and UPROBE_SS_ARM_INSN) coincidentally share the
same encoding as a pair of unallocated 32-bit thumb instructions (not
UDF) when the condition code is 0b1111 (0xf). This in effect makes it
possible to trigger the uprobes functionality from thumb, and at that
using two unallocated instructions which are not permanently undefined.
Signed-off-by: Fredrik Strupe <fredrik@strupe.net>
Cc: stable@vger.kernel.org
Fixes: c7edc9e326d5 ("ARM: add uprobes support")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Qualcomm fix for 5.12
This bypasses the, recently introduced, interconnect handling in the
GENI (serial engine) driver when running off ACPI, as this causes the
GENI probe to fail and the Lenovo Yoga C630 to boot without keyboard and
touchpad.
* tag 'qcom-drivers-fixes-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
soc: qcom: geni: shield geni_icc_get() for ACPI boot
Link: https://lore.kernel.org/r/20210404155604.712236-1-bjorn.andersson@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fix uninitialized sr_inst.
Fixes: fbfa463be8dc ("ARM: OMAP2+: Fix smartreflex init regression after dropping legacy data")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Merge misc fixes from Andrew Morton:
"5 patches.
Subsystems affected by this patch series: coda, overlayfs, and
mm (pagecache and memcg)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
tools/cgroup/slabinfo.py: updated to work on current kernel
mm/filemap: fix mapping_seek_hole_data on THP & 32-bit
mm/filemap: fix find_lock_entries hang on 32-bit THP
ovl: fix reference counting in ovl_mmap error path
coda: fix reference counting in coda_file_mmap error path
The commit f1b206cf7c57 ("pinctrl: core: print gpio in pins debugfs file")
enabled GPIO pin number and label in debugfs for pin controller. However,
it limited that feature to the chips where base is positive number. This,
in particular, excluded chips where base is 0 for the historical or backward
compatibility reasons. Refactor the code to include the latter as well.
Fixes: f1b206cf7c57 ("pinctrl: core: print gpio in pins debugfs file")
Cc: Drew Fustini <drew@beagleboard.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Drew Fustini <drew@beagleboard.org>
Reviewed-by: Drew Fustini <drew@beagleboard.org>
Link: https://lore.kernel.org/r/20210415130356.15885-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Pull percpu fix from Dennis Zhou:
"This contains a fix for sporadically failing atomic percpu
allocations.
I only caught it recently while I was reviewing a new series [1] and
simultaneously saw reports by btrfs in xfstests [2] and [3].
In v5.9, memcg accounting was extended to percpu done by adding a
second type of chunk. I missed an interaction with the free page float
count used to ensure we can support atomic allocations. If one type of
chunk has no free pages, but the other has enough to satisfy the free
page float requirement, we will not repopulate the free pages for the
former type of chunk. This led to the sporadically failing atomic
allocations"
Link: https://lore.kernel.org/linux-mm/20210324190626.564297-1-guro@fb.com/ [1]
Link: https://lore.kernel.org/linux-mm/20210401185158.3275.409509F4@e16-tech.com/ [2]
Link: https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/ [3]
* 'for-5.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
percpu: make pcpu_nr_empty_pop_pages per chunk type
Commit
334872a09198 ("x86/traps: Attempt to fixup exceptions in vDSO before signaling")
added return statements which bypass calling cond_local_irq_disable().
According to
ca4c6a9858c2 ("x86/traps: Make interrupt enable/disable symmetric in C code"),
cond_local_irq_disable() is needed because the asm return code no longer
disables interrupts. Follow the existing code as an example to use "goto
exit" instead of "return" statement.
[ bp: Massage commit message. ]
Fixes: 334872a09198 ("x86/traps: Attempt to fixup exceptions in vDSO before signaling")
Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/1617902914-83245-1-git-send-email-thomas.tai@oracle.com
Since this message is printed when dynamically allocated spinlocks (e.g.
kzalloc()) are used without initialization (e.g. spin_lock_init()),
suggest to developers to check whether initialization functions for objects
were called, before making developers wonder what annotation is missing.
[ mingo: Minor tweaks to the message. ]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210321064913.4619-1-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While removing a qgroup's sysfs entry we end up taking the kernfs_mutex,
through kobject_del(), while holding the fs_info->qgroup_lock spinlock,
producing the following trace:
[821.843637] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281
[821.843641] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 28214, name: podman
[821.843644] CPU: 3 PID: 28214 Comm: podman Tainted: G W 5.11.6 #15
[821.843646] Hardware name: Dell Inc. PowerEdge R330/084XW4, BIOS 2.11.0 12/08/2020
[821.843647] Call Trace:
[821.843650] dump_stack+0xa1/0xfb
[821.843656] ___might_sleep+0x144/0x160
[821.843659] mutex_lock+0x17/0x40
[821.843662] kernfs_remove_by_name_ns+0x1f/0x80
[821.843666] sysfs_remove_group+0x7d/0xe0
[821.843668] sysfs_remove_groups+0x28/0x40
[821.843670] kobject_del+0x2a/0x80
[821.843672] btrfs_sysfs_del_one_qgroup+0x2b/0x40 [btrfs]
[821.843685] __del_qgroup_rb+0x12/0x150 [btrfs]
[821.843696] btrfs_remove_qgroup+0x288/0x2a0 [btrfs]
[821.843707] btrfs_ioctl+0x3129/0x36a0 [btrfs]
[821.843717] ? __mod_lruvec_page_state+0x5e/0xb0
[821.843719] ? page_add_new_anon_rmap+0xbc/0x150
[821.843723] ? kfree+0x1b4/0x300
[821.843725] ? mntput_no_expire+0x55/0x330
[821.843728] __x64_sys_ioctl+0x5a/0xa0
[821.843731] do_syscall_64+0x33/0x70
[821.843733] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[821.843736] RIP: 0033:0x4cd3fb
[821.843741] RSP: 002b:000000c000906b20 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[821.843744] RAX: ffffffffffffffda RBX: 000000c000050000 RCX: 00000000004cd3fb
[821.843745] RDX: 000000c000906b98 RSI: 000000004010942a RDI: 000000000000000f
[821.843747] RBP: 000000c000907cd0 R08: 000000c000622901 R09: 0000000000000000
[821.843748] R10: 000000c000d992c0 R11: 0000000000000206 R12: 000000000000012d
[821.843749] R13: 000000000000012c R14: 0000000000000200 R15: 0000000000000049
Fix this by removing the qgroup sysfs entry while not holding the spinlock,
since the spinlock is only meant for protection of the qgroup rbtree.
Reported-by: Stuart Shelton <srcshelton@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/7A5485BB-0628-419D-A4D3-27B1AF47E25A@gmail.com/
Fixes: 49e5fb46211de0 ("btrfs: qgroup: export qgroups in sysfs")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull vmwgfx fixes from Dave Airlie:
"This contains two regression fixes for vmwgfx, one due to a refactor
which meant locks were being used before initialisation, and the other
in fixing up some warnings from the core when destroying pinned
buffers.
vmwgfx:
- fixed unpinning before destruction
- lockdep init reordering"
* tag 'drm-fixes-2021-04-18' of git://anongit.freedesktop.org/drm/drm:
drm/vmwgfx: Make sure bo's are unpinned before putting them back
drm/vmwgfx: Fix the lockdep breakage
drm/vmwgfx: Make sure we unpin no longer needed buffers
When the cache_type for the SCSI device is changed, the SCSI layer issues a
MODE_SELECT command. The caching mode details are communicated via a
request buffer associated with the SCSI command with data direction set as
DMA_TO_DEVICE (scsi_mode_select()). When this command reaches the libata
layer, as a part of generic initial setup, libata layer sets up the
scatterlist for the command using the SCSI command (ata_scsi_qc_new()).
This command is then translated by the libata layer into
ATA_CMD_SET_FEATURES (ata_scsi_mode_select_xlat()). The libata layer treats
this as a non-data command (ata_mselect_caching()), since it only needs an
ATA taskfile to pass the caching on/off information to the device. It does
not need the scatterlist that has been setup, so it does not perform
dma_map_sg() on the scatterlist (ata_qc_issue()). Unfortunately, when this
command reaches the libsas layer (sas_ata_qc_issue()), libsas layer sees it
as a non-data command with a scatterlist. It cannot extract the correct DMA
length since the scatterlist has not been mapped with dma_map_sg() for a
DMA operation. When this partially constructed SAS task reaches pm80xx
LLDD, it results in the following warning:
"pm80xx_chip_sata_req 6058: The sg list address
start_addr=0x0000000000000000 data_len=0x0end_addr_high=0xffffffff
end_addr_low=0xffffffff has crossed 4G boundary"
Update libsas to handle ATA non-data commands separately so num_scatter and
total_xfer_len remain 0.
Link: https://lore.kernel.org/r/20210318225632.2481291-1-jollys@google.com
Fixes: 53de092f47ff ("scsi: libsas: Set data_dir as DMA_NONE if libata marks qc as NODATA")
Tested-by: Luo Jiaxing <luojiaxing@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jolly Shah <jollys@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since commit 30fdfb929e82 ("PCI: Add a call to pci_assign_irq() in
pci_device_probe()"), the PCI code will call the IRQ mapping function
whenever a PCI driver is probed. If these are marked as __init, this
causes an oops if a PCI driver is loaded or bound after the kernel has
initialised.
Fixes: 30fdfb929e82 ("PCI: Add a call to pci_assign_irq() in pci_device_probe()")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
One 32kHz clock fix for the beelink gs1, a CD polarity fix for the SoPine, some
MAINTAINERS maintainance, and a clk / reset switch to our headers.
* tag 'sunxi-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: dts: allwinner: h6: beelink-gs1: Remove ext. 32 kHz osc reference
MAINTAINERS: Match on allwinner keyword
MAINTAINERS: Add our new mailing-list
arm64: dts: allwinner: Fix SD card CD GPIO for SOPine systems
arm64: dts: allwinner: h6: Switch to macros for RSB clock/reset indices
Link: https://lore.kernel.org/r/9972a85e-60b7-49f4-a246-db3396dd4764.lettre@localhost
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently, GENI devices like i2c-qcom-geni fails to probe in ACPI boot,
if interconnect support is enabled. That's because interconnect driver
only supports DT right now. As interconnect is not necessarily required
for basic function of GENI devices, let's shield geni_icc_get() call,
and then all other ICC calls become nop due to NULL icc_path, so that
GENI devices keep working for ACPI boot.
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Link: https://lore.kernel.org/r/20210114112928.11368-1-shawn.guo@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Also some omap3 devices like n900 seem to have eMMC and micro-sd swapped
around with commit 21b2cec61c04 ("mmc: Set PROBE_PREFER_ASYNCHRONOUS for
drivers that existed in v4.4").
Let's fix the issue with aliases as discussed on the mailing lists. While
the mmc aliases should be board specific, let's first fix the issue with
minimal changes.
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Pull block fix from Jens Axboe:
"A single fix for a behavioral regression in this series, when
re-reading the partition table with partitions open"
* tag 'block-5.12-2021-04-23' of git://git.kernel.dk/linux-block:
block: return -EBUSY when there are open partitions in blkdev_reread_part
slabinfo.py script does not work with actual kernel version.
First, it was unable to recognise SLUB susbsytem, and when I specified
it manually it failed again with
AttributeError: 'struct page' has no member 'obj_cgroups'
.. and then again with
File "tools/cgroup/memcg_slabinfo.py", line 221, in main
memcg.kmem_caches.address_of_(),
AttributeError: 'struct mem_cgroup' has no member 'kmem_caches'
Link: https://lkml.kernel.org/r/cec1a75e-43b4-3d64-2084-d9f98fda037f@virtuozzo.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Roman Gushchin <guro@fb.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
intel-pinctrl for v5.12-4
* Fix pin numbering per community in Intel Lewisburg driver
The following is an automated git shortlog grouped by driver:
lewisburg:
- Update number of pins in community
Pull SCSI fixes from James Bottomley:
"Seven fixes, all in drivers.
The hpsa three are the most extensive and the most problematic: it's a
packed structure misalignment that oopses on ia64 but looks like it
would also oops on quite a few non-x86 architectures.
The pm80xx is a regression and the rest are bug fixes for patches in
the misc tree"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: scsi_transport_srp: Don't block target in SRP_PORT_LOST state
scsi: target: iscsi: Fix zero tag inside a trace event
scsi: pm80xx: Fix chip initialization failure
scsi: ufs: core: Fix wrong Task Tag used in task management request UPIUs
scsi: ufs: core: Fix task management request completion timeout
scsi: hpsa: Add an assert to prevent __packed reintroduction
scsi: hpsa: Fix boot on ia64 (atomic_t alignment)
scsi: hpsa: Use __packed on individual structs, not header-wide
nr_empty_pop_pages is used to guarantee that there are some free
populated pages to satisfy atomic allocations. Accounted and
non-accounted allocations are using separate sets of chunks,
so both need to have a surplus of empty pages.
This commit makes pcpu_nr_empty_pop_pages and the corresponding logic
per chunk type.
[Dennis]
This issue came up as I was reviewing [1] and realized I missed this.
Simultaneously, it was reported btrfs was seeing failed atomic
allocations in fsstress tests [2] and [3].
[1] https://lore.kernel.org/linux-mm/20210324190626.564297-1-guro@fb.com/
[2] https://lore.kernel.org/linux-mm/20210401185158.3275.409509F4@e16-tech.com/
[3] https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/
Fixes: 3c7be18ac9a0 ("mm: memcg/percpu: account percpu memory to memory cgroups")
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Roman Gushchin <guro@fb.com>
Tested-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
ce_add_elem() uses different return values to signal a result from
adding an element to the collector. Commit in Fixes: broke the case
where the element being added is not found in the array. Correct that.
[ bp: Rewrite commit message, add kernel-doc comments. ]
Fixes: de0e0624d86f ("RAS/CEC: Check count_threshold unconditionally")
Signed-off-by: William Roche <william.roche@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1617722939-29670-1-git-send-email-william.roche@oracle.com
On a Haswell machine, the perf_fuzzer managed to trigger this message:
[117248.075892] unchecked MSR access error: WRMSR to 0x3f1 (tried to
write 0x0400000000000000) at rIP: 0xffffffff8106e4f4
(native_write_msr+0x4/0x20)
[117248.089957] Call Trace:
[117248.092685] intel_pmu_pebs_enable_all+0x31/0x40
[117248.097737] intel_pmu_enable_all+0xa/0x10
[117248.102210] __perf_event_task_sched_in+0x2df/0x2f0
[117248.107511] finish_task_switch.isra.0+0x15f/0x280
[117248.112765] schedule_tail+0xc/0x40
[117248.116562] ret_from_fork+0x8/0x30
A fake event called VLBR_EVENT may use the bit 58 of the PEBS_ENABLE, if
the precise_ip is set. The bit 58 is reserved by the HW. Accessing the
bit causes the unchecked MSR access error.
The fake event doesn't support PEBS. The case should be rejected.
Fixes: 097e4311cda9 ("perf/x86: Add constraint to create guest LBR event without hw counter")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1615555298-140216-2-git-send-email-kan.liang@linux.intel.com
During the mount procedure we are calling btrfs_orphan_cleanup() against
the root tree, which will find all orphans items in this tree. When an
orphan item corresponds to a deleted subvolume/snapshot (instead of an
inode space cache), it must not delete the orphan item, because that will
cause btrfs_find_orphan_roots() to not find the orphan item and therefore
not add the corresponding subvolume root to the list of dead roots, which
results in the subvolume's tree never being deleted by the cleanup thread.
The same applies to the remount from RO to RW path.
Fix this by making btrfs_find_orphan_roots() run before calling
btrfs_orphan_cleanup() against the root tree.
A test case for fstests will follow soon.
Reported-by: Robbie Ko <robbieko@synology.com>
Link: https://lore.kernel.org/linux-btrfs/b19f4310-35e0-606e-1eea-2dd84d28c5da@synology.com/
Fixes: 638331fa56caea ("btrfs: fix transaction leak and crash after cleaning up orphans on RO mount")
CC: stable@vger.kernel.org # 5.11+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull i2c fix from Wolfram Sang:
"One more driver bugfix for I2C"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: mv64xxx: Fix random system lock caused by runtime PM
vmwgfx fixes for regressions in 5.12
Here's a set of 3 patches fixing ugly regressions
in the vmwgfx driver. We broke lock initialization
code and ended up using spinlocks before initialization
breaking lockdep.
Also there was a bit of a fallout from drm changes
which made the core validate that unreferenced buffers
have been unpinned. vmwgfx pinning code predates a lot
of the core drm and wasn't written to account for those
semantics. Fortunately changes required to fix it
are not too intrusive.
The changes have been validated by our internal ci.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f7add0a2-162e-3bd2-b1be-344a94f2acbf@vmware.com
In commit 9e67600ed6b8 ("scsi: iscsi: Fix race condition between login and
sync thread") I missed that libiscsi was now setting the iSCSI class state,
and that patch ended up resetting the state during conn stoppage and using
the wrong state value during ep_disconnect. This patch moves the setting of
the class state to the class module and then fixes the two issues above.
Link: https://lore.kernel.org/r/20210406171746.5016-1-michael.christie@oracle.com
Fixes: 9e67600ed6b8 ("scsi: iscsi: Fix race condition between login and sync thread")
Cc: Gulam Mohamed <gulam.mohamed@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
for_each_mem_range() uses a loop variable, yet looking into code it is
not just iteration counter but more complex entity which encodes
information about memblock. Thus condition i == 0 looks fragile.
Indeed, it broke boot of R-class platforms since it never took i == 0
path (due to i was set to 1). Fix that with restoring original flag
check.
Fixes: b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix potential NULL pointer dereference in the auxtrace option parser
- Fix access to PID in an array when setting a PID filter in 'perf ftrace'
- Fix error return code in the 'perf data' tool and in maps__clone(),
found using a static analysis tool from Huawei
* tag 'perf-tools-fixes-for-v5.12-2021-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf map: Fix error return code in maps__clone()
perf ftrace: Fix access to pid in array when setting a pid filter
perf auxtrace: Fix potential NULL pointer dereference
perf data: Fix error return code in perf_data__create_dir()
Pull x86 perf fixes from Borislav Petkov:
- Fix Broadwell Xeon's stepping in the PEBS isolation table of CPUs
- Fix a panic when initializing perf uncore machinery on Haswell and
Broadwell servers
* tag 'perf_urgent_for_v5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/kvm: Fix Broadwell Xeon stepping in isolation_ucodes[]
perf/x86/intel/uncore: Remove uncore extra PCI dev HSWEP_PCI_PCU_3
Although 'err' has been initialized to -ENOMEM, but it will be reassigned
by the "err = unwind__prepare_access(...)" statement in the for loop. So
that, the value of 'err' is unknown when map__clone() failed.
Fixes: 6c502584438bda63 ("perf unwind: Call unwind__prepare_access for forked thread")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: zhen lei <thunder.leizhen@huawei.com>
Link: http://lore.kernel.org/lkml/20210415092744.3793-1-thunder.leizhen@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The only stepping of Broadwell Xeon parts is stepping 1. Fix the
relevant isolation_ucodes[] entry, which previously enumerated
stepping 2.
Although the original commit was characterized as an optimization, it
is also a workaround for a correctness issue.
If a PMI arrives between kvm's call to perf_guest_get_msrs() and the
subsequent VM-entry, a stale value for the IA32_PEBS_ENABLE MSR may be
restored at the next VM-exit. This is because, unbeknownst to kvm, PMI
throttling may clear bits in the IA32_PEBS_ENABLE MSR. CPUs with "PEBS
isolation" don't suffer from this issue, because perf_guest_get_msrs()
doesn't report the IA32_PEBS_ENABLE value.
Fixes: 9b545c04abd4f ("perf/x86/kvm: Avoid unnecessary work in guest filtering")
Signed-off-by: Jim Mattson <jmattson@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Peter Shier <pshier@google.com>
Acked-by: Andi Kleen <ak@linux.intel.com>
Link: https://lkml.kernel.org/r/20210422001834.1748319-1-jmattson@google.com
Command 'perf ftrace -v -- ls' fails in s390 (at least 5.12.0rc6).
The root cause is a missing pointer dereference which causes an
array element address to be used as PID.
Fix this by extracting the PID.
Output before:
# ./perf ftrace -v -- ls
function_graph tracer is used
write '-263732416' to tracing/set_ftrace_pid failed: Invalid argument
failed to set ftrace pid
#
Output after:
./perf ftrace -v -- ls
function_graph tracer is used
# tracer: function_graph
#
# CPU DURATION FUNCTION CALLS
# | | | | | | |
4) | rcu_read_lock_sched_held() {
4) 0.552 us | rcu_lockdep_current_cpu_online();
4) 6.124 us | }
Reported-by: Alexander Schmidt <alexschm@de.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20210421120400.2126433-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While this code is executed with the wait_lock held, a reader can
acquire the lock without holding wait_lock. The writer side loops
checking the value with the atomic_cond_read_acquire(), but only truly
acquires the lock when the compare-and-exchange is completed
successfully which isn’t ordered. This exposes the window between the
acquire and the cmpxchg to an A-B-A problem which allows reads
following the lock acquisition to observe values speculatively before
the write lock is truly acquired.
We've seen a problem in epoll where the reader does a xchg while
holding the read lock, but the writer can see a value change out from
under it.
Writer | Reader
--------------------------------------------------------------------------------
ep_scan_ready_list() |
|- write_lock_irq() |
|- queued_write_lock_slowpath() |
|- atomic_cond_read_acquire() |
| read_lock_irqsave(&ep->lock, flags);
--> (observes value before unlock) | chain_epi_lockless()
| | epi->next = xchg(&ep->ovflist, epi);
| | read_unlock_irqrestore(&ep->lock, flags);
| |
| atomic_cmpxchg_relaxed() |
|-- READ_ONCE(ep->ovflist); |
A core can order the read of the ovflist ahead of the
atomic_cmpxchg_relaxed(). Switching the cmpxchg to use acquire
semantics addresses this issue at which point the atomic_cond_read can
be switched to use relaxed semantics.
Fixes: b519b56e378ee ("locking/qrwlock: Use atomic_cond_read_acquire() when spinning in qrwlock")
Signed-off-by: Ali Saidi <alisaidi@amazon.com>
[peterz: use try_cmpxchg()]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steve Capper <steve.capper@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Tested-by: Steve Capper <steve.capper@arm.com>
There may be a kernel panic on the Haswell server and the Broadwell
server, if the snbep_pci2phy_map_init() return error.
The uncore_extra_pci_dev[HSWEP_PCI_PCU_3] is used in the cpu_init() to
detect the existence of the SBOX, which is a MSR type of PMON unit.
The uncore_extra_pci_dev is allocated in the uncore_pci_init(). If the
snbep_pci2phy_map_init() returns error, perf doesn't initialize the
PCI type of the PMON units, so the uncore_extra_pci_dev will not be
allocated. But perf may continue initializing the MSR type of PMON
units. A null dereference kernel panic will be triggered.
The sockets in a Haswell server or a Broadwell server are identical.
Only need to detect the existence of the SBOX once.
Current perf probes all available PCU devices and stores them into the
uncore_extra_pci_dev. It's unnecessary.
Use the pci_get_device() to replace the uncore_extra_pci_dev. Only
detect the existence of the SBOX on the first available PCU device once.
Factor out hswep_has_limit_sbox(), since the Haswell server and the
Broadwell server uses the same way to detect the existence of the SBOX.
Add some macros to replace the magic number.
Fixes: 5306c31c5733 ("perf/x86/uncore/hsw-ep: Handle systems with only two SBOXes")
Reported-by: Steve Wahl <steve.wahl@hpe.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Steve Wahl <steve.wahl@hpe.com>
Link: https://lkml.kernel.org/r/1618521764-100923-1-git-send-email-kan.liang@linux.intel.com
In the function auxtrace_parse_snapshot_options(), the callback pointer
"itr->parse_snapshot_options" can be NULL if it has not been set during
the AUX record initialization. This can cause tool crashing if the
callback pointer "itr->parse_snapshot_options" is dereferenced without
performing NULL check.
Add a NULL check for the pointer "itr->parse_snapshot_options" before
invoke the callback.
Fixes: d20031bb63dd6dde ("perf tools: Add AUX area tracing Snapshot Mode")
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: http://lore.kernel.org/lkml/20210420151554.2031768-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched()
static call") tried to provide irqentry_exit_cond_resched() static call
in irqentry_exit, but has a typo in macro conditional statement.
Fixes: 40607ee97e4e ("preempt/dynamic: Provide irqentry_exit_cond_resched() static call")
Signed-off-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210410073523.5493-1-zhouzhouyi@gmail.com
Although 'ret' has been initialized to -1, but it will be reassigned by
the "ret = open(...)" statement in the for loop. So that, the value of
'ret' is unknown when asprintf() failed.
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210415083417.3740-1-thunder.leizhen@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Commit in Fixes: added support for kexec-ing a kernel on panic using a
new system call. As part of it, it does prepare a memory map for the new
kernel.
However, while doing so, it wrongly accesses memory it has not
allocated: it accesses the first element of the cmem->ranges[] array in
memmap_exclude_ranges() but it has not allocated the memory for it in
crash_setup_memmap_entries(). As KASAN reports:
BUG: KASAN: vmalloc-out-of-bounds in crash_setup_memmap_entries+0x17e/0x3a0
Write of size 8 at addr ffffc90000426008 by task kexec/1187
(gdb) list *crash_setup_memmap_entries+0x17e
0xffffffff8107cafe is in crash_setup_memmap_entries (arch/x86/kernel/crash.c:322).
317 unsigned long long mend)
318 {
319 unsigned long start, end;
320
321 cmem->ranges[0].start = mstart;
322 cmem->ranges[0].end = mend;
323 cmem->nr_ranges = 1;
324
325 /* Exclude elf header region */
326 start = image->arch.elf_load_addr;
(gdb)
Make sure the ranges array becomes a single element allocated.
[ bp: Write a proper commit message. ]
Fixes: dd5f726076cc ("kexec: support for kexec on panic using new system call")
Signed-off-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Young <dyoung@redhat.com>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/725fa3dc1da2737f0f6188a1a9701bead257ea9d.camel@gmx.de
Pull btrfs fix from David Sterba:
"One more patch that we'd like to get to 5.12 before release.
It's changing where and how the superblock is stored in the zoned
mode. It is an on-disk format change but so far there are no
implications for users as the proper mkfs support hasn't been merged
and is waiting for the kernel side to settle.
Until now, the superblocks were derived from the zone index, but zone
size can differ per device. This is changed to be based on fixed
offset values, to make it independent of the device zone size.
The work on that got a bit delayed, we discussed the exact locations
to support potential device sizes and usecases. (Partially delayed
also due to my vacation.) Having that in the same release where the
zoned mode is declared usable is highly desired, there are userspace
projects that need to be updated to recognize the feature. Pushing
that to the next release would make things harder to test"
* tag 'for-5.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: zoned: move superblock logging zone location
Pull ARM SoC fixes from Arnd Bergmann:
"Another smaller set of fixes for three of the Arm platforms:
TI OMAP:
Fix swapped mmc device order also for omap3 that got changed with
the recent PROBE_PREFER_ASYNCHRONOUS changes. While eventually the
aliases should be board specific, all the mmc device instances are
all there in the SoC, and we do probe them by default so that PM
runtime can idle the devices if left enabled from the bootloader.
Qualcomm Snapdragon:
This bypasses the recently introduced interconnect handling in
the GENI (serial engine) driver when running off ACPI, as this
causes the GENI probe to fail and the Lenovo Yoga C630 to boot
without keyboard and touchpad.
Allwinner:
One 32kHz clock fix for the beelink gs1, a CD polarity fix for the
SoPine, some MAINTAINERS maintainance, and a clk / reset switch to
our headers"
* tag 'arm-fixes-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
arm64: dts: allwinner: h6: beelink-gs1: Remove ext. 32 kHz osc reference
MAINTAINERS: Match on allwinner keyword
MAINTAINERS: Add our new mailing-list
arm64: dts: allwinner: Fix SD card CD GPIO for SOPine systems
arm64: dts: allwinner: h6: Switch to macros for RSB clock/reset indices
ARM: OMAP2+: Fix uninitialized sr_inst
ARM: dts: Fix swapped mmc order for omap3
ARM: OMAP2+: Fix warning for omap_init_time_of()
soc: qcom: geni: shield geni_icc_get() for ACPI boot
This reverts commit 04c53de57cb6435738961dace8b1b71d3ecd3c39.
Nathan Chancellor points out that it should not have been merged into
mainline by itself. It was a fix for "gcov: use kvmalloc()", which is
still in -mm/-next. Merging it alone has broken the build.
Link: https://github.com/ClangBuiltLinux/continuous-integration2/runs/2384465683?check_suite_focus=true
Reported-by: Nathan Chancellor <nathan@kernel.org>
Cc: Johannes Berg <johannes.berg@intel.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 0c85a7e87465f2d4cbc768e245f4f45b2f299b05.
The games with 'rm' are on (two separate instances) of a local variable,
and make no difference.
Quoting Aditya Pakki:
"I was the author of the patch and it was the cause of the giant UMN
revert.
The patch is garbage and I was unaware of the steps involved in
retracting it. I *believed* the maintainers would pull it, given it
was already under Greg's list. The patch does not introduce any bugs
but is pointless and is stupid. I accept my incompetence and for not
requesting a revert earlier."
Link: https://lwn.net/Articles/854319/
Requested-by: Aditya Pakki <pakki001@umn.edu>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kvm_memslots() will be called by kvm_write_guest_offset_cached() so we should
take the srcu lock. Let's pull the srcu lock operation from kvm_steal_time_set_preempted()
again to fix xen part.
Fixes: 30b5c851af7 ("KVM: x86/xen: Add support for vCPU runstate information")
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1619166200-9215-1-git-send-email-wanpengli@tencent.com>
Reviewed-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull locking fixlets from Ingo Molnar:
"Two minor fixes: one for a Clang warning, the other improves an
ambiguous/confusing kernel log message"
* tag 'locking-urgent-2021-04-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Address clang -Wformat warning printing for %hd
lockdep: Add a missing initialization hint to the "INFO: Trying to register non-static key" message
Moves the location of the superblock logging zones. The new locations of
the logging zones are now determined based on fixed block addresses
instead of on fixed zone numbers.
The old placement method based on fixed zone numbers causes problems when
one needs to inspect a file system image without access to the drive zone
information. In such case, the super block locations cannot be reliably
determined as the zone size is unknown. By locating the superblock logging
zones using fixed addresses, we can scan a dumped file system image without
the zone information since a super block copy will always be present at or
after the fixed known locations.
Introduce the following three pairs of zones containing fixed offset
locations, regardless of the device zone size.
- primary superblock: offset 0B (and the following zone)
- first copy: offset 512G (and the following zone)
- Second copy: offset 4T (4096G, and the following zone)
If a logging zone is outside of the disk capacity, we do not record the
superblock copy.
The first copy position is much larger than for a non-zoned filesystem,
which is at 64M. This is to avoid overlapping with the log zones for
the primary superblock. This higher location is arbitrary but allows
supporting devices with very large zone sizes, plus some space around in
between.
Such large zone size is unrealistic and very unlikely to ever be seen in
real devices. Currently, SMR disks have a zone size of 256MB, and we are
expecting ZNS drives to be in the 1-4GB range, so this limit gives us
room to breathe. For now, we only allow zone sizes up to 8GB. The
maximum zone size that would still fit in the space is 256G.
The fixed location addresses are somewhat arbitrary, with the intent of
maintaining superblock reliability for smaller and larger devices, with
the preference for the latter. For this reason, there are two superblocks
under the first 1T. This should cover use cases for physical devices and
for emulated/device-mapper devices.
The superblock logging zones are reserved for superblock logging and
never used for data or metadata blocks. Note that we only reserve the
two zones per primary/copy actually used for superblock logging. We do
not reserve the ranges of zones possibly containing superblocks with the
largest supported zone size (0-16GB, 512G-528GB, 4096G-4112G).
The zones containing the fixed location offsets used to store
superblocks on a non-zoned volume are also reserved to avoid confusion.
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull ARM fixes from Russell King:
- Halve maximum number of CPUs if DEBUG_KMAP_LOCAL is enabled
- Fix conversion for_each_membock() to for_each_mem_range()
- Fix footbridge PCI mapping
- Avoid uprobes hooking on thumb instructions
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9071/1: uprobes: Don't hook on thumb instructions
ARM: footbridge: fix PCI interrupt mapping
ARM: 9069/1: NOMMU: Fix conversion for_each_membock() to for_each_mem_range()
ARM: 9063/1: mm: reduce maximum number of CPUs if DEBUG_KMAP_LOCAL is enabled
Fixes for omaps for v5.12-rc cycle
Fix swapped mmc device order also for omap3 that got changed with the
recent PROBE_PREFER_ASYNCHRONOUS changes. While eventually the aliases
should be board specific, all the mmc device instances are all there in
the SoC, and we do probe them by default so that PM runtime can idle the
devices if left enabled from the bootloader.
Also included are two compiler warning fixes.
* tag 'omap-for-v5.12/fixes-rc6-signed' of git://git.kernel.org/pub/scm/linux/kernel/git/tmlind/linux-omap:
ARM: OMAP2+: Fix uninitialized sr_inst
ARM: dts: Fix swapped mmc order for omap3
ARM: OMAP2+: Fix warning for omap_init_time_of()
ARM: OMAP4: PM: update ROM return address for OSWR and OFF
ARM: OMAP4: Fix PMIC voltage domains for bionic
ARM: dts: Fix moving mmc devices with aliases for omap4 & 5
ARM: dts: Drop duplicate sha2md5_fck to fix clk_disable race
Link: https://lore.kernel.org/r/pull-1617702755-711306@atomide.com
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Pull pin control fixes from Linus Walleij:
"Late pin control fixes, would have been in the main pull request
normally but hey I got lucky and we got another week to polish up
v5.12 so here we go.
One driver fix and one making the core debugfs work:
- Fix the number of pins in the community of the Intel Lewisburg SoC
- Show pin numbers for controllers with base = 0 in the new debugfs
feature"
* tag 'pinctrl-v5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: core: Show pin numbers for the controllers with base = 0
pinctrl: lewisburg: Update number of pins in community
Pull x86 fixes from Borislav Petkov:
- Fix the vDSO exception handling return path to disable interrupts
again.
- A fix for the CE collector to return the proper return values to its
callers which are used to convey what the collector has done with the
error address.
* tag 'x86_urgent_for_v5.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/traps: Correct exc_general_protection() and math_error() return paths
RAS/CEC: Correct ce_add_elem()'s returned values
Clang doesn't like format strings that truncate a 32-bit
value to something shorter:
kernel/locking/lockdep.c:709:4: error: format specifies type 'short' but the argument has type 'int' [-Werror,-Wformat]
In this case, the warning is a slightly questionable, as it could realize
that both class->wait_type_outer and class->wait_type_inner are in fact
8-bit struct members, even though the result of the ?: operator becomes an
'int'.
However, there is really no point in printing the number as a 16-bit
'short' rather than either an 8-bit or 32-bit number, so just change
it to a normal %d.
Fixes: de8f5e4f2dc1 ("lockdep: Introduce wait-type checks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210322115531.3987555-1-arnd@kernel.org
Commit 1dae796aabf6 ("btrfs: inode: sink parameter start and len to
check_data_csum()") replaced the start parameter to check_data_csum()
with page_offset(), but page_offset() is not meaningful for direct I/O
pages. Bring back the start parameter.
Fixes: 265d4ac03fdf ("btrfs: sink parameter start and len to check_data_csum")
CC: stable@vger.kernel.org # 5.11+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull SCSI fixes from James Bottomley:
"Two fixes: the libsas fix is for a problem that occurs when trying to
change the cache type of an ATA device and the libiscsi one is a
regression fix from this merge window"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: libsas: Reset num_scatter if libata marks qc as NODATA
scsi: iscsi: Fix iSCSI cls conn state
Since uprobes is not supported for thumb, check that the thumb bit is
not set when matching the uprobes instruction hooks.
The Arm UDF instructions used for uprobes triggering
(UPROBE_SWBP_ARM_INSN and UPROBE_SS_ARM_INSN) coincidentally share the
same encoding as a pair of unallocated 32-bit thumb instructions (not
UDF) when the condition code is 0b1111 (0xf). This in effect makes it
possible to trigger the uprobes functionality from thumb, and at that
using two unallocated instructions which are not permanently undefined.
Signed-off-by: Fredrik Strupe <fredrik@strupe.net>
Cc: stable@vger.kernel.org
Fixes: c7edc9e326d5 ("ARM: add uprobes support")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Qualcomm fix for 5.12
This bypasses the, recently introduced, interconnect handling in the
GENI (serial engine) driver when running off ACPI, as this causes the
GENI probe to fail and the Lenovo Yoga C630 to boot without keyboard and
touchpad.
* tag 'qcom-drivers-fixes-for-5.12' of git://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux:
soc: qcom: geni: shield geni_icc_get() for ACPI boot
Link: https://lore.kernel.org/r/20210404155604.712236-1-bjorn.andersson@linaro.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Merge misc fixes from Andrew Morton:
"5 patches.
Subsystems affected by this patch series: coda, overlayfs, and
mm (pagecache and memcg)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
tools/cgroup/slabinfo.py: updated to work on current kernel
mm/filemap: fix mapping_seek_hole_data on THP & 32-bit
mm/filemap: fix find_lock_entries hang on 32-bit THP
ovl: fix reference counting in ovl_mmap error path
coda: fix reference counting in coda_file_mmap error path
The commit f1b206cf7c57 ("pinctrl: core: print gpio in pins debugfs file")
enabled GPIO pin number and label in debugfs for pin controller. However,
it limited that feature to the chips where base is positive number. This,
in particular, excluded chips where base is 0 for the historical or backward
compatibility reasons. Refactor the code to include the latter as well.
Fixes: f1b206cf7c57 ("pinctrl: core: print gpio in pins debugfs file")
Cc: Drew Fustini <drew@beagleboard.org>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Drew Fustini <drew@beagleboard.org>
Reviewed-by: Drew Fustini <drew@beagleboard.org>
Link: https://lore.kernel.org/r/20210415130356.15885-1-andriy.shevchenko@linux.intel.com
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Pull percpu fix from Dennis Zhou:
"This contains a fix for sporadically failing atomic percpu
allocations.
I only caught it recently while I was reviewing a new series [1] and
simultaneously saw reports by btrfs in xfstests [2] and [3].
In v5.9, memcg accounting was extended to percpu done by adding a
second type of chunk. I missed an interaction with the free page float
count used to ensure we can support atomic allocations. If one type of
chunk has no free pages, but the other has enough to satisfy the free
page float requirement, we will not repopulate the free pages for the
former type of chunk. This led to the sporadically failing atomic
allocations"
Link: https://lore.kernel.org/linux-mm/20210324190626.564297-1-guro@fb.com/ [1]
Link: https://lore.kernel.org/linux-mm/20210401185158.3275.409509F4@e16-tech.com/ [2]
Link: https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/ [3]
* 'for-5.12-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu:
percpu: make pcpu_nr_empty_pop_pages per chunk type
Commit
334872a09198 ("x86/traps: Attempt to fixup exceptions in vDSO before signaling")
added return statements which bypass calling cond_local_irq_disable().
According to
ca4c6a9858c2 ("x86/traps: Make interrupt enable/disable symmetric in C code"),
cond_local_irq_disable() is needed because the asm return code no longer
disables interrupts. Follow the existing code as an example to use "goto
exit" instead of "return" statement.
[ bp: Massage commit message. ]
Fixes: 334872a09198 ("x86/traps: Attempt to fixup exceptions in vDSO before signaling")
Signed-off-by: Thomas Tai <thomas.tai@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com>
Link: https://lkml.kernel.org/r/1617902914-83245-1-git-send-email-thomas.tai@oracle.com
Since this message is printed when dynamically allocated spinlocks (e.g.
kzalloc()) are used without initialization (e.g. spin_lock_init()),
suggest to developers to check whether initialization functions for objects
were called, before making developers wonder what annotation is missing.
[ mingo: Minor tweaks to the message. ]
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210321064913.4619-1-penguin-kernel@I-love.SAKURA.ne.jp
Signed-off-by: Ingo Molnar <mingo@kernel.org>
While removing a qgroup's sysfs entry we end up taking the kernfs_mutex,
through kobject_del(), while holding the fs_info->qgroup_lock spinlock,
producing the following trace:
[821.843637] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:281
[821.843641] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 28214, name: podman
[821.843644] CPU: 3 PID: 28214 Comm: podman Tainted: G W 5.11.6 #15
[821.843646] Hardware name: Dell Inc. PowerEdge R330/084XW4, BIOS 2.11.0 12/08/2020
[821.843647] Call Trace:
[821.843650] dump_stack+0xa1/0xfb
[821.843656] ___might_sleep+0x144/0x160
[821.843659] mutex_lock+0x17/0x40
[821.843662] kernfs_remove_by_name_ns+0x1f/0x80
[821.843666] sysfs_remove_group+0x7d/0xe0
[821.843668] sysfs_remove_groups+0x28/0x40
[821.843670] kobject_del+0x2a/0x80
[821.843672] btrfs_sysfs_del_one_qgroup+0x2b/0x40 [btrfs]
[821.843685] __del_qgroup_rb+0x12/0x150 [btrfs]
[821.843696] btrfs_remove_qgroup+0x288/0x2a0 [btrfs]
[821.843707] btrfs_ioctl+0x3129/0x36a0 [btrfs]
[821.843717] ? __mod_lruvec_page_state+0x5e/0xb0
[821.843719] ? page_add_new_anon_rmap+0xbc/0x150
[821.843723] ? kfree+0x1b4/0x300
[821.843725] ? mntput_no_expire+0x55/0x330
[821.843728] __x64_sys_ioctl+0x5a/0xa0
[821.843731] do_syscall_64+0x33/0x70
[821.843733] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[821.843736] RIP: 0033:0x4cd3fb
[821.843741] RSP: 002b:000000c000906b20 EFLAGS: 00000206 ORIG_RAX: 0000000000000010
[821.843744] RAX: ffffffffffffffda RBX: 000000c000050000 RCX: 00000000004cd3fb
[821.843745] RDX: 000000c000906b98 RSI: 000000004010942a RDI: 000000000000000f
[821.843747] RBP: 000000c000907cd0 R08: 000000c000622901 R09: 0000000000000000
[821.843748] R10: 000000c000d992c0 R11: 0000000000000206 R12: 000000000000012d
[821.843749] R13: 000000000000012c R14: 0000000000000200 R15: 0000000000000049
Fix this by removing the qgroup sysfs entry while not holding the spinlock,
since the spinlock is only meant for protection of the qgroup rbtree.
Reported-by: Stuart Shelton <srcshelton@gmail.com>
Link: https://lore.kernel.org/linux-btrfs/7A5485BB-0628-419D-A4D3-27B1AF47E25A@gmail.com/
Fixes: 49e5fb46211de0 ("btrfs: qgroup: export qgroups in sysfs")
CC: stable@vger.kernel.org # 5.10+
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
Pull vmwgfx fixes from Dave Airlie:
"This contains two regression fixes for vmwgfx, one due to a refactor
which meant locks were being used before initialisation, and the other
in fixing up some warnings from the core when destroying pinned
buffers.
vmwgfx:
- fixed unpinning before destruction
- lockdep init reordering"
* tag 'drm-fixes-2021-04-18' of git://anongit.freedesktop.org/drm/drm:
drm/vmwgfx: Make sure bo's are unpinned before putting them back
drm/vmwgfx: Fix the lockdep breakage
drm/vmwgfx: Make sure we unpin no longer needed buffers
When the cache_type for the SCSI device is changed, the SCSI layer issues a
MODE_SELECT command. The caching mode details are communicated via a
request buffer associated with the SCSI command with data direction set as
DMA_TO_DEVICE (scsi_mode_select()). When this command reaches the libata
layer, as a part of generic initial setup, libata layer sets up the
scatterlist for the command using the SCSI command (ata_scsi_qc_new()).
This command is then translated by the libata layer into
ATA_CMD_SET_FEATURES (ata_scsi_mode_select_xlat()). The libata layer treats
this as a non-data command (ata_mselect_caching()), since it only needs an
ATA taskfile to pass the caching on/off information to the device. It does
not need the scatterlist that has been setup, so it does not perform
dma_map_sg() on the scatterlist (ata_qc_issue()). Unfortunately, when this
command reaches the libsas layer (sas_ata_qc_issue()), libsas layer sees it
as a non-data command with a scatterlist. It cannot extract the correct DMA
length since the scatterlist has not been mapped with dma_map_sg() for a
DMA operation. When this partially constructed SAS task reaches pm80xx
LLDD, it results in the following warning:
"pm80xx_chip_sata_req 6058: The sg list address
start_addr=0x0000000000000000 data_len=0x0end_addr_high=0xffffffff
end_addr_low=0xffffffff has crossed 4G boundary"
Update libsas to handle ATA non-data commands separately so num_scatter and
total_xfer_len remain 0.
Link: https://lore.kernel.org/r/20210318225632.2481291-1-jollys@google.com
Fixes: 53de092f47ff ("scsi: libsas: Set data_dir as DMA_NONE if libata marks qc as NODATA")
Tested-by: Luo Jiaxing <luojiaxing@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Jolly Shah <jollys@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since commit 30fdfb929e82 ("PCI: Add a call to pci_assign_irq() in
pci_device_probe()"), the PCI code will call the IRQ mapping function
whenever a PCI driver is probed. If these are marked as __init, this
causes an oops if a PCI driver is loaded or bound after the kernel has
initialised.
Fixes: 30fdfb929e82 ("PCI: Add a call to pci_assign_irq() in pci_device_probe()")
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
One 32kHz clock fix for the beelink gs1, a CD polarity fix for the SoPine, some
MAINTAINERS maintainance, and a clk / reset switch to our headers.
* tag 'sunxi-fixes-for-5.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux:
arm64: dts: allwinner: h6: beelink-gs1: Remove ext. 32 kHz osc reference
MAINTAINERS: Match on allwinner keyword
MAINTAINERS: Add our new mailing-list
arm64: dts: allwinner: Fix SD card CD GPIO for SOPine systems
arm64: dts: allwinner: h6: Switch to macros for RSB clock/reset indices
Link: https://lore.kernel.org/r/9972a85e-60b7-49f4-a246-db3396dd4764.lettre@localhost
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Currently, GENI devices like i2c-qcom-geni fails to probe in ACPI boot,
if interconnect support is enabled. That's because interconnect driver
only supports DT right now. As interconnect is not necessarily required
for basic function of GENI devices, let's shield geni_icc_get() call,
and then all other ICC calls become nop due to NULL icc_path, so that
GENI devices keep working for ACPI boot.
Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Shawn Guo <shawn.guo@linaro.org>
Link: https://lore.kernel.org/r/20210114112928.11368-1-shawn.guo@linaro.org
Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Also some omap3 devices like n900 seem to have eMMC and micro-sd swapped
around with commit 21b2cec61c04 ("mmc: Set PROBE_PREFER_ASYNCHRONOUS for
drivers that existed in v4.4").
Let's fix the issue with aliases as discussed on the mailing lists. While
the mmc aliases should be board specific, let's first fix the issue with
minimal changes.
Cc: Aaro Koskinen <aaro.koskinen@iki.fi>
Cc: Peter Ujfalusi <peter.ujfalusi@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
slabinfo.py script does not work with actual kernel version.
First, it was unable to recognise SLUB susbsytem, and when I specified
it manually it failed again with
AttributeError: 'struct page' has no member 'obj_cgroups'
.. and then again with
File "tools/cgroup/memcg_slabinfo.py", line 221, in main
memcg.kmem_caches.address_of_(),
AttributeError: 'struct mem_cgroup' has no member 'kmem_caches'
Link: https://lkml.kernel.org/r/cec1a75e-43b4-3d64-2084-d9f98fda037f@virtuozzo.com
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Tested-by: Roman Gushchin <guro@fb.com>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull SCSI fixes from James Bottomley:
"Seven fixes, all in drivers.
The hpsa three are the most extensive and the most problematic: it's a
packed structure misalignment that oopses on ia64 but looks like it
would also oops on quite a few non-x86 architectures.
The pm80xx is a regression and the rest are bug fixes for patches in
the misc tree"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: scsi_transport_srp: Don't block target in SRP_PORT_LOST state
scsi: target: iscsi: Fix zero tag inside a trace event
scsi: pm80xx: Fix chip initialization failure
scsi: ufs: core: Fix wrong Task Tag used in task management request UPIUs
scsi: ufs: core: Fix task management request completion timeout
scsi: hpsa: Add an assert to prevent __packed reintroduction
scsi: hpsa: Fix boot on ia64 (atomic_t alignment)
scsi: hpsa: Use __packed on individual structs, not header-wide
nr_empty_pop_pages is used to guarantee that there are some free
populated pages to satisfy atomic allocations. Accounted and
non-accounted allocations are using separate sets of chunks,
so both need to have a surplus of empty pages.
This commit makes pcpu_nr_empty_pop_pages and the corresponding logic
per chunk type.
[Dennis]
This issue came up as I was reviewing [1] and realized I missed this.
Simultaneously, it was reported btrfs was seeing failed atomic
allocations in fsstress tests [2] and [3].
[1] https://lore.kernel.org/linux-mm/20210324190626.564297-1-guro@fb.com/
[2] https://lore.kernel.org/linux-mm/20210401185158.3275.409509F4@e16-tech.com/
[3] https://lore.kernel.org/linux-mm/CAL3q7H5RNBjCi708GH7jnczAOe0BLnacT9C+OBgA-Dx9jhB6SQ@mail.gmail.com/
Fixes: 3c7be18ac9a0 ("mm: memcg/percpu: account percpu memory to memory cgroups")
Cc: stable@vger.kernel.org # 5.9+
Signed-off-by: Roman Gushchin <guro@fb.com>
Tested-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Dennis Zhou <dennis@kernel.org>
ce_add_elem() uses different return values to signal a result from
adding an element to the collector. Commit in Fixes: broke the case
where the element being added is not found in the array. Correct that.
[ bp: Rewrite commit message, add kernel-doc comments. ]
Fixes: de0e0624d86f ("RAS/CEC: Check count_threshold unconditionally")
Signed-off-by: William Roche <william.roche@oracle.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Cc: <stable@vger.kernel.org>
Link: https://lkml.kernel.org/r/1617722939-29670-1-git-send-email-william.roche@oracle.com
On a Haswell machine, the perf_fuzzer managed to trigger this message:
[117248.075892] unchecked MSR access error: WRMSR to 0x3f1 (tried to
write 0x0400000000000000) at rIP: 0xffffffff8106e4f4
(native_write_msr+0x4/0x20)
[117248.089957] Call Trace:
[117248.092685] intel_pmu_pebs_enable_all+0x31/0x40
[117248.097737] intel_pmu_enable_all+0xa/0x10
[117248.102210] __perf_event_task_sched_in+0x2df/0x2f0
[117248.107511] finish_task_switch.isra.0+0x15f/0x280
[117248.112765] schedule_tail+0xc/0x40
[117248.116562] ret_from_fork+0x8/0x30
A fake event called VLBR_EVENT may use the bit 58 of the PEBS_ENABLE, if
the precise_ip is set. The bit 58 is reserved by the HW. Accessing the
bit causes the unchecked MSR access error.
The fake event doesn't support PEBS. The case should be rejected.
Fixes: 097e4311cda9 ("perf/x86: Add constraint to create guest LBR event without hw counter")
Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1615555298-140216-2-git-send-email-kan.liang@linux.intel.com
During the mount procedure we are calling btrfs_orphan_cleanup() against
the root tree, which will find all orphans items in this tree. When an
orphan item corresponds to a deleted subvolume/snapshot (instead of an
inode space cache), it must not delete the orphan item, because that will
cause btrfs_find_orphan_roots() to not find the orphan item and therefore
not add the corresponding subvolume root to the list of dead roots, which
results in the subvolume's tree never being deleted by the cleanup thread.
The same applies to the remount from RO to RW path.
Fix this by making btrfs_find_orphan_roots() run before calling
btrfs_orphan_cleanup() against the root tree.
A test case for fstests will follow soon.
Reported-by: Robbie Ko <robbieko@synology.com>
Link: https://lore.kernel.org/linux-btrfs/b19f4310-35e0-606e-1eea-2dd84d28c5da@synology.com/
Fixes: 638331fa56caea ("btrfs: fix transaction leak and crash after cleaning up orphans on RO mount")
CC: stable@vger.kernel.org # 5.11+
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
vmwgfx fixes for regressions in 5.12
Here's a set of 3 patches fixing ugly regressions
in the vmwgfx driver. We broke lock initialization
code and ended up using spinlocks before initialization
breaking lockdep.
Also there was a bit of a fallout from drm changes
which made the core validate that unreferenced buffers
have been unpinned. vmwgfx pinning code predates a lot
of the core drm and wasn't written to account for those
semantics. Fortunately changes required to fix it
are not too intrusive.
The changes have been validated by our internal ci.
Signed-off-by: Zack Rusin <zackr@vmware.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Zack Rusin <zackr@vmware.com>
Link: https://patchwork.freedesktop.org/patch/msgid/f7add0a2-162e-3bd2-b1be-344a94f2acbf@vmware.com
In commit 9e67600ed6b8 ("scsi: iscsi: Fix race condition between login and
sync thread") I missed that libiscsi was now setting the iSCSI class state,
and that patch ended up resetting the state during conn stoppage and using
the wrong state value during ep_disconnect. This patch moves the setting of
the class state to the class module and then fixes the two issues above.
Link: https://lore.kernel.org/r/20210406171746.5016-1-michael.christie@oracle.com
Fixes: 9e67600ed6b8 ("scsi: iscsi: Fix race condition between login and sync thread")
Cc: Gulam Mohamed <gulam.mohamed@oracle.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
for_each_mem_range() uses a loop variable, yet looking into code it is
not just iteration counter but more complex entity which encodes
information about memblock. Thus condition i == 0 looks fragile.
Indeed, it broke boot of R-class platforms since it never took i == 0
path (due to i was set to 1). Fix that with restoring original flag
check.
Fixes: b10d6bca8720 ("arch, drivers: replace for_each_membock() with for_each_mem_range()")
Signed-off-by: Vladimir Murzin <vladimir.murzin@arm.com>
Acked-by: Mike Rapoport <rppt@linux.ibm.com>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>