commits
gcc doesn't care, but clang quite reasonably pointed out that the recent
commit e9ba16e68cce ("smpboot: Mark idle_init() as __always_inlined to
work around aggressive compiler un-inlining") did some really odd
things:
kernel/smpboot.c:50:20: warning: duplicate 'inline' declaration specifier [-Wduplicate-decl-specifier]
static inline void __always_inline idle_init(unsigned int cpu)
^
which not only has that duplicate inlining specifier, but the new
__always_inline was put in the wrong place of the function definition.
We put the storage class specifiers (ie things like "static" and
"extern") first, and the type information after that. And while the
compiler may not care, we put the inline specifier before the types.
So it should be just
static __always_inline void idle_init(unsigned int cpu)
instead.
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull powerpc fixes from Michael Ellerman:
- Fix guest to host memory corruption in H_RTAS due to missing nargs
check.
- Fix guest triggerable host crashes due to bad handling of nested
guest TM state.
- Fix possible crashes due to incorrect reference counting in
kvm_arch_vcpu_ioctl().
- Two commits fixing some regressions in KVM transactional memory
handling introduced by the recent rework of the KVM code.
Thanks to Nicholas Piggin, Alexey Kardashevskiy, and Michael Neuling.
* tag 'powerpc-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state
KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow
KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak
KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash
KVM: PPC: Book3S HV P9: Fix guest TM support
Pull timer fixes from Thomas Gleixner:
"A small set of timer related fixes:
- Plug a race between rearm and process tick in the posix CPU timers
code
- Make the optimization to avoid recalculation of the next timer
interrupt work correctly when there are no timers pending"
* tag 'timers-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Fix get_next_timer_interrupt() with no timers pending
posix-cpu-timers: Fix rearm racing against process tick
The H_ENTER_NESTED hypercall is handled by the L0, and it is a request
by the L1 to switch the context of the vCPU over to that of its L2
guest, and return with an interrupt indication. The L1 is responsible
for switching some registers to guest context, and the L0 switches
others (including all the hypervisor privileged state).
If the L2 MSR has TM active, then the L1 is responsible for
recheckpointing the L2 TM state. Then the L1 exits to L0 via the
H_ENTER_NESTED hcall, and the L0 saves the TM state as part of the exit,
and then it recheckpoints the TM state as part of the nested entry and
finally HRFIDs into the L2 with TM active MSR. Not efficient, but about
the simplest approach for something that's horrendously complicated.
Problems arise if the L1 exits to the L0 with a TM state which does not
match the L2 TM state being requested. For example if the L1 is
transactional but the L2 MSR is non-transactional, or vice versa. The
L0's HRFID can take a TM Bad Thing interrupt and crash.
Fix this by disallowing H_ENTER_NESTED in TM[T] state entirely, and then
ensuring that if the L1 is suspended then the L2 must have TM active,
and if the L1 is not suspended then the L2 must not have TM active.
Fixes: 360cae313702 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall")
Cc: stable@vger.kernel.org # v4.20+
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull x86 jump label fix from Thomas Gleixner:
"A single fix for jump labels to prevent the compiler from agressive
un-inlining which results in a section mismatch"
* tag 'locking-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
jump_labels: Mark __jump_label_transform() as __always_inlined to work around aggressive compiler un-inlining
Pull dyntick fixes from Frederic Weisbecker:
- Fix a rearm race in the posix cpu timer code
- Handle get_next_timer_interrupt() correctly when no timers are pending
Link: https://lore.kernel.org/r/20210715104218.81276-1-frederic@kernel.org
The kvmppc_rtas_hcall() sets the host rtas_args.rets pointer based on
the rtas_args.nargs that was provided by the guest. That guest nargs
value is not range checked, so the guest can cause the host rets pointer
to be pointed outside the args array. The individual rtas function
handlers check the nargs and nrets values to ensure they are correct,
but if they are not, the handlers store a -3 (0xfffffffd) failure
indication in rets[0] which corrupts host memory.
Fix this by testing up front whether the guest supplied nargs and nret
would exceed the array size, and fail the hcall directly without storing
a failure indication to rets[0].
Also expand on a comment about why we kill the guest and try not to
return errors directly if we have a valid rets[0] pointer.
Fixes: 8e591cb72047 ("KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls")
Cc: stable@vger.kernel.org # v3.10+
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull EFI fixes from Thomas Gleixner:
"A set of EFI fixes:
- Prevent memblock and I/O reserved resources to get out of sync when
EFI memreserve is in use.
- Don't claim a non-existing table is invalid
- Don't warn when firmware memory is already reserved correctly"
* tag 'efi-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/mokvar: Reserve the table only if it is in boot services data
efi/libstub: Fix the efi_load_initrd function description
firmware/efi: Tell memblock about EFI iomem reservations
efi/tpm: Differentiate missing and invalid final event log table.
In randconfig testing, certain UBSAN and CC Kconfig combinations
with GCC 10.3.0:
CONFIG_X86_32=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_UBSAN=y
# CONFIG_UBSAN_TRAP is not set
# CONFIG_UBSAN_BOUNDS is not set
CONFIG_UBSAN_SHIFT=y
# CONFIG_UBSAN_DIV_ZERO is not set
CONFIG_UBSAN_UNREACHABLE=y
CONFIG_UBSAN_BOOL=y
# CONFIG_UBSAN_ENUM is not set
# CONFIG_UBSAN_ALIGNMENT is not set
# CONFIG_UBSAN_SANITIZE_ALL is not set
... produce this build warning (and build error if
CONFIG_SECTION_MISMATCH_WARN_ONLY=y is set):
WARNING: modpost: vmlinux.o(.text+0x4c1cc): Section mismatch in reference from the function __jump_label_transform() to the function .init.text:text_poke_early()
The function __jump_label_transform() references
the function __init text_poke_early().
This is often because __jump_label_transform lacks a __init
annotation or the annotation of text_poke_early is wrong.
ERROR: modpost: Section mismatches detected.
The problem is that __jump_label_transform() gets uninlined by GCC,
despite there being only a single local scope user of the 'static inline'
function.
Mark the function __always_inline instead, to work around this compiler
bug/artifact.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
31cd0e119d50 ("timers: Recalculate next timer interrupt only when
necessary") subtly altered get_next_timer_interrupt()'s behaviour. The
function no longer consistently returns KTIME_MAX with no timers
pending.
In order to decide if there are any timers pending we check whether the
next expiry will happen NEXT_TIMER_MAX_DELTA jiffies from now.
Unfortunately, the next expiry time and the timer base clock are no
longer updated in unison. The former changes upon certain timer
operations (enqueue, expire, detach), whereas the latter keeps track of
jiffies as they move forward. Ultimately breaking the logic above.
A simplified example:
- Upon entering get_next_timer_interrupt() with:
jiffies = 1
base->clk = 0;
base->next_expiry = NEXT_TIMER_MAX_DELTA;
'base->next_expiry == base->clk + NEXT_TIMER_MAX_DELTA', the function
returns KTIME_MAX.
- 'base->clk' is updated to the jiffies value.
- The next time we enter get_next_timer_interrupt(), taking into account
no timer operations happened:
base->clk = 1;
base->next_expiry = NEXT_TIMER_MAX_DELTA;
'base->next_expiry != base->clk + NEXT_TIMER_MAX_DELTA', the function
returns a valid expire time, which is incorrect.
This ultimately might unnecessarily rearm sched's timer on nohz_full
setups, and add latency to the system[1].
So, introduce 'base->timers_pending'[2], update it every time
'base->next_expiry' changes, and use it in get_next_timer_interrupt().
[1] See tick_nohz_stop_tick().
[2] A quick pahole check on x86_64 and arm64 shows it doesn't make
'struct timer_base' any bigger.
Fixes: 31cd0e119d50 ("timers: Recalculate next timer interrupt only when necessary")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
vcpu_put is not called if the user copy fails. This can result in preempt
notifier corruption and crashes, among other issues.
Fixes: b3cebfe8c1ca ("KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210716024310.164448-2-npiggin@gmail.com
Pull core fix from Thomas Gleixner:
"A single update for the boot code to prevent aggressive un-inlining
which causes a section mismatch"
* tag 'core-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smpboot: Mark idle_init() as __always_inlined to work around aggressive compiler un-inlining
Pull EFI fixes for v5.14-rc2 from Ard Biesheuvel:
" - Ensure that memblock reservations and IO reserved resources remain in
sync when using the EFI memreserve feature.
- Don't complain about invalid TPM final event log table if it is
missing altogether.
- Comment header fix for the stub.
- Avoid a spurious warning when attempting to reserve firmware memory
that is already reserved in the first place."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
No need to give up the original sd minor even with this option,
and if we did we'd also need to fix the number of minors for
this configuration to actually work.
Fixes: 7c3f828b522b0 ("block: refactor device number setup in __device_add_disk")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Skip invalid hybrid PMU on hybrid systems when the atom (little) CPUs
are offlined.
- Fix 'perf test' problems related to the recently added hybrid
(BIG/little) code.
- Split ARM's coresight (hw tracing) decode by aux records to avoid
fatal decoding errors.
- Fix add event failure in 'perf probe' when running 32-bit perf in a
64-bit kernel.
- Fix 'perf sched record' failure when CONFIG_SCHEDSTATS is not set.
- Fix memory and refcount leaks detected by ASAn when running 'perf
test', should be clean of warnings now.
- Remove broken definition of __LITTLE_ENDIAN from tools'
linux/kconfig.h, which was breaking the build in some systems.
- Cast PTHREAD_STACK_MIN to int as it may turn into 'long
sysconf(__SC_THREAD_STACK_MIN_VALUE), breaking the build in some
systems.
- Fix libperf build error with LIBPFM4=1.
- Sync UAPI files changed by the memfd_secret new syscall.
* tag 'perf-tools-fixes-for-v5.14-2021-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (35 commits)
perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set
perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernel
perf data: Close all files in close_dir()
perf probe-file: Delete namelist in del_events() on the error path
perf test bpf: Free obj_buf
perf trace: Free strings in trace__parse_events_option()
perf trace: Free syscall tp fields in evsel->priv
perf trace: Free syscall->arg_fmt
perf trace: Free malloc'd trace fields on exit
perf lzma: Close lzma stream on exit
perf script: Fix memory 'threads' and 'cpus' leaks on exit
perf script: Release zstd data
perf session: Cleanup trace_event
perf inject: Close inject.output on exit
perf report: Free generated help strings for sort option
perf env: Fix memory leak of cpu_pmu_caps
perf test maps__merge_in: Fix memory leak of maps
perf dso: Fix memory leak in dso__new_map()
perf test event_update: Fix memory leak of unit
perf test event_update: Fix memory leak of evlist
...
Since the process wide cputime counter is started locklessly from
posix_cpu_timer_rearm(), it can be concurrently stopped by operations
on other timers from the same thread group, such as in the following
unlucky scenario:
CPU 0 CPU 1
----- -----
timer_settime(TIMER B)
posix_cpu_timer_rearm(TIMER A)
cpu_clock_sample_group()
(pct->timers_active already true)
handle_posix_cpu_timers()
check_process_timers()
stop_process_timers()
pct->timers_active = false
arm_timer(TIMER A)
tick -> run_posix_cpu_timers()
// sees !pct->timers_active, ignore
// our TIMER A
Fix this with simply locking process wide cputime counting start and
timer arm in the same block.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Fixes: 60f2ceaa8111 ("posix-cpu-timers: Remove unnecessary locking around cpu_clock_sample_group")
Cc: stable@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
When running CPU_FTR_P9_TM_HV_ASSIST, HFSCR[TM] is set for the guest
even if the host has CONFIG_TRANSACTIONAL_MEM=n, which causes it to be
unprepared to handle guest exits while transactional.
Normal guests don't have a problem because the HTM capability will not
be advertised, but a rogue or buggy one could crash the host.
Fixes: 4bb3c7a0208f ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210716024310.164448-1-npiggin@gmail.com
Pull dma-mapping fix from Christoph Hellwig:
- handle vmalloc addresses in dma_common_{mmap,get_sgtable} (Roman
Skakun)
* tag 'dma-mapping-5.14-1' of git://git.infradead.org/users/hch/dma-mapping:
dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable}
While this function is a static inline, and is only used once in
local scope, certain Kconfig variations may cause it to be compiled
as a standalone function:
89231bf0 <idle_init>:
89231bf0: 83 05 60 d9 45 89 01 addl $0x1,0x8945d960
89231bf7: 55 push %ebp
Resulting in this build failure:
WARNING: modpost: vmlinux.o(.text.unlikely+0x7fd5): Section mismatch in reference from the function idle_init() to the function .init.text:fork_idle()
The function idle_init() references
the function __init fork_idle().
This is often because idle_init lacks a __init
annotation or the annotation of fork_idle is wrong.
ERROR: modpost: Section mismatches detected.
Certain USBSAN options x86-32 builds with CONFIG_CC_OPTIMIZE_FOR_SIZE=y
seem to be causing this.
So mark idle_init() as __always_inline to work around this compiler
bug/feature.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
One of the SUSE QA tests triggered:
localhost kernel: efi: Failed to lookup EFI memory descriptor for 0x000000003dcf8000
which comes from x86's version of efi_arch_mem_reserve() trying to
reserve a memory region. Usually, that function expects
EFI_BOOT_SERVICES_DATA memory descriptors but the above case is for the
MOKvar table which is allocated in the EFI shim as runtime services.
That lead to a fix changing the allocation of that table to boot services.
However, that fix broke booting SEV guests with that shim leading to
this kernel fix
8d651ee9c71b ("x86/ioremap: Map EFI-reserved memory as encrypted for SEV")
which extended the ioremap hint to map reserved EFI boot services as
decrypted too.
However, all that wasn't needed, IMO, because that error message in
efi_arch_mem_reserve() was innocuous in this case - if the MOKvar table
is not in boot services, then it doesn't need to be reserved in the
first place because it is, well, in runtime services which *should* be
reserved anyway.
So do that reservation for the MOKvar table only if it is allocated
in boot services data. I couldn't find any requirement about where
that table should be allocated in, unlike the ESRT which allocation is
mandated to be done in boot services data by the UEFI spec.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Rewrite copy_huge_page() and move it into mm/util.c so it's always
available. Fixes an exposure of uninitialised memory on configurations
with HUGETLB and UFFD enabled and MIGRATION disabled.
Fixes: 8cc5fcbb5be8 ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull xfs fixes from Darrick Wong:
"A few fixes for issues in the new online shrink code, additional
corrections for my recent bug-hunt w.r.t. extent size hints on
realtime, and improved input checking of the GROWFSRT ioctl.
IOW, the usual 'I somehow got bored during the merge window and
resumed auditing the farther reaches of xfs':
- Fix shrink eligibility checking when sparse inode clusters enabled
- Reset '..' directory entries when unlinking directories to prevent
verifier errors if fs is shrinked later
- Don't report unusable extent size hints to FSGETXATTR
- Don't warn when extent size hints are unusable because the sysadmin
configured them that way
- Fix insufficient parameter validation in GROWFSRT ioctl
- Fix integer overflow when adding rt volumes to filesystem"
* tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: detect misaligned rtinherit directory extent size hints
xfs: fix an integer overflow error in xfs_growfs_rt
xfs: improve FSGROWFSRT precondition checking
xfs: don't expose misaligned extszinherit hints to userspace
xfs: correct the narrative around misaligned rtinherit/extszinherit dirs
xfs: reset child dir '..' entry when unlinking child
xfs: check for sparse inode clusters that cross new EOAG when shrinking
The tracepoints trace_sched_stat_{wait, sleep, iowait} are not exposed to user
if CONFIG_SCHEDSTATS is not set, "perf sched record" records the three events.
As a result, the command fails.
Before:
#perf sched record sleep 1
event syntax error: 'sched:sched_stat_wait'
\___ unknown tracepoint
Error: File /sys/kernel/tracing/events/sched/sched_stat_wait not found.
Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
Solution:
Check whether schedstat tracepoints are exposed. If no, these events are not recorded.
After:
# perf sched record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.163 MB perf.data (1091 samples) ]
# perf sched report
run measurement overhead: 4736 nsecs
sleep measurement overhead: 9059979 nsecs
the run test took 999854 nsecs
the sleep test took 8945271 nsecs
nr_run_events: 716
nr_sleep_events: 785
nr_wakeup_events: 0
...
------------------------------------------------------------
Fixes: 2a09b5de235a6 ("sched/fair: do not expose some tracepoints to user if CONFIG_SCHEDSTATS is not set")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: http://lore.kernel.org/lkml/20210713112358.194693-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The conversion to C introduced several bugs in TM handling that can
cause host crashes with TM bad thing interrupts. Mostly just simple
typos or missed logic in the conversion that got through due to my
not testing TM in the guest sufficiently.
- Early TM emulation for the softpatch interrupt should be done if fake
suspend mode is _not_ active.
- Early TM emulation wants to return immediately to the guest so as to
not doom transactions unnecessarily.
- And if exiting from the guest, the host MSR should include the TM[S]
bit if the guest was T/S, before it is treclaimed.
After this fix, all the TM selftests pass when running on a P9 processor
that implements TM with softpatch interrupt.
Fixes: 89d35b2391015 ("KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210712013650.376325-1-npiggin@gmail.com
Pull cifs fixes from Steve French:
"Five cifs/smb3 fixes, including a DFS failover fix, two fallocate
fixes, and two trivial coverity cleanups"
* tag '5.14-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix fallocate when trying to allocate a hole.
CIFS: Clarify SMB1 code for POSIX delete file
CIFS: Clarify SMB1 code for POSIX Create
cifs: support share failover when remounting
cifs: only write 64kb at a time when fallocating a small region of a file
xen-swiotlb can use vmalloc backed addresses for dma coherent allocations
and uses the common helpers. Properly handle them to unbreak Xen on
ARM platforms.
Fixes: 1b65c4e5a9af ("swiotlb-xen: use xen_alloc/free_coherent_pages")
Signed-off-by: Roman Skakun <roman_skakun@epam.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
[hch: split the patch, renamed the helpers]
Signed-off-by: Christoph Hellwig <hch@lst.de>
The soft_limit and hard_limit in the function efi_load_initrd describes
the preferred and max address of initrd loading location respectively.
However, the description wrongly describes it as the size of the
allocated memory.
Fix the function description.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Many thanks to Kirill for reminding that PageDoubleMap cannot be relied on
to warn of pte mappings in the Anon THP case; and a scan of subpages does
not seem appropriate here. Note how follow_trans_huge_pmd() does not even
mark an Anon THP as mlocked when compound_mapcount != 1: multiple mlocking
of Anon THP is avoided, so simply return from page_mlock() in this case.
Link: https://lore.kernel.org/lkml/cfa154c-d595-406-eb7d-eb9df730f944@google.com/
Fixes: d9770fcc1c0c ("mm/rmap: fix old bug: munlocking THP missed other mlocks")
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull iomap fixes from Darrick Wong:
"A handful of bugfixes for the iomap code.
There's nothing especially exciting here, just fixes for UBSAN (not
KASAN as I erroneously wrote in the tag message) warnings about
undefined behavior in the SEEK_DATA/SEEK_HOLE code, and some
reshuffling of per-page block state info to fix some problems with
gfs2.
- Fix KASAN warnings due to integer overflow in SEEK_DATA/SEEK_HOLE
- Fix assertion errors when using inlinedata files on gfs2"
* tag 'iomap-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: Don't create iomap_page objects in iomap_page_mkwrite_actor
iomap: Don't create iomap_page objects for inline files
iomap: Permit pages without an iop to enter writeback
iomap: remove the length variable in iomap_seek_hole
iomap: remove the length variable in iomap_seek_data
If we encounter a directory that has been configured to pass on an
extent size hint to a new realtime file and the hint isn't an integer
multiple of the rt extent size, we should flag the hint for
administrative review because that is a misconfiguration (that other
parts of the kernel will fix automatically).
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The "address" member of "struct probe_trace_point" uses long data type.
If kernel is 64-bit and perf program is 32-bit, size of "address"
variable is 32 bits.
As a result, upper 32 bits of address read from kernel are truncated, an
error occurs during address comparison in kprobe_warn_out_range().
Before:
# perf probe -a schedule
schedule is out of .text, skip it.
Error: Failed to add events.
Solution:
Change data type of "address" variable to u64 and change corresponding
address printing and value assignment.
After:
# perf.new.new probe -a schedule
Added new event:
probe:schedule (on schedule)
You can now use it in all perf tools, such as:
perf record -e probe:schedule -aR sleep 1
# perf probe -l
probe:schedule (on schedule@kernel/sched/core.c)
# perf record -e probe:schedule -aR sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.156 MB perf.data (1366 samples) ]
# perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'probe:schedule'
# Event count (approx.): 1366
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ............
#
6.22% migration/0 [kernel.kallsyms] [k] schedule
6.22% migration/1 [kernel.kallsyms] [k] schedule
6.22% migration/2 [kernel.kallsyms] [k] schedule
6.22% migration/3 [kernel.kallsyms] [k] schedule
6.15% migration/10 [kernel.kallsyms] [k] schedule
6.15% migration/11 [kernel.kallsyms] [k] schedule
6.15% migration/12 [kernel.kallsyms] [k] schedule
6.15% migration/13 [kernel.kallsyms] [k] schedule
6.15% migration/14 [kernel.kallsyms] [k] schedule
6.15% migration/15 [kernel.kallsyms] [k] schedule
6.15% migration/4 [kernel.kallsyms] [k] schedule
6.15% migration/5 [kernel.kallsyms] [k] schedule
6.15% migration/6 [kernel.kallsyms] [k] schedule
6.15% migration/7 [kernel.kallsyms] [k] schedule
6.15% migration/8 [kernel.kallsyms] [k] schedule
6.15% migration/9 [kernel.kallsyms] [k] schedule
0.22% rcu_sched [kernel.kallsyms] [k] schedule
...
#
# (Cannot load tips.txt file, please install perf!)
#
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jianlin Lv <jianlin.lv@arm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lore.kernel.org/lkml/20210715063723.11926-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I know nothing about zone_device pages and !device_private pages; but if
try_to_migrate_one() will do nothing for them, then it's better that
try_to_migrate() filter them first, than trawl through all their vmas.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Link: https://lore.kernel.org/lkml/1241d356-8ec9-f47b-a5ec-9b2bf66d242@google.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull RISC-V fixes from Palmer Dabbelt:
- properly set the memory size, which fixes 32-bit systems
- allow initrd to load anywhere in memory, rather that restricting it
to the first 256MiB
- fix the 'mem=' parameter on 64-bit systems to properly account for
the maximum supported memory now that the kernel is outside the
linear map
- avoid installing mappings into the last 4KiB of memory, which
conflicts with error values
- avoid the stack from being freed while it is being walked
- a handful of fixes to the new copy to/from user routines
* tag 'riscv-for-linus-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: __asm_copy_to-from_user: Fix: Typos in comments
riscv: __asm_copy_to-from_user: Remove unnecessary size check
riscv: __asm_copy_to-from_user: Fix: fail on RV32
riscv: __asm_copy_to-from_user: Fix: overrun copy
riscv: stacktrace: pin the task's stack in get_wchan
riscv: Make sure the kernel mapping does not overlap with IS_ERR_VALUE
riscv: Make sure the linear mapping does not use the kernel mapping
riscv: Fix memory_limit for 64-bit kernel
RISC-V: load initrd wherever it fits into memory
riscv: Fix 32-bit RISC-V boot failure
Remove the conditional checking for out_data_len and skipping the fallocate
if it is 0. This is wrong will actually change any legitimate the fallocate
where the entire region is unallocated into a no-op.
Additionally, before allocating the range, if FALLOC_FL_KEEP_SIZE is set then
we need to clamp the length of the fallocate region as to not extend the size of the file.
Fixes: 966a3cb7c7db ("cifs: improve fallocate emulation")
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
This reverts commit b7eb335e26a9c7f258c96b3962c283c379d3ede0.
It turns out that the problem with the clang -Wimplicit-fallthrough
warning is not about the kernel source code, but about clang itself, and
that the warning is unusable until clang fixes its broken ways.
In particular, when you enable this warning for clang, you not only get
warnings about implicit fallthroughs. You also get this:
warning: fallthrough annotation in unreachable code [-Wimplicit-fallthrough]
which is completely broken becasue it
(a) doesn't even tell you where the problem is (seriously: no line
numbers, no filename, no nothing).
(b) is fundamentally broken anyway, because there are perfectly valid
reasons to have a fallthrough statement even if it turns out that
it can perhaps not be reached.
In the kernel, an example of that second case is code in the scheduler:
switch (state) {
case cpuset:
if (IS_ENABLED(CONFIG_CPUSETS)) {
cpuset_cpus_allowed_fallback(p);
state = possible;
break;
}
fallthrough;
case possible:
where if CONFIG_CPUSETS is enabled you actually never hit the
fallthrough case at all. But that in no way makes the fallthrough
wrong.
So the warning is completely broken, and enabling it for clang is a very
bad idea.
In the meantime, we can keep the gcc option enabled, and make the gcc
build use
-Wimplicit-fallthrough=5
which means that we will at least continue to require a proper
fallthrough statement, and that gcc won't silently accept the magic
comment versions. Because gcc does this all correctly, and while the odd
"=5" part is kind of obscure, it's documented in [1]:
"-Wimplicit-fallthrough=5 doesn’t recognize any comments as
fallthrough comments, only attributes disable the warning"
so if clang ever fixes its bad behavior we can try enabling it there again.
Link: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html [1]
Cc: Kees Cook <keescook@chromium.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kexec_load_file() relies on the memblock infrastructure to avoid
stamping over regions of memory that are essential to the survival
of the system.
However, nobody seems to agree how to flag these regions as reserved,
and (for example) EFI only publishes its reservations in /proc/iomem
for the benefit of the traditional, userspace based kexec tool.
On arm64 platforms with GICv3, this can result in the payload being
placed at the location of the LPI tables. Shock, horror!
Let's augment the EFI reservation code with a memblock_reserve() call,
protecting our dear tables from the secondary kernel invasion.
Reported-by: Moritz Fischer <mdf@kernel.org>
Tested-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Pull Kbuild fixes from Masahiro Yamada:
- Restore the original behavior of scripts/setlocalversion when
LOCALVERSION is set to empty.
- Show Kconfig prompts even for 'make -s'
- Fix the combination of COFNIG_LTO_CLANG=y and CONFIG_MODVERSIONS=y
for older GNU Make versions
* tag 'kbuild-fixes-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
Documentation: Fix intiramfs script name
Kbuild: lto: fix module versionings mismatch in GNU make 3.X
kbuild: do not suppress Kconfig prompts for silent build
scripts/setlocalversion: fix a bug when LOCALVERSION is empty
Now that we create those objects in iomap_writepage_map when needed,
there's no need to pre-create them in iomap_page_mkwrite_actor anymore.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
During a realtime grow operation, we run a single transaction for each
rt bitmap block added to the filesystem. This means that each step has
to be careful to increase sb_rblocks appropriately.
Fix the integer overflow error in this calculation that can happen when
the extent size is very large. Found by running growfs to add a rt
volume to a filesystem formatted with a 1g rt extent size.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
When using 'perf report' in directory mode, the first file is not closed
on exit, causing a memory leak.
The problem is caused by the iterating variable never reaching 0.
Fixes: 145520631130bd64 ("perf data: Add perf_data__(create_dir|close_dir) functions")
Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Link: http://lore.kernel.org/lkml/20210716141122.858082-1-rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In the unlikely race case that page_mlock_one() finds VM_LOCKED has been
cleared by the time it got page table lock, page_vma_mapped_walk_done()
must be called before returning, either explicitly, or by a final call
to page_vma_mapped_walk() - otherwise the page table remains locked.
Fixes: cd62734ca60d ("mm/rmap: split try_to_munlock from try_to_unmap")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/lkml/20210711151446.GB4070@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/f71f8523-cba7-3342-40a7-114abc5d1f51@google.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 71f642833284 ("ACPI: utils: Fix reference counting in
for_each_acpi_dev_match()") started doing "acpi_dev_put()" on a pointer
that was possibly NULL. That fails miserably, because that helper
inline function is not set up to handle that case.
Just make acpi_dev_put() silently accept a NULL pointer, rather than
calling down to put_device() with an invalid offset off that NULL
pointer.
Link: https://lore.kernel.org/lkml/a607c149-6bf6-0fd0-0e31-100378504da2@kernel.dk/
Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Daniel Scally <djrscally@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Fixing typos and grammar mistakes and using more intuitive label
name.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
Fixes: ca6eaaa210de ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Coverity also complains about the way we calculate the offset
(starting from the address of a 4 byte array within the
header structure rather than from the beginning of the struct
plus 4 bytes) for SMB1 CIFSPOSIXDelFile. This changeset
doesn't change the address but makes it slightly clearer.
Addresses-Coverity: 711519 ("Out of bounds write")
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull configfs fix from Christoph Hellwig:
- fix the read and write iterators (Bart Van Assche)
* tag 'configfs-5.13-1' of git://git.infradead.org/users/hch/configfs:
configfs: fix the read and write iterators
Missing TPM final event log table is not a firmware bug.
Clearly if providing event log in the old format makes the final event
log invalid it should not be provided at least in that case.
Fixes: b4f1874c6216 ("tpm: check event log version before reading final events")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Pull ARM SoC fixes from Arnd Bergmann:
"Here are the patches for this week that came as the fallout of the
merge window:
- Two fixes for the NVidia memory controller driver
- multiple defconfig files get patched to turn CONFIG_FB back on
after that is no longer selected by CONFIG_DRM
- ffa and scmpi firmware drivers fixes, mostly addressing compiler
and documentation warnings
- Platform specific fixes for device tree files on ASpeed, Renesas
and NVidia SoC, mostly for recent regressions.
- A workaround for a regression on the USB PHY with devlink when the
usb-nop-xceiv driver is not available until the rootfs is mounted.
- Device tree compiler warnings in Arm Versatile-AB"
* tag 'soc-fixes-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits)
ARM: dts: versatile: Fix up interrupt controller node names
ARM: multi_v7_defconfig: Make NOP_USB_XCEIV driver built-in
ARM: configs: Update u8500_defconfig
ARM: configs: Update Vexpress defconfig
ARM: configs: Update Versatile defconfig
ARM: configs: Update RealView defconfig
ARM: configs: Update Integrator defconfig
arm: Typo s/PCI_IXP4XX_LEGACY/IXP4XX_PCI_LEGACY/
firmware: arm_scmi: Fix range check for the maximum number of pending messages
firmware: arm_scmi: Avoid padding in sensor message structure
firmware: arm_scmi: Fix kernel doc warnings about return values
firmware: arm_scpi: Fix kernel doc warnings
firmware: arm_scmi: Fix kernel doc warnings
ARM: shmobile: defconfig: Restore graphical consoles
firmware: arm_ffa: Fix a possible ffa_linux_errmap buffer overflow
firmware: arm_ffa: Fix the comment style
firmware: arm_ffa: Simplify probe function
firmware: arm_ffa: Ensure drivers provide a probe function
firmware: arm_scmi: Fix possible scmi_linux_errmap buffer overflow
firmware: arm_scmi: Ensure drivers provide a probe function
...
Documentation was not changed when renaming the script in commit
80e715a06c2d ("initramfs: rename gen_initramfs_list.sh to
gen_initramfs.sh"). Fixing this.
Basically does:
$ sed -i -e s/gen_initramfs_list.sh/gen_initramfs.sh/g $(git grep -l gen_initramfs_list.sh)
Fixes: 80e715a06c2d ("initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh")
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
In iomap_readpage_actor, don't create iop objects for inline inodes.
Otherwise, iomap_read_inline_data will set PageUptodate without setting
iop->uptodate, and iomap_page_release will eventually complain.
To prevent this kind of bug from occurring in the future, make sure the
page doesn't have private data attached in iomap_read_inline_data.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Improve the checking at the start of a realtime grow operation so that
we avoid accidentally set a new extent size that is too large and avoid
adding an rt volume to a filesystem with rmap or reflink because we
don't support rt rmap or reflink yet.
While we're at it, separate the checks so that we're only testing one
aspect at a time.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
ASan reports some memory leaks when running:
# perf test "42: BPF filter"
This second leak is caused by a strlist not being dellocated on error
inside probe_file__del_events.
This patch adds a goto label before the deallocation and makes the error
path jump to it.
Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Fixes: e7895e422e4da63d ("perf probe: Split del_perf_probe_events()")
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/174963c587ae77fa108af794669998e4ae558338.1626343282.git.rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The kernel recovers in due course from missing Mlocked pages: but there
was no point in calling page_mlock() (formerly known as
try_to_munlock()) on a THP, because nothing got done even when it was
found to be mapped in another VM_LOCKED vma.
It's true that we need to be careful: Mlocked accounting of pte-mapped
THPs is too difficult (so consistently avoided); but Mlocked accounting
of only-pmd-mapped THPs is supposed to work, even when multiple mappings
are mlocked and munlocked or munmapped. Refine the tests.
There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so
page_mlock_one() does not even have to worry about that complication.
(I said the kernel recovers: but would page reclaim be likely to split
THP before rediscovering that it's VM_LOCKED? I've not followed that up)
Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/lkml/cfa154c-d595-406-eb7d-eb9df730f944@google.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull SCSI fixes from James Bottomley:
"Four fixes, all in drivers, all of which can lead to user visible
problems in certain situations"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: Fix NULL dereference on XCOPY completion
scsi: mpt3sas: Transition IOC to Ready state during shutdown
scsi: target: Fix protect handling in WRITE SAME(32)
scsi: iscsi: Fix iface sysfs attr detection
Clean up:
The size of 0 will be evaluated in the next step. Not
required here.
Signed-off-by: Akira Tsukamoto <akira.tsukamoto@gmail.com>
Fixes: ca6eaaa210de ("riscv: __asm_copy_to-from_user: Optimize unaligned memory access and pipeline stall")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Coverity also complains about the way we calculate the offset
(starting from the address of a 4 byte array within the
header structure rather than from the beginning of the struct
plus 4 bytes) for SMB1 CIFSPOSIXCreate. This changeset
doesn't change the address but makes it slightly clearer.
Addresses-Coverity: 711518 ("Out of bounds write")
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull pwm fixes from Thierry Reding:
"A couple of fixes from Uwe that I missed for v5.14-rc1"
* tag 'pwm/for-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: ep93xx: Ensure configuring period and duty_cycle isn't wrongly skipped
pwm: berlin: Ensure configuring period and duty_cycle isn't wrongly skipped
pwm: tiecap: Ensure configuring period and duty_cycle isn't wrongly skipped
pwm: spear: Ensure configuring period and duty_cycle isn't wrongly skipped
pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped
Commit 7fe1e79b59ba ("configfs: implement the .read_iter and .write_iter
methods") changed the simple_read_from_buffer() calls into copy_to_iter()
calls and the simple_write_to_buffer() calls into copy_from_iter() calls.
The simple*buffer() methods update the file offset (*ppos) but the read
and write iterators not yet. Make the read and write iterators update the
file offset (iocb->ki_pos).
This patch has been tested as follows:
# modprobe target_core_user
# dd if=/sys/kernel/config/target/dbroot bs=1
/var/target
12+0 records in
12+0 records out
12 bytes copied, 9.5539e-05 s, 126 kB/s
# cd /sys/kernel/config/acpi/table
# mkdir test
# cd test
# dmesg -c >/dev/null; printf 'SSDT\x8\0\0\0abcdefghijklmnopqrstuvwxyz' | dd of=aml bs=1; dmesg -c
34+0 records in
34+0 records out
34 bytes copied, 0.010627 s, 3.2 kB/s
[ 261.056551] ACPI configfs: invalid table length
Reported-by: Yanko Kaneti <yaneti@declera.com>
Cc: Yanko Kaneti <yaneti@declera.com>
Fixes: 7fe1e79b59ba ("configfs: implement the .read_iter and .write_iter methods")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
gcc doesn't care, but clang quite reasonably pointed out that the recent
commit e9ba16e68cce ("smpboot: Mark idle_init() as __always_inlined to
work around aggressive compiler un-inlining") did some really odd
things:
kernel/smpboot.c:50:20: warning: duplicate 'inline' declaration specifier [-Wduplicate-decl-specifier]
static inline void __always_inline idle_init(unsigned int cpu)
^
which not only has that duplicate inlining specifier, but the new
__always_inline was put in the wrong place of the function definition.
We put the storage class specifiers (ie things like "static" and
"extern") first, and the type information after that. And while the
compiler may not care, we put the inline specifier before the types.
So it should be just
static __always_inline void idle_init(unsigned int cpu)
instead.
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull powerpc fixes from Michael Ellerman:
- Fix guest to host memory corruption in H_RTAS due to missing nargs
check.
- Fix guest triggerable host crashes due to bad handling of nested
guest TM state.
- Fix possible crashes due to incorrect reference counting in
kvm_arch_vcpu_ioctl().
- Two commits fixing some regressions in KVM transactional memory
handling introduced by the recent rework of the KVM code.
Thanks to Nicholas Piggin, Alexey Kardashevskiy, and Michael Neuling.
* tag 'powerpc-5.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV Nested: Sanitise H_ENTER_NESTED TM state
KVM: PPC: Book3S: Fix H_RTAS rets buffer overflow
KVM: PPC: Fix kvm_arch_vcpu_ioctl vcpu_load leak
KVM: PPC: Book3S: Fix CONFIG_TRANSACTIONAL_MEM=n crash
KVM: PPC: Book3S HV P9: Fix guest TM support
Pull timer fixes from Thomas Gleixner:
"A small set of timer related fixes:
- Plug a race between rearm and process tick in the posix CPU timers
code
- Make the optimization to avoid recalculation of the next timer
interrupt work correctly when there are no timers pending"
* tag 'timers-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers: Fix get_next_timer_interrupt() with no timers pending
posix-cpu-timers: Fix rearm racing against process tick
The H_ENTER_NESTED hypercall is handled by the L0, and it is a request
by the L1 to switch the context of the vCPU over to that of its L2
guest, and return with an interrupt indication. The L1 is responsible
for switching some registers to guest context, and the L0 switches
others (including all the hypervisor privileged state).
If the L2 MSR has TM active, then the L1 is responsible for
recheckpointing the L2 TM state. Then the L1 exits to L0 via the
H_ENTER_NESTED hcall, and the L0 saves the TM state as part of the exit,
and then it recheckpoints the TM state as part of the nested entry and
finally HRFIDs into the L2 with TM active MSR. Not efficient, but about
the simplest approach for something that's horrendously complicated.
Problems arise if the L1 exits to the L0 with a TM state which does not
match the L2 TM state being requested. For example if the L1 is
transactional but the L2 MSR is non-transactional, or vice versa. The
L0's HRFID can take a TM Bad Thing interrupt and crash.
Fix this by disallowing H_ENTER_NESTED in TM[T] state entirely, and then
ensuring that if the L1 is suspended then the L2 must have TM active,
and if the L1 is not suspended then the L2 must not have TM active.
Fixes: 360cae313702 ("KVM: PPC: Book3S HV: Nested guest entry via hypercall")
Cc: stable@vger.kernel.org # v4.20+
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull x86 jump label fix from Thomas Gleixner:
"A single fix for jump labels to prevent the compiler from agressive
un-inlining which results in a section mismatch"
* tag 'locking-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
jump_labels: Mark __jump_label_transform() as __always_inlined to work around aggressive compiler un-inlining
The kvmppc_rtas_hcall() sets the host rtas_args.rets pointer based on
the rtas_args.nargs that was provided by the guest. That guest nargs
value is not range checked, so the guest can cause the host rets pointer
to be pointed outside the args array. The individual rtas function
handlers check the nargs and nrets values to ensure they are correct,
but if they are not, the handlers store a -3 (0xfffffffd) failure
indication in rets[0] which corrupts host memory.
Fix this by testing up front whether the guest supplied nargs and nret
would exceed the array size, and fail the hcall directly without storing
a failure indication to rets[0].
Also expand on a comment about why we kill the guest and try not to
return errors directly if we have a valid rets[0] pointer.
Fixes: 8e591cb72047 ("KVM: PPC: Book3S: Add infrastructure to implement kernel-side RTAS calls")
Cc: stable@vger.kernel.org # v3.10+
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Pull EFI fixes from Thomas Gleixner:
"A set of EFI fixes:
- Prevent memblock and I/O reserved resources to get out of sync when
EFI memreserve is in use.
- Don't claim a non-existing table is invalid
- Don't warn when firmware memory is already reserved correctly"
* tag 'efi-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi/mokvar: Reserve the table only if it is in boot services data
efi/libstub: Fix the efi_load_initrd function description
firmware/efi: Tell memblock about EFI iomem reservations
efi/tpm: Differentiate missing and invalid final event log table.
In randconfig testing, certain UBSAN and CC Kconfig combinations
with GCC 10.3.0:
CONFIG_X86_32=y
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_UBSAN=y
# CONFIG_UBSAN_TRAP is not set
# CONFIG_UBSAN_BOUNDS is not set
CONFIG_UBSAN_SHIFT=y
# CONFIG_UBSAN_DIV_ZERO is not set
CONFIG_UBSAN_UNREACHABLE=y
CONFIG_UBSAN_BOOL=y
# CONFIG_UBSAN_ENUM is not set
# CONFIG_UBSAN_ALIGNMENT is not set
# CONFIG_UBSAN_SANITIZE_ALL is not set
... produce this build warning (and build error if
CONFIG_SECTION_MISMATCH_WARN_ONLY=y is set):
WARNING: modpost: vmlinux.o(.text+0x4c1cc): Section mismatch in reference from the function __jump_label_transform() to the function .init.text:text_poke_early()
The function __jump_label_transform() references
the function __init text_poke_early().
This is often because __jump_label_transform lacks a __init
annotation or the annotation of text_poke_early is wrong.
ERROR: modpost: Section mismatches detected.
The problem is that __jump_label_transform() gets uninlined by GCC,
despite there being only a single local scope user of the 'static inline'
function.
Mark the function __always_inline instead, to work around this compiler
bug/artifact.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
31cd0e119d50 ("timers: Recalculate next timer interrupt only when
necessary") subtly altered get_next_timer_interrupt()'s behaviour. The
function no longer consistently returns KTIME_MAX with no timers
pending.
In order to decide if there are any timers pending we check whether the
next expiry will happen NEXT_TIMER_MAX_DELTA jiffies from now.
Unfortunately, the next expiry time and the timer base clock are no
longer updated in unison. The former changes upon certain timer
operations (enqueue, expire, detach), whereas the latter keeps track of
jiffies as they move forward. Ultimately breaking the logic above.
A simplified example:
- Upon entering get_next_timer_interrupt() with:
jiffies = 1
base->clk = 0;
base->next_expiry = NEXT_TIMER_MAX_DELTA;
'base->next_expiry == base->clk + NEXT_TIMER_MAX_DELTA', the function
returns KTIME_MAX.
- 'base->clk' is updated to the jiffies value.
- The next time we enter get_next_timer_interrupt(), taking into account
no timer operations happened:
base->clk = 1;
base->next_expiry = NEXT_TIMER_MAX_DELTA;
'base->next_expiry != base->clk + NEXT_TIMER_MAX_DELTA', the function
returns a valid expire time, which is incorrect.
This ultimately might unnecessarily rearm sched's timer on nohz_full
setups, and add latency to the system[1].
So, introduce 'base->timers_pending'[2], update it every time
'base->next_expiry' changes, and use it in get_next_timer_interrupt().
[1] See tick_nohz_stop_tick().
[2] A quick pahole check on x86_64 and arm64 shows it doesn't make
'struct timer_base' any bigger.
Fixes: 31cd0e119d50 ("timers: Recalculate next timer interrupt only when necessary")
Signed-off-by: Nicolas Saenz Julienne <nsaenzju@redhat.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
vcpu_put is not called if the user copy fails. This can result in preempt
notifier corruption and crashes, among other issues.
Fixes: b3cebfe8c1ca ("KVM: PPC: Move vcpu_load/vcpu_put down to each ioctl case in kvm_arch_vcpu_ioctl")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210716024310.164448-2-npiggin@gmail.com
Pull core fix from Thomas Gleixner:
"A single update for the boot code to prevent aggressive un-inlining
which causes a section mismatch"
* tag 'core-urgent-2021-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
smpboot: Mark idle_init() as __always_inlined to work around aggressive compiler un-inlining
Pull EFI fixes for v5.14-rc2 from Ard Biesheuvel:
" - Ensure that memblock reservations and IO reserved resources remain in
sync when using the EFI memreserve feature.
- Don't complain about invalid TPM final event log table if it is
missing altogether.
- Comment header fix for the stub.
- Avoid a spurious warning when attempting to reserve firmware memory
that is already reserved in the first place."
Signed-off-by: Ingo Molnar <mingo@kernel.org>
No need to give up the original sd minor even with this option,
and if we did we'd also need to fix the number of minors for
this configuration to actually work.
Fixes: 7c3f828b522b0 ("block: refactor device number setup in __device_add_disk")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Skip invalid hybrid PMU on hybrid systems when the atom (little) CPUs
are offlined.
- Fix 'perf test' problems related to the recently added hybrid
(BIG/little) code.
- Split ARM's coresight (hw tracing) decode by aux records to avoid
fatal decoding errors.
- Fix add event failure in 'perf probe' when running 32-bit perf in a
64-bit kernel.
- Fix 'perf sched record' failure when CONFIG_SCHEDSTATS is not set.
- Fix memory and refcount leaks detected by ASAn when running 'perf
test', should be clean of warnings now.
- Remove broken definition of __LITTLE_ENDIAN from tools'
linux/kconfig.h, which was breaking the build in some systems.
- Cast PTHREAD_STACK_MIN to int as it may turn into 'long
sysconf(__SC_THREAD_STACK_MIN_VALUE), breaking the build in some
systems.
- Fix libperf build error with LIBPFM4=1.
- Sync UAPI files changed by the memfd_secret new syscall.
* tag 'perf-tools-fixes-for-v5.14-2021-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (35 commits)
perf sched: Fix record failure when CONFIG_SCHEDSTATS is not set
perf probe: Fix add event failure when running 32-bit perf in a 64-bit kernel
perf data: Close all files in close_dir()
perf probe-file: Delete namelist in del_events() on the error path
perf test bpf: Free obj_buf
perf trace: Free strings in trace__parse_events_option()
perf trace: Free syscall tp fields in evsel->priv
perf trace: Free syscall->arg_fmt
perf trace: Free malloc'd trace fields on exit
perf lzma: Close lzma stream on exit
perf script: Fix memory 'threads' and 'cpus' leaks on exit
perf script: Release zstd data
perf session: Cleanup trace_event
perf inject: Close inject.output on exit
perf report: Free generated help strings for sort option
perf env: Fix memory leak of cpu_pmu_caps
perf test maps__merge_in: Fix memory leak of maps
perf dso: Fix memory leak in dso__new_map()
perf test event_update: Fix memory leak of unit
perf test event_update: Fix memory leak of evlist
...
Since the process wide cputime counter is started locklessly from
posix_cpu_timer_rearm(), it can be concurrently stopped by operations
on other timers from the same thread group, such as in the following
unlucky scenario:
CPU 0 CPU 1
----- -----
timer_settime(TIMER B)
posix_cpu_timer_rearm(TIMER A)
cpu_clock_sample_group()
(pct->timers_active already true)
handle_posix_cpu_timers()
check_process_timers()
stop_process_timers()
pct->timers_active = false
arm_timer(TIMER A)
tick -> run_posix_cpu_timers()
// sees !pct->timers_active, ignore
// our TIMER A
Fix this with simply locking process wide cputime counting start and
timer arm in the same block.
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Fixes: 60f2ceaa8111 ("posix-cpu-timers: Remove unnecessary locking around cpu_clock_sample_group")
Cc: stable@vger.kernel.org
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Eric W. Biederman <ebiederm@xmission.com>
When running CPU_FTR_P9_TM_HV_ASSIST, HFSCR[TM] is set for the guest
even if the host has CONFIG_TRANSACTIONAL_MEM=n, which causes it to be
unprepared to handle guest exits while transactional.
Normal guests don't have a problem because the HTM capability will not
be advertised, but a rogue or buggy one could crash the host.
Fixes: 4bb3c7a0208f ("KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210716024310.164448-1-npiggin@gmail.com
While this function is a static inline, and is only used once in
local scope, certain Kconfig variations may cause it to be compiled
as a standalone function:
89231bf0 <idle_init>:
89231bf0: 83 05 60 d9 45 89 01 addl $0x1,0x8945d960
89231bf7: 55 push %ebp
Resulting in this build failure:
WARNING: modpost: vmlinux.o(.text.unlikely+0x7fd5): Section mismatch in reference from the function idle_init() to the function .init.text:fork_idle()
The function idle_init() references
the function __init fork_idle().
This is often because idle_init lacks a __init
annotation or the annotation of fork_idle is wrong.
ERROR: modpost: Section mismatches detected.
Certain USBSAN options x86-32 builds with CONFIG_CC_OPTIMIZE_FOR_SIZE=y
seem to be causing this.
So mark idle_init() as __always_inline to work around this compiler
bug/feature.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
One of the SUSE QA tests triggered:
localhost kernel: efi: Failed to lookup EFI memory descriptor for 0x000000003dcf8000
which comes from x86's version of efi_arch_mem_reserve() trying to
reserve a memory region. Usually, that function expects
EFI_BOOT_SERVICES_DATA memory descriptors but the above case is for the
MOKvar table which is allocated in the EFI shim as runtime services.
That lead to a fix changing the allocation of that table to boot services.
However, that fix broke booting SEV guests with that shim leading to
this kernel fix
8d651ee9c71b ("x86/ioremap: Map EFI-reserved memory as encrypted for SEV")
which extended the ioremap hint to map reserved EFI boot services as
decrypted too.
However, all that wasn't needed, IMO, because that error message in
efi_arch_mem_reserve() was innocuous in this case - if the MOKvar table
is not in boot services, then it doesn't need to be reserved in the
first place because it is, well, in runtime services which *should* be
reserved anyway.
So do that reservation for the MOKvar table only if it is allocated
in boot services data. I couldn't find any requirement about where
that table should be allocated in, unlike the ESRT which allocation is
mandated to be done in boot services data by the UEFI spec.
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Rewrite copy_huge_page() and move it into mm/util.c so it's always
available. Fixes an exposure of uninitialised memory on configurations
with HUGETLB and UFFD enabled and MIGRATION disabled.
Fixes: 8cc5fcbb5be8 ("mm, hugetlb: fix racy resv_huge_pages underflow on UFFDIO_COPY")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull xfs fixes from Darrick Wong:
"A few fixes for issues in the new online shrink code, additional
corrections for my recent bug-hunt w.r.t. extent size hints on
realtime, and improved input checking of the GROWFSRT ioctl.
IOW, the usual 'I somehow got bored during the merge window and
resumed auditing the farther reaches of xfs':
- Fix shrink eligibility checking when sparse inode clusters enabled
- Reset '..' directory entries when unlinking directories to prevent
verifier errors if fs is shrinked later
- Don't report unusable extent size hints to FSGETXATTR
- Don't warn when extent size hints are unusable because the sysadmin
configured them that way
- Fix insufficient parameter validation in GROWFSRT ioctl
- Fix integer overflow when adding rt volumes to filesystem"
* tag 'xfs-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: detect misaligned rtinherit directory extent size hints
xfs: fix an integer overflow error in xfs_growfs_rt
xfs: improve FSGROWFSRT precondition checking
xfs: don't expose misaligned extszinherit hints to userspace
xfs: correct the narrative around misaligned rtinherit/extszinherit dirs
xfs: reset child dir '..' entry when unlinking child
xfs: check for sparse inode clusters that cross new EOAG when shrinking
The tracepoints trace_sched_stat_{wait, sleep, iowait} are not exposed to user
if CONFIG_SCHEDSTATS is not set, "perf sched record" records the three events.
As a result, the command fails.
Before:
#perf sched record sleep 1
event syntax error: 'sched:sched_stat_wait'
\___ unknown tracepoint
Error: File /sys/kernel/tracing/events/sched/sched_stat_wait not found.
Hint: Perhaps this kernel misses some CONFIG_ setting to enable this feature?.
Run 'perf list' for a list of valid events
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
-e, --event <event> event selector. use 'perf list' to list available events
Solution:
Check whether schedstat tracepoints are exposed. If no, these events are not recorded.
After:
# perf sched record sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.163 MB perf.data (1091 samples) ]
# perf sched report
run measurement overhead: 4736 nsecs
sleep measurement overhead: 9059979 nsecs
the run test took 999854 nsecs
the sleep test took 8945271 nsecs
nr_run_events: 716
nr_sleep_events: 785
nr_wakeup_events: 0
...
------------------------------------------------------------
Fixes: 2a09b5de235a6 ("sched/fair: do not expose some tracepoints to user if CONFIG_SCHEDSTATS is not set")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Yafang Shao <laoar.shao@gmail.com>
Link: http://lore.kernel.org/lkml/20210713112358.194693-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The conversion to C introduced several bugs in TM handling that can
cause host crashes with TM bad thing interrupts. Mostly just simple
typos or missed logic in the conversion that got through due to my
not testing TM in the guest sufficiently.
- Early TM emulation for the softpatch interrupt should be done if fake
suspend mode is _not_ active.
- Early TM emulation wants to return immediately to the guest so as to
not doom transactions unnecessarily.
- And if exiting from the guest, the host MSR should include the TM[S]
bit if the guest was T/S, before it is treclaimed.
After this fix, all the TM selftests pass when running on a P9 processor
that implements TM with softpatch interrupt.
Fixes: 89d35b2391015 ("KVM: PPC: Book3S HV P9: Implement the rest of the P9 path in C")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210712013650.376325-1-npiggin@gmail.com
Pull cifs fixes from Steve French:
"Five cifs/smb3 fixes, including a DFS failover fix, two fallocate
fixes, and two trivial coverity cleanups"
* tag '5.14-rc2-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: fix fallocate when trying to allocate a hole.
CIFS: Clarify SMB1 code for POSIX delete file
CIFS: Clarify SMB1 code for POSIX Create
cifs: support share failover when remounting
cifs: only write 64kb at a time when fallocating a small region of a file
xen-swiotlb can use vmalloc backed addresses for dma coherent allocations
and uses the common helpers. Properly handle them to unbreak Xen on
ARM platforms.
Fixes: 1b65c4e5a9af ("swiotlb-xen: use xen_alloc/free_coherent_pages")
Signed-off-by: Roman Skakun <roman_skakun@epam.com>
Reviewed-by: Andrii Anisov <andrii_anisov@epam.com>
[hch: split the patch, renamed the helpers]
Signed-off-by: Christoph Hellwig <hch@lst.de>
The soft_limit and hard_limit in the function efi_load_initrd describes
the preferred and max address of initrd loading location respectively.
However, the description wrongly describes it as the size of the
allocated memory.
Fix the function description.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Many thanks to Kirill for reminding that PageDoubleMap cannot be relied on
to warn of pte mappings in the Anon THP case; and a scan of subpages does
not seem appropriate here. Note how follow_trans_huge_pmd() does not even
mark an Anon THP as mlocked when compound_mapcount != 1: multiple mlocking
of Anon THP is avoided, so simply return from page_mlock() in this case.
Link: https://lore.kernel.org/lkml/cfa154c-d595-406-eb7d-eb9df730f944@google.com/
Fixes: d9770fcc1c0c ("mm/rmap: fix old bug: munlocking THP missed other mlocks")
Reported-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Shakeel Butt <shakeelb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull iomap fixes from Darrick Wong:
"A handful of bugfixes for the iomap code.
There's nothing especially exciting here, just fixes for UBSAN (not
KASAN as I erroneously wrote in the tag message) warnings about
undefined behavior in the SEEK_DATA/SEEK_HOLE code, and some
reshuffling of per-page block state info to fix some problems with
gfs2.
- Fix KASAN warnings due to integer overflow in SEEK_DATA/SEEK_HOLE
- Fix assertion errors when using inlinedata files on gfs2"
* tag 'iomap-5.14-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
iomap: Don't create iomap_page objects in iomap_page_mkwrite_actor
iomap: Don't create iomap_page objects for inline files
iomap: Permit pages without an iop to enter writeback
iomap: remove the length variable in iomap_seek_hole
iomap: remove the length variable in iomap_seek_data
If we encounter a directory that has been configured to pass on an
extent size hint to a new realtime file and the hint isn't an integer
multiple of the rt extent size, we should flag the hint for
administrative review because that is a misconfiguration (that other
parts of the kernel will fix automatically).
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
The "address" member of "struct probe_trace_point" uses long data type.
If kernel is 64-bit and perf program is 32-bit, size of "address"
variable is 32 bits.
As a result, upper 32 bits of address read from kernel are truncated, an
error occurs during address comparison in kprobe_warn_out_range().
Before:
# perf probe -a schedule
schedule is out of .text, skip it.
Error: Failed to add events.
Solution:
Change data type of "address" variable to u64 and change corresponding
address printing and value assignment.
After:
# perf.new.new probe -a schedule
Added new event:
probe:schedule (on schedule)
You can now use it in all perf tools, such as:
perf record -e probe:schedule -aR sleep 1
# perf probe -l
probe:schedule (on schedule@kernel/sched/core.c)
# perf record -e probe:schedule -aR sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.156 MB perf.data (1366 samples) ]
# perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 1K of event 'probe:schedule'
# Event count (approx.): 1366
#
# Overhead Command Shared Object Symbol
# ........ ............... ................. ............
#
6.22% migration/0 [kernel.kallsyms] [k] schedule
6.22% migration/1 [kernel.kallsyms] [k] schedule
6.22% migration/2 [kernel.kallsyms] [k] schedule
6.22% migration/3 [kernel.kallsyms] [k] schedule
6.15% migration/10 [kernel.kallsyms] [k] schedule
6.15% migration/11 [kernel.kallsyms] [k] schedule
6.15% migration/12 [kernel.kallsyms] [k] schedule
6.15% migration/13 [kernel.kallsyms] [k] schedule
6.15% migration/14 [kernel.kallsyms] [k] schedule
6.15% migration/15 [kernel.kallsyms] [k] schedule
6.15% migration/4 [kernel.kallsyms] [k] schedule
6.15% migration/5 [kernel.kallsyms] [k] schedule
6.15% migration/6 [kernel.kallsyms] [k] schedule
6.15% migration/7 [kernel.kallsyms] [k] schedule
6.15% migration/8 [kernel.kallsyms] [k] schedule
6.15% migration/9 [kernel.kallsyms] [k] schedule
0.22% rcu_sched [kernel.kallsyms] [k] schedule
...
#
# (Cannot load tips.txt file, please install perf!)
#
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Frank Ch. Eigler <fche@redhat.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jianlin Lv <jianlin.lv@arm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Li Huafei <lihuafei1@huawei.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lore.kernel.org/lkml/20210715063723.11926-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
I know nothing about zone_device pages and !device_private pages; but if
try_to_migrate_one() will do nothing for them, then it's better that
try_to_migrate() filter them first, than trawl through all their vmas.
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Link: https://lore.kernel.org/lkml/1241d356-8ec9-f47b-a5ec-9b2bf66d242@google.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull RISC-V fixes from Palmer Dabbelt:
- properly set the memory size, which fixes 32-bit systems
- allow initrd to load anywhere in memory, rather that restricting it
to the first 256MiB
- fix the 'mem=' parameter on 64-bit systems to properly account for
the maximum supported memory now that the kernel is outside the
linear map
- avoid installing mappings into the last 4KiB of memory, which
conflicts with error values
- avoid the stack from being freed while it is being walked
- a handful of fixes to the new copy to/from user routines
* tag 'riscv-for-linus-5.14-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: __asm_copy_to-from_user: Fix: Typos in comments
riscv: __asm_copy_to-from_user: Remove unnecessary size check
riscv: __asm_copy_to-from_user: Fix: fail on RV32
riscv: __asm_copy_to-from_user: Fix: overrun copy
riscv: stacktrace: pin the task's stack in get_wchan
riscv: Make sure the kernel mapping does not overlap with IS_ERR_VALUE
riscv: Make sure the linear mapping does not use the kernel mapping
riscv: Fix memory_limit for 64-bit kernel
RISC-V: load initrd wherever it fits into memory
riscv: Fix 32-bit RISC-V boot failure
Remove the conditional checking for out_data_len and skipping the fallocate
if it is 0. This is wrong will actually change any legitimate the fallocate
where the entire region is unallocated into a no-op.
Additionally, before allocating the range, if FALLOC_FL_KEEP_SIZE is set then
we need to clamp the length of the fallocate region as to not extend the size of the file.
Fixes: 966a3cb7c7db ("cifs: improve fallocate emulation")
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
This reverts commit b7eb335e26a9c7f258c96b3962c283c379d3ede0.
It turns out that the problem with the clang -Wimplicit-fallthrough
warning is not about the kernel source code, but about clang itself, and
that the warning is unusable until clang fixes its broken ways.
In particular, when you enable this warning for clang, you not only get
warnings about implicit fallthroughs. You also get this:
warning: fallthrough annotation in unreachable code [-Wimplicit-fallthrough]
which is completely broken becasue it
(a) doesn't even tell you where the problem is (seriously: no line
numbers, no filename, no nothing).
(b) is fundamentally broken anyway, because there are perfectly valid
reasons to have a fallthrough statement even if it turns out that
it can perhaps not be reached.
In the kernel, an example of that second case is code in the scheduler:
switch (state) {
case cpuset:
if (IS_ENABLED(CONFIG_CPUSETS)) {
cpuset_cpus_allowed_fallback(p);
state = possible;
break;
}
fallthrough;
case possible:
where if CONFIG_CPUSETS is enabled you actually never hit the
fallthrough case at all. But that in no way makes the fallthrough
wrong.
So the warning is completely broken, and enabling it for clang is a very
bad idea.
In the meantime, we can keep the gcc option enabled, and make the gcc
build use
-Wimplicit-fallthrough=5
which means that we will at least continue to require a proper
fallthrough statement, and that gcc won't silently accept the magic
comment versions. Because gcc does this all correctly, and while the odd
"=5" part is kind of obscure, it's documented in [1]:
"-Wimplicit-fallthrough=5 doesn’t recognize any comments as
fallthrough comments, only attributes disable the warning"
so if clang ever fixes its bad behavior we can try enabling it there again.
Link: https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html [1]
Cc: Kees Cook <keescook@chromium.org>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
kexec_load_file() relies on the memblock infrastructure to avoid
stamping over regions of memory that are essential to the survival
of the system.
However, nobody seems to agree how to flag these regions as reserved,
and (for example) EFI only publishes its reservations in /proc/iomem
for the benefit of the traditional, userspace based kexec tool.
On arm64 platforms with GICv3, this can result in the payload being
placed at the location of the LPI tables. Shock, horror!
Let's augment the EFI reservation code with a memblock_reserve() call,
protecting our dear tables from the secondary kernel invasion.
Reported-by: Moritz Fischer <mdf@kernel.org>
Tested-by: Moritz Fischer <mdf@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: James Morse <james.morse@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Pull Kbuild fixes from Masahiro Yamada:
- Restore the original behavior of scripts/setlocalversion when
LOCALVERSION is set to empty.
- Show Kconfig prompts even for 'make -s'
- Fix the combination of COFNIG_LTO_CLANG=y and CONFIG_MODVERSIONS=y
for older GNU Make versions
* tag 'kbuild-fixes-v5.14' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
Documentation: Fix intiramfs script name
Kbuild: lto: fix module versionings mismatch in GNU make 3.X
kbuild: do not suppress Kconfig prompts for silent build
scripts/setlocalversion: fix a bug when LOCALVERSION is empty
Now that we create those objects in iomap_writepage_map when needed,
there's no need to pre-create them in iomap_page_mkwrite_actor anymore.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
During a realtime grow operation, we run a single transaction for each
rt bitmap block added to the filesystem. This means that each step has
to be careful to increase sb_rblocks appropriately.
Fix the integer overflow error in this calculation that can happen when
the extent size is very large. Found by running growfs to add a rt
volume to a filesystem formatted with a 1g rt extent size.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
When using 'perf report' in directory mode, the first file is not closed
on exit, causing a memory leak.
The problem is caused by the iterating variable never reaching 0.
Fixes: 145520631130bd64 ("perf data: Add perf_data__(create_dir|close_dir) functions")
Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Zhen Lei <thunder.leizhen@huawei.com>
Link: http://lore.kernel.org/lkml/20210716141122.858082-1-rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In the unlikely race case that page_mlock_one() finds VM_LOCKED has been
cleared by the time it got page table lock, page_vma_mapped_walk_done()
must be called before returning, either explicitly, or by a final call
to page_vma_mapped_walk() - otherwise the page table remains locked.
Fixes: cd62734ca60d ("mm/rmap: split try_to_munlock from try_to_unmap")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Alistair Popple <apopple@nvidia.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Link: https://lore.kernel.org/lkml/20210711151446.GB4070@xsang-OptiPlex-9020/
Link: https://lore.kernel.org/lkml/f71f8523-cba7-3342-40a7-114abc5d1f51@google.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 71f642833284 ("ACPI: utils: Fix reference counting in
for_each_acpi_dev_match()") started doing "acpi_dev_put()" on a pointer
that was possibly NULL. That fails miserably, because that helper
inline function is not set up to handle that case.
Just make acpi_dev_put() silently accept a NULL pointer, rather than
calling down to put_device() with an invalid offset off that NULL
pointer.
Link: https://lore.kernel.org/lkml/a607c149-6bf6-0fd0-0e31-100378504da2@kernel.dk/
Reported-and-tested-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Daniel Scally <djrscally@gmail.com>
Cc: Andy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Coverity also complains about the way we calculate the offset
(starting from the address of a 4 byte array within the
header structure rather than from the beginning of the struct
plus 4 bytes) for SMB1 CIFSPOSIXDelFile. This changeset
doesn't change the address but makes it slightly clearer.
Addresses-Coverity: 711519 ("Out of bounds write")
Signed-off-by: Steve French <stfrench@microsoft.com>
Missing TPM final event log table is not a firmware bug.
Clearly if providing event log in the old format makes the final event
log invalid it should not be provided at least in that case.
Fixes: b4f1874c6216 ("tpm: check event log version before reading final events")
Signed-off-by: Michal Suchanek <msuchanek@suse.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Pull ARM SoC fixes from Arnd Bergmann:
"Here are the patches for this week that came as the fallout of the
merge window:
- Two fixes for the NVidia memory controller driver
- multiple defconfig files get patched to turn CONFIG_FB back on
after that is no longer selected by CONFIG_DRM
- ffa and scmpi firmware drivers fixes, mostly addressing compiler
and documentation warnings
- Platform specific fixes for device tree files on ASpeed, Renesas
and NVidia SoC, mostly for recent regressions.
- A workaround for a regression on the USB PHY with devlink when the
usb-nop-xceiv driver is not available until the rootfs is mounted.
- Device tree compiler warnings in Arm Versatile-AB"
* tag 'soc-fixes-5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (35 commits)
ARM: dts: versatile: Fix up interrupt controller node names
ARM: multi_v7_defconfig: Make NOP_USB_XCEIV driver built-in
ARM: configs: Update u8500_defconfig
ARM: configs: Update Vexpress defconfig
ARM: configs: Update Versatile defconfig
ARM: configs: Update RealView defconfig
ARM: configs: Update Integrator defconfig
arm: Typo s/PCI_IXP4XX_LEGACY/IXP4XX_PCI_LEGACY/
firmware: arm_scmi: Fix range check for the maximum number of pending messages
firmware: arm_scmi: Avoid padding in sensor message structure
firmware: arm_scmi: Fix kernel doc warnings about return values
firmware: arm_scpi: Fix kernel doc warnings
firmware: arm_scmi: Fix kernel doc warnings
ARM: shmobile: defconfig: Restore graphical consoles
firmware: arm_ffa: Fix a possible ffa_linux_errmap buffer overflow
firmware: arm_ffa: Fix the comment style
firmware: arm_ffa: Simplify probe function
firmware: arm_ffa: Ensure drivers provide a probe function
firmware: arm_scmi: Fix possible scmi_linux_errmap buffer overflow
firmware: arm_scmi: Ensure drivers provide a probe function
...
Documentation was not changed when renaming the script in commit
80e715a06c2d ("initramfs: rename gen_initramfs_list.sh to
gen_initramfs.sh"). Fixing this.
Basically does:
$ sed -i -e s/gen_initramfs_list.sh/gen_initramfs.sh/g $(git grep -l gen_initramfs_list.sh)
Fixes: 80e715a06c2d ("initramfs: rename gen_initramfs_list.sh to gen_initramfs.sh")
Signed-off-by: Robert Richter <rrichter@amd.com>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
In iomap_readpage_actor, don't create iop objects for inline inodes.
Otherwise, iomap_read_inline_data will set PageUptodate without setting
iop->uptodate, and iomap_page_release will eventually complain.
To prevent this kind of bug from occurring in the future, make sure the
page doesn't have private data attached in iomap_read_inline_data.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Improve the checking at the start of a realtime grow operation so that
we avoid accidentally set a new extent size that is too large and avoid
adding an rt volume to a filesystem with rmap or reflink because we
don't support rt rmap or reflink yet.
While we're at it, separate the checks so that we're only testing one
aspect at a time.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
ASan reports some memory leaks when running:
# perf test "42: BPF filter"
This second leak is caused by a strlist not being dellocated on error
inside probe_file__del_events.
This patch adds a goto label before the deallocation and makes the error
path jump to it.
Signed-off-by: Riccardo Mancini <rickyman7@gmail.com>
Fixes: e7895e422e4da63d ("perf probe: Split del_perf_probe_events()")
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/174963c587ae77fa108af794669998e4ae558338.1626343282.git.rickyman7@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The kernel recovers in due course from missing Mlocked pages: but there
was no point in calling page_mlock() (formerly known as
try_to_munlock()) on a THP, because nothing got done even when it was
found to be mapped in another VM_LOCKED vma.
It's true that we need to be careful: Mlocked accounting of pte-mapped
THPs is too difficult (so consistently avoided); but Mlocked accounting
of only-pmd-mapped THPs is supposed to work, even when multiple mappings
are mlocked and munlocked or munmapped. Refine the tests.
There is already a VM_BUG_ON_PAGE(PageDoubleMap) in page_mlock(), so
page_mlock_one() does not even have to worry about that complication.
(I said the kernel recovers: but would page reclaim be likely to split
THP before rediscovering that it's VM_LOCKED? I've not followed that up)
Fixes: 9a73f61bdb8a ("thp, mlock: do not mlock PTE-mapped file huge pages")
Signed-off-by: Hugh Dickins <hughd@google.com>
Reviewed-by: Shakeel Butt <shakeelb@google.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Link: https://lore.kernel.org/lkml/cfa154c-d595-406-eb7d-eb9df730f944@google.com/
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Alistair Popple <apopple@nvidia.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Yang Shi <shy828301@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull SCSI fixes from James Bottomley:
"Four fixes, all in drivers, all of which can lead to user visible
problems in certain situations"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: Fix NULL dereference on XCOPY completion
scsi: mpt3sas: Transition IOC to Ready state during shutdown
scsi: target: Fix protect handling in WRITE SAME(32)
scsi: iscsi: Fix iface sysfs attr detection
Coverity also complains about the way we calculate the offset
(starting from the address of a 4 byte array within the
header structure rather than from the beginning of the struct
plus 4 bytes) for SMB1 CIFSPOSIXCreate. This changeset
doesn't change the address but makes it slightly clearer.
Addresses-Coverity: 711518 ("Out of bounds write")
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull pwm fixes from Thierry Reding:
"A couple of fixes from Uwe that I missed for v5.14-rc1"
* tag 'pwm/for-5.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/thierry.reding/linux-pwm:
pwm: ep93xx: Ensure configuring period and duty_cycle isn't wrongly skipped
pwm: berlin: Ensure configuring period and duty_cycle isn't wrongly skipped
pwm: tiecap: Ensure configuring period and duty_cycle isn't wrongly skipped
pwm: spear: Ensure configuring period and duty_cycle isn't wrongly skipped
pwm: sprd: Ensure configuring period and duty_cycle isn't wrongly skipped
Commit 7fe1e79b59ba ("configfs: implement the .read_iter and .write_iter
methods") changed the simple_read_from_buffer() calls into copy_to_iter()
calls and the simple_write_to_buffer() calls into copy_from_iter() calls.
The simple*buffer() methods update the file offset (*ppos) but the read
and write iterators not yet. Make the read and write iterators update the
file offset (iocb->ki_pos).
This patch has been tested as follows:
# modprobe target_core_user
# dd if=/sys/kernel/config/target/dbroot bs=1
/var/target
12+0 records in
12+0 records out
12 bytes copied, 9.5539e-05 s, 126 kB/s
# cd /sys/kernel/config/acpi/table
# mkdir test
# cd test
# dmesg -c >/dev/null; printf 'SSDT\x8\0\0\0abcdefghijklmnopqrstuvwxyz' | dd of=aml bs=1; dmesg -c
34+0 records in
34+0 records out
34 bytes copied, 0.010627 s, 3.2 kB/s
[ 261.056551] ACPI configfs: invalid table length
Reported-by: Yanko Kaneti <yaneti@declera.com>
Cc: Yanko Kaneti <yaneti@declera.com>
Fixes: 7fe1e79b59ba ("configfs: implement the .read_iter and .write_iter methods")
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>