commits
Pull smb client fixes from Steve French:
- Two fallocate fixes
- Fix warnings from new gcc
- Two symlink fixes
* tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client, common: fix fortify warnings
cifs: Fix FALLOC_FL_INSERT_RANGE by setting i_size after EOF moved
cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF moved
smb: client: report correct st_size for SMB and NFS symlinks
smb: client: fix missing mode bits for SMB symlinks
Pull firewire fix from Takashi Sakamoto:
"A single patch to fix long-standing issue of memory leak at failure of
device registration for fw_unit. We rarely encounter the issue, but it
should be applied to stable releases, since it fixes inappropriate API
usage"
* tag 'firewire-fixes-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: fix possible memory leak in create_units()
When compiling with gcc version 14.0.0 20231126 (experimental)
and CONFIG_FORTIFY_SOURCE=y, I've noticed the following:
In file included from ./include/linux/string.h:295,
from ./include/linux/bitmap.h:12,
from ./include/linux/cpumask.h:12,
from ./arch/x86/include/asm/paravirt.h:17,
from ./arch/x86/include/asm/cpuid.h:62,
from ./arch/x86/include/asm/processor.h:19,
from ./arch/x86/include/asm/cpufeature.h:5,
from ./arch/x86/include/asm/thread_info.h:53,
from ./include/linux/thread_info.h:60,
from ./arch/x86/include/asm/preempt.h:9,
from ./include/linux/preempt.h:79,
from ./include/linux/spinlock.h:56,
from ./include/linux/wait.h:9,
from ./include/linux/wait_bit.h:8,
from ./include/linux/fs.h:6,
from fs/smb/client/smb2pdu.c:18:
In function 'fortify_memcpy_chk',
inlined from '__SMB2_close' at fs/smb/client/smb2pdu.c:3480:4:
./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field'
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Wattribute-warning]
588 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and:
In file included from ./include/linux/string.h:295,
from ./include/linux/bitmap.h:12,
from ./include/linux/cpumask.h:12,
from ./arch/x86/include/asm/paravirt.h:17,
from ./arch/x86/include/asm/cpuid.h:62,
from ./arch/x86/include/asm/processor.h:19,
from ./arch/x86/include/asm/cpufeature.h:5,
from ./arch/x86/include/asm/thread_info.h:53,
from ./include/linux/thread_info.h:60,
from ./arch/x86/include/asm/preempt.h:9,
from ./include/linux/preempt.h:79,
from ./include/linux/spinlock.h:56,
from ./include/linux/wait.h:9,
from ./include/linux/wait_bit.h:8,
from ./include/linux/fs.h:6,
from fs/smb/client/cifssmb.c:17:
In function 'fortify_memcpy_chk',
inlined from 'CIFS_open' at fs/smb/client/cifssmb.c:1248:3:
./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field'
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Wattribute-warning]
588 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In both cases, the fortification logic inteprets calls to 'memcpy()' as an
attempts to copy an amount of data which exceeds the size of the specified
field (i.e. more than 8 bytes from __le64 value) and thus issues an overread
warning. Both of these warnings may be silenced by using the convenient
'struct_group()' quirk.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull powerpc fixes from Michael Ellerman:
- Fix corruption of f0/vs0 during FP/Vector save, seen as userspace
crashes when using io-uring workers (in particular with MariaDB)
- Fix KVM_RUN potentially clobbering all host userspace FP/Vector
registers
Thanks to Timothy Pearson, Jens Axboe, and Nicholas Piggin.
* tag 'powerpc-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Fix KVM_RUN clobbering FP/VEC user registers
powerpc: Don't clobber f0/vs0 during fp|altivec register save
If device_register() fails, the refcount of device is not 0, the name
allocated in dev_set_name() is leaked. To fix this by calling put_device(),
so that it will be freed in callback function kobject_cleanup().
unreferenced object 0xffff9d99035c7a90 (size 8):
comm "systemd-udevd", pid 168, jiffies 4294672386 (age 152.089s)
hex dump (first 8 bytes):
66 77 30 2e 30 00 ff ff fw0.0...
backtrace:
[<00000000e1d62bac>] __kmem_cache_alloc_node+0x1e9/0x360
[<00000000bbeaff31>] __kmalloc_node_track_caller+0x44/0x1a0
[<00000000491f2fb4>] kvasprintf+0x67/0xd0
[<000000005b960ddc>] kobject_set_name_vargs+0x1e/0x90
[<00000000427ac591>] dev_set_name+0x4e/0x70
[<000000003b4e447d>] create_units+0xc5/0x110
fw_unit_release() will be called in the error path, move fw_device_get()
before calling device_register() to keep balanced with fw_device_put() in
fw_unit_release().
Cc: stable@vger.kernel.org
Fixes: 1fa5ae857bb1 ("driver core: get rid of struct device's bus_id string array")
Fixes: a1f64819fe9f ("firewire: struct device - replace bus_id with dev_name(), dev_set_name()")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Fix the cifs filesystem implementations of FALLOC_FL_INSERT_RANGE, in
smb3_insert_range(), to set i_size after extending the file on the server
and before we do the copy to open the gap (as we don't clean up the EOF
marker if the copy fails).
Fixes: 7fe6fe95b936 ("cifs: add FALLOC_FL_INSERT_RANGE support")
Cc: stable@vger.kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull vfio fixes from Alex Williamson:
- Fix the lifecycle of a mutex in the pds variant driver such that a
reset prior to opening the device won't find it uninitialized.
Implement the release path to symmetrically destroy the mutex. Also
switch a different lock from spinlock to mutex as the code path has
the potential to sleep and doesn't need the spinlock context
otherwise (Brett Creeley)
- Fix an issue detected via randconfig where KVM tries to symbol_get an
undeclared function. The symbol is temporarily declared
unconditionally here, which resolves the problem and avoids churn
relative to a series pending for the next merge window which resolves
some of this symbol ugliness, but also fixes Kconfig dependencies
(Sean Christopherson)
* tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfio:
vfio: Drop vfio_file_iommu_group() stub to fudge around a KVM wart
vfio/pds: Fix possible sleep while in atomic context
vfio/pds: Fix mutex lock->magic != lock warning
Before running a guest, the host process (e.g., QEMU) FP/VEC registers
are saved if they were being used, similarly to when the kernel uses FP
registers. The guest values are then loaded into regs, and the host
process registers will be restored lazily when it uses FP/VEC.
KVM HV has a bug here: the host process registers do get saved, but the
user MSR bits remain enabled, which indicates the registers are valid
for the process. After they are clobbered by running the guest, this
valid indication causes the host process to take on the FP/VEC register
values of the guest.
Fixes: 34e119c96b2b ("KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRs")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231122025811.2973-1-npiggin@gmail.com
Fix the cifs filesystem implementations of FALLOC_FL_ZERO_RANGE, in
smb3_zero_range(), to set i_size after extending the file on the server.
Fixes: 72c419d9b073 ("cifs: fix smb3_zero_range so it can expand the file-size when required")
Cc: stable@vger.kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull xen fixes from Juergen Gross:
- A fix for the Xen event driver setting the correct return value when
experiencing an allocation failure
- A fix for allocating space for a struct in the percpu area to not
cross page boundaries (this one is for x86, a similar one for Arm was
already in the pull request for rc3)
* tag 'for-linus-6.7a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: fix error code in xen_bind_pirq_msi_to_irq()
x86/xen: fix percpu vcpu_info allocation
Drop the vfio_file_iommu_group() stub and instead unconditionally declare
the function to fudge around a KVM wart where KVM tries to do symbol_get()
on vfio_file_iommu_group() (and other VFIO symbols) even if CONFIG_VFIO=n.
Ensuring the symbol is always declared fixes a PPC build error when
modules are also disabled, in which case symbol_get() simply points at the
address of the symbol (with some attributes shenanigans). Because KVM
does symbol_get() instead of directly depending on VFIO, the lack of a
fully defined symbol is not problematic (ugly, but "fine").
arch/powerpc/kvm/../../../virt/kvm/vfio.c:89:7:
error: attribute declaration must precede definition [-Werror,-Wignored-attributes]
fn = symbol_get(vfio_file_iommu_group);
^
include/linux/module.h:805:60: note: expanded from macro 'symbol_get'
#define symbol_get(x) ({ extern typeof(x) x __attribute__((weak,visibility("hidden"))); &(x); })
^
include/linux/vfio.h:294:35: note: previous definition is here
static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
^
arch/powerpc/kvm/../../../virt/kvm/vfio.c:89:7:
error: attribute declaration must precede definition [-Werror,-Wignored-attributes]
fn = symbol_get(vfio_file_iommu_group);
^
include/linux/module.h:805:65: note: expanded from macro 'symbol_get'
#define symbol_get(x) ({ extern typeof(x) x __attribute__((weak,visibility("hidden"))); &(x); })
^
include/linux/vfio.h:294:35: note: previous definition is here
static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
^
2 errors generated.
Although KVM is firmly in the wrong (there is zero reason for KVM to build
virt/kvm/vfio.c when VFIO is disabled), fudge around the error in VFIO as
the stub is unnecessary and doesn't serve its intended purpose (KVM is the
only external user of vfio_file_iommu_group()), and there is an in-flight
series to clean up the entire KVM<->VFIO interaction, i.e. fixing this in
KVM would result in more churn in the long run, and the stub needs to go
away regardless.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308251949.5IiaV0sz-lkp@intel.com
Closes: https://lore.kernel.org/oe-kbuild-all/202309030741.82aLACDG-lkp@intel.com
Closes: https://lore.kernel.org/oe-kbuild-all/202309110914.QLH0LU6L-lkp@intel.com
Link: https://lore.kernel.org/all/0-v1-08396538817d+13c5-vfio_kvm_kconfig_jgg@nvidia.com
Link: https://lore.kernel.org/all/20230916003118.2540661-1-seanjc@google.com
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: c1cce6d079b8 ("vfio: Compile vfio_group infrastructure optionally")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20231130001000.543240-1-seanjc@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
During floating point and vector save to thread data f0/vs0 are
clobbered by the FPSCR/VSCR store routine. This has been obvserved to
lead to userspace register corruption and application data corruption
with io-uring.
Fix it by restoring f0/vs0 after FPSCR/VSCR store has completed for
all the FP, altivec, VMX register save paths.
Tested under QEMU in kvm mode, running on a Talos II workstation with
dual POWER9 DD2.2 CPUs.
Additional detail (mpe):
Typically save_fpu() is called from __giveup_fpu() which saves the FP
regs and also *turns off FP* in the tasks MSR, meaning the kernel will
reload the FP regs from the thread struct before letting the task use FP
again. So in that case save_fpu() is free to clobber f0 because the FP
regs no longer hold live values for the task.
There is another case though, which is the path via:
sys_clone()
...
copy_process()
dup_task_struct()
arch_dup_task_struct()
flush_all_to_thread()
save_all()
That path saves the FP regs but leaves them live. That's meant as an
optimisation for a process that's using FP/VSX and then calls fork(),
leaving the regs live means the parent process doesn't have to take a
fault after the fork to get its FP regs back. The optimisation was added
in commit 8792468da5e1 ("powerpc: Add the ability to save FPU without
giving it up").
That path does clobber f0, but f0 is volatile across function calls,
and typically programs reach copy_process() from userspace via a syscall
wrapper function. So in normal usage f0 being clobbered across a
syscall doesn't cause visible data corruption.
But there is now a new path, because io-uring can call copy_process()
via create_io_thread() from the signal handling path. That's OK if the
signal is handled as part of syscall return, but it's not OK if the
signal is handled due to some other interrupt.
That path is:
interrupt_return_srr_user()
interrupt_exit_user_prepare()
interrupt_exit_user_prepare_main()
do_notify_resume()
get_signal()
task_work_run()
create_worker_cb()
create_io_worker()
copy_process()
dup_task_struct()
arch_dup_task_struct()
flush_all_to_thread()
save_all()
if (tsk->thread.regs->msr & MSR_FP)
save_fpu()
# f0 is clobbered and potentially live in userspace
Note the above discussion applies equally to save_altivec().
Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up")
Cc: stable@vger.kernel.org # v4.6+
Closes: https://lore.kernel.org/all/480932026.45576726.1699374859845.JavaMail.zimbra@raptorengineeringinc.com/
Closes: https://lore.kernel.org/linuxppc-dev/480221078.47953493.1700206777956.JavaMail.zimbra@raptorengineeringinc.com/
Tested-by: Timothy Pearson <tpearson@raptorengineering.com>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
[mpe: Reword change log to describe exact path of corruption & other minor tweaks]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/1921539696.48534988.1700407082933.JavaMail.zimbra@raptorengineeringinc.com
Pull tracing fixes from Steven Rostedt::
"Eventfs fixes:
- With the usage of simple_recursive_remove() recommended by Al Viro,
the code should not be calling "d_invalidate()" itself. Doing so is
causing crashes. The code was calling d_invalidate() on the race of
trying to look up a file while the parent was being deleted. This
was detected, and the added dentry was having d_invalidate() called
on it, but the deletion of the directory was also calling
d_invalidate() on that same dentry.
- A fix to not free the eventfs_inode (ei) until the last dput() was
called on its ei->dentry made the ei->dentry exist even after it
was marked for free by setting the ei->is_freed. But code elsewhere
still was checking if ei->dentry was NULL if ei->is_freed is set
and would trigger WARN_ON if that was the case. That's no longer
true and there should not be any warnings when it is true.
- Use GFP_NOFS for allocations done under eventfs_mutex. The
eventfs_mutex can be taken on file system reclaim, make sure that
allocations done under that mutex do not trigger file system
reclaim.
- Clean up code by moving the taking of inode_lock out of the helper
functions and into where they are needed, and not use the parameter
to know to take it or not. It must always be held but some callers
of the helper function have it taken when they were called.
- Warn if the inode_lock is not held in the helper functions.
- Warn if eventfs_start_creating() is called without a parent. As
eventfs is underneath tracefs, all files created will have a parent
(the top one will have a tracefs parent).
Tracing update:
- Add Mathieu Desnoyers as an official reviewer of the tracing subsystem"
* tag 'trace-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
MAINTAINERS: TRACING: Add Mathieu Desnoyers as Reviewer
eventfs: Make sure that parent->d_inode is locked in creating files/dirs
eventfs: Do not allow NULL parent to eventfs_start_creating()
eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()
eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held
eventfs: Do not invalidate dentry in create_file/dir_dentry()
eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL
We can't rely on FILE_STANDARD_INFORMATION::EndOfFile for reparse
points as they will be always zero. Set it to symlink target's length
as specified by POSIX.
This will make stat() family of syscalls return the correct st_size
for such files.
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull probes fixes from Masami Hiramatsu:
- objpool: Fix objpool overrun case on memory/cache access delay
especially on the big.LITTLE SoC. The objpool uses a copy of object
slot index internal loop, but the slot index can be changed on
another processor in parallel. In that case, the difference of 'head'
local copy and the 'slot->last' index will be bigger than local slot
size. In that case, we need to re-read the slot::head to update it.
- kretprobe: Fix to use appropriate rcu API for kretprobe holder. Since
kretprobe_holder::rp is RCU managed, it should use
rcu_assign_pointer() and rcu_dereference_check() correctly. Also
adding __rcu tag for finding wrong usage by sparse.
- rethook: Fix to use appropriate rcu API for rethook::handler. The
same as kretprobe, rethook::handler is RCU managed and it should use
rcu_assign_pointer() and rcu_dereference_check(). This also adds
__rcu tag for finding wrong usage by sparse.
* tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rethook: Use __rcu pointer for rethook::handler
kprobes: consistent rcu api usage for kretprobe holder
lib: objpool: fix head overrun on RK3588 SBC
Return -ENOMEM if xen_irq_init() fails. currently the code returns an
uninitialized variable or zero.
Fixes: 5dd9ad32d775 ("xen/events: drop xen_allocate_irqs_dynamic()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Juergen Gross <jgross@ssue.com>
Link: https://lore.kernel.org/r/3b9ab040-a92e-4e35-b687-3a95890a9ace@moroto.mountain
Signed-off-by: Juergen Gross <jgross@suse.com>
The driver could possibly sleep while in atomic context resulting
in the following call trace while CONFIG_DEBUG_ATOMIC_SLEEP=y is
set:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2817, name: bash
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
Call Trace:
<TASK>
dump_stack_lvl+0x36/0x50
__might_resched+0x123/0x170
mutex_lock+0x1e/0x50
pds_vfio_put_lm_file+0x1e/0xa0 [pds_vfio_pci]
pds_vfio_put_save_file+0x19/0x30 [pds_vfio_pci]
pds_vfio_state_mutex_unlock+0x2e/0x80 [pds_vfio_pci]
pci_reset_function+0x4b/0x70
reset_store+0x5b/0xa0
kernfs_fop_write_iter+0x137/0x1d0
vfs_write+0x2de/0x410
ksys_write+0x5d/0xd0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
This can happen if pds_vfio_put_restore_file() and/or
pds_vfio_put_save_file() grab the mutex_lock(&lm_file->lock)
while the spin_lock(&pds_vfio->reset_lock) is held, which can
happen during while calling pds_vfio_state_mutex_unlock().
Fix this by changing the reset_lock to reset_mutex so there are no such
conerns. Also, make sure to destroy the reset_mutex in the driver specific
VFIO device release function.
This also fixes a spinlock bad magic BUG that was caused
by not calling spinlock_init() on the reset_lock. Since, the lock is
being changed to a mutex, make sure to call mutex_init() on it.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/kvm/1f9bc27b-3de9-4891-9687-ba2820c1b390@moroto.mountain/
Fixes: bb500dbe2ac6 ("vfio/pds: Add VFIO live migration support")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20231122192532.25791-3-brett.creeley@amd.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pull parisc architecture fixes from Helge Deller:
"This patchset fixes and enforces correct section alignments for the
ex_table, altinstructions, parisc_unwind, jump_table and bug_table
which are created by inline assembly.
Due to not being correctly aligned at link & load time they can
trigger unnecessarily the kernel unaligned exception handler at
runtime. While at it, I switched the bug table to use relative
addresses which reduces the size of the table by half on 64-bit.
We still had the ENOSYM and EREMOTERELEASE errno symbols as left-overs
from HP-UX, which now trigger build-issues with glibc. We can simply
remove them.
Most of the patches are tagged for stable kernel series.
Summary:
- Drop HP-UX ENOSYM and EREMOTERELEASE return codes to avoid glibc
build issues
- Fix section alignments for ex_table, altinstructions, parisc unwind
table, jump_table and bug_table
- Reduce size of bug_table on 64-bit kernel by using relative
pointers"
* tag 'parisc-for-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Reduce size of the bug_table on 64-bit kernel by half
parisc: Drop the HP-UX ENOSYM and EREMOTERELEASE error codes
parisc: Use natural CPU alignment for bug_table
parisc: Ensure 32-bit alignment on parisc unwind section
parisc: Mark lock_aligned variables 16-byte aligned on SMP
parisc: Mark jump_table naturally aligned
parisc: Mark altinstructions read-only and 32-bit aligned
parisc: Mark ex_table entries 32-bit aligned in uaccess.h
parisc: Mark ex_table entries 32-bit aligned in assembly.h
In order to make sure I get CC'd on tracing changes for which my input
would be relevant, add my name as reviewer of the TRACING subsystem.
Link: https://lore.kernel.org/linux-trace-kernel/20231115155018.8236-1-mathieu.desnoyers@efficios.com
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
When instantiating inodes for SMB symlinks, add the mode bits from
@cifs_sb->ctx->file_mode as we already do for the other special files.
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull power management fixes from Rafael Wysocki:
"These fix issues in two cpufreq drivers, in the AMD P-state driver and
in the power-capping DTPM framework.
Specifics:
- Fix the AMD P-state driver's EPP sysfs interface in the cases when
the performance governor is in use (Ayush Jain)
- Make the ->fast_switch() callback in the AMD P-state driver return
the target frequency as expected (Gautham R. Shenoy)
- Allow user space to control the range of frequencies to use via
scaling_min_freq and scaling_max_freq when AMD P-state driver is in
use (Wyes Karny)
- Prevent power domains needed for wakeup signaling from being turned
off during system suspend on Qualcomm systems and prevent
performance states votes from runtime-suspended devices from being
lost across a system suspend-resume cycle in qcom-cpufreq-nvmem
(Stephan Gerhold)
- Fix disabling the 792 Mhz OPP in the imx6q cpufreq driver for the
i.MX6ULL types that can run at that frequency (Christoph
Niedermaier)
- Eliminate unnecessary and harmful conversions to uW from the DTPM
(dynamic thermal and power management) framework (Lukasz Luba)"
* tag 'pm-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate: Only print supported EPP values for performance governor
cpufreq/amd-pstate: Fix scaling_min_freq and scaling_max_freq update
powercap: DTPM: Fix unneeded conversions to micro-Watts
cpufreq/amd-pstate: Fix the return value of amd_pstate_fast_switch()
pmdomain: qcom: rpmpd: Set GENPD_FLAG_ACTIVE_WAKEUP
cpufreq: qcom-nvmem: Preserve PM domain votes in system suspend
cpufreq: qcom-nvmem: Enable virtual power domain devices
cpufreq: imx6q: Don't disable 792 Mhz OPP unnecessarily
Since the rethook::handler is an RCU-maganged pointer so that it will
notice readers the rethook is stopped (unregistered) or not, it should
be an __rcu pointer and use appropriate functions to be accessed. This
will use appropriate memory barrier when accessing it. OTOH,
rethook::data is never changed, so we don't need to check it in
get_kretprobe().
NOTE: To avoid sparse warning, rethook::handler is defined by a raw
function pointer type with __rcu instead of rethook_handler_t.
Link: https://lore.kernel.org/all/170126066201.398836.837498688669005979.stgit@devnote2/
Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311241808.rv9ceuAh-lkp@intel.com/
Tested-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Today the percpu struct vcpu_info is allocated via DEFINE_PER_CPU(),
meaning that it could cross a page boundary. In this case registering
it with the hypervisor will fail, resulting in a panic().
This can easily be fixed by using DEFINE_PER_CPU_ALIGNED() instead,
as struct vcpu_info is guaranteed to have a size of 64 bytes, matching
the cache line size of x86 64-bit processors (Xen doesn't support
32-bit processors).
Fixes: 5ead97c84fa7 ("xen: Core Xen implementation")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.con>
Link: https://lore.kernel.org/r/20231124074852.25161-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
The following BUG was found when running on a kernel with
CONFIG_DEBUG_MUTEXES=y set:
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
RIP: 0010:mutex_trylock+0x10d/0x120
Call Trace:
<TASK>
? __warn+0x85/0x140
? mutex_trylock+0x10d/0x120
? report_bug+0xfc/0x1e0
? handle_bug+0x3f/0x70
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? mutex_trylock+0x10d/0x120
? mutex_trylock+0x10d/0x120
pds_vfio_reset+0x3a/0x60 [pds_vfio_pci]
pci_reset_function+0x4b/0x70
reset_store+0x5b/0xa0
kernfs_fop_write_iter+0x137/0x1d0
vfs_write+0x2de/0x410
ksys_write+0x5d/0xd0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
As shown, lock->magic != lock. This is because
mutex_init(&pds_vfio->state_mutex) is called in the VFIO open path. So,
if a reset is initiated before the VFIO device is opened the mutex will
have never been initialized. Fix this by calling
mutex_init(&pds_vfio->state_mutex) in the VFIO init path.
Also, don't destroy the mutex on close because the device may
be re-opened, which would cause mutex to be uninitialized. Fix this by
implementing a driver specific vfio_device_ops.release callback that
destroys the mutex before calling vfio_pci_core_release_dev().
Fixes: bb500dbe2ac6 ("vfio/pds: Add VFIO live migration support")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20231122192532.25791-2-brett.creeley@amd.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pull Kbuild fixes from Masahiro Yamada:
- Fix section mismatch warning messages for riscv and loongarch
- Remove CONFIG_IA64 left-over from linux/export-internal.h
- Fix the location of the quotes for UIMAGE_NAME
- Fix a memory leak bug in Kconfig
* tag 'kbuild-fixes-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: fix memory leak from range properties
kbuild: Move the single quotes for image name
linux/export: clean up the IA-64 KSYM_FUNC macro
modpost: fix section mismatch message for RELA
Pull x86 microcode fixes from Ingo Molnar:
"Fix/enhance x86 microcode version reporting: fix the bootup log spam,
and remove the driver version announcement to avoid version confusion
when distros backport fixes"
* tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Rework early revisions reporting
x86/microcode: Remove the driver announcement and version
Enable GENERIC_BUG_RELATIVE_POINTERS which will store 32-bit relative
offsets to the bug address and the source file name instead of 64-bit
absolute addresses. This effectively reduces the size of the
bug_table[] array by half on 64-bit kernels.
Signed-off-by: Helge Deller <deller@gmx.de>
Since the locking of the parent->d_inode has been moved outside the
creation of the files and directories (as it use to be locked via a
conditional), add a WARN_ON_ONCE() to the case that it's not locked.
Link: https://lkml.kernel.org/r/20231121231112.853962542@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull ACPI fixes from Rafael Wysocki:
"This fixes a recently introduced build issue on ARM32 and a NULL
pointer dereference in the ACPI backlight driver due to a design issue
exposed by a recent change in the ACPI bus type code.
Specifics:
- Fix a recently introduced build issue on ARM32 platforms caused by
an inadvertent header file breakage (Dave Jiang)
- Eliminate questionable usage of acpi_driver_data() in the ACPI
backlight cooling device code that leads to NULL pointer
dereferences after recent ACPI core changes (Hans de Goede)"
* tag 'acpi-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Use acpi_video_device for cooling-dev driver data
ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes
Merge a power capping fix for 6.7-rc4 which eliminates unnecessary
and harmful conversions to uW from the DTPM (dynamic thermal and power
management) framework (Lukasz Luba).
* powercap:
powercap: DTPM: Fix unneeded conversions to micro-Watts
It seems that the pointer-to-kretprobe "rp" within the kretprobe_holder is
RCU-managed, based on the (non-rethook) implementation of get_kretprobe().
The thought behind this patch is to make use of the RCU API where possible
when accessing this pointer so that the needed barriers are always in place
and to self-document the code.
The __rcu annotation to "rp" allows for sparse RCU checking. Plain writes
done to the "rp" pointer are changed to make use of the RCU macro for
assignment. For the single read, the implementation of get_kretprobe()
is simplified by making use of an RCU macro which accomplishes the same,
but note that the log warning text will be more generic.
I did find that there is a difference in assembly generated between the
usage of the RCU macros vs without. For example, on arm64, when using
rcu_assign_pointer(), the corresponding store instruction is a
store-release (STLR) which has an implicit barrier. When normal assignment
is done, a regular store (STR) is found. In the macro case, this seems to
be a result of rcu_assign_pointer() using smp_store_release() when the
value to write is not NULL.
Link: https://lore.kernel.org/all/20231122132058.3359-1-inwardvessel@gmail.com/
Fixes: d741bf41d7c7 ("kprobes: Remove kretprobe hash")
Cc: stable@vger.kernel.org
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
xen_vcpu_info is a percpu area than needs to be mapped by Xen.
Currently, it could cross a page boundary resulting in Xen being unable
to map it:
[ 0.567318] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:164!
[ 0.574002] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Fix the issue by using __alloc_percpu and requesting alignment for the
memory allocation.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2311221501340.2053963@ubuntu-linux-20-04-desktop
Fixes: 24d5373dda7c ("arm/xen: Use alloc_percpu rather than __alloc_percpu")
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Pull irq fix from Borislav Petkov:
- Flush the translation service tables to prevent unpredictable
behavior on non-coherent GIC devices
* tag 'irq_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Flush ITS tables correctly in non-coherent GIC designs
Currently, sym_validate_range() duplicates the range string using
xstrdup(), which is overwritten by a subsequent sym_calc_value() call.
It results in a memory leak.
Instead, only the pointer should be copied.
Below is a test case, with a summary from Valgrind.
[Test Kconfig]
config FOO
int "foo"
range 10 20
[Test .config]
CONFIG_FOO=0
[Before]
LEAK SUMMARY:
definitely lost: 3 bytes in 1 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 17,465 bytes in 21 blocks
suppressed: 0 bytes in 0 blocks
[After]
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 17,462 bytes in 20 blocks
suppressed: 0 bytes in 0 blocks
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull x86 perf event fix from Ingo Molnar:
"Fix a bug in the Intel hybrid CPUs hardware-capabilities enumeration
code resulting in non-working events on those platforms"
* tag 'perf-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Correct incorrect 'or' operation for PMU capabilities
The AMD side of the loader issues the microcode revision for each
logical thread on the system, which can become really noisy on huge
machines. And doing that doesn't make a whole lot of sense - the
microcode revision is already in /proc/cpuinfo.
So in case one is interested in the theoretical support of mixed silicon
steppings on AMD, one can check there.
What is also missing on the AMD side - something which people have
requested before - is showing the microcode revision the CPU had
*before* the early update.
So abstract that up in the main code and have the BSP on each vendor
provide those revision numbers.
Then, dump them only once on driver init.
On Intel, do not dump the patch date - it is not needed.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/CAHk-=wg=%2B8rceshMkB4VnKxmRccVLtBLPBawnewZuuqyx5U=3A@mail.gmail.com
Those return codes are only defined for the parisc architecture and
are leftovers from when we wanted to be HP-UX compatible.
They are not returned by any Linux kernel syscall but do trigger
problems with the glibc strerrorname_np() and strerror() functions as
reported in glibc issue #31080.
There is no need to keep them, so simply remove them.
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: Bruno Haible <bruno@clisp.org>
Closes: https://sourceware.org/bugzilla/show_bug.cgi?id=31080
Cc: stable@vger.kernel.org
The eventfs directory is dynamically created via the meta data supplied by
the existing trace events. All files and directories in eventfs has a
parent. Do not allow NULL to be passed into eventfs_start_creating() as
the parent because that should never happen. Warn if it does.
Link: https://lkml.kernel.org/r/20231121231112.693841807@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull arm64 fix from Catalin Marinas:
"Fix a regression where the arm64 KPTI ends up enabled even on systems
that don't need it"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Avoid enabling KPTI unnecessarily
Merge a fix for a recently introduced build issue on ARM32 platforms
caused by an inadvertent header file breakage (Dave Jiang).
* acpi-tables:
ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes
show_energy_performance_available_preferences() to show only supported
values which is performance in performance governor policy.
-------Before--------
$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_driver
amd-pstate-epp
$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_preference
performance
$ cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_available_preferences
default performance balance_performance balance_power power
-------After--------
$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_driver
amd-pstate-epp
$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_preference
performance
$ cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_available_preferences
performance
Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors")
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Ayush Jain <ayush.jain3@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The power values coming from the Energy Model are already in uW.
The PowerCap and DTPM frameworks operate on uW, so all places should
just use the values from the EM.
Fix the code by removing all of the conversion to uW still present in it.
Fixes: ae6ccaa65038 (PM: EM: convert power field to micro-Watts precision and align drivers)
Cc: 5.19+ <stable@vger.kernel.org> # v5.19+
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
objpool overrun stress with test_objpool on OrangePi5+ SBC triggered the
following kernel warnings:
WARNING: CPU: 6 PID: 3115 at lib/objpool.c:168 objpool_push+0xc0/0x100
This message is from objpool.c:168:
WARN_ON_ONCE(tail - head > pool->nr_objs);
The overrun test case is to validate the case that pre-allocated objects
are insufficient: 8 objects are pre-allocated for each node and consumer
thread per node tries to grab 16 objects in a row. The testing system is
OrangePI 5+, with RK3588, a big.LITTLE SOC with 4x A76 and 4x A55. When
disabling either all 4 big or 4 little cores, the overrun tests run well,
and once with big and little cores mixed together, the overrun test would
always cause an overrun loop. It's likely the memory timing differences
of big and little cores cause this trouble. Here are the debugging data
of objpool_try_get_slot after try_cmpxchg_release:
objpool_pop: cpu: 4/0 0:0 head: 278/279 tail:278 last:276/278
The local copies of 'head' and 'last' were 278 and 276, and reloading of
'slot->head' and 'slot->last' got 279 and 278. After try_cmpxchg_release
'slot->head' became 'head + 1', which is correct. But what's wrong here
is the stale value of 'last', and that stale value of 'last' finally led
the overrun of 'head'.
Memory updating of 'last' and 'head' are performed in push() and pop()
independently, which could be the culprit leading this out of order
visibility of 'last' and 'head'. So for objpool_try_get_slot(), it's
not enough only checking the condition of 'head != slot', the implicit
condition 'last - head <= nr_objs' must also be explicitly asserted to
guarantee 'last' is always behind 'head' before the object retrieving.
This patch will check and try reloading of 'head' and 'last' to ensure
'last' is behind 'head' at the time of object retrieving. Performance
testings show the average impact is about 0.1% for X86_64 and 1.12% for
ARM64. Here are the results:
OS: Debian 10 X86_64, Linux 6.6rc
HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s
1T 2T 4T 8T 16T
native: 49543304 99277826 199017659 399070324 795185848
objpool: 29909085 59865637 119692073 239750369 478005250
objpool+: 29879313 59230743 119609856 239067773 478509029
32T 48T 64T 96T 128T
native: 1596927073 2390099988 2929397330 3183875848 3257546602
objpool: 957553042 1435814086 1680872925 2043126796 2165424198
objpool+: 956476281 1434491297 1666055740 2041556569 2157415622
OS: Debian 11 AARCH64, Linux 6.6rc
HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s
1T 2T 4T 8T 16T
native: 30890508 60399915 123111980 242257008 494002946
objpool: 14742531 28883047 57739948 115886644 232455421
objpool+: 14107220 29032998 57286084 113730493 232232850
24T 32T 48T 64T 96T
native: 746406039 1000174750 1493236240 1998318364 2942911180
objpool: 349164852 467284332 702296756 934459713 1387898285
objpool+: 348388180 462750976 696606096 927865887 1368402195
Link: https://lore.kernel.org/all/20231114115148.298821-1-wuqiang.matt@bytedance.com/
Fixes: b4edb8d2d464 ("lib: objpool added: ring-array based lockless MPMC")
Signed-off-by: wuqiang.matt <wuqiang.matt@bytedance.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fake flexible arrays (zero-length and one-element arrays) are deprecated,
and should be replaced by flexible-array members. So, replace
zero-length array with a flexible-array member in `struct
privcmd_kernel_ioreq`.
Also annotate array `ports` with `__counted_by()` to prepare for the
coming implementation by GCC and Clang of the `__counted_by` attribute.
Flexible array members annotated with `__counted_by` can have their
accesses bounds-checked at run-time via `CONFIG_UBSAN_BOUNDS` (for array
indexing) and `CONFIG_FORTIFY_SOURCE` (for strcpy/memcpy-family functions).
This fixes multiple -Warray-bounds warnings:
drivers/xen/privcmd.c:1239:30: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
drivers/xen/privcmd.c:1240:30: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
drivers/xen/privcmd.c:1241:30: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
drivers/xen/privcmd.c:1245:33: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
drivers/xen/privcmd.c:1258:67: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
This results in no differences in binary output.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/ZVZlg3tPMPCRdteh@work
Signed-off-by: Juergen Gross <jgross@suse.com>
Pull x86 fixes from Borislav Petkov:
- Ignore invalid x2APIC entries in order to not waste per-CPU data
- Fix a back-to-back signals handling scenario when shadow stack is in
use
- A documentation fix
- Add Kirill as TDX maintainer
* tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/acpi: Ignore invalid x2APIC entries
x86/shstk: Delay signal entry SSP write until after user accesses
x86/Documentation: Indent 'note::' directive for protocol version number note
MAINTAINERS: Add Intel TDX entry
In non-coherent GIC designs, the ITS tables must be flushed before writing
to the GITS_BASER<n> registers, otherwise the ITS could read dirty tables,
which results in unpredictable behavior.
Flush the tables right at the begin of its_setup_baser() to prevent that.
[ tglx: Massage changelog ]
Fixes: a8707f553884 ("irqchip/gic-v3: Add Rockchip 3588001 erratum workaround")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fang Xiang <fangxiang3@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231030083256.4345-1-fangxiang3@xiaomi.com
Add quotes where UIMAGE_NAME is used, rather than where it is defined.
This allows the UIMAGE_NAME variable to be set by the user.
Signed-off-by: Simon Glass <sjg@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull locking fix from Ingo Molnar:
"Fix lockdep block chain corruption resulting in KASAN warnings"
* tag 'locking-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
lockdep: Fix block chain corruption
When running perf-stat command on Intel hybrid platform, perf-stat
reports the following errors:
sudo taskset -c 7 ./perf stat -vvvv -e cpu_atom/instructions/ sleep 1
Opening: cpu/cycles/:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -16
Performance counter stats for 'sleep 1':
<not counted> cpu_atom/instructions/
It looks the cpu_atom/instructions/ event can't be enabled on atom PMU
even when the process is pinned on atom core. Investigation shows that
exclusive_event_init() helper always returns -EBUSY error in the perf
event creation. That's strange since the atom PMU should not be an
exclusive PMU.
Further investigation shows the issue was introduced by commit:
97588df87b56 ("perf/x86/intel: Add common intel_pmu_init_hybrid()")
The commit originally intents to clear the bit PERF_PMU_CAP_AUX_OUTPUT
from PMU capabilities if intel_cap.pebs_output_pt_available is not set,
but it incorrectly uses 'or' operation and leads to all PMU capabilities
bits are set to 1 except bit PERF_PMU_CAP_AUX_OUTPUT.
Testing this fix on Intel hybrid platforms, the observed issues
disappear.
Fixes: 97588df87b56 ("perf/x86/intel: Add common intel_pmu_init_hybrid()")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231121014628.729989-1-dapeng1.mi@linux.intel.com
First of all, the print is useless. The driver will either load and say
which microcode revision the machine has or issue an error.
Then, the version number is meaningless and actively confusing, as Yazen
mentioned recently: when a subset of patches are backported to a distro
kernel, one can't assume the driver version is the same as the upstream
one. And besides, the version number of the loader hasn't been used and
incremented for a long time. So drop it.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20231115210212.9981-2-bp@alien8.de
Make sure that the __bug_table section gets 32- or 64-bit aligned,
depending if a 32- or 64-bit kernel is being built.
Mark it non-writeable and use .blockz instead of the .org assembler
directive to pad the struct.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v6.0+
The both create_file_dentry() and create_dir_dentry() takes a boolean
parameter "lookup", as on lookup the inode_lock should already be taken,
but for dcache_dir_open_wrapper() it is not taken.
There's no reason that the dcache_dir_open_wrapper() can't take the
inode_lock before calling these functions. In fact, it's better if it
does, as the lock can be held throughout both directory and file
creations.
This also simplifies the code, and possibly prevents unexpected race
conditions when the lock is released.
Link: https://lkml.kernel.org/r/20231121231112.528544825@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull iommu fixes from Joerg Roedel:
- Fix race conditions in device probe path
- Handle ERR_PTR() returns in __iommu_domain_alloc() path
- Update MAINTAINERS entry for Qualcom IOMMUs
- Printk argument fix in device tree specific code
- Several Intel VT-d fixes from Lu Baolu:
- Do not support enforcing cache coherency for non-empty domains
- Avoid devTLB invalidation if iommu is off
- Disable PCI ATS in legacy passthrough mode
- Support non-PCI devices when clearing context
- Fix incorrect cache invalidation for mm notification
- Add MTL to quirk list to skip TE disabling
- Set variable intel_dirty_ops to static
* tag 'iommu-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Fix printk arg in of_iommu_get_resv_regions()
iommu/vt-d: Set variable intel_dirty_ops to static
iommu/vt-d: Fix incorrect cache invalidation for mm notification
iommu/vt-d: Add MTL to quirk list to skip TE disabling
iommu/vt-d: Make context clearing consistent with context mapping
iommu/vt-d: Disable PCI ATS in legacy passthrough mode
iommu/vt-d: Omit devTLB invalidation requests when TES=0
iommu/vt-d: Support enforce_cache_coherency only for empty domains
iommu: Avoid more races around device probe
MAINTAINERS: list all Qualcomm IOMMU drivers in the QUALCOMM IOMMU entry
iommu: Flow ERR_PTR out from __iommu_domain_alloc()
Commit 42c5a3b04bf6 refactored the KPTI init code in a way that results
in the use of non-global kernel mappings even on systems that have no
need for it, and even when KPTI has been disabled explicitly via the
command line.
Ensure that this only happens when we have decided (based on the
detected system-wide CPU features) that KPTI should be enabled.
Fixes: 42c5a3b04bf6 ("arm64: Split kpti_install_ng_mappings()")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20231127120049.2258650-6-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The acpi_video code was storing the acpi_video_device as driver_data
in the acpi_device children of the acpi_video_bus acpi_device.
But the acpi_video driver only binds to the bus acpi_device.
It uses, but does not bind to, the children. Since it is not
the driver it should not be using the driver_data of the children's
acpi_device-s.
Since commit 0d16710146a1 ("ACPI: bus: Set driver_data to NULL every
time .add() fails") the childen's driver_data ends up getting set
to NULL after a driver fails to bind to the children leading to a NULL
pointer deref in video_get_max_state when registering the cooling-dev:
[ 3.148958] BUG: kernel NULL pointer dereference, address: 0000000000000090
<snip>
[ 3.149015] Hardware name: Sony Corporation VPCSB2X9R/VAIO, BIOS R2087H4 06/15/2012
[ 3.149021] RIP: 0010:video_get_max_state+0x17/0x30 [video]
<snip>
[ 3.149105] Call Trace:
[ 3.149110] <TASK>
[ 3.149114] ? __die+0x23/0x70
[ 3.149126] ? page_fault_oops+0x171/0x4e0
[ 3.149137] ? exc_page_fault+0x7f/0x180
[ 3.149147] ? asm_exc_page_fault+0x26/0x30
[ 3.149158] ? video_get_max_state+0x17/0x30 [video 9b6f3f0d19d7b4a0e2df17a2d8b43bc19c2ed71f]
[ 3.149176] ? __pfx_video_get_max_state+0x10/0x10 [video 9b6f3f0d19d7b4a0e2df17a2d8b43bc19c2ed71f]
[ 3.149192] __thermal_cooling_device_register.part.0+0xf2/0x2f0
[ 3.149205] acpi_video_bus_register_backlight.part.0.isra.0+0x414/0x570 [video 9b6f3f0d19d7b4a0e2df17a2d8b43bc19c2ed71f]
[ 3.149227] acpi_video_register_backlight+0x57/0x80 [video 9b6f3f0d19d7b4a0e2df17a2d8b43bc19c2ed71f]
[ 3.149245] intel_acpi_video_register+0x68/0x90 [i915 1f3a758130b32ef13d301d4f8f78c7d766d57f2a]
[ 3.149669] intel_display_driver_register+0x28/0x50 [i915 1f3a758130b32ef13d301d4f8f78c7d766d57f2a]
[ 3.150064] i915_driver_probe+0x790/0xb90 [i915 1f3a758130b32ef13d301d4f8f78c7d766d57f2a]
[ 3.150402] local_pci_probe+0x45/0xa0
[ 3.150412] pci_device_probe+0xc1/0x260
<snip>
Fix this by directly using the acpi_video_device as devdata for
the cooling-device, which avoids the need to set driver-data on
the children at all.
Fixes: 0d16710146a1 ("ACPI: bus: Set driver_data to NULL every time .add() fails")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/9718
Cc: 6.6+ <stable@vger.kernel.org> # 6.6+
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus reported that:
After commit a103f46633fd the kernel stopped compiling for
several ARM32 platforms that I am building with a bare metal
compiler. Bare metal compilers (arm-none-eabi-) don't
define __linux__.
This is because the header <acpi/platform/acenv.h> is now
in the include path for <linux/irq.h>:
CC arch/arm/kernel/irq.o
CC kernel/sysctl.o
CC crypto/api.o
In file included from ../include/acpi/acpi.h:22,
from ../include/linux/fw_table.h:29,
from ../include/linux/acpi.h:18,
from ../include/linux/irqchip.h:14,
from ../arch/arm/kernel/irq.c:25:
../include/acpi/platform/acenv.h:218:2: error: #error Unknown target environment
218 | #error Unknown target environment
| ^~~~~
The issue is caused by the introducing of splitting out the ACPI code to
support the new generic fw_table code.
Rafael suggested [1] moving the fw_table.h include in linux/acpi.h to below
the linux/mutex.h. Remove the two includes in fw_table.h. Replace
linux/fw_table.h include in fw_table.c with linux/acpi.h.
Link: https://lore.kernel.org/linux-acpi/CAJZ5v0idWdJq3JSqQWLG5q+b+b=zkEdWR55rGYEoxh7R6N8kFQ@mail.gmail.com/
Fixes: a103f46633fd ("acpi: Move common tables helper functions to common lib")
Closes: https://lore.kernel.org/linux-acpi/20231114-arm-build-bug-v1-1-458745fe32a4@linaro.org/
Reported-by: Linus Walleij <linus.walleij@linaro.org>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When amd_pstate is running, writing to scaling_min_freq and
scaling_max_freq has no effect. These values are only passed to the
policy level, but not to the platform level. This means that the
platform does not know about the frequency limits set by the user.
To fix this, update the min_perf and max_perf values at the platform
level whenever the user changes the scaling_min_freq and scaling_max_freq
values.
Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors")
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull smb client fixes from Steve French:
- Two fallocate fixes
- Fix warnings from new gcc
- Two symlink fixes
* tag 'v6.7-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client, common: fix fortify warnings
cifs: Fix FALLOC_FL_INSERT_RANGE by setting i_size after EOF moved
cifs: Fix FALLOC_FL_ZERO_RANGE by setting i_size if EOF moved
smb: client: report correct st_size for SMB and NFS symlinks
smb: client: fix missing mode bits for SMB symlinks
Pull firewire fix from Takashi Sakamoto:
"A single patch to fix long-standing issue of memory leak at failure of
device registration for fw_unit. We rarely encounter the issue, but it
should be applied to stable releases, since it fixes inappropriate API
usage"
* tag 'firewire-fixes-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
firewire: core: fix possible memory leak in create_units()
When compiling with gcc version 14.0.0 20231126 (experimental)
and CONFIG_FORTIFY_SOURCE=y, I've noticed the following:
In file included from ./include/linux/string.h:295,
from ./include/linux/bitmap.h:12,
from ./include/linux/cpumask.h:12,
from ./arch/x86/include/asm/paravirt.h:17,
from ./arch/x86/include/asm/cpuid.h:62,
from ./arch/x86/include/asm/processor.h:19,
from ./arch/x86/include/asm/cpufeature.h:5,
from ./arch/x86/include/asm/thread_info.h:53,
from ./include/linux/thread_info.h:60,
from ./arch/x86/include/asm/preempt.h:9,
from ./include/linux/preempt.h:79,
from ./include/linux/spinlock.h:56,
from ./include/linux/wait.h:9,
from ./include/linux/wait_bit.h:8,
from ./include/linux/fs.h:6,
from fs/smb/client/smb2pdu.c:18:
In function 'fortify_memcpy_chk',
inlined from '__SMB2_close' at fs/smb/client/smb2pdu.c:3480:4:
./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field'
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Wattribute-warning]
588 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
and:
In file included from ./include/linux/string.h:295,
from ./include/linux/bitmap.h:12,
from ./include/linux/cpumask.h:12,
from ./arch/x86/include/asm/paravirt.h:17,
from ./arch/x86/include/asm/cpuid.h:62,
from ./arch/x86/include/asm/processor.h:19,
from ./arch/x86/include/asm/cpufeature.h:5,
from ./arch/x86/include/asm/thread_info.h:53,
from ./include/linux/thread_info.h:60,
from ./arch/x86/include/asm/preempt.h:9,
from ./include/linux/preempt.h:79,
from ./include/linux/spinlock.h:56,
from ./include/linux/wait.h:9,
from ./include/linux/wait_bit.h:8,
from ./include/linux/fs.h:6,
from fs/smb/client/cifssmb.c:17:
In function 'fortify_memcpy_chk',
inlined from 'CIFS_open' at fs/smb/client/cifssmb.c:1248:3:
./include/linux/fortify-string.h:588:25: warning: call to '__read_overflow2_field'
declared with attribute warning: detected read beyond size of field (2nd parameter);
maybe use struct_group()? [-Wattribute-warning]
588 | __read_overflow2_field(q_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In both cases, the fortification logic inteprets calls to 'memcpy()' as an
attempts to copy an amount of data which exceeds the size of the specified
field (i.e. more than 8 bytes from __le64 value) and thus issues an overread
warning. Both of these warnings may be silenced by using the convenient
'struct_group()' quirk.
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull powerpc fixes from Michael Ellerman:
- Fix corruption of f0/vs0 during FP/Vector save, seen as userspace
crashes when using io-uring workers (in particular with MariaDB)
- Fix KVM_RUN potentially clobbering all host userspace FP/Vector
registers
Thanks to Timothy Pearson, Jens Axboe, and Nicholas Piggin.
* tag 'powerpc-6.7-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
KVM: PPC: Book3S HV: Fix KVM_RUN clobbering FP/VEC user registers
powerpc: Don't clobber f0/vs0 during fp|altivec register save
If device_register() fails, the refcount of device is not 0, the name
allocated in dev_set_name() is leaked. To fix this by calling put_device(),
so that it will be freed in callback function kobject_cleanup().
unreferenced object 0xffff9d99035c7a90 (size 8):
comm "systemd-udevd", pid 168, jiffies 4294672386 (age 152.089s)
hex dump (first 8 bytes):
66 77 30 2e 30 00 ff ff fw0.0...
backtrace:
[<00000000e1d62bac>] __kmem_cache_alloc_node+0x1e9/0x360
[<00000000bbeaff31>] __kmalloc_node_track_caller+0x44/0x1a0
[<00000000491f2fb4>] kvasprintf+0x67/0xd0
[<000000005b960ddc>] kobject_set_name_vargs+0x1e/0x90
[<00000000427ac591>] dev_set_name+0x4e/0x70
[<000000003b4e447d>] create_units+0xc5/0x110
fw_unit_release() will be called in the error path, move fw_device_get()
before calling device_register() to keep balanced with fw_device_put() in
fw_unit_release().
Cc: stable@vger.kernel.org
Fixes: 1fa5ae857bb1 ("driver core: get rid of struct device's bus_id string array")
Fixes: a1f64819fe9f ("firewire: struct device - replace bus_id with dev_name(), dev_set_name()")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Takashi Sakamoto <o-takashi@sakamocchi.jp>
Fix the cifs filesystem implementations of FALLOC_FL_INSERT_RANGE, in
smb3_insert_range(), to set i_size after extending the file on the server
and before we do the copy to open the gap (as we don't clean up the EOF
marker if the copy fails).
Fixes: 7fe6fe95b936 ("cifs: add FALLOC_FL_INSERT_RANGE support")
Cc: stable@vger.kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull vfio fixes from Alex Williamson:
- Fix the lifecycle of a mutex in the pds variant driver such that a
reset prior to opening the device won't find it uninitialized.
Implement the release path to symmetrically destroy the mutex. Also
switch a different lock from spinlock to mutex as the code path has
the potential to sleep and doesn't need the spinlock context
otherwise (Brett Creeley)
- Fix an issue detected via randconfig where KVM tries to symbol_get an
undeclared function. The symbol is temporarily declared
unconditionally here, which resolves the problem and avoids churn
relative to a series pending for the next merge window which resolves
some of this symbol ugliness, but also fixes Kconfig dependencies
(Sean Christopherson)
* tag 'vfio-v6.7-rc4' of https://github.com/awilliam/linux-vfio:
vfio: Drop vfio_file_iommu_group() stub to fudge around a KVM wart
vfio/pds: Fix possible sleep while in atomic context
vfio/pds: Fix mutex lock->magic != lock warning
Before running a guest, the host process (e.g., QEMU) FP/VEC registers
are saved if they were being used, similarly to when the kernel uses FP
registers. The guest values are then loaded into regs, and the host
process registers will be restored lazily when it uses FP/VEC.
KVM HV has a bug here: the host process registers do get saved, but the
user MSR bits remain enabled, which indicates the registers are valid
for the process. After they are clobbered by running the guest, this
valid indication causes the host process to take on the FP/VEC register
values of the guest.
Fixes: 34e119c96b2b ("KVM: PPC: Book3S HV P9: Reduce mtmsrd instructions required to save host SPRs")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20231122025811.2973-1-npiggin@gmail.com
Fix the cifs filesystem implementations of FALLOC_FL_ZERO_RANGE, in
smb3_zero_range(), to set i_size after extending the file on the server.
Fixes: 72c419d9b073 ("cifs: fix smb3_zero_range so it can expand the file-size when required")
Cc: stable@vger.kernel.org
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Paulo Alcantara <pc@manguebit.com>
cc: Shyam Prasad N <nspmangalore@gmail.com>
cc: Rohith Surabattula <rohiths.msft@gmail.com>
cc: Jeff Layton <jlayton@kernel.org>
cc: linux-cifs@vger.kernel.org
cc: linux-mm@kvack.org
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull xen fixes from Juergen Gross:
- A fix for the Xen event driver setting the correct return value when
experiencing an allocation failure
- A fix for allocating space for a struct in the percpu area to not
cross page boundaries (this one is for x86, a similar one for Arm was
already in the pull request for rc3)
* tag 'for-linus-6.7a-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/events: fix error code in xen_bind_pirq_msi_to_irq()
x86/xen: fix percpu vcpu_info allocation
Drop the vfio_file_iommu_group() stub and instead unconditionally declare
the function to fudge around a KVM wart where KVM tries to do symbol_get()
on vfio_file_iommu_group() (and other VFIO symbols) even if CONFIG_VFIO=n.
Ensuring the symbol is always declared fixes a PPC build error when
modules are also disabled, in which case symbol_get() simply points at the
address of the symbol (with some attributes shenanigans). Because KVM
does symbol_get() instead of directly depending on VFIO, the lack of a
fully defined symbol is not problematic (ugly, but "fine").
arch/powerpc/kvm/../../../virt/kvm/vfio.c:89:7:
error: attribute declaration must precede definition [-Werror,-Wignored-attributes]
fn = symbol_get(vfio_file_iommu_group);
^
include/linux/module.h:805:60: note: expanded from macro 'symbol_get'
#define symbol_get(x) ({ extern typeof(x) x __attribute__((weak,visibility("hidden"))); &(x); })
^
include/linux/vfio.h:294:35: note: previous definition is here
static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
^
arch/powerpc/kvm/../../../virt/kvm/vfio.c:89:7:
error: attribute declaration must precede definition [-Werror,-Wignored-attributes]
fn = symbol_get(vfio_file_iommu_group);
^
include/linux/module.h:805:65: note: expanded from macro 'symbol_get'
#define symbol_get(x) ({ extern typeof(x) x __attribute__((weak,visibility("hidden"))); &(x); })
^
include/linux/vfio.h:294:35: note: previous definition is here
static inline struct iommu_group *vfio_file_iommu_group(struct file *file)
^
2 errors generated.
Although KVM is firmly in the wrong (there is zero reason for KVM to build
virt/kvm/vfio.c when VFIO is disabled), fudge around the error in VFIO as
the stub is unnecessary and doesn't serve its intended purpose (KVM is the
only external user of vfio_file_iommu_group()), and there is an in-flight
series to clean up the entire KVM<->VFIO interaction, i.e. fixing this in
KVM would result in more churn in the long run, and the stub needs to go
away regardless.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308251949.5IiaV0sz-lkp@intel.com
Closes: https://lore.kernel.org/oe-kbuild-all/202309030741.82aLACDG-lkp@intel.com
Closes: https://lore.kernel.org/oe-kbuild-all/202309110914.QLH0LU6L-lkp@intel.com
Link: https://lore.kernel.org/all/0-v1-08396538817d+13c5-vfio_kvm_kconfig_jgg@nvidia.com
Link: https://lore.kernel.org/all/20230916003118.2540661-1-seanjc@google.com
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Jason Gunthorpe <jgg@nvidia.com>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Fixes: c1cce6d079b8 ("vfio: Compile vfio_group infrastructure optionally")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://lore.kernel.org/r/20231130001000.543240-1-seanjc@google.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
During floating point and vector save to thread data f0/vs0 are
clobbered by the FPSCR/VSCR store routine. This has been obvserved to
lead to userspace register corruption and application data corruption
with io-uring.
Fix it by restoring f0/vs0 after FPSCR/VSCR store has completed for
all the FP, altivec, VMX register save paths.
Tested under QEMU in kvm mode, running on a Talos II workstation with
dual POWER9 DD2.2 CPUs.
Additional detail (mpe):
Typically save_fpu() is called from __giveup_fpu() which saves the FP
regs and also *turns off FP* in the tasks MSR, meaning the kernel will
reload the FP regs from the thread struct before letting the task use FP
again. So in that case save_fpu() is free to clobber f0 because the FP
regs no longer hold live values for the task.
There is another case though, which is the path via:
sys_clone()
...
copy_process()
dup_task_struct()
arch_dup_task_struct()
flush_all_to_thread()
save_all()
That path saves the FP regs but leaves them live. That's meant as an
optimisation for a process that's using FP/VSX and then calls fork(),
leaving the regs live means the parent process doesn't have to take a
fault after the fork to get its FP regs back. The optimisation was added
in commit 8792468da5e1 ("powerpc: Add the ability to save FPU without
giving it up").
That path does clobber f0, but f0 is volatile across function calls,
and typically programs reach copy_process() from userspace via a syscall
wrapper function. So in normal usage f0 being clobbered across a
syscall doesn't cause visible data corruption.
But there is now a new path, because io-uring can call copy_process()
via create_io_thread() from the signal handling path. That's OK if the
signal is handled as part of syscall return, but it's not OK if the
signal is handled due to some other interrupt.
That path is:
interrupt_return_srr_user()
interrupt_exit_user_prepare()
interrupt_exit_user_prepare_main()
do_notify_resume()
get_signal()
task_work_run()
create_worker_cb()
create_io_worker()
copy_process()
dup_task_struct()
arch_dup_task_struct()
flush_all_to_thread()
save_all()
if (tsk->thread.regs->msr & MSR_FP)
save_fpu()
# f0 is clobbered and potentially live in userspace
Note the above discussion applies equally to save_altivec().
Fixes: 8792468da5e1 ("powerpc: Add the ability to save FPU without giving it up")
Cc: stable@vger.kernel.org # v4.6+
Closes: https://lore.kernel.org/all/480932026.45576726.1699374859845.JavaMail.zimbra@raptorengineeringinc.com/
Closes: https://lore.kernel.org/linuxppc-dev/480221078.47953493.1700206777956.JavaMail.zimbra@raptorengineeringinc.com/
Tested-by: Timothy Pearson <tpearson@raptorengineering.com>
Tested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Timothy Pearson <tpearson@raptorengineering.com>
[mpe: Reword change log to describe exact path of corruption & other minor tweaks]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/1921539696.48534988.1700407082933.JavaMail.zimbra@raptorengineeringinc.com
Pull tracing fixes from Steven Rostedt::
"Eventfs fixes:
- With the usage of simple_recursive_remove() recommended by Al Viro,
the code should not be calling "d_invalidate()" itself. Doing so is
causing crashes. The code was calling d_invalidate() on the race of
trying to look up a file while the parent was being deleted. This
was detected, and the added dentry was having d_invalidate() called
on it, but the deletion of the directory was also calling
d_invalidate() on that same dentry.
- A fix to not free the eventfs_inode (ei) until the last dput() was
called on its ei->dentry made the ei->dentry exist even after it
was marked for free by setting the ei->is_freed. But code elsewhere
still was checking if ei->dentry was NULL if ei->is_freed is set
and would trigger WARN_ON if that was the case. That's no longer
true and there should not be any warnings when it is true.
- Use GFP_NOFS for allocations done under eventfs_mutex. The
eventfs_mutex can be taken on file system reclaim, make sure that
allocations done under that mutex do not trigger file system
reclaim.
- Clean up code by moving the taking of inode_lock out of the helper
functions and into where they are needed, and not use the parameter
to know to take it or not. It must always be held but some callers
of the helper function have it taken when they were called.
- Warn if the inode_lock is not held in the helper functions.
- Warn if eventfs_start_creating() is called without a parent. As
eventfs is underneath tracefs, all files created will have a parent
(the top one will have a tracefs parent).
Tracing update:
- Add Mathieu Desnoyers as an official reviewer of the tracing subsystem"
* tag 'trace-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
MAINTAINERS: TRACING: Add Mathieu Desnoyers as Reviewer
eventfs: Make sure that parent->d_inode is locked in creating files/dirs
eventfs: Do not allow NULL parent to eventfs_start_creating()
eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()
eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held
eventfs: Do not invalidate dentry in create_file/dir_dentry()
eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL
We can't rely on FILE_STANDARD_INFORMATION::EndOfFile for reparse
points as they will be always zero. Set it to symlink target's length
as specified by POSIX.
This will make stat() family of syscalls return the correct st_size
for such files.
Cc: stable@vger.kernel.org
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull probes fixes from Masami Hiramatsu:
- objpool: Fix objpool overrun case on memory/cache access delay
especially on the big.LITTLE SoC. The objpool uses a copy of object
slot index internal loop, but the slot index can be changed on
another processor in parallel. In that case, the difference of 'head'
local copy and the 'slot->last' index will be bigger than local slot
size. In that case, we need to re-read the slot::head to update it.
- kretprobe: Fix to use appropriate rcu API for kretprobe holder. Since
kretprobe_holder::rp is RCU managed, it should use
rcu_assign_pointer() and rcu_dereference_check() correctly. Also
adding __rcu tag for finding wrong usage by sparse.
- rethook: Fix to use appropriate rcu API for rethook::handler. The
same as kretprobe, rethook::handler is RCU managed and it should use
rcu_assign_pointer() and rcu_dereference_check(). This also adds
__rcu tag for finding wrong usage by sparse.
* tag 'probes-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rethook: Use __rcu pointer for rethook::handler
kprobes: consistent rcu api usage for kretprobe holder
lib: objpool: fix head overrun on RK3588 SBC
Return -ENOMEM if xen_irq_init() fails. currently the code returns an
uninitialized variable or zero.
Fixes: 5dd9ad32d775 ("xen/events: drop xen_allocate_irqs_dynamic()")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Juergen Gross <jgross@ssue.com>
Link: https://lore.kernel.org/r/3b9ab040-a92e-4e35-b687-3a95890a9ace@moroto.mountain
Signed-off-by: Juergen Gross <jgross@suse.com>
The driver could possibly sleep while in atomic context resulting
in the following call trace while CONFIG_DEBUG_ATOMIC_SLEEP=y is
set:
BUG: sleeping function called from invalid context at kernel/locking/mutex.c:283
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2817, name: bash
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
Call Trace:
<TASK>
dump_stack_lvl+0x36/0x50
__might_resched+0x123/0x170
mutex_lock+0x1e/0x50
pds_vfio_put_lm_file+0x1e/0xa0 [pds_vfio_pci]
pds_vfio_put_save_file+0x19/0x30 [pds_vfio_pci]
pds_vfio_state_mutex_unlock+0x2e/0x80 [pds_vfio_pci]
pci_reset_function+0x4b/0x70
reset_store+0x5b/0xa0
kernfs_fop_write_iter+0x137/0x1d0
vfs_write+0x2de/0x410
ksys_write+0x5d/0xd0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
This can happen if pds_vfio_put_restore_file() and/or
pds_vfio_put_save_file() grab the mutex_lock(&lm_file->lock)
while the spin_lock(&pds_vfio->reset_lock) is held, which can
happen during while calling pds_vfio_state_mutex_unlock().
Fix this by changing the reset_lock to reset_mutex so there are no such
conerns. Also, make sure to destroy the reset_mutex in the driver specific
VFIO device release function.
This also fixes a spinlock bad magic BUG that was caused
by not calling spinlock_init() on the reset_lock. Since, the lock is
being changed to a mutex, make sure to call mutex_init() on it.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/kvm/1f9bc27b-3de9-4891-9687-ba2820c1b390@moroto.mountain/
Fixes: bb500dbe2ac6 ("vfio/pds: Add VFIO live migration support")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20231122192532.25791-3-brett.creeley@amd.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pull parisc architecture fixes from Helge Deller:
"This patchset fixes and enforces correct section alignments for the
ex_table, altinstructions, parisc_unwind, jump_table and bug_table
which are created by inline assembly.
Due to not being correctly aligned at link & load time they can
trigger unnecessarily the kernel unaligned exception handler at
runtime. While at it, I switched the bug table to use relative
addresses which reduces the size of the table by half on 64-bit.
We still had the ENOSYM and EREMOTERELEASE errno symbols as left-overs
from HP-UX, which now trigger build-issues with glibc. We can simply
remove them.
Most of the patches are tagged for stable kernel series.
Summary:
- Drop HP-UX ENOSYM and EREMOTERELEASE return codes to avoid glibc
build issues
- Fix section alignments for ex_table, altinstructions, parisc unwind
table, jump_table and bug_table
- Reduce size of bug_table on 64-bit kernel by using relative
pointers"
* tag 'parisc-for-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Reduce size of the bug_table on 64-bit kernel by half
parisc: Drop the HP-UX ENOSYM and EREMOTERELEASE error codes
parisc: Use natural CPU alignment for bug_table
parisc: Ensure 32-bit alignment on parisc unwind section
parisc: Mark lock_aligned variables 16-byte aligned on SMP
parisc: Mark jump_table naturally aligned
parisc: Mark altinstructions read-only and 32-bit aligned
parisc: Mark ex_table entries 32-bit aligned in uaccess.h
parisc: Mark ex_table entries 32-bit aligned in assembly.h
In order to make sure I get CC'd on tracing changes for which my input
would be relevant, add my name as reviewer of the TRACING subsystem.
Link: https://lore.kernel.org/linux-trace-kernel/20231115155018.8236-1-mathieu.desnoyers@efficios.com
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull power management fixes from Rafael Wysocki:
"These fix issues in two cpufreq drivers, in the AMD P-state driver and
in the power-capping DTPM framework.
Specifics:
- Fix the AMD P-state driver's EPP sysfs interface in the cases when
the performance governor is in use (Ayush Jain)
- Make the ->fast_switch() callback in the AMD P-state driver return
the target frequency as expected (Gautham R. Shenoy)
- Allow user space to control the range of frequencies to use via
scaling_min_freq and scaling_max_freq when AMD P-state driver is in
use (Wyes Karny)
- Prevent power domains needed for wakeup signaling from being turned
off during system suspend on Qualcomm systems and prevent
performance states votes from runtime-suspended devices from being
lost across a system suspend-resume cycle in qcom-cpufreq-nvmem
(Stephan Gerhold)
- Fix disabling the 792 Mhz OPP in the imx6q cpufreq driver for the
i.MX6ULL types that can run at that frequency (Christoph
Niedermaier)
- Eliminate unnecessary and harmful conversions to uW from the DTPM
(dynamic thermal and power management) framework (Lukasz Luba)"
* tag 'pm-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq/amd-pstate: Only print supported EPP values for performance governor
cpufreq/amd-pstate: Fix scaling_min_freq and scaling_max_freq update
powercap: DTPM: Fix unneeded conversions to micro-Watts
cpufreq/amd-pstate: Fix the return value of amd_pstate_fast_switch()
pmdomain: qcom: rpmpd: Set GENPD_FLAG_ACTIVE_WAKEUP
cpufreq: qcom-nvmem: Preserve PM domain votes in system suspend
cpufreq: qcom-nvmem: Enable virtual power domain devices
cpufreq: imx6q: Don't disable 792 Mhz OPP unnecessarily
Since the rethook::handler is an RCU-maganged pointer so that it will
notice readers the rethook is stopped (unregistered) or not, it should
be an __rcu pointer and use appropriate functions to be accessed. This
will use appropriate memory barrier when accessing it. OTOH,
rethook::data is never changed, so we don't need to check it in
get_kretprobe().
NOTE: To avoid sparse warning, rethook::handler is defined by a raw
function pointer type with __rcu instead of rethook_handler_t.
Link: https://lore.kernel.org/all/170126066201.398836.837498688669005979.stgit@devnote2/
Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook")
Cc: stable@vger.kernel.org
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202311241808.rv9ceuAh-lkp@intel.com/
Tested-by: JP Kobryn <inwardvessel@gmail.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Today the percpu struct vcpu_info is allocated via DEFINE_PER_CPU(),
meaning that it could cross a page boundary. In this case registering
it with the hypervisor will fail, resulting in a panic().
This can easily be fixed by using DEFINE_PER_CPU_ALIGNED() instead,
as struct vcpu_info is guaranteed to have a size of 64 bytes, matching
the cache line size of x86 64-bit processors (Xen doesn't support
32-bit processors).
Fixes: 5ead97c84fa7 ("xen: Core Xen implementation")
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.con>
Link: https://lore.kernel.org/r/20231124074852.25161-1-jgross@suse.com
Signed-off-by: Juergen Gross <jgross@suse.com>
The following BUG was found when running on a kernel with
CONFIG_DEBUG_MUTEXES=y set:
DEBUG_LOCKS_WARN_ON(lock->magic != lock)
RIP: 0010:mutex_trylock+0x10d/0x120
Call Trace:
<TASK>
? __warn+0x85/0x140
? mutex_trylock+0x10d/0x120
? report_bug+0xfc/0x1e0
? handle_bug+0x3f/0x70
? exc_invalid_op+0x17/0x70
? asm_exc_invalid_op+0x1a/0x20
? mutex_trylock+0x10d/0x120
? mutex_trylock+0x10d/0x120
pds_vfio_reset+0x3a/0x60 [pds_vfio_pci]
pci_reset_function+0x4b/0x70
reset_store+0x5b/0xa0
kernfs_fop_write_iter+0x137/0x1d0
vfs_write+0x2de/0x410
ksys_write+0x5d/0xd0
do_syscall_64+0x3b/0x90
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
As shown, lock->magic != lock. This is because
mutex_init(&pds_vfio->state_mutex) is called in the VFIO open path. So,
if a reset is initiated before the VFIO device is opened the mutex will
have never been initialized. Fix this by calling
mutex_init(&pds_vfio->state_mutex) in the VFIO init path.
Also, don't destroy the mutex on close because the device may
be re-opened, which would cause mutex to be uninitialized. Fix this by
implementing a driver specific vfio_device_ops.release callback that
destroys the mutex before calling vfio_pci_core_release_dev().
Fixes: bb500dbe2ac6 ("vfio/pds: Add VFIO live migration support")
Signed-off-by: Brett Creeley <brett.creeley@amd.com>
Reviewed-by: Shannon Nelson <shannon.nelson@amd.com>
Link: https://lore.kernel.org/r/20231122192532.25791-2-brett.creeley@amd.com
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Pull Kbuild fixes from Masahiro Yamada:
- Fix section mismatch warning messages for riscv and loongarch
- Remove CONFIG_IA64 left-over from linux/export-internal.h
- Fix the location of the quotes for UIMAGE_NAME
- Fix a memory leak bug in Kconfig
* tag 'kbuild-fixes-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: fix memory leak from range properties
kbuild: Move the single quotes for image name
linux/export: clean up the IA-64 KSYM_FUNC macro
modpost: fix section mismatch message for RELA
Pull x86 microcode fixes from Ingo Molnar:
"Fix/enhance x86 microcode version reporting: fix the bootup log spam,
and remove the driver version announcement to avoid version confusion
when distros backport fixes"
* tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Rework early revisions reporting
x86/microcode: Remove the driver announcement and version
Since the locking of the parent->d_inode has been moved outside the
creation of the files and directories (as it use to be locked via a
conditional), add a WARN_ON_ONCE() to the case that it's not locked.
Link: https://lkml.kernel.org/r/20231121231112.853962542@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull ACPI fixes from Rafael Wysocki:
"This fixes a recently introduced build issue on ARM32 and a NULL
pointer dereference in the ACPI backlight driver due to a design issue
exposed by a recent change in the ACPI bus type code.
Specifics:
- Fix a recently introduced build issue on ARM32 platforms caused by
an inadvertent header file breakage (Dave Jiang)
- Eliminate questionable usage of acpi_driver_data() in the ACPI
backlight cooling device code that leads to NULL pointer
dereferences after recent ACPI core changes (Hans de Goede)"
* tag 'acpi-6.7-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: video: Use acpi_video_device for cooling-dev driver data
ACPI: Fix ARM32 platforms compile issue introduced by fw_table changes
It seems that the pointer-to-kretprobe "rp" within the kretprobe_holder is
RCU-managed, based on the (non-rethook) implementation of get_kretprobe().
The thought behind this patch is to make use of the RCU API where possible
when accessing this pointer so that the needed barriers are always in place
and to self-document the code.
The __rcu annotation to "rp" allows for sparse RCU checking. Plain writes
done to the "rp" pointer are changed to make use of the RCU macro for
assignment. For the single read, the implementation of get_kretprobe()
is simplified by making use of an RCU macro which accomplishes the same,
but note that the log warning text will be more generic.
I did find that there is a difference in assembly generated between the
usage of the RCU macros vs without. For example, on arm64, when using
rcu_assign_pointer(), the corresponding store instruction is a
store-release (STLR) which has an implicit barrier. When normal assignment
is done, a regular store (STR) is found. In the macro case, this seems to
be a result of rcu_assign_pointer() using smp_store_release() when the
value to write is not NULL.
Link: https://lore.kernel.org/all/20231122132058.3359-1-inwardvessel@gmail.com/
Fixes: d741bf41d7c7 ("kprobes: Remove kretprobe hash")
Cc: stable@vger.kernel.org
Signed-off-by: JP Kobryn <inwardvessel@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
xen_vcpu_info is a percpu area than needs to be mapped by Xen.
Currently, it could cross a page boundary resulting in Xen being unable
to map it:
[ 0.567318] kernel BUG at arch/arm64/xen/../../arm/xen/enlighten.c:164!
[ 0.574002] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMPT SMP
Fix the issue by using __alloc_percpu and requesting alignment for the
memory allocation.
Signed-off-by: Stefano Stabellini <stefano.stabellini@amd.com>
Link: https://lore.kernel.org/r/alpine.DEB.2.22.394.2311221501340.2053963@ubuntu-linux-20-04-desktop
Fixes: 24d5373dda7c ("arm/xen: Use alloc_percpu rather than __alloc_percpu")
Reviewed-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Currently, sym_validate_range() duplicates the range string using
xstrdup(), which is overwritten by a subsequent sym_calc_value() call.
It results in a memory leak.
Instead, only the pointer should be copied.
Below is a test case, with a summary from Valgrind.
[Test Kconfig]
config FOO
int "foo"
range 10 20
[Test .config]
CONFIG_FOO=0
[Before]
LEAK SUMMARY:
definitely lost: 3 bytes in 1 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 17,465 bytes in 21 blocks
suppressed: 0 bytes in 0 blocks
[After]
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 17,462 bytes in 20 blocks
suppressed: 0 bytes in 0 blocks
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull x86 perf event fix from Ingo Molnar:
"Fix a bug in the Intel hybrid CPUs hardware-capabilities enumeration
code resulting in non-working events on those platforms"
* tag 'perf-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Correct incorrect 'or' operation for PMU capabilities
The AMD side of the loader issues the microcode revision for each
logical thread on the system, which can become really noisy on huge
machines. And doing that doesn't make a whole lot of sense - the
microcode revision is already in /proc/cpuinfo.
So in case one is interested in the theoretical support of mixed silicon
steppings on AMD, one can check there.
What is also missing on the AMD side - something which people have
requested before - is showing the microcode revision the CPU had
*before* the early update.
So abstract that up in the main code and have the BSP on each vendor
provide those revision numbers.
Then, dump them only once on driver init.
On Intel, do not dump the patch date - it is not needed.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/CAHk-=wg=%2B8rceshMkB4VnKxmRccVLtBLPBawnewZuuqyx5U=3A@mail.gmail.com
Those return codes are only defined for the parisc architecture and
are leftovers from when we wanted to be HP-UX compatible.
They are not returned by any Linux kernel syscall but do trigger
problems with the glibc strerrorname_np() and strerror() functions as
reported in glibc issue #31080.
There is no need to keep them, so simply remove them.
Signed-off-by: Helge Deller <deller@gmx.de>
Reported-by: Bruno Haible <bruno@clisp.org>
Closes: https://sourceware.org/bugzilla/show_bug.cgi?id=31080
Cc: stable@vger.kernel.org
The eventfs directory is dynamically created via the meta data supplied by
the existing trace events. All files and directories in eventfs has a
parent. Do not allow NULL to be passed into eventfs_start_creating() as
the parent because that should never happen. Warn if it does.
Link: https://lkml.kernel.org/r/20231121231112.693841807@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
show_energy_performance_available_preferences() to show only supported
values which is performance in performance governor policy.
-------Before--------
$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_driver
amd-pstate-epp
$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_preference
performance
$ cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_available_preferences
default performance balance_performance balance_power power
-------After--------
$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_driver
amd-pstate-epp
$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_preference
performance
$ cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_available_preferences
performance
Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors")
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Ayush Jain <ayush.jain3@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The power values coming from the Energy Model are already in uW.
The PowerCap and DTPM frameworks operate on uW, so all places should
just use the values from the EM.
Fix the code by removing all of the conversion to uW still present in it.
Fixes: ae6ccaa65038 (PM: EM: convert power field to micro-Watts precision and align drivers)
Cc: 5.19+ <stable@vger.kernel.org> # v5.19+
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
objpool overrun stress with test_objpool on OrangePi5+ SBC triggered the
following kernel warnings:
WARNING: CPU: 6 PID: 3115 at lib/objpool.c:168 objpool_push+0xc0/0x100
This message is from objpool.c:168:
WARN_ON_ONCE(tail - head > pool->nr_objs);
The overrun test case is to validate the case that pre-allocated objects
are insufficient: 8 objects are pre-allocated for each node and consumer
thread per node tries to grab 16 objects in a row. The testing system is
OrangePI 5+, with RK3588, a big.LITTLE SOC with 4x A76 and 4x A55. When
disabling either all 4 big or 4 little cores, the overrun tests run well,
and once with big and little cores mixed together, the overrun test would
always cause an overrun loop. It's likely the memory timing differences
of big and little cores cause this trouble. Here are the debugging data
of objpool_try_get_slot after try_cmpxchg_release:
objpool_pop: cpu: 4/0 0:0 head: 278/279 tail:278 last:276/278
The local copies of 'head' and 'last' were 278 and 276, and reloading of
'slot->head' and 'slot->last' got 279 and 278. After try_cmpxchg_release
'slot->head' became 'head + 1', which is correct. But what's wrong here
is the stale value of 'last', and that stale value of 'last' finally led
the overrun of 'head'.
Memory updating of 'last' and 'head' are performed in push() and pop()
independently, which could be the culprit leading this out of order
visibility of 'last' and 'head'. So for objpool_try_get_slot(), it's
not enough only checking the condition of 'head != slot', the implicit
condition 'last - head <= nr_objs' must also be explicitly asserted to
guarantee 'last' is always behind 'head' before the object retrieving.
This patch will check and try reloading of 'head' and 'last' to ensure
'last' is behind 'head' at the time of object retrieving. Performance
testings show the average impact is about 0.1% for X86_64 and 1.12% for
ARM64. Here are the results:
OS: Debian 10 X86_64, Linux 6.6rc
HW: XEON 8336C x 2, 64 cores/128 threads, DDR4 3200MT/s
1T 2T 4T 8T 16T
native: 49543304 99277826 199017659 399070324 795185848
objpool: 29909085 59865637 119692073 239750369 478005250
objpool+: 29879313 59230743 119609856 239067773 478509029
32T 48T 64T 96T 128T
native: 1596927073 2390099988 2929397330 3183875848 3257546602
objpool: 957553042 1435814086 1680872925 2043126796 2165424198
objpool+: 956476281 1434491297 1666055740 2041556569 2157415622
OS: Debian 11 AARCH64, Linux 6.6rc
HW: Kunpeng-920 96 cores/2 sockets/4 NUMA nodes, DDR4 2933 MT/s
1T 2T 4T 8T 16T
native: 30890508 60399915 123111980 242257008 494002946
objpool: 14742531 28883047 57739948 115886644 232455421
objpool+: 14107220 29032998 57286084 113730493 232232850
24T 32T 48T 64T 96T
native: 746406039 1000174750 1493236240 1998318364 2942911180
objpool: 349164852 467284332 702296756 934459713 1387898285
objpool+: 348388180 462750976 696606096 927865887 1368402195
Link: https://lore.kernel.org/all/20231114115148.298821-1-wuqiang.matt@bytedance.com/
Fixes: b4edb8d2d464 ("lib: objpool added: ring-array based lockless MPMC")
Signed-off-by: wuqiang.matt <wuqiang.matt@bytedance.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Fake flexible arrays (zero-length and one-element arrays) are deprecated,
and should be replaced by flexible-array members. So, replace
zero-length array with a flexible-array member in `struct
privcmd_kernel_ioreq`.
Also annotate array `ports` with `__counted_by()` to prepare for the
coming implementation by GCC and Clang of the `__counted_by` attribute.
Flexible array members annotated with `__counted_by` can have their
accesses bounds-checked at run-time via `CONFIG_UBSAN_BOUNDS` (for array
indexing) and `CONFIG_FORTIFY_SOURCE` (for strcpy/memcpy-family functions).
This fixes multiple -Warray-bounds warnings:
drivers/xen/privcmd.c:1239:30: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
drivers/xen/privcmd.c:1240:30: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
drivers/xen/privcmd.c:1241:30: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
drivers/xen/privcmd.c:1245:33: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
drivers/xen/privcmd.c:1258:67: warning: array subscript i is outside array bounds of 'struct ioreq_port[0]' [-Warray-bounds=]
This results in no differences in binary output.
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/ZVZlg3tPMPCRdteh@work
Signed-off-by: Juergen Gross <jgross@suse.com>
Pull x86 fixes from Borislav Petkov:
- Ignore invalid x2APIC entries in order to not waste per-CPU data
- Fix a back-to-back signals handling scenario when shadow stack is in
use
- A documentation fix
- Add Kirill as TDX maintainer
* tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/acpi: Ignore invalid x2APIC entries
x86/shstk: Delay signal entry SSP write until after user accesses
x86/Documentation: Indent 'note::' directive for protocol version number note
MAINTAINERS: Add Intel TDX entry
In non-coherent GIC designs, the ITS tables must be flushed before writing
to the GITS_BASER<n> registers, otherwise the ITS could read dirty tables,
which results in unpredictable behavior.
Flush the tables right at the begin of its_setup_baser() to prevent that.
[ tglx: Massage changelog ]
Fixes: a8707f553884 ("irqchip/gic-v3: Add Rockchip 3588001 erratum workaround")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fang Xiang <fangxiang3@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231030083256.4345-1-fangxiang3@xiaomi.com
When running perf-stat command on Intel hybrid platform, perf-stat
reports the following errors:
sudo taskset -c 7 ./perf stat -vvvv -e cpu_atom/instructions/ sleep 1
Opening: cpu/cycles/:HG
------------------------------------------------------------
perf_event_attr:
type 0 (PERF_TYPE_HARDWARE)
config 0xa00000000
disabled 1
------------------------------------------------------------
sys_perf_event_open: pid 0 cpu -1 group_fd -1 flags 0x8
sys_perf_event_open failed, error -16
Performance counter stats for 'sleep 1':
<not counted> cpu_atom/instructions/
It looks the cpu_atom/instructions/ event can't be enabled on atom PMU
even when the process is pinned on atom core. Investigation shows that
exclusive_event_init() helper always returns -EBUSY error in the perf
event creation. That's strange since the atom PMU should not be an
exclusive PMU.
Further investigation shows the issue was introduced by commit:
97588df87b56 ("perf/x86/intel: Add common intel_pmu_init_hybrid()")
The commit originally intents to clear the bit PERF_PMU_CAP_AUX_OUTPUT
from PMU capabilities if intel_cap.pebs_output_pt_available is not set,
but it incorrectly uses 'or' operation and leads to all PMU capabilities
bits are set to 1 except bit PERF_PMU_CAP_AUX_OUTPUT.
Testing this fix on Intel hybrid platforms, the observed issues
disappear.
Fixes: 97588df87b56 ("perf/x86/intel: Add common intel_pmu_init_hybrid()")
Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231121014628.729989-1-dapeng1.mi@linux.intel.com
First of all, the print is useless. The driver will either load and say
which microcode revision the machine has or issue an error.
Then, the version number is meaningless and actively confusing, as Yazen
mentioned recently: when a subset of patches are backported to a distro
kernel, one can't assume the driver version is the same as the upstream
one. And besides, the version number of the loader hasn't been used and
incremented for a long time. So drop it.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20231115210212.9981-2-bp@alien8.de
The both create_file_dentry() and create_dir_dentry() takes a boolean
parameter "lookup", as on lookup the inode_lock should already be taken,
but for dcache_dir_open_wrapper() it is not taken.
There's no reason that the dcache_dir_open_wrapper() can't take the
inode_lock before calling these functions. In fact, it's better if it
does, as the lock can be held throughout both directory and file
creations.
This also simplifies the code, and possibly prevents unexpected race
conditions when the lock is released.
Link: https://lkml.kernel.org/r/20231121231112.528544825@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Fixes: 5790b1fb3d672 ("eventfs: Remove eventfs_file and just use eventfs_inode")
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Pull iommu fixes from Joerg Roedel:
- Fix race conditions in device probe path
- Handle ERR_PTR() returns in __iommu_domain_alloc() path
- Update MAINTAINERS entry for Qualcom IOMMUs
- Printk argument fix in device tree specific code
- Several Intel VT-d fixes from Lu Baolu:
- Do not support enforcing cache coherency for non-empty domains
- Avoid devTLB invalidation if iommu is off
- Disable PCI ATS in legacy passthrough mode
- Support non-PCI devices when clearing context
- Fix incorrect cache invalidation for mm notification
- Add MTL to quirk list to skip TE disabling
- Set variable intel_dirty_ops to static
* tag 'iommu-fixes-v6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu:
iommu: Fix printk arg in of_iommu_get_resv_regions()
iommu/vt-d: Set variable intel_dirty_ops to static
iommu/vt-d: Fix incorrect cache invalidation for mm notification
iommu/vt-d: Add MTL to quirk list to skip TE disabling
iommu/vt-d: Make context clearing consistent with context mapping
iommu/vt-d: Disable PCI ATS in legacy passthrough mode
iommu/vt-d: Omit devTLB invalidation requests when TES=0
iommu/vt-d: Support enforce_cache_coherency only for empty domains
iommu: Avoid more races around device probe
MAINTAINERS: list all Qualcomm IOMMU drivers in the QUALCOMM IOMMU entry
iommu: Flow ERR_PTR out from __iommu_domain_alloc()
Commit 42c5a3b04bf6 refactored the KPTI init code in a way that results
in the use of non-global kernel mappings even on systems that have no
need for it, and even when KPTI has been disabled explicitly via the
command line.
Ensure that this only happens when we have decided (based on the
detected system-wide CPU features) that KPTI should be enabled.
Fixes: 42c5a3b04bf6 ("arm64: Split kpti_install_ng_mappings()")
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20231127120049.2258650-6-ardb@google.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The acpi_video code was storing the acpi_video_device as driver_data
in the acpi_device children of the acpi_video_bus acpi_device.
But the acpi_video driver only binds to the bus acpi_device.
It uses, but does not bind to, the children. Since it is not
the driver it should not be using the driver_data of the children's
acpi_device-s.
Since commit 0d16710146a1 ("ACPI: bus: Set driver_data to NULL every
time .add() fails") the childen's driver_data ends up getting set
to NULL after a driver fails to bind to the children leading to a NULL
pointer deref in video_get_max_state when registering the cooling-dev:
[ 3.148958] BUG: kernel NULL pointer dereference, address: 0000000000000090
<snip>
[ 3.149015] Hardware name: Sony Corporation VPCSB2X9R/VAIO, BIOS R2087H4 06/15/2012
[ 3.149021] RIP: 0010:video_get_max_state+0x17/0x30 [video]
<snip>
[ 3.149105] Call Trace:
[ 3.149110] <TASK>
[ 3.149114] ? __die+0x23/0x70
[ 3.149126] ? page_fault_oops+0x171/0x4e0
[ 3.149137] ? exc_page_fault+0x7f/0x180
[ 3.149147] ? asm_exc_page_fault+0x26/0x30
[ 3.149158] ? video_get_max_state+0x17/0x30 [video 9b6f3f0d19d7b4a0e2df17a2d8b43bc19c2ed71f]
[ 3.149176] ? __pfx_video_get_max_state+0x10/0x10 [video 9b6f3f0d19d7b4a0e2df17a2d8b43bc19c2ed71f]
[ 3.149192] __thermal_cooling_device_register.part.0+0xf2/0x2f0
[ 3.149205] acpi_video_bus_register_backlight.part.0.isra.0+0x414/0x570 [video 9b6f3f0d19d7b4a0e2df17a2d8b43bc19c2ed71f]
[ 3.149227] acpi_video_register_backlight+0x57/0x80 [video 9b6f3f0d19d7b4a0e2df17a2d8b43bc19c2ed71f]
[ 3.149245] intel_acpi_video_register+0x68/0x90 [i915 1f3a758130b32ef13d301d4f8f78c7d766d57f2a]
[ 3.149669] intel_display_driver_register+0x28/0x50 [i915 1f3a758130b32ef13d301d4f8f78c7d766d57f2a]
[ 3.150064] i915_driver_probe+0x790/0xb90 [i915 1f3a758130b32ef13d301d4f8f78c7d766d57f2a]
[ 3.150402] local_pci_probe+0x45/0xa0
[ 3.150412] pci_device_probe+0xc1/0x260
<snip>
Fix this by directly using the acpi_video_device as devdata for
the cooling-device, which avoids the need to set driver-data on
the children at all.
Fixes: 0d16710146a1 ("ACPI: bus: Set driver_data to NULL every time .add() fails")
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/9718
Cc: 6.6+ <stable@vger.kernel.org> # 6.6+
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Linus reported that:
After commit a103f46633fd the kernel stopped compiling for
several ARM32 platforms that I am building with a bare metal
compiler. Bare metal compilers (arm-none-eabi-) don't
define __linux__.
This is because the header <acpi/platform/acenv.h> is now
in the include path for <linux/irq.h>:
CC arch/arm/kernel/irq.o
CC kernel/sysctl.o
CC crypto/api.o
In file included from ../include/acpi/acpi.h:22,
from ../include/linux/fw_table.h:29,
from ../include/linux/acpi.h:18,
from ../include/linux/irqchip.h:14,
from ../arch/arm/kernel/irq.c:25:
../include/acpi/platform/acenv.h:218:2: error: #error Unknown target environment
218 | #error Unknown target environment
| ^~~~~
The issue is caused by the introducing of splitting out the ACPI code to
support the new generic fw_table code.
Rafael suggested [1] moving the fw_table.h include in linux/acpi.h to below
the linux/mutex.h. Remove the two includes in fw_table.h. Replace
linux/fw_table.h include in fw_table.c with linux/acpi.h.
Link: https://lore.kernel.org/linux-acpi/CAJZ5v0idWdJq3JSqQWLG5q+b+b=zkEdWR55rGYEoxh7R6N8kFQ@mail.gmail.com/
Fixes: a103f46633fd ("acpi: Move common tables helper functions to common lib")
Closes: https://lore.kernel.org/linux-acpi/20231114-arm-build-bug-v1-1-458745fe32a4@linaro.org/
Reported-by: Linus Walleij <linus.walleij@linaro.org>
Suggested-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When amd_pstate is running, writing to scaling_min_freq and
scaling_max_freq has no effect. These values are only passed to the
policy level, but not to the platform level. This means that the
platform does not know about the frequency limits set by the user.
To fix this, update the min_perf and max_perf values at the platform
level whenever the user changes the scaling_min_freq and scaling_max_freq
values.
Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors")
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>