commits
Pull smb server fixes from Steve French:
"Six SMB3 server fixes for various races found by RO0T Lab of Huawei:
- Fix oops when racing between oplock break ack and freeing file
- Simultaneous request fixes for parallel logoffs, and for parallel
lock requests
- Fixes for tree disconnect race, session expire race, and close/open
race"
* tag '6.6-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix race condition between tree conn lookup and disconnect
ksmbd: fix race condition from parallel smb2 lock requests
ksmbd: fix race condition from parallel smb2 logoff requests
ksmbd: fix uaf in smb20_oplock_break_ack
ksmbd: fix race condition with fp
ksmbd: fix race condition between session lookup and expire
Pull misc scheduler fixes from Ingo Molnar:
- Two EEVDF fixes: one to fix sysctl_sched_base_slice propagation, and
to fix an avg_vruntime() corner-case.
- A cpufreq frequency scaling fix
* tag 'sched-urgent-2023-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpufreq: schedutil: Update next_freq when cpufreq_limits change
sched/eevdf: Fix avg_vruntime()
sched/eevdf: Also update slice on placement
if thread A in smb2_write is using work-tcon, other thread B use
smb2_tree_disconnect free the tcon, then thread A will use free'd tcon.
Time
+
Thread A | Thread A
smb2_write | smb2_tree_disconnect
|
|
| kfree(tree_conn)
|
// UAF! |
work->tcon->share_conf |
+
This patch add state, reference count and lock for tree conn to fix race
condition issue.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull misc x86 fixes from Ingo Molnar:
- Fix SEV-SNP guest crashes that may happen on NMIs
- Fix a potential SEV platform memory setup overflow
* tag 'x86-urgent-2023-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Change npages to unsigned long in snp_accept_memory()
x86/sev: Use the GHCB protocol when available for SNP CPUID requests
When cpufreq's policy is 'single', there is a scenario that will
cause sg_policy's next_freq to be unable to update.
When the CPU's util is always max, the cpufreq will be max,
and then if we change the policy's scaling_max_freq to be a
lower freq, indeed, the sg_policy's next_freq need change to
be the lower freq, however, because the cpu_is_busy, the next_freq
would keep the max_freq.
For example:
The cpu7 is a single CPU:
unisoc:/sys/devices/system/cpu/cpufreq/policy7 # while true;do done& [1] 4737
unisoc:/sys/devices/system/cpu/cpufreq/policy7 # taskset -p 80 4737
pid 4737's current affinity mask: ff
pid 4737's new affinity mask: 80
unisoc:/sys/devices/system/cpu/cpufreq/policy7 # cat scaling_max_freq
2301000
unisoc:/sys/devices/system/cpu/cpufreq/policy7 # cat scaling_cur_freq
2301000
unisoc:/sys/devices/system/cpu/cpufreq/policy7 # echo 2171000 > scaling_max_freq
unisoc:/sys/devices/system/cpu/cpufreq/policy7 # cat scaling_max_freq
2171000
At this time, the sg_policy's next_freq would stay at 2301000, which
is wrong.
To fix this, add a check for the ->need_freq_update flag.
[ mingo: Clarified the changelog. ]
Co-developed-by: Guohua Yan <guohua.yan@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Signed-off-by: Guohua Yan <guohua.yan@unisoc.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230719130527.8074-1-xuewen.yan@unisoc.com
There is a race condition issue between parallel smb2 lock request.
Time
+
Thread A | Thread A
smb2_lock | smb2_lock
|
insert smb_lock to lock_list |
spin_unlock(&work->conn->llist_lock) |
|
| spin_lock(&conn->llist_lock);
| kfree(cmp_lock);
|
// UAF! |
list_add(&smb_lock->llist, &rollback_list) +
This patch swaps the line for adding the smb lock to the rollback list and
adding the lock list of connection to fix the race issue.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull parisc fixes from Helge Deller:
- fix random faults in mmap'd memory on pre PA8800 processors
- fix boot crash with nr_cpus=1 on kernel command line
* tag 'parisc-for-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Restore __ldcw_align for PA-RISC 2.0 processors
parisc: Fix crash with nr_cpus=1 option
In snp_accept_memory(), the npages variables value is calculated from
phys_addr_t variables but is an unsigned int. A very large range passed
into snp_accept_memory() could lead to truncating npages to zero. This
doesn't happen at the moment but let's be prepared.
Fixes: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/6d511c25576494f682063c9fb6c705b526a3757e.1687441505.git.thomas.lendacky@amd.com
The expectation is that placing a task at avg_vruntime() makes it
eligible. Turns out there is a corner case where this is not the case.
Specifically, avg_vruntime() relies on the fact that integer division
is a flooring function (eg. it discards the remainder). By this
property the value returned is slightly left of the true average.
However! when the average is a negative (relative to min_vruntime) the
effect is flipped and it becomes a ceil, with the result that the
returned value is just right of the average and thus not eligible.
Fixes: af4cf40470c2 ("sched/fair: Add cfs_rq::avg_vruntime")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
If parallel smb2 logoff requests come in before closing door, running
request count becomes more than 1 even though connection status is set to
KSMBD_SESS_NEED_RECONNECT. It can't get condition true, and sleep forever.
This patch fix race condition problem by returning error if connection
status was already set to KSMBD_SESS_NEED_RECONNECT.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull smb client fixes from Steve French:
- protect cifs/smb3 socket connect from BPF address overwrite
- fix case when directory leases disabled but wasting resources with
unneeded thread on each mount
* tag '6.6-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: do not start laundromat thread on nohandlecache
smb: use kernel_connect() and kernel_bind()
Back in 2005, Kyle McMartin removed the 16-byte alignment for
ldcw semaphores on PA 2.0 machines (CONFIG_PA20). This broke
spinlocks on pre PA8800 processors. The main symptom was random
faults in mmap'd memory (e.g., gcc compilations, etc).
Unfortunately, the errata for this ldcw change is lost.
The issue is the 16-byte alignment required for ldcw semaphore
instructions can only be reduced to natural alignment when the
ldcw operation can be handled coherently in cache. Only PA8800
and PA8900 processors actually support doing the operation in
cache.
Aligning the spinlock dynamically adds two integer instructions
to each spinlock.
Tested on rp3440, c8000 and a500.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Link: https://lore.kernel.org/linux-parisc/6b332788-2227-127f-ba6d-55e99ecf4ed8@bell.net/T/#t
Link: https://lore.kernel.org/linux-parisc/20050609050702.GB4641@roadwarrior.mcmartin.ca/
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
SNP retrieves the majority of CPUID information from the SNP CPUID page.
But there are times when that information needs to be supplemented by the
hypervisor, for example, obtaining the initial APIC ID of the vCPU from
leaf 1.
The current implementation uses the MSR protocol to retrieve the data from
the hypervisor, even when a GHCB exists. The problem arises when an NMI
arrives on return from the VMGEXIT. The NMI will be immediately serviced
and may generate a #VC requiring communication with the hypervisor.
Since a GHCB exists in this case, it will be used. As part of using the
GHCB, the #VC handler will write the GHCB physical address into the GHCB
MSR and the #VC will be handled.
When the NMI completes, processing resumes at the site of the VMGEXIT
which is expecting to read the GHCB MSR and find a CPUID MSR protocol
response. Since the NMI handling overwrote the GHCB MSR response, the
guest will see an invalid reply from the hypervisor and self-terminate.
Fix this problem by using the GHCB when it is available. Any NMI
received is properly handled because the GHCB contents are copied into
a backup page and restored on NMI exit, thus preserving the active GHCB
request or result.
[ bp: Touchups. ]
Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/a5856fa1ebe3879de91a8f6298b6bbd901c61881.1690578565.git.thomas.lendacky@amd.com
Tasks that never consume their full slice would not update their slice value.
This means that tasks that are spawned before the sysctl scaling keep their
original (UP) slice length.
Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230915124822.847197830@noisy.programming.kicks-ass.net
drop reference after use opinfo.
Signed-off-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull xfs fixes from Chandan Babu:
- Prevent filesystem hang when executing fstrim operations on large and
slow storage
* tag 'xfs-6.6-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: abort fstrim if kernel is suspending
xfs: reduce AGF hold times during fstrim operations
xfs: move log discard work to xfs_discard.c
Honor 'nohandlecache' mount option by not starting laundromat thread
even when SMB server supports directory leases. Do not waste system
resources by having laundromat thread running with no directory
caching at all.
Fixes: 2da338ff752a ("smb3: do not start laundromat thread when dir leases disabled")
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
John David Anglin reported that giving "nr_cpus=1" on the command
line causes a crash, while "maxcpus=1" works.
Reported-by: John David Anglin <dave.anglin@bell.net>
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org # v5.18+
fp can used in each command. If smb2_close command is coming at the
same time, UAF issue can happen by race condition.
Time
+
Thread A | Thread B1 B2 .... B5
smb2_open | smb2_close
|
__open_id |
insert fp to file_table |
|
| atomic_dec_and_test(&fp->refcount)
| if fp->refcount == 0, free fp by kfree.
// UAF! |
use fp |
+
This patch add f_state not to use freed fp is used and not to free fp in
use.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull device mapper fixes from Mike Snitzer:
- Fix memory leak when freeing dm zoned target device
- Update dm-devel mailing list address in MAINTAINERS
* tag 'for-6.6/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
MAINTAINERS: update the dm-devel mailing list
dm zoned: free dmz->ddev array in dmz_put_zoned_devices
xfs: reduce AGF hold times during fstrim operations
A recent log space overflow and recovery failure was root caused to
a long running truncate blocking on the AGF and ending up pinning
the tail of the log. The filesystem then hung, the machine was
rebooted, and log recoery then refused to run because there wasn't
enough space in the log for EFI transaction reservation.
The reason the long running truncate got blocked on the AGF for so
long was that an fstrim was being run. THe underlying block device
was large and very slow (10TB ceph rbd volume) and so discarding all
the free space in the AG took a really long time.
The current fstrim implementation holds the AGF across the entire
operations - both the freee space scan and the issuing of all the
discards. The discards are synchronous and single depth, so if there
are millions of free spaces, we hold the AGF lock across millions of
discard operations.
It doesn't really need to be said that this is a Bad Thing.
This series reworks the fstrim discard path to use the same
mechanisms as online discard. This allows discards to be issued
asynchronously without holding the AGF locked, enabling higher
discard queue depths (much faster on fast devices) and only
requiring the AGF lock to be held whilst we are scanning free space.
To do this, we make use of busy extents - we lock the AGF, mark all
the extents we want to discard as "busy under discard" so that
nothing will be allowed to allocate them, and then drop the AGF
lock. We then issue discards on the gathered busy extents and on
discard completion remove them from the busy list.
This results in AGF lock holds times for fstrim dropping to a few
milliseconds each batch of free extents we scan, and so the hours
long hold times that can currently occur on large, slow, badly
fragmented device no longer occur.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'xfs-fstrim-busy-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: abort fstrim if kernel is suspending
xfs: reduce AGF hold times during fstrim operations
xfs: move log discard work to xfs_discard.c
Recent changes to kernel_connect() and kernel_bind() ensure that
callers are insulated from changes to the address parameter made by BPF
SOCK_ADDR hooks. This patch wraps direct calls to ops->connect() and
ops->bind() with kernel_connect() and kernel_bind() to ensure that SMB
mounts do not see their mount address overwritten in such cases.
Link: https://lore.kernel.org/netdev/9944248dba1bce861375fcce9de663934d933ba9.camel@redhat.com/
Cc: <stable@vger.kernel.org> # 6.0+
Signed-off-by: Jordan Rife <jrife@google.com>
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull Kbuild fixes from Masahiro Yamada:
- Fix the module compression with xz so the in-kernel decompressor
works
- Document a kconfig idiom to express an optional dependency between
modules
- Make modpost, when W=1 is given, detect broken drivers that reference
.exit.* sections
- Remove unused code
* tag 'kbuild-fixes-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: remove stale code for 'source' symlink in packaging scripts
modpost: Don't let "driver"s reference .exit.*
vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros
modpost: add missing else to the "of" check
Documentation: kbuild: explain handling optional dependencies
kbuild: Use CRC32 and a 1MiB dictionary for XZ compressed modules
Thread A + Thread B
ksmbd_session_lookup | smb2_sess_setup
sess = xa_load |
|
| xa_erase(&conn->sessions, sess->id);
|
| ksmbd_session_destroy(sess) --> kfree(sess)
|
// UAF! |
sess->last_active = jiffies |
+
This patch add rwsem to fix race condition between ksmbd_session_lookup
and ksmbd_expire_session.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull media fixes from Mauro Carvalho Chehab:
- two Kconfig build fixes under randconfig
- pxa_camera: Fix an error handling path
- mediatek: vcodec: Fix a NULL-access pointer
- tegra-video: fix an infinite recursion regression
* tag 'media/v6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: mediatek: vcodec: Fix encoder access NULL pointer
staging: media: tegra-video: fix infinite recursion regression
media: pci: intel: ivsc: select V4L2_FWNODE
media: ipu-bridge: Fix Kconfig dependencies
media: pxa_camera: Fix an error handling path in pxa_camera_probe()
dm-devel@redhat.com has migrated to dm-devel@lists.linux.dev
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
A recent ext4 patch posting from Jan Kara reminded me of a
discussion a year ago about fstrim in progress preventing kernels
from suspending. The fix is simple, we should do the same for XFS.
This removes the -ERESTARTSYS error return from this code, replacing
it with either the last error seen or the number of blocks
successfully trimmed up to the point where we detected the stop
condition.
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- Fix an UV boot crash
- Skip spurious ENDBR generation on _THIS_IP_
- Fix ENDBR use in putuser() asm methods
- Fix corner case boot crashes on 5-level paging
- and fix a false positive WARNING on LTO kernels"
* tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/purgatory: Remove LTO flags
x86/boot/compressed: Reserve more memory for page tables
x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*()
x86/ibt: Suppress spurious ENDBR
x86/platform/uv: Use alternate source for socket to node data
Pull misc fixes from Andrew Morton:
"Fourteen hotfixes, eleven of which are cc:stable. The remainder
pertain to issues which were introduced after 6.5"
* tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
Crash: add lock to serialize crash hotplug handling
selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
mm, memcg: reconsider kmem.limit_in_bytes deprecation
mm: zswap: fix potential memory corruption on duplicate store
arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
mm: hugetlb: add huge page size param to set_huge_pte_at()
maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
maple_tree: add mas_is_active() to detect in-tree walks
nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
mm: abstract moving to the next PFN
mm: report success more often from filemap_map_folio_range()
fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
Since commit d8131c2965d5 ("kbuild: remove $(MODLIB)/source symlink"),
modules_install does not create the 'source' symlink.
Remove the stale code from builddeb and kernel.spec.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull devicetree fixes from Rob Herring:
- Fix potential memory leak in of_changeset_action()
- Fix some i.MX binding warnings
- Fix typo in renesas,vin binding field-even-active property
- Fix andestech,ax45mp-cache example unit-address
- Add missing additionalProperties on RiscV CPU interrupt-controller
node
- Add missing unevaluatedProperties on media bindings
- Fix brcm,iproc-pcie binding 'msi' child node schema
- Fix MEMSIC MXC4005 compatible string
* tag 'devicetree-fixes-for-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: trivial-devices: Fix MEMSIC MXC4005 compatible string
dt-bindings: PCI: brcm,iproc-pcie: Fix 'msi' child node schema
dt-bindings: PCI: brcm,iproc-pcie: Drop common pci-bus properties
dt-bindings: PCI: brcm,iproc-pcie: Fix example indentation
media: dt-bindings: Add missing unevaluatedProperties on child node schemas
dt-bindings: bus: fsl,imx8qxp-pixel-link-msi-bus: Drop child 'reg' property
media: dt-bindings: imx7-csi: Make power-domains not required for imx8mq
dt-bindings: media: renesas,vin: Fix field-even-active spelling
dt-bindings: cache: andestech,ax45mp-cache: Fix unit address in example
of: overlay: Reorder struct fragment fields kerneldoc
dt-bindings: display: fsl,imx6-hdmi: Change to 'unevaluatedProperties: false'
dt-bindings: riscv: cpus: Add missing additionalProperties on interrupt-controller node
of: dynamic: Fix potential memory leak in of_changeset_action()
Need to set the private data with encoder device, or will access
NULL pointer in encoder handler.
Fixes: 1972e32431ed ("media: mediatek: vcodec: Fix possible invalid memory access for encoder")
Signed-off-by: Irui Wang <irui.wang@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Commit 4dba12881f88 ("dm zoned: support arbitrary number of devices")
made the pointers to additional zoned devices to be stored in a
dynamically allocated dmz->ddev array. However, this array is not freed.
Rename dmz_put_zoned_device to dmz_put_zoned_devices and fix it to
free the dmz->ddev array when cleaning up zoned device information.
Remove NULL assignment for all dmz->ddev elements and just free the
dmz->ddev array instead.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4dba12881f88 ("dm zoned: support arbitrary number of devices")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
fstrim will hold the AGF lock for as long as it takes to walk and
discard all the free space in the AG that meets the userspace trim
criteria. For AGs with lots of free space extents (e.g. millions)
or the underlying device is really slow at processing discard
requests (e.g. Ceph RBD), this means the AGF hold time is often
measured in minutes to hours, not a few milliseconds as we normal
see with non-discard based operations.
This can result in the entire filesystem hanging whilst the
long-running fstrim is in progress. We can have transactions get
stuck waiting for the AGF lock (data or metadata extent allocation
and freeing), and then more transactions get stuck waiting on the
locks those transactions hold. We can get to the point where fstrim
blocks an extent allocation or free operation long enough that it
ends up pinning the tail of the log and the log then runs out of
space. At this point, every modification in the filesystem gets
blocked. This includes read operations, if atime updates need to be
made.
To fix this problem, we need to be able to discard free space
extents safely without holding the AGF lock. Fortunately, we already
do this with online discard via busy extents. We can mark free space
extents as "busy being discarded" under the AGF lock and then unlock
the AGF, knowing that nobody will be able to allocate that free
space extent until we remove it from the busy tree.
Modify xfs_trim_extents to use the same asynchronous discard
mechanism backed by busy extents as is used with online discard.
This results in the AGF only needing to be held for short periods of
time and it is never held while we issue discards. Hence if discard
submission gets throttled because it is slow and/or there are lots
of them, we aren't preventing other operations from being performed
on AGF while we wait for discards to complete...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Pull scheduler fixes from Ingo Molnar:
"Fix a performance regression on large SMT systems, an Intel SMT4
balancing bug, and a topology setup bug on (Intel) hybrid processors"
* tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain
sched/fair: Fix SMT4 group_smt_balance handling
sched/fair: Optimize should_we_balance() for large SMT systems
-flto* implies -ffunction-sections. With LTO enabled, ld.lld generates
multiple .text sections for purgatory.ro:
$ readelf -S purgatory.ro | grep " .text"
[ 1] .text PROGBITS 0000000000000000 00000040
[ 7] .text.purgatory PROGBITS 0000000000000000 000020e0
[ 9] .text.warn PROGBITS 0000000000000000 000021c0
[13] .text.sha256_upda PROGBITS 0000000000000000 000022f0
[15] .text.sha224_upda PROGBITS 0000000000000000 00002be0
[17] .text.sha256_fina PROGBITS 0000000000000000 00002bf0
[19] .text.sha224_fina PROGBITS 0000000000000000 00002cc0
This causes WARNING from kexec_purgatory_setup_sechdrs():
WARNING: CPU: 26 PID: 110894 at kernel/kexec_file.c:919
kexec_load_purgatory+0x37f/0x390
Fix this by disabling LTO for purgatory.
[ AFAICT, x86 is the only arch that supports LTO and purgatory. ]
We could also fix this with an explicit linker script to rejoin .text.*
sections back into .text. However, given the benefit of LTOing purgatory
is small, simply disable the production of more .text.* sections for now.
Fixes: b33fff07e3e3 ("x86, build: allow LTO to be selected")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230914170138.995606-1-song@kernel.org
Pull misc driver fix from Greg KH:
"Here is a single, much requested, fix for a set of misc drivers to
resolve a much reported regression in the -rc series that has also
propagated back to the stable releases. Sorry for the delay, lots of
conference travel for a few weeks put me very far behind in patch
wrangling.
It has been reported by many to resolve the reported problem, and has
been in linux-next with no reported issues"
* tag 'char-misc-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe
Eric reported that handling corresponding crash hotplug event can be
failed easily when many memory hotplug event are notified in a short
period. They failed because failing to take __kexec_lock.
=======
[ 78.714569] Fallback order for Node 0: 0
[ 78.714575] Built 1 zonelists, mobility grouping on. Total pages: 1817886
[ 78.717133] Policy zone: Normal
[ 78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[ 78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[ 80.056643] PEFILE: Unsigned PE binary
=======
The memory hotplug events are notified very quickly and very many, while
the handling of crash hotplug is much slower relatively. So the atomic
variable __kexec_lock and kexec_trylock() can't guarantee the
serialization of crash hotplug handling.
Here, add a new mutex lock __crash_hotplug_lock to serialize crash hotplug
handling specifically. This doesn't impact the usage of __kexec_lock.
Link: https://lkml.kernel.org/r/20230926120905.392903-1-bhe@redhat.com
Fixes: 247262756121 ("crash: add generic infrastructure for crash hotplug support")
Signed-off-by: Baoquan He <bhe@redhat.com>
Tested-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Drivers must not reference functions marked with __exit as these likely
are not available when the code is built-in.
There are few creative offenders uncovered for example in ARCH=amd64
allmodconfig builds. So only trigger the section mismatch warning for
W=1 builds.
The dual rule that drivers must not reference .init.* is implemented
since commit 0db252452378 ("modpost: don't allow *driver to reference
.init.*") which however missed that .exit.* should be handled in the
same way.
Thanks to Masahiro Yamada and Arnd Bergmann who gave valuable hints to
find this improvement.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull gpio fixes from Bartosz Golaszewski:
"Another round of driver one-liners from the GPIO subsystem:
- disable pin control on MMP GPIOs in gpio-pxa
- fix the GPIO number passed to one of the pinctrl callbacks in
gpio-aspeed"
* tag 'gpio-fixes-for-v6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: aspeed: fix the GPIO number passed to pinctrl_gpio_set_config()
gpio: pxa: disable pinctrl calls for MMP_GPIO
The correct name of this chip is MXC4005, not MX4005. This is confirmed
both by the manufacturer website and by the title of the original commit,
which added other MXCxxxx devices as well but only this one misses a "c" in
the compatible string.
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Fixes: d9bf5d37fd58 ("dt-bindings:trivial-devices: Add memsic,mxc4005/mxc6255/mxc6655 entries")
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20231004-mxc4005-device-tree-support-v1-1-e7c0faea72e4@bootlin.com
Signed-off-by: Rob Herring <robh@kernel.org>
Since commit 9bf19fbf0c8b ("media: v4l: async: Rework internal lists"), aka
v6.6-rc1~97^2~198, probing the tegra-video VI driver causes infinite
recursion due tegra_vi_graph_parse_one() calling itself until:
[ 1.571168] Insufficient stack space to handle exception!
...
[ 1.591416] Internal error: kernel stack overflow: 0 [#1] PREEMPT SMP ARM
...
[ 3.861013] of_phandle_iterator_init from __of_parse_phandle_with_args+0x40/0xf0
[ 3.868497] __of_parse_phandle_with_args from of_fwnode_graph_get_remote_endpoint+0x68/0xa8
[ 3.876938] of_fwnode_graph_get_remote_endpoint from fwnode_graph_get_remote_port_parent+0x30/0x7c
[ 3.885984] fwnode_graph_get_remote_port_parent from tegra_vi_graph_parse_one+0x7c/0x224
[ 3.894158] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 3.901459] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 3.908760] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 3.916061] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
...
[ 4.857892] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 4.865193] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 4.872494] tegra_vi_graph_parse_one from tegra_vi_init+0x574/0x6d4
[ 4.878842] tegra_vi_init from host1x_device_init+0x84/0x15c
[ 4.884594] host1x_device_init from host1x_video_probe+0xa0/0x114
[ 4.890770] host1x_video_probe from really_probe+0xe0/0x400
The reason is the mentioned commit changed tegra_vi_graph_find_entity() to
search for an entity in the done notifier list:
> @@ -1464,7 +1464,7 @@ tegra_vi_graph_find_entity(struct tegra_vi_channel *chan,
> struct tegra_vi_graph_entity *entity;
> struct v4l2_async_connection *asd;
>
> - list_for_each_entry(asd, &chan->notifier.asc_list, asc_entry) {
> + list_for_each_entry(asd, &chan->notifier.done_list, asc_entry) {
> entity = to_tegra_vi_graph_entity(asd);
> if (entity->asd.match.fwnode == fwnode)
> return entity;
This is not always correct, being tegra_vi_graph_find_entity() called in
three locations, in this order:
1. tegra_vi_graph_parse_one() -- called while probing
2. tegra_vi_graph_notify_bound() -- the .bound notifier op
3. tegra_vi_graph_build() -- called in the .complete notifier op
Locations 1 and 2 are called before moving the entity from waiting_list to
done_list, thus they won't find what they are looking for in
done_list. Location 3 happens afterwards and thus it is not broken, however
it means tegra_vi_graph_find_entity() should not search in the same list
every time.
The error appears at step 1: tegra_vi_graph_parse_one() iterates
recursively until it finds the entity already notified, which now never
happens.
Fix by passing the specific notifier list pointer to
tegra_vi_graph_find_entity() instead of the channel, so each caller can
search in whatever list is correct.
Also improve the tegra_vi_graph_find_entity() comment.
Fixes: 9bf19fbf0c8b ("media: v4l: async: Rework internal lists")
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Sowjanya Komatineni <skomatineni@nvidia.com>
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
[Sakari Ailus: Wrapped some long lines.]
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Because we are going to use the same list-based discard submission
interface for fstrim-based discards, too.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Pull objtool fix from Ingo Molnar:
"Fix a cold functions related false-positive objtool warning that
triggers on Clang"
* tag 'objtool-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix _THIS_IP_ detection for cold functions
Commit 8f2d6c41e5a6 ("x86/sched: Rewrite topology setup") dropped the
SD_ASYM_PACKING flag in the DIE domain added in commit 044f0e27dec6
("x86/sched: Add the SD_ASYM_PACKING flag to the die domain of hybrid
processors"). Restore it on hybrid processors.
The die-level domain does not depend on any build configuration and now
x86_sched_itmt_flags() is always needed. Remove the build dependency on
CONFIG_SCHED_[SMT|CLUSTER|MC].
Fixes: 8f2d6c41e5a6 ("x86/sched: Rewrite topology setup")
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Link: https://lkml.kernel.org/r/20230815035747.11529-1-ricardo.neri-calderon@linux.intel.com
The decompressor has a hard limit on the number of page tables it can
allocate. This limit is defined at compile-time and will cause boot
failure if it is reached.
The kernel is very strict and calculates the limit precisely for the
worst-case scenario based on the current configuration. However, it is
easy to forget to adjust the limit when a new use-case arises. The
worst-case scenario is rarely encountered during sanity checks.
In the case of enabling 5-level paging, a use-case was overlooked. The
limit needs to be increased by one to accommodate the additional level.
This oversight went unnoticed until Aaron attempted to run the kernel
via kexec with 5-level paging and unaccepted memory enabled.
Update wost-case calculations to include 5-level paging.
To address this issue, let's allocate some extra space for page tables.
128K should be sufficient for any use-case. The logic can be simplified
by using a single value for all kernel configurations.
[ Also add a warning, should this memory run low - by Dave Hansen. ]
Fixes: 34bbb0009f3b ("x86/boot/compressed: Enable 5-level paging during decompression stage")
Reported-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com
Pull tty / serial driver fixes from Greg KH:
"Here are two tty/serial driver fixes for 6.6-rc4 that resolve some
reported regressions:
- revert a n_gsm change that ended up causing problems
- 8250_port fix for irq data
both have been in linux-next for over a week with no reported
problems"
* tag 'tty-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "tty: n_gsm: fix UAF in gsm_cleanup_mux"
serial: 8250_port: Check IRQ data before use
commit 101bd907b424 ("misc: rtsx: judge ASPM Mode to set PETXCFG Reg")
some readers no longer force #CLKREQ to low
when the system need to enter ASPM.
But some platform maybe not implement complete ASPM?
it causes some platforms can not boot
Like in the past only the platform support L1ss we release the #CLKREQ.
Move the judgment (L1ss) to probe,
we think read config space one time when the driver start is enough
Fixes: 101bd907b424 ("misc: rtsx: judge ASPM Mode to set PETXCFG Reg")
Cc: stable <stable@kernel.org>
Reported-by: Paul Grandperrin <paul.grandperrin@gmail.com>
Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
Tested-By: Jade Lovelace <lists@jade.fyi>
Link: https://lore.kernel.org/r/37b1afb997f14946a8784c73d1f9a4f5@realtek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to the awk manual, the -e option does not need to be specified
in front of 'program' (unless you need to mix program-file).
The redundant -e option can cause error when users use awk tools other
than gawk (for example, mawk does not support the -e option).
Error Example:
awk: not an option: -e
Link: https://lkml.kernel.org/r/VI1P193MB075228810591AF2FDD7D42C599C3A@VI1P193MB0752.EURP193.PROD.OUTLOOK.COM
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Remove the left-over of commit e24f6628811e ("modpost: remove all
traces of cpuinit/cpuexit sections").
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Pull rdma fixes from Jason Gunthorpe:
"This includes a fix for a significant security miss in checking the
RDMA_NLDEV_CMD_SYS_SET operation.
Summary:
- UAF in SRP
- Error unwind failure in siw connection management
- Missing error checks
- NULL/ERR_PTR confusion in erdma
- Possible string truncation in CMA configfs and mlx4
- Data ordering issue in bnxt_re
- Missing stats decrement on object destroy in bnxt_re
- Mlx5 bugs in this merge window:
* Incorrect access_flag in the new mkey cache
* Missing unlock on error in flow steering
* lockdep possible deadlock on new mkey cache destruction (Plus a
fix for this too)
- Don't leak kernel stack memory to userspace in the CM
- Missing permission validation for RDMA_NLDEV_CMD_SYS_SET"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/core: Require admin capabilities to set system parameters
RDMA/mlx5: Remove not-used cache disable flag
RDMA/cma: Initialize ib_sa_multicast structure to 0 when join
RDMA/mlx5: Fix mkey cache possible deadlock on cleanup
RDMA/mlx5: Fix NULL string error
RDMA/mlx5: Fix mutex unlocking on error flow for steering anchor creation
RDMA/mlx5: Fix assigning access flags to cache mkeys
IB/mlx4: Fix the size of a buffer in add_port_entries()
RDMA/bnxt_re: Decrement resource stats correctly
RDMA/bnxt_re: Fix the handling of control path response data
RDMA/cma: Fix truncation compilation warning in make_cma_ports
RDMA/erdma: Fix NULL pointer access in regmr_cmd
RDMA/erdma: Fix error code in erdma_create_scatter_mtt()
RDMA/uverbs: Fix typo of sizeof argument
RDMA/cxgb4: Check skb value for failure to allocate
RDMA/siw: Fix connection failure handling
RDMA/srp: Do not call scsi_done() from srp_abort()
pinctrl_gpio_set_config() expects the GPIO number from the global GPIO
numberspace, not the controller-relative offset, which needs to be added
to the chip base.
Fixes: 5ae4cb94b313 ("gpio: aspeed: Add debounce support")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au>
The 'msi' child node schema is missing constraints on additional properties.
It turns out it is incomplete and properties for it are documented in the
parent node by mistake. Move the reference to msi-controller.yaml and
the custom properties to the 'msi' node. Adding 'unevaluatedProperties'
ensures all the properties in the 'msi' node are documented.
With the schema corrected, a minimal interrupt controller node is needed
to properly decode the interrupt properties since the example has
multiple interrupt parents.
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Fixes: 905b986d099c ("dt-bindings: pci: Convert iProc PCIe to YAML")
Link: https://lore.kernel.org/r/20230926155613.33904-3-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Some missing select statements were already added back, but I ran into
another one that is missing:
ERROR: modpost: "v4l2_fwnode_endpoint_free" [drivers/media/pci/intel/ivsc/ivsc-csi.ko] undefined!
ERROR: modpost: "v4l2_fwnode_endpoint_alloc_parse" [drivers/media/pci/intel/ivsc/ivsc-csi.ko] undefined!
ERROR: modpost: "v4l2_fwnode_endpoint_parse" [drivers/media/pci/intel/ivsc/ivsc-csi.ko] undefined!
Fixes: 29006e196a56 ("media: pci: intel: ivsc: Add CSI submodule")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[Sakari Ailus: Drop V4L2_ASYNC dependency, it is implied now.]
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Pull WARN fix from Ingo Molnar:
"Fix a missing preempt-enable in the WARN() slowpath"
* tag 'core-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
panic: Reenable preemption in WARN slowpath
Cold functions and their non-cold counterparts can use _THIS_IP_ to
reference each other. Don't warn about !ENDBR in that case.
Note that for GCC this is currently irrelevant in light of the following
commit
c27cd083cfb9 ("Compiler attributes: GCC cold function alignment workarounds")
which disabled cold functions in the kernel. However this may still be
possible with Clang.
Fixes several warnings like the following:
drivers/scsi/bnx2i/bnx2i.prelink.o: warning: objtool: bnx2i_hw_ep_disconnect+0x19d: relocation to !ENDBR: bnx2i_hw_ep_disconnect.cold+0x0
drivers/net/ipvlan/ipvlan.prelink.o: warning: objtool: ipvlan_addr4_event.cold+0x28: relocation to !ENDBR: ipvlan_addr4_event+0xda
drivers/net/ipvlan/ipvlan.prelink.o: warning: objtool: ipvlan_addr6_event.cold+0x26: relocation to !ENDBR: ipvlan_addr6_event+0xb7
drivers/net/ethernet/broadcom/tg3.prelink.o: warning: objtool: tg3_set_ringparam.cold+0x17: relocation to !ENDBR: tg3_set_ringparam+0x115
drivers/net/ethernet/broadcom/tg3.prelink.o: warning: objtool: tg3_self_test.cold+0x17: relocation to !ENDBR: tg3_self_test+0x2e1
drivers/target/iscsi/cxgbit/cxgbit.prelink.o: warning: objtool: __cxgbit_free_conn.cold+0x24: relocation to !ENDBR: __cxgbit_free_conn+0xfb
net/can/can.prelink.o: warning: objtool: can_rx_unregister.cold+0x2c: relocation to !ENDBR: can_rx_unregister+0x11b
drivers/net/ethernet/qlogic/qed/qed.prelink.o: warning: objtool: qed_spq_post+0xc0: relocation to !ENDBR: qed_spq_post.cold+0x9a
drivers/net/ethernet/qlogic/qed/qed.prelink.o: warning: objtool: qed_iwarp_ll2_comp_syn_pkt.cold+0x12f: relocation to !ENDBR: qed_iwarp_ll2_comp_syn_pkt+0x34b
net/tipc/tipc.prelink.o: warning: objtool: tipc_nametbl_publish.cold+0x21: relocation to !ENDBR: tipc_nametbl_publish+0xa6
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/d8f1ab6a23a6105bc023c132b105f245c7976be6.1694476559.git.jpoimboe@kernel.org
For SMT4, any group with more than 2 tasks will be marked as
group_smt_balance. Retain the behaviour of group_has_spare by marking
the busiest group as the group which has the least number of idle_cpus.
Also, handle rounding effect of adding (ncores_local + ncores_busy) when
the local is fully idle and busy group imbalance is less than 2 tasks.
Local group should try to pull at least 1 task in this case so imbalance
should be set to 2 instead.
Fixes: fee1759e4f04 ("sched/fair: Determine active load balance for SMT sched groups")
Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/6cd1633036bb6b651af575c32c2a9608a106702c.camel@linux.intel.com
Pull smb server fixes from Steve French:
"Six SMB3 server fixes for various races found by RO0T Lab of Huawei:
- Fix oops when racing between oplock break ack and freeing file
- Simultaneous request fixes for parallel logoffs, and for parallel
lock requests
- Fixes for tree disconnect race, session expire race, and close/open
race"
* tag '6.6-rc4-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix race condition between tree conn lookup and disconnect
ksmbd: fix race condition from parallel smb2 lock requests
ksmbd: fix race condition from parallel smb2 logoff requests
ksmbd: fix uaf in smb20_oplock_break_ack
ksmbd: fix race condition with fp
ksmbd: fix race condition between session lookup and expire
Pull misc scheduler fixes from Ingo Molnar:
- Two EEVDF fixes: one to fix sysctl_sched_base_slice propagation, and
to fix an avg_vruntime() corner-case.
- A cpufreq frequency scaling fix
* tag 'sched-urgent-2023-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpufreq: schedutil: Update next_freq when cpufreq_limits change
sched/eevdf: Fix avg_vruntime()
sched/eevdf: Also update slice on placement
if thread A in smb2_write is using work-tcon, other thread B use
smb2_tree_disconnect free the tcon, then thread A will use free'd tcon.
Time
+
Thread A | Thread A
smb2_write | smb2_tree_disconnect
|
|
| kfree(tree_conn)
|
// UAF! |
work->tcon->share_conf |
+
This patch add state, reference count and lock for tree conn to fix race
condition issue.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull misc x86 fixes from Ingo Molnar:
- Fix SEV-SNP guest crashes that may happen on NMIs
- Fix a potential SEV platform memory setup overflow
* tag 'x86-urgent-2023-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Change npages to unsigned long in snp_accept_memory()
x86/sev: Use the GHCB protocol when available for SNP CPUID requests
When cpufreq's policy is 'single', there is a scenario that will
cause sg_policy's next_freq to be unable to update.
When the CPU's util is always max, the cpufreq will be max,
and then if we change the policy's scaling_max_freq to be a
lower freq, indeed, the sg_policy's next_freq need change to
be the lower freq, however, because the cpu_is_busy, the next_freq
would keep the max_freq.
For example:
The cpu7 is a single CPU:
unisoc:/sys/devices/system/cpu/cpufreq/policy7 # while true;do done& [1] 4737
unisoc:/sys/devices/system/cpu/cpufreq/policy7 # taskset -p 80 4737
pid 4737's current affinity mask: ff
pid 4737's new affinity mask: 80
unisoc:/sys/devices/system/cpu/cpufreq/policy7 # cat scaling_max_freq
2301000
unisoc:/sys/devices/system/cpu/cpufreq/policy7 # cat scaling_cur_freq
2301000
unisoc:/sys/devices/system/cpu/cpufreq/policy7 # echo 2171000 > scaling_max_freq
unisoc:/sys/devices/system/cpu/cpufreq/policy7 # cat scaling_max_freq
2171000
At this time, the sg_policy's next_freq would stay at 2301000, which
is wrong.
To fix this, add a check for the ->need_freq_update flag.
[ mingo: Clarified the changelog. ]
Co-developed-by: Guohua Yan <guohua.yan@unisoc.com>
Signed-off-by: Xuewen Yan <xuewen.yan@unisoc.com>
Signed-off-by: Guohua Yan <guohua.yan@unisoc.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: "Rafael J. Wysocki" <rafael@kernel.org>
Link: https://lore.kernel.org/r/20230719130527.8074-1-xuewen.yan@unisoc.com
There is a race condition issue between parallel smb2 lock request.
Time
+
Thread A | Thread A
smb2_lock | smb2_lock
|
insert smb_lock to lock_list |
spin_unlock(&work->conn->llist_lock) |
|
| spin_lock(&conn->llist_lock);
| kfree(cmp_lock);
|
// UAF! |
list_add(&smb_lock->llist, &rollback_list) +
This patch swaps the line for adding the smb lock to the rollback list and
adding the lock list of connection to fix the race issue.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull parisc fixes from Helge Deller:
- fix random faults in mmap'd memory on pre PA8800 processors
- fix boot crash with nr_cpus=1 on kernel command line
* tag 'parisc-for-6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Restore __ldcw_align for PA-RISC 2.0 processors
parisc: Fix crash with nr_cpus=1 option
In snp_accept_memory(), the npages variables value is calculated from
phys_addr_t variables but is an unsigned int. A very large range passed
into snp_accept_memory() could lead to truncating npages to zero. This
doesn't happen at the moment but let's be prepared.
Fixes: 6c3211796326 ("x86/sev: Add SNP-specific unaccepted memory support")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/6d511c25576494f682063c9fb6c705b526a3757e.1687441505.git.thomas.lendacky@amd.com
The expectation is that placing a task at avg_vruntime() makes it
eligible. Turns out there is a corner case where this is not the case.
Specifically, avg_vruntime() relies on the fact that integer division
is a flooring function (eg. it discards the remainder). By this
property the value returned is slightly left of the true average.
However! when the average is a negative (relative to min_vruntime) the
effect is flipped and it becomes a ceil, with the result that the
returned value is just right of the average and thus not eligible.
Fixes: af4cf40470c2 ("sched/fair: Add cfs_rq::avg_vruntime")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
If parallel smb2 logoff requests come in before closing door, running
request count becomes more than 1 even though connection status is set to
KSMBD_SESS_NEED_RECONNECT. It can't get condition true, and sleep forever.
This patch fix race condition problem by returning error if connection
status was already set to KSMBD_SESS_NEED_RECONNECT.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull smb client fixes from Steve French:
- protect cifs/smb3 socket connect from BPF address overwrite
- fix case when directory leases disabled but wasting resources with
unneeded thread on each mount
* tag '6.6-rc4-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: do not start laundromat thread on nohandlecache
smb: use kernel_connect() and kernel_bind()
Back in 2005, Kyle McMartin removed the 16-byte alignment for
ldcw semaphores on PA 2.0 machines (CONFIG_PA20). This broke
spinlocks on pre PA8800 processors. The main symptom was random
faults in mmap'd memory (e.g., gcc compilations, etc).
Unfortunately, the errata for this ldcw change is lost.
The issue is the 16-byte alignment required for ldcw semaphore
instructions can only be reduced to natural alignment when the
ldcw operation can be handled coherently in cache. Only PA8800
and PA8900 processors actually support doing the operation in
cache.
Aligning the spinlock dynamically adds two integer instructions
to each spinlock.
Tested on rp3440, c8000 and a500.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Link: https://lore.kernel.org/linux-parisc/6b332788-2227-127f-ba6d-55e99ecf4ed8@bell.net/T/#t
Link: https://lore.kernel.org/linux-parisc/20050609050702.GB4641@roadwarrior.mcmartin.ca/
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
SNP retrieves the majority of CPUID information from the SNP CPUID page.
But there are times when that information needs to be supplemented by the
hypervisor, for example, obtaining the initial APIC ID of the vCPU from
leaf 1.
The current implementation uses the MSR protocol to retrieve the data from
the hypervisor, even when a GHCB exists. The problem arises when an NMI
arrives on return from the VMGEXIT. The NMI will be immediately serviced
and may generate a #VC requiring communication with the hypervisor.
Since a GHCB exists in this case, it will be used. As part of using the
GHCB, the #VC handler will write the GHCB physical address into the GHCB
MSR and the #VC will be handled.
When the NMI completes, processing resumes at the site of the VMGEXIT
which is expecting to read the GHCB MSR and find a CPUID MSR protocol
response. Since the NMI handling overwrote the GHCB MSR response, the
guest will see an invalid reply from the hypervisor and self-terminate.
Fix this problem by using the GHCB when it is available. Any NMI
received is properly handled because the GHCB contents are copied into
a backup page and restored on NMI exit, thus preserving the active GHCB
request or result.
[ bp: Touchups. ]
Fixes: ee0bfa08a345 ("x86/compressed/64: Add support for SEV-SNP CPUID table in #VC handlers")
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/a5856fa1ebe3879de91a8f6298b6bbd901c61881.1690578565.git.thomas.lendacky@amd.com
Tasks that never consume their full slice would not update their slice value.
This means that tasks that are spawned before the sysctl scaling keep their
original (UP) slice length.
Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230915124822.847197830@noisy.programming.kicks-ass.net
Pull xfs fixes from Chandan Babu:
- Prevent filesystem hang when executing fstrim operations on large and
slow storage
* tag 'xfs-6.6-fixes-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: abort fstrim if kernel is suspending
xfs: reduce AGF hold times during fstrim operations
xfs: move log discard work to xfs_discard.c
Honor 'nohandlecache' mount option by not starting laundromat thread
even when SMB server supports directory leases. Do not waste system
resources by having laundromat thread running with no directory
caching at all.
Fixes: 2da338ff752a ("smb3: do not start laundromat thread when dir leases disabled")
Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
fp can used in each command. If smb2_close command is coming at the
same time, UAF issue can happen by race condition.
Time
+
Thread A | Thread B1 B2 .... B5
smb2_open | smb2_close
|
__open_id |
insert fp to file_table |
|
| atomic_dec_and_test(&fp->refcount)
| if fp->refcount == 0, free fp by kfree.
// UAF! |
use fp |
+
This patch add f_state not to use freed fp is used and not to free fp in
use.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull device mapper fixes from Mike Snitzer:
- Fix memory leak when freeing dm zoned target device
- Update dm-devel mailing list address in MAINTAINERS
* tag 'for-6.6/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
MAINTAINERS: update the dm-devel mailing list
dm zoned: free dmz->ddev array in dmz_put_zoned_devices
xfs: reduce AGF hold times during fstrim operations
A recent log space overflow and recovery failure was root caused to
a long running truncate blocking on the AGF and ending up pinning
the tail of the log. The filesystem then hung, the machine was
rebooted, and log recoery then refused to run because there wasn't
enough space in the log for EFI transaction reservation.
The reason the long running truncate got blocked on the AGF for so
long was that an fstrim was being run. THe underlying block device
was large and very slow (10TB ceph rbd volume) and so discarding all
the free space in the AG took a really long time.
The current fstrim implementation holds the AGF across the entire
operations - both the freee space scan and the issuing of all the
discards. The discards are synchronous and single depth, so if there
are millions of free spaces, we hold the AGF lock across millions of
discard operations.
It doesn't really need to be said that this is a Bad Thing.
This series reworks the fstrim discard path to use the same
mechanisms as online discard. This allows discards to be issued
asynchronously without holding the AGF locked, enabling higher
discard queue depths (much faster on fast devices) and only
requiring the AGF lock to be held whilst we are scanning free space.
To do this, we make use of busy extents - we lock the AGF, mark all
the extents we want to discard as "busy under discard" so that
nothing will be allowed to allocate them, and then drop the AGF
lock. We then issue discards on the gathered busy extents and on
discard completion remove them from the busy list.
This results in AGF lock holds times for fstrim dropping to a few
milliseconds each batch of free extents we scan, and so the hours
long hold times that can currently occur on large, slow, badly
fragmented device no longer occur.
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>
* tag 'xfs-fstrim-busy-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs:
xfs: abort fstrim if kernel is suspending
xfs: reduce AGF hold times during fstrim operations
xfs: move log discard work to xfs_discard.c
Recent changes to kernel_connect() and kernel_bind() ensure that
callers are insulated from changes to the address parameter made by BPF
SOCK_ADDR hooks. This patch wraps direct calls to ops->connect() and
ops->bind() with kernel_connect() and kernel_bind() to ensure that SMB
mounts do not see their mount address overwritten in such cases.
Link: https://lore.kernel.org/netdev/9944248dba1bce861375fcce9de663934d933ba9.camel@redhat.com/
Cc: <stable@vger.kernel.org> # 6.0+
Signed-off-by: Jordan Rife <jrife@google.com>
Acked-by: Paulo Alcantara (SUSE) <pc@manguebit.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull Kbuild fixes from Masahiro Yamada:
- Fix the module compression with xz so the in-kernel decompressor
works
- Document a kconfig idiom to express an optional dependency between
modules
- Make modpost, when W=1 is given, detect broken drivers that reference
.exit.* sections
- Remove unused code
* tag 'kbuild-fixes-v6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kbuild: remove stale code for 'source' symlink in packaging scripts
modpost: Don't let "driver"s reference .exit.*
vmlinux.lds.h: remove unused CPU_KEEP and CPU_DISCARD macros
modpost: add missing else to the "of" check
Documentation: kbuild: explain handling optional dependencies
kbuild: Use CRC32 and a 1MiB dictionary for XZ compressed modules
Thread A + Thread B
ksmbd_session_lookup | smb2_sess_setup
sess = xa_load |
|
| xa_erase(&conn->sessions, sess->id);
|
| ksmbd_session_destroy(sess) --> kfree(sess)
|
// UAF! |
sess->last_active = jiffies |
+
This patch add rwsem to fix race condition between ksmbd_session_lookup
and ksmbd_expire_session.
Reported-by: luosili <rootlab@huawei.com>
Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Steve French <stfrench@microsoft.com>
Pull media fixes from Mauro Carvalho Chehab:
- two Kconfig build fixes under randconfig
- pxa_camera: Fix an error handling path
- mediatek: vcodec: Fix a NULL-access pointer
- tegra-video: fix an infinite recursion regression
* tag 'media/v6.6-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: mediatek: vcodec: Fix encoder access NULL pointer
staging: media: tegra-video: fix infinite recursion regression
media: pci: intel: ivsc: select V4L2_FWNODE
media: ipu-bridge: Fix Kconfig dependencies
media: pxa_camera: Fix an error handling path in pxa_camera_probe()
A recent ext4 patch posting from Jan Kara reminded me of a
discussion a year ago about fstrim in progress preventing kernels
from suspending. The fix is simple, we should do the same for XFS.
This removes the -ERESTARTSYS error return from this code, replacing
it with either the last error seen or the number of blocks
successfully trimmed up to the point where we detected the stop
condition.
References: https://bugzilla.kernel.org/show_bug.cgi?id=216322
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Pull x86 fixes from Ingo Molnar:
"Misc fixes:
- Fix an UV boot crash
- Skip spurious ENDBR generation on _THIS_IP_
- Fix ENDBR use in putuser() asm methods
- Fix corner case boot crashes on 5-level paging
- and fix a false positive WARNING on LTO kernels"
* tag 'x86-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/purgatory: Remove LTO flags
x86/boot/compressed: Reserve more memory for page tables
x86/ibt: Avoid duplicate ENDBR in __put_user_nocheck*()
x86/ibt: Suppress spurious ENDBR
x86/platform/uv: Use alternate source for socket to node data
Pull misc fixes from Andrew Morton:
"Fourteen hotfixes, eleven of which are cc:stable. The remainder
pertain to issues which were introduced after 6.5"
* tag 'mm-hotfixes-stable-2023-10-01-08-34' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
Crash: add lock to serialize crash hotplug handling
selftests/mm: fix awk usage in charge_reserved_hugetlb.sh and hugetlb_reparenting_test.sh that may cause error
mm: mempolicy: keep VMA walk if both MPOL_MF_STRICT and MPOL_MF_MOVE are specified
mm/damon/vaddr-test: fix memory leak in damon_do_test_apply_three_regions()
mm, memcg: reconsider kmem.limit_in_bytes deprecation
mm: zswap: fix potential memory corruption on duplicate store
arm64: hugetlb: fix set_huge_pte_at() to work with all swap entries
mm: hugetlb: add huge page size param to set_huge_pte_at()
maple_tree: add MAS_UNDERFLOW and MAS_OVERFLOW states
maple_tree: add mas_is_active() to detect in-tree walks
nilfs2: fix potential use after free in nilfs_gccache_submit_read_data()
mm: abstract moving to the next PFN
mm: report success more often from filemap_map_folio_range()
fs: binfmt_elf_efpic: fix personality for ELF-FDPIC
Pull devicetree fixes from Rob Herring:
- Fix potential memory leak in of_changeset_action()
- Fix some i.MX binding warnings
- Fix typo in renesas,vin binding field-even-active property
- Fix andestech,ax45mp-cache example unit-address
- Add missing additionalProperties on RiscV CPU interrupt-controller
node
- Add missing unevaluatedProperties on media bindings
- Fix brcm,iproc-pcie binding 'msi' child node schema
- Fix MEMSIC MXC4005 compatible string
* tag 'devicetree-fixes-for-6.6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux:
dt-bindings: trivial-devices: Fix MEMSIC MXC4005 compatible string
dt-bindings: PCI: brcm,iproc-pcie: Fix 'msi' child node schema
dt-bindings: PCI: brcm,iproc-pcie: Drop common pci-bus properties
dt-bindings: PCI: brcm,iproc-pcie: Fix example indentation
media: dt-bindings: Add missing unevaluatedProperties on child node schemas
dt-bindings: bus: fsl,imx8qxp-pixel-link-msi-bus: Drop child 'reg' property
media: dt-bindings: imx7-csi: Make power-domains not required for imx8mq
dt-bindings: media: renesas,vin: Fix field-even-active spelling
dt-bindings: cache: andestech,ax45mp-cache: Fix unit address in example
of: overlay: Reorder struct fragment fields kerneldoc
dt-bindings: display: fsl,imx6-hdmi: Change to 'unevaluatedProperties: false'
dt-bindings: riscv: cpus: Add missing additionalProperties on interrupt-controller node
of: dynamic: Fix potential memory leak in of_changeset_action()
Need to set the private data with encoder device, or will access
NULL pointer in encoder handler.
Fixes: 1972e32431ed ("media: mediatek: vcodec: Fix possible invalid memory access for encoder")
Signed-off-by: Irui Wang <irui.wang@mediatek.com>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Commit 4dba12881f88 ("dm zoned: support arbitrary number of devices")
made the pointers to additional zoned devices to be stored in a
dynamically allocated dmz->ddev array. However, this array is not freed.
Rename dmz_put_zoned_device to dmz_put_zoned_devices and fix it to
free the dmz->ddev array when cleaning up zoned device information.
Remove NULL assignment for all dmz->ddev elements and just free the
dmz->ddev array instead.
Found by Linux Verification Center (linuxtesting.org).
Fixes: 4dba12881f88 ("dm zoned: support arbitrary number of devices")
Cc: stable@vger.kernel.org
Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru>
Signed-off-by: Mike Snitzer <snitzer@kernel.org>
fstrim will hold the AGF lock for as long as it takes to walk and
discard all the free space in the AG that meets the userspace trim
criteria. For AGs with lots of free space extents (e.g. millions)
or the underlying device is really slow at processing discard
requests (e.g. Ceph RBD), this means the AGF hold time is often
measured in minutes to hours, not a few milliseconds as we normal
see with non-discard based operations.
This can result in the entire filesystem hanging whilst the
long-running fstrim is in progress. We can have transactions get
stuck waiting for the AGF lock (data or metadata extent allocation
and freeing), and then more transactions get stuck waiting on the
locks those transactions hold. We can get to the point where fstrim
blocks an extent allocation or free operation long enough that it
ends up pinning the tail of the log and the log then runs out of
space. At this point, every modification in the filesystem gets
blocked. This includes read operations, if atime updates need to be
made.
To fix this problem, we need to be able to discard free space
extents safely without holding the AGF lock. Fortunately, we already
do this with online discard via busy extents. We can mark free space
extents as "busy being discarded" under the AGF lock and then unlock
the AGF, knowing that nobody will be able to allocate that free
space extent until we remove it from the busy tree.
Modify xfs_trim_extents to use the same asynchronous discard
mechanism backed by busy extents as is used with online discard.
This results in the AGF only needing to be held for short periods of
time and it is never held while we issue discards. Hence if discard
submission gets throttled because it is slow and/or there are lots
of them, we aren't preventing other operations from being performed
on AGF while we wait for discards to complete...
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Pull scheduler fixes from Ingo Molnar:
"Fix a performance regression on large SMT systems, an Intel SMT4
balancing bug, and a topology setup bug on (Intel) hybrid processors"
* tag 'sched-urgent-2023-09-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sched: Restore the SD_ASYM_PACKING flag in the DIE domain
sched/fair: Fix SMT4 group_smt_balance handling
sched/fair: Optimize should_we_balance() for large SMT systems
-flto* implies -ffunction-sections. With LTO enabled, ld.lld generates
multiple .text sections for purgatory.ro:
$ readelf -S purgatory.ro | grep " .text"
[ 1] .text PROGBITS 0000000000000000 00000040
[ 7] .text.purgatory PROGBITS 0000000000000000 000020e0
[ 9] .text.warn PROGBITS 0000000000000000 000021c0
[13] .text.sha256_upda PROGBITS 0000000000000000 000022f0
[15] .text.sha224_upda PROGBITS 0000000000000000 00002be0
[17] .text.sha256_fina PROGBITS 0000000000000000 00002bf0
[19] .text.sha224_fina PROGBITS 0000000000000000 00002cc0
This causes WARNING from kexec_purgatory_setup_sechdrs():
WARNING: CPU: 26 PID: 110894 at kernel/kexec_file.c:919
kexec_load_purgatory+0x37f/0x390
Fix this by disabling LTO for purgatory.
[ AFAICT, x86 is the only arch that supports LTO and purgatory. ]
We could also fix this with an explicit linker script to rejoin .text.*
sections back into .text. However, given the benefit of LTOing purgatory
is small, simply disable the production of more .text.* sections for now.
Fixes: b33fff07e3e3 ("x86, build: allow LTO to be selected")
Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230914170138.995606-1-song@kernel.org
Pull misc driver fix from Greg KH:
"Here is a single, much requested, fix for a set of misc drivers to
resolve a much reported regression in the -rc series that has also
propagated back to the stable releases. Sorry for the delay, lots of
conference travel for a few weeks put me very far behind in patch
wrangling.
It has been reported by many to resolve the reported problem, and has
been in linux-next with no reported issues"
* tag 'char-misc-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
misc: rtsx: Fix some platforms can not boot and move the l1ss judgment to probe
Eric reported that handling corresponding crash hotplug event can be
failed easily when many memory hotplug event are notified in a short
period. They failed because failing to take __kexec_lock.
=======
[ 78.714569] Fallback order for Node 0: 0
[ 78.714575] Built 1 zonelists, mobility grouping on. Total pages: 1817886
[ 78.717133] Policy zone: Normal
[ 78.724423] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[ 78.727207] crash hp: kexec_trylock() failed, elfcorehdr may be inaccurate
[ 80.056643] PEFILE: Unsigned PE binary
=======
The memory hotplug events are notified very quickly and very many, while
the handling of crash hotplug is much slower relatively. So the atomic
variable __kexec_lock and kexec_trylock() can't guarantee the
serialization of crash hotplug handling.
Here, add a new mutex lock __crash_hotplug_lock to serialize crash hotplug
handling specifically. This doesn't impact the usage of __kexec_lock.
Link: https://lkml.kernel.org/r/20230926120905.392903-1-bhe@redhat.com
Fixes: 247262756121 ("crash: add generic infrastructure for crash hotplug support")
Signed-off-by: Baoquan He <bhe@redhat.com>
Tested-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Eric DeVolder <eric.devolder@oracle.com>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Cc: Sourabh Jain <sourabhjain@linux.ibm.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Drivers must not reference functions marked with __exit as these likely
are not available when the code is built-in.
There are few creative offenders uncovered for example in ARCH=amd64
allmodconfig builds. So only trigger the section mismatch warning for
W=1 builds.
The dual rule that drivers must not reference .init.* is implemented
since commit 0db252452378 ("modpost: don't allow *driver to reference
.init.*") which however missed that .exit.* should be handled in the
same way.
Thanks to Masahiro Yamada and Arnd Bergmann who gave valuable hints to
find this improvement.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull gpio fixes from Bartosz Golaszewski:
"Another round of driver one-liners from the GPIO subsystem:
- disable pin control on MMP GPIOs in gpio-pxa
- fix the GPIO number passed to one of the pinctrl callbacks in
gpio-aspeed"
* tag 'gpio-fixes-for-v6.6-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
gpio: aspeed: fix the GPIO number passed to pinctrl_gpio_set_config()
gpio: pxa: disable pinctrl calls for MMP_GPIO
The correct name of this chip is MXC4005, not MX4005. This is confirmed
both by the manufacturer website and by the title of the original commit,
which added other MXCxxxx devices as well but only this one misses a "c" in
the compatible string.
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
Fixes: d9bf5d37fd58 ("dt-bindings:trivial-devices: Add memsic,mxc4005/mxc6255/mxc6655 entries")
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Link: https://lore.kernel.org/r/20231004-mxc4005-device-tree-support-v1-1-e7c0faea72e4@bootlin.com
Signed-off-by: Rob Herring <robh@kernel.org>
Since commit 9bf19fbf0c8b ("media: v4l: async: Rework internal lists"), aka
v6.6-rc1~97^2~198, probing the tegra-video VI driver causes infinite
recursion due tegra_vi_graph_parse_one() calling itself until:
[ 1.571168] Insufficient stack space to handle exception!
...
[ 1.591416] Internal error: kernel stack overflow: 0 [#1] PREEMPT SMP ARM
...
[ 3.861013] of_phandle_iterator_init from __of_parse_phandle_with_args+0x40/0xf0
[ 3.868497] __of_parse_phandle_with_args from of_fwnode_graph_get_remote_endpoint+0x68/0xa8
[ 3.876938] of_fwnode_graph_get_remote_endpoint from fwnode_graph_get_remote_port_parent+0x30/0x7c
[ 3.885984] fwnode_graph_get_remote_port_parent from tegra_vi_graph_parse_one+0x7c/0x224
[ 3.894158] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 3.901459] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 3.908760] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 3.916061] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
...
[ 4.857892] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 4.865193] tegra_vi_graph_parse_one from tegra_vi_graph_parse_one+0x144/0x224
[ 4.872494] tegra_vi_graph_parse_one from tegra_vi_init+0x574/0x6d4
[ 4.878842] tegra_vi_init from host1x_device_init+0x84/0x15c
[ 4.884594] host1x_device_init from host1x_video_probe+0xa0/0x114
[ 4.890770] host1x_video_probe from really_probe+0xe0/0x400
The reason is the mentioned commit changed tegra_vi_graph_find_entity() to
search for an entity in the done notifier list:
> @@ -1464,7 +1464,7 @@ tegra_vi_graph_find_entity(struct tegra_vi_channel *chan,
> struct tegra_vi_graph_entity *entity;
> struct v4l2_async_connection *asd;
>
> - list_for_each_entry(asd, &chan->notifier.asc_list, asc_entry) {
> + list_for_each_entry(asd, &chan->notifier.done_list, asc_entry) {
> entity = to_tegra_vi_graph_entity(asd);
> if (entity->asd.match.fwnode == fwnode)
> return entity;
This is not always correct, being tegra_vi_graph_find_entity() called in
three locations, in this order:
1. tegra_vi_graph_parse_one() -- called while probing
2. tegra_vi_graph_notify_bound() -- the .bound notifier op
3. tegra_vi_graph_build() -- called in the .complete notifier op
Locations 1 and 2 are called before moving the entity from waiting_list to
done_list, thus they won't find what they are looking for in
done_list. Location 3 happens afterwards and thus it is not broken, however
it means tegra_vi_graph_find_entity() should not search in the same list
every time.
The error appears at step 1: tegra_vi_graph_parse_one() iterates
recursively until it finds the entity already notified, which now never
happens.
Fix by passing the specific notifier list pointer to
tegra_vi_graph_find_entity() instead of the channel, so each caller can
search in whatever list is correct.
Also improve the tegra_vi_graph_find_entity() comment.
Fixes: 9bf19fbf0c8b ("media: v4l: async: Rework internal lists")
Cc: Thierry Reding <thierry.reding@gmail.com>
Cc: Jonathan Hunter <jonathanh@nvidia.com>
Cc: Sowjanya Komatineni <skomatineni@nvidia.com>
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
[Sakari Ailus: Wrapped some long lines.]
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Commit 8f2d6c41e5a6 ("x86/sched: Rewrite topology setup") dropped the
SD_ASYM_PACKING flag in the DIE domain added in commit 044f0e27dec6
("x86/sched: Add the SD_ASYM_PACKING flag to the die domain of hybrid
processors"). Restore it on hybrid processors.
The die-level domain does not depend on any build configuration and now
x86_sched_itmt_flags() is always needed. Remove the build dependency on
CONFIG_SCHED_[SMT|CLUSTER|MC].
Fixes: 8f2d6c41e5a6 ("x86/sched: Rewrite topology setup")
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Link: https://lkml.kernel.org/r/20230815035747.11529-1-ricardo.neri-calderon@linux.intel.com
The decompressor has a hard limit on the number of page tables it can
allocate. This limit is defined at compile-time and will cause boot
failure if it is reached.
The kernel is very strict and calculates the limit precisely for the
worst-case scenario based on the current configuration. However, it is
easy to forget to adjust the limit when a new use-case arises. The
worst-case scenario is rarely encountered during sanity checks.
In the case of enabling 5-level paging, a use-case was overlooked. The
limit needs to be increased by one to accommodate the additional level.
This oversight went unnoticed until Aaron attempted to run the kernel
via kexec with 5-level paging and unaccepted memory enabled.
Update wost-case calculations to include 5-level paging.
To address this issue, let's allocate some extra space for page tables.
128K should be sufficient for any use-case. The logic can be simplified
by using a single value for all kernel configurations.
[ Also add a warning, should this memory run low - by Dave Hansen. ]
Fixes: 34bbb0009f3b ("x86/boot/compressed: Enable 5-level paging during decompression stage")
Reported-by: Aaron Lu <aaron.lu@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230915070221.10266-1-kirill.shutemov@linux.intel.com
Pull tty / serial driver fixes from Greg KH:
"Here are two tty/serial driver fixes for 6.6-rc4 that resolve some
reported regressions:
- revert a n_gsm change that ended up causing problems
- 8250_port fix for irq data
both have been in linux-next for over a week with no reported
problems"
* tag 'tty-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
Revert "tty: n_gsm: fix UAF in gsm_cleanup_mux"
serial: 8250_port: Check IRQ data before use
commit 101bd907b424 ("misc: rtsx: judge ASPM Mode to set PETXCFG Reg")
some readers no longer force #CLKREQ to low
when the system need to enter ASPM.
But some platform maybe not implement complete ASPM?
it causes some platforms can not boot
Like in the past only the platform support L1ss we release the #CLKREQ.
Move the judgment (L1ss) to probe,
we think read config space one time when the driver start is enough
Fixes: 101bd907b424 ("misc: rtsx: judge ASPM Mode to set PETXCFG Reg")
Cc: stable <stable@kernel.org>
Reported-by: Paul Grandperrin <paul.grandperrin@gmail.com>
Signed-off-by: Ricky Wu <ricky_wu@realtek.com>
Tested-By: Jade Lovelace <lists@jade.fyi>
Link: https://lore.kernel.org/r/37b1afb997f14946a8784c73d1f9a4f5@realtek.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
According to the awk manual, the -e option does not need to be specified
in front of 'program' (unless you need to mix program-file).
The redundant -e option can cause error when users use awk tools other
than gawk (for example, mawk does not support the -e option).
Error Example:
awk: not an option: -e
Link: https://lkml.kernel.org/r/VI1P193MB075228810591AF2FDD7D42C599C3A@VI1P193MB0752.EURP193.PROD.OUTLOOK.COM
Signed-off-by: Juntong Deng <juntong.deng@outlook.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Pull rdma fixes from Jason Gunthorpe:
"This includes a fix for a significant security miss in checking the
RDMA_NLDEV_CMD_SYS_SET operation.
Summary:
- UAF in SRP
- Error unwind failure in siw connection management
- Missing error checks
- NULL/ERR_PTR confusion in erdma
- Possible string truncation in CMA configfs and mlx4
- Data ordering issue in bnxt_re
- Missing stats decrement on object destroy in bnxt_re
- Mlx5 bugs in this merge window:
* Incorrect access_flag in the new mkey cache
* Missing unlock on error in flow steering
* lockdep possible deadlock on new mkey cache destruction (Plus a
fix for this too)
- Don't leak kernel stack memory to userspace in the CM
- Missing permission validation for RDMA_NLDEV_CMD_SYS_SET"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/core: Require admin capabilities to set system parameters
RDMA/mlx5: Remove not-used cache disable flag
RDMA/cma: Initialize ib_sa_multicast structure to 0 when join
RDMA/mlx5: Fix mkey cache possible deadlock on cleanup
RDMA/mlx5: Fix NULL string error
RDMA/mlx5: Fix mutex unlocking on error flow for steering anchor creation
RDMA/mlx5: Fix assigning access flags to cache mkeys
IB/mlx4: Fix the size of a buffer in add_port_entries()
RDMA/bnxt_re: Decrement resource stats correctly
RDMA/bnxt_re: Fix the handling of control path response data
RDMA/cma: Fix truncation compilation warning in make_cma_ports
RDMA/erdma: Fix NULL pointer access in regmr_cmd
RDMA/erdma: Fix error code in erdma_create_scatter_mtt()
RDMA/uverbs: Fix typo of sizeof argument
RDMA/cxgb4: Check skb value for failure to allocate
RDMA/siw: Fix connection failure handling
RDMA/srp: Do not call scsi_done() from srp_abort()
pinctrl_gpio_set_config() expects the GPIO number from the global GPIO
numberspace, not the controller-relative offset, which needs to be added
to the chip base.
Fixes: 5ae4cb94b313 ("gpio: aspeed: Add debounce support")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Reviewed-by: Andrew Jeffery <andrew@codeconstruct.com.au>
The 'msi' child node schema is missing constraints on additional properties.
It turns out it is incomplete and properties for it are documented in the
parent node by mistake. Move the reference to msi-controller.yaml and
the custom properties to the 'msi' node. Adding 'unevaluatedProperties'
ensures all the properties in the 'msi' node are documented.
With the schema corrected, a minimal interrupt controller node is needed
to properly decode the interrupt properties since the example has
multiple interrupt parents.
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com>
Fixes: 905b986d099c ("dt-bindings: pci: Convert iProc PCIe to YAML")
Link: https://lore.kernel.org/r/20230926155613.33904-3-robh@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
Some missing select statements were already added back, but I ran into
another one that is missing:
ERROR: modpost: "v4l2_fwnode_endpoint_free" [drivers/media/pci/intel/ivsc/ivsc-csi.ko] undefined!
ERROR: modpost: "v4l2_fwnode_endpoint_alloc_parse" [drivers/media/pci/intel/ivsc/ivsc-csi.ko] undefined!
ERROR: modpost: "v4l2_fwnode_endpoint_parse" [drivers/media/pci/intel/ivsc/ivsc-csi.ko] undefined!
Fixes: 29006e196a56 ("media: pci: intel: ivsc: Add CSI submodule")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
[Sakari Ailus: Drop V4L2_ASYNC dependency, it is implied now.]
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
Cold functions and their non-cold counterparts can use _THIS_IP_ to
reference each other. Don't warn about !ENDBR in that case.
Note that for GCC this is currently irrelevant in light of the following
commit
c27cd083cfb9 ("Compiler attributes: GCC cold function alignment workarounds")
which disabled cold functions in the kernel. However this may still be
possible with Clang.
Fixes several warnings like the following:
drivers/scsi/bnx2i/bnx2i.prelink.o: warning: objtool: bnx2i_hw_ep_disconnect+0x19d: relocation to !ENDBR: bnx2i_hw_ep_disconnect.cold+0x0
drivers/net/ipvlan/ipvlan.prelink.o: warning: objtool: ipvlan_addr4_event.cold+0x28: relocation to !ENDBR: ipvlan_addr4_event+0xda
drivers/net/ipvlan/ipvlan.prelink.o: warning: objtool: ipvlan_addr6_event.cold+0x26: relocation to !ENDBR: ipvlan_addr6_event+0xb7
drivers/net/ethernet/broadcom/tg3.prelink.o: warning: objtool: tg3_set_ringparam.cold+0x17: relocation to !ENDBR: tg3_set_ringparam+0x115
drivers/net/ethernet/broadcom/tg3.prelink.o: warning: objtool: tg3_self_test.cold+0x17: relocation to !ENDBR: tg3_self_test+0x2e1
drivers/target/iscsi/cxgbit/cxgbit.prelink.o: warning: objtool: __cxgbit_free_conn.cold+0x24: relocation to !ENDBR: __cxgbit_free_conn+0xfb
net/can/can.prelink.o: warning: objtool: can_rx_unregister.cold+0x2c: relocation to !ENDBR: can_rx_unregister+0x11b
drivers/net/ethernet/qlogic/qed/qed.prelink.o: warning: objtool: qed_spq_post+0xc0: relocation to !ENDBR: qed_spq_post.cold+0x9a
drivers/net/ethernet/qlogic/qed/qed.prelink.o: warning: objtool: qed_iwarp_ll2_comp_syn_pkt.cold+0x12f: relocation to !ENDBR: qed_iwarp_ll2_comp_syn_pkt+0x34b
net/tipc/tipc.prelink.o: warning: objtool: tipc_nametbl_publish.cold+0x21: relocation to !ENDBR: tipc_nametbl_publish+0xa6
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/d8f1ab6a23a6105bc023c132b105f245c7976be6.1694476559.git.jpoimboe@kernel.org
For SMT4, any group with more than 2 tasks will be marked as
group_smt_balance. Retain the behaviour of group_has_spare by marking
the busiest group as the group which has the least number of idle_cpus.
Also, handle rounding effect of adding (ncores_local + ncores_busy) when
the local is fully idle and busy group imbalance is less than 2 tasks.
Local group should try to pull at least 1 task in this case so imbalance
should be set to 2 instead.
Fixes: fee1759e4f04 ("sched/fair: Determine active load balance for SMT sched groups")
Acked-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/6cd1633036bb6b651af575c32c2a9608a106702c.camel@linux.intel.com