commits

Commit a402006d48a9 ("misc: pci_endpoint_test: Remove global 'irq_type'
and 'no_msi'") changed so that the default IRQ vector requested by
pci_endpoint_test_probe() was no longer the module param 'irq_type', but
instead test->irq_type. test->irq_type is by default IRQ_TYPE_UNDEFINED
(until someone calls ioctl(PCITEST_SET_IRQTYPE)).

However, the commit also changed so that after initializing test->irq_type
to IRQ_TYPE_UNDEFINED, it also overrides it with driver_data->irq_type, if
the PCI device and vendor ID provides driver_data.

This causes a regression for PCI device and vendor IDs that do not provide
driver_data, and the host side pci_endpoint_test_driver driver failed to
probe on such platforms:

pci-endpoint-test 0001:01:00.0: Invalid IRQ type selected
pci-endpoint-test 0001:01:00.0: probe with driver pci-endpoint-test failed with error -22

Considering that the pci endpoint selftests and the old pcitest.sh always
call ioctl(PCITEST_SET_IRQTYPE) before performing any test that requires
IRQs, fix the regression by removing the allocation of IRQs in
pci_endpoint_test_probe(). The IRQ allocation will occur when
ioctl(PCITEST_SET_IRQTYPE) is called.

A positive side effect of this is that even if the endpoint controller has
issues with IRQs, the user can do still do all the tests/ioctls() that do
not require working IRQs, e.g. PCITEST_BAR and PCITEST_BARS.

This also means that we can remove the now unused irq_type from
driver_data. The irq_type will always be the one configured by the user
using ioctl(PCITEST_SET_IRQTYPE). (A user that does not know, or care
which irq_type that is used, can use PCITEST_IRQ_TYPE_AUTO. This has
superseded the need for a default irq_type in driver_data.)

[bhelgaas: add probe failure details]
Fixes: a402006d48a9c ("misc: pci_endpoint_test: Remove global 'irq_type' and 'no_msi'")
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20250416142825.336554-2-cassel@kernel.org

9mo ago

Linus Torvalds

06b31bdb

Merge tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9mo ago

Chuck Lever

831e3f54

Revert "sunrpc: clean cache_detail immediately when flush is written frequently"

9mo ago

Manivannan Sadhasivam

0747c136

MAINTAINERS: Move Manivannan Sadhasivam as PCI Native host bridge and endpoint maintainer

9mo ago

Linus Torvalds

3d23ef05

Merge tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9mo ago

Kirill A. Shutemov

85fd85bc

x86/insn: Fix CTEST instruction decoding

9mo ago

Li Lingfeng

a1d14d93

nfsd: decrease sc_count directly if fail to queue dl_recall

9mo ago

Ilpo Järvinen

39e703ed

selftests/pcie_bwctrl: Fix test progs list

9mo ago

Linus Torvalds

86baa549

Merge tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9mo ago

Omar Sandoval

bbce3de7

sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash

There is a code path in dequeue_entities() that can set the slice of a
sched_entity to U64_MAX, which sometimes results in a crash.

The offending case is when dequeue_entities() is called to dequeue a
delayed group entity, and then the entity's parent's dequeue is delayed.
In that case:

1. In the if (entity_is_task(se)) else block at the beginning of
dequeue_entities(), slice is set to
cfs_rq_min_slice(group_cfs_rq(se)). If the entity was delayed, then
it has no queued tasks, so cfs_rq_min_slice() returns U64_MAX.
2. The first for_each_sched_entity() loop dequeues the entity.
3. If the entity was its parent's only child, then the next iteration
tries to dequeue the parent.
4. If the parent's dequeue needs to be delayed, then it breaks from the
first for_each_sched_entity() loop _without updating slice_.
5. The second for_each_sched_entity() loop sets the parent's ->slice to
the saved slice, which is still U64_MAX.

This throws off subsequent calculations with potentially catastrophic
results. A manifestation we saw in production was:

6. In update_entity_lag(), se->slice is used to calculate limit, which
ends up as a huge negative number.
7. limit is used in se->vlag = clamp(vlag, -limit, limit). Because limit
is negative, vlag > limit, so se->vlag is set to the same huge
negative number.
8. In place_entity(), se->vlag is scaled, which overflows and results in
another huge (positive or negative) number.
9. The adjusted lag is subtracted from se->vruntime, which increases or
decreases se->vruntime by a huge number.
10. pick_eevdf() calls entity_eligible()/vruntime_eligible(), which
incorrectly returns false because the vruntime is so far from the
other vruntimes on the queue, causing the
(vruntime - cfs_rq->min_vruntime) * load calulation to overflow.
11. Nothing appears to be eligible, so pick_eevdf() returns NULL.
12. pick_next_entity() tries to dereference the return value of
pick_eevdf() and crashes.

Dumping the cfs_rq states from the core dumps with drgn showed tell-tale
huge vruntime ranges and bogus vlag values, and I also traced se->slice
being set to U64_MAX on live systems (which was usually "benign" since
the rest of the runqueue needed to be in a particular state to crash).

Fix it in dequeue_entities() by always setting slice from the first
non-empty cfs_rq.

Fixes: aef6987d8954 ("sched/eevdf: Propagate min_slice up the cgroup hierarchy")
Signed-off-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/f0c2d1072be229e1bdddc73c0703919a8b00c652.1745570998.git.osandov@fb.com

9mo ago

Ard Biesheuvel

032ce1ea

x86/boot: Work around broken busybox 'truncate' tool

9mo ago

Eric Biggers

cd35b6cb

nfs: add missing selections of CONFIG_CRC32

9mo ago

Ilpo Järvinen

a34d7487

PCI: Restore assigned resources fully after release

9mo ago

Linus Torvalds

a226e654

Merge tag 'irq-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9mo ago

Luo Gengkun

1a97fea9

perf/x86: Fix non-sampling (counting) events on certain x86 platforms

9mo ago

Linus Torvalds

9d7a0577

gcc-15: disable '-Wunterminated-string-initialization' entirely for now

I had left the warning around but as a non-fatal error to get my gcc-15
builds going, but fixed up some of the most annoying warning cases so
that it wouldn't be *too* verbose.

Because I like the _concept_ of the warning, even if I detested the
implementation to shut it up.

It turns out the implementation to shut it up is even more broken than I
thought, and my "shut up most of the warnings" patch just caused fatal
errors on gcc-14 instead.

I had tested with clang, but when I upgrade my development environment,
I try to do it on all machines because I hate having different systems
to maintain, and hadn't realized that gcc-14 now had issues.

The ACPI case is literally why I wanted to have a *type* that doesn't
trigger the warning (see commit d5d45a7f2619: "gcc-15: make
'unterminated string initialization' just a warning"), instead of
marking individual places as "__nonstring".

But gcc-14 doesn't like that __nonstring location that shut gcc-15 up,
because it's on an array of char arrays, not on one single array:

drivers/acpi/tables.c:399:1: error: 'nonstring' attribute ignored on objects of type 'const char[][4]' [-Werror=attributes]
399 | static const char table_sigs[][ACPI_NAMESEG_SIZE] __initconst __nonstring = {
| ^~~~~~

and my attempts to nest it properly with a type had failed, because of
how gcc doesn't like marking the types as having attributes, only
symbols.

There may be some trick to it, but I was already annoyed by the bad
attribute design, now I'm just entirely fed up with it.

I wish gcc had a proper way to say "this type is a *byte* array, not a
string".

The obvious thing would be to distinguish between "char []" and an
explicitly signed "unsigned char []" (as opposed to an implicitly
unsigned char, which is typically an architecture-specific default, but
for the kernel is universal thanks to '-funsigned-char').

But any "we can typedef a 8-bit type to not become a string just because
it's an array" model would be fine.

But "__attribute__((nonstring))" is sadly not that sane model.

Reported-by: Chris Clayton <chris2553@googlemail.com>
Fixes: 4b4bd8c50f48 ("gcc-15: acpi: sprinkle random '__nonstring' crumbles around")
Fixes: d5d45a7f2619 ("gcc-15: make 'unterminated string initialization' just a warning")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

9mo ago

Juergen Gross

4ce385f5

x86/mm: Fix _pgd_alloc() for Xen PV mode

9mo ago

Chuck Lever

26a80762

NFSD: Add a Kconfig setting to enable delegated timestamps

10mo ago

Linus Torvalds

0af2f6be

Linux 6.15-rc1 v6.15-rc1

9mo ago

Linus Torvalds

e742bd19

Merge tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

9mo ago

Suzuki K Poulose

3318dc29

irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()

9mo ago

Namhyung Kim

0db61388

perf/core: Change to POLLERR for pinned events with error

9mo ago

Linus Torvalds

9c32cda4

Linux 6.15-rc3 v6.15-rc3

9mo ago

Mike Rapoport (Microsoft)

83b2d345

x86/e820: Discard high memory that can't be addressed by 32-bit systems

Dave Hansen reports the following crash on a 32-bit system with
CONFIG_HIGHMEM=y and CONFIG_X86_PAE=y:

> 0xf75fe000 is the mem_map[] entry for the first page >4GB. It
> obviously wasn't allocated, thus the oops.

BUG: unable to handle page fault for address: f75fe000
#PF: supervisor write access in kernel mode
#PF: error_code(0x0002) - not-present page
*pdpt = 0000000002da2001 *pde = 000000000300c067 *pte = 0000000000000000
Oops: Oops: 0002 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 0 Comm: swapper Not tainted 6.15.0-rc1-00288-ge618ee89561b-dirty #311 PREEMPT(undef)
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014
EIP: __free_pages_core+0x3c/0x74
...
Call Trace:
memblock_free_pages+0x11/0x2c
memblock_free_all+0x2ce/0x3a0
mm_core_init+0xf5/0x320
start_kernel+0x296/0x79c
i386_start_kernel+0xad/0xb0
startup_32_smp+0x151/0x154

The mem_map[] is allocated up to the end of ZONE_HIGHMEM which is defined
by max_pfn.

The bug was introduced by this recent commit:

6faea3422e3b ("arch, mm: streamline HIGHMEM freeing")

Previously, freeing of high memory was also clamped to the end of
ZONE_HIGHMEM but after this change, memblock_free_all() tries to
free memory above the of ZONE_HIGHMEM as well and that causes
access to mem_map[] entries beyond the end of the memory map.

To fix this, discard the memory after max_pfn from memblock on
32-bit systems so that core MM would be aware only of actually
usable memory.

Fixes: 6faea3422e3b ("arch, mm: streamline HIGHMEM freeing")
Reported-by: Dave Hansen <dave.hansen@intel.com>
Tested-by: Arnd Bergmann <arnd@kernel.org>
Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Andy Shevchenko <andy@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Davide Ciminaghi <ciminaghi@gnudd.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20250413080858.743221-1-rppt@kernel.org # discussion and submission

9mo ago

Nicolas Bouchinet

87480a8c

sysctl: Fixes nsm_local_state bounds

10mo ago

Thomas Weißschuh

0efdedb3

tools/include: make uapi/linux/types.h usable from assembly

9mo ago

Linus Torvalds

ec0c2d53

Merge tag 'for-linus' of https://github.com/openrisc/linux

9mo ago

Bibo Mao

5add0dbb

LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally

9mo ago

Linus Torvalds

bc337235

Merge tag 'for-6.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

9mo ago

Linus Torvalds

ac71fabf

gcc-15: work around sequence-point warning

9mo ago

Linus Torvalds

8560697b

Merge tag '6.15-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

9mo ago

Jeff Layton

261e3bbf

nfsd: use a long for the count in nfsd4_state_shrinker_count()

10mo ago

Linus Torvalds

71032925

Merge tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

9mo ago

Linus Torvalds

a16ebe51

Merge tag 'move-lib-kunit-v6.15-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

9mo ago

Stafford Horne

66ffd2f3

Documentation: openrisc: Update toolchain binaries URL

9mo ago

Bibo Mao

9ea86232

LoongArch: KVM: Fully clear some CSRs when VM reboot

9mo ago

Linus Torvalds

e4b51cb6

Merge tag 'integrity-6.15-rc3-fix' of https://github.com/linux-integrity/linux

9mo ago

Naohiro Aota

866bafae

btrfs: zoned: skip reporting zone for new block group

There is a potential deadlock if we do report zones in an IO context, detailed
in below lockdep report. When one process do a report zones and another process
freezes the block device, the report zones side cannot allocate a tag because
the freeze is already started. This can thus result in new block group creation
to hang forever, blocking the write path.

Thankfully, a new block group should be created on empty zones. So, reporting
the zones is not necessary and we can set the write pointer = 0 and load the
zone capacity from the block layer using bdev_zone_capacity() helper.

======================================================
WARNING: possible circular locking dependency detected
6.14.0-rc1 #252 Not tainted
------------------------------------------------------
modprobe/1110 is trying to acquire lock:
ffff888100ac83e0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: __flush_work+0x38f/0xb60

but task is already holding lock:
ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #3 (&q->q_usage_counter(queue)#16){++++}-{0:0}:
blk_queue_enter+0x3d9/0x500
blk_mq_alloc_request+0x47d/0x8e0
scsi_execute_cmd+0x14f/0xb80
sd_zbc_do_report_zones+0x1c1/0x470
sd_zbc_report_zones+0x362/0xd60
blkdev_report_zones+0x1b1/0x2e0
btrfs_get_dev_zones+0x215/0x7e0 [btrfs]
btrfs_load_block_group_zone_info+0x6d2/0x2c10 [btrfs]
btrfs_make_block_group+0x36b/0x870 [btrfs]
btrfs_create_chunk+0x147d/0x2320 [btrfs]
btrfs_chunk_alloc+0x2ce/0xcf0 [btrfs]
start_transaction+0xce6/0x1620 [btrfs]
btrfs_uuid_scan_kthread+0x4ee/0x5b0 [btrfs]
kthread+0x39d/0x750
ret_from_fork+0x30/0x70
ret_from_fork_asm+0x1a/0x30

-> #2 (&fs_info->dev_replace.rwsem){++++}-{4:4}:
down_read+0x9b/0x470
btrfs_map_block+0x2ce/0x2ce0 [btrfs]
btrfs_submit_chunk+0x2d4/0x16c0 [btrfs]
btrfs_submit_bbio+0x16/0x30 [btrfs]
btree_write_cache_pages+0xb5a/0xf90 [btrfs]
do_writepages+0x17f/0x7b0
__writeback_single_inode+0x114/0xb00
writeback_sb_inodes+0x52b/0xe00
wb_writeback+0x1a7/0x800
wb_workfn+0x12a/0xbd0
process_one_work+0x85a/0x1460
worker_thread+0x5e2/0xfc0
kthread+0x39d/0x750
ret_from_fork+0x30/0x70
ret_from_fork_asm+0x1a/0x30

-> #1 (&fs_info->zoned_meta_io_lock){+.+.}-{4:4}:
__mutex_lock+0x1aa/0x1360
btree_write_cache_pages+0x252/0xf90 [btrfs]
do_writepages+0x17f/0x7b0
__writeback_single_inode+0x114/0xb00
writeback_sb_inodes+0x52b/0xe00
wb_writeback+0x1a7/0x800
wb_workfn+0x12a/0xbd0
process_one_work+0x85a/0x1460
worker_thread+0x5e2/0xfc0
kthread+0x39d/0x750
ret_from_fork+0x30/0x70
ret_from_fork_asm+0x1a/0x30

-> #0 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}:
__lock_acquire+0x2f52/0x5ea0
lock_acquire+0x1b1/0x540
__flush_work+0x3ac/0xb60
wb_shutdown+0x15b/0x1f0
bdi_unregister+0x172/0x5b0
del_gendisk+0x841/0xa20
sd_remove+0x85/0x130
device_release_driver_internal+0x368/0x520
bus_remove_device+0x1f1/0x3f0
device_del+0x3bd/0x9c0
__scsi_remove_device+0x272/0x340
scsi_forget_host+0xf7/0x170
scsi_remove_host+0xd2/0x2a0
sdebug_driver_remove+0x52/0x2f0 [scsi_debug]
device_release_driver_internal+0x368/0x520
bus_remove_device+0x1f1/0x3f0
device_del+0x3bd/0x9c0
device_unregister+0x13/0xa0
sdebug_do_remove_host+0x1fb/0x290 [scsi_debug]
scsi_debug_exit+0x17/0x70 [scsi_debug]
__do_sys_delete_module.isra.0+0x321/0x520
do_syscall_64+0x93/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e

other info that might help us debug this:

Chain exists of:
(work_completion)(&(&wb->dwork)->work) --> &fs_info->dev_replace.rwsem --> &q->q_usage_counter(queue)#16

Possible unsafe locking scenario:

CPU0 CPU1
---- ----
lock(&q->q_usage_counter(queue)#16);
lock(&fs_info->dev_replace.rwsem);
lock(&q->q_usage_counter(queue)#16);
lock((work_completion)(&(&wb->dwork)->work));

*** DEADLOCK ***

5 locks held by modprobe/1110:
#0: ffff88811f7bc108 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520
#1: ffff8881022ee0e0 (&shost->scan_mutex){+.+.}-{4:4}, at: scsi_remove_host+0x20/0x2a0
#2: ffff88811b4c4378 (&dev->mutex){....}-{4:4}, at: device_release_driver_internal+0x8f/0x520
#3: ffff8881205b6f20 (&q->q_usage_counter(queue)#16){++++}-{0:0}, at: sd_remove+0x85/0x130
#4: ffffffffa3284360 (rcu_read_lock){....}-{1:3}, at: __flush_work+0xda/0xb60

stack backtrace:
CPU: 0 UID: 0 PID: 1110 Comm: modprobe Not tainted 6.14.0-rc1 #252
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x6a/0x90
print_circular_bug.cold+0x1e0/0x274
check_noncircular+0x306/0x3f0
? __pfx_check_noncircular+0x10/0x10
? mark_lock+0xf5/0x1650
? __pfx_check_irq_usage+0x10/0x10
? lockdep_lock+0xca/0x1c0
? __pfx_lockdep_lock+0x10/0x10
__lock_acquire+0x2f52/0x5ea0
? __pfx___lock_acquire+0x10/0x10
? __pfx_mark_lock+0x10/0x10
lock_acquire+0x1b1/0x540
? __flush_work+0x38f/0xb60
? __pfx_lock_acquire+0x10/0x10
? __pfx_lock_release+0x10/0x10
? mark_held_locks+0x94/0xe0
? __flush_work+0x38f/0xb60
__flush_work+0x3ac/0xb60
? __flush_work+0x38f/0xb60
? __pfx_mark_lock+0x10/0x10
? __pfx___flush_work+0x10/0x10
? __pfx_wq_barrier_func+0x10/0x10
? __pfx___might_resched+0x10/0x10
? mark_held_locks+0x94/0xe0
wb_shutdown+0x15b/0x1f0
bdi_unregister+0x172/0x5b0
? __pfx_bdi_unregister+0x10/0x10
? up_write+0x1ba/0x510
del_gendisk+0x841/0xa20
? __pfx_del_gendisk+0x10/0x10
? _raw_spin_unlock_irqrestore+0x35/0x60
? __pm_runtime_resume+0x79/0x110
sd_remove+0x85/0x130
device_release_driver_internal+0x368/0x520
? kobject_put+0x5d/0x4a0
bus_remove_device+0x1f1/0x3f0
device_del+0x3bd/0x9c0
? __pfx_device_del+0x10/0x10
__scsi_remove_device+0x272/0x340
scsi_forget_host+0xf7/0x170
scsi_remove_host+0xd2/0x2a0
sdebug_driver_remove+0x52/0x2f0 [scsi_debug]
? kernfs_remove_by_name_ns+0xc0/0xf0
device_release_driver_internal+0x368/0x520
? kobject_put+0x5d/0x4a0
bus_remove_device+0x1f1/0x3f0
device_del+0x3bd/0x9c0
? __pfx_device_del+0x10/0x10
? __pfx___mutex_unlock_slowpath+0x10/0x10
device_unregister+0x13/0xa0
sdebug_do_remove_host+0x1fb/0x290 [scsi_debug]
scsi_debug_exit+0x17/0x70 [scsi_debug]
__do_sys_delete_module.isra.0+0x321/0x520
? __pfx___do_sys_delete_module.isra.0+0x10/0x10
? __pfx_slab_free_after_rcu_debug+0x10/0x10
? kasan_save_stack+0x2c/0x50
? kasan_record_aux_stack+0xa3/0xb0
? __call_rcu_common.constprop.0+0xc4/0xfb0
? kmem_cache_free+0x3a0/0x590
? __x64_sys_close+0x78/0xd0
do_syscall_64+0x93/0x180
? lock_is_held_type+0xd5/0x130
? __call_rcu_common.constprop.0+0x3c0/0xfb0
? lockdep_hardirqs_on+0x78/0x100
? __call_rcu_common.constprop.0+0x3c0/0xfb0
? __pfx___call_rcu_common.constprop.0+0x10/0x10
? kmem_cache_free+0x3a0/0x590
? lockdep_hardirqs_on_prepare+0x16d/0x400
? do_syscall_64+0x9f/0x180
? lockdep_hardirqs_on+0x78/0x100
? do_syscall_64+0x9f/0x180
? __pfx___x64_sys_openat+0x10/0x10
? lockdep_hardirqs_on_prepare+0x16d/0x400
? do_syscall_64+0x9f/0x180
? lockdep_hardirqs_on+0x78/0x100
? do_syscall_64+0x9f/0x180
entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f436712b68b
RSP: 002b:00007ffe9f1a8658 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 00005559b367fd80 RCX: 00007f436712b68b
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 00005559b367fde8
RBP: 00007ffe9f1a8680 R08: 1999999999999999 R09: 0000000000000000
R10: 00007f43671a5fe0 R11: 0000000000000206 R12: 0000000000000000
R13: 00007ffe9f1a86b0 R14: 0000000000000000 R15: 0000000000000000
</TASK>

Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
CC: <stable@vger.kernel.org> # 6.13+
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>

9mo ago

Linus Torvalds

05e8d261

gcc-15: add '__nonstring' markers to byte arrays

9mo ago

Linus Torvalds

3088d269

Merge tag 'x86-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9mo ago

Kuniyuki Iwashima

95d2b9f6

Revert "smb: client: fix TCP timers deadlock after rmmod"

This reverts commit e9f2517a3e18a54a3943c098d2226b245d488801.

Commit e9f2517a3e18 ("smb: client: fix TCP timers deadlock after
rmmod") is intended to fix a null-ptr-deref in LOCKDEP, which is
mentioned as CVE-2024-54680, but is actually did not fix anything;
The issue can be reproduced on top of it. [0]

Also, it reverted the change by commit ef7134c7fc48 ("smb: client:
Fix use-after-free of network namespace.") and introduced a real
issue by reviving the kernel TCP socket.

When a reconnect happens for a CIFS connection, the socket state
transitions to FIN_WAIT_1. Then, inet_csk_clear_xmit_timers_sync()
in tcp_close() stops all timers for the socket.

If an incoming FIN packet is lost, the socket will stay at FIN_WAIT_1
forever, and such sockets could be leaked up to net.ipv4.tcp_max_orphans.

Usually, FIN can be retransmitted by the peer, but if the peer aborts
the connection, the issue comes into reality.

I warned about this privately by pointing out the exact report [1],
but the bogus fix was finally merged.

So, we should not stop the timers to finally kill the connection on
our side in that case, meaning we must not use a kernel socket for
TCP whose sk->sk_net_refcnt is 0.

The kernel socket does not have a reference to its netns to make it
possible to tear down netns without cleaning up every resource in it.

For example, tunnel devices use a UDP socket internally, but we can
destroy netns without removing such devices and let it complete
during exit. Otherwise, netns would be leaked when the last application
died.

However, this is problematic for TCP sockets because TCP has timers to
close the connection gracefully even after the socket is close()d. The
lifetime of the socket and its netns is different from the lifetime of
the underlying connection.

If the socket user does not maintain the netns lifetime, the timer could
be fired after the socket is close()d and its netns is freed up, resulting
in use-after-free.

Actually, we have seen so many similar issues and converted such sockets
to have a reference to netns.

That's why I converted the CIFS client socket to have a reference to
netns (sk->sk_net_refcnt == 1), which is somehow mentioned as out-of-scope
of CIFS and technically wrong in e9f2517a3e18, but **is in-scope and right
fix**.

Regarding the LOCKDEP issue, we can prevent the module unload by
bumping the module refcount when switching the LOCKDDEP key in
sock_lock_init_class_and_name(). [2]

For a while, let's revert the bogus fix.

Note that now we can use sk_net_refcnt_upgrade() for the socket
conversion, but I'll do so later separately to make backport easy.

Link: https://lore.kernel.org/all/20250402020807.28583-1-kuniyu@amazon.com/ #[0]
Link: https://lore.kernel.org/netdev/c08bd5378da647a2a4c16698125d180a@huawei.com/ #[1]
Link: https://lore.kernel.org/lkml/20250402005841.19846-1-kuniyu@amazon.com/ #[2]
Fixes: e9f2517a3e18 ("smb: client: fix TCP timers deadlock after rmmod")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Steve French <stfrench@microsoft.com>

9mo ago

Jeff Layton

38762580

nfsd: remove obsolete comment from nfs4_alloc_stid

10mo ago

Linus Torvalds

59f392fa

Merge tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

9mo ago

Len Brown

03e00e37

tools/power turbostat: v2025.05.06

9mo ago

Linus Torvalds

fa573aef

Merge tag 'drm-fixes-2025-04-26' of https://gitlab.freedesktop.org/drm/kernel

9mo ago

Mark Brown

4ea404fd

lib: Ensure prime numbers tests are included in KUnit test runs

9mo ago

Stafford Horne

20a43732

Documentation: openrisc: Update mailing list

9mo ago

Yulong Han

8b2d01fe

LoongArch: KVM: Fix multiple typos of KVM code

9mo ago

Linus Torvalds

a33b5a08

Merge tag 'sched_ext-for-6.15-rc3-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

9mo ago

Frederick Lawler

30d68cb0

ima: process_measurement() needlessly takes inode_lock() on MAY_READ

9mo ago

Naohiro Aota

c1a79b1a

block: introduce zone capacity helper

9mo ago

Linus Torvalds

be913e7c

gcc-15: get rid of misc extra NUL character padding

9mo ago

Linus Torvalds

ac85740e

Merge tag 'timers-urgent-2025-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9mo ago

Ard Biesheuvel

d54d6102

x86/boot/sev: Avoid shared GHCB page for early memory acceptance

9mo ago

Kuniyuki Iwashima

c707193a

Revert "smb: client: Fix netns refcount imbalance causing leaks and use-after-free"

9mo ago

Jeff Layton

d917d783

nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault()

10mo ago

Linus Torvalds

dda88878

Merge tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9mo ago

Linux 6.15-rc4 v6.15-rc4

b4432656

Linus Torvalds

9mo

Merge tag 'pci-v6.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

5bc10186

Linus Torvalds

9mo

Merge tag 'nfsd-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

d22aad29

Linus Torvalds

9mo

misc: pci_endpoint_test: Defer IRQ allocation until ioctl(PCITEST_SET_IRQTYPE)

442cacac

Niklas Cassel

9mo

Merge tag 'x86-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

06b31bdb

Linus Torvalds

9mo

Revert "sunrpc: clean cache_detail immediately when flush is written frequently"

831e3f54

Chuck Lever

9mo

MAINTAINERS: Move Manivannan Sadhasivam as PCI Native host bridge and endpoint maintainer

0747c136

Manivannan Sadhasivam

9mo

Merge tag 'sched-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

3d23ef05

Linus Torvalds

9mo

x86/insn: Fix CTEST instruction decoding

85fd85bc

Kirill A. Shutemov

9mo

nfsd: decrease sc_count directly if fail to queue dl_recall

A deadlock warning occurred when invoking nfs4_put_stid following a failed
dl_recall queue operation:
T1 T2
nfs4_laundromat
nfs4_get_client_reaplist
nfs4_anylock_blockers
__break_lease
spin_lock // ctx->flc_lock
spin_lock // clp->cl_lock
nfs4_lockowner_has_blockers
locks_owner_has_blockers
spin_lock // flctx->flc_lock
nfsd_break_deleg_cb
nfsd_break_one_deleg
nfs4_put_stid
refcount_dec_and_lock
spin_lock // clp->cl_lock

When a file is opened, an nfs4_delegation is allocated with sc_count
initialized to 1, and the file_lease holds a reference to the delegation.
The file_lease is then associated with the file through kernel_setlease.

The disassociation is performed in nfsd4_delegreturn via the following
call chain:
nfsd4_delegreturn --> destroy_delegation --> destroy_unhashed_deleg -->
nfs4_unlock_deleg_lease --> kernel_setlease --> generic_delete_lease
The corresponding sc_count reference will be released after this
disassociation.

Since nfsd_break_one_deleg executes while holding the flc_lock, the
disassociation process becomes blocked when attempting to acquire flc_lock
in generic_delete_lease. This means:
1) sc_count in nfsd_break_one_deleg will not be decremented to 0;
2) The nfs4_put_stid called by nfsd_break_one_deleg will not attempt to
acquire cl_lock;
3) Consequently, no deadlock condition is created.

Given that sc_count in nfsd_break_one_deleg remains non-zero, we can
safely perform refcount_dec on sc_count directly. This approach
effectively avoids triggering deadlock warnings.

Fixes: 230ca758453c ("nfsd: put dl_stid if fail to queue dl_recall")
Signed-off-by: Li Lingfeng <lilingfeng3@huawei.com>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>

a1d14d93

Li Lingfeng

9mo

selftests/pcie_bwctrl: Fix test progs list

39e703ed

Ilpo Järvinen

9mo

Merge tag 'perf-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

86baa549

Linus Torvalds

9mo

sched/eevdf: Fix se->slice being set to U64_MAX and resulting crash

bbce3de7

Omar Sandoval

9mo

x86/boot: Work around broken busybox 'truncate' tool

032ce1ea

Ard Biesheuvel

9mo

nfs: add missing selections of CONFIG_CRC32

cd35b6cb

Eric Biggers

9mo

PCI: Restore assigned resources fully after release

a34d7487

Ilpo Järvinen

9mo

Merge tag 'irq-urgent-2025-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

a226e654

Linus Torvalds

9mo

perf/x86: Fix non-sampling (counting) events on certain x86 platforms

1a97fea9

Luo Gengkun

9mo

gcc-15: disable '-Wunterminated-string-initialization' entirely for now

9d7a0577

Linus Torvalds

9mo

x86/mm: Fix _pgd_alloc() for Xen PV mode

4ce385f5

Juergen Gross

9mo

NFSD: Add a Kconfig setting to enable delegated timestamps

26a80762

Chuck Lever

10mo

Linux 6.15-rc1 v6.15-rc1

0af2f6be

Linus Torvalds

9mo

Merge tag 'loongarch-fixes-6.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

e742bd19

Linus Torvalds

9mo

irqchip/gic-v2m: Prevent use after free of gicv2m_get_fwnode()

3318dc29

Suzuki K Poulose

9mo

perf/core: Change to POLLERR for pinned events with error

0db61388

Namhyung Kim

9mo

Linux 6.15-rc3 v6.15-rc3

9c32cda4

Linus Torvalds

9mo

x86/e820: Discard high memory that can't be addressed by 32-bit systems

83b2d345

Mike Rapoport (Microsoft)

9mo

sysctl: Fixes nsm_local_state bounds

87480a8c

Nicolas Bouchinet

10mo

tools/include: make uapi/linux/types.h usable from assembly

0efdedb3

Thomas Weißschuh

9mo

Merge tag 'for-linus' of https://github.com/openrisc/linux

ec0c2d53

Linus Torvalds

9mo

LoongArch: KVM: Fix PMU pass-through issue if VM exits to host finally

5add0dbb

Bibo Mao

9mo

Merge tag 'for-6.15-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

bc337235

Linus Torvalds

9mo

gcc-15: work around sequence-point warning

The C sequence points are complicated things, and gcc-15 has apparently
added a warning for the case where an object is both used and modified
multiple times within the same sequence point.

That's a great warning.

Or rather, it would be a great warning, except gcc-15 seems to not
really be very exact about it, and doesn't notice that the modification
are to two entirely different members of the same object: the array
counter and the array entries.

So that seems kind of silly.

That said, the code that gcc complains about is unnecessarily
complicated, so moving the array counter update into a separate
statement seems like the most straightforward fix for these warnings:

drivers/net/wireless/intel/iwlwifi/mld/d3.c: In function ‘iwl_mld_set_netdetect_info’:
drivers/net/wireless/intel/iwlwifi/mld/d3.c:1102:66: error: operation on ‘netdetect_info->n_matches’ may be undefined [-Werror=sequence-point]
1102 | netdetect_info->matches[netdetect_info->n_matches++] = match;
| ~~~~~~~~~~~~~~~~~~~~~~~~~^~

drivers/net/wireless/intel/iwlwifi/mld/d3.c:1120:58: error: operation on ‘match->n_channels’ may be undefined [-Werror=sequence-point]
1120 | match->channels[match->n_channels++] =
| ~~~~~~~~~~~~~~~~~^~

side note: the code at that second warning is actively buggy, and only
works on little-endian machines that don't do strict alignment checks.

The code casts an array of integers into an array of unsigned long in
order to use our bitmap iterators. That happens to work fine on any
sane architecture, but it's still wrong.

This does *not* fix that more serious problem. This only splits the two
assignments into two statements and fixes the compiler warning. I need
to get rid of the new warnings in order to be able to actually do any
build testing.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>