commits

When flushing a work item for cancellation, __flush_work() knows that it
exclusively owns the work item through its PENDING bit. 134874e2eee9
("workqueue: Allow cancel_work_sync() and disable_work() from atomic
contexts on BH work items") added a read of @work->data to determine whether
to use busy wait for BH work items that are being canceled. While the read
is safe when @from_cancel, @work->data was read before testing @from_cancel
to simplify code structure:

data = *work_data_bits(work);
if (from_cancel &&
!WARN_ON_ONCE(data & WORK_STRUCT_PWQ) && (data & WORK_OFFQ_BH)) {

While the read data was never used if !@from_cancel, this could trigger
KCSAN data race detection spuriously:

==================================================================
BUG: KCSAN: data-race in __flush_work / __flush_work

write to 0xffff8881223aa3e8 of 8 bytes by task 3998 on cpu 0:
instrument_write include/linux/instrumented.h:41 [inline]
___set_bit include/asm-generic/bitops/instrumented-non-atomic.h:28 [inline]
insert_wq_barrier kernel/workqueue.c:3790 [inline]
start_flush_work kernel/workqueue.c:4142 [inline]
__flush_work+0x30b/0x570 kernel/workqueue.c:4178
flush_work kernel/workqueue.c:4229 [inline]
...

read to 0xffff8881223aa3e8 of 8 bytes by task 50 on cpu 1:
__flush_work+0x42a/0x570 kernel/workqueue.c:4188
flush_work kernel/workqueue.c:4229 [inline]
flush_delayed_work+0x66/0x70 kernel/workqueue.c:4251
...

value changed: 0x0000000000400000 -> 0xffff88810006c00d

Reorganize the code so that @from_cancel is tested before @work->data is
accessed. The only problem is triggering KCSAN detection spuriously. This
shouldn't need READ_ONCE() or other access qualifiers.

No functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: syzbot+b3e4f2f51ed645fd5df2@syzkaller.appspotmail.com
Fixes: 134874e2eee9 ("workqueue: Allow cancel_work_sync() and disable_work() from atomic contexts on BH work items")
Link: http://lkml.kernel.org/r/000000000000ae429e061eea2157@google.com
Cc: Jens Axboe <axboe@kernel.dk>

1y ago

Chen Ridong

959ab635

cgroup/cpuset: fix panic caused by partcmd_update

We find a bug as below:
BUG: unable to handle page fault for address: 00000003
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 3 PID: 358 Comm: bash Tainted: G W I 6.6.0-10893-g60d6
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/4
RIP: 0010:partition_sched_domains_locked+0x483/0x600
Code: 01 48 85 d2 74 0d 48 83 05 29 3f f8 03 01 f3 48 0f bc c2 89 c0 48 9
RSP: 0018:ffffc90000fdbc58 EFLAGS: 00000202
RAX: 0000000100000003 RBX: ffff888100b3dfa0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000000000002fe80
RBP: ffff888100b3dfb0 R08: 0000000000000001 R09: 0000000000000000
R10: ffffc90000fdbcb0 R11: 0000000000000004 R12: 0000000000000002
R13: ffff888100a92b48 R14: 0000000000000000 R15: 0000000000000000
FS: 00007f44a5425740(0000) GS:ffff888237d80000(0000) knlGS:0000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000100030973 CR3: 000000010722c000 CR4: 00000000000006e0
Call Trace:
<TASK>
? show_regs+0x8c/0xa0
? __die_body+0x23/0xa0
? __die+0x3a/0x50
? page_fault_oops+0x1d2/0x5c0
? partition_sched_domains_locked+0x483/0x600
? search_module_extables+0x2a/0xb0
? search_exception_tables+0x67/0x90
? kernelmode_fixup_or_oops+0x144/0x1b0
? __bad_area_nosemaphore+0x211/0x360
? up_read+0x3b/0x50
? bad_area_nosemaphore+0x1a/0x30
? exc_page_fault+0x890/0xd90
? __lock_acquire.constprop.0+0x24f/0x8d0
? __lock_acquire.constprop.0+0x24f/0x8d0
? asm_exc_page_fault+0x26/0x30
? partition_sched_domains_locked+0x483/0x600
? partition_sched_domains_locked+0xf0/0x600
rebuild_sched_domains_locked+0x806/0xdc0
update_partition_sd_lb+0x118/0x130
cpuset_write_resmask+0xffc/0x1420
cgroup_file_write+0xb2/0x290
kernfs_fop_write_iter+0x194/0x290
new_sync_write+0xeb/0x160
vfs_write+0x16f/0x1d0
ksys_write+0x81/0x180
__x64_sys_write+0x21/0x30
x64_sys_call+0x2f25/0x4630
do_syscall_64+0x44/0xb0
entry_SYSCALL_64_after_hwframe+0x78/0xe2
RIP: 0033:0x7f44a553c887

It can be reproduced with cammands:
cd /sys/fs/cgroup/
mkdir test
cd test/
echo +cpuset > ../cgroup.subtree_control
echo root > cpuset.cpus.partition
cat /sys/fs/cgroup/cpuset.cpus.effective
0-3
echo 0-3 > cpuset.cpus // taking away all cpus from root

This issue is caused by the incorrect rebuilding of scheduling domains.
In this scenario, test/cpuset.cpus.partition should be an invalid root
and should not trigger the rebuilding of scheduling domains. When calling
update_parent_effective_cpumask with partcmd_update, if newmask is not
null, it should recheck newmask whether there are cpus is available
for parect/cs that has tasks.

Fixes: 0c7f293efc87 ("cgroup/cpuset: Add cpuset.cpus.exclusive.effective for v2")
Cc: stable@vger.kernel.org # v6.7+
Signed-off-by: Chen Ridong <chenridong@huawei.com>
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

1y ago

Manivannan Sadhasivam

cd06b713

scsi: ufs: core: Add a quirk for handling broken LSDBS field in controller capabilities register

1y ago

Alexander Gordeev

a3ca27c4

s390/mm: Prevent lowcore vs identity mapping overlap

1y ago

Linus Torvalds

b311c1b4

Merge tag '6.11-rc4-server-fixes' of git://git.samba.org/ksmbd

1y ago

Kent Overstreet

0b50b731

bcachefs: Fix refcounting in discard path

1y ago

Linus Torvalds

60f0560f

Merge tag 'nfs-for-6.11-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

1y ago

Catalin Marinas

75c8f387

Merge tag 'kvmarm-fixes-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into for-next/fixes

1y ago

Jiaxun Yang

1cb6ab44

MIPS: Loongson64: Set timer mode in cpu-probe

1y ago

Lai Jiangshan

98cc1730

workqueue: Remove incorrect "WARN_ON_ONCE(!list_empty(&worker->entry));" from dying worker

The commit 68f83057b913 ("workqueue: Reap workers via kthread_stop()
and remove detach_completion") changes the procedure of destroying
workers; the dying workers are kept in the cull_list in wake_dying_workers()
with the pool lock held and removed from the cull_list by the newly
added reap_dying_workers() without the pool lock.

This can cause a warning if the dying worker is wokenup earlier than
reaped as reported by Marc:

2024/07/23 18:01:21 [M83LP63]: [ 157.267727] ------------[ cut here ]------------
2024/07/23 18:01:21 [M83LP63]: [ 157.267735] WARNING: CPU: 21 PID: 725 at kernel/workqueue.c:3340 worker_thread+0x54e/0x558
2024/07/23 18:01:21 [M83LP63]: [ 157.267746] Modules linked in: binfmt_misc nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables sunrpc dm_service_time s390_trng vfio_ccw mdev vfio_iommu_type1 vfio sch_fq_codel
2024/07/23 18:01:21 [M83LP63]: loop dm_multipath configfs nfnetlink lcs ctcm fsm zfcp scsi_transport_fc ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common scm_block eadm_sch scsi_dh_rdac scsi_dh_emc scsi_dh_alua pkey zcrypt rng_core autofs4
2024/07/23 18:01:21 [M83LP63]: [ 157.267792] CPU: 21 PID: 725 Comm: kworker/dying Not tainted 6.10.0-rc2-00239-g68f83057b913 #95
2024/07/23 18:01:21 [M83LP63]: [ 157.267796] Hardware name: IBM 3906 M04 704 (LPAR)
2024/07/23 18:01:21 [M83LP63]: [ 157.267802] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:1 PM:0 RI:0 EA:3
2024/07/23 18:01:21 [M83LP63]: [ 157.267797] Krnl PSW : 0704d00180000000 000003d600fcd9fa (worker_thread+0x552/0x558)
2024/07/23 18:01:21 [M83LP63]: [ 157.267806] Krnl GPRS: 6479696e6700776f 000002c901b62780 000003d602493ec8 000002c914954600
2024/07/23 18:01:21 [M83LP63]: [ 157.267809] 0000000000000000 0000000000000008 000002c901a85400 000002c90719e840
2024/07/23 18:01:21 [M83LP63]: [ 157.267811] 000002c90719e880 000002c901a85420 000002c91127adf0 000002c901a85400
2024/07/23 18:01:21 [M83LP63]: [ 157.267813] 000002c914954600 0000000000000000 000003d600fcd772 000003560452bd98
2024/07/23 18:01:21 [M83LP63]: [ 157.267822] Krnl Code: 000003d600fcd9ec: c0e500674262 brasl %r14,000003d601cb5eb0
2024/07/23 18:01:21 [M83LP63]: [ 157.267822] 000003d600fcd9f2: a7f4ffc8 brc 15,000003d600fcd982
2024/07/23 18:01:21 [M83LP63]: [ 157.267822] #000003d600fcd9f6: af000000 mc 0,0
2024/07/23 18:01:21 [M83LP63]: [ 157.267822] >000003d600fcd9fa: a7f4fec2 brc 15,000003d600fcd77e
2024/07/23 18:01:21 [M83LP63]: [ 157.267822] 000003d600fcd9fe: 0707 bcr 0,%r7
2024/07/23 18:01:21 [M83LP63]: [ 157.267822] 000003d600fcda00: c00400682e10 brcl 0,000003d601cd3620
2024/07/23 18:01:21 [M83LP63]: [ 157.267822] 000003d600fcda06: eb7ff0500024 stmg %r7,%r15,80(%r15)
2024/07/23 18:01:21 [M83LP63]: [ 157.267822] 000003d600fcda0c: b90400ef lgr %r14,%r15
2024/07/23 18:01:21 [M83LP63]: [ 157.267853] Call Trace:
2024/07/23 18:01:21 [M83LP63]: [ 157.267855] [<000003d600fcd9fa>] worker_thread+0x552/0x558
2024/07/23 18:01:21 [M83LP63]: [ 157.267859] ([<000003d600fcd772>] worker_thread+0x2ca/0x558)
2024/07/23 18:01:21 [M83LP63]: [ 157.267862] [<000003d600fd6c80>] kthread+0x120/0x128
2024/07/23 18:01:21 [M83LP63]: [ 157.267865] [<000003d600f5305c>] __ret_from_fork+0x3c/0x58
2024/07/23 18:01:21 [M83LP63]: [ 157.267868] [<000003d601cc746a>] ret_from_fork+0xa/0x30
2024/07/23 18:01:21 [M83LP63]: [ 157.267873] Last Breaking-Event-Address:
2024/07/23 18:01:21 [M83LP63]: [ 157.267874] [<000003d600fcd778>] worker_thread+0x2d0/0x558

Since the procedure of destroying workers is changed, the WARN_ON_ONCE()
becomes incorrect and should be removed.

Cc: Marc Hartmayer <mhartmay@linux.ibm.com>
Link: https://lore.kernel.org/lkml/87le1sjd2e.fsf@linux.ibm.com/
Reported-by: Marc Hartmayer <mhartmay@linux.ibm.com>
Fixes: 68f83057b913 ("workqueue: Reap workers via kthread_stop() and remove detach_completion")
Cc: stable@vger.kernel.org # v6.11+
Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

1y ago

Linus Torvalds

8400291e

Linux 6.11-rc1 v6.11-rc1

2y ago

Chaotian Jing

f03e94f2

scsi: core: Fix the return value of scsi_logical_block_count()

1y ago

Linus Torvalds

de9c2c66

Linux 6.11-rc2 v6.11-rc2

1y ago

Linus Torvalds

0108b7be

Merge tag 'for-linus-iommufd' of git://git.kernel.org/pub/scm/linux/kernel/git/jgg/iommufd

1y ago

Thorsten Blum

7c525ddd

ksmbd: Replace one-element arrays with flexible-array members

1y ago

Kent Overstreet

8ed823b1

bcachefs: Fix compat issue with old alloc_v4 keys

1y ago

Linus Torvalds

66ace9a8

Merge tag 'v6.11-rc4-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

1y ago

Trond Myklebust

f92214e4

NFS: Avoid unnecessary rescanning of the per-server delegation list

1y ago

Samuel Holland

f75c2355

arm64: Fix KASAN random tag seed initialization

1y ago

Marc Zyngier

3e6245eb

KVM: arm64: Make ICC_*SGI*_EL1 undef in the absence of a vGICv3

1y ago

Will Deacon

38f7e145

workqueue: Fix UBSAN 'subtraction overflow' error in shift_and_mask()

UBSAN reports the following 'subtraction overflow' error when booting
in a virtual machine on Android:

| Internal error: UBSAN: integer subtraction overflow: 00000000f2005515 [#1] PREEMPT SMP
| Modules linked in:
| CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.10.0-00006-g3cbe9e5abd46-dirty #4
| Hardware name: linux,dummy-virt (DT)
| pstate: 600000c5 (nZCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : cancel_delayed_work+0x34/0x44
| lr : cancel_delayed_work+0x2c/0x44
| sp : ffff80008002ba60
| x29: ffff80008002ba60 x28: 0000000000000000 x27: 0000000000000000
| x26: 0000000000000000 x25: 0000000000000000 x24: 0000000000000000
| x23: 0000000000000000 x22: 0000000000000000 x21: ffff1f65014cd3c0
| x20: ffffc0e84c9d0da0 x19: ffffc0e84cab3558 x18: ffff800080009058
| x17: 00000000247ee1f8 x16: 00000000247ee1f8 x15: 00000000bdcb279d
| x14: 0000000000000001 x13: 0000000000000075 x12: 00000a0000000000
| x11: ffff1f6501499018 x10: 00984901651fffff x9 : ffff5e7cc35af000
| x8 : 0000000000000001 x7 : 3d4d455453595342 x6 : 000000004e514553
| x5 : ffff1f6501499265 x4 : ffff1f650ff60b10 x3 : 0000000000000620
| x2 : ffff80008002ba78 x1 : 0000000000000000 x0 : 0000000000000000
| Call trace:
| cancel_delayed_work+0x34/0x44
| deferred_probe_extend_timeout+0x20/0x70
| driver_register+0xa8/0x110
| __platform_driver_register+0x28/0x3c
| syscon_init+0x24/0x38
| do_one_initcall+0xe4/0x338
| do_initcall_level+0xac/0x178
| do_initcalls+0x5c/0xa0
| do_basic_setup+0x20/0x30
| kernel_init_freeable+0x8c/0xf8
| kernel_init+0x28/0x1b4
| ret_from_fork+0x10/0x20
| Code: f9000fbf 97fffa2f 39400268 37100048 (d42aa2a0)
| ---[ end trace 0000000000000000 ]---
| Kernel panic - not syncing: UBSAN: integer subtraction overflow: Fatal exception

This is due to shift_and_mask() using a signed immediate to construct
the mask and being called with a shift of 31 (WORK_OFFQ_POOL_SHIFT) so
that it ends up decrementing from INT_MIN.

Use an unsigned constant '1U' to generate the mask in shift_and_mask().

Cc: Tejun Heo <tj@kernel.org>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Fixes: 1211f3b21c2a ("workqueue: Preserve OFFQ bits in cancel[_sync] paths")
Signed-off-by: Will Deacon <will@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>