commits

Syzkaller detected a use-after-free issue in ext4_insert_dentry that was
caused by out-of-bounds access due to incorrect splitting in do_split.

BUG: KASAN: use-after-free in ext4_insert_dentry+0x36a/0x6d0 fs/ext4/namei.c:2109
Write of size 251 at addr ffff888074572f14 by task syz-executor335/5847

CPU: 0 UID: 0 PID: 5847 Comm: syz-executor335 Not tainted 6.12.0-rc6-syzkaller-00318-ga9cda7c0ffed #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
Call Trace:
<TASK>
__dump_stack lib/dump_stack.c:94 [inline]
dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:377 [inline]
print_report+0x169/0x550 mm/kasan/report.c:488
kasan_report+0x143/0x180 mm/kasan/report.c:601
kasan_check_range+0x282/0x290 mm/kasan/generic.c:189
__asan_memcpy+0x40/0x70 mm/kasan/shadow.c:106
ext4_insert_dentry+0x36a/0x6d0 fs/ext4/namei.c:2109
add_dirent_to_buf+0x3d9/0x750 fs/ext4/namei.c:2154
make_indexed_dir+0xf98/0x1600 fs/ext4/namei.c:2351
ext4_add_entry+0x222a/0x25d0 fs/ext4/namei.c:2455
ext4_add_nondir+0x8d/0x290 fs/ext4/namei.c:2796
ext4_symlink+0x920/0xb50 fs/ext4/namei.c:3431
vfs_symlink+0x137/0x2e0 fs/namei.c:4615
do_symlinkat+0x222/0x3a0 fs/namei.c:4641
__do_sys_symlink fs/namei.c:4662 [inline]
__se_sys_symlink fs/namei.c:4660 [inline]
__x64_sys_symlink+0x7a/0x90 fs/namei.c:4660
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
</TASK>

The following loop is located right above 'if' statement.

for (i = count-1; i >= 0; i--) {
/* is more than half of this entry in 2nd half of the block? */
if (size + map[i].size/2 > blocksize/2)
break;
size += map[i].size;
move++;
}

'i' in this case could go down to -1, in which case sum of active entries
wouldn't exceed half the block size, but previous behaviour would also do
split in half if sum would exceed at the very last block, which in case of
having too many long name files in a single block could lead to
out-of-bounds access and following use-after-free.

Found by Linux Verification Center (linuxtesting.org) with Syzkaller.

Cc: stable@vger.kernel.org
Fixes: 5872331b3d91 ("ext4: fix potential negative array index in do_split()")
Signed-off-by: Artem Sadovnikov <a.sadovnikov@ispras.ru>
Reviewed-by: Jan Kara <jack@suse.cz>
Link: https://patch.msgid.link/20250404082804.2567-3-a.sadovnikov@ispras.ru
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

9mo ago

Gao Xiang

be45319c

erofs: fix encoded extents handling

9mo ago

Linus Torvalds

7cdabafc

Merge tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Hide get_vm_area() from MMUless builds

The function get_vm_area() is not defined when CONFIG_MMU is not
defined. Hide that function within #ifdef CONFIG_MMU.

- Fix output of synthetic events when they have dynamic strings

The print fmt of the synthetic event's format file use to have "%.*s"
for dynamic size strings even though the user space exported
arguments had only __get_str() macro that provided just a nul
terminated string. This was fixed so that user space could parse this
properly.

But the reason that it had "%.*s" was because internally it provided
the maximum size of the string as one of the arguments. The fix that
replaced "%.*s" with "%s" caused the trace output (when the kernel
reads the event) to write "(efault)" as it would now read the length
of the string as "%s".

As the string provided is always nul terminated, there's no reason
for the internal code to use "%.*s" anyway. Just remove the length
argument to match the "%s" that is now in the format.

- Fix the ftrace subops hash logic of the manager ops hash

The function_graph uses the ftrace subops code. The subops code is a
way to have a single ftrace_ops registered with ftrace to determine
what functions will call the ftrace_ops callback. More than one user
of function graph can register a ftrace_ops with it. The function
graph infrastructure will then add this ftrace_ops as a subops with
the main ftrace_ops it registers with ftrace. This is because the
functions will always call the function graph callback which in turn
calls the subops ftrace_ops callbacks.

The main ftrace_ops must add a callback to all the functions that the
subops want a callback from. When a subops is registered, it will
update the main ftrace_ops hash to include the functions it wants.
This is the logic that was broken.

The ftrace_ops hash has a "filter_hash" and a "notrace_hash" where
all the functions in the filter_hash but not in the notrace_hash are
attached by ftrace. The original logic would have the main ftrace_ops
filter_hash be a union of all the subops filter_hashes and the main
notrace_hash would be a intersect of all the subops filter hashes.
But this was incorrect because the notrace hash depends on the
filter_hash it is associated to and not the union of all
filter_hashes.

Instead, when a subops is added, just include all the functions of
the subops hash that are in its filter_hash but not in its
notrace_hash. The main subops hash should not use its notrace hash,
unless all of its subops hashes have an empty filter_hash (which
means to attach to all functions), and then, and only then, the main
ftrace_ops notrace hash can be the intersect of all the subops
hashes.

This not only fixes the bug, but also simplifies the code.

- Add a selftest to better test the subops filtering

Add a selftest that would catch the bug fixed by the above change.

- Fix extra newline printed in function tracing with retval

The function parameter code changed the output logic slightly and
called print_graph_retval() and also printed a newline. The
print_graph_retval() also prints a newline which caused blank lines
to be printed in the function graph tracer when retval was added.
This caused one of the selftests to fail if retvals were enabled.
Instead remove the new line output from print_graph_retval() and have
the callers always print the new line so that it doesn't have to do
special logic if it calls print_graph_retval() or not.

- Fix out-of-bound memory access in the runtime verifier

When rv_is_container_monitor() is called on the last entry on the
link list it references the next entry, which is the list head and
causes an out-of-bound memory access.

* tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
rv: Fix out-of-bound memory access in rv_is_container_monitor()
ftrace: Do not have print_graph_retval() add a newline
tracing/selftest: Add test to better test subops filtering of function graph
ftrace: Fix accounting of subop hashes
ftrace: Properly merge notrace hashes
tracing: Do not add length to print format in synthetic events
tracing: Hide get_vm_area() from MMUless builds

9mo ago

Masami Hiramatsu (Google)

ed471e19

memblock tests: Fix mutex related build error

9mo ago

Ojaswin Mujoo

ccad447a

ext4: make block validity check resistent to sb bh corruption

9mo ago

Gao Xiang

d385f15d

erofs: add __packed annotation to union(__le16..)

9mo ago

Linus Torvalds

b676ac48

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

9mo ago

Nam Cao

8d7861ac

rv: Fix out-of-bound memory access in rv_is_container_monitor()

9mo ago

Linus Torvalds

0af2f6be

Linux 6.15-rc1 v6.15-rc1

9mo ago

Gustavo A. R. Silva

7e50bbb1

ext4: avoid -Wflex-array-member-not-at-end warning

9mo ago

Sheng Yong

1595f153

erofs: set error to bio if file-backed IO fails

9mo ago

Linus Torvalds

ecd5d67a

Merge tag 'pwm/for-6.15-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux

9mo ago

Kumar Kartikeya Dwivedi

a650d389

bpf: Convert ringbuf map to rqspinlock

Convert the raw spinlock used by BPF ringbuf to rqspinlock. Currently,
we have an open syzbot report of a potential deadlock. In addition, the
ringbuf can fail to reserve spuriously under contention from NMI
context.

It is potentially attractive to enable unconstrained usage (incl. NMIs)
while ensuring no deadlocks manifest at runtime, perform the conversion
to rqspinlock to achieve this.

This change was benchmarked for BPF ringbuf's multi-producer contention
case on an Intel Sapphire Rapids server, with hyperthreading disabled
and performance governor turned on. 5 warm up runs were done for each
case before obtaining the results.

Before (raw_spinlock_t):

Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1 11.440 ± 0.019M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 2 2.706 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 3 3.130 ± 0.004M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 4 2.472 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 8 2.352 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 12 2.813 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 16 1.988 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 20 2.245 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 24 2.148 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 28 2.190 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 32 2.490 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 36 2.180 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 40 2.201 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 44 2.226 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 48 2.164 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 52 1.874 ± 0.001M/s (drops 0.000 ± 0.000M/s)

After (rqspinlock_t):

Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1 11.078 ± 0.019M/s (drops 0.000 ± 0.000M/s) (-3.16%)
rb-libbpf nr_prod 2 2.801 ± 0.014M/s (drops 0.000 ± 0.000M/s) (3.51%)
rb-libbpf nr_prod 3 3.454 ± 0.005M/s (drops 0.000 ± 0.000M/s) (10.35%)
rb-libbpf nr_prod 4 2.567 ± 0.002M/s (drops 0.000 ± 0.000M/s) (3.84%)
rb-libbpf nr_prod 8 2.468 ± 0.001M/s (drops 0.000 ± 0.000M/s) (4.93%)
rb-libbpf nr_prod 12 2.510 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-10.77%)
rb-libbpf nr_prod 16 2.075 ± 0.001M/s (drops 0.000 ± 0.000M/s) (4.38%)
rb-libbpf nr_prod 20 2.640 ± 0.001M/s (drops 0.000 ± 0.000M/s) (17.59%)
rb-libbpf nr_prod 24 2.092 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-2.61%)
rb-libbpf nr_prod 28 2.426 ± 0.005M/s (drops 0.000 ± 0.000M/s) (10.78%)
rb-libbpf nr_prod 32 2.331 ± 0.004M/s (drops 0.000 ± 0.000M/s) (-6.39%)
rb-libbpf nr_prod 36 2.306 ± 0.003M/s (drops 0.000 ± 0.000M/s) (5.78%)
rb-libbpf nr_prod 40 2.178 ± 0.002M/s (drops 0.000 ± 0.000M/s) (-1.04%)
rb-libbpf nr_prod 44 2.293 ± 0.001M/s (drops 0.000 ± 0.000M/s) (3.01%)
rb-libbpf nr_prod 48 2.022 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-6.56%)
rb-libbpf nr_prod 52 1.809 ± 0.001M/s (drops 0.000 ± 0.000M/s) (-3.47%)

There's a fair amount of noise in the benchmark, with numbers on reruns
going up and down by 10%, so all changes are in the range of this
disturbance, and we see no major regressions.

Reported-by: syzbot+850aaf14624dc0c6d366@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000004aa700061379547e@google.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250411101759.4061366-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

9mo ago

Steven Rostedt

485acd20

ftrace: Do not have print_graph_retval() add a newline

9mo ago

Thomas Weißschuh

0efdedb3

tools/include: make uapi/linux/types.h usable from assembly

9mo ago

Tom Vierjahn

ce7e8a65

Documentation: ext4: Add fields to ext4_super_block documentation

9mo ago

Linus Torvalds

3bde70a2

Merge tag 'v6.15-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

9mo ago

Uwe Kleine-König

a85e08a0

pwm: axi-pwmgen: Let .round_waveform_tohw() signal when request was rounded up

9mo ago

Kumar Kartikeya Dwivedi

2f41503d

bpf: Convert queue_stack map to rqspinlock

9mo ago

Steven Rostedt

a1fc89d4

tracing/selftest: Add test to better test subops filtering of function graph

9mo ago

Linus Torvalds

71032925

Merge tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

9mo ago

Jann Horn

642335f3

ext4: don't treat fhandle lookup of ea_inode as FS corruption

9mo ago

Linus Torvalds

5d749923

Merge tag 'pci-v6.15-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

9mo ago

Steve French

56c283b9

smb3: Add defines for two new FileSystemAttributes

9mo ago

Uwe Kleine-König

fda6e003

pwm: stm32: Search an appropriate duty_cycle if period cannot be modified

9mo ago

Kumar Kartikeya Dwivedi

92b90f78

bpf: Use architecture provided res_smp_cond_load_acquire

9mo ago

Steven Rostedt

0ae6b8ce

ftrace: Fix accounting of subop hashes

The function graph infrastructure uses ftrace to hook to functions. It has
a single ftrace_ops to manage all the users of function graph. Each
individual user (tracing, bpf, fprobes, etc) has its own ftrace_ops to
track the functions it will have its callback called from. These
ftrace_ops are "subops" to the main ftrace_ops of the function graph
infrastructure.

Each ftrace_ops has a filter_hash and a notrace_hash that is defined as:

Only trace functions that are in the filter_hash but not in the
notrace_hash.

If the filter_hash is empty, it means to trace all functions.
If the notrace_hash is empty, it means do not disable any function.

The function graph main ftrace_ops needs to be a superset containing all
the functions to be traced by all the subops it has. The algorithm to
perform this merge was incorrect.

When the first subops was added to the main ops, it simply made the main
ops a copy of the subops (same filter_hash and notrace_hash).

When a second ops was added, it joined the new subops filter_hash with the
main ops filter_hash as a union of the two sets. The intersect between the
new subops notrace_hash and the main ops notrace_hash was created as the
new notrace_hash of the main ops.

The issue here is that it would then start tracing functions than no
subops were tracing. For example if you had two subops that had:

subops 1:

filter_hash = '*sched*' # trace all functions with "sched" in it
notrace_hash = '*time*' # except do not trace functions with "time"

subops 2:

filter_hash = '*lock*' # trace all functions with "lock" in it
notrace_hash = '*clock*' # except do not trace functions with "clock"

The intersect of '*time*' functions with '*clock*' functions could be the
empty set. That means the main ops will be tracing all functions with
'*time*' and all "*clock*" in it!

Instead, modify the algorithm to be a bit simpler and correct.

First, when adding a new subops, even if it's the first one, do not add
the notrace_hash if the filter_hash is not empty. Instead, just add the
functions that are in the filter_hash of the subops but not in the
notrace_hash of the subops into the main ops filter_hash. There's no
reason to add anything to the main ops notrace_hash.

The notrace_hash of the main ops should only be non empty iff all subops
filter_hashes are empty (meaning to trace all functions) and all subops
notrace_hashes include the same functions.

That is, the main ops notrace_hash is empty if any subops filter_hash is
non empty.

The main ops notrace_hash only has content in it if all subops
filter_hashes are empty, and the content are only functions that intersect
all the subops notrace_hashes. If any subops notrace_hash is empty, then
so is the main ops notrace_hash.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Shuah Khan <skhan@linuxfoundation.org>
Cc: Andy Chiu <andybnac@gmail.com>
Link: https://lore.kernel.org/20250409152720.216356767@goodmis.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

9mo ago

Linus Torvalds

59f392fa

Merge tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

9mo ago

Len Brown

03e00e37

tools/power turbostat: v2025.05.06

9mo ago

Acs, Jakub

d5e20677

ext4: fix OOB read when checking dotdot dir

Mounting a corrupted filesystem with directory which contains '.' dir
entry with rec_len == block size results in out-of-bounds read (later
on, when the corrupted directory is removed).

ext4_empty_dir() assumes every ext4 directory contains at least '.'
and '..' as directory entries in the first data block. It first loads
the '.' dir entry, performs sanity checks by calling ext4_check_dir_entry()
and then uses its rec_len member to compute the location of '..' dir
entry (in ext4_next_entry). It assumes the '..' dir entry fits into the
same data block.

If the rec_len of '.' is precisely one block (4KB), it slips through the
sanity checks (it is considered the last directory entry in the data
block) and leaves "struct ext4_dir_entry_2 *de" point exactly past the
memory slot allocated to the data block. The following call to
ext4_check_dir_entry() on new value of de then dereferences this pointer
which results in out-of-bounds mem access.

Fix this by extending __ext4_check_dir_entry() to check for '.' dir
entries that reach the end of data block. Make sure to ignore the phony
dir entries for checksum (by checking name_len for non-zero).

Note: This is reported by KASAN as use-after-free in case another
structure was recently freed from the slot past the bound, but it is
really an OOB read.

This issue was found by syzkaller tool.

Call Trace:
[ 38.594108] BUG: KASAN: slab-use-after-free in __ext4_check_dir_entry+0x67e/0x710
[ 38.594649] Read of size 2 at addr ffff88802b41a004 by task syz-executor/5375
[ 38.595158]
[ 38.595288] CPU: 0 UID: 0 PID: 5375 Comm: syz-executor Not tainted 6.14.0-rc7 #1
[ 38.595298] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
[ 38.595304] Call Trace:
[ 38.595308] <TASK>
[ 38.595311] dump_stack_lvl+0xa7/0xd0
[ 38.595325] print_address_description.constprop.0+0x2c/0x3f0
[ 38.595339] ? __ext4_check_dir_entry+0x67e/0x710
[ 38.595349] print_report+0xaa/0x250
[ 38.595359] ? __ext4_check_dir_entry+0x67e/0x710
[ 38.595368] ? kasan_addr_to_slab+0x9/0x90
[ 38.595378] kasan_report+0xab/0xe0
[ 38.595389] ? __ext4_check_dir_entry+0x67e/0x710
[ 38.595400] __ext4_check_dir_entry+0x67e/0x710
[ 38.595410] ext4_empty_dir+0x465/0x990
[ 38.595421] ? __pfx_ext4_empty_dir+0x10/0x10
[ 38.595432] ext4_rmdir.part.0+0x29a/0xd10
[ 38.595441] ? __dquot_initialize+0x2a7/0xbf0
[ 38.595455] ? __pfx_ext4_rmdir.part.0+0x10/0x10
[ 38.595464] ? __pfx___dquot_initialize+0x10/0x10
[ 38.595478] ? down_write+0xdb/0x140
[ 38.595487] ? __pfx_down_write+0x10/0x10
[ 38.595497] ext4_rmdir+0xee/0x140
[ 38.595506] vfs_rmdir+0x209/0x670
[ 38.595517] ? lookup_one_qstr_excl+0x3b/0x190
[ 38.595529] do_rmdir+0x363/0x3c0
[ 38.595537] ? __pfx_do_rmdir+0x10/0x10
[ 38.595544] ? strncpy_from_user+0x1ff/0x2e0
[ 38.595561] __x64_sys_unlinkat+0xf0/0x130
[ 38.595570] do_syscall_64+0x5b/0x180
[ 38.595583] entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fixes: ac27a0ec112a0 ("[PATCH] ext4: initial copy of files from ext3")
Signed-off-by: Jakub Acs <acsjakub@amazon.de>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: linux-ext4@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: Mahmoud Adam <mngyadam@amazon.com>
Cc: stable@vger.kernel.org
Cc: security@kernel.org
Link: https://patch.msgid.link/b3ae36a6794c4a01944c7d70b403db5b@amazon.de
Signed-off-by: Theodore Ts'o <tytso@mit.edu>

10mo ago

Linus Torvalds

e618ee89

Merge tag 'spi-fix-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

9mo ago

Zhangfei Gao

c8ba3f8a

PCI: Run quirk_huawei_pcie_sva() before arm_smmu_probe_device()

9mo ago

Pali Rohár

ef86ab13

cifs: Fix querying of WSL CHR and BLK reparse points over SMB1

9mo ago

Uwe Kleine-König

00e53d0f

pwm: Let pwm_set_waveform() succeed even if lowlevel driver rounded up

9mo ago

Kumar Kartikeya Dwivedi

1ddb9ad2

selftests/bpf: Make res_spin_lock AA test condition stronger

9mo ago

Andy Chiu

04a80a34

ftrace: Properly merge notrace hashes

9mo ago

Linus Torvalds

dda88878

Merge tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9mo ago

Bard Liao

fcc0f169

ASoC: SOF: Intel: Let SND_SOF_SOF_HDA_SDW_BPT select SND_HDA_EXT_CORE

9mo ago

Len Brown

ec4acd31

tools/power turbostat: disable "cpuidle" invocation counters, by default

9mo ago

Nicolas Bretz

d7b0befd

ext4: on a remount, only log the ro or r/w state when it has changed

10mo ago

Linus Torvalds

2f3e5ef2

Merge tag 'ata-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

9mo ago

Kevin Hao

82bedbfe

spi: fsl-spi: Remove redundant probe error message

9mo ago

Pali Rohár

56c0bea5

cifs: Split parse_reparse_point callback to functions: get buffer and parse buffer

9mo ago

Uwe Kleine-König

928446a5

pwm: fsl-ftm: Handle clk_get_rate() returning 0

9mo ago

Alexei Starovoitov

7bbb38f1

Merge branch 'support-skf_net_off-and-skf_ll_off-on-skb-frags'

9mo ago

Steven Rostedt

e1a453a5

tracing: Do not add length to print format in synthetic events

9mo ago

Linus Torvalds

302deb10

Merge tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9mo ago

Yeoreum Yun

a3c3c666

perf/core: Fix child_total_time_enabled accounting bug at task exit

The perf events code fails to account for total_time_enabled of
inactive events.

Here is a failure case for accounting total_time_enabled for
CPU PMU events:

sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 2s
...

armv8_pmuv3_0/event=0x08/: 1138698008 2289429840 2174835740
armv8_pmuv3_1/event=0x08/: 1826791390 1950025700 847648440
` ` `
` ` > total_time_running with child
` > total_time_enabled with child
> count with child

Performance counter stats for 'stress-ng --pthread=2 -t 2s':

1,138,698,008 armv8_pmuv3_0/event=0x08/ (94.99%)
1,826,791,390 armv8_pmuv3_1/event=0x08/ (43.47%)

The two events above are opened on two different CPU PMUs, for example,
each event is opened for a cluster in an Arm big.LITTLE system, they
will never run on the same CPU. In theory, the total enabled time should
be same for both events, as two events are opened and closed together.

As the result show, the two events' total enabled time including
child event is different (2289429840 vs 1950025700).

This is because child events are not accounted properly
if a event is INACTIVE state when the task exits:

perf_event_exit_event()
`> perf_remove_from_context()
`> __perf_remove_from_context()
`> perf_child_detach() -> Accumulate child_total_time_enabled
`> list_del_event() -> Update child event's time

The problem is the time accumulation happens prior to child event's
time updating. Thus, it misses to account the last period's time when
the event exits.

The perf core layer follows the rule that timekeeping is tied to state
change. To address the issue, make __perf_remove_from_context()
handle the task exit case by passing 'DETACH_EXIT' to it and
invoke perf_event_state() for state alongside with accounting the time.

Then, perf_child_detach() populates the time into the parent's time metrics.

After this patch, the bug is fixed:

sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 10s
...
armv8_pmuv3_0/event=0x08/: 15396770398 32157963940 21898169000
armv8_pmuv3_1/event=0x08/: 22428964974 32157963940 10259794940

Performance counter stats for 'stress-ng --pthread=2 -t 10s':

15,396,770,398 armv8_pmuv3_0/event=0x08/ (68.10%)
22,428,964,974 armv8_pmuv3_1/event=0x08/ (31.90%)

[ mingo: Clarified the changelog. ]

Fixes: ef54c1a476aef ("perf: Rework perf_event_exit_event()")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250326082003.1630986-1-yeoreum.yun@arm.com

9mo ago

Bard Liao

08ae0d61

soundwire: take in count the bandwidth of a prepared stream

10mo ago

Len Brown

99463389

tools/power turbostat: re-factor sysfs code

9mo ago

Zhang Yi

129245cf

ext4: correct the error handle in ext4_fallocate()

10mo ago

Linus Torvalds

ff885625

Merge tag 'block-6.15-20250411' of git://git.kernel.dk/linux

9mo ago

Wentao Liang

8d46a270

ata: sata_sx4: Add error handling in pdc20621_i2c_read()

9mo ago

Kevin Hao

5d07ab2a

spi: fsl-qspi: Fix double cleanup in probe error path

9mo ago

Pali Rohár

12193b98

cifs: Improve handling of name surrogate reparse points in reparse.c

9mo ago

Linux 6.15-rc2 v6.15-rc2

8ffd015d

Linus Torvalds

9mo

Merge tag 'erofs-for-6.15-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

004a365e

Linus Torvalds

9mo

Merge tag 'ext4_for_linus-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

5aaaedb0

Linus Torvalds

9mo

erofs: remove duplicate code

f5ffef98

Bo Liu

9mo

Merge tag 'fixes-2025-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

051ea726

Linus Torvalds

9mo

ext4: fix off-by-one error in do_split

94824ac9

Artem Sadovnikov

9mo

erofs: fix encoded extents handling

be45319c

Gao Xiang

9mo

Merge tag 'trace-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

7cdabafc

Linus Torvalds

9mo

memblock tests: Fix mutex related build error

ed471e19

Masami Hiramatsu (Google)

9mo

ext4: make block validity check resistent to sb bh corruption

ccad447a

Ojaswin Mujoo

9mo

erofs: add __packed annotation to union(__le16..)

d385f15d

Gao Xiang

9mo

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

b676ac48

Linus Torvalds

9mo

rv: Fix out-of-bound memory access in rv_is_container_monitor()

8d7861ac

Nam Cao

9mo

Linux 6.15-rc1 v6.15-rc1

0af2f6be

Linus Torvalds

9mo

ext4: avoid -Wflex-array-member-not-at-end warning

7e50bbb1

Gustavo A. R. Silva

9mo

erofs: set error to bio if file-backed IO fails

1595f153

Sheng Yong

9mo

Merge tag 'pwm/for-6.15-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ukleinek/linux

ecd5d67a

Linus Torvalds

9mo

bpf: Convert ringbuf map to rqspinlock

a650d389

Kumar Kartikeya Dwivedi

9mo

ftrace: Do not have print_graph_retval() add a newline

485acd20

Steven Rostedt

9mo

tools/include: make uapi/linux/types.h usable from assembly

0efdedb3

Thomas Weißschuh

9mo

Documentation: ext4: Add fields to ext4_super_block documentation

ce7e8a65

Tom Vierjahn

9mo

Merge tag 'v6.15-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

3bde70a2

Linus Torvalds

9mo

pwm: axi-pwmgen: Let .round_waveform_tohw() signal when request was rounded up

a85e08a0

Uwe Kleine-König

9mo

bpf: Convert queue_stack map to rqspinlock

2f41503d

Kumar Kartikeya Dwivedi

9mo

tracing/selftest: Add test to better test subops filtering of function graph

a1fc89d4

Steven Rostedt

9mo

Merge tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

71032925

Linus Torvalds

9mo

ext4: don't treat fhandle lookup of ea_inode as FS corruption

642335f3

Jann Horn

9mo

Merge tag 'pci-v6.15-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

5d749923

Linus Torvalds

9mo

smb3: Add defines for two new FileSystemAttributes

56c283b9

Steve French

9mo

pwm: stm32: Search an appropriate duty_cycle if period cannot be modified

fda6e003

Uwe Kleine-König

9mo

bpf: Use architecture provided res_smp_cond_load_acquire

92b90f78

Kumar Kartikeya Dwivedi

9mo

ftrace: Fix accounting of subop hashes

0ae6b8ce

Steven Rostedt

9mo

Merge tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

59f392fa

Linus Torvalds

9mo

tools/power turbostat: v2025.05.06

03e00e37

Len Brown

9mo

ext4: fix OOB read when checking dotdot dir

d5e20677

Acs, Jakub

10mo

Merge tag 'spi-fix-v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

e618ee89

Linus Torvalds

9mo

PCI: Run quirk_huawei_pcie_sva() before arm_smmu_probe_device()

c8ba3f8a

Zhangfei Gao

9mo

cifs: Fix querying of WSL CHR and BLK reparse points over SMB1

ef86ab13

Pali Rohár

9mo

pwm: Let pwm_set_waveform() succeed even if lowlevel driver rounded up

00e53d0f

Uwe Kleine-König

9mo

selftests/bpf: Make res_spin_lock AA test condition stronger

1ddb9ad2

Kumar Kartikeya Dwivedi

9mo

ftrace: Properly merge notrace hashes

04a80a34

Andy Chiu

9mo

Merge tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

dda88878

Linus Torvalds

9mo

ASoC: SOF: Intel: Let SND_SOF_SOF_HDA_SDW_BPT select SND_HDA_EXT_CORE

fcc0f169

Bard Liao

9mo

tools/power turbostat: disable "cpuidle" invocation counters, by default

ec4acd31

Len Brown

9mo

ext4: on a remount, only log the ro or r/w state when it has changed

d7b0befd

Nicolas Bretz

10mo

Merge tag 'ata-6.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

2f3e5ef2

Linus Torvalds

9mo

spi: fsl-spi: Remove redundant probe error message

82bedbfe

Kevin Hao

9mo

cifs: Split parse_reparse_point callback to functions: get buffer and parse buffer

56c0bea5

Pali Rohár

9mo

pwm: fsl-ftm: Handle clk_get_rate() returning 0

928446a5

Uwe Kleine-König

9mo

Merge branch 'support-skf_net_off-and-skf_ll_off-on-skb-frags'

7bbb38f1

Alexei Starovoitov

9mo

tracing: Do not add length to print format in synthetic events

The following causes a vsnprintf fault:

# echo 's:wake_lat char[] wakee; u64 delta;' >> /sys/kernel/tracing/dynamic_events
# echo 'hist:keys=pid:ts=common_timestamp.usecs if !(common_flags & 0x18)' > /sys/kernel/tracing/events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:delta=common_timestamp.usecs-$ts:onmatch(sched.sched_waking).trace(wake_lat,next_comm,$delta)' > /sys/kernel/tracing/events/sched/sched_switch/trigger

Because the synthetic event's "wakee" field is created as a dynamic string
(even though the string copied is not). The print format to print the
dynamic string changed from "%*s" to "%s" because another location
(__set_synth_event_print_fmt()) exported this to user space, and user
space did not need that. But it is still used in print_synth_event(), and
the output looks like:

<idle>-0 [001] d..5. 193.428167: wake_lat: wakee=(efault)sshd-sessiondelta=155
sshd-session-879 [001] d..5. 193.811080: wake_lat: wakee=(efault)kworker/u34:5delta=58
<idle>-0 [002] d..5. 193.811198: wake_lat: wakee=(efault)bashdelta=91
bash-880 [002] d..5. 193.811371: wake_lat: wakee=(efault)kworker/u35:2delta=21
<idle>-0 [001] d..5. 193.811516: wake_lat: wakee=(efault)sshd-sessiondelta=129
sshd-session-879 [001] d..5. 193.967576: wake_lat: wakee=(efault)kworker/u34:5delta=50

The length isn't needed as the string is always nul terminated. Just print
the string and not add the length (which was hard coded to the max string
length anyway).

Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/20250407154139.69955768@gandalf.local.home
Fixes: 4d38328eb442d ("tracing: Fix synth event printk format for str fields");
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>

e1a453a5

Steven Rostedt

9mo

Merge tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

302deb10

Linus Torvalds

9mo

perf/core: Fix child_total_time_enabled accounting bug at task exit

a3c3c666

Yeoreum Yun

9mo

soundwire: take in count the bandwidth of a prepared stream

08ae0d61

Bard Liao

10mo

tools/power turbostat: re-factor sysfs code

99463389

Len Brown

9mo

ext4: correct the error handle in ext4_fallocate()

129245cf

Zhang Yi

10mo

Merge tag 'block-6.15-20250411' of git://git.kernel.dk/linux

ff885625

Linus Torvalds

9mo

ata: sata_sx4: Add error handling in pdc20621_i2c_read()

8d46a270

Wentao Liang

9mo

spi: fsl-qspi: Fix double cleanup in probe error path

5d07ab2a

Kevin Hao

9mo

cifs: Improve handling of name surrogate reparse points in reparse.c

12193b98

Pali Rohár

9mo