commits

Do not mix class->size and object size during offsets/sizes calculation in
zs_obj_write(). Size classes can merge into clusters, based on
objects-per-zspage and pages-per-zspage characteristics, so some size
classes can store objects smaller than class->size. This becomes
problematic when object size is much smaller than class->size. zsmalloc
can falsely decide that object spans two physical pages, because a larger
class->size value is used for that check, while the actual object is much
smaller and fits the free space of the first physical page, so there is
nothing to write to the second page and memcpy() size calculation
underflows.

Unable to handle kernel paging request at virtual address ffffc00081ff4000
pc : __memcpy+0x10/0x24
lr : zs_obj_write+0x1b0/0x1d0 [zsmalloc]
Call trace:
__memcpy+0x10/0x24 (P)
zram_write_page+0x150/0x4fc [zram]
zram_submit_bio+0x5e0/0x6a4 [zram]
__submit_bio+0x168/0x220
submit_bio_noacct_nocheck+0x128/0x2c8
submit_bio_noacct+0x19c/0x2f8

This is mostly seen on system with larger page-sizes, because size class
cluters of such systems hold wider size ranges than on 4K PAGE_SIZE
systems.

Assume a 16K PAGE_SIZE system, a write of 820 bytes object to a 864-bytes
size class at offset 15560. 15560 + 864 is more than 16384 so zsmalloc
attempts to memcpy() it to two physical pages. However, 16384 - 15560 =
824 which is more than 820, so the object in fact doesn't span two
physical pages, and there is no data to write to the second physical page.

We always know the exact size in bytes of the object that we are about to
write (store), so use it instead of class->size.

Link: https://lkml.kernel.org/r/20250507054312.4135983-1-senozhatsky@chromium.org
Fixes: 44f76413496e ("zsmalloc: introduce new object mapping API")
Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reported-by: Igor Belousov <igor.b@beldev.am>
Tested-by: Igor Belousov <igor.b@beldev.am>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

8mo ago

Thomas Weißschuh

0efdedb3

tools/include: make uapi/linux/types.h usable from assembly

9mo ago

Claudiu Beznea

55a387eb

phy: renesas: rcar-gen3-usb2: Lock around hardware registers and driver data

The phy-rcar-gen3-usb2 driver exposes four individual PHYs that are
requested and configured by PHY users. The struct phy_ops APIs access the
same set of registers to configure all PHYs. Additionally, PHY settings can
be modified through sysfs or an IRQ handler. While some struct phy_ops APIs
are protected by a driver-wide mutex, others rely on individual
PHY-specific mutexes.

This approach can lead to various issues, including:
1/ the IRQ handler may interrupt PHY settings in progress, racing with
hardware configuration protected by a mutex lock
2/ due to msleep(20) in rcar_gen3_init_otg(), while a configuration thread
suspends to wait for the delay, another thread may try to configure
another PHY (with phy_init() + phy_power_on()); re-running the
phy_init() goes to the exact same configuration code, re-running the
same hardware configuration on the same set of registers (and bits)
which might impact the result of the msleep for the 1st configuring
thread
3/ sysfs can configure the hardware (though role_store()) and it can
still race with the phy_init()/phy_power_on() APIs calling into the
drivers struct phy_ops

To address these issues, add a spinlock to protect hardware register access
and driver private data structures (e.g., calls to
rcar_gen3_is_any_rphy_initialized()). Checking driver-specific data remains
necessary as all PHY instances share common settings. With this change,
the existing mutex protection is removed and the cleanup.h helpers are
used.

While at it, to keep the code simpler, do not skip
regulator_enable()/regulator_disable() APIs in
rcar_gen3_phy_usb2_power_on()/rcar_gen3_phy_usb2_power_off() as the
regulators enable/disable operations are reference counted anyway.

Fixes: f3b5a8d9b50d ("phy: rcar-gen3-usb2: Add R-Car Gen3 USB2 PHY driver")
Cc: stable@vger.kernel.org
Reviewed-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Tested-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
Link: https://lore.kernel.org/r/20250507125032.565017-4-claudiu.beznea.uj@bp.renesas.com
Signed-off-by: Vinod Koul <vkoul@kernel.org>

8mo ago

Shuai Xue

a409e919

dmaengine: idxd: Refactor remove call with idxd_cleanup() helper

8mo ago

Linus Torvalds

4bcaa590

Merge tag 'perf-urgent-2025-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

8mo ago

Lukas Bulwahn

03680913

x86/mm: Remove duplicated word in warning message

8mo ago

Marc Zyngier

fb0ea6e4

irqchip: Drop MSI_CHIP_FLAG_SET_ACK from unsuspecting MSI drivers

8mo ago

Kirill A. Shutemov

fefc0751

mm/page_alloc: fix race condition in unaccepted memory handling

8mo ago

Linus Torvalds

71032925

Merge tag 'turbostat-2025.05.06' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

9mo ago

Claudiu Beznea

de76809f

phy: renesas: rcar-gen3-usb2: Move IRQ request in probe

8mo ago

Shuai Xue

d5449ff1

dmaengine: idxd: Add missing idxd cleanup to fix memory leak in remove call

8mo ago

Linus Torvalds

c586c97d

Merge tag 'loongarch-fixes-6.15-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

8mo ago

Adrian Hunter

99bcd91f

perf/x86/intel: Fix segfault with PEBS-via-PT with sample_freq

8mo ago

Yazen Ghannam

24ee8d94

x86/CPU/AMD: Add X86_FEATURE_ZEN6

8mo ago

Linus Torvalds

82f2b0b9

Linux 6.15-rc6 v6.15-rc6

8mo ago

Kirill A. Shutemov

23fa022a

mm/page_alloc: ensure try_alloc_pages() plays well with unaccepted memory

8mo ago

Linus Torvalds

59f392fa

Merge tag 'soundwire-6.15-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire

9mo ago

Len Brown

03e00e37

tools/power turbostat: v2025.05.06

9mo ago

Claudiu Beznea

54c4c587

phy: renesas: rcar-gen3-usb2: Fix role detection on unbind/bind

8mo ago

Shuai Xue

90022b3a

dmaengine: idxd: fix memory leak in error handling path of idxd_pci_probe

8mo ago

Linus Torvalds

a1317e1c

Merge tag 'i2c-for-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

8mo ago

Tiezhu Yang

12614f79

LoongArch: uprobes: Remove redundant code about resume_era

8mo ago

Ashish Kalra

82b7f88f

x86/sev: Make sure pages are not skipped during kdump

8mo ago

Linus Torvalds

cd802e7e

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull KVM fixes from Paolo Bonzini:
"ARM:

- Avoid use of uninitialized memcache pointer in user_mem_abort()

- Always set HCR_EL2.xMO bits when running in VHE, allowing
interrupts to be taken while TGE=0 and fixing an ugly bug on
AmpereOne that occurs when taking an interrupt while clearing the
xMO bits (AC03_CPU_36)

- Prevent VMMs from hiding support for AArch64 at any EL virtualized
by KVM

- Save/restore the host value for HCRX_EL2 instead of restoring an
incorrect fixed value

- Make host_stage2_set_owner_locked() check that the entire requested
range is memory rather than just the first page

RISC-V:

- Add missing reset of smstateen CSRs

x86:

- Forcibly leave SMM on SHUTDOWN interception on AMD CPUs to avoid
causing problems due to KVM stuffing INIT on SHUTDOWN (KVM needs to
sanitize the VMCB as its state is undefined after SHUTDOWN,
emulating INIT is the least awful choice).

- Track the valid sync/dirty fields in kvm_run as a u64 to ensure KVM
KVM doesn't goof a sanity check in the future.

- Free obsolete roots when (re)loading the MMU to fix a bug where
pre-faulting memory can get stuck due to always encountering a
stale root.

- When dumping GHCB state, use KVM's snapshot instead of the raw GHCB
page to print state, so that KVM doesn't print stale/wrong
information.

- When changing memory attributes (e.g. shared <=> private), add
potential hugepage ranges to the mmu_invalidate_range_{start,end}
set so that KVM doesn't create a shared/private hugepage when the
the corresponding attributes will become mixed (the attributes are
commited *after* KVM finishes the invalidation).

- Rework the SRSO mitigation to enable BP_SPEC_REDUCE only when KVM
has at least one active VM. Effectively BP_SPEC_REDUCE when KVM is
loaded led to very measurable performance regressions for non-KVM
workloads"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SVM: Set/clear SRSO's BP_SPEC_REDUCE on 0 <=> 1 VM count transitions
KVM: arm64: Fix memory check in host_stage2_set_owner_locked()
KVM: arm64: Kill HCRX_HOST_FLAGS
KVM: arm64: Properly save/restore HCRX_EL2
KVM: arm64: selftest: Don't try to disable AArch64 support
KVM: arm64: Prevent userspace from disabling AArch64 support at any virtualisable EL
KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode
KVM: arm64: Fix uninitialized memcache pointer in user_mem_abort()
KVM: x86/mmu: Prevent installing hugepages when mem attributes are changing
KVM: SVM: Update dump_ghcb() to use the GHCB snapshot fields
KVM: RISC-V: reset smstateen CSRs
KVM: x86/mmu: Check and free obsolete roots in kvm_mmu_reload()
KVM: x86: Check that the high 32bits are clear in kvm_arch_vcpu_ioctl_run()
KVM: SVM: Forcibly leave SMM mode on SHUTDOWN interception

8mo ago

Lorenzo Stoakes

d55582d6

MAINTAINERS: add mm GUP section

8mo ago

Linus Torvalds

dda88878

Merge tag 'perf-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9mo ago

Bard Liao

fcc0f169

ASoC: SOF: Intel: Let SND_SOF_SOF_HDA_SDW_BPT select SND_HDA_EXT_CORE

9mo ago

Len Brown

ec4acd31

tools/power turbostat: disable "cpuidle" invocation counters, by default

9mo ago

Dan Carpenter

83c17847

phy: tegra: xusb: remove a stray unlock

8mo ago

Shuai Xue

46a5cca7

dmaengine: idxd: fix memory leak in error handling path of idxd_alloc

8mo ago

Linus Torvalds

172a9d94

Merge tag '6.15-rc6-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

8mo ago

Wolfram Sang

6c72fc56

Merge tag 'i2c-host-fixes-6.15-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current

8mo ago

Tiezhu Yang

0b326b23

LoongArch: uprobes: Remove user_{en,dis}able_single_step()

8mo ago

Ashish Kalra

d2062cc1

x86/sev: Do not touch VMSA pages during SNP guest memory kdump

8mo ago

Linus Torvalds

ecb9194d

Merge tag 'mips-fixes_6.15_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux

8mo ago

Paolo Bonzini

add20321

Merge tag 'kvm-x86-fixes-6.15-rcN' of https://github.com/kvm-x86/linux into HEAD

8mo ago

David Wang

0ae0227f

mm/codetag: move tag retrieval back upfront in __free_pages()

8mo ago

Linus Torvalds

302deb10

Merge tag 'sched-urgent-2025-04-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9mo ago

Yeoreum Yun

a3c3c666

perf/core: Fix child_total_time_enabled accounting bug at task exit

The perf events code fails to account for total_time_enabled of
inactive events.

Here is a failure case for accounting total_time_enabled for
CPU PMU events:

sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 2s
...

armv8_pmuv3_0/event=0x08/: 1138698008 2289429840 2174835740
armv8_pmuv3_1/event=0x08/: 1826791390 1950025700 847648440
` ` `
` ` > total_time_running with child
` > total_time_enabled with child
> count with child

Performance counter stats for 'stress-ng --pthread=2 -t 2s':

1,138,698,008 armv8_pmuv3_0/event=0x08/ (94.99%)
1,826,791,390 armv8_pmuv3_1/event=0x08/ (43.47%)

The two events above are opened on two different CPU PMUs, for example,
each event is opened for a cluster in an Arm big.LITTLE system, they
will never run on the same CPU. In theory, the total enabled time should
be same for both events, as two events are opened and closed together.

As the result show, the two events' total enabled time including
child event is different (2289429840 vs 1950025700).

This is because child events are not accounted properly
if a event is INACTIVE state when the task exits:

perf_event_exit_event()
`> perf_remove_from_context()
`> __perf_remove_from_context()
`> perf_child_detach() -> Accumulate child_total_time_enabled
`> list_del_event() -> Update child event's time

The problem is the time accumulation happens prior to child event's
time updating. Thus, it misses to account the last period's time when
the event exits.

The perf core layer follows the rule that timekeeping is tied to state
change. To address the issue, make __perf_remove_from_context()
handle the task exit case by passing 'DETACH_EXIT' to it and
invoke perf_event_state() for state alongside with accounting the time.

Then, perf_child_detach() populates the time into the parent's time metrics.

After this patch, the bug is fixed:

sudo ./perf stat -vvv -e armv8_pmuv3_0/event=0x08/ -e armv8_pmuv3_1/event=0x08/ -- stress-ng --pthread=2 -t 10s
...
armv8_pmuv3_0/event=0x08/: 15396770398 32157963940 21898169000
armv8_pmuv3_1/event=0x08/: 22428964974 32157963940 10259794940

Performance counter stats for 'stress-ng --pthread=2 -t 10s':

15,396,770,398 armv8_pmuv3_0/event=0x08/ (68.10%)
22,428,964,974 armv8_pmuv3_1/event=0x08/ (31.90%)

[ mingo: Clarified the changelog. ]

Fixes: ef54c1a476aef ("perf: Rework perf_event_exit_event()")
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Leo Yan <leo.yan@arm.com>
Link: https://lore.kernel.org/r/20250326082003.1630986-1-yeoreum.yun@arm.com