commits

The thread flag TIF_NEED_FPU_LOAD indicates that the FPU saved state is
valid and should be reloaded when returning to userspace. However, the
kernel will skip doing this if the FPU registers are already valid as
determined by fpregs_state_valid(). The logic embedded there considers
the state valid if two cases are both true:

1: fpu_fpregs_owner_ctx points to the current tasks FPU state
2: the last CPU the registers were live in was the current CPU.

This is usually correct logic. A CPU’s fpu_fpregs_owner_ctx is set to
the current FPU during the fpregs_restore_userregs() operation, so it
indicates that the registers have been restored on this CPU. But this
alone doesn’t preclude that the task hasn’t been rescheduled to a
different CPU, where the registers were modified, and then back to the
current CPU. To verify that this was not the case the logic relies on the
second condition. So the assumption is that if the registers have been
restored, AND they haven’t had the chance to be modified (by being
loaded on another CPU), then they MUST be valid on the current CPU.

Besides the lazy FPU optimizations, the other cases where the FPU
registers might not be valid are when the kernel modifies the FPU register
state or the FPU saved buffer. In this case the operation modifying the
FPU state needs to let the kernel know the correspondence has been
broken. The comment in “arch/x86/kernel/fpu/context.h” has:
/*
...
* If the FPU register state is valid, the kernel can skip restoring the
* FPU state from memory.
*
* Any code that clobbers the FPU registers or updates the in-memory
* FPU state for a task MUST let the rest of the kernel know that the
* FPU registers are no longer valid for this task.
*
* Either one of these invalidation functions is enough. Invalidate
* a resource you control: CPU if using the CPU for something else
* (with preemption disabled), FPU for the current task, or a task that
* is prevented from running by the current task.
*/

However, this is not completely true. When the kernel modifies the
registers or saved FPU state, it can only rely on
__fpu_invalidate_fpregs_state(), which wipes the FPU’s last_cpu
tracking. The exec path instead relies on fpregs_deactivate(), which sets
the CPU’s FPU context to NULL. This was observed to fail to restore the
reset FPU state to the registers when returning to userspace in the
following scenario:

1. A task is executing in userspace on CPU0
- CPU0’s FPU context points to tasks
- fpu->last_cpu=CPU0

2. The task exec()’s

3. While in the kernel the task is preempted
- CPU0 gets a thread executing in the kernel (such that no other
FPU context is activated)
- Scheduler sets task’s fpu->last_cpu=CPU0 when scheduling out

4. Task is migrated to CPU1

5. Continuing the exec(), the task gets to
fpu_flush_thread()->fpu_reset_fpregs()
- Sets CPU1’s fpu context to NULL
- Copies the init state to the task’s FPU buffer
- Sets TIF_NEED_FPU_LOAD on the task

6. The task reschedules back to CPU0 before completing the exec() and
returning to userspace
- During the reschedule, scheduler finds TIF_NEED_FPU_LOAD is set
- Skips saving the registers and updating task’s fpu→last_cpu,
because TIF_NEED_FPU_LOAD is the canonical source.

7. Now CPU0’s FPU context is still pointing to the task’s, and
fpu->last_cpu is still CPU0. So fpregs_state_valid() returns true even
though the reset FPU state has not been restored.

So the root cause is that exec() is doing the wrong kind of invalidate. It
should reset fpu->last_cpu via __fpu_invalidate_fpregs_state(). Further,
fpu__drop() doesn't really seem appropriate as the task (and FPU) are not
going away, they are just getting reset as part of an exec. So switch to
__fpu_invalidate_fpregs_state().

Also, delete the misleading comment that says that either kind of
invalidate will be enough, because it’s not always the case.

Fixes: 33344368cb08 ("x86/fpu: Clean up the fpu__clear() variants")
Reported-by: Lei Wang <lei4.wang@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Lijun Pan <lijun.pan@intel.com>
Reviewed-by: Sohil Mehta <sohil.mehta@intel.com>
Acked-by: Lijun Pan <lijun.pan@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230818170305.502891-1-rick.p.edgecombe@intel.com

2y ago

Neil Armstrong

c422fbd5

scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW major version > 5

2y ago

Linus Torvalds

7d2f353b

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

2y ago

Huacai Chen

9730870b

LoongArch: Fix hw_breakpoint_control() for watchpoints

2y ago

Thomas Gleixner

de990908

Merge tag 'irqchip-fixes-6.5-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

2y ago

Linus Torvalds

706a7415

Linux 6.5-rc7 v6.5-rc7

2y ago

Bao D. Nguyen

d0c89af3

scsi: ufs: mcq: Fix the search/wrap around logic

2y ago

Helge Deller

382d4cd1

lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels

The gcc compiler translates on some architectures the 64-bit
__builtin_clzll() function to a call to the libgcc function __clzdi2(),
which should take a 64-bit parameter on 32- and 64-bit platforms.

But in the current kernel code, the built-in __clzdi2() function is
defined to operate (wrongly) on 32-bit parameters if BITS_PER_LONG ==
32, thus the return values on 32-bit kernels are in the range from
[0..31] instead of the expected [0..63] range.

This patch fixes the in-kernel functions __clzdi2() and __ctzdi2() to
take a 64-bit parameter on 32-bit kernels as well, thus it makes the
functions identical for 32- and 64-bit kernels.

This bug went unnoticed since kernel 3.11 for over 10 years, and here
are some possible reasons for that:

a) Some architectures have assembly instructions to count the bits and
which are used instead of calling __clzdi2(), e.g. on x86 the bsr
instruction and on ppc cntlz is used. On such architectures the
wrong __clzdi2() implementation isn't used and as such the bug has
no effect and won't be noticed.

b) Some architectures link to libgcc.a, and the in-kernel weak
functions get replaced by the correct 64-bit variants from libgcc.a.

c) __builtin_clzll() and __clzdi2() doesn't seem to be used in many
places in the kernel, and most likely only in uncritical functions,
e.g. when printing hex values via seq_put_hex_ll(). The wrong return
value will still print the correct number, but just in a wrong
formatting (e.g. with too many leading zeroes).

d) 32-bit kernels aren't used that much any longer, so they are less
tested.

A trivial testcase to verify if the currently running 32-bit kernel is
affected by the bug is to look at the output of /proc/self/maps:

Here the kernel uses a correct implementation of __clzdi2():

root@debian:~# cat /proc/self/maps
00010000-00019000 r-xp 00000000 08:05 787324 /usr/bin/cat
00019000-0001a000 rwxp 00009000 08:05 787324 /usr/bin/cat
0001a000-0003b000 rwxp 00000000 00:00 0 [heap]
f7551000-f770d000 r-xp 00000000 08:05 794765 /usr/lib/hppa-linux-gnu/libc.so.6
...

and this kernel uses the broken implementation of __clzdi2():

root@debian:~# cat /proc/self/maps
0000000010000-0000000019000 r-xp 00000000 000000008:000000005 787324 /usr/bin/cat
0000000019000-000000001a000 rwxp 000000009000 000000008:000000005 787324 /usr/bin/cat
000000001a000-000000003b000 rwxp 00000000 00:00 0 [heap]
00000000f73d1000-00000000f758d000 r-xp 00000000 000000008:000000005 794765 /usr/lib/hppa-linux-gnu/libc.so.6
...

Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 4df87bb7b6a22 ("lib: add weak clz/ctz functions")
Cc: Chanho Min <chanho.min@lge.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: stable@vger.kernel.org # v3.11+
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

2y ago

Andrey Skvortsov

66fbfb35

clk: Fix slab-out-of-bounds error in devm_clk_release()

Problem can be reproduced by unloading snd_soc_simple_card, because in
devm_get_clk_from_child() devres data is allocated as `struct clk`, but
devm_clk_release() expects devres data to be `struct devm_clk_state`.

KASAN report:
==================================================================
BUG: KASAN: slab-out-of-bounds in devm_clk_release+0x20/0x54
Read of size 8 at addr ffffff800ee09688 by task (udev-worker)/287

Call trace:
dump_backtrace+0xe8/0x11c
show_stack+0x1c/0x30
dump_stack_lvl+0x60/0x78
print_report+0x150/0x450
kasan_report+0xa8/0xf0
__asan_load8+0x78/0xa0
devm_clk_release+0x20/0x54
release_nodes+0x84/0x120
devres_release_all+0x144/0x210
device_unbind_cleanup+0x1c/0xac
really_probe+0x2f0/0x5b0
__driver_probe_device+0xc0/0x1f0
driver_probe_device+0x68/0x120
__driver_attach+0x140/0x294
bus_for_each_dev+0xec/0x160
driver_attach+0x38/0x44
bus_add_driver+0x24c/0x300
driver_register+0xf0/0x210
__platform_driver_register+0x48/0x54
asoc_simple_card_init+0x24/0x1000 [snd_soc_simple_card]
do_one_initcall+0xac/0x340
do_init_module+0xd0/0x300
load_module+0x2ba4/0x3100
__do_sys_init_module+0x2c8/0x300
__arm64_sys_init_module+0x48/0x5c
invoke_syscall+0x64/0x190
el0_svc_common.constprop.0+0x124/0x154
do_el0_svc+0x44/0xdc
el0_svc+0x14/0x50
el0t_64_sync_handler+0xec/0x11c
el0t_64_sync+0x14c/0x150

Allocated by task 287:
kasan_save_stack+0x38/0x60
kasan_set_track+0x28/0x40
kasan_save_alloc_info+0x20/0x30
__kasan_kmalloc+0xac/0xb0
__kmalloc_node_track_caller+0x6c/0x1c4
__devres_alloc_node+0x44/0xb4
devm_get_clk_from_child+0x44/0xa0
asoc_simple_parse_clk+0x1b8/0x1dc [snd_soc_simple_card_utils]
simple_parse_node.isra.0+0x1ec/0x230 [snd_soc_simple_card]
simple_dai_link_of+0x1bc/0x334 [snd_soc_simple_card]
__simple_for_each_link+0x2ec/0x320 [snd_soc_simple_card]
asoc_simple_probe+0x468/0x4dc [snd_soc_simple_card]
platform_probe+0x90/0xf0
really_probe+0x118/0x5b0
__driver_probe_device+0xc0/0x1f0
driver_probe_device+0x68/0x120
__driver_attach+0x140/0x294
bus_for_each_dev+0xec/0x160
driver_attach+0x38/0x44
bus_add_driver+0x24c/0x300
driver_register+0xf0/0x210
__platform_driver_register+0x48/0x54
asoc_simple_card_init+0x24/0x1000 [snd_soc_simple_card]
do_one_initcall+0xac/0x340
do_init_module+0xd0/0x300
load_module+0x2ba4/0x3100
__do_sys_init_module+0x2c8/0x300
__arm64_sys_init_module+0x48/0x5c
invoke_syscall+0x64/0x190
el0_svc_common.constprop.0+0x124/0x154
do_el0_svc+0x44/0xdc
el0_svc+0x14/0x50
el0t_64_sync_handler+0xec/0x11c
el0t_64_sync+0x14c/0x150

The buggy address belongs to the object at ffffff800ee09600
which belongs to the cache kmalloc-256 of size 256
The buggy address is located 136 bytes inside of
256-byte region [ffffff800ee09600, ffffff800ee09700)

The buggy address belongs to the physical page:
page:000000002d97303b refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x4ee08
head:000000002d97303b order:1 compound_mapcount:0 compound_pincount:0
flags: 0x10200(slab|head|zone=0)
raw: 0000000000010200 0000000000000000 dead000000000122 ffffff8002c02480
raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffffff800ee09580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffff800ee09600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
>ffffff800ee09680: 00 fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
^
ffffff800ee09700: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffffff800ee09780: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
==================================================================

Fixes: abae8e57e49a ("clk: generalize devm_clk_get() a bit")
Signed-off-by: Andrey Skvortsov <andrej.skvortzov@gmail.com>
Link: https://lore.kernel.org/r/20230805084847.3110586-1-andrej.skvortzov@gmail.com
Signed-off-by: Stephen Boyd <sboyd@kernel.org>

2y ago

Huacai Chen

656f9aec

LoongArch: Ensure FP/SIMD registers in the core dump file is up to date

2y ago

Andy Shevchenko

67a4e1a3

irqdomain: Use return value of strreplace()

2y ago

Lorenzo Pieralisi

6fe5c68e

irqchip/gic-v3: Workaround for GIC-700 erratum 2941627

2y ago

Linus Torvalds

b320441c

Merge tag 'tty-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

2y ago

Nilesh Javali

ef222f55

scsi: qedf: Fix firmware halt over suspend and resume

2y ago

Linus Torvalds

6f0edbb8

Merge tag 'mm-hotfixes-stable-2023-08-25-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2y ago

Biju Das

2746f13f

clk: Fix undefined reference to `clk_rate_exclusive_{get,put}'

2y ago

Tiezhu Yang

c337c849

LoongArch: Put the body of play_dead() into arch_cpu_idle_dead()

2y ago

Linus Torvalds

00173879

Merge tag 'irq-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:

Core:

- Convert the interrupt descriptor storage to a maple tree to
overcome the limitations of the radixtree + fixed size bitmap.

This allows us to handle very large servers with a huge number of
guests without imposing a huge memory overhead on everyone

- Implement optional retriggering of interrupts which utilize the
fasteoi handler to work around a GICv3 architecture issue

Drivers:

- A set of fixes and updates for the Loongson/Loongarch related
drivers

- Workaound for an ASR8601 integration hickup which ends up with CPU
numbering which can't be represented in the GIC implementation

- The usual set of boring fixes and updates all over the place"

* tag 'irq-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
Revert "irqchip/mxs: Include linux/irqchip/mxs.h"
irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
irqchip/stm32-exti: Fix warning on initialized field overwritten
irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map
irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype
irqchip/mxs: Include linux/irqchip/mxs.h
irqchip/clps711x: Remove unused clps711x_intc_init() function
irqchip/mmp: Remove non-DT codepath
irqchip/ftintc010: Mark all function static
irqdomain: Include internals.h for function prototypes
irqchip/loongson-eiointc: Add DT init support
dt-bindings: interrupt-controller: Add Loongson EIOINTC
irqchip/loongson-eiointc: Fix irq affinity setting during resume
irqchip/loongson-liointc: Add IRQCHIP_SKIP_SET_WAKE flag
irqchip/loongson-liointc: Fix IRQ trigger polarity
irqchip/loongson-pch-pic: Fix potential incorrect hwirq assignment
irqchip/loongson-pch-pic: Fix initialization of HT vector register
irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs
genirq: Allow fasteoi handler to resend interrupts on concurrent handling
genirq: Expand doc for PENDING and REPLAY flags
...

2y ago

Sebastian Reichel

567f67ac

irqchip/gic-v3: Enable Rockchip 3588001 erratum workaround for RK3588S

2y ago

Linus Torvalds

ec27a636

Merge tag 'rust-fixes-6.5-rc7' of https://github.com/Rust-for-Linux/linux

2y ago

Tony Lindgren

04c7f60c

serial: core: Fix serial core port id, including multiport devices

2y ago

Nilesh Javali

1516ee03

scsi: qedi: Fix firmware halt over suspend and resume

2y ago

Linus Torvalds

4942fed8

Merge tag 'riscv-for-linus-6.5-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

2y ago

Hugh Dickins

e5548f85

shmem: fix smaps BUG sleeping while atomic

2y ago

Francesco Dolcini

dd1df82c

clk: keystone: syscon-clk: Fix audio refclk

2y ago

Tiezhu Yang

8879515e

LoongArch: Add identifier names to arguments of die() declaration

2y ago

Linus Torvalds

cef2dd76

Merge tag 'core-debugobjects-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip

2y ago

Thomas Gleixner

f121ab7f

Merge tag 'irqchip-6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

2y ago

Marc Zyngier

926846a7

irqchip/gic-v4.1: Properly lock VPEs when doing a directLPI invalidation

2y ago

Linus Torvalds

9e6c269d

Merge tag 'i2c-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

2y ago

Qingsong Chen

3fa7187e

rust: macros: vtable: fix `HAS_*` redefinition (`gen_const_name`)

2y ago

Jiri Slaby (SUSE)

3d9e6f55

serial: 8250: drop lockdep annotation from serial8250_clear_IER()

2y ago

Chengfeng Ye

dd64f805

scsi: qedi: Fix potential deadlock on &qedi_percpu->p_work_lock

2y ago

Linus Torvalds

98c6b8a5

Merge tag 'gpio-fixes-for-v6.5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

2y ago

Mingzheng Xing

ef21fa7c

riscv: Fix build errors using binutils2.37 toolchains

2y ago

Andre Przywara

f84f62e6

selftests: cachestat: catch failing fsync test on tmpfs

2y ago

Stephen Boyd

ae9b1458

Merge tag 'clk-meson-fixes-v6.5-1' of https://github.com/BayLibre/clk-meson into clk-fixes

2y ago

Tiezhu Yang

a038ae71

LoongArch: Return earlier in die() if notify_die() returns NOTIFY_STOP

2y ago

Linus Torvalds

a0433f8c

Merge tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

- NVMe pull request via Keith:
- Various cleanups all around (Irvin, Chaitanya, Christophe)
- Better struct packing (Christophe JAILLET)
- Reduce controller error logs for optional commands (Keith)
- Support for >=64KiB block sizes (Daniel Gomez)
- Fabrics fixes and code organization (Max, Chaitanya, Daniel
Wagner)

- bcache updates via Coly:
- Fix a race at init time (Mingzhe Zou)
- Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye)

- use page pinning in the block layer for dio (David)

- convert old block dio code to page pinning (David, Christoph)

- cleanups for pktcdvd (Andy)

- cleanups for rnbd (Guoqing)

- use the unchecked __bio_add_page() for the initial single page
additions (Johannes)

- fix overflows in the Amiga partition handling code (Michael)

- improve mq-deadline zoned device support (Bart)

- keep passthrough requests out of the IO schedulers (Christoph, Ming)

- improve support for flush requests, making them less special to deal
with (Christoph)

- add bdev holder ops and shutdown methods (Christoph)

- fix the name_to_dev_t() situation and use cases (Christoph)

- decouple the block open flags from fmode_t (Christoph)

- ublk updates and cleanups, including adding user copy support (Ming)

- BFQ sanity checking (Bart)

- convert brd from radix to xarray (Pankaj)

- constify various structures (Thomas, Ivan)

- more fine grained persistent reservation ioctl capability checks
(Jingbo)

- misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan,
Jordy, Li, Min, Yu, Zhong, Waiman)

* tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits)
scsi/sg: don't grab scsi host module reference
ext4: Fix warning in blkdev_put()
block: don't return -EINVAL for not found names in devt_from_devname
cdrom: Fix spectre-v1 gadget
block: Improve kernel-doc headers
blk-mq: don't insert passthrough request into sw queue
bsg: make bsg_class a static const structure
ublk: make ublk_chr_class a static const structure
aoe: make aoe_class a static const structure
block/rnbd: make all 'class' structures const
block: fix the exclusive open mask in disk_scan_partitions
block: add overflow checks for Amiga partition support
block: change all __u32 annotations to __be32 in affs_hardblocks.h
block: fix signed int overflow in Amiga partition support
block: add capacity validation in bdev_add_partition()
block: fine-granular CAP_SYS_ADMIN for Persistent Reservation
block: disallow Persistent Reservation on partitions
reiserfs: fix blkdev_put() warning from release_journal_dev()
block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()
block: document the holder argument to blkdev_get_by_path
...