commits
We just sorted the entries and fields last release, so just out of a
perverse sense of curiosity, I decided to see if we can keep things
ordered for even just one release.
The answer is "No. No we cannot".
I suggest that all kernel developers will need weekly training sessions,
involving a lot of Big Bird and Sesame Street. And at the yearly
maintainer summit, we will all sing the alphabet song together.
I doubt I will keep doing this. At some point "perverse sense of
curiosity" turns into just a cold dark place filled with sadness and
despair.
Repeats: 80e62bc8487b ("MAINTAINERS: re-sort all entries and fields")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull dma-mapping fixes from Christoph Hellwig:
- swiotlb area sizing fixes (Petr Tesarik)
* tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: reduce the number of areas to match actual memory pool size
swiotlb: always set the number of areas before allocating the pool
Pull irq update from Borislav Petkov:
- Optimize IRQ domain's name assignment
* tag 'irq_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqdomain: Use return value of strreplace()
Although the desired size of the SWIOTLB memory pool is increased in
swiotlb_adjust_nareas() to match the number of areas, the actual allocation
may be smaller, which may require reducing the number of areas.
For example, Xen uses swiotlb_init_late(), which in turn uses the page
allocator. On x86, page size is 4 KiB and MAX_ORDER is 10 (1024 pages),
resulting in a maximum memory pool size of 4 MiB. This corresponds to 2048
slots of 2 KiB each. The minimum area size is 128 (IO_TLB_SEGSIZE),
allowing at most 2048 / 128 = 16 areas.
If num_possible_cpus() is greater than the maximum number of areas, areas
are smaller than IO_TLB_SEGSIZE and contiguous groups of free slots will
span multiple areas. When allocating and freeing slots, only one area will
be properly locked, causing race conditions on the unlocked slots and
ultimately data corruption, kernel hangs and crashes.
Fixes: 20347fca71a3 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull x86 fpu fix from Borislav Petkov:
- Do FPU AP initialization on Xen PV too which got missed by the recent
boot reordering work
* tag 'x86_urgent_for_v6.5_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/xen: Fix secondary processors' FPU initialization
Since strreplace() returns the pointer to the string itself, use it
directly.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230628150251.17832-1-andriy.shevchenko@linux.intel.com
The number of areas defaults to the number of possible CPUs. However, the
total number of slots may have to be increased after adjusting the number
of areas. Consequently, the number of areas must be determined before
allocating the memory pool. This is even explained with a comment in
swiotlb_init_remap(), but swiotlb_init_late() adjusts the number of areas
after slots are already allocated. The areas may end up being smaller than
IO_TLB_SEGSIZE, which breaks per-area locking.
While fixing swiotlb_init_late(), move all relevant comments before the
definition of swiotlb_adjust_nareas() and convert them to kernel-doc.
Fixes: 20347fca71a3 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull x86 fix from Thomas Gleixner:
"A single fix for the mechanism to park CPUs with an INIT IPI.
On shutdown or kexec, the kernel tries to park the non-boot CPUs with
an INIT IPI. But the same code path is also used by the crash utility.
If the CPU which panics is not the boot CPU then it sends an INIT IPI
to the boot CPU which resets the machine.
Prevent this by validating that the CPU which runs the stop mechanism
is the boot CPU. If not, leave the other CPUs in HLT"
* tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Don't send INIT to boot CPU
Moving the call of fpu__init_cpu() from cpu_init() to start_secondary()
broke Xen PV guests, as those don't call start_secondary() for APs.
Call fpu__init_cpu() in Xen's cpu_bringup(), which is the Xen PV
replacement of start_secondary().
Fixes: b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230703130032.22916-1-jgross@suse.com
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Core:
- Convert the interrupt descriptor storage to a maple tree to
overcome the limitations of the radixtree + fixed size bitmap.
This allows us to handle very large servers with a huge number of
guests without imposing a huge memory overhead on everyone
- Implement optional retriggering of interrupts which utilize the
fasteoi handler to work around a GICv3 architecture issue
Drivers:
- A set of fixes and updates for the Loongson/Loongarch related
drivers
- Workaound for an ASR8601 integration hickup which ends up with CPU
numbering which can't be represented in the GIC implementation
- The usual set of boring fixes and updates all over the place"
* tag 'irq-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
Revert "irqchip/mxs: Include linux/irqchip/mxs.h"
irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
irqchip/stm32-exti: Fix warning on initialized field overwritten
irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map
irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype
irqchip/mxs: Include linux/irqchip/mxs.h
irqchip/clps711x: Remove unused clps711x_intc_init() function
irqchip/mmp: Remove non-DT codepath
irqchip/ftintc010: Mark all function static
irqdomain: Include internals.h for function prototypes
irqchip/loongson-eiointc: Add DT init support
dt-bindings: interrupt-controller: Add Loongson EIOINTC
irqchip/loongson-eiointc: Fix irq affinity setting during resume
irqchip/loongson-liointc: Add IRQCHIP_SKIP_SET_WAKE flag
irqchip/loongson-liointc: Fix IRQ trigger polarity
irqchip/loongson-pch-pic: Fix potential incorrect hwirq assignment
irqchip/loongson-pch-pic: Fix initialization of HT vector register
irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs
genirq: Allow fasteoi handler to resend interrupts on concurrent handling
genirq: Expand doc for PENDING and REPLAY flags
...
Drivers have no business looking into dma-mapping internals and check
what backend is used. Unfortunstely the DRM core is still broken and
tries to do plain page allocations instead of using DMA API allocators
by default and uses various bandaids on when to use dma_alloc_coherent.
Switch nouveau to use the same (broken) scheme as amdgpu and radeon
to remove the last driver user of is_swiotlb_active.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Pull MIPS fixes from Thomas Bogendoerfer:
- fixes for KVM
- fix for loongson build and cpu probing
- DT fixes
* tag 'mips_6.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled
MIPS: dts: add missing space before {
MIPS: Loongson: Fix build error when make modules_install
MIPS: KVM: Fix NULL pointer dereference
MIPS: Loongson: Fix cpu_probe_loongson() again
Parking CPUs in INIT works well, except for the crash case when the CPU
which invokes smp_park_other_cpus_in_init() is not the boot CPU. Sending
INIT to the boot CPU resets the whole machine.
Prevent this by validating that this runs on the boot CPU. If not fall back
and let CPUs hang in HLT.
Fixes: 45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible")
Reported-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/87ttui91jo.ffs@tglx
Niklāvs reported a boot regression on an Alderlake machine and bisected it
to commit 9df9d2f0471b ("init: Invoke arch_cpu_finalize_init() earlier").
By moving the invocation of arch_cpu_finalize_init() further down he
identified that efi_enter_virtual_mode() is the function which causes the
boot hang.
The main difference of the earlier invocation is that the boot CPU is
already fully initialized and mitigations and alternatives are applied.
But the only really interesting change turned out to be IBT, which is now
enabled before efi_enter_virtual_mode(). "ibt=off" on the kernel command
line cured the problem.
Inspection of the involved calls in efi_enter_virtual_mode() unearthed that
efi_set_virtual_address_map() is the only place in the kernel which invokes
an EFI call without the IBT safe wrapper. This went obviously unnoticed so
far as IBT was enabled later.
Use arch_efi_call_virt() instead of efi_call() to cure that.
Fixes: fe379fa4d199 ("x86/ibt: Disable IBT around firmware")
Fixes: 9df9d2f0471b ("init: Invoke arch_cpu_finalize_init() earlier")
Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217602
Link: https://lore.kernel.org/r/87jzvm12q0.ffs@tglx
Pull debugobjects update from Thomas Gleixner:
"A single update for debug objects:
- Recheck whether debug objects is enabled before reporting a problem
to avoid spamming the logs with messages which are caused by a
concurrent OOM"
* tag 'core-debugobjects-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Recheck debug_objects_enabled before reporting
Pull irqchip updates from Marc Zyngier:
- A number of Loogson/Loogarch fixes
- Allow the core code to retrigger an interrupt that has
fired while the same interrupt is being handled on another
CPU, papering over a GICv3 architecture issue
- Work around an integration problem on ASR8601, where the CPU
numbering isn't representable in the GIC implementation...
- Add some missing interrupt to the STM32 irqchip
- A bunch of warning squashing triggered by W=1 builds
Link: https://lore.kernel.org/r/20230623224345.3577134-1-maz@kernel.org
If DEBUG_FS is enabled, the cost of keeping an exact number of total
used slabs is already paid. In this case, there is no reason to use an
inexact number for statistics and kernel messages.
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull xfs fix from Darrick Wong:
"Nothing exciting here, just getting rid of a gcc warning that I got
tired of seeing when I turn on gcov"
* tag 'xfs-6.5-merge-6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: fix uninit warning in xfs_growfs_data
Commit e4de20576986 ("MIPS: KVM: Fix NULL pointer dereference") missed
converting one place accessing cop0 registers, which results in a build
error, if KVM_MIPS_DEBUG_COP0_COUNTERS is enabled.
Fixes: e4de20576986 ("MIPS: KVM: Fix NULL pointer dereference")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can
resume execution due to NMI, SMI and MCE, which has the same issue as the
MWAIT loop.
Kicking the secondary CPUs into INIT makes this safe against NMI and SMI.
A broadcast MCE will take the machine down, but a broadcast MCE which makes
HLT resume and execute overwritten text, pagetables or data will end up in
a disaster too.
So chose the lesser of two evils and kick the secondary CPUs into INIT
unless the system has installed special wakeup mechanisms which are not
using INIT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230615193330.608657211@linutronix.de
Pull x86 boot updates from Thomas Gleixner:
"Initialize FPU late.
Right now FPU is initialized very early during boot. There is no real
requirement to do so. The only requirement is to have it done before
alternatives are patched.
That's done in check_bugs() which does way more than what the function
name suggests.
So first rename check_bugs() to arch_cpu_finalize_init() which makes
it clear what this is about.
Move the invocation of arch_cpu_finalize_init() earlier in
start_kernel() as it has to be done before fork_init() which needs to
know the FPU register buffer size.
With those prerequisites the FPU initialization can be moved into
arch_cpu_finalize_init(), which removes it from the early and fragile
part of the x86 bringup"
* tag 'x86-boot-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n build
x86/fpu: Move FPU initialization into arch_cpu_finalize_init()
x86/fpu: Mark init functions __init
x86/fpu: Remove cpuinfo argument from init functions
x86/init: Initialize signal frame size late
init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()
init: Invoke arch_cpu_finalize_init() earlier
init: Remove check_bugs() leftovers
um/cpu: Switch to arch_cpu_finalize_init()
sparc/cpu: Switch to arch_cpu_finalize_init()
sh/cpu: Switch to arch_cpu_finalize_init()
mips/cpu: Switch to arch_cpu_finalize_init()
m68k/cpu: Switch to arch_cpu_finalize_init()
loongarch/cpu: Switch to arch_cpu_finalize_init()
ia64/cpu: Switch to arch_cpu_finalize_init()
ARM: cpu: Switch to arch_cpu_finalize_init()
x86/cpu: Switch to arch_cpu_finalize_init()
init: Provide arch_cpu_finalize_init()
Pull block updates from Jens Axboe:
- NVMe pull request via Keith:
- Various cleanups all around (Irvin, Chaitanya, Christophe)
- Better struct packing (Christophe JAILLET)
- Reduce controller error logs for optional commands (Keith)
- Support for >=64KiB block sizes (Daniel Gomez)
- Fabrics fixes and code organization (Max, Chaitanya, Daniel
Wagner)
- bcache updates via Coly:
- Fix a race at init time (Mingzhe Zou)
- Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye)
- use page pinning in the block layer for dio (David)
- convert old block dio code to page pinning (David, Christoph)
- cleanups for pktcdvd (Andy)
- cleanups for rnbd (Guoqing)
- use the unchecked __bio_add_page() for the initial single page
additions (Johannes)
- fix overflows in the Amiga partition handling code (Michael)
- improve mq-deadline zoned device support (Bart)
- keep passthrough requests out of the IO schedulers (Christoph, Ming)
- improve support for flush requests, making them less special to deal
with (Christoph)
- add bdev holder ops and shutdown methods (Christoph)
- fix the name_to_dev_t() situation and use cases (Christoph)
- decouple the block open flags from fmode_t (Christoph)
- ublk updates and cleanups, including adding user copy support (Ming)
- BFQ sanity checking (Bart)
- convert brd from radix to xarray (Pankaj)
- constify various structures (Thomas, Ivan)
- more fine grained persistent reservation ioctl capability checks
(Jingbo)
- misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan,
Jordy, Li, Min, Yu, Zhong, Waiman)
* tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits)
scsi/sg: don't grab scsi host module reference
ext4: Fix warning in blkdev_put()
block: don't return -EINVAL for not found names in devt_from_devname
cdrom: Fix spectre-v1 gadget
block: Improve kernel-doc headers
blk-mq: don't insert passthrough request into sw queue
bsg: make bsg_class a static const structure
ublk: make ublk_chr_class a static const structure
aoe: make aoe_class a static const structure
block/rnbd: make all 'class' structures const
block: fix the exclusive open mask in disk_scan_partitions
block: add overflow checks for Amiga partition support
block: change all __u32 annotations to __be32 in affs_hardblocks.h
block: fix signed int overflow in Amiga partition support
block: add capacity validation in bdev_add_partition()
block: fine-granular CAP_SYS_ADMIN for Persistent Reservation
block: disallow Persistent Reservation on partitions
reiserfs: fix blkdev_put() warning from release_journal_dev()
block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()
block: document the holder argument to blkdev_get_by_path
...
syzbot is reporting false a positive ODEBUG message immediately after
ODEBUG was disabled due to OOM.
[ 1062.309646][T22911] ODEBUG: Out of memory. ODEBUG disabled
[ 1062.886755][ T5171] ------------[ cut here ]------------
[ 1062.892770][ T5171] ODEBUG: assert_init not available (active state 0) object: ffffc900056afb20 object type: timer_list hint: process_timeout+0x0/0x40
CPU 0 [ T5171] CPU 1 [T22911]
-------------- --------------
debug_object_assert_init() {
if (!debug_objects_enabled)
return;
db = get_bucket(addr);
lookup_object_or_alloc() {
debug_objects_enabled = 0;
return NULL;
}
debug_objects_oom() {
pr_warn("Out of memory. ODEBUG disabled\n");
// all buckets get emptied here, and
}
lookup_object_or_alloc(addr, db, descr, false, true) {
// this bucket is already empty.
return ERR_PTR(-ENOENT);
}
// Emits false positive warning.
debug_print_object(&o, "assert_init");
}
Recheck debug_object_enabled in debug_print_object() to avoid that.
Reported-by: syzbot <syzbot+7937ba6a50bdd00fffdf@syzkaller.appspotmail.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/492fe2ae-5141-d548-ebd5-62f5fe2e57f7@I-love.SAKURA.ne.jp
Closes: https://syzkaller.appspot.com/bug?extid=7937ba6a50bdd00fffdf
The current implementation uses a static bitmap for interrupt descriptor
allocation and a radix tree to pointer store the pointer for lookup.
However, the size of the bitmap is constrained by the build time macro
MAX_SPARSE_IRQS, which may not be sufficient to support high-end servers,
particularly those with GICv4.1 hardware, which require a large interrupt
space to cover LPIs and vSGIs.
Replace the bitmap and the radix tree with a maple tree, which not only
stores pointers for lookup, but also provides a mechanism to find free
ranges. That removes the build time hardcoded upper limit.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230519134902.1495562-4-sdonthineni@nvidia.com
* irq/misc-6.5:
: .
: Misc cleanups:
:
: - Add a number of missing prototypes
: - Mark global symbol as static where needed
: - Drop some now useless non-DT code paths
: - Add a missing interrupt mapping to the STM32 irqchip
: - Silence another STM32 warning when building with W=1
: - Fix the jcore-aic driver that actually never worked...
: .
Revert "irqchip/mxs: Include linux/irqchip/mxs.h"
irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
irqchip/stm32-exti: Fix warning on initialized field overwritten
irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map
irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype
irqchip/mxs: Include linux/irqchip/mxs.h
irqchip/clps711x: Remove unused clps711x_intc_init() function
irqchip/mmp: Remove non-DT codepath
irqchip/ftintc010: Mark all function static
irqdomain: Include internals.h for function prototypes
Signed-off-by: Marc Zyngier <maz@kernel.org>
Commit 20347fca71a3 ("swiotlb: split up the global swiotlb lock") moved
the number of used slots to struct io_tlb_area, but it did not remove
the field from struct io_tlb_mem.
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull more smb client updates from Steve French:
- fix potential use after free in unmount
- minor cleanup
- add worker to cleanup stale directory leases
* tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add a laundromat thread for cached directories
smb: client: remove redundant pointer 'server'
cifs: fix session state transition to avoid use-after-free issue
Quiet down this gcc warning:
fs/xfs/xfs_fsops.c: In function ‘xfs_growfs_data’:
fs/xfs/xfs_fsops.c:219:21: error: ‘lastag_extended’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
219 | if (lastag_extended) {
| ^~~~~~~~~~~~~~~
fs/xfs/xfs_fsops.c:100:33: note: ‘lastag_extended’ was declared here
100 | bool lastag_extended;
| ^~~~~~~~~~~~~~~
By setting its value explicitly. From code analysis I don't think this
is a real problem, but I have better things to do than analyse this
closely.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Add missing whitespace between node name/label and opening {.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Putting CPUs into INIT is a safer place during kexec() to park CPUs.
Split the INIT assert/deassert sequence out so it can be reused.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Link: https://lore.kernel.org/r/20230615193330.551157083@linutronix.de
Moving mem_encrypt_init() broke the AMD_MEM_ENCRYPT=n because the
declaration of that function was under #ifdef CONFIG_AMD_MEM_ENCRYPT and
the obvious placement for the inline stub was the #else path.
This is a leftover of commit 20f07a044a76 ("x86/sev: Move common memory
encryption code to mem_encrypt.c") which made mem_encrypt_init() depend on
X86_MEM_ENCRYPT without moving the prototype. That did not fail back then
because there was no stub inline as the core init code had a weak function.
Move both the declaration and the stub out of the CONFIG_AMD_MEM_ENCRYPT
section and guard it with CONFIG_X86_MEM_ENCRYPT.
Fixes: 439e17576eb4 ("init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Closes: https://lore.kernel.org/oe-kbuild-all/202306170247.eQtCJPE8-lkp@intel.com/
Pull io_uring updates from Jens Axboe:
"Nothing major in this release, just a bunch of cleanups and some
optimizations around networking mostly.
- clean up file request flags handling (Christoph)
- clean up request freeing and CQ locking (Pavel)
- support for using pre-registering the io_uring fd at setup time
(Josh)
- Add support for user allocated ring memory, rather than having the
kernel allocate it. Mostly for packing rings into a huge page (me)
- avoid an unnecessary double retry on receive (me)
- maintain ordering for task_work, which also improves performance
(me)
- misc cleanups/fixes (Pavel, me)"
* tag 'for-6.5/io_uring-2023-06-23' of git://git.kernel.dk/linux: (39 commits)
io_uring: merge conditional unlock flush helpers
io_uring: make io_cq_unlock_post static
io_uring: inline __io_cq_unlock
io_uring: fix acquire/release annotations
io_uring: kill io_cq_unlock()
io_uring: remove IOU_F_TWQ_FORCE_NORMAL
io_uring: don't batch task put on reqs free
io_uring: move io_clean_op()
io_uring: inline io_dismantle_req()
io_uring: remove io_free_req_tw
io_uring: open code io_put_req_find_next
io_uring: add helpers to decode the fixed file file_ptr
io_uring: use io_file_from_index in io_msg_grab_file
io_uring: use io_file_from_index in __io_sync_cancel
io_uring: return REQ_F_ flags from io_file_get_flags
io_uring: remove io_req_ffs_set
io_uring: remove a confusing comment above io_file_get_flags
io_uring: remove the mode variable in io_file_get_flags
io_uring: remove __io_file_supports_nowait
io_uring: wait interruptibly for request completions on exit
...
In order to prevent request_queue to be freed before cleaning up
blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
debugfs entries leakage") use scsi_device_get(), however,
scsi_device_get() will also grab scsi module reference and scsi module
can't be removed.
It's reported that blktests can't unload scsi_debug after block/001:
blktests (master) # ./check block
block/001 (stress device hotplugging) [failed]
+++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
Running block/001
Stressing sd
+modprobe: FATAL: Module scsi_debug is in use.
Fix this problem by grabbing request_queue reference directly, so that
scsi host module can still be unloaded while request_queue will be
pinged by sg device.
Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com>
Link: https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/
Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230621160111.1433521-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move the open coded sparse bitmap handling into helper functions as
a preparatory step for converting the sparse interrupt management
to a maple tree.
No functional change.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230519134902.1495562-3-sdonthineni@nvidia.com
* irq/loongarch-fixes-6.5:
: .
: Yet another series of random fixes for the Loongson/Loongarch
: string of interrupt controller, covering
:
: - affinity setting,
: - trigger polarity,
: - wake-up,
: - DT support
: .
irqchip/loongson-eiointc: Add DT init support
dt-bindings: interrupt-controller: Add Loongson EIOINTC
irqchip/loongson-eiointc: Fix irq affinity setting during resume
irqchip/loongson-liointc: Add IRQCHIP_SKIP_SET_WAKE flag
irqchip/loongson-liointc: Fix IRQ trigger polarity
irqchip/loongson-pch-pic: Fix potential incorrect hwirq assignment
irqchip/loongson-pch-pic: Fix initialization of HT vector register
Signed-off-by: Marc Zyngier <maz@kernel.org>
This reverts commit 5b7e5676209120814dbb9fec8bc3769f0f7a7958.
Although including linux/irqchip/mxs.h is technically correct,
this clashes with the parallel removal of this include file
with 32bit ARM modernizing the low level irq handling as part of
5bb578a0c1b8 ("ARM: 9298/1: Drop custom mdesc->handle_irq()").
As such, this patch is not only unnecessary, it also breaks
compilation in -next. Revert it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Shawn Guo <shawnguo@kernel.org>
If dma_direct_alloc() alloc memory in size of 64MB, the inner function
dma_common_contiguous_remap() will allocate 128KB memory by invoking
the function kmalloc_array(). and the kmalloc_array seems to fail to try to
allocate 128KB mem.
Call trace:
[14977.928623] qcrosvm: page allocation failure: order:5, mode:0x40cc0
[14977.928638] dump_backtrace.cfi_jt+0x0/0x8
[14977.928647] dump_stack_lvl+0x80/0xb8
[14977.928652] warn_alloc+0x164/0x200
[14977.928657] __alloc_pages_slowpath+0x9f0/0xb4c
[14977.928660] __alloc_pages+0x21c/0x39c
[14977.928662] kmalloc_order+0x48/0x108
[14977.928666] kmalloc_order_trace+0x34/0x154
[14977.928668] __kmalloc+0x548/0x7e4
[14977.928673] dma_direct_alloc+0x11c/0x4f8
[14977.928678] dma_alloc_attrs+0xf4/0x138
[14977.928680] gh_vm_ioctl_set_fw_name+0x3c4/0x610 [gunyah]
[14977.928698] gh_vm_ioctl+0x90/0x14c [gunyah]
[14977.928705] __arm64_sys_ioctl+0x184/0x210
work around by doing kvmalloc_array instead.
Signed-off-by: Gao Xu <gaoxu2@hihonor.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull NTB updates from Jon Mason:
"Fixes for pci_clean_master, error handling in driver inits, and
various other issues/bugs"
* tag 'ntb-6.5' of https://github.com/jonmason/ntb:
ntb: hw: amd: Fix debugfs_create_dir error checking
ntb.rst: Fix copy and paste error
ntb_netdev: Fix module_init problem
ntb: intel: Remove redundant pci_clear_master
ntb: epf: Remove redundant pci_clear_master
ntb_hw_amd: Remove redundant pci_clear_master
ntb: idt: drop redundant pci_enable_pcie_error_reporting()
MAINTAINERS: git://github -> https://github.com for jonmason
NTB: EPF: fix possible memory leak in pci_vntb_probe()
NTB: ntb_tool: Add check for devm_kcalloc
NTB: ntb_transport: fix possible memory leak while device_register() fails
ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
ntb: idt: Fix error handling in idt_pci_driver_init()
and drop cached directories after 30 seconds
Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
./fs/xfs/xfs_extfree_item.c:723:3-4: Unneeded semicolon
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5728
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
After commit 0e96ea5c3eb5904e5dc2f ("MIPS: Loongson64: Clean up use of
cc-ifversion") we get a build error when make modules_install:
cc1: error: '-mloongson-mmi' must be used with '-mhard-float'
The reason is when make modules_install, 'call cc-option' doesn't work
in $(KBUILD_CFLAGS) of 'CHECKFLAGS'. Then there is no -mno-loongson-mmi
applied and -march=loongson3a enable MMI instructions.
To be detail, the error message comes from the CHECKFLAGS invocation of
$(CC) but it has no impact on the final result of make modules_install,
it is purely a cosmetic issue. The error occurs because cc-option is
defined in scripts/Makefile.compiler, which is not included in Makefile
when running 'make modules_install', as install targets are not supposed
to require the compiler; see commit 805b2e1d427aab4b ("kbuild: include
Makefile.compiler only when compiler is needed"). As a result, the call
to check for '-mno-loongson-mmi' just never happens.
Fix this by partially reverting to the old logic, use 'call cc-option'
to conditionally apply -march=loongson3a and -march=mips64r2.
By the way, Loongson-2E/2F is also broken in commit 13ceb48bc19c563e05f4
("MIPS: Loongson2ef: Remove unnecessary {as,cc}-option calls") so fix it
together.
Fixes: 13ceb48bc19c563e05f4 ("MIPS: Loongson2ef: Remove unnecessary {as,cc}-option calls")
Fixes: 0e96ea5c3eb5904e5dc2 ("MIPS: Loongson64: Clean up use of cc-ifversion")
Cc: stable@vger.kernel.org
Cc: Feiyang Chen <chenfeiyang@loongson.cn>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
TLDR: It's a mess.
When kexec() is executed on a system with offline CPUs, which are parked in
mwait_play_dead() it can end up in a triple fault during the bootup of the
kexec kernel or cause hard to diagnose data corruption.
The reason is that kexec() eventually overwrites the previous kernel's text,
page tables, data and stack. If it writes to the cache line which is
monitored by a previously offlined CPU, MWAIT resumes execution and ends
up executing the wrong text, dereferencing overwritten page tables or
corrupting the kexec kernels data.
Cure this by bringing the offlined CPUs out of MWAIT into HLT.
Write to the monitored cache line of each offline CPU, which makes MWAIT
resume execution. The written control word tells the offlined CPUs to issue
HLT, which does not have the MWAIT problem.
That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as
those make it come out of HLT.
A follow up change will put them into INIT, which protects at least against
NMI and SMI.
Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case")
Reported-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de
Initializing the FPU during the early boot process is a pointless
exercise. Early boot is convoluted and fragile enough.
Nothing requires that the FPU is set up early. It has to be initialized
before fork_init() because the task_struct size depends on the FPU register
buffer size.
Move the initialization to arch_cpu_finalize_init() which is the perfect
place to do so.
No functional change.
This allows to remove quite some of the custom early command line parsing,
but that's subject to the next installment.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224545.902376621@linutronix.de
Pull splice updates from Jens Axboe:
"This kills off ITER_PIPE to avoid a race between truncate,
iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio
with unpinned/unref'ed pages from an O_DIRECT splice read. This causes
memory corruption.
Instead, we either use (a) filemap_splice_read(), which invokes the
buffered file reading code and splices from the pagecache into the
pipe; (b) copy_splice_read(), which bulk-allocates a buffer, reads
into it and then pushes the filled pages into the pipe; or (c) handle
it in filesystem-specific code.
Summary:
- Rename direct_splice_read() to copy_splice_read()
- Simplify the calculations for the number of pages to be reclaimed
in copy_splice_read()
- Turn do_splice_to() into a helper, vfs_splice_read(), so that it
can be used by overlayfs and coda to perform the checks on the
lower fs
- Make vfs_splice_read() jump to copy_splice_read() to handle
direct-I/O and DAX
- Provide shmem with its own splice_read to handle non-existent pages
in the pagecache. We don't want a ->read_folio() as we don't want
to populate holes, but filemap_get_pages() requires it
- Provide overlayfs with its own splice_read to call down to a lower
layer as overlayfs doesn't provide ->read_folio()
- Provide coda with its own splice_read to call down to a lower layer
as coda doesn't provide ->read_folio()
- Direct ->splice_read to copy_splice_read() in tty, procfs, kernfs
and random files as they just copy to the output buffer and don't
splice pages
- Provide wrappers for afs, ceph, ecryptfs, ext4, f2fs, nfs, ntfs3,
ocfs2, orangefs, xfs and zonefs to do locking and/or revalidation
- Make cifs use filemap_splice_read()
- Replace pointers to generic_file_splice_read() with pointers to
filemap_splice_read() as DIO and DAX are handled in the caller;
filesystems can still provide their own alternate ->splice_read()
op
- Remove generic_file_splice_read()
- Remove ITER_PIPE and its paraphernalia as generic_file_splice_read
was the only user"
* tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux: (31 commits)
splice: kdoc for filemap_splice_read() and copy_splice_read()
iov_iter: Kill ITER_PIPE
splice: Remove generic_file_splice_read()
splice: Use filemap_splice_read() instead of generic_file_splice_read()
cifs: Use filemap_splice_read()
trace: Convert trace/seq to use copy_splice_read()
zonefs: Provide a splice-read wrapper
xfs: Provide a splice-read wrapper
orangefs: Provide a splice-read wrapper
ocfs2: Provide a splice-read wrapper
ntfs3: Provide a splice-read wrapper
nfs: Provide a splice-read wrapper
f2fs: Provide a splice-read wrapper
ext4: Provide a splice-read wrapper
ecryptfs: Provide a splice-read wrapper
ceph: Provide a splice-read wrapper
afs: Provide a splice-read wrapper
9p: Add splice_read wrapper
net: Make sock_splice_read() use copy_splice_read() by default
tty, proc, kernfs, random: Use copy_splice_read()
...
There is no reason not to use __io_cq_unlock_post_flush for intermediate
aux CQE flushing, all ->task_complete should apply there, i.e. if set it
should be the submitter task. Combine them, get rid of of
__io_cq_unlock_post() and rename the left function.
This place was also taking a couple percents of CPU according to
profiles for max throughput net benchmarks due to multishot recv
flooding it with completions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bbed60734cbec2e833d9c7bdcf9741aada5d8aab.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
ext4_blkdev_remove() passes a wrong holder pointer to blkdev_put() which
triggers a warning there. Fix it.
Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230622165107.13687-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull x86 cpu fix from Thomas Gleixner:
"A single fix for x86:
- Prevent a bogus setting for the number of HT siblings, which is
caused by the CPUID evaluation trainwreck of X86. That recomputes
the value for each CPU, so the last CPU "wins". That can cause
completely bogus sibling values"
* tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
The current implementation utilizes a bitmap for managing interrupt resend
handlers, which is allocated based on the SPARSE_IRQ/NR_IRQS macros.
However, this method may not efficiently utilize memory during runtime,
particularly when IRQ_BITMAP_BITS is large.
Address this issue by using an hlist to manage interrupt resend handlers
instead of relying on a static bitmap memory allocation. Additionally, a
new function, clear_irq_resend(), is introduced and called from
irq_shutdown to ensure a graceful teardown of the interrupt.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230519134902.1495562-2-sdonthineni@nvidia.com
* irq/lpi-resend:
: .
: Patch series from James Gowans, working around an issue with
: GICv3 LPIs that can fire concurrently on multiple CPUs.
: .
irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs
genirq: Allow fasteoi handler to resend interrupts on concurrent handling
genirq: Expand doc for PENDING and REPLAY flags
genirq: Use BIT() for the IRQD_* state flags
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add EIOINTC irqchip DT support, which is needed for Loongson chips
based on DT and supporting EIOINTC, such as the Loongson-2K0500 SOC.
Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/764e02d924094580ac0f1d15535f4b98308705c6.1683279769.git.zhoubinbin@loongson.cn
The initialization function for the J-Core AIC aic_irq_of_init() is
currently missing the call to irq_alloc_descs() which allocates and
initializes all the IRQ descriptors. Add missing function call and
return the error code from irq_alloc_descs() in case the allocation
fails.
Fixes: 981b58f66cfc ("irqchip/jcore-aic: Add J-Core AIC driver")
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: Rob Landley <rob@landley.net>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230510163343.43090-1-glaubitz@physik.fu-berlin.de
'thing' -> 'think'
Signed-off-by: Sui Jingfeng <suijingfeng@loongson.cn>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Lockdep is certainly right to complain about
(&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
but task is already holding lock:
(&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
Invert those to the usual ordering.
Fixes: 33313a747e81 ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The debugfs_create_dir function returns ERR_PTR in case of error, and the
only correct way to check if an error occurred is 'IS_ERR' inline function.
This patch will replace the null-comparison with IS_ERR.
Signed-off-by: Anup Sharma <anupnewsmail@gmail.com>
Suggested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The pointer 'server' is assigned but never read, the pointer is
redundant and can be removed. Cleans up clang scan build warning:
fs/smb/client/dfs.c:217:3: warning: Value stored to 'server' is
never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Similar to the recent patch strengthening the AGF agf_length
verification, the AGI verifier does not check that the AGI length field
is within known good bounds. This isn't currently checked by runtime
kernel code, yet we assume in many places that it is correct and verify
other metadata against it.
Add length verification to the AGI verifier. Just like the AGF length
checking, the length of the AGI must be equal to the size of the AG
specified in the superblock, unless it is the last AG in the filesystem.
In that case, it must be less than or equal to sb->sb_agblocks and
greater than XFS_MIN_AG_BLOCKS, which is the smallest AG a growfs
operation will allow to exist.
There's only one place in the filesystem that actually uses agi_length,
but let's not leave it vulnerable to the same weird nonsense that
generates syzbot bugs, eh?
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
After commit 45c7e8af4a5e3f0bea4ac209 ("MIPS: Remove KVM_TE support") we
get a NULL pointer dereference when creating a KVM guest:
[ 146.243409] Starting KVM with MIPS VZ extensions
[ 149.849151] CPU 3 Unable to handle kernel paging request at virtual address 0000000000000300, epc == ffffffffc06356ec, ra == ffffffffc063568c
[ 149.849177] Oops[#1]:
[ 149.849182] CPU: 3 PID: 2265 Comm: qemu-system-mip Not tainted 6.4.0-rc3+ #1671
[ 149.849188] Hardware name: THTF CX TL630 Series/THTF-LS3A4000-7A1000-ML4A, BIOS KL4.1F.TF.D.166.201225.R 12/25/2020
[ 149.849192] $ 0 : 0000000000000000 000000007400cce0 0000000000400004 ffffffff8119c740
[ 149.849209] $ 4 : 000000007400cce1 000000007400cce1 0000000000000000 0000000000000000
[ 149.849221] $ 8 : 000000240058bb36 ffffffff81421ac0 0000000000000000 0000000000400dc0
[ 149.849233] $12 : 9800000102a07cc8 ffffffff80e40e38 0000000000000001 0000000000400dc0
[ 149.849245] $16 : 0000000000000000 9800000106cd0000 9800000106cd0000 9800000100cce000
[ 149.849257] $20 : ffffffffc0632b28 ffffffffc05b31b0 9800000100ccca00 0000000000400000
[ 149.849269] $24 : 9800000106cd09ce ffffffff802f69d0
[ 149.849281] $28 : 9800000102a04000 9800000102a07cd0 98000001106a8000 ffffffffc063568c
[ 149.849293] Hi : 00000335b2111e66
[ 149.849295] Lo : 6668d90061ae0ae9
[ 149.849298] epc : ffffffffc06356ec kvm_vz_vcpu_setup+0xc4/0x328 [kvm]
[ 149.849324] ra : ffffffffc063568c kvm_vz_vcpu_setup+0x64/0x328 [kvm]
[ 149.849336] Status: 7400cce3 KX SX UX KERNEL EXL IE
[ 149.849351] Cause : 1000000c (ExcCode 03)
[ 149.849354] BadVA : 0000000000000300
[ 149.849357] PrId : 0014c004 (ICT Loongson-3)
[ 149.849360] Modules linked in: kvm nfnetlink_queue nfnetlink_log nfnetlink fuse sha256_generic libsha256 cfg80211 rfkill binfmt_misc vfat fat snd_hda_codec_hdmi input_leds led_class snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_pcm snd_timer snd serio_raw xhci_pci radeon drm_suballoc_helper drm_display_helper xhci_hcd ip_tables x_tables
[ 149.849432] Process qemu-system-mip (pid: 2265, threadinfo=00000000ae2982d2, task=0000000038e09ad4, tls=000000ffeba16030)
[ 149.849439] Stack : 9800000000000003 9800000100ccca00 9800000100ccc000 ffffffffc062cef4
[ 149.849453] 9800000102a07d18 c89b63a7ab338e00 0000000000000000 ffffffff811a0000
[ 149.849465] 0000000000000000 9800000106cd0000 ffffffff80e59938 98000001106a8920
[ 149.849476] ffffffff80e57f30 ffffffffc062854c ffffffff811a0000 9800000102bf4240
[ 149.849488] ffffffffc05b0000 ffffffff80e3a798 000000ff78000000 000000ff78000010
[ 149.849500] 0000000000000255 98000001021f7de0 98000001023f0078 ffffffff81434000
[ 149.849511] 0000000000000000 0000000000000000 9800000102ae0000 980000025e92ae28
[ 149.849523] 0000000000000000 c89b63a7ab338e00 0000000000000001 ffffffff8119dce0
[ 149.849535] 000000ff78000010 ffffffff804f3d3c 9800000102a07eb0 0000000000000255
[ 149.849546] 0000000000000000 ffffffff8049460c 000000ff78000010 0000000000000255
[ 149.849558] ...
[ 149.849565] Call Trace:
[ 149.849567] [<ffffffffc06356ec>] kvm_vz_vcpu_setup+0xc4/0x328 [kvm]
[ 149.849586] [<ffffffffc062cef4>] kvm_arch_vcpu_create+0x184/0x228 [kvm]
[ 149.849605] [<ffffffffc062854c>] kvm_vm_ioctl+0x64c/0xf28 [kvm]
[ 149.849623] [<ffffffff805209c0>] sys_ioctl+0xc8/0x118
[ 149.849631] [<ffffffff80219eb0>] syscall_common+0x34/0x58
The root cause is the deletion of kvm_mips_commpage_init() leaves vcpu
->arch.cop0 NULL. So fix it by making cop0 from a pointer to an embedded
object.
Fixes: 45c7e8af4a5e3f0bea4ac209 ("MIPS: Remove KVM_TE support")
Cc: stable@vger.kernel.org
Reported-by: Yu Zhao <yuzhao@google.com>
Suggested-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Monitoring idletask::thread_info::flags in mwait_play_dead() has been an
obvious choice as all what is needed is a cache line which is not written
by other CPUs.
But there is a use case where a "dead" CPU needs to be brought out of
MWAIT: kexec().
This is required as kexec() can overwrite text, pagetables, stacks and the
monitored cacheline of the original kernel. The latter causes MWAIT to
resume execution which obviously causes havoc on the kexec kernel which
results usually in triple faults.
Use a dedicated per CPU storage to prepare for that.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.434553750@linutronix.de
We just sorted the entries and fields last release, so just out of a
perverse sense of curiosity, I decided to see if we can keep things
ordered for even just one release.
The answer is "No. No we cannot".
I suggest that all kernel developers will need weekly training sessions,
involving a lot of Big Bird and Sesame Street. And at the yearly
maintainer summit, we will all sing the alphabet song together.
I doubt I will keep doing this. At some point "perverse sense of
curiosity" turns into just a cold dark place filled with sadness and
despair.
Repeats: 80e62bc8487b ("MAINTAINERS: re-sort all entries and fields")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull dma-mapping fixes from Christoph Hellwig:
- swiotlb area sizing fixes (Petr Tesarik)
* tag 'dma-mapping-6.5-2023-07-09' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: reduce the number of areas to match actual memory pool size
swiotlb: always set the number of areas before allocating the pool
Although the desired size of the SWIOTLB memory pool is increased in
swiotlb_adjust_nareas() to match the number of areas, the actual allocation
may be smaller, which may require reducing the number of areas.
For example, Xen uses swiotlb_init_late(), which in turn uses the page
allocator. On x86, page size is 4 KiB and MAX_ORDER is 10 (1024 pages),
resulting in a maximum memory pool size of 4 MiB. This corresponds to 2048
slots of 2 KiB each. The minimum area size is 128 (IO_TLB_SEGSIZE),
allowing at most 2048 / 128 = 16 areas.
If num_possible_cpus() is greater than the maximum number of areas, areas
are smaller than IO_TLB_SEGSIZE and contiguous groups of free slots will
span multiple areas. When allocating and freeing slots, only one area will
be properly locked, causing race conditions on the unlocked slots and
ultimately data corruption, kernel hangs and crashes.
Fixes: 20347fca71a3 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The number of areas defaults to the number of possible CPUs. However, the
total number of slots may have to be increased after adjusting the number
of areas. Consequently, the number of areas must be determined before
allocating the memory pool. This is even explained with a comment in
swiotlb_init_remap(), but swiotlb_init_late() adjusts the number of areas
after slots are already allocated. The areas may end up being smaller than
IO_TLB_SEGSIZE, which breaks per-area locking.
While fixing swiotlb_init_late(), move all relevant comments before the
definition of swiotlb_adjust_nareas() and convert them to kernel-doc.
Fixes: 20347fca71a3 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull x86 fix from Thomas Gleixner:
"A single fix for the mechanism to park CPUs with an INIT IPI.
On shutdown or kexec, the kernel tries to park the non-boot CPUs with
an INIT IPI. But the same code path is also used by the crash utility.
If the CPU which panics is not the boot CPU then it sends an INIT IPI
to the boot CPU which resets the machine.
Prevent this by validating that the CPU which runs the stop mechanism
is the boot CPU. If not, leave the other CPUs in HLT"
* tag 'x86-core-2023-07-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/smp: Don't send INIT to boot CPU
Moving the call of fpu__init_cpu() from cpu_init() to start_secondary()
broke Xen PV guests, as those don't call start_secondary() for APs.
Call fpu__init_cpu() in Xen's cpu_bringup(), which is the Xen PV
replacement of start_secondary().
Fixes: b81fac906a8f ("x86/fpu: Move FPU initialization into arch_cpu_finalize_init()")
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230703130032.22916-1-jgross@suse.com
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt subsystem:
Core:
- Convert the interrupt descriptor storage to a maple tree to
overcome the limitations of the radixtree + fixed size bitmap.
This allows us to handle very large servers with a huge number of
guests without imposing a huge memory overhead on everyone
- Implement optional retriggering of interrupts which utilize the
fasteoi handler to work around a GICv3 architecture issue
Drivers:
- A set of fixes and updates for the Loongson/Loongarch related
drivers
- Workaound for an ASR8601 integration hickup which ends up with CPU
numbering which can't be represented in the GIC implementation
- The usual set of boring fixes and updates all over the place"
* tag 'irq-core-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
Revert "irqchip/mxs: Include linux/irqchip/mxs.h"
irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
irqchip/stm32-exti: Fix warning on initialized field overwritten
irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map
irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype
irqchip/mxs: Include linux/irqchip/mxs.h
irqchip/clps711x: Remove unused clps711x_intc_init() function
irqchip/mmp: Remove non-DT codepath
irqchip/ftintc010: Mark all function static
irqdomain: Include internals.h for function prototypes
irqchip/loongson-eiointc: Add DT init support
dt-bindings: interrupt-controller: Add Loongson EIOINTC
irqchip/loongson-eiointc: Fix irq affinity setting during resume
irqchip/loongson-liointc: Add IRQCHIP_SKIP_SET_WAKE flag
irqchip/loongson-liointc: Fix IRQ trigger polarity
irqchip/loongson-pch-pic: Fix potential incorrect hwirq assignment
irqchip/loongson-pch-pic: Fix initialization of HT vector register
irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs
genirq: Allow fasteoi handler to resend interrupts on concurrent handling
genirq: Expand doc for PENDING and REPLAY flags
...
Drivers have no business looking into dma-mapping internals and check
what backend is used. Unfortunstely the DRM core is still broken and
tries to do plain page allocations instead of using DMA API allocators
by default and uses various bandaids on when to use dma_alloc_coherent.
Switch nouveau to use the same (broken) scheme as amdgpu and radeon
to remove the last driver user of is_swiotlb_active.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Pull MIPS fixes from Thomas Bogendoerfer:
- fixes for KVM
- fix for loongson build and cpu probing
- DT fixes
* tag 'mips_6.5_1' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled
MIPS: dts: add missing space before {
MIPS: Loongson: Fix build error when make modules_install
MIPS: KVM: Fix NULL pointer dereference
MIPS: Loongson: Fix cpu_probe_loongson() again
Parking CPUs in INIT works well, except for the crash case when the CPU
which invokes smp_park_other_cpus_in_init() is not the boot CPU. Sending
INIT to the boot CPU resets the whole machine.
Prevent this by validating that this runs on the boot CPU. If not fall back
and let CPUs hang in HLT.
Fixes: 45e34c8af58f ("x86/smp: Put CPUs into INIT on shutdown if possible")
Reported-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Baokun Li <libaokun1@huawei.com>
Link: https://lore.kernel.org/r/87ttui91jo.ffs@tglx
Niklāvs reported a boot regression on an Alderlake machine and bisected it
to commit 9df9d2f0471b ("init: Invoke arch_cpu_finalize_init() earlier").
By moving the invocation of arch_cpu_finalize_init() further down he
identified that efi_enter_virtual_mode() is the function which causes the
boot hang.
The main difference of the earlier invocation is that the boot CPU is
already fully initialized and mitigations and alternatives are applied.
But the only really interesting change turned out to be IBT, which is now
enabled before efi_enter_virtual_mode(). "ibt=off" on the kernel command
line cured the problem.
Inspection of the involved calls in efi_enter_virtual_mode() unearthed that
efi_set_virtual_address_map() is the only place in the kernel which invokes
an EFI call without the IBT safe wrapper. This went obviously unnoticed so
far as IBT was enabled later.
Use arch_efi_call_virt() instead of efi_call() to cure that.
Fixes: fe379fa4d199 ("x86/ibt: Disable IBT around firmware")
Fixes: 9df9d2f0471b ("init: Invoke arch_cpu_finalize_init() earlier")
Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=217602
Link: https://lore.kernel.org/r/87jzvm12q0.ffs@tglx
Pull debugobjects update from Thomas Gleixner:
"A single update for debug objects:
- Recheck whether debug objects is enabled before reporting a problem
to avoid spamming the logs with messages which are caused by a
concurrent OOM"
* tag 'core-debugobjects-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip:
debugobjects: Recheck debug_objects_enabled before reporting
Pull irqchip updates from Marc Zyngier:
- A number of Loogson/Loogarch fixes
- Allow the core code to retrigger an interrupt that has
fired while the same interrupt is being handled on another
CPU, papering over a GICv3 architecture issue
- Work around an integration problem on ASR8601, where the CPU
numbering isn't representable in the GIC implementation...
- Add some missing interrupt to the STM32 irqchip
- A bunch of warning squashing triggered by W=1 builds
Link: https://lore.kernel.org/r/20230623224345.3577134-1-maz@kernel.org
Commit e4de20576986 ("MIPS: KVM: Fix NULL pointer dereference") missed
converting one place accessing cop0 registers, which results in a build
error, if KVM_MIPS_DEBUG_COP0_COUNTERS is enabled.
Fixes: e4de20576986 ("MIPS: KVM: Fix NULL pointer dereference")
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Parking CPUs in a HLT loop is not completely safe vs. kexec() as HLT can
resume execution due to NMI, SMI and MCE, which has the same issue as the
MWAIT loop.
Kicking the secondary CPUs into INIT makes this safe against NMI and SMI.
A broadcast MCE will take the machine down, but a broadcast MCE which makes
HLT resume and execute overwritten text, pagetables or data will end up in
a disaster too.
So chose the lesser of two evils and kick the secondary CPUs into INIT
unless the system has installed special wakeup mechanisms which are not
using INIT.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230615193330.608657211@linutronix.de
Pull x86 boot updates from Thomas Gleixner:
"Initialize FPU late.
Right now FPU is initialized very early during boot. There is no real
requirement to do so. The only requirement is to have it done before
alternatives are patched.
That's done in check_bugs() which does way more than what the function
name suggests.
So first rename check_bugs() to arch_cpu_finalize_init() which makes
it clear what this is about.
Move the invocation of arch_cpu_finalize_init() earlier in
start_kernel() as it has to be done before fork_init() which needs to
know the FPU register buffer size.
With those prerequisites the FPU initialization can be moved into
arch_cpu_finalize_init(), which removes it from the early and fragile
part of the x86 bringup"
* tag 'x86-boot-2023-06-26' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n build
x86/fpu: Move FPU initialization into arch_cpu_finalize_init()
x86/fpu: Mark init functions __init
x86/fpu: Remove cpuinfo argument from init functions
x86/init: Initialize signal frame size late
init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()
init: Invoke arch_cpu_finalize_init() earlier
init: Remove check_bugs() leftovers
um/cpu: Switch to arch_cpu_finalize_init()
sparc/cpu: Switch to arch_cpu_finalize_init()
sh/cpu: Switch to arch_cpu_finalize_init()
mips/cpu: Switch to arch_cpu_finalize_init()
m68k/cpu: Switch to arch_cpu_finalize_init()
loongarch/cpu: Switch to arch_cpu_finalize_init()
ia64/cpu: Switch to arch_cpu_finalize_init()
ARM: cpu: Switch to arch_cpu_finalize_init()
x86/cpu: Switch to arch_cpu_finalize_init()
init: Provide arch_cpu_finalize_init()
Pull block updates from Jens Axboe:
- NVMe pull request via Keith:
- Various cleanups all around (Irvin, Chaitanya, Christophe)
- Better struct packing (Christophe JAILLET)
- Reduce controller error logs for optional commands (Keith)
- Support for >=64KiB block sizes (Daniel Gomez)
- Fabrics fixes and code organization (Max, Chaitanya, Daniel
Wagner)
- bcache updates via Coly:
- Fix a race at init time (Mingzhe Zou)
- Misc fixes and cleanups (Andrea, Thomas, Zheng, Ye)
- use page pinning in the block layer for dio (David)
- convert old block dio code to page pinning (David, Christoph)
- cleanups for pktcdvd (Andy)
- cleanups for rnbd (Guoqing)
- use the unchecked __bio_add_page() for the initial single page
additions (Johannes)
- fix overflows in the Amiga partition handling code (Michael)
- improve mq-deadline zoned device support (Bart)
- keep passthrough requests out of the IO schedulers (Christoph, Ming)
- improve support for flush requests, making them less special to deal
with (Christoph)
- add bdev holder ops and shutdown methods (Christoph)
- fix the name_to_dev_t() situation and use cases (Christoph)
- decouple the block open flags from fmode_t (Christoph)
- ublk updates and cleanups, including adding user copy support (Ming)
- BFQ sanity checking (Bart)
- convert brd from radix to xarray (Pankaj)
- constify various structures (Thomas, Ivan)
- more fine grained persistent reservation ioctl capability checks
(Jingbo)
- misc fixes and cleanups (Arnd, Azeem, Demi, Ed, Hengqi, Hou, Jan,
Jordy, Li, Min, Yu, Zhong, Waiman)
* tag 'for-6.5/block-2023-06-23' of git://git.kernel.dk/linux: (266 commits)
scsi/sg: don't grab scsi host module reference
ext4: Fix warning in blkdev_put()
block: don't return -EINVAL for not found names in devt_from_devname
cdrom: Fix spectre-v1 gadget
block: Improve kernel-doc headers
blk-mq: don't insert passthrough request into sw queue
bsg: make bsg_class a static const structure
ublk: make ublk_chr_class a static const structure
aoe: make aoe_class a static const structure
block/rnbd: make all 'class' structures const
block: fix the exclusive open mask in disk_scan_partitions
block: add overflow checks for Amiga partition support
block: change all __u32 annotations to __be32 in affs_hardblocks.h
block: fix signed int overflow in Amiga partition support
block: add capacity validation in bdev_add_partition()
block: fine-granular CAP_SYS_ADMIN for Persistent Reservation
block: disallow Persistent Reservation on partitions
reiserfs: fix blkdev_put() warning from release_journal_dev()
block: fix wrong mode for blkdev_get_by_dev() from disk_scan_partitions()
block: document the holder argument to blkdev_get_by_path
...
syzbot is reporting false a positive ODEBUG message immediately after
ODEBUG was disabled due to OOM.
[ 1062.309646][T22911] ODEBUG: Out of memory. ODEBUG disabled
[ 1062.886755][ T5171] ------------[ cut here ]------------
[ 1062.892770][ T5171] ODEBUG: assert_init not available (active state 0) object: ffffc900056afb20 object type: timer_list hint: process_timeout+0x0/0x40
CPU 0 [ T5171] CPU 1 [T22911]
-------------- --------------
debug_object_assert_init() {
if (!debug_objects_enabled)
return;
db = get_bucket(addr);
lookup_object_or_alloc() {
debug_objects_enabled = 0;
return NULL;
}
debug_objects_oom() {
pr_warn("Out of memory. ODEBUG disabled\n");
// all buckets get emptied here, and
}
lookup_object_or_alloc(addr, db, descr, false, true) {
// this bucket is already empty.
return ERR_PTR(-ENOENT);
}
// Emits false positive warning.
debug_print_object(&o, "assert_init");
}
Recheck debug_object_enabled in debug_print_object() to avoid that.
Reported-by: syzbot <syzbot+7937ba6a50bdd00fffdf@syzkaller.appspotmail.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/492fe2ae-5141-d548-ebd5-62f5fe2e57f7@I-love.SAKURA.ne.jp
Closes: https://syzkaller.appspot.com/bug?extid=7937ba6a50bdd00fffdf
The current implementation uses a static bitmap for interrupt descriptor
allocation and a radix tree to pointer store the pointer for lookup.
However, the size of the bitmap is constrained by the build time macro
MAX_SPARSE_IRQS, which may not be sufficient to support high-end servers,
particularly those with GICv4.1 hardware, which require a large interrupt
space to cover LPIs and vSGIs.
Replace the bitmap and the radix tree with a maple tree, which not only
stores pointers for lookup, but also provides a mechanism to find free
ranges. That removes the build time hardcoded upper limit.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230519134902.1495562-4-sdonthineni@nvidia.com
* irq/misc-6.5:
: .
: Misc cleanups:
:
: - Add a number of missing prototypes
: - Mark global symbol as static where needed
: - Drop some now useless non-DT code paths
: - Add a missing interrupt mapping to the STM32 irqchip
: - Silence another STM32 warning when building with W=1
: - Fix the jcore-aic driver that actually never worked...
: .
Revert "irqchip/mxs: Include linux/irqchip/mxs.h"
irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
irqchip/stm32-exti: Fix warning on initialized field overwritten
irqchip/stm32-exti: Add STM32MP15xx IWDG2 EXTI to GIC map
irqchip/gicv3: Add a iort_pmsi_get_dev_id() prototype
irqchip/mxs: Include linux/irqchip/mxs.h
irqchip/clps711x: Remove unused clps711x_intc_init() function
irqchip/mmp: Remove non-DT codepath
irqchip/ftintc010: Mark all function static
irqdomain: Include internals.h for function prototypes
Signed-off-by: Marc Zyngier <maz@kernel.org>
Pull more smb client updates from Steve French:
- fix potential use after free in unmount
- minor cleanup
- add worker to cleanup stale directory leases
* tag '6.5-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6:
cifs: Add a laundromat thread for cached directories
smb: client: remove redundant pointer 'server'
cifs: fix session state transition to avoid use-after-free issue
Quiet down this gcc warning:
fs/xfs/xfs_fsops.c: In function ‘xfs_growfs_data’:
fs/xfs/xfs_fsops.c:219:21: error: ‘lastag_extended’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
219 | if (lastag_extended) {
| ^~~~~~~~~~~~~~~
fs/xfs/xfs_fsops.c:100:33: note: ‘lastag_extended’ was declared here
100 | bool lastag_extended;
| ^~~~~~~~~~~~~~~
By setting its value explicitly. From code analysis I don't think this
is a real problem, but I have better things to do than analyse this
closely.
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
Moving mem_encrypt_init() broke the AMD_MEM_ENCRYPT=n because the
declaration of that function was under #ifdef CONFIG_AMD_MEM_ENCRYPT and
the obvious placement for the inline stub was the #else path.
This is a leftover of commit 20f07a044a76 ("x86/sev: Move common memory
encryption code to mem_encrypt.c") which made mem_encrypt_init() depend on
X86_MEM_ENCRYPT without moving the prototype. That did not fail back then
because there was no stub inline as the core init code had a weak function.
Move both the declaration and the stub out of the CONFIG_AMD_MEM_ENCRYPT
section and guard it with CONFIG_X86_MEM_ENCRYPT.
Fixes: 439e17576eb4 ("init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Closes: https://lore.kernel.org/oe-kbuild-all/202306170247.eQtCJPE8-lkp@intel.com/
Pull io_uring updates from Jens Axboe:
"Nothing major in this release, just a bunch of cleanups and some
optimizations around networking mostly.
- clean up file request flags handling (Christoph)
- clean up request freeing and CQ locking (Pavel)
- support for using pre-registering the io_uring fd at setup time
(Josh)
- Add support for user allocated ring memory, rather than having the
kernel allocate it. Mostly for packing rings into a huge page (me)
- avoid an unnecessary double retry on receive (me)
- maintain ordering for task_work, which also improves performance
(me)
- misc cleanups/fixes (Pavel, me)"
* tag 'for-6.5/io_uring-2023-06-23' of git://git.kernel.dk/linux: (39 commits)
io_uring: merge conditional unlock flush helpers
io_uring: make io_cq_unlock_post static
io_uring: inline __io_cq_unlock
io_uring: fix acquire/release annotations
io_uring: kill io_cq_unlock()
io_uring: remove IOU_F_TWQ_FORCE_NORMAL
io_uring: don't batch task put on reqs free
io_uring: move io_clean_op()
io_uring: inline io_dismantle_req()
io_uring: remove io_free_req_tw
io_uring: open code io_put_req_find_next
io_uring: add helpers to decode the fixed file file_ptr
io_uring: use io_file_from_index in io_msg_grab_file
io_uring: use io_file_from_index in __io_sync_cancel
io_uring: return REQ_F_ flags from io_file_get_flags
io_uring: remove io_req_ffs_set
io_uring: remove a confusing comment above io_file_get_flags
io_uring: remove the mode variable in io_file_get_flags
io_uring: remove __io_file_supports_nowait
io_uring: wait interruptibly for request completions on exit
...
In order to prevent request_queue to be freed before cleaning up
blktrace debugfs entries, commit db59133e9279 ("scsi: sg: fix blktrace
debugfs entries leakage") use scsi_device_get(), however,
scsi_device_get() will also grab scsi module reference and scsi module
can't be removed.
It's reported that blktests can't unload scsi_debug after block/001:
blktests (master) # ./check block
block/001 (stress device hotplugging) [failed]
+++ /root/blktests/results/nodev/block/001.out.bad 2023-06-19
Running block/001
Stressing sd
+modprobe: FATAL: Module scsi_debug is in use.
Fix this problem by grabbing request_queue reference directly, so that
scsi host module can still be unloaded while request_queue will be
pinged by sg device.
Reported-by: Chaitanya Kulkarni <chaitanyak@nvidia.com>
Link: https://lore.kernel.org/all/1760da91-876d-fc9c-ab51-999a6f66ad50@nvidia.com/
Fixes: db59133e9279 ("scsi: sg: fix blktrace debugfs entries leakage")
Signed-off-by: Yu Kuai <yukuai3@huawei.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230621160111.1433521-1-yukuai1@huaweicloud.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move the open coded sparse bitmap handling into helper functions as
a preparatory step for converting the sparse interrupt management
to a maple tree.
No functional change.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230519134902.1495562-3-sdonthineni@nvidia.com
* irq/loongarch-fixes-6.5:
: .
: Yet another series of random fixes for the Loongson/Loongarch
: string of interrupt controller, covering
:
: - affinity setting,
: - trigger polarity,
: - wake-up,
: - DT support
: .
irqchip/loongson-eiointc: Add DT init support
dt-bindings: interrupt-controller: Add Loongson EIOINTC
irqchip/loongson-eiointc: Fix irq affinity setting during resume
irqchip/loongson-liointc: Add IRQCHIP_SKIP_SET_WAKE flag
irqchip/loongson-liointc: Fix IRQ trigger polarity
irqchip/loongson-pch-pic: Fix potential incorrect hwirq assignment
irqchip/loongson-pch-pic: Fix initialization of HT vector register
Signed-off-by: Marc Zyngier <maz@kernel.org>
This reverts commit 5b7e5676209120814dbb9fec8bc3769f0f7a7958.
Although including linux/irqchip/mxs.h is technically correct,
this clashes with the parallel removal of this include file
with 32bit ARM modernizing the low level irq handling as part of
5bb578a0c1b8 ("ARM: 9298/1: Drop custom mdesc->handle_irq()").
As such, this patch is not only unnecessary, it also breaks
compilation in -next. Revert it.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Shawn Guo <shawnguo@kernel.org>
If dma_direct_alloc() alloc memory in size of 64MB, the inner function
dma_common_contiguous_remap() will allocate 128KB memory by invoking
the function kmalloc_array(). and the kmalloc_array seems to fail to try to
allocate 128KB mem.
Call trace:
[14977.928623] qcrosvm: page allocation failure: order:5, mode:0x40cc0
[14977.928638] dump_backtrace.cfi_jt+0x0/0x8
[14977.928647] dump_stack_lvl+0x80/0xb8
[14977.928652] warn_alloc+0x164/0x200
[14977.928657] __alloc_pages_slowpath+0x9f0/0xb4c
[14977.928660] __alloc_pages+0x21c/0x39c
[14977.928662] kmalloc_order+0x48/0x108
[14977.928666] kmalloc_order_trace+0x34/0x154
[14977.928668] __kmalloc+0x548/0x7e4
[14977.928673] dma_direct_alloc+0x11c/0x4f8
[14977.928678] dma_alloc_attrs+0xf4/0x138
[14977.928680] gh_vm_ioctl_set_fw_name+0x3c4/0x610 [gunyah]
[14977.928698] gh_vm_ioctl+0x90/0x14c [gunyah]
[14977.928705] __arm64_sys_ioctl+0x184/0x210
work around by doing kvmalloc_array instead.
Signed-off-by: Gao Xu <gaoxu2@hihonor.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Pull NTB updates from Jon Mason:
"Fixes for pci_clean_master, error handling in driver inits, and
various other issues/bugs"
* tag 'ntb-6.5' of https://github.com/jonmason/ntb:
ntb: hw: amd: Fix debugfs_create_dir error checking
ntb.rst: Fix copy and paste error
ntb_netdev: Fix module_init problem
ntb: intel: Remove redundant pci_clear_master
ntb: epf: Remove redundant pci_clear_master
ntb_hw_amd: Remove redundant pci_clear_master
ntb: idt: drop redundant pci_enable_pcie_error_reporting()
MAINTAINERS: git://github -> https://github.com for jonmason
NTB: EPF: fix possible memory leak in pci_vntb_probe()
NTB: ntb_tool: Add check for devm_kcalloc
NTB: ntb_transport: fix possible memory leak while device_register() fails
ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
ntb: idt: Fix error handling in idt_pci_driver_init()
./fs/xfs/xfs_extfree_item.c:723:3-4: Unneeded semicolon
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=5728
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
After commit 0e96ea5c3eb5904e5dc2f ("MIPS: Loongson64: Clean up use of
cc-ifversion") we get a build error when make modules_install:
cc1: error: '-mloongson-mmi' must be used with '-mhard-float'
The reason is when make modules_install, 'call cc-option' doesn't work
in $(KBUILD_CFLAGS) of 'CHECKFLAGS'. Then there is no -mno-loongson-mmi
applied and -march=loongson3a enable MMI instructions.
To be detail, the error message comes from the CHECKFLAGS invocation of
$(CC) but it has no impact on the final result of make modules_install,
it is purely a cosmetic issue. The error occurs because cc-option is
defined in scripts/Makefile.compiler, which is not included in Makefile
when running 'make modules_install', as install targets are not supposed
to require the compiler; see commit 805b2e1d427aab4b ("kbuild: include
Makefile.compiler only when compiler is needed"). As a result, the call
to check for '-mno-loongson-mmi' just never happens.
Fix this by partially reverting to the old logic, use 'call cc-option'
to conditionally apply -march=loongson3a and -march=mips64r2.
By the way, Loongson-2E/2F is also broken in commit 13ceb48bc19c563e05f4
("MIPS: Loongson2ef: Remove unnecessary {as,cc}-option calls") so fix it
together.
Fixes: 13ceb48bc19c563e05f4 ("MIPS: Loongson2ef: Remove unnecessary {as,cc}-option calls")
Fixes: 0e96ea5c3eb5904e5dc2 ("MIPS: Loongson64: Clean up use of cc-ifversion")
Cc: stable@vger.kernel.org
Cc: Feiyang Chen <chenfeiyang@loongson.cn>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
TLDR: It's a mess.
When kexec() is executed on a system with offline CPUs, which are parked in
mwait_play_dead() it can end up in a triple fault during the bootup of the
kexec kernel or cause hard to diagnose data corruption.
The reason is that kexec() eventually overwrites the previous kernel's text,
page tables, data and stack. If it writes to the cache line which is
monitored by a previously offlined CPU, MWAIT resumes execution and ends
up executing the wrong text, dereferencing overwritten page tables or
corrupting the kexec kernels data.
Cure this by bringing the offlined CPUs out of MWAIT into HLT.
Write to the monitored cache line of each offline CPU, which makes MWAIT
resume execution. The written control word tells the offlined CPUs to issue
HLT, which does not have the MWAIT problem.
That does not help, if a stray NMI, MCE or SMI hits the offlined CPUs as
those make it come out of HLT.
A follow up change will put them into INIT, which protects at least against
NMI and SMI.
Fixes: ea53069231f9 ("x86, hotplug: Use mwait to offline a processor, fix the legacy case")
Reported-by: Ashok Raj <ashok.raj@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.492257119@linutronix.de
Initializing the FPU during the early boot process is a pointless
exercise. Early boot is convoluted and fragile enough.
Nothing requires that the FPU is set up early. It has to be initialized
before fork_init() because the task_struct size depends on the FPU register
buffer size.
Move the initialization to arch_cpu_finalize_init() which is the perfect
place to do so.
No functional change.
This allows to remove quite some of the custom early command line parsing,
but that's subject to the next installment.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230613224545.902376621@linutronix.de
Pull splice updates from Jens Axboe:
"This kills off ITER_PIPE to avoid a race between truncate,
iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio
with unpinned/unref'ed pages from an O_DIRECT splice read. This causes
memory corruption.
Instead, we either use (a) filemap_splice_read(), which invokes the
buffered file reading code and splices from the pagecache into the
pipe; (b) copy_splice_read(), which bulk-allocates a buffer, reads
into it and then pushes the filled pages into the pipe; or (c) handle
it in filesystem-specific code.
Summary:
- Rename direct_splice_read() to copy_splice_read()
- Simplify the calculations for the number of pages to be reclaimed
in copy_splice_read()
- Turn do_splice_to() into a helper, vfs_splice_read(), so that it
can be used by overlayfs and coda to perform the checks on the
lower fs
- Make vfs_splice_read() jump to copy_splice_read() to handle
direct-I/O and DAX
- Provide shmem with its own splice_read to handle non-existent pages
in the pagecache. We don't want a ->read_folio() as we don't want
to populate holes, but filemap_get_pages() requires it
- Provide overlayfs with its own splice_read to call down to a lower
layer as overlayfs doesn't provide ->read_folio()
- Provide coda with its own splice_read to call down to a lower layer
as coda doesn't provide ->read_folio()
- Direct ->splice_read to copy_splice_read() in tty, procfs, kernfs
and random files as they just copy to the output buffer and don't
splice pages
- Provide wrappers for afs, ceph, ecryptfs, ext4, f2fs, nfs, ntfs3,
ocfs2, orangefs, xfs and zonefs to do locking and/or revalidation
- Make cifs use filemap_splice_read()
- Replace pointers to generic_file_splice_read() with pointers to
filemap_splice_read() as DIO and DAX are handled in the caller;
filesystems can still provide their own alternate ->splice_read()
op
- Remove generic_file_splice_read()
- Remove ITER_PIPE and its paraphernalia as generic_file_splice_read
was the only user"
* tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux: (31 commits)
splice: kdoc for filemap_splice_read() and copy_splice_read()
iov_iter: Kill ITER_PIPE
splice: Remove generic_file_splice_read()
splice: Use filemap_splice_read() instead of generic_file_splice_read()
cifs: Use filemap_splice_read()
trace: Convert trace/seq to use copy_splice_read()
zonefs: Provide a splice-read wrapper
xfs: Provide a splice-read wrapper
orangefs: Provide a splice-read wrapper
ocfs2: Provide a splice-read wrapper
ntfs3: Provide a splice-read wrapper
nfs: Provide a splice-read wrapper
f2fs: Provide a splice-read wrapper
ext4: Provide a splice-read wrapper
ecryptfs: Provide a splice-read wrapper
ceph: Provide a splice-read wrapper
afs: Provide a splice-read wrapper
9p: Add splice_read wrapper
net: Make sock_splice_read() use copy_splice_read() by default
tty, proc, kernfs, random: Use copy_splice_read()
...
There is no reason not to use __io_cq_unlock_post_flush for intermediate
aux CQE flushing, all ->task_complete should apply there, i.e. if set it
should be the submitter task. Combine them, get rid of of
__io_cq_unlock_post() and rename the left function.
This place was also taking a couple percents of CPU according to
profiles for max throughput net benchmarks due to multishot recv
flooding it with completions.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bbed60734cbec2e833d9c7bdcf9741aada5d8aab.1687518903.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
ext4_blkdev_remove() passes a wrong holder pointer to blkdev_put() which
triggers a warning there. Fix it.
Fixes: 2736e8eeb0cc ("block: use the holder as indication for exclusive opens")
Signed-off-by: Jan Kara <jack@suse.cz>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20230622165107.13687-1-jack@suse.cz
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pull x86 cpu fix from Thomas Gleixner:
"A single fix for x86:
- Prevent a bogus setting for the number of HT siblings, which is
caused by the CPUID evaluation trainwreck of X86. That recomputes
the value for each CPU, so the last CPU "wins". That can cause
completely bogus sibling values"
* tag 'x86-urgent-2023-05-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms
The current implementation utilizes a bitmap for managing interrupt resend
handlers, which is allocated based on the SPARSE_IRQ/NR_IRQS macros.
However, this method may not efficiently utilize memory during runtime,
particularly when IRQ_BITMAP_BITS is large.
Address this issue by using an hlist to manage interrupt resend handlers
instead of relying on a static bitmap memory allocation. Additionally, a
new function, clear_irq_resend(), is introduced and called from
irq_shutdown to ensure a graceful teardown of the interrupt.
Signed-off-by: Shanker Donthineni <sdonthineni@nvidia.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230519134902.1495562-2-sdonthineni@nvidia.com
* irq/lpi-resend:
: .
: Patch series from James Gowans, working around an issue with
: GICv3 LPIs that can fire concurrently on multiple CPUs.
: .
irqchip/gic-v3-its: Enable RESEND_WHEN_IN_PROGRESS for LPIs
genirq: Allow fasteoi handler to resend interrupts on concurrent handling
genirq: Expand doc for PENDING and REPLAY flags
genirq: Use BIT() for the IRQD_* state flags
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add EIOINTC irqchip DT support, which is needed for Loongson chips
based on DT and supporting EIOINTC, such as the Loongson-2K0500 SOC.
Signed-off-by: Binbin Zhou <zhoubinbin@loongson.cn>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/764e02d924094580ac0f1d15535f4b98308705c6.1683279769.git.zhoubinbin@loongson.cn
The initialization function for the J-Core AIC aic_irq_of_init() is
currently missing the call to irq_alloc_descs() which allocates and
initializes all the IRQ descriptors. Add missing function call and
return the error code from irq_alloc_descs() in case the allocation
fails.
Fixes: 981b58f66cfc ("irqchip/jcore-aic: Add J-Core AIC driver")
Signed-off-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
Tested-by: Rob Landley <rob@landley.net>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230510163343.43090-1-glaubitz@physik.fu-berlin.de
Lockdep is certainly right to complain about
(&vma->vm_lock->lock){++++}-{3:3}, at: vma_start_write+0x2d/0x3f
but task is already holding lock:
(&mapping->i_mmap_rwsem){+.+.}-{3:3}, at: mmap_region+0x4dc/0x6db
Invert those to the usual ordering.
Fixes: 33313a747e81 ("mm: lock newly mapped VMA which can be modified after it becomes visible")
Cc: stable@vger.kernel.org
Signed-off-by: Hugh Dickins <hughd@google.com>
Tested-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The debugfs_create_dir function returns ERR_PTR in case of error, and the
only correct way to check if an error occurred is 'IS_ERR' inline function.
This patch will replace the null-comparison with IS_ERR.
Signed-off-by: Anup Sharma <anupnewsmail@gmail.com>
Suggested-by: Ivan Orlov <ivan.orlov0322@gmail.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>
The pointer 'server' is assigned but never read, the pointer is
redundant and can be removed. Cleans up clang scan build warning:
fs/smb/client/dfs.c:217:3: warning: Value stored to 'server' is
never read [deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
Similar to the recent patch strengthening the AGF agf_length
verification, the AGI verifier does not check that the AGI length field
is within known good bounds. This isn't currently checked by runtime
kernel code, yet we assume in many places that it is correct and verify
other metadata against it.
Add length verification to the AGI verifier. Just like the AGF length
checking, the length of the AGI must be equal to the size of the AG
specified in the superblock, unless it is the last AG in the filesystem.
In that case, it must be less than or equal to sb->sb_agblocks and
greater than XFS_MIN_AG_BLOCKS, which is the smallest AG a growfs
operation will allow to exist.
There's only one place in the filesystem that actually uses agi_length,
but let's not leave it vulnerable to the same weird nonsense that
generates syzbot bugs, eh?
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>
After commit 45c7e8af4a5e3f0bea4ac209 ("MIPS: Remove KVM_TE support") we
get a NULL pointer dereference when creating a KVM guest:
[ 146.243409] Starting KVM with MIPS VZ extensions
[ 149.849151] CPU 3 Unable to handle kernel paging request at virtual address 0000000000000300, epc == ffffffffc06356ec, ra == ffffffffc063568c
[ 149.849177] Oops[#1]:
[ 149.849182] CPU: 3 PID: 2265 Comm: qemu-system-mip Not tainted 6.4.0-rc3+ #1671
[ 149.849188] Hardware name: THTF CX TL630 Series/THTF-LS3A4000-7A1000-ML4A, BIOS KL4.1F.TF.D.166.201225.R 12/25/2020
[ 149.849192] $ 0 : 0000000000000000 000000007400cce0 0000000000400004 ffffffff8119c740
[ 149.849209] $ 4 : 000000007400cce1 000000007400cce1 0000000000000000 0000000000000000
[ 149.849221] $ 8 : 000000240058bb36 ffffffff81421ac0 0000000000000000 0000000000400dc0
[ 149.849233] $12 : 9800000102a07cc8 ffffffff80e40e38 0000000000000001 0000000000400dc0
[ 149.849245] $16 : 0000000000000000 9800000106cd0000 9800000106cd0000 9800000100cce000
[ 149.849257] $20 : ffffffffc0632b28 ffffffffc05b31b0 9800000100ccca00 0000000000400000
[ 149.849269] $24 : 9800000106cd09ce ffffffff802f69d0
[ 149.849281] $28 : 9800000102a04000 9800000102a07cd0 98000001106a8000 ffffffffc063568c
[ 149.849293] Hi : 00000335b2111e66
[ 149.849295] Lo : 6668d90061ae0ae9
[ 149.849298] epc : ffffffffc06356ec kvm_vz_vcpu_setup+0xc4/0x328 [kvm]
[ 149.849324] ra : ffffffffc063568c kvm_vz_vcpu_setup+0x64/0x328 [kvm]
[ 149.849336] Status: 7400cce3 KX SX UX KERNEL EXL IE
[ 149.849351] Cause : 1000000c (ExcCode 03)
[ 149.849354] BadVA : 0000000000000300
[ 149.849357] PrId : 0014c004 (ICT Loongson-3)
[ 149.849360] Modules linked in: kvm nfnetlink_queue nfnetlink_log nfnetlink fuse sha256_generic libsha256 cfg80211 rfkill binfmt_misc vfat fat snd_hda_codec_hdmi input_leds led_class snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_hda_core snd_pcm snd_timer snd serio_raw xhci_pci radeon drm_suballoc_helper drm_display_helper xhci_hcd ip_tables x_tables
[ 149.849432] Process qemu-system-mip (pid: 2265, threadinfo=00000000ae2982d2, task=0000000038e09ad4, tls=000000ffeba16030)
[ 149.849439] Stack : 9800000000000003 9800000100ccca00 9800000100ccc000 ffffffffc062cef4
[ 149.849453] 9800000102a07d18 c89b63a7ab338e00 0000000000000000 ffffffff811a0000
[ 149.849465] 0000000000000000 9800000106cd0000 ffffffff80e59938 98000001106a8920
[ 149.849476] ffffffff80e57f30 ffffffffc062854c ffffffff811a0000 9800000102bf4240
[ 149.849488] ffffffffc05b0000 ffffffff80e3a798 000000ff78000000 000000ff78000010
[ 149.849500] 0000000000000255 98000001021f7de0 98000001023f0078 ffffffff81434000
[ 149.849511] 0000000000000000 0000000000000000 9800000102ae0000 980000025e92ae28
[ 149.849523] 0000000000000000 c89b63a7ab338e00 0000000000000001 ffffffff8119dce0
[ 149.849535] 000000ff78000010 ffffffff804f3d3c 9800000102a07eb0 0000000000000255
[ 149.849546] 0000000000000000 ffffffff8049460c 000000ff78000010 0000000000000255
[ 149.849558] ...
[ 149.849565] Call Trace:
[ 149.849567] [<ffffffffc06356ec>] kvm_vz_vcpu_setup+0xc4/0x328 [kvm]
[ 149.849586] [<ffffffffc062cef4>] kvm_arch_vcpu_create+0x184/0x228 [kvm]
[ 149.849605] [<ffffffffc062854c>] kvm_vm_ioctl+0x64c/0xf28 [kvm]
[ 149.849623] [<ffffffff805209c0>] sys_ioctl+0xc8/0x118
[ 149.849631] [<ffffffff80219eb0>] syscall_common+0x34/0x58
The root cause is the deletion of kvm_mips_commpage_init() leaves vcpu
->arch.cop0 NULL. So fix it by making cop0 from a pointer to an embedded
object.
Fixes: 45c7e8af4a5e3f0bea4ac209 ("MIPS: Remove KVM_TE support")
Cc: stable@vger.kernel.org
Reported-by: Yu Zhao <yuzhao@google.com>
Suggested-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Monitoring idletask::thread_info::flags in mwait_play_dead() has been an
obvious choice as all what is needed is a cache line which is not written
by other CPUs.
But there is a use case where a "dead" CPU needs to be brought out of
MWAIT: kexec().
This is required as kexec() can overwrite text, pagetables, stacks and the
monitored cacheline of the original kernel. The latter causes MWAIT to
resume execution which obviously causes havoc on the kexec kernel which
results usually in triple faults.
Use a dedicated per CPU storage to prepare for that.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ashok Raj <ashok.raj@intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230615193330.434553750@linutronix.de