commits
Pull Kbuild fixes from Masahiro Yamada:
- Fix section mismatch warning messages for riscv and loongarch
- Remove CONFIG_IA64 left-over from linux/export-internal.h
- Fix the location of the quotes for UIMAGE_NAME
- Fix a memory leak bug in Kconfig
* tag 'kbuild-fixes-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: fix memory leak from range properties
kbuild: Move the single quotes for image name
linux/export: clean up the IA-64 KSYM_FUNC macro
modpost: fix section mismatch message for RELA
Pull irq fix from Borislav Petkov:
- Flush the translation service tables to prevent unpredictable
behavior on non-coherent GIC devices
* tag 'irq_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Flush ITS tables correctly in non-coherent GIC designs
Currently, sym_validate_range() duplicates the range string using
xstrdup(), which is overwritten by a subsequent sym_calc_value() call.
It results in a memory leak.
Instead, only the pointer should be copied.
Below is a test case, with a summary from Valgrind.
[Test Kconfig]
config FOO
int "foo"
range 10 20
[Test .config]
CONFIG_FOO=0
[Before]
LEAK SUMMARY:
definitely lost: 3 bytes in 1 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 17,465 bytes in 21 blocks
suppressed: 0 bytes in 0 blocks
[After]
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 17,462 bytes in 20 blocks
suppressed: 0 bytes in 0 blocks
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull x86 fixes from Borislav Petkov:
- Ignore invalid x2APIC entries in order to not waste per-CPU data
- Fix a back-to-back signals handling scenario when shadow stack is in
use
- A documentation fix
- Add Kirill as TDX maintainer
* tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/acpi: Ignore invalid x2APIC entries
x86/shstk: Delay signal entry SSP write until after user accesses
x86/Documentation: Indent 'note::' directive for protocol version number note
MAINTAINERS: Add Intel TDX entry
In non-coherent GIC designs, the ITS tables must be flushed before writing
to the GITS_BASER<n> registers, otherwise the ITS could read dirty tables,
which results in unpredictable behavior.
Flush the tables right at the begin of its_setup_baser() to prevent that.
[ tglx: Massage changelog ]
Fixes: a8707f553884 ("irqchip/gic-v3: Add Rockchip 3588001 erratum workaround")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fang Xiang <fangxiang3@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231030083256.4345-1-fangxiang3@xiaomi.com
Add quotes where UIMAGE_NAME is used, rather than where it is defined.
This allows the UIMAGE_NAME variable to be set by the user.
Signed-off-by: Simon Glass <sjg@chromium.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull timer fix from Borislav Petkov:
- Do the push of pending hrtimers away from a CPU which is being
offlined earlier in the offlining process in order to prevent a
deadlock
* tag 'timers_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimers: Push pending hrtimers away from outgoing CPU earlier
Currently, the kernel enumerates the possible CPUs by parsing both ACPI
MADT Local APIC entries and x2APIC entries. So CPUs with "valid" APIC IDs,
even if they have duplicated APIC IDs in Local APIC and x2APIC, are always
enumerated.
Below is what ACPI MADT Local APIC and x2APIC describes on an
Ivebridge-EP system,
[02Ch 0044 1] Subtable Type : 00 [Processor Local APIC]
[02Fh 0047 1] Local Apic ID : 00
...
[164h 0356 1] Subtable Type : 00 [Processor Local APIC]
[167h 0359 1] Local Apic ID : 39
[16Ch 0364 1] Subtable Type : 00 [Processor Local APIC]
[16Fh 0367 1] Local Apic ID : FF
...
[3ECh 1004 1] Subtable Type : 09 [Processor Local x2APIC]
[3F0h 1008 4] Processor x2Apic ID : 00000000
...
[B5Ch 2908 1] Subtable Type : 09 [Processor Local x2APIC]
[B60h 2912 4] Processor x2Apic ID : 00000077
As a result, kernel shows "smpboot: Allowing 168 CPUs, 120 hotplug CPUs".
And this wastes significant amount of memory for the per-cpu data.
Plus this also breaks https://lore.kernel.org/all/87edm36qqb.ffs@tglx/,
because __max_logical_packages is over-estimated by the APIC IDs in
the x2APIC entries.
According to https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-local-x2apic-structure:
"[Compatibility note] On some legacy OSes, Logical processors with APIC
ID values less than 255 (whether in XAPIC or X2APIC mode) must use the
Processor Local APIC structure to convey their APIC information to OSPM,
and those processors must be declared in the DSDT using the Processor()
keyword. Logical processors with APIC ID values 255 and greater must use
the Processor Local x2APIC structure and be declared using the Device()
keyword."
Therefore prevent the registration of x2APIC entries with an APIC ID less
than 255 if the local APIC table enumerates valid APIC IDs.
[ tglx: Simplify the logic ]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230702162802.344176-1-rui.zhang@intel.com
Pull x86 TDX updates from Dave Hansen:
"The majority of this is a rework of the assembly and C wrappers that
are used to talk to the TDX module and VMM. This is a nice cleanup in
general but is also clearing the way for using this code when Linux is
the TDX VMM.
There are also some tidbits to make TDX guests play nicer with Hyper-V
and to take advantage the hardware TSC.
Summary:
- Refactor and clean up TDX hypercall/module call infrastructure
- Handle retrying/resuming page conversion hypercalls
- Make sure to use the (shockingly) reliable TSC in TDX guests"
[ TLA reminder: TDX is "Trust Domain Extensions", Intel's guest VM
confidentiality technology ]
* tag 'x86_tdx_for_6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tdx: Mark TSC reliable
x86/tdx: Fix __noreturn build warning around __tdx_hypercall_failed()
x86/virt/tdx: Make TDX_MODULE_CALL handle SEAMCALL #UD and #GP
x86/virt/tdx: Wire up basic SEAMCALL functions
x86/tdx: Remove 'struct tdx_hypercall_args'
x86/tdx: Reimplement __tdx_hypercall() using TDX_MODULE_CALL asm
x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL
x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs
x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structure
x86/tdx: Rename __tdx_module_call() to __tdcall()
x86/tdx: Make macros of TDCALLs consistent with the spec
x86/tdx: Skip saving output regs when SEAMCALL fails with VMFailInvalid
x86/tdx: Zero out the missing RSI in TDX_HYPERCALL macro
x86/tdx: Retry partially-completed page conversion hypercalls
With commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture"),
there is no need to keep the IA-64 definition of the KSYM_FUNC macro.
Clean up the IA-64 definition of the KSYM_FUNC macro.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull scheduler fixes from Borislav Petkov:
- Fix virtual runtime calculation when recomputing a sched entity's
weights
- Fix wrongly rejected unprivileged poll requests to the cgroup psi
pressure files
- Make sure the load balancing is done by only one CPU
* tag 'sched_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix the decision for load balance
sched: psi: fix unprivileged polling against cgroups
sched/eevdf: Fix vruntime adjustment on reweight
2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug")
solved the straight forward CPU hotplug deadlock vs. the scheduler
bandwidth timer. Yu discovered a more involved variant where a task which
has a bandwidth timer started on the outgoing CPU holds a lock and then
gets throttled. If the lock required by one of the CPU hotplug callbacks
the hotplug operation deadlocks because the unthrottling timer event is not
handled on the dying CPU and can only be recovered once the control CPU
reaches the hotplug state which pulls the pending hrtimers from the dead
CPU.
Solve this by pushing the hrtimers away from the dying CPU in the dying
callbacks. Nothing can queue a hrtimer on the dying CPU at that point because
all other CPUs spin in stop_machine() with interrupts disabled and once the
operation is finished the CPU is marked offline.
Reported-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liu Tie <liutie4@huawei.com>
Link: https://lore.kernel.org/r/87a5rphara.ffs@tglx
When a signal is being delivered, the kernel needs to make accesses to
userspace. These accesses could encounter an access error, in which case
the signal delivery itself will trigger a segfault. Usually this would
result in the kernel killing the process. But in the case of a SEGV signal
handler being configured, the failure of the first signal delivery will
result in *another* signal getting delivered. The second signal may
succeed if another thread has resolved the issue that triggered the
segfault (i.e. a well timed mprotect()/mmap()), or the second signal is
being delivered to another stack (i.e. an alt stack).
On x86, in the non-shadow stack case, all the accesses to userspace are
done before changes to the registers (in pt_regs). The operation is
aborted when an access error occurs, so although there may be writes done
for the first signal, control flow changes for the signal (regs->ip,
regs->sp, etc) are not committed until all the accesses have already
completed successfully. This means that the second signal will be
delivered as if it happened at the time of the first signal. It will
effectively replace the first aborted signal, overwriting the half-written
frame of the aborted signal. So on sigreturn from the second signal,
control flow will resume happily from the point of control flow where the
original signal was delivered.
The problem is, when shadow stack is active, the shadow stack SSP
register/MSR is updated *before* some of the userspace accesses. This
means if the earlier accesses succeed and the later ones fail, the second
signal will not be delivered at the same spot on the shadow stack as the
first one. So on sigreturn from the second signal, the SSP will be
pointing to the wrong location on the shadow stack (off by a frame).
Pengfei privately reported that while using a shadow stack enabled glibc,
the “signal06” test in the LTP test-suite hung. It turns out it is
testing the above described double signal scenario. When this test was
compiled with shadow stack, the first signal pushed a shadow stack
sigframe, then the second pushed another. When the second signal was
handled, the SSP was at the first shadow stack signal frame instead of
the original location. The test then got stuck as the #CP from the twice
incremented SSP was incorrect and generated segfaults in a loop.
Fix this by adjusting the SSP register only after any userspace accesses,
such that there can be no failures after the SSP is adjusted. Do this by
moving the shadow stack sigframe push logic to happen after all other
userspace accesses.
Note, sigreturn (as opposed to the signal delivery dealt with in this
patch) has ordering behavior that could lead to similar failures. The
ordering issues there extend beyond shadow stack to include the alt stack
restoration. Fixing that would require cross-arch changes, and the
ordering today does not cause any known test or apps breakages. So leave
it as is, for now.
[ dhansen: minor changelog/subject tweak ]
Fixes: 05e36022c054 ("x86/shstk: Handle signals for shadow stack")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20231107182251.91276-1-rick.p.edgecombe%40intel.com
Link: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/signal/signal06.c
Pull parisc updates from Helge Deller:
"Usual fixes and updates:
- Add up to 12 nops after TLB inserts for PA8x00 CPUs as the
specification requires (Dave Anglin)
- Simplify the parisc smp_prepare_boot_cpu() code (Russell King)
- Use 64-bit little-endian values in SBA IOMMU PDIR table for AGP
Since there is upcoming support for booting a 64-bit kernel on QEMU,
some corner cases were fixed and improvements added:
- Fix 64-bit kernel crash in STI (graphics console) font setup code
which miscalculated the font start address as it gets signed vs
unsigned offsets wrong
- Support building an uncompressed Linux kernel
- Add support for soft power-off in qemu"
* tag 'parisc-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
fbdev: stifb: Make the STI next font pointer a 32-bit signed offset
parisc: Show default CPU PSW.W setting as reported by PDC
parisc/pdc: Add width field to struct pdc_model
parisc: Add nop instructions after TLB inserts
parisc: simplify smp_prepare_boot_cpu()
parisc/agp: Use 64-bit LE values in SBA IOMMU PDIR table
parisc/firmware: Use PDC constants for narrow/wide firmware
parisc: Move parisc_narrow_firmware variable to header file
parisc/power: Trivial whitespace cleanups and license update
parisc/power: Add power soft-off when running on qemu
parisc: Allow building uncompressed Linux kernel
parisc: Add some missing PDC functions and constants
parisc: sba-iommu: Fix comment when calculating IOC number
In x86 virtualization environments, including TDX, RDTSC instruction is
handled without causing a VM exit, resulting in minimal overhead and
jitters. On the other hand, other clock sources (such as HPET, ACPI
timer, APIC, etc.) necessitate VM exits to implement, resulting in more
fluctuating measurements compared to TSC. Thus, those clock sources are
not effective for calibrating TSC.
As a foundation, the host TSC is guaranteed to be invariant on any
system which enumerates TDX support.
TDX guests and the TDX module build on that foundation by enforcing:
- Virtual TSC is monotonously incrementing for any single VCPU;
- Virtual TSC values are consistent among all the TD’s VCPUs at the
level supported by the CPU:
+ VMM is required to set the same TSC_ADJUST;
+ VMM must not modify from initial value of TSC_ADJUST before
SEAMCALL;
- The frequency is determined by TD configuration:
+ Virtual TSC frequency is specified by VMM on TDH.MNG.INIT;
+ Virtual TSC starts counting from 0 at TDH.MNG.INIT;
The result is that a reliable TSC is a TDX architectural guarantee.
Use the TSC as the only reliable clock source in TD guests, bypassing
unstable calibration.
This is similar to what the kernel already does in some VMWare and
HyperV environments.
[ dhansen: changelog tweaks ]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Erdem Aktas <erdemaktas@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/all/20231006144549.2633-1-kirill.shutemov%40linux.intel.com
The section mismatch check prints a bogus symbol name on some
architectures.
[test code]
#include <linux/init.h>
int __initdata foo;
int get_foo(void) { return foo; }
If you compile it with GCC for riscv or loongarch, modpost will show an
incorrect symbol name:
WARNING: modpost: vmlinux: section mismatch in reference: get_foo+0x8 (section: .text) -> done (section: .init.data)
To get the correct symbol address, the st_value must be added.
This issue has never been noticed since commit 93684d3b8062 ("kbuild:
include symbol names in section mismatch warnings") presumably because
st_value becomes zero on most architectures when the referenced symbol
is looked up. It is not true for riscv or loongarch, at least.
With this fix, modpost will show the correct symbol name:
WARNING: modpost: vmlinux: section mismatch in reference: get_foo+0x8 (section: .text) -> foo (section: .init.data)
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Pull locking fix from Borislav Petkov:
- Fix a hardcoded futex flags case which lead to one robust futex test
failure
* tag 'locking_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Fix hardcoded flags
should_we_balance is called for the decision to do load-balancing.
When sched ticks invoke this function, only one CPU should return
true. However, in the current code, two CPUs can return true. The
following situation, where b means busy and i means idle, is an
example, because CPU 0 and CPU 2 return true.
[0, 1] [2, 3]
b b i b
This fix checks if there exists an idle CPU with busy sibling(s)
after looking for a CPU on an idle core. If some idle CPUs with busy
siblings are found, just the first one should do load-balancing.
Fixes: b1bfeab9b002 ("sched/fair: Consider the idle state of the whole core for load balance")
Signed-off-by: Keisuke Nishimura <keisuke.nishimura@inria.fr>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20231031133821.1570861-1-keisuke.nishimura@inria.fr
The protocol version number note is between the protocol version table and
the memory layout section. As such, Sphinx renders the note directive not
only on the actual note, but until the end of doc.
Indent the directive so that only the actual protocol version number
note is rendered as such.
Fixes: 2c33c27fd603 ("x86/boot: Introduce kernel_info")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20231106101206.76487-2-bagasdotme@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull m68k updates from Geert Uytterhoeven:
- misc aesthetical improvements for the floating point emulator
- remove the last user of strlcpy()
- use kernel's generic libgcc functions
- misc fixes for W=1 builds
- misc indentation fixes
- misc fixes and improvements
- defconfig updates
* tag 'm68k-for-v6.7-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: (72 commits)
m68k: lib: Include <linux/libgcc.h> for __muldi3()
m68k: fpsp040: Fix indentation by 5 spaces
m68k: Fix indentation by 2 or 5 spaces in <asm/page_mm.h>
m68k: kernel: Fix indentation by 7 spaces in traps.c
m68k: sun3: Fix indentation by 5 or 7 spaces
m68k: Fix indentation by 7 spaces in <asm/io_mm.h>
m68k: defconfig: Update virt_defconfig for v6.6-rc3
m68k: defconfig: Update defconfigs for v6.6-rc1
m68k: io: Mark mmio read addresses as const
m68k: Replace GPL 2.0+ README.legal boilerplate with SPDX
m68k: sun3: Change led_pattern[] to unsigned char
m68k: Add missing types to asm/irq.h
m68k: sun3/3x: Add and use "sun3.h"
m68k: sun3x: Make dvma_print() static
m68k: sun3x: Make sun3x_halt() static
m68k: sun3x: Do not mark dvma_map_iommu() inline
m68k: sun3x: Fix signature of sun3_leds()
m68k: sun3: Make sun3_platform_init() static
m68k: sun3: Make print_pte() static
m68k: sun3: Annotate prom_printf() with __printf()
...
The pointer to the next STI font is actually a signed 32-bit
offset. With this change the 64-bit kernel will correctly subract
the (signed 32-bit) offset instead of adding a (unsigned 32-bit)
offset. It has no effect on 32-bit kernels.
This fixes the stifb driver with a 64-bit kernel on qemu.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
LKP reported below build warning:
vmlinux.o: warning: objtool: __tdx_hypercall+0x128: __tdx_hypercall_failed() is missing a __noreturn annotation
The __tdx_hypercall_failed() function definition already has __noreturn
annotation, but it turns out the __noreturn must be annotated to the
function declaration.
PeterZ explains:
"FWIW, the reason being that...
The point of noreturn is that the caller should know to stop generating
code. For that the declaration needs the attribute, because call sites
typically do not have access to the function definition in C."
Add __noreturn annotation to the declaration of __tdx_hypercall_failed()
to fix. It's not a bad idea to document the __noreturn nature at the
definition site either, so keep the annotation at the definition.
Note <asm/shared/tdx.h> is also included by TDX related assembly files.
Include <linux/compiler_attributes.h> only in case of !__ASSEMBLY__
otherwise compiling assembly file would trigger build error.
Also, following the objtool documentation, add __tdx_hypercall_failed()
to "tools/objtool/noreturns.h".
Fixes: c641cfb5c157 ("x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230918041858.331234-1-kai.huang@intel.com
Closes: https://lore.kernel.org/oe-kbuild-all/202309140828.9RdmlH2Z-lkp@intel.com/
Pull perf fix from Borislav Petkov:
- Make sure the context refcount is transferred too when migrating perf
events
* tag 'perf_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix cpuctx refcounting
Xi reported that commit 5694289ce183 ("futex: Flag conversion") broke
glibc's robust futex tests.
This was narrowed down to the change of FLAGS_SHARED from 0x01 to
0x10, at which point Florian noted that handle_futex_death() has a
hardcoded flags argument of 1.
Change this to: FLAGS_SIZE_32 | FLAGS_SHARED, matching how
futex_to_flags() unconditionally sets FLAGS_SIZE_32 for all legacy
futex ops.
Reported-by: Xi Ruoyao <xry111@xry111.site>
Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20231114201402.GA25315@noisy.programming.kicks-ass.net
Fixes: 5694289ce183 ("futex: Flag conversion")
Cc: <stable@vger.kernel.org>
519fabc7aaba ("psi: remove 500ms min window size limitation for
triggers") breaks unprivileged psi polling on cgroups.
Historically, we had a privilege check for polling in the open() of a
pressure file in /proc, but were erroneously missing it for the open()
of cgroup pressure files.
When unprivileged polling was introduced in d82caa273565 ("sched/psi:
Allow unprivileged polling of N*2s period"), it needed to filter
privileges depending on the exact polling parameters, and as such
moved the CAP_SYS_RESOURCE check from the proc open() callback to
psi_trigger_create(). Both the proc files as well as cgroup files go
through this during write(). This implicitly added the missing check
for privileges required for HT polling for cgroups.
When 519fabc7aaba ("psi: remove 500ms min window size limitation for
triggers") followed right after to remove further restrictions on the
RT polling window, it incorrectly assumed the cgroup privilege check
was still missing and added it to the cgroup open(), mirroring what we
used to do for proc files in the past.
As a result, unprivileged poll requests that would be supported now
get rejected when opening the cgroup pressure file for writing.
Remove the cgroup open() check. psi_trigger_create() handles it.
Fixes: 519fabc7aaba ("psi: remove 500ms min window size limitation for triggers")
Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: stable@vger.kernel.org # 6.5+
Link: https://lore.kernel.org/r/20231026164114.2488682-1-hannes@cmpxchg.org
Pull misc x86 fixes from Ingo Molnar:
- Fix a possible CPU hotplug deadlock bug caused by the new TSC
synchronization code
- Fix a legacy PIC discovery bug that results in device troubles on
affected systems, such as non-working keybards, etc
- Add a new Intel CPU model number to <asm/intel-family.h>
* tag 'x86-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Defer marking TSC unstable to a worker
x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility
x86/cpu: Add model number for Intel Arrow Lake mobile processor
Add myself as Intel TDX maintainer.
I drove upstreaming most of TDX code so far and I will continue
working on TDX for foreseeable future.
[ dhansen: * Add myself as a reviewer too
* Swap Maintained=>Supported. I double
checked Kirill is still being paid
* Add drivers/virt/coco/tdx-guest ]
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/all/20231101233314.2567-1-kirill.shutemov%40linux.intel.com
Pull arm64 updates from Catalin Marinas:
"No major architecture features this time around, just some new HWCAP
definitions, support for the Ampere SoC PMUs and a few fixes/cleanups.
The bulk of the changes is reworking of the CPU capability checking
code (cpus_have_cap() etc).
- Major refactoring of the CPU capability detection logic resulting
in the removal of the cpus_have_const_cap() function and migrating
the code to "alternative" branches where possible
- Backtrace/kgdb: use IPIs and pseudo-NMI
- Perf and PMU:
- Add support for Ampere SoC PMUs
- Multi-DTC improvements for larger CMN configurations with
multiple Debug & Trace Controllers
- Rework the Arm CoreSight PMU driver to allow separate
registration of vendor backend modules
- Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf
driver; use device_get_match_data() in the xgene driver; fix
NULL pointer dereference in the hisi driver caused by calling
cpuhp_state_remove_instance(); use-after-free in the hisi driver
- HWCAP updates:
- FEAT_SVE_B16B16 (BFloat16)
- FEAT_LRCPC3 (release consistency model)
- FEAT_LSE128 (128-bit atomic instructions)
- SVE: remove a couple of pseudo registers from the cpufeature code.
There is logic in place already to detect mismatched SVE features
- Miscellaneous:
- Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA
bouncing is needed. The buffer is still required for small
kmalloc() buffers
- Fix module PLT counting with !RANDOMIZE_BASE
- Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move
synchronisation code out of the set_ptes() loop
- More compact cpufeature displaying enabled cores
- Kselftest updates for the new CPU features"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits)
arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
perf: hisi: Fix use-after-free when register pmu fails
drivers/perf: hisi_pcie: Initialize event->cpu only on success
drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
arm64: cpufeature: Change DBM to display enabled cores
arm64: cpufeature: Display the set of cores with a feature
perf/arm-cmn: Enable per-DTC counter allocation
perf/arm-cmn: Rework DTC counters (again)
perf/arm-cmn: Fix DTC domain detection
drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
arm64: Remove system_uses_lse_atomics()
arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
drivers/perf: xgene: Use device_get_match_data()
perf/amlogic: add missing MODULE_DEVICE_TABLE
arm64/mm: Hoist synchronization out of set_ptes() loop
...
When building with W=1:
arch/m68k/lib/muldi3.c:82:1: warning: no previous prototype for ‘__muldi3’ [-Wmissing-prototypes]
82 | __muldi3 (DItype u, DItype v)
| ^~~~~~~~
Fix this by including <linux/libgcc.h>.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/160c1fd14b4798f576d9649334b1d2c77db5cb07.1697008341.git.geert@linux-m68k.org
The last word shows the default PSW.W setting.
Signed-off-by: Helge Deller <deller@gmx.de>
SEAMCALL instruction causes #UD if the CPU isn't in VMX operation.
Currently the TDX_MODULE_CALL assembly doesn't handle #UD, thus making
SEAMCALL when VMX is disabled would cause Oops.
Unfortunately, there are legal cases that SEAMCALL can be made when VMX
is disabled. For instance, VMX can be disabled due to emergency reboot
while there are still TDX guests running.
Extend the TDX_MODULE_CALL assembly to return an error code for #UD to
handle this case gracefully, e.g., KVM can then quietly eat all SEAMCALL
errors caused by emergency reboot.
SEAMCALL instruction also causes #GP when TDX isn't enabled by the BIOS.
Use _ASM_EXTABLE_FAULT() to catch both exceptions with the trap number
recorded, and define two new error codes by XORing the trap number to
the TDX_SW_ERROR. This opportunistically handles #GP too while using
the same simple assembly code.
A bonus is when kernel mistakenly calls SEAMCALL when CPU isn't in VMX
operation, or when TDX isn't enabled by the BIOS, or when the BIOS is
buggy, the kernel can get a nicer error code rather than a less
understandable Oops.
This is basically based on Peter's code.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/de975832a367f476aab2d0eb0d9de66019a16b54.1692096753.git.kai.huang%40intel.com
The commands should be sorted inside the group definition.
Fix the ordering so we won't get following warning:
WARN_ON(iwl_cmd_groups_verify_sorted(trans_cfg))
Link: https://lore.kernel.org/regressions/2fa930bb-54dd-4942-a88d-05a47c8e9731@gmail.com/
Link: https://lore.kernel.org/linux-wireless/CAHk-=wix6kqQ5vHZXjOPpZBfM7mMm9bBZxi2Jh7XnaKCqVf94w@mail.gmail.com/
Fixes: b6e3d1ba4fcf ("wifi: iwlwifi: mvm: implement new firmware API for statistics")
Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Tested-by: Damian Tometzki <damian@riscv-rocks.de>
Acked-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull SCSI fixes from James Bottomley:
"Seven small fixes, six in drivers and one in sd.
The sd fix is so large because it changes a struct pointer to a struct
but otherwise is fairly simple"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: qcom-ufs: dt-bindings: Document the SM8650 UFS Controller
scsi: sd: Fix sshdr use in sd_suspend_common()
scsi: scsi_debug: Delete some bogus error checking
scsi: scsi_debug: Fix some bugs in sdebug_error_write()
scsi: ufs: core: Fix racing issue between ufshcd_mcq_abort() and ISR
scsi: ufs: core: Expand MCQ queue slot to DeviceQueueDepth + 1
scsi: qla2xxx: Fix system crash due to bad pointer access
Audit of the refcounting turned up that perf_pmu_migrate_context()
fails to migrate the ctx refcount.
Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20230612093539.085862001@infradead.org
Cc: <stable@vger.kernel.org>
vruntime of the (on_rq && !0-lag) entity needs to be adjusted when
it gets re-weighted, and the calculations can be simplified based
on the fact that re-weight won't change the w-average of all the
entities. Please check the proofs in comments.
But adjusting vruntime can also cause position change in RB-tree
hence require re-queue to fix up which might be costly. This might
be avoided by deferring adjustment to the time the entity actually
leaves tree (dequeue/pick), but that will negatively affect task
selection and probably not good enough either.
Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231107090510.71322-2-wuyun.abel@bytedance.com
Pull irq fix from Ingo Molnar:
"Restore unintentionally lost quirk settings in the GIC irqchip driver,
which broke certain devices"
* tag 'irq-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/gic-v3-its: Don't override quirk settings with default values
Tetsuo reported the following lockdep splat when the TSC synchronization
fails during CPU hotplug:
tsc: Marking TSC unstable due to check_tsc_sync_source failed
WARNING: inconsistent lock state
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
ffffffff8cfa1c78 (watchdog_lock){?.-.}-{2:2}, at: clocksource_watchdog+0x23/0x5a0
{IN-HARDIRQ-W} state was registered at:
_raw_spin_lock_irqsave+0x3f/0x60
clocksource_mark_unstable+0x1b/0x90
mark_tsc_unstable+0x41/0x50
check_tsc_sync_source+0x14f/0x180
sysvec_call_function_single+0x69/0x90
Possible unsafe locking scenario:
lock(watchdog_lock);
<Interrupt>
lock(watchdog_lock);
stack backtrace:
_raw_spin_lock+0x30/0x40
clocksource_watchdog+0x23/0x5a0
run_timer_softirq+0x2a/0x50
sysvec_apic_timer_interrupt+0x6e/0x90
The reason is the recent conversion of the TSC synchronization function
during CPU hotplug on the control CPU to a SMP function call. In case
that the synchronization with the upcoming CPU fails, the TSC has to be
marked unstable via clocksource_mark_unstable().
clocksource_mark_unstable() acquires 'watchdog_lock', but that lock is
taken with interrupts enabled in the watchdog timer callback to minimize
interrupt disabled time. That's obviously a possible deadlock scenario,
Before that change the synchronization function was invoked in thread
context so this could not happen.
As it is not crucical whether the unstable marking happens slightly
delayed, defer the call to a worker thread which avoids the lock context
problem.
Fixes: 9d349d47f0e3 ("x86/smpboot: Make TSC synchronization function call based")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87zg064ceg.ffs@tglx
Pull drm updates from Dave Airlie:
"Highlights:
- AMD adds some more upcoming HW platforms
- Intel made Meteorlake stable and started adding Lunarlake
- nouveau has a bunch of display rework in prepartion for the NVIDIA
GSP firmware support
- msm adds a7xx support
- habanalabs has finished migration to accel subsystem
Detail summary:
kernel:
- add initial vmemdup-user-array
core:
- fix platform remove() to return void
- drm_file owner updated to reflect owner
- move size calcs to drm buddy allocator
- let GPUVM build as a module
- allow variable number of run-queues in scheduler
edid:
- handle bad h/v sync_end in EDIDs
panfrost:
- add Boris as maintainer
fbdev:
- use fb_ops helpers more
- only allow logo use from fbcon
- rename fb_pgproto to pgprot_framebuffer
- add HPD state to drm_connector_oob_hotplug_event
- convert to fbdev i/o mem helpers
i915:
- Enable meteorlake by default
- Early Xe2 LPD/Lunarlake display enablement
- Rework subplatforms into IP version checks
- GuC based TLB invalidation for Meteorlake
- Display rework for future Xe driver integration
- LNL FBC features
- LNL display feature capability reads
- update recommended fw versions for DG2+
- drop fastboot module parameter
- added deviceid for Arrowlake-S
- drop preproduction workarounds
- don't disable preemption for resets
- cleanup inlines in headers
- PXP firmware loading fix
- Fix sg list lengths
- DSC PPS state readout/verification
- Add more RPL P/U PCI IDs
- Add new DG2-G12 stepping
- DP enhanced framing support to state checker
- Improve shared link bandwidth management
- stop using GEM macros in display code
- refactor related code into display code
- locally enable W=1 warnings
- remove PSR watchdog timers on LNL
amdgpu:
- RAS/FRU EEPROM updatse
- IP discovery updatses
- GC 11.5 support
- DCN 3.5 support
- VPE 6.1 support
- NBIO 7.11 support
- DML2 support
- lots of IP updates
- use flexible arrays for bo list handling
- W=1 fixes
- Enable seamless boot in more cases
- Enable context type property for HDMI
- Rework GPUVM TLB flushing
- VCN IB start/size alignment fixes
amdkfd:
- GC 10/11 fixes
- GC 11.5 support
- use partial migration in GPU faults
radeon:
- W=1 Fixes
- fix some possible buffer overflow/NULL derefs
nouveau:
- update uapi for NO_PREFETCH
- scheduler/fence fixes
- rework suspend/resume for GSP-RM
- rework display in preparation for GSP-RM
habanalabs:
- uapi: expose tsc clock
- uapi: block access to eventfd through control device
- uapi: force dma-buf export to PAGE_SIZE alignments
- complete move to accel subsystem
- move firmware interface include files
- perform hard reset on PCIe AXI drain event
- optimise user interrupt handling
msm:
- DP: use existing helpers for DPCD
- DPU: interrupts reworked
- gpu: a7xx (a730/a740) support
- decouple msm_drv from kms for headless devices
mediatek:
- MT8188 dsi/dp/edp support
- DDP GAMMA - 12 bit LUT support
- connector dynamic selection capability
rockchip:
- rv1126 mipi-dsi/vop support
- add planar formats
ast:
- rename constants
panels:
- Mitsubishi AA084XE01
- JDI LPM102A188A
- LTK050H3148W-CTA6
ivpu:
- power management fixes
qaic:
- add detach slice bo api
komeda:
- add NV12 writeback
tegra:
- support NVSYNC/NHSYNC
- host1x suspend fixes
ili9882t:
- separate into own driver"
* tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm: (1803 commits)
drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo
drm/amdgpu: Remove duplicate fdinfo fields
drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded
drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems
drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table
drm/amdgpu: Identify data parity error corrected in replay mode
drm/amdgpu: Fix typo in IP discovery parsing
drm/amd/display: fix S/G display enablement
drm/amdxcp: fix amdxcp unloads incompletely
drm/amd/amdgpu: fix the GPU power print error in pm info
drm/amdgpu: Use pcie domain of xcc acpi objects
drm/amd: check num of link levels when update pcie param
drm/amdgpu: Add a read to GFX v9.4.3 ring test
drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported.
drm/amdgpu: get RAS poison status from DF v4_6_2
drm/amdgpu: Use discovery table's subrevision
drm/amd/display: 3.2.256
drm/amd/display: add interface to query SubVP status
drm/amd/display: Read before writing Backlight Mode Set Register
drm/amd/display: Disable SYMCLK32_SE RCO on DCN314
...
* for-next/cpus_have_const_cap: (38 commits)
: cpus_have_const_cap() removal
arm64: Remove cpus_have_const_cap()
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419
arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0
arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}
arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2
arm64: Avoid cpus_have_const_cap() for ARM64_SSBS
arm64: Avoid cpus_have_const_cap() for ARM64_MTE
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT
...
Indentation should use TABs, not spaces.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/5ab108be356a5d2a6e6d72bc418ccf1c1938e8fe.1696602993.git.geert@linux-m68k.org
PDC2.0 specifies the additional PSW-bit field.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks. A CPU-attested software module
called 'the TDX module' runs inside a new isolated memory range as a
trusted hypervisor to manage and run protected VMs.
TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This
mode runs only the TDX module itself or other code to load the TDX
module.
The host kernel communicates with SEAM software via a new SEAMCALL
instruction. This is conceptually similar to a guest->host hypercall,
except it is made from the host to SEAM software instead. The TDX
module establishes a new SEAMCALL ABI which allows the host to
initialize the module and to manage VMs.
The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
TDCALL infrastructure. Wire up basic functions to make SEAMCALLs for
the basic support of running TDX guests: __seamcall(), __seamcall_ret(),
and __seamcall_saved_ret() for TDH.VP.ENTER. All SEAMCALLs involved in
the basic TDX support don't use "callee-saved" registers as input and
output, except the TDH.VP.ENTER.
To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
TDX host kernel support. Add a new Kconfig option CONFIG_INTEL_TDX_HOST
to opt-in TDX host kernel support (to distinguish with TDX guest kernel
support). So far only KVM uses TDX. Make the new config option depend
on KVM_INTEL.
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Isaku Yamahata <isaku.yamahata@intel.com>
Link: https://lore.kernel.org/all/4db7c3fc085e6af12acc2932294254ddb3d320b3.1692096753.git.kai.huang%40intel.com
Pull parisc architecture fixes from Helge Deller:
- Include the upper 5 address bits when inserting TLB entries on a
64-bit kernel.
On physical machines those are ignored, but in qemu it's nice to have
them included and to be correct.
- Stop the 64-bit kernel and show a warning if someone tries to boot on
a machine with a 32-bit CPU
- Fix a "no previous prototype" warning in parport-gsc
* tag 'parisc-for-6.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Prevent booting 64-bit kernels on PA1.x machines
parport: gsc: mark init function static
parisc/pgtable: Do not drop upper 5 address bits of physical address
Pull parisc fixes from Helge Deller:
"On parisc we still sometimes need writeable stacks, e.g. if programs
aren't compiled with gcc-14. To avoid issues with the upcoming
systemd-254 we therefore have to disable prctl(PR_SET_MDWE) for now
(for parisc only).
The other two patches are minor: a bugfix for the soft power-off on
qemu with 64-bit kernel and prefer strscpy() over strlcpy():
- Fix power soft-off on qemu
- Disable prctl(PR_SET_MDWE) since parisc sometimes still needs
writeable stacks
- Use strscpy instead of strlcpy in show_cpuinfo()"
* tag 'parisc-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
prctl: Disable prctl(PR_SET_MDWE) on parisc
parisc/power: Fix power soft-off when running on qemu
parisc: Replace strlcpy() with strscpy()
Pull in queued fixes for 6.7
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull perf event fix from Ingo Molnar:
"Fix a potential NULL dereference bug"
* tag 'perf-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: Fix potential NULL deref
When splitting the allocation of the ITS node from its configuration,
some of the default settings were kept in the latter instead of
being moved to the former.
This has the side effect of negating some of the quirk detections that
have happened in between, amongst which the dreaded Synquacer hack
(that also affect Dominic's TI platform).
Move the initialisation of these fields early, so that they can again be
overriden by the Synquacer quirk.
Fixes: 9585a495ac93 ("irqchip/gic-v3-its: Split allocation from initialisation of its_node")
Reported by: Dominic Rath <dominic.rath@ibv-augsburg.net>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dominic Rath <dominic.rath@ibv-augsburg.net>
Link: https://lore.kernel.org/r/20231024084831.GA3788@JADEVM-DRA
Link: https://lore.kernel.org/r/20231024143431.2144579-1-maz@kernel.org
David and a few others reported that on certain newer systems some legacy
interrupts fail to work correctly.
Debugging revealed that the BIOS of these systems leaves the legacy PIC in
uninitialized state which makes the PIC detection fail and the kernel
switches to a dummy implementation.
Unfortunately this fallback causes quite some code to fail as it depends on
checks for the number of legacy PIC interrupts or the availability of the
real PIC.
In theory there is no reason to use the PIC on any modern system when
IO/APIC is available, but the dependencies on the related checks cannot be
resolved trivially and on short notice. This needs lots of analysis and
rework.
The PIC detection has been added to avoid quirky checks and force selection
of the dummy implementation all over the place, especially in VM guest
scenarios. So it's not an option to revert the relevant commit as that
would break a lot of other scenarios.
One solution would be to try to initialize the PIC on detection fail and
retry the detection, but that puts the burden on everything which does not
have a PIC.
Fortunately the ACPI/MADT table header has a flag field, which advertises
in bit 0 that the system is PCAT compatible, which means it has a legacy
8259 PIC.
Evaluate that bit and if set avoid the detection routine and keep the real
PIC installed, which then gets initialized (for nothing) and makes the rest
of the code with all the dependencies work again.
Fixes: e179f6914152 ("x86, irq, pic: Probe for legacy PIC and set legacy_pic appropriately")
Reported-by: David Lazar <dlazar@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: David Lazar <dlazar@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218003
Link: https://lore.kernel.org/r/875y2u5s8g.ffs@tglx
Pull devicetree updates from Rob Herring:
- Add a kselftest to check for unprobed DT devices
- Fix address translation for some 3 address cells cases
- Refactor firmware node refcounting for AMBA bus
- Add bindings for qcom,sm4450-pdc, Qualcomm Kryo 465 CPU, and
Freescale QMC HDLC
- Add Marantec vendor prefix
- Convert qcom,pm8921-keypad, cnxt,cx92755-wdt, da9062-wdt, and
atmel,at91rm9200-wdt bindings to DT schema
- Several additionalProperties/unevaluatedProperties on child node
schemas fixes
- Drop reserved-memory bindings which now live in dtschema project
- Fix a reference to rockchip,inno-usb2phy.yaml
- Remove backlight nodes from display panel examples
- Expand example for using DT_SCHEMA_FILES
- Merge simple LVDS panel bindings to one binding doc
* tag 'devicetree-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (34 commits)
dt-bindings: soc: fsl: cpm_qe: cpm1-scc-qmc: Add support for QMC HDLC
dt-bindings: soc: fsl: cpm_qe: cpm1-scc-qmc: Add 'additionalProperties: false' in child nodes
dt-bindings: soc: fsl: cpm_qe: cpm1-scc-qmc: Fix example property name
dt-bindings: arm,coresight-cti: Add missing additionalProperties on child nodes
dt-bindings: arm,coresight-cti: Drop type for 'cpu' property
dt-bindings: soundwire: Add reference to soundwire-controller.yaml schema
dt-bindings: input: syna,rmi4: Make "additionalProperties: true" explicit
media: dt-bindings: ti,ds90ub960: Add missing type for "i2c-alias"
dt-bindings: input: qcom,pm8921-keypad: convert to YAML format
of: overlay: unittest: overlay_bad_unresolved: Spelling s/ok/okay/
of: address: Consolidate bus .map() functions
of: address: Store number of bus flag cells rather than bool
of: unittest: Add tests for address translations
of: address: Remove duplicated functions
of: address: Fix address translation when address-size is greater than 2
dt-bindings: watchdog: cnxt,cx92755-wdt: convert txt to yaml
dt-bindings: watchdog: da9062-wdt: convert txt to yaml
dt-bindings: watchdog: fsl,scu-wdt: Document imx8dl
dt-bindings: watchdog: atmel,at91rm9200-wdt: convert txt to yaml
dt-bindings: usb: rockchip,dwc3: update inno usb2 phy binding name
...
amd-drm-next-6.7-2023-10-27:
amdgpu:
- RAS fixes
- Seamless boot fixes
- NBIO 7.7 fix
- SMU 14.0 fixes
- GC 11.5 fixes
- DML2 fixes
- ASPM fixes
- VPE fixes
- Misc code cleanups
- SRIOV fixes
- Add some missing copyright notices
- DCN 3.5 fixes
- FAMS fixes
- Backlight fix
- S/G display fix
- fdinfo cleanups
- EXT_COHERENT fixes for APU and NUMA systems
amdkfd:
- Misc fixes
- Misc code cleanups
- SVM fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231027200343.57132-1-alexander.deucher@amd.com
* for-next/feat_lse128:
: HWCAP for FEAT_LSE128
kselftest/arm64: add FEAT_LSE128 to hwcap test
arm64: add FEAT_LSE128 HWCAP
There are no longer any users of cpus_have_const_cap(), and therefore it
can be removed.
Remove cpus_have_const_cap(). At the same time, remove
__cpus_have_const_cap(), as this is a trivial wrapper of
alternative_has_cap_unlikely(), which can be used directly instead.
The comment for __system_matches_cap() is updated to no longer refer to
cpus_have_const_cap(). As we have a number of ways to check the cpucaps,
the specific suggestions are removed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Indentation should use TABs, not spaces.
Fix whitespace in reindented code while at it.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/2819709eee2be69c93497d4e97413bd0e05a9268.1696602993.git.geert@linux-m68k.org
An excerpt from the PA8800 ERS states:
* The PA8800 violates the seven instruction pipeline rule when performing
TLB inserts or PxTLBE instructions with the PSW C bit on. The instruction
will take effect by the 12th instruction after the insert or purge.
I believe we have a problem with handling TLB misses. We don't fill
the pipeline following TLB inserts. As a result, we likely fault again
after returning from the interruption.
The above statement indicates that we need at least seven instructions
after the insert on pre PA8800 processors and we need 12 instructions
on PA8800/PA8900 processors.
Here we add macros and code to provide the required number instructions
after a TLB insert.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Suggested-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
Now 'struct tdx_hypercall_args' is basically 'struct tdx_module_args'
minus RCX. Although from __tdx_hypercall()'s perspective RCX isn't
used as shared register thus not part of input/output registers, it's
not worth to have a separate structure just due to one register.
Remove the 'struct tdx_hypercall_args' and use 'struct tdx_module_args'
instead in __tdx_hypercall() related code. This also saves the memory
copy between the two structures within __tdx_hypercall().
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/798dad5ce24e9d745cf0e16825b75ccc433ad065.1692096753.git.kai.huang%40intel.com
Pull LoongArch updates from Huacai Chen:
- support PREEMPT_DYNAMIC with static keys
- relax memory ordering for atomic operations
- support BPF CPU v4 instructions for LoongArch
- some build and runtime warning fixes
* tag 'loongarch-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
selftests/bpf: Enable cpu v4 tests for LoongArch
LoongArch: BPF: Support signed mod instructions
LoongArch: BPF: Support signed div instructions
LoongArch: BPF: Support 32-bit offset jmp instructions
LoongArch: BPF: Support unconditional bswap instructions
LoongArch: BPF: Support sign-extension mov instructions
LoongArch: BPF: Support sign-extension load instructions
LoongArch: Add more instruction opcodes and emit_* helpers
LoongArch/smp: Call rcutree_report_cpu_starting() earlier
LoongArch: Relax memory ordering for atomic operations
LoongArch: Mark __percpu functions as always inline
LoongArch: Disable module from accessing external data directly
LoongArch: Support PREEMPT_DYNAMIC with static keys
Bail out early with error message when trying to boot a 64-bit kernel on
32-bit machines. This fixes the previous commit to include the check for
true 64-bit kernels as well.
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 591d2108f3abc ("parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines")
Cc: <stable@vger.kernel.org> # v6.0+
Pull Kbuild fixes from Masahiro Yamada:
- Fix section mismatch warning messages for riscv and loongarch
- Remove CONFIG_IA64 left-over from linux/export-internal.h
- Fix the location of the quotes for UIMAGE_NAME
- Fix a memory leak bug in Kconfig
* tag 'kbuild-fixes-v6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild:
kconfig: fix memory leak from range properties
kbuild: Move the single quotes for image name
linux/export: clean up the IA-64 KSYM_FUNC macro
modpost: fix section mismatch message for RELA
Currently, sym_validate_range() duplicates the range string using
xstrdup(), which is overwritten by a subsequent sym_calc_value() call.
It results in a memory leak.
Instead, only the pointer should be copied.
Below is a test case, with a summary from Valgrind.
[Test Kconfig]
config FOO
int "foo"
range 10 20
[Test .config]
CONFIG_FOO=0
[Before]
LEAK SUMMARY:
definitely lost: 3 bytes in 1 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 17,465 bytes in 21 blocks
suppressed: 0 bytes in 0 blocks
[After]
LEAK SUMMARY:
definitely lost: 0 bytes in 0 blocks
indirectly lost: 0 bytes in 0 blocks
possibly lost: 0 bytes in 0 blocks
still reachable: 17,462 bytes in 20 blocks
suppressed: 0 bytes in 0 blocks
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull x86 fixes from Borislav Petkov:
- Ignore invalid x2APIC entries in order to not waste per-CPU data
- Fix a back-to-back signals handling scenario when shadow stack is in
use
- A documentation fix
- Add Kirill as TDX maintainer
* tag 'x86_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/acpi: Ignore invalid x2APIC entries
x86/shstk: Delay signal entry SSP write until after user accesses
x86/Documentation: Indent 'note::' directive for protocol version number note
MAINTAINERS: Add Intel TDX entry
In non-coherent GIC designs, the ITS tables must be flushed before writing
to the GITS_BASER<n> registers, otherwise the ITS could read dirty tables,
which results in unpredictable behavior.
Flush the tables right at the begin of its_setup_baser() to prevent that.
[ tglx: Massage changelog ]
Fixes: a8707f553884 ("irqchip/gic-v3: Add Rockchip 3588001 erratum workaround")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Fang Xiang <fangxiang3@xiaomi.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20231030083256.4345-1-fangxiang3@xiaomi.com
Pull timer fix from Borislav Petkov:
- Do the push of pending hrtimers away from a CPU which is being
offlined earlier in the offlining process in order to prevent a
deadlock
* tag 'timers_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
hrtimers: Push pending hrtimers away from outgoing CPU earlier
Currently, the kernel enumerates the possible CPUs by parsing both ACPI
MADT Local APIC entries and x2APIC entries. So CPUs with "valid" APIC IDs,
even if they have duplicated APIC IDs in Local APIC and x2APIC, are always
enumerated.
Below is what ACPI MADT Local APIC and x2APIC describes on an
Ivebridge-EP system,
[02Ch 0044 1] Subtable Type : 00 [Processor Local APIC]
[02Fh 0047 1] Local Apic ID : 00
...
[164h 0356 1] Subtable Type : 00 [Processor Local APIC]
[167h 0359 1] Local Apic ID : 39
[16Ch 0364 1] Subtable Type : 00 [Processor Local APIC]
[16Fh 0367 1] Local Apic ID : FF
...
[3ECh 1004 1] Subtable Type : 09 [Processor Local x2APIC]
[3F0h 1008 4] Processor x2Apic ID : 00000000
...
[B5Ch 2908 1] Subtable Type : 09 [Processor Local x2APIC]
[B60h 2912 4] Processor x2Apic ID : 00000077
As a result, kernel shows "smpboot: Allowing 168 CPUs, 120 hotplug CPUs".
And this wastes significant amount of memory for the per-cpu data.
Plus this also breaks https://lore.kernel.org/all/87edm36qqb.ffs@tglx/,
because __max_logical_packages is over-estimated by the APIC IDs in
the x2APIC entries.
According to https://uefi.org/specs/ACPI/6.5/05_ACPI_Software_Programming_Model.html#processor-local-x2apic-structure:
"[Compatibility note] On some legacy OSes, Logical processors with APIC
ID values less than 255 (whether in XAPIC or X2APIC mode) must use the
Processor Local APIC structure to convey their APIC information to OSPM,
and those processors must be declared in the DSDT using the Processor()
keyword. Logical processors with APIC ID values 255 and greater must use
the Processor Local x2APIC structure and be declared using the Device()
keyword."
Therefore prevent the registration of x2APIC entries with an APIC ID less
than 255 if the local APIC table enumerates valid APIC IDs.
[ tglx: Simplify the logic ]
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230702162802.344176-1-rui.zhang@intel.com
Pull x86 TDX updates from Dave Hansen:
"The majority of this is a rework of the assembly and C wrappers that
are used to talk to the TDX module and VMM. This is a nice cleanup in
general but is also clearing the way for using this code when Linux is
the TDX VMM.
There are also some tidbits to make TDX guests play nicer with Hyper-V
and to take advantage the hardware TSC.
Summary:
- Refactor and clean up TDX hypercall/module call infrastructure
- Handle retrying/resuming page conversion hypercalls
- Make sure to use the (shockingly) reliable TSC in TDX guests"
[ TLA reminder: TDX is "Trust Domain Extensions", Intel's guest VM
confidentiality technology ]
* tag 'x86_tdx_for_6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tdx: Mark TSC reliable
x86/tdx: Fix __noreturn build warning around __tdx_hypercall_failed()
x86/virt/tdx: Make TDX_MODULE_CALL handle SEAMCALL #UD and #GP
x86/virt/tdx: Wire up basic SEAMCALL functions
x86/tdx: Remove 'struct tdx_hypercall_args'
x86/tdx: Reimplement __tdx_hypercall() using TDX_MODULE_CALL asm
x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL
x86/tdx: Extend TDX_MODULE_CALL to support more TDCALL/SEAMCALL leafs
x86/tdx: Pass TDCALL/SEAMCALL input/output registers via a structure
x86/tdx: Rename __tdx_module_call() to __tdcall()
x86/tdx: Make macros of TDCALLs consistent with the spec
x86/tdx: Skip saving output regs when SEAMCALL fails with VMFailInvalid
x86/tdx: Zero out the missing RSI in TDX_HYPERCALL macro
x86/tdx: Retry partially-completed page conversion hypercalls
With commit cf8e8658100d ("arch: Remove Itanium (IA-64) architecture"),
there is no need to keep the IA-64 definition of the KSYM_FUNC macro.
Clean up the IA-64 definition of the KSYM_FUNC macro.
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Pull scheduler fixes from Borislav Petkov:
- Fix virtual runtime calculation when recomputing a sched entity's
weights
- Fix wrongly rejected unprivileged poll requests to the cgroup psi
pressure files
- Make sure the load balancing is done by only one CPU
* tag 'sched_urgent_for_v6.7_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix the decision for load balance
sched: psi: fix unprivileged polling against cgroups
sched/eevdf: Fix vruntime adjustment on reweight
2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug")
solved the straight forward CPU hotplug deadlock vs. the scheduler
bandwidth timer. Yu discovered a more involved variant where a task which
has a bandwidth timer started on the outgoing CPU holds a lock and then
gets throttled. If the lock required by one of the CPU hotplug callbacks
the hotplug operation deadlocks because the unthrottling timer event is not
handled on the dying CPU and can only be recovered once the control CPU
reaches the hotplug state which pulls the pending hrtimers from the dead
CPU.
Solve this by pushing the hrtimers away from the dying CPU in the dying
callbacks. Nothing can queue a hrtimer on the dying CPU at that point because
all other CPUs spin in stop_machine() with interrupts disabled and once the
operation is finished the CPU is marked offline.
Reported-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liu Tie <liutie4@huawei.com>
Link: https://lore.kernel.org/r/87a5rphara.ffs@tglx
When a signal is being delivered, the kernel needs to make accesses to
userspace. These accesses could encounter an access error, in which case
the signal delivery itself will trigger a segfault. Usually this would
result in the kernel killing the process. But in the case of a SEGV signal
handler being configured, the failure of the first signal delivery will
result in *another* signal getting delivered. The second signal may
succeed if another thread has resolved the issue that triggered the
segfault (i.e. a well timed mprotect()/mmap()), or the second signal is
being delivered to another stack (i.e. an alt stack).
On x86, in the non-shadow stack case, all the accesses to userspace are
done before changes to the registers (in pt_regs). The operation is
aborted when an access error occurs, so although there may be writes done
for the first signal, control flow changes for the signal (regs->ip,
regs->sp, etc) are not committed until all the accesses have already
completed successfully. This means that the second signal will be
delivered as if it happened at the time of the first signal. It will
effectively replace the first aborted signal, overwriting the half-written
frame of the aborted signal. So on sigreturn from the second signal,
control flow will resume happily from the point of control flow where the
original signal was delivered.
The problem is, when shadow stack is active, the shadow stack SSP
register/MSR is updated *before* some of the userspace accesses. This
means if the earlier accesses succeed and the later ones fail, the second
signal will not be delivered at the same spot on the shadow stack as the
first one. So on sigreturn from the second signal, the SSP will be
pointing to the wrong location on the shadow stack (off by a frame).
Pengfei privately reported that while using a shadow stack enabled glibc,
the “signal06” test in the LTP test-suite hung. It turns out it is
testing the above described double signal scenario. When this test was
compiled with shadow stack, the first signal pushed a shadow stack
sigframe, then the second pushed another. When the second signal was
handled, the SSP was at the first shadow stack signal frame instead of
the original location. The test then got stuck as the #CP from the twice
incremented SSP was incorrect and generated segfaults in a loop.
Fix this by adjusting the SSP register only after any userspace accesses,
such that there can be no failures after the SSP is adjusted. Do this by
moving the shadow stack sigframe push logic to happen after all other
userspace accesses.
Note, sigreturn (as opposed to the signal delivery dealt with in this
patch) has ordering behavior that could lead to similar failures. The
ordering issues there extend beyond shadow stack to include the alt stack
restoration. Fixing that would require cross-arch changes, and the
ordering today does not cause any known test or apps breakages. So leave
it as is, for now.
[ dhansen: minor changelog/subject tweak ]
Fixes: 05e36022c054 ("x86/shstk: Handle signals for shadow stack")
Reported-by: Pengfei Xu <pengfei.xu@intel.com>
Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Pengfei Xu <pengfei.xu@intel.com>
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20231107182251.91276-1-rick.p.edgecombe%40intel.com
Link: https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/syscalls/signal/signal06.c
Pull parisc updates from Helge Deller:
"Usual fixes and updates:
- Add up to 12 nops after TLB inserts for PA8x00 CPUs as the
specification requires (Dave Anglin)
- Simplify the parisc smp_prepare_boot_cpu() code (Russell King)
- Use 64-bit little-endian values in SBA IOMMU PDIR table for AGP
Since there is upcoming support for booting a 64-bit kernel on QEMU,
some corner cases were fixed and improvements added:
- Fix 64-bit kernel crash in STI (graphics console) font setup code
which miscalculated the font start address as it gets signed vs
unsigned offsets wrong
- Support building an uncompressed Linux kernel
- Add support for soft power-off in qemu"
* tag 'parisc-for-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
fbdev: stifb: Make the STI next font pointer a 32-bit signed offset
parisc: Show default CPU PSW.W setting as reported by PDC
parisc/pdc: Add width field to struct pdc_model
parisc: Add nop instructions after TLB inserts
parisc: simplify smp_prepare_boot_cpu()
parisc/agp: Use 64-bit LE values in SBA IOMMU PDIR table
parisc/firmware: Use PDC constants for narrow/wide firmware
parisc: Move parisc_narrow_firmware variable to header file
parisc/power: Trivial whitespace cleanups and license update
parisc/power: Add power soft-off when running on qemu
parisc: Allow building uncompressed Linux kernel
parisc: Add some missing PDC functions and constants
parisc: sba-iommu: Fix comment when calculating IOC number
In x86 virtualization environments, including TDX, RDTSC instruction is
handled without causing a VM exit, resulting in minimal overhead and
jitters. On the other hand, other clock sources (such as HPET, ACPI
timer, APIC, etc.) necessitate VM exits to implement, resulting in more
fluctuating measurements compared to TSC. Thus, those clock sources are
not effective for calibrating TSC.
As a foundation, the host TSC is guaranteed to be invariant on any
system which enumerates TDX support.
TDX guests and the TDX module build on that foundation by enforcing:
- Virtual TSC is monotonously incrementing for any single VCPU;
- Virtual TSC values are consistent among all the TD’s VCPUs at the
level supported by the CPU:
+ VMM is required to set the same TSC_ADJUST;
+ VMM must not modify from initial value of TSC_ADJUST before
SEAMCALL;
- The frequency is determined by TD configuration:
+ Virtual TSC frequency is specified by VMM on TDH.MNG.INIT;
+ Virtual TSC starts counting from 0 at TDH.MNG.INIT;
The result is that a reliable TSC is a TDX architectural guarantee.
Use the TSC as the only reliable clock source in TD guests, bypassing
unstable calibration.
This is similar to what the kernel already does in some VMWare and
HyperV environments.
[ dhansen: changelog tweaks ]
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Erdem Aktas <erdemaktas@google.com>
Reviewed-by: Isaku Yamahata <isaku.yamahata@intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/all/20231006144549.2633-1-kirill.shutemov%40linux.intel.com
The section mismatch check prints a bogus symbol name on some
architectures.
[test code]
#include <linux/init.h>
int __initdata foo;
int get_foo(void) { return foo; }
If you compile it with GCC for riscv or loongarch, modpost will show an
incorrect symbol name:
WARNING: modpost: vmlinux: section mismatch in reference: get_foo+0x8 (section: .text) -> done (section: .init.data)
To get the correct symbol address, the st_value must be added.
This issue has never been noticed since commit 93684d3b8062 ("kbuild:
include symbol names in section mismatch warnings") presumably because
st_value becomes zero on most architectures when the referenced symbol
is looked up. It is not true for riscv or loongarch, at least.
With this fix, modpost will show the correct symbol name:
WARNING: modpost: vmlinux: section mismatch in reference: get_foo+0x8 (section: .text) -> foo (section: .init.data)
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
should_we_balance is called for the decision to do load-balancing.
When sched ticks invoke this function, only one CPU should return
true. However, in the current code, two CPUs can return true. The
following situation, where b means busy and i means idle, is an
example, because CPU 0 and CPU 2 return true.
[0, 1] [2, 3]
b b i b
This fix checks if there exists an idle CPU with busy sibling(s)
after looking for a CPU on an idle core. If some idle CPUs with busy
siblings are found, just the first one should do load-balancing.
Fixes: b1bfeab9b002 ("sched/fair: Consider the idle state of the whole core for load balance")
Signed-off-by: Keisuke Nishimura <keisuke.nishimura@inria.fr>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.vnet.ibm.com>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20231031133821.1570861-1-keisuke.nishimura@inria.fr
The protocol version number note is between the protocol version table and
the memory layout section. As such, Sphinx renders the note directive not
only on the actual note, but until the end of doc.
Indent the directive so that only the actual protocol version number
note is rendered as such.
Fixes: 2c33c27fd603 ("x86/boot: Introduce kernel_info")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20231106101206.76487-2-bagasdotme@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull m68k updates from Geert Uytterhoeven:
- misc aesthetical improvements for the floating point emulator
- remove the last user of strlcpy()
- use kernel's generic libgcc functions
- misc fixes for W=1 builds
- misc indentation fixes
- misc fixes and improvements
- defconfig updates
* tag 'm68k-for-v6.7-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k: (72 commits)
m68k: lib: Include <linux/libgcc.h> for __muldi3()
m68k: fpsp040: Fix indentation by 5 spaces
m68k: Fix indentation by 2 or 5 spaces in <asm/page_mm.h>
m68k: kernel: Fix indentation by 7 spaces in traps.c
m68k: sun3: Fix indentation by 5 or 7 spaces
m68k: Fix indentation by 7 spaces in <asm/io_mm.h>
m68k: defconfig: Update virt_defconfig for v6.6-rc3
m68k: defconfig: Update defconfigs for v6.6-rc1
m68k: io: Mark mmio read addresses as const
m68k: Replace GPL 2.0+ README.legal boilerplate with SPDX
m68k: sun3: Change led_pattern[] to unsigned char
m68k: Add missing types to asm/irq.h
m68k: sun3/3x: Add and use "sun3.h"
m68k: sun3x: Make dvma_print() static
m68k: sun3x: Make sun3x_halt() static
m68k: sun3x: Do not mark dvma_map_iommu() inline
m68k: sun3x: Fix signature of sun3_leds()
m68k: sun3: Make sun3_platform_init() static
m68k: sun3: Make print_pte() static
m68k: sun3: Annotate prom_printf() with __printf()
...
The pointer to the next STI font is actually a signed 32-bit
offset. With this change the 64-bit kernel will correctly subract
the (signed 32-bit) offset instead of adding a (unsigned 32-bit)
offset. It has no effect on 32-bit kernels.
This fixes the stifb driver with a 64-bit kernel on qemu.
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
LKP reported below build warning:
vmlinux.o: warning: objtool: __tdx_hypercall+0x128: __tdx_hypercall_failed() is missing a __noreturn annotation
The __tdx_hypercall_failed() function definition already has __noreturn
annotation, but it turns out the __noreturn must be annotated to the
function declaration.
PeterZ explains:
"FWIW, the reason being that...
The point of noreturn is that the caller should know to stop generating
code. For that the declaration needs the attribute, because call sites
typically do not have access to the function definition in C."
Add __noreturn annotation to the declaration of __tdx_hypercall_failed()
to fix. It's not a bad idea to document the __noreturn nature at the
definition site either, so keep the annotation at the definition.
Note <asm/shared/tdx.h> is also included by TDX related assembly files.
Include <linux/compiler_attributes.h> only in case of !__ASSEMBLY__
otherwise compiling assembly file would trigger build error.
Also, following the objtool documentation, add __tdx_hypercall_failed()
to "tools/objtool/noreturns.h".
Fixes: c641cfb5c157 ("x86/tdx: Make TDX_HYPERCALL asm similar to TDX_MODULE_CALL")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20230918041858.331234-1-kai.huang@intel.com
Closes: https://lore.kernel.org/oe-kbuild-all/202309140828.9RdmlH2Z-lkp@intel.com/
Xi reported that commit 5694289ce183 ("futex: Flag conversion") broke
glibc's robust futex tests.
This was narrowed down to the change of FLAGS_SHARED from 0x01 to
0x10, at which point Florian noted that handle_futex_death() has a
hardcoded flags argument of 1.
Change this to: FLAGS_SIZE_32 | FLAGS_SHARED, matching how
futex_to_flags() unconditionally sets FLAGS_SIZE_32 for all legacy
futex ops.
Reported-by: Xi Ruoyao <xry111@xry111.site>
Reported-by: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20231114201402.GA25315@noisy.programming.kicks-ass.net
Fixes: 5694289ce183 ("futex: Flag conversion")
Cc: <stable@vger.kernel.org>
519fabc7aaba ("psi: remove 500ms min window size limitation for
triggers") breaks unprivileged psi polling on cgroups.
Historically, we had a privilege check for polling in the open() of a
pressure file in /proc, but were erroneously missing it for the open()
of cgroup pressure files.
When unprivileged polling was introduced in d82caa273565 ("sched/psi:
Allow unprivileged polling of N*2s period"), it needed to filter
privileges depending on the exact polling parameters, and as such
moved the CAP_SYS_RESOURCE check from the proc open() callback to
psi_trigger_create(). Both the proc files as well as cgroup files go
through this during write(). This implicitly added the missing check
for privileges required for HT polling for cgroups.
When 519fabc7aaba ("psi: remove 500ms min window size limitation for
triggers") followed right after to remove further restrictions on the
RT polling window, it incorrectly assumed the cgroup privilege check
was still missing and added it to the cgroup open(), mirroring what we
used to do for proc files in the past.
As a result, unprivileged poll requests that would be supported now
get rejected when opening the cgroup pressure file for writing.
Remove the cgroup open() check. psi_trigger_create() handles it.
Fixes: 519fabc7aaba ("psi: remove 500ms min window size limitation for triggers")
Reported-by: Luca Boccassi <bluca@debian.org>
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Luca Boccassi <bluca@debian.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Cc: stable@vger.kernel.org # 6.5+
Link: https://lore.kernel.org/r/20231026164114.2488682-1-hannes@cmpxchg.org
Pull misc x86 fixes from Ingo Molnar:
- Fix a possible CPU hotplug deadlock bug caused by the new TSC
synchronization code
- Fix a legacy PIC discovery bug that results in device troubles on
affected systems, such as non-working keybards, etc
- Add a new Intel CPU model number to <asm/intel-family.h>
* tag 'x86-urgent-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/tsc: Defer marking TSC unstable to a worker
x86/i8259: Skip probing when ACPI/MADT advertises PCAT compatibility
x86/cpu: Add model number for Intel Arrow Lake mobile processor
Add myself as Intel TDX maintainer.
I drove upstreaming most of TDX code so far and I will continue
working on TDX for foreseeable future.
[ dhansen: * Add myself as a reviewer too
* Swap Maintained=>Supported. I double
checked Kirill is still being paid
* Add drivers/virt/coco/tdx-guest ]
Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Dave Hansen <dave.hansen@linux.intel.com>
Acked-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Acked-by: Kai Huang <kai.huang@intel.com>
Link: https://lore.kernel.org/all/20231101233314.2567-1-kirill.shutemov%40linux.intel.com
Pull arm64 updates from Catalin Marinas:
"No major architecture features this time around, just some new HWCAP
definitions, support for the Ampere SoC PMUs and a few fixes/cleanups.
The bulk of the changes is reworking of the CPU capability checking
code (cpus_have_cap() etc).
- Major refactoring of the CPU capability detection logic resulting
in the removal of the cpus_have_const_cap() function and migrating
the code to "alternative" branches where possible
- Backtrace/kgdb: use IPIs and pseudo-NMI
- Perf and PMU:
- Add support for Ampere SoC PMUs
- Multi-DTC improvements for larger CMN configurations with
multiple Debug & Trace Controllers
- Rework the Arm CoreSight PMU driver to allow separate
registration of vendor backend modules
- Fixes: add missing MODULE_DEVICE_TABLE to the amlogic perf
driver; use device_get_match_data() in the xgene driver; fix
NULL pointer dereference in the hisi driver caused by calling
cpuhp_state_remove_instance(); use-after-free in the hisi driver
- HWCAP updates:
- FEAT_SVE_B16B16 (BFloat16)
- FEAT_LRCPC3 (release consistency model)
- FEAT_LSE128 (128-bit atomic instructions)
- SVE: remove a couple of pseudo registers from the cpufeature code.
There is logic in place already to detect mismatched SVE features
- Miscellaneous:
- Reduce the default swiotlb size (currently 64MB) if no ZONE_DMA
bouncing is needed. The buffer is still required for small
kmalloc() buffers
- Fix module PLT counting with !RANDOMIZE_BASE
- Restrict CPU_BIG_ENDIAN to LLVM IAS 15.x or newer move
synchronisation code out of the set_ptes() loop
- More compact cpufeature displaying enabled cores
- Kselftest updates for the new CPU features"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (83 commits)
arm64: Restrict CPU_BIG_ENDIAN to GNU as or LLVM IAS 15.x or newer
arm64: module: Fix PLT counting when CONFIG_RANDOMIZE_BASE=n
arm64, irqchip/gic-v3, ACPI: Move MADT GICC enabled check into a helper
perf: hisi: Fix use-after-free when register pmu fails
drivers/perf: hisi_pcie: Initialize event->cpu only on success
drivers/perf: hisi_pcie: Check the type first in pmu::event_init()
arm64: cpufeature: Change DBM to display enabled cores
arm64: cpufeature: Display the set of cores with a feature
perf/arm-cmn: Enable per-DTC counter allocation
perf/arm-cmn: Rework DTC counters (again)
perf/arm-cmn: Fix DTC domain detection
drivers: perf: arm_pmuv3: Drop some unused arguments from armv8_pmu_init()
drivers: perf: arm_pmuv3: Read PMMIR_EL1 unconditionally
drivers/perf: hisi: use cpuhp_state_remove_instance_nocalls() for hisi_hns3_pmu uninit process
clocksource/drivers/arm_arch_timer: limit XGene-1 workaround
arm64: Remove system_uses_lse_atomics()
arm64: Mark the 'addr' argument to set_ptes() and __set_pte_at() as unused
drivers/perf: xgene: Use device_get_match_data()
perf/amlogic: add missing MODULE_DEVICE_TABLE
arm64/mm: Hoist synchronization out of set_ptes() loop
...
When building with W=1:
arch/m68k/lib/muldi3.c:82:1: warning: no previous prototype for ‘__muldi3’ [-Wmissing-prototypes]
82 | __muldi3 (DItype u, DItype v)
| ^~~~~~~~
Fix this by including <linux/libgcc.h>.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Link: https://lore.kernel.org/r/160c1fd14b4798f576d9649334b1d2c77db5cb07.1697008341.git.geert@linux-m68k.org
SEAMCALL instruction causes #UD if the CPU isn't in VMX operation.
Currently the TDX_MODULE_CALL assembly doesn't handle #UD, thus making
SEAMCALL when VMX is disabled would cause Oops.
Unfortunately, there are legal cases that SEAMCALL can be made when VMX
is disabled. For instance, VMX can be disabled due to emergency reboot
while there are still TDX guests running.
Extend the TDX_MODULE_CALL assembly to return an error code for #UD to
handle this case gracefully, e.g., KVM can then quietly eat all SEAMCALL
errors caused by emergency reboot.
SEAMCALL instruction also causes #GP when TDX isn't enabled by the BIOS.
Use _ASM_EXTABLE_FAULT() to catch both exceptions with the trap number
recorded, and define two new error codes by XORing the trap number to
the TDX_SW_ERROR. This opportunistically handles #GP too while using
the same simple assembly code.
A bonus is when kernel mistakenly calls SEAMCALL when CPU isn't in VMX
operation, or when TDX isn't enabled by the BIOS, or when the BIOS is
buggy, the kernel can get a nicer error code rather than a less
understandable Oops.
This is basically based on Peter's code.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/de975832a367f476aab2d0eb0d9de66019a16b54.1692096753.git.kai.huang%40intel.com
The commands should be sorted inside the group definition.
Fix the ordering so we won't get following warning:
WARN_ON(iwl_cmd_groups_verify_sorted(trans_cfg))
Link: https://lore.kernel.org/regressions/2fa930bb-54dd-4942-a88d-05a47c8e9731@gmail.com/
Link: https://lore.kernel.org/linux-wireless/CAHk-=wix6kqQ5vHZXjOPpZBfM7mMm9bBZxi2Jh7XnaKCqVf94w@mail.gmail.com/
Fixes: b6e3d1ba4fcf ("wifi: iwlwifi: mvm: implement new firmware API for statistics")
Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@gmail.com>
Tested-by: Damian Tometzki <damian@riscv-rocks.de>
Acked-by: Kalle Valo <kvalo@kernel.org>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Pull SCSI fixes from James Bottomley:
"Seven small fixes, six in drivers and one in sd.
The sd fix is so large because it changes a struct pointer to a struct
but otherwise is fairly simple"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: qcom-ufs: dt-bindings: Document the SM8650 UFS Controller
scsi: sd: Fix sshdr use in sd_suspend_common()
scsi: scsi_debug: Delete some bogus error checking
scsi: scsi_debug: Fix some bugs in sdebug_error_write()
scsi: ufs: core: Fix racing issue between ufshcd_mcq_abort() and ISR
scsi: ufs: core: Expand MCQ queue slot to DeviceQueueDepth + 1
scsi: qla2xxx: Fix system crash due to bad pointer access
Audit of the refcounting turned up that perf_pmu_migrate_context()
fails to migrate the ctx refcount.
Fixes: bd2756811766 ("perf: Rewrite core context handling")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20230612093539.085862001@infradead.org
Cc: <stable@vger.kernel.org>
vruntime of the (on_rq && !0-lag) entity needs to be adjusted when
it gets re-weighted, and the calculations can be simplified based
on the fact that re-weight won't change the w-average of all the
entities. Please check the proofs in comments.
But adjusting vruntime can also cause position change in RB-tree
hence require re-queue to fix up which might be costly. This might
be avoided by deferring adjustment to the time the entity actually
leaves tree (dequeue/pick), but that will negatively affect task
selection and probably not good enough either.
Fixes: 147f3efaa241 ("sched/fair: Implement an EEVDF-like scheduling policy")
Signed-off-by: Abel Wu <wuyun.abel@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20231107090510.71322-2-wuyun.abel@bytedance.com
Tetsuo reported the following lockdep splat when the TSC synchronization
fails during CPU hotplug:
tsc: Marking TSC unstable due to check_tsc_sync_source failed
WARNING: inconsistent lock state
inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
ffffffff8cfa1c78 (watchdog_lock){?.-.}-{2:2}, at: clocksource_watchdog+0x23/0x5a0
{IN-HARDIRQ-W} state was registered at:
_raw_spin_lock_irqsave+0x3f/0x60
clocksource_mark_unstable+0x1b/0x90
mark_tsc_unstable+0x41/0x50
check_tsc_sync_source+0x14f/0x180
sysvec_call_function_single+0x69/0x90
Possible unsafe locking scenario:
lock(watchdog_lock);
<Interrupt>
lock(watchdog_lock);
stack backtrace:
_raw_spin_lock+0x30/0x40
clocksource_watchdog+0x23/0x5a0
run_timer_softirq+0x2a/0x50
sysvec_apic_timer_interrupt+0x6e/0x90
The reason is the recent conversion of the TSC synchronization function
during CPU hotplug on the control CPU to a SMP function call. In case
that the synchronization with the upcoming CPU fails, the TSC has to be
marked unstable via clocksource_mark_unstable().
clocksource_mark_unstable() acquires 'watchdog_lock', but that lock is
taken with interrupts enabled in the watchdog timer callback to minimize
interrupt disabled time. That's obviously a possible deadlock scenario,
Before that change the synchronization function was invoked in thread
context so this could not happen.
As it is not crucical whether the unstable marking happens slightly
delayed, defer the call to a worker thread which avoids the lock context
problem.
Fixes: 9d349d47f0e3 ("x86/smpboot: Make TSC synchronization function call based")
Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87zg064ceg.ffs@tglx
Pull drm updates from Dave Airlie:
"Highlights:
- AMD adds some more upcoming HW platforms
- Intel made Meteorlake stable and started adding Lunarlake
- nouveau has a bunch of display rework in prepartion for the NVIDIA
GSP firmware support
- msm adds a7xx support
- habanalabs has finished migration to accel subsystem
Detail summary:
kernel:
- add initial vmemdup-user-array
core:
- fix platform remove() to return void
- drm_file owner updated to reflect owner
- move size calcs to drm buddy allocator
- let GPUVM build as a module
- allow variable number of run-queues in scheduler
edid:
- handle bad h/v sync_end in EDIDs
panfrost:
- add Boris as maintainer
fbdev:
- use fb_ops helpers more
- only allow logo use from fbcon
- rename fb_pgproto to pgprot_framebuffer
- add HPD state to drm_connector_oob_hotplug_event
- convert to fbdev i/o mem helpers
i915:
- Enable meteorlake by default
- Early Xe2 LPD/Lunarlake display enablement
- Rework subplatforms into IP version checks
- GuC based TLB invalidation for Meteorlake
- Display rework for future Xe driver integration
- LNL FBC features
- LNL display feature capability reads
- update recommended fw versions for DG2+
- drop fastboot module parameter
- added deviceid for Arrowlake-S
- drop preproduction workarounds
- don't disable preemption for resets
- cleanup inlines in headers
- PXP firmware loading fix
- Fix sg list lengths
- DSC PPS state readout/verification
- Add more RPL P/U PCI IDs
- Add new DG2-G12 stepping
- DP enhanced framing support to state checker
- Improve shared link bandwidth management
- stop using GEM macros in display code
- refactor related code into display code
- locally enable W=1 warnings
- remove PSR watchdog timers on LNL
amdgpu:
- RAS/FRU EEPROM updatse
- IP discovery updatses
- GC 11.5 support
- DCN 3.5 support
- VPE 6.1 support
- NBIO 7.11 support
- DML2 support
- lots of IP updates
- use flexible arrays for bo list handling
- W=1 fixes
- Enable seamless boot in more cases
- Enable context type property for HDMI
- Rework GPUVM TLB flushing
- VCN IB start/size alignment fixes
amdkfd:
- GC 10/11 fixes
- GC 11.5 support
- use partial migration in GPU faults
radeon:
- W=1 Fixes
- fix some possible buffer overflow/NULL derefs
nouveau:
- update uapi for NO_PREFETCH
- scheduler/fence fixes
- rework suspend/resume for GSP-RM
- rework display in preparation for GSP-RM
habanalabs:
- uapi: expose tsc clock
- uapi: block access to eventfd through control device
- uapi: force dma-buf export to PAGE_SIZE alignments
- complete move to accel subsystem
- move firmware interface include files
- perform hard reset on PCIe AXI drain event
- optimise user interrupt handling
msm:
- DP: use existing helpers for DPCD
- DPU: interrupts reworked
- gpu: a7xx (a730/a740) support
- decouple msm_drv from kms for headless devices
mediatek:
- MT8188 dsi/dp/edp support
- DDP GAMMA - 12 bit LUT support
- connector dynamic selection capability
rockchip:
- rv1126 mipi-dsi/vop support
- add planar formats
ast:
- rename constants
panels:
- Mitsubishi AA084XE01
- JDI LPM102A188A
- LTK050H3148W-CTA6
ivpu:
- power management fixes
qaic:
- add detach slice bo api
komeda:
- add NV12 writeback
tegra:
- support NVSYNC/NHSYNC
- host1x suspend fixes
ili9882t:
- separate into own driver"
* tag 'drm-next-2023-10-31-1' of git://anongit.freedesktop.org/drm/drm: (1803 commits)
drm/amdgpu: Remove unused variables from amdgpu_show_fdinfo
drm/amdgpu: Remove duplicate fdinfo fields
drm/amd/amdgpu: avoid to disable gfxhub interrupt when driver is unloaded
drm/amdgpu: Add EXT_COHERENT support for APU and NUMA systems
drm/amdgpu: Retrieve CE count from ce_count_lo_chip in EccInfo table
drm/amdgpu: Identify data parity error corrected in replay mode
drm/amdgpu: Fix typo in IP discovery parsing
drm/amd/display: fix S/G display enablement
drm/amdxcp: fix amdxcp unloads incompletely
drm/amd/amdgpu: fix the GPU power print error in pm info
drm/amdgpu: Use pcie domain of xcc acpi objects
drm/amd: check num of link levels when update pcie param
drm/amdgpu: Add a read to GFX v9.4.3 ring test
drm/amd/pm: call smu_cmn_get_smc_version in is_mode1_reset_supported.
drm/amdgpu: get RAS poison status from DF v4_6_2
drm/amdgpu: Use discovery table's subrevision
drm/amd/display: 3.2.256
drm/amd/display: add interface to query SubVP status
drm/amd/display: Read before writing Backlight Mode Set Register
drm/amd/display: Disable SYMCLK32_SE RCO on DCN314
...
* for-next/cpus_have_const_cap: (38 commits)
: cpus_have_const_cap() removal
arm64: Remove cpus_have_const_cap()
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_REPEAT_TLBI
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_NVIDIA_CARMEL_CNP
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_CAVIUM_23154
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_2645198
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1742098
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_1542419
arm64: Avoid cpus_have_const_cap() for ARM64_WORKAROUND_843419
arm64: Avoid cpus_have_const_cap() for ARM64_UNMAP_KERNEL_AT_EL0
arm64: Avoid cpus_have_const_cap() for ARM64_{SVE,SME,SME2,FA64}
arm64: Avoid cpus_have_const_cap() for ARM64_SPECTRE_V2
arm64: Avoid cpus_have_const_cap() for ARM64_SSBS
arm64: Avoid cpus_have_const_cap() for ARM64_MTE
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_TLB_RANGE
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_WFXT
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_RNG
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_EPAN
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_PAN
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_GIC_PRIO_MASKING
arm64: Avoid cpus_have_const_cap() for ARM64_HAS_DIT
...
Intel Trust Domain Extensions (TDX) protects guest VMs from malicious
host and certain physical attacks. A CPU-attested software module
called 'the TDX module' runs inside a new isolated memory range as a
trusted hypervisor to manage and run protected VMs.
TDX introduces a new CPU mode: Secure Arbitration Mode (SEAM). This
mode runs only the TDX module itself or other code to load the TDX
module.
The host kernel communicates with SEAM software via a new SEAMCALL
instruction. This is conceptually similar to a guest->host hypercall,
except it is made from the host to SEAM software instead. The TDX
module establishes a new SEAMCALL ABI which allows the host to
initialize the module and to manage VMs.
The SEAMCALL ABI is very similar to the TDCALL ABI and leverages much
TDCALL infrastructure. Wire up basic functions to make SEAMCALLs for
the basic support of running TDX guests: __seamcall(), __seamcall_ret(),
and __seamcall_saved_ret() for TDH.VP.ENTER. All SEAMCALLs involved in
the basic TDX support don't use "callee-saved" registers as input and
output, except the TDH.VP.ENTER.
To start to support TDX, create a new arch/x86/virt/vmx/tdx/tdx.c for
TDX host kernel support. Add a new Kconfig option CONFIG_INTEL_TDX_HOST
to opt-in TDX host kernel support (to distinguish with TDX guest kernel
support). So far only KVM uses TDX. Make the new config option depend
on KVM_INTEL.
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Isaku Yamahata <isaku.yamahata@intel.com>
Link: https://lore.kernel.org/all/4db7c3fc085e6af12acc2932294254ddb3d320b3.1692096753.git.kai.huang%40intel.com
Pull parisc architecture fixes from Helge Deller:
- Include the upper 5 address bits when inserting TLB entries on a
64-bit kernel.
On physical machines those are ignored, but in qemu it's nice to have
them included and to be correct.
- Stop the 64-bit kernel and show a warning if someone tries to boot on
a machine with a 32-bit CPU
- Fix a "no previous prototype" warning in parport-gsc
* tag 'parisc-for-6.7-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Prevent booting 64-bit kernels on PA1.x machines
parport: gsc: mark init function static
parisc/pgtable: Do not drop upper 5 address bits of physical address
Pull parisc fixes from Helge Deller:
"On parisc we still sometimes need writeable stacks, e.g. if programs
aren't compiled with gcc-14. To avoid issues with the upcoming
systemd-254 we therefore have to disable prctl(PR_SET_MDWE) for now
(for parisc only).
The other two patches are minor: a bugfix for the soft power-off on
qemu with 64-bit kernel and prefer strscpy() over strlcpy():
- Fix power soft-off on qemu
- Disable prctl(PR_SET_MDWE) since parisc sometimes still needs
writeable stacks
- Use strscpy instead of strlcpy in show_cpuinfo()"
* tag 'parisc-for-6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
prctl: Disable prctl(PR_SET_MDWE) on parisc
parisc/power: Fix power soft-off when running on qemu
parisc: Replace strlcpy() with strscpy()
When splitting the allocation of the ITS node from its configuration,
some of the default settings were kept in the latter instead of
being moved to the former.
This has the side effect of negating some of the quirk detections that
have happened in between, amongst which the dreaded Synquacer hack
(that also affect Dominic's TI platform).
Move the initialisation of these fields early, so that they can again be
overriden by the Synquacer quirk.
Fixes: 9585a495ac93 ("irqchip/gic-v3-its: Split allocation from initialisation of its_node")
Reported by: Dominic Rath <dominic.rath@ibv-augsburg.net>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Dominic Rath <dominic.rath@ibv-augsburg.net>
Link: https://lore.kernel.org/r/20231024084831.GA3788@JADEVM-DRA
Link: https://lore.kernel.org/r/20231024143431.2144579-1-maz@kernel.org
David and a few others reported that on certain newer systems some legacy
interrupts fail to work correctly.
Debugging revealed that the BIOS of these systems leaves the legacy PIC in
uninitialized state which makes the PIC detection fail and the kernel
switches to a dummy implementation.
Unfortunately this fallback causes quite some code to fail as it depends on
checks for the number of legacy PIC interrupts or the availability of the
real PIC.
In theory there is no reason to use the PIC on any modern system when
IO/APIC is available, but the dependencies on the related checks cannot be
resolved trivially and on short notice. This needs lots of analysis and
rework.
The PIC detection has been added to avoid quirky checks and force selection
of the dummy implementation all over the place, especially in VM guest
scenarios. So it's not an option to revert the relevant commit as that
would break a lot of other scenarios.
One solution would be to try to initialize the PIC on detection fail and
retry the detection, but that puts the burden on everything which does not
have a PIC.
Fortunately the ACPI/MADT table header has a flag field, which advertises
in bit 0 that the system is PCAT compatible, which means it has a legacy
8259 PIC.
Evaluate that bit and if set avoid the detection routine and keep the real
PIC installed, which then gets initialized (for nothing) and makes the rest
of the code with all the dependencies work again.
Fixes: e179f6914152 ("x86, irq, pic: Probe for legacy PIC and set legacy_pic appropriately")
Reported-by: David Lazar <dlazar@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: David Lazar <dlazar@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218003
Link: https://lore.kernel.org/r/875y2u5s8g.ffs@tglx
Pull devicetree updates from Rob Herring:
- Add a kselftest to check for unprobed DT devices
- Fix address translation for some 3 address cells cases
- Refactor firmware node refcounting for AMBA bus
- Add bindings for qcom,sm4450-pdc, Qualcomm Kryo 465 CPU, and
Freescale QMC HDLC
- Add Marantec vendor prefix
- Convert qcom,pm8921-keypad, cnxt,cx92755-wdt, da9062-wdt, and
atmel,at91rm9200-wdt bindings to DT schema
- Several additionalProperties/unevaluatedProperties on child node
schemas fixes
- Drop reserved-memory bindings which now live in dtschema project
- Fix a reference to rockchip,inno-usb2phy.yaml
- Remove backlight nodes from display panel examples
- Expand example for using DT_SCHEMA_FILES
- Merge simple LVDS panel bindings to one binding doc
* tag 'devicetree-for-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (34 commits)
dt-bindings: soc: fsl: cpm_qe: cpm1-scc-qmc: Add support for QMC HDLC
dt-bindings: soc: fsl: cpm_qe: cpm1-scc-qmc: Add 'additionalProperties: false' in child nodes
dt-bindings: soc: fsl: cpm_qe: cpm1-scc-qmc: Fix example property name
dt-bindings: arm,coresight-cti: Add missing additionalProperties on child nodes
dt-bindings: arm,coresight-cti: Drop type for 'cpu' property
dt-bindings: soundwire: Add reference to soundwire-controller.yaml schema
dt-bindings: input: syna,rmi4: Make "additionalProperties: true" explicit
media: dt-bindings: ti,ds90ub960: Add missing type for "i2c-alias"
dt-bindings: input: qcom,pm8921-keypad: convert to YAML format
of: overlay: unittest: overlay_bad_unresolved: Spelling s/ok/okay/
of: address: Consolidate bus .map() functions
of: address: Store number of bus flag cells rather than bool
of: unittest: Add tests for address translations
of: address: Remove duplicated functions
of: address: Fix address translation when address-size is greater than 2
dt-bindings: watchdog: cnxt,cx92755-wdt: convert txt to yaml
dt-bindings: watchdog: da9062-wdt: convert txt to yaml
dt-bindings: watchdog: fsl,scu-wdt: Document imx8dl
dt-bindings: watchdog: atmel,at91rm9200-wdt: convert txt to yaml
dt-bindings: usb: rockchip,dwc3: update inno usb2 phy binding name
...
amd-drm-next-6.7-2023-10-27:
amdgpu:
- RAS fixes
- Seamless boot fixes
- NBIO 7.7 fix
- SMU 14.0 fixes
- GC 11.5 fixes
- DML2 fixes
- ASPM fixes
- VPE fixes
- Misc code cleanups
- SRIOV fixes
- Add some missing copyright notices
- DCN 3.5 fixes
- FAMS fixes
- Backlight fix
- S/G display fix
- fdinfo cleanups
- EXT_COHERENT fixes for APU and NUMA systems
amdkfd:
- Misc fixes
- Misc code cleanups
- SVM fixes
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20231027200343.57132-1-alexander.deucher@amd.com
There are no longer any users of cpus_have_const_cap(), and therefore it
can be removed.
Remove cpus_have_const_cap(). At the same time, remove
__cpus_have_const_cap(), as this is a trivial wrapper of
alternative_has_cap_unlikely(), which can be used directly instead.
The comment for __system_matches_cap() is updated to no longer refer to
cpus_have_const_cap(). As we have a number of ways to check the cpucaps,
the specific suggestions are removed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
An excerpt from the PA8800 ERS states:
* The PA8800 violates the seven instruction pipeline rule when performing
TLB inserts or PxTLBE instructions with the PSW C bit on. The instruction
will take effect by the 12th instruction after the insert or purge.
I believe we have a problem with handling TLB misses. We don't fill
the pipeline following TLB inserts. As a result, we likely fault again
after returning from the interruption.
The above statement indicates that we need at least seven instructions
after the insert on pre PA8800 processors and we need 12 instructions
on PA8800/PA8900 processors.
Here we add macros and code to provide the required number instructions
after a TLB insert.
Signed-off-by: John David Anglin <dave.anglin@bell.net>
Suggested-by: Helge Deller <deller@gmx.de>
Cc: stable@vger.kernel.org
Signed-off-by: Helge Deller <deller@gmx.de>
Now 'struct tdx_hypercall_args' is basically 'struct tdx_module_args'
minus RCX. Although from __tdx_hypercall()'s perspective RCX isn't
used as shared register thus not part of input/output registers, it's
not worth to have a separate structure just due to one register.
Remove the 'struct tdx_hypercall_args' and use 'struct tdx_module_args'
instead in __tdx_hypercall() related code. This also saves the memory
copy between the two structures within __tdx_hypercall().
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Kai Huang <kai.huang@intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/all/798dad5ce24e9d745cf0e16825b75ccc433ad065.1692096753.git.kai.huang%40intel.com
Pull LoongArch updates from Huacai Chen:
- support PREEMPT_DYNAMIC with static keys
- relax memory ordering for atomic operations
- support BPF CPU v4 instructions for LoongArch
- some build and runtime warning fixes
* tag 'loongarch-6.7' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
selftests/bpf: Enable cpu v4 tests for LoongArch
LoongArch: BPF: Support signed mod instructions
LoongArch: BPF: Support signed div instructions
LoongArch: BPF: Support 32-bit offset jmp instructions
LoongArch: BPF: Support unconditional bswap instructions
LoongArch: BPF: Support sign-extension mov instructions
LoongArch: BPF: Support sign-extension load instructions
LoongArch: Add more instruction opcodes and emit_* helpers
LoongArch/smp: Call rcutree_report_cpu_starting() earlier
LoongArch: Relax memory ordering for atomic operations
LoongArch: Mark __percpu functions as always inline
LoongArch: Disable module from accessing external data directly
LoongArch: Support PREEMPT_DYNAMIC with static keys
Bail out early with error message when trying to boot a 64-bit kernel on
32-bit machines. This fixes the previous commit to include the check for
true 64-bit kernels as well.
Signed-off-by: Helge Deller <deller@gmx.de>
Fixes: 591d2108f3abc ("parisc: Add runtime check to prevent PA2.0 kernels on PA1.x machines")
Cc: <stable@vger.kernel.org> # v6.0+