Linux kernel
============
There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.
In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``. The formatted documentation can also be read online at:
https://www.kernel.org/doc/html/latest/
There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.
Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
A lockdep fix removed two rdt_last_cmd_clear() calls that were used to
clear the last_cmd_status buffer but called without holding the required
rdtgroup_mutex.
The impacted resctrl commands are writing to the cpus or cpus_list files
and creating a new monitor or control group. With stale data in the
last_cmd_status buffer the impacted resctrl commands report the stale error
on success, or append its own failure message to the stale error on
failure.
Consequently, restore the rdt_last_cmd_clear() calls after acquiring
rdtgroup_mutex.
Fixes: c8eafe149530 ("x86/resctrl: Fix potential lockdep warning")
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/all/20250603125828.1590067-1-zengheng4@huawei.com
io_bitmap_exit() is invoked from exit_thread() when a task exists or
when a fork fails. In the latter case the exit_thread() cleans up
resources which were allocated during fork().
io_bitmap_exit() invokes task_update_io_bitmap(), which in turn ends up
in tss_update_io_bitmap(). tss_update_io_bitmap() operates on the
current task. If current has TIF_IO_BITMAP set, but no bitmap installed,
tss_update_io_bitmap() crashes with a NULL pointer dereference.
There are two issues, which lead to that problem:
1) io_bitmap_exit() should not invoke task_update_io_bitmap() when
the task, which is cleaned up, is not the current task. That's a
clear indicator for a cleanup after a failed fork().
2) A task should not have TIF_IO_BITMAP set and neither a bitmap
installed nor IOPL emulation level 3 activated.
This happens when a kernel thread is created in the context of
a user space thread, which has TIF_IO_BITMAP set as the thread
flags are copied and the IO bitmap pointer is cleared.
Other than in the failed fork() case this has no impact because
kernel threads including IO workers never return to user space and
therefore never invoke tss_update_io_bitmap().
Cure this by adding the missing cleanups and checks:
1) Prevent io_bitmap_exit() to invoke task_update_io_bitmap() if
the to be cleaned up task is not the current task.
2) Clear TIF_IO_BITMAP in copy_thread() unconditionally. For user
space forks it is set later, when the IO bitmap is inherited in
io_bitmap_share().
For paranoia sake, add a warning into tss_update_io_bitmap() to catch
the case, when that code is invoked with inconsistent state.
Fixes: ea5f1cd7ab49 ("x86/ioperm: Remove bitmap if all permissions dropped")
Reported-by: syzbot+e2b1803445d236442e54@syzkaller.appspotmail.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/87wmdceom2.ffs@tglx
The following trace events are not used and defining them just wastes
memory:
x86_fpu_before_restore
x86_fpu_after_restore
x86_fpu_init_state
Simply remove them.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Link: https://lore.kernel.org/all/20250529130138.544ffec4@gandalf.local.home # background
Link: https://lore.kernel.org/r/20250529131024.7c2ef96f@gandalf.local.home # x86 submission
Pull AMD SEV update from Borislav Petkov:
"Add a virtual TPM driver glue which allows a guest kernel to talk to a
TPM device emulated by a Secure VM Service Module (SVSM) - a helper
module of sorts which runs at a different privilege level in the
SEV-SNP VM stack.
The intent being that a TPM device is emulated by a trusted entity and
not by the untrusted host which is the default assumption in the
confidential computing scenarios"
* tag 'x86_sev_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sev: Register tpm-svsm platform device
tpm: Add SNP SVSM vTPM driver
svsm: Add header with SVSM_VTPM_CMD helpers
x86/sev: Add SVSM vTPM probe/send_command functions
Pull mtrr update from Borislav Petkov:
"A single change to verify the presence of fixed MTRR ranges before
accessing the respective MSRs"
* tag 'x86_mtrr_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mtrr: Check if fixed-range MTRRs exist in mtrr_save_fixed_ranges()
SNP platform can provide a vTPM device emulated by SVSM.
The "tpm-svsm" device can be handled by the platform driver registered by the
x86/sev core code.
Register the platform device only when SVSM is available and it supports vTPM
commands as checked by snp_svsm_vtpm_probe().
[ bp: Massage commit message. ]
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lore.kernel.org/r/20250410135118.133240-5-sgarzare@redhat.com
Pull EDAC updates from Borislav Petkov:
- ie31200: Add support for Raptor Lake-S and Alder Lake-S compute dies
- Rework how RRL registers per channel tracking is done in order to
support newer hardware with different RRL configurations and refactor
that code. Add support for Granite Rapids server
- i10nm: explicitly set RRL modes to fix any wrong BIOS programming
- Properly save and restore Retry Read error Log channel configuration
info on Intel drivers
- igen6: Handle correctly the case of fused off memory controllers on
Arizona Beach and Amston Lake SoCs before adding support for them
- the usual set of fixes and cleanups
* tag 'edac_updates_for_v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/bluefield: Don't use bluefield_edac_readl() result on error
EDAC/i10nm: Fix the bitwise operation between variables of different sizes
EDAC/ie31200: Add two Intel SoCs for EDAC support
EDAC/{skx_common,i10nm}: Add RRL support for Intel Granite Rapids server
EDAC/{skx_common,i10nm}: Refactor show_retry_rd_err_log()
EDAC/{skx_common,i10nm}: Refactor enable_retry_rd_err_log()
EDAC/{skx_common,i10nm}: Structure the per-channel RRL registers
EDAC/i10nm: Explicitly set the modes of the RRL register sets
EDAC/{skx_common,i10nm}: Fix the loss of saved RRL for HBM pseudo channel 0
EDAC/skx_common: Fix general protection fault
EDAC/igen6: Add Intel Amston Lake SoCs support
EDAC/igen6: Add Intel Arizona Beach SoCs support
EDAC/igen6: Skip absent memory controllers
When suspending, save_processor_state() calls mtrr_save_fixed_ranges()
to save fixed-range MTRRs.
On platforms without fixed-range MTRRs like the ACRN hypervisor which
has removed fixed-range MTRR emulation, accessing these MSRs will
trigger an unchecked MSR access error. Make sure fixed-range MTRRs are
supported before access to prevent such error.
Since mtrr_state.have_fixed is only set when MTRRs are present and
enabled, checking the CPU feature flag in mtrr_save_fixed_ranges() is
unnecessary.
Fixes: 3ebad5905609 ("[PATCH] x86: Save and restore the fixed-range MTRRs of the BSP when suspending")
Signed-off-by: Jiaqing Zhao <jiaqing.zhao@linux.intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/20250509170633.3411169-2-jiaqing.zhao@linux.intel.com
Add driver for the vTPM defined by the AMD SVSM spec [1].
The specification defines a protocol that a SEV-SNP guest OS can use to
discover and talk to a vTPM emulated by the Secure VM Service Module (SVSM) in
the guest context, but at a more privileged level (VMPL0).
The new tpm-svsm platform driver uses API exposed by the x86/sev core
implementation interface to a SVSM to send commands and receive responses.
The device cannot be hot-plugged/unplugged as it is emulated by the platform,
so module_platform_driver_probe() can be used. The device will be registered
by the platform only when it's available, so the probe function just needs to
setup the tpm_chip.
This device does not support interrupts and sends responses to commands
synchronously.
In order to have .recv() called just after .send() in tpm_try_transmit(), the
.status() callback is not implemented as recently supported by commit
980a573621ea ("tpm: Make chip->{status,cancel,req_canceled} opt").
[1] "Secure VM Service Module for SEV-SNP Guests"
Publication # 58019 Revision: 1.00
[ bp: Massage commit message. ]
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lore.kernel.org/r/20250410135118.133240-4-sgarzare@redhat.com
Pull x86 resource control updates from Borislav Petkov:
"Carve out the resctrl filesystem-related code into fs/resctrl/ so that
multiple architectures can share the fs API for manipulating their
respective hw resource control implementation.
This is the second step in the work towards sharing the resctrl
filesystem interface, the next one being plugging ARM's MPAM into the
aforementioned fs API"
* tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
MAINTAINERS: Add reviewers for fs/resctrl
x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl
x86/resctrl: Always initialise rid field in rdt_resources_all[]
x86/resctrl: Relax some asm #includes
x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context()
x86/resctrl: Squelch whitespace anomalies in resctrl core code
x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h
x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs
x86/resctrl: Move enum resctrl_event_id to resctrl.h
x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl
fs/resctrl: Add boiler plate for external resctrl code
x86/resctrl: Add 'resctrl' to the title of the resctrl documentation
x86/resctrl: Split trace.h
x86/resctrl: Expand the width of domid by replacing mon_data_bits
x86/resctrl: Add end-marker to the resctrl_event_id enum
x86/resctrl: Move is_mba_sc() out of core.c
x86/resctrl: Drop __init/__exit on assorted symbols
x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point
x86/resctrl: Check all domains are offline in resctrl_exit()
x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_"
...
The bluefield_edac_readl() routine returns an uninitialized result on error
paths. In those cases the calling routine should not use the uninitialized
result. The driver should simply log the error, and then return early.
Fixes: e41967575474 ("EDAC/bluefield: Use Arm SMC for EMI access on BlueField-2")
Signed-off-by: David Thompson <davthompson@nvidia.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Shravan Kumar Ramani <shravankr@nvidia.com>
Link: https://lore.kernel.org/20250318214747.12271-1-davthompson@nvidia.com
Add helpers for the SVSM_VTPM_CMD calls used by the vTPM protocol defined by
the AMD SVSM spec [1].
The vTPM protocol follows the Official TPM 2.0 Reference Implementation
(originally by Microsoft, now part of the TCG) simulator protocol.
[1] "Secure VM Service Module for SEV-SNP Guests"
Publication # 58019 Revision: 1.00
Co-developed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Co-developed-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Link: https://lore.kernel.org/r/20250403100943.120738-3-sgarzare@redhat.com
Pull timer core updates from Thomas Gleixner:
"Updates for the time/timer core code:
- Rework the initialization of the posix-timer kmem_cache and move
the cache pointer into the timer_data structure to prevent false
sharing
- Switch the alarmtimer code to lock guards
- Improve the CPU selection criteria in the per CPU validation of the
clocksource watchdog to avoid arbitrary selections (or omissions)
on systems with a small number of CPUs
- The usual cleanups and improvements"
* tag 'timers-core-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tick/nohz: Remove unused tick_nohz_full_add_cpus_to()
clocksource: Fix the CPUs' choice in the watchdog per CPU verification
alarmtimer: Switch spin_{lock,unlock}_irqsave() to guards
alarmtimer: Remove dead return value in clock2alarm()
time/jiffies: Change register_refined_jiffies() to void __init
timers: Remove unused __round_jiffies(_up)
posix-timers: Initialize cache early and move pointer into __timer_data
resctrl has existed for quite a while as a filesystem interface private to
arch/x86. To allow other architectures to support the same user interface
for similar hardware features, it has been moved to /fs/.
Add those with a vested interest in the common code as reviewers.
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Reinette Chatre <reinette.chatre@intel.com>
Acked-by: Dave Martin <Dave.Martin@arm.com>
Tested-by: Fenghua Yu <fenghuay@nvidia.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/20250515165855.31452-26-james.morse@arm.com
The tool of Smatch static checker reported the following warning:
drivers/edac/i10nm_base.c:364 show_retry_rd_err_log()
warn: should bitwise negate be 'ullong'?
This warning was due to the bitwise NOT/AND operations between
'status_mask' (a u32 type) and 'log' (a u64 type), which resulted in
the high 32 bits of 'log' were cleared.
This was a false positive warning, as only the low 32 bits of 'log' was
written to the first RRL memory controller register (a u32 type).
To improve code sanity, fix this warning by changing 'status_mask' to
a u64 type, ensuring it matches the size of 'log' for bitwise operations.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/aAih0KmEVq7ch6v2@stanley.mountain/
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://lore.kernel.org/r/20250424081454.2952632-1-qiuxu.zhuo@intel.com