Linux kernel
============
There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.
In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``. The formatted documentation can also be read online at:
https://www.kernel.org/doc/html/latest/
There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
Merge a power capping fix for 6.7-rc4 which eliminates unnecessary
and harmful conversions to uW from the DTPM (dynamic thermal and power
management) framework (Lukasz Luba).
* powercap:
powercap: DTPM: Fix unneeded conversions to micro-Watts
show_energy_performance_available_preferences() to show only supported
values which is performance in performance governor policy.
-------Before--------
$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_driver
amd-pstate-epp
$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_preference
performance
$ cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_available_preferences
default performance balance_performance balance_power power
-------After--------
$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_driver
amd-pstate-epp
$ cat /sys/devices/system/cpu/cpu1/cpufreq/scaling_governor
performance
$ cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_preference
performance
$ cat /sys/devices/system/cpu/cpu1/cpufreq/energy_performance_available_preferences
performance
Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors")
Suggested-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Ayush Jain <ayush.jain3@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The power values coming from the Energy Model are already in uW.
The PowerCap and DTPM frameworks operate on uW, so all places should
just use the values from the EM.
Fix the code by removing all of the conversion to uW still present in it.
Fixes: ae6ccaa65038 (PM: EM: convert power field to micro-Watts precision and align drivers)
Cc: 5.19+ <stable@vger.kernel.org> # v5.19+
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When amd_pstate is running, writing to scaling_min_freq and
scaling_max_freq has no effect. These values are only passed to the
policy level, but not to the platform level. This means that the
platform does not know about the frequency limits set by the user.
To fix this, update the min_perf and max_perf values at the platform
level whenever the user changes the scaling_min_freq and scaling_max_freq
values.
Fixes: ffa5096a7c33 ("cpufreq: amd-pstate: implement Pstate EPP support for the AMD processors")
Acked-by: Huang Rui <ray.huang@amd.com>
Signed-off-by: Wyes Karny <wyes.karny@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq_driver->fast_switch() callback expects a frequency as a return
value. amd_pstate_fast_switch() was returning the return value of
amd_pstate_update_freq(), which only indicates a success or failure.
Fix this by making amd_pstate_fast_switch() return the target_freq
when the call to amd_pstate_update_freq() is successful, and return
the current frequency from policy->cur when the call to
amd_pstate_update_freq() is unsuccessful.
Fixes: 4badf2eb1e98 ("cpufreq: amd-pstate: Add ->fast_switch() callback")
Acked-by: Huang Rui <ray.huang@amd.com>
Reviewed-by: Wyes Karny <wyes.karny@amd.com>
Reviewed-by: Perry Yuan <perry.yuan@amd.com>
Cc: 6.4+ <stable@vger.kernel.org> # v6.4+
Signed-off-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Pull tracing fixes from Steven Rostedt::
"Eventfs fixes:
- With the usage of simple_recursive_remove() recommended by Al Viro,
the code should not be calling "d_invalidate()" itself. Doing so is
causing crashes. The code was calling d_invalidate() on the race of
trying to look up a file while the parent was being deleted. This
was detected, and the added dentry was having d_invalidate() called
on it, but the deletion of the directory was also calling
d_invalidate() on that same dentry.
- A fix to not free the eventfs_inode (ei) until the last dput() was
called on its ei->dentry made the ei->dentry exist even after it
was marked for free by setting the ei->is_freed. But code elsewhere
still was checking if ei->dentry was NULL if ei->is_freed is set
and would trigger WARN_ON if that was the case. That's no longer
true and there should not be any warnings when it is true.
- Use GFP_NOFS for allocations done under eventfs_mutex. The
eventfs_mutex can be taken on file system reclaim, make sure that
allocations done under that mutex do not trigger file system
reclaim.
- Clean up code by moving the taking of inode_lock out of the helper
functions and into where they are needed, and not use the parameter
to know to take it or not. It must always be held but some callers
of the helper function have it taken when they were called.
- Warn if the inode_lock is not held in the helper functions.
- Warn if eventfs_start_creating() is called without a parent. As
eventfs is underneath tracefs, all files created will have a parent
(the top one will have a tracefs parent).
Tracing update:
- Add Mathieu Desnoyers as an official reviewer of the tracing subsystem"
* tag 'trace-v6.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
MAINTAINERS: TRACING: Add Mathieu Desnoyers as Reviewer
eventfs: Make sure that parent->d_inode is locked in creating files/dirs
eventfs: Do not allow NULL parent to eventfs_start_creating()
eventfs: Move taking of inode_lock into dcache_dir_open_wrapper()
eventfs: Use GFP_NOFS for allocation when eventfs_mutex is held
eventfs: Do not invalidate dentry in create_file/dir_dentry()
eventfs: Remove expectation that ei->is_freed means ei->dentry == NULL
Merge ARM cpufreq fixes for 6.7-rc4 from Viresh Kumar.
These fix issues related to power domains in the qcom cpufreq driver
and an OPP-related issue in the imx6q cpufreq driver.
* 'cpufreq/arm/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
pmdomain: qcom: rpmpd: Set GENPD_FLAG_ACTIVE_WAKEUP
cpufreq: qcom-nvmem: Preserve PM domain votes in system suspend
cpufreq: qcom-nvmem: Enable virtual power domain devices
cpufreq: imx6q: Don't disable 792 Mhz OPP unnecessarily
Pull parisc architecture fixes from Helge Deller:
"This patchset fixes and enforces correct section alignments for the
ex_table, altinstructions, parisc_unwind, jump_table and bug_table
which are created by inline assembly.
Due to not being correctly aligned at link & load time they can
trigger unnecessarily the kernel unaligned exception handler at
runtime. While at it, I switched the bug table to use relative
addresses which reduces the size of the table by half on 64-bit.
We still had the ENOSYM and EREMOTERELEASE errno symbols as left-overs
from HP-UX, which now trigger build-issues with glibc. We can simply
remove them.
Most of the patches are tagged for stable kernel series.
Summary:
- Drop HP-UX ENOSYM and EREMOTERELEASE return codes to avoid glibc
build issues
- Fix section alignments for ex_table, altinstructions, parisc unwind
table, jump_table and bug_table
- Reduce size of bug_table on 64-bit kernel by using relative
pointers"
* tag 'parisc-for-6.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux:
parisc: Reduce size of the bug_table on 64-bit kernel by half
parisc: Drop the HP-UX ENOSYM and EREMOTERELEASE error codes
parisc: Use natural CPU alignment for bug_table
parisc: Ensure 32-bit alignment on parisc unwind section
parisc: Mark lock_aligned variables 16-byte aligned on SMP
parisc: Mark jump_table naturally aligned
parisc: Mark altinstructions read-only and 32-bit aligned
parisc: Mark ex_table entries 32-bit aligned in uaccess.h
parisc: Mark ex_table entries 32-bit aligned in assembly.h
In order to make sure I get CC'd on tracing changes for which my input
would be relevant, add my name as reviewer of the TRACING subsystem.
Link: https://lore.kernel.org/linux-trace-kernel/20231115155018.8236-1-mathieu.desnoyers@efficios.com
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Set GENPD_FLAG_ACTIVE_WAKEUP for all RPM power domains so that power
domains necessary for wakeup/"awake path" devices are kept on across
suspend.
This is needed for example for the *_AO ("active-only") variants of the
RPMPDs used by the CPU. Those should maintain their votes also across
system suspend to ensure the CPU can keep running for the whole suspend
process (ending in a firmware call). When the RPM firmware detects that
the CPUs are in a deep idle state it will drop those votes automatically.
Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Pull x86 microcode fixes from Ingo Molnar:
"Fix/enhance x86 microcode version reporting: fix the bootup log spam,
and remove the driver version announcement to avoid version confusion
when distros backport fixes"
* tag 'x86-urgent-2023-11-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/microcode: Rework early revisions reporting
x86/microcode: Remove the driver announcement and version
Enable GENERIC_BUG_RELATIVE_POINTERS which will store 32-bit relative
offsets to the bug address and the source file name instead of 64-bit
absolute addresses. This effectively reduces the size of the
bug_table[] array by half on 64-bit kernels.
Signed-off-by: Helge Deller <deller@gmx.de>
Since the locking of the parent->d_inode has been moved outside the
creation of the files and directories (as it use to be locked via a
conditional), add a WARN_ON_ONCE() to the case that it's not locked.
Link: https://lkml.kernel.org/r/20231121231112.853962542@goodmis.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
>From the Linux point of view, the power domains used by the CPU must
stay always-on. This is because we still need the CPU to keep running
until the last instruction, which will typically be a firmware call that
shuts down the CPU cleanly.
At the moment the power domain votes (enable + performance state) are
dropped during system suspend, which means the CPU could potentially
malfunction while entering suspend.
We need to distinguish between two different setups used with
qcom-cpufreq-nvmem:
1. CPR power domain: The backing regulator used by CPR should stay
always-on in Linux; it is typically disabled automatically by
hardware when the CPU enters a deep idle state. However, we
should pause the CPR state machine during system suspend.
2. RPMPD: The power domains used by the CPU should stay always-on
in Linux (also across system suspend). The CPU typically only
uses the *_AO ("active-only") variants of the power domains in
RPMPD. For those, the RPM firmware will automatically drop
the votes internally when the CPU enters a deep idle state.
Make this work correctly by calling device_set_awake_path() on the
virtual genpd devices, so that the votes are maintained across system
suspend. The power domain drivers need to set GENPD_FLAG_ACTIVE_WAKEUP
to opt into staying on during system suspend.
For now we only set this for the RPMPD case. For CPR, not setting it
will ensure the state machine is still paused during system suspend,
while the backing regulator will stay on with "regulator-always-on".
Signed-off-by: Stephan Gerhold <stephan.gerhold@kernkonzept.com>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>