Linux kernel
============
There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.
In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``. The formatted documentation can also be read online at:
https://www.kernel.org/doc/html/latest/
There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.
Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
The CoreThr column displays total thermal throttling events
since boot time.
Change it to report events during the measurement interval.
This is more useful for showing a user the current conditions.
Total events since boot time are still available to the user via
/sys/devices/system/cpu/cpu*/thermal_throttle/*
Document CoreThr on turbostat.8
Fixes: eae97e053fe30 ("turbostat: Support thermal throttle count print")
Reported-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Cc: Chen Yu <yu.c.chen@intel.com>
On systems with >= 1024 cpus (in my case 1152), turbostat fails with the error output:
"turbostat: /sys/fs/cgroup/cpuset.cpus.effective: cpu str malformat 0-1151"
A similar error appears with the use of turbostat --cpu when the inputted cpu
range contains a cpu number >= 1024:
# turbostat -c 1100-1151
"--cpu 1100-1151" malformed
...
Both errors are caused by parse_cpu_str() reaching its limit of CPU_SUBSET_MAXCPUS.
It's a good idea to limit the maximum cpu number being parsed, but 1024 is too low.
For a small increase in compute and allocated memory, increasing CPU_SUBSET_MAXCPUS
brings support for parsing cpu numbers >= 1024.
Increase CPU_SUBSET_MAXCPUS to 8192, a common setting for CONFIG_NR_CPUS on x86_64.
Signed-off-by: Justin Ernst <justin.ernst@hpe.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The idle governor provides the following per-idle state sysfs files:
* above - Indicates overshoots, where a more shallow state should have
been requested (if avaliale and enabled).
* below - Indicates undershoots, where a deeper state should have been
requested (if available and enabled).
These files offer valuable insights into how effectively the Linux kernel
idle governor selects idle states for a given workload. This commit adds
support for these files in turbostat.
Expose the contents of these files with the following naming convention:
* C1: The number of times the C1 state was requested (existing counter).
* C1+: The number of times the idle governor selected C1, but a deeper
idle state should have been selected instead.
* C1-: The number of times the idle governor selected C1, but a shallower
idle state should have been selected instead.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Fix the 'find_msrp_by_name()' function which returns incorrect matches for
cases like this:
s1 = "C1-";
find_msrp_by_name(head, s1);
Inside 'find_msrp_by_name()':
...
s2 = "C1"
if !(strcnmp(s1, s2, len(s2)))
// Incorrect match!
return mp;
Full strings should be match istead. Switch to 'strcmp()' to fix the problem.
Signed-off-by: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
turbostat aborted with below messages on a dual-package system,
turbostat: turbostat.c:3744: rapl_counter_accumulate: Assertion `dst->unit == src->unit' failed.
Aborted
This is because
1. the MSR_DRAM_PERF_STATUS returns Zero for one package, and non-Zero
for another package
2. probe_msr() treats Zero return value as a failure so this feature is
enabled on one package, and disabled for another package.
3. turbostat aborts because the feature is invalid on some package
Unlike the RAPL energy counter registers, MSR_DRAM_PERF_STATUS can
return Zero value, and this should not be treated as a failure.
Fix the problem by allowing Zero return value for RAPL registers other
than the energy counters.
Fixes: 7c6fee25bdf5 ("tools/power turbostat: Check for non-zero value when MSR probing")
Reported-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
The clustered uncore frequency counters, UMHz*.*
should honor the --show and --hide options.
All non-specified counters should be implicityly hidden.
But when --show was used, UMHz*.* showed up anyway:
$ sudo turbostat -q -S --show Busy%
Busy% UMHz0.0 UMHz1.0 UMHz2.0 UMHz3.0 UMHz4.0
Indeed, there was no string that can be used to explicitly
show or hide clustered uncore counters.
Even through they are dynamically probed and added,
group the clustered UMHz*.* counters with the legacy
built-in-counter "UncMHz" for show/hide.
turbostat --show Busy%
does not show UMHz*.*.
turbostat --show UncMHz
shows either UncMHz or UMHz*.*, if present
turbostat --hide UncMHz
hides either UncMHz or UMHz*.*, if present
Reported-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Tested-by: Artem Bityutskiy <artem.bityutskiy@intel.com>
Summary of Changes since 2024.11.30:
Fix regression in 2023.11.07 that affinitized forked child
in one-shot mode.
Harden one-shot mode against hotplug online/offline
Enable RAPL SysWatt column by default.
Add initial PTL, CWF platform support.
Harden initial PMT code in response to early use.
Enable first built-in PMT counter: CWF c1e residency
Refuse to run on unsupported platforms without --force,
to encourage updating to a version that supports the system,
and to avoid no-so-useful measurement results.
Signed-off-by: Len Brown <len.brown@intel.com>
Intel Clearwater Forest report PMT telemetry with GUID 0x14421519, which
can be used to obtain module c1e residency counter of type tcore clock.
Add early support for the counter by using heuristic that should work
for the Clearwater Forest platforms.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
when turbostat interval mode can't migrate to a CPU, it complains,
prints no data, re-initializes with the new CPU configuration
and starts a new interval.
But this strategy in the face of a CPU hotplug offline during an interval
doesn't help in one-shot mode. When the missing CPU is discovered
at the end of the interval, the forked program has already returned
and there is nothing left for a new interval to measure.
So instead of aborting get_coutners() and delta_cpu() if a missing CPU
is detected, complain, but carry on and output what statistics are
actually present.
Use the same strategy for delta_cpu when aperf:mperf are observed
to have been reset -- complain, but carry on and print data for
the CPUs that are still present.
Interval mode error handling is unchanged.
One-shot mode can now do this:
$ sudo chcpu -e 1 ; sudo ./turbostat --quiet --show PkgWatt,Busy%,CPU chcpu -d 1
CPU 1 enabled
CPU 1 disabled
get_counters: Could not migrate to CPU 1
./turbostat: Counter reset detected
0.036920 sec
CPU Busy% PkgWatt
- 0.00 10.00
0 99.73 10.00
1 0.00
2 91.53
3 16.83
Suggested-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
In "one-shot" mode, turbostat
1. takes a counter snapshot
2. forks and waits for a child
3. takes the end counter snapshot and prints the result.
But turbostat counter snapshots currently use affinity to travel
around the system so that counter reads are "local", and this
affinity must be cleared between #1 and #2 above.
The offending commit removed that reset that allowed the child
to run on cpu_present_set.
Fix that issue, and improve upon the original by using
cpu_possible_set for the child. This allows the child
to also run on CPUs that hotplug online during its runtime.
Reported-by: Zhang Rui <rui.zhang@intel.com>
Fixes: 7bb3fe27ad4f ("tools/power/turbostat: Obey allowed CPUs during startup")
Signed-off-by: Len Brown <len.brown@intel.com>
Some PMT counters, for example module c1e residency on Intel Clearwater
Forest, are reported using tcore clock type.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Fix checkpatch whitespace issues since 2024.11.30
Summary of Changes since 2024.11.30:
Enable SysWatt by default.
Add initial PTL, CWF platform support.
Refuse to run on unsupported platforms without --force
to avoid not-so-useful measurements mistakenly made
using obsolete versions.
Harden initial PMT code in response to early use.
Signed-off-by: Len Brown <len.brown@intel.com>
Allow user to add PMT counters by either identifying the source with:
guid=%u,seq=%u
or, since this patch, with direct sysfs path:
path=%s, for example path=/sys/class/intel_pmt/telem5
In the later case, the guid and sequence number will be infered
by turbostat.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
Some platforms may expose multiple telemetry files identified with the
same GUID. Interpreting it correctly, to associate given counter with a
CPU, core or a package requires more metadata from the user.
Parse and create ordered, linked list of those PMT aggregators, so that
we can identify specific aggregator with GUID + sequence number.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>
PMT directories exposed in sysfs use the following pattern:
telem%u
for example:
telem0, telem2, telem3, ..., telem15, telem16
This naming scheme preserves the ordering from the PCIe discovery, which
is important to correctly map the telemetry directory to the specific
domain (cpu, core, package etc).
Because readdir() traverses the entries in alphabetical order, causing
for example "telem13" to be traversed before "telem3", it is necessary
to use scandir() with custom compare() callback to preserve the PCIe
ordering.
Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com>
Signed-off-by: Len Brown <len.brown@intel.com>