commits

Pull perf tools fixes from Arnaldo Carvalho de Melo:

- Fail graciously if BUILD_BPF_SKEL=1 is specified and clang isn't
available

- Add empty 'struct rq' to 'perf lock contention' to satisfy libbpf
'runqueue' type verification. This feature is built only with
BUILD_BPF_SKEL=1

- Make vmlinux.h use bpf.h and perf_event.h in source directory, not
system ones that may be old and not have things like 'union
perf_sample_weight'

- Add system include paths to BPF builds to pick things missing in the
headers included by clang -target bpf

- Update various header copies with the kernel sources

- Change divide by zero and not supported events behavior to show
'nan'/'not counted' in 'perf stat' output.

This happens when using things like 'perf stat -M TopdownL2 true',
involving JSON metrics

- Update no event/metric expectations affected by using JSON metrics in
'perf stat -ddd' perf test

- Avoid segv with 'perf stat --topdown' for metrics without a group

- Do not assume which events may have a PMU name, allowing the logic to
keep an AUX event group together. Makes this usecase work again:

$ perf record --no-bpf-event -c 10 -e '{intel_pt//,tlb_flush.stlb_any/aux-sample-size=8192/pp}:u' -- sleep 0.1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.078 MB perf.data ]
$ perf script -F-dso,+addr | grep -C5 tlb_flush.stlb_any | head -11
sleep 20444 [003] 7939.510243: 1 branches:uH: 7f5350cc82a2 dl_main+0x9a2 => 7f5350cb38f0 _dl_add_to_namespace_list+0x0
sleep 20444 [003] 7939.510243: 1 branches:uH: 7f5350cb3908 _dl_add_to_namespace_list+0x18 => 7f5350cbb080 rtld_mutex_dummy+0x0
sleep 20444 [003] 7939.510243: 1 branches:uH: 7f5350cc8350 dl_main+0xa50 => 0 [unknown]
sleep 20444 [003] 7939.510244: 1 branches:uH: 7f5350cc83ca dl_main+0xaca => 7f5350caeb60 _dl_process_pt_gnu_property+0x0
sleep 20444 [003] 7939.510245: 1 branches:uH: 7f5350caeb60 _dl_process_pt_gnu_property+0x0 => 0 [unknown]
sleep 20444 7939.510245: 10 tlb_flush.stlb_any/aux-sample-size=8192/pp: 0 7f5350caeb60 _dl_process_pt_gnu_property+0x0
sleep 20444 [003] 7939.510254: 1 branches:uH: 7f5350cc87fe dl_main+0xefe => 7f5350ccd240 strcmp+0x0
sleep 20444 [003] 7939.510254: 1 branches:uH: 7f5350cc8862 dl_main+0xf62 => 0 [unknown]

- Add a check for the above use case in 'perf test test_intel_pt'

- Fix build with refcount checking on arm64, it was still accessing
fields that need to be wrapped so that the refcounted struct gets
checked

- Fix contextid validation in ARM's CS-ETM, so that older kernels
without that field can still be supported

- Skip unsupported aggregation for stat events found in perf.data files
in 'perf script'

- Add stat test for record and script to check the previous problem

- Remove needless debuginfod queries from 'perf test java symbol', this
was just making the test take a long time to complete

- Address python SafeConfigParser() deprecation warning in 'perf test
attr'

- Fix __NR_execve undeclared on i386 'perf bench syscall' build error

* tag 'perf-tools-fixes-for-v6.4-1-2023-05-20' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (33 commits)
perf bench syscall: Fix __NR_execve undeclared build error
perf test attr: Fix python SafeConfigParser() deprecation warning
perf test attr: Update no event/metric expectations
tools headers disabled-features: Sync with the kernel sources
tools headers UAPI: Sync arch prctl headers with the kernel sources
tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench'
tools headers x86 cpufeatures: Sync with the kernel sources
tools headers UAPI: Sync s390 syscall table file that wires up the memfd_secret syscall
tools headers UAPI: Sync linux/prctl.h with the kernel sources
perf metrics: Avoid segv with --topdown for metrics without a group
perf lock contention: Add empty 'struct rq' to satisfy libbpf 'runqueue' type verification
perf cs-etm: Fix contextid validation
perf arm64: Fix build with refcount checking
perf test: Add stat test for record and script
perf script: Skip aggregation for stat events
perf build: Add system include paths to BPF builds
perf bpf skels: Make vmlinux.h use bpf.h and perf_event.h in source directory
perf parse-events: Do not break up AUX event group
perf test test_intel_pt.sh: Test sample mode with event with PMU name
perf evsel: Modify group pmu name for software events
...

2y ago

Mingwei Zhang

b9846a69

KVM: VMX: add MSR_IA32_TSX_CTRL into msrs_to_save

2y ago

Linus Torvalds

ac9a7868

Linux 6.4-rc1 v6.4-rc1

2y ago

Linus Torvalds

4927cb98

Merge tag 'powerpc-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

2y ago

Tiezhu Yang

4e111f0c

perf bench syscall: Fix __NR_execve undeclared build error

2y ago

Sean Christopherson

275a8724

KVM: x86: Don't adjust guest's CPUID.0x12.1 (allowed SGX enclave XFRM)

2y ago

Linus Torvalds

f085df1b

Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tool updates from Arnaldo Carvalho de Melo:
"Third version of perf tool updates, with the build problems with with
using a 'vmlinux.h' generated from the main build fixed, and the bpf
skeleton build disabled by default.

Build:

- Require libtraceevent to build, one can disable it using
NO_LIBTRACEEVENT=1.

It is required for tools like 'perf sched', 'perf kvm', 'perf
trace', etc.

libtraceevent is available in most distros so installing
'libtraceevent-devel' should be a one-time event to continue
building perf as usual.

Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
sufficient for lots of users not interested in those libtraceevent
dependent features.

- Allow Python support in 'perf script' when libtraceevent isn't
linked, as not all features requires it, for instance Intel PT does
not use tracepoints.

- Error if the python interpreter needed for jevents to work isn't
available and NO_JEVENTS=1 isn't set, preventing a build without
support for JSON vendor events, which is a rare but possible
condition. The two check error messages:

$(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
$(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)

- Make libbpf 1.0 the minimum required when building with out of
tree, distro provided libbpf.

- Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
demangler, add 'perf test' entry for it.

- Make binutils libraries opt in, as distros disable building with it
due to licensing, they were used for C++ demangling, for instance.

- Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
equivalent) isn't installed, we'll just have a build warning:

Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev

- Add a feature test for scandirat(), that is not implemented so far
in musl and uclibc, disabling features that need it, such as
scanning for tracepoints in /sys/kernel/tracing/events.

perf BPF filters:

- New feature where BPF can be used to filter samples, for instance:

$ sudo ./perf record -e cycles --filter 'period > 1000' true
$ sudo ./perf script
perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)

- In addition to 'period' (PERF_SAMPLE_PERIOD), the other
PERF_SAMPLE_ can be used for filtering, and also some other sample
accessible values, from tools/perf/Documentation/perf-record.txt:

Essentially the BPF filter expression is:

<term> <operator> <value> (("," | "||") <term> <operator> <value>)*

The <term> can be one of:
ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
mem_dtlb, mem_blk, mem_hops

The <operator> can be one of:
==, !=, >, >=, <, <=, &

The <value> can be one of:
<number> (for any term)
na, load, store, pfetch, exec (for mem_op)
l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
remote (for mem_remote)
na, locked (for mem_locked)
na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
na, by_data, by_addr (for mem_blk)
hops0, hops1, hops2, hops3 (for mem_hops)

perf lock contention:

- Show lock type with address.

- Track and show mmap_lock, siglock and per-cpu rq_lock with address.
This is done for mmap_lock by following the current->mm pointer:

$ sudo ./perf lock con -abl -- sleep 10
contended total wait max wait avg wait address symbol
...
16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640
17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0
3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock
3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58
1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70
9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock
14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0
3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock
16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560
11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock
1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8
1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock
581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058
5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070
112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120
381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock
255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80

- Update default map size to 16384.

- Allocate single letter option -M for --map-nr-entries, as it is
proving being frequently used.

- Fix struct rq lock access for older kernels with BPF's CO-RE
(Compile once, run everywhere).

- Fix problems found with MSAn.

perf report/top:

- Add inline information when using --call-graph=fp or lbr, as was
already done to the --call-graph=dwarf callchain mode.

- Improve the 'srcfile' sort key performance by really using an
optimization introduced in 6.2 for the 'srcline' sort key that
avoids calling addr2line for comparision with each sample.

perf sched:

- Make 'perf sched latency/map/replay' to use "sched:sched_waking"
instead of "sched:sched_waking", consistent with 'perf record'
since d566a9c2d482 ("perf sched: Prefer sched_waking event when it
exists").

perf ftrace:

- Make system wide the default target for latency subcommand, run the
following command then generate some network traffic and press
control+C:

# perf ftrace latency -T __kfree_skb
^C
DURATION | COUNT | GRAPH |
0 - 1 us | 27 | ############# |
1 - 2 us | 22 | ########### |
2 - 4 us | 8 | #### |
4 - 8 us | 5 | ## |
8 - 16 us | 24 | ############ |
16 - 32 us | 2 | # |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - 2 ms | 0 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 0 | |
256 - 512 ms | 0 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
#

perf top:

- Add --branch-history (LBR: Last Branch Record) option, just like
already available for 'perf record'.

- Fix segfault in thread__comm_len() where thread->comm was being
used outside thread->comm_lock.

perf annotate:

- Allow configuring objdump and addr2line in ~/.perfconfig., so that
you can use alternative binaries, such as llvm's.

perf kvm:

- Add TUI mode for 'perf kvm stat report'.

Reference counting:

- Add reference count checking infrastructure to check for use after
free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
more to come.

To build with it use -DREFCNT_CHECKING=1 in the make command line
to build tools/perf. Documented at:

https://perf.wiki.kernel.org/index.php/Reference_Count_Checking

- The above caught, for instance, fix, present in this series:

- Fix maps use after put in 'perf test "Share thread maps"':

'maps' is copied from leader, but the leader is put on line 79
and then 'maps' is used to read the reference count below - so
a use after put, with the put of maps happening within
thread__put.

Fixed by reversing the order of puts so that the leader is put
last.

- Also several fixes were made to places where reference counts were
not being held.

- Make this one of the tests in 'make -C tools/perf build-test' to
regularly build test it and to make sure no direct access to the
reference counted structs are made, doing that via accessors to
check the validity of the struct pointer.

ARM64:

- Fix 'perf report' segfault when filtering coresight traces by
sparse lists of CPUs.

- Add support for 'simd' as a sort field for 'perf report', to show
ARM's NEON SIMD's predicate flags: "partial" and "empty".

arm64 vendor events:

- Add N1 metrics.

Intel vendor events:

- Add graniterapids, grandridge and sierraforrest events.

- Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
silvermont, skylake, tigerlake and westmereep-dp

- Refresh metrics for alderlake-n, broadwell, broadwellde,
broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
skylakex.

perf stat:

- Implement --topdown using JSON metrics.

- Add TopdownL1 JSON metric as a default if present, but disable it
for now for some Intel hybrid architectures, a series of patches
addressing this is being reviewed and will be submitted for v6.5.

- Use metrics for --smi-cost.

- Update topdown documentation.

Vendor events (JSON) infrastructure:

- Add support for computing and printing metric threshold values. For
instance, here is one found in thesapphirerapids json file:

{
"BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
"MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
"MetricGroup": "smi",
"MetricName": "smi_cycles",
"MetricThreshold": "smi_cycles > 0.1",
"ScaleUnit": "100%"
},

- Test parsing metric thresholds with the fake PMU in 'perf test
pmu-events'.

- Support for printing metric thresholds in 'perf list'.

- Add --metric-no-threshold option to 'perf stat'.

- Add rand (reverse and) and has_pmem (optane memory) support to
metrics.

- Sort list of input files to avoid depending on the order from
readdir() helping in obtaining reproducible builds.

S/390:

- Add common metrics: - CPI (cycles per instruction), prbstate (ratio
of instructions executed in problem state compared to total number
of instructions), l1mp (Level one instruction and data cache misses
per 100 instructions).

- Add cache metrics for z13, z14, z15 and z16.

- Add metric for TLB and cache.

ARM:

- Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
(Memory Tagging Extension) and MOPS (Memory Operations) load/store.

Intel PT hardware tracing:

- Add event type names UINTR (User interrupt delivered) and UIRET
(Exiting from user interrupt routine), documented in table 32-50
"CFE Packet Type and Vector Fields Details" in the Intel Processor
Trace chapter of The Intel SDM Volume 3 version 078.

- Add support for new branch instructions ERETS and ERETU.

- Fix CYC timestamps after standalone CBR

ARM CoreSight hardware tracing:

- Allow user to override timestamp and contextid settings.

- Fix segfault in dso lookup.

- Fix timeless decode mode detection.

- Add separate decode paths for timeless and per-thread modes.

auxtrace:

- Fix address filter entire kernel size.

Miscellaneous:

- Fix use-after-free and unaligned bugs in the PLT handling routines.

- Use zfree() to reduce chances of use after free.

- Add missing 0x prefix for addresses printed in hexadecimal in 'perf
probe'.

- Suppress massive unsupported target platform errors in the unwind
code.

- Fix return incorrect build_id size in elf_read_build_id().

- Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .

- Add missing new parameter in kfree_skb tracepoint to the python
scripts using it.

- Add 'perf bench syscall fork' benchmark.

- Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
'perf mem'.

- Fix wrong size expectation for perf test 'Setup struct
perf_event_attr' caused by the patch adding
perf_event_attr::config3.

- Fix some spelling mistakes"

* tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
Revert "perf build: Warn for BPF skeletons if endian mismatches"
perf metrics: Fix SEGV with --for-each-cgroup
perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
perf stat: Separate bperf from bpf_profiler
perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
perf test record+probe_libc_inet_pton: Fix call chain match on s390
perf tracepoint: Fix memory leak in is_valid_tracepoint()
perf cs-etm: Add fix for coresight trace for any range of CPUs
perf build: Fix unescaped # in perf build-test
perf unwind: Suppress massive unsupported target platform errors
perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
perf script: Print raw ip instead of binary offset for callchain
perf symbols: Fix return incorrect build_id size in elf_read_build_id()
perf list: Modify the warning message about scandirat(3)
perf list: Fix memory leaks in print_tracepoint_events()
perf lock contention: Rework offset calculation with BPF CO-RE
perf lock contention: Fix struct rq lock access
perf stat: Disable TopdownL1 on hybrid
perf stat: Avoid SEGV on counter->name
...

2y ago

Linus Torvalds

90af47ed

Merge tag 'ata-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata

2y ago

Gaurav Batra

1f7aacc5

powerpc/iommu: Incorrect DDW Table is referenced for SR-IOV device

2y ago

Ian Rogers

75438f24

perf test attr: Fix python SafeConfigParser() deprecation warning

2y ago

Sean Christopherson

ad45413d

KVM: VMX: Don't rely _only_ on CPUID to enforce XCR0 restrictions for ECREATE

2y ago

Linus Torvalds

17784de6

Merge tag 'core-debugobjects-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2y ago

Arnaldo Carvalho de Melo

9a2d5178

Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"

2y ago

Linus Torvalds

70e137e3

Merge tag 'fbdev-for-6.4-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

2y ago

Michal Simek

a7844528

dt-bindings: ata: ahci-ceva: Cover all 4 iommus entries

2y ago

Gaurav Batra

096339ab

powerpc/iommu: DMA address offset is incorrectly calculated with 2MB TCEs

2y ago

Ian Rogers

951efb99

perf test attr: Update no event/metric expectations

2y ago

Michal Luczaj

afb2acb2

KVM: Fix vcpu_array[0] races

In kvm_vm_ioctl_create_vcpu(), add vcpu to vcpu_array iff it's safe to
access vcpu via kvm_get_vcpu() and kvm_for_each_vcpu(), i.e. when there's
no failure path requiring vcpu removal and destruction. Such order is
important because vcpu_array accessors may end up referencing vcpu at
vcpu_array[0] even before online_vcpus is set to 1.

When online_vcpus=0, any call to kvm_get_vcpu() goes through
array_index_nospec() and ends with an attempt to xa_load(vcpu_array, 0):

int num_vcpus = atomic_read(&kvm->online_vcpus);
i = array_index_nospec(i, num_vcpus);
return xa_load(&kvm->vcpu_array, i);

Similarly, when online_vcpus=0, a kvm_for_each_vcpu() does not iterate over
an "empty" range, but actually [0, ULONG_MAX]:

xa_for_each_range(&kvm->vcpu_array, idx, vcpup, 0, \
(atomic_read(&kvm->online_vcpus) - 1))

In both cases, such online_vcpus=0 edge case, even if leading to
unnecessary calls to XArray API, should not be an issue; requesting
unpopulated indexes/ranges is handled by xa_load() and xa_for_each_range().

However, this means that when the first vCPU is created and inserted in
vcpu_array *and* before online_vcpus is incremented, code calling
kvm_get_vcpu()/kvm_for_each_vcpu() already has access to that first vCPU.

This should not pose a problem assuming that once a vcpu is stored in
vcpu_array, it will remain there, but that's not the case:
kvm_vm_ioctl_create_vcpu() first inserts to vcpu_array, then requests a
file descriptor. If create_vcpu_fd() fails, newly inserted vcpu is removed
from the vcpu_array, then destroyed:

vcpu->vcpu_idx = atomic_read(&kvm->online_vcpus);
r = xa_insert(&kvm->vcpu_array, vcpu->vcpu_idx, vcpu, GFP_KERNEL_ACCOUNT);
kvm_get_kvm(kvm);
r = create_vcpu_fd(vcpu);
if (r < 0) {
xa_erase(&kvm->vcpu_array, vcpu->vcpu_idx);
kvm_put_kvm_no_destroy(kvm);
goto unlock_vcpu_destroy;
}
atomic_inc(&kvm->online_vcpus);

This results in a possible race condition when a reference to a vcpu is
acquired (via kvm_get_vcpu() or kvm_for_each_vcpu()) moments before said
vcpu is destroyed.

Signed-off-by: Michal Luczaj <mhal@rbox.co>
Message-Id: <20230510140410.1093987-2-mhal@rbox.co>
Cc: stable@vger.kernel.org
Fixes: c5b077549136 ("KVM: Convert the kvm->vcpus array to a xarray", 2021-12-08)
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

2y ago

Linus Torvalds

6f69c981

Merge tag 'v6.4-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

2y ago

Thomas Gleixner

0af462f1

debugobject: Ensure pool refill (again)

2y ago

Arnaldo Carvalho de Melo

c3e6df97

Revert "perf build: Warn for BPF skeletons if endian mismatches"

2y ago

Linus Torvalds

e2065b8c

Merge tag '6.4-rc2-ksmbd-server-fixes' of git://git.samba.org/ksmbd

2y ago

Helge Deller

d9a45969

fbdev: stifb: Whitespace cleanups

2y ago

Linus Torvalds

f1fcbaa1

Linux 6.4-rc2 v6.4-rc2

2y ago

Jason Gunthorpe

ad593827

powerpc/iommu: Remove iommu_del_device()

2y ago

Arnaldo Carvalho de Melo

1b5f159c

tools headers disabled-features: Sync with the kernel sources

2y ago

Jacob Xu

3367eeab

KVM: VMX: Fix header file dependency of asm/vmx.h

2y ago

Linus Torvalds

63342b1d

Merge tag '6.4-rc-smb3-client-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

2y ago

Ondrej Mosnacek

b8969a1b

crypto: api - Fix CRYPTO_USER checks for report function

2y ago

Thomas Gleixner

63a75969

debugobject: Prevent init race with static objects

Statically initialized objects are usually not initialized via the init()
function of the subsystem. They are special cased and the subsystem
provides a function to validate whether an object which is not yet tracked
by debugobjects is statically initialized. This means the object is started
to be tracked on first use, e.g. activation.

This works perfectly fine, unless there are two concurrent operations on
that object. Schspa decoded the problem:

T0 T1

debug_object_assert_init(addr)
lock_hash_bucket()
obj = lookup_object(addr);
if (!obj) {
unlock_hash_bucket();
- > preemption
lock_subsytem_object(addr);
activate_object(addr)
lock_hash_bucket();
obj = lookup_object(addr);
if (!obj) {
unlock_hash_bucket();
if (is_static_object(addr))
init_and_track(addr);
lock_hash_bucket();
obj = lookup_object(addr);
obj->state = ACTIVATED;
unlock_hash_bucket();

subsys function modifies content of addr,
so static object detection does
not longer work.

unlock_subsytem_object(addr);

if (is_static_object(addr)) <- Fails

debugobject emits a warning and invokes the fixup function which
reinitializes the already active object in the worst case.

This race exists forever, but was never observed until mod_timer() got a
debug_object_assert_init() added which is outside of the timer base lock
held section right at the beginning of the function to cover the lockless
early exit points too.

Rework the code so that the lookup, the static object check and the
tracking object association happens atomically under the hash bucket
lock. This prevents the issue completely as all callers are serialized on
the hash bucket lock and therefore cannot observe inconsistent state.

Fixes: 3ac7fe5a4aab ("infrastructure to debug (dynamic) objects")
Reported-by: syzbot+5093ba19745994288b53@syzkaller.appspotmail.com
Debugged-by: Schspa Shi <schspa@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Stephen Boyd <swboyd@chromium.org>
Link: https://syzkaller.appspot.com/bug?id=22c8a5938eab640d1c6bcc0e3dc7be519d878462
Link: https://lore.kernel.org/lkml/20230303161906.831686-1-schspa@gmail.com
Link: https://lore.kernel.org/r/87zg7dzgao.ffs@tglx

2y ago

Ian Rogers

6c73f819

perf metrics: Fix SEGV with --for-each-cgroup

2y ago

Linus Torvalds

0c9dcf12

Merge tag '6.4-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

2y ago

Gustav Johansson

e7b8b8ed

ksmbd: smb2: Allow messages padded to 8byte boundary

2y ago

Helge Deller

537adba4

fbdev: udlfb: Use usb_control_msg_send()

2y ago

Linus Torvalds

533c5454

Merge tag 'cxl-fixes-6.4-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

2y ago

Michael Ellerman

8133a3f0

powerpc/crypto: Fix aes-gcm-p10 build when VSX=n

2y ago

Arnaldo Carvalho de Melo

29719e31

tools headers UAPI: Sync arch prctl headers with the kernel sources

To pick the changes in this cset:

a03c376ebaf38394 ("x86/arch_prctl: Add AMX feature numbers as ABI constants")
23e5d9ec2bab53c4 ("x86/mm/iommu/sva: Make LAM and SVA mutually exclusive")
2f8794bd087e7958 ("x86/mm: Provide arch_prctl() interface for LAM")

This picks these new prctls in a third range, that was also added to the
tools/perf/trace/beauty/arch_prctl.c beautifier.

$ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/before
$ cp arch/x86/include/uapi/asm/prctl.h tools/arch/x86/include/uapi/asm/prctl.h
$ tools/perf/trace/beauty/x86_arch_prctl.sh > /tmp/after
$ diff -u /tmp/before /tmp/after
@@ -20,3 +20,11 @@
[0x2003 - 0x2001]= "MAP_VDSO_64",
};

+#define x86_arch_prctl_codes_3_offset 0x4001
+static const char *x86_arch_prctl_codes_3[] = {
+ [0x4001 - 0x4001]= "GET_UNTAG_MASK",
+ [0x4002 - 0x4001]= "ENABLE_TAGGED_ADDR",
+ [0x4003 - 0x4001]= "GET_MAX_TAG_BITS",
+ [0x4004 - 0x4001]= "FORCE_TAGGED_SVA",
+};
+
$

With this 'perf trace' can translate those numbers into strings and use
the strings in filter expressions:

# perf trace -e prctl
0.000 ( 0.011 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9c014b7df5) = 0
0.032 ( 0.002 ms): DOM Worker/3722622 prctl(option: SET_NAME, arg2: 0x7f9bb6b51580) = 0
5.452 ( 0.003 ms): StreamT~ns #30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfeb70) = 0
5.468 ( 0.002 ms): StreamT~ns #30/3722623 prctl(option: SET_NAME, arg2: 0x7f9bdbdfea70) = 0
24.494 ( 0.009 ms): IndexedDB #556/3722624 prctl(option: SET_NAME, arg2: 0x7f562a32ae28) = 0
24.540 ( 0.002 ms): IndexedDB #556/3722624 prctl(option: SET_NAME, arg2: 0x7f563c6d4b30) = 0
670.281 ( 0.008 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30805c8) = 0
670.293 ( 0.002 ms): systemd-userwo/3722339 prctl(option: SET_NAME, arg2: 0x564be30800f0) = 0
^C#

This addresses this perf build warning:

Warning: Kernel ABI header at 'tools/arch/x86/include/uapi/asm/prctl.h' differs from latest version at 'arch/x86/include/uapi/asm/prctl.h'
diff -u tools/arch/x86/include/uapi/asm/prctl.h arch/x86/include/uapi/asm/prctl.h

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Chang S. Bae <chang.seok.bae@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZGTjNPpD3FOWfetM@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

2y ago

Sean Christopherson

e0ceec22

KVM: Don't enable hardware after a restart/shutdown is initiated

2y ago

Linus Torvalds

d6b8a8c4

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

2y ago

Yang Li

9ee04875

cifs: Remove unneeded semicolon

2y ago

Olivier Bacon

4140aafc

crypto: engine - fix crypto_queue backlog handling

2y ago

Linus Torvalds

09a9639e

Linux 6.3-rc6 v6.3-rc6

2y ago

Arnaldo Carvalho de Melo

a8874665

perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE

Linus reported a build break due to using a vmlinux without a BTF elf
section to generate the vmlinux.h header with bpftool for use in the BPF
tools in tools/perf/util/bpf_skel/*.bpf.c.

Instead add a vmlinux.h file with the structs needed with the fields the
tools need, marking the structs with __attribute__((preserve_access_index)),
so that libbpf's CO-RE code can fixup the struct field offsets.

In some cases the vmlinux.h file that was being generated by bpftool
from the kernel BTF information was not needed at all, just including
linux/bpf.h, sometimes linux/perf_event.h was enough as non-UAPI
types were not being used.

To keep te patch small, include those UAPI headers from the trimmed down
vmlinux.h file, that then provides the tools with just the structs and
the subset of its fields needed for them.

Testing it:

# perf lock contention -b find / > /dev/null
^C contended total wait max wait avg wait type caller

7 53.59 us 10.86 us 7.66 us rwlock:R start_this_handle+0xa0
2 30.35 us 21.99 us 15.17 us rwsem:R iterate_dir+0x52
1 9.04 us 9.04 us 9.04 us rwlock:W start_this_handle+0x291
1 8.73 us 8.73 us 8.73 us spinlock raw_spin_rq_lock_nested+0x1e
#
# perf lock contention -abl find / > /dev/null
^C contended total wait max wait avg wait address symbol

1 262.96 ms 262.96 ms 262.96 ms ffff8e67502d0170 (mutex)
12 244.24 us 39.91 us 20.35 us ffff8e6af56f8070 mmap_lock (rwsem)
7 30.28 us 6.85 us 4.33 us ffff8e6c865f1d40 rq_lock (spinlock)
3 7.42 us 4.03 us 2.47 us ffff8e6c864b1d40 rq_lock (spinlock)
2 3.72 us 2.19 us 1.86 us ffff8e6c86571d40 rq_lock (spinlock)
1 2.42 us 2.42 us 2.42 us ffff8e6c86471d40 rq_lock (spinlock)
4 2.11 us 559 ns 527 ns ffffffff9a146c80 rcu_state (spinlock)
3 1.45 us 818 ns 482 ns ffff8e674ae8384c (rwlock)
1 870 ns 870 ns 870 ns ffff8e68456ee060 (rwlock)
1 663 ns 663 ns 663 ns ffff8e6c864f1d40 rq_lock (spinlock)
1 573 ns 573 ns 573 ns ffff8e6c86531d40 rq_lock (spinlock)
1 472 ns 472 ns 472 ns ffff8e6c86431740 (spinlock)
1 397 ns 397 ns 397 ns ffff8e67413a4f04 (spinlock)
#
# perf test offcpu
95: perf record offcpu profiling tests : Ok
#
# perf kwork latency --use-bpf
Starting trace, Hit <Ctrl+C> to stop and report
^C
Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end |
--------------------------------------------------------------------------------------------------------------------------------
(w)flush_memcg_stats_dwork | 0000 | 1056.212 ms | 2 | 2112.345 ms | 550113.229573 s | 550115.341919 s |
(w)toggle_allocation_gate | 0000 | 10.144 ms | 62 | 416.389 ms | 550113.453518 s | 550113.869907 s |
(w)0xffff8e6748e28080 | 0002 | 0.623 ms | 1 | 0.623 ms | 550110.989841 s | 550110.990464 s |
(w)vmstat_shepherd | 0000 | 0.586 ms | 10 | 2.828 ms | 550111.971536 s | 550111.974364 s |
(w)vmstat_update | 0007 | 0.363 ms | 5 | 1.634 ms | 550113.222520 s | 550113.224154 s |
(w)vmstat_update | 0000 | 0.324 ms | 10 | 2.827 ms | 550111.971526 s | 550111.974354 s |
(w)0xffff8e674c5f4a58 | 0002 | 0.102 ms | 5 | 0.134 ms | 550110.989839 s | 550110.989972 s |
(w)psi_avgs_work | 0001 | 0.086 ms | 3 | 0.107 ms | 550114.957852 s | 550114.957959 s |
(w)psi_avgs_work | 0000 | 0.079 ms | 5 | 0.100 ms | 550118.605668 s | 550118.605768 s |
(w)kfree_rcu_monitor | 0006 | 0.079 ms | 1 | 0.079 ms | 550110.925821 s | 550110.925900 s |
(w)psi_avgs_work | 0004 | 0.079 ms | 1 | 0.079 ms | 550109.581835 s | 550109.581914 s |
(w)psi_avgs_work | 0001 | 0.078 ms | 1 | 0.078 ms | 550109.197809 s | 550109.197887 s |
(w)psi_avgs_work | 0002 | 0.077 ms | 5 | 0.086 ms | 550110.669819 s | 550110.669905 s |
<SNIP>
# strace -e bpf -o perf-stat-bpf-counters.output perf stat -e cycles --bpf-counters sleep 1

Performance counter stats for 'sleep 1':

6,197,983 cycles

1.003922848 seconds time elapsed

0.000000000 seconds user
0.002032000 seconds sys

# head -7 perf-stat-bpf-counters.output
bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/perf_attr_map", bpf_fd=0, file_flags=0}, 16) = 3
bpf(BPF_OBJ_GET_INFO_BY_FD, {info={bpf_fd=3, info_len=88, info=0x7ffcead64990}}, 16) = 0
bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=3, key=0x24129e0, value=0x7ffcead65a48, flags=BPF_ANY}, 32) = 0
bpf(BPF_LINK_GET_FD_BY_ID, {link_id=1252}, 12) = -1 ENOENT (No such file or directory)
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffcead65780, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0,
+func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 116) = 4
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffcead65920, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0,
+func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 128) = 4
bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\20\0\0\0\20\0\0\0\5\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=45, btf_log_size=0, btf_log_level=0}, 28) = 4
#

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/lkml/ZFU1PJrn8YtHIqno@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>