commits

Without this check a snapshot is taken whenever a bucket's max is hit,
rather than only when the global max is hit, as it should be.

Before:

In this example, we do a first run of the workload (cyclictest),
examine the output, note the max ('triggering value') (347), then do
a second run and note the max again.

In this case, the max in the second run (39) is below the max in the
first run, but since we haven't cleared the histogram, the first max
is still in the histogram and is higher than any other max, so it
should still be the max for the snapshot. It isn't however - the
value should still be 347 after the second run.

# echo 'hist:keys=pid:ts0=common_timestamp.usecs if comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_waking/trigger
# echo 'hist:keys=next_pid:wakeup_lat=common_timestamp.usecs-$ts0:onmax($wakeup_lat).save(next_prio,next_comm,prev_pid,prev_prio,prev_comm):onmax($wakeup_lat).snapshot() if next_comm=="cyclictest"' >> /sys/kernel/debug/tracing/events/sched/sched_switch/trigger

# cyclictest -p 80 -n -s -t 2 -D 2

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

{ next_pid: 2143 } hitcount: 199
max: 44 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4

{ next_pid: 2145 } hitcount: 1325
max: 38 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2

{ next_pid: 2144 } hitcount: 1982
max: 347 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6

Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 347
triggered by event with key: { next_pid: 2144 }

# cyclictest -p 80 -n -s -t 2 -D 2

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

{ next_pid: 2143 } hitcount: 199
max: 44 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4

{ next_pid: 2148 } hitcount: 199
max: 16 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/1

{ next_pid: 2145 } hitcount: 1325
max: 38 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2

{ next_pid: 2150 } hitcount: 1326
max: 39 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4

{ next_pid: 2144 } hitcount: 1982
max: 347 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6

{ next_pid: 2149 } hitcount: 1983
max: 130 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/0

Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 39
triggered by event with key: { next_pid: 2150 }

After:

In this example, we do a first run of the workload (cyclictest),
examine the output, note the max ('triggering value') (375), then do
a second run and note the max again.

In this case, the max in the second run is still 375, the highest in
any bucket, as it should be.

# cyclictest -p 80 -n -s -t 2 -D 2

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

{ next_pid: 2072 } hitcount: 200
max: 28 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/5

{ next_pid: 2074 } hitcount: 1323
max: 375 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2

{ next_pid: 2073 } hitcount: 1980
max: 153 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6

Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 375
triggered by event with key: { next_pid: 2074 }

# cyclictest -p 80 -n -s -t 2 -D 2

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/hist

{ next_pid: 2101 } hitcount: 199
max: 49 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6

{ next_pid: 2072 } hitcount: 200
max: 28 next_prio: 120 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/5

{ next_pid: 2074 } hitcount: 1323
max: 375 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/2

{ next_pid: 2103 } hitcount: 1325
max: 74 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/4

{ next_pid: 2073 } hitcount: 1980
max: 153 next_prio: 19 next_comm: cyclictest
prev_pid: 0 prev_prio: 120 prev_comm: swapper/6

{ next_pid: 2102 } hitcount: 1981
max: 84 next_prio: 19 next_comm: cyclictest
prev_pid: 12 prev_prio: 120 prev_comm: kworker/0:1

Snapshot taken (see tracing/snapshot). Details:
triggering value { onmax($wakeup_lat) }: 375
triggered by event with key: { next_pid: 2074 }

Link: http://lkml.kernel.org/r/95958351329f129c07504b4d1769c47a97b70d65.1555597045.git.tom.zanussi@linux.intel.com

Cc: stable@vger.kernel.org
Fixes: a3785b7eca8fd ("tracing: Add hist trigger snapshot() action")
Signed-off-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>

6y ago

Linus Torvalds

b2ad8136

Merge tag 'libnvdimm-fixes-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

6y ago

Gabriel Krisman Bertazi

66883da1

ext4: fix dcache lookup of !casefolded directories

6y ago

Sebastian Andrzej Siewior

b7d5dc21

random: add a spinlock_t to struct batched_entropy

6y ago

Thomas Huth

c7957206

KVM: selftests: Wrap vcpu_nested_state_get/set functions with x86 guard

6y ago

Tom Zanussi

c8d94a18

tracing: Check keys for variable references in expressions too

6y ago

Linus Torvalds

a2c48d98

Merge tag 'trace-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

6y ago

Dan Williams

52f476a3

libnvdimm/pmem: Bypass CONFIG_HARDENED_USERCOPY overhead

6y ago

Jan Kara

ee0ed02c

ext4: do not delete unlinked inode from orphan list on failed truncate

6y ago

George Spelvin

92e507d2

random: document get_random_int() family

6y ago

Andrew Jones

98e68344

kvm: selftests: aarch64: compile with warnings on

6y ago

Tom Zanussi

55267c88

tracing: Prevent hist_field_var_ref() from accessing NULL tracing_map_elts

6y ago

Linus Torvalds

2409207a

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

6y ago

Dan Williams

7bf7eac8

dax: Arrange for dax_supported check to span multiple devices

6y ago

Jan Kara

82a25b02

ext4: wait for outstanding dio during truncate in nojournal mode

6y ago

Jon DeVree

fe6f1a6a

random: fix CRNG initialization when random.trust_cpu=1

6y ago

Andrew Jones

55eda003

kvm: selftests: aarch64: fix default vm mode

6y ago

Linus Torvalds

a188339c

Linux 5.2-rc1 v5.2-rc1

6y ago

Linus Torvalds

7fbc78e3

Merge tag 'for-linus-20190524' of git://git.kernel.dk/linux-block

6y ago

Martin K. Petersen

8acf608e

Revert "scsi: sd: Keep disk read-only when re-reading partition"

6y ago

Qian Cai

c01dafad

libnvdimm: Fix compilation warnings with W=1

6y ago

Theodore Ts'o

0a944e8a

ext4: don't perform block validity checks on the journal inode

6y ago

Kees Cook

d5553523

random: move rand_initialize() earlier

6y ago

Andrew Jones

bffed38d

kvm: selftests: aarch64: dirty_log_test: fix unaligned memslot size

6y ago

Linus Torvalds

2e2c1220

Merge tag 'upstream-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs

6y ago

Linus Torvalds

7f8b40e3

Merge tag 'linux-kselftest-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest

6y ago

Jens Axboe

096c7a6d

Merge branch 'nvme-5.2-rc2' of git://git.infradead.org/nvme into for-linus

6y ago

Colin Ian King

d0c0d902

scsi: bnx2fc: fix incorrect cast to u64 on shift operation

6y ago

Jan Kara

2c1d0e36

ext4: avoid panic during forced reboot due to aborted journal

6y ago

Theodore Ts'o

eb9d1bf0

random: only read from /dev/random after its pool has received 128 bits

6y ago

Christian Borntraeger

19ec166c

KVM: s390: fix memory slot handling for KVM_SET_USER_MEMORY_REGION

6y ago

Linus Torvalds

cb6f8739

Merge branch 'akpm' (patches from Andrew)

6y ago

Richard Weinberger

4dd04815

ubifs: Convert xattr inum to host order

6y ago

Linus Torvalds

e7bd3e24

Merge tag 'devicetree-fixes-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

6y ago

Kees Cook

fe483192

selftests/timers: Add missing fflush(stdout) calls

6y ago

Jens Axboe

004d564f

tools/io_uring: sync with liburing

6y ago

Keith Busch

cb9e0e50

nvme-pci: use blk-mq mapping for unmanaged irqs

If a device is providing a single IRQ vector, the IO queue will share
that vector with the admin queue. This is an unmanaged vector, so does
not have a valid PCI IRQ affinity. Avoid trying to extract a managed
affinity in this case and let blk-mq set up the cpu:queue mapping instead.
Otherwise we'd hit the following warning when the device is using MSI:

WARNING: CPU: 4 PID: 7 at drivers/pci/msi.c:1272 pci_irq_get_affinity+0x66/0x80
Modules linked in: nvme nvme_core serio_raw
CPU: 4 PID: 7 Comm: kworker/u16:0 Tainted: G W 5.2.0-rc1+ #494
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014
Workqueue: nvme-reset-wq nvme_reset_work [nvme]
RIP: 0010:pci_irq_get_affinity+0x66/0x80
Code: 0b 31 c0 c3 83 e2 10 48 c7 c0 b0 83 35 91 74 2a 48 8b 87 d8 03 00 00 48 85 c0 74 0e 48 8b 50 30 48 85 d2 74 05 39 70 14 77 05 <0f> 0b 31 c0 c3 48 63 f6 48 8d 04 76 48 8d 04 c2 f3 c3 48 8b 40 30
RSP: 0000:ffffb5abc01d3cc8 EFLAGS: 00010246
RAX: ffff9536786a39c0 RBX: 0000000000000000 RCX: 0000000000000080
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff9536781ed000
RBP: ffff95367346a008 R08: ffff95367d43f080 R09: ffff953678c07800
R10: ffff953678164800 R11: 0000000000000000 R12: 0000000000000000
R13: ffff9536781ed000 R14: 00000000ffffffff R15: ffff95367346a008
FS: 0000000000000000(0000) GS:ffff95367d400000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fdf814a3ff0 CR3: 000000001a20f000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
blk_mq_pci_map_queues+0x37/0xd0
nvme_pci_map_queues+0x80/0xb0 [nvme]
blk_mq_alloc_tag_set+0x133/0x2f0
nvme_reset_work+0x105d/0x1590 [nvme]
process_one_work+0x291/0x530
worker_thread+0x218/0x3d0
? process_one_work+0x530/0x530
kthread+0x111/0x130
? kthread_park+0x90/0x90
ret_from_fork+0x1f/0x30
---[ end trace 74587339d93c83c0 ]---

Fixes: 22b5560195bd6 ("nvme-pci: Separate IO and admin queue IRQ vectors")
Reported-by: Iván Chavero <ichavero@chavero.com.mx>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>

6y ago

Erwan Velu

8ef860ae

scsi: smartpqi: Reporting unhandled SCSI errors

6y ago

Theodore Ts'o

170417c8

ext4: fix block validity checks for journal inodes using indirect blocks

6y ago

Rasmus Villemoes

764ed189

drivers/char/random.c: make primary_crng static

6y ago

Paolo Bonzini

2924b521

KVM: x86/pmu: do not mask the value that is written to fixed PMUs

6y ago

Linus Torvalds

ff8583d6

Merge tag 'kbuild-v5.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

6y ago

Feng Tang

de6da1e8

panic: add an option to replay all the printk message in buffer

6y ago

Richard Weinberger

76aa3494

ubifs: Use correct config name for encryption

6y ago

Linus Torvalds

86c2f5d6

Merge tag 'spdx-5.2-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pule more SPDX updates from Greg KH:
"Here is another set of reviewed patches that adds SPDX tags to
different kernel files, based on a set of rules that are being used to
parse the comments to try to determine that the license of the file is
"GPL-2.0-or-later".

Only the "obvious" versions of these matches are included here, a
number of "non-obvious" variants of text have been found but those
have been postponed for later review and analysis.

These patches have been out for review on the linux-spdx@vger mailing
list, and while they were created by automatic tools, they were
hand-verified by a bunch of different people, all whom names are on
the patches are reviewers"

* tag 'spdx-5.2-rc2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (85 commits)
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 125
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 123
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 122
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 121
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 120
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 119
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 118
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 116
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 114
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 113
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 112
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 111
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 110
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 106
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 105
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 104
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 103
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 102
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 101
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 98
...

6y ago

Rob Herring

852d095d

checkpatch.pl: Update DT vendor prefix check

6y ago

Kees Cook

e8108866

selftests: Remove forced unbuffering for test running

6y ago

Jens Axboe

486f0692

tools/io_uring: fix Makefile for pthread library link

6y ago

Keith Busch

0decfd8b

nvme: update MAINTAINERS

6y ago

YueHaibing

41552199

scsi: myrs: Fix uninitialized variable

6y ago

Linux 5.2-rc2 v5.2-rc2

cd6c84d8

Linus Torvalds

Merge tag 'trace-v5.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

c5b44095

Linus Torvalds

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

862f0a32

Linus Torvalds

tracing: Silence GCC 9 array bounds warning

0c97bf86

Miguel Ojeda

Merge tag 'random_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random

128f2bfa

Linus Torvalds

KVM: x86: fix return value for reserved EFER

66f61c92

Paolo Bonzini

kernel/trace/trace.h: Remove duplicate header of trace_seq.h

4eebe38a

Jagadeesh Pagadala

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

35efb51e

Linus Torvalds

random: fix soft lockup when trying to read from an uninitialized blocking pool

58be0106

Theodore Ts'o

tools/kvm_stat: fix fields filter for child events

883d25e7

Stefan Raspl

tracing: Add a check_val() check before updating cond_snapshot() track_val

9b2ca371

Tom Zanussi

Merge tag 'libnvdimm-fixes-5.2-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

b2ad8136

Linus Torvalds

ext4: fix dcache lookup of !casefolded directories

66883da1

Gabriel Krisman Bertazi

random: add a spinlock_t to struct batched_entropy

The per-CPU variable batched_entropy_uXX is protected by get_cpu_var().
This is just a preempt_disable() which ensures that the variable is only
from the local CPU. It does not protect against users on the same CPU
from another context. It is possible that a preemptible context reads
slot 0 and then an interrupt occurs and the same value is read again.

The above scenario is confirmed by lockdep if we add a spinlock:
| ================================
| WARNING: inconsistent lock state
| 5.1.0-rc3+ #42 Not tainted
| --------------------------------
| inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
| ksoftirqd/9/56 [HC0[0]:SC1[1]:HE0:SE0] takes:
| (____ptrval____) (batched_entropy_u32.lock){+.?.}, at: get_random_u32+0x3e/0xe0
| {SOFTIRQ-ON-W} state was registered at:
| _raw_spin_lock+0x2a/0x40
| get_random_u32+0x3e/0xe0
| new_slab+0x15c/0x7b0
| ___slab_alloc+0x492/0x620
| __slab_alloc.isra.73+0x53/0xa0
| kmem_cache_alloc_node+0xaf/0x2a0
| copy_process.part.41+0x1e1/0x2370
| _do_fork+0xdb/0x6d0
| kernel_thread+0x20/0x30
| kthreadd+0x1ba/0x220
| ret_from_fork+0x3a/0x50
…
| other info that might help us debug this:
| Possible unsafe locking scenario:
|
| CPU0
| ----
| lock(batched_entropy_u32.lock);
| <Interrupt>
| lock(batched_entropy_u32.lock);
|
| *** DEADLOCK ***
|
| stack backtrace:
| Call Trace:
…
| kmem_cache_alloc_trace+0x20e/0x270
| ipmi_alloc_recv_msg+0x16/0x40
…
| __do_softirq+0xec/0x48d
| run_ksoftirqd+0x37/0x60
| smpboot_thread_fn+0x191/0x290
| kthread+0xfe/0x130
| ret_from_fork+0x3a/0x50

Add a spinlock_t to the batched_entropy data structure and acquire the
lock while accessing it. Acquire the lock with disabled interrupts
because this function may be used from interrupt context.

Remove the batched_entropy_reset_lock lock. Now that we have a lock for
the data scructure, we can access it from a remote CPU.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>