commits
Pull ext4 fixes from Ted Ts'o:
"Miscellaneous ext4 bug fixes for 5.1"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: prohibit fstrim in norecovery mode
ext4: cleanup bh release code in ext4_ind_remove_space()
ext4: brelse all indirect buffer in ext4_ind_remove_space()
ext4: report real fs size after failed resize
ext4: add missing brelse() in add_new_gdb_meta_bg()
ext4: remove useless ext4_pin_inode()
ext4: avoid panic during forced reboot
ext4: fix data corruption caused by unaligned direct AIO
ext4: fix NULL pointer dereference while journal is aborted
Pull scheduler updates from Thomas Gleixner:
"Third more careful attempt for this set of fixes:
- Prevent a 32bit math overflow in the cpufreq code
- Fix a buffer overflow when scanning the cgroup2 cpu.max property
- A set of fixes for the NOHZ scheduler logic to prevent waking up
CPUs even if the capacity of the busy CPUs is sufficient along with
other tweaks optimizing the behaviour for asymmetric systems
(big/little)"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Skip LLC NOHZ logic for asymmetric systems
sched/fair: Tune down misfit NOHZ kicks
sched/fair: Comment some nohz_balancer_kick() kick conditions
sched/core: Fix buffer overflow in cgroup2 property cpu.max
sched/cpufreq: Fix 32-bit math overflow
The ext4 fstrim implementation uses the block bitmaps to find free space
that can be discarded. If we haven't replayed the journal, the bitmaps
will be stale and we absolutely *cannot* use stale metadata to zap the
underlying storage.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull perf updates from Thomas Gleixner:
"A larger set of perf updates.
Not all of them are strictly fixes, but that's solely the tip
maintainers fault as they let the timely -rc1 pull request fall
through the cracks for various reasons including travel. So I'm
sending this nevertheless because rebasing and distangling fixes and
updates would be a mess and risky as well. As of tomorrow, a strict
fixes separation is happening again. Sorry for the slip-up.
Kernel:
- Handle RECORD_MMAP vs. RECORD_MMAP2 correctly so different
consumers of the mmap event get what they requested.
Tools:
- A larger set of updates to perf record/report/scripts vs. time
stamp handling
- More Python3 fixups
- A pile of memory leak plumbing
- perf BPF improvements and fixes
- Finalize the perf.data directory storage"
[ Note: the kernel part is strictly a fix, the updates are purely to
tooling - Linus ]
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
perf bpf: Show more BPF program info in print_bpf_prog_info()
perf bpf: Extract logic to create program names from perf_event__synthesize_one_bpf_prog()
perf tools: Save bpf_prog_info and BTF of new BPF programs
perf evlist: Introduce side band thread
perf annotate: Enable annotation of BPF programs
perf build: Check what binutils's 'disassembler()' signature to use
perf bpf: Process PERF_BPF_EVENT_PROG_LOAD for annotation
perf symbols: Introduce DSO_BINARY_TYPE__BPF_PROG_INFO
perf feature detection: Add -lopcodes to feature-libbfd
perf top: Add option --no-bpf-event
perf bpf: Save BTF information as headers to perf.data
perf bpf: Save BTF in a rbtree in perf_env
perf bpf: Save bpf_prog_info information as headers to perf.data
perf bpf: Save bpf_prog_info in a rbtree in perf_env
perf bpf: Make synthesize_bpf_events() receive perf_session pointer instead of perf_tool
perf bpf: Synthesize bpf events with bpf_program__get_prog_info_linear()
bpftool: use bpf_program__get_prog_info_linear() in prog.c:do_dump()
tools lib bpf: Introduce bpf_program__get_prog_info_linear()
perf record: Replace option --bpf-event with --no-bpf-event
perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
...
The LLC NOHZ condition will become true as soon as >=2 CPUs in a
single LLC domain are busy. On big.LITTLE systems, this translates to
two or more CPUs of a "cluster" (big or LITTLE) being busy.
Issuing a NOHZ kick in these conditions isn't desired for asymmetric
systems, as if the busy CPUs can provide enough compute capacity to
the running tasks, then we can leave the NOHZ CPUs in peace.
Skip the LLC NOHZ condition for asymmetric systems, and rely on
nr_running & capacity checks to trigger NOHZ kicks when the system
actually needs them.
Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dietmar.Eggemann@arm.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20190211175946.4961-4-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, we are releasing the indirect buffer where we are done with
it in ext4_ind_remove_space(), so we can see the brelse() and
BUFFER_TRACE() everywhere. It seems fragile and hard to read, and we
may probably forget to release the buffer some day. This patch cleans
up the code by putting of the code which releases the buffers to the
end of the function.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes:
- Prevent potential NULL pointer dereferences in the HPET and HyperV
code
- Exclude the GART aperture from /proc/kcore to prevent kernel
crashes on access
- Use the correct macros for Cyrix I/O on Geode processors
- Remove yet another kernel address printk leak
- Announce microcode reload completion as requested by quite some
people. Microcode loading has become popular recently.
- Some 'Make Clang' happy fixlets
- A few cleanups for recently added code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/gart: Exclude GART aperture from kcore
x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
x86/mm/pti: Make local symbols static
x86/cpu/cyrix: Remove {get,set}Cx86_old macros used for Cyrix processors
x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
x86/microcode: Announce reload operation's completion
x86/hyperv: Prevent potential NULL pointer dereference
x86/hpet: Prevent potential NULL pointer dereference
x86/lib: Fix indentation issue, remove extra tab
x86/boot: Restrict header scope to make Clang happy
x86/mm: Don't leak kernel addresses
x86/cpufeature: Fix various quality problems in the <asm/cpu_device_hd.h> header
Pull perf/core improvements and fixes from Arnaldo:
BPF:
Song Liu:
- Add support for annotating BPF programs, using the PERF_RECORD_BPF_EVENT
and PERF_RECORD_KSYMBOL recently added to the kernel and plugging
binutils's libopcodes disassembly of BPF programs with the existing
annotation interfaces in 'perf annotate', 'perf report' and 'perf top'
various output formats (--stdio, --stdio2, --tui).
perf list:
Andi Kleen:
- Filter metrics when using substring search.
perf record:
Andi Kleen:
- Allow to limit number of reported perf.data files
- Clarify help for --switch-output.
perf report:
Andi Kleen
- Indicate JITed code better.
- Show all sort keys in help output.
perf script:
Andi Kleen:
- Support relative time.
perf stat:
Andi Kleen:
- Improve scaling.
General:
Changbin Du:
- Fix some mostly error path memory and reference count leaks found
using gcc's ASan and UBSan.
Vendor events:
Mamatha Inamdar:
- Remove P8 HW events which are not supported.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In this commit:
3b1baa6496e6 ("sched/fair: Add 'group_misfit_task' load-balance type")
we set rq->misfit_task_load whenever the current running task has a
utilization greater than 80% of rq->cpu_capacity. A non-zero value in
this field enables misfit load balancing.
However, if the task being looked at is already running on a CPU of
highest capacity, there's nothing more we can do for it. We can
currently spot this in update_sd_pick_busiest(), which prevents us
from selecting a sched_group of group_type == group_misfit_task as the
busiest group, but we don't do any of that in nohz_balancer_kick().
This means that we could repeatedly kick NOHZ CPUs when there's no
improvements in terms of load balance to be done.
Introduce a check_misfit_status() helper that returns true iff there
is a CPU in the system that could give more CPU capacity to a rq's
misfit task - IOW, there exists a CPU of higher capacity_orig or the
rq's CPU is severely pressured by rt/IRQ.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dietmar.Eggemann@arm.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: morten.rasmussen@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20190211175946.4961-3-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
All indirect buffers get by ext4_find_shared() should be released no
mater the branch should be freed or not. But now, we forget to release
the lower depth indirect buffers when removing space from the same
higher depth indirect block. It will lead to buffer leak and futher
more, it may lead to quota information corruption when using old quota,
consider the following case.
- Create and mount an empty ext4 filesystem without extent and quota
features,
- quotacheck and enable the user & group quota,
- Create some files and write some data to them, and then punch hole
to some files of them, it may trigger the buffer leak problem
mentioned above.
- Disable quota and run quotacheck again, it will create two new
aquota files and write the checked quota information to them, which
probably may reuse the freed indirect block(the buffer and page
cache was not freed) as data block.
- Enable quota again, it will invoke
vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
buffers and pagecache. Unfortunately, because of the buffer of quota
data block is still referenced, quota code cannot read the up to date
quota info from the device and lead to quota information corruption.
This problem can be reproduced by xfstests generic/231 on ext3 file
system or ext4 file system without extent and quota features.
This patch fix this problem by releasing the missing indirect buffers,
in ext4_ind_remove_space().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Pull timer fixes from Thomas Gleixner:
"A set of small fixes plus the removal of stale board support code:
- Remove the board support code from the clpx711x clocksource driver.
This change had fallen through the cracks and I'm sending it now
rather than dealing with people who want to improve that stale code
for 3 month.
- Use the proper clocksource mask on RICSV
- Make local scope functions and variables static"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/clps711x: Remove board support
clocksource/drivers/riscv: Fix clocksource mask
clocksource/drivers/mips-gic-timer: Make gic_compare_irqaction static
clocksource/drivers/timer-ti-dm: Make omap_dm_timer_set_load_start() static
clocksource/drivers/tcb_clksrc: Make tc_clksrc_suspend/resume() static
clocksource/drivers/clps711x: Make clps711x_clksrc_init() static
time/jiffies: Make refined_jiffies static
On machines where the GART aperture is mapped over physical RAM,
/proc/kcore contains the GART aperture range. Accessing the GART range via
/proc/kcore results in a kernel crash.
vmcore used to have the same issue, until it was fixed with commit
2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging
existing hook infrastructure in vmcore to let /proc/vmcore return zeroes
when attempting to read the aperture region, and so it won't read from the
actual memory.
Apply the same workaround for kcore. First implement the same hook
infrastructure for kcore, then reuse the hook functions introduced in the
previous vmcore fix. Just with some minor adjustment, rename some functions
for more general usage, and simplify the hook infrastructure a bit as there
is no module usage yet.
Suggested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Dave Young <dyoung@redhat.com>
Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com
Pull perf/core improvements and fixes from Arnaldo:
kernel:
Stephane Eranian :
- Restore mmap record type correctly when handling PERF_RECORD_MMAP2
events, as the same template is used for all the threads interested
in mmap events, some may want just PERF_RECORD_MMAP, while some
may want the extra info in MMAP2 records.
perf probe:
Adrian Hunter:
- Fix getting the kernel map, because since changes related to x86 PTI
entry trampolines handling, there are more than one kernel map.
perf script:
Andi Kleen:
- Support insn output for normal samples, i.e.:
perf script -F ip,sym,insn --xed
Will fetch the sample IP from the thread address space and feed it
to Intel's XED disassembler, producing lines such as:
ffffffffa4068804 native_write_msr wrmsr
ffffffffa415b95e __hrtimer_next_event_base movq 0x18(%rax), %rdx
That match 'perf annotate's output.
- Make the --cpu filter apply to PERF_RECORD_COMM/FORK/... events, in
addition to PERF_RECORD_SAMPLE.
perf report:
- Add a new --samples option to save a small random number of samples
per hist entry, using a reservoir technique to select a representative
number of samples.
Then allow browsing the samples using 'perf script' as part of the hist
entry context menu. This automatically adds the right filters, so only
the thread or CPU of the sample is displayed. Then we use less' search
functionality to directly jump to the time stamp of the selected sample.
It uses different menus for assembler and source display. Assembler
needs xed installed and source needs debuginfo.
- Fix the UI browser scripts pop up menu when there are many scripts
available.
perf report:
Andi Kleen:
- Add 'time' sort option. E.g.:
% perf report --sort time,overhead,symbol --time-quantum 1ms --stdio
...
0.67% 277061.87300 [.] _dl_start
0.50% 277061.87300 [.] f1
0.50% 277061.87300 [.] f2
0.33% 277061.87300 [.] main
0.29% 277061.87300 [.] _dl_lookup_symbol_x
0.29% 277061.87300 [.] dl_main
0.29% 277061.87300 [.] do_lookup_x
0.17% 277061.87300 [.] _dl_debug_initialize
0.17% 277061.87300 [.] _dl_init_paths
0.08% 277061.87300 [.] check_match
0.04% 277061.87300 [.] _dl_count_modids
1.33% 277061.87400 [.] f1
1.33% 277061.87400 [.] f2
1.33% 277061.87400 [.] main
1.17% 277061.87500 [.] main
1.08% 277061.87500 [.] f1
1.08% 277061.87500 [.] f2
1.00% 277061.87600 [.] main
0.83% 277061.87600 [.] f1
0.83% 277061.87600 [.] f2
1.00% 277061.87700 [.] main
tools headers:
Arnaldo Carvalho de Melo:
- Update x86's syscall_64.tbl, no change in tools/perf behaviour.
- Sync copies asm-generic/unistd.h and linux/in with the kernel sources.
perf data:
Jiri Olsa:
- Prep work to support having perf.data stored as a directory, with one
file per CPU, that ultimately will allow having one ring buffer reading
thread per CPU.
Vendor events:
Martin Liška:
- perf PMU events for AMD Family 17h.
perf script python:
Tony Jones:
- Add python3 support for the remaining Intel PT related scripts, with
these we should have a clean build of perf with python3 while still
supporting the build with python2.
libbpf:
Arnaldo Carvalho de Melo:
- Fix the build on uCLibc, adding the missing stdarg.h since we use
va_list in one typedef.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch enables showing bpf program name, address, and size in the
header.
Before the patch:
perf report --header-only
...
# bpf_prog_info of id 9
# bpf_prog_info of id 10
# bpf_prog_info of id 13
After the patch:
# bpf_prog_info 9: bpf_prog_7be49e3934a125ba addr 0xffffffffa0024947 size 229
# bpf_prog_info 10: bpf_prog_2a142ef67aaad174 addr 0xffffffffa007c94d size 229
# bpf_prog_info 13: bpf_prog_47368425825d7384_task__task_newt addr 0xffffffffa0251137 size 369
Committer notes:
Fix the fallback definition when HAVE_LIBBPF_SUPPORT is not defined,
i.e. add the missing 'static inline' and add the __maybe_unused to the
args. Also add stdio.h since we now use FILE * in bpf-event.h.
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190319165454.1298742-3-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We now have a comment explaining the first sched_domain based NOHZ kick,
so might as well comment them all.
While at it, unwrap a line that fits under 80 characters.
Co-authored-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dietmar.Eggemann@arm.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: morten.rasmussen@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20190211175946.4961-2-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently when the file system resize using ext4_resize_fs() fails it
will report into log that "resized filesystem to <requested block
count>". However this may not be true in the case of failure. Use the
current block count as returned by ext4_blocks_count() to report the
block count.
Additionally, report a warning that "error occurred during file system
resize"
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull locking fixes from Thomas Gleixner:
"Two small fixes:
- Cure a recently introduces error path hickup which tries to
unregister a not registered lockdep key in te workqueue code
- Prevent unaligned cmpxchg() crashes in the robust list handling
code by sanity checking the user space supplied futex pointer"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Ensure that futex address is aligned in handle_futex_death()
workqueue: Only unregister a registered lockdep key
Since board support for the CLPS711X platform was removed,
remove the board support from the clps711x-timer driver.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lkml.kernel.org/r/20181220111626.17140-1-shc_work@mail.ru
Merge the forgotten cleanup patch for the new file, so the mess does not
propagate further.
Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: kbuild-all@01.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190313184243.GA10820@lkp-sb-ep06
The libbpf_print_fn_t typedef uses va_list without including the header
where that type is defined, stdarg.h, breaking in places where we're
unlucky for that type not to be already defined by some previously
included header.
Noticed while building on fedora 24 cross building tools/perf to the ARC
architecture using the uClibc C library:
28 fedora:24-x-ARC-uClibc : FAIL arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
CC /tmp/build/perf/tests/llvm.o
In file included from tests/llvm.c:3:0:
/git/linux/tools/lib/bpf/libbpf.h:57:20: error: unknown type name 'va_list'
const char *, va_list ap);
^~~~~~~
/git/linux/tools/lib/bpf/libbpf.h:59:34: error: unknown type name 'libbpf_print_fn_t'
LIBBPF_API void libbpf_set_print(libbpf_print_fn_t fn);
^~~~~~~~~~~~~~~~~
mv: cannot stat '/tmp/build/perf/tests/.llvm.o.tmp': No such file or directory
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Yonghong Song <yhs@fb.com>
Fixes: a8a1f7d09cfc ("libbpf: fix libbpf_print")
Link: https://lkml.kernel.org/n/tip-5270n2quu2gqz22o7itfdx00@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Extract logic to create program names to synthesize_bpf_prog_name(), so
that it can be reused in header.c:print_bpf_prog_info().
This commit doesn't change the behavior.
Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190319165454.1298742-2-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add limit into sscanf format string for on-stack buffer.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0d5936344f30 ("sched: Implement interface for cgroup unified hierarchy")
Link: https://lkml.kernel.org/r/155189230232.2620.13120481613524200065.stgit@buzz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently in add_new_gdb_meta_bg() there is a missing brelse of gdb_bh
in case ext4_journal_get_write_access() fails.
Additionally kvfree() is missing in the same error path. Fix it by
moving the ext4_journal_get_write_access() before the ext4 sb update as
Ted suggested and release n_group_desc and gdb_bh in case it fails.
Fixes: 61a9c11e5e7a ("ext4: add missing brelse() add_new_gdb_meta_bg()'s error path")
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull irq fixes from Thomas Gleixner:
"A set of fixes for the interrupt subsystem:
- Remove secondary GIC support on systems w/o device-tree support
- A set of small fixlets in various irqchip drivers
- static and fall-through annotations
- Kernel doc and typo fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Mark expected switch case fall-through
genirq/devres: Remove excess parameter from kernel doc
irqchip/irq-mvebu-sei: Make mvebu_sei_ap806_caps static
irqchip/mbigen: Don't clear eventid when freeing an MSI
irqchip/stm32: Don't set rising configuration registers at init
irqchip/stm32: Don't clear rising/falling config registers at init
dt-bindings: irqchip: renesas-irqc: Document r8a774c0 support
irqchip/mmp: Make mmp_irq_domain_ops static
irqchip/brcmstb-l2: Make two init functions static
genirq: Fix typo in comment of IRQD_MOVE_PCNTXT
irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp
irqchip/gic: Drop support for secondary GIC in non-DT systems
irqchip/imx-irqsteer: Fix of_property_read_u32() error handling
The futex code requires that the user space addresses of futexes are 32bit
aligned. sys_futex() checks this in futex_get_keys() but the robust list
code has no alignment check in place.
As a consequence the kernel crashes on architectures with strict alignment
requirements in handle_futex_death() when trying to cmpxchg() on an
unaligned futex address which was retrieved from the robust list.
[ tglx: Rewrote changelog, proper sizeof() based alignement check and add
comment ]
Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core")
Signed-off-by: Chen Jie <chenjie6@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <dvhart@infradead.org>
Cc: <peterz@infradead.org>
Cc: <zengweilin@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1552621478-119787-1-git-send-email-chenjie6@huawei.com
For all riscv architectures (RV32, RV64 and RV128), the clocksource
is a 64 bit incrementing counter.
Fix the clock source mask accordingly.
Tested on both 64bit and 32 bit virt machine in QEMU.
Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-riscv@lists.infradead.org
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Anup Patel <Anup.Patel@wdc.com>
Cc: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190322215411.19362-1-atish.patra@wdc.com
When building with -Wsometimes-uninitialized, Clang warns:
arch/x86/kernel/hw_breakpoint.c:355:2: warning: variable 'align' is used
uninitialized whenever switch default is taken
[-Wsometimes-uninitialized]
The default cannot be reached because arch_build_bp_info() initializes
hw->len to one of the specified cases. Nevertheless the warning is valid
and returning -EINVAL makes sure that this cannot be broken by future
modifications.
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: clang-built-linux@googlegroups.com
Link: https://github.com/ClangBuiltLinux/linux/issues/392
Link: https://lkml.kernel.org/r/20190307212756.4648-1-natechancellor@gmail.com
Thomas noticed that the new arch/x86/include/asm/cpu_device_id.h header is
a train-wreck that didn't incorporate review feedback like not using __u8
in kernel-only headers.
While at it also fix all the *other* problems this header has:
- Use canonical names for the header guards. It's inexplicable why a non-standard
guard was used.
- Don't define the header guard to 1. Plus annotate the closing #endif as done
absolutely every other header. Again, an inexplicable source of noise.
- Move the kernel API calls provided by this header next to each other, there's
absolutely no reason to have them spread apart in the header.
- Align the INTEL_CPU_DESC() macro initializations vertically, this is easier to
read and it's also the canonical style.
- Actually name the macro arguments properly: instead of 'mod, step, rev',
spell out 'model, stepping, revision' - it's not like we have a lack of
characters in this header.
- Actually make arguments macro-safe - again it's inexplicable why it wasn't
done properly to begin with.
Quite amazing how many problems a 41 lines header can contain.
This kind of code quality is unacceptable, and it slipped through the
review net of 2 developers and 2 maintainers, including myself, until
Thomas noticed it. :-/
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Guenter reported a build warning for CONFIG_CPU_SUP_INTEL=n:
> With allmodconfig-CONFIG_CPU_SUP_INTEL, this patch results in:
>
> In file included from arch/x86/events/amd/core.c:8:0:
> arch/x86/events/amd/../perf_event.h:1036:45: warning: ‘struct cpu_hw_event’ declared inside parameter list will not be visible outside of this definition or declaration
> static inline int intel_cpuc_prepare(struct cpu_hw_event *cpuc, int cpu)
While harmless (an unsed pointer is an unused pointer, no matter the type)
it needs fixing.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: d01b1f96a82e ("perf/x86/intel: Make cpuc allocations consistent")
Link: http://lkml.kernel.org/r/20190315081410.GR5996@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add a way to define custom scripts through ~/.perfconfig, which are then
added to the scripts menu. The scripts get the same arguments as 'perf
script', in particular -i, --cpu, --tid.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311144502.15423-10-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To fully annotate BPF programs with source code mapping, 4 different
information are needed:
1) PERF_RECORD_KSYMBOL
2) PERF_RECORD_BPF_EVENT
3) bpf_prog_info
4) btf
This patch handles 3) and 4) for BPF programs loaded after 'perf
record|top'.
For timely process of these information, a dedicated event is added to
the side band evlist.
When PERF_RECORD_BPF_EVENT is received via the side band event, the
polling thread gathers 3) and 4) vis sys_bpf and store them in perf_env.
This information is saved to perf.data at the end of 'perf record'.
Committer testing:
The 'wakeup_watermark' member in 'struct perf_event_attr' is inside a
unnamed union, so can't be used in a struct designated initialization
with older gccs, get it out of that, isolating as 'attr.wakeup_watermark
= 1;' to work with all gcc versions.
We also need to add '--no-bpf-event' to the 'perf record'
perf_event_attr tests in 'perf test', as the way that that test goes is
to intercept the events being setup and looking if they match the fields
described in the control files, since now it finds first the side band
event used to catch the PERF_RECORD_BPF_EVENT, they all fail.
With these issues fixed:
Same scenario as for testing BPF programs loaded before 'perf record' or
'perf top' starts, only start the BPF programs after 'perf record|top',
so that its information get collected by the sideband threads, the rest
works as for the programs loaded before start monitoring.
Add missing 'inline' to the bpf_event__add_sb_event() when
HAVE_LIBBPF_SUPPORT is not defined, fixing the build in systems without
binutils devel files installed.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-16-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Vincent Wang reported that get_next_freq() has a mult overflow bug on
32-bit platforms in the IOWAIT boost case, since in that case {util,max}
are in freq units instead of capacity units.
Solve this by moving the IOWAIT boost to capacity units. And since this
means @max is constant; simplify the code.
Reported-by: Vincent Wang <vincent.wang@unisoc.com>
Tested-by: Vincent Wang <vincent.wang@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190305083202.GU32494@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
This function is never used from the beginning (and is commented out);
let's remove it.
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull core fixes from Thomas Gleixner:
"Two small fixes:
- Move the large objtool_file struct off the stack so objtool works
in setups with a tight stack limit.
- Make a few variables static in the watchdog core code"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog/core: Make variables static
objtool: Move objtool_file struct off the stack
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
With -Wimplicit-fallthrough added to CFLAGS:
kernel/irq/manage.c: In function ‘irq_do_set_affinity’:
kernel/irq/manage.c:198:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
cpumask_copy(desc->irq_common_data.affinity, mask);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/irq/manage.c:199:2: note: here
case IRQ_SET_MASK_OK_NOCOPY:
^~~~
Annotate it.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20190228213714.GA9246@embeddedor
The recent change to prevent use after free and a memory leak introduced an
unconditional call to wq_unregister_lockdep() in the error handling
path. If the lockdep key had not been registered yet, then the lockdep core
emits a warning.
Only call wq_unregister_lockdep() if wq_register_lockdep() has been
called first.
Fixes: 009bb421b6ce ("workqueue, lockdep: Fix an alloc_workqueue() error path")
Reported-by: syzbot+be0c198232f86389c3dd@syzkaller.appspotmail.com
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Link: https://lkml.kernel.org/r/20190311230255.176081-1-bvanassche@acm.org
Fix sparse warning:
drivers/clocksource/mips-gic-timer.c:70:18: warning:
symbol 'gic_compare_irqaction' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <daniel.lezcano@linaro.org>
Link: https://lkml.kernel.org/r/20190322144359.19516-1-yuehaibing@huawei.com
With 'make C=2 W=1', sparse and gcc both complain:
CHECK arch/x86/mm/pti.c
arch/x86/mm/pti.c:84:3: warning: symbol 'pti_mode' was not declared. Should it be static?
arch/x86/mm/pti.c:605:6: warning: symbol 'pti_set_kernel_image_nonglobal' was not declared. Should it be static?
CC arch/x86/mm/pti.o
arch/x86/mm/pti.c:605:6: warning: no previous prototype for 'pti_set_kernel_image_nonglobal' [-Wmissing-prototypes]
605 | void pti_set_kernel_image_nonglobal(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pti_set_kernel_image_nonglobal() is only used locally. 'pti_mode' exists in
drivers/hwtracing/intel_th/pti.c as well, but it's a completely unrelated
local (static) symbol.
Make both static.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/27680.1552376873@turing-police
For bug workarounds or checks, it is useful to check for specific
microcode revisions.
Add a new generic function to match the CPU with stepping.
Add the other function to check the min microcode revisions for
the matched CPU.
A new table format is introduced to facilitate the quirk to
fill the related information.
This does not change the existing x86_cpu_id because it's an ABI
shared with modules, and also has quite different requirements,
as in no wildcards, but everything has to be matched exactly.
Originally-by: Andi Kleen <ak@linux.intel.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: eranian@google.com
Link: https://lkml.kernel.org/r/1549319013-4522-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Through:
validate_event()
x86_pmu.get_event_constraints(.idx=-1)
tfa_get_event_constraints()
dyn_constraint()
cpuc->constraint_list[-1] is used, which is an obvious out-of-bound access.
In this case, simply skip the TFA constraint code, there is no event
constraint with just PMC3, therefore the code will never result in the
empty set.
Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
Reported-by: Tony Jones <tonyj@suse.com>
Reported-by: "DSouza, Nelson" <nelson.dsouza@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Jones <tonyj@suse.com>
Tested-by: "DSouza, Nelson" <nelson.dsouza@intel.com>
Cc: eranian@google.com
Cc: jolsa@redhat.com
Cc: stable@kernel.org
Link: https://lkml.kernel.org/r/20190314130705.441549378@infradead.org
Fix the argv ui browser code to correctly display more entries than fit
on the screen without crashing. The problem was some type confusion with
pointer types in the ->seek function. Do the argv arithmetic correctly
with char ** pointers. Also add some asserts to find overruns and limit
the display function correctly.
Then finally remove a workaround for this in the res sample browser.
Committer testing:
1) Resize the x terminal to have just some 5 lines
2) Use 'perf report --samples 1' to activate the sample browser options
in the menu
3) Press ENTER, this will cause the crash:
# perf report --samples 1
perf: Segmentation fault
-------- backtrace --------
perf[0x5a514a]
/lib64/libc.so.6(+0x385bf)[0x7f27281b55bf]
/lib64/libc.so.6(+0x161a67)[0x7f27282dea67]
/lib64/libslang.so.2(SLsmg_write_wrapped_string+0x82)[0x7f272874a0b2]
perf(ui_browser__argv_refresh+0x77)[0x5939a7]
perf[0x5924cc]
perf(ui_browser__run+0x39)[0x593449]
perf(ui__popup_menu+0x83)[0x5a5263]
perf[0x59f421]
perf(perf_evlist__tui_browse_hists+0x3a0)[0x5a3780]
perf(cmd_report+0x2746)[0x447136]
perf[0x4a95fe]
perf(main+0x61c)[0x42dc6c]
/lib64/libc.so.6(__libc_start_main+0xf2)[0x7f27281a1412]
perf(_start+0x2d)[0x42de9d]
#
After applying this patch no crash takes place in such situation.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311144502.15423-12-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch introduces side band thread that captures extended
information for events like PERF_RECORD_BPF_EVENT.
This new thread uses its own evlist that uses ring buffer with very low
watermark for lower latency.
To use side band thread, we need to:
1. add side band event(s) by calling perf_evlist__add_sb_event();
2. calls perf_evlist__start_sb_thread();
3. at the end of perf run, perf_evlist__stop_sb_thread().
In the next patch, we use this thread to handle PERF_RECORD_BPF_EVENT.
Committer notes:
Add fix by Jiri Olsa for when te sb_tread can't get started and then at
the end the stop_sb_thread() segfaults when joining the (non-existing)
thread.
That can happen when running 'perf top' or 'perf record' as a normal
user, for instance.
Further checks need to be done on top of this to more graciously handle
these possible failure scenarios.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-15-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The TIMER_IRQSAFE usage was introduced in commit 22597dc3d97b1 ("kthread:
initial support for delayed kthread work") which modelled the delayed
kthread code after workqueue's code. The workqueue code requires the flag
TIMER_IRQSAFE for synchronisation purpose. This is not true for kthread's
delay timer since all operations occur under a lock.
Remove TIMER_IRQSAFE from the timer initialisation and use timer_setup()
for initialisation purpose which is the official function.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lkml.kernel.org/r/20190212162554.19779-2-bigeasy@linutronix.de
When admin calls "reboot -f" - i.e., does a hard system reboot by
directly calling reboot(2) - ext4 filesystem mounted with errors=panic
can panic the system. This happens because the underlying device gets
disabled without unmounting the filesystem and thus some syscall running
in parallel to reboot(2) can result in the filesystem getting IO errors.
This is somewhat surprising to the users so try improve the behavior by
switching to errors=remount-ro behavior when the system is running
reboot(2).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull thermal management fixes from Zhang Rui:
- Fix a wrong __percpu structure declaration in intel_powerclamp driver
(Luc Van Oostenryck)
- Fix truncated name of the idle injection kthreads created by
intel_powerclamp driver (Zhang Rui)
- Fix the missing UUID supports in int3400 thermal driver (Matthew
Garrett)
- Fix a crash when accessing the debugfs of bcm2835 SoC thermal driver
(Phil Elwell)
- A couple of trivial fixes/cleanups in some SoC thermal drivers
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal/intel_powerclamp: fix truncated kthread name
thermal: mtk: Allocate enough space for mtk_thermal.
thermal/int340x_thermal: fix mode setting
thermal/int340x_thermal: Add additional UUIDs
thermal: cpu_cooling: Remove unused cur_freq variable
thermal: bcm2835: Fix crash in bcm2835_thermal_debugfs
thermal: samsung: Fix incorrect check after code merge
thermal/intel_powerclamp: fix __percpu declaration of worker_data
sparse complains:
CHECK kernel/watchdog.c
kernel/watchdog.c:45:19: warning: symbol 'nmi_watchdog_available'
was not declared. Should it be static?
kernel/watchdog.c:47:16: warning: symbol 'watchdog_allowed_mask'
was not declared. Should it be static?
They're not referenced by name from anyplace else, make them static.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/7855.1552383228@turing-police
Building with 'make W=1' complains:
CC kernel/irq/devres.o
kernel/irq/devres.c:104: warning: Excess function parameter 'thread_fn'
description in 'devm_request_any_context_irq'
Remove it.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/31207.1552378676@turing-police
Fix sparse warning:
drivers/clocksource/timer-ti-dm.c:589:5: warning:
symbol 'omap_dm_timer_set_load_start' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <daniel.lezcano@linaro.org>
Link: https://lkml.kernel.org/r/20190322144302.6704-1-yuehaibing@huawei.com
The getCx86_old() and setCx86_old() macros have been replaced with
correctly working getCx86() and setCx86(), so remove these unused macros.
Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: luto@kernel.org
Link: https://lkml.kernel.org/r/1552596361-8967-3-git-send-email-tedheadster@gmail.com
Some F17h models do not have CPB set in CPUID even though the CPU
supports it. Set the feature bit unconditionally on all F17h.
[ bp: Rewrite commit message and patch. ]
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181120030018.5185-1-jiaxun.yang@flygoat.com
Merge misc patches from Andrew Morton:
- a little bit more MM
- a few fixups
[ The "little bit more MM" is actually just one of the three patches
Andrew sent for mm/filemap.c, I'm still mulling over two more of them
from Josef Bacik - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
tools/testing/selftests/proc/proc-pid-vm.c: test with vsyscall in mind
zram: default to lzo-rle instead of lzo
filemap: pass vm_fault to the mmap ra helpers
Don't overflow array when the scripts directory is too large, or the
script file name is too long.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311144502.15423-11-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In symbol__disassemble(), DSO_BINARY_TYPE__BPF_PROG_INFO dso calls into
a new function symbol__disassemble_bpf(), where annotation line
information is filled based on the bpf_prog_info and btf data saved in
given perf_env.
symbol__disassemble_bpf() uses binutils's libopcodes to disassemble bpf
programs.
Committer testing:
After fixing this:
- u64 *addrs = (u64 *)(info_linear->info.jited_ksyms);
+ u64 *addrs = (u64 *)(uintptr_t)(info_linear->info.jited_ksyms);
Detected when crossbuilding to a 32-bit arch.
And making all this dependent on HAVE_LIBBFD_SUPPORT and
HAVE_LIBBPF_SUPPORT:
1) Have a BPF program running, one that has BTF info, etc, I used
the tools/perf/examples/bpf/augmented_raw_syscalls.c put in place
by 'perf trace'.
# grep -B1 augmented_raw ~/.perfconfig
[trace]
add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
#
# perf trace -e *mmsg
dnf/6245 sendmmsg(20, 0x7f5485a88030, 2, MSG_NOSIGNAL) = 2
NetworkManager/10055 sendmmsg(22<socket:[1056822]>, 0x7f8126ad1bb0, 2, MSG_NOSIGNAL) = 2
2) Then do a 'perf record' system wide for a while:
# perf record -a
^C[ perf record: Woken up 68 times to write data ]
[ perf record: Captured and wrote 19.427 MB perf.data (366891 samples) ]
#
3) Check that we captured BPF and BTF info in the perf.data file:
# perf report --header-only | grep 'b[pt]f'
# event : name = cycles:ppp, , id = { 294789, 294790, 294791, 294792, 294793, 294794, 294795, 294796 }, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CPU|PERIOD, read_format = ID, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
# bpf_prog_info of id 13
# bpf_prog_info of id 14
# bpf_prog_info of id 15
# bpf_prog_info of id 16
# bpf_prog_info of id 17
# bpf_prog_info of id 18
# bpf_prog_info of id 21
# bpf_prog_info of id 22
# bpf_prog_info of id 41
# bpf_prog_info of id 42
# btf info of id 2
#
4) Check which programs got recorded:
# perf report | grep bpf_prog | head
0.16% exe bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.14% exe bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
0.08% fuse-overlayfs bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.07% fuse-overlayfs bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
0.01% clang-4.0 bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
0.01% clang-4.0 bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.00% clang bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
0.00% runc bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.00% clang bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.00% sh bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
#
This was with the default --sort order for 'perf report', which is:
--sort comm,dso,symbol
If we just look for the symbol, for instance:
# perf report --sort symbol | grep bpf_prog | head
0.26% [k] bpf_prog_819967866022f1e1_sys_enter - -
0.24% [k] bpf_prog_c1bd85c092d6e4aa_sys_exit - -
#
or the DSO:
# perf report --sort dso | grep bpf_prog | head
0.26% bpf_prog_819967866022f1e1_sys_enter
0.24% bpf_prog_c1bd85c092d6e4aa_sys_exit
#
We'll see the two BPF programs that augmented_raw_syscalls.o puts in
place, one attached to the raw_syscalls:sys_enter and another to the
raw_syscalls:sys_exit tracepoints, as expected.
Now we can finally do, from the command line, annotation for one of
those two symbols, with the original BPF program source coude intermixed
with the disassembled JITed code:
# perf annotate --stdio2 bpf_prog_819967866022f1e1_sys_enter
Samples: 950 of event 'cycles:ppp', 4000 Hz, Event count (approx.): 553756947, [percent: local period]
bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
Percent int sys_enter(struct syscall_enter_args *args)
53.41 push %rbp
0.63 mov %rsp,%rbp
0.31 sub $0x170,%rsp
1.93 sub $0x28,%rbp
7.02 mov %rbx,0x0(%rbp)
3.20 mov %r13,0x8(%rbp)
1.07 mov %r14,0x10(%rbp)
0.61 mov %r15,0x18(%rbp)
0.11 xor %eax,%eax
1.29 mov %rax,0x20(%rbp)
0.11 mov %rdi,%rbx
return bpf_get_current_pid_tgid();
2.02 → callq *ffffffffda6776d9
2.76 mov %eax,-0x148(%rbp)
mov %rbp,%rsi
int sys_enter(struct syscall_enter_args *args)
add $0xfffffffffffffeb8,%rsi
return bpf_map_lookup_elem(pids, &pid) != NULL;
movabs $0xffff975ac2607800,%rdi
1.26 → callq *ffffffffda6789e9
cmp $0x0,%rax
2.43 → je 0
add $0x38,%rax
0.21 xor %r13d,%r13d
if (pid_filter__has(&pids_filtered, getpid()))
0.81 cmp $0x0,%rax
→ jne 0
mov %rbp,%rdi
probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
2.22 add $0xfffffffffffffeb8,%rdi
0.11 mov $0x40,%esi
0.32 mov %rbx,%rdx
2.74 → callq *ffffffffda658409
syscall = bpf_map_lookup_elem(&syscalls, &augmented_args.args.syscall_nr);
0.22 mov %rbp,%rsi
1.69 add $0xfffffffffffffec0,%rsi
syscall = bpf_map_lookup_elem(&syscalls, &augmented_args.args.syscall_nr);
movabs $0xffff975bfcd36000,%rdi
add $0xd0,%rdi
0.21 mov 0x0(%rsi),%eax
0.93 cmp $0x200,%rax
→ jae 0
0.10 shl $0x3,%rax
0.11 add %rdi,%rax
0.11 → jmp 0
xor %eax,%eax
if (syscall == NULL || !syscall->enabled)
1.07 cmp $0x0,%rax
→ je 0
if (syscall == NULL || !syscall->enabled)
6.57 movzbq 0x0(%rax),%rdi
if (syscall == NULL || !syscall->enabled)
cmp $0x0,%rdi
0.95 → je 0
mov $0x40,%r8d
switch (augmented_args.args.syscall_nr) {
mov -0x140(%rbp),%rdi
switch (augmented_args.args.syscall_nr) {
cmp $0x2,%rdi
→ je 0
cmp $0x101,%rdi
→ je 0
cmp $0x15,%rdi
→ jne 0
case SYS_OPEN: filename_arg = (const void *)args->args[0];
mov 0x10(%rbx),%rdx
→ jmp 0
case SYS_OPENAT: filename_arg = (const void *)args->args[1];
mov 0x18(%rbx),%rdx
if (filename_arg != NULL) {
cmp $0x0,%rdx
→ je 0
xor %edi,%edi
augmented_args.filename.reserved = 0;
mov %edi,-0x104(%rbp)
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
mov %rbp,%rdi
add $0xffffffffffffff00,%rdi
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
mov $0x100,%esi
→ callq *ffffffffda658499
mov $0x148,%r8d
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
mov %eax,-0x108(%rbp)
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
mov %rax,%rdi
shl $0x20,%rdi
shr $0x20,%rdi
if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
cmp $0xff,%rdi
→ ja 0
len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
add $0x48,%rax
len &= sizeof(augmented_args.filename.value) - 1;
and $0xff,%rax
mov %rax,%r8
mov %rbp,%rcx
return perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
add $0xfffffffffffffeb8,%rcx
mov %rbx,%rdi
movabs $0xffff975fbd72d800,%rsi
mov $0xffffffff,%edx
→ callq *ffffffffda658ad9
mov %rax,%r13
}
mov %r13,%rax
0.72 mov 0x0(%rbp),%rbx
mov 0x8(%rbp),%r13
1.16 mov 0x10(%rbp),%r14
0.10 mov 0x18(%rbp),%r15
0.42 add $0x28,%rbp
0.54 leaveq
0.54 ← retq
#
Please see 'man perf-config' to see how to control what should be seen,
via ~/.perfconfig [annotate] section, for instance, one can suppress the
source code and see just the disassembly, etc.
Alternatively, use the TUI bu just using 'perf annotate', press
'/bpf_prog' to see the bpf symbols, press enter and do the interactive
annotation, which allows for dumping to a file after selecting the
the various output tunables, for instance, the above without source code
intermixed, plus showing all the instruction offsets:
# perf annotate bpf_prog_819967866022f1e1_sys_enter
Then press: 's' to hide the source code + 'O' twice to show all
instruction offsets, then 'P' to print to the
bpf_prog_819967866022f1e1_sys_enter.annotation file, which will have:
# cat bpf_prog_819967866022f1e1_sys_enter.annotation
bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
Event: cycles:ppp
53.41 0: push %rbp
0.63 1: mov %rsp,%rbp
0.31 4: sub $0x170,%rsp
1.93 b: sub $0x28,%rbp
7.02 f: mov %rbx,0x0(%rbp)
3.20 13: mov %r13,0x8(%rbp)
1.07 17: mov %r14,0x10(%rbp)
0.61 1b: mov %r15,0x18(%rbp)
0.11 1f: xor %eax,%eax
1.29 21: mov %rax,0x20(%rbp)
0.11 25: mov %rdi,%rbx
2.02 28: → callq *ffffffffda6776d9
2.76 2d: mov %eax,-0x148(%rbp)
33: mov %rbp,%rsi
36: add $0xfffffffffffffeb8,%rsi
3d: movabs $0xffff975ac2607800,%rdi
1.26 47: → callq *ffffffffda6789e9
4c: cmp $0x0,%rax
2.43 50: → je 0
52: add $0x38,%rax
0.21 56: xor %r13d,%r13d
0.81 59: cmp $0x0,%rax
5d: → jne 0
63: mov %rbp,%rdi
2.22 66: add $0xfffffffffffffeb8,%rdi
0.11 6d: mov $0x40,%esi
0.32 72: mov %rbx,%rdx
2.74 75: → callq *ffffffffda658409
0.22 7a: mov %rbp,%rsi
1.69 7d: add $0xfffffffffffffec0,%rsi
84: movabs $0xffff975bfcd36000,%rdi
8e: add $0xd0,%rdi
0.21 95: mov 0x0(%rsi),%eax
0.93 98: cmp $0x200,%rax
9f: → jae 0
0.10 a1: shl $0x3,%rax
0.11 a5: add %rdi,%rax
0.11 a8: → jmp 0
aa: xor %eax,%eax
1.07 ac: cmp $0x0,%rax
b0: → je 0
6.57 b6: movzbq 0x0(%rax),%rdi
bb: cmp $0x0,%rdi
0.95 bf: → je 0
c5: mov $0x40,%r8d
cb: mov -0x140(%rbp),%rdi
d2: cmp $0x2,%rdi
d6: → je 0
d8: cmp $0x101,%rdi
df: → je 0
e1: cmp $0x15,%rdi
e5: → jne 0
e7: mov 0x10(%rbx),%rdx
eb: → jmp 0
ed: mov 0x18(%rbx),%rdx
f1: cmp $0x0,%rdx
f5: → je 0
f7: xor %edi,%edi
f9: mov %edi,-0x104(%rbp)
ff: mov %rbp,%rdi
102: add $0xffffffffffffff00,%rdi
109: mov $0x100,%esi
10e: → callq *ffffffffda658499
113: mov $0x148,%r8d
119: mov %eax,-0x108(%rbp)
11f: mov %rax,%rdi
122: shl $0x20,%rdi
126: shr $0x20,%rdi
12a: cmp $0xff,%rdi
131: → ja 0
133: add $0x48,%rax
137: and $0xff,%rax
13d: mov %rax,%r8
140: mov %rbp,%rcx
143: add $0xfffffffffffffeb8,%rcx
14a: mov %rbx,%rdi
14d: movabs $0xffff975fbd72d800,%rsi
157: mov $0xffffffff,%edx
15c: → callq *ffffffffda658ad9
161: mov %rax,%r13
164: mov %r13,%rax
0.72 167: mov 0x0(%rbp),%rbx
16b: mov 0x8(%rbp),%r13
1.16 16f: mov 0x10(%rbp),%r14
0.10 173: mov 0x18(%rbp),%r15
0.42 177: add $0x28,%rbp
0.54 17b: leaveq
0.54 17c: ← retq
Another cool way to test all this is to symple use 'perf top' look for
those symbols, go there and press enter, annotate it live :-)
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-13-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In order to enable the queuing of kthread work items from hardirq context
even when PREEMPT_RT_FULL is enabled, convert the worker spin_lock to a
raw_spin_lock.
This is only acceptable to do because the work performed under the lock is
well-bounded and minimal.
Reported-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Reported-by: Tim Sander <tim@krieglstein.org>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: https://lkml.kernel.org/r/20190212162554.19779-1-bigeasy@linutronix.de
Ext4 needs to serialize unaligned direct AIO because the zeroing of
partial blocks of two competing unaligned AIOs can result in data
corruption.
However it decides not to serialize if the potentially unaligned aio is
past i_size with the rationale that no pending writes are possible past
i_size. Unfortunately if the i_size is not block aligned and the second
unaligned write lands past i_size, but still into the same block, it has
the potential of corrupting the previous unaligned write to the same
block.
This is (very simplified) reproducer from Frank
// 41472 = (10 * 4096) + 512
// 37376 = 41472 - 4096
ftruncate(fd, 41472);
io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);
io_submit(io_ctx, 1, &iocbs[1]);
io_submit(io_ctx, 1, &iocbs[2]);
io_getevents(io_ctx, 2, 2, events, NULL);
Without this patch the 512B range from 40960 up to the start of the
second unaligned write (41472) is going to be zeroed overwriting the data
written by the first write. This is a data corruption.
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
00009200 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
*
0000a000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0000a200 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31
With this patch the data corruption is avoided because we will recognize
the unaligned_aio and wait for the unwritten extent conversion.
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
00009200 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
*
0000a200 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31
*
0000b200
Reported-by: Frank Sorenson <fsorenso@redhat.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fixes: e9e3bcecf44c ("ext4: serialize unaligned asynchronous DIO")
Cc: stable@vger.kernel.org
Pull smb3 fixes from Steve French:
- two fixes for stable for guest mount problems with smb3.1.1
- two fixes for crediting (SMB3 flow control) on resent requests
- a byte range lock leak fix
- two fixes for incorrect rc mappings
* tag '5.1-rc1-cifs-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
SMB3: Fix SMB3.1.1 guest mounts to Samba
cifs: Fix slab-out-of-bounds when tracing SMB tcon
cifs: allow guest mounts to work for smb3.11
fix incorrect error code mapping for OBJECTID_NOT_FOUND
cifs: fix that return -EINVAL when do dedupe operation
CIFS: Fix an issue with re-sending rdata when transport returning -EAGAIN
CIFS: Fix an issue with re-sending wdata when transport returning -EAGAIN
Pull ext4 fixes from Ted Ts'o:
"Miscellaneous ext4 bug fixes for 5.1"
* tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
ext4: prohibit fstrim in norecovery mode
ext4: cleanup bh release code in ext4_ind_remove_space()
ext4: brelse all indirect buffer in ext4_ind_remove_space()
ext4: report real fs size after failed resize
ext4: add missing brelse() in add_new_gdb_meta_bg()
ext4: remove useless ext4_pin_inode()
ext4: avoid panic during forced reboot
ext4: fix data corruption caused by unaligned direct AIO
ext4: fix NULL pointer dereference while journal is aborted
Pull scheduler updates from Thomas Gleixner:
"Third more careful attempt for this set of fixes:
- Prevent a 32bit math overflow in the cpufreq code
- Fix a buffer overflow when scanning the cgroup2 cpu.max property
- A set of fixes for the NOHZ scheduler logic to prevent waking up
CPUs even if the capacity of the busy CPUs is sufficient along with
other tweaks optimizing the behaviour for asymmetric systems
(big/little)"
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Skip LLC NOHZ logic for asymmetric systems
sched/fair: Tune down misfit NOHZ kicks
sched/fair: Comment some nohz_balancer_kick() kick conditions
sched/core: Fix buffer overflow in cgroup2 property cpu.max
sched/cpufreq: Fix 32-bit math overflow
The ext4 fstrim implementation uses the block bitmaps to find free space
that can be discarded. If we haven't replayed the journal, the bitmaps
will be stale and we absolutely *cannot* use stale metadata to zap the
underlying storage.
Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull perf updates from Thomas Gleixner:
"A larger set of perf updates.
Not all of them are strictly fixes, but that's solely the tip
maintainers fault as they let the timely -rc1 pull request fall
through the cracks for various reasons including travel. So I'm
sending this nevertheless because rebasing and distangling fixes and
updates would be a mess and risky as well. As of tomorrow, a strict
fixes separation is happening again. Sorry for the slip-up.
Kernel:
- Handle RECORD_MMAP vs. RECORD_MMAP2 correctly so different
consumers of the mmap event get what they requested.
Tools:
- A larger set of updates to perf record/report/scripts vs. time
stamp handling
- More Python3 fixups
- A pile of memory leak plumbing
- perf BPF improvements and fixes
- Finalize the perf.data directory storage"
[ Note: the kernel part is strictly a fix, the updates are purely to
tooling - Linus ]
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (75 commits)
perf bpf: Show more BPF program info in print_bpf_prog_info()
perf bpf: Extract logic to create program names from perf_event__synthesize_one_bpf_prog()
perf tools: Save bpf_prog_info and BTF of new BPF programs
perf evlist: Introduce side band thread
perf annotate: Enable annotation of BPF programs
perf build: Check what binutils's 'disassembler()' signature to use
perf bpf: Process PERF_BPF_EVENT_PROG_LOAD for annotation
perf symbols: Introduce DSO_BINARY_TYPE__BPF_PROG_INFO
perf feature detection: Add -lopcodes to feature-libbfd
perf top: Add option --no-bpf-event
perf bpf: Save BTF information as headers to perf.data
perf bpf: Save BTF in a rbtree in perf_env
perf bpf: Save bpf_prog_info information as headers to perf.data
perf bpf: Save bpf_prog_info in a rbtree in perf_env
perf bpf: Make synthesize_bpf_events() receive perf_session pointer instead of perf_tool
perf bpf: Synthesize bpf events with bpf_program__get_prog_info_linear()
bpftool: use bpf_program__get_prog_info_linear() in prog.c:do_dump()
tools lib bpf: Introduce bpf_program__get_prog_info_linear()
perf record: Replace option --bpf-event with --no-bpf-event
perf tests: Fix a memory leak in test__perf_evsel__tp_sched_test()
...
The LLC NOHZ condition will become true as soon as >=2 CPUs in a
single LLC domain are busy. On big.LITTLE systems, this translates to
two or more CPUs of a "cluster" (big or LITTLE) being busy.
Issuing a NOHZ kick in these conditions isn't desired for asymmetric
systems, as if the busy CPUs can provide enough compute capacity to
the running tasks, then we can leave the NOHZ CPUs in peace.
Skip the LLC NOHZ condition for asymmetric systems, and rely on
nr_running & capacity checks to trigger NOHZ kicks when the system
actually needs them.
Suggested-by: Morten Rasmussen <morten.rasmussen@arm.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dietmar.Eggemann@arm.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20190211175946.4961-4-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently, we are releasing the indirect buffer where we are done with
it in ext4_ind_remove_space(), so we can see the brelse() and
BUFFER_TRACE() everywhere. It seems fragile and hard to read, and we
may probably forget to release the buffer some day. This patch cleans
up the code by putting of the code which releases the buffers to the
end of the function.
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Pull x86 fixes from Thomas Gleixner:
"A set of x86 fixes:
- Prevent potential NULL pointer dereferences in the HPET and HyperV
code
- Exclude the GART aperture from /proc/kcore to prevent kernel
crashes on access
- Use the correct macros for Cyrix I/O on Geode processors
- Remove yet another kernel address printk leak
- Announce microcode reload completion as requested by quite some
people. Microcode loading has become popular recently.
- Some 'Make Clang' happy fixlets
- A few cleanups for recently added code"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/gart: Exclude GART aperture from kcore
x86/hw_breakpoints: Make default case in hw_breakpoint_arch_parse() return an error
x86/mm/pti: Make local symbols static
x86/cpu/cyrix: Remove {get,set}Cx86_old macros used for Cyrix processors
x86/cpu/cyrix: Use correct macros for Cyrix calls on Geode processors
x86/microcode: Announce reload operation's completion
x86/hyperv: Prevent potential NULL pointer dereference
x86/hpet: Prevent potential NULL pointer dereference
x86/lib: Fix indentation issue, remove extra tab
x86/boot: Restrict header scope to make Clang happy
x86/mm: Don't leak kernel addresses
x86/cpufeature: Fix various quality problems in the <asm/cpu_device_hd.h> header
Pull perf/core improvements and fixes from Arnaldo:
BPF:
Song Liu:
- Add support for annotating BPF programs, using the PERF_RECORD_BPF_EVENT
and PERF_RECORD_KSYMBOL recently added to the kernel and plugging
binutils's libopcodes disassembly of BPF programs with the existing
annotation interfaces in 'perf annotate', 'perf report' and 'perf top'
various output formats (--stdio, --stdio2, --tui).
perf list:
Andi Kleen:
- Filter metrics when using substring search.
perf record:
Andi Kleen:
- Allow to limit number of reported perf.data files
- Clarify help for --switch-output.
perf report:
Andi Kleen
- Indicate JITed code better.
- Show all sort keys in help output.
perf script:
Andi Kleen:
- Support relative time.
perf stat:
Andi Kleen:
- Improve scaling.
General:
Changbin Du:
- Fix some mostly error path memory and reference count leaks found
using gcc's ASan and UBSan.
Vendor events:
Mamatha Inamdar:
- Remove P8 HW events which are not supported.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
In this commit:
3b1baa6496e6 ("sched/fair: Add 'group_misfit_task' load-balance type")
we set rq->misfit_task_load whenever the current running task has a
utilization greater than 80% of rq->cpu_capacity. A non-zero value in
this field enables misfit load balancing.
However, if the task being looked at is already running on a CPU of
highest capacity, there's nothing more we can do for it. We can
currently spot this in update_sd_pick_busiest(), which prevents us
from selecting a sched_group of group_type == group_misfit_task as the
busiest group, but we don't do any of that in nohz_balancer_kick().
This means that we could repeatedly kick NOHZ CPUs when there's no
improvements in terms of load balance to be done.
Introduce a check_misfit_status() helper that returns true iff there
is a CPU in the system that could give more CPU capacity to a rq's
misfit task - IOW, there exists a CPU of higher capacity_orig or the
rq's CPU is severely pressured by rt/IRQ.
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dietmar.Eggemann@arm.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: morten.rasmussen@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20190211175946.4961-3-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
All indirect buffers get by ext4_find_shared() should be released no
mater the branch should be freed or not. But now, we forget to release
the lower depth indirect buffers when removing space from the same
higher depth indirect block. It will lead to buffer leak and futher
more, it may lead to quota information corruption when using old quota,
consider the following case.
- Create and mount an empty ext4 filesystem without extent and quota
features,
- quotacheck and enable the user & group quota,
- Create some files and write some data to them, and then punch hole
to some files of them, it may trigger the buffer leak problem
mentioned above.
- Disable quota and run quotacheck again, it will create two new
aquota files and write the checked quota information to them, which
probably may reuse the freed indirect block(the buffer and page
cache was not freed) as data block.
- Enable quota again, it will invoke
vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
buffers and pagecache. Unfortunately, because of the buffer of quota
data block is still referenced, quota code cannot read the up to date
quota info from the device and lead to quota information corruption.
This problem can be reproduced by xfstests generic/231 on ext3 file
system or ext4 file system without extent and quota features.
This patch fix this problem by releasing the missing indirect buffers,
in ext4_ind_remove_space().
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: zhangyi (F) <yi.zhang@huawei.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
Pull timer fixes from Thomas Gleixner:
"A set of small fixes plus the removal of stale board support code:
- Remove the board support code from the clpx711x clocksource driver.
This change had fallen through the cracks and I'm sending it now
rather than dealing with people who want to improve that stale code
for 3 month.
- Use the proper clocksource mask on RICSV
- Make local scope functions and variables static"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
clocksource/drivers/clps711x: Remove board support
clocksource/drivers/riscv: Fix clocksource mask
clocksource/drivers/mips-gic-timer: Make gic_compare_irqaction static
clocksource/drivers/timer-ti-dm: Make omap_dm_timer_set_load_start() static
clocksource/drivers/tcb_clksrc: Make tc_clksrc_suspend/resume() static
clocksource/drivers/clps711x: Make clps711x_clksrc_init() static
time/jiffies: Make refined_jiffies static
On machines where the GART aperture is mapped over physical RAM,
/proc/kcore contains the GART aperture range. Accessing the GART range via
/proc/kcore results in a kernel crash.
vmcore used to have the same issue, until it was fixed with commit
2a3e83c6f96c ("x86/gart: Exclude GART aperture from vmcore")', leveraging
existing hook infrastructure in vmcore to let /proc/vmcore return zeroes
when attempting to read the aperture region, and so it won't read from the
actual memory.
Apply the same workaround for kcore. First implement the same hook
infrastructure for kcore, then reuse the hook functions introduced in the
previous vmcore fix. Just with some minor adjustment, rename some functions
for more general usage, and simplify the hook infrastructure a bit as there
is no module usage yet.
Suggested-by: Baoquan He <bhe@redhat.com>
Signed-off-by: Kairui Song <kasong@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiri Bohac <jbohac@suse.cz>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Omar Sandoval <osandov@fb.com>
Cc: Dave Young <dyoung@redhat.com>
Link: https://lkml.kernel.org/r/20190308030508.13548-1-kasong@redhat.com
Pull perf/core improvements and fixes from Arnaldo:
kernel:
Stephane Eranian :
- Restore mmap record type correctly when handling PERF_RECORD_MMAP2
events, as the same template is used for all the threads interested
in mmap events, some may want just PERF_RECORD_MMAP, while some
may want the extra info in MMAP2 records.
perf probe:
Adrian Hunter:
- Fix getting the kernel map, because since changes related to x86 PTI
entry trampolines handling, there are more than one kernel map.
perf script:
Andi Kleen:
- Support insn output for normal samples, i.e.:
perf script -F ip,sym,insn --xed
Will fetch the sample IP from the thread address space and feed it
to Intel's XED disassembler, producing lines such as:
ffffffffa4068804 native_write_msr wrmsr
ffffffffa415b95e __hrtimer_next_event_base movq 0x18(%rax), %rdx
That match 'perf annotate's output.
- Make the --cpu filter apply to PERF_RECORD_COMM/FORK/... events, in
addition to PERF_RECORD_SAMPLE.
perf report:
- Add a new --samples option to save a small random number of samples
per hist entry, using a reservoir technique to select a representative
number of samples.
Then allow browsing the samples using 'perf script' as part of the hist
entry context menu. This automatically adds the right filters, so only
the thread or CPU of the sample is displayed. Then we use less' search
functionality to directly jump to the time stamp of the selected sample.
It uses different menus for assembler and source display. Assembler
needs xed installed and source needs debuginfo.
- Fix the UI browser scripts pop up menu when there are many scripts
available.
perf report:
Andi Kleen:
- Add 'time' sort option. E.g.:
% perf report --sort time,overhead,symbol --time-quantum 1ms --stdio
...
0.67% 277061.87300 [.] _dl_start
0.50% 277061.87300 [.] f1
0.50% 277061.87300 [.] f2
0.33% 277061.87300 [.] main
0.29% 277061.87300 [.] _dl_lookup_symbol_x
0.29% 277061.87300 [.] dl_main
0.29% 277061.87300 [.] do_lookup_x
0.17% 277061.87300 [.] _dl_debug_initialize
0.17% 277061.87300 [.] _dl_init_paths
0.08% 277061.87300 [.] check_match
0.04% 277061.87300 [.] _dl_count_modids
1.33% 277061.87400 [.] f1
1.33% 277061.87400 [.] f2
1.33% 277061.87400 [.] main
1.17% 277061.87500 [.] main
1.08% 277061.87500 [.] f1
1.08% 277061.87500 [.] f2
1.00% 277061.87600 [.] main
0.83% 277061.87600 [.] f1
0.83% 277061.87600 [.] f2
1.00% 277061.87700 [.] main
tools headers:
Arnaldo Carvalho de Melo:
- Update x86's syscall_64.tbl, no change in tools/perf behaviour.
- Sync copies asm-generic/unistd.h and linux/in with the kernel sources.
perf data:
Jiri Olsa:
- Prep work to support having perf.data stored as a directory, with one
file per CPU, that ultimately will allow having one ring buffer reading
thread per CPU.
Vendor events:
Martin Liška:
- perf PMU events for AMD Family 17h.
perf script python:
Tony Jones:
- Add python3 support for the remaining Intel PT related scripts, with
these we should have a clean build of perf with python3 while still
supporting the build with python2.
libbpf:
Arnaldo Carvalho de Melo:
- Fix the build on uCLibc, adding the missing stdarg.h since we use
va_list in one typedef.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch enables showing bpf program name, address, and size in the
header.
Before the patch:
perf report --header-only
...
# bpf_prog_info of id 9
# bpf_prog_info of id 10
# bpf_prog_info of id 13
After the patch:
# bpf_prog_info 9: bpf_prog_7be49e3934a125ba addr 0xffffffffa0024947 size 229
# bpf_prog_info 10: bpf_prog_2a142ef67aaad174 addr 0xffffffffa007c94d size 229
# bpf_prog_info 13: bpf_prog_47368425825d7384_task__task_newt addr 0xffffffffa0251137 size 369
Committer notes:
Fix the fallback definition when HAVE_LIBBPF_SUPPORT is not defined,
i.e. add the missing 'static inline' and add the __maybe_unused to the
args. Also add stdio.h since we now use FILE * in bpf-event.h.
Signed-off-by: Song Liu <songliubraving@fb.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190319165454.1298742-3-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We now have a comment explaining the first sched_domain based NOHZ kick,
so might as well comment them all.
While at it, unwrap a line that fits under 80 characters.
Co-authored-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Dietmar.Eggemann@arm.com
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: morten.rasmussen@arm.com
Cc: vincent.guittot@linaro.org
Link: https://lkml.kernel.org/r/20190211175946.4961-2-valentin.schneider@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently when the file system resize using ext4_resize_fs() fails it
will report into log that "resized filesystem to <requested block
count>". However this may not be true in the case of failure. Use the
current block count as returned by ext4_blocks_count() to report the
block count.
Additionally, report a warning that "error occurred during file system
resize"
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull locking fixes from Thomas Gleixner:
"Two small fixes:
- Cure a recently introduces error path hickup which tries to
unregister a not registered lockdep key in te workqueue code
- Prevent unaligned cmpxchg() crashes in the robust list handling
code by sanity checking the user space supplied futex pointer"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
futex: Ensure that futex address is aligned in handle_futex_death()
workqueue: Only unregister a registered lockdep key
Since board support for the CLPS711X platform was removed,
remove the board support from the clps711x-timer driver.
Signed-off-by: Alexander Shiyan <shc_work@mail.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lkml.kernel.org/r/20181220111626.17140-1-shc_work@mail.ru
Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
Signed-off-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: kbuild-all@01.org
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190313184243.GA10820@lkp-sb-ep06
The libbpf_print_fn_t typedef uses va_list without including the header
where that type is defined, stdarg.h, breaking in places where we're
unlucky for that type not to be already defined by some previously
included header.
Noticed while building on fedora 24 cross building tools/perf to the ARC
architecture using the uClibc C library:
28 fedora:24-x-ARC-uClibc : FAIL arc-linux-gcc (ARCompact ISA Linux uClibc toolchain 2017.09-rc2) 7.1.1 20170710
CC /tmp/build/perf/tests/llvm.o
In file included from tests/llvm.c:3:0:
/git/linux/tools/lib/bpf/libbpf.h:57:20: error: unknown type name 'va_list'
const char *, va_list ap);
^~~~~~~
/git/linux/tools/lib/bpf/libbpf.h:59:34: error: unknown type name 'libbpf_print_fn_t'
LIBBPF_API void libbpf_set_print(libbpf_print_fn_t fn);
^~~~~~~~~~~~~~~~~
mv: cannot stat '/tmp/build/perf/tests/.llvm.o.tmp': No such file or directory
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jakub Kicinski <jakub.kicinski@netronome.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Quentin Monnet <quentin.monnet@netronome.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Yonghong Song <yhs@fb.com>
Fixes: a8a1f7d09cfc ("libbpf: fix libbpf_print")
Link: https://lkml.kernel.org/n/tip-5270n2quu2gqz22o7itfdx00@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Extract logic to create program names to synthesize_bpf_prog_name(), so
that it can be reused in header.c:print_bpf_prog_info().
This commit doesn't change the behavior.
Signed-off-by: Song Liu <songliubraving@fb.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190319165454.1298742-2-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add limit into sscanf format string for on-stack buffer.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0d5936344f30 ("sched: Implement interface for cgroup unified hierarchy")
Link: https://lkml.kernel.org/r/155189230232.2620.13120481613524200065.stgit@buzz
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Currently in add_new_gdb_meta_bg() there is a missing brelse of gdb_bh
in case ext4_journal_get_write_access() fails.
Additionally kvfree() is missing in the same error path. Fix it by
moving the ext4_journal_get_write_access() before the ext4 sb update as
Ted suggested and release n_group_desc and gdb_bh in case it fails.
Fixes: 61a9c11e5e7a ("ext4: add missing brelse() add_new_gdb_meta_bg()'s error path")
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull irq fixes from Thomas Gleixner:
"A set of fixes for the interrupt subsystem:
- Remove secondary GIC support on systems w/o device-tree support
- A set of small fixlets in various irqchip drivers
- static and fall-through annotations
- Kernel doc and typo fixes"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq: Mark expected switch case fall-through
genirq/devres: Remove excess parameter from kernel doc
irqchip/irq-mvebu-sei: Make mvebu_sei_ap806_caps static
irqchip/mbigen: Don't clear eventid when freeing an MSI
irqchip/stm32: Don't set rising configuration registers at init
irqchip/stm32: Don't clear rising/falling config registers at init
dt-bindings: irqchip: renesas-irqc: Document r8a774c0 support
irqchip/mmp: Make mmp_irq_domain_ops static
irqchip/brcmstb-l2: Make two init functions static
genirq: Fix typo in comment of IRQD_MOVE_PCNTXT
irqchip/gic-v3-its: Fix comparison logic in lpi_range_cmp
irqchip/gic: Drop support for secondary GIC in non-DT systems
irqchip/imx-irqsteer: Fix of_property_read_u32() error handling
The futex code requires that the user space addresses of futexes are 32bit
aligned. sys_futex() checks this in futex_get_keys() but the robust list
code has no alignment check in place.
As a consequence the kernel crashes on architectures with strict alignment
requirements in handle_futex_death() when trying to cmpxchg() on an
unaligned futex address which was retrieved from the robust list.
[ tglx: Rewrote changelog, proper sizeof() based alignement check and add
comment ]
Fixes: 0771dfefc9e5 ("[PATCH] lightweight robust futexes: core")
Signed-off-by: Chen Jie <chenjie6@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <dvhart@infradead.org>
Cc: <peterz@infradead.org>
Cc: <zengweilin@huawei.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/1552621478-119787-1-git-send-email-chenjie6@huawei.com
For all riscv architectures (RV32, RV64 and RV128), the clocksource
is a 64 bit incrementing counter.
Fix the clock source mask accordingly.
Tested on both 64bit and 32 bit virt machine in QEMU.
Fixes: 62b019436814 ("clocksource: new RISC-V SBI timer driver")
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Anup Patel <anup@brainfault.org>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: linux-riscv@lists.infradead.org
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Anup Patel <Anup.Patel@wdc.com>
Cc: Damien Le Moal <Damien.LeMoal@wdc.com>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20190322215411.19362-1-atish.patra@wdc.com
When building with -Wsometimes-uninitialized, Clang warns:
arch/x86/kernel/hw_breakpoint.c:355:2: warning: variable 'align' is used
uninitialized whenever switch default is taken
[-Wsometimes-uninitialized]
The default cannot be reached because arch_build_bp_info() initializes
hw->len to one of the specified cases. Nevertheless the warning is valid
and returning -EINVAL makes sure that this cannot be broken by future
modifications.
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: clang-built-linux@googlegroups.com
Link: https://github.com/ClangBuiltLinux/linux/issues/392
Link: https://lkml.kernel.org/r/20190307212756.4648-1-natechancellor@gmail.com
Thomas noticed that the new arch/x86/include/asm/cpu_device_id.h header is
a train-wreck that didn't incorporate review feedback like not using __u8
in kernel-only headers.
While at it also fix all the *other* problems this header has:
- Use canonical names for the header guards. It's inexplicable why a non-standard
guard was used.
- Don't define the header guard to 1. Plus annotate the closing #endif as done
absolutely every other header. Again, an inexplicable source of noise.
- Move the kernel API calls provided by this header next to each other, there's
absolutely no reason to have them spread apart in the header.
- Align the INTEL_CPU_DESC() macro initializations vertically, this is easier to
read and it's also the canonical style.
- Actually name the macro arguments properly: instead of 'mod, step, rev',
spell out 'model, stepping, revision' - it's not like we have a lack of
characters in this header.
- Actually make arguments macro-safe - again it's inexplicable why it wasn't
done properly to begin with.
Quite amazing how many problems a 41 lines header can contain.
This kind of code quality is unacceptable, and it slipped through the
review net of 2 developers and 2 maintainers, including myself, until
Thomas noticed it. :-/
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Guenter reported a build warning for CONFIG_CPU_SUP_INTEL=n:
> With allmodconfig-CONFIG_CPU_SUP_INTEL, this patch results in:
>
> In file included from arch/x86/events/amd/core.c:8:0:
> arch/x86/events/amd/../perf_event.h:1036:45: warning: ‘struct cpu_hw_event’ declared inside parameter list will not be visible outside of this definition or declaration
> static inline int intel_cpuc_prepare(struct cpu_hw_event *cpuc, int cpu)
While harmless (an unsed pointer is an unused pointer, no matter the type)
it needs fixing.
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: stable@vger.kernel.org
Fixes: d01b1f96a82e ("perf/x86/intel: Make cpuc allocations consistent")
Link: http://lkml.kernel.org/r/20190315081410.GR5996@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Add a way to define custom scripts through ~/.perfconfig, which are then
added to the scripts menu. The scripts get the same arguments as 'perf
script', in particular -i, --cpu, --tid.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311144502.15423-10-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To fully annotate BPF programs with source code mapping, 4 different
information are needed:
1) PERF_RECORD_KSYMBOL
2) PERF_RECORD_BPF_EVENT
3) bpf_prog_info
4) btf
This patch handles 3) and 4) for BPF programs loaded after 'perf
record|top'.
For timely process of these information, a dedicated event is added to
the side band evlist.
When PERF_RECORD_BPF_EVENT is received via the side band event, the
polling thread gathers 3) and 4) vis sys_bpf and store them in perf_env.
This information is saved to perf.data at the end of 'perf record'.
Committer testing:
The 'wakeup_watermark' member in 'struct perf_event_attr' is inside a
unnamed union, so can't be used in a struct designated initialization
with older gccs, get it out of that, isolating as 'attr.wakeup_watermark
= 1;' to work with all gcc versions.
We also need to add '--no-bpf-event' to the 'perf record'
perf_event_attr tests in 'perf test', as the way that that test goes is
to intercept the events being setup and looking if they match the fields
described in the control files, since now it finds first the side band
event used to catch the PERF_RECORD_BPF_EVENT, they all fail.
With these issues fixed:
Same scenario as for testing BPF programs loaded before 'perf record' or
'perf top' starts, only start the BPF programs after 'perf record|top',
so that its information get collected by the sideband threads, the rest
works as for the programs loaded before start monitoring.
Add missing 'inline' to the bpf_event__add_sb_event() when
HAVE_LIBBPF_SUPPORT is not defined, fixing the build in systems without
binutils devel files installed.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-16-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Vincent Wang reported that get_next_freq() has a mult overflow bug on
32-bit platforms in the IOWAIT boost case, since in that case {util,max}
are in freq units instead of capacity units.
Solve this by moving the IOWAIT boost to capacity units. And since this
means @max is constant; simplify the code.
Reported-by: Vincent Wang <vincent.wang@unisoc.com>
Tested-by: Vincent Wang <vincent.wang@unisoc.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Chunyan Zhang <zhang.lyra@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
Cc: Rik van Riel <riel@surriel.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190305083202.GU32494@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull core fixes from Thomas Gleixner:
"Two small fixes:
- Move the large objtool_file struct off the stack so objtool works
in setups with a tight stack limit.
- Make a few variables static in the watchdog core code"
* 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
watchdog/core: Make variables static
objtool: Move objtool_file struct off the stack
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
With -Wimplicit-fallthrough added to CFLAGS:
kernel/irq/manage.c: In function ‘irq_do_set_affinity’:
kernel/irq/manage.c:198:3: warning: this statement may fall through [-Wimplicit-fallthrough=]
cpumask_copy(desc->irq_common_data.affinity, mask);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
kernel/irq/manage.c:199:2: note: here
case IRQ_SET_MASK_OK_NOCOPY:
^~~~
Annotate it.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Kees Cook <keescook@chromium.org>
Link: https://lkml.kernel.org/r/20190228213714.GA9246@embeddedor
The recent change to prevent use after free and a memory leak introduced an
unconditional call to wq_unregister_lockdep() in the error handling
path. If the lockdep key had not been registered yet, then the lockdep core
emits a warning.
Only call wq_unregister_lockdep() if wq_register_lockdep() has been
called first.
Fixes: 009bb421b6ce ("workqueue, lockdep: Fix an alloc_workqueue() error path")
Reported-by: syzbot+be0c198232f86389c3dd@syzkaller.appspotmail.com
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Link: https://lkml.kernel.org/r/20190311230255.176081-1-bvanassche@acm.org
Fix sparse warning:
drivers/clocksource/mips-gic-timer.c:70:18: warning:
symbol 'gic_compare_irqaction' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <daniel.lezcano@linaro.org>
Link: https://lkml.kernel.org/r/20190322144359.19516-1-yuehaibing@huawei.com
With 'make C=2 W=1', sparse and gcc both complain:
CHECK arch/x86/mm/pti.c
arch/x86/mm/pti.c:84:3: warning: symbol 'pti_mode' was not declared. Should it be static?
arch/x86/mm/pti.c:605:6: warning: symbol 'pti_set_kernel_image_nonglobal' was not declared. Should it be static?
CC arch/x86/mm/pti.o
arch/x86/mm/pti.c:605:6: warning: no previous prototype for 'pti_set_kernel_image_nonglobal' [-Wmissing-prototypes]
605 | void pti_set_kernel_image_nonglobal(void)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
pti_set_kernel_image_nonglobal() is only used locally. 'pti_mode' exists in
drivers/hwtracing/intel_th/pti.c as well, but it's a completely unrelated
local (static) symbol.
Make both static.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lkml.kernel.org/r/27680.1552376873@turing-police
For bug workarounds or checks, it is useful to check for specific
microcode revisions.
Add a new generic function to match the CPU with stepping.
Add the other function to check the min microcode revisions for
the matched CPU.
A new table format is introduced to facilitate the quirk to
fill the related information.
This does not change the existing x86_cpu_id because it's an ABI
shared with modules, and also has quite different requirements,
as in no wildcards, but everything has to be matched exactly.
Originally-by: Andi Kleen <ak@linux.intel.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: eranian@google.com
Link: https://lkml.kernel.org/r/1549319013-4522-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Through:
validate_event()
x86_pmu.get_event_constraints(.idx=-1)
tfa_get_event_constraints()
dyn_constraint()
cpuc->constraint_list[-1] is used, which is an obvious out-of-bound access.
In this case, simply skip the TFA constraint code, there is no event
constraint with just PMC3, therefore the code will never result in the
empty set.
Fixes: 400816f60c54 ("perf/x86/intel: Implement support for TSX Force Abort")
Reported-by: Tony Jones <tonyj@suse.com>
Reported-by: "DSouza, Nelson" <nelson.dsouza@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Tony Jones <tonyj@suse.com>
Tested-by: "DSouza, Nelson" <nelson.dsouza@intel.com>
Cc: eranian@google.com
Cc: jolsa@redhat.com
Cc: stable@kernel.org
Link: https://lkml.kernel.org/r/20190314130705.441549378@infradead.org
Fix the argv ui browser code to correctly display more entries than fit
on the screen without crashing. The problem was some type confusion with
pointer types in the ->seek function. Do the argv arithmetic correctly
with char ** pointers. Also add some asserts to find overruns and limit
the display function correctly.
Then finally remove a workaround for this in the res sample browser.
Committer testing:
1) Resize the x terminal to have just some 5 lines
2) Use 'perf report --samples 1' to activate the sample browser options
in the menu
3) Press ENTER, this will cause the crash:
# perf report --samples 1
perf: Segmentation fault
-------- backtrace --------
perf[0x5a514a]
/lib64/libc.so.6(+0x385bf)[0x7f27281b55bf]
/lib64/libc.so.6(+0x161a67)[0x7f27282dea67]
/lib64/libslang.so.2(SLsmg_write_wrapped_string+0x82)[0x7f272874a0b2]
perf(ui_browser__argv_refresh+0x77)[0x5939a7]
perf[0x5924cc]
perf(ui_browser__run+0x39)[0x593449]
perf(ui__popup_menu+0x83)[0x5a5263]
perf[0x59f421]
perf(perf_evlist__tui_browse_hists+0x3a0)[0x5a3780]
perf(cmd_report+0x2746)[0x447136]
perf[0x4a95fe]
perf(main+0x61c)[0x42dc6c]
/lib64/libc.so.6(__libc_start_main+0xf2)[0x7f27281a1412]
perf(_start+0x2d)[0x42de9d]
#
After applying this patch no crash takes place in such situation.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311144502.15423-12-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch introduces side band thread that captures extended
information for events like PERF_RECORD_BPF_EVENT.
This new thread uses its own evlist that uses ring buffer with very low
watermark for lower latency.
To use side band thread, we need to:
1. add side band event(s) by calling perf_evlist__add_sb_event();
2. calls perf_evlist__start_sb_thread();
3. at the end of perf run, perf_evlist__stop_sb_thread().
In the next patch, we use this thread to handle PERF_RECORD_BPF_EVENT.
Committer notes:
Add fix by Jiri Olsa for when te sb_tread can't get started and then at
the end the stop_sb_thread() segfaults when joining the (non-existing)
thread.
That can happen when running 'perf top' or 'perf record' as a normal
user, for instance.
Further checks need to be done on top of this to more graciously handle
these possible failure scenarios.
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-15-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The TIMER_IRQSAFE usage was introduced in commit 22597dc3d97b1 ("kthread:
initial support for delayed kthread work") which modelled the delayed
kthread code after workqueue's code. The workqueue code requires the flag
TIMER_IRQSAFE for synchronisation purpose. This is not true for kthread's
delay timer since all operations occur under a lock.
Remove TIMER_IRQSAFE from the timer initialisation and use timer_setup()
for initialisation purpose which is the official function.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Link: https://lkml.kernel.org/r/20190212162554.19779-2-bigeasy@linutronix.de
When admin calls "reboot -f" - i.e., does a hard system reboot by
directly calling reboot(2) - ext4 filesystem mounted with errors=panic
can panic the system. This happens because the underlying device gets
disabled without unmounting the filesystem and thus some syscall running
in parallel to reboot(2) can result in the filesystem getting IO errors.
This is somewhat surprising to the users so try improve the behavior by
switching to errors=remount-ro behavior when the system is running
reboot(2).
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Pull thermal management fixes from Zhang Rui:
- Fix a wrong __percpu structure declaration in intel_powerclamp driver
(Luc Van Oostenryck)
- Fix truncated name of the idle injection kthreads created by
intel_powerclamp driver (Zhang Rui)
- Fix the missing UUID supports in int3400 thermal driver (Matthew
Garrett)
- Fix a crash when accessing the debugfs of bcm2835 SoC thermal driver
(Phil Elwell)
- A couple of trivial fixes/cleanups in some SoC thermal drivers
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux:
thermal/intel_powerclamp: fix truncated kthread name
thermal: mtk: Allocate enough space for mtk_thermal.
thermal/int340x_thermal: fix mode setting
thermal/int340x_thermal: Add additional UUIDs
thermal: cpu_cooling: Remove unused cur_freq variable
thermal: bcm2835: Fix crash in bcm2835_thermal_debugfs
thermal: samsung: Fix incorrect check after code merge
thermal/intel_powerclamp: fix __percpu declaration of worker_data
sparse complains:
CHECK kernel/watchdog.c
kernel/watchdog.c:45:19: warning: symbol 'nmi_watchdog_available'
was not declared. Should it be static?
kernel/watchdog.c:47:16: warning: symbol 'watchdog_allowed_mask'
was not declared. Should it be static?
They're not referenced by name from anyplace else, make them static.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/7855.1552383228@turing-police
Building with 'make W=1' complains:
CC kernel/irq/devres.o
kernel/irq/devres.c:104: warning: Excess function parameter 'thread_fn'
description in 'devm_request_any_context_irq'
Remove it.
Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/31207.1552378676@turing-police
Fix sparse warning:
drivers/clocksource/timer-ti-dm.c:589:5: warning:
symbol 'omap_dm_timer_set_load_start' was not declared. Should it be static?
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: <daniel.lezcano@linaro.org>
Link: https://lkml.kernel.org/r/20190322144302.6704-1-yuehaibing@huawei.com
The getCx86_old() and setCx86_old() macros have been replaced with
correctly working getCx86() and setCx86(), so remove these unused macros.
Signed-off-by: Matthew Whitehead <tedheadster@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: luto@kernel.org
Link: https://lkml.kernel.org/r/1552596361-8967-3-git-send-email-tedheadster@gmail.com
Some F17h models do not have CPB set in CPUID even though the CPU
supports it. Set the feature bit unconditionally on all F17h.
[ bp: Rewrite commit message and patch. ]
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
Cc: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: x86-ml <x86@kernel.org>
Link: https://lkml.kernel.org/r/20181120030018.5185-1-jiaxun.yang@flygoat.com
Merge misc patches from Andrew Morton:
- a little bit more MM
- a few fixups
[ The "little bit more MM" is actually just one of the three patches
Andrew sent for mm/filemap.c, I'm still mulling over two more of them
from Josef Bacik - Linus ]
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
include/linux/swap.h: use offsetof() instead of custom __swapoffset macro
tools/testing/selftests/proc/proc-pid-vm.c: test with vsyscall in mind
zram: default to lzo-rle instead of lzo
filemap: pass vm_fault to the mmap ra helpers
Don't overflow array when the scripts directory is too large, or the
script file name is too long.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20190311144502.15423-11-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In symbol__disassemble(), DSO_BINARY_TYPE__BPF_PROG_INFO dso calls into
a new function symbol__disassemble_bpf(), where annotation line
information is filled based on the bpf_prog_info and btf data saved in
given perf_env.
symbol__disassemble_bpf() uses binutils's libopcodes to disassemble bpf
programs.
Committer testing:
After fixing this:
- u64 *addrs = (u64 *)(info_linear->info.jited_ksyms);
+ u64 *addrs = (u64 *)(uintptr_t)(info_linear->info.jited_ksyms);
Detected when crossbuilding to a 32-bit arch.
And making all this dependent on HAVE_LIBBFD_SUPPORT and
HAVE_LIBBPF_SUPPORT:
1) Have a BPF program running, one that has BTF info, etc, I used
the tools/perf/examples/bpf/augmented_raw_syscalls.c put in place
by 'perf trace'.
# grep -B1 augmented_raw ~/.perfconfig
[trace]
add_events = /home/acme/git/perf/tools/perf/examples/bpf/augmented_raw_syscalls.c
#
# perf trace -e *mmsg
dnf/6245 sendmmsg(20, 0x7f5485a88030, 2, MSG_NOSIGNAL) = 2
NetworkManager/10055 sendmmsg(22<socket:[1056822]>, 0x7f8126ad1bb0, 2, MSG_NOSIGNAL) = 2
2) Then do a 'perf record' system wide for a while:
# perf record -a
^C[ perf record: Woken up 68 times to write data ]
[ perf record: Captured and wrote 19.427 MB perf.data (366891 samples) ]
#
3) Check that we captured BPF and BTF info in the perf.data file:
# perf report --header-only | grep 'b[pt]f'
# event : name = cycles:ppp, , id = { 294789, 294790, 294791, 294792, 294793, 294794, 294795, 294796 }, size = 112, { sample_period, sample_freq } = 4000, sample_type = IP|TID|TIME|CPU|PERIOD, read_format = ID, disabled = 1, inherit = 1, mmap = 1, comm = 1, freq = 1, task = 1, precise_ip = 3, sample_id_all = 1, exclude_guest = 1, mmap2 = 1, comm_exec = 1, ksymbol = 1, bpf_event = 1
# bpf_prog_info of id 13
# bpf_prog_info of id 14
# bpf_prog_info of id 15
# bpf_prog_info of id 16
# bpf_prog_info of id 17
# bpf_prog_info of id 18
# bpf_prog_info of id 21
# bpf_prog_info of id 22
# bpf_prog_info of id 41
# bpf_prog_info of id 42
# btf info of id 2
#
4) Check which programs got recorded:
# perf report | grep bpf_prog | head
0.16% exe bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.14% exe bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
0.08% fuse-overlayfs bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.07% fuse-overlayfs bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
0.01% clang-4.0 bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
0.01% clang-4.0 bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.00% clang bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
0.00% runc bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.00% clang bpf_prog_819967866022f1e1_sys_enter [k] bpf_prog_819967866022f1e1_sys_enter
0.00% sh bpf_prog_c1bd85c092d6e4aa_sys_exit [k] bpf_prog_c1bd85c092d6e4aa_sys_exit
#
This was with the default --sort order for 'perf report', which is:
--sort comm,dso,symbol
If we just look for the symbol, for instance:
# perf report --sort symbol | grep bpf_prog | head
0.26% [k] bpf_prog_819967866022f1e1_sys_enter - -
0.24% [k] bpf_prog_c1bd85c092d6e4aa_sys_exit - -
#
or the DSO:
# perf report --sort dso | grep bpf_prog | head
0.26% bpf_prog_819967866022f1e1_sys_enter
0.24% bpf_prog_c1bd85c092d6e4aa_sys_exit
#
We'll see the two BPF programs that augmented_raw_syscalls.o puts in
place, one attached to the raw_syscalls:sys_enter and another to the
raw_syscalls:sys_exit tracepoints, as expected.
Now we can finally do, from the command line, annotation for one of
those two symbols, with the original BPF program source coude intermixed
with the disassembled JITed code:
# perf annotate --stdio2 bpf_prog_819967866022f1e1_sys_enter
Samples: 950 of event 'cycles:ppp', 4000 Hz, Event count (approx.): 553756947, [percent: local period]
bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
Percent int sys_enter(struct syscall_enter_args *args)
53.41 push %rbp
0.63 mov %rsp,%rbp
0.31 sub $0x170,%rsp
1.93 sub $0x28,%rbp
7.02 mov %rbx,0x0(%rbp)
3.20 mov %r13,0x8(%rbp)
1.07 mov %r14,0x10(%rbp)
0.61 mov %r15,0x18(%rbp)
0.11 xor %eax,%eax
1.29 mov %rax,0x20(%rbp)
0.11 mov %rdi,%rbx
return bpf_get_current_pid_tgid();
2.02 → callq *ffffffffda6776d9
2.76 mov %eax,-0x148(%rbp)
mov %rbp,%rsi
int sys_enter(struct syscall_enter_args *args)
add $0xfffffffffffffeb8,%rsi
return bpf_map_lookup_elem(pids, &pid) != NULL;
movabs $0xffff975ac2607800,%rdi
1.26 → callq *ffffffffda6789e9
cmp $0x0,%rax
2.43 → je 0
add $0x38,%rax
0.21 xor %r13d,%r13d
if (pid_filter__has(&pids_filtered, getpid()))
0.81 cmp $0x0,%rax
→ jne 0
mov %rbp,%rdi
probe_read(&augmented_args.args, sizeof(augmented_args.args), args);
2.22 add $0xfffffffffffffeb8,%rdi
0.11 mov $0x40,%esi
0.32 mov %rbx,%rdx
2.74 → callq *ffffffffda658409
syscall = bpf_map_lookup_elem(&syscalls, &augmented_args.args.syscall_nr);
0.22 mov %rbp,%rsi
1.69 add $0xfffffffffffffec0,%rsi
syscall = bpf_map_lookup_elem(&syscalls, &augmented_args.args.syscall_nr);
movabs $0xffff975bfcd36000,%rdi
add $0xd0,%rdi
0.21 mov 0x0(%rsi),%eax
0.93 cmp $0x200,%rax
→ jae 0
0.10 shl $0x3,%rax
0.11 add %rdi,%rax
0.11 → jmp 0
xor %eax,%eax
if (syscall == NULL || !syscall->enabled)
1.07 cmp $0x0,%rax
→ je 0
if (syscall == NULL || !syscall->enabled)
6.57 movzbq 0x0(%rax),%rdi
if (syscall == NULL || !syscall->enabled)
cmp $0x0,%rdi
0.95 → je 0
mov $0x40,%r8d
switch (augmented_args.args.syscall_nr) {
mov -0x140(%rbp),%rdi
switch (augmented_args.args.syscall_nr) {
cmp $0x2,%rdi
→ je 0
cmp $0x101,%rdi
→ je 0
cmp $0x15,%rdi
→ jne 0
case SYS_OPEN: filename_arg = (const void *)args->args[0];
mov 0x10(%rbx),%rdx
→ jmp 0
case SYS_OPENAT: filename_arg = (const void *)args->args[1];
mov 0x18(%rbx),%rdx
if (filename_arg != NULL) {
cmp $0x0,%rdx
→ je 0
xor %edi,%edi
augmented_args.filename.reserved = 0;
mov %edi,-0x104(%rbp)
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
mov %rbp,%rdi
add $0xffffffffffffff00,%rdi
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
mov $0x100,%esi
→ callq *ffffffffda658499
mov $0x148,%r8d
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
mov %eax,-0x108(%rbp)
augmented_args.filename.size = probe_read_str(&augmented_args.filename.value,
mov %rax,%rdi
shl $0x20,%rdi
shr $0x20,%rdi
if (augmented_args.filename.size < sizeof(augmented_args.filename.value)) {
cmp $0xff,%rdi
→ ja 0
len -= sizeof(augmented_args.filename.value) - augmented_args.filename.size;
add $0x48,%rax
len &= sizeof(augmented_args.filename.value) - 1;
and $0xff,%rax
mov %rax,%r8
mov %rbp,%rcx
return perf_event_output(args, &__augmented_syscalls__, BPF_F_CURRENT_CPU, &augmented_args, len);
add $0xfffffffffffffeb8,%rcx
mov %rbx,%rdi
movabs $0xffff975fbd72d800,%rsi
mov $0xffffffff,%edx
→ callq *ffffffffda658ad9
mov %rax,%r13
}
mov %r13,%rax
0.72 mov 0x0(%rbp),%rbx
mov 0x8(%rbp),%r13
1.16 mov 0x10(%rbp),%r14
0.10 mov 0x18(%rbp),%r15
0.42 add $0x28,%rbp
0.54 leaveq
0.54 ← retq
#
Please see 'man perf-config' to see how to control what should be seen,
via ~/.perfconfig [annotate] section, for instance, one can suppress the
source code and see just the disassembly, etc.
Alternatively, use the TUI bu just using 'perf annotate', press
'/bpf_prog' to see the bpf symbols, press enter and do the interactive
annotation, which allows for dumping to a file after selecting the
the various output tunables, for instance, the above without source code
intermixed, plus showing all the instruction offsets:
# perf annotate bpf_prog_819967866022f1e1_sys_enter
Then press: 's' to hide the source code + 'O' twice to show all
instruction offsets, then 'P' to print to the
bpf_prog_819967866022f1e1_sys_enter.annotation file, which will have:
# cat bpf_prog_819967866022f1e1_sys_enter.annotation
bpf_prog_819967866022f1e1_sys_enter() bpf_prog_819967866022f1e1_sys_enter
Event: cycles:ppp
53.41 0: push %rbp
0.63 1: mov %rsp,%rbp
0.31 4: sub $0x170,%rsp
1.93 b: sub $0x28,%rbp
7.02 f: mov %rbx,0x0(%rbp)
3.20 13: mov %r13,0x8(%rbp)
1.07 17: mov %r14,0x10(%rbp)
0.61 1b: mov %r15,0x18(%rbp)
0.11 1f: xor %eax,%eax
1.29 21: mov %rax,0x20(%rbp)
0.11 25: mov %rdi,%rbx
2.02 28: → callq *ffffffffda6776d9
2.76 2d: mov %eax,-0x148(%rbp)
33: mov %rbp,%rsi
36: add $0xfffffffffffffeb8,%rsi
3d: movabs $0xffff975ac2607800,%rdi
1.26 47: → callq *ffffffffda6789e9
4c: cmp $0x0,%rax
2.43 50: → je 0
52: add $0x38,%rax
0.21 56: xor %r13d,%r13d
0.81 59: cmp $0x0,%rax
5d: → jne 0
63: mov %rbp,%rdi
2.22 66: add $0xfffffffffffffeb8,%rdi
0.11 6d: mov $0x40,%esi
0.32 72: mov %rbx,%rdx
2.74 75: → callq *ffffffffda658409
0.22 7a: mov %rbp,%rsi
1.69 7d: add $0xfffffffffffffec0,%rsi
84: movabs $0xffff975bfcd36000,%rdi
8e: add $0xd0,%rdi
0.21 95: mov 0x0(%rsi),%eax
0.93 98: cmp $0x200,%rax
9f: → jae 0
0.10 a1: shl $0x3,%rax
0.11 a5: add %rdi,%rax
0.11 a8: → jmp 0
aa: xor %eax,%eax
1.07 ac: cmp $0x0,%rax
b0: → je 0
6.57 b6: movzbq 0x0(%rax),%rdi
bb: cmp $0x0,%rdi
0.95 bf: → je 0
c5: mov $0x40,%r8d
cb: mov -0x140(%rbp),%rdi
d2: cmp $0x2,%rdi
d6: → je 0
d8: cmp $0x101,%rdi
df: → je 0
e1: cmp $0x15,%rdi
e5: → jne 0
e7: mov 0x10(%rbx),%rdx
eb: → jmp 0
ed: mov 0x18(%rbx),%rdx
f1: cmp $0x0,%rdx
f5: → je 0
f7: xor %edi,%edi
f9: mov %edi,-0x104(%rbp)
ff: mov %rbp,%rdi
102: add $0xffffffffffffff00,%rdi
109: mov $0x100,%esi
10e: → callq *ffffffffda658499
113: mov $0x148,%r8d
119: mov %eax,-0x108(%rbp)
11f: mov %rax,%rdi
122: shl $0x20,%rdi
126: shr $0x20,%rdi
12a: cmp $0xff,%rdi
131: → ja 0
133: add $0x48,%rax
137: and $0xff,%rax
13d: mov %rax,%r8
140: mov %rbp,%rcx
143: add $0xfffffffffffffeb8,%rcx
14a: mov %rbx,%rdi
14d: movabs $0xffff975fbd72d800,%rsi
157: mov $0xffffffff,%edx
15c: → callq *ffffffffda658ad9
161: mov %rax,%r13
164: mov %r13,%rax
0.72 167: mov 0x0(%rbp),%rbx
16b: mov 0x8(%rbp),%r13
1.16 16f: mov 0x10(%rbp),%r14
0.10 173: mov 0x18(%rbp),%r15
0.42 177: add $0x28,%rbp
0.54 17b: leaveq
0.54 17c: ← retq
Another cool way to test all this is to symple use 'perf top' look for
those symbols, go there and press enter, annotate it live :-)
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stanislav Fomichev <sdf@google.com>
Link: http://lkml.kernel.org/r/20190312053051.2690567-13-songliubraving@fb.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In order to enable the queuing of kthread work items from hardirq context
even when PREEMPT_RT_FULL is enabled, convert the worker spin_lock to a
raw_spin_lock.
This is only acceptable to do because the work performed under the lock is
well-bounded and minimal.
Reported-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Reported-by: Tim Sander <tim@krieglstein.org>
Signed-off-by: Julia Cartwright <julia@ni.com>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Steffen Trumtrar <s.trumtrar@pengutronix.de>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Cc: Guenter Roeck <linux@roeck-us.net>
Link: https://lkml.kernel.org/r/20190212162554.19779-1-bigeasy@linutronix.de
Ext4 needs to serialize unaligned direct AIO because the zeroing of
partial blocks of two competing unaligned AIOs can result in data
corruption.
However it decides not to serialize if the potentially unaligned aio is
past i_size with the rationale that no pending writes are possible past
i_size. Unfortunately if the i_size is not block aligned and the second
unaligned write lands past i_size, but still into the same block, it has
the potential of corrupting the previous unaligned write to the same
block.
This is (very simplified) reproducer from Frank
// 41472 = (10 * 4096) + 512
// 37376 = 41472 - 4096
ftruncate(fd, 41472);
io_prep_pwrite(iocbs[0], fd, buf[0], 4096, 37376);
io_prep_pwrite(iocbs[1], fd, buf[1], 4096, 41472);
io_submit(io_ctx, 1, &iocbs[1]);
io_submit(io_ctx, 1, &iocbs[2]);
io_getevents(io_ctx, 2, 2, events, NULL);
Without this patch the 512B range from 40960 up to the start of the
second unaligned write (41472) is going to be zeroed overwriting the data
written by the first write. This is a data corruption.
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
00009200 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
*
0000a000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
0000a200 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31
With this patch the data corruption is avoided because we will recognize
the unaligned_aio and wait for the unwritten extent conversion.
00000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
00009200 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30 30
*
0000a200 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31 31
*
0000b200
Reported-by: Frank Sorenson <fsorenso@redhat.com>
Signed-off-by: Lukas Czerner <lczerner@redhat.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Fixes: e9e3bcecf44c ("ext4: serialize unaligned asynchronous DIO")
Cc: stable@vger.kernel.org
Pull smb3 fixes from Steve French:
- two fixes for stable for guest mount problems with smb3.1.1
- two fixes for crediting (SMB3 flow control) on resent requests
- a byte range lock leak fix
- two fixes for incorrect rc mappings
* tag '5.1-rc1-cifs-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
cifs: update internal module version number
SMB3: Fix SMB3.1.1 guest mounts to Samba
cifs: Fix slab-out-of-bounds when tracing SMB tcon
cifs: allow guest mounts to work for smb3.11
fix incorrect error code mapping for OBJECTID_NOT_FOUND
cifs: fix that return -EINVAL when do dedupe operation
CIFS: Fix an issue with re-sending rdata when transport returning -EAGAIN
CIFS: Fix an issue with re-sending wdata when transport returning -EAGAIN