Linux kernel
============
There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.
In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``. The formatted documentation can also be read online at:
https://www.kernel.org/doc/html/latest/
There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
The trapping of SMPRI_EL1 and TPIDR2_EL0 currently only really
work on nVHE, as only this mode uses the fine-grained trapping
that controls these two registers.
Move the trapping enable/disable code into
__{de,}activate_traps_common(), allowing it to be called when it
actually matters on VHE, and remove the flipping of EL2 control
for TPIDR2_EL0, which only affects the host access of this
register.
Fixes: 861262ab8627 ("KVM: arm64: Handle SME host state when running guests")
Reported-by: Mark Brown <broonie@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/86bkpqer4z.wl-maz@kernel.org
There are two capabilities related to ring-based dirty page tracking:
KVM_CAP_DIRTY_LOG_RING and KVM_CAP_DIRTY_LOG_RING_ACQ_REL. Both are
supported by x86. However, arm64 supports KVM_CAP_DIRTY_LOG_RING_ACQ_REL
only when the feature is supported on arm64. The userspace doesn't have
to enable the advertised capability, meaning KVM_CAP_DIRTY_LOG_RING can
be enabled on arm64 by userspace and it's wrong.
Fix it by double checking if the capability has been advertised prior to
enabling it. It's rejected to enable the capability if it hasn't been
advertised.
Fixes: 17601bfed909 ("KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option")
Reported-by: Sean Christopherson <seanjc@google.com>
Suggested-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Oliver Upton <oliver.upton@linux.dev>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221031003621.164306-4-gshan@redhat.com
enter_exception64() performs an MTE check, which involves dereferencing
vcpu->kvm. While vcpu has already been fixed up to be a HYP VA pointer,
kvm is still a pointer in the kernel VA space.
This only affects nVHE configurations with MTE enabled, as in other
cases, the pointer is either valid (VHE) or not dereferenced (!MTE).
Fix this by first converting kvm to a HYP VA pointer.
Fixes: ea7fc1bb1cd1 ("KVM: arm64: Introduce MTE VM feature")
Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
Reviewed-by: Steven Price <steven.price@arm.com>
[maz: commit message tidy-up]
Signed-off-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20221027120945.29679-1-ryan.roberts@arm.com
hyp_get_page_state() is used with pKVM to retrieve metadata about a page
by parsing a hypervisor stage-1 PTE. However, it incorrectly uses a
helper which parses *stage-2* mappings. Ouch.
Luckily, pkvm_getstate() only looks at the software bits, which happen
to be in the same place for stage-1 and stage-2 PTEs, and this all ends
up working correctly by accident. But clearly, we should do better.
Fix hyp_get_page_state() to use the correct helper.
Fixes: e82edcc75c4e ("KVM: arm64: Implement do_share() helper for sharing memory")
Signed-off-by: Quentin Perret <qperret@google.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20221025145156.855308-1-qperret@google.com
Pull kvm fixes from Paolo Bonzini:
"RISC-V:
- Fix compilation without RISCV_ISA_ZICBOM
- Fix kvm_riscv_vcpu_timer_pending() for Sstc
ARM:
- Fix a bug preventing restoring an ITS containing mappings for very
large and very sparse device topology
- Work around a relocation handling error when compiling the nVHE
object with profile optimisation
- Fix for stage-2 invalidation holding the VM MMU lock for too long
by limiting the walk to the largest block mapping size
- Enable stack protection and branch profiling for VHE
- Two selftest fixes
x86:
- add compat implementation for KVM_X86_SET_MSR_FILTER ioctl
selftests:
- synchronize includes between include/uapi and tools/include/uapi"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
tools: include: sync include/api/linux/kvm.h
KVM: x86: Add compat handler for KVM_X86_SET_MSR_FILTER
KVM: x86: Copy filter arg outside kvm_vm_ioctl_set_msr_filter()
kvm: Add support for arch compat vm ioctls
RISC-V: KVM: Fix kvm_riscv_vcpu_timer_pending() for Sstc
RISC-V: Fix compilation without RISCV_ISA_ZICBOM
KVM: arm64: vgic: Fix exit condition in scan_its_table()
KVM: arm64: nvhe: Fix build with profile optimization
KVM: selftests: Fix number of pages for memory slot in memslot_modification_stress_test
KVM: arm64: selftests: Fix multiple versions of GIC creation
KVM: arm64: Enable stack protection and branch profiling for VHE
KVM: arm64: Limit stage2_apply_range() batch size to largest block
KVM: arm64: Work out supported block level at compile time
This reverts commit 72a95859728a7866522e6633818bebc1c2519b17.
It broke reboots on big-endian MIPS and MIPS64 malta QEMU instances,
which use the syscon driver. Little-endian is not effected, which means
likely it's important to handle regmap_get_val_endian() in this function
after all.
Fixes: 72a95859728a ("mfd: syscon: Remove repetition of the regmap_get_val_endian()")
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Lee Jones <lee@kernel.org>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Provide a definition of KVM_CAP_DIRTY_LOG_RING_ACQ_REL.
Fixes: 17601bfed909 ("KVM: Add KVM_CAP_DIRTY_LOG_RING_ACQ_REL capability and config option")
Cc: Marc Zyngier <maz@kernel.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Commit bfca3dd3d068 ("kernel/utsname_sysctl.c: print kernel arch") added
a new entry to the uts_kern_table[] array, but didn't update the
UTS_PROC_xyz enumerators of older entries, breaking anything that used
them.
Which is admittedly not many cases: it's really just the two uses of
uts_proc_notify() in kernel/sys.c. But apparently journald-systemd
actually uses this to detect hostname changes.
Reported-by: Torsten Hilbrich <torsten.hilbrich@secunet.com>
Fixes: bfca3dd3d068 ("kernel/utsname_sysctl.c: print kernel arch")
Link: https://lore.kernel.org/lkml/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/
Link: https://linux-regtracking.leemhuis.info/regzbot/regression/0c2b92a6-0f25-9538-178f-eee3b06da23f@secunet.com/
Cc: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The KVM_X86_SET_MSR_FILTER ioctls contains a pointer in the passed in
struct which means it has a different struct size depending on whether
it gets called from 32bit or 64bit code.
This patch introduces compat code that converts from the 32bit struct to
its 64bit counterpart which then gets used going forward internally.
With this applied, 32bit QEMU can successfully set MSR bitmaps when
running on 64bit kernels.
Reported-by: Andrew Randrianasulu <randrianasulu@gmail.com>
Fixes: 1a155254ff937 ("KVM: x86: Introduce MSR filtering")
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-4-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull perf fixes from Borislav Petkov:
- Fix raw data handling when perf events are used in bpf
- Rework how SIGTRAPs get delivered to events to address a bunch of
problems with it. Add a selftest for that too
* tag 'perf_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
bpf: Fix sample_flags for bpf_perf_event_output
selftests/perf_events: Add a SIGTRAP stress test with disables
perf: Fix missing SIGTRAPs
In the next patch we want to introduce a second caller to
set_msr_filter() which constructs its own filter list on the stack.
Refactor the original function so it takes it as argument instead of
reading it through copy_from_user().
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-3-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pull scheduler fixes from Borislav Petkov:
- Adjust code to not trip up CFI
- Fix sched group cookie matching
* tag 'sched_urgent_for_v6.1_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Introduce struct balance_callback to avoid CFI mismatches
sched/core: Fix comparison in sched_group_cookie_match()
* Raw data is also filled by bpf_perf_event_output.
* Add sample_flags to indicate raw data.
* This eliminates the segfaults as shown below:
Run ./samples/bpf/trace_output
BUG pid 9 cookie 1001000000004 sized 4
BUG pid 9 cookie 1001000000004 sized 4
BUG pid 9 cookie 1001000000004 sized 4
Segmentation fault (core dumped)
Fixes: 838d9bb62d13 ("perf: Use sample_flags for raw_data")
Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lkml.kernel.org/r/20221007081327.1047552-1-sumanthk@linux.ibm.com
We will introduce the first architecture specific compat vm ioctl in the
next patch. Add all necessary boilerplate to allow architectures to
override compat vm ioctls when necessary.
Signed-off-by: Alexander Graf <graf@amazon.com>
Message-Id: <20221017184541.2658-2-graf@amazon.com>
Cc: stable@vger.kernel.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>