Linux kernel
============
There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.
In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``. The formatted documentation can also be read online at:
https://www.kernel.org/doc/html/latest/
There are various text files in the Documentation/ subdirectory,
several of them using the Restructured Text markup notation.
Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.
Clone this repository
For self-hosted knots, clone URLs may differ based on your setup.
Download tar.gz
This reverts commit ef5b570d3700fbb8628a58da0487486ceeb713cd.
Zhouyi reported that commit is causing crashes when running rcutorture
with KASAN enabled:
BUG: using smp_processor_id() in preemptible [00000000] code: rcu_torture_rea/100
caller is rcu_preempt_deferred_qs_irqrestore+0x74/0xed0
CPU: 4 PID: 100 Comm: rcu_torture_rea Tainted: G W 5.19.0-rc5-next-20220708-dirty #253
Call Trace:
dump_stack_lvl+0xbc/0x108 (unreliable)
check_preemption_disabled+0x154/0x160
rcu_preempt_deferred_qs_irqrestore+0x74/0xed0
__rcu_read_unlock+0x290/0x3b0
rcu_torture_read_unlock+0x30/0xb0
rcutorture_one_extend+0x198/0x810
rcu_torture_one_read+0x58c/0xc90
rcu_torture_reader+0x12c/0x360
kthread+0x1e8/0x220
ret_from_kernel_thread+0x5c/0x64
KASAN will generate instrumentation instructions around the
WRITE_ONCE(local_paca->irq_soft_mask, mask):
0xc000000000295cb0 <+0>: addis r2,r12,774
0xc000000000295cb4 <+4>: addi r2,r2,16464
0xc000000000295cb8 <+8>: mflr r0
0xc000000000295cbc <+12>: bl 0xc00000000008bb4c <mcount>
0xc000000000295cc0 <+16>: mflr r0
0xc000000000295cc4 <+20>: std r31,-8(r1)
0xc000000000295cc8 <+24>: addi r3,r13,2354
0xc000000000295ccc <+28>: mr r31,r13
0xc000000000295cd0 <+32>: std r0,16(r1)
0xc000000000295cd4 <+36>: stdu r1,-48(r1)
0xc000000000295cd8 <+40>: bl 0xc000000000609b98 <__asan_store1+8>
0xc000000000295cdc <+44>: nop
0xc000000000295ce0 <+48>: li r9,1
0xc000000000295ce4 <+52>: stb r9,2354(r31)
0xc000000000295ce8 <+56>: addi r1,r1,48
0xc000000000295cec <+60>: ld r0,16(r1)
0xc000000000295cf0 <+64>: ld r31,-8(r1)
0xc000000000295cf4 <+68>: mtlr r0
If there is a context switch before "stb r9,2354(r31)", r31 may
not equal to r13, in such case, irq soft mask will not work.
The usual solution of marking the code ineligible for instrumentation
forces the code out-of-line, which we would prefer to avoid. Christophe
proposed a partial revert, but Nick raised some concerns with that. So
for now do a full revert.
Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
[mpe: Construct change log based on Zhouyi's original report]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220831131052.42250-1-mpe@ellerman.id.au
As reported by Zhouyi Zhou, WRITE_ONCE() is not atomic
as expected when KASAN or KCSAN are compiled in.
Fix it by re-implementing it using inline assembly.
Fixes: 077fc62b2b66 ("powerpc/irq: remove inline assembly in hard_irq_disable macro")
Reported-by: Zhouyi Zhou <zhouzhouyi@gmail.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/a8298991b3df049a54ee8e558838e34265812014.1661272586.git.christophe.leroy@csgroup.eu
The semi-recent changes to MSR handling when entering RTAS (firmware)
cause crashes on IBM Cell machines. An example trace:
kernel tried to execute user page (2fff01a8) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel instruction fetch
Faulting instruction address: 0x2fff01a8
Oops: Kernel access of bad area, sig: 11 [#1]
BE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=4 NUMA Cell
Modules linked in:
CPU: 0 PID: 0 Comm: swapper/0 Tainted: G W 6.0.0-rc2-00433-gede0a8d3307a #207
NIP: 000000002fff01a8 LR: 0000000000032608 CTR: 0000000000000000
REGS: c0000000015236b0 TRAP: 0400 Tainted: G W (6.0.0-rc2-00433-gede0a8d3307a)
MSR: 0000000008001002 <ME,RI> CR: 00000000 XER: 20000000
...
NIP 0x2fff01a8
LR 0x32608
Call Trace:
0xc00000000143c5f8 (unreliable)
.rtas_call+0x224/0x320
.rtas_get_boot_time+0x70/0x150
.read_persistent_clock64+0x114/0x140
.read_persistent_wall_and_boot_offset+0x24/0x80
.timekeeping_init+0x40/0x29c
.start_kernel+0x674/0x8f0
start_here_common+0x1c/0x50
Unlike PAPR platforms where RTAS is only used in guests, on the IBM Cell
machines Linux runs with MSR[HV] set but also uses RTAS, provided by
SLOF.
Fix it by copying the MSR[HV] bit from the MSR value we've just read
using mfmsr into the value used for RTAS.
It seems like we could also fix it using an #ifdef CELL to set MSR[HV],
but that doesn't work because it's possible to build a single kernel
image that runs on both Cell native and pseries.
Fixes: b6b1c3ce06ca ("powerpc/rtas: Keep MSR[RI] set when calling RTAS")
Cc: stable@vger.kernel.org # v5.19+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Reviewed-by: Jordan Niethe <jniethe5@gmail.com>
Link: https://lore.kernel.org/r/20220823115952.1203106-2-mpe@ellerman.id.au
This reverts commit 79b74a68486765a4fe685ac4069bc71366c538f5.
It broke booting on IBM Cell machines when the kernel is also built with
CONFIG_PPC_PS3=y.
That's because FW_FEATURE_NATIVE_ALWAYS = 0 does have an important
effect, which is to clear the PS3 ALWAYS features from
FW_FEATURE_ALWAYS.
Note that CONFIG_PPC_NATIVE has since been renamed
CONFIG_PPC_HASH_MMU_NATIVE.
Fixes: 79b74a684867 ("powerpc: Remove unused FW_FEATURE_NATIVE references")
Cc: stable@vger.kernel.org # v5.17+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220823115952.1203106-1-mpe@ellerman.id.au
Christophe Leroy reported that commit 7b4537199a4a ("kbuild: link
symbol CRCs at final link, removing CONFIG_MODULE_REL_CRCS") broke
mpc85xx_defconfig + CONFIG_RELOCATABLE=y.
LD vmlinux
SYSMAP System.map
SORTTAB vmlinux
CHKREL vmlinux
WARNING: 451 bad relocations
c0b312a9 R_PPC_UADDR32 .head.text-0x3ff9ed54
c0b312ad R_PPC_UADDR32 .head.text-0x3ffac224
c0b312b1 R_PPC_UADDR32 .head.text-0x3ffb09f4
c0b312b5 R_PPC_UADDR32 .head.text-0x3fe184dc
c0b312b9 R_PPC_UADDR32 .head.text-0x3fe183a8
...
The compiler emits a bunch of R_PPC_UADDR32, which is not supported by
arch/powerpc/kernel/reloc_32.S.
The reason is there exists an unaligned symbol.
$ powerpc-linux-gnu-nm -n vmlinux
...
c0b31258 d spe_aligninfo
c0b31298 d __func__.0
c0b312a9 D sys_call_table
c0b319b8 d __func__.0
Commit 7b4537199a4a is not the root cause. Even before that, I can
reproduce the same issue for mpc85xx_defconfig + CONFIG_RELOCATABLE=y
+ CONFIG_MODVERSIONS=n.
It is just that nobody noticed because when CONFIG_MODVERSIONS is
enabled, a __crc_* symbol inserted before sys_call_table was hiding the
unalignment issue.
Adding alignment to the syscall table for ppc32 fixes the issue.
Cc: stable@vger.kernel.org
Reported-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Tested-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Trim change log discussion, add Cc stable]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/lkml/38605f6a-a568-f884-f06f-ea4da5b214f0@csgroup.eu/
Link: https://lore.kernel.org/r/20220820165129.1147589-1-masahiroy@kernel.org
On 32-bit powerpc systems with more PCIe controllers and more PCI
domains, where on more PCI domains are same PCI numbers, when kernel is
compiled with CONFIG_PROC_FS=y and
CONFIG_PPC_PCI_BUS_NUM_DOMAIN_DEPENDENT=y options, kernel prints
"proc_dir_entry 'pci/01' already registered" error message.
proc_dir_entry 'pci/01' already registered
WARNING: CPU: 0 PID: 1 at fs/proc/generic.c:377 proc_register+0x1a8/0x1ac
...
NIP proc_register+0x1a8/0x1ac
LR proc_register+0x1a8/0x1ac
Call Trace:
proc_register+0x1a8/0x1ac (unreliable)
_proc_mkdir+0x78/0xa4
pci_proc_attach_device+0x11c/0x168
pci_proc_init+0x80/0x98
do_one_initcall+0x80/0x284
kernel_init_freeable+0x1f4/0x2a0
kernel_init+0x24/0x150
ret_from_kernel_thread+0x5c/0x64
This regression started appearing after commit
566356813082 ("powerpc/pci: Add config option for using all 256 PCI
buses") in case in each mPCIe slot is connected PCIe card and therefore
PCI bus 1 is populated in for every PCIe controller / PCI domain.
The reason is that PCI procfs code expects that when PCI bus numbers are
not unique across all PCI domains, function pci_proc_domain() returns
true for domain dependent buses.
Fix this issue by setting PCI_ENABLE_PROC_DOMAINS and
PCI_COMPAT_DOMAIN_0 flags for 32-bit powerpc code when
CONFIG_PPC_PCI_BUS_NUM_DOMAIN_DEPENDENT is enabled. Same approach is
already implemented for 64-bit powerpc code (where PCI bus numbers are
always domain dependent).
Fixes: 566356813082 ("powerpc/pci: Add config option for using all 256 PCI buses")
Signed-off-by: Pali Rohár <pali@kernel.org>
[mpe: Trim change log oops message]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220820115113.30581-1-pali@kernel.org
Commit 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support")
added performance monitoring support for papr-scm nvdimm devices via
perf interface. Commit also added an array in papr_scm_priv
structure called "nvdimm_events_map", which got filled based on the
result of H_SCM_PERFORMANCE_STATS hcall.
Currently there is an assumption that the order of events in the
stats buffer, returned by the hypervisor is same. And order also
happens to matches with the events specified in nvdimm driver code.
But this assumption is not documented in Power Architecture
Platform Requirements (PAPR) document. Although the order
of events happens to be same on current generation od system, but
it might not be true in future generation systems. Fix the issue, by
adding a static mapping for nvdimm events to corresponding stat-id,
and removing the dynamic map from papr_scm_priv structure. Also
remove the function papr_scm_pmu_check_events from papr_scm.c file,
as we no longer need to copy stat-ids dynamically.
Fixes: 4c08d4bbc089 ("powerpc/papr_scm: Add perf interface support")
Reported-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Reviewed-by: Vaibhav Jain <vaibhav@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220804074852.55157-1-kjain@linux.ibm.com
Pull irq fixes from Ingo Molnar:
"Misc irqchip fixes: LoongArch driver fixes and a Hyper-V IOMMU fix"
* tag 'irq-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/loongson-liointc: Fix an error handling path in liointc_init()
irqchip/loongarch: Fix irq_domain_alloc_fwnode() abuse
irqchip/loongson-pch-pic: Move find_pch_pic() into CONFIG_ACPI
irqchip/loongson-eiointc: Fix a build warning
irqchip/loongson-eiointc: Fix irq affinity setting
iommu/hyper-v: Use helper instead of directly accessing affinity
Pull x86 kprobes fix from Ingo Molnar:
"Fix a kprobes bug in JNG/JNLE emulation when a kprobe is installed at
such instructions, possibly resulting in incorrect execution (the
wrong branch taken)"
* tag 'perf-urgent-2022-08-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/kprobes: Fix JNG/JNLE emulation
Pull irqchip fixes from Marc Zyngier:
- A bunch of small fixes for the recently merged LoongArch drivers
- A leftover from the non-SMP IRQ affinity rework affecting
the Hyper-V IOMMU code
Link: https://lore.kernel.org/r/20220812125910.2227338-1-maz@kernel.org
Pull tracing fixes from Steven Rostedt:
"Various fixes for tracing:
- Fix a return value of traceprobe_parse_event_name()
- Fix NULL pointer dereference from failed ftrace enabling
- Fix NULL pointer dereference when asking for registers from eprobes
- Make eprobes consistent with kprobes/uprobes, filters and
histograms"
* tag 'trace-v6.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Have filter accept "common_cpu" to be consistent
tracing/probes: Have kprobes and uprobes use $COMM too
tracing/eprobes: Have event probes be consistent with kprobes and uprobes
tracing/eprobes: Fix reading of string fields
tracing/eprobes: Do not hardcode $comm as a string
tracing/eprobes: Do not allow eprobes to use $stack, or % for regs
ftrace: Fix NULL pointer dereference in is_ftrace_trampoline when ftrace is dead
tracing/perf: Fix double put of trace event when init fails
tracing: React to error return from traceprobe_parse_event_name()
When kprobes emulates JNG/JNLE instructions on x86 it uses the wrong
condition. For JNG (opcode: 0F 8E), according to Intel SDM, the jump is
performed if (ZF == 1 or SF != OF). However the kernel emulation
currently uses 'and' instead of 'or'.
As a result, setting a kprobe on JNG/JNLE might cause the kernel to
behave incorrectly whenever the kprobe is hit.
Fix by changing the 'and' to 'or'.
Fixes: 6256e668b7af ("x86/kprobes: Use int3 instead of debug trap for single-step")
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220813225943.143767-1-namit@vmware.com