commits

Pull powerpc fixes from Michael Ellerman:
"A bit of a big batch, partly because I didn't send any last week, and
also just because the BPF fixes happened to land this week.

Summary:

- Fix a regression hit by the IPR SCSI driver, introduced by the
recent addition of MSI domains on pseries.

- A big series including 8 BPF fixes, some with potential security
impact and the rest various code generation issues.

- Fix our program check assembler entry path, which was accidentally
jumping into a gas macro and generating strange stack frames, which
could confuse find_bug().

- A couple of fixes, and related changes, to fix corner cases in our
machine check handling.

- Fix our DMA IOMMU ops, which were not always returning the optimal
DMA mask, leading to at least one device falling back to 32-bit DMA
when it shouldn't.

- A fix for KUAP handling on 32-bit Book3S.

- Fix crashes seen when kdumping on some pseries systems.

Thanks to Naveen N. Rao, Nicholas Piggin, Alexey Kardashevskiy, Cédric
Le Goater, Christophe Leroy, Mahesh Salgaonkar, Abdul Haleem,
Christoph Hellwig, Johan Almbladh, Stan Johnson"

* tag 'powerpc-5.15-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init
powerpc/32s: Fix kuap_kernel_restore()
powerpc/pseries/msi: Add an empty irq_write_msi_msg() handler
powerpc/64s: Fix unrecoverable MCE calling async handler from NMI
powerpc/64/interrupt: Reconcile soft-mask state in NMI and fix false BUG
powerpc/64: warn if local irqs are enabled in NMI or hardirq context
powerpc/traps: do not enable irqs in _exception
powerpc/64s: fix program check interrupt emergency stack path
powerpc/bpf ppc32: Fix BPF_SUB when imm == 0x80000000
powerpc/bpf ppc32: Do not emit zero extend instruction for 64-bit BPF_END
powerpc/bpf ppc32: Fix JMP32_JSET_K
powerpc/bpf ppc32: Fix ALU32 BPF_ARSH operation
powerpc/bpf: Emit stf barrier instruction sequences for BPF_NOSPEC
powerpc/security: Add a helper to query stf_barrier type
powerpc/bpf: Fix BPF_SUB when imm == 0x80000000
powerpc/bpf: Fix BPF_MOD when imm == 1
powerpc/bpf: Validate branch ranges
powerpc/lib: Add helper to check if offset is within conditional branch range
powerpc/iommu: Report the correct most efficient DMA mask for PCI devices

4y ago

Linus Torvalds

6890acac

Merge tag 'objtool_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

4y ago

Michael Ellerman

cdeb5d7d

KVM: PPC: Book3S HV: Make idle_kvm_start_guest() return 0 if it went to guest

4y ago

Randy Dunlap

09b6addf

VDUSE: fix documentation underline warning

4y ago

Jens Axboe

78f8876c

io-wq: exclusively gate signal based exit on get_signal() return

4y ago

Christoph Hellwig

c4110804

kyber: avoid q->disk dereferences in trace points

4y ago

Linus Torvalds

75cd9b01

Merge tag 'objtool_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

4y ago

Mahesh Salgaonkar

eb8257a1

pseries/eeh: Fix the kdump kernel crash during eeh_pseries_init

On pseries LPAR when an empty slot is assigned to partition OR in single
LPAR mode, kdump kernel crashes during issuing PHB reset.

In the kdump scenario, we traverse all PHBs and issue reset using the
pe_config_addr of the first child device present under each PHB. However
the code assumes that none of the PHB slots can be empty and uses
list_first_entry() to get the first child device under the PHB. Since
list_first_entry() expects the list to be non-empty, it returns an
invalid pci_dn entry and ends up accessing NULL phb pointer under
pci_dn->phb causing kdump kernel crash.

This patch fixes the below kdump kernel crash by skipping empty slots:

audit: initializing netlink subsys (disabled)
thermal_sys: Registered thermal governor 'fair_share'
thermal_sys: Registered thermal governor 'step_wise'
cpuidle: using governor menu
pstore: Registered nvram as persistent store backend
Issue PHB reset ...
audit: type=2000 audit(1631267818.000:1): state=initialized audit_enabled=0 res=1
BUG: Kernel NULL pointer dereference on read at 0x00000268
Faulting instruction address: 0xc000000008101fb0
Oops: Kernel access of bad area, sig: 7 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
Modules linked in:
CPU: 7 PID: 1 Comm: swapper/7 Not tainted 5.14.0 #1
NIP: c000000008101fb0 LR: c000000009284ccc CTR: c000000008029d70
REGS: c00000001161b840 TRAP: 0300 Not tainted (5.14.0)
MSR: 8000000002009033 <SF,VEC,EE,ME,IR,DR,RI,LE> CR: 28000224 XER: 20040002
CFAR: c000000008101f0c DAR: 0000000000000268 DSISR: 00080000 IRQMASK: 0
...
NIP pseries_eeh_get_pe_config_addr+0x100/0x1b0
LR __machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350
Call Trace:
0xc00000001161bb80 (unreliable)
__machine_initcall_pseries_eeh_pseries_init+0x2cc/0x350
do_one_initcall+0x60/0x2d0
kernel_init_freeable+0x350/0x3f8
kernel_init+0x3c/0x17c
ret_from_kernel_thread+0x5c/0x64

Fixes: 5a090f7c363fd ("powerpc/pseries: PCIE PHB reset")
Signed-off-by: Mahesh Salgaonkar <mahesh@linux.ibm.com>
[mpe: Tweak wording and trim oops]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/163215558252.413351.8600189949820258982.stgit@jupiter

4y ago

Linus Torvalds

f644750c

Merge tag 'edac_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

4y ago

Michael Forney

86e1e054

objtool: Update section header before relocations

4y ago

Michael Ellerman

9b4416c5

KVM: PPC: Book3S HV: Fix stack handling in idle_kvm_start_guest()

In commit 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in
C") kvm_start_guest() became idle_kvm_start_guest(). The old code
allocated a stack frame on the emergency stack, but didn't use the
frame to store anything, and also didn't store anything in its caller's
frame.

idle_kvm_start_guest() on the other hand is written more like a normal C
function, it creates a frame on entry, and also stores CR/LR into its
callers frame (per the ABI). The problem is that there is no caller
frame on the emergency stack.

The emergency stack for a given CPU is allocated with:

paca_ptrs[i]->emergency_sp = alloc_stack(limit, i) + THREAD_SIZE;

So emergency_sp actually points to the first address above the emergency
stack allocation for a given CPU, we must not store above it without
first decrementing it to create a frame. This is different to the
regular kernel stack, paca->kstack, which is initialised to point at an
initial frame that is ready to use.

idle_kvm_start_guest() stores the backchain, CR and LR all of which
write outside the allocation for the emergency stack. It then creates a
stack frame and saves the non-volatile registers. Unfortunately the
frame it creates is not large enough to fit the non-volatiles, and so
the saving of the non-volatile registers also writes outside the
emergency stack allocation.

The end result is that we corrupt whatever is at 0-24 bytes, and 112-248
bytes above the emergency stack allocation.

In practice this has gone unnoticed because the memory immediately above
the emergency stack happens to be used for other stack allocations,
either another CPUs mc_emergency_sp or an IRQ stack. See the order of
calls to irqstack_early_init() and emergency_stack_init().

The low addresses of another stack are the top of that stack, and so are
only used if that stack is under extreme pressue, which essentially
never happens in practice - and if it did there's a high likelyhood we'd
crash due to that stack overflowing.

Still, we shouldn't be corrupting someone else's stack, and it is purely
luck that we aren't corrupting something else.

To fix it we save CR/LR into the caller's frame using the existing r1 on
entry, we then create a SWITCH_FRAME_SIZE frame (which has space for
pt_regs) on the emergency stack with the backchain pointing to the
existing stack, and then finally we switch to the new frame on the
emergency stack.

Fixes: 10d91611f426 ("powerpc/64s: Reimplement book3s idle code in C")
Cc: stable@vger.kernel.org # v5.2+
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015133929.832061-1-mpe@ellerman.id.au

4y ago

Michael S. Tsirkin

ff631988

Revert "virtio-blk: Add validation for block size in config space"

4y ago

Pavel Begunkov

7df778be

io_uring: make OP_CLOSE consistent with direct open

4y ago

Christoph Hellwig

aec89dc5

block: keep q_usage_counter in atomic mode after del_gendisk

4y ago

Linus Torvalds

c22ccc4a

Merge tag 'x86_urgent_for_v5.15_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

4y ago

Joe Lawrence

fe255fe6

objtool: Remove redundant 'len' field from struct section

4y ago

Christophe Leroy

d93f9e23

powerpc/32s: Fix kuap_kernel_restore()

4y ago

Linus Torvalds

60ebc28b

Merge tag 'perf_urgent_for_v5.15_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

4y ago

Hans Potsch

d9b7748f

EDAC/armada-xp: Fix output of uncorrectable error counter

4y ago

Michael Forney

b46179d6

objtool: Check for gelf_update_rel[a] failures

4y ago

Cédric Le Goater

6f779e1d

powerpc/xive: Discard disabled interrupts in get_irqchip_state()

4y ago

Wu Zongyong

97f854be

vhost_vdpa: unset vq irq before freeing irq

4y ago

Pavel Begunkov

9f3a2cb2

io_uring: kill extra checks in io_write()

4y ago

Christoph Hellwig

8e141f9e

block: drain file system I/O on del_gendisk

4y ago

Linus Torvalds

7fd2bf83

Merge branch 'i2c/for-current-fixed' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

4y ago

Borislav Petkov

d298b035

x86/fpu: Restore the masking out of reserved MXCSR bits

4y ago

Joe Lawrence

dc023681

objtool: Make .altinstructions section entry size consistent

4y ago

Cédric Le Goater

5a4b0320

powerpc/pseries/msi: Add an empty irq_write_msi_msg() handler

4y ago

Linus Torvalds

424e7d87

Merge tag 'efi-urgent-for-v5.15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

4y ago

Kan Liang

71920ea9

perf/x86/msr: Add Sapphire Rapids CPU support

4y ago

Halil Pasic

2f9a174f

virtio: write back F_VERSION_1 before validate

The virtio specification virtio-v1.1-cs01 states: "Transitional devices
MUST detect Legacy drivers by detecting that VIRTIO_F_VERSION_1 has not
been acknowledged by the driver." This is exactly what QEMU as of 6.1
has done relying solely on VIRTIO_F_VERSION_1 for detecting that.

However, the specification also says: "... the driver MAY read (but MUST
NOT write) the device-specific configuration fields to check that it can
support the device ..." before setting FEATURES_OK.

In that case, any transitional device relying solely on
VIRTIO_F_VERSION_1 for detecting legacy drivers will return data in
legacy format. In particular, this implies that it is in big endian
format for big endian guests. This naturally confuses the driver which
expects little endian in the modern mode.

It is probably a good idea to amend the spec to clarify that
VIRTIO_F_VERSION_1 can only be relied on after the feature negotiation
is complete. Before validate callback existed, config space was only
read after FEATURES_OK. However, we already have two regressions, so
let's address this here as well.

The regressions affect the VIRTIO_NET_F_MTU feature of virtio-net and
the VIRTIO_BLK_F_BLK_SIZE feature of virtio-blk for BE guests when
virtio 1.0 is used on both sides. The latter renders virtio-blk unusable
with DASD backing, because things simply don't work with the default.
See Fixes tags for relevant commits.

For QEMU, we can work around the issue by writing out the feature bits
with VIRTIO_F_VERSION_1 bit set. We (ab)use the finalize_features
config op for this. This isn't enough to address all vhost devices since
these do not get the features until FEATURES_OK, however it looks like
the affected devices actually never handled the endianness for legacy
mode correctly, so at least that's not a regression.

No devices except virtio net and virtio blk seem to be affected.

Long term the right thing to do is to fix the hypervisors.

Cc: <stable@vger.kernel.org> #v4.11
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: 82e89ea077b9 ("virtio-blk: Add validation for block size in config space")
Fixes: fe36cbe0671e ("virtio_net: clear MTU when out of range")
Reported-by: markver@us.ibm.com
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Link: https://lore.kernel.org/r/20211011053921.1198936-1-pasic@linux.ibm.com
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>