commits

Pull x86 fixes from Thomas Gleixner:
"Three interrupt related fixes for X86:

- Move disabling of the local APIC after invoking fixup_irqs() to
ensure that interrupts which are incoming are noted in the IRR and
not ignored.

- Unbreak affinity setting.

The rework of the entry code reused the regular exception entry
code for device interrupts. The vector number is pushed into the
errorcode slot on the stack which is then lifted into an argument
and set to -1 because that's regs->orig_ax which is used in quite
some places to check whether the entry came from a syscall.

But it was overlooked that orig_ax is used in the affinity cleanup
code to validate whether the interrupt has arrived on the new
target. It turned out that this vector check is pointless because
interrupts are never moved from one vector to another on the same
CPU. That check is a historical leftover from the time where x86
supported multi-CPU affinities, but not longer needed with the now
strict single CPU affinity. Famous last words ...

- Add a missing check for an empty cpumask into the matrix allocator.

The affinity change added a warning to catch the case where an
interrupt is moved on the same CPU to a different vector. This
triggers because a condition with an empty cpumask returns an
assignment from the allocator as the allocator uses for_each_cpu()
without checking the cpumask for being empty. The historical
inconsistent for_each_cpu() behaviour of ignoring the cpumask and
unconditionally claiming that CPU0 is in the mask struck again.
Sigh.

plus a new entry into the MAINTAINER file for the HPE/UV platform"

* tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/matrix: Deal with the sillyness of for_each_cpu() on UP
x86/irq: Unbreak interrupt affinity setting
x86/hotplug: Silence APIC only after all interrupts are migrated
MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers

5y ago

Herbert Xu

c195d66a

crypto: af_alg - Work around empty control messages without MSG_MORE

5y ago

Linus Torvalds

d2283cdc

Merge tag 'irq-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5y ago

Thomas Gleixner

784a0830

genirq/matrix: Deal with the sillyness of for_each_cpu() on UP

5y ago

Randy Dunlap

bfe8fe93

crypto: sa2ul - add Kconfig selects to fix build error

5y ago

Linus Torvalds

0063a82d

Merge tag 'sched-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5y ago

Thomas Gleixner

ceb2465c

Merge tag 'irqchip-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

5y ago

Thomas Gleixner

e027ffff

x86/irq: Unbreak interrupt affinity setting

Several people reported that 5.8 broke the interrupt affinity setting
mechanism.

The consolidation of the entry code reused the regular exception entry code
for device interrupts and changed the way how the vector number is conveyed
from ptregs->orig_ax to a function argument.

The low level entry uses the hardware error code slot to push the vector
number onto the stack which is retrieved from there into a function
argument and the slot on stack is set to -1.

The reason for setting it to -1 is that the error code slot is at the
position where pt_regs::orig_ax is. A positive value in pt_regs::orig_ax
indicates that the entry came via a syscall. If it's not set to a negative
value then a signal delivery on return to userspace would try to restart a
syscall. But there are other places which rely on pt_regs::orig_ax being a
valid indicator for syscall entry.

But setting pt_regs::orig_ax to -1 has a nasty side effect vs. the
interrupt affinity setting mechanism, which was overlooked when this change
was made.

Moving interrupts on x86 happens in several steps. A new vector on a
different CPU is allocated and the relevant interrupt source is
reprogrammed to that. But that's racy and there might be an interrupt
already in flight to the old vector. So the old vector is preserved until
the first interrupt arrives on the new vector and the new target CPU. Once
that happens the old vector is cleaned up, but this cleanup still depends
on the vector number being stored in pt_regs::orig_ax, which is now -1.

That -1 makes the check for cleanup: pt_regs::orig_ax == new_vector
always false. As a consequence the interrupt is moved once, but then it
cannot be moved anymore because the cleanup of the old vector never
happens.

There would be several ways to convey the vector information to that place
in the guts of the interrupt handling, but on deeper inspection it turned
out that this check is pointless and a leftover from the old affinity model
of X86 which supported multi-CPU affinities. Under this model it was
possible that an interrupt had an old and a new vector on the same CPU, so
the vector match was required.

Under the new model the effective affinity of an interrupt is always a
single CPU from the requested affinity mask. If the affinity mask changes
then either the interrupt stays on the CPU and on the same vector when that
CPU is still in the new affinity mask or it is moved to a different CPU, but
it is never moved to a different vector on the same CPU.

Ergo the cleanup check for the matching vector number is not required and
can be removed which makes the dependency on pt_regs:orig_ax go away.

The remaining check for new_cpu == smp_processsor_id() is completely
sufficient. If it matches then the interrupt was successfully migrated and
the cleanup can proceed.

For paranoia sake add a warning into the vector assignment code to
validate that the assumption of never moving to a different vector on
the same CPU holds.

Fixes: 633260fa143 ("x86/irq: Convey vector as argument and not in ptregs")
Reported-by: Alex bykov <alex.bykov@scylladb.com>
Reported-by: Avi Kivity <avi@scylladb.com>
Reported-by: Alexander Graf <graf@amazon.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexander Graf <graf@amazon.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87wo1ltaxz.fsf@nanos.tec.linutronix.de

5y ago

Wei Yongjun

11a954ee

crypto: ingenic - Drop kfree for memory allocated with devm_kzalloc

5y ago

Linus Torvalds

b69bea8a

Merge tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5y ago

Marco Elver

c94a88f3

sched: Use __always_inline on is_idle_task()

5y ago

Guenter Roeck

f107cee9

genirq: Unlock irq descriptor after errors

5y ago

Paul Cercueil

821fc9e2

irqchip/ingenic: Leave parent IRQ unmasked on suspend

5y ago

Ashok Raj

52d6b926

x86/hotplug: Silence APIC only after all interrupts are migrated

5y ago

Giovanni Cabiddu

9a5a668d

crypto: qat - add delay before polling mailbox

5y ago

Linus Torvalds

3edd8db2

Merge tag '5.9-rc2-smb-fix' of git://git.samba.org/sfrench/cifs-2.6

5y ago

Peter Zijlstra

eb1f0023

lockdep,trace: Expose tracepoints

5y ago

Linus Torvalds

d012a719

Linux 5.9-rc2 v5.9-rc2

5y ago

Guenter Roeck

e27b1636

genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()

5y ago

qiuguorui1

e579076a

irqchip/stm32-exti: Avoid losing interrupts due to clearing pending bits by mistake

5y ago

Steve Wahl

d4f07268

MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers

5y ago

Linus Torvalds

9123e3a7

Linux 5.9-rc1 v5.9-rc1

5y ago

Linus Torvalds

8bb5021c

Merge tag 'powerpc-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

5y ago

Paulo Alcantara

e183785f

cifs: fix check of tcon dfs in smb1

5y ago

Nicholas Piggin

044d0d6d

lockdep: Only trace IRQ edges

5y ago

Linus Torvalds

cb957121

Merge tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

- Add perf support for emitting extended registers for power10.

- A fix for CPU hotplug on pseries, where on large/loaded systems we
may not wait long enough for the CPU to be offlined, leading to
crashes.

- Addition of a raw cputable entry for Power10, which is not required
to boot, but is required to make our PMU setup work correctly in
guests.

- Three fixes for the recent changes on 32-bit Book3S to move modules
into their own segment for strict RWX.

- A fix for a recent change in our powernv PCI code that could lead to
crashes.

- A change to our perf interrupt accounting to avoid soft lockups when
using some events, found by syzkaller.

- A change in the way we handle power loss events from the hypervisor
on pseries. We no longer immediately shut down if we're told we're
running on a UPS.

- A few other minor fixes.

Thanks to Alexey Kardashevskiy, Andreas Schwab, Aneesh Kumar K.V, Anju T
Sudhakar, Athira Rajeev, Christophe Leroy, Frederic Barrat, Greg Kurz,
Kajol Jain, Madhavan Srinivasan, Michael Neuling, Michael Roth,
Nageswara R Sastry, Oliver O'Halloran, Thiago Jung Bauermann,
Vaidyanathan Srinivasan, Vasant Hegde.

* tag 'powerpc-5.9-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver
powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000
powerpc/pseries: Do not initiate shutdown when system is running on UPS
powerpc/perf: Fix soft lockups due to missed interrupt accounting
powerpc/powernv/pci: Fix possible crash when releasing DMA resources
powerpc/pseries/hotplug-cpu: wait indefinitely for vCPU death
powerpc/32s: Fix is_module_segment() when MODULES_VADDR is defined
powerpc/kasan: Fix KASAN_SHADOW_START on BOOK3S_32
powerpc/fixmap: Fix the size of the early debug area
powerpc/pkeys: Fix build error with PPC_MEM_KEYS disabled
powerpc/kernel: Cleanup machine check function declarations
powerpc: Add POWER10 raw mode cputable entry
powerpc/perf: Add extended regs support for power10 platform
powerpc/perf: Add support for outputting extended regs in perf intr_regs
powerpc: Fix P10 PVR revision in /proc/cpuinfo for SMT4 cores

5y ago

Linus Torvalds

fb893de3

Merge tag 'tag-chrome-platform-for-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/chrome-platform/linux

5y ago

Marc Zyngier

a150dac5

irqchip: Revert modular support for drivers using IRQCHIP_PLATFORM_DRIVER helperse

5y ago

Sean Christopherson

6a3ea3e6

x86/entry/64: Do not use RDPID in paranoid entry to accomodate KVM

KVM has an optmization to avoid expensive MRS read/writes on
VMENTER/EXIT. It caches the MSR values and restores them either when
leaving the run loop, on preemption or when going out to user space.

The affected MSRs are not required for kernel context operations. This
changed with the recently introduced mechanism to handle FSGSBASE in the
paranoid entry code which has to retrieve the kernel GSBASE value by
accessing per CPU memory. The mechanism needs to retrieve the CPU number
and uses either LSL or RDPID if the processor supports it.

Unfortunately RDPID uses MSR_TSC_AUX which is in the list of cached and
lazily restored MSRs, which means between the point where the guest value
is written and the point of restore, MSR_TSC_AUX contains a random number.

If an NMI or any other exception which uses the paranoid entry path happens
in such a context, then RDPID returns the random guest MSR_TSC_AUX value.

As a consequence this reads from the wrong memory location to retrieve the
kernel GSBASE value. Kernel GS is used to for all regular this_cpu_*()
operations. If the GSBASE in the exception handler points to the per CPU
memory of a different CPU then this has the obvious consequences of data
corruption and crashes.

As the paranoid entry path is the only place which accesses MSR_TSX_AUX
(via RDPID) and the fallback via LSL is not significantly slower, remove
the RDPID alternative from the entry path and always use LSL.

The alternative would be to write MSR_TSC_AUX on every VMENTER and VMEXIT
which would be inflicting massive overhead on that code path.

[ tglx: Rewrote changelog ]

Fixes: eaad981291ee3 ("x86/entry/64: Introduce the FIND_PERCPU_BASE macro")
Reported-by: Tom Lendacky <thomas.lendacky@amd.com>
Debugged-by: Tom Lendacky <thomas.lendacky@amd.com>
Suggested-by: Andy Lutomirski <luto@kernel.org>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20200821105229.18938-1-pbonzini@redhat.com

5y ago

Linus Torvalds

2cc3c4b3

Merge tag 'io_uring-5.9-2020-08-15' of git://git.kernel.dk/linux-block

5y ago

Linus Torvalds

6f0306d1

Merge tag 'usb-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB fixes from Greg KH:
"Let's try this again... Here are some USB fixes for 5.9-rc3.

This differs from the previous pull request for this release in that
the usb gadget patch now does not break some systems, and actually
does what it was intended to do. Many thanks to Marek Szyprowski for
quickly noticing and testing the patch from Andy Shevchenko to resolve
this issue.

Additionally, some more new USB quirks have been added to get some new
devices to work properly based on user reports.

Other than that, the patches are all here, and they contain:

- usb gadget driver fixes

- xhci driver fixes

- typec fixes

- new quirks and ids

- fixes for USB patches that went into 5.9-rc1.

All of these have been tested in linux-next with no reported issues"

* tag 'usb-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
usb: storage: Add unusual_uas entry for Sony PSZ drives
USB: Ignore UAS for JMicron JMS567 ATA/ATAPI Bridge
usb: host: ohci-exynos: Fix error handling in exynos_ohci_probe()
USB: gadget: u_f: Unbreak offset calculation in VLAs
USB: quirks: Ignore duplicate endpoint on Sound Devices MixPre-D
usb: typec: tcpm: Fix Fix source hard reset response for TDA 2.3.1.1 and TDA 2.3.1.2 failures
USB: PHY: JZ4770: Fix static checker warning.
USB: gadget: f_ncm: add bounds checks to ncm_unwrap_ntb()
USB: gadget: u_f: add overflow checks to VLA macros
xhci: Always restore EP_SOFT_CLEAR_TOGGLE even if ep reset failed
xhci: Do warm-reset when both CAS and XDEV_RESUME are set
usb: host: xhci: fix ep context print mismatch in debugfs
usb: uas: Add quirk for PNY Pro Elite
tools: usb: move to tools buildsystem
USB: Fix device driver race
USB: Also match device drivers using the ->match vfunc
usb: host: xhci-tegra: fix tegra_xusb_get_phy()
usb: host: xhci-tegra: otg usb2/usb3 port init
usb: hcd: Fix use after free in usb_hcd_pci_remove()
usb: typec: ucsi: Hold con->lock for the entire duration of ucsi_register_port()
...

5y ago

Christophe Leroy

4a133eb3

powerpc/32s: Disable VMAP stack which CONFIG_ADB_PMU

5y ago

Peter Zijlstra

99dc56fe

mips: Implement arch_irqs_disabled()

5y ago

Linus Torvalds

550c2129

Merge tag 'x86-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5y ago

Kajol Jain

64ef8f2c

powerpc/perf/hv-24x7: Move cpumask file to top folder of hv-24x7 driver

5y ago

Linus Torvalds

d668e848

Merge tag 'for-linus-5.9-ofs1' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux

5y ago

Brian Norris

fc8cacf3

platform/chrome: cros_ec_proto: check for missing EC_CMD_HOST_EVENT_GET_WAKE_MASK

5y ago

Marc Zyngier

7828a3ef

irqchip: Fix probing deferal when using IRQCHIP_PLATFORM_DRIVER helpers

5y ago

Linus Torvalds

50f6c7db

Merge tag 'x86-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5y ago

Mike Rapoport

6f6aea7e

parisc: fix PMD pages allocation by restoring pmd_alloc_one()

5y ago

Jens Axboe

f91daf56

io_uring: short circuit -EAGAIN for blocking read attempt

5y ago

Linus Torvalds

42df60fc

Merge tag 'edac_urgent_for_v5.9_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

5y ago

Alan Stern

20934c0d

usb: storage: Add unusual_uas entry for Sony PSZ drives

5y ago

Pratik Rajesh Sampat

16d83a54

Revert "powerpc/powernv/idle: Replace CPU feature check with PVR check"

5y ago

Peter Zijlstra

021c1093

arm64: Implement arch_irqs_disabled()

5y ago

Linus Torvalds

cea05c19

Merge tag 'perf-urgent-2020-08-23' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5y ago

Christophe Leroy

541cebb5

powerpc/32s: Fix module loading failure when VMALLOC_END is over 0xf0000000

5y ago

Linus Torvalds

57d528bf

Merge tag 'zonefs-5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

5y ago

Jing Xiangfeng

e848643b

orangefs: remove unnecessary assignment to variable ret

5y ago

Brian Norris

c214e564

platform/chrome: cros_ec_proto: ignore unnecessary wakeups on old ECs

5y ago

Lokesh Vutla

6da45875

arm64: dts: k3-am65: Update the RM resource types

5y ago

Linus Torvalds

1195d58f

Merge tag 'sched-urgent-2020-08-15' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5y ago

Sebastian Andrzej Siewior

a6d996cb

x86/alternatives: Acquire pte lock with interrupts enabled

5y ago

Linus Torvalds

4b6c093e

Merge tag 'block-5.9-2020-08-14' of git://git.kernel.dk/linux-block

5y ago

Jens Axboe

d4e7cd36

io_uring: sanitize double poll handling

5y ago

Linus Torvalds

c4011283

Merge tag 'dma-mapping-5.9-2' of git://git.infradead.org/users/hch/dma-mapping

5y ago

Shiju Jose

b972fdba

EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()

5y ago

Cyril Roelandt

9aa37788

USB: Ignore UAS for JMicron JMS567 ATA/ATAPI Bridge

5y ago

Linux 5.9-rc3 v5.9-rc3

f75aef39

Linus Torvalds

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

e43327c7

Linus Torvalds

Merge tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

dcc5c6f0

Linus Torvalds

crypto: af_alg - Work around empty control messages without MSG_MORE

c195d66a

Herbert Xu

Merge tag 'irq-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

d2283cdc

Linus Torvalds

genirq/matrix: Deal with the sillyness of for_each_cpu() on UP

784a0830

Thomas Gleixner

crypto: sa2ul - add Kconfig selects to fix build error

bfe8fe93

Randy Dunlap

Merge tag 'sched-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

0063a82d

Linus Torvalds

Merge tag 'irqchip-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

ceb2465c

Thomas Gleixner

x86/irq: Unbreak interrupt affinity setting

e027ffff

Thomas Gleixner

crypto: ingenic - Drop kfree for memory allocated with devm_kzalloc

11a954ee

Wei Yongjun

Merge tag 'locking-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

b69bea8a

Linus Torvalds

sched: Use __always_inline on is_idle_task()

c94a88f3

Marco Elver

genirq: Unlock irq descriptor after errors

f107cee9

Guenter Roeck

irqchip/ingenic: Leave parent IRQ unmasked on suspend

821fc9e2

Paul Cercueil

x86/hotplug: Silence APIC only after all interrupts are migrated

52d6b926

Ashok Raj

crypto: qat - add delay before polling mailbox

9a5a668d

Giovanni Cabiddu

Merge tag '5.9-rc2-smb-fix' of git://git.samba.org/sfrench/cifs-2.6

3edd8db2

Linus Torvalds

lockdep,trace: Expose tracepoints

eb1f0023

Peter Zijlstra

Linux 5.9-rc2 v5.9-rc2

d012a719

Linus Torvalds

genirq/PM: Always unlock IRQ descriptor in rearm_wake_irq()

e27b1636

Guenter Roeck

irqchip/stm32-exti: Avoid losing interrupts due to clearing pending bits by mistake

e579076a

qiuguorui1

MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers

d4f07268

Steve Wahl

Linux 5.9-rc1 v5.9-rc1

9123e3a7

Linus Torvalds

Merge tag 'powerpc-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

8bb5021c

Linus Torvalds

cifs: fix check of tcon dfs in smb1

e183785f

Paulo Alcantara

lockdep: Only trace IRQ edges

Problem:

raw_local_irq_save(); // software state on
local_irq_save(); // software state off
...
local_irq_restore(); // software state still off, because we don't enable IRQs
raw_local_irq_restore(); // software state still off, *whoopsie*

existing instances:

- lock_acquire()
raw_local_irq_save()
__lock_acquire()
arch_spin_lock(&graph_lock)
pv_wait() := kvm_wait() (same or worse for Xen/HyperV)
local_irq_save()

- trace_clock_global()
raw_local_irq_save()
arch_spin_lock()
pv_wait() := kvm_wait()
local_irq_save()

- apic_retrigger_irq()
raw_local_irq_save()
apic->send_IPI() := default_send_IPI_single_phys()
local_irq_save()

Possible solutions:

A) make it work by enabling the tracing inside raw_*()
B) make it work by keeping tracing disabled inside raw_*()
C) call it broken and clean it up now

Now, given that the only reason to use the raw_* variant is because you don't
want tracing. Therefore A) seems like a weird option (although it can be done).
C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just
because there is one raw user, this strips the validation/tracing off for all
the other users.

So we pick B) and declare any code that ends up doing:

raw_local_irq_save()
local_irq_save()
lockdep_assert_irqs_disabled();

broken. AFAICT this problem has existed forever, the only reason it came
up is because commit: 859d069ee1dd ("lockdep: Prepare for NMI IRQ
state tracking") changed IRQ tracing vs lockdep recursion and the
first instance is fairly common, the other cases hardly ever happen.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[rewrote changelog]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Marco Elver <elver@google.com>
Link: https://lkml.kernel.org/r/20200723105615.1268126-1-npiggin@gmail.com