commits

Pull x86 mm changes from Ingo Molnar:
"PCID support, 5-level paging support, Secure Memory Encryption support

The main changes in this cycle are support for three new, complex
hardware features of x86 CPUs:

- Add 5-level paging support, which is a new hardware feature on
upcoming Intel CPUs allowing up to 128 PB of virtual address space
and 4 PB of physical RAM space - a 512-fold increase over the old
limits. (Supercomputers of the future forecasting hurricanes on an
ever warming planet can certainly make good use of more RAM.)

Many of the necessary changes went upstream in previous cycles,
v4.14 is the first kernel that can enable 5-level paging.

This feature is activated via CONFIG_X86_5LEVEL=y - disabled by
default.

(By Kirill A. Shutemov)

- Add 'encrypted memory' support, which is a new hardware feature on
upcoming AMD CPUs ('Secure Memory Encryption', SME) allowing system
RAM to be encrypted and decrypted (mostly) transparently by the
CPU, with a little help from the kernel to transition to/from
encrypted RAM. Such RAM should be more secure against various
attacks like RAM access via the memory bus and should make the
radio signature of memory bus traffic harder to intercept (and
decrypt) as well.

This feature is activated via CONFIG_AMD_MEM_ENCRYPT=y - disabled
by default.

(By Tom Lendacky)

- Enable PCID optimized TLB flushing on newer Intel CPUs: PCID is a
hardware feature that attaches an address space tag to TLB entries
and thus allows to skip TLB flushing in many cases, even if we
switch mm's.

(By Andy Lutomirski)

All three of these features were in the works for a long time, and
it's coincidence of the three independent development paths that they
are all enabled in v4.14 at once"

* 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (65 commits)
x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)
x86/mm: Use pr_cont() in dump_pagetable()
x86/mm: Fix SME encryption stack ptr handling
kvm/x86: Avoid clearing the C-bit in rsvd_bits()
x86/CPU: Align CR3 defines
x86/mm, mm/hwpoison: Clear PRESENT bit for kernel 1:1 mappings of poison pages
acpi, x86/mm: Remove encryption mask from ACPI page protection type
x86/mm, kexec: Fix memory corruption with SME on successive kexecs
x86/mm/pkeys: Fix typo in Documentation/x86/protection-keys.txt
x86/mm/dump_pagetables: Speed up page tables dump for CONFIG_KASAN=y
x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID
x86: Enable 5-level paging support via CONFIG_X86_5LEVEL=y
x86/mm: Allow userspace have mappings above 47-bit
x86/mm: Prepare to expose larger address space to userspace
x86/mpx: Do not allow MPX if we have mappings above 47-bit
x86/mm: Rename tasksize_32bit/64bit to task_size_32bit/64bit()
x86/xen: Redefine XEN_ELFNOTE_INIT_P2M using PUD_SIZE * PTRS_PER_PUD
x86/mm/dump_pagetables: Fix printout of p4d level
x86/mm/dump_pagetables: Generalize address normalization
x86/boot: Fix memremap() related build failure
...

8y ago

Alexandre Belloni

51218298

alarmtimer: Ensure RTC module is not unloaded

8y ago

Thomas Gleixner

b33394ba

genirq/proc: Avoid uninitalized variable warning

8y ago

Minghuan Lian

ae3efabf

irqchip/ls-scfg-msi: Add MSI affinity support

8y ago

Linus Torvalds

31ba04d9

Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

8y ago

Thomas Gleixner

19d39a38

genirq: Keep chip buslock across irq_request/release_resources()

Moving the irq_request/release_resources() callbacks out of the spinlocked,
irq disabled and bus locked region, unearthed an interesting abuse of the
irq_bus_lock/irq_bus_sync_unlock() callbacks.

The OMAP GPIO driver does merily power management inside of them. The
irq_request_resources() callback of this GPIO irqchip calls a function
which reads a GPIO register. That read aborts now because the clock of the
GPIO block is not magically enabled via the irq_bus_lock() callback.

Move the callbacks under the bus lock again to prevent this. In the
free_irq() path this requires to drop the bus_lock before calling
synchronize_irq() and reaquiring it before calling the
irq_release_resources() callback.

The bus lock can't be held because:

1) The data which has been changed between bus_lock/un_lock is cached in
the irq chip driver private data and needs to go out to the irq chip
via the slow bus (usually SPI or I2C) before calling
synchronize_irq().

That's the reason why this bus_lock/unlock magic exists in the first
place, as you cannot do SPI/I2C transactions while holding desc->lock
with interrupts disabled.

2) synchronize_irq() will actually deadlock, if there is a handler on
flight. These chips use threaded handlers for obvious reasons, as
they allow to do SPI/I2C communication. When the threaded handler
returns then bus_lock needs to be taken in irq_finalize_oneshot() as
we need to talk to the actual irq chip once more. After that the
threaded handler is marked done, which makes synchronize_irq() return.

So if we hold bus_lock accross the synchronize_irq() call, the
handler cannot mark itself done because it blocks on the bus
lock. That in turn makes synchronize_irq() wait forever on the
threaded handler to complete....

Add the missing unlock of desc->request_mutex in the error path of
__free_irq() and add a bunch of comments to explain the locking and
protection rules.

Fixes: 46e48e257360 ("genirq: Move irq resource handling out of spinlocked region")
Reported-and-tested-by: Sebastian Reichel <sebastian.reichel@collabora.co.uk>
Reported-and-tested-by: Tony Lindgren <tony@atomide.com>
Reported-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Not-longer-ranted-at-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: Grygorii Strashko <grygorii.strashko@ti.com>
Cc: Marc Zyngier <marc.zyngier@arm.com>

8y ago

Colin Ian King

5707b46a

x86/intel_rdt: Remove redundant ternary operator on return

8y ago

Hans de Goede

594a30fb

x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature

8y ago

Linus Torvalds

5f82e71a

Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull locking updates from Ingo Molnar:

- Add 'cross-release' support to lockdep, which allows APIs like
completions, where it's not the 'owner' who releases the lock, to be
tracked. It's all activated automatically under
CONFIG_PROVE_LOCKING=y.

- Clean up (restructure) the x86 atomics op implementation to be more
readable, in preparation of KASAN annotations. (Dmitry Vyukov)

- Fix static keys (Paolo Bonzini)

- Add killable versions of down_read() et al (Kirill Tkhai)

- Rework and fix jump_label locking (Marc Zyngier, Paolo Bonzini)

- Rework (and fix) tlb_flush_pending() barriers (Peter Zijlstra)

- Remove smp_mb__before_spinlock() and convert its usages, introduce
smp_mb__after_spinlock() (Peter Zijlstra)

* 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (56 commits)
locking/lockdep/selftests: Fix mixed read-write ABBA tests
sched/completion: Avoid unnecessary stack allocation for COMPLETION_INITIALIZER_ONSTACK()
acpi/nfit: Fix COMPLETION_INITIALIZER_ONSTACK() abuse
locking/pvqspinlock: Relax cmpxchg's to improve performance on some architectures
smp: Avoid using two cache lines for struct call_single_data
locking/lockdep: Untangle xhlock history save/restore from task independence
locking/refcounts, x86/asm: Disable CONFIG_ARCH_HAS_REFCOUNT for the time being
futex: Remove duplicated code and fix undefined behaviour
Documentation/locking/atomic: Finish the document...
locking/lockdep: Fix workqueue crossrelease annotation
workqueue/lockdep: 'Fix' flush_work() annotation
locking/lockdep/selftests: Add mixed read-write ABBA tests
mm, locking/barriers: Clarify tlb_flush_pending() barriers
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE and CONFIG_LOCKDEP_COMPLETIONS truly non-interactive
locking/lockdep: Explicitly initialize wq_barrier::done::map
locking/lockdep: Rename CONFIG_LOCKDEP_COMPLETE to CONFIG_LOCKDEP_COMPLETIONS
locking/lockdep: Reword title of LOCKDEP_CROSSRELEASE config
locking/lockdep: Make CONFIG_LOCKDEP_CROSSRELEASE part of CONFIG_PROVE_LOCKING
locking/refcounts, x86/asm: Implement fast refcount overflow protection
locking/lockdep: Fix the rollback and overwrite detection logic in crossrelease
...

8y ago

Vitaly Kuznetsov

9e52fc2b

x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)

8y ago

Thomas Gleixner

bc30658e

Merge branch 'clockevents/4.14' of http://git.linaro.org/people/daniel.lezcano/linux into timers/core

8y ago

Dan Carpenter

20c4d49c

irqdomain: Prevent potential NULL pointer dereference in irq_domain_push_irq()

8y ago

Minghuan Lian

fd100dab

irqchip/ls-scfg-msi: Add LS1043a v1.1 MSI support

8y ago

Linus Torvalds

338a57d5

Merge tag 'regmap-fix-w1-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap

8y ago

Thomas Gleixner

dea1d0f5

smp/hotplug: Replace BUG_ON and react useful

8y ago

Marc Zyngier

c5c601c4

irqdomain: Allow ACPI device nodes to be used as irqdomain identifiers

8y ago

Vikas Shivappa

24247aee

x86/intel_rdt/cqm: Improve limbo list processing

8y ago

Thomas Gleixner

1d792a67

x86/idt: Remove the tracing IDT leftovers

8y ago

Linus Torvalds

6c51e67b

Merge branch 'x86-syscall-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

8y ago

Ingo Molnar

edc2988c

Merge branch 'linus' into locking/core, to fix up conflicts

8y ago

Jan Beulich

39e48d9b

x86/mm: Use pr_cont() in dump_pagetable()

8y ago

Thomas Gleixner

4e2a8097

Merge branch 'fortglx/4.14/time' of https://git.linaro.org/people/john.stultz/linux into timers/core

8y ago

Rob Herring

469869d1

clocksource: Convert to using %pOF instead of full_name

8y ago

kbuild test robot

ce8bdd69

genirq: Fix semicolon.cocci warnings

8y ago

Minghuan Lian

4dd5da65

irqchip/ls-scfg-msi: Add LS1046a MSI support

8y ago

Linus Torvalds

e8e9941b

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

8y ago

minimumlaw@rambler.ru

5b20a436

regmap: regmap-w1: Fix build troubles

8y ago

Thomas Gleixner

9cd4f1a4

smp/hotplug: Move unparking of percpu threads to the control CPU

Vikram reported the following backtrace:

BUG: scheduling while atomic: swapper/7/0/0x00000002
CPU: 7 PID: 0 Comm: swapper/7 Not tainted 4.9.32-perf+ #680
schedule
schedule_hrtimeout_range_clock
schedule_hrtimeout
wait_task_inactive
__kthread_bind_mask
__kthread_bind
__kthread_unpark
kthread_unpark
cpuhp_online_idle
cpu_startup_entry
secondary_start_kernel

He analyzed correctly that a parked cpu hotplug thread of an offlined CPU
was still on the runqueue when the CPU came back online and tried to unpark
it. This causes the thread which invoked kthread_unpark() to call
wait_task_inactive() and subsequently schedule() with preemption disabled.
His proposed workaround was to "make sure" that a parked thread has
scheduled out when the CPU goes offline, so the situation cannot happen.

But that's still wrong because the root cause is not the fact that the
percpu thread is still on the runqueue and neither that preemption is
disabled, which could be simply solved by enabling preemption before
calling kthread_unpark().

The real issue is that the calling thread is the idle task of the upcoming
CPU, which is not supposed to call anything which might sleep. The moron,
who wrote that code, missed completely that kthread_unpark() might end up
in schedule().

The solution is simpler than expected. The thread which controls the
hotplug operation is waiting for the CPU to call complete() on the hotplug
state completion. So the idle task of the upcoming CPU can set its state to
CPUHP_AP_ONLINE_IDLE and invoke complete(). This in turn wakes the control
task on a different CPU, which then can safely do the unpark and kick the
now unparked hotplug thread of the upcoming CPU to complete the bringup to
the final target state.

Control CPU AP

bringup_cpu();
__cpu_up() ------------>
bringup_ap();
bringup_wait_for_ap()
wait_for_completion();
cpuhp_online_idle();
<------------ complete();
unpark(AP->stopper);
unpark(AP->hotplugthread);
while(1)
do_idle();
kick(AP->hotplugthread);
wait_for_completion(); hotplug_thread()
run_online_callbacks();
complete();

Fixes: 8df3e07e7f21 ("cpu/hotplug: Let upcoming cpu bring itself fully up")
Reported-by: Vikram Mulukutla <markivx@codeaurora.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Sebastian Sewior <bigeasy@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Tejun Heo <tj@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1707042218020.2131@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

8y ago

Thomas Gleixner

f610c9d6

genirq/debugfs: Remove redundant NULL pointer check

8y ago

Vikas Shivappa

bbc4615e

x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug

8y ago

Thomas Gleixner

facaa3e3

x86/idt: Hide set_intr_gate()

8y ago

Linus Torvalds

e0a195b5

Merge branch 'x86-spinlocks-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

8y ago

Thomas Garnier

cf7de27a

arm64/syscalls: Check address limit on user-mode return

8y ago

Peter Zijlstra

d82fed75

locking/lockdep/selftests: Fix mixed read-write ABBA tests

8y ago

Linus Torvalds

81a84ad3

Merge branch 'docs-next' of git://git.lwn.net/linux

8y ago

Borislav Petkov

6e0b52d4

x86/mm: Fix SME encryption stack ptr handling

8y ago

Krzysztof Opasiak

3cf29496

posix-cpu-timers: Use dedicated helper to access rlimit values

8y ago

Geert Uytterhoeven

47b4a457

alarmtimer: Fix unavailable wake-up source in sysfs

8y ago

Markus Elfring

ebbe2665

clocksource/drivers/bcm2835: Remove message for a memory allocation failure

8y ago

sched/core: WARN() when migrating to an offline CPU

4ff9083b

Peter Zijlstra

sched/fair: Plug hole between hotplug and active_load_balance()

edd8e41d

Peter Zijlstra

sched/fair: Avoid newidle balance for !active CPUs

2800486e

Peter Zijlstra

sched/fair: Fix nuisance kernel-doc warning

46123355

Randy Dunlap

sched/cpuset/pm: Fix cpuset vs. suspend-resume bugs

50e76632

Peter Zijlstra

sched/fair: Fix wake_affine_llc() balancing rules

a731ebe6

Peter Zijlstra

Merge branch 'x86-apic-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

24e700e2

Linus Torvalds

Merge branch 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 cache quality monitoring update from Thomas Gleixner:
"This update provides a complete rewrite of the Cache Quality
Monitoring (CQM) facility.

The existing CQM support was duct taped into perf with a lot of issues
and the attempts to fix those turned out to be incomplete and
horrible.

After lengthy discussions it was decided to integrate the CQM support
into the Resource Director Technology (RDT) facility, which is the
obvious choise as in hardware CQM is part of RDT. This allowed to add
Memory Bandwidth Monitoring support on top.

As a result the mechanisms for allocating cache/memory bandwidth and
the corresponding monitoring mechanisms are integrated into a single
management facility with a consistent user interface"

* 'x86-cache-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (37 commits)
x86/intel_rdt: Turn off most RDT features on Skylake
x86/intel_rdt: Add command line options for resource director technology
x86/intel_rdt: Move special case code for Haswell to a quirk function
x86/intel_rdt: Remove redundant ternary operator on return
x86/intel_rdt/cqm: Improve limbo list processing
x86/intel_rdt/mbm: Fix MBM overflow handler during CPU hotplug
x86/intel_rdt: Modify the intel_pqr_state for better performance
x86/intel_rdt/cqm: Clear the default RMID during hotcpu
x86/intel_rdt: Show bitmask of shareable resource with other executing units
x86/intel_rdt/mbm: Handle counter overflow
x86/intel_rdt/mbm: Add mbm counter initialization
x86/intel_rdt/mbm: Basic counting of MBM events (total and local)
x86/intel_rdt/cqm: Add CPU hotplug support
x86/intel_rdt/cqm: Add sched_in support
x86/intel_rdt: Introduce rdt_enable_key for scheduling
x86/intel_rdt/cqm: Add mount,umount support
x86/intel_rdt/cqm: Add rmdir support
x86/intel_rdt: Separate the ctrl bits from rmdir
x86/intel_rdt/cqm: Add mon_data
x86/intel_rdt: Prepare for RDT monitor data support
...

f5709176

Linus Torvalds

x86/idt: Fix the X86_TRAP_BP gate

c6ef8942

Ingo Molnar

Merge branch 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

d725c7ac

Linus Torvalds

x86/intel_rdt: Turn off most RDT features on Skylake

d56593eb

Tony Luck

x86/xen: Get rid of paravirt op adjust_exception_frame

5878d5d6

Juergen Gross

Merge branch 'irq-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

93cc1228

Linus Torvalds

smp/hotplug: Handle removal correctly in cpuhp_store_callbacks()

0c96b273

Ethan Barnes

x86/intel_rdt: Add command line options for resource director technology

1d9807fc

Tony Luck

x86/eisa: Add missing include

ef1d4dea

Thomas Gleixner

Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

dd90cccf

Linus Torvalds

Merge tag 'irqchip-4.14' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/core

9fbd7fd2

Thomas Gleixner

Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

935acd3f

Linus Torvalds

x86/intel_rdt: Move special case code for Haswell to a quirk function

0576113a

Tony Luck

x86/idt: Remove superfluous ALIGNment

04b5de3a

Jiri Slaby

Merge branch 'x86-mm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

b1b6f83a

Linus Torvalds

alarmtimer: Ensure RTC module is not unloaded

51218298

Alexandre Belloni

genirq/proc: Avoid uninitalized variable warning

b33394ba

Thomas Gleixner

irqchip/ls-scfg-msi: Add MSI affinity support

ae3efabf

Minghuan Lian

Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

31ba04d9

Linus Torvalds

genirq: Keep chip buslock across irq_request/release_resources()

19d39a38

Thomas Gleixner

x86/intel_rdt: Remove redundant ternary operator on return

5707b46a

Colin Ian King

x86/apic: Silence "FW_BUG TSC_DEADLINE disabled due to Errata" on CPUs without the feature

594a30fb

Hans de Goede

Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

5f82e71a

Linus Torvalds

x86/mm: Enable RCU based page table freeing (CONFIG_HAVE_RCU_TABLE_FREE=y)

There's a subtle bug in how some of the paravirt guest code handles
page table freeing on x86:

On x86 software page table walkers depend on the fact that remote TLB flush
does an IPI: walk is performed lockless but with interrupts disabled and in
case the page table is freed the freeing CPU will get blocked as remote TLB
flush is required. On other architectures which don't require an IPI to do
remote TLB flush we have an RCU-based mechanism (see
include/asm-generic/tlb.h for more details).

In virtualized environments we may want to override the ->flush_tlb_others
callback in pv_mmu_ops and use a hypercall asking the hypervisor to do a
remote TLB flush for us. This breaks the assumption about IPIs. Xen PV has
been doing this for years and the upcoming remote TLB flush for Hyper-V will
do it too.

This is not safe, as software page table walkers may step on an already
freed page.

Fix the bug by enabling the RCU-based page table freeing mechanism,
CONFIG_HAVE_RCU_TABLE_FREE=y.

Testing with kernbench and mmap/munmap microbenchmarks, and neither showed
any noticeable performance impact.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Juergen Gross <jgross@suse.com>
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrew Cooper <andrew.cooper3@citrix.com>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Jork Loeser <Jork.Loeser@microsoft.com>
Cc: KY Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170828082251.5562-1-vkuznets@redhat.com
[ Rewrote/fixed/clarified the changelog. ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>