commits

Some machines have EFI regions in page zero (physical address
0x00000000) and historically that region has been added to the e820
map via trim_bios_range(), and ultimately mapped into the kernel page
tables. It was not mapped via efi_map_regions() as one would expect.

Alexis reports that with the new separate EFI page tables some boot
services regions, such as page zero, are not mapped. This triggers an
oops during the SetVirtualAddressMap() runtime call.

For the EFI boot services quirk on x86 we need to memblock_reserve()
boot services regions until after SetVirtualAddressMap(). Doing that
while respecting the ownership of regions that may have already been
reserved by the kernel was the motivation behind this commit:

7d68dc3f1003 ("x86, efi: Do not reserve boot services regions within reserved areas")

That patch was merged at a time when the EFI runtime virtual mappings
were inserted into the kernel page tables as described above, and the
trick of setting ->numpages (and hence the region size) to zero to
track regions that should not be freed in efi_free_boot_services()
meant that we never mapped those regions in efi_map_regions(). Instead
we were relying solely on the existing kernel mappings.

Now that we have separate page tables we need to make sure the EFI
boot services regions are mapped correctly, even if someone else has
already called memblock_reserve(). Instead of stashing a tag in
->numpages, set the EFI_MEMORY_RUNTIME bit of ->attribute. Since it
generally makes no sense to mark a boot services region as required at
runtime, it's pretty much guaranteed the firmware will not have
already set this bit.

For the record, the specific circumstances under which Alexis
triggered this bug was that an EFI runtime driver on his machine was
responding to the EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE event during
SetVirtualAddressMap().

The event handler for this driver looks like this,

sub rsp,0x28
lea rdx,[rip+0x2445] # 0xaa948720
mov ecx,0x4
call func_aa9447c0 ; call to ConvertPointer(4, & 0xaa948720)
mov r11,QWORD PTR [rip+0x2434] # 0xaa948720
xor eax,eax
mov BYTE PTR [r11+0x1],0x1
add rsp,0x28
ret

Which is pretty typical code for an EVT_SIGNAL_VIRTUAL_ADDRESS_CHANGE
handler. The "mov r11, QWORD PTR [rip+0x2424]" was the faulting
instruction because ConvertPointer() was being called to convert the
address 0x0000000000000000, which when converted is left unchanged and
remains 0x0000000000000000.

The output of the oops trace gave the impression of a standard NULL
pointer dereference bug, but because we're accessing physical
addresses during ConvertPointer(), it wasn't. EFI boot services code
is stored at that address on Alexis' machine.

Reported-by: Alexis Murzeau <amurzeau@gmail.com>
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Cc: Matthew Garrett <mjg59@srcf.ucam.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raphael Hertzog <hertzog@debian.org>
Cc: Roger Shimizu <rogershimizu@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-efi@vger.kernel.org
Link: http://lkml.kernel.org/r/1457695163-29632-2-git-send-email-matt@codeblueprint.co.uk
Link: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=815125
Signed-off-by: Ingo Molnar <mingo@kernel.org>

9y ago

Linus Torvalds

f2c12421

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

9y ago

James Hogan

4b7b1ef2

ld-version: Fix awk regex compile failure

9y ago

Linus Torvalds

03c668a9

Merge tag 'for-linus-20160311' of git://git.infradead.org/linux-mtd

9y ago

Nicholas Bellinger

7f54ab5f

target: Drop incorrect ABORT_TASK put for completed commands

9y ago

Borislav Petkov

6e686709

x86/fpu: Fix eager-FPU handling on legacy FPU machines

9y ago

Linus Torvalds

c32c2cb2

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

9y ago

Paolo Bonzini

5f0b8199

KVM: MMU: fix reserved bit check for ept=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0

9y ago

Aaro Koskinen

ba9e72c2

MIPS: Fix build with DEBUG_ZBOOT and MACH_JZ4780

9y ago

Linus Torvalds

3ab0a0f9

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

9y ago

Boris BREZILLON

9df4f913

MAINTAINERS: add a maintainer for the NAND subsystem

9y ago

Linus Torvalds

81f70ba2

Linux 4.5-rc5 v4.5-rc5

10y ago

Borislav Petkov

84477336

x86/delay: Avoid preemptible context checks in delay_mwaitx()

9y ago

Linus Torvalds

2da33f9f

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

9y ago

Will Deacon

ff792584

arm64: hugetlb: partial revert of 66b3923a1a0f

9y ago

Paolo Bonzini

844a5fe2

KVM: MMU: fix ept=0/pte.u=1/pte.w=0/CR0.WP=0/CR4.SMEP=1/EFER.NX=0 combo

9y ago

Linus Torvalds

f6cede5b

Linux 4.5-rc7 v4.5-rc7

9y ago

Matthew Dawson

76401310

mm/mempool: avoid KASAN marking mempool poison checks as use-after-free

9y ago

Ville Syrjälä

0bbca274

drm/i915: Actually retry with bit-banging after GMBUS timeout

9y ago

Jorge Ramirez-Ortiz

c57753d4

mtd: nand: tests: fix regression introduced in mtd_nandectest

9y ago

Linus Torvalds

0389075e

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

10y ago

Yu-cheng Yu

a65050c6

x86/fpu: Revert ("x86/fpu: Disable AVX when eagerfpu is off")

9y ago

Linus Torvalds

8e0f93cd

Merge tag 'spi-fix-v4.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

9y ago

Martin Schwidefsky

3446c13b

s390/mm: four page table levels vs. fork

9y ago

Ard Biesheuvel

36e5cd6b

arm64: account for sparsemem section alignment when choosing vmemmap offset

9y ago

David Matlack

313f636d

kvm: cap halt polling at exactly halt_poll_ns

9y ago

Linus Torvalds

d55e08c8

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

9y ago

Linus Torvalds

2a4fb270

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

9y ago

Takashi Iwai

2f791908

drm/i915: Fix bogus dig_port_map[] assignment for pre-HSW

9y ago

Han Xu

44248aff

MAINTAINERS: add maintainer entry for FREESCALE GPMI NAND driver

10y ago

Linus Torvalds

06b74c65

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

10y ago

Toshi Kani

f4eafd8b

x86/mm: Fix vmalloc_fault() to handle large pages properly

A kernel page fault oops with the callstack below was observed
when a read syscall was made to a pmem device after a huge amount
(>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
system:

BUG: unable to handle kernel paging request at ffff880840000ff8
IP: vmalloc_fault+0x1be/0x300
PGD c7f03a067 PUD 0
Oops: 0000 [#1] SM
Call Trace:
__do_page_fault+0x285/0x3e0
do_page_fault+0x2f/0x80
? put_prev_entity+0x35/0x7a0
page_fault+0x28/0x30
? memcpy_erms+0x6/0x10
? schedule+0x35/0x80
? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
? schedule_timeout+0x183/0x240
btt_log_read+0x63/0x140 [nd_btt]
:
? __symbol_put+0x60/0x60
? kernel_read+0x50/0x80
SyS_finit_module+0xb9/0xf0
entry_SYSCALL_64_fastpath+0x1a/0xa4

Since v4.1, ioremap() supports large page (pud/pmd) mappings in
x86_64 and PAE. vmalloc_fault() however assumes that the vmalloc
range is limited to pte mappings.

vmalloc faults do not normally happen in ioremap'd ranges since
ioremap() sets up the kernel page tables, which are shared by
user processes. pgd_ctor() sets the kernel's PGD entries to
user's during fork(). When allocation of the vmalloc ranges
crosses a 512GB boundary, ioremap() allocates a new pud table
and updates the kernel PGD entry to point it. If user process's
PGD entry does not have this update yet, a read/write syscall
to the range will cause a vmalloc fault, which hits the Oops
above as it does not handle a large page properly.

Following changes are made to vmalloc_fault().

64-bit:

- No change for the PGD sync operation as it handles large
pages already.
- Add pud_huge() and pmd_huge() to the validation code to
handle large pages.
- Change pud_page_vaddr() to pud_pfn() since an ioremap range
is not directly mapped (while the if-statement still works
with a bogus addr).
- Change pmd_page() to pmd_pfn() since an ioremap range is not
backed by struct page (while the if-statement still works
with a bogus addr).

32-bit:
- No change for the sync operation since the index3 PGD entry
covers the entire vmalloc range, which is always valid.
(A separate change to sync PGD entry is necessary if this
memory layout is changed regardless of the page size.)
- Add pmd_huge() to the validation code to handle large pages.
This is for completeness since vmalloc_fault() won't happen
in ioremap'd ranges as its PGD entry is always valid.

Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org> # 4.1+
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

10y ago

Andy Lutomirski

f363938c

x86/fpu: Fix 'no387' regression

9y ago

Linus Torvalds

718e47a5

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

9y ago

Mark Brown

3ee20abb

Merge remote-tracking branch 'spi/fix/rockchip' into spi-linus

9y ago

Christian Borntraeger

7a76aa95

s390/cpumf: Fix lpp detection

9y ago

David Hildenbrand

9522b37f

KVM: s390: correct fprs on SIGP (STOP AND) STORE STATUS

9y ago

Linus Torvalds

dd273a80

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm

9y ago

Arnd Bergmann

f3c87e99

Merge tag 'renesas-dt-fixes2-for-v4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/horms/renesas into fixes

10y ago

Linus Torvalds

95f41fb2

Merge tag 'media/v4.5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media

9y ago

Thomas Petazzoni

d7d5a43c

ARM: mvebu: fix overlap of Crypto SRAM with PCIe memory window

When the Crypto SRAM mappings were added to the Device Tree files
describing the Armada XP boards in commit c466d997bb16 ("ARM: mvebu:
define crypto SRAM ranges for all armada-xp boards"), the fact that
those mappings were overlaping with the PCIe memory aperture was
overlooked. Due to this, we currently have for all Armada XP platforms
a situation that looks like this:

Memory mapping on Armada XP boards with internal registers at
0xf1000000:

- 0x00000000 -> 0xf0000000 3.75G RAM
- 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB)
- 0xf1000000 -> 0xf1100000 1M internal registers
- 0xf8000000 -> 0xffe0000 126M PCIe memory aperture
- 0xf8100000 -> 0xf8110000 64KB Crypto SRAM #0 => OVERLAPS WITH PCIE !
- 0xf8110000 -> 0xf8120000 64KB Crypto SRAM #1 => OVERLAPS WITH PCIE !
- 0xffe00000 -> 0xfff00000 1M PCIe I/O aperture
- 0xfff0000 -> 0xffffffff 1M BootROM

The overlap means that when PCIe devices are added, depending on their
memory window needs, they might or might not be mapped into the
physical address space. Indeed, they will not be mapped if the area
allocated in the PCIe memory aperture by the PCI core overlaps with
one of the Crypto SRAM. Typically, a Intel IGB PCIe NIC that needs 8MB
of PCIe memory will see its PCIe memory window allocated from
0xf80000000 for 8MB, which overlaps with the Crypto SRAM windows. Due
to this, the PCIe window is not created, and any attempt to access the
PCIe window makes the kernel explode:

[ 3.302213] igb: Copyright (c) 2007-2014 Intel Corporation.
[ 3.307841] pci 0000:00:09.0: enabling device (0140 -> 0143)
[ 3.313539] mvebu_mbus: cannot add window '4:f8', conflicts with another window
[ 3.320870] mvebu-pcie soc:pcie-controller: Could not create MBus window at [mem 0xf8000000-0xf87fffff]: -22
[ 3.330811] Unhandled fault: external abort on non-linefetch (0x1008) at 0xf08c0018

This problem does not occur on Armada 370 boards, because we use the
following memory mapping (for boards that have internal registers at
0xf1000000):

- 0x00000000 -> 0xf0000000 3.75G RAM
- 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB)
- 0xf1000000 -> 0xf1100000 1M internal registers
- 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0 => OK !
- 0xf8000000 -> 0xffe0000 126M PCIe memory
- 0xffe00000 -> 0xfff00000 1M PCIe I/O
- 0xfff0000 -> 0xffffffff 1M BootROM

Obviously, the solution is to align the location of the Crypto SRAM
mappings of Armada XP to be similar with the ones on Armada 370, i.e
have them between the "internal registers" area and the beginning of
the PCIe aperture.

However, we have a special case with the OpenBlocks AX3-4 platform,
which has a 128 MB NOR flash. Currently, this NOR flash is mapped from
0xf0000000 to 0xf8000000. This is possible because on OpenBlocks
AX3-4, the internal registers are not at 0xf1000000. And this explains
why the Crypto SRAM mappings were not configured at the same place on
Armada XP.

Hence, the solution is two-fold:

(1) Move the NOR flash mapping on Armada XP OpenBlocks AX3-4 from
0xe8000000 to 0xf0000000. This frees the 0xf0000000 ->
0xf80000000 space.

(2) Move the Crypto SRAM mappings on Armada XP to be similar to
Armada 370 (except of course that Armada XP has two Crypto SRAM
and not one).

After this patch, the memory mapping on Armada XP boards with
registers at 0xf1 is:

- 0x00000000 -> 0xf0000000 3.75G RAM
- 0xf0000000 -> 0xf1000000 16M NOR flashes (AXP GP / AXP DB)
- 0xf1000000 -> 0xf1100000 1M internal registers
- 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0
- 0xf1110000 -> 0xf1120000 64KB Crypto SRAM #1
- 0xf8000000 -> 0xffe0000 126M PCIe memory
- 0xffe00000 -> 0xfff00000 1M PCIe I/O
- 0xfff0000 -> 0xffffffff 1M BootROM

And the memory mapping for the special case of the OpenBlocks AX3-4
(internal registers at 0xd0000000, NOR of 128 MB):

- 0x00000000 -> 0xc0000000 3G RAM
- 0xd0000000 -> 0xd1000000 1M internal registers
- 0xe800000 -> 0xf0000000 128M NOR flash
- 0xf1100000 -> 0xf1110000 64KB Crypto SRAM #0
- 0xf1110000 -> 0xf1120000 64KB Crypto SRAM #1
- 0xf8000000 -> 0xffe0000 126M PCIe memory
- 0xffe00000 -> 0xfff00000 1M PCIe I/O
- 0xfff0000 -> 0xffffffff 1M BootROM

Fixes: c466d997bb16 ("ARM: mvebu: define crypto SRAM ranges for all armada-xp boards")
Reported-by: Phil Sutter <phil@nwl.cc>
Cc: Phil Sutter <phil@nwl.cc>
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Olof Johansson <olof@lixom.net>

9y ago

David Woodhouse

be629c62

Fix directory hardlinks from deleted directories

10y ago

Linus Torvalds

e6a1c1e9

Merge tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

10y ago

Thomas Gleixner

059fcd8c

perf/core: Plug potential memory leak in CPU_UP_PREPARE

10y ago

Michael S. Tsirkin

4e7f9df2

hpet: Drop stale URLs

10y ago

Linus Torvalds

e2857b8f

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix ordering of WEXT netlink messages so we don't see a newlink
after a dellink, from Johannes Berg.

2) Out of bounds access in minstrel_ht_set_best_prob_rage, from
Konstantin Khlebnikov.

3) Paging buffer memory leak in iwlwifi, from Matti Gottlieb.

4) Wrong units used to set initial TCP rto from cached metrics, also
from Konstantin Khlebnikov.

5) Fix stale IP options data in the SKB control block from leaking
through layers of encapsulation, from Bernie Harris.

6) Zero padding len miscalculated in bnxt_en, from Michael Chan.

7) Only CHECKSUM_PARTIAL packets should be passed down through GSO, fix
from Hannes Frederic Sowa.

8) Fix suspend/resume with JME networking devices, from Diego Violat
and Guo-Fu Tseng.

9) Checksums not validated properly in bridge multicast support due to
the placement of the SKB header pointers at the time of the check,
fix from Álvaro Fernández Rojas.

10) Fix hang/tiemout with r8169 if a stats fetch is done while the
device is runtime suspended. From Chun-Hao Lin.

11) The forwarding database netlink dump facilities don't track the
state of the dump properly, resulting in skipped/missed entries.
From Minoura Makoto.

12) Fix regression from a recent 3c59x bug fix, from Neil Horman.

13) Fix list corruption in bna driver, from Ivan Vecera.

14) Big endian machines crash on vlan add in bnx2x, fix from Michal
Schmidt.

15) Ethtool RSS configuration not propagated properly in mlx5 driver,
from Tariq Toukan.

16) Fix regression in PHY probing in stmmac driver, from Gabriel
Fernandez.

17) Fix SKB tailroom calculation in igmp/mld code, from Benjamin
Poirier.

18) A past change to skip empty routing headers in ipv6 extention header
parsing accidently caused fragment headers to not be matched any
longer. Fix from Florian Westphal.

19) eTSEC-106 erratum needs to be applied to more gianfar chips, from
Atsushi Nemoto.

20) Fix netdev reference after free via workqueues in usb networking
drivers, from Oliver Neukum and Bjørn Mork.

21) mdio->irq is now an array rather than a pointer to dynamic memory,
but several drivers were still trying to free it :-/ Fixes from
Colin Ian King.

22) act_ipt iptables action forgets to set the family field, thus LOG
netfilter targets don't work with it. Fix from Phil Sutter.

23) SKB leak in ibmveth when skb_linearize() fails, from Thomas Falcon.

24) pskb_may_pull() cannot be called with interrupts disabled, fix code
that tries to do this in vmxnet3 driver, from Neil Horman.

25) be2net driver leaks iomap'd memory on removal, fix from Douglas
Miller.

26) Forgotton RTNL mutex unlock in ppp_create_interface() error paths,
from Guillaume Nault.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (97 commits)
ppp: release rtnl mutex when interface creation fails
cdc_ncm: do not call usbnet_link_change from cdc_ncm_bind
tcp: fix tcpi_segs_in after connection establishment
net: hns: fix the bug about loopback
jme: Fix device PM wakeup API usage
jme: Do not enable NIC WoL functions on S0
udp6: fix UDP/IPv6 encap resubmit path
be2net: Don't leak iomapped memory on removal.
vmxnet3: avoid calling pskb_may_pull with interrupts disabled
net: ethernet: Add missing MFD_SYSCON dependency on HAS_IOMEM
ibmveth: check return of skb_linearize in ibmveth_start_xmit
cdc_ncm: toggle altsetting to force reset before setup
usbnet: cleanup after bind() in probe()
mlxsw: pci: Correctly determine if descriptor queue is full
mlxsw: spectrum: Always decrement bridge's ref count
tipc: fix nullptr crash during subscription cancel
net: eth: altera: do not free array priv->mdio->irq
net/ethoc: do not free array priv->mdio->irq
net: sched: fix act_ipt for LOG target
asix: do not free array priv->mdio->irq
...