commits

A kernel page fault oops with the callstack below was observed
when a read syscall was made to a pmem device after a huge amount
(>512GB) of vmalloc ranges was allocated by ioremap() on a x86_64
system:

BUG: unable to handle kernel paging request at ffff880840000ff8
IP: vmalloc_fault+0x1be/0x300
PGD c7f03a067 PUD 0
Oops: 0000 [#1] SM
Call Trace:
__do_page_fault+0x285/0x3e0
do_page_fault+0x2f/0x80
? put_prev_entity+0x35/0x7a0
page_fault+0x28/0x30
? memcpy_erms+0x6/0x10
? schedule+0x35/0x80
? pmem_rw_bytes+0x6a/0x190 [nd_pmem]
? schedule_timeout+0x183/0x240
btt_log_read+0x63/0x140 [nd_btt]
:
? __symbol_put+0x60/0x60
? kernel_read+0x50/0x80
SyS_finit_module+0xb9/0xf0
entry_SYSCALL_64_fastpath+0x1a/0xa4

Since v4.1, ioremap() supports large page (pud/pmd) mappings in
x86_64 and PAE. vmalloc_fault() however assumes that the vmalloc
range is limited to pte mappings.

vmalloc faults do not normally happen in ioremap'd ranges since
ioremap() sets up the kernel page tables, which are shared by
user processes. pgd_ctor() sets the kernel's PGD entries to
user's during fork(). When allocation of the vmalloc ranges
crosses a 512GB boundary, ioremap() allocates a new pud table
and updates the kernel PGD entry to point it. If user process's
PGD entry does not have this update yet, a read/write syscall
to the range will cause a vmalloc fault, which hits the Oops
above as it does not handle a large page properly.

Following changes are made to vmalloc_fault().

64-bit:

- No change for the PGD sync operation as it handles large
pages already.
- Add pud_huge() and pmd_huge() to the validation code to
handle large pages.
- Change pud_page_vaddr() to pud_pfn() since an ioremap range
is not directly mapped (while the if-statement still works
with a bogus addr).
- Change pmd_page() to pmd_pfn() since an ioremap range is not
backed by struct page (while the if-statement still works
with a bogus addr).

32-bit:
- No change for the sync operation since the index3 PGD entry
covers the entire vmalloc range, which is always valid.
(A separate change to sync PGD entry is necessary if this
memory layout is changed regardless of the page size.)
- Add pmd_huge() to the validation code to handle large pages.
This is for completeness since vmalloc_fault() won't happen
in ioremap'd ranges as its PGD entry is always valid.

Reported-by: Henning Schild <henning.schild@siemens.com>
Signed-off-by: Toshi Kani <toshi.kani@hpe.com>
Acked-by: Borislav Petkov <bp@alien8.de>
Cc: <stable@vger.kernel.org> # 4.1+
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Luis R. Rodriguez <mcgrof@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Toshi Kani <toshi.kani@hp.com>
Cc: linux-mm@kvack.org
Cc: linux-nvdimm@lists.01.org
Link: http://lkml.kernel.org/r/1455758214-24623-1-git-send-email-toshi.kani@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>

10y ago

Linus Torvalds

e6a1c1e9

Merge tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

10y ago

Thomas Gleixner

059fcd8c

perf/core: Plug potential memory leak in CPU_UP_PREPARE

10y ago

Michael S. Tsirkin

4e7f9df2

hpet: Drop stale URLs

10y ago

Linus Torvalds

da6b7366

Merge tag 'dmaengine-fix-4.5-rc5' of git://git.infradead.org/users/vkoul/slave-dma

10y ago

Alexey Kardashevskiy

6ecad912

powerpc/ioda: Set "read" permission when "write" is set

10y ago

Thomas Gleixner

27ca9236

perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug state

10y ago

Toshi Kani

a82eee74

x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()

10y ago

Linus Torvalds

37aa4dac

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

10y ago

Andy Shevchenko

ee1cdcda

dmaengine: dw: disable BLOCK IRQs for non-cyclic xfer

10y ago

Aneesh Kumar K.V

c777e2a8

powerpc/mm: Fix Multi hit ERAT cause by recent THP update

10y ago

Thomas Gleixner

b4f75d44

perf/core: Remove bogus UP_CANCELED hotplug state

10y ago

Toshi Kani

ee9737c9

x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable

10y ago

Linus Torvalds

a703f42d

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

10y ago

Stephen Boyd

4462b4bb

clk: gpio: Really allow an optional clock= DT property

10y ago

John Ogness

4ac31d18

dmaengine: edma: fix residue race for cyclic

10y ago

Gavin Shan

1bc74f1c

powerpc/powernv: Fix stale PE primary bus

10y ago

Thomas Gleixner

8bc9162c

perf/x86/amd/uncore: Plug reference leak

10y ago

Ingo Molnar

02a5f765

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

10y ago

Simon Guinot

59ceeaaf

kernel/resource.c: fix muxed resource handling in __request_region()

10y ago

Rasmus Villemoes

4fbbed46

drm/nouveau: use post-decrement in error handling

10y ago

Stephen Boyd

c430daf9

Revert "clk: qcom: Specify LE device endianness"

10y ago

Andy Shevchenko

3efaf2a9

dmaengine: dw: pci: add ID for WildcatPoint PCH

10y ago

Gavin Shan

05ba75f8

powerpc/eeh: Fix stale cached primary bus

10y ago

Linus Torvalds

65c23c65

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

10y ago

Ingo Molnar

4682c211

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

10y ago

Jason Andryuk

a6807590

lib/ucs2_string: Correct ucs2 -> utf8 conversion

10y ago

Linus Torvalds

020ecbba

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

10y ago

Maarten Lankhorst

5fff80bb

drm/atomic: Allow for holes in connector state, v2.

10y ago

Linus Walleij

df9cd564

clk: versatile: mask VCO bits before writing

10y ago

Dave Jiang

8a695db0

dmaengine: IOATDMA: fix timer code that continues to restart channels during idle

10y ago

Denis Kirjanov

126df08c

powerpc/pseries: Don't trace hcalls on offline CPUs

10y ago

Linus Torvalds

d82834ee

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

10y ago

Anton Protopopov

4b550af5

cifs: fix erroneous return value

10y ago

Simon Horman

1926e54f

MAINTAINERS: Update mailing list for Renesas ARM64 SoC Development

10y ago

Peter Jones

ed8b0de5

efi: Make efivarfs entries immutable by default

10y ago

Matt Fleming

e246eb56

efi: Add pstore variables to the deletion whitelist

10y ago

Linus Torvalds

ce6b7143

Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

10y ago

Jan Kara

74dae427

ext4: fix crashes in dioread_nolock mode

10y ago

Dave Airlie

5441ea11

Merge tag 'drm-vc4-fixes-2016-02-17' of github.com:anholt/linux into drm-fixes

10y ago

Stephen Boyd

0e954fea

Merge tag 'tegra-for-4.5-clk-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into clk-fixes

10y ago

Linus Torvalds

92e963f5

Linux 4.5-rc1 v4.5-rc1

10y ago

Andreas Schwab

f15838e9

powerpc: Fix dedotify for binutils >= 2.26

10y ago

Linus Torvalds

87bbcfde

Merge tag 'for-linus-20160216' of git://git.infradead.org/intel-iommu

10y ago

Paolo Bonzini

c53d7a84

Merge tag 'kvm-arm-for-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master

10y ago

Insu Yun

f34d69c3

cifs: fix potential overflow in cifs_compose_mount_options

10y ago

Linus Torvalds

631c0e84

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

10y ago

Peter Jones

8282f5d9

efi: Make our variable validation list include the guid

10y ago

Linus Torvalds

87d9ac71

Merge branch 'akpm' (patches from Andrew)

10y ago

Chris Mason

413eddc6

Merge branch 'for-chris-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.5

10y ago

Jan Kara

ed8ad838

ext4: fix bh->b_state corruption

10y ago

Dave Airlie

aaa7dd2c

Merge branch 'drm-fixes-4.5' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

10y ago

Eric Anholt

36cb6253

drm/vc4: Use runtime PM to power cycle the device when the GPU hangs.

10y ago

Stephen Boyd

60c7e2d2

Merge tag 'v4.5-rockchip-clkfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into clk-fixes

10y ago

Jon Hunter

5a1d5eff

clk: tegra: super: Fix sparse warnings for functions not declared as static

10y ago

Linus Torvalds

e2464688

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

10y ago

Aneesh Kumar K.V

19f97c98

powerpc/book3s_32: Fix build error with checkpoint restart

10y ago

Linux 4.5-rc5 v4.5-rc5

81f70ba2

Linus Torvalds

10y

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

0389075e

Linus Torvalds

10y

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

06b74c65

Linus Torvalds

10y

x86/mm: Fix vmalloc_fault() to handle large pages properly

f4eafd8b

Toshi Kani

10y

Merge tag 'powerpc-4.5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

e6a1c1e9

Linus Torvalds

10y

perf/core: Plug potential memory leak in CPU_UP_PREPARE

059fcd8c

Thomas Gleixner

10y

hpet: Drop stale URLs

4e7f9df2

Michael S. Tsirkin

10y

Merge tag 'dmaengine-fix-4.5-rc5' of git://git.infradead.org/users/vkoul/slave-dma

da6b7366

Linus Torvalds

10y

powerpc/ioda: Set "read" permission when "write" is set

6ecad912

Alexey Kardashevskiy

10y

perf/core: Remove the bogus and dangerous CPU_DOWN_FAILED hotplug state

27ca9236

Thomas Gleixner

10y

x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache()

a82eee74

Toshi Kani

10y

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

37aa4dac

Linus Torvalds

10y

dmaengine: dw: disable BLOCK IRQs for non-cyclic xfer

ee1cdcda

Andy Shevchenko

10y

powerpc/mm: Fix Multi hit ERAT cause by recent THP update

With ppc64 we use the deposited pgtable_t to store the hash pte slot
information. We should not withdraw the deposited pgtable_t without
marking the pmd none. This ensure that low level hash fault handling
will skip this huge pte and we will handle them at upper levels.

Recent change to pmd splitting changed the above in order to handle the
race between pmd split and exit_mmap. The race is explained below.

Consider following race:

CPU0 CPU1
shrink_page_list()
add_to_swap()
split_huge_page_to_list()
__split_huge_pmd_locked()
pmdp_huge_clear_flush_notify()
// pmd_none() == true
exit_mmap()
unmap_vmas()
zap_pmd_range()
// no action on pmd since pmd_none() == true
pmd_populate()

As result the THP will not be freed. The leak is detected by check_mm():

BUG: Bad rss-counter state mm:ffff880058d2e580 idx:1 val:512

The above required us to not mark pmd none during a pmd split.

The fix for ppc is to clear the huge pte of _PAGE_USER, so that low
level fault handling code skip this pte. At higher level we do take ptl
lock. That should serialze us against the pmd split. Once the lock is
acquired we do check the pmd again using pmd_same. That should always
return false for us and hence we should retry the access. We do the
pmd_same check in all case after taking plt with
THP (do_huge_pmd_wp_page, do_huge_pmd_numa_page and
huge_pmd_set_accessed)

Also make sure we wait for irq disable section in other cpus to finish
before flipping a huge pte entry with a regular pmd entry. Code paths
like find_linux_pte_or_hugepte depend on irq disable to get
a stable pte_t pointer. A parallel thp split need to make sure we
don't convert a pmd pte to a regular pmd entry without waiting for the
irq disable section to finish.

Fixes: eef1b3ba053a ("thp: implement split_huge_pmd()")
Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

c777e2a8

Aneesh Kumar K.V

10y

perf/core: Remove bogus UP_CANCELED hotplug state

b4f75d44

Thomas Gleixner

10y

x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable

ee9737c9

Toshi Kani

10y

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

a703f42d

Linus Torvalds

10y

clk: gpio: Really allow an optional clock= DT property

4462b4bb

Stephen Boyd

10y

dmaengine: edma: fix residue race for cyclic

4ac31d18

John Ogness

10y

powerpc/powernv: Fix stale PE primary bus

1bc74f1c

Gavin Shan

10y

perf/x86/amd/uncore: Plug reference leak

8bc9162c

Thomas Gleixner

10y

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

02a5f765

Ingo Molnar

10y

kernel/resource.c: fix muxed resource handling in __request_region()

59ceeaaf

Simon Guinot

10y

drm/nouveau: use post-decrement in error handling

4fbbed46

Rasmus Villemoes

10y

Revert "clk: qcom: Specify LE device endianness"

c430daf9

Stephen Boyd

10y

dmaengine: dw: pci: add ID for WildcatPoint PCH

3efaf2a9

Andy Shevchenko

10y

powerpc/eeh: Fix stale cached primary bus

05ba75f8

Gavin Shan

10y

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

65c23c65

Linus Torvalds

10y

Merge tag 'efi-urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent

4682c211

Ingo Molnar

10y

lib/ucs2_string: Correct ucs2 -> utf8 conversion

a6807590

Jason Andryuk

10y

Merge tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4

020ecbba

Linus Torvalds

10y

drm/atomic: Allow for holes in connector state, v2.

5fff80bb

Maarten Lankhorst

10y

clk: versatile: mask VCO bits before writing

df9cd564

Linus Walleij

10y

dmaengine: IOATDMA: fix timer code that continues to restart channels during idle

8a695db0

Dave Jiang

10y

powerpc/pseries: Don't trace hcalls on offline CPUs

126df08c

Denis Kirjanov

10y

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

d82834ee

Linus Torvalds

10y

cifs: fix erroneous return value

4b550af5

Anton Protopopov

10y

MAINTAINERS: Update mailing list for Renesas ARM64 SoC Development

1926e54f

Simon Horman

10y

efi: Make efivarfs entries immutable by default

ed8b0de5

Peter Jones

10y

efi: Add pstore variables to the deletion whitelist

e246eb56

Matt Fleming

10y

Merge branch 'for-linus-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs

ce6b7143

Linus Torvalds

10y

ext4: fix crashes in dioread_nolock mode

74dae427

Jan Kara

10y

Merge tag 'drm-vc4-fixes-2016-02-17' of github.com:anholt/linux into drm-fixes

5441ea11

Dave Airlie

10y

Merge tag 'tegra-for-4.5-clk-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tegra/linux into clk-fixes

0e954fea

Stephen Boyd

10y

Linux 4.5-rc1 v4.5-rc1

92e963f5

Linus Torvalds

10y

powerpc: Fix dedotify for binutils >= 2.26

f15838e9

Andreas Schwab

10y

Merge tag 'for-linus-20160216' of git://git.infradead.org/intel-iommu

87bbcfde

Linus Torvalds

10y

Merge tag 'kvm-arm-for-4.5-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into kvm-master

c53d7a84

Paolo Bonzini

10y

cifs: fix potential overflow in cifs_compose_mount_options

f34d69c3

Insu Yun

10y

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

631c0e84

Linus Torvalds

10y

efi: Make our variable validation list include the guid

8282f5d9

Peter Jones

10y

Merge branch 'akpm' (patches from Andrew)

87d9ac71

Linus Torvalds

10y

Merge branch 'for-chris-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/fdmanana/linux into for-linus-4.5

413eddc6

Chris Mason

10y

ext4: fix bh->b_state corruption

ed8ad838

Jan Kara

10y

Merge branch 'drm-fixes-4.5' of git://people.freedesktop.org/~agd5f/linux into drm-fixes

aaa7dd2c

Dave Airlie

10y

drm/vc4: Use runtime PM to power cycle the device when the GPU hangs.

36cb6253

Eric Anholt

10y

Merge tag 'v4.5-rockchip-clkfixes1' of git://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into clk-fixes

60c7e2d2

Stephen Boyd

10y

clk: tegra: super: Fix sparse warnings for functions not declared as static

5a1d5eff

Jon Hunter

10y

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

Pull MIPS updates from Ralf Baechle:
"This is the main pull request for MIPS for 4.5 plus some 4.4 fixes.

The executive summary:

- ATH79 platform improvments, use DT bindings for the ATH79 USB PHY.
- Avoid useless rebuilds for zboot.
- jz4780: Add NEMC, BCH and NAND device tree nodes
- Initial support for the MicroChip's DT platform. As all the device
drivers are missing this is still of limited use.
- Some Loongson3 cleanups.
- The unavoidable whitespace polishing.
- Reduce clock skew when synchronizing the CPU cycle counters on CPU
startup.
- Add MIPS R6 fixes.
- Lots of cleanups across arch/mips as fallout from KVM.
- Lots of minor fixes and changes for IEEE 754-2008 support to the
FPU emulator / fp-assist software.
- Minor Ralink, BCM47xx and bcm963xx platform support improvments.
- Support SMP on BCM63168"

* 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus: (84 commits)
MIPS: zboot: Add support for serial debug using the PROM
MIPS: zboot: Avoid useless rebuilds
MIPS: BMIPS: Enable ARCH_WANT_OPTIONAL_GPIOLIB
MIPS: bcm63xx: nvram: Remove unused bcm63xx_nvram_get_psi_size() function
MIPS: bcm963xx: Update bcm_tag field image_sequence
MIPS: bcm963xx: Move extended flash address to bcm_tag header file
MIPS: bcm963xx: Move Broadcom BCM963xx image tag data structure
MIPS: bcm63xx: nvram: Use nvram structure definition from header file
MIPS: bcm963xx: Add Broadcom BCM963xx board nvram data structure
MAINTAINERS: Add KVM for MIPS entry
MIPS: KVM: Add missing newline to kvm_err()
MIPS: Move KVM specific opcodes into asm/inst.h
MIPS: KVM: Use cacheops.h definitions
MIPS: Break down cacheops.h definitions
MIPS: Use EXCCODE_ constants with set_except_vector()
MIPS: Update trap codes
MIPS: Move Cause.ExcCode trap codes to mipsregs.h
MIPS: KVM: Make kvm_mips_{init,exit}() static
MIPS: KVM: Refactor added offsetof()s
MIPS: KVM: Convert EXPORT_SYMBOL to _GPL
...

e2464688

Linus Torvalds

10y

powerpc/book3s_32: Fix build error with checkpoint restart

19f97c98

Aneesh Kumar K.V

10y