commits

mpe: On 64-bit Book3E vmalloc space starts at 0x8000000000000000.

Because of the way __pa() works we have:
__pa(0x8000000000000000) == 0, and therefore
virt_to_pfn(0x8000000000000000) == 0, and therefore
virt_addr_valid(0x8000000000000000) == true

Which is wrong, virt_addr_valid() should be false for vmalloc space.
In fact all vmalloc addresses that alias with a valid PFN will return
true from virt_addr_valid(). That can cause bugs with hardened usercopy
as described below by Kefeng Wang:

When running ethtool eth0 on 64-bit Book3E, a BUG occurred:

usercopy: Kernel memory exposure attempt detected from SLUB object not in SLUB page?! (offset 0, size 1048)!
kernel BUG at mm/usercopy.c:99
...
usercopy_abort+0x64/0xa0 (unreliable)
__check_heap_object+0x168/0x190
__check_object_size+0x1a0/0x200
dev_ethtool+0x2494/0x2b20
dev_ioctl+0x5d0/0x770
sock_do_ioctl+0xf0/0x1d0
sock_ioctl+0x3ec/0x5a0
__se_sys_ioctl+0xf0/0x160
system_call_exception+0xfc/0x1f0
system_call_common+0xf8/0x200

The code shows below,

data = vzalloc(array_size(gstrings.len, ETH_GSTRING_LEN));
copy_to_user(useraddr, data, gstrings.len * ETH_GSTRING_LEN))

The data is alloced by vmalloc(), virt_addr_valid(ptr) will return true
on 64-bit Book3E, which leads to the panic.

As commit 4dd7554a6456 ("powerpc/64: Add VIRTUAL_BUG_ON checks for __va
and __pa addresses") does, make sure the virt addr above PAGE_OFFSET in
the virt_addr_valid() for 64-bit, also add upper limit check to make
sure the virt is below high_memory.

Meanwhile, for 32-bit PAGE_OFFSET is the virtual address of the start
of lowmem, high_memory is the upper low virtual address, the check is
suitable for 32-bit, this will fix the issue mentioned in commit
602946ec2f90 ("powerpc: Set max_mapnr correctly") too.

On 32-bit there is a similar problem with high memory, that was fixed in
commit 602946ec2f90 ("powerpc: Set max_mapnr correctly"), but that
commit breaks highmem and needs to be reverted.

We can't easily fix __pa(), we have code that relies on its current
behaviour. So for now add extra checks to virt_addr_valid().

For 64-bit Book3S the extra checks are not necessary, the combination of
virt_to_pfn() and pfn_valid() should yield the correct result, but they
are harmless.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
[mpe: Add additional change log detail]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220406145802.538416-1-mpe@ellerman.id.au

3y ago

Linus Torvalds

e235f419

Merge tag 'core-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

3y ago

Thomas Gleixner

d6d6d50f

x86/fpu/xstate: Consolidate size calculations

3y ago

Stephen Boyd

cf683abd

Merge branches 'clk-sifive' and 'clk-visconti' into clk-next

3y ago

Steven Rostedt (Google)

fcbf591c

tracing: Set user_events to BROKEN

3y ago

Linus Torvalds

b51f86e9

Merge tag 'perf_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

3y ago

Reto Buerki

59b18a1e

x86/msi: Fix msi message data shadow struct

3y ago

Andre Przywara

544808f7

irqchip/gic, gic-v3: Prevent GSI to SGI translations

3y ago

Michael Ellerman

7f921a2d

KVM: PPC: Move kvmhv_on_pseries() into kvm_ppc.h

3y ago

Linus Torvalds

63d12cc3

Merge tag 'dma-mapping-5.18-1' of git://git.infradead.org/users/hch/dma-mapping

3y ago

Thomas Gleixner

7dd5ad2d

Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels"

3y ago

Thomas Gleixner

781c64bf

x86/fpu/xstate: Handle supervisor states in XSTATE permissions

3y ago

Stephen Boyd

c64dd8ea

Merge branches 'clk-range', 'clk-uniphier', 'clk-apple' and 'clk-qcom' into clk-next

- Make clk_set_rate_range() re-evaluate the limits each time
- Introduce various clk_set_rate_range() tests
- Add clk_drop_range() to drop a previously set range
- Support for NCO blocks on Apple SoCs

* clk-range:
clk: Drop the rate range on clk_put()
clk: test: Test clk_set_rate_range on orphan mux
clk: Initialize orphan req_rate
clk: bcm: rpi: Run some clocks at the minimum rate allowed
clk: bcm: rpi: Set a default minimum rate
clk: bcm: rpi: Add variant structure
clk: Add clk_drop_range
clk: Always set the rate on clk_set_range_rate
clk: Use clamp instead of open-coding our own
clk: Always clamp the rounded rate
clk: Enforce that disjoints limits are invalid
clk: Introduce Kunit Tests for the framework
clk: Fix clk_hw_get_clk() when dev is NULL

* clk-uniphier:
clk: uniphier: Fix fixed-rate initialization

* clk-apple:
clk: clk-apple-nco: Allow and fix module building
MAINTAINERS: Add clk-apple-nco under ARM/APPLE MACHINE
clk: clk-apple-nco: Add driver for Apple NCO
dt-bindings: clock: Add Apple NCO

* clk-qcom: (61 commits)
clk: qcom: gcc-msm8994: Fix gpll4 width
dt-bindings: clock: fix dt_binding_check error for qcom,gcc-other.yaml
clk: qcom: Add display clock controller driver for SM6125
dt-bindings: clock: add QCOM SM6125 display clock bindings
clk: qcom: Fix sorting of SDX_GCC_65 in Makefile and Kconfig
clk: qcom: gcc: Add emac GDSC support for SM8150
clk: qcom: gcc: sm8150: Fix some identation issues
clk: qcom: gcc: Add UFS_CARD and UFS_PHY GDSCs for SM8150
clk: qcom: gcc: Add PCIe0 and PCIe1 GDSC for SM8150
clk: qcom: clk-rcg2: Update the frac table for pixel clock
clk: qcom: clk-rcg2: Update logic to calculate D value for RCG
clk: qcom: smd: Add missing MSM8998 RPM clocks
clk: qcom: smd: Add missing RPM clocks for msm8992/4
dt-bindings: clock: qcom: rpmcc: Add RPM Modem SubSystem (MSS) clocks
clk: qcom: gcc-ipq806x: add CryptoEngine resets
dt-bindings: reset: add ipq8064 ce5 resets
clk: qcom: gcc-ipq806x: add CryptoEngine clocks
dt-bindings: clock: add ipq8064 ce5 clk define
clk: qcom: gcc-ipq806x: add additional freq for sdc table
clk: qcom: clk-rcg: add clk_rcg_floor_ops ops
...

3y ago

Zong Li

5e916932

clk: sifive: Move all stuff into SoCs header files from C files

3y ago

Dan Carpenter

c5601e07

clk: visconti: prevent array overflow in visconti_clk_register_gates()

3y ago

Beau Belgrave

768c1e7f

tracing/user_events: Remove eBPF interfaces

3y ago

Linus Torvalds

50c94de6

Merge tag 'locking_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

3y ago

Chengming Zhou

e19cd0b6

perf/core: Always set cpuctx cgrp when enable cgroup event

3y ago

Nick Desaulniers

334865b2

x86/extable: Prefer local labels in .set directives

3y ago

Marc Zyngier

0df66645

irqchip/gic-v3: Fix GICR_CTLR.RWP polling

3y ago

Srikar Dronamraju

e4ff7759

powerpc/numa: Handle partially initialized numa nodes

With commit 09f49dca570a ("mm: handle uninitialized numa nodes
gracefully") NODE_DATA for even a memoryless/cpuless node is partially
initialized at boot time.

Before onlining the node, current Powerpc code checks for NODE_DATA to
be NULL. However since NODE_DATA is partially initialized, this check
will end up always being false.

This causes hotplugging a CPU to a memoryless/cpuless node to fail.

Before adding CPUs:
$ numactl -H
available: 1 nodes (4)
node 4 cpus: 0 1 2 3 4 5 6 7
node 4 size: 97372 MB
node 4 free: 95545 MB
node distances:
node 4
4: 10

$ lparstat
System Configuration
type=Dedicated mode=Capped smt=8 lcpu=1 mem=99709440 kB cpus=0 ent=1.00

%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
2.66 2.67 0.16 94.51 0.00 0.00 5.33 0.00 67749 0

After hotplugging 32 cores:
$ numactl -H
node 4 cpus: 0 1 2 3 4 5 6 7 120 121 122 123 124 125 126 127 128 129 130
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148
149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166
167 168 169 170 171 172 173 174 175
node 4 size: 97372 MB
node 4 free: 93636 MB
node distances:
node 4
4: 10

$ lparstat
System Configuration
type=Dedicated mode=Capped smt=8 lcpu=33 mem=99709440 kB cpus=0 ent=33.00

%user %sys %wait %idle physc %entc lbusy app vcsw phint
----- ----- ----- ----- ----- ----- ----- ----- ----- -----
0.04 0.02 0.00 99.94 0.00 0.00 0.06 0.00 1128751 3

As we can see numactl is listing only 8 cores while lparstat is showing
33 cores.

Also dmesg is showing messages like:
[ 2261.318350 ] BUG: arch topology borken
[ 2261.318357 ] the DIE domain not a subset of the NODE domain

Fixes: 09f49dca570a ("mm: handle uninitialized numa nodes gracefully")
Reported-by: Geetika Moolchandani <Geetika.Moolchandani1@ibm.com>
Signed-off-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Oscar Salvador <osalvador@suse.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220330135123.1868197-1-srikar@linux.vnet.ibm.com

3y ago

Linus Torvalds

5dee8721

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

3y ago

Christoph Hellwig

4fe87e81

dma-mapping: move pgprot_decrypted out of dma_pgprot

3y ago

Zi Yan

787af64d

mm: page_alloc: validate buddy before check its migratetype.

3y ago

Thomas Gleixner

7aa5128b

x86/fpu/xsave: Handle compacted offsets correctly with supervisor states

3y ago

Stephen Boyd

4222744d

Merge branches 'clk-starfive', 'clk-ti', 'clk-terminate' and 'clk-cleanup' into clk-next

3y ago

Maxime Ripard

7dabfa2b

clk: Drop the rate range on clk_put()

3y ago

Kunihiko Hayashi

ca85a667

clk: uniphier: Fix fixed-rate initialization

3y ago

Martin Povišer

236541ac

clk: clk-apple-nco: Allow and fix module building

3y ago

Konrad Dybcio

71021db1

clk: qcom: gcc-msm8994: Fix gpll4 width

3y ago

Zong Li

24a4a29f

clk: sifive: Add SoCs prefix in each SoCs-dependent data

3y ago

Linus Torvalds

e783362e

Linux 5.17-rc1 v5.17-rc1

4y ago

Beau Belgrave

efe34e99

tracing/user_events: Hold event_mutex during dyn_event_add

3y ago

Linus Torvalds

7136849e

Merge tag 'sched_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

3y ago

Sebastian Andrzej Siewior

273ba85b

Revert "mm/page_alloc: mark pagesets as __maybe_unused"

3y ago

Chengming Zhou

96492a6c

perf/core: Fix perf_cgroup_switch()

There is a race problem that can trigger WARN_ON_ONCE(cpuctx->cgrp)
in perf_cgroup_switch().

CPU1 CPU2
perf_cgroup_sched_out(prev, next)
cgrp1 = perf_cgroup_from_task(prev)
cgrp2 = perf_cgroup_from_task(next)
if (cgrp1 != cgrp2)
perf_cgroup_switch(prev, PERF_CGROUP_SWOUT)
cgroup_migrate_execute()
task->cgroups = ?
perf_cgroup_attach()
task_function_call(task, __perf_cgroup_move)
perf_cgroup_sched_in(prev, next)
cgrp1 = perf_cgroup_from_task(prev)
cgrp2 = perf_cgroup_from_task(next)
if (cgrp1 != cgrp2)
perf_cgroup_switch(next, PERF_CGROUP_SWIN)
__perf_cgroup_move()
perf_cgroup_switch(task, PERF_CGROUP_SWOUT | PERF_CGROUP_SWIN)

The commit a8d757ef076f ("perf events: Fix slow and broken cgroup
context switch code") want to skip perf_cgroup_switch() when the
perf_cgroup of "prev" and "next" are the same.

But task->cgroups can change in concurrent with context_switch()
in cgroup_migrate_execute(). If cgrp1 == cgrp2 in sched_out(),
cpuctx won't do sched_out. Then task->cgroups changed cause
cgrp1 != cgrp2 in sched_in(), cpuctx will do sched_in. So trigger
WARN_ON_ONCE(cpuctx->cgrp).

Even though __perf_cgroup_move() will be synchronized as the context
switch disables the interrupt, context_switch() still can see the
task->cgroups is changing in the middle, since task->cgroups changed
before sending IPI.

So we have to combine perf_cgroup_sched_in() into perf_cgroup_sched_out(),
unified into perf_cgroup_switch(), to fix the incosistency between
perf_cgroup_sched_out() and perf_cgroup_sched_in().

But we can't just compare prev->cgroups with next->cgroups to decide
whether to skip cpuctx sched_out/in since the prev->cgroups is changing
too. For example:

CPU1 CPU2
cgroup_migrate_execute()
prev->cgroups = ?
perf_cgroup_attach()
task_function_call(task, __perf_cgroup_move)
perf_cgroup_switch(task)
cgrp1 = perf_cgroup_from_task(prev)
cgrp2 = perf_cgroup_from_task(next)
if (cgrp1 != cgrp2)
cpuctx sched_out/in ...
task_function_call() will return -ESRCH

In the above example, prev->cgroups changing cause (cgrp1 == cgrp2)
to be true, so skip cpuctx sched_out/in. And later task_function_call()
would return -ESRCH since the prev task isn't running on cpu anymore.
So we would leave perf_events of the old prev->cgroups still sched on
the CPU, which is wrong.

The solution is that we should use cpuctx->cgrp to compare with
the next task's perf_cgroup. Since cpuctx->cgrp can only be changed
on local CPU, and we have irq disabled, we can read cpuctx->cgrp to
compare without holding ctx lock.

Fixes: a8d757ef076f ("perf events: Fix slow and broken cgroup context switch code")
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220329154523.86438-4-zhouchengming@bytedance.com

3y ago

Peter Zijlstra

be8a0965

x86,bpf: Avoid IBT objtool warning

3y ago

Marc Zyngier

af27e416

irqchip/gic-v4: Wait for GICR_VPENDBASER.Dirty to clear before descheduling

3y ago

Linux 5.18-rc2 v5.18-rc2

ce522ba9

Linus Torvalds

Merge tag 'tty-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

8b57b304

Linus Torvalds

Merge tag 'staging-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

95aa17c3

Linus Torvalds

tty: serial: mpc52xx_uart: make rx/tx hooks return unsigned, part II.

dbf3f093

Jiri Slaby

Merge tag 'driver-core-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

33563138

Linus Torvalds

staging: r8188eu: Fix PPPoE tag insertion on little endian systems

20314bac

Guenter Roeck

Linux 5.18-rc1 v5.18-rc1

31231092

Linus Torvalds

Merge tag 'char-misc-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

f58d3410

Linus Torvalds

kobject: kobj_type: remove default_attrs

cdb4f26a

Greg Kroah-Hartman

Merge tag 'trace-v5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

09bb8856

Linus Torvalds

Merge tag 'powerpc-5.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

4ea3c642

Linus Torvalds

habanalabs: Fix test build failures

94865e2d

Guenter Roeck

powerpc/pseries/vas: use default_groups in kobj_type

c31bc046

Greg Kroah-Hartman

Merge tag 'clk-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

34a53ff9

Linus Torvalds

tracing: Move user_events.h temporarily out of include/uapi

5cfff569

Steven Rostedt (Google)

Merge tag 'irq-urgent-2022-04-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

1519610b

Linus Torvalds

Revert "powerpc: Set max_mapnr correctly"

1ff5c8e8

Kefeng Wang

Merge tag 'x86-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

8b5656bc

Linus Torvalds

Revert "clk: Drop the rate range on clk_put()"

859c2c7b

Stephen Boyd

ftrace: Make ftrace_graph_is_dead() a static branch

18bfee32

Christophe Leroy

Merge tag 'x86_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

9c6913b7

Linus Torvalds

Merge tag 'irqchip-fixes-5.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms into irq/urgent

63ef1a8a

Thomas Gleixner

powerpc: Fix virt_addr_valid() for 64-bit Book3E & 32-bit

ffa0b64e

Kefeng Wang

Merge tag 'core-urgent-2022-04-03' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

e235f419

Linus Torvalds

x86/fpu/xstate: Consolidate size calculations

d6d6d50f

Thomas Gleixner

Merge branches 'clk-sifive' and 'clk-visconti' into clk-next

cf683abd

Stephen Boyd

tracing: Set user_events to BROKEN

fcbf591c

Steven Rostedt (Google)

Merge tag 'perf_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

b51f86e9

Linus Torvalds

x86/msi: Fix msi message data shadow struct

59b18a1e

Reto Buerki

irqchip/gic, gic-v3: Prevent GSI to SGI translations

544808f7

Andre Przywara

KVM: PPC: Move kvmhv_on_pseries() into kvm_ppc.h

7f921a2d

Michael Ellerman

Merge tag 'dma-mapping-5.18-1' of git://git.infradead.org/users/hch/dma-mapping

63d12cc3

Linus Torvalds

Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels"

7dd5ad2d

Thomas Gleixner

x86/fpu/xstate: Handle supervisor states in XSTATE permissions

781c64bf

Thomas Gleixner

Merge branches 'clk-range', 'clk-uniphier', 'clk-apple' and 'clk-qcom' into clk-next

c64dd8ea

Stephen Boyd

clk: sifive: Move all stuff into SoCs header files from C files

5e916932

Zong Li

clk: visconti: prevent array overflow in visconti_clk_register_gates()

c5601e07

Dan Carpenter

tracing/user_events: Remove eBPF interfaces

768c1e7f

Beau Belgrave

Merge tag 'locking_urgent_for_v5.18_rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

50c94de6

Linus Torvalds

perf/core: Always set cpuctx cgrp when enable cgroup event

e19cd0b6

Chengming Zhou

x86/extable: Prefer local labels in .set directives

334865b2

Nick Desaulniers

irqchip/gic-v3: Fix GICR_CTLR.RWP polling

0df66645

Marc Zyngier

powerpc/numa: Handle partially initialized numa nodes

e4ff7759

Srikar Dronamraju

Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm

5dee8721

Linus Torvalds

dma-mapping: move pgprot_decrypted out of dma_pgprot

4fe87e81

Christoph Hellwig

mm: page_alloc: validate buddy before check its migratetype.

787af64d

Zi Yan

x86/fpu/xsave: Handle compacted offsets correctly with supervisor states

7aa5128b

Thomas Gleixner

Merge branches 'clk-starfive', 'clk-ti', 'clk-terminate' and 'clk-cleanup' into clk-next

- Audio clks on StarFive JH7100 RISC-V SoC
- Terminate arrays with sentinels and make that clearer
- Cleanup SPDX tags
- Fix typos in comments

* clk-starfive:
clk: starfive: Add JH7100 audio clock driver
clk: starfive: jh7100: Support more clock types
clk: starfive: jh7100: Make hw clock implementation reusable
dt-bindings: clock: Add starfive,jh7100-audclk bindings
dt-bindings: clock: Add JH7100 audio clock definitions
clk: starfive: jh7100: Handle audio_div clock properly
clk: starfive: jh7100: Don't round divisor up twice

* clk-ti:
clk: ti: Drop legacy compatibility clocks for dra7
clk: ti: Drop legacy compatibility clocks for am4
clk: ti: Drop legacy compatibility clocks for am3
clk: ti: Update component clocks to use ti_dt_clk_name()
clk: ti: Update pll and clockdomain clocks to use ti_dt_clk_name()
clk: ti: Add ti_dt_clk_name() helper to use clock-output-names
clk: ti: Use clock-output-names for clkctrl
clk: ti: Add ti_find_clock_provider() to use clock-output-names
clk: ti: Optionally parse IO address from parent clock node
clk: ti: Preserve node in ti_dt_clocks_register()
clk: ti: Constify clkctrl_name

* clk-terminate:
clk: actions: Make sentinel elements more obvious
clk: clps711x: Terminate clk_div_table with sentinel element
clk: hisilicon: Terminate clk_div_table with sentinel element
clk: loongson1: Terminate clk_div_table with sentinel element
clk: actions: Terminate clk_div_table with sentinel element

* clk-cleanup:
clk: zynq: Update the parameters to zynq_clk_register_periph_clk
clk: zynq: trivial warning fix
clk: qcom: sm6125-gcc: fix typos in comments
clk: ti: clkctrl: fix typos in comments
clk: COMMON_CLK_LAN966X should depend on SOC_LAN966
clk: Use of_device_get_match_data()
clk: bcm2835: Remove unused variable
clk: tegra: tegra124-emc: Fix missing put_device() call in emc_ensure_emc_driver
clk: cleanup comments
clk: socfpga: cleanup spdx tags