commits

When a timer base is idle, it is forwarded when a new timer is added
to ensure that granularity does not become excessive. When not idle,
the timer tick is expected to increment the base.

However there are several problems:

- If an existing timer is modified, the base is forwarded only after
the index is calculated.

- The base is not forwarded by add_timer_on.

- There is a window after a timer is restarted from a nohz idle, after
it is marked not-idle and before the timer tick on this CPU, where a
timer may be added but the ancient base does not get forwarded.

These result in excessive granularity (a 1 jiffy timeout can blow out
to 100s of jiffies), which cause the rcu lockup detector to trigger,
among other things.

Fix this by keeping track of whether the timer base has been idle
since it was last run or forwarded, and if so then forward it before
adding a new timer.

There is still a case where mod_timer optimises the case of a pending
timer mod with the same expiry time, where the timer can see excessive
granularity relative to the new, shorter interval. A comment is added,
but it's not changed because it is an important fastpath for
networking.

This has been tested and found to fix the RCU softlockup messages.

Testing was also done with tracing to measure requested versus
achieved wakeup latencies for all non-deferrable timers in an idle
system (with no lockup watchdogs running). Wakeup latency relative to
absolute latency is calculated (note this suffers from round-up skew
at low absolute times) and analysed:

max avg std
upstream 506.0 1.20 4.68
patched 2.0 1.08 0.15

The bug was noticed due to the lockup detector Kconfig changes
dropping it out of people's .configs and resulting in larger base
clk skew When the lockup detectors are enabled, no CPU can go idle for
longer than 4 seconds, which limits the granularity errors.
Sub-optimal timer behaviour is observable on a smaller scale in that
case:

max avg std
upstream 9.0 1.05 0.19
patched 2.0 1.04 0.11

Fixes: Fixes: a683f390b93f ("timers: Forward the wheel clock whenever possible")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Tested-by: David Miller <davem@davemloft.net>
Cc: dzickus@redhat.com
Cc: sfr@canb.auug.org.au
Cc: mpe@ellerman.id.au
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: linuxarm@huawei.com
Cc: abdhalee@linux.vnet.ibm.com
Cc: John Stultz <john.stultz@linaro.org>
Cc: akpm@linux-foundation.org
Cc: paulmck@linux.vnet.ibm.com
Cc: torvalds@linux-foundation.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20170822084348.21436-1-npiggin@gmail.com

8y ago

Dan Carpenter

eaa2f87c

x86/ldt: Fix off by one in get_segment_base()

8y ago

Linus Torvalds

cc4a41fe

Linux 4.13-rc7 v4.13-rc7

8y ago

Linus Torvalds

54f70f52

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

8y ago

Steve French

7e682f76

Fix warning messages when mounting to older servers

8y ago

Meng Xu

f12f42ac

perf/core: Fix potential double-fetch bug

8y ago

Linus Torvalds

14ccee78

Linux 4.13-rc6 v4.13-rc6

8y ago

Linus Torvalds

9c3a815f

page waitqueue: always add new entries at the end

8y ago

Linus Torvalds

2c25833c

Merge tag 'iommu-fixes-v4.13-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu

8y ago

Linus Torvalds

f8c6d724

Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

8y ago

Olof Johansson

6f71a925

Merge tag 'mvebu-fixes-4.13-3' of git://git.infradead.org/linux-mvebu into fixes

8y ago

Linus Torvalds

e89ce1f8

Merge tag 'cifs-fixes-for-4.13-rc7-and-stable' of git://git.samba.org/sfrench/cifs-2.6

8y ago

Linus Torvalds

197e7e52

Sanitize 'move_pages()' permission checks

8y ago

Tejun Heo

b339752d

cpumask: fix spurious cpumask_of_node() on non-NUMA multi-node configs

8y ago

Linus Torvalds

80f73b2d

Merge tag 'char-misc-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

8y ago

Joerg Roedel

2926a2aa

iommu: Fix wrong freeing of iommu_device->dev

8y ago

Oleg Nesterov

138e4ad6

epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove()

8y ago

Hans de Goede

231d069f

i2c: designware: Round down ACPI provided clk to nearest supported clk

8y ago

Olof Johansson

fabed5ad

Merge tag 'sunxi-fixes-for-4.13-3' of https://git.kernel.org/pub/scm/linux/kernel/git/sunxi/linux into fixes

8y ago

Thomas Petazzoni

a0ac89b5

arm64: dts: marvell: fix number of GPIOs in Armada AP806 description

8y ago

Linus Torvalds

501d9f79

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

8y ago

Steve French

6e3c1529

CIFS: remove endian related sparse warning

8y ago

Linus Torvalds

7f680d7e

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

8y ago

Alexey Brodkin

e8206d2b

ARCv2: SMP: Mask only private-per-core IRQ lines on boot at core intc

8y ago

Linus Torvalds

c3c16263

Merge tag 'staging-4.13-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

8y ago

Martijn Coenen

b2a6d1b9

ANDROID: binder: fix proc->tsk check.

8y ago

Artem Savkov

a7990c64

iommu/arm-smmu: fix null-pointer dereference in arm_smmu_add_device

8y ago

Linus Torvalds

8cf9f2a2

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Fix handling of pinned BPF map nodes in hash of maps, from Daniel
Borkmann.

2) IPSEC ESP error paths leak memory, from Steffen Klassert.

3) We need an RCU grace period before freeing fib6_node objects, from
Wei Wang.

4) Must check skb_put_padto() return value in HSR driver, from FLorian
Fainelli.

5) Fix oops on PHY probe failure in ftgmac100 driver, from Andrew
Jeffery.

6) Fix infinite loop in UDP queue when using SO_PEEK_OFF, from Eric
Dumazet.

7) Use after free when tcf_chain_destroy() called multiple times, from
Jiri Pirko.

8) Fix KSZ DSA tag layer multiple free of SKBS, from Florian Fainelli.

9) Fix leak of uninitialized memory in sctp_get_sctp_info(),
inet_diag_msg_sctpladdrs_fill() and inet_diag_msg_sctpaddrs_fill().
From Stefano Brivio.

10) L2TP tunnel refcount fixes from Guillaume Nault.

11) Don't leak UDP secpath in udp_set_dev_scratch(), from Yossi
Kauperman.

12) Revert a PHY layer change wrt. handling of PHY_HALTED state in
phy_stop_machine(), it causes regressions for multiple people. From
Florian Fainelli.

13) When packets are sent out of br0 we have to clear the
offload_fwdq_mark value.

14) Several NULL pointer deref fixes in packet schedulers when their
->init() routine fails. From Nikolay Aleksandrov.

15) Aquantium devices cannot checksum offload correctly when the packet
is <= 60 bytes. From Pavel Belous.

16) Fix vnet header access past end of buffer in AF_PACKET, from
Benjamin Poirier.

17) Double free in probe error paths of nfp driver, from Dan Carpenter.

18) QOS capability not checked properly in DCB init paths of mlx5
driver, from Huy Nguyen.

19) Fix conflicts between firmware load failure and health_care timer in
mlx5, also from Huy Nguyen.

20) Fix dangling page pointer when DMA mapping errors occur in mlx5,
from Eran Ben ELisha.

21) ->ndo_setup_tc() in bnxt_en driver doesn't count rings properly,
from Michael Chan.

22) Missing MSIX vector free in bnxt_en, also from Michael Chan.

23) Refcount leak in xfrm layer when using sk_policy, from Lorenzo
Colitti.

24) Fix copy of uninitialized data in qlge driver, from Arnd Bergmann.

25) bpf_setsockopts() erroneously always returns -EINVAL even on
success. Fix from Yuchung Cheng.

26) tipc_rcv() needs to linearize the SKB before parsing the inner
headers, from Parthasarathy Bhuvaragan.

27) Fix deadlock between link status updates and link removal in netvsc
driver, from Stephen Hemminger.

28) Missed locking of page fragment handling in ESP output, from Steffen
Klassert.

29) Fix refcnt leak in ebpf congestion control code, from Sabrina
Dubroca.

30) sxgbe_probe_config_dt() doesn't check devm_kzalloc()'s return value,
from Christophe Jaillet.

31) Fix missing ipv6 rx_dst_cookie update when rx_dst is updated during
early demux, from Paolo Abeni.

32) Several info leaks in xfrm_user layer, from Mathias Krause.

33) Fix out of bounds read in cxgb4 driver, from Stefano Brivio.

34) Properly propagate obsolete state of route upwards in ipv6 so that
upper holders like xfrm can see it. From Xin Long.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (118 commits)
udp: fix secpath leak
bridge: switchdev: Clear forward mark when transmitting packet
mlxsw: spectrum: Forbid linking to devices that have uppers
wl1251: add a missing spin_lock_init()
Revert "net: phy: Correctly process PHY_HALTED in phy_stop_machine()"
net: dsa: bcm_sf2: Fix number of CFP entries for BCM7278
kcm: do not attach PF_KCM sockets to avoid deadlock
sch_tbf: fix two null pointer dereferences on init failure
sch_sfq: fix null pointer dereference on init failure
sch_netem: avoid null pointer deref on init failure
sch_fq_codel: avoid double free on init failure
sch_cbq: fix null pointer dereferences on init failure
sch_hfsc: fix null pointer deref and double free on init failure
sch_hhf: fix null pointer dereference on init failure
sch_multiq: fix double free on init failure
sch_htb: fix crash on init failure
net/mlx5e: Fix CQ moderation mode not set properly
net/mlx5e: Fix inline header size for small packets
net/mlx5: E-Switch, Unload the representors in the correct order
net/mlx5e: Properly resolve TC offloaded ipv6 vxlan tunnel source address
...

8y ago

Stephen Douthit

ba201c4f

i2c: ismt: Return EMSGSIZE for block reads with bogus length

8y ago

Hans Verkuil

93a4c835

ARM: dts: exynos: add needs-hpd for Odroid-XU3/4

8y ago

Maxime Ripard

fe45174b

arm: dts: sunxi: Revert EMAC changes

8y ago

Gregory CLEMENT

d7a65c49

ARM64: dts: marvell: armada-37xx: Fix the number of GPIO on south bridge

8y ago

Linus Torvalds

73adb8c5

Merge tag 'for-4.13/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

8y ago

Jens Axboe

7ef10f3c

Merge branch 'nvme-4.13' of git://git.infradead.org/nvme into for-linus

8y ago

Pavel Shilovsky

9e37b178

CIFS: Fix maximum SMB2 header size

8y ago

Linus Torvalds

2615a38f

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

8y ago

Arvind Yadav

45bd07ad

x86: Constify attribute_group structures

8y ago

Helge Deller

79de3cbe

fs/select: Fix memory corruption in compat_get_fd_set()

8y ago

Linus Torvalds

fff4e7a0

Merge tag 'ntb-4.13-bugfixes' of git://github.com/jonmason/ntb

8y ago

Greg Kroah-Hartman

2c68888f

Merge tag 'fixes-for-4.13b' of git://git.kernel.org/pub/scm/linux/kernel/git/jic23/iio into staging-linus

8y ago

Linus Torvalds

ef954844

Linux 4.13-rc5 v4.13-rc5

8y ago

Joerg Roedel

74ddda71

iommu/amd: Fix schedule-while-atomic BUG in initialization code

8y ago

Linus Torvalds

b8a78bb4

Merge tag 'ceph-for-4.13-rc8' of git://github.com/ceph/ceph-client

8y ago

Yossi Kuperman

e8a732d1

udp: fix secpath leak

8y ago

Stephen Douthit

b6c159a9

i2c: ismt: Don't duplicate the receive length for block reads

8y ago

Arnd Bergmann

dbeb0c8e

ARM: at91: don't select CONFIG_ARM_CPU_SUSPEND for old platforms

8y ago

Maxime Ripard

87e1f5e8

arm64: dts: allwinner: Revert EMAC changes

8y ago

Linux 4.13 v4.13

569dbb88

Linus Torvalds

Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/ralf/upstream-linus

5e3b19d8

Linus Torvalds

Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

d0fa6ea1

Linus Torvalds

irqchip: mips-gic: SYNC after enabling GIC region

2c0e8382

James Hogan

Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

3b62dc6c

Linus Torvalds

x86/boot: Prevent faulty bootparams.screeninfo from causing harm

fb1cc2f9

Jan H. Schönherr

MIPS: Remove pt_regs adjustments in indirect syscall handler

5af2ed36

James Cowgill

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

e92d51af

Linus Torvalds

time: Fix ktime_get_raw() incorrect base accumulation

0bcdc098

John Stultz

x86/boot: Provide more slack space during decompression

5746f055

Jan H. Schönherr

MIPS: seccomp: Fix indirect syscall args

3d729dea

James Hogan

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6

d0d6ab53

Linus Torvalds

perf/ftrace: Fix double traces of perf on ftrace:function

75e83876

Zhou Chengming

timers: Fix excessive granularity of new timers after a nohz idle

2fe59f50

Nicholas Piggin

x86/ldt: Fix off by one in get_segment_base()

eaa2f87c

Dan Carpenter

Linux 4.13-rc7 v4.13-rc7

cc4a41fe

Linus Torvalds

Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc

54f70f52

Linus Torvalds

Fix warning messages when mounting to older servers

7e682f76

Steve French

perf/core: Fix potential double-fetch bug

While examining the kernel source code, I found a dangerous operation that
could turn into a double-fetch situation (a race condition bug) where the same
userspace memory region are fetched twice into kernel with sanity checks after
the first fetch while missing checks after the second fetch.

1. The first fetch happens in line 9573 get_user(size, &uattr->size).

2. Subsequently the 'size' variable undergoes a few sanity checks and
transformations (line 9577 to 9584).

3. The second fetch happens in line 9610 copy_from_user(attr, uattr, size)

4. Given that 'uattr' can be fully controlled in userspace, an attacker can
race condition to override 'uattr->size' to arbitrary value (say, 0xFFFFFFFF)
after the first fetch but before the second fetch. The changed value will be
copied to 'attr->size'.

5. There is no further checks on 'attr->size' until the end of this function,
and once the function returns, we lose the context to verify that 'attr->size'
conforms to the sanity checks performed in step 2 (line 9577 to 9584).

6. My manual analysis shows that 'attr->size' is not used elsewhere later,
so, there is no working exploit against it right now. However, this could
easily turns to an exploitable one if careless developers start to use
'attr->size' later.

To fix this, override 'attr->size' from the second fetch to the one from the
first fetch, regardless of what is actually copied in.

In this way, it is assured that 'attr->size' is consistent with the checks
performed after the first fetch.

Signed-off-by: Meng Xu <mengxu.gatech@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: acme@kernel.org
Cc: alexander.shishkin@linux.intel.com
Cc: meng.xu@gatech.edu
Cc: sanidhya@gatech.edu
Cc: taesoo@gatech.edu
Link: http://lkml.kernel.org/r/1503522470-35531-1-git-send-email-meng.xu@gatech.edu
Signed-off-by: Ingo Molnar <mingo@kernel.org>