commits

hrtimer_force_reprogram() and hrtimer_interrupt() invokes
__hrtimer_get_next_event() to find the earliest expiry time of hrtimer
bases. __hrtimer_get_next_event() does not update
cpu_base::[softirq_]_expires_next to preserve reprogramming logic. That
needs to be done at the callsites.

hrtimer_force_reprogram() updates cpu_base::softirq_expires_next only when
the first expiring timer is a softirq timer and the soft interrupt is not
activated. That's wrong because cpu_base::softirq_expires_next is left
stale when the first expiring timer of all bases is a timer which expires
in hard interrupt context. hrtimer_interrupt() does never update
cpu_base::softirq_expires_next which is wrong too.

That becomes a problem when clock_settime() sets CLOCK_REALTIME forward and
the first soft expiring timer is in the CLOCK_REALTIME_SOFT base. Setting
CLOCK_REALTIME forward moves the clock MONOTONIC based expiry time of that
timer before the stale cpu_base::softirq_expires_next.

cpu_base::softirq_expires_next is cached to make the check for raising the
soft interrupt fast. In the above case the soft interrupt won't be raised
until clock monotonic reaches the stale cpu_base::softirq_expires_next
value. That's incorrect, but what's worse it that if the softirq timer
becomes the first expiring timer of all clock bases after the hard expiry
timer has been handled the reprogramming of the clockevent from
hrtimer_interrupt() will result in an interrupt storm. That happens because
the reprogramming does not use cpu_base::softirq_expires_next, it uses
__hrtimer_get_next_event() which returns the actual expiry time. Once clock
MONOTONIC reaches cpu_base::softirq_expires_next the soft interrupt is
raised and the storm subsides.

Change the logic in hrtimer_force_reprogram() to evaluate the soft and hard
bases seperately, update softirq_expires_next and handle the case when a
soft expiring timer is the first of all bases by comparing the expiry times
and updating the required cpu base fields. Split this functionality into a
separate function to be able to use it in hrtimer_interrupt() as well
without copy paste.

Fixes: 5da70160462e ("hrtimer: Implement support for softirq based hrtimers")
Reported-by: Mikael Beckius <mikael.beckius@windriver.com>
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Mikael Beckius <mikael.beckius@windriver.com>
Signed-off-by: Anna-Maria Behnsen <anna-maria@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20210223160240.27518-1-anna-maria@linutronix.de

4y ago

Hans de Goede

4c7bcb51

genirq: Prevent [devm_]irq_alloc_desc from returning irq 0

5y ago

Paul Cercueil

5fbecd23

irqchip/ingenic: Add support for the JZ4760

4y ago

Linus Torvalds

19469d2a

Merge tag 'objtool-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

4y ago

Mathieu Desnoyers

ce29ddc4

sched/membarrier: fix missing local execution of ipi_sync_rq_state()

4y ago

Linus Torvalds

a38fd874

Linux 5.12-rc2 v5.12-rc2

4y ago

Marc Zyngier

4c457e8c

genirq/msi: Activate Multi-MSI early when MSI_FLAG_ACTIVATE_EARLY is set

5y ago

Paul Cercueil

673433e7

dt-bindings/irq: Add compatible string for the JZ4760B

4y ago

Linus Torvalds

fa509ff8

Merge tag 'locking-urgent-2021-03-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

4y ago

Peter Zijlstra

ba08abca

objtool,x86: Fix uaccess PUSHF/POPF validation

4y ago

Peter Zijlstra

50caf9c1

sched: Simplify set_affinity_pending refcounts

4y ago

Linus Torvalds

f3ed4de6

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

4y ago

Linus Torvalds

13391c60

Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

5y ago

Marc Zyngier

a79f7051

irqchip: Do not blindly select CONFIG_GENERIC_IRQ_MULTI_HANDLER

4y ago

Linus Torvalds

75013c6c

Merge tag 'perf_urgent_for_v5.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

4y ago

Peter Zijlstra

4817a52b

seqlock,lockdep: Fix seqcount_latch_init()

4y ago

Peter Zijlstra

9e81889c

sched: Fix affine_move_task() self-concurrency

4y ago

Linus Torvalds

de5bd6c5

Merge tag 'gcc-plugins-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

4y ago

Bob Pearson

545c4ab4

RDMA/rxe: Fix errant WARN_ONCE in rxe_completer()

4y ago

Johannes Berg

f8ad8187

fs/pipe: allow sendfile() to pipe again

5y ago

Herbert Xu

4f6543f2

crypto: marvel/cesa - Fix tdma descriptor on 64-bit

5y ago

Marc Zyngier

3e895f4c

ARM: ep93xx: Select GENERIC_IRQ_MULTI_HANDLER directly

4y ago

Linus Torvalds

836d7f05

Merge tag 'efi-urgent-for-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

4y ago

Sean Christopherson

c8e2fe13

x86/perf: Use RET0 as default for guest_get_msrs to handle "no PMU" case

4y ago

Peter Zijlstra

d5b0e067

u64_stats,lockdep: Fix u64_stats_init() vs lockdep

4y ago

Peter Zijlstra

3f1bc119

sched: Optimize migration_cpu_stop()

4y ago

Linus Torvalds

8b24ef44

Merge tag 'pstore-v5.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

4y ago

Jason Yan

5477edca

gcc-plugins: latent_entropy: remove unneeded semicolon

4y ago

Bob Pearson

5e4a7ccc

RDMA/rxe: Fix extra deref in rxe_rcv_mcast_pkt()

4y ago

Sami Tolvanen

9f12e37c

Commit 9bb48c82aced ("tty: implement write_iter") converted the tty layer to use write_iter. Fix the redirected_tty_write declaration also in n_tty and change the comparisons to use write_iter instead of write.

5y ago

Arnd Bergmann

38281194

crypto: omap-sham - Fix link error without crypto-engine

5y ago

Greg Kroah-Hartman

69dd4503

irqdomain: Remove debugfs_file from struct irq_domain

4y ago

Linus Torvalds

0a7c10df

Merge tag 'x86_urgent_for_v5.12_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

4y ago

Ard Biesheuvel

9e9888a0

efi: stub: omit SetVirtualAddressMap() if marked unsupported in RT_PROP table

4y ago

Kan Liang

afbef301

perf/x86/intel: Set PERF_ATTACH_SCHED_CB for large PEBS and LBR

4y ago

Peter Zijlstra

50bf8080

static_call: Fix the module key fixup

4y ago

Peter Zijlstra

58b1a450

sched: Collate affine_move_task() stoppers

4y ago

Linus Torvalds

63dcd69d

Merge tag 'for-5.12/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm

4y ago

Dmitry Osipenko

7db688e9

pstore/ram: Rate-limit "uncorrectable error in header" message

4y ago

Jason Yan

b924a819

gcc-plugins: structleak: remove unneeded variable 'ret'

4y ago

Bob Pearson

21e27ac8

RDMA/rxe: Fix missed IB reference counting in loopback

4y ago

Linus Torvalds

007ad27d

Merge tag 'printk-for-5.11-urgent-fixup' of git://git.kernel.org/pub/scm/linux/kernel/git/printk/linux

5y ago

Kirill Tkhai

3c02e04f

crypto: xor - Fix divide error in do_xor_speed()

5y ago

Linus Torvalds

c3c7579f

Merge tag 'powerpc-5.12-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

4y ago

Joerg Roedel

bffe30dd

x86/sev-es: Use __copy_from_user_inatomic()

4y ago

Kan Liang

a5398bff

perf/core: Flush PMU internal buffers for per-CPU events

4y ago

Peter Zijlstra

c20cf065

sched: Simplify migration_cpu_stop()

4y ago

Linus Torvalds

47454caf

Merge tag 'block-5.12-2021-03-05' of git://git.kernel.dk/linux-block

4y ago

Milan Broz

df7b59ba

dm verity: fix FEC for RS roots unaligned to block size

Optional Forward Error Correction (FEC) code in dm-verity uses
Reed-Solomon code and should support roots from 2 to 24.

The error correction parity bytes (of roots lengths per RS block) are
stored on a separate device in sequence without any padding.

Currently, to access FEC device, the dm-verity-fec code uses dm-bufio
client with block size set to verity data block (usually 4096 or 512
bytes).

Because this block size is not divisible by some (most!) of the roots
supported lengths, data repair cannot work for partially stored parity
bytes.

This fix changes FEC device dm-bufio block size to "roots << SECTOR_SHIFT"
where we can be sure that the full parity data is always available.
(There cannot be partial FEC blocks because parity must cover whole
sectors.)

Because the optional FEC starting offset could be unaligned to this
new block size, we have to use dm_bufio_set_sector_offset() to
configure it.

The problem is easily reproduced using veritysetup, e.g. for roots=13:

# create verity device with RS FEC
dd if=/dev/urandom of=data.img bs=4096 count=8 status=none
veritysetup format data.img hash.img --fec-device=fec.img --fec-roots=13 | awk '/^Root hash/{ print $3 }' >roothash

# create an erasure that should be always repairable with this roots setting
dd if=/dev/zero of=data.img conv=notrunc bs=1 count=8 seek=4088 status=none

# try to read it through dm-verity
veritysetup open data.img test hash.img --fec-device=fec.img --fec-roots=13 $(cat roothash)
dd if=/dev/mapper/test of=/dev/null bs=4096 status=noxfer
# wait for possible recursive recovery in kernel
udevadm settle
veritysetup close test

With this fix, errors are properly repaired.
device-mapper: verity-fec: 7:1: FEC 0: corrected 8 errors
...

Without it, FEC code usually ends on unrecoverable failure in RS decoder:
device-mapper: verity-fec: 7:1: FEC 0: failed to correct: -74
...

This problem is present in all kernels since the FEC code's
introduction (kernel 4.5).

It is thought that this problem is not visible in Android ecosystem
because it always uses a default RS roots=2.

Depends-on: a14e5ec66a7a ("dm bufio: subtract the number of initial sectors in dm_bufio_get_device_size")
Signed-off-by: Milan Broz <gmazyland@gmail.com>
Tested-by: Jérôme Carretero <cJ-ko@zougloub.eu>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Cc: stable@vger.kernel.org # 4.5+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>