commits

We employ a "waitboost" heuristic to detect when userspace is stalled
waiting for results from earlier execution. Under latency sensitive work
mixed between the gpu/cpu, the GPU is typically under-utilised and so
RPS sees that low utilisation as a reason to downclock the frequency,
causing longer stalls and lower throughput. The user left waiting for
the results is not impressed.

On applying commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv
workaround") it was observed that deinterlacing h264 on Haswell
performance dropped by 2-5x. The reason being that the natural workload
was not intense enough to trigger RPS (using HW evaluation intervals) to
upclock, and so it was depending on waitboosting for the throughput.

Commit 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
changes the composition of dma-resv from keeping a single write fence +
multiple read fences, to a single array of multiple write and read
fences (a maximum of one pair of write/read fences per context). The
iteration order was also changed implicitly from all-read fences then
the single write fence, to a mix of write fences followed by read
fences. It is that ordering change that belied the fragility of
waitboosting.

Currently, a waitboost is inspected at the point of waiting on an
outstanding fence. If the GPU is backlogged such that we haven't yet
stated the request we need to wait on, we force the GPU to upclock until
the completion of that request. By changing the order in which we waited
upon requests, we ended up waiting on those requests in sequence and as
such we saw that each request was already started and so not a suitable
candidate for waitboosting.

Instead of asking whether to boost each fence in turn, we can look at
whether boosting is required for the dma-resv ensemble prior to waiting
on any fence, making the heuristic more robust to the order in which
fences are stored in the dma-resv.

Reported-by: Thomas Voegtle <tv@lio96.de>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/6284
Fixes: 047a1b877ed4 ("dma-buf & drm/amdgpu: remove dma_resv workaround")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Karolina Drobnik <karolina.drobnik@intel.com>
Tested-by: Thomas Voegtle <tv@lio96.de>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/07e05518d9f6620d20cc1101ec1849203fe973f9.1657289332.git.karolina.drobnik@intel.com
(cherry picked from commit 394e2b57a989113de494c52d4683444bcb02d4e1)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>

3y ago

Linus Torvalds

2eccaca7

Merge tag 'gpio-fixes-for-v5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

3y ago

Kim Phillips

bcf16315

x86/bugs: Remove apostrophe typo

3y ago

Linus Torvalds

88084a3d

Linux 5.19-rc5 v5.19-rc5

3y ago

Adrian Hunter

498c7a54

perf tests: Stop Convert perf time to TSC test opening events twice

3y ago

Chris Wilson

a1c5a7bf

drm/i915/gt: Serialize TLB invalidates with GT resets

3y ago

Linus Torvalds

8ad4b6fa

Merge tag 'input-for-v5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input

3y ago

Bartosz Golaszewski

7329b071

gpio: sim: fix the chip_name configfs item

3y ago

Peter Zijlstra

564d9981

um: Add missing apply_returns()

3y ago

Linus Torvalds

b8d5109f

lockref: remove unused 'lockref_get_or_lock()' function

Looking at the conditional lock acquire functions in the kernel due to
the new sparse support (see commit 4a557a5d1a61 "sparse: introduce
conditional lock acquire function attribute"), it became obvious that
the lockref code has a couple of them, but they don't match the usual
naming convention for the other ones, and their return value logic is
also reversed.

In the other very similar places, the naming pattern is '*_and_lock()'
(eg 'atomic_put_and_lock()' and 'refcount_dec_and_lock()'), and the
function returns true when the lock is taken.

The lockref code is superficially very similar to the refcount code,
only with the special "atomic wrt the embedded lock" semantics. But
instead of the '*_and_lock()' naming it uses '*_or_lock()'.

And instead of returning true in case it took the lock, it returns true
if it *didn't* take the lock.

Now, arguably the reflock code is quite logical: it really is a "either
decrement _or_ lock" kind of situation - and the return value is about
whether the operation succeeded without any special care needed.

So despite the similarities, the differences do make some sense, and
maybe it's not worth trying to unify the different conditional locking
primitives in this area.

But while looking at this all, it did become obvious that the
'lockref_get_or_lock()' function hasn't actually had any users for
almost a decade.

The only user it ever had was the shortlived 'd_rcu_to_refcount()'
function, and it got removed and replaced with 'lockref_get_not_dead()'
back in 2013 in commits 0d98439ea3c6 ("vfs: use lockred 'dead' flag to
mark unrecoverably dead dentries") and e5c832d55588 ("vfs: fix dentry
RCU to refcounting possibly sleeping dput()")

In fact, that single use was removed less than a week after the whole
function was introduced in commit b3abd80250c1 ("lockref: add
'lockref_get_or_lock() helper") so this function has been around for a
decade, but only had a user for six days.

Let's just put this mis-designed and unused function out of its misery.

We can think about the naming and semantic oddities of the remaining
'lockref_put_or_lock()' later, but at least that function has users.

And while the naming is different and the return value doesn't match,
that function matches the whole '{atomic,refcount}_dec_and_test()'
pattern much better (ie the magic happens when the count goes down to
zero, not when it is incremented from zero).

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3y ago

Arnaldo Carvalho de Melo

91d248c3

tools arch x86: Sync the msr-index.h copy with the kernel sources

3y ago

Chris Wilson

b24dcf1d

drm/i915/gt: Serialize GRDOM access between multiple engine resets

3y ago

Linus Torvalds

396df700

Merge tag 'for-v5.19-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/sre/linux-power-supply

3y ago

Siarhei Vishniakou

2a96271f

Input: document the units for resolution of size axes

3y ago

Linus Torvalds

32346491

Linux 5.19-rc6 v5.19-rc6

3y ago

Alexandre Chartre

d16e0b26

x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt

3y ago

Linus Torvalds

4a557a5d

sparse: introduce conditional lock acquire function attribute

The kernel tends to try to avoid conditional locking semantics because
it makes it harder to think about and statically check locking rules,
but we do have a few fundamental locking primitives that take locks
conditionally - most obviously the 'trylock' functions.

That has always been a problem for 'sparse' checking for locking
imbalance, and we've had a special '__cond_lock()' macro that we've used
to let sparse know how the locking works:

# define __cond_lock(x,c) ((c) ? ({ __acquire(x); 1; }) : 0)

so that you can then use this to tell sparse that (for example) the
spinlock trylock macro ends up acquiring the lock when it succeeds, but
not when it fails:

#define raw_spin_trylock(lock) __cond_lock(lock, _raw_spin_trylock(lock))

and then sparse can follow along the locking rules when you have code like

if (!spin_trylock(&dentry->d_lock))
return LRU_SKIP;
.. sparse sees that the lock is held here..
spin_unlock(&dentry->d_lock);

and sparse ends up happy about the lock contexts.

However, this '__cond_lock()' use does result in very ugly header files,
and requires you to basically wrap the real function with that macro
that uses '__cond_lock'. Which has made PeterZ NAK things that try to
fix sparse warnings over the years [1].

To solve this, there is now a very experimental patch to sparse that
basically does the exact same thing as '__cond_lock()' did, but using a
function attribute instead. That seems to make PeterZ happy [2].

Note that this does not replace existing use of '__cond_lock()', but
only exposes the new proposed attribute and uses it for the previously
unannotated 'refcount_dec_and_lock()' family of functions.

For existing sparse installations, this will make no difference (a
negative output context was ignored), but if you have the experimental
sparse patch it will make sparse now understand code that uses those
functions, the same way '__cond_lock()' makes sparse understand the very
similar 'atomic_dec_and_lock()' uses that have the old '__cond_lock()'
annotations.

Note that in some cases this will silence existing context imbalance
warnings. But in other cases it may end up exposing new sparse warnings
for code that sparse just didn't see the locking for at all before.

This is a trial, in other words. I'd expect that if it ends up being
successful, and new sparse releases end up having this new attribute,
we'll migrate the old-style '__cond_lock()' users to use the new-style
'__cond_acquires' function attribute.

The actual experimental sparse patch was posted in [3].

Link: https://lore.kernel.org/all/20130930134434.GC12926@twins.programming.kicks-ass.net/ [1]
Link: https://lore.kernel.org/all/Yr60tWxN4P568x3W@worktop.programming.kicks-ass.net/ [2]
Link: https://lore.kernel.org/all/CAHk-=wjZfO9hGqJ2_hGQG3U_XzSh9_XaXze=HgPdvJbgrvASfA@mail.gmail.com/ [3]
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Aring <aahringo@redhat.com>
Cc: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

3y ago

Arnaldo Carvalho de Melo

f098addb

tools headers cpufeatures: Sync with the kernel sources

3y ago

Matthew Auld

aff1e0b0

drm/i915/ttm: fix sg_table construction

3y ago

Linus Torvalds

972a278f

Merge tag 'for-5.19-rc7-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

3y ago

Dorian Rudolph

093d27bb

power: supply: core: Fix boundary conditions in interpolation

3y ago

Hans de Goede

3de93e6e

Input: goodix - call acpi_device_fix_up_power() in some cases

3y ago

Linus Torvalds

24f4b40e

Merge branch 'hot-fixes' (fixes for rc6)

3y ago

Jiapeng Chong

33a8573b

x86/bugs: Mark retbleed_strings static

3y ago

Linus Torvalds

20855e4c

Merge tag 'xfs-5.19-fixes-4' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

3y ago

Arnaldo Carvalho de Melo

eee51fe3

tools headers UAPI: Sync linux/kvm.h with the kernel sources

3y ago

Dan Carpenter

896dcabd

drm/i915/selftests: fix a couple IS_ERR() vs NULL tests

3y ago

Linus Torvalds

c5fe7a97

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

3y ago

David Sterba

088aea3b

Revert "btrfs: turn delayed_nodes_tree into an XArray"

3y ago

Miaoqian Lin

80192eff

power/reset: arm-versatile: Fix refcount leak in versatile_reboot_probe

3y ago

Uwe Kleine-König

12dc6adc

Input: wm97xx - make .remove() obviously always return 0

3y ago

Linus Torvalds

952c53cd

Merge tag 'dmaengine-fix-5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine

3y ago

Linus Torvalds

fc82bbf4

ida: don't use BUG_ON() for debugging

3y ago

Juergen Gross

230ec83d

x86/pat: Fix x86_has_pat_wp()

3y ago

Linus Torvalds

69cb6c65

Merge tag 'nfsd-5.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux

3y ago

Darrick J. Wong

7561cea5

xfs: prevent a UAF when log IO errors race with unmount

KASAN reported the following use after free bug when running
generic/475:

XFS (dm-0): Mounting V5 Filesystem
XFS (dm-0): Starting recovery (logdev: internal)
XFS (dm-0): Ending recovery (logdev: internal)
Buffer I/O error on dev dm-0, logical block 20639616, async page read
Buffer I/O error on dev dm-0, logical block 20639617, async page read
XFS (dm-0): log I/O error -5
XFS (dm-0): Filesystem has been shut down due to log error (0x2).
XFS (dm-0): Unmounting Filesystem
XFS (dm-0): Please unmount the filesystem and rectify the problem(s).
==================================================================
BUG: KASAN: use-after-free in do_raw_spin_lock+0x246/0x270
Read of size 4 at addr ffff888109dd84c4 by task 3:1H/136

CPU: 3 PID: 136 Comm: 3:1H Not tainted 5.19.0-rc4-xfsx #rc4 8e53ab5ad0fddeb31cee5e7063ff9c361915a9c4
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014
Workqueue: xfs-log/dm-0 xlog_ioend_work [xfs]
Call Trace:
<TASK>
dump_stack_lvl+0x34/0x44
print_report.cold+0x2b8/0x661
? do_raw_spin_lock+0x246/0x270
kasan_report+0xab/0x120
? do_raw_spin_lock+0x246/0x270
do_raw_spin_lock+0x246/0x270
? rwlock_bug.part.0+0x90/0x90
xlog_force_shutdown+0xf6/0x370 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318]
xlog_ioend_work+0x100/0x190 [xfs 4ad76ae0d6add7e8183a553e624c31e9ed567318]
process_one_work+0x672/0x1040
worker_thread+0x59b/0xec0
? __kthread_parkme+0xc6/0x1f0
? process_one_work+0x1040/0x1040
? process_one_work+0x1040/0x1040
kthread+0x29e/0x340
? kthread_complete_and_exit+0x20/0x20
ret_from_fork+0x1f/0x30
</TASK>

Allocated by task 154099:
kasan_save_stack+0x1e/0x40
__kasan_kmalloc+0x81/0xa0
kmem_alloc+0x8d/0x2e0 [xfs]
xlog_cil_init+0x1f/0x540 [xfs]
xlog_alloc_log+0xd1e/0x1260 [xfs]
xfs_log_mount+0xba/0x640 [xfs]
xfs_mountfs+0xf2b/0x1d00 [xfs]
xfs_fs_fill_super+0x10af/0x1910 [xfs]
get_tree_bdev+0x383/0x670
vfs_get_tree+0x7d/0x240
path_mount+0xdb7/0x1890
__x64_sys_mount+0x1fa/0x270
do_syscall_64+0x2b/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0

Freed by task 154151:
kasan_save_stack+0x1e/0x40
kasan_set_track+0x21/0x30
kasan_set_free_info+0x20/0x30
____kasan_slab_free+0x110/0x190
slab_free_freelist_hook+0xab/0x180
kfree+0xbc/0x310
xlog_dealloc_log+0x1b/0x2b0 [xfs]
xfs_unmountfs+0x119/0x200 [xfs]
xfs_fs_put_super+0x6e/0x2e0 [xfs]
generic_shutdown_super+0x12b/0x3a0
kill_block_super+0x95/0xd0
deactivate_locked_super+0x80/0x130
cleanup_mnt+0x329/0x4d0
task_work_run+0xc5/0x160
exit_to_user_mode_prepare+0xd4/0xe0
syscall_exit_to_user_mode+0x1d/0x40
entry_SYSCALL_64_after_hwframe+0x46/0xb0

This appears to be a race between the unmount process, which frees the
CIL and waits for in-flight iclog IO; and the iclog IO completion. When
generic/475 runs, it starts fsstress in the background, waits a few
seconds, and substitutes a dm-error device to simulate a disk falling
out of a machine. If the fsstress encounters EIO on a pure data write,
it will exit but the filesystem will still be online.

The next thing the test does is unmount the filesystem, which tries to
clean the log, free the CIL, and wait for iclog IO completion. If an
iclog was being written when the dm-error switch occurred, it can race
with log unmounting as follows:

Thread 1 Thread 2

xfs_log_unmount
xfs_log_clean
xfs_log_quiesce
xlog_ioend_work
<observe error>
xlog_force_shutdown
test_and_set_bit(XLOG_IOERROR)
xfs_log_force
<log is shut down, nop>
xfs_log_umount_write
<log is shut down, nop>
xlog_dealloc_log
xlog_cil_destroy
<wait for iclogs>
spin_lock(&log->l_cilp->xc_push_lock)
<KABOOM>

Therefore, free the CIL after waiting for the iclogs to complete. I
/think/ this race has existed for quite a few years now, though I don't
remember the ~2014 era logging code well enough to know if it was a real
threat then or if the actual race was exposed only more recently.

Fixes: ac983517ec59 ("xfs: don't sleep in xlog_cil_force_lsn on shutdown")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Dave Chinner <dchinner@redhat.com>

3y ago

Rodrigo Vivi

f9954629

Merge tag 'gvt-fixes-2022-07-11' of https://github.com/intel/gvt-linux into drm-intel-fixes

3y ago

Linus Torvalds

6bca047e

Merge tag 'block-5.19-2022-07-15' of git://git.kernel.dk/linux-block

3y ago

Changyuan Lyu

355bf2e0

scsi: pm80xx: Set stopped phy's linkrate to Disabled

3y ago

David Sterba

5b8418b8

Revert "btrfs: turn name_cache radix tree into XArray in send_ctx"

3y ago

Gao Chao

0f5de2f0

power: supply: ab8500_fg: add missing destroy_workqueue in ab8500_fg_probe

3y ago

Johan Hovold

039d4ed3

Input: usbtouchscreen - add driver_info sanity check

3y ago

Linus Torvalds

5867f3b8

Merge tag 'staging-5.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

3y ago

Samuel Holland

607a48c7

dt-bindings: dma: allwinner,sun50i-a64-dma: Fix min/max typo

3y ago

Thomas Zimmermann

84499c5d

drm/aperture: Run fbdev removal before internal helpers

3y ago

Jiri Slaby

3131ef39

x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit

3y ago

Linus Torvalds

34074da5

Merge tag 'for-5.19/parisc-4' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux

3y ago

Chuck Lever

a23dd544

SUNRPC: Fix READ_PLUS crasher

3y ago

Darrick J. Wong

8944c6fb

xfs: dont treat rt extents beyond EOF as eofblocks to be cleared

3y ago

Thomas Hellström

48da0f67

drm/i915: Fix vm use-after-free in vma destruction

3y ago

Linux 5.19-rc7 v5.19-rc7

ff699273

Linus Torvalds

Merge tag 'drm-intel-fixes-2022-07-17' of git://anongit.freedesktop.org/drm/drm-intel

55ea9bd6

Linus Torvalds

Merge tag 'perf-tools-fixes-for-v5.19-2022-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

f7f4da30

Linus Torvalds

drm/i915/ttm: fix 32b build

ced7866d

Matthew Auld

Merge tag 'perf_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2b18593e

Linus Torvalds

perf trace: Fix SIGSEGV when processing syscall args

On powerpc, 'perf trace' is crashing with a SIGSEGV when trying to
process a perf.data file created with 'perf trace record -p':

#0 0x00000001225b8988 in syscall_arg__scnprintf_augmented_string <snip> at builtin-trace.c:1492
#1 syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1492
#2 syscall_arg__scnprintf_filename <snip> at builtin-trace.c:1486
#3 0x00000001225bdd9c in syscall_arg_fmt__scnprintf_val <snip> at builtin-trace.c:1973
#4 syscall__scnprintf_args <snip> at builtin-trace.c:2041
#5 0x00000001225bff04 in trace__sys_enter <snip> at builtin-trace.c:2319

That points to the below code in tools/perf/builtin-trace.c:
/*
* If this is raw_syscalls.sys_enter, then it always comes with the 6 possible
* arguments, even if the syscall being handled, say "openat", uses only 4 arguments
* this breaks syscall__augmented_args() check for augmented args, as we calculate
* syscall->args_size using each syscalls:sys_enter_NAME tracefs format file,
* so when handling, say the openat syscall, we end up getting 6 args for the
* raw_syscalls:sys_enter event, when we expected just 4, we end up mistakenly
* thinking that the extra 2 u64 args are the augmented filename, so just check
* here and avoid using augmented syscalls when the evsel is the raw_syscalls one.
*/
if (evsel != trace->syscalls.events.sys_enter)
augmented_args = syscall__augmented_args(sc, sample, &augmented_args_size, trace->raw_augmented_syscalls_args_size);

As the comment points out, we should not be trying to augment the args
for raw_syscalls. However, when processing a perf.data file, we are not
initializing those properly. Fix the same.

Reported-by: Claudio Carvalho <cclaudio@linux.ibm.com>
Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/20220707090900.572584-1-naveen.n.rao@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>