commits

tl;dr: CPUs with CPUID.80000008H but without CPUID.01H:EDX[CLFSH]
will end up reporting cache_line_size()==0 and bad things happen.
Fill in a default on those to avoid the problem.

Long Story:

The kernel dies a horrible death if c->x86_cache_alignment (aka.
cache_line_size() is 0. Normally, this value is populated from
c->x86_clflush_size.

Right now the code is set up to get c->x86_clflush_size from two
places. First, modern CPUs get it from CPUID. Old CPUs that don't
have leaf 0x80000008 (or CPUID at all) just get some sane defaults
from the kernel in get_cpu_address_sizes().

The vast majority of CPUs that have leaf 0x80000008 also get
->x86_clflush_size from CPUID. But there are oddballs.

Intel Quark CPUs[1] and others[2] have leaf 0x80000008 but don't set
CPUID.01H:EDX[CLFSH], so they skip over filling in ->x86_clflush_size:

cpuid(0x00000001, &tfms, &misc, &junk, &cap0);
if (cap0 & (1<<19))
c->x86_clflush_size = ((misc >> 8) & 0xff) * 8;

So they: land in get_cpu_address_sizes() and see that CPUID has level
0x80000008 and jump into the side of the if() that does not fill in
c->x86_clflush_size. That assigns a 0 to c->x86_cache_alignment, and
hilarity ensues in code like:

buffer = kzalloc(ALIGN(sizeof(*buffer), cache_line_size()),
GFP_KERNEL);

To fix this, always provide a sane value for ->x86_clflush_size.

Big thanks to Andy Shevchenko for finding and reporting this and also
providing a first pass at a fix. But his fix was only partial and only
worked on the Quark CPUs. It would not, for instance, have worked on
the QEMU config.

1. https://raw.githubusercontent.com/InstLatx64/InstLatx64/master/GenuineIntel/GenuineIntel0000590_Clanton_03_CPUID.txt
2. You can also get this behavior if you use "-cpu 486,+clzero"
in QEMU.

[ dhansen: remove 'vp_bits_from_cpuid' reference in changelog
because bpetkov brutally murdered it recently. ]

Fixes: fbf6449f84bf ("x86/sev-es: Set x86_virt_bits to the correct value straight away, instead of a two-phase approach")
Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Tested-by: Jörn Heusipp <osmanx@heusipp.de>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/all/20240516173928.3960193-1-andriy.shevchenko@linux.intel.com/
Link: https://lore.kernel.org/lkml/5e31cad3-ad4d-493e-ab07-724cfbfaba44@heusipp.de/
Link: https://lore.kernel.org/all/20240517200534.8EC5F33E%40davehans-spike.ostc.intel.com

2y ago

Niklas Cassel

86aaa7e9

ata: libata-core: Add ATA_HORKAGE_NOLPM for Crucial CT240BX500SSD1

2y ago

Linus Torvalds

00a8c352

Merge tag 'hardening-v6.10-rc2-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

2y ago

Jeff Johnson

dc8e5dfb

perf/x86/intel: Add missing MODULE_DESCRIPTION() lines

2y ago

Cheng Yu

49217ea1

sched/core: Fix incorrect initialization of the 'burst' parameter in cpu_max_write()

2y ago

Thomas Gleixner

34bf6bae

x86/topology/amd: Evaluate SMT in CPUID leaf 0x8000001e only on family 0x17 and greater

2y ago

Jason Nader

9e2f46cd

ata: ahci: Do not apply Intel PCS quirk on Intel Alder Lake

2y ago

Linus Torvalds

83814698

Merge tag 'powerpc-6.10-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

2y ago

Kees Cook

99a6087d

kunit/fortify: Remove __kmalloc_node() test

2y ago

Jeff Johnson

0a44078f

perf/x86/rapl: Add missing MODULE_DESCRIPTION() line

2y ago

Christian Loehle

7cb7fb5b

sched/fair: Remove stale FREQUENCY_UTIL comment

2y ago

Linus Torvalds

1613e604

Linux 6.10-rc1 v6.10-rc1

2y ago

Linus Torvalds

54bec8ed

Merge tag 'firewire-fixes-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394

2y ago

Samuel Holland

be2fc65d

powerpc: Limit ARCH_HAS_KERNEL_FPU_SUPPORT to PPC64

2y ago

Kees Cook

4e173c82

mailmap: update entry for Kees Cook

2y ago

Linus Torvalds

4a4be1ad

Revert "vfs: Delete the associated dentry when deleting a file"

2y ago

Dawei Li

72bffbf5

sched/fair: Fix initial util_avg calculation

2y ago

Kent Overstreet

9b0abe79

mm: percpu: Include smp.h in alloc_tag.h

2y ago

Linus Torvalds

89be4025

Merge tag '6.10-rc1-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

2y ago

Jeff Johnson

f94b7770

firewire: add missing MODULE_DESCRIPTION() to test modules

2y ago

Michael Ellerman

50934945

powerpc/uaccess: Use YZ asm constraint for ld

2y ago

Kees Cook

adb77bba

scsi: mpt3sas: Avoid possible run-time warning with long manufacturer strings

2y ago

Linus Torvalds

397a83ab

Merge tag '9p-for-6.10-rc2' of https://github.com/martinetd/linux

2y ago

Vitalii Bursov

0f1c74be

docs: cgroup-v1: Clarify that domain levels are system-specific

2y ago

Linus Torvalds

6fbf7185

Merge tag 'perf-tools-fixes-for-v6.10-1-2024-05-26' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

2y ago

Linus Torvalds

ec9eeb89

Merge tag 'kbuild-fixes-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild

2y ago

Steve French

518549c1

cifs: fix creating sockets when using sfu mount options

2y ago

Michael Ellerman

2d43cc70

powerpc/uaccess: Fix build errors seen with GCC 13/14

Building ppc64le_defconfig with GCC 14 fails with assembler errors:

CC fs/readdir.o
/tmp/ccdQn0mD.s: Assembler messages:
/tmp/ccdQn0mD.s:212: Error: operand out of domain (18 is not a multiple of 4)
/tmp/ccdQn0mD.s:226: Error: operand out of domain (18 is not a multiple of 4)
... [6 lines]
/tmp/ccdQn0mD.s:1699: Error: operand out of domain (18 is not a multiple of 4)

A snippet of the asm shows:

# ../fs/readdir.c:210: unsafe_copy_dirent_name(dirent->d_name, name, namlen, efault_end);
ld 9,0(29) # MEM[(u64 *)name_38(D) + _88 * 1], MEM[(u64 *)name_38(D) + _88 * 1]
# 210 "../fs/readdir.c" 1
1: std 9,18(8) # put_user # *__pus_addr_52, MEM[(u64 *)name_38(D) + _88 * 1]

The 'std' instruction requires a 4-byte aligned displacement because
it is a DS-form instruction, and as the assembler says, 18 is not a
multiple of 4.

A similar error is seen with GCC 13 and CONFIG_UBSAN_SIGNED_WRAP=y.

The fix is to change the constraint on the memory operand to put_user(),
from "m" which is a general memory reference to "YZ".

The "Z" constraint is documented in the GCC manual PowerPC machine
constraints, and specifies a "memory operand accessed with indexed or
indirect addressing". "Y" is not documented in the manual but specifies
a "memory operand for a DS-form instruction". Using both allows the
compiler to generate a DS-form "std" or X-form "stdx" as appropriate.

The change has to be conditional on CONFIG_PPC_KERNEL_PREFIXED because
the "Y" constraint does not guarantee 4-byte alignment when prefixed
instructions are enabled.

Unfortunately clang doesn't support the "Y" constraint so that has to be
behind an ifdef.

Although the build error is only seen with GCC 13/14, that appears
to just be luck. The constraint has been incorrect since it was first
added.

Fixes: c20beffeec3c ("powerpc/uaccess: Use flexible addressing with __put_user()/__get_user()")
Cc: stable@vger.kernel.org # v5.10+
Suggested-by: Kewen Lin <linkw@gcc.gnu.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240529123029.146953-1-mpe@ellerman.id.au

2y ago

Linus Torvalds

db163660

Merge tag 'v6.10-p3' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

2y ago

Dominique Martinet

c898afdc

9p: add missing locking around taking dentry fid list

2y ago

Vitalii Bursov

287372fa

sched/debug: Dump domains' level

2y ago

Linus Torvalds

c1332049

Merge tag '6.10-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6

2y ago

Arnaldo Carvalho de Melo

4f1b0673

Revert "perf parse-events: Prefer sysfs/JSON hardware events over legacy"

2y ago

Linus Torvalds

bbeb1219

Merge tag 'xfs-6.10-fixes-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

2y ago

Rob Herring (Arm)

1b1c9f0f

dt-bindings: kbuild: Fix dt_binding_check on unconfigured build

2y ago

Jeff Johnson

9ee267a2

fs: smb: common: add missing MODULE_DESCRIPTION() macros

2y ago

Nathan Lynch

12870ae3

powerpc/pseries/lparcfg: drop error message from guest name lookup

2y ago

Linus Torvalds

e0cce98f

Merge tag 'tpmdd-next-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

2y ago

Herbert Xu

67ec8cdf

hwrng: core - Remove add_early_randomness

2y ago

Nikita Zhandarovich

25460d6f

net/9p: fix uninit-value in p9_client_rpc()

Syzbot with the help of KMSAN reported the following error:

BUG: KMSAN: uninit-value in trace_9p_client_res include/trace/events/9p.h:146 [inline]
BUG: KMSAN: uninit-value in p9_client_rpc+0x1314/0x1340 net/9p/client.c:754
trace_9p_client_res include/trace/events/9p.h:146 [inline]
p9_client_rpc+0x1314/0x1340 net/9p/client.c:754
p9_client_create+0x1551/0x1ff0 net/9p/client.c:1031
v9fs_session_init+0x1b9/0x28e0 fs/9p/v9fs.c:410
v9fs_mount+0xe2/0x12b0 fs/9p/vfs_super.c:122
legacy_get_tree+0x114/0x290 fs/fs_context.c:662
vfs_get_tree+0xa7/0x570 fs/super.c:1797
do_new_mount+0x71f/0x15e0 fs/namespace.c:3352
path_mount+0x742/0x1f20 fs/namespace.c:3679
do_mount fs/namespace.c:3692 [inline]
__do_sys_mount fs/namespace.c:3898 [inline]
__se_sys_mount+0x725/0x810 fs/namespace.c:3875
__x64_sys_mount+0xe4/0x150 fs/namespace.c:3875
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75

Uninit was created at:
__alloc_pages+0x9d6/0xe70 mm/page_alloc.c:4598
__alloc_pages_node include/linux/gfp.h:238 [inline]
alloc_pages_node include/linux/gfp.h:261 [inline]
alloc_slab_page mm/slub.c:2175 [inline]
allocate_slab mm/slub.c:2338 [inline]
new_slab+0x2de/0x1400 mm/slub.c:2391
___slab_alloc+0x1184/0x33d0 mm/slub.c:3525
__slab_alloc mm/slub.c:3610 [inline]
__slab_alloc_node mm/slub.c:3663 [inline]
slab_alloc_node mm/slub.c:3835 [inline]
kmem_cache_alloc+0x6d3/0xbe0 mm/slub.c:3852
p9_tag_alloc net/9p/client.c:278 [inline]
p9_client_prepare_req+0x20a/0x1770 net/9p/client.c:641
p9_client_rpc+0x27e/0x1340 net/9p/client.c:688
p9_client_create+0x1551/0x1ff0 net/9p/client.c:1031
v9fs_session_init+0x1b9/0x28e0 fs/9p/v9fs.c:410
v9fs_mount+0xe2/0x12b0 fs/9p/vfs_super.c:122
legacy_get_tree+0x114/0x290 fs/fs_context.c:662
vfs_get_tree+0xa7/0x570 fs/super.c:1797
do_new_mount+0x71f/0x15e0 fs/namespace.c:3352
path_mount+0x742/0x1f20 fs/namespace.c:3679
do_mount fs/namespace.c:3692 [inline]
__do_sys_mount fs/namespace.c:3898 [inline]
__se_sys_mount+0x725/0x810 fs/namespace.c:3875
__x64_sys_mount+0xe4/0x150 fs/namespace.c:3875
do_syscall_64+0xd5/0x1f0
entry_SYSCALL_64_after_hwframe+0x6d/0x75

If p9_check_errors() fails early in p9_client_rpc(), req->rc.tag
will not be properly initialized. However, trace_9p_client_res()
ends up trying to print it out anyway before p9_client_rpc()
finishes.

Fix this issue by assigning default values to p9_fcall fields
such as 'tag' and (just in case KMSAN unearths something new) 'id'
during the tag allocation stage.

Reported-and-tested-by: syzbot+ff14db38f56329ef68df@syzkaller.appspotmail.com
Fixes: 348b59012e5c ("net/9p: Convert net/9p protocol dumps to tracepoints")
Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru>
Reviewed-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Cc: stable@vger.kernel.org
Message-ID: <20240408141039.30428-1-n.zhandarovich@fintech.ru>
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>

2y ago

Vitalii Bursov

a1fd0b9d

sched/fair: Allow disabling sched_balance_newidle with sched_relax_domain_level

2y ago

Linus Torvalds

9b62e02e

Merge tag 'mm-hotfixes-stable-2024-05-25-09-13' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

2y ago

David Howells

93a43155

cifs: Fix missing set of remote_i_size

2y ago

Linus Torvalds

29c73fc7

Merge tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Arnaldo Carvalho de Melo:
"General:

- Integrate the shellcheck utility with the build of perf to allow
catching shell problems early in areas such as 'perf test', 'perf
trace' scrape scripts, etc

- Add 'uretprobe' variant in the 'perf bench uprobe' tool

- Add script to run instances of 'perf script' in parallel

- Allow parsing tracepoint names that start with digits, such as
9p/9p_client_req, etc. Make sure 'perf test' tests it even on
systems where those tracepoints aren't available

- Add Kan Liang to MAINTAINERS as a perf tools reviewer

- Add support for using the 'capstone' disassembler library in
various tools, such as 'perf script' and 'perf annotate'. This is
an alternative for the use of the 'xed' and 'objdump' disassemblers

Data-type profiling improvements:

- Resolve types for a->b->c by backtracking the assignments until it
finds DWARF info for one of those members

- Support for global variables, keeping a cache to speed up lookups

- Handle the 'call' instruction, dealing with effects on registers
and handling its return when tracking register data types

- Handle x86's segment based addressing like %gs:0x28, to support
things like per CPU variables, the stack canary, etc

- Data-type profiling got big speedups when using capstone for
disassembling. The objdump outoput parsing method is left as a
fallback when capstone fails or isn't available. There are patches
posted for 6.11 that to use a LLVM disassembler

- Support event group display in the TUI when annotating types with
--data-type, for instance to show memory load and store events for
the data type fields

- Optimize the 'perf annotate' data structures, reducing memory usage

- Add a initial 'perf test' for 'perf annotate', checking that a
target symbol appears on the output, specifying objdump via the
command line, etc

Vendor Events:

- Update Intel JSON files for Cascade Lake X, Emerald Rapids, Grand
Ridge, Ice Lake X, Lunar Lake, Meteor Lake, Sapphire Rapids, Sierra
Forest, Sky Lake X, Sky Lake and Snow Ridge X. Remove info metrics
erroneously in TopdownL1

- Add AMD's Zen 5 core and uncore events and metrics. Those come from
the "Performance Monitor Counters for AMD Family 1Ah Model 00h- 0Fh
Processors" document, with events that capture information on op
dispatch, execution and retirement, branch prediction, L1 and L2
cache activity, TLB activity, etc

- Mark L1D_CACHE_INVAL impacted by errata for ARM64's AmpereOne/
AmpereOneX

Miscellaneous:

- Sync header copies with the kernel sources

- Move some header copies used only for generating translation string
tables for ioctl cmds and other syscall integer arguments to a new
directory under tools/perf/beauty/, to separate from copies in
tools/include/ that are used to build the tools

- Introduce scrape script for several syscall 'flags'/'mask'
arguments

- Improve cpumap utilization, fixing up pairing of refcounts, using
the right iterators (perf_cpu_map__for_each_cpu), etc

- Give more details about raw event encodings in 'perf list', show
tracepoint encoding in the detailed output

- Refactor the DSOs handling code, reducing memory usage

- Document the BPF event modifier and add a 'perf test' for it

- Improve the event parser, better error messages and add further
'perf test's for it

- Add reference count checking to 'struct comm_str' and 'struct
mem_info'

- Make ARM64's 'perf test' entries for the Neoverse N1 more robust

- Tweak the ARM64's Coresight 'perf test's

- Improve ARM64's CoreSight ETM version detection and error reporting

- Fix handling of symbols when using kcore

- Fix PAI (Processor Activity Instrumentation) counter names for s390
virtual machines in 'perf report'

- Fix -g/--call-graph option failure in 'perf sched timehist'

- Add LIBTRACEEVENT_DIR build option to allow building with
libtraceevent installed in non-standard directories, such as when
doing cross builds

- Various 'perf test' and 'perf bench' fixes

- Improve 'perf probe' error message for long C++ probe names"

* tag 'perf-tools-for-v6.10-1-2024-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (260 commits)
tools lib subcmd: Show parent options in help
perf pmu: Count sys and cpuid JSON events separately
perf stat: Don't display metric header for non-leader uncore events
perf annotate-data: Ensure the number of type histograms
perf annotate: Fix segfault on sample histogram
perf daemon: Fix file leak in daemon_session__control
libsubcmd: Fix parse-options memory leak
perf lock: Avoid memory leaks from strdup()
perf sched: Rename 'switches' column header to 'count' and add usage description, options for latency
perf tools: Ignore deleted cgroups
perf parse: Allow tracepoint names to start with digits
perf parse-events: Add new 'fake_tp' parameter for tests
perf parse-events: pass parse_state to add_tracepoint
perf symbols: Fix ownership of string in dso__load_vmlinux()
perf symbols: Update kcore map before merging in remaining symbols
perf maps: Re-use __maps__free_maps_by_name()
perf symbols: Remove map from list before updating addresses
perf tracepoint: Don't scan all tracepoints to test if one exists
perf dwarf-aux: Fix build with HAVE_DWARF_CFI_SUPPORT
perf thread: Fixes to thread__new() related to initializing comm
...

2y ago

Linus Torvalds

f26ee67a

Merge tag 'tty-6.10-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

2y ago

Ritesh Harjani (IBM)

b0c6bcd5

xfs: Add cond_resched to block unmap range and reflink remap path

An async dio write to a sparse file can generate a lot of extents
and when we unlink this file (using rm), the kernel can be busy in umapping
and freeing those extents as part of transaction processing.

Similarly xfs reflink remapping path can also iterate over a million
extent entries in xfs_reflink_remap_blocks().

Since we can busy loop in these two functions, so let's add cond_resched()
to avoid softlockup messages like these.

watchdog: BUG: soft lockup - CPU#1 stuck for 22s! [kworker/1:0:82435]
CPU: 1 PID: 82435 Comm: kworker/1:0 Tainted: G S L 6.9.0-rc5-0-default #1
Workqueue: xfs-inodegc/sda2 xfs_inodegc_worker
NIP [c000000000beea10] xfs_extent_busy_trim+0x100/0x290
LR [c000000000bee958] xfs_extent_busy_trim+0x48/0x290
Call Trace:
xfs_alloc_get_rec+0x54/0x1b0 (unreliable)
xfs_alloc_compute_aligned+0x5c/0x144
xfs_alloc_ag_vextent_size+0x238/0x8d4
xfs_alloc_fix_freelist+0x540/0x694
xfs_free_extent_fix_freelist+0x84/0xe0
__xfs_free_extent+0x74/0x1ec
xfs_extent_free_finish_item+0xcc/0x214
xfs_defer_finish_one+0x194/0x388
xfs_defer_finish_noroll+0x1b4/0x5c8
xfs_defer_finish+0x2c/0xc4
xfs_bunmapi_range+0xa4/0x100
xfs_itruncate_extents_flags+0x1b8/0x2f4
xfs_inactive_truncate+0xe0/0x124
xfs_inactive+0x30c/0x3e0
xfs_inodegc_worker+0x140/0x234
process_scheduled_works+0x240/0x57c
worker_thread+0x198/0x468
kthread+0x138/0x140
start_kernel_thread+0x14/0x18

run fstests generic/175 at 2024-02-02 04:40:21
[ C17] watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
watchdog: BUG: soft lockup - CPU#17 stuck for 23s! [xfs_io:7679]
CPU: 17 PID: 7679 Comm: xfs_io Kdump: loaded Tainted: G X 6.4.0
NIP [c008000005e3ec94] xfs_rmapbt_diff_two_keys+0x54/0xe0 [xfs]
LR [c008000005e08798] xfs_btree_get_leaf_keys+0x110/0x1e0 [xfs]
Call Trace:
0xc000000014107c00 (unreliable)
__xfs_btree_updkeys+0x8c/0x2c0 [xfs]
xfs_btree_update_keys+0x150/0x170 [xfs]
xfs_btree_lshift+0x534/0x660 [xfs]
xfs_btree_make_block_unfull+0x19c/0x240 [xfs]
xfs_btree_insrec+0x4e4/0x630 [xfs]
xfs_btree_insert+0x104/0x2d0 [xfs]
xfs_rmap_insert+0xc4/0x260 [xfs]
xfs_rmap_map_shared+0x228/0x630 [xfs]
xfs_rmap_finish_one+0x2d4/0x350 [xfs]
xfs_rmap_update_finish_item+0x44/0xc0 [xfs]
xfs_defer_finish_noroll+0x2e4/0x740 [xfs]
__xfs_trans_commit+0x1f4/0x400 [xfs]
xfs_reflink_remap_extent+0x2d8/0x650 [xfs]
xfs_reflink_remap_blocks+0x154/0x320 [xfs]
xfs_file_remap_range+0x138/0x3a0 [xfs]
do_clone_file_range+0x11c/0x2f0
vfs_clone_file_range+0x60/0x1c0
ioctl_file_clone+0x78/0x140
sys_ioctl+0x934/0x1270
system_call_exception+0x158/0x320
system_call_vectored_common+0x15c/0x2ec

Cc: Ojaswin Mujoo <ojaswin@linux.ibm.com>
Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com>
Reviewed-by: Darrick J. Wong <djwong@kernel.org>
Tested-by: Disha Goel<disgoel@linux.ibm.com>
Signed-off-by: Chandan Babu R <chandanbabu@kernel.org>

2y ago

Miguel Ojeda

6e58e017

kheaders: use `command -v` to test for existence of `cpio`

2y ago

Puranjay Mohan

b1e7cee9

powerpc/bpf: enforce full ordering for ATOMIC operations with BPF_FETCH

The Linux Kernel Memory Model [1][2] requires RMW operations that have a
return value to be fully ordered.

BPF atomic operations with BPF_FETCH (including BPF_XCHG and
BPF_CMPXCHG) return a value back so they need to be JITed to fully
ordered operations. POWERPC currently emits relaxed operations for
these.

We can show this by running the following litmus-test:

PPC SB+atomic_add+fetch

{
0:r0=x; (* dst reg assuming offset is 0 *)
0:r1=2; (* src reg *)
0:r2=1;
0:r4=y; (* P0 writes to this, P1 reads this *)
0:r5=z; (* P1 writes to this, P0 reads this *)
0:r6=0;

1:r2=1;
1:r4=y;
1:r5=z;
}

P0 | P1 ;
stw r2, 0(r4) | stw r2,0(r5) ;
| ;
loop:lwarx r3, r6, r0 | ;
mr r8, r3 | ;
add r3, r3, r1 | sync ;
stwcx. r3, r6, r0 | ;
bne loop | ;
mr r1, r8 | ;
| ;
lwa r7, 0(r5) | lwa r7,0(r4) ;

~exists(0:r7=0 /\ 1:r7=0)

Witnesses
Positive: 9 Negative: 3
Condition ~exists (0:r7=0 /\ 1:r7=0)
Observation SB+atomic_add+fetch Sometimes 3 9

This test shows that the older store in P0 is reordered with a newer
load to a different address. Although there is a RMW operation with
fetch between them. Adding a sync before and after RMW fixes the issue:

Witnesses
Positive: 9 Negative: 0
Condition ~exists (0:r7=0 /\ 1:r7=0)
Observation SB+atomic_add+fetch Never 0 9

[1] https://www.kernel.org/doc/Documentation/memory-barriers.txt
[2] https://www.kernel.org/doc/Documentation/atomic_t.txt

Fixes: aea7ef8a82c0 ("powerpc/bpf/32: add support for BPF_ATOMIC bitwise operations")
Fixes: 2d9206b22743 ("powerpc/bpf/32: Add instructions for atomic_[cmp]xchg")
Fixes: dbe6e2456fb0 ("powerpc/bpf/64: add support for atomic fetch operations")
Fixes: 1e82dfaa7819 ("powerpc/bpf/64: Add instructions for atomic_[cmp]xchg")
Cc: stable@vger.kernel.org # v6.0+
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Naveen N Rao <naveen@kernel.org>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/20240513100248.110535-1-puranjay@kernel.org